1*dfc6aa5cSAndroid Build Coastguard Worker; 2*dfc6aa5cSAndroid Build Coastguard Worker; jidctfst.asm - fast integer IDCT (MMX) 3*dfc6aa5cSAndroid Build Coastguard Worker; 4*dfc6aa5cSAndroid Build Coastguard Worker; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB 5*dfc6aa5cSAndroid Build Coastguard Worker; Copyright (C) 2016, D. R. Commander. 6*dfc6aa5cSAndroid Build Coastguard Worker; 7*dfc6aa5cSAndroid Build Coastguard Worker; Based on the x86 SIMD extension for IJG JPEG library 8*dfc6aa5cSAndroid Build Coastguard Worker; Copyright (C) 1999-2006, MIYASAKA Masaru. 9*dfc6aa5cSAndroid Build Coastguard Worker; For conditions of distribution and use, see copyright notice in jsimdext.inc 10*dfc6aa5cSAndroid Build Coastguard Worker; 11*dfc6aa5cSAndroid Build Coastguard Worker; This file should be assembled with NASM (Netwide Assembler), 12*dfc6aa5cSAndroid Build Coastguard Worker; can *not* be assembled with Microsoft's MASM or any compatible 13*dfc6aa5cSAndroid Build Coastguard Worker; assembler (including Borland's Turbo Assembler). 14*dfc6aa5cSAndroid Build Coastguard Worker; NASM is available from http://nasm.sourceforge.net/ or 15*dfc6aa5cSAndroid Build Coastguard Worker; http://sourceforge.net/project/showfiles.php?group_id=6208 16*dfc6aa5cSAndroid Build Coastguard Worker; 17*dfc6aa5cSAndroid Build Coastguard Worker; This file contains a fast, not so accurate integer implementation of 18*dfc6aa5cSAndroid Build Coastguard Worker; the inverse DCT (Discrete Cosine Transform). The following code is 19*dfc6aa5cSAndroid Build Coastguard Worker; based directly on the IJG's original jidctfst.c; see the jidctfst.c 20*dfc6aa5cSAndroid Build Coastguard Worker; for more details. 21*dfc6aa5cSAndroid Build Coastguard Worker 22*dfc6aa5cSAndroid Build Coastguard Worker%include "jsimdext.inc" 23*dfc6aa5cSAndroid Build Coastguard Worker%include "jdct.inc" 24*dfc6aa5cSAndroid Build Coastguard Worker 25*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 26*dfc6aa5cSAndroid Build Coastguard Worker 27*dfc6aa5cSAndroid Build Coastguard Worker%define CONST_BITS 8 ; 14 is also OK. 28*dfc6aa5cSAndroid Build Coastguard Worker%define PASS1_BITS 2 29*dfc6aa5cSAndroid Build Coastguard Worker 30*dfc6aa5cSAndroid Build Coastguard Worker%if IFAST_SCALE_BITS != PASS1_BITS 31*dfc6aa5cSAndroid Build Coastguard Worker%error "'IFAST_SCALE_BITS' must be equal to 'PASS1_BITS'." 32*dfc6aa5cSAndroid Build Coastguard Worker%endif 33*dfc6aa5cSAndroid Build Coastguard Worker 34*dfc6aa5cSAndroid Build Coastguard Worker%if CONST_BITS == 8 35*dfc6aa5cSAndroid Build Coastguard WorkerF_1_082 equ 277 ; FIX(1.082392200) 36*dfc6aa5cSAndroid Build Coastguard WorkerF_1_414 equ 362 ; FIX(1.414213562) 37*dfc6aa5cSAndroid Build Coastguard WorkerF_1_847 equ 473 ; FIX(1.847759065) 38*dfc6aa5cSAndroid Build Coastguard WorkerF_2_613 equ 669 ; FIX(2.613125930) 39*dfc6aa5cSAndroid Build Coastguard WorkerF_1_613 equ (F_2_613 - 256) ; FIX(2.613125930) - FIX(1) 40*dfc6aa5cSAndroid Build Coastguard Worker%else 41*dfc6aa5cSAndroid Build Coastguard Worker; NASM cannot do compile-time arithmetic on floating-point constants. 42*dfc6aa5cSAndroid Build Coastguard Worker%define DESCALE(x, n) (((x) + (1 << ((n) - 1))) >> (n)) 43*dfc6aa5cSAndroid Build Coastguard WorkerF_1_082 equ DESCALE(1162209775, 30 - CONST_BITS) ; FIX(1.082392200) 44*dfc6aa5cSAndroid Build Coastguard WorkerF_1_414 equ DESCALE(1518500249, 30 - CONST_BITS) ; FIX(1.414213562) 45*dfc6aa5cSAndroid Build Coastguard WorkerF_1_847 equ DESCALE(1984016188, 30 - CONST_BITS) ; FIX(1.847759065) 46*dfc6aa5cSAndroid Build Coastguard WorkerF_2_613 equ DESCALE(2805822602, 30 - CONST_BITS) ; FIX(2.613125930) 47*dfc6aa5cSAndroid Build Coastguard WorkerF_1_613 equ (F_2_613 - (1 << CONST_BITS)) ; FIX(2.613125930) - FIX(1) 48*dfc6aa5cSAndroid Build Coastguard Worker%endif 49*dfc6aa5cSAndroid Build Coastguard Worker 50*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 51*dfc6aa5cSAndroid Build Coastguard Worker SECTION SEG_CONST 52*dfc6aa5cSAndroid Build Coastguard Worker 53*dfc6aa5cSAndroid Build Coastguard Worker; PRE_MULTIPLY_SCALE_BITS <= 2 (to avoid overflow) 54*dfc6aa5cSAndroid Build Coastguard Worker; CONST_BITS + CONST_SHIFT + PRE_MULTIPLY_SCALE_BITS == 16 (for pmulhw) 55*dfc6aa5cSAndroid Build Coastguard Worker 56*dfc6aa5cSAndroid Build Coastguard Worker%define PRE_MULTIPLY_SCALE_BITS 2 57*dfc6aa5cSAndroid Build Coastguard Worker%define CONST_SHIFT (16 - PRE_MULTIPLY_SCALE_BITS - CONST_BITS) 58*dfc6aa5cSAndroid Build Coastguard Worker 59*dfc6aa5cSAndroid Build Coastguard Worker alignz 32 60*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_DATA(jconst_idct_ifast_mmx) 61*dfc6aa5cSAndroid Build Coastguard Worker 62*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jconst_idct_ifast_mmx): 63*dfc6aa5cSAndroid Build Coastguard Worker 64*dfc6aa5cSAndroid Build Coastguard WorkerPW_F1414 times 4 dw F_1_414 << CONST_SHIFT 65*dfc6aa5cSAndroid Build Coastguard WorkerPW_F1847 times 4 dw F_1_847 << CONST_SHIFT 66*dfc6aa5cSAndroid Build Coastguard WorkerPW_MF1613 times 4 dw -F_1_613 << CONST_SHIFT 67*dfc6aa5cSAndroid Build Coastguard WorkerPW_F1082 times 4 dw F_1_082 << CONST_SHIFT 68*dfc6aa5cSAndroid Build Coastguard WorkerPB_CENTERJSAMP times 8 db CENTERJSAMPLE 69*dfc6aa5cSAndroid Build Coastguard Worker 70*dfc6aa5cSAndroid Build Coastguard Worker alignz 32 71*dfc6aa5cSAndroid Build Coastguard Worker 72*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 73*dfc6aa5cSAndroid Build Coastguard Worker SECTION SEG_TEXT 74*dfc6aa5cSAndroid Build Coastguard Worker BITS 32 75*dfc6aa5cSAndroid Build Coastguard Worker; 76*dfc6aa5cSAndroid Build Coastguard Worker; Perform dequantization and inverse DCT on one block of coefficients. 77*dfc6aa5cSAndroid Build Coastguard Worker; 78*dfc6aa5cSAndroid Build Coastguard Worker; GLOBAL(void) 79*dfc6aa5cSAndroid Build Coastguard Worker; jsimd_idct_ifast_mmx(void *dct_table, JCOEFPTR coef_block, 80*dfc6aa5cSAndroid Build Coastguard Worker; JSAMPARRAY output_buf, JDIMENSION output_col) 81*dfc6aa5cSAndroid Build Coastguard Worker; 82*dfc6aa5cSAndroid Build Coastguard Worker 83*dfc6aa5cSAndroid Build Coastguard Worker%define dct_table(b) (b) + 8 ; jpeg_component_info *compptr 84*dfc6aa5cSAndroid Build Coastguard Worker%define coef_block(b) (b) + 12 ; JCOEFPTR coef_block 85*dfc6aa5cSAndroid Build Coastguard Worker%define output_buf(b) (b) + 16 ; JSAMPARRAY output_buf 86*dfc6aa5cSAndroid Build Coastguard Worker%define output_col(b) (b) + 20 ; JDIMENSION output_col 87*dfc6aa5cSAndroid Build Coastguard Worker 88*dfc6aa5cSAndroid Build Coastguard Worker%define original_ebp ebp + 0 89*dfc6aa5cSAndroid Build Coastguard Worker%define wk(i) ebp - (WK_NUM - (i)) * SIZEOF_MMWORD 90*dfc6aa5cSAndroid Build Coastguard Worker ; mmword wk[WK_NUM] 91*dfc6aa5cSAndroid Build Coastguard Worker%define WK_NUM 2 92*dfc6aa5cSAndroid Build Coastguard Worker%define workspace wk(0) - DCTSIZE2 * SIZEOF_JCOEF 93*dfc6aa5cSAndroid Build Coastguard Worker ; JCOEF workspace[DCTSIZE2] 94*dfc6aa5cSAndroid Build Coastguard Worker 95*dfc6aa5cSAndroid Build Coastguard Worker align 32 96*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_FUNCTION(jsimd_idct_ifast_mmx) 97*dfc6aa5cSAndroid Build Coastguard Worker 98*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jsimd_idct_ifast_mmx): 99*dfc6aa5cSAndroid Build Coastguard Worker push ebp 100*dfc6aa5cSAndroid Build Coastguard Worker mov eax, esp ; eax = original ebp 101*dfc6aa5cSAndroid Build Coastguard Worker sub esp, byte 4 102*dfc6aa5cSAndroid Build Coastguard Worker and esp, byte (-SIZEOF_MMWORD) ; align to 64 bits 103*dfc6aa5cSAndroid Build Coastguard Worker mov [esp], eax 104*dfc6aa5cSAndroid Build Coastguard Worker mov ebp, esp ; ebp = aligned ebp 105*dfc6aa5cSAndroid Build Coastguard Worker lea esp, [workspace] 106*dfc6aa5cSAndroid Build Coastguard Worker push ebx 107*dfc6aa5cSAndroid Build Coastguard Worker; push ecx ; need not be preserved 108*dfc6aa5cSAndroid Build Coastguard Worker; push edx ; need not be preserved 109*dfc6aa5cSAndroid Build Coastguard Worker push esi 110*dfc6aa5cSAndroid Build Coastguard Worker push edi 111*dfc6aa5cSAndroid Build Coastguard Worker 112*dfc6aa5cSAndroid Build Coastguard Worker get_GOT ebx ; get GOT address 113*dfc6aa5cSAndroid Build Coastguard Worker 114*dfc6aa5cSAndroid Build Coastguard Worker ; ---- Pass 1: process columns from input, store into work array. 115*dfc6aa5cSAndroid Build Coastguard Worker 116*dfc6aa5cSAndroid Build Coastguard Worker; mov eax, [original_ebp] 117*dfc6aa5cSAndroid Build Coastguard Worker mov edx, POINTER [dct_table(eax)] ; quantptr 118*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JCOEFPTR [coef_block(eax)] ; inptr 119*dfc6aa5cSAndroid Build Coastguard Worker lea edi, [workspace] ; JCOEF *wsptr 120*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, DCTSIZE/4 ; ctr 121*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 122*dfc6aa5cSAndroid Build Coastguard Worker.columnloop: 123*dfc6aa5cSAndroid Build Coastguard Worker%ifndef NO_ZERO_COLUMN_TEST_IFAST_MMX 124*dfc6aa5cSAndroid Build Coastguard Worker mov eax, dword [DWBLOCK(1,0,esi,SIZEOF_JCOEF)] 125*dfc6aa5cSAndroid Build Coastguard Worker or eax, dword [DWBLOCK(2,0,esi,SIZEOF_JCOEF)] 126*dfc6aa5cSAndroid Build Coastguard Worker jnz short .columnDCT 127*dfc6aa5cSAndroid Build Coastguard Worker 128*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)] 129*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)] 130*dfc6aa5cSAndroid Build Coastguard Worker por mm0, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)] 131*dfc6aa5cSAndroid Build Coastguard Worker por mm1, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)] 132*dfc6aa5cSAndroid Build Coastguard Worker por mm0, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)] 133*dfc6aa5cSAndroid Build Coastguard Worker por mm1, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)] 134*dfc6aa5cSAndroid Build Coastguard Worker por mm0, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)] 135*dfc6aa5cSAndroid Build Coastguard Worker por mm1, mm0 136*dfc6aa5cSAndroid Build Coastguard Worker packsswb mm1, mm1 137*dfc6aa5cSAndroid Build Coastguard Worker movd eax, mm1 138*dfc6aa5cSAndroid Build Coastguard Worker test eax, eax 139*dfc6aa5cSAndroid Build Coastguard Worker jnz short .columnDCT 140*dfc6aa5cSAndroid Build Coastguard Worker 141*dfc6aa5cSAndroid Build Coastguard Worker ; -- AC terms all zero 142*dfc6aa5cSAndroid Build Coastguard Worker 143*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)] 144*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_IFAST_MULT_TYPE)] 145*dfc6aa5cSAndroid Build Coastguard Worker 146*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, mm0 ; mm0=in0=(00 01 02 03) 147*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd mm0, mm0 ; mm0=(00 00 01 01) 148*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd mm2, mm2 ; mm2=(02 02 03 03) 149*dfc6aa5cSAndroid Build Coastguard Worker 150*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm0 151*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm0, mm0 ; mm0=(00 00 00 00) 152*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm1, mm1 ; mm1=(01 01 01 01) 153*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm2 154*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm2, mm2 ; mm2=(02 02 02 02) 155*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm3, mm3 ; mm3=(03 03 03 03) 156*dfc6aa5cSAndroid Build Coastguard Worker 157*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm0 158*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm0 159*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm1 160*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm1 161*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm2 162*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(2,1,edi,SIZEOF_JCOEF)], mm2 163*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm3 164*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(3,1,edi,SIZEOF_JCOEF)], mm3 165*dfc6aa5cSAndroid Build Coastguard Worker jmp near .nextcolumn 166*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 167*dfc6aa5cSAndroid Build Coastguard Worker%endif 168*dfc6aa5cSAndroid Build Coastguard Worker.columnDCT: 169*dfc6aa5cSAndroid Build Coastguard Worker 170*dfc6aa5cSAndroid Build Coastguard Worker ; -- Even part 171*dfc6aa5cSAndroid Build Coastguard Worker 172*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)] 173*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)] 174*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm0, MMWORD [MMBLOCK(0,0,edx,SIZEOF_IFAST_MULT_TYPE)] 175*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm1, MMWORD [MMBLOCK(2,0,edx,SIZEOF_IFAST_MULT_TYPE)] 176*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)] 177*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)] 178*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm2, MMWORD [MMBLOCK(4,0,edx,SIZEOF_IFAST_MULT_TYPE)] 179*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm3, MMWORD [MMBLOCK(6,0,edx,SIZEOF_IFAST_MULT_TYPE)] 180*dfc6aa5cSAndroid Build Coastguard Worker 181*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm0 182*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm1 183*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm2 ; mm0=tmp11 184*dfc6aa5cSAndroid Build Coastguard Worker psubw mm1, mm3 185*dfc6aa5cSAndroid Build Coastguard Worker paddw mm4, mm2 ; mm4=tmp10 186*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm3 ; mm5=tmp13 187*dfc6aa5cSAndroid Build Coastguard Worker 188*dfc6aa5cSAndroid Build Coastguard Worker psllw mm1, PRE_MULTIPLY_SCALE_BITS 189*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm1, [GOTOFF(ebx,PW_F1414)] 190*dfc6aa5cSAndroid Build Coastguard Worker psubw mm1, mm5 ; mm1=tmp12 191*dfc6aa5cSAndroid Build Coastguard Worker 192*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, mm4 193*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, mm0 194*dfc6aa5cSAndroid Build Coastguard Worker psubw mm4, mm5 ; mm4=tmp3 195*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm1 ; mm0=tmp2 196*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, mm5 ; mm6=tmp0 197*dfc6aa5cSAndroid Build Coastguard Worker paddw mm7, mm1 ; mm7=tmp1 198*dfc6aa5cSAndroid Build Coastguard Worker 199*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(1)], mm4 ; wk(1)=tmp3 200*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(0)], mm0 ; wk(0)=tmp2 201*dfc6aa5cSAndroid Build Coastguard Worker 202*dfc6aa5cSAndroid Build Coastguard Worker ; -- Odd part 203*dfc6aa5cSAndroid Build Coastguard Worker 204*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)] 205*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)] 206*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm2, MMWORD [MMBLOCK(1,0,edx,SIZEOF_IFAST_MULT_TYPE)] 207*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm3, MMWORD [MMBLOCK(3,0,edx,SIZEOF_IFAST_MULT_TYPE)] 208*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)] 209*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)] 210*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm5, MMWORD [MMBLOCK(5,0,edx,SIZEOF_IFAST_MULT_TYPE)] 211*dfc6aa5cSAndroid Build Coastguard Worker pmullw mm1, MMWORD [MMBLOCK(7,0,edx,SIZEOF_IFAST_MULT_TYPE)] 212*dfc6aa5cSAndroid Build Coastguard Worker 213*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm2 214*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm5 215*dfc6aa5cSAndroid Build Coastguard Worker psubw mm2, mm1 ; mm2=z12 216*dfc6aa5cSAndroid Build Coastguard Worker psubw mm5, mm3 ; mm5=z10 217*dfc6aa5cSAndroid Build Coastguard Worker paddw mm4, mm1 ; mm4=z11 218*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, mm3 ; mm0=z13 219*dfc6aa5cSAndroid Build Coastguard Worker 220*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm5 ; mm1=z10(unscaled) 221*dfc6aa5cSAndroid Build Coastguard Worker psllw mm2, PRE_MULTIPLY_SCALE_BITS 222*dfc6aa5cSAndroid Build Coastguard Worker psllw mm5, PRE_MULTIPLY_SCALE_BITS 223*dfc6aa5cSAndroid Build Coastguard Worker 224*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm4 225*dfc6aa5cSAndroid Build Coastguard Worker psubw mm4, mm0 226*dfc6aa5cSAndroid Build Coastguard Worker paddw mm3, mm0 ; mm3=tmp7 227*dfc6aa5cSAndroid Build Coastguard Worker 228*dfc6aa5cSAndroid Build Coastguard Worker psllw mm4, PRE_MULTIPLY_SCALE_BITS 229*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm4, [GOTOFF(ebx,PW_F1414)] ; mm4=tmp11 230*dfc6aa5cSAndroid Build Coastguard Worker 231*dfc6aa5cSAndroid Build Coastguard Worker ; To avoid overflow... 232*dfc6aa5cSAndroid Build Coastguard Worker ; 233*dfc6aa5cSAndroid Build Coastguard Worker ; (Original) 234*dfc6aa5cSAndroid Build Coastguard Worker ; tmp12 = -2.613125930 * z10 + z5; 235*dfc6aa5cSAndroid Build Coastguard Worker ; 236*dfc6aa5cSAndroid Build Coastguard Worker ; (This implementation) 237*dfc6aa5cSAndroid Build Coastguard Worker ; tmp12 = (-1.613125930 - 1) * z10 + z5; 238*dfc6aa5cSAndroid Build Coastguard Worker ; = -1.613125930 * z10 - z10 + z5; 239*dfc6aa5cSAndroid Build Coastguard Worker 240*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm5 241*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm2 242*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm5, [GOTOFF(ebx,PW_F1847)] ; mm5=z5 243*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm0, [GOTOFF(ebx,PW_MF1613)] 244*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm2, [GOTOFF(ebx,PW_F1082)] 245*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm1 246*dfc6aa5cSAndroid Build Coastguard Worker psubw mm2, mm5 ; mm2=tmp10 247*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, mm5 ; mm0=tmp12 248*dfc6aa5cSAndroid Build Coastguard Worker 249*dfc6aa5cSAndroid Build Coastguard Worker ; -- Final output stage 250*dfc6aa5cSAndroid Build Coastguard Worker 251*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm3 ; mm0=tmp6 252*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm6 253*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm7 254*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, mm3 ; mm6=data0=(00 01 02 03) 255*dfc6aa5cSAndroid Build Coastguard Worker paddw mm7, mm0 ; mm7=data1=(10 11 12 13) 256*dfc6aa5cSAndroid Build Coastguard Worker psubw mm1, mm3 ; mm1=data7=(70 71 72 73) 257*dfc6aa5cSAndroid Build Coastguard Worker psubw mm5, mm0 ; mm5=data6=(60 61 62 63) 258*dfc6aa5cSAndroid Build Coastguard Worker psubw mm4, mm0 ; mm4=tmp5 259*dfc6aa5cSAndroid Build Coastguard Worker 260*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm6 ; transpose coefficients(phase 1) 261*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd mm6, mm7 ; mm6=(00 10 01 11) 262*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd mm3, mm7 ; mm3=(02 12 03 13) 263*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm5 ; transpose coefficients(phase 1) 264*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd mm5, mm1 ; mm5=(60 70 61 71) 265*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd mm0, mm1 ; mm0=(62 72 63 73) 266*dfc6aa5cSAndroid Build Coastguard Worker 267*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, MMWORD [wk(0)] ; mm7=tmp2 268*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [wk(1)] ; mm1=tmp3 269*dfc6aa5cSAndroid Build Coastguard Worker 270*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(0)], mm5 ; wk(0)=(60 70 61 71) 271*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(1)], mm0 ; wk(1)=(62 72 63 73) 272*dfc6aa5cSAndroid Build Coastguard Worker 273*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, mm4 ; mm2=tmp4 274*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm7 275*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm1 276*dfc6aa5cSAndroid Build Coastguard Worker paddw mm7, mm4 ; mm7=data2=(20 21 22 23) 277*dfc6aa5cSAndroid Build Coastguard Worker paddw mm1, mm2 ; mm1=data4=(40 41 42 43) 278*dfc6aa5cSAndroid Build Coastguard Worker psubw mm5, mm4 ; mm5=data5=(50 51 52 53) 279*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm2 ; mm0=data3=(30 31 32 33) 280*dfc6aa5cSAndroid Build Coastguard Worker 281*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm7 ; transpose coefficients(phase 1) 282*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd mm7, mm0 ; mm7=(20 30 21 31) 283*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd mm4, mm0 ; mm4=(22 32 23 33) 284*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, mm1 ; transpose coefficients(phase 1) 285*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd mm1, mm5 ; mm1=(40 50 41 51) 286*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd mm2, mm5 ; mm2=(42 52 43 53) 287*dfc6aa5cSAndroid Build Coastguard Worker 288*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm6 ; transpose coefficients(phase 2) 289*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm6, mm7 ; mm6=(00 10 20 30) 290*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm0, mm7 ; mm0=(01 11 21 31) 291*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm3 ; transpose coefficients(phase 2) 292*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm3, mm4 ; mm3=(02 12 22 32) 293*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm5, mm4 ; mm5=(03 13 23 33) 294*dfc6aa5cSAndroid Build Coastguard Worker 295*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, MMWORD [wk(0)] ; mm7=(60 70 61 71) 296*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, MMWORD [wk(1)] ; mm4=(62 72 63 73) 297*dfc6aa5cSAndroid Build Coastguard Worker 298*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(0,0,edi,SIZEOF_JCOEF)], mm6 299*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(1,0,edi,SIZEOF_JCOEF)], mm0 300*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(2,0,edi,SIZEOF_JCOEF)], mm3 301*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(3,0,edi,SIZEOF_JCOEF)], mm5 302*dfc6aa5cSAndroid Build Coastguard Worker 303*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, mm1 ; transpose coefficients(phase 2) 304*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm1, mm7 ; mm1=(40 50 60 70) 305*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm6, mm7 ; mm6=(41 51 61 71) 306*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm2 ; transpose coefficients(phase 2) 307*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm2, mm4 ; mm2=(42 52 62 72) 308*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm0, mm4 ; mm0=(43 53 63 73) 309*dfc6aa5cSAndroid Build Coastguard Worker 310*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(0,1,edi,SIZEOF_JCOEF)], mm1 311*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(1,1,edi,SIZEOF_JCOEF)], mm6 312*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(2,1,edi,SIZEOF_JCOEF)], mm2 313*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [MMBLOCK(3,1,edi,SIZEOF_JCOEF)], mm0 314*dfc6aa5cSAndroid Build Coastguard Worker 315*dfc6aa5cSAndroid Build Coastguard Worker.nextcolumn: 316*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 4*SIZEOF_JCOEF ; coef_block 317*dfc6aa5cSAndroid Build Coastguard Worker add edx, byte 4*SIZEOF_IFAST_MULT_TYPE ; quantptr 318*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 4*DCTSIZE*SIZEOF_JCOEF ; wsptr 319*dfc6aa5cSAndroid Build Coastguard Worker dec ecx ; ctr 320*dfc6aa5cSAndroid Build Coastguard Worker jnz near .columnloop 321*dfc6aa5cSAndroid Build Coastguard Worker 322*dfc6aa5cSAndroid Build Coastguard Worker ; ---- Pass 2: process rows from work array, store into output array. 323*dfc6aa5cSAndroid Build Coastguard Worker 324*dfc6aa5cSAndroid Build Coastguard Worker mov eax, [original_ebp] 325*dfc6aa5cSAndroid Build Coastguard Worker lea esi, [workspace] ; JCOEF *wsptr 326*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JSAMPARRAY [output_buf(eax)] ; (JSAMPROW *) 327*dfc6aa5cSAndroid Build Coastguard Worker mov eax, JDIMENSION [output_col(eax)] 328*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, DCTSIZE/4 ; ctr 329*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 330*dfc6aa5cSAndroid Build Coastguard Worker.rowloop: 331*dfc6aa5cSAndroid Build Coastguard Worker 332*dfc6aa5cSAndroid Build Coastguard Worker ; -- Even part 333*dfc6aa5cSAndroid Build Coastguard Worker 334*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [MMBLOCK(0,0,esi,SIZEOF_JCOEF)] 335*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [MMBLOCK(2,0,esi,SIZEOF_JCOEF)] 336*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [MMBLOCK(4,0,esi,SIZEOF_JCOEF)] 337*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, MMWORD [MMBLOCK(6,0,esi,SIZEOF_JCOEF)] 338*dfc6aa5cSAndroid Build Coastguard Worker 339*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm0 340*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm1 341*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm2 ; mm0=tmp11 342*dfc6aa5cSAndroid Build Coastguard Worker psubw mm1, mm3 343*dfc6aa5cSAndroid Build Coastguard Worker paddw mm4, mm2 ; mm4=tmp10 344*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm3 ; mm5=tmp13 345*dfc6aa5cSAndroid Build Coastguard Worker 346*dfc6aa5cSAndroid Build Coastguard Worker psllw mm1, PRE_MULTIPLY_SCALE_BITS 347*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm1, [GOTOFF(ebx,PW_F1414)] 348*dfc6aa5cSAndroid Build Coastguard Worker psubw mm1, mm5 ; mm1=tmp12 349*dfc6aa5cSAndroid Build Coastguard Worker 350*dfc6aa5cSAndroid Build Coastguard Worker movq mm6, mm4 351*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, mm0 352*dfc6aa5cSAndroid Build Coastguard Worker psubw mm4, mm5 ; mm4=tmp3 353*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm1 ; mm0=tmp2 354*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, mm5 ; mm6=tmp0 355*dfc6aa5cSAndroid Build Coastguard Worker paddw mm7, mm1 ; mm7=tmp1 356*dfc6aa5cSAndroid Build Coastguard Worker 357*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(1)], mm4 ; wk(1)=tmp3 358*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [wk(0)], mm0 ; wk(0)=tmp2 359*dfc6aa5cSAndroid Build Coastguard Worker 360*dfc6aa5cSAndroid Build Coastguard Worker ; -- Odd part 361*dfc6aa5cSAndroid Build Coastguard Worker 362*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, MMWORD [MMBLOCK(1,0,esi,SIZEOF_JCOEF)] 363*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, MMWORD [MMBLOCK(3,0,esi,SIZEOF_JCOEF)] 364*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, MMWORD [MMBLOCK(5,0,esi,SIZEOF_JCOEF)] 365*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, MMWORD [MMBLOCK(7,0,esi,SIZEOF_JCOEF)] 366*dfc6aa5cSAndroid Build Coastguard Worker 367*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm2 368*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm5 369*dfc6aa5cSAndroid Build Coastguard Worker psubw mm2, mm1 ; mm2=z12 370*dfc6aa5cSAndroid Build Coastguard Worker psubw mm5, mm3 ; mm5=z10 371*dfc6aa5cSAndroid Build Coastguard Worker paddw mm4, mm1 ; mm4=z11 372*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, mm3 ; mm0=z13 373*dfc6aa5cSAndroid Build Coastguard Worker 374*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm5 ; mm1=z10(unscaled) 375*dfc6aa5cSAndroid Build Coastguard Worker psllw mm2, PRE_MULTIPLY_SCALE_BITS 376*dfc6aa5cSAndroid Build Coastguard Worker psllw mm5, PRE_MULTIPLY_SCALE_BITS 377*dfc6aa5cSAndroid Build Coastguard Worker 378*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, mm4 379*dfc6aa5cSAndroid Build Coastguard Worker psubw mm4, mm0 380*dfc6aa5cSAndroid Build Coastguard Worker paddw mm3, mm0 ; mm3=tmp7 381*dfc6aa5cSAndroid Build Coastguard Worker 382*dfc6aa5cSAndroid Build Coastguard Worker psllw mm4, PRE_MULTIPLY_SCALE_BITS 383*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm4, [GOTOFF(ebx,PW_F1414)] ; mm4=tmp11 384*dfc6aa5cSAndroid Build Coastguard Worker 385*dfc6aa5cSAndroid Build Coastguard Worker ; To avoid overflow... 386*dfc6aa5cSAndroid Build Coastguard Worker ; 387*dfc6aa5cSAndroid Build Coastguard Worker ; (Original) 388*dfc6aa5cSAndroid Build Coastguard Worker ; tmp12 = -2.613125930 * z10 + z5; 389*dfc6aa5cSAndroid Build Coastguard Worker ; 390*dfc6aa5cSAndroid Build Coastguard Worker ; (This implementation) 391*dfc6aa5cSAndroid Build Coastguard Worker ; tmp12 = (-1.613125930 - 1) * z10 + z5; 392*dfc6aa5cSAndroid Build Coastguard Worker ; = -1.613125930 * z10 - z10 + z5; 393*dfc6aa5cSAndroid Build Coastguard Worker 394*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm5 395*dfc6aa5cSAndroid Build Coastguard Worker paddw mm5, mm2 396*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm5, [GOTOFF(ebx,PW_F1847)] ; mm5=z5 397*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm0, [GOTOFF(ebx,PW_MF1613)] 398*dfc6aa5cSAndroid Build Coastguard Worker pmulhw mm2, [GOTOFF(ebx,PW_F1082)] 399*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm1 400*dfc6aa5cSAndroid Build Coastguard Worker psubw mm2, mm5 ; mm2=tmp10 401*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, mm5 ; mm0=tmp12 402*dfc6aa5cSAndroid Build Coastguard Worker 403*dfc6aa5cSAndroid Build Coastguard Worker ; -- Final output stage 404*dfc6aa5cSAndroid Build Coastguard Worker 405*dfc6aa5cSAndroid Build Coastguard Worker psubw mm0, mm3 ; mm0=tmp6 406*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm6 407*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm7 408*dfc6aa5cSAndroid Build Coastguard Worker paddw mm6, mm3 ; mm6=data0=(00 10 20 30) 409*dfc6aa5cSAndroid Build Coastguard Worker paddw mm7, mm0 ; mm7=data1=(01 11 21 31) 410*dfc6aa5cSAndroid Build Coastguard Worker psraw mm6, (PASS1_BITS+3) ; descale 411*dfc6aa5cSAndroid Build Coastguard Worker psraw mm7, (PASS1_BITS+3) ; descale 412*dfc6aa5cSAndroid Build Coastguard Worker psubw mm1, mm3 ; mm1=data7=(07 17 27 37) 413*dfc6aa5cSAndroid Build Coastguard Worker psubw mm5, mm0 ; mm5=data6=(06 16 26 36) 414*dfc6aa5cSAndroid Build Coastguard Worker psraw mm1, (PASS1_BITS+3) ; descale 415*dfc6aa5cSAndroid Build Coastguard Worker psraw mm5, (PASS1_BITS+3) ; descale 416*dfc6aa5cSAndroid Build Coastguard Worker psubw mm4, mm0 ; mm4=tmp5 417*dfc6aa5cSAndroid Build Coastguard Worker 418*dfc6aa5cSAndroid Build Coastguard Worker packsswb mm6, mm5 ; mm6=(00 10 20 30 06 16 26 36) 419*dfc6aa5cSAndroid Build Coastguard Worker packsswb mm7, mm1 ; mm7=(01 11 21 31 07 17 27 37) 420*dfc6aa5cSAndroid Build Coastguard Worker 421*dfc6aa5cSAndroid Build Coastguard Worker movq mm3, MMWORD [wk(0)] ; mm3=tmp2 422*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, MMWORD [wk(1)] ; mm0=tmp3 423*dfc6aa5cSAndroid Build Coastguard Worker 424*dfc6aa5cSAndroid Build Coastguard Worker paddw mm2, mm4 ; mm2=tmp4 425*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm3 426*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm0 427*dfc6aa5cSAndroid Build Coastguard Worker paddw mm3, mm4 ; mm3=data2=(02 12 22 32) 428*dfc6aa5cSAndroid Build Coastguard Worker paddw mm0, mm2 ; mm0=data4=(04 14 24 34) 429*dfc6aa5cSAndroid Build Coastguard Worker psraw mm3, (PASS1_BITS+3) ; descale 430*dfc6aa5cSAndroid Build Coastguard Worker psraw mm0, (PASS1_BITS+3) ; descale 431*dfc6aa5cSAndroid Build Coastguard Worker psubw mm5, mm4 ; mm5=data5=(05 15 25 35) 432*dfc6aa5cSAndroid Build Coastguard Worker psubw mm1, mm2 ; mm1=data3=(03 13 23 33) 433*dfc6aa5cSAndroid Build Coastguard Worker psraw mm5, (PASS1_BITS+3) ; descale 434*dfc6aa5cSAndroid Build Coastguard Worker psraw mm1, (PASS1_BITS+3) ; descale 435*dfc6aa5cSAndroid Build Coastguard Worker 436*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, [GOTOFF(ebx,PB_CENTERJSAMP)] ; mm4=[PB_CENTERJSAMP] 437*dfc6aa5cSAndroid Build Coastguard Worker 438*dfc6aa5cSAndroid Build Coastguard Worker packsswb mm3, mm0 ; mm3=(02 12 22 32 04 14 24 34) 439*dfc6aa5cSAndroid Build Coastguard Worker packsswb mm1, mm5 ; mm1=(03 13 23 33 05 15 25 35) 440*dfc6aa5cSAndroid Build Coastguard Worker 441*dfc6aa5cSAndroid Build Coastguard Worker paddb mm6, mm4 442*dfc6aa5cSAndroid Build Coastguard Worker paddb mm7, mm4 443*dfc6aa5cSAndroid Build Coastguard Worker paddb mm3, mm4 444*dfc6aa5cSAndroid Build Coastguard Worker paddb mm1, mm4 445*dfc6aa5cSAndroid Build Coastguard Worker 446*dfc6aa5cSAndroid Build Coastguard Worker movq mm2, mm6 ; transpose coefficients(phase 1) 447*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm6, mm7 ; mm6=(00 01 10 11 20 21 30 31) 448*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm2, mm7 ; mm2=(06 07 16 17 26 27 36 37) 449*dfc6aa5cSAndroid Build Coastguard Worker movq mm0, mm3 ; transpose coefficients(phase 1) 450*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw mm3, mm1 ; mm3=(02 03 12 13 22 23 32 33) 451*dfc6aa5cSAndroid Build Coastguard Worker punpckhbw mm0, mm1 ; mm0=(04 05 14 15 24 25 34 35) 452*dfc6aa5cSAndroid Build Coastguard Worker 453*dfc6aa5cSAndroid Build Coastguard Worker movq mm5, mm6 ; transpose coefficients(phase 2) 454*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd mm6, mm3 ; mm6=(00 01 02 03 10 11 12 13) 455*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd mm5, mm3 ; mm5=(20 21 22 23 30 31 32 33) 456*dfc6aa5cSAndroid Build Coastguard Worker movq mm4, mm0 ; transpose coefficients(phase 2) 457*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd mm0, mm2 ; mm0=(04 05 06 07 14 15 16 17) 458*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd mm4, mm2 ; mm4=(24 25 26 27 34 35 36 37) 459*dfc6aa5cSAndroid Build Coastguard Worker 460*dfc6aa5cSAndroid Build Coastguard Worker movq mm7, mm6 ; transpose coefficients(phase 3) 461*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm6, mm0 ; mm6=(00 01 02 03 04 05 06 07) 462*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm7, mm0 ; mm7=(10 11 12 13 14 15 16 17) 463*dfc6aa5cSAndroid Build Coastguard Worker movq mm1, mm5 ; transpose coefficients(phase 3) 464*dfc6aa5cSAndroid Build Coastguard Worker punpckldq mm5, mm4 ; mm5=(20 21 22 23 24 25 26 27) 465*dfc6aa5cSAndroid Build Coastguard Worker punpckhdq mm1, mm4 ; mm1=(30 31 32 33 34 35 36 37) 466*dfc6aa5cSAndroid Build Coastguard Worker 467*dfc6aa5cSAndroid Build Coastguard Worker pushpic ebx ; save GOT address 468*dfc6aa5cSAndroid Build Coastguard Worker 469*dfc6aa5cSAndroid Build Coastguard Worker mov edx, JSAMPROW [edi+0*SIZEOF_JSAMPROW] 470*dfc6aa5cSAndroid Build Coastguard Worker mov ebx, JSAMPROW [edi+1*SIZEOF_JSAMPROW] 471*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+eax*SIZEOF_JSAMPLE], mm6 472*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [ebx+eax*SIZEOF_JSAMPLE], mm7 473*dfc6aa5cSAndroid Build Coastguard Worker mov edx, JSAMPROW [edi+2*SIZEOF_JSAMPROW] 474*dfc6aa5cSAndroid Build Coastguard Worker mov ebx, JSAMPROW [edi+3*SIZEOF_JSAMPROW] 475*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [edx+eax*SIZEOF_JSAMPLE], mm5 476*dfc6aa5cSAndroid Build Coastguard Worker movq MMWORD [ebx+eax*SIZEOF_JSAMPLE], mm1 477*dfc6aa5cSAndroid Build Coastguard Worker 478*dfc6aa5cSAndroid Build Coastguard Worker poppic ebx ; restore GOT address 479*dfc6aa5cSAndroid Build Coastguard Worker 480*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 4*SIZEOF_JCOEF ; wsptr 481*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 4*SIZEOF_JSAMPROW 482*dfc6aa5cSAndroid Build Coastguard Worker dec ecx ; ctr 483*dfc6aa5cSAndroid Build Coastguard Worker jnz near .rowloop 484*dfc6aa5cSAndroid Build Coastguard Worker 485*dfc6aa5cSAndroid Build Coastguard Worker emms ; empty MMX state 486*dfc6aa5cSAndroid Build Coastguard Worker 487*dfc6aa5cSAndroid Build Coastguard Worker pop edi 488*dfc6aa5cSAndroid Build Coastguard Worker pop esi 489*dfc6aa5cSAndroid Build Coastguard Worker; pop edx ; need not be preserved 490*dfc6aa5cSAndroid Build Coastguard Worker; pop ecx ; need not be preserved 491*dfc6aa5cSAndroid Build Coastguard Worker pop ebx 492*dfc6aa5cSAndroid Build Coastguard Worker mov esp, ebp ; esp <- aligned ebp 493*dfc6aa5cSAndroid Build Coastguard Worker pop esp ; esp <- original ebp 494*dfc6aa5cSAndroid Build Coastguard Worker pop ebp 495*dfc6aa5cSAndroid Build Coastguard Worker ret 496*dfc6aa5cSAndroid Build Coastguard Worker 497*dfc6aa5cSAndroid Build Coastguard Worker; For some reason, the OS X linker does not honor the request to align the 498*dfc6aa5cSAndroid Build Coastguard Worker; segment unless we do this. 499*dfc6aa5cSAndroid Build Coastguard Worker align 32 500