1*dfc6aa5cSAndroid Build Coastguard Worker; 2*dfc6aa5cSAndroid Build Coastguard Worker; jquantf.asm - sample data conversion and quantization (SSE & SSE2) 3*dfc6aa5cSAndroid Build Coastguard Worker; 4*dfc6aa5cSAndroid Build Coastguard Worker; Copyright 2009 Pierre Ossman <ossman@cendio.se> for Cendio AB 5*dfc6aa5cSAndroid Build Coastguard Worker; Copyright (C) 2016, D. R. Commander. 6*dfc6aa5cSAndroid Build Coastguard Worker; 7*dfc6aa5cSAndroid Build Coastguard Worker; Based on the x86 SIMD extension for IJG JPEG library 8*dfc6aa5cSAndroid Build Coastguard Worker; Copyright (C) 1999-2006, MIYASAKA Masaru. 9*dfc6aa5cSAndroid Build Coastguard Worker; For conditions of distribution and use, see copyright notice in jsimdext.inc 10*dfc6aa5cSAndroid Build Coastguard Worker; 11*dfc6aa5cSAndroid Build Coastguard Worker; This file should be assembled with NASM (Netwide Assembler), 12*dfc6aa5cSAndroid Build Coastguard Worker; can *not* be assembled with Microsoft's MASM or any compatible 13*dfc6aa5cSAndroid Build Coastguard Worker; assembler (including Borland's Turbo Assembler). 14*dfc6aa5cSAndroid Build Coastguard Worker; NASM is available from http://nasm.sourceforge.net/ or 15*dfc6aa5cSAndroid Build Coastguard Worker; http://sourceforge.net/project/showfiles.php?group_id=6208 16*dfc6aa5cSAndroid Build Coastguard Worker 17*dfc6aa5cSAndroid Build Coastguard Worker%include "jsimdext.inc" 18*dfc6aa5cSAndroid Build Coastguard Worker%include "jdct.inc" 19*dfc6aa5cSAndroid Build Coastguard Worker 20*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 21*dfc6aa5cSAndroid Build Coastguard Worker SECTION SEG_TEXT 22*dfc6aa5cSAndroid Build Coastguard Worker BITS 32 23*dfc6aa5cSAndroid Build Coastguard Worker; 24*dfc6aa5cSAndroid Build Coastguard Worker; Load data into workspace, applying unsigned->signed conversion 25*dfc6aa5cSAndroid Build Coastguard Worker; 26*dfc6aa5cSAndroid Build Coastguard Worker; GLOBAL(void) 27*dfc6aa5cSAndroid Build Coastguard Worker; jsimd_convsamp_float_sse2(JSAMPARRAY sample_data, JDIMENSION start_col, 28*dfc6aa5cSAndroid Build Coastguard Worker; FAST_FLOAT *workspace); 29*dfc6aa5cSAndroid Build Coastguard Worker; 30*dfc6aa5cSAndroid Build Coastguard Worker 31*dfc6aa5cSAndroid Build Coastguard Worker%define sample_data ebp + 8 ; JSAMPARRAY sample_data 32*dfc6aa5cSAndroid Build Coastguard Worker%define start_col ebp + 12 ; JDIMENSION start_col 33*dfc6aa5cSAndroid Build Coastguard Worker%define workspace ebp + 16 ; FAST_FLOAT *workspace 34*dfc6aa5cSAndroid Build Coastguard Worker 35*dfc6aa5cSAndroid Build Coastguard Worker align 32 36*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_FUNCTION(jsimd_convsamp_float_sse2) 37*dfc6aa5cSAndroid Build Coastguard Worker 38*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jsimd_convsamp_float_sse2): 39*dfc6aa5cSAndroid Build Coastguard Worker push ebp 40*dfc6aa5cSAndroid Build Coastguard Worker mov ebp, esp 41*dfc6aa5cSAndroid Build Coastguard Worker push ebx 42*dfc6aa5cSAndroid Build Coastguard Worker; push ecx ; need not be preserved 43*dfc6aa5cSAndroid Build Coastguard Worker; push edx ; need not be preserved 44*dfc6aa5cSAndroid Build Coastguard Worker push esi 45*dfc6aa5cSAndroid Build Coastguard Worker push edi 46*dfc6aa5cSAndroid Build Coastguard Worker 47*dfc6aa5cSAndroid Build Coastguard Worker pcmpeqw xmm7, xmm7 48*dfc6aa5cSAndroid Build Coastguard Worker psllw xmm7, 7 49*dfc6aa5cSAndroid Build Coastguard Worker packsswb xmm7, xmm7 ; xmm7 = PB_CENTERJSAMPLE (0x808080..) 50*dfc6aa5cSAndroid Build Coastguard Worker 51*dfc6aa5cSAndroid Build Coastguard Worker mov esi, JSAMPARRAY [sample_data] ; (JSAMPROW *) 52*dfc6aa5cSAndroid Build Coastguard Worker mov eax, JDIMENSION [start_col] 53*dfc6aa5cSAndroid Build Coastguard Worker mov edi, POINTER [workspace] ; (DCTELEM *) 54*dfc6aa5cSAndroid Build Coastguard Worker mov ecx, DCTSIZE/2 55*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 56*dfc6aa5cSAndroid Build Coastguard Worker.convloop: 57*dfc6aa5cSAndroid Build Coastguard Worker mov ebx, JSAMPROW [esi+0*SIZEOF_JSAMPROW] ; (JSAMPLE *) 58*dfc6aa5cSAndroid Build Coastguard Worker mov edx, JSAMPROW [esi+1*SIZEOF_JSAMPROW] ; (JSAMPLE *) 59*dfc6aa5cSAndroid Build Coastguard Worker 60*dfc6aa5cSAndroid Build Coastguard Worker movq xmm0, XMM_MMWORD [ebx+eax*SIZEOF_JSAMPLE] 61*dfc6aa5cSAndroid Build Coastguard Worker movq xmm1, XMM_MMWORD [edx+eax*SIZEOF_JSAMPLE] 62*dfc6aa5cSAndroid Build Coastguard Worker 63*dfc6aa5cSAndroid Build Coastguard Worker psubb xmm0, xmm7 ; xmm0=(01234567) 64*dfc6aa5cSAndroid Build Coastguard Worker psubb xmm1, xmm7 ; xmm1=(89ABCDEF) 65*dfc6aa5cSAndroid Build Coastguard Worker 66*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw xmm0, xmm0 ; xmm0=(*0*1*2*3*4*5*6*7) 67*dfc6aa5cSAndroid Build Coastguard Worker punpcklbw xmm1, xmm1 ; xmm1=(*8*9*A*B*C*D*E*F) 68*dfc6aa5cSAndroid Build Coastguard Worker 69*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd xmm2, xmm0 ; xmm2=(***0***1***2***3) 70*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd xmm0, xmm0 ; xmm0=(***4***5***6***7) 71*dfc6aa5cSAndroid Build Coastguard Worker punpcklwd xmm3, xmm1 ; xmm3=(***8***9***A***B) 72*dfc6aa5cSAndroid Build Coastguard Worker punpckhwd xmm1, xmm1 ; xmm1=(***C***D***E***F) 73*dfc6aa5cSAndroid Build Coastguard Worker 74*dfc6aa5cSAndroid Build Coastguard Worker psrad xmm2, (DWORD_BIT-BYTE_BIT) ; xmm2=(0123) 75*dfc6aa5cSAndroid Build Coastguard Worker psrad xmm0, (DWORD_BIT-BYTE_BIT) ; xmm0=(4567) 76*dfc6aa5cSAndroid Build Coastguard Worker cvtdq2ps xmm2, xmm2 ; xmm2=(0123) 77*dfc6aa5cSAndroid Build Coastguard Worker cvtdq2ps xmm0, xmm0 ; xmm0=(4567) 78*dfc6aa5cSAndroid Build Coastguard Worker psrad xmm3, (DWORD_BIT-BYTE_BIT) ; xmm3=(89AB) 79*dfc6aa5cSAndroid Build Coastguard Worker psrad xmm1, (DWORD_BIT-BYTE_BIT) ; xmm1=(CDEF) 80*dfc6aa5cSAndroid Build Coastguard Worker cvtdq2ps xmm3, xmm3 ; xmm3=(89AB) 81*dfc6aa5cSAndroid Build Coastguard Worker cvtdq2ps xmm1, xmm1 ; xmm1=(CDEF) 82*dfc6aa5cSAndroid Build Coastguard Worker 83*dfc6aa5cSAndroid Build Coastguard Worker movaps XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_FAST_FLOAT)], xmm2 84*dfc6aa5cSAndroid Build Coastguard Worker movaps XMMWORD [XMMBLOCK(0,1,edi,SIZEOF_FAST_FLOAT)], xmm0 85*dfc6aa5cSAndroid Build Coastguard Worker movaps XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_FAST_FLOAT)], xmm3 86*dfc6aa5cSAndroid Build Coastguard Worker movaps XMMWORD [XMMBLOCK(1,1,edi,SIZEOF_FAST_FLOAT)], xmm1 87*dfc6aa5cSAndroid Build Coastguard Worker 88*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 2*SIZEOF_JSAMPROW 89*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 2*DCTSIZE*SIZEOF_FAST_FLOAT 90*dfc6aa5cSAndroid Build Coastguard Worker dec ecx 91*dfc6aa5cSAndroid Build Coastguard Worker jnz short .convloop 92*dfc6aa5cSAndroid Build Coastguard Worker 93*dfc6aa5cSAndroid Build Coastguard Worker pop edi 94*dfc6aa5cSAndroid Build Coastguard Worker pop esi 95*dfc6aa5cSAndroid Build Coastguard Worker; pop edx ; need not be preserved 96*dfc6aa5cSAndroid Build Coastguard Worker; pop ecx ; need not be preserved 97*dfc6aa5cSAndroid Build Coastguard Worker pop ebx 98*dfc6aa5cSAndroid Build Coastguard Worker pop ebp 99*dfc6aa5cSAndroid Build Coastguard Worker ret 100*dfc6aa5cSAndroid Build Coastguard Worker 101*dfc6aa5cSAndroid Build Coastguard Worker; -------------------------------------------------------------------------- 102*dfc6aa5cSAndroid Build Coastguard Worker; 103*dfc6aa5cSAndroid Build Coastguard Worker; Quantize/descale the coefficients, and store into coef_block 104*dfc6aa5cSAndroid Build Coastguard Worker; 105*dfc6aa5cSAndroid Build Coastguard Worker; GLOBAL(void) 106*dfc6aa5cSAndroid Build Coastguard Worker; jsimd_quantize_float_sse2(JCOEFPTR coef_block, FAST_FLOAT *divisors, 107*dfc6aa5cSAndroid Build Coastguard Worker; FAST_FLOAT *workspace); 108*dfc6aa5cSAndroid Build Coastguard Worker; 109*dfc6aa5cSAndroid Build Coastguard Worker 110*dfc6aa5cSAndroid Build Coastguard Worker%define coef_block ebp + 8 ; JCOEFPTR coef_block 111*dfc6aa5cSAndroid Build Coastguard Worker%define divisors ebp + 12 ; FAST_FLOAT *divisors 112*dfc6aa5cSAndroid Build Coastguard Worker%define workspace ebp + 16 ; FAST_FLOAT *workspace 113*dfc6aa5cSAndroid Build Coastguard Worker 114*dfc6aa5cSAndroid Build Coastguard Worker align 32 115*dfc6aa5cSAndroid Build Coastguard Worker GLOBAL_FUNCTION(jsimd_quantize_float_sse2) 116*dfc6aa5cSAndroid Build Coastguard Worker 117*dfc6aa5cSAndroid Build Coastguard WorkerEXTN(jsimd_quantize_float_sse2): 118*dfc6aa5cSAndroid Build Coastguard Worker push ebp 119*dfc6aa5cSAndroid Build Coastguard Worker mov ebp, esp 120*dfc6aa5cSAndroid Build Coastguard Worker; push ebx ; unused 121*dfc6aa5cSAndroid Build Coastguard Worker; push ecx ; unused 122*dfc6aa5cSAndroid Build Coastguard Worker; push edx ; need not be preserved 123*dfc6aa5cSAndroid Build Coastguard Worker push esi 124*dfc6aa5cSAndroid Build Coastguard Worker push edi 125*dfc6aa5cSAndroid Build Coastguard Worker 126*dfc6aa5cSAndroid Build Coastguard Worker mov esi, POINTER [workspace] 127*dfc6aa5cSAndroid Build Coastguard Worker mov edx, POINTER [divisors] 128*dfc6aa5cSAndroid Build Coastguard Worker mov edi, JCOEFPTR [coef_block] 129*dfc6aa5cSAndroid Build Coastguard Worker mov eax, DCTSIZE2/16 130*dfc6aa5cSAndroid Build Coastguard Worker alignx 16, 7 131*dfc6aa5cSAndroid Build Coastguard Worker.quantloop: 132*dfc6aa5cSAndroid Build Coastguard Worker movaps xmm0, XMMWORD [XMMBLOCK(0,0,esi,SIZEOF_FAST_FLOAT)] 133*dfc6aa5cSAndroid Build Coastguard Worker movaps xmm1, XMMWORD [XMMBLOCK(0,1,esi,SIZEOF_FAST_FLOAT)] 134*dfc6aa5cSAndroid Build Coastguard Worker mulps xmm0, XMMWORD [XMMBLOCK(0,0,edx,SIZEOF_FAST_FLOAT)] 135*dfc6aa5cSAndroid Build Coastguard Worker mulps xmm1, XMMWORD [XMMBLOCK(0,1,edx,SIZEOF_FAST_FLOAT)] 136*dfc6aa5cSAndroid Build Coastguard Worker movaps xmm2, XMMWORD [XMMBLOCK(1,0,esi,SIZEOF_FAST_FLOAT)] 137*dfc6aa5cSAndroid Build Coastguard Worker movaps xmm3, XMMWORD [XMMBLOCK(1,1,esi,SIZEOF_FAST_FLOAT)] 138*dfc6aa5cSAndroid Build Coastguard Worker mulps xmm2, XMMWORD [XMMBLOCK(1,0,edx,SIZEOF_FAST_FLOAT)] 139*dfc6aa5cSAndroid Build Coastguard Worker mulps xmm3, XMMWORD [XMMBLOCK(1,1,edx,SIZEOF_FAST_FLOAT)] 140*dfc6aa5cSAndroid Build Coastguard Worker 141*dfc6aa5cSAndroid Build Coastguard Worker cvtps2dq xmm0, xmm0 142*dfc6aa5cSAndroid Build Coastguard Worker cvtps2dq xmm1, xmm1 143*dfc6aa5cSAndroid Build Coastguard Worker cvtps2dq xmm2, xmm2 144*dfc6aa5cSAndroid Build Coastguard Worker cvtps2dq xmm3, xmm3 145*dfc6aa5cSAndroid Build Coastguard Worker 146*dfc6aa5cSAndroid Build Coastguard Worker packssdw xmm0, xmm1 147*dfc6aa5cSAndroid Build Coastguard Worker packssdw xmm2, xmm3 148*dfc6aa5cSAndroid Build Coastguard Worker 149*dfc6aa5cSAndroid Build Coastguard Worker movdqa XMMWORD [XMMBLOCK(0,0,edi,SIZEOF_JCOEF)], xmm0 150*dfc6aa5cSAndroid Build Coastguard Worker movdqa XMMWORD [XMMBLOCK(1,0,edi,SIZEOF_JCOEF)], xmm2 151*dfc6aa5cSAndroid Build Coastguard Worker 152*dfc6aa5cSAndroid Build Coastguard Worker add esi, byte 16*SIZEOF_FAST_FLOAT 153*dfc6aa5cSAndroid Build Coastguard Worker add edx, byte 16*SIZEOF_FAST_FLOAT 154*dfc6aa5cSAndroid Build Coastguard Worker add edi, byte 16*SIZEOF_JCOEF 155*dfc6aa5cSAndroid Build Coastguard Worker dec eax 156*dfc6aa5cSAndroid Build Coastguard Worker jnz short .quantloop 157*dfc6aa5cSAndroid Build Coastguard Worker 158*dfc6aa5cSAndroid Build Coastguard Worker pop edi 159*dfc6aa5cSAndroid Build Coastguard Worker pop esi 160*dfc6aa5cSAndroid Build Coastguard Worker; pop edx ; need not be preserved 161*dfc6aa5cSAndroid Build Coastguard Worker; pop ecx ; unused 162*dfc6aa5cSAndroid Build Coastguard Worker; pop ebx ; unused 163*dfc6aa5cSAndroid Build Coastguard Worker pop ebp 164*dfc6aa5cSAndroid Build Coastguard Worker ret 165*dfc6aa5cSAndroid Build Coastguard Worker 166*dfc6aa5cSAndroid Build Coastguard Worker; For some reason, the OS X linker does not honor the request to align the 167*dfc6aa5cSAndroid Build Coastguard Worker; segment unless we do this. 168*dfc6aa5cSAndroid Build Coastguard Worker align 32 169