1*61046927SAndroid Build Coastguard Worker.section #gk104_builtin_code 2*61046927SAndroid Build Coastguard Worker// DIV U32 3*61046927SAndroid Build Coastguard Worker// 4*61046927SAndroid Build Coastguard Worker// UNR recurrence (q = a / b): 5*61046927SAndroid Build Coastguard Worker// look for z such that 2^32 - b <= b * z < 2^32 6*61046927SAndroid Build Coastguard Worker// then q - 1 <= (a * z) / 2^32 <= q 7*61046927SAndroid Build Coastguard Worker// 8*61046927SAndroid Build Coastguard Worker// INPUT: $r0: dividend, $r1: divisor 9*61046927SAndroid Build Coastguard Worker// OUTPUT: $r0: result, $r1: modulus 10*61046927SAndroid Build Coastguard Worker// CLOBBER: $r2 - $r3, $p0 - $p1 11*61046927SAndroid Build Coastguard Worker// SIZE: 22 / 14 * 8 bytes 12*61046927SAndroid Build Coastguard Worker// 13*61046927SAndroid Build Coastguard Workergk104_div_u32: 14*61046927SAndroid Build Coastguard Worker sched 0x28 0x4 0x28 0x4 0x28 0x28 0x28 15*61046927SAndroid Build Coastguard Worker bfind u32 $r2 $r1 16*61046927SAndroid Build Coastguard Worker long xor b32 $r2 $r2 0x1f 17*61046927SAndroid Build Coastguard Worker long mov b32 $r3 0x1 18*61046927SAndroid Build Coastguard Worker shl b32 $r2 $r3 clamp $r2 19*61046927SAndroid Build Coastguard Worker long cvt u32 $r1 neg u32 $r1 20*61046927SAndroid Build Coastguard Worker long mul $r3 u32 $r1 u32 $r2 21*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 22*61046927SAndroid Build Coastguard Worker sched 0x28 0x28 0x28 0x28 0x28 0x28 0x28 23*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 24*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 25*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 26*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 27*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 28*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 29*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 30*61046927SAndroid Build Coastguard Worker sched 0x4 0x28 0x4 0x28 0x28 0x2c 0x4 31*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 32*61046927SAndroid Build Coastguard Worker mov b32 $r3 $r0 33*61046927SAndroid Build Coastguard Worker mul high $r0 u32 $r0 u32 $r2 34*61046927SAndroid Build Coastguard Worker long cvt u32 $r2 neg u32 $r1 35*61046927SAndroid Build Coastguard Worker long add $r1 (mul u32 $r1 u32 $r0) $r3 36*61046927SAndroid Build Coastguard Worker set $p0 0x1 ge u32 $r1 $r2 37*61046927SAndroid Build Coastguard Worker $p0 sub b32 $r1 $r1 $r2 38*61046927SAndroid Build Coastguard Worker sched 0x28 0x2c 0x4 0x20 0x2e 0x28 0x20 39*61046927SAndroid Build Coastguard Worker $p0 add b32 $r0 $r0 0x1 40*61046927SAndroid Build Coastguard Worker $p0 set $p0 0x1 ge u32 $r1 $r2 41*61046927SAndroid Build Coastguard Worker $p0 sub b32 $r1 $r1 $r2 42*61046927SAndroid Build Coastguard Worker $p0 add b32 $r0 $r0 0x1 43*61046927SAndroid Build Coastguard Worker long ret 44*61046927SAndroid Build Coastguard Worker 45*61046927SAndroid Build Coastguard Worker// DIV S32, like DIV U32 after taking ABS(inputs) 46*61046927SAndroid Build Coastguard Worker// 47*61046927SAndroid Build Coastguard Worker// INPUT: $r0: dividend, $r1: divisor 48*61046927SAndroid Build Coastguard Worker// OUTPUT: $r0: result, $r1: modulus 49*61046927SAndroid Build Coastguard Worker// CLOBBER: $r2 - $r3, $p0 - $p3 50*61046927SAndroid Build Coastguard Worker// 51*61046927SAndroid Build Coastguard Workergk104_div_s32: 52*61046927SAndroid Build Coastguard Worker set $p2 0x1 lt s32 $r0 0x0 53*61046927SAndroid Build Coastguard Worker set $p3 0x1 lt s32 $r1 0x0 xor $p2 54*61046927SAndroid Build Coastguard Worker sched 0x20 0x28 0x28 0x4 0x28 0x04 0x28 55*61046927SAndroid Build Coastguard Worker long cvt s32 $r0 abs s32 $r0 56*61046927SAndroid Build Coastguard Worker long cvt s32 $r1 abs s32 $r1 57*61046927SAndroid Build Coastguard Worker bfind u32 $r2 $r1 58*61046927SAndroid Build Coastguard Worker long xor b32 $r2 $r2 0x1f 59*61046927SAndroid Build Coastguard Worker long mov b32 $r3 0x1 60*61046927SAndroid Build Coastguard Worker shl b32 $r2 $r3 clamp $r2 61*61046927SAndroid Build Coastguard Worker cvt u32 $r1 neg u32 $r1 62*61046927SAndroid Build Coastguard Worker sched 0x28 0x28 0x28 0x28 0x28 0x28 0x28 63*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 64*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 65*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 66*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 67*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 68*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 69*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 70*61046927SAndroid Build Coastguard Worker sched 0x28 0x28 0x4 0x28 0x04 0x28 0x28 71*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 72*61046927SAndroid Build Coastguard Worker mul $r3 u32 $r1 u32 $r2 73*61046927SAndroid Build Coastguard Worker add $r2 (mul high u32 $r2 u32 $r3) $r2 74*61046927SAndroid Build Coastguard Worker mov b32 $r3 $r0 75*61046927SAndroid Build Coastguard Worker mul high $r0 u32 $r0 u32 $r2 76*61046927SAndroid Build Coastguard Worker long cvt u32 $r2 neg u32 $r1 77*61046927SAndroid Build Coastguard Worker long add $r1 (mul u32 $r1 u32 $r0) $r3 78*61046927SAndroid Build Coastguard Worker sched 0x2c 0x04 0x28 0x2c 0x04 0x28 0x20 79*61046927SAndroid Build Coastguard Worker set $p0 0x1 ge u32 $r1 $r2 80*61046927SAndroid Build Coastguard Worker $p0 sub b32 $r1 $r1 $r2 81*61046927SAndroid Build Coastguard Worker $p0 add b32 $r0 $r0 0x1 82*61046927SAndroid Build Coastguard Worker $p0 set $p0 0x1 ge u32 $r1 $r2 83*61046927SAndroid Build Coastguard Worker $p0 sub b32 $r1 $r1 $r2 84*61046927SAndroid Build Coastguard Worker long $p0 add b32 $r0 $r0 0x1 85*61046927SAndroid Build Coastguard Worker long $p3 cvt s32 $r0 neg s32 $r0 86*61046927SAndroid Build Coastguard Worker sched 0x04 0x2e 0x04 0x28 0x04 0x20 0x2c 87*61046927SAndroid Build Coastguard Worker $p2 cvt s32 $r1 neg s32 $r1 88*61046927SAndroid Build Coastguard Worker long ret 89*61046927SAndroid Build Coastguard Worker 90*61046927SAndroid Build Coastguard Worker// SULDP [for each format] 91*61046927SAndroid Build Coastguard Worker// $r4d: address 92*61046927SAndroid Build Coastguard Worker// $r2: surface info (format) 93*61046927SAndroid Build Coastguard Worker// $p0: access predicate 94*61046927SAndroid Build Coastguard Worker// $p1, $p2: caching predicate (00: cv, 01: ca, 10: cg) 95*61046927SAndroid Build Coastguard Worker// 96*61046927SAndroid Build Coastguard Worker// RGBA32 97*61046927SAndroid Build Coastguard Worker$p1 suldgb b128 $r0q ca zero u8 g[$r4d] $r2 $p0 98*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 99*61046927SAndroid Build Coastguard Worker$p2 suldgb b128 $r0q cg zero u8 g[$r4d] $r2 $p0 100*61046927SAndroid Build Coastguard Worker$p1 suldgb b128 $r0q cv zero u8 g[$r4d] $r2 $p0 101*61046927SAndroid Build Coastguard Workerlong ret 102*61046927SAndroid Build Coastguard Worker// RGBA16_UNORM 103*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 104*61046927SAndroid Build Coastguard Worker$p1 suldgb b128 $r0q ca zero u8 g[$r4d] $r2 $p0 105*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 106*61046927SAndroid Build Coastguard Worker$p2 suldgb b128 $r0q cg zero u8 g[$r4d] $r2 $p0 107*61046927SAndroid Build Coastguard Worker$p1 suldgb b128 $r0q cv zero u8 g[$r4d] $r2 $p0 108*61046927SAndroid Build Coastguard Workercvt rn f32 $r3 u16 1 $r1 109*61046927SAndroid Build Coastguard Workercvt rn f32 $r2 u16 0 $r1 110*61046927SAndroid Build Coastguard Workermul f32 $r3 $r3 0x37800074 111*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 112*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 u16 1 $r0 113*61046927SAndroid Build Coastguard Workermul f32 $r2 $r2 0x37800074 114*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u16 0 $r0 115*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x37800074 116*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x37800074 117*61046927SAndroid Build Coastguard Workerlong ret 118*61046927SAndroid Build Coastguard Worker// RGBA16_SNORM 119*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d ca zero u8 g[$r4d] $r2 $p0 120*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 121*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 122*61046927SAndroid Build Coastguard Worker$p2 suldgb b64 $r0d cg zero u8 g[$r4d] $r2 $p0 123*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d cv zero u8 g[$r4d] $r2 $p0 124*61046927SAndroid Build Coastguard Workercvt rn f32 $r3 s16 1 $r1 125*61046927SAndroid Build Coastguard Workercvt rn f32 $r2 s16 0 $r1 126*61046927SAndroid Build Coastguard Workermul f32 $r3 $r3 0x38000187 127*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 s16 1 $r0 128*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 129*61046927SAndroid Build Coastguard Workermul f32 $r2 $r2 0x38000187 130*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 s16 0 $r0 131*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x38000187 132*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x38000187 133*61046927SAndroid Build Coastguard Workerlong ret 134*61046927SAndroid Build Coastguard Worker// RGBA16_SINT 135*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d ca zero u8 g[$r4d] $r2 $p0 136*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 137*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 138*61046927SAndroid Build Coastguard Worker$p2 suldgb b64 $r0d cg zero u8 g[$r4d] $r2 $p0 139*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d cv zero u8 g[$r4d] $r2 $p0 140*61046927SAndroid Build Coastguard Workercvt s32 $r3 s16 1 $r1 141*61046927SAndroid Build Coastguard Workercvt s32 $r2 s16 0 $r1 142*61046927SAndroid Build Coastguard Workercvt s32 $r1 s16 1 $r0 143*61046927SAndroid Build Coastguard Workercvt s32 $r0 s16 0 $r0 144*61046927SAndroid Build Coastguard Workerlong ret 145*61046927SAndroid Build Coastguard Worker// RGBA16_UINT 146*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 147*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d ca zero u8 g[$r4d] $r2 $p0 148*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 149*61046927SAndroid Build Coastguard Worker$p2 suldgb b64 $r0d cg zero u8 g[$r4d] $r2 $p0 150*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d cv zero u8 g[$r4d] $r2 $p0 151*61046927SAndroid Build Coastguard Workercvt u32 $r3 u16 1 $r1 152*61046927SAndroid Build Coastguard Workercvt u32 $r2 u16 0 $r1 153*61046927SAndroid Build Coastguard Workercvt u32 $r1 u16 1 $r0 154*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 155*61046927SAndroid Build Coastguard Workercvt u32 $r0 u16 0 $r0 156*61046927SAndroid Build Coastguard Workerlong ret 157*61046927SAndroid Build Coastguard Worker// RGBA16_FLOAT 158*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d ca zero u8 g[$r4d] $r2 $p0 159*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 160*61046927SAndroid Build Coastguard Worker$p2 suldgb b64 $r0d cg zero u8 g[$r4d] $r2 $p0 161*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d cv zero u8 g[$r4d] $r2 $p0 162*61046927SAndroid Build Coastguard Workercvt f32 $r3 f16 $r1 1 163*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 164*61046927SAndroid Build Coastguard Workercvt f32 $r2 f16 $r1 0 165*61046927SAndroid Build Coastguard Workercvt f32 $r1 f16 $r0 1 166*61046927SAndroid Build Coastguard Workercvt f32 $r0 f16 $r0 0 167*61046927SAndroid Build Coastguard Workerlong ret 168*61046927SAndroid Build Coastguard Worker// RG32_FLOAT 169*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d ca zero u8 g[$r4d] $r2 $p0 170*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 171*61046927SAndroid Build Coastguard Worker$p2 suldgb b64 $r0d cg zero u8 g[$r4d] $r2 $p0 172*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 173*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d cv zero u8 g[$r4d] $r2 $p0 174*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 175*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 176*61046927SAndroid Build Coastguard Workerlong ret 177*61046927SAndroid Build Coastguard Worker// RG32_xINT 178*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d ca zero u8 g[$r4d] $r2 $p0 179*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 180*61046927SAndroid Build Coastguard Worker$p2 suldgb b64 $r0d cg zero u8 g[$r4d] $r2 $p0 181*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 182*61046927SAndroid Build Coastguard Worker$p1 suldgb b64 $r0d cv zero u8 g[$r4d] $r2 $p0 183*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 184*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 185*61046927SAndroid Build Coastguard Workerlong ret 186*61046927SAndroid Build Coastguard Worker// RGB10A2_UNORM 187*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 188*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 189*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 190*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 191*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 192*61046927SAndroid Build Coastguard Workerext u32 $r1 $r0 0x0a0a 193*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 194*61046927SAndroid Build Coastguard Workerext u32 $r2 $r0 0x0a14 195*61046927SAndroid Build Coastguard Workerlong and b32 $r0 $r0 0x3ff 196*61046927SAndroid Build Coastguard Workercvt rn f32 $r2 u16 0 $r2 197*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 u16 0 $r1 198*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 199*61046927SAndroid Build Coastguard Workermul f32 $r2 $r2 0x3a802007 200*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u16 0 $r0 201*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x3a802007 202*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3a802007 203*61046927SAndroid Build Coastguard Workerlong ret 204*61046927SAndroid Build Coastguard Worker// RGB10A2_UINT 205*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 206*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 207*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 208*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 209*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 210*61046927SAndroid Build Coastguard Workerext u32 $r1 $r0 0x0a0a 211*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 212*61046927SAndroid Build Coastguard Workerext u32 $r2 $r0 0x0a14 213*61046927SAndroid Build Coastguard Workerlong and b32 $r0 $r0 0x3ff 214*61046927SAndroid Build Coastguard Workerlong ret 215*61046927SAndroid Build Coastguard Worker// RGBA8_UNORM 216*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 217*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 218*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 219*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 220*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 221*61046927SAndroid Build Coastguard Workercvt rn f32 $r3 u8 3 $r0 222*61046927SAndroid Build Coastguard Workercvt rn f32 $r2 u8 2 $r0 223*61046927SAndroid Build Coastguard Workermul f32 $r3 $r3 0x3b808081 224*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 225*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 u8 1 $r0 226*61046927SAndroid Build Coastguard Workermul f32 $r2 $r2 0x3b808081 227*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u8 0 $r0 228*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x3b808081 229*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3b808081 230*61046927SAndroid Build Coastguard Workerlong ret 231*61046927SAndroid Build Coastguard Worker// RGBA8_SNORM 232*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 233*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 234*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 235*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 236*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 237*61046927SAndroid Build Coastguard Workercvt rn f32 $r3 s8 3 $r0 238*61046927SAndroid Build Coastguard Workercvt rn f32 $r2 s8 2 $r0 239*61046927SAndroid Build Coastguard Workermul f32 $r3 $r3 0x3c010204 240*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 s8 1 $r0 241*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 242*61046927SAndroid Build Coastguard Workermul f32 $r2 $r2 0x3c010204 243*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 s8 0 $r0 244*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x3c010204 245*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3c010204 246*61046927SAndroid Build Coastguard Workerlong ret 247*61046927SAndroid Build Coastguard Worker// RGBA8_SINT 248*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 249*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 250*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 251*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 252*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 253*61046927SAndroid Build Coastguard Workercvt s32 $r3 s8 3 $r0 254*61046927SAndroid Build Coastguard Workercvt s32 $r2 s8 2 $r0 255*61046927SAndroid Build Coastguard Workercvt s32 $r1 s8 1 $r0 256*61046927SAndroid Build Coastguard Workercvt s32 $r0 s8 0 $r0 257*61046927SAndroid Build Coastguard Workerlong ret 258*61046927SAndroid Build Coastguard Worker// RGBA8_UINT 259*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 260*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 261*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 262*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 263*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 264*61046927SAndroid Build Coastguard Workercvt u32 $r3 u8 3 $r0 265*61046927SAndroid Build Coastguard Workercvt u32 $r2 u8 2 $r0 266*61046927SAndroid Build Coastguard Workercvt u32 $r1 u8 1 $r0 267*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 268*61046927SAndroid Build Coastguard Workercvt u32 $r0 u8 0 $r0 269*61046927SAndroid Build Coastguard Workerlong ret 270*61046927SAndroid Build Coastguard Worker// R5G6B5_UNORM 271*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 272*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 273*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 274*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 275*61046927SAndroid Build Coastguard Workerext u32 $r1 $r0 0x0605 276*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 277*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 278*61046927SAndroid Build Coastguard Workerext u32 $r2 $r0 0x050b 279*61046927SAndroid Build Coastguard Workerlong and b32 $r0 $r0 0x1f 280*61046927SAndroid Build Coastguard Workercvt rn f32 $r2 u8 0 $r2 281*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 u8 0 $r1 282*61046927SAndroid Build Coastguard Workermul f32 $r2 $r2 0x3d042108 283*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u8 0 $r0 284*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 285*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x3c820821 286*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3d042108 287*61046927SAndroid Build Coastguard Workerlong ret 288*61046927SAndroid Build Coastguard Worker// R5G5B5X1_UNORM 289*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 290*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 291*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 292*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 293*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 294*61046927SAndroid Build Coastguard Workerext u32 $r1 $r0 0x0505 295*61046927SAndroid Build Coastguard Workerext u32 $r2 $r0 0x050a 296*61046927SAndroid Build Coastguard Workerlong and b32 $r0 $r0 0x1f 297*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 298*61046927SAndroid Build Coastguard Workercvt rn f32 $r2 u8 0 $r2 299*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 u8 0 $r1 300*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u8 0 $r0 301*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 302*61046927SAndroid Build Coastguard Workermul f32 $r2 $r2 0x3d042108 303*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x3d042108 304*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3d042108 305*61046927SAndroid Build Coastguard Workerlong ret 306*61046927SAndroid Build Coastguard Worker// RG16_UNORM 307*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 308*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 309*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 310*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 311*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 312*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 u16 1 $r0 313*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u16 0 $r0 314*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x37800074 315*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x37800074 316*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 317*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 318*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 319*61046927SAndroid Build Coastguard Workerlong ret 320*61046927SAndroid Build Coastguard Worker// RG16_SNORM 321*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 322*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 323*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 324*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 325*61046927SAndroid Build Coastguard Workermov b32 $r3 0x3f800000 326*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 s16 1 $r0 327*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 328*61046927SAndroid Build Coastguard Workermov b32 $r2 0x00000000 329*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 s16 0 $r0 330*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x38000187 331*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x38000187 332*61046927SAndroid Build Coastguard Workerlong ret 333*61046927SAndroid Build Coastguard Worker// RG16_SINT 334*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 335*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 336*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 337*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 338*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 339*61046927SAndroid Build Coastguard Workermov b32 $r3 0x00000001 340*61046927SAndroid Build Coastguard Workercvt s32 $r1 s16 1 $r0 341*61046927SAndroid Build Coastguard Workermov b32 $r2 0x00000000 342*61046927SAndroid Build Coastguard Workercvt s32 $r0 s16 0 $r0 343*61046927SAndroid Build Coastguard Workerlong ret 344*61046927SAndroid Build Coastguard Worker// RG16_UINT 345*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 346*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 347*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 348*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 349*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 350*61046927SAndroid Build Coastguard Workermov b32 $r3 0x00000001 351*61046927SAndroid Build Coastguard Workercvt u32 $r1 u16 1 $r0 352*61046927SAndroid Build Coastguard Workermov b32 $r2 0x00000000 353*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 354*61046927SAndroid Build Coastguard Workercvt u32 $r0 u16 0 $r0 355*61046927SAndroid Build Coastguard Workerlong ret 356*61046927SAndroid Build Coastguard Worker// RG16_FLOAT 357*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 358*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 359*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 360*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 361*61046927SAndroid Build Coastguard Workermov b32 $r3 0x3f800000 362*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 363*61046927SAndroid Build Coastguard Workercvt f32 $r1 f16 $r0 1 364*61046927SAndroid Build Coastguard Workermov b32 $r2 0x00000000 365*61046927SAndroid Build Coastguard Workercvt f32 $r0 f16 $r0 0 366*61046927SAndroid Build Coastguard Workerlong ret 367*61046927SAndroid Build Coastguard Worker// R32_FLOAT 368*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 369*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 370*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 371*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 372*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 373*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 374*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 375*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 376*61046927SAndroid Build Coastguard Workerlong ret 377*61046927SAndroid Build Coastguard Worker// R32_xINT 378*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 ca zero u8 g[$r4d] $r2 $p0 379*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 380*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 381*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r0 cg zero u8 g[$r4d] $r2 $p0 382*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r0 cv zero u8 g[$r4d] $r2 $p0 383*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 384*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 385*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 386*61046927SAndroid Build Coastguard Workerlong ret 387*61046927SAndroid Build Coastguard Worker// RG8_UNORM 388*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 389*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 390*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 391*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 392*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 393*61046927SAndroid Build Coastguard Workermov b32 $r3 0x3f800000 394*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 u8 1 $r0 395*61046927SAndroid Build Coastguard Workermov b32 $r2 0x00000000 396*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u8 0 $r0 397*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 398*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x3b808081 399*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3b808081 400*61046927SAndroid Build Coastguard Workerlong ret 401*61046927SAndroid Build Coastguard Worker// RG8_SNORM 402*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 403*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 404*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 405*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 406*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 407*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 408*61046927SAndroid Build Coastguard Workercvt rn f32 $r1 s8 1 $r0 409*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 410*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 s8 0 $r0 411*61046927SAndroid Build Coastguard Workermul f32 $r1 $r1 0x3c010204 412*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3c010204 413*61046927SAndroid Build Coastguard Workerlong ret 414*61046927SAndroid Build Coastguard Worker// RG8_UINT 415*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 416*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 417*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 418*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 419*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 420*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 421*61046927SAndroid Build Coastguard Workercvt u32 $r1 u8 1 $r0 422*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 423*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 424*61046927SAndroid Build Coastguard Workercvt u32 $r0 u8 0 $r0 425*61046927SAndroid Build Coastguard Workerlong ret 426*61046927SAndroid Build Coastguard Worker// RG8_SINT 427*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 428*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 429*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 430*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 431*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 432*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 433*61046927SAndroid Build Coastguard Workercvt s32 $r1 s8 1 $r0 434*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 435*61046927SAndroid Build Coastguard Workercvt s32 $r0 s8 0 $r0 436*61046927SAndroid Build Coastguard Workerlong ret 437*61046927SAndroid Build Coastguard Worker// R16_UNORM 438*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 439*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 440*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 441*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 442*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 443*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 444*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u16 0 $r0 445*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 446*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 447*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x37800074 448*61046927SAndroid Build Coastguard Workerlong ret 449*61046927SAndroid Build Coastguard Worker// R16_SNORM 450*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 451*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 452*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 453*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 454*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 455*61046927SAndroid Build Coastguard Workermov b32 $r3 0x3f800000 456*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 s16 0 $r0 457*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 458*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 459*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 460*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x38000187 461*61046927SAndroid Build Coastguard Workerlong ret 462*61046927SAndroid Build Coastguard Worker// R16_SINT 463*61046927SAndroid Build Coastguard Worker$p1 suldgb s16 $r0 ca zero u8 g[$r4d] $r2 $p0 464*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 465*61046927SAndroid Build Coastguard Worker$p2 suldgb s16 $r0 cg zero u8 g[$r4d] $r2 $p0 466*61046927SAndroid Build Coastguard Worker$p1 suldgb s16 $r0 cv zero u8 g[$r4d] $r2 $p0 467*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 468*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 469*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 470*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 471*61046927SAndroid Build Coastguard Workerlong ret 472*61046927SAndroid Build Coastguard Worker// R16_UINT 473*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 474*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 475*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 476*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 477*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 478*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 479*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 480*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 481*61046927SAndroid Build Coastguard Workerlong ret 482*61046927SAndroid Build Coastguard Worker// R16_FLOAT 483*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 ca zero u8 g[$r4d] $r2 $p0 484*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 485*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 486*61046927SAndroid Build Coastguard Worker$p2 suldgb u16 $r0 cg zero u8 g[$r4d] $r2 $p0 487*61046927SAndroid Build Coastguard Worker$p1 suldgb u16 $r0 cv zero u8 g[$r4d] $r2 $p0 488*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 489*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 490*61046927SAndroid Build Coastguard Workercvt f32 $r0 f16 $r0 0 491*61046927SAndroid Build Coastguard Workermov b32 $r1 0x00000000 492*61046927SAndroid Build Coastguard Workerlong ret 493*61046927SAndroid Build Coastguard Worker// R8_UNORM 494*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 495*61046927SAndroid Build Coastguard Worker$p1 suldgb u8 $r0 ca zero u8 g[$r4d] $r2 $p0 496*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 497*61046927SAndroid Build Coastguard Worker$p2 suldgb u8 $r0 cg zero u8 g[$r4d] $r2 $p0 498*61046927SAndroid Build Coastguard Worker$p1 suldgb u8 $r0 cv zero u8 g[$r4d] $r2 $p0 499*61046927SAndroid Build Coastguard Workermov b32 $r3 0x3f800000 500*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 u8 0 $r0 501*61046927SAndroid Build Coastguard Workermov b32 $r2 0x00000000 502*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 503*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3b808081 504*61046927SAndroid Build Coastguard Workermov b32 $r1 0x00000000 505*61046927SAndroid Build Coastguard Workerlong ret 506*61046927SAndroid Build Coastguard Worker// R8_SNORM 507*61046927SAndroid Build Coastguard Worker$p1 suldgb u8 $r0 ca zero u8 g[$r4d] $r2 $p0 508*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 509*61046927SAndroid Build Coastguard Worker$p2 suldgb u8 $r0 cg zero u8 g[$r4d] $r2 $p0 510*61046927SAndroid Build Coastguard Worker$p1 suldgb u8 $r0 cv zero u8 g[$r4d] $r2 $p0 511*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 512*61046927SAndroid Build Coastguard Workermov b32 $r3 0x3f800000 513*61046927SAndroid Build Coastguard Workercvt rn f32 $r0 s8 0 $r0 514*61046927SAndroid Build Coastguard Workermov b32 $r2 0x00000000 515*61046927SAndroid Build Coastguard Workermul f32 $r0 $r0 0x3c010204 516*61046927SAndroid Build Coastguard Workermov b32 $r1 0x00000000 517*61046927SAndroid Build Coastguard Workerlong ret 518*61046927SAndroid Build Coastguard Worker// R8_SINT 519*61046927SAndroid Build Coastguard Worker$p1 suldgb s8 $r0 ca zero u8 g[$r4d] $r2 $p0 520*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 521*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 522*61046927SAndroid Build Coastguard Worker$p2 suldgb s8 $r0 cg zero u8 g[$r4d] $r2 $p0 523*61046927SAndroid Build Coastguard Worker$p1 suldgb s8 $r0 cv zero u8 g[$r4d] $r2 $p0 524*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 525*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 526*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 527*61046927SAndroid Build Coastguard Workerlong ret 528*61046927SAndroid Build Coastguard Worker// R8_UINT 529*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 530*61046927SAndroid Build Coastguard Worker$p1 suldgb u8 $r0 ca zero u8 g[$r4d] $r2 $p0 531*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 532*61046927SAndroid Build Coastguard Worker$p2 suldgb u8 $r0 cg zero u8 g[$r4d] $r2 $p0 533*61046927SAndroid Build Coastguard Worker$p1 suldgb u8 $r0 cv zero u8 g[$r4d] $r2 $p0 534*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x00000001 535*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 0x00000000 536*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 0x00000000 537*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 538*61046927SAndroid Build Coastguard Workerlong ret 539*61046927SAndroid Build Coastguard Worker// R11G11B10_FLOAT TODO 540*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r3 ca zero u8 g[$r4d] $r2 $p0 541*61046927SAndroid Build Coastguard Workerset $p1 0x1 $p1 xor not $p2 542*61046927SAndroid Build Coastguard Worker$p2 suldgb b32 $r3 cg zero u8 g[$r4d] $r2 $p0 543*61046927SAndroid Build Coastguard Worker$p1 suldgb b32 $r3 cv zero u8 g[$r4d] $r2 $p0 544*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 0x3f800000 545*61046927SAndroid Build Coastguard Workerlong nop 546*61046927SAndroid Build Coastguard Workersched 0x00 0x00 0x00 0x00 0x00 0x00 0x00 547*61046927SAndroid Build Coastguard Workerlong nop 548*61046927SAndroid Build Coastguard Workerlong ret 549*61046927SAndroid Build Coastguard Worker 550*61046927SAndroid Build Coastguard Worker 551*61046927SAndroid Build Coastguard Worker// RCP F64: Newton Raphson reciprocal(x): r_{i+1} = r_i * (2.0 - x * r_i) 552*61046927SAndroid Build Coastguard Worker// 553*61046927SAndroid Build Coastguard Worker// INPUT: $r0d (x) 554*61046927SAndroid Build Coastguard Worker// OUTPUT: $r0d (rcp(x)) 555*61046927SAndroid Build Coastguard Worker// CLOBBER: $r2 - $r7 556*61046927SAndroid Build Coastguard Worker// SIZE: 9 * 8 bytes 557*61046927SAndroid Build Coastguard Worker// 558*61046927SAndroid Build Coastguard Workergk104_rcp_f64: 559*61046927SAndroid Build Coastguard Worker // Step 1: classify input according to exponent and value, and calculate 560*61046927SAndroid Build Coastguard Worker // result for 0/inf/nan. $r2 holds the exponent value, which starts at 561*61046927SAndroid Build Coastguard Worker // bit 52 (bit 20 of the upper half) and is 11 bits in length 562*61046927SAndroid Build Coastguard Worker ext u32 $r2 $r1 0xb14 563*61046927SAndroid Build Coastguard Worker add b32 $r3 $r2 0xffffffff 564*61046927SAndroid Build Coastguard Worker joinat #rcp_rejoin 565*61046927SAndroid Build Coastguard Worker // We want to check whether the exponent is 0 or 0x7ff (i.e. NaN, inf, 566*61046927SAndroid Build Coastguard Worker // denorm, or 0). Do this by subtracting 1 from the exponent, which will 567*61046927SAndroid Build Coastguard Worker // mean that it's > 0x7fd in those cases when doing unsigned comparison 568*61046927SAndroid Build Coastguard Worker set $p0 0x1 gt u32 $r3 0x7fd 569*61046927SAndroid Build Coastguard Worker // $r3: 0 for norms, 0x36 for denorms, -1 for others 570*61046927SAndroid Build Coastguard Worker long mov b32 $r3 0x0 571*61046927SAndroid Build Coastguard Worker sched 0x2f 0x04 0x2d 0x2b 0x2f 0x28 0x28 572*61046927SAndroid Build Coastguard Worker join (not $p0) nop 573*61046927SAndroid Build Coastguard Worker // Process all special values: NaN, inf, denorm, 0 574*61046927SAndroid Build Coastguard Worker mov b32 $r3 0xffffffff 575*61046927SAndroid Build Coastguard Worker // A number is NaN if its abs value is greater than or unordered with inf 576*61046927SAndroid Build Coastguard Worker set $p0 0x1 gtu f64 abs $r0d 0x7ff0000000000000 577*61046927SAndroid Build Coastguard Worker (not $p0) bra #rcp_inf_or_denorm_or_zero 578*61046927SAndroid Build Coastguard Worker // NaN -> NaN, the next line sets the "quiet" bit of the result. This 579*61046927SAndroid Build Coastguard Worker // behavior is both seen on the CPU and the blob 580*61046927SAndroid Build Coastguard Worker join or b32 $r1 $r1 0x80000 581*61046927SAndroid Build Coastguard Workerrcp_inf_or_denorm_or_zero: 582*61046927SAndroid Build Coastguard Worker and b32 $r4 $r1 0x7ff00000 583*61046927SAndroid Build Coastguard Worker // Other values with nonzero in exponent field should be inf 584*61046927SAndroid Build Coastguard Worker set $p0 0x1 eq s32 $r4 0x0 585*61046927SAndroid Build Coastguard Worker sched 0x2b 0x04 0x2f 0x2d 0x2b 0x2f 0x20 586*61046927SAndroid Build Coastguard Worker $p0 bra #rcp_denorm_or_zero 587*61046927SAndroid Build Coastguard Worker // +/-Inf -> +/-0 588*61046927SAndroid Build Coastguard Worker xor b32 $r1 $r1 0x7ff00000 589*61046927SAndroid Build Coastguard Worker join mov b32 $r0 0x0 590*61046927SAndroid Build Coastguard Workerrcp_denorm_or_zero: 591*61046927SAndroid Build Coastguard Worker set $p0 0x1 gtu f64 abs $r0d 0x0 592*61046927SAndroid Build Coastguard Worker $p0 bra #rcp_denorm 593*61046927SAndroid Build Coastguard Worker // +/-0 -> +/-Inf 594*61046927SAndroid Build Coastguard Worker join or b32 $r1 $r1 0x7ff00000 595*61046927SAndroid Build Coastguard Workerrcp_denorm: 596*61046927SAndroid Build Coastguard Worker // non-0 denorms: multiply with 2^54 (the 0x36 in $r3), join with norms 597*61046927SAndroid Build Coastguard Worker mul rn f64 $r0d $r0d 0x4350000000000000 598*61046927SAndroid Build Coastguard Worker sched 0x2f 0x28 0x2b 0x28 0x28 0x04 0x28 599*61046927SAndroid Build Coastguard Worker join mov b32 $r3 0x36 600*61046927SAndroid Build Coastguard Workerrcp_rejoin: 601*61046927SAndroid Build Coastguard Worker // All numbers with -1 in $r3 have their result ready in $r0d, return them 602*61046927SAndroid Build Coastguard Worker // others need further calculation 603*61046927SAndroid Build Coastguard Worker set $p0 0x1 lt s32 $r3 0x0 604*61046927SAndroid Build Coastguard Worker $p0 bra #rcp_end 605*61046927SAndroid Build Coastguard Worker // Step 2: Before the real calculation goes on, renormalize the values to 606*61046927SAndroid Build Coastguard Worker // range [1, 2) by setting exponent field to 0x3ff (the exponent of 1) 607*61046927SAndroid Build Coastguard Worker // result in $r6d. The exponent will be recovered later. 608*61046927SAndroid Build Coastguard Worker ext u32 $r2 $r1 0xb14 609*61046927SAndroid Build Coastguard Worker and b32 $r7 $r1 0x800fffff 610*61046927SAndroid Build Coastguard Worker add b32 $r7 $r7 0x3ff00000 611*61046927SAndroid Build Coastguard Worker long mov b32 $r6 $r0 612*61046927SAndroid Build Coastguard Worker sched 0x2b 0x04 0x28 0x28 0x2a 0x2b 0x2e 613*61046927SAndroid Build Coastguard Worker // Step 3: Convert new value to float (no overflow will occur due to step 614*61046927SAndroid Build Coastguard Worker // 2), calculate rcp and do newton-raphson step once 615*61046927SAndroid Build Coastguard Worker cvt rz f32 $r5 f64 $r6d 616*61046927SAndroid Build Coastguard Worker long rcp f32 $r4 $r5 617*61046927SAndroid Build Coastguard Worker mov b32 $r0 0xbf800000 618*61046927SAndroid Build Coastguard Worker fma rn f32 $r5 $r4 $r5 $r0 619*61046927SAndroid Build Coastguard Worker fma rn f32 $r0 neg $r4 $r5 $r4 620*61046927SAndroid Build Coastguard Worker // Step 4: convert result $r0 back to double, do newton-raphson steps 621*61046927SAndroid Build Coastguard Worker cvt f64 $r0d f32 $r0 622*61046927SAndroid Build Coastguard Worker cvt f64 $r6d neg f64 $r6d 623*61046927SAndroid Build Coastguard Worker sched 0x2e 0x29 0x29 0x29 0x29 0x29 0x29 624*61046927SAndroid Build Coastguard Worker cvt f64 $r8d f32 0x3f800000 625*61046927SAndroid Build Coastguard Worker // 4 Newton-Raphson Steps, tmp in $r4d, result in $r0d 626*61046927SAndroid Build Coastguard Worker // The formula used here (and above) is: 627*61046927SAndroid Build Coastguard Worker // RCP_{n + 1} = 2 * RCP_{n} - x * RCP_{n} * RCP_{n} 628*61046927SAndroid Build Coastguard Worker // The following code uses 2 FMAs for each step, and it will basically 629*61046927SAndroid Build Coastguard Worker // looks like: 630*61046927SAndroid Build Coastguard Worker // tmp = -src * RCP_{n} + 1 631*61046927SAndroid Build Coastguard Worker // RCP_{n + 1} = RCP_{n} * tmp + RCP_{n} 632*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r6d $r0d $r8d 633*61046927SAndroid Build Coastguard Worker fma rn f64 $r0d $r0d $r4d $r0d 634*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r6d $r0d $r8d 635*61046927SAndroid Build Coastguard Worker fma rn f64 $r0d $r0d $r4d $r0d 636*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r6d $r0d $r8d 637*61046927SAndroid Build Coastguard Worker fma rn f64 $r0d $r0d $r4d $r0d 638*61046927SAndroid Build Coastguard Worker sched 0x29 0x20 0x28 0x28 0x28 0x28 0x28 639*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r6d $r0d $r8d 640*61046927SAndroid Build Coastguard Worker fma rn f64 $r0d $r0d $r4d $r0d 641*61046927SAndroid Build Coastguard Worker // Step 5: Exponent recovery and final processing 642*61046927SAndroid Build Coastguard Worker // The exponent is recovered by adding what we added to the exponent. 643*61046927SAndroid Build Coastguard Worker // Suppose we want to calculate rcp(x), but we have rcp(cx), then 644*61046927SAndroid Build Coastguard Worker // rcp(x) = c * rcp(cx) 645*61046927SAndroid Build Coastguard Worker // The delta in exponent comes from two sources: 646*61046927SAndroid Build Coastguard Worker // 1) The renormalization in step 2. The delta is: 647*61046927SAndroid Build Coastguard Worker // 0x3ff - $r2 648*61046927SAndroid Build Coastguard Worker // 2) (For the denorm input) The 2^54 we multiplied at rcp_denorm, stored 649*61046927SAndroid Build Coastguard Worker // in $r3 650*61046927SAndroid Build Coastguard Worker // These 2 sources are calculated in the first two lines below, and then 651*61046927SAndroid Build Coastguard Worker // added to the exponent extracted from the result above. 652*61046927SAndroid Build Coastguard Worker // Note that after processing, the new exponent may >= 0x7ff (inf) 653*61046927SAndroid Build Coastguard Worker // or <= 0 (denorm). Those cases will be handled respectively below 654*61046927SAndroid Build Coastguard Worker subr b32 $r2 $r2 0x3ff 655*61046927SAndroid Build Coastguard Worker long add b32 $r4 $r2 $r3 656*61046927SAndroid Build Coastguard Worker ext u32 $r3 $r1 0xb14 657*61046927SAndroid Build Coastguard Worker // New exponent in $r3 658*61046927SAndroid Build Coastguard Worker long add b32 $r3 $r3 $r4 659*61046927SAndroid Build Coastguard Worker add b32 $r2 $r3 0xffffffff 660*61046927SAndroid Build Coastguard Worker sched 0x28 0x2b 0x28 0x2b 0x28 0x28 0x2b 661*61046927SAndroid Build Coastguard Worker // (exponent-1) < 0x7fe (unsigned) means the result is in norm range 662*61046927SAndroid Build Coastguard Worker // (same logic as in step 1) 663*61046927SAndroid Build Coastguard Worker set $p0 0x1 lt u32 $r2 0x7fe 664*61046927SAndroid Build Coastguard Worker (not $p0) bra #rcp_result_inf_or_denorm 665*61046927SAndroid Build Coastguard Worker // Norms: convert exponents back and return 666*61046927SAndroid Build Coastguard Worker shl b32 $r4 $r4 clamp 0x14 667*61046927SAndroid Build Coastguard Worker long add b32 $r1 $r4 $r1 668*61046927SAndroid Build Coastguard Worker bra #rcp_end 669*61046927SAndroid Build Coastguard Workerrcp_result_inf_or_denorm: 670*61046927SAndroid Build Coastguard Worker // New exponent >= 0x7ff means that result is inf 671*61046927SAndroid Build Coastguard Worker set $p0 0x1 ge s32 $r3 0x7ff 672*61046927SAndroid Build Coastguard Worker (not $p0) bra #rcp_result_denorm 673*61046927SAndroid Build Coastguard Worker sched 0x20 0x25 0x28 0x2b 0x23 0x25 0x2f 674*61046927SAndroid Build Coastguard Worker // Infinity 675*61046927SAndroid Build Coastguard Worker and b32 $r1 $r1 0x80000000 676*61046927SAndroid Build Coastguard Worker long mov b32 $r0 0x0 677*61046927SAndroid Build Coastguard Worker add b32 $r1 $r1 0x7ff00000 678*61046927SAndroid Build Coastguard Worker bra #rcp_end 679*61046927SAndroid Build Coastguard Workerrcp_result_denorm: 680*61046927SAndroid Build Coastguard Worker // Denorm result comes from huge input. The greatest possible fp64, i.e. 681*61046927SAndroid Build Coastguard Worker // 0x7fefffffffffffff's rcp is 0x0004000000000000, 1/4 of the smallest 682*61046927SAndroid Build Coastguard Worker // normal value. Other rcp result should be greater than that. If we 683*61046927SAndroid Build Coastguard Worker // set the exponent field to 1, we can recover the result by multiplying 684*61046927SAndroid Build Coastguard Worker // it with 1/2 or 1/4. 1/2 is used if the "exponent" $r3 is 0, otherwise 685*61046927SAndroid Build Coastguard Worker // 1/4 ($r3 should be -1 then). This is quite tricky but greatly simplifies 686*61046927SAndroid Build Coastguard Worker // the logic here. 687*61046927SAndroid Build Coastguard Worker set $p0 0x1 ne u32 $r3 0x0 688*61046927SAndroid Build Coastguard Worker and b32 $r1 $r1 0x800fffff 689*61046927SAndroid Build Coastguard Worker // 0x3e800000: 1/4 690*61046927SAndroid Build Coastguard Worker $p0 cvt f64 $r6d f32 0x3e800000 691*61046927SAndroid Build Coastguard Worker sched 0x2f 0x28 0x2c 0x2e 0x2a 0x20 0x27 692*61046927SAndroid Build Coastguard Worker // 0x3f000000: 1/2 693*61046927SAndroid Build Coastguard Worker (not $p0) cvt f64 $r6d f32 0x3f000000 694*61046927SAndroid Build Coastguard Worker add b32 $r1 $r1 0x00100000 695*61046927SAndroid Build Coastguard Worker mul rn f64 $r0d $r0d $r6d 696*61046927SAndroid Build Coastguard Workerrcp_end: 697*61046927SAndroid Build Coastguard Worker long ret 698*61046927SAndroid Build Coastguard Worker 699*61046927SAndroid Build Coastguard Worker// RSQ F64: Newton Raphson rsqrt(x): r_{i+1} = r_i * (1.5 - 0.5 * x * r_i * r_i) 700*61046927SAndroid Build Coastguard Worker// 701*61046927SAndroid Build Coastguard Worker// INPUT: $r0d (x) 702*61046927SAndroid Build Coastguard Worker// OUTPUT: $r0d (rsqrt(x)) 703*61046927SAndroid Build Coastguard Worker// CLOBBER: $r2 - $r7 704*61046927SAndroid Build Coastguard Worker// SIZE: 14 * 8 bytes 705*61046927SAndroid Build Coastguard Worker// 706*61046927SAndroid Build Coastguard Workergk104_rsq_f64: 707*61046927SAndroid Build Coastguard Worker // Before getting initial result rsqrt64h, two special cases should be 708*61046927SAndroid Build Coastguard Worker // handled first. 709*61046927SAndroid Build Coastguard Worker // 1. NaN: set the highest bit in mantissa so it'll be surely recognized 710*61046927SAndroid Build Coastguard Worker // as NaN in rsqrt64h 711*61046927SAndroid Build Coastguard Worker set $p0 0x1 gtu f64 abs $r0d 0x7ff0000000000000 712*61046927SAndroid Build Coastguard Worker $p0 or b32 $r1 $r1 0x00080000 713*61046927SAndroid Build Coastguard Worker and b32 $r2 $r1 0x7fffffff 714*61046927SAndroid Build Coastguard Worker sched 0x27 0x20 0x28 0x2c 0x25 0x28 0x28 715*61046927SAndroid Build Coastguard Worker // 2. denorms and small normal values: using their original value will 716*61046927SAndroid Build Coastguard Worker // lose precision either at rsqrt64h or the first step in newton-raphson 717*61046927SAndroid Build Coastguard Worker // steps below. Take 2 as a threshold in exponent field, and multiply 718*61046927SAndroid Build Coastguard Worker // with 2^54 if the exponent is smaller or equal. (will multiply 2^27 719*61046927SAndroid Build Coastguard Worker // to recover in the end) 720*61046927SAndroid Build Coastguard Worker ext u32 $r3 $r1 0xb14 721*61046927SAndroid Build Coastguard Worker set $p1 0x1 le u32 $r3 0x2 722*61046927SAndroid Build Coastguard Worker long or b32 $r2 $r0 $r2 723*61046927SAndroid Build Coastguard Worker $p1 mul rn f64 $r0d $r0d 0x4350000000000000 724*61046927SAndroid Build Coastguard Worker rsqrt64h $r5 $r1 725*61046927SAndroid Build Coastguard Worker // rsqrt64h will give correct result for 0/inf/nan, the following logic 726*61046927SAndroid Build Coastguard Worker // checks whether the input is one of those (exponent is 0x7ff or all 0 727*61046927SAndroid Build Coastguard Worker // except for the sign bit) 728*61046927SAndroid Build Coastguard Worker set b32 $r6 ne u32 $r3 0x7ff 729*61046927SAndroid Build Coastguard Worker long and b32 $r2 $r2 $r6 730*61046927SAndroid Build Coastguard Worker sched 0x28 0x2b 0x20 0x27 0x28 0x2e 0x28 731*61046927SAndroid Build Coastguard Worker set $p0 0x1 ne u32 $r2 0x0 732*61046927SAndroid Build Coastguard Worker $p0 bra #rsq_norm 733*61046927SAndroid Build Coastguard Worker // For 0/inf/nan, make sure the sign bit agrees with input and return 734*61046927SAndroid Build Coastguard Worker and b32 $r1 $r1 0x80000000 735*61046927SAndroid Build Coastguard Worker long mov b32 $r0 0x0 736*61046927SAndroid Build Coastguard Worker long or b32 $r1 $r1 $r5 737*61046927SAndroid Build Coastguard Worker long ret 738*61046927SAndroid Build Coastguard Workerrsq_norm: 739*61046927SAndroid Build Coastguard Worker // For others, do 4 Newton-Raphson steps with the formula: 740*61046927SAndroid Build Coastguard Worker // RSQ_{n + 1} = RSQ_{n} * (1.5 - 0.5 * x * RSQ_{n} * RSQ_{n}) 741*61046927SAndroid Build Coastguard Worker // In the code below, each step is written as: 742*61046927SAndroid Build Coastguard Worker // tmp1 = 0.5 * x * RSQ_{n} 743*61046927SAndroid Build Coastguard Worker // tmp2 = -RSQ_{n} * tmp1 + 0.5 744*61046927SAndroid Build Coastguard Worker // RSQ_{n + 1} = RSQ_{n} * tmp2 + RSQ_{n} 745*61046927SAndroid Build Coastguard Worker long mov b32 $r4 0x0 746*61046927SAndroid Build Coastguard Worker sched 0x2f 0x29 0x29 0x29 0x29 0x29 0x29 747*61046927SAndroid Build Coastguard Worker // 0x3f000000: 1/2 748*61046927SAndroid Build Coastguard Worker cvt f64 $r8d f32 0x3f000000 749*61046927SAndroid Build Coastguard Worker mul rn f64 $r2d $r0d $r8d 750*61046927SAndroid Build Coastguard Worker mul rn f64 $r0d $r2d $r4d 751*61046927SAndroid Build Coastguard Worker fma rn f64 $r6d neg $r4d $r0d $r8d 752*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r4d $r6d $r4d 753*61046927SAndroid Build Coastguard Worker mul rn f64 $r0d $r2d $r4d 754*61046927SAndroid Build Coastguard Worker fma rn f64 $r6d neg $r4d $r0d $r8d 755*61046927SAndroid Build Coastguard Worker sched 0x29 0x29 0x29 0x29 0x29 0x29 0x29 756*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r4d $r6d $r4d 757*61046927SAndroid Build Coastguard Worker mul rn f64 $r0d $r2d $r4d 758*61046927SAndroid Build Coastguard Worker fma rn f64 $r6d neg $r4d $r0d $r8d 759*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r4d $r6d $r4d 760*61046927SAndroid Build Coastguard Worker mul rn f64 $r0d $r2d $r4d 761*61046927SAndroid Build Coastguard Worker fma rn f64 $r6d neg $r4d $r0d $r8d 762*61046927SAndroid Build Coastguard Worker fma rn f64 $r4d $r4d $r6d $r4d 763*61046927SAndroid Build Coastguard Worker sched 0x29 0x20 0x28 0x2e 0x00 0x00 0x00 764*61046927SAndroid Build Coastguard Worker // Multiply 2^27 to result for small inputs to recover 765*61046927SAndroid Build Coastguard Worker $p1 mul rn f64 $r4d $r4d 0x41a0000000000000 766*61046927SAndroid Build Coastguard Worker long mov b32 $r1 $r5 767*61046927SAndroid Build Coastguard Worker long mov b32 $r0 $r4 768*61046927SAndroid Build Coastguard Worker long ret 769*61046927SAndroid Build Coastguard Worker 770*61046927SAndroid Build Coastguard Worker// 771*61046927SAndroid Build Coastguard Worker// Trap handler. 772*61046927SAndroid Build Coastguard Worker// Requires at least 4 GPRs and 32 bytes of l[] memory to temporarily save GPRs. 773*61046927SAndroid Build Coastguard Worker// Low 32 bytes of l[] memory shouldn't be used if resumability is required. 774*61046927SAndroid Build Coastguard Worker// 775*61046927SAndroid Build Coastguard Worker// Trap info: 776*61046927SAndroid Build Coastguard Worker// 0x000: mutex 777*61046927SAndroid Build Coastguard Worker// 0x004: PC 778*61046927SAndroid Build Coastguard Worker// 0x008: trapstat 779*61046927SAndroid Build Coastguard Worker// 0x00c: warperr 780*61046927SAndroid Build Coastguard Worker// 0x010: tidx 781*61046927SAndroid Build Coastguard Worker// 0x014: tidy 782*61046927SAndroid Build Coastguard Worker// 0x018: tidz 783*61046927SAndroid Build Coastguard Worker// 0x01c: ctaidx 784*61046927SAndroid Build Coastguard Worker// 0x020: ctaidy 785*61046927SAndroid Build Coastguard Worker// 0x024: ctaidz 786*61046927SAndroid Build Coastguard Worker// 0x030: $r0q 787*61046927SAndroid Build Coastguard Worker// 0x130: $flags 788*61046927SAndroid Build Coastguard Worker// 0x140: s[] 789*61046927SAndroid Build Coastguard Worker// 790*61046927SAndroid Build Coastguard Workerst b128 wb l[0x00] $r0q 791*61046927SAndroid Build Coastguard Worker// check state of the warp and continue if it didn't cause the trap 792*61046927SAndroid Build Coastguard Workerlong mov b32 $r1 $trapstat 793*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 $warperr 794*61046927SAndroid Build Coastguard Workermov $r2 $flags mask 0xffff 795*61046927SAndroid Build Coastguard Workerand b32 0 $c $r1 $r3 796*61046927SAndroid Build Coastguard Workere $c bra #end_cont 797*61046927SAndroid Build Coastguard Worker// spill control flow stack to l[] 798*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 16 799*61046927SAndroid Build Coastguard Workerspill_cfstack: 800*61046927SAndroid Build Coastguard Workerpreret #end_exit 801*61046927SAndroid Build Coastguard Workersub b32 $r3 $c $r3 0x1 802*61046927SAndroid Build Coastguard Workerlg $c bra #spill_cfstack 803*61046927SAndroid Build Coastguard Worker// retrieve pointer to trap info 804*61046927SAndroid Build Coastguard Workermov b32 $r0 c0[0x1900] 805*61046927SAndroid Build Coastguard Workermov b32 $r1 c0[0x1904] 806*61046927SAndroid Build Coastguard Worker// we only let a single faulting thread store its state 807*61046927SAndroid Build Coastguard Workermov b32 $r3 0x1 808*61046927SAndroid Build Coastguard Workerexch b32 $r3 g[$r0d] $r3 809*61046927SAndroid Build Coastguard Workerjoinat #end_exit 810*61046927SAndroid Build Coastguard Workerset $p0 0x1 eq u32 $r3 0x1 811*61046927SAndroid Build Coastguard Workerjoin $p0 nop 812*61046927SAndroid Build Coastguard Worker// store $c and $p registers 813*61046927SAndroid Build Coastguard Workerst b32 wb g[$r0d+0x130] $r2 814*61046927SAndroid Build Coastguard Worker// store $trapstat and $warperr 815*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 $trapstat 816*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 $warperr 817*61046927SAndroid Build Coastguard Workerst b64 wb g[$r0d+0x8] $r2d 818*61046927SAndroid Build Coastguard Worker// store registers 819*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x40] $r4q 820*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x50] $r8q 821*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x60] $r12q 822*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x70] $r16q 823*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x80] $r20q 824*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x90] $r24q 825*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0xa0] $r28q 826*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0xb0] $r32q 827*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0xc0] $r36q 828*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0xd0] $r40q 829*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0xe0] $r44q 830*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0xf0] $r48q 831*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x100] $r52q 832*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x110] $r56q 833*61046927SAndroid Build Coastguard Workerst b128 wb g[$r0d+0x120] $r60q 834*61046927SAndroid Build Coastguard Workerld b64 $r2d cs l[0x0] 835*61046927SAndroid Build Coastguard Workerst b64 wb g[$r0d+0x30] $r2d 836*61046927SAndroid Build Coastguard Workerld b64 $r2d cs l[0x8] 837*61046927SAndroid Build Coastguard Workerst b64 wb g[$r0d+0x38] $r2d 838*61046927SAndroid Build Coastguard Worker// store thread id 839*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 $tidx 840*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 $tidy 841*61046927SAndroid Build Coastguard Workerst b64 wb g[$r0d+0x10] $r2d 842*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 $tidz 843*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 $ctaidx 844*61046927SAndroid Build Coastguard Workerst b64 wb g[$r0d+0x18] $r2d 845*61046927SAndroid Build Coastguard Workerlong mov b32 $r2 $ctaidy 846*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 $ctaidz 847*61046927SAndroid Build Coastguard Workerst b64 wb g[$r0d+0x20] $r2d 848*61046927SAndroid Build Coastguard Worker// store shared memory (in reverse order so $r0d is base again at the end) 849*61046927SAndroid Build Coastguard Workerlong mov b32 $r3 $smemsz 850*61046927SAndroid Build Coastguard Workersub b32 $r3 $c $r3 0x4 851*61046927SAndroid Build Coastguard Workers $c bra #shared_done 852*61046927SAndroid Build Coastguard Workeradd b32 $r0 $c $r0 $r3 853*61046927SAndroid Build Coastguard Workeradd b32 $r1 $r1 0x0 $c 854*61046927SAndroid Build Coastguard Workershared_loop: 855*61046927SAndroid Build Coastguard Workerlong ld b32 $r2 s[$r3] 856*61046927SAndroid Build Coastguard Workerlong st b32 wb g[$r0d+0x140] $r2 857*61046927SAndroid Build Coastguard Workersub b32 $r0 $c $r0 0x4 858*61046927SAndroid Build Coastguard Workersub b32 $r1 $r1 0x0 $c 859*61046927SAndroid Build Coastguard Workersub b32 $r3 $c $r3 0x4 860*61046927SAndroid Build Coastguard Workerlg $c bra #shared_loop 861*61046927SAndroid Build Coastguard Workershared_done: 862*61046927SAndroid Build Coastguard Worker// search the stack for trap entry to retrieve PC 863*61046927SAndroid Build Coastguard Workermov b32 $r0 c0[0x1908] 864*61046927SAndroid Build Coastguard Workermov b32 $r1 c0[0x190c] 865*61046927SAndroid Build Coastguard Workermembar sys 866*61046927SAndroid Build Coastguard Worker// invalidate caches so we can read stack entries via g[] 867*61046927SAndroid Build Coastguard Workercctl ivall 0 l[0] 868*61046927SAndroid Build Coastguard Workercctl ivall 0 g[$r0d] 869*61046927SAndroid Build Coastguard Worker// get offsets 870*61046927SAndroid Build Coastguard Workermov b32 $r2 $physid 871*61046927SAndroid Build Coastguard Workerext u32 $r3 $r2 0x0814 // MP id 872*61046927SAndroid Build Coastguard Workerext u32 $r2 $r2 0x0608 // warp id 873*61046927SAndroid Build Coastguard Workermul $r2 u32 $r2 u32 c0[0x1914] // warp offset 874*61046927SAndroid Build Coastguard Workermul $r3 u32 $r3 u32 c0[0x1910] // MP offset 875*61046927SAndroid Build Coastguard Workeradd b32 $r2 $r2 $r3 // MP + warp offset 876*61046927SAndroid Build Coastguard Workeradd b32 $r0 $c $r0 $r2 877*61046927SAndroid Build Coastguard Workeradd b32 $r1 $r1 0x0 $c 878*61046927SAndroid Build Coastguard Workersearch_cstack: 879*61046927SAndroid Build Coastguard Workermov b32 $r3 c0[0x1918] // cstack size 880*61046927SAndroid Build Coastguard Workerld u8 $r2 cv g[$r0d+0x8] 881*61046927SAndroid Build Coastguard Workerset $p0 0x1 eq u32 $r2 0xa 882*61046927SAndroid Build Coastguard Worker$p0 bra #entry_found 883*61046927SAndroid Build Coastguard Workeradd b32 $r0 $c $r0 0x10 884*61046927SAndroid Build Coastguard Workeradd b32 $r1 $r1 0x0 $c 885*61046927SAndroid Build Coastguard Workersub b32 $r3 $c $r3 0x10 886*61046927SAndroid Build Coastguard Workerlg $c bra #search_cstack 887*61046927SAndroid Build Coastguard Workerbra #end_exit 888*61046927SAndroid Build Coastguard Workerentry_found: 889*61046927SAndroid Build Coastguard Worker// load PC (may be unaligned and spread out) 890*61046927SAndroid Build Coastguard Workerld b32 $r2 cv g[$r0d] 891*61046927SAndroid Build Coastguard Workermov b32 $r0 c0[0x1900] 892*61046927SAndroid Build Coastguard Workermov b32 $r1 c0[0x1904] 893*61046927SAndroid Build Coastguard Workerst b32 wb g[$r0d+0x4] $r2 894*61046927SAndroid Build Coastguard Workerjoin nop 895*61046927SAndroid Build Coastguard Worker// invalidate caches and exit 896*61046927SAndroid Build Coastguard Workerend_exit: 897*61046927SAndroid Build Coastguard Workercctl ivall 0 g[0] 898*61046927SAndroid Build Coastguard Workerbpt pause 0x0 899*61046927SAndroid Build Coastguard Workerrtt terminate 900*61046927SAndroid Build Coastguard Workerend_cont: 901*61046927SAndroid Build Coastguard Workerbpt pause 0x0 902*61046927SAndroid Build Coastguard Workermov $flags $r2 mask 0xffff 903*61046927SAndroid Build Coastguard Workerld b128 $r0q cs l[0x00] 904*61046927SAndroid Build Coastguard Workerrtt 905*61046927SAndroid Build Coastguard Worker 906*61046927SAndroid Build Coastguard Worker.section #gk104_builtin_offsets 907*61046927SAndroid Build Coastguard Worker.b64 #gk104_div_u32 908*61046927SAndroid Build Coastguard Worker.b64 #gk104_div_s32 909*61046927SAndroid Build Coastguard Worker.b64 #gk104_rcp_f64 910*61046927SAndroid Build Coastguard Worker.b64 #gk104_rsq_f64 911