1*61046927SAndroid Build Coastguard Worker# Unofficial GCN/RDNA ISA reference errata 2*61046927SAndroid Build Coastguard Worker 3*61046927SAndroid Build Coastguard Worker## `v_sad_u32` 4*61046927SAndroid Build Coastguard Worker 5*61046927SAndroid Build Coastguard WorkerThe Vega ISA reference writes its behaviour as: 6*61046927SAndroid Build Coastguard Worker 7*61046927SAndroid Build Coastguard Worker``` 8*61046927SAndroid Build Coastguard WorkerD.u = abs(S0.i - S1.i) + S2.u. 9*61046927SAndroid Build Coastguard Worker``` 10*61046927SAndroid Build Coastguard Worker 11*61046927SAndroid Build Coastguard WorkerThis is incorrect. The actual behaviour is what is written in the GCN3 reference 12*61046927SAndroid Build Coastguard Workerguide: 13*61046927SAndroid Build Coastguard Worker 14*61046927SAndroid Build Coastguard Worker``` 15*61046927SAndroid Build Coastguard WorkerABS_DIFF (A,B) = (A>B) ? (A-B) : (B-A) 16*61046927SAndroid Build Coastguard WorkerD.u = ABS_DIFF (S0.u,S1.u) + S2.u 17*61046927SAndroid Build Coastguard Worker``` 18*61046927SAndroid Build Coastguard Worker 19*61046927SAndroid Build Coastguard WorkerThe instruction doesn't subtract the S0 and S1 and use the absolute value (the 20*61046927SAndroid Build Coastguard Worker_signed_ distance), it uses the _unsigned_ distance between the operands. So 21*61046927SAndroid Build Coastguard Worker`v_sad_u32(-5, 0, 0)` would return `4294967291` (`-5` interpreted as unsigned), 22*61046927SAndroid Build Coastguard Workernot `5`. 23*61046927SAndroid Build Coastguard Worker 24*61046927SAndroid Build Coastguard Worker## `s_bfe_*` 25*61046927SAndroid Build Coastguard Worker 26*61046927SAndroid Build Coastguard WorkerBoth the RDNA, Vega and GCN3 ISA references write that these instructions don't write 27*61046927SAndroid Build Coastguard WorkerSCC. They do. 28*61046927SAndroid Build Coastguard Worker 29*61046927SAndroid Build Coastguard Worker## `v_bcnt_u32_b32` 30*61046927SAndroid Build Coastguard Worker 31*61046927SAndroid Build Coastguard WorkerThe Vega ISA reference writes its behaviour as: 32*61046927SAndroid Build Coastguard Worker 33*61046927SAndroid Build Coastguard Worker``` 34*61046927SAndroid Build Coastguard WorkerD.u = 0; 35*61046927SAndroid Build Coastguard Workerfor i in 0 ... 31 do 36*61046927SAndroid Build Coastguard WorkerD.u += (S0.u[i] == 1 ? 1 : 0); 37*61046927SAndroid Build Coastguard Workerendfor. 38*61046927SAndroid Build Coastguard Worker``` 39*61046927SAndroid Build Coastguard Worker 40*61046927SAndroid Build Coastguard WorkerThis is incorrect. The actual behaviour (and number of operands) is what 41*61046927SAndroid Build Coastguard Workeris written in the GCN3 reference guide: 42*61046927SAndroid Build Coastguard Worker 43*61046927SAndroid Build Coastguard Worker``` 44*61046927SAndroid Build Coastguard WorkerD.u = CountOneBits(S0.u) + S1.u. 45*61046927SAndroid Build Coastguard Worker``` 46*61046927SAndroid Build Coastguard Worker 47*61046927SAndroid Build Coastguard Worker## `v_alignbyte_b32` 48*61046927SAndroid Build Coastguard Worker 49*61046927SAndroid Build Coastguard WorkerAll versions of the ISA document are vague about it, but after some trial and 50*61046927SAndroid Build Coastguard Workererror we discovered that only 2 bits of the 3rd operand are used. 51*61046927SAndroid Build Coastguard WorkerTherefore, this instruction can't shift more than 24 bits. 52*61046927SAndroid Build Coastguard Worker 53*61046927SAndroid Build Coastguard WorkerThe correct description of `v_alignbyte_b32` is probably the following: 54*61046927SAndroid Build Coastguard Worker 55*61046927SAndroid Build Coastguard Worker``` 56*61046927SAndroid Build Coastguard WorkerD.u = ({S0, S1} >> (8 * S2.u[1:0])) & 0xffffffff 57*61046927SAndroid Build Coastguard Worker``` 58*61046927SAndroid Build Coastguard Worker 59*61046927SAndroid Build Coastguard Worker## SMEM stores 60*61046927SAndroid Build Coastguard Worker 61*61046927SAndroid Build Coastguard WorkerThe Vega ISA references doesn't say this (or doesn't make it clear), but 62*61046927SAndroid Build Coastguard Workerthe offset for SMEM stores must be in m0 if IMM == 0. 63*61046927SAndroid Build Coastguard Worker 64*61046927SAndroid Build Coastguard WorkerThe RDNA ISA doesn't mention SMEM stores at all, but they seem to be supported 65*61046927SAndroid Build Coastguard Workerby the chip and are present in LLVM. AMD devs however highly recommend avoiding 66*61046927SAndroid Build Coastguard Workerthese instructions. 67*61046927SAndroid Build Coastguard Worker 68*61046927SAndroid Build Coastguard Worker## SMEM atomics 69*61046927SAndroid Build Coastguard Worker 70*61046927SAndroid Build Coastguard WorkerRDNA ISA: same as the SMEM stores, the ISA pretends they don't exist, but they 71*61046927SAndroid Build Coastguard Workerare there in LLVM. 72*61046927SAndroid Build Coastguard Worker 73*61046927SAndroid Build Coastguard Worker## VMEM stores 74*61046927SAndroid Build Coastguard Worker 75*61046927SAndroid Build Coastguard WorkerAll reference guides say (under "Vector Memory Instruction Data Dependencies"): 76*61046927SAndroid Build Coastguard Worker 77*61046927SAndroid Build Coastguard Worker> When a VM instruction is issued, the address is immediately read out of VGPRs 78*61046927SAndroid Build Coastguard Worker> and sent to the texture cache. Any texture or buffer resources and samplers 79*61046927SAndroid Build Coastguard Worker> are also sent immediately. However, write-data is not immediately sent to the 80*61046927SAndroid Build Coastguard Worker> texture cache. 81*61046927SAndroid Build Coastguard Worker 82*61046927SAndroid Build Coastguard WorkerReading that, one might think that waitcnts need to be added when writing to 83*61046927SAndroid Build Coastguard Workerthe registers used for a VMEM store's data. Experimentation has shown that this 84*61046927SAndroid Build Coastguard Workerdoes not seem to be the case on GFX8 and GFX9 (GFX6 and GFX7 are untested). It 85*61046927SAndroid Build Coastguard Workeralso seems unlikely, since NOPs are apparently needed in a subset of these 86*61046927SAndroid Build Coastguard Workersituations. 87*61046927SAndroid Build Coastguard Worker 88*61046927SAndroid Build Coastguard Worker## MIMG opcodes on GFX8/GCN3 89*61046927SAndroid Build Coastguard Worker 90*61046927SAndroid Build Coastguard WorkerThe `image_atomic_{swap,cmpswap,add,sub}` opcodes in the GCN3 ISA reference 91*61046927SAndroid Build Coastguard Workerguide are incorrect. The Vega ISA reference guide has the correct ones. 92*61046927SAndroid Build Coastguard Worker 93*61046927SAndroid Build Coastguard Worker## VINTRP encoding 94*61046927SAndroid Build Coastguard Worker 95*61046927SAndroid Build Coastguard WorkerVEGA ISA doc says the encoding should be `110010` but `110101` works. 96*61046927SAndroid Build Coastguard Worker 97*61046927SAndroid Build Coastguard Worker## VOP1 instructions encoded as VOP3 98*61046927SAndroid Build Coastguard Worker 99*61046927SAndroid Build Coastguard WorkerRDNA ISA doc says that `0x140` should be added to the opcode, but that doesn't 100*61046927SAndroid Build Coastguard Workerwork. What works is adding `0x180`, which LLVM also does. 101*61046927SAndroid Build Coastguard Worker 102*61046927SAndroid Build Coastguard Worker## FLAT, Scratch, Global instructions 103*61046927SAndroid Build Coastguard Worker 104*61046927SAndroid Build Coastguard WorkerThe NV bit was removed in RDNA, but some parts of the doc still mention it. 105*61046927SAndroid Build Coastguard Worker 106*61046927SAndroid Build Coastguard WorkerRDNA ISA doc 13.8.1 says that SADDR should be set to 0x7f when ADDR is used, but 107*61046927SAndroid Build Coastguard Worker9.3.1 says it should be set to NULL. We assume 9.3.1 is correct and set it to 108*61046927SAndroid Build Coastguard WorkerSGPR_NULL. 109*61046927SAndroid Build Coastguard Worker 110*61046927SAndroid Build Coastguard Worker## Legacy instructions 111*61046927SAndroid Build Coastguard Worker 112*61046927SAndroid Build Coastguard WorkerSome instructions have a `_LEGACY` variant which implements "DX9 rules", in which 113*61046927SAndroid Build Coastguard Workerthe zero "wins" in multiplications, ie. `0.0*x` is always `0.0`. The VEGA ISA 114*61046927SAndroid Build Coastguard Workermentions `V_MAC_LEGACY_F32` but this instruction is not really there on VEGA. 115*61046927SAndroid Build Coastguard Worker 116*61046927SAndroid Build Coastguard Worker## LDS size and allocation granule 117*61046927SAndroid Build Coastguard Worker 118*61046927SAndroid Build Coastguard WorkerGFX7-8 ISA manuals are mistaken about the available LDS size. 119*61046927SAndroid Build Coastguard Worker 120*61046927SAndroid Build Coastguard Worker* GFX7+ workgroups can use 64KB LDS. 121*61046927SAndroid Build Coastguard Worker There is 64KB LDS per CU. 122*61046927SAndroid Build Coastguard Worker* GFX6 workgroups can use 32KB LDS. 123*61046927SAndroid Build Coastguard Worker There is 64KB LDS per CU, but a single workgroup can only use half of it. 124*61046927SAndroid Build Coastguard Worker 125*61046927SAndroid Build Coastguard Worker Regarding the LDS allocation granule, Mesa has the correct details and 126*61046927SAndroid Build Coastguard Worker the ISA manuals are mistaken. 127*61046927SAndroid Build Coastguard Worker 128*61046927SAndroid Build Coastguard Worker## `m0` with LDS instructions on Vega and newer 129*61046927SAndroid Build Coastguard Worker 130*61046927SAndroid Build Coastguard WorkerThe Vega ISA doc (both the old one and the "7nm" one) claims that LDS instructions 131*61046927SAndroid Build Coastguard Workeruse the `m0` register for address clamping like older GPUs, but this is not the case. 132*61046927SAndroid Build Coastguard Worker 133*61046927SAndroid Build Coastguard WorkerIn reality, only the `_addtid` variants of LDS instructions use `m0` on Vega and 134*61046927SAndroid Build Coastguard Workernewer GPUs, so the relevant section of the RDNA ISA doc seems to apply. 135*61046927SAndroid Build Coastguard WorkerLLVM also doesn't emit any initialization of `m0` for LDS instructions, and this 136*61046927SAndroid Build Coastguard Workerwas also confirmed by AMD devs. 137*61046927SAndroid Build Coastguard Worker 138*61046927SAndroid Build Coastguard Worker## RDNA L0, L1 cache and DLC, GLC bits 139*61046927SAndroid Build Coastguard Worker 140*61046927SAndroid Build Coastguard WorkerThe old L1 cache was renamed to L0, and a new L1 cache was added to RDNA. The 141*61046927SAndroid Build Coastguard WorkerL1 cache is 1 cache per shader array. Some instruction encodings have DLC and 142*61046927SAndroid Build Coastguard WorkerGLC bits that interact with the cache. 143*61046927SAndroid Build Coastguard Worker 144*61046927SAndroid Build Coastguard Worker* DLC ("device level coherent") bit: controls the L1 cache 145*61046927SAndroid Build Coastguard Worker* GLC ("globally coherent") bit: controls the L0 cache 146*61046927SAndroid Build Coastguard Worker 147*61046927SAndroid Build Coastguard WorkerThe recommendation from AMD devs is to always set these two bits at the same time, 148*61046927SAndroid Build Coastguard Workeras it doesn't make too much sense to set them independently, aside from some 149*61046927SAndroid Build Coastguard Workercircumstances (eg. we needn't set DLC when only one shader array is used). 150*61046927SAndroid Build Coastguard Worker 151*61046927SAndroid Build Coastguard WorkerStores and atomics always bypass the L1 cache, so they don't support the DLC bit, 152*61046927SAndroid Build Coastguard Workerand it shouldn't be set in these cases. Setting the DLC for these cases can result 153*61046927SAndroid Build Coastguard Workerin graphical glitches or hangs. 154*61046927SAndroid Build Coastguard Worker 155*61046927SAndroid Build Coastguard Worker## RDNA `s_dcache_wb` 156*61046927SAndroid Build Coastguard Worker 157*61046927SAndroid Build Coastguard WorkerThe `s_dcache_wb` is not mentioned in the RDNA ISA doc, but it is needed in order 158*61046927SAndroid Build Coastguard Workerto achieve correct behavior in some SSBO CTS tests. 159*61046927SAndroid Build Coastguard Worker 160*61046927SAndroid Build Coastguard Worker## RDNA subvector mode 161*61046927SAndroid Build Coastguard Worker 162*61046927SAndroid Build Coastguard WorkerThe documentation of `s_subvector_loop_begin` and `s_subvector_mode_end` is not clear 163*61046927SAndroid Build Coastguard Workeron what sort of addressing should be used, but it says that it 164*61046927SAndroid Build Coastguard Worker"is equivalent to an `S_CBRANCH` with extra math", so the subvector loop handling 165*61046927SAndroid Build Coastguard Workerin ACO is done according to the `s_cbranch` doc. 166*61046927SAndroid Build Coastguard Worker 167*61046927SAndroid Build Coastguard Worker## RDNA early rasterization 168*61046927SAndroid Build Coastguard Worker 169*61046927SAndroid Build Coastguard WorkerThe ISA documentation says about `s_endpgm`: 170*61046927SAndroid Build Coastguard Worker 171*61046927SAndroid Build Coastguard Worker> The hardware implicitly executes S_WAITCNT 0 and S_WAITCNT_VSCNT 0 172*61046927SAndroid Build Coastguard Worker> before executing this instruction. 173*61046927SAndroid Build Coastguard Worker 174*61046927SAndroid Build Coastguard WorkerWhat the doc doesn't say is that in case of NGG (and legacy VS) when there 175*61046927SAndroid Build Coastguard Workerare no param exports, the driver sets `NO_PC_EXPORT=1` for optimal performance, 176*61046927SAndroid Build Coastguard Workerand when this is set, the hardware will start clipping and rasterization 177*61046927SAndroid Build Coastguard Workeras soon as it encounters a position export with `DONE=1`, without waiting 178*61046927SAndroid Build Coastguard Workerfor the NGG (or VS) to finish. 179*61046927SAndroid Build Coastguard Worker 180*61046927SAndroid Build Coastguard WorkerIt can even launch PS waves before NGG (or VS) ends. 181*61046927SAndroid Build Coastguard Worker 182*61046927SAndroid Build Coastguard WorkerWhen this happens, any store performed by a VS is not guaranteed 183*61046927SAndroid Build Coastguard Workerto be complete when PS tries to load it, so we need to manually 184*61046927SAndroid Build Coastguard Workermake sure to insert wait instructions before the position exports. 185*61046927SAndroid Build Coastguard Worker 186*61046927SAndroid Build Coastguard Worker## A16 and G16 187*61046927SAndroid Build Coastguard Worker 188*61046927SAndroid Build Coastguard WorkerOn GFX9, the A16 field enables both 16 bit addresses and derivatives. 189*61046927SAndroid Build Coastguard WorkerSince GFX10+ these are fully independent of each other, A16 controls 16 bit addresses 190*61046927SAndroid Build Coastguard Workerand G16 opcodes 16 bit derivatives. A16 without G16 uses 32 bit derivatives. 191*61046927SAndroid Build Coastguard Worker 192*61046927SAndroid Build Coastguard Worker## POPS collision wave ID argument (GFX9-10.3) 193*61046927SAndroid Build Coastguard Worker 194*61046927SAndroid Build Coastguard WorkerThe 2020 RDNA and RDNA 2 ISA references contain incorrect offsets and widths of 195*61046927SAndroid Build Coastguard Workerthe fields of the "POPS collision wave ID" SGPR argument. 196*61046927SAndroid Build Coastguard Worker 197*61046927SAndroid Build Coastguard WorkerAccording to the code generated for Rasterizer Ordered View usage in Direct3D, 198*61046927SAndroid Build Coastguard Workerthe correct layout is: 199*61046927SAndroid Build Coastguard Worker 200*61046927SAndroid Build Coastguard Worker* [31]: Whether overlap has occurred. 201*61046927SAndroid Build Coastguard Worker* [29:28] (GFX10+) / [28] (GFX9): ID of the packer the wave should be associated 202*61046927SAndroid Build Coastguard Worker with. 203*61046927SAndroid Build Coastguard Worker* [25:16]: Newest overlapped wave ID. 204*61046927SAndroid Build Coastguard Worker* [9:0]: Current wave ID. 205*61046927SAndroid Build Coastguard Worker 206*61046927SAndroid Build Coastguard Worker## RDNA3 `v_pk_fmac_f16_dpp` 207*61046927SAndroid Build Coastguard Worker 208*61046927SAndroid Build Coastguard Worker"Table 30. Which instructions support DPP" in the RDNA3 ISA documentation has no exception for 209*61046927SAndroid Build Coastguard WorkerVOP2 `v_pk_fmac_f16`. But like all other packed math opcodes, DPP does not function in practice. 210*61046927SAndroid Build Coastguard WorkerRDNA1 and RDNA2 support `v_pk_fmac_f16_dpp`. 211*61046927SAndroid Build Coastguard Worker 212*61046927SAndroid Build Coastguard Worker 213*61046927SAndroid Build Coastguard Worker# Hardware Bugs 214*61046927SAndroid Build Coastguard Worker 215*61046927SAndroid Build Coastguard Worker## SMEM corrupts VCCZ on SI/CI 216*61046927SAndroid Build Coastguard Worker 217*61046927SAndroid Build Coastguard Worker[See this LLVM source.](https://github.com/llvm/llvm-project/blob/acb089e12ae48b82c0b05c42326196a030df9b82/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp#L580-L616) 218*61046927SAndroid Build Coastguard Worker 219*61046927SAndroid Build Coastguard WorkerAfter issuing a SMEM instructions, we need to wait for the SMEM instructions to 220*61046927SAndroid Build Coastguard Workerfinish and then write to vcc (for example, `s_mov_b64 vcc, vcc`) to correct vccz 221*61046927SAndroid Build Coastguard Worker 222*61046927SAndroid Build Coastguard WorkerCurrently, we don't do this. 223*61046927SAndroid Build Coastguard Worker 224*61046927SAndroid Build Coastguard Worker## SGPR offset on MUBUF prevents addr clamping on SI/CI 225*61046927SAndroid Build Coastguard Worker 226*61046927SAndroid Build Coastguard Worker[See this LLVM source.](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp#L1917-L1922) 227*61046927SAndroid Build Coastguard Worker 228*61046927SAndroid Build Coastguard WorkerThis leads to wrong bounds checking, using a VGPR offset fixes it. 229*61046927SAndroid Build Coastguard Worker 230*61046927SAndroid Build Coastguard Worker## unused VMEM/DS destination lanes can't be used without waiting 231*61046927SAndroid Build Coastguard Worker 232*61046927SAndroid Build Coastguard WorkerOn GFX11, we can't safely read/write unused lanes of VMEM/DS destination 233*61046927SAndroid Build Coastguard WorkerVGPRs without waiting for the load to finish. 234*61046927SAndroid Build Coastguard Worker 235*61046927SAndroid Build Coastguard Worker## GCN / GFX6 hazards 236*61046927SAndroid Build Coastguard Worker 237*61046927SAndroid Build Coastguard Worker### VINTRP followed by a read with `v_readfirstlane` or `v_readlane` 238*61046927SAndroid Build Coastguard Worker 239*61046927SAndroid Build Coastguard WorkerIt's required to insert 1 wait state if the dst VGPR of any `v_interp_*` is 240*61046927SAndroid Build Coastguard Workerfollowed by a read with `v_readfirstlane` or `v_readlane` to fix GPU hangs on GFX6. 241*61046927SAndroid Build Coastguard WorkerNote that `v_writelane_*` is apparently not affected. This hazard isn't 242*61046927SAndroid Build Coastguard Workerdocumented anywhere but AMD confirmed it. 243*61046927SAndroid Build Coastguard Worker 244*61046927SAndroid Build Coastguard Worker## RDNA / GFX10 hazards 245*61046927SAndroid Build Coastguard Worker 246*61046927SAndroid Build Coastguard Worker### SMEM store followed by a load with the same address 247*61046927SAndroid Build Coastguard Worker 248*61046927SAndroid Build Coastguard WorkerWe found that an `s_buffer_load` will produce incorrect results if it is preceded 249*61046927SAndroid Build Coastguard Workerby an `s_buffer_store` with the same address. Inserting an `s_nop` between them 250*61046927SAndroid Build Coastguard Workerdoes not mitigate the issue, so an `s_waitcnt lgkmcnt(0)` must be inserted. 251*61046927SAndroid Build Coastguard WorkerThis is not mentioned by LLVM among the other GFX10 bugs, but LLVM doesn't use 252*61046927SAndroid Build Coastguard WorkerSMEM stores, so it's not surprising that they didn't notice it. 253*61046927SAndroid Build Coastguard Worker 254*61046927SAndroid Build Coastguard Worker### VMEMtoScalarWriteHazard 255*61046927SAndroid Build Coastguard Worker 256*61046927SAndroid Build Coastguard WorkerTriggered by: 257*61046927SAndroid Build Coastguard WorkerVMEM/FLAT/GLOBAL/SCRATCH/DS instruction reads an SGPR (or EXEC, or M0). 258*61046927SAndroid Build Coastguard WorkerThen, a SALU/SMEM instruction writes the same SGPR. 259*61046927SAndroid Build Coastguard Worker 260*61046927SAndroid Build Coastguard WorkerMitigated by: 261*61046927SAndroid Build Coastguard WorkerA VALU instruction or an `s_waitcnt` between the two instructions. 262*61046927SAndroid Build Coastguard Worker 263*61046927SAndroid Build Coastguard Worker### SMEMtoVectorWriteHazard 264*61046927SAndroid Build Coastguard Worker 265*61046927SAndroid Build Coastguard WorkerTriggered by: 266*61046927SAndroid Build Coastguard WorkerAn SMEM instruction reads an SGPR. Then, a VALU instruction writes that same SGPR. 267*61046927SAndroid Build Coastguard Worker 268*61046927SAndroid Build Coastguard WorkerMitigated by: 269*61046927SAndroid Build Coastguard WorkerAny non-SOPP SALU instruction (except `s_setvskip`, `s_version`, and any non-lgkmcnt `s_waitcnt`). 270*61046927SAndroid Build Coastguard Worker 271*61046927SAndroid Build Coastguard Worker### Offset3fBug 272*61046927SAndroid Build Coastguard Worker 273*61046927SAndroid Build Coastguard WorkerAny branch that is located at offset 0x3f will be buggy. Just insert some NOPs to make sure no branch 274*61046927SAndroid Build Coastguard Workeris located at this offset. 275*61046927SAndroid Build Coastguard Worker 276*61046927SAndroid Build Coastguard Worker### InstFwdPrefetchBug 277*61046927SAndroid Build Coastguard Worker 278*61046927SAndroid Build Coastguard WorkerAccording to LLVM, the `s_inst_prefetch` instruction can cause a hang on GFX10. 279*61046927SAndroid Build Coastguard WorkerSeems to be resolved on GFX10.3+. There are no further details. 280*61046927SAndroid Build Coastguard Worker 281*61046927SAndroid Build Coastguard Worker### LdsMisalignedBug 282*61046927SAndroid Build Coastguard Worker 283*61046927SAndroid Build Coastguard WorkerWhen there is a misaligned multi-dword FLAT load/store instruction in WGP mode, 284*61046927SAndroid Build Coastguard Workerit needs to be split into multiple single-dword FLAT instructions. 285*61046927SAndroid Build Coastguard Worker 286*61046927SAndroid Build Coastguard WorkerACO doesn't use FLAT load/store on GFX10, so is unaffected. 287*61046927SAndroid Build Coastguard Worker 288*61046927SAndroid Build Coastguard Worker### FlatSegmentOffsetBug 289*61046927SAndroid Build Coastguard Worker 290*61046927SAndroid Build Coastguard WorkerThe 12-bit immediate OFFSET field of FLAT instructions must always be 0. 291*61046927SAndroid Build Coastguard WorkerGLOBAL and SCRATCH are unaffected. 292*61046927SAndroid Build Coastguard Worker 293*61046927SAndroid Build Coastguard WorkerACO doesn't use FLAT load/store on GFX10, so is unaffected. 294*61046927SAndroid Build Coastguard Worker 295*61046927SAndroid Build Coastguard Worker### VcmpxPermlaneHazard 296*61046927SAndroid Build Coastguard Worker 297*61046927SAndroid Build Coastguard WorkerTriggered by: 298*61046927SAndroid Build Coastguard WorkerAny permlane instruction that follows any VOPC instruction which writes exec. 299*61046927SAndroid Build Coastguard Worker 300*61046927SAndroid Build Coastguard WorkerMitigated by: any VALU instruction except `v_nop`. 301*61046927SAndroid Build Coastguard Worker 302*61046927SAndroid Build Coastguard Worker### VcmpxExecWARHazard 303*61046927SAndroid Build Coastguard Worker 304*61046927SAndroid Build Coastguard WorkerTriggered by: 305*61046927SAndroid Build Coastguard WorkerAny non-VALU instruction reads the EXEC mask. Then, any VALU instruction writes the EXEC mask. 306*61046927SAndroid Build Coastguard Worker 307*61046927SAndroid Build Coastguard WorkerMitigated by: 308*61046927SAndroid Build Coastguard WorkerA VALU instruction that writes an SGPR (or has a valid SDST operand), or `s_waitcnt_depctr 0xfffe`. 309*61046927SAndroid Build Coastguard WorkerNote: `s_waitcnt_depctr` is an internal instruction, so there is no further information 310*61046927SAndroid Build Coastguard Workerabout what it does or what its operand means. 311*61046927SAndroid Build Coastguard Worker 312*61046927SAndroid Build Coastguard Worker### LdsBranchVmemWARHazard 313*61046927SAndroid Build Coastguard Worker 314*61046927SAndroid Build Coastguard WorkerTriggered by: 315*61046927SAndroid Build Coastguard WorkerVMEM/GLOBAL/SCRATCH instruction, then a branch, then a DS instruction, 316*61046927SAndroid Build Coastguard Workeror vice versa: DS instruction, then a branch, then a VMEM/GLOBAL/SCRATCH instruction. 317*61046927SAndroid Build Coastguard Worker 318*61046927SAndroid Build Coastguard WorkerMitigated by: 319*61046927SAndroid Build Coastguard WorkerOnly `s_waitcnt_vscnt null, 0`. Needed even if the first instruction is a load. 320*61046927SAndroid Build Coastguard Worker 321*61046927SAndroid Build Coastguard Worker### NSAClauseBug 322*61046927SAndroid Build Coastguard Worker 323*61046927SAndroid Build Coastguard Worker"MIMG-NSA in a hard clause has unpredictable results on GFX10.1" 324*61046927SAndroid Build Coastguard Worker 325*61046927SAndroid Build Coastguard Worker### NSAMaxSize5 326*61046927SAndroid Build Coastguard Worker 327*61046927SAndroid Build Coastguard WorkerNSA MIMG instructions should be limited to 3 dwords before GFX10.3 to avoid 328*61046927SAndroid Build Coastguard Workerstability issues: https://reviews.llvm.org/D103348 329*61046927SAndroid Build Coastguard Worker 330*61046927SAndroid Build Coastguard Worker## RDNA3 / GFX11 hazards 331*61046927SAndroid Build Coastguard Worker 332*61046927SAndroid Build Coastguard Worker### VcmpxPermlaneHazard 333*61046927SAndroid Build Coastguard Worker 334*61046927SAndroid Build Coastguard WorkerSame as GFX10. 335*61046927SAndroid Build Coastguard Worker 336*61046927SAndroid Build Coastguard Worker### LdsDirectVALUHazard 337*61046927SAndroid Build Coastguard Worker 338*61046927SAndroid Build Coastguard WorkerTriggered by: 339*61046927SAndroid Build Coastguard WorkerLDSDIR instruction writing a VGPR soon after it's used by a VALU instruction. 340*61046927SAndroid Build Coastguard Worker 341*61046927SAndroid Build Coastguard WorkerMitigated by: 342*61046927SAndroid Build Coastguard WorkerA vdst wait, preferably using the LDSDIR's field. 343*61046927SAndroid Build Coastguard Worker 344*61046927SAndroid Build Coastguard Worker### LdsDirectVMEMHazard 345*61046927SAndroid Build Coastguard Worker 346*61046927SAndroid Build Coastguard WorkerTriggered by: 347*61046927SAndroid Build Coastguard WorkerLDSDIR instruction writing a VGPR after it's used by a VMEM/DS instruction. 348*61046927SAndroid Build Coastguard Worker 349*61046927SAndroid Build Coastguard WorkerMitigated by: 350*61046927SAndroid Build Coastguard WorkerWaiting for the VMEM/DS instruction to finish, a VALU or export instruction, or 351*61046927SAndroid Build Coastguard Worker`s_waitcnt_depctr 0xffe3`. 352*61046927SAndroid Build Coastguard Worker 353*61046927SAndroid Build Coastguard Worker### VALUTransUseHazard 354*61046927SAndroid Build Coastguard Worker 355*61046927SAndroid Build Coastguard WorkerTriggered by: 356*61046927SAndroid Build Coastguard WorkerA VALU instruction reading a VGPR written by a transcendental VALU instruction without 6+ VALU or 2+ 357*61046927SAndroid Build Coastguard Workertranscendental instructions in-between. 358*61046927SAndroid Build Coastguard Worker 359*61046927SAndroid Build Coastguard WorkerMitigated by: 360*61046927SAndroid Build Coastguard WorkerA va_vdst=0 wait: `s_waitcnt_deptr 0x0fff` 361*61046927SAndroid Build Coastguard Worker 362*61046927SAndroid Build Coastguard Worker### VALUPartialForwardingHazard 363*61046927SAndroid Build Coastguard Worker 364*61046927SAndroid Build Coastguard WorkerTriggered by: 365*61046927SAndroid Build Coastguard WorkerA VALU instruction reading two VGPRs: one written before an exec write by SALU and one after. To 366*61046927SAndroid Build Coastguard Workertrigger, there must be less than 3 VALU between the first and second VGPR writes and less than 5 367*61046927SAndroid Build Coastguard WorkerVALU between the second VGPR write and the current instruction. 368*61046927SAndroid Build Coastguard Worker 369*61046927SAndroid Build Coastguard WorkerMitigated by: 370*61046927SAndroid Build Coastguard WorkerA va_vdst=0 wait: `s_waitcnt_deptr 0x0fff` 371*61046927SAndroid Build Coastguard Worker 372*61046927SAndroid Build Coastguard Worker### VALUMaskWriteHazard 373*61046927SAndroid Build Coastguard Worker 374*61046927SAndroid Build Coastguard WorkerTriggered by: 375*61046927SAndroid Build Coastguard WorkerSALU writing then SALU or VALU reading a SGPR that was previously used as a lane mask for a VALU. 376*61046927SAndroid Build Coastguard Worker 377*61046927SAndroid Build Coastguard WorkerMitigated by: 378*61046927SAndroid Build Coastguard WorkerA VALU instruction reading a non-exec SGPR before the SALU write, or a sa_sdst=0 wait after the 379*61046927SAndroid Build Coastguard WorkerSALU write: `s_waitcnt_depctr 0xfffe` 380