xref: /aosp_15_r20/external/mesa3d/src/amd/compiler/README-ISA.md (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1*61046927SAndroid Build Coastguard Worker# Unofficial GCN/RDNA ISA reference errata
2*61046927SAndroid Build Coastguard Worker
3*61046927SAndroid Build Coastguard Worker## `v_sad_u32`
4*61046927SAndroid Build Coastguard Worker
5*61046927SAndroid Build Coastguard WorkerThe Vega ISA reference writes its behaviour as:
6*61046927SAndroid Build Coastguard Worker
7*61046927SAndroid Build Coastguard Worker```
8*61046927SAndroid Build Coastguard WorkerD.u = abs(S0.i - S1.i) + S2.u.
9*61046927SAndroid Build Coastguard Worker```
10*61046927SAndroid Build Coastguard Worker
11*61046927SAndroid Build Coastguard WorkerThis is incorrect. The actual behaviour is what is written in the GCN3 reference
12*61046927SAndroid Build Coastguard Workerguide:
13*61046927SAndroid Build Coastguard Worker
14*61046927SAndroid Build Coastguard Worker```
15*61046927SAndroid Build Coastguard WorkerABS_DIFF (A,B) = (A>B) ? (A-B) : (B-A)
16*61046927SAndroid Build Coastguard WorkerD.u = ABS_DIFF (S0.u,S1.u) + S2.u
17*61046927SAndroid Build Coastguard Worker```
18*61046927SAndroid Build Coastguard Worker
19*61046927SAndroid Build Coastguard WorkerThe instruction doesn't subtract the S0 and S1 and use the absolute value (the
20*61046927SAndroid Build Coastguard Worker_signed_ distance), it uses the _unsigned_ distance between the operands. So
21*61046927SAndroid Build Coastguard Worker`v_sad_u32(-5, 0, 0)` would return `4294967291` (`-5` interpreted as unsigned),
22*61046927SAndroid Build Coastguard Workernot `5`.
23*61046927SAndroid Build Coastguard Worker
24*61046927SAndroid Build Coastguard Worker## `s_bfe_*`
25*61046927SAndroid Build Coastguard Worker
26*61046927SAndroid Build Coastguard WorkerBoth the RDNA, Vega and GCN3 ISA references write that these instructions don't write
27*61046927SAndroid Build Coastguard WorkerSCC. They do.
28*61046927SAndroid Build Coastguard Worker
29*61046927SAndroid Build Coastguard Worker## `v_bcnt_u32_b32`
30*61046927SAndroid Build Coastguard Worker
31*61046927SAndroid Build Coastguard WorkerThe Vega ISA reference writes its behaviour as:
32*61046927SAndroid Build Coastguard Worker
33*61046927SAndroid Build Coastguard Worker```
34*61046927SAndroid Build Coastguard WorkerD.u = 0;
35*61046927SAndroid Build Coastguard Workerfor i in 0 ... 31 do
36*61046927SAndroid Build Coastguard WorkerD.u += (S0.u[i] == 1 ? 1 : 0);
37*61046927SAndroid Build Coastguard Workerendfor.
38*61046927SAndroid Build Coastguard Worker```
39*61046927SAndroid Build Coastguard Worker
40*61046927SAndroid Build Coastguard WorkerThis is incorrect. The actual behaviour (and number of operands) is what
41*61046927SAndroid Build Coastguard Workeris written in the GCN3 reference guide:
42*61046927SAndroid Build Coastguard Worker
43*61046927SAndroid Build Coastguard Worker```
44*61046927SAndroid Build Coastguard WorkerD.u = CountOneBits(S0.u) + S1.u.
45*61046927SAndroid Build Coastguard Worker```
46*61046927SAndroid Build Coastguard Worker
47*61046927SAndroid Build Coastguard Worker## `v_alignbyte_b32`
48*61046927SAndroid Build Coastguard Worker
49*61046927SAndroid Build Coastguard WorkerAll versions of the ISA document are vague about it, but after some trial and
50*61046927SAndroid Build Coastguard Workererror we discovered that only 2 bits of the 3rd operand are used.
51*61046927SAndroid Build Coastguard WorkerTherefore, this instruction can't shift more than 24 bits.
52*61046927SAndroid Build Coastguard Worker
53*61046927SAndroid Build Coastguard WorkerThe correct description of `v_alignbyte_b32` is probably the following:
54*61046927SAndroid Build Coastguard Worker
55*61046927SAndroid Build Coastguard Worker```
56*61046927SAndroid Build Coastguard WorkerD.u = ({S0, S1} >> (8 * S2.u[1:0])) & 0xffffffff
57*61046927SAndroid Build Coastguard Worker```
58*61046927SAndroid Build Coastguard Worker
59*61046927SAndroid Build Coastguard Worker## SMEM stores
60*61046927SAndroid Build Coastguard Worker
61*61046927SAndroid Build Coastguard WorkerThe Vega ISA references doesn't say this (or doesn't make it clear), but
62*61046927SAndroid Build Coastguard Workerthe offset for SMEM stores must be in m0 if IMM == 0.
63*61046927SAndroid Build Coastguard Worker
64*61046927SAndroid Build Coastguard WorkerThe RDNA ISA doesn't mention SMEM stores at all, but they seem to be supported
65*61046927SAndroid Build Coastguard Workerby the chip and are present in LLVM. AMD devs however highly recommend avoiding
66*61046927SAndroid Build Coastguard Workerthese instructions.
67*61046927SAndroid Build Coastguard Worker
68*61046927SAndroid Build Coastguard Worker## SMEM atomics
69*61046927SAndroid Build Coastguard Worker
70*61046927SAndroid Build Coastguard WorkerRDNA ISA: same as the SMEM stores, the ISA pretends they don't exist, but they
71*61046927SAndroid Build Coastguard Workerare there in LLVM.
72*61046927SAndroid Build Coastguard Worker
73*61046927SAndroid Build Coastguard Worker## VMEM stores
74*61046927SAndroid Build Coastguard Worker
75*61046927SAndroid Build Coastguard WorkerAll reference guides say (under "Vector Memory Instruction Data Dependencies"):
76*61046927SAndroid Build Coastguard Worker
77*61046927SAndroid Build Coastguard Worker> When a VM instruction is issued, the address is immediately read out of VGPRs
78*61046927SAndroid Build Coastguard Worker> and sent to the texture cache. Any texture or buffer resources and samplers
79*61046927SAndroid Build Coastguard Worker> are also sent immediately. However, write-data is not immediately sent to the
80*61046927SAndroid Build Coastguard Worker> texture cache.
81*61046927SAndroid Build Coastguard Worker
82*61046927SAndroid Build Coastguard WorkerReading that, one might think that waitcnts need to be added when writing to
83*61046927SAndroid Build Coastguard Workerthe registers used for a VMEM store's data. Experimentation has shown that this
84*61046927SAndroid Build Coastguard Workerdoes not seem to be the case on GFX8 and GFX9 (GFX6 and GFX7 are untested). It
85*61046927SAndroid Build Coastguard Workeralso seems unlikely, since NOPs are apparently needed in a subset of these
86*61046927SAndroid Build Coastguard Workersituations.
87*61046927SAndroid Build Coastguard Worker
88*61046927SAndroid Build Coastguard Worker## MIMG opcodes on GFX8/GCN3
89*61046927SAndroid Build Coastguard Worker
90*61046927SAndroid Build Coastguard WorkerThe `image_atomic_{swap,cmpswap,add,sub}` opcodes in the GCN3 ISA reference
91*61046927SAndroid Build Coastguard Workerguide are incorrect. The Vega ISA reference guide has the correct ones.
92*61046927SAndroid Build Coastguard Worker
93*61046927SAndroid Build Coastguard Worker## VINTRP encoding
94*61046927SAndroid Build Coastguard Worker
95*61046927SAndroid Build Coastguard WorkerVEGA ISA doc says the encoding should be `110010` but `110101` works.
96*61046927SAndroid Build Coastguard Worker
97*61046927SAndroid Build Coastguard Worker## VOP1 instructions encoded as VOP3
98*61046927SAndroid Build Coastguard Worker
99*61046927SAndroid Build Coastguard WorkerRDNA ISA doc says that `0x140` should be added to the opcode, but that doesn't
100*61046927SAndroid Build Coastguard Workerwork. What works is adding `0x180`, which LLVM also does.
101*61046927SAndroid Build Coastguard Worker
102*61046927SAndroid Build Coastguard Worker## FLAT, Scratch, Global instructions
103*61046927SAndroid Build Coastguard Worker
104*61046927SAndroid Build Coastguard WorkerThe NV bit was removed in RDNA, but some parts of the doc still mention it.
105*61046927SAndroid Build Coastguard Worker
106*61046927SAndroid Build Coastguard WorkerRDNA ISA doc 13.8.1 says that SADDR should be set to 0x7f when ADDR is used, but
107*61046927SAndroid Build Coastguard Worker9.3.1 says it should be set to NULL. We assume 9.3.1 is correct and set it to
108*61046927SAndroid Build Coastguard WorkerSGPR_NULL.
109*61046927SAndroid Build Coastguard Worker
110*61046927SAndroid Build Coastguard Worker## Legacy instructions
111*61046927SAndroid Build Coastguard Worker
112*61046927SAndroid Build Coastguard WorkerSome instructions have a `_LEGACY` variant which implements "DX9 rules", in which
113*61046927SAndroid Build Coastguard Workerthe zero "wins" in multiplications, ie. `0.0*x` is always `0.0`. The VEGA ISA
114*61046927SAndroid Build Coastguard Workermentions `V_MAC_LEGACY_F32` but this instruction is not really there on VEGA.
115*61046927SAndroid Build Coastguard Worker
116*61046927SAndroid Build Coastguard Worker## LDS size and allocation granule
117*61046927SAndroid Build Coastguard Worker
118*61046927SAndroid Build Coastguard WorkerGFX7-8 ISA manuals are mistaken about the available LDS size.
119*61046927SAndroid Build Coastguard Worker
120*61046927SAndroid Build Coastguard Worker* GFX7+ workgroups can use 64KB LDS.
121*61046927SAndroid Build Coastguard Worker  There is 64KB LDS per CU.
122*61046927SAndroid Build Coastguard Worker* GFX6 workgroups can use 32KB LDS.
123*61046927SAndroid Build Coastguard Worker  There is 64KB LDS per CU, but a single workgroup can only use half of it.
124*61046927SAndroid Build Coastguard Worker
125*61046927SAndroid Build Coastguard Worker Regarding the LDS allocation granule, Mesa has the correct details and
126*61046927SAndroid Build Coastguard Worker the ISA manuals are mistaken.
127*61046927SAndroid Build Coastguard Worker
128*61046927SAndroid Build Coastguard Worker## `m0` with LDS instructions on Vega and newer
129*61046927SAndroid Build Coastguard Worker
130*61046927SAndroid Build Coastguard WorkerThe Vega ISA doc (both the old one and the "7nm" one) claims that LDS instructions
131*61046927SAndroid Build Coastguard Workeruse the `m0` register for address clamping like older GPUs, but this is not the case.
132*61046927SAndroid Build Coastguard Worker
133*61046927SAndroid Build Coastguard WorkerIn reality, only the `_addtid` variants of LDS instructions use `m0` on Vega and
134*61046927SAndroid Build Coastguard Workernewer GPUs, so the relevant section of the RDNA ISA doc seems to apply.
135*61046927SAndroid Build Coastguard WorkerLLVM also doesn't emit any initialization of `m0` for LDS instructions, and this
136*61046927SAndroid Build Coastguard Workerwas also confirmed by AMD devs.
137*61046927SAndroid Build Coastguard Worker
138*61046927SAndroid Build Coastguard Worker## RDNA L0, L1 cache and DLC, GLC bits
139*61046927SAndroid Build Coastguard Worker
140*61046927SAndroid Build Coastguard WorkerThe old L1 cache was renamed to L0, and a new L1 cache was added to RDNA. The
141*61046927SAndroid Build Coastguard WorkerL1 cache is 1 cache per shader array. Some instruction encodings have DLC and
142*61046927SAndroid Build Coastguard WorkerGLC bits that interact with the cache.
143*61046927SAndroid Build Coastguard Worker
144*61046927SAndroid Build Coastguard Worker* DLC ("device level coherent") bit: controls the L1 cache
145*61046927SAndroid Build Coastguard Worker* GLC ("globally coherent") bit: controls the L0 cache
146*61046927SAndroid Build Coastguard Worker
147*61046927SAndroid Build Coastguard WorkerThe recommendation from AMD devs is to always set these two bits at the same time,
148*61046927SAndroid Build Coastguard Workeras it doesn't make too much sense to set them independently, aside from some
149*61046927SAndroid Build Coastguard Workercircumstances (eg. we needn't set DLC when only one shader array is used).
150*61046927SAndroid Build Coastguard Worker
151*61046927SAndroid Build Coastguard WorkerStores and atomics always bypass the L1 cache, so they don't support the DLC bit,
152*61046927SAndroid Build Coastguard Workerand it shouldn't be set in these cases. Setting the DLC for these cases can result
153*61046927SAndroid Build Coastguard Workerin graphical glitches or hangs.
154*61046927SAndroid Build Coastguard Worker
155*61046927SAndroid Build Coastguard Worker## RDNA `s_dcache_wb`
156*61046927SAndroid Build Coastguard Worker
157*61046927SAndroid Build Coastguard WorkerThe `s_dcache_wb` is not mentioned in the RDNA ISA doc, but it is needed in order
158*61046927SAndroid Build Coastguard Workerto achieve correct behavior in some SSBO CTS tests.
159*61046927SAndroid Build Coastguard Worker
160*61046927SAndroid Build Coastguard Worker## RDNA subvector mode
161*61046927SAndroid Build Coastguard Worker
162*61046927SAndroid Build Coastguard WorkerThe documentation of `s_subvector_loop_begin` and `s_subvector_mode_end` is not clear
163*61046927SAndroid Build Coastguard Workeron what sort of addressing should be used, but it says that it
164*61046927SAndroid Build Coastguard Worker"is equivalent to an `S_CBRANCH` with extra math", so the subvector loop handling
165*61046927SAndroid Build Coastguard Workerin ACO is done according to the `s_cbranch` doc.
166*61046927SAndroid Build Coastguard Worker
167*61046927SAndroid Build Coastguard Worker## RDNA early rasterization
168*61046927SAndroid Build Coastguard Worker
169*61046927SAndroid Build Coastguard WorkerThe ISA documentation says about `s_endpgm`:
170*61046927SAndroid Build Coastguard Worker
171*61046927SAndroid Build Coastguard Worker> The hardware implicitly executes S_WAITCNT 0 and S_WAITCNT_VSCNT 0
172*61046927SAndroid Build Coastguard Worker> before executing this instruction.
173*61046927SAndroid Build Coastguard Worker
174*61046927SAndroid Build Coastguard WorkerWhat the doc doesn't say is that in case of NGG (and legacy VS) when there
175*61046927SAndroid Build Coastguard Workerare no param exports, the driver sets `NO_PC_EXPORT=1` for optimal performance,
176*61046927SAndroid Build Coastguard Workerand when this is set, the hardware will start clipping and rasterization
177*61046927SAndroid Build Coastguard Workeras soon as it encounters a position export with `DONE=1`, without waiting
178*61046927SAndroid Build Coastguard Workerfor the NGG (or VS) to finish.
179*61046927SAndroid Build Coastguard Worker
180*61046927SAndroid Build Coastguard WorkerIt can even launch PS waves before NGG (or VS) ends.
181*61046927SAndroid Build Coastguard Worker
182*61046927SAndroid Build Coastguard WorkerWhen this happens, any store performed by a VS is not guaranteed
183*61046927SAndroid Build Coastguard Workerto be complete when PS tries to load it, so we need to manually
184*61046927SAndroid Build Coastguard Workermake sure to insert wait instructions before the position exports.
185*61046927SAndroid Build Coastguard Worker
186*61046927SAndroid Build Coastguard Worker## A16 and G16
187*61046927SAndroid Build Coastguard Worker
188*61046927SAndroid Build Coastguard WorkerOn GFX9, the A16 field enables both 16 bit addresses and derivatives.
189*61046927SAndroid Build Coastguard WorkerSince GFX10+ these are fully independent of each other, A16 controls 16 bit addresses
190*61046927SAndroid Build Coastguard Workerand G16 opcodes 16 bit derivatives. A16 without G16 uses 32 bit derivatives.
191*61046927SAndroid Build Coastguard Worker
192*61046927SAndroid Build Coastguard Worker## POPS collision wave ID argument (GFX9-10.3)
193*61046927SAndroid Build Coastguard Worker
194*61046927SAndroid Build Coastguard WorkerThe 2020 RDNA and RDNA 2 ISA references contain incorrect offsets and widths of
195*61046927SAndroid Build Coastguard Workerthe fields of the "POPS collision wave ID" SGPR argument.
196*61046927SAndroid Build Coastguard Worker
197*61046927SAndroid Build Coastguard WorkerAccording to the code generated for Rasterizer Ordered View usage in Direct3D,
198*61046927SAndroid Build Coastguard Workerthe correct layout is:
199*61046927SAndroid Build Coastguard Worker
200*61046927SAndroid Build Coastguard Worker* [31]: Whether overlap has occurred.
201*61046927SAndroid Build Coastguard Worker* [29:28] (GFX10+) / [28] (GFX9): ID of the packer the wave should be associated
202*61046927SAndroid Build Coastguard Worker  with.
203*61046927SAndroid Build Coastguard Worker* [25:16]: Newest overlapped wave ID.
204*61046927SAndroid Build Coastguard Worker* [9:0]: Current wave ID.
205*61046927SAndroid Build Coastguard Worker
206*61046927SAndroid Build Coastguard Worker## RDNA3 `v_pk_fmac_f16_dpp`
207*61046927SAndroid Build Coastguard Worker
208*61046927SAndroid Build Coastguard Worker"Table 30. Which instructions support DPP" in the RDNA3 ISA documentation has no exception for
209*61046927SAndroid Build Coastguard WorkerVOP2 `v_pk_fmac_f16`. But like all other packed math opcodes, DPP does not function in practice.
210*61046927SAndroid Build Coastguard WorkerRDNA1 and RDNA2 support `v_pk_fmac_f16_dpp`.
211*61046927SAndroid Build Coastguard Worker
212*61046927SAndroid Build Coastguard Worker
213*61046927SAndroid Build Coastguard Worker# Hardware Bugs
214*61046927SAndroid Build Coastguard Worker
215*61046927SAndroid Build Coastguard Worker## SMEM corrupts VCCZ on SI/CI
216*61046927SAndroid Build Coastguard Worker
217*61046927SAndroid Build Coastguard Worker[See this LLVM source.](https://github.com/llvm/llvm-project/blob/acb089e12ae48b82c0b05c42326196a030df9b82/llvm/lib/Target/AMDGPU/SIInsertWaits.cpp#L580-L616)
218*61046927SAndroid Build Coastguard Worker
219*61046927SAndroid Build Coastguard WorkerAfter issuing a SMEM instructions, we need to wait for the SMEM instructions to
220*61046927SAndroid Build Coastguard Workerfinish and then write to vcc (for example, `s_mov_b64 vcc, vcc`) to correct vccz
221*61046927SAndroid Build Coastguard Worker
222*61046927SAndroid Build Coastguard WorkerCurrently, we don't do this.
223*61046927SAndroid Build Coastguard Worker
224*61046927SAndroid Build Coastguard Worker## SGPR offset on MUBUF prevents addr clamping on SI/CI
225*61046927SAndroid Build Coastguard Worker
226*61046927SAndroid Build Coastguard Worker[See this LLVM source.](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp#L1917-L1922)
227*61046927SAndroid Build Coastguard Worker
228*61046927SAndroid Build Coastguard WorkerThis leads to wrong bounds checking, using a VGPR offset fixes it.
229*61046927SAndroid Build Coastguard Worker
230*61046927SAndroid Build Coastguard Worker## unused VMEM/DS destination lanes can't be used without waiting
231*61046927SAndroid Build Coastguard Worker
232*61046927SAndroid Build Coastguard WorkerOn GFX11, we can't safely read/write unused lanes of VMEM/DS destination
233*61046927SAndroid Build Coastguard WorkerVGPRs without waiting for the load to finish.
234*61046927SAndroid Build Coastguard Worker
235*61046927SAndroid Build Coastguard Worker## GCN / GFX6 hazards
236*61046927SAndroid Build Coastguard Worker
237*61046927SAndroid Build Coastguard Worker### VINTRP followed by a read with `v_readfirstlane` or `v_readlane`
238*61046927SAndroid Build Coastguard Worker
239*61046927SAndroid Build Coastguard WorkerIt's required to insert 1 wait state if the dst VGPR of any  `v_interp_*` is
240*61046927SAndroid Build Coastguard Workerfollowed by a read with `v_readfirstlane` or `v_readlane` to fix GPU hangs on GFX6.
241*61046927SAndroid Build Coastguard WorkerNote that `v_writelane_*` is apparently not affected. This hazard isn't
242*61046927SAndroid Build Coastguard Workerdocumented anywhere but AMD confirmed it.
243*61046927SAndroid Build Coastguard Worker
244*61046927SAndroid Build Coastguard Worker## RDNA / GFX10 hazards
245*61046927SAndroid Build Coastguard Worker
246*61046927SAndroid Build Coastguard Worker### SMEM store followed by a load with the same address
247*61046927SAndroid Build Coastguard Worker
248*61046927SAndroid Build Coastguard WorkerWe found that an `s_buffer_load` will produce incorrect results if it is preceded
249*61046927SAndroid Build Coastguard Workerby an `s_buffer_store` with the same address. Inserting an `s_nop` between them
250*61046927SAndroid Build Coastguard Workerdoes not mitigate the issue, so an `s_waitcnt lgkmcnt(0)` must be inserted.
251*61046927SAndroid Build Coastguard WorkerThis is not mentioned by LLVM among the other GFX10 bugs, but LLVM doesn't use
252*61046927SAndroid Build Coastguard WorkerSMEM stores, so it's not surprising that they didn't notice it.
253*61046927SAndroid Build Coastguard Worker
254*61046927SAndroid Build Coastguard Worker### VMEMtoScalarWriteHazard
255*61046927SAndroid Build Coastguard Worker
256*61046927SAndroid Build Coastguard WorkerTriggered by:
257*61046927SAndroid Build Coastguard WorkerVMEM/FLAT/GLOBAL/SCRATCH/DS instruction reads an SGPR (or EXEC, or M0).
258*61046927SAndroid Build Coastguard WorkerThen, a SALU/SMEM instruction writes the same SGPR.
259*61046927SAndroid Build Coastguard Worker
260*61046927SAndroid Build Coastguard WorkerMitigated by:
261*61046927SAndroid Build Coastguard WorkerA VALU instruction or an `s_waitcnt` between the two instructions.
262*61046927SAndroid Build Coastguard Worker
263*61046927SAndroid Build Coastguard Worker### SMEMtoVectorWriteHazard
264*61046927SAndroid Build Coastguard Worker
265*61046927SAndroid Build Coastguard WorkerTriggered by:
266*61046927SAndroid Build Coastguard WorkerAn SMEM instruction reads an SGPR. Then, a VALU instruction writes that same SGPR.
267*61046927SAndroid Build Coastguard Worker
268*61046927SAndroid Build Coastguard WorkerMitigated by:
269*61046927SAndroid Build Coastguard WorkerAny non-SOPP SALU instruction (except `s_setvskip`, `s_version`, and any non-lgkmcnt `s_waitcnt`).
270*61046927SAndroid Build Coastguard Worker
271*61046927SAndroid Build Coastguard Worker### Offset3fBug
272*61046927SAndroid Build Coastguard Worker
273*61046927SAndroid Build Coastguard WorkerAny branch that is located at offset 0x3f will be buggy. Just insert some NOPs to make sure no branch
274*61046927SAndroid Build Coastguard Workeris located at this offset.
275*61046927SAndroid Build Coastguard Worker
276*61046927SAndroid Build Coastguard Worker### InstFwdPrefetchBug
277*61046927SAndroid Build Coastguard Worker
278*61046927SAndroid Build Coastguard WorkerAccording to LLVM, the `s_inst_prefetch` instruction can cause a hang on GFX10.
279*61046927SAndroid Build Coastguard WorkerSeems to be resolved on GFX10.3+. There are no further details.
280*61046927SAndroid Build Coastguard Worker
281*61046927SAndroid Build Coastguard Worker### LdsMisalignedBug
282*61046927SAndroid Build Coastguard Worker
283*61046927SAndroid Build Coastguard WorkerWhen there is a misaligned multi-dword FLAT load/store instruction in WGP mode,
284*61046927SAndroid Build Coastguard Workerit needs to be split into multiple single-dword FLAT instructions.
285*61046927SAndroid Build Coastguard Worker
286*61046927SAndroid Build Coastguard WorkerACO doesn't use FLAT load/store on GFX10, so is unaffected.
287*61046927SAndroid Build Coastguard Worker
288*61046927SAndroid Build Coastguard Worker### FlatSegmentOffsetBug
289*61046927SAndroid Build Coastguard Worker
290*61046927SAndroid Build Coastguard WorkerThe 12-bit immediate OFFSET field of FLAT instructions must always be 0.
291*61046927SAndroid Build Coastguard WorkerGLOBAL and SCRATCH are unaffected.
292*61046927SAndroid Build Coastguard Worker
293*61046927SAndroid Build Coastguard WorkerACO doesn't use FLAT load/store on GFX10, so is unaffected.
294*61046927SAndroid Build Coastguard Worker
295*61046927SAndroid Build Coastguard Worker### VcmpxPermlaneHazard
296*61046927SAndroid Build Coastguard Worker
297*61046927SAndroid Build Coastguard WorkerTriggered by:
298*61046927SAndroid Build Coastguard WorkerAny permlane instruction that follows any VOPC instruction which writes exec.
299*61046927SAndroid Build Coastguard Worker
300*61046927SAndroid Build Coastguard WorkerMitigated by: any VALU instruction except `v_nop`.
301*61046927SAndroid Build Coastguard Worker
302*61046927SAndroid Build Coastguard Worker### VcmpxExecWARHazard
303*61046927SAndroid Build Coastguard Worker
304*61046927SAndroid Build Coastguard WorkerTriggered by:
305*61046927SAndroid Build Coastguard WorkerAny non-VALU instruction reads the EXEC mask. Then, any VALU instruction writes the EXEC mask.
306*61046927SAndroid Build Coastguard Worker
307*61046927SAndroid Build Coastguard WorkerMitigated by:
308*61046927SAndroid Build Coastguard WorkerA VALU instruction that writes an SGPR (or has a valid SDST operand), or `s_waitcnt_depctr 0xfffe`.
309*61046927SAndroid Build Coastguard WorkerNote: `s_waitcnt_depctr` is an internal instruction, so there is no further information
310*61046927SAndroid Build Coastguard Workerabout what it does or what its operand means.
311*61046927SAndroid Build Coastguard Worker
312*61046927SAndroid Build Coastguard Worker### LdsBranchVmemWARHazard
313*61046927SAndroid Build Coastguard Worker
314*61046927SAndroid Build Coastguard WorkerTriggered by:
315*61046927SAndroid Build Coastguard WorkerVMEM/GLOBAL/SCRATCH instruction, then a branch, then a DS instruction,
316*61046927SAndroid Build Coastguard Workeror vice versa: DS instruction, then a branch, then a VMEM/GLOBAL/SCRATCH instruction.
317*61046927SAndroid Build Coastguard Worker
318*61046927SAndroid Build Coastguard WorkerMitigated by:
319*61046927SAndroid Build Coastguard WorkerOnly `s_waitcnt_vscnt null, 0`. Needed even if the first instruction is a load.
320*61046927SAndroid Build Coastguard Worker
321*61046927SAndroid Build Coastguard Worker### NSAClauseBug
322*61046927SAndroid Build Coastguard Worker
323*61046927SAndroid Build Coastguard Worker"MIMG-NSA in a hard clause has unpredictable results on GFX10.1"
324*61046927SAndroid Build Coastguard Worker
325*61046927SAndroid Build Coastguard Worker### NSAMaxSize5
326*61046927SAndroid Build Coastguard Worker
327*61046927SAndroid Build Coastguard WorkerNSA MIMG instructions should be limited to 3 dwords before GFX10.3 to avoid
328*61046927SAndroid Build Coastguard Workerstability issues: https://reviews.llvm.org/D103348
329*61046927SAndroid Build Coastguard Worker
330*61046927SAndroid Build Coastguard Worker## RDNA3 / GFX11 hazards
331*61046927SAndroid Build Coastguard Worker
332*61046927SAndroid Build Coastguard Worker### VcmpxPermlaneHazard
333*61046927SAndroid Build Coastguard Worker
334*61046927SAndroid Build Coastguard WorkerSame as GFX10.
335*61046927SAndroid Build Coastguard Worker
336*61046927SAndroid Build Coastguard Worker### LdsDirectVALUHazard
337*61046927SAndroid Build Coastguard Worker
338*61046927SAndroid Build Coastguard WorkerTriggered by:
339*61046927SAndroid Build Coastguard WorkerLDSDIR instruction writing a VGPR soon after it's used by a VALU instruction.
340*61046927SAndroid Build Coastguard Worker
341*61046927SAndroid Build Coastguard WorkerMitigated by:
342*61046927SAndroid Build Coastguard WorkerA vdst wait, preferably using the LDSDIR's field.
343*61046927SAndroid Build Coastguard Worker
344*61046927SAndroid Build Coastguard Worker### LdsDirectVMEMHazard
345*61046927SAndroid Build Coastguard Worker
346*61046927SAndroid Build Coastguard WorkerTriggered by:
347*61046927SAndroid Build Coastguard WorkerLDSDIR instruction writing a VGPR after it's used by a VMEM/DS instruction.
348*61046927SAndroid Build Coastguard Worker
349*61046927SAndroid Build Coastguard WorkerMitigated by:
350*61046927SAndroid Build Coastguard WorkerWaiting for the VMEM/DS instruction to finish, a VALU or export instruction, or
351*61046927SAndroid Build Coastguard Worker`s_waitcnt_depctr 0xffe3`.
352*61046927SAndroid Build Coastguard Worker
353*61046927SAndroid Build Coastguard Worker### VALUTransUseHazard
354*61046927SAndroid Build Coastguard Worker
355*61046927SAndroid Build Coastguard WorkerTriggered by:
356*61046927SAndroid Build Coastguard WorkerA VALU instruction reading a VGPR written by a transcendental VALU instruction without 6+ VALU or 2+
357*61046927SAndroid Build Coastguard Workertranscendental instructions in-between.
358*61046927SAndroid Build Coastguard Worker
359*61046927SAndroid Build Coastguard WorkerMitigated by:
360*61046927SAndroid Build Coastguard WorkerA va_vdst=0 wait: `s_waitcnt_deptr 0x0fff`
361*61046927SAndroid Build Coastguard Worker
362*61046927SAndroid Build Coastguard Worker### VALUPartialForwardingHazard
363*61046927SAndroid Build Coastguard Worker
364*61046927SAndroid Build Coastguard WorkerTriggered by:
365*61046927SAndroid Build Coastguard WorkerA VALU instruction reading two VGPRs: one written before an exec write by SALU and one after. To
366*61046927SAndroid Build Coastguard Workertrigger, there must be less than 3 VALU between the first and second VGPR writes and less than 5
367*61046927SAndroid Build Coastguard WorkerVALU between the second VGPR write and the current instruction.
368*61046927SAndroid Build Coastguard Worker
369*61046927SAndroid Build Coastguard WorkerMitigated by:
370*61046927SAndroid Build Coastguard WorkerA va_vdst=0 wait: `s_waitcnt_deptr 0x0fff`
371*61046927SAndroid Build Coastguard Worker
372*61046927SAndroid Build Coastguard Worker### VALUMaskWriteHazard
373*61046927SAndroid Build Coastguard Worker
374*61046927SAndroid Build Coastguard WorkerTriggered by:
375*61046927SAndroid Build Coastguard WorkerSALU writing then SALU or VALU reading a SGPR that was previously used as a lane mask for a VALU.
376*61046927SAndroid Build Coastguard Worker
377*61046927SAndroid Build Coastguard WorkerMitigated by:
378*61046927SAndroid Build Coastguard WorkerA VALU instruction reading a non-exec SGPR before the SALU write, or a sa_sdst=0 wait after the
379*61046927SAndroid Build Coastguard WorkerSALU write: `s_waitcnt_depctr 0xfffe`
380