xref: /aosp_15_r20/external/llvm/test/CodeGen/AMDGPU/lds-output-queue.ll (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker; RUN: llc < %s -march=r600 -mcpu=redwood -verify-machineinstrs | FileCheck %s
2*9880d681SAndroid Build Coastguard Worker;
3*9880d681SAndroid Build Coastguard Worker; This test checks that the lds input queue will is empty at the end of
4*9880d681SAndroid Build Coastguard Worker; the ALU clause.
5*9880d681SAndroid Build Coastguard Worker
6*9880d681SAndroid Build Coastguard Worker; CHECK-LABEL: {{^}}lds_input_queue:
7*9880d681SAndroid Build Coastguard Worker; CHECK: LDS_READ_RET * OQAP
8*9880d681SAndroid Build Coastguard Worker; CHECK-NOT: ALU clause
9*9880d681SAndroid Build Coastguard Worker; CHECK: MOV * T{{[0-9]\.[XYZW]}}, OQAP
10*9880d681SAndroid Build Coastguard Worker
11*9880d681SAndroid Build Coastguard Worker@local_mem = internal unnamed_addr addrspace(3) global [2 x i32] undef, align 4
12*9880d681SAndroid Build Coastguard Worker
13*9880d681SAndroid Build Coastguard Workerdefine void @lds_input_queue(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %index) {
14*9880d681SAndroid Build Coastguard Workerentry:
15*9880d681SAndroid Build Coastguard Worker  %0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @local_mem, i32 0, i32 %index
16*9880d681SAndroid Build Coastguard Worker  %1 = load i32, i32 addrspace(3)* %0
17*9880d681SAndroid Build Coastguard Worker  call void @llvm.AMDGPU.barrier.local()
18*9880d681SAndroid Build Coastguard Worker
19*9880d681SAndroid Build Coastguard Worker  ; This will start a new clause for the vertex fetch
20*9880d681SAndroid Build Coastguard Worker  %2 = load i32, i32 addrspace(1)* %in
21*9880d681SAndroid Build Coastguard Worker  %3 = add i32 %1, %2
22*9880d681SAndroid Build Coastguard Worker  store i32 %3, i32 addrspace(1)* %out
23*9880d681SAndroid Build Coastguard Worker  ret void
24*9880d681SAndroid Build Coastguard Worker}
25*9880d681SAndroid Build Coastguard Worker
26*9880d681SAndroid Build Coastguard Workerdeclare void @llvm.AMDGPU.barrier.local()
27*9880d681SAndroid Build Coastguard Worker
28*9880d681SAndroid Build Coastguard Worker; The machine scheduler does not do proper alias analysis and assumes that
29*9880d681SAndroid Build Coastguard Worker; loads from global values (Note that a global value is different that a
30*9880d681SAndroid Build Coastguard Worker; value from global memory.  A global value is a value that is declared
31*9880d681SAndroid Build Coastguard Worker; outside of a function, it can reside in any address space) alias with
32*9880d681SAndroid Build Coastguard Worker; all other loads.
33*9880d681SAndroid Build Coastguard Worker;
34*9880d681SAndroid Build Coastguard Worker; This is a problem for scheduling the reads from the local data share (lds).
35*9880d681SAndroid Build Coastguard Worker; These reads are implemented using two instructions.  The first copies the
36*9880d681SAndroid Build Coastguard Worker; data from lds into the lds output queue, and the second moves the data from
37*9880d681SAndroid Build Coastguard Worker; the input queue into main memory.  These two instructions don't have to be
38*9880d681SAndroid Build Coastguard Worker; scheduled one after the other, but they do need to be scheduled in the same
39*9880d681SAndroid Build Coastguard Worker; clause.  The aliasing problem mentioned above causes problems when there is a
40*9880d681SAndroid Build Coastguard Worker; load from global memory which immediately follows a load from a global value that
41*9880d681SAndroid Build Coastguard Worker; has been declared in the local memory space:
42*9880d681SAndroid Build Coastguard Worker;
43*9880d681SAndroid Build Coastguard Worker;  %0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @local_mem, i32 0, i32 %index
44*9880d681SAndroid Build Coastguard Worker;  %1 = load i32, i32 addrspace(3)* %0
45*9880d681SAndroid Build Coastguard Worker;  %2 = load i32, i32 addrspace(1)* %in
46*9880d681SAndroid Build Coastguard Worker;
47*9880d681SAndroid Build Coastguard Worker; The instruction selection phase will generate ISA that looks like this:
48*9880d681SAndroid Build Coastguard Worker; %OQAP = LDS_READ_RET
49*9880d681SAndroid Build Coastguard Worker; %vreg0 = MOV %OQAP
50*9880d681SAndroid Build Coastguard Worker; %vreg1 = VTX_READ_32
51*9880d681SAndroid Build Coastguard Worker; %vreg2 = ADD_INT %vreg1, %vreg0
52*9880d681SAndroid Build Coastguard Worker;
53*9880d681SAndroid Build Coastguard Worker; The bottom scheduler will schedule the two ALU instructions first:
54*9880d681SAndroid Build Coastguard Worker;
55*9880d681SAndroid Build Coastguard Worker; UNSCHEDULED:
56*9880d681SAndroid Build Coastguard Worker; %OQAP = LDS_READ_RET
57*9880d681SAndroid Build Coastguard Worker; %vreg1 = VTX_READ_32
58*9880d681SAndroid Build Coastguard Worker;
59*9880d681SAndroid Build Coastguard Worker; SCHEDULED:
60*9880d681SAndroid Build Coastguard Worker;
61*9880d681SAndroid Build Coastguard Worker; vreg0 = MOV %OQAP
62*9880d681SAndroid Build Coastguard Worker; vreg2 = ADD_INT %vreg1, %vreg2
63*9880d681SAndroid Build Coastguard Worker;
64*9880d681SAndroid Build Coastguard Worker; The lack of proper aliasing results in the local memory read (LDS_READ_RET)
65*9880d681SAndroid Build Coastguard Worker; to consider the global memory read (VTX_READ_32) has a chain dependency, so
66*9880d681SAndroid Build Coastguard Worker; the global memory read will always be scheduled first.  This will give us a
67*9880d681SAndroid Build Coastguard Worker; final program which looks like this:
68*9880d681SAndroid Build Coastguard Worker;
69*9880d681SAndroid Build Coastguard Worker; Alu clause:
70*9880d681SAndroid Build Coastguard Worker; %OQAP = LDS_READ_RET
71*9880d681SAndroid Build Coastguard Worker; VTX clause:
72*9880d681SAndroid Build Coastguard Worker; %vreg1 = VTX_READ_32
73*9880d681SAndroid Build Coastguard Worker; Alu clause:
74*9880d681SAndroid Build Coastguard Worker; vreg0 = MOV %OQAP
75*9880d681SAndroid Build Coastguard Worker; vreg2 = ADD_INT %vreg1, %vreg2
76*9880d681SAndroid Build Coastguard Worker;
77*9880d681SAndroid Build Coastguard Worker; This is an illegal program because the OQAP def and use know occur in
78*9880d681SAndroid Build Coastguard Worker; different ALU clauses.
79*9880d681SAndroid Build Coastguard Worker;
80*9880d681SAndroid Build Coastguard Worker; This test checks this scenario and makes sure it doesn't result in an
81*9880d681SAndroid Build Coastguard Worker; illegal program.  For now, we have fixed this issue by merging the
82*9880d681SAndroid Build Coastguard Worker; LDS_READ_RET and MOV together during instruction selection and then
83*9880d681SAndroid Build Coastguard Worker; expanding them after scheduling.  Once the scheduler has better alias
84*9880d681SAndroid Build Coastguard Worker; analysis, we should be able to keep these instructions sparate before
85*9880d681SAndroid Build Coastguard Worker; scheduling.
86*9880d681SAndroid Build Coastguard Worker;
87*9880d681SAndroid Build Coastguard Worker; CHECK-LABEL: {{^}}local_global_alias:
88*9880d681SAndroid Build Coastguard Worker; CHECK: LDS_READ_RET
89*9880d681SAndroid Build Coastguard Worker; CHECK-NOT: ALU clause
90*9880d681SAndroid Build Coastguard Worker; CHECK: MOV * T{{[0-9]\.[XYZW]}}, OQAP
91*9880d681SAndroid Build Coastguard Workerdefine void @local_global_alias(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
92*9880d681SAndroid Build Coastguard Workerentry:
93*9880d681SAndroid Build Coastguard Worker  %0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @local_mem, i32 0, i32 0
94*9880d681SAndroid Build Coastguard Worker  %1 = load i32, i32 addrspace(3)* %0
95*9880d681SAndroid Build Coastguard Worker  %2 = load i32, i32 addrspace(1)* %in
96*9880d681SAndroid Build Coastguard Worker  %3 = add i32 %2, %1
97*9880d681SAndroid Build Coastguard Worker  store i32 %3, i32 addrspace(1)* %out
98*9880d681SAndroid Build Coastguard Worker  ret void
99*9880d681SAndroid Build Coastguard Worker}
100