1*9880d681SAndroid Build Coastguard Worker; RUN: llc < %s -march=r600 -mcpu=redwood -verify-machineinstrs | FileCheck %s 2*9880d681SAndroid Build Coastguard Worker; 3*9880d681SAndroid Build Coastguard Worker; This test checks that the lds input queue will is empty at the end of 4*9880d681SAndroid Build Coastguard Worker; the ALU clause. 5*9880d681SAndroid Build Coastguard Worker 6*9880d681SAndroid Build Coastguard Worker; CHECK-LABEL: {{^}}lds_input_queue: 7*9880d681SAndroid Build Coastguard Worker; CHECK: LDS_READ_RET * OQAP 8*9880d681SAndroid Build Coastguard Worker; CHECK-NOT: ALU clause 9*9880d681SAndroid Build Coastguard Worker; CHECK: MOV * T{{[0-9]\.[XYZW]}}, OQAP 10*9880d681SAndroid Build Coastguard Worker 11*9880d681SAndroid Build Coastguard Worker@local_mem = internal unnamed_addr addrspace(3) global [2 x i32] undef, align 4 12*9880d681SAndroid Build Coastguard Worker 13*9880d681SAndroid Build Coastguard Workerdefine void @lds_input_queue(i32 addrspace(1)* %out, i32 addrspace(1)* %in, i32 %index) { 14*9880d681SAndroid Build Coastguard Workerentry: 15*9880d681SAndroid Build Coastguard Worker %0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @local_mem, i32 0, i32 %index 16*9880d681SAndroid Build Coastguard Worker %1 = load i32, i32 addrspace(3)* %0 17*9880d681SAndroid Build Coastguard Worker call void @llvm.AMDGPU.barrier.local() 18*9880d681SAndroid Build Coastguard Worker 19*9880d681SAndroid Build Coastguard Worker ; This will start a new clause for the vertex fetch 20*9880d681SAndroid Build Coastguard Worker %2 = load i32, i32 addrspace(1)* %in 21*9880d681SAndroid Build Coastguard Worker %3 = add i32 %1, %2 22*9880d681SAndroid Build Coastguard Worker store i32 %3, i32 addrspace(1)* %out 23*9880d681SAndroid Build Coastguard Worker ret void 24*9880d681SAndroid Build Coastguard Worker} 25*9880d681SAndroid Build Coastguard Worker 26*9880d681SAndroid Build Coastguard Workerdeclare void @llvm.AMDGPU.barrier.local() 27*9880d681SAndroid Build Coastguard Worker 28*9880d681SAndroid Build Coastguard Worker; The machine scheduler does not do proper alias analysis and assumes that 29*9880d681SAndroid Build Coastguard Worker; loads from global values (Note that a global value is different that a 30*9880d681SAndroid Build Coastguard Worker; value from global memory. A global value is a value that is declared 31*9880d681SAndroid Build Coastguard Worker; outside of a function, it can reside in any address space) alias with 32*9880d681SAndroid Build Coastguard Worker; all other loads. 33*9880d681SAndroid Build Coastguard Worker; 34*9880d681SAndroid Build Coastguard Worker; This is a problem for scheduling the reads from the local data share (lds). 35*9880d681SAndroid Build Coastguard Worker; These reads are implemented using two instructions. The first copies the 36*9880d681SAndroid Build Coastguard Worker; data from lds into the lds output queue, and the second moves the data from 37*9880d681SAndroid Build Coastguard Worker; the input queue into main memory. These two instructions don't have to be 38*9880d681SAndroid Build Coastguard Worker; scheduled one after the other, but they do need to be scheduled in the same 39*9880d681SAndroid Build Coastguard Worker; clause. The aliasing problem mentioned above causes problems when there is a 40*9880d681SAndroid Build Coastguard Worker; load from global memory which immediately follows a load from a global value that 41*9880d681SAndroid Build Coastguard Worker; has been declared in the local memory space: 42*9880d681SAndroid Build Coastguard Worker; 43*9880d681SAndroid Build Coastguard Worker; %0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @local_mem, i32 0, i32 %index 44*9880d681SAndroid Build Coastguard Worker; %1 = load i32, i32 addrspace(3)* %0 45*9880d681SAndroid Build Coastguard Worker; %2 = load i32, i32 addrspace(1)* %in 46*9880d681SAndroid Build Coastguard Worker; 47*9880d681SAndroid Build Coastguard Worker; The instruction selection phase will generate ISA that looks like this: 48*9880d681SAndroid Build Coastguard Worker; %OQAP = LDS_READ_RET 49*9880d681SAndroid Build Coastguard Worker; %vreg0 = MOV %OQAP 50*9880d681SAndroid Build Coastguard Worker; %vreg1 = VTX_READ_32 51*9880d681SAndroid Build Coastguard Worker; %vreg2 = ADD_INT %vreg1, %vreg0 52*9880d681SAndroid Build Coastguard Worker; 53*9880d681SAndroid Build Coastguard Worker; The bottom scheduler will schedule the two ALU instructions first: 54*9880d681SAndroid Build Coastguard Worker; 55*9880d681SAndroid Build Coastguard Worker; UNSCHEDULED: 56*9880d681SAndroid Build Coastguard Worker; %OQAP = LDS_READ_RET 57*9880d681SAndroid Build Coastguard Worker; %vreg1 = VTX_READ_32 58*9880d681SAndroid Build Coastguard Worker; 59*9880d681SAndroid Build Coastguard Worker; SCHEDULED: 60*9880d681SAndroid Build Coastguard Worker; 61*9880d681SAndroid Build Coastguard Worker; vreg0 = MOV %OQAP 62*9880d681SAndroid Build Coastguard Worker; vreg2 = ADD_INT %vreg1, %vreg2 63*9880d681SAndroid Build Coastguard Worker; 64*9880d681SAndroid Build Coastguard Worker; The lack of proper aliasing results in the local memory read (LDS_READ_RET) 65*9880d681SAndroid Build Coastguard Worker; to consider the global memory read (VTX_READ_32) has a chain dependency, so 66*9880d681SAndroid Build Coastguard Worker; the global memory read will always be scheduled first. This will give us a 67*9880d681SAndroid Build Coastguard Worker; final program which looks like this: 68*9880d681SAndroid Build Coastguard Worker; 69*9880d681SAndroid Build Coastguard Worker; Alu clause: 70*9880d681SAndroid Build Coastguard Worker; %OQAP = LDS_READ_RET 71*9880d681SAndroid Build Coastguard Worker; VTX clause: 72*9880d681SAndroid Build Coastguard Worker; %vreg1 = VTX_READ_32 73*9880d681SAndroid Build Coastguard Worker; Alu clause: 74*9880d681SAndroid Build Coastguard Worker; vreg0 = MOV %OQAP 75*9880d681SAndroid Build Coastguard Worker; vreg2 = ADD_INT %vreg1, %vreg2 76*9880d681SAndroid Build Coastguard Worker; 77*9880d681SAndroid Build Coastguard Worker; This is an illegal program because the OQAP def and use know occur in 78*9880d681SAndroid Build Coastguard Worker; different ALU clauses. 79*9880d681SAndroid Build Coastguard Worker; 80*9880d681SAndroid Build Coastguard Worker; This test checks this scenario and makes sure it doesn't result in an 81*9880d681SAndroid Build Coastguard Worker; illegal program. For now, we have fixed this issue by merging the 82*9880d681SAndroid Build Coastguard Worker; LDS_READ_RET and MOV together during instruction selection and then 83*9880d681SAndroid Build Coastguard Worker; expanding them after scheduling. Once the scheduler has better alias 84*9880d681SAndroid Build Coastguard Worker; analysis, we should be able to keep these instructions sparate before 85*9880d681SAndroid Build Coastguard Worker; scheduling. 86*9880d681SAndroid Build Coastguard Worker; 87*9880d681SAndroid Build Coastguard Worker; CHECK-LABEL: {{^}}local_global_alias: 88*9880d681SAndroid Build Coastguard Worker; CHECK: LDS_READ_RET 89*9880d681SAndroid Build Coastguard Worker; CHECK-NOT: ALU clause 90*9880d681SAndroid Build Coastguard Worker; CHECK: MOV * T{{[0-9]\.[XYZW]}}, OQAP 91*9880d681SAndroid Build Coastguard Workerdefine void @local_global_alias(i32 addrspace(1)* %out, i32 addrspace(1)* %in) { 92*9880d681SAndroid Build Coastguard Workerentry: 93*9880d681SAndroid Build Coastguard Worker %0 = getelementptr inbounds [2 x i32], [2 x i32] addrspace(3)* @local_mem, i32 0, i32 0 94*9880d681SAndroid Build Coastguard Worker %1 = load i32, i32 addrspace(3)* %0 95*9880d681SAndroid Build Coastguard Worker %2 = load i32, i32 addrspace(1)* %in 96*9880d681SAndroid Build Coastguard Worker %3 = add i32 %2, %1 97*9880d681SAndroid Build Coastguard Worker store i32 %3, i32 addrspace(1)* %out 98*9880d681SAndroid Build Coastguard Worker ret void 99*9880d681SAndroid Build Coastguard Worker} 100