xref: /aosp_15_r20/external/llvm/lib/CodeGen/README.txt (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
2*9880d681SAndroid Build Coastguard Worker
3*9880d681SAndroid Build Coastguard WorkerCommon register allocation / spilling problem:
4*9880d681SAndroid Build Coastguard Worker
5*9880d681SAndroid Build Coastguard Worker        mul lr, r4, lr
6*9880d681SAndroid Build Coastguard Worker        str lr, [sp, #+52]
7*9880d681SAndroid Build Coastguard Worker        ldr lr, [r1, #+32]
8*9880d681SAndroid Build Coastguard Worker        sxth r3, r3
9*9880d681SAndroid Build Coastguard Worker        ldr r4, [sp, #+52]
10*9880d681SAndroid Build Coastguard Worker        mla r4, r3, lr, r4
11*9880d681SAndroid Build Coastguard Worker
12*9880d681SAndroid Build Coastguard Workercan be:
13*9880d681SAndroid Build Coastguard Worker
14*9880d681SAndroid Build Coastguard Worker        mul lr, r4, lr
15*9880d681SAndroid Build Coastguard Worker        mov r4, lr
16*9880d681SAndroid Build Coastguard Worker        str lr, [sp, #+52]
17*9880d681SAndroid Build Coastguard Worker        ldr lr, [r1, #+32]
18*9880d681SAndroid Build Coastguard Worker        sxth r3, r3
19*9880d681SAndroid Build Coastguard Worker        mla r4, r3, lr, r4
20*9880d681SAndroid Build Coastguard Worker
21*9880d681SAndroid Build Coastguard Workerand then "merge" mul and mov:
22*9880d681SAndroid Build Coastguard Worker
23*9880d681SAndroid Build Coastguard Worker        mul r4, r4, lr
24*9880d681SAndroid Build Coastguard Worker        str r4, [sp, #+52]
25*9880d681SAndroid Build Coastguard Worker        ldr lr, [r1, #+32]
26*9880d681SAndroid Build Coastguard Worker        sxth r3, r3
27*9880d681SAndroid Build Coastguard Worker        mla r4, r3, lr, r4
28*9880d681SAndroid Build Coastguard Worker
29*9880d681SAndroid Build Coastguard WorkerIt also increase the likelihood the store may become dead.
30*9880d681SAndroid Build Coastguard Worker
31*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
32*9880d681SAndroid Build Coastguard Worker
33*9880d681SAndroid Build Coastguard Workerbb27 ...
34*9880d681SAndroid Build Coastguard Worker        ...
35*9880d681SAndroid Build Coastguard Worker        %reg1037 = ADDri %reg1039, 1
36*9880d681SAndroid Build Coastguard Worker        %reg1038 = ADDrs %reg1032, %reg1039, %NOREG, 10
37*9880d681SAndroid Build Coastguard Worker    Successors according to CFG: 0x8b03bf0 (#5)
38*9880d681SAndroid Build Coastguard Worker
39*9880d681SAndroid Build Coastguard Workerbb76 (0x8b03bf0, LLVM BB @0x8b032d0, ID#5):
40*9880d681SAndroid Build Coastguard Worker    Predecessors according to CFG: 0x8b0c5f0 (#3) 0x8b0a7c0 (#4)
41*9880d681SAndroid Build Coastguard Worker        %reg1039 = PHI %reg1070, mbb<bb76.outer,0x8b0c5f0>, %reg1037, mbb<bb27,0x8b0a7c0>
42*9880d681SAndroid Build Coastguard Worker
43*9880d681SAndroid Build Coastguard WorkerNote ADDri is not a two-address instruction. However, its result %reg1037 is an
44*9880d681SAndroid Build Coastguard Workeroperand of the PHI node in bb76 and its operand %reg1039 is the result of the
45*9880d681SAndroid Build Coastguard WorkerPHI node. We should treat it as a two-address code and make sure the ADDri is
46*9880d681SAndroid Build Coastguard Workerscheduled after any node that reads %reg1039.
47*9880d681SAndroid Build Coastguard Worker
48*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
49*9880d681SAndroid Build Coastguard Worker
50*9880d681SAndroid Build Coastguard WorkerUse local info (i.e. register scavenger) to assign it a free register to allow
51*9880d681SAndroid Build Coastguard Workerreuse:
52*9880d681SAndroid Build Coastguard Worker        ldr r3, [sp, #+4]
53*9880d681SAndroid Build Coastguard Worker        add r3, r3, #3
54*9880d681SAndroid Build Coastguard Worker        ldr r2, [sp, #+8]
55*9880d681SAndroid Build Coastguard Worker        add r2, r2, #2
56*9880d681SAndroid Build Coastguard Worker        ldr r1, [sp, #+4]  <==
57*9880d681SAndroid Build Coastguard Worker        add r1, r1, #1
58*9880d681SAndroid Build Coastguard Worker        ldr r0, [sp, #+4]
59*9880d681SAndroid Build Coastguard Worker        add r0, r0, #2
60*9880d681SAndroid Build Coastguard Worker
61*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
62*9880d681SAndroid Build Coastguard Worker
63*9880d681SAndroid Build Coastguard WorkerLLVM aggressively lift CSE out of loop. Sometimes this can be negative side-
64*9880d681SAndroid Build Coastguard Workereffects:
65*9880d681SAndroid Build Coastguard Worker
66*9880d681SAndroid Build Coastguard WorkerR1 = X + 4
67*9880d681SAndroid Build Coastguard WorkerR2 = X + 7
68*9880d681SAndroid Build Coastguard WorkerR3 = X + 15
69*9880d681SAndroid Build Coastguard Worker
70*9880d681SAndroid Build Coastguard Workerloop:
71*9880d681SAndroid Build Coastguard Workerload [i + R1]
72*9880d681SAndroid Build Coastguard Worker...
73*9880d681SAndroid Build Coastguard Workerload [i + R2]
74*9880d681SAndroid Build Coastguard Worker...
75*9880d681SAndroid Build Coastguard Workerload [i + R3]
76*9880d681SAndroid Build Coastguard Worker
77*9880d681SAndroid Build Coastguard WorkerSuppose there is high register pressure, R1, R2, R3, can be spilled. We need
78*9880d681SAndroid Build Coastguard Workerto implement proper re-materialization to handle this:
79*9880d681SAndroid Build Coastguard Worker
80*9880d681SAndroid Build Coastguard WorkerR1 = X + 4
81*9880d681SAndroid Build Coastguard WorkerR2 = X + 7
82*9880d681SAndroid Build Coastguard WorkerR3 = X + 15
83*9880d681SAndroid Build Coastguard Worker
84*9880d681SAndroid Build Coastguard Workerloop:
85*9880d681SAndroid Build Coastguard WorkerR1 = X + 4  @ re-materialized
86*9880d681SAndroid Build Coastguard Workerload [i + R1]
87*9880d681SAndroid Build Coastguard Worker...
88*9880d681SAndroid Build Coastguard WorkerR2 = X + 7 @ re-materialized
89*9880d681SAndroid Build Coastguard Workerload [i + R2]
90*9880d681SAndroid Build Coastguard Worker...
91*9880d681SAndroid Build Coastguard WorkerR3 = X + 15 @ re-materialized
92*9880d681SAndroid Build Coastguard Workerload [i + R3]
93*9880d681SAndroid Build Coastguard Worker
94*9880d681SAndroid Build Coastguard WorkerFurthermore, with re-association, we can enable sharing:
95*9880d681SAndroid Build Coastguard Worker
96*9880d681SAndroid Build Coastguard WorkerR1 = X + 4
97*9880d681SAndroid Build Coastguard WorkerR2 = X + 7
98*9880d681SAndroid Build Coastguard WorkerR3 = X + 15
99*9880d681SAndroid Build Coastguard Worker
100*9880d681SAndroid Build Coastguard Workerloop:
101*9880d681SAndroid Build Coastguard WorkerT = i + X
102*9880d681SAndroid Build Coastguard Workerload [T + 4]
103*9880d681SAndroid Build Coastguard Worker...
104*9880d681SAndroid Build Coastguard Workerload [T + 7]
105*9880d681SAndroid Build Coastguard Worker...
106*9880d681SAndroid Build Coastguard Workerload [T + 15]
107*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
108*9880d681SAndroid Build Coastguard Worker
109*9880d681SAndroid Build Coastguard WorkerIt's not always a good idea to choose rematerialization over spilling. If all
110*9880d681SAndroid Build Coastguard Workerthe load / store instructions would be folded then spilling is cheaper because
111*9880d681SAndroid Build Coastguard Workerit won't require new live intervals / registers. See 2003-05-31-LongShifts for
112*9880d681SAndroid Build Coastguard Workeran example.
113*9880d681SAndroid Build Coastguard Worker
114*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
115*9880d681SAndroid Build Coastguard Worker
116*9880d681SAndroid Build Coastguard WorkerWith a copying garbage collector, derived pointers must not be retained across
117*9880d681SAndroid Build Coastguard Workercollector safe points; the collector could move the objects and invalidate the
118*9880d681SAndroid Build Coastguard Workerderived pointer. This is bad enough in the first place, but safe points can
119*9880d681SAndroid Build Coastguard Workercrop up unpredictably. Consider:
120*9880d681SAndroid Build Coastguard Worker
121*9880d681SAndroid Build Coastguard Worker        %array = load { i32, [0 x %obj] }** %array_addr
122*9880d681SAndroid Build Coastguard Worker        %nth_el = getelementptr { i32, [0 x %obj] }* %array, i32 0, i32 %n
123*9880d681SAndroid Build Coastguard Worker        %old = load %obj** %nth_el
124*9880d681SAndroid Build Coastguard Worker        %z = div i64 %x, %y
125*9880d681SAndroid Build Coastguard Worker        store %obj* %new, %obj** %nth_el
126*9880d681SAndroid Build Coastguard Worker
127*9880d681SAndroid Build Coastguard WorkerIf the i64 division is lowered to a libcall, then a safe point will (must)
128*9880d681SAndroid Build Coastguard Workerappear for the call site. If a collection occurs, %array and %nth_el no longer
129*9880d681SAndroid Build Coastguard Workerpoint into the correct object.
130*9880d681SAndroid Build Coastguard Worker
131*9880d681SAndroid Build Coastguard WorkerThe fix for this is to copy address calculations so that dependent pointers
132*9880d681SAndroid Build Coastguard Workerare never live across safe point boundaries. But the loads cannot be copied
133*9880d681SAndroid Build Coastguard Workerlike this if there was an intervening store, so may be hard to get right.
134*9880d681SAndroid Build Coastguard Worker
135*9880d681SAndroid Build Coastguard WorkerOnly a concurrent mutator can trigger a collection at the libcall safe point.
136*9880d681SAndroid Build Coastguard WorkerSo single-threaded programs do not have this requirement, even with a copying
137*9880d681SAndroid Build Coastguard Workercollector. Still, LLVM optimizations would probably undo a front-end's careful
138*9880d681SAndroid Build Coastguard Workerwork.
139*9880d681SAndroid Build Coastguard Worker
140*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
141*9880d681SAndroid Build Coastguard Worker
142*9880d681SAndroid Build Coastguard WorkerThe ocaml frametable structure supports liveness information. It would be good
143*9880d681SAndroid Build Coastguard Workerto support it.
144*9880d681SAndroid Build Coastguard Worker
145*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
146*9880d681SAndroid Build Coastguard Worker
147*9880d681SAndroid Build Coastguard WorkerThe FIXME in ComputeCommonTailLength in BranchFolding.cpp needs to be
148*9880d681SAndroid Build Coastguard Workerrevisited. The check is there to work around a misuse of directives in inline
149*9880d681SAndroid Build Coastguard Workerassembly.
150*9880d681SAndroid Build Coastguard Worker
151*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
152*9880d681SAndroid Build Coastguard Worker
153*9880d681SAndroid Build Coastguard WorkerIt would be good to detect collector/target compatibility instead of silently
154*9880d681SAndroid Build Coastguard Workerdoing the wrong thing.
155*9880d681SAndroid Build Coastguard Worker
156*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
157*9880d681SAndroid Build Coastguard Worker
158*9880d681SAndroid Build Coastguard WorkerIt would be really nice to be able to write patterns in .td files for copies,
159*9880d681SAndroid Build Coastguard Workerwhich would eliminate a bunch of explicit predicates on them (e.g. no side
160*9880d681SAndroid Build Coastguard Workereffects).  Once this is in place, it would be even better to have tblgen
161*9880d681SAndroid Build Coastguard Workersynthesize the various copy insertion/inspection methods in TargetInstrInfo.
162*9880d681SAndroid Build Coastguard Worker
163*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
164*9880d681SAndroid Build Coastguard Worker
165*9880d681SAndroid Build Coastguard WorkerStack coloring improvements:
166*9880d681SAndroid Build Coastguard Worker
167*9880d681SAndroid Build Coastguard Worker1. Do proper LiveStackAnalysis on all stack objects including those which are
168*9880d681SAndroid Build Coastguard Worker   not spill slots.
169*9880d681SAndroid Build Coastguard Worker2. Reorder objects to fill in gaps between objects.
170*9880d681SAndroid Build Coastguard Worker   e.g. 4, 1, <gap>, 4, 1, 1, 1, <gap>, 4 => 4, 1, 1, 1, 1, 4, 4
171*9880d681SAndroid Build Coastguard Worker
172*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
173*9880d681SAndroid Build Coastguard Worker
174*9880d681SAndroid Build Coastguard WorkerThe scheduler should be able to sort nearby instructions by their address. For
175*9880d681SAndroid Build Coastguard Workerexample, in an expanded memset sequence it's not uncommon to see code like this:
176*9880d681SAndroid Build Coastguard Worker
177*9880d681SAndroid Build Coastguard Worker  movl $0, 4(%rdi)
178*9880d681SAndroid Build Coastguard Worker  movl $0, 8(%rdi)
179*9880d681SAndroid Build Coastguard Worker  movl $0, 12(%rdi)
180*9880d681SAndroid Build Coastguard Worker  movl $0, 0(%rdi)
181*9880d681SAndroid Build Coastguard Worker
182*9880d681SAndroid Build Coastguard WorkerEach of the stores is independent, and the scheduler is currently making an
183*9880d681SAndroid Build Coastguard Workerarbitrary decision about the order.
184*9880d681SAndroid Build Coastguard Worker
185*9880d681SAndroid Build Coastguard Worker//===---------------------------------------------------------------------===//
186*9880d681SAndroid Build Coastguard Worker
187*9880d681SAndroid Build Coastguard WorkerAnother opportunitiy in this code is that the $0 could be moved to a register:
188*9880d681SAndroid Build Coastguard Worker
189*9880d681SAndroid Build Coastguard Worker  movl $0, 4(%rdi)
190*9880d681SAndroid Build Coastguard Worker  movl $0, 8(%rdi)
191*9880d681SAndroid Build Coastguard Worker  movl $0, 12(%rdi)
192*9880d681SAndroid Build Coastguard Worker  movl $0, 0(%rdi)
193*9880d681SAndroid Build Coastguard Worker
194*9880d681SAndroid Build Coastguard WorkerThis would save substantial code size, especially for longer sequences like
195*9880d681SAndroid Build Coastguard Workerthis. It would be easy to have a rule telling isel to avoid matching MOV32mi
196*9880d681SAndroid Build Coastguard Workerif the immediate has more than some fixed number of uses. It's more involved
197*9880d681SAndroid Build Coastguard Workerto teach the register allocator how to do late folding to recover from
198*9880d681SAndroid Build Coastguard Workerexcessive register pressure.
199*9880d681SAndroid Build Coastguard Worker
200