xref: /aosp_15_r20/external/mesa3d/docs/drivers/freedreno/ir3-notes.rst (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1*61046927SAndroid Build Coastguard WorkerIR3 NOTES
2*61046927SAndroid Build Coastguard Worker=========
3*61046927SAndroid Build Coastguard Worker
4*61046927SAndroid Build Coastguard WorkerSome notes about ir3, the compiler and machine-specific IR for the shader ISA introduced with Adreno 3xx.  The same shader ISA is present, with some small differences, in Adreno 4xx.
5*61046927SAndroid Build Coastguard Worker
6*61046927SAndroid Build Coastguard WorkerCompared to the previous generation a2xx ISA (ir2), the a3xx ISA is a "simple" scalar instruction set.  However, the compiler is responsible, in most cases, to schedule the instructions.  The hardware does not try to hide the shader core pipeline stages.  For a common example, a common (cat2) ALU instruction takes four cycles, so a subsequent cat2 instruction which uses the result must have three intervening instructions (or NOPs).  When operating on vec4's, typically the corresponding scalar instructions for operating on the remaining three components could typically fit.  Although that results in a lot of edge cases where things fall over, like:
7*61046927SAndroid Build Coastguard Worker
8*61046927SAndroid Build Coastguard Worker::
9*61046927SAndroid Build Coastguard Worker
10*61046927SAndroid Build Coastguard Worker  ADD TEMP[0], TEMP[1], TEMP[2]
11*61046927SAndroid Build Coastguard Worker  MUL TEMP[0], TEMP[1], TEMP[0].wzyx
12*61046927SAndroid Build Coastguard Worker
13*61046927SAndroid Build Coastguard WorkerHere, the second instruction needs the output of the first group of scalar instructions in the wrong order, resulting in not enough instruction spots between the ``add r0.w, r1.w, r2.w`` and ``mul r0.x, r1.x, r0.w``.  Which is why the original (old) compiler which merely translated nearly literally from TGSI to ir3, had a strong tendency to fall over.
14*61046927SAndroid Build Coastguard Worker
15*61046927SAndroid Build Coastguard WorkerSo the current compiler instead, in the frontend, generates a directed-acyclic-graph of instructions and basic blocks, which go through various additional passes to eventually schedule and do register assignment.
16*61046927SAndroid Build Coastguard Worker
17*61046927SAndroid Build Coastguard WorkerFor additional documentation about the hardware, see wiki: `a3xx ISA
18*61046927SAndroid Build Coastguard Worker<https://github.com/freedreno/freedreno/wiki/A3xx-shader-instruction-set-architecture>`__.
19*61046927SAndroid Build Coastguard Worker
20*61046927SAndroid Build Coastguard WorkerExternal Structure
21*61046927SAndroid Build Coastguard Worker------------------
22*61046927SAndroid Build Coastguard Worker
23*61046927SAndroid Build Coastguard Worker``ir3_shader``
24*61046927SAndroid Build Coastguard Worker    A single vertex/fragment/etc shader from gallium perspective (i.e.
25*61046927SAndroid Build Coastguard Worker    maps to a single TGSI shader), and manages a set of shader variants
26*61046927SAndroid Build Coastguard Worker    which are generated on demand based on the shader key.
27*61046927SAndroid Build Coastguard Worker
28*61046927SAndroid Build Coastguard Worker``ir3_shader_key``
29*61046927SAndroid Build Coastguard Worker    The configuration key that identifies a shader variant.  I.e. based
30*61046927SAndroid Build Coastguard Worker    on other GL state (two-sided-color, render-to-alpha, etc) or render
31*61046927SAndroid Build Coastguard Worker    stages (binning-pass vertex shader) different shader variants are
32*61046927SAndroid Build Coastguard Worker    generated.
33*61046927SAndroid Build Coastguard Worker
34*61046927SAndroid Build Coastguard Worker``ir3_shader_variant``
35*61046927SAndroid Build Coastguard Worker    The actual HW shader generated based on input TGSI and shader key.
36*61046927SAndroid Build Coastguard Worker
37*61046927SAndroid Build Coastguard Worker``ir3_compiler``
38*61046927SAndroid Build Coastguard Worker    Compiler frontend which generates ir3 and runs the various backend
39*61046927SAndroid Build Coastguard Worker    stages to schedule and do register assignment.
40*61046927SAndroid Build Coastguard Worker
41*61046927SAndroid Build Coastguard WorkerThe IR
42*61046927SAndroid Build Coastguard Worker------
43*61046927SAndroid Build Coastguard Worker
44*61046927SAndroid Build Coastguard WorkerThe ir3 IR maps quite directly to the hardware, in that instruction opcodes map directly to hardware opcodes, and that dst/src register(s) map directly to the hardware dst/src register(s).  But there are a few extensions, in the form of meta_ instructions.  And additionally, for normal (non-const, etc) src registers, the ``IR3_REG_SSA`` flag is set and ``reg->instr`` points to the source instruction which produced that value.  So, for example, the following TGSI shader:
45*61046927SAndroid Build Coastguard Worker
46*61046927SAndroid Build Coastguard Worker::
47*61046927SAndroid Build Coastguard Worker
48*61046927SAndroid Build Coastguard Worker  VERT
49*61046927SAndroid Build Coastguard Worker  DCL IN[0]
50*61046927SAndroid Build Coastguard Worker  DCL IN[1]
51*61046927SAndroid Build Coastguard Worker  DCL OUT[0], POSITION
52*61046927SAndroid Build Coastguard Worker  DCL TEMP[0], LOCAL
53*61046927SAndroid Build Coastguard Worker    1: DP3 TEMP[0].x, IN[0].xyzz, IN[1].xyzz
54*61046927SAndroid Build Coastguard Worker    2: MOV OUT[0], TEMP[0].xxxx
55*61046927SAndroid Build Coastguard Worker    3: END
56*61046927SAndroid Build Coastguard Worker
57*61046927SAndroid Build Coastguard Workereventually generates:
58*61046927SAndroid Build Coastguard Worker
59*61046927SAndroid Build Coastguard Worker.. graphviz::
60*61046927SAndroid Build Coastguard Worker
61*61046927SAndroid Build Coastguard Worker  digraph G {
62*61046927SAndroid Build Coastguard Worker  rankdir=RL;
63*61046927SAndroid Build Coastguard Worker  nodesep=0.25;
64*61046927SAndroid Build Coastguard Worker  ranksep=1.5;
65*61046927SAndroid Build Coastguard Worker  subgraph clusterdce198 {
66*61046927SAndroid Build Coastguard Worker  label="vert";
67*61046927SAndroid Build Coastguard Worker  inputdce198 [shape=record,label="inputs|<in0> i0.x|<in1> i0.y|<in2> i0.z|<in4> i1.x|<in5> i1.y|<in6> i1.z"];
68*61046927SAndroid Build Coastguard Worker  instrdcf348 [shape=record,style=filled,fillcolor=lightgrey,label="{mov.f32f32|<dst0>|<src0> }"];
69*61046927SAndroid Build Coastguard Worker  instrdcedd0 [shape=record,style=filled,fillcolor=lightgrey,label="{mad.f32|<dst0>|<src0> |<src1> |<src2> }"];
70*61046927SAndroid Build Coastguard Worker  inputdce198:<in2>:w -> instrdcedd0:<src0>
71*61046927SAndroid Build Coastguard Worker  inputdce198:<in6>:w -> instrdcedd0:<src1>
72*61046927SAndroid Build Coastguard Worker  instrdcec30 [shape=record,style=filled,fillcolor=lightgrey,label="{mad.f32|<dst0>|<src0> |<src1> |<src2> }"];
73*61046927SAndroid Build Coastguard Worker  inputdce198:<in1>:w -> instrdcec30:<src0>
74*61046927SAndroid Build Coastguard Worker  inputdce198:<in5>:w -> instrdcec30:<src1>
75*61046927SAndroid Build Coastguard Worker  instrdceb60 [shape=record,style=filled,fillcolor=lightgrey,label="{mul.f|<dst0>|<src0> |<src1> }"];
76*61046927SAndroid Build Coastguard Worker  inputdce198:<in0>:w -> instrdceb60:<src0>
77*61046927SAndroid Build Coastguard Worker  inputdce198:<in4>:w -> instrdceb60:<src1>
78*61046927SAndroid Build Coastguard Worker  instrdceb60:<dst0> -> instrdcec30:<src2>
79*61046927SAndroid Build Coastguard Worker  instrdcec30:<dst0> -> instrdcedd0:<src2>
80*61046927SAndroid Build Coastguard Worker  instrdcedd0:<dst0> -> instrdcf348:<src0>
81*61046927SAndroid Build Coastguard Worker  instrdcf400 [shape=record,style=filled,fillcolor=lightgrey,label="{mov.f32f32|<dst0>|<src0> }"];
82*61046927SAndroid Build Coastguard Worker  instrdcedd0:<dst0> -> instrdcf400:<src0>
83*61046927SAndroid Build Coastguard Worker  instrdcf4b8 [shape=record,style=filled,fillcolor=lightgrey,label="{mov.f32f32|<dst0>|<src0> }"];
84*61046927SAndroid Build Coastguard Worker  instrdcedd0:<dst0> -> instrdcf4b8:<src0>
85*61046927SAndroid Build Coastguard Worker  outputdce198 [shape=record,label="outputs|<out0> o0.x|<out1> o0.y|<out2> o0.z|<out3> o0.w"];
86*61046927SAndroid Build Coastguard Worker  instrdcf348:<dst0> -> outputdce198:<out0>:e
87*61046927SAndroid Build Coastguard Worker  instrdcf400:<dst0> -> outputdce198:<out1>:e
88*61046927SAndroid Build Coastguard Worker  instrdcf4b8:<dst0> -> outputdce198:<out2>:e
89*61046927SAndroid Build Coastguard Worker  instrdcedd0:<dst0> -> outputdce198:<out3>:e
90*61046927SAndroid Build Coastguard Worker  }
91*61046927SAndroid Build Coastguard Worker  }
92*61046927SAndroid Build Coastguard Worker
93*61046927SAndroid Build Coastguard Worker(after scheduling, etc, but before register assignment).
94*61046927SAndroid Build Coastguard Worker
95*61046927SAndroid Build Coastguard WorkerInternal Structure
96*61046927SAndroid Build Coastguard Worker~~~~~~~~~~~~~~~~~~
97*61046927SAndroid Build Coastguard Worker
98*61046927SAndroid Build Coastguard Worker``ir3_block``
99*61046927SAndroid Build Coastguard Worker    Represents a basic block.
100*61046927SAndroid Build Coastguard Worker
101*61046927SAndroid Build Coastguard Worker    TODO: currently blocks are nested, but I think I need to change that
102*61046927SAndroid Build Coastguard Worker    to a more conventional arrangement before implementing proper flow
103*61046927SAndroid Build Coastguard Worker    control.  Currently the only flow control handles is if/else which
104*61046927SAndroid Build Coastguard Worker    gets flattened out and results chosen with ``sel`` instructions.
105*61046927SAndroid Build Coastguard Worker
106*61046927SAndroid Build Coastguard Worker``ir3_instruction``
107*61046927SAndroid Build Coastguard Worker    Represents a machine instruction or meta_ instruction.  Has pointers
108*61046927SAndroid Build Coastguard Worker    to dst register (``regs[0]``) and src register(s) (``regs[1..n]``),
109*61046927SAndroid Build Coastguard Worker    as needed.
110*61046927SAndroid Build Coastguard Worker
111*61046927SAndroid Build Coastguard Worker``ir3_register``
112*61046927SAndroid Build Coastguard Worker    Represents a src or dst register, flags indicate const/relative/etc.
113*61046927SAndroid Build Coastguard Worker    If ``IR3_REG_SSA`` is set on a src register, the actual register
114*61046927SAndroid Build Coastguard Worker    number (name) has not been assigned yet, and instead the ``instr``
115*61046927SAndroid Build Coastguard Worker    field points to src instruction.
116*61046927SAndroid Build Coastguard Worker
117*61046927SAndroid Build Coastguard WorkerIn addition there are various util macros/functions to simplify manipulation/traversal of the graph:
118*61046927SAndroid Build Coastguard Worker
119*61046927SAndroid Build Coastguard Worker``foreach_src(srcreg, instr)``
120*61046927SAndroid Build Coastguard Worker    Iterate each instruction's source ``ir3_register``\s
121*61046927SAndroid Build Coastguard Worker
122*61046927SAndroid Build Coastguard Worker``foreach_src_n(srcreg, n, instr)``
123*61046927SAndroid Build Coastguard Worker    Like ``foreach_src``, also setting ``n`` to the source number (starting
124*61046927SAndroid Build Coastguard Worker    with ``0``).
125*61046927SAndroid Build Coastguard Worker
126*61046927SAndroid Build Coastguard Worker``foreach_ssa_src(srcinstr, instr)``
127*61046927SAndroid Build Coastguard Worker    Iterate each instruction's SSA source ``ir3_instruction``\s.  This skips
128*61046927SAndroid Build Coastguard Worker    non-SSA sources (consts, etc), but includes virtual sources (such as the
129*61046927SAndroid Build Coastguard Worker    address register if `relative addressing`_ is used).
130*61046927SAndroid Build Coastguard Worker
131*61046927SAndroid Build Coastguard Worker``foreach_ssa_src_n(srcinstr, n, instr)``
132*61046927SAndroid Build Coastguard Worker    Like ``foreach_ssa_src``, also setting ``n`` to the source number.
133*61046927SAndroid Build Coastguard Worker
134*61046927SAndroid Build Coastguard WorkerFor example:
135*61046927SAndroid Build Coastguard Worker
136*61046927SAndroid Build Coastguard Worker.. code-block:: c
137*61046927SAndroid Build Coastguard Worker
138*61046927SAndroid Build Coastguard Worker  foreach_ssa_src_n(src, i, instr) {
139*61046927SAndroid Build Coastguard Worker    unsigned d = delay_calc_srcn(ctx, src, instr, i);
140*61046927SAndroid Build Coastguard Worker    delay = MAX2(delay, d);
141*61046927SAndroid Build Coastguard Worker  }
142*61046927SAndroid Build Coastguard Worker
143*61046927SAndroid Build Coastguard Worker
144*61046927SAndroid Build Coastguard WorkerTODO probably other helper/util stuff worth mentioning here
145*61046927SAndroid Build Coastguard Worker
146*61046927SAndroid Build Coastguard Worker.. _meta:
147*61046927SAndroid Build Coastguard Worker
148*61046927SAndroid Build Coastguard WorkerMeta Instructions
149*61046927SAndroid Build Coastguard Worker~~~~~~~~~~~~~~~~~
150*61046927SAndroid Build Coastguard Worker
151*61046927SAndroid Build Coastguard Worker**input**
152*61046927SAndroid Build Coastguard Worker    Used for shader inputs (registers configured in the command-stream
153*61046927SAndroid Build Coastguard Worker    to hold particular input values, written by the shader core before
154*61046927SAndroid Build Coastguard Worker    start of execution.  Also used for connecting up values within a
155*61046927SAndroid Build Coastguard Worker    basic block to an output of a previous block.
156*61046927SAndroid Build Coastguard Worker
157*61046927SAndroid Build Coastguard Worker**output**
158*61046927SAndroid Build Coastguard Worker    Used to hold outputs of a basic block.
159*61046927SAndroid Build Coastguard Worker
160*61046927SAndroid Build Coastguard Worker**flow**
161*61046927SAndroid Build Coastguard Worker    TODO
162*61046927SAndroid Build Coastguard Worker
163*61046927SAndroid Build Coastguard Worker**phi**
164*61046927SAndroid Build Coastguard Worker    TODO
165*61046927SAndroid Build Coastguard Worker
166*61046927SAndroid Build Coastguard Worker**collect**
167*61046927SAndroid Build Coastguard Worker    Groups registers which need to be assigned to consecutive scalar
168*61046927SAndroid Build Coastguard Worker    registers, for example ``sam`` (texture fetch) src instructions (see
169*61046927SAndroid Build Coastguard Worker    `register groups`_) or array element dereference
170*61046927SAndroid Build Coastguard Worker    (see `relative addressing`_).
171*61046927SAndroid Build Coastguard Worker
172*61046927SAndroid Build Coastguard Worker**split**
173*61046927SAndroid Build Coastguard Worker    The counterpart to **collect**, when an instruction such as ``sam``
174*61046927SAndroid Build Coastguard Worker    writes multiple components, splits the result into individual
175*61046927SAndroid Build Coastguard Worker    scalar components to be consumed by other instructions.
176*61046927SAndroid Build Coastguard Worker
177*61046927SAndroid Build Coastguard Worker
178*61046927SAndroid Build Coastguard Worker.. _`flow control`:
179*61046927SAndroid Build Coastguard Worker
180*61046927SAndroid Build Coastguard WorkerFlow Control
181*61046927SAndroid Build Coastguard Worker~~~~~~~~~~~~
182*61046927SAndroid Build Coastguard Worker
183*61046927SAndroid Build Coastguard WorkerTODO
184*61046927SAndroid Build Coastguard Worker
185*61046927SAndroid Build Coastguard Worker
186*61046927SAndroid Build Coastguard Worker.. _`register groups`:
187*61046927SAndroid Build Coastguard Worker
188*61046927SAndroid Build Coastguard WorkerRegister Groups
189*61046927SAndroid Build Coastguard Worker~~~~~~~~~~~~~~~
190*61046927SAndroid Build Coastguard Worker
191*61046927SAndroid Build Coastguard WorkerCertain instructions, such as texture sample instructions, consume multiple consecutive scalar registers via a single src register encoded in the instruction, and/or write multiple consecutive scalar registers.  In the simplest example:
192*61046927SAndroid Build Coastguard Worker
193*61046927SAndroid Build Coastguard Worker::
194*61046927SAndroid Build Coastguard Worker
195*61046927SAndroid Build Coastguard Worker  sam (f32)(xyz)r2.x, r0.z, s#0, t#0
196*61046927SAndroid Build Coastguard Worker
197*61046927SAndroid Build Coastguard Workerfor a 2d texture, would read ``r0.zw`` to get the coordinate, and write ``r2.xyz``.
198*61046927SAndroid Build Coastguard Worker
199*61046927SAndroid Build Coastguard WorkerBefore register assignment, to group the two components of the texture src together:
200*61046927SAndroid Build Coastguard Worker
201*61046927SAndroid Build Coastguard Worker.. graphviz::
202*61046927SAndroid Build Coastguard Worker
203*61046927SAndroid Build Coastguard Worker  digraph G {
204*61046927SAndroid Build Coastguard Worker    { rank=same;
205*61046927SAndroid Build Coastguard Worker      collect;
206*61046927SAndroid Build Coastguard Worker    };
207*61046927SAndroid Build Coastguard Worker    { rank=same;
208*61046927SAndroid Build Coastguard Worker      coord_x;
209*61046927SAndroid Build Coastguard Worker      coord_y;
210*61046927SAndroid Build Coastguard Worker    };
211*61046927SAndroid Build Coastguard Worker    sam -> collect [label="regs[1]"];
212*61046927SAndroid Build Coastguard Worker    collect -> coord_x [label="regs[1]"];
213*61046927SAndroid Build Coastguard Worker    collect -> coord_y [label="regs[2]"];
214*61046927SAndroid Build Coastguard Worker    coord_x -> coord_y [label="right",style=dotted];
215*61046927SAndroid Build Coastguard Worker    coord_y -> coord_x [label="left",style=dotted];
216*61046927SAndroid Build Coastguard Worker    coord_x [label="coord.x"];
217*61046927SAndroid Build Coastguard Worker    coord_y [label="coord.y"];
218*61046927SAndroid Build Coastguard Worker  }
219*61046927SAndroid Build Coastguard Worker
220*61046927SAndroid Build Coastguard WorkerThe frontend sets up the SSA ptrs from ``sam`` source register to the ``collect`` meta instruction, which in turn points to the instructions producing the ``coord.x`` and ``coord.y`` values.  And the grouping_ pass sets up the ``left`` and ``right`` neighbor pointers to the ``collect``\'s sources, used later by the `register assignment`_ pass to assign blocks of scalar registers.
221*61046927SAndroid Build Coastguard Worker
222*61046927SAndroid Build Coastguard WorkerAnd likewise, for the consecutive scalar registers for the destination:
223*61046927SAndroid Build Coastguard Worker
224*61046927SAndroid Build Coastguard Worker.. graphviz::
225*61046927SAndroid Build Coastguard Worker
226*61046927SAndroid Build Coastguard Worker  digraph {
227*61046927SAndroid Build Coastguard Worker    { rank=same;
228*61046927SAndroid Build Coastguard Worker      A;
229*61046927SAndroid Build Coastguard Worker      B;
230*61046927SAndroid Build Coastguard Worker      C;
231*61046927SAndroid Build Coastguard Worker    };
232*61046927SAndroid Build Coastguard Worker    { rank=same;
233*61046927SAndroid Build Coastguard Worker      split_0;
234*61046927SAndroid Build Coastguard Worker      split_1;
235*61046927SAndroid Build Coastguard Worker      split_2;
236*61046927SAndroid Build Coastguard Worker    };
237*61046927SAndroid Build Coastguard Worker    A -> split_0;
238*61046927SAndroid Build Coastguard Worker    B -> split_1;
239*61046927SAndroid Build Coastguard Worker    C -> split_2;
240*61046927SAndroid Build Coastguard Worker    split_0 [label="split\noff=0"];
241*61046927SAndroid Build Coastguard Worker    split_0 -> sam;
242*61046927SAndroid Build Coastguard Worker    split_1 [label="split\noff=1"];
243*61046927SAndroid Build Coastguard Worker    split_1 -> sam;
244*61046927SAndroid Build Coastguard Worker    split_2 [label="split\noff=2"];
245*61046927SAndroid Build Coastguard Worker    split_2 -> sam;
246*61046927SAndroid Build Coastguard Worker    split_0 -> split_1 [label="right",style=dotted];
247*61046927SAndroid Build Coastguard Worker    split_1 -> split_0 [label="left",style=dotted];
248*61046927SAndroid Build Coastguard Worker    split_1 -> split_2 [label="right",style=dotted];
249*61046927SAndroid Build Coastguard Worker    split_2 -> split_1 [label="left",style=dotted];
250*61046927SAndroid Build Coastguard Worker    sam;
251*61046927SAndroid Build Coastguard Worker  }
252*61046927SAndroid Build Coastguard Worker
253*61046927SAndroid Build Coastguard Worker.. _`relative addressing`:
254*61046927SAndroid Build Coastguard Worker
255*61046927SAndroid Build Coastguard WorkerRelative Addressing
256*61046927SAndroid Build Coastguard Worker~~~~~~~~~~~~~~~~~~~
257*61046927SAndroid Build Coastguard Worker
258*61046927SAndroid Build Coastguard WorkerMost instructions support addressing indirectly (relative to address register) into const or gpr register file in some or all of their src/dst registers.  In this case the register accessed is taken from ``r<a0.x + n>`` or ``c<a0.x + n>``, i.e. address register (``a0.x``) value plus ``n``, where ``n`` is encoded in the instruction (rather than the absolute register number).
259*61046927SAndroid Build Coastguard Worker
260*61046927SAndroid Build Coastguard Worker    Note that cat5 (texture sample) instructions are the notable exception, not
261*61046927SAndroid Build Coastguard Worker    supporting relative addressing of src or dst.
262*61046927SAndroid Build Coastguard Worker
263*61046927SAndroid Build Coastguard WorkerRelative addressing of the const file (for example, a uniform array) is relatively simple.  We don't do register assignment of the const file, so all that is required is to schedule things properly.  I.e. the instruction that writes the address register must be scheduled first, and we cannot have two different address register values live at one time.
264*61046927SAndroid Build Coastguard Worker
265*61046927SAndroid Build Coastguard WorkerBut relative addressing of gpr file (which can be as src or dst) has additional restrictions on register assignment (i.e. the array elements must be assigned to consecutive scalar registers).  And in the case of relative dst, subsequent instructions now depend on both the relative write, as well as the previous instruction which wrote that register, since we do not know at compile time which actual register was written.
266*61046927SAndroid Build Coastguard Worker
267*61046927SAndroid Build Coastguard WorkerEach instruction has an optional ``address`` pointer, to capture the dependency on the address register value when relative addressing is used for any of the src/dst register(s).  This behaves as an additional virtual src register, i.e. ``foreach_ssa_src()`` will also iterate the address register (last).
268*61046927SAndroid Build Coastguard Worker
269*61046927SAndroid Build Coastguard Worker    Note that ``nop``\'s for timing constraints, type specifiers (i.e.
270*61046927SAndroid Build Coastguard Worker    ``add.f`` vs ``add.u``), etc, omitted for brevity in examples
271*61046927SAndroid Build Coastguard Worker
272*61046927SAndroid Build Coastguard Worker::
273*61046927SAndroid Build Coastguard Worker
274*61046927SAndroid Build Coastguard Worker  mova a0.x, hr1.y
275*61046927SAndroid Build Coastguard Worker  sub r1.y, r2.x, r3.x
276*61046927SAndroid Build Coastguard Worker  add r0.x, r1.y, c<a0.x + 2>
277*61046927SAndroid Build Coastguard Worker
278*61046927SAndroid Build Coastguard Workerresults in:
279*61046927SAndroid Build Coastguard Worker
280*61046927SAndroid Build Coastguard Worker.. graphviz::
281*61046927SAndroid Build Coastguard Worker
282*61046927SAndroid Build Coastguard Worker  digraph {
283*61046927SAndroid Build Coastguard Worker    rankdir=LR;
284*61046927SAndroid Build Coastguard Worker    sub;
285*61046927SAndroid Build Coastguard Worker    const [label="const file"];
286*61046927SAndroid Build Coastguard Worker    add;
287*61046927SAndroid Build Coastguard Worker    mova;
288*61046927SAndroid Build Coastguard Worker    add -> mova;
289*61046927SAndroid Build Coastguard Worker    add -> sub;
290*61046927SAndroid Build Coastguard Worker    add -> const [label="off=2"];
291*61046927SAndroid Build Coastguard Worker  }
292*61046927SAndroid Build Coastguard Worker
293*61046927SAndroid Build Coastguard WorkerThe scheduling pass has some smarts to schedule things such that only a single ``a0.x`` value is used at any one time.
294*61046927SAndroid Build Coastguard Worker
295*61046927SAndroid Build Coastguard WorkerTo implement variable arrays, the NIR registers are stored as an ``ir3_array``,
296*61046927SAndroid Build Coastguard Workerwhich will be register allocated to consecutive hardware registers.  The array
297*61046927SAndroid Build Coastguard Workeraccess uses the id field in the ``ir3_register`` to map to the array being
298*61046927SAndroid Build Coastguard Workeraccessed, and the offset field for the fixed offset within the array.  A NIR
299*61046927SAndroid Build Coastguard Workerindirect register read such as:
300*61046927SAndroid Build Coastguard Worker
301*61046927SAndroid Build Coastguard Worker::
302*61046927SAndroid Build Coastguard Worker
303*61046927SAndroid Build Coastguard Worker  decl_reg vec2 32 r0[2]
304*61046927SAndroid Build Coastguard Worker  ...
305*61046927SAndroid Build Coastguard Worker  vec2 32 ssa_19 = mov r0[0 + ssa_9]
306*61046927SAndroid Build Coastguard Worker
307*61046927SAndroid Build Coastguard Worker
308*61046927SAndroid Build Coastguard Workerresults in:
309*61046927SAndroid Build Coastguard Worker
310*61046927SAndroid Build Coastguard Worker::
311*61046927SAndroid Build Coastguard Worker
312*61046927SAndroid Build Coastguard Worker  0000:0000:001:  shl.b hssa_19, hssa_17, himm[0.000000,1,0x1]
313*61046927SAndroid Build Coastguard Worker  0000:0000:002:  mov.s16s16 hr61.x, hssa_19
314*61046927SAndroid Build Coastguard Worker  0000:0000:002:  mov.u32u32 ssa_21, arr[id=1, offset=0, size=4, ssa_12], address=_[0000:0000:002:  mov.s16s16]
315*61046927SAndroid Build Coastguard Worker  0000:0000:002:  mov.u32u32 ssa_22, arr[id=1, offset=1, size=4, ssa_12], address=_[0000:0000:002:  mov.s16s16]
316*61046927SAndroid Build Coastguard Worker
317*61046927SAndroid Build Coastguard Worker
318*61046927SAndroid Build Coastguard WorkerArray writes write to the array in ``instr->regs[0]->array.id``.  A NIR indirect
319*61046927SAndroid Build Coastguard Workerregister write such as:
320*61046927SAndroid Build Coastguard Worker
321*61046927SAndroid Build Coastguard Worker::
322*61046927SAndroid Build Coastguard Worker
323*61046927SAndroid Build Coastguard Worker  decl_reg vec2 32 r0[2]
324*61046927SAndroid Build Coastguard Worker  ...
325*61046927SAndroid Build Coastguard Worker  r0[0 + ssa_12] = mov ssa_13
326*61046927SAndroid Build Coastguard Worker
327*61046927SAndroid Build Coastguard Workerresults in:
328*61046927SAndroid Build Coastguard Worker
329*61046927SAndroid Build Coastguard Worker::
330*61046927SAndroid Build Coastguard Worker
331*61046927SAndroid Build Coastguard Worker  0000:0000:001:  shl.b hssa_29, hssa_27, himm[0.000000,1,0x1]
332*61046927SAndroid Build Coastguard Worker  0000:0000:002:  mov.s16s16 hr61.x, hssa_29
333*61046927SAndroid Build Coastguard Worker  0000:0000:001:  mov.u32u32 arr[id=1, offset=0, size=4, ssa_17], c2.y, address=_[0000:0000:002:  mov.s16s16]
334*61046927SAndroid Build Coastguard Worker  0000:0000:004:  mov.u32u32 arr[id=1, offset=1, size=4, ssa_31], c2.z, address=_[0000:0000:002:  mov.s16s16]
335*61046927SAndroid Build Coastguard Worker
336*61046927SAndroid Build Coastguard WorkerNote that only cat1 (mov) can do indirect write, and thus NIR register stores
337*61046927SAndroid Build Coastguard Workermay need to introduce an extra mov.
338*61046927SAndroid Build Coastguard Worker
339*61046927SAndroid Build Coastguard Workerir3 array accesses in the DAG get serialized by the ``instr->barrier_class`` and
340*61046927SAndroid Build Coastguard Workercontaining ``IR3_BARRIER_ARRAY_W`` or ``IR3_BARRIER_ARRAY_R``.
341*61046927SAndroid Build Coastguard Worker
342*61046927SAndroid Build Coastguard WorkerShader Passes
343*61046927SAndroid Build Coastguard Worker-------------
344*61046927SAndroid Build Coastguard Worker
345*61046927SAndroid Build Coastguard WorkerAfter the frontend has generated the use-def graph of instructions, they are run through various passes which include scheduling_ and `register assignment`_.  Because inserting ``mov`` instructions after scheduling would also require inserting additional ``nop`` instructions (since it is too late to reschedule to try and fill the bubbles), the earlier stages try to ensure that (at least given an infinite supply of registers) that `register assignment`_ after scheduling_ cannot fail.
346*61046927SAndroid Build Coastguard Worker
347*61046927SAndroid Build Coastguard Worker    Note that we essentially have ~256 scalar registers in the
348*61046927SAndroid Build Coastguard Worker    architecture (although larger register usage will at some thresholds
349*61046927SAndroid Build Coastguard Worker    limit the number of threads which can run in parallel).  And at some
350*61046927SAndroid Build Coastguard Worker    point we will have to deal with spilling.
351*61046927SAndroid Build Coastguard Worker
352*61046927SAndroid Build Coastguard Worker.. _flatten:
353*61046927SAndroid Build Coastguard Worker
354*61046927SAndroid Build Coastguard WorkerFlatten
355*61046927SAndroid Build Coastguard Worker~~~~~~~
356*61046927SAndroid Build Coastguard Worker
357*61046927SAndroid Build Coastguard WorkerIn this stage, simple if/else blocks are flattened into a single block with ``phi`` nodes converted into ``sel`` instructions.  The a3xx ISA has very few predicated instructions, and we would prefer not to use branches for simple if/else.
358*61046927SAndroid Build Coastguard Worker
359*61046927SAndroid Build Coastguard Worker
360*61046927SAndroid Build Coastguard Worker.. _`copy propagation`:
361*61046927SAndroid Build Coastguard Worker
362*61046927SAndroid Build Coastguard WorkerCopy Propagation
363*61046927SAndroid Build Coastguard Worker~~~~~~~~~~~~~~~~
364*61046927SAndroid Build Coastguard Worker
365*61046927SAndroid Build Coastguard WorkerCurrently the frontend inserts ``mov``\s in various cases, because certain categories of instructions have limitations about const regs as sources.  And the CP pass simply removes all simple ``mov``\s (i.e. src-type is same as dst-type, no abs/neg flags, etc).
366*61046927SAndroid Build Coastguard Worker
367*61046927SAndroid Build Coastguard WorkerThe eventual plan is to invert that, with the front-end inserting no ``mov``\s and CP legalize things.
368*61046927SAndroid Build Coastguard Worker
369*61046927SAndroid Build Coastguard Worker
370*61046927SAndroid Build Coastguard Worker.. _grouping:
371*61046927SAndroid Build Coastguard Worker
372*61046927SAndroid Build Coastguard WorkerGrouping
373*61046927SAndroid Build Coastguard Worker~~~~~~~~
374*61046927SAndroid Build Coastguard Worker
375*61046927SAndroid Build Coastguard WorkerIn the grouping pass, instructions which need to be grouped (for ``collect``\s, etc) have their ``left`` / ``right`` neighbor pointers setup.  In cases where there is a conflict (i.e. one instruction cannot have two unique left or right neighbors), an additional ``mov`` instruction is inserted.  This ensures that there is some possible valid `register assignment`_ at the later stages.
376*61046927SAndroid Build Coastguard Worker
377*61046927SAndroid Build Coastguard Worker
378*61046927SAndroid Build Coastguard Worker.. _depth:
379*61046927SAndroid Build Coastguard Worker
380*61046927SAndroid Build Coastguard WorkerDepth
381*61046927SAndroid Build Coastguard Worker~~~~~
382*61046927SAndroid Build Coastguard Worker
383*61046927SAndroid Build Coastguard WorkerIn the depth pass, a depth is calculated for each instruction node within its basic block.  The depth is the sum of the required cycles (delay slots needed between two instructions plus one) of each instruction plus the max depth of any of its source instructions.  (meta_ instructions don't add to the depth).  As an instruction's depth is calculated, it is inserted into a per block list sorted by deepest instruction.  Unreachable instructions and inputs are marked.
384*61046927SAndroid Build Coastguard Worker
385*61046927SAndroid Build Coastguard Worker    TODO: we should probably calculate both hard and soft depths (?) to
386*61046927SAndroid Build Coastguard Worker    try to coax additional instructions to fit in places where we need
387*61046927SAndroid Build Coastguard Worker    to use sync bits, such as after a texture fetch or SFU.
388*61046927SAndroid Build Coastguard Worker
389*61046927SAndroid Build Coastguard Worker.. _scheduling:
390*61046927SAndroid Build Coastguard Worker
391*61046927SAndroid Build Coastguard WorkerScheduling
392*61046927SAndroid Build Coastguard Worker~~~~~~~~~~
393*61046927SAndroid Build Coastguard Worker
394*61046927SAndroid Build Coastguard WorkerAfter the grouping_ pass, there are no more instructions to insert or remove.  Start scheduling each basic block from the deepest node in the depth sorted list created by the depth_ pass, recursively trying to schedule each instruction after its source instructions plus delay slots.  Insert ``nop``\s as required.
395*61046927SAndroid Build Coastguard Worker
396*61046927SAndroid Build Coastguard Worker.. _`register assignment`:
397*61046927SAndroid Build Coastguard Worker
398*61046927SAndroid Build Coastguard WorkerRegister Assignment
399*61046927SAndroid Build Coastguard Worker~~~~~~~~~~~~~~~~~~~
400*61046927SAndroid Build Coastguard Worker
401*61046927SAndroid Build Coastguard WorkerTODO
402*61046927SAndroid Build Coastguard Worker
403*61046927SAndroid Build Coastguard Worker
404