xref: /aosp_15_r20/external/mesa3d/docs/isl/tiling.rst (revision 6104692788411f58d303aa86923a9ff6ecaded22)
1Tiling
2======
3
4The naive view of an image in memory is that the pixels are stored one after
5another in memory usually in an X-major order.  An image that is arranged in
6this way is called "linear".  Linear images, while easy to reason about, can
7have very bad cache locality.  Graphics operations tend to act on pixels that
8are close together in 2-D euclidean space.  If you move one pixel to the right
9or left in a linear image, you only move a few bytes to one side or the other
10in memory.  However, if you move one pixel up or down you can end up kilobytes
11or even megabytes away.
12
13Tiling (sometimes referred to as swizzling) is a method of re-arranging the
14pixels of a surface so that pixels which are close in 2-D euclidean space are
15likely to be close in memory.
16
17Basics
18------
19
20The basic idea of a tiled image is that the image is first divided into
21two-dimensional blocks or tiles.  Each tile takes up a chunk of contiguous
22memory and the tiles are arranged like pixels in linear surface.  This is best
23demonstrated with a specific example. Suppose we have a RGBA8888 X-tiled
24surface on Intel graphics.  Then the surface is divided into 128x8 pixel tiles
25each of which is 4KB of memory.  Within each tile, the pixels are laid out like
26a 128x8 linear image.  The tiles themselves are laid out row-major in memory
27like giant pixels.  This means that, as long as you don't leave your 128x8
28tile, you can move in both dimensions without leaving the same 4K page in
29memory.
30
31.. image:: tiling-basic.svg
32   :alt: Example of an X-tiled image
33
34You can, however do even better than this.  Suppose that same image is,
35instead, Y-tiled.  Then the surface is divided into 32x32 pixel tiles each of
36which is 4KB of memory.  Within a tile, each 64B cache line corresponds to 4x4
37pixel region of the image (you can think of it as a tile within a tile).  This
38means that very small deviations don't even leave the cache line.  This added
39bit of pixel shuffling is known to have a substantial performance impact in
40most real-world applications.
41
42Intel graphics has several different tiling formats that we'll discuss in
43detail in later sections.  The most commonly used as of the writing of this
44chapter is Y-tiling.  In all tiling formats the basic principal is the same:
45The image is divided into tiles of a particular size and, within those tiles,
46the data is re-arranged (or swizzled) based on a particular pattern.  A tile
47size will always be specified in bytes by rows and the actual X-dimension of
48the tile in elements depends on the size of the element in bytes.
49
50Bit-6 Swizzling
51^^^^^^^^^^^^^^^
52
53On some older hardware, there is an additional address swizzle that is applied
54on top of the tiling format.  This has been removed starting with Broadwell
55because, as it says in the Broadwell PRM Vol 5 "Tiling Algorithm" (p. 17):
56
57   Address Swizzling for Tiled-Surfaces is no longer used because the main
58   memory controller has a more effective address swizzling algorithm.
59
60Whether or not swizzling is enabled depends on the memory configuration of the
61system.  Generally, systems with dual-channel RAM have swizzling enabled and
62single-channel do not.  Supposedly, this swizzling allows for better balancing
63between the two memory channels and increases performance. Because it depends
64on the memory configuration which may change from one boot to the next, it
65requires a run-time check.
66
67The best documentation for bit-6 swizzling can be found in the Haswell PRM Vol.
685 "Memory Views" in the section entitled "Address Swizzling for Tiled-Y
69Surfaces".  It exists on older platforms but the docs get progressively worse
70the further you go back.
71
72ISL Representation
73------------------
74
75The structure of any given tiling format is represented by ISL using the
76:c:enum:`isl_tiling` enum and the :c:struct:`isl_tile_info` structure:
77
78.. c:autoenum:: isl_tiling
79   :file: src/intel/isl/isl.h
80   :members:
81
82.. c:autofunction:: isl_tiling_get_info
83   :file: src/intel/isl/isl.c
84
85.. c:autostruct:: isl_tile_info
86   :members:
87
88The ``isl_tile_info`` structure has two different sizes for a tile: a logical
89size in surface elements and a physical size in bytes.  In order to determine
90the proper logical size, the bits-per-block of the underlying format has to be
91passed into ``isl_tiling_get_info``. The proper way to compute the size of an
92image in bytes given a width and height in elements is as follows:
93
94.. code-block:: c
95
96   uint32_t width_tl = DIV_ROUND_UP(width_el * (format_bpb / tile_info.format_bpb),
97                                    tile_info.logical_extent_el.w);
98   uint32_t height_tl = DIV_ROUND_UP(height_el, tile_info.logical_extent_el.h);
99   uint32_t row_pitch = width_tl * tile_info.phys_extent_el.w;
100   uint32_t size = height_tl * tile_info.phys_extent_el.h * row_pitch;
101
102It is very important to note that there is no direct conversion between
103:c:member:`isl_tile_info.logical_extent_el` and
104:c:member:`isl_tile_info.phys_extent_B`.  It is tempting to assume that the
105logical and physical heights are the same and simply divide the width of
106:c:member:`isl_tile_info.phys_extent_B` by the size of the format (which is
107what the PRM does) to get :c:member:`isl_tile_info.logical_extent_el` but
108this is not at all correct. Some tiling formats have logical and physical
109heights that differ and so no such calculation will work in general.  The
110easiest case study for this is W-tiling. From the Sky Lake PRM Vol. 2d,
111"RENDER_SURFACE_STATE" (p. 427):
112
113   If the surface is a stencil buffer (and thus has Tile Mode set to
114   TILEMODE_WMAJOR), the pitch must be set to 2x the value computed based on
115   width, as the stencil buffer is stored with two rows interleaved.
116
117What does this mean?  Why are we multiplying the pitch by two?  What does it
118mean that "the stencil buffer is stored with two rows interleaved"?  The
119explanation for all these questions is that a W-tile (which is only used for
120stencil) has a logical size of 64el x 64el but a physical size of 128B
121x 32rows.  In memory, a W-tile has the same footprint as a Y-tile (128B
122x 32rows) but every pair of rows in the stencil buffer is interleaved into
123a single row of bytes yielding a two-dimensional area of 64el x 64el.  You can
124consider this as its own tiling format or as a modification of Y-tiling.  The
125interpretation in the PRMs vary by hardware generation; on Sandy Bridge they
126simply said it was Y-tiled but by Sky Lake there is almost no mention of
127Y-tiling in connection with stencil buffers and they are always W-tiled. This
128mismatch between logical and physical tile sizes are also relevant for
129hierarchical depth buffers as well as single-channel MCS and CCS buffers.
130
131X-tiling
132--------
133
134The simplest tiling format available on Intel graphics (which has been
135available since gen4) is X-tiling.  An X-tile is 512B x 8rows and, within the
136tile, the data is arranged in an X-major linear fashion.  You can also look at
137X-tiling as being an 8x8 cache line grid where the cache lines are arranged
138X-major as follows:
139
140======= ======= ======= ======= ======= ======= ======= =======
141`0x000` `0x040` `0x080` `0x0c0` `0x100` `0x140` `0x180` `0x1c0`
142`0x200` `0x240` `0x280` `0x2c0` `0x300` `0x340` `0x380` `0x3c0`
143`0x400` `0x440` `0x480` `0x4c0` `0x500` `0x540` `0x580` `0x5c0`
144`0x600` `0x640` `0x680` `0x6c0` `0x700` `0x740` `0x780` `0x7c0`
145`0x800` `0x840` `0x880` `0x8c0` `0x900` `0x940` `0x980` `0x9c0`
146`0xa00` `0xa40` `0xa80` `0xac0` `0xb00` `0xb40` `0xb80` `0xbc0`
147`0xc00` `0xc40` `0xc80` `0xcc0` `0xd00` `0xd40` `0xd80` `0xdc0`
148`0xe00` `0xe40` `0xe80` `0xec0` `0xf00` `0xf40` `0xf80` `0xfc0`
149======= ======= ======= ======= ======= ======= ======= =======
150
151Each cache line represents a piece of a single row of pixels within the image.
152The memory locations of two vertically adjacent pixels within the same X-tile
153always differs by 512B or 8 cache lines.
154
155As mentioned above, X-tiling is slower than Y-tiling (though still faster than
156linear).  However, until Sky Lake, the display scan-out hardware could only do
157X-tiling so we have historically used X-tiling for all window-system buffers
158(because X or a Wayland compositor may want to put it in a plane).
159
160Bit-6 Swizzling
161^^^^^^^^^^^^^^^
162
163When bit-6 swizzling is enabled, bits 9 and 10 are XORed in with bit 6 of the
164tiled address:
165
166.. code-block:: c
167
168   addr[6] ^= addr[9] ^ addr[10];
169
170Y-tiling
171--------
172
173The Y-tiling format, also available since gen4, is substantially different from
174X-tiling and performs much better in practice.  Each Y-tile is an 8x8 grid of cache lines arranged Y-major as follows:
175
176======= ======= ======= ======= ======= ======= ======= =======
177`0x000` `0x200` `0x400` `0x600` `0x800` `0xa00` `0xc00` `0xe00`
178`0x040` `0x240` `0x440` `0x640` `0x840` `0xa40` `0xc40` `0xe40`
179`0x080` `0x280` `0x480` `0x680` `0x880` `0xa80` `0xc80` `0xe80`
180`0x0c0` `0x2c0` `0x4c0` `0x6c0` `0x8c0` `0xac0` `0xcc0` `0xec0`
181`0x100` `0x300` `0x500` `0x700` `0x900` `0xb00` `0xd00` `0xf00`
182`0x140` `0x340` `0x540` `0x740` `0x940` `0xb40` `0xd40` `0xf40`
183`0x180` `0x380` `0x580` `0x780` `0x980` `0xb80` `0xd80` `0xf80`
184`0x1c0` `0x3c0` `0x5c0` `0x7c0` `0x9c0` `0xbc0` `0xdc0` `0xfc0`
185======= ======= ======= ======= ======= ======= ======= =======
186
187Each 64B cache line within the tile is laid out as 4 rows of 16B each:
188
189====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
190`0x00` `0x01` `0x02` `0x03` `0x04` `0x05` `0x06` `0x07` `0x08` `0x09` `0x0a` `0x0b` `0x0c` `0x0d` `0x0e` `0x0f`
191`0x10` `0x11` `0x12` `0x13` `0x14` `0x15` `0x16` `0x17` `0x18` `0x19` `0x1a` `0x1b` `0x1c` `0x1d` `0x1e` `0x1f`
192`0x20` `0x21` `0x22` `0x23` `0x24` `0x25` `0x26` `0x27` `0x28` `0x29` `0x2a` `0x2b` `0x2c` `0x2d` `0x2e` `0x2f`
193`0x30` `0x31` `0x32` `0x33` `0x34` `0x35` `0x36` `0x37` `0x38` `0x39` `0x3a` `0x3b` `0x3c` `0x3d` `0x3e` `0x3f`
194====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
195
196Y-tiling is widely regarded as being substantially faster than X-tiling so it
197is generally preferred.  However, prior to Sky Lake, Y-tiling was not available
198for scanout so X tiling was used for any sort of window-system buffers.
199Starting with Sky Lake, we can scan out from Y-tiled buffers.
200
201Bit-6 Swizzling
202^^^^^^^^^^^^^^^
203
204When bit-6 swizzling is enabled, bit 9 is XORed in with bit 6 of the tiled
205address:
206
207.. code-block:: c
208
209   addr[6] ^= addr[9];
210
211W-tiling
212--------
213
214W-tiling is a new tiling format added on Sandy Bridge for use in stencil
215buffers.  W-tiling is similar to Y-tiling in that it's arranged as an 8x8
216Y-major grid of cache lines.  The bytes within each cache line are arranged as
217follows:
218
219====== ====== ====== ====== ====== ====== ====== ======
220`0x00` `0x01` `0x04` `0x05` `0x10` `0x11` `0x14` `0x15`
221`0x02` `0x03` `0x06` `0x07` `0x12` `0x13` `0x16` `0x17`
222`0x08` `0x09` `0x0c` `0x0d` `0x18` `0x19` `0x1c` `0x1d`
223`0x0a` `0x0b` `0x0e` `0x0f` `0x1a` `0x1b` `0x1e` `0x1f`
224`0x20` `0x21` `0x24` `0x25` `0x30` `0x31` `0x34` `0x35`
225`0x22` `0x23` `0x26` `0x27` `0x32` `0x33` `0x36` `0x37`
226`0x28` `0x29` `0x2c` `0x2d` `0x38` `0x39` `0x3c` `0x3d`
227`0x2a` `0x2b` `0x2e` `0x2f` `0x3a` `0x3b` `0x3e` `0x3f`
228====== ====== ====== ====== ====== ====== ====== ======
229
230While W-tiling has been required for stencil all the way back to Sandy Bridge,
231the docs are somewhat confused as to whether stencil buffers are W or Y-tiled.
232This seems to stem from the fact that the hardware seems to implement W-tiling
233as a sort of modified Y-tiling.  One example of this is the somewhat odd
234requirement that W-tiled buffers have their pitch multiplied by 2.  From the
235Sky Lake PRM Vol. 2d, "RENDER_SURFACE_STATE" (p. 427):
236
237   If the surface is a stencil buffer (and thus has Tile Mode set to
238   TILEMODE_WMAJOR), the pitch must be set to 2x the value computed based on
239   width, as the stencil buffer is stored with two rows interleaved.
240
241The last phrase holds the key here: "the stencil buffer is stored with two rows
242interleaved".  More accurately, a W-tiled buffer can be viewed as a Y-tiled
243buffer with each set of 4 W-tiled lines interleaved to form 2 Y-tiled lines. In
244ISL, we represent a W-tile as a tiling with a logical dimension of 64el x 64el
245but a physical size of 128B x 32rows.  This cleanly takes care of the pitch
246issue above and seems to nicely model the hardware.
247
248Tile4
249-----
250
251The tile4 format, introduced on Xe-HP, is somewhat similar to Y but with more
252internal shuffling.  Each tile4 tile is an 8x8 grid of cache lines arranged
253as follows:
254
255======= ======= ======= ======= ======= ======= ======= =======
256`0x000` `0x040` `0x080` `0x0a0` `0x200` `0x240` `0x280` `0x2a0`
257`0x100` `0x140` `0x180` `0x1a0` `0x300` `0x340` `0x380` `0x3a0`
258`0x400` `0x440` `0x480` `0x4a0` `0x600` `0x640` `0x680` `0x6a0`
259`0x500` `0x540` `0x580` `0x5a0` `0x700` `0x740` `0x780` `0x7a0`
260`0x800` `0x840` `0x880` `0x8a0` `0xa00` `0xa40` `0xa80` `0xaa0`
261`0x900` `0x940` `0x980` `0x9a0` `0xb00` `0xb40` `0xb80` `0xba0`
262`0xc00` `0xc40` `0xc80` `0xca0` `0xe00` `0xe40` `0xe80` `0xea0`
263`0xd00` `0xd40` `0xd80` `0xda0` `0xf00` `0xf40` `0xf80` `0xfa0`
264======= ======= ======= ======= ======= ======= ======= =======
265
266Each 64B cache line within the tile is laid out the same way as for a Y-tile,
267as 4 rows of 16B each:
268
269====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
270`0x00` `0x01` `0x02` `0x03` `0x04` `0x05` `0x06` `0x07` `0x08` `0x09` `0x0a` `0x0b` `0x0c` `0x0d` `0x0e` `0x0f`
271`0x10` `0x11` `0x12` `0x13` `0x14` `0x15` `0x16` `0x17` `0x18` `0x19` `0x1a` `0x1b` `0x1c` `0x1d` `0x1e` `0x1f`
272`0x20` `0x21` `0x22` `0x23` `0x24` `0x25` `0x26` `0x27` `0x28` `0x29` `0x2a` `0x2b` `0x2c` `0x2d` `0x2e` `0x2f`
273`0x30` `0x31` `0x32` `0x33` `0x34` `0x35` `0x36` `0x37` `0x38` `0x39` `0x3a` `0x3b` `0x3c` `0x3d` `0x3e` `0x3f`
274====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
275
276Tiling as a bit pattern
277-----------------------
278
279There is one more important angle on tiling that should be discussed before we
280finish.  Every tiling can be described by three things:
281
282 1. A logical width and height in elements
283 2. A physical width in bytes and height in rows
284 3. A mapping from logical elements to physical bytes within the tile
285
286We have spent a good deal of time on the first two because this is what you
287really need for doing surface layout calculations.  However, there are cases in
288which the map from logical to physical elements is critical.  One example is
289W-tiling where we have code to do W-tiled encoding and decoding in the shader
290for doing stencil blits because the hardware does not allow us to render to
291W-tiled surfaces.
292
293There are many ways to mathematically describe the mapping from logical
294elements to physical bytes.  In the PRMs they give a very complicated set of
295formulas involving lots of multiplication, modulus, and sums that show you how
296to compute the mapping.  With a little creativity, you can easily reduce those
297to a set of bit shifts and ORs.  By far the simplest formulation, however, is
298as a mapping from the bits of the texture coordinates to bits in the address.
299Suppose that :math:`(u, v)` is location of a 1-byte element within a tile.  If
300you represent :math:`u` as :math:`u_n u_{n-1} \cdots u_2 u_1 u_0` where
301:math:`u_0` is the LSB and :math:`u_n` is the MSB of :math:`u` and similarly
302:math:`v = v_m v_{m-1} \cdots v_2 v_1 v_0`, then the bits of the address within
303the tile are given by the table below:
304
305=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
306 Tiling                                          11          10          9           8           7           6           5           4           3           2           1           0
307=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
308:c:enumerator:`isl_tiling.ISL_TILING_X`     :math:`v_2` :math:`v_1` :math:`v_0` :math:`u_8` :math:`u_7` :math:`u_6` :math:`u_5` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0`
309:c:enumerator:`isl_tiling.ISL_TILING_Y0`    :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_4` :math:`v_3` :math:`v_2` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0`
310:c:enumerator:`isl_tiling.ISL_TILING_W`     :math:`u_5` :math:`u_4` :math:`u_3` :math:`v_5` :math:`v_4` :math:`v_3` :math:`v_2` :math:`u_2` :math:`v_1` :math:`u_1` :math:`v_0` :math:`u_0`
311:c:enumerator:`isl_tiling.ISL_TILING_4`     :math:`v_4` :math:`v_3` :math:`u_6` :math:`v_2` :math:`u_5` :math:`u_4` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0`
312=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
313
314Constructing the mapping this way makes a lot of sense when you think about
315hardware.  It may seem complex on paper but "simple" things such as addition
316are relatively expensive in hardware while interleaving bits in a well-defined
317pattern is practically free. For a format that has more than one byte per
318element, you simply chop bits off the bottom of the pattern, hard-code them to
3190, and adjust bit indices as needed.  For a 128-bit format, for instance, the
320Y-tiled pattern becomes :math:`u_2 u_1 u_0 v_4 v_3 v_2 v_1 v_0`.  The Sky Lake
321PRM Vol. 5 in the section "2D Surfaces" contains an expanded version of the
322above table (which we will not repeat here) that also includes the bit patterns
323for the Ys and Yf tiling formats.
324