• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..--

self-hosted-builder/25-Apr-2025-153104

Makefile.inD25-Apr-20251.2 KiB5537

README.mdD25-Apr-202511.6 KiB285228

crc32-vx.cD25-Apr-20258.4 KiB22378

dfltcc_common.cD25-Apr-20251.1 KiB3718

dfltcc_common.hD25-Apr-20251.2 KiB4232

dfltcc_deflate.cD25-Apr-202516.2 KiB440277

dfltcc_deflate.hD25-Apr-20252.4 KiB6146

dfltcc_detail.hD25-Apr-20257.8 KiB242193

dfltcc_inflate.cD25-Apr-20255.2 KiB153103

dfltcc_inflate.hD25-Apr-20251.9 KiB5543

s390_features.cD25-Apr-2025303 1511

s390_features.hD25-Apr-2025147 95

README.md

1# Introduction
2
3This directory contains SystemZ deflate hardware acceleration support.
4It can be enabled using the following build commands:
5
6    $ ./configure --with-dfltcc-deflate --with-dfltcc-inflate
7    $ make
8
9or
10
11    $ cmake -DWITH_DFLTCC_DEFLATE=1 -DWITH_DFLTCC_INFLATE=1 .
12    $ make
13
14When built like this, zlib-ng would compress using hardware on level 1,
15and using software on all other levels. Decompression will always happen
16in hardware. In order to enable hardware compression for levels 1-6
17(i.e. to make it used by default) one could add
18`-DDFLTCC_LEVEL_MASK=0x7e` to CFLAGS when building zlib-ng.
19
20SystemZ deflate hardware acceleration is available on [IBM z15](
21https://www.ibm.com/products/z15) and newer machines under the name [
22"Integrated Accelerator for zEnterprise Data Compression"](
23https://www.ibm.com/support/z-content-solutions/compression/). The
24programming interface to it is a machine instruction called DEFLATE
25CONVERSION CALL (DFLTCC). It is documented in Chapter 26 of [Principles
26of Operation](https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf). Both
27the code and the rest of this document refer to this feature simply as
28"DFLTCC".
29
30# Performance
31
32Performance figures are published [here](
33https://github.com/iii-i/zlib-ng/wiki/Performance-with-dfltcc-patch-applied-and-dfltcc-support-built-on-dfltcc-enabled-machine
34). The compression speed-up can be as high as 110x and the decompression
35speed-up can be as high as 15x.
36
37# Limitations
38
39Two DFLTCC compression calls with identical inputs are not guaranteed to
40produce identical outputs. Therefore care should be taken when using
41hardware compression when reproducible results are desired. In
42particular, zlib-ng-specific `zng_deflateSetParams` call allows setting
43`Z_DEFLATE_REPRODUCIBLE` parameter, which disables DFLTCC support for a
44particular stream.
45
46DFLTCC does not support every single zlib-ng feature, in particular:
47
48* `inflate(Z_BLOCK)` and `inflate(Z_TREES)`
49* `inflateMark()`
50* `inflatePrime()`
51* `inflateSyncPoint()`
52
53When used, these functions will either switch to software, or, in case
54this is not possible, gracefully fail.
55
56# Code structure
57
58All SystemZ-specific code lives in `arch/s390` directory and is
59integrated with the rest of zlib-ng using hook macros.
60
61## Hook macros
62
63DFLTCC takes as arguments a parameter block, an input buffer, an output
64buffer and a window. `ZALLOC_DEFLATE_STATE()`, `ZALLOC_INFLATE_STATE()`,
65`ZFREE_STATE()`, `ZCOPY_DEFLATE_STATE()`, `ZCOPY_INFLATE_STATE()`,
66`ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros encapsulate allocation
67details for the parameter block (which is allocated alongside zlib-ng
68state) and the window (which must be page-aligned).
69
70While inflate software and hardware window formats match, this is not
71the case for deflate. Therefore, `deflateSetDictionary()` and
72`deflateGetDictionary()` need special handling, which is triggered using
73`DEFLATE_SET_DICTIONARY_HOOK()` and `DEFLATE_GET_DICTIONARY_HOOK()`
74macros.
75
76`deflateResetKeep()` and `inflateResetKeep()` update the DFLTCC
77parameter block using `DEFLATE_RESET_KEEP_HOOK()` and
78`INFLATE_RESET_KEEP_HOOK()` macros.
79
80`INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and
81`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported
82calls gracefully fail.
83
84`DEFLATE_PARAMS_HOOK()` implements switching between hardware and
85software compression mid-stream using `deflateParams()`. Switching
86normally entails flushing the current block, which might not be possible
87in low memory situations. `deflateParams()` uses `DEFLATE_DONE()` hook
88in order to detect and gracefully handle such situations.
89
90The algorithm implemented in hardware has different compression ratio
91than the one implemented in software. `DEFLATE_BOUND_ADJUST_COMPLEN()`
92and `DEFLATE_NEED_CONSERVATIVE_BOUND()` macros make `deflateBound()`
93return the correct results for the hardware implementation.
94
95Actual compression and decompression are handled by `DEFLATE_HOOK()` and
96`INFLATE_TYPEDO_HOOK()` macros. Since inflation with DFLTCC manages the
97window on its own, calling `updatewindow()` is suppressed using
98`INFLATE_NEED_UPDATEWINDOW()` macro.
99
100In addition to compression, DFLTCC computes CRC-32 and Adler-32
101checksums, therefore, whenever it's used, software checksumming is
102suppressed using `DEFLATE_NEED_CHECKSUM()` and `INFLATE_NEED_CHECKSUM()`
103macros.
104
105While software always produces reproducible compression results, this
106is not the case for DFLTCC. Therefore, zlib-ng users are given the
107ability to specify whether or not reproducible compression results
108are required. While it is always possible to specify this setting
109before the compression begins, it is not always possible to do so in
110the middle of a deflate stream - the exact conditions for that are
111determined by `DEFLATE_CAN_SET_REPRODUCIBLE()` macro.
112
113## SystemZ-specific code
114
115When zlib-ng is built with DFLTCC, the hooks described above are
116converted to calls to functions, which are implemented in
117`arch/s390/dfltcc_*` files. The functions can be grouped in three broad
118categories:
119
120* Base DFLTCC support, e.g. wrapping the machine instruction -
121  `dfltcc()` and allocating aligned memory - `dfltcc_alloc_state()`.
122* Translating between software and hardware data formats, e.g.
123  `dfltcc_deflate_set_dictionary()`.
124* Translating between software and hardware state machines, e.g.
125  `dfltcc_deflate()` and `dfltcc_inflate()`.
126
127The functions from the first two categories are fairly simple, however,
128various quirks in both software and hardware state machines make the
129functions from the third category quite complicated.
130
131### `dfltcc_deflate()` function
132
133This function is called by `deflate()` and has the following
134responsibilities:
135
136* Checking whether DFLTCC can be used with the current stream. If this
137  is not the case, then it returns `0`, making `deflate()` use some
138  other function in order to compress in software. Otherwise it returns
139  `1`.
140* Block management and Huffman table generation. DFLTCC ends blocks only
141  when explicitly instructed to do so by the software. Furthermore,
142  whether to use fixed or dynamic Huffman tables must also be determined
143  by the software. Since looking at data in order to gather statistics
144  would negate performance benefits, the following approach is used: the
145  first `DFLTCC_FIRST_FHT_BLOCK_SIZE` bytes are placed into a fixed
146  block, and every next `DFLTCC_BLOCK_SIZE` bytes are placed into
147  dynamic blocks.
148* Writing EOBS. Block Closing Control bit in the parameter block
149  instructs DFLTCC to write EOBS, however, certain conditions need to be
150  met: input data length must be non-zero or Continuation Flag must be
151  set. To put this in simpler terms, DFLTCC will silently refuse to
152  write EOBS if this is the only thing that it is asked to do. Since the
153  code has to be able to emit EOBS in software anyway, in order to avoid
154  tricky corner cases Block Closing Control is never used. Whether to
155  write EOBS is instead controlled by `soft_bcc` variable.
156* Triggering block post-processing. Depending on flush mode, `deflate()`
157  must perform various additional actions when a block or a stream ends.
158  `dfltcc_deflate()` informs `deflate()` about this using
159  `block_state *result` parameter.
160* Converting software state fields into hardware parameter block fields,
161  and vice versa. For example, `wrap` and Check Value Type or `bi_valid`
162  and Sub-Byte Boundary. Certain fields cannot be translated and must
163  persist untouched in the parameter block between calls, for example,
164  Continuation Flag or Continuation State Buffer.
165* Handling flush modes and low-memory situations. These aspects are
166  quite intertwined and pervasive. The general idea here is that the
167  code must not do anything in software - whether explicitly by e.g.
168  calling `send_eobs()`, or implicitly - by returning to `deflate()`
169  with certain return and `*result` values, when Continuation Flag is
170  set.
171* Ending streams. When a new block is started and flush mode is
172  `Z_FINISH`, Block Header Final parameter block bit is used to mark
173  this block as final. However, sometimes an empty final block is
174  needed, and, unfortunately, just like with EOBS, DFLTCC will silently
175  refuse to do this. The general idea of DFLTCC implementation is to
176  rely as much as possible on the existing code. Here in order to do
177  this, the code pretends that it does not support DFLTCC, which makes
178  `deflate()` call a software compression function, which writes an
179  empty final block. Whether this is required is controlled by
180  `need_empty_block` variable.
181* Error handling. This is simply converting
182  Operation-Ending-Supplemental Code to string. Errors can only happen
183  due to things like memory corruption, and therefore they don't affect
184  the `deflate()` return code.
185
186### `dfltcc_inflate()` function
187
188This function is called by `inflate()` from the `TYPEDO` state (that is,
189when all the metadata is parsed and the stream is positioned at the type
190bits of deflate block header) and it's responsible for the following:
191
192* Falling back to software when flush mode is `Z_BLOCK` or `Z_TREES`.
193  Unfortunately, there is no way to ask DFLTCC to stop decompressing on
194  block or tree boundary.
195* `inflate()` decompression loop management. This is controlled using
196  the return value, which can be either `DFLTCC_INFLATE_BREAK` or
197  `DFLTCC_INFLATE_CONTINUE`.
198* Converting software state fields into hardware parameter block fields,
199  and vice versa. For example, `whave` and History Length or `wnext` and
200  History Offset.
201* Ending streams. This instructs `inflate()` to return `Z_STREAM_END`
202  and is controlled by `last` state field.
203* Error handling. Like deflate, error handling comprises
204  Operation-Ending-Supplemental Code to string conversion. Unlike
205  deflate, errors may happen due to bad inputs, therefore they are
206  propagated to `inflate()` by setting `mode` field to `MEM` or `BAD`.
207
208# Testing
209
210Given complexity of DFLTCC machine instruction, it is not clear whether
211QEMU TCG will ever support it. At the time of writing, one has to have
212access to an IBM z15+ VM or LPAR in order to test DFLTCC support. Since
213DFLTCC is a non-privileged instruction, neither special VM/LPAR
214configuration nor root are required.
215
216zlib-ng CI uses an IBM-provided z15 self-hosted builder for the DFLTCC
217testing. There are no IBM Z builds of GitHub Actions runner, and
218stable qemu-user has problems with .NET apps, so the builder runs the
219x86_64 runner version with qemu-user built from the master branch.
220
221## Configuring the builder.
222
223### Install prerequisites.
224
225```
226$ sudo dnf install docker
227```
228
229### Add services.
230
231```
232$ sudo cp self-hosted-builder/*.service /etc/systemd/system/
233$ sudo systemctl daemon-reload
234```
235
236### Create a config file.
237
238```
239$ sudo tee /etc/actions-runner
240repo=<owner>/<name>
241access_token=<ghp_***>
242```
243
244Access token should have the repo scope, consult
245https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-a-repository
246for details.
247
248### Autostart the x86_64 emulation support.
249
250```
251$ sudo systemctl enable --now qemu-user-static
252```
253
254### Autostart the runner.
255
256```
257$ sudo systemctl enable --now actions-runner
258```
259
260## Rebuilding the image
261
262In order to update the `iiilinuxibmcom/actions-runner` image, e.g. to get the
263latest OS security fixes, use the following commands:
264
265```
266$ sudo docker build \
267      --pull \
268      -f self-hosted-builder/actions-runner.Dockerfile \
269      -t iiilinuxibmcom/actions-runner
270$ sudo systemctl restart actions-runner
271```
272
273## Removing persistent data
274
275The `actions-runner` service stores various temporary data, such as runner
276registration information, work directories and logs, in the `actions-runner`
277volume. In order to remove it and start from scratch, e.g. when switching the
278runner to a different repository, use the following commands:
279
280```
281$ sudo systemctl stop actions-runner
282$ sudo docker rm -f actions-runner
283$ sudo docker volume rm actions-runner
284```
285