xref: /aosp_15_r20/external/stg/doc/tutorial.md (revision 9e3b08ae94a55201065475453d799e8b1378bea6)
1# Tutorial
2
3The simplest use for the STG tools is to extract, store and compare ABI
4representations.
5
6This tutorial uses long options throughout. Equivalent short options can be
7found in the manual pages for [`stg`](stg.md) and [`stgdiff`](stgdiff.md). Both
8tools understand `-` as a shorthand for `/dev/stdout`.
9
10<details>
11<summary>Working Example - code and compilation</summary>
12
13This small code sample will be used as a working example. Copy it into a file
14called `tree.c`.
15
16```c
17struct N {
18  struct N * left;
19  struct N * right;
20  int value;
21};
22
23unsigned int count(struct N * tree) {
24  return tree ? count(tree->left) + count(tree->right) + 1 : 0;
25}
26
27int sum(struct N * tree) {
28  return tree ? sum(tree->left) + sum(tree->right) + tree->value : 0;
29}
30```
31
32Compile it:
33
34```shell
35gcc -Wall -Wextra -g -c tree.c -o tree.o
36```
37
38</details>
39
40## Extraction from ELF / DWARF
41
42`stg` is the tool for extracting ABI representations, though it can do more
43sophisticated things as well. The simplest invocation of `stg` looks something
44like this:
45
46```shell
47stg --elf library.so --output library.stg
48```
49
50Adding the `--annotate` option can be useful, especially if trying to debug ABI
51issues or when experimenting with the tools, like now.
52
53If the output consists of just symbols and you get a warning about missing DWARF
54information, this means that `library.so` has no DWARF debugging information.
55For meaningful results, `stg` should be run on an *unstripped* ELF file which
56may require build system adjustments.
57
58<details>
59<summary>Working Example - ABI extraction</summary>
60
61Run this:
62
63```shell
64stg --elf tree.o --annotate --output -
65```
66
67And you should get something like this:
68
69<details>
70<summary>Output</summary>
71
72```proto
73version: 0x00000002
74root_id: 0x84ea5130  # interface
75pointer_reference {
76  id: 0x32b38621
77  kind: POINTER
78  pointee_type_id: 0xe08efe1a  # struct N
79}
80primitive {
81  id: 0x4585663f
82  name: "unsigned int"
83  encoding: UNSIGNED_INTEGER
84  bytesize: 0x00000004
85}
86primitive {
87  id: 0x6720d32f
88  name: "int"
89  encoding: SIGNED_INTEGER
90  bytesize: 0x00000004
91}
92member {
93  id: 0x35cbdb23
94  name: "left"
95  type_id: 0x32b38621  # struct N*
96}
97member {
98  id: 0x0b440ffb
99  name: "right"
100  type_id: 0x32b38621  # struct N*
101  offset: 64
102}
103member {
104  id: 0xa06f75d5
105  name: "value"
106  type_id: 0x6720d32f  # int
107  offset: 128
108}
109struct_union {
110  id: 0xe08efe1a
111  kind: STRUCT
112  name: "N"
113  definition {
114    bytesize: 24
115    member_id: 0x35cbdb23  # struct N* left
116    member_id: 0x0b440ffb  # struct N* right
117    member_id: 0xa06f75d5  # int value
118  }
119}
120function {
121  id: 0x912c02a7
122  return_type_id: 0x6720d32f  # int
123  parameter_id: 0x32b38621  # struct N*
124}
125function {
126  id: 0xc2779f73
127  return_type_id: 0x4585663f  # unsigned int
128  parameter_id: 0x32b38621  # struct N*
129}
130elf_symbol {
131  id: 0xbb237197
132  name: "count"
133  is_defined: true
134  symbol_type: FUNCTION
135  type_id: 0xc2779f73  # unsigned int(struct N*)
136  full_name: "count"
137}
138elf_symbol {
139  id: 0x4fdeca38
140  name: "sum"
141  is_defined: true
142  symbol_type: FUNCTION
143  type_id: 0x912c02a7  # int(struct N*)
144  full_name: "sum"
145}
146interface {
147  id: 0x84ea5130
148  symbol_id: 0xbb237197  # unsigned int count(struct N*)
149  symbol_id: 0x4fdeca38  # int sum(struct N*)
150}
151```
152
153</details>
154
155</details>
156
157## Filtering
158
159One issue when first starting to manage the ABI of a binary is the wish to
160restrict the interface surface to just the necessary minimum. Any superfluous
161symbols or type definitions in the ABI representation can result in spurious ABI
162differences in reports later on.
163
164When it comes to the symbols exposed, it's common to control symbol
165*visibility*. Type definitions can be either exposed in public header files or
166hidden in private header files, with perhaps only public forward declarations,
167but this does not remove any type definitions in the DWARF information.
168
169STG provides filtering facilities for both symbols and types, for example:
170
171```shell
172stg --files '*.h' --elf library.so --output library.stg
173```
174
175This will ensure that definitions of any types defined outside any header files,
176and perhaps used as opaque pointer handles, are omitted from the ABI
177representation. If you separate public and private headers, then use an
178appropriate glob pattern that distinguishes the two.
179
180Sets of symbol or file names can be read from a file. In this example, all
181symbols whose names begin with `api_`, except those in the `obsolete` file, are
182kept.
183
184```shell
185stg --symbols 'api_* & ! :obsolete' --elf library.so --output library.stg
186```
187
188For historical reasons, the literal filter file format is compatible with
189libabigail's symbol list one, but this is subject to change.
190
191```ini
192[list]
193 # one symbol per line
194 foo # comments, whitespace and empty lines are all ignored
195 bar
196
197 baz
198```
199
200<details>
201<summary>Working Example - filtering the ABI</summary>
202
203Let's say that `struct N` is supposed to be an opaque type that user code only
204gets pointers to and, additionally, the function `count` should be excluded from
205the ABI (perhaps due to an argument over its return type). We can exclude the
206definition of `struct N`, along with that of any other types defined in
207`tree.c`, using a file filter. The symbol can be excluded by name.
208
209Run this:
210
211```shell
212stg --elf tree.o --files '*.h' --symbols '!count' --output -
213```
214
215The result should be something like this:
216
217<details>
218<summary>Output</summary>
219
220```proto
221version: 0x00000002
222root_id: 0x84ea5130
223pointer_reference {
224  id: 0x26944aa7
225  kind: POINTER
226  pointee_type_id: 0xb011cc02
227}
228primitive {
229  id: 0x6720d32f
230  name: "int"
231  encoding: SIGNED_INTEGER
232  bytesize: 0x00000004
233}
234struct_union {
235  id: 0xb011cc02
236  kind: STRUCT
237  name: "N"
238}
239function {
240  id: 0x9425f186
241  return_type_id: 0x6720d32f
242  parameter_id: 0x26944aa7
243}
244elf_symbol {
245  id: 0x4fdeca38
246  name: "sum"
247  is_defined: true
248  symbol_type: FUNCTION
249  type_id: 0x9425f186
250  full_name: "sum"
251}
252interface {
253  id: 0x84ea5130
254  symbol_id: 0x4fdeca38
255}
256```
257
258</details>
259
260</details>
261
262## ABI Comparison
263
264`stgdiff` is the tool for comparing ABI representations and reporting
265differences, though it has some other, more specialised, uses. The simplest
266invocation of `stgdiff` looks something like this:
267
268```shell
269stgdiff --stg old/library.stg new/library.stg --output -
270```
271
272This will report ABI differences in the default (`small`) format.
273
274<details>
275<summary>Working Example - ABI differences - small format</summary>
276
277The function `sum` has a type that depends on `struct N`. Any change to either
278might affect the ABI exposed via `sum`. For example, if the type of the `value`
279member is changed to `short` and the file is recompiled, STG can detect this
280difference.
281
282First rerun the STG extraction, specifying `--output tree-old.stg`. Make the
283source code change, recompile and extract the ABI with `--output tree-new.stg`.
284
285Then run this:
286
287```shell
288stgdiff --stg tree-old.stg tree-new.stg --output -
289```
290
291To get this:
292
293```text
294type 'struct N' changed
295  member changed from 'int value' to 'short int value'
296    type changed from 'int' to 'short int'
297
298```
299
300</details>
301
302The `small` format omits parts of the ABI graph which haven't changed.[^1] To
303see all impacted nodes, use `--format flat` instead.
304
305[^1]: The similarly named `short` format goes a bit further and will omit and
306    summarise certain repetitive differences.
307
308<details>
309<summary>Working Example - ABI differences - flat format</summary>
310
311```text
312function symbol 'int sum(struct N*)' changed
313  type 'int(struct N*)' changed
314    parameter 1 type 'struct N*' changed
315      pointed-to type 'struct N' changed
316
317type 'struct N' changed
318  member 'struct N* left' changed
319    type 'struct N*' changed
320      pointed-to type 'struct N' changed
321  member 'struct N* right' changed
322    type 'struct N*' changed
323      pointed-to type 'struct N' changed
324  member changed from 'int value' to 'short int value'
325    type changed from 'int' to 'short int'
326
327```
328
329</details>
330
331And if you really want to see more of the graph structure, use `--format plain`.
332
333<details>
334<summary>Working Example - ABI differences - plain format</summary>
335
336```text
337function symbol 'int sum(struct N*)' changed
338  type 'int(struct N*)' changed
339    parameter 1 type 'struct N*' changed
340      pointed-to type 'struct N' changed
341        member 'struct N* left' changed
342          type 'struct N*' changed
343            pointed-to type 'struct N' changed
344              (being reported)
345        member 'struct N* right' changed
346          type 'struct N*' changed
347            pointed-to type 'struct N' changed
348              (being reported)
349        member changed from 'int value' to 'short int value'
350          type changed from 'int' to 'short int'
351
352```
353
354</details>
355
356Or just use `--format viz` which generates input for
357[Graphviz](https://graphviz.org/).
358
359<details>
360<summary>Working Example - ABI differences - viz format</summary>
361
362```dot
363digraph "ABI diff" {
364  "0" [shape=rectangle, label="'interface'"]
365  "1" [label="'int sum(struct N*)'"]
366  "2" [label="'int(struct N*)'"]
367  "3" [label="'struct N*'"]
368  "4" [shape=rectangle, label="'struct N'"]
369  "5" [label="'struct N* left'"]
370  "5" -> "3" [label=""]
371  "4" -> "5" [label=""]
372  "6" [label="'struct N* right'"]
373  "6" -> "3" [label=""]
374  "4" -> "6" [label=""]
375  "7" [label="'int value' → 'short int value'"]
376  "8" [color=red, label="'int' → 'short int'"]
377  "7" -> "8" [label=""]
378  "4" -> "7" [label=""]
379  "3" -> "4" [label="pointed-to"]
380  "2" -> "3" [label="parameter 1"]
381  "1" -> "2" [label=""]
382  "0" -> "1" [label=""]
383}
384```
385
386</details>
387