xref: /aosp_15_r20/external/stg/doc/reference.md (revision 9e3b08ae94a55201065475453d799e8b1378bea6)
1# STG
2
3STG stands for Symbol-Type Graph.
4
5# Overview
6
7STG models Application Binary Interfaces. It supports extraction of ABIs from
8DWARF and ingestion of BTF and libabigail XML into its model. Its primary
9purpose is monitoring an ABI for changes over time and reporting such changes in
10a comprehensible fashion.
11
12STG captures symbol information, the size and layout of structs, function
13argument and return types and much more, in a graph representation. Difference
14reporting happens via a graph comparison.
15
16Currently, STG functionality is exposed as two command-line tools, `stg` (for
17ABI extraction) and `stgdiff` (for ABI comparison), and a native file format.
18
19## Model
20
21STG's model is an *abstraction* which does not and cannot capture every possible
22interface property, invariant or behaviour. Conversely, the model includes
23distinctions which are API significant but not ABI significant.
24
25Concretely, STG's model is a rooted, connected, directed graph where each kind
26of node corresponds to a meaningful ABI entity such as a symbol, function type
27or struct member.
28
29Nodes have specific attributes, such as name or size. Outgoing edges specify
30things like return type. STG's model does not impose any constraints on which
31nodes may be joined by edges.
32
33Each node has an identity. However, for the purpose of comparison, nodes are
34considered equal if they are of the same kind, have the same attributes and
35matching outgoing edges and all nodes reachable via a pair of matching edges are
36(recursively) equal. Renumbering nodes, (de)duplicating nodes and
37adding/removing unreachable nodes do not affect this relationship.
38
39### Symbols
40
41As modelled by STG, symbols correspond closely to ELF symbols as seen in
42`.dynsym` for shared object files or in `.symtab` for object files. In the case
43of the Linux kernel, the `.symtab` is enriched with metadata and the effective
44"ksymtab" is actually a subset of the ELF symbols together with CRC and
45namespace information.
46
47STG links symbols to their source-level types where these are known. Symbols
48defined purely in assembly language will not have type information.
49
50The symbol table is contained in the root node of the graph, which is an
51*Interface* node.
52
53### Types
54
55STG models the C, C++ and (to a limited extent) Rust type systems.
56
57For example, C++ template value parameters are poorly modelled for the simple
58reason that this would require modelling C++ *values* as well as types,
59something that DWARF itself doesn't do to the full extent permitted by C++20.
60
61As type definitions are in general mutually recursive, an STG ABI is in general
62a cyclic graph.
63
64The root node of the graph can also contain a list of interface types, which may
65not necessarily be reachable from the interface symbols.
66
67## Supported Input Formats, Parsers and Limitations
68
69STG can read its own native format for processing or comparison. It can also
70process libabigail XML and BTF (`.BTF` ELF sections), with some limitations due
71to model, design and implementation differences including missing features.
72
73### Kinds of Node
74
75STG has the following kinds of node.
76
77*   **Special** - used for `void` and `...`
78*   **Pointer / Reference** - `*`, `&` and `&&`
79*   **Pointer to Member** - `foo::*`
80*   **Typedef** - `typedef` and `using ... = ...`
81*   **Qualified** - `const` and friends
82*   **Primitive** - concrete types such as `int` and friends
83*   **Array** - `foo[N]` - there is no distinction between zero and
84    indeterminate length in the model
85*   **Base Class** - inheritance metadata
86*   **Method** - (only) virtual function
87*   **Member** - data member
88*   **Variant Member** - discriminated member
89*   **Struct / Union** - `struct foo` etc., Rust tuples too
90*   **Enumeration** - including the underlying value type - only values that are
91    within the range of signed 64-bit integer are correctly modelled
92*   **Variant** - for Rust enums holding data
93*   **Function** - multiple argument, single return type
94*   **ELF Symbol** - name, version, ELF metadata, Linux kernel metadata
95*   **Interface** - top-level collection of symbols and types
96
97An STG ABI consists of a rooted, connected graph of such nodes, and *nothing
98else*. STG is blind to anything that cannot be represented by its model.
99
100### Native Format
101
102STG's native file format is a protocol buffer text format. It is suitable for
103revision control, rather than human consumption. It is effectively described by
104[`stg.proto`](../stg.proto).
105
106In this textual serialisation of ABI graphs, external node identifiers and node
107order are chosen to minimise file changes when a small subset of the graph
108changes.
109
110As an example, this is the definition of the **Typedef** node kind:
111
112```proto
113message Typedef {
114  fixed32 id = 1;
115  string name = 2;
116  fixed32 referred_type_id = 3;
117}
118```
119
120### Abigail (a.k.a. libabigail XML)
121
122[libabigail](https://sourceware.org/libabigail/) is another project for ABI
123monitoring. It uses a format that can be parsed as XML.
124
125This command will transform Abigail into STG:
126
127```shell
128stg --abi library.xml --output library.stg
129```
130
131The main features modelled in Abigail but not STG are:
132
133*   source file, line and column information
134*   C++ access specifiers (public, protected, private)
135
136The Abigail reader has these distinct phases of operation:
137
1381.  text parsed into an XML tree
1392.  XML cleaning - whitespace and unused attributes are stripped
1403.  XML tidying - issues like duplicate nodes are resolved, if possible
1414.  XML parsed into a graph with symbol information held separately
1425.  symbols and root node added to the graph
1436.  useless type qualifiers are stripped in post-processing
144
145### BTF
146
147[BTF](https://docs.kernel.org/bpf/btf.html) is typically used for the Linux
148kernel where it is generated by `pahole -J` from ELF and DWARF information. It
149can also be generated natively instead of DWARF using `gcc -gbtf` and by Clang,
150but only for eBPF targets.
151
152This command will transform BTF into STG:
153
154```shell
155stg --btf vmlinux --output vmlinux.stg
156```
157
158STG has primarily been tested against the `pahole` (libbtf) dialect of BTF and
159support is not complete.
160
161*   split BTF is not supported at all
162*   any `.BTF.ext` section is just ignored
163*   some kinds of BTF node are not handled:
164    *   `BTF_KIND_DATASEC` - skip
165    *   `BTF_KIND_DECL_TAG` - abort
166    *   `BTF_KIND_TYPE_TAG` - abort
167
168The BTF reader has these distinct phases of operation:
169
1701.  file is opened as ELF and `.BTF` section data found
1712.  BTF header processed
1723.  BTF nodes parsed into a graph with symbol information held separately
1734.  symbols and root node added to the graph
174
175### DWARF
176
177The ELF / DWARF reader operates similarly to the other readers at a high level,
178but much more work has to be done to turn ELF symbols and DWARF DIEs into STG
179nodes.
180
1811.  the ELF file is checked for DWARF - missing DWARF results in a warning
1822.  the ELF symbols are read (from `.dynsym` in the case of shared object file)
1833.  the DWARF information is parsed into a partial STG graph
1844.  the ELF and DWARF information are stitched together, adding symbols and a
185    root node to the graph
1865.  useless type qualifiers are stripped in post-processing
187
188## Output preprocessing
189
190Before `stg` outputs a serialised graph, it performs:
191
1921.  a type normalisation step that unifies overlapping type definitions
1932.  a final deduplication step to eliminate other redundant nodes
194