1# STG 2 3STG stands for Symbol-Type Graph. 4 5# Overview 6 7STG models Application Binary Interfaces. It supports extraction of ABIs from 8DWARF and ingestion of BTF and libabigail XML into its model. Its primary 9purpose is monitoring an ABI for changes over time and reporting such changes in 10a comprehensible fashion. 11 12STG captures symbol information, the size and layout of structs, function 13argument and return types and much more, in a graph representation. Difference 14reporting happens via a graph comparison. 15 16Currently, STG functionality is exposed as two command-line tools, `stg` (for 17ABI extraction) and `stgdiff` (for ABI comparison), and a native file format. 18 19## Model 20 21STG's model is an *abstraction* which does not and cannot capture every possible 22interface property, invariant or behaviour. Conversely, the model includes 23distinctions which are API significant but not ABI significant. 24 25Concretely, STG's model is a rooted, connected, directed graph where each kind 26of node corresponds to a meaningful ABI entity such as a symbol, function type 27or struct member. 28 29Nodes have specific attributes, such as name or size. Outgoing edges specify 30things like return type. STG's model does not impose any constraints on which 31nodes may be joined by edges. 32 33Each node has an identity. However, for the purpose of comparison, nodes are 34considered equal if they are of the same kind, have the same attributes and 35matching outgoing edges and all nodes reachable via a pair of matching edges are 36(recursively) equal. Renumbering nodes, (de)duplicating nodes and 37adding/removing unreachable nodes do not affect this relationship. 38 39### Symbols 40 41As modelled by STG, symbols correspond closely to ELF symbols as seen in 42`.dynsym` for shared object files or in `.symtab` for object files. In the case 43of the Linux kernel, the `.symtab` is enriched with metadata and the effective 44"ksymtab" is actually a subset of the ELF symbols together with CRC and 45namespace information. 46 47STG links symbols to their source-level types where these are known. Symbols 48defined purely in assembly language will not have type information. 49 50The symbol table is contained in the root node of the graph, which is an 51*Interface* node. 52 53### Types 54 55STG models the C, C++ and (to a limited extent) Rust type systems. 56 57For example, C++ template value parameters are poorly modelled for the simple 58reason that this would require modelling C++ *values* as well as types, 59something that DWARF itself doesn't do to the full extent permitted by C++20. 60 61As type definitions are in general mutually recursive, an STG ABI is in general 62a cyclic graph. 63 64The root node of the graph can also contain a list of interface types, which may 65not necessarily be reachable from the interface symbols. 66 67## Supported Input Formats, Parsers and Limitations 68 69STG can read its own native format for processing or comparison. It can also 70process libabigail XML and BTF (`.BTF` ELF sections), with some limitations due 71to model, design and implementation differences including missing features. 72 73### Kinds of Node 74 75STG has the following kinds of node. 76 77* **Special** - used for `void` and `...` 78* **Pointer / Reference** - `*`, `&` and `&&` 79* **Pointer to Member** - `foo::*` 80* **Typedef** - `typedef` and `using ... = ...` 81* **Qualified** - `const` and friends 82* **Primitive** - concrete types such as `int` and friends 83* **Array** - `foo[N]` - there is no distinction between zero and 84 indeterminate length in the model 85* **Base Class** - inheritance metadata 86* **Method** - (only) virtual function 87* **Member** - data member 88* **Variant Member** - discriminated member 89* **Struct / Union** - `struct foo` etc., Rust tuples too 90* **Enumeration** - including the underlying value type - only values that are 91 within the range of signed 64-bit integer are correctly modelled 92* **Variant** - for Rust enums holding data 93* **Function** - multiple argument, single return type 94* **ELF Symbol** - name, version, ELF metadata, Linux kernel metadata 95* **Interface** - top-level collection of symbols and types 96 97An STG ABI consists of a rooted, connected graph of such nodes, and *nothing 98else*. STG is blind to anything that cannot be represented by its model. 99 100### Native Format 101 102STG's native file format is a protocol buffer text format. It is suitable for 103revision control, rather than human consumption. It is effectively described by 104[`stg.proto`](../stg.proto). 105 106In this textual serialisation of ABI graphs, external node identifiers and node 107order are chosen to minimise file changes when a small subset of the graph 108changes. 109 110As an example, this is the definition of the **Typedef** node kind: 111 112```proto 113message Typedef { 114 fixed32 id = 1; 115 string name = 2; 116 fixed32 referred_type_id = 3; 117} 118``` 119 120### Abigail (a.k.a. libabigail XML) 121 122[libabigail](https://sourceware.org/libabigail/) is another project for ABI 123monitoring. It uses a format that can be parsed as XML. 124 125This command will transform Abigail into STG: 126 127```shell 128stg --abi library.xml --output library.stg 129``` 130 131The main features modelled in Abigail but not STG are: 132 133* source file, line and column information 134* C++ access specifiers (public, protected, private) 135 136The Abigail reader has these distinct phases of operation: 137 1381. text parsed into an XML tree 1392. XML cleaning - whitespace and unused attributes are stripped 1403. XML tidying - issues like duplicate nodes are resolved, if possible 1414. XML parsed into a graph with symbol information held separately 1425. symbols and root node added to the graph 1436. useless type qualifiers are stripped in post-processing 144 145### BTF 146 147[BTF](https://docs.kernel.org/bpf/btf.html) is typically used for the Linux 148kernel where it is generated by `pahole -J` from ELF and DWARF information. It 149can also be generated natively instead of DWARF using `gcc -gbtf` and by Clang, 150but only for eBPF targets. 151 152This command will transform BTF into STG: 153 154```shell 155stg --btf vmlinux --output vmlinux.stg 156``` 157 158STG has primarily been tested against the `pahole` (libbtf) dialect of BTF and 159support is not complete. 160 161* split BTF is not supported at all 162* any `.BTF.ext` section is just ignored 163* some kinds of BTF node are not handled: 164 * `BTF_KIND_DATASEC` - skip 165 * `BTF_KIND_DECL_TAG` - abort 166 * `BTF_KIND_TYPE_TAG` - abort 167 168The BTF reader has these distinct phases of operation: 169 1701. file is opened as ELF and `.BTF` section data found 1712. BTF header processed 1723. BTF nodes parsed into a graph with symbol information held separately 1734. symbols and root node added to the graph 174 175### DWARF 176 177The ELF / DWARF reader operates similarly to the other readers at a high level, 178but much more work has to be done to turn ELF symbols and DWARF DIEs into STG 179nodes. 180 1811. the ELF file is checked for DWARF - missing DWARF results in a warning 1822. the ELF symbols are read (from `.dynsym` in the case of shared object file) 1833. the DWARF information is parsed into a partial STG graph 1844. the ELF and DWARF information are stitched together, adding symbols and a 185 root node to the graph 1865. useless type qualifiers are stripped in post-processing 187 188## Output preprocessing 189 190Before `stg` outputs a serialised graph, it performs: 191 1921. a type normalisation step that unifies overlapping type definitions 1932. a final deduplication step to eliminate other redundant nodes 194