xref: /aosp_15_r20/external/AFLplusplus/instrumentation/README.llvm.md (revision 08b48e0b10e97b33e7b60c5b6e2243bd915777f2)
1# Fast LLVM-based instrumentation for afl-fuzz
2
3For the general instruction manual, see [docs/README.md](../docs/README.md).
4
5For the GCC-based instrumentation, see
6[README.gcc_plugin.md](README.gcc_plugin.md).
7
8## 1) Introduction
9
10! llvm_mode works with llvm versions 3.8 up to 17 - but 13+ is recommended !
11
12The code in this directory allows you to instrument programs for AFL++ using
13true compiler-level instrumentation, instead of the more crude assembly-level
14rewriting approach taken by afl-gcc and afl-clang. This has several interesting
15properties:
16
17- The compiler can make many optimizations that are hard to pull off when
18  manually inserting assembly. As a result, some slow, CPU-bound programs will
19  run up to around 2x faster.
20
21  The gains are less pronounced for fast binaries, where the speed is limited
22  chiefly by the cost of creating new processes. In such cases, the gain will
23  probably stay within 10%.
24
25- The instrumentation is CPU-independent. At least in principle, you should be
26  able to rely on it to fuzz programs on non-x86 architectures (after building
27  afl-fuzz with AFL_NO_X86=1).
28
29- The instrumentation can cope a bit better with multi-threaded targets.
30
31- Because the feature relies on the internals of LLVM, it is clang-specific and
32  will *not* work with GCC (see ../gcc_plugin/ for an alternative once it is
33  available).
34
35Once this implementation is shown to be sufficiently robust and portable, it
36will probably replace afl-clang. For now, it can be built separately and
37co-exists with the original code.
38
39The idea and much of the initial implementation came from Laszlo Szekeres.
40
41## 2a) How to use this - short
42
43Set the `LLVM_CONFIG` variable to the clang version you want to use, e.g.:
44
45```
46LLVM_CONFIG=llvm-config-9 make
47```
48
49In case you have your own compiled llvm version specify the full path:
50
51```
52LLVM_CONFIG=~/llvm-project/build/bin/llvm-config make
53```
54
55If you try to use a new llvm version on an old Linux this can fail because of
56old c++ libraries. In this case usually switching to gcc/g++ to compile
57llvm_mode will work:
58
59```
60LLVM_CONFIG=llvm-config-7 REAL_CC=gcc REAL_CXX=g++ make
61```
62
63It is highly recommended to use the newest clang version you can put your hands
64on :)
65
66Then look at [README.persistent_mode.md](README.persistent_mode.md).
67
68## 2b) How to use this - long
69
70In order to leverage this mechanism, you need to have clang installed on your
71system. You should also make sure that the llvm-config tool is in your path (or
72pointed to via LLVM_CONFIG in the environment).
73
74Note that if you have several LLVM versions installed, pointing LLVM_CONFIG to
75the version you want to use will switch compiling to this specific version - if
76you installation is set up correctly :-)
77
78Unfortunately, some systems that do have clang come without llvm-config or the
79LLVM development headers; one example of this is FreeBSD. FreeBSD users will
80also run into problems with clang being built statically and not being able to
81load modules (you'll see "Service unavailable" when loading afl-llvm-pass.so).
82
83To solve all your problems, you can grab pre-built binaries for your OS from:
84
85[https://llvm.org/releases/download.html](https://llvm.org/releases/download.html)
86
87...and then put the bin/ directory from the tarball at the beginning of your
88$PATH when compiling the feature and building packages later on. You don't need
89to be root for that.
90
91To build the instrumentation itself, type `make`. This will generate binaries
92called afl-clang-fast and afl-clang-fast++ in the parent directory. Once this is
93done, you can instrument third-party code in a way similar to the standard
94operating mode of AFL, e.g.:
95
96```
97  CC=/path/to/afl/afl-clang-fast ./configure [...options...]
98  make
99```
100
101Be sure to also include CXX set to afl-clang-fast++ for C++ code.
102
103Note that afl-clang-fast/afl-clang-fast++ are just pointers to afl-cc. You can
104also use afl-cc/afl-c++ and instead direct it to use LLVM instrumentation by
105either setting `AFL_CC_COMPILER=LLVM` or pass the parameter `--afl-llvm` via
106CFLAGS/CXXFLAGS/CPPFLAGS.
107
108The tool honors roughly the same environmental variables as afl-gcc (see
109[docs/env_variables.md](../docs/env_variables.md)). This includes
110`AFL_USE_ASAN`, `AFL_HARDEN`, and `AFL_DONT_OPTIMIZE`. However, `AFL_INST_RATIO`
111is not honored as it does not serve a good purpose with the more effective
112PCGUARD analysis.
113
114## 3) Options
115
116Several options are present to make llvm_mode faster or help it rearrange the
117code to make afl-fuzz path discovery easier.
118
119If you need just to instrument specific parts of the code, you can create the
120instrument file list which C/C++ files to actually instrument. See
121[README.instrument_list.md](README.instrument_list.md)
122
123For splitting memcmp, strncmp, etc., see
124[README.laf-intel.md](README.laf-intel.md).
125
126Then there are different ways of instrumenting the target:
127
1281. A better instrumentation strategy uses LTO and link time instrumentation.
129   Note that not all targets can compile in this mode, however, if it works it
130   is the best option you can use. To go with this option, use
131   afl-clang-lto/afl-clang-lto++. See [README.lto.md](README.lto.md).
132
1332. Alternatively you can choose a completely different coverage method:
134
1352a. N-GRAM coverage - which combines the previous visited edges with the current
136    one. This explodes the map but on the other hand has proven to be effective
137    for fuzzing. See
138    [7) AFL++ N-Gram Branch Coverage](#7-afl-n-gram-branch-coverage).
139
1402b. Context sensitive coverage - which combines the visited edges with an
141    individual caller ID (the function that called the current one). See
142    [6) AFL++ Context Sensitive Branch Coverage](#6-afl-context-sensitive-branch-coverage).
143
144Then - additionally to one of the instrumentation options above - there is a
145very effective new instrumentation option called CmpLog as an alternative to
146laf-intel that allow AFL++ to apply mutations similar to Redqueen. See
147[README.cmplog.md](README.cmplog.md).
148
149Finally, if your llvm version is 8 or lower, you can activate a mode that
150prevents that a counter overflow result in a 0 value. This is good for path
151discovery, but the llvm implementation for x86 for this functionality is not
152optimal and was only fixed in llvm 9. You can set this with AFL_LLVM_NOT_ZERO=1.
153
154Support for thread safe counters has been added for all modes. Activate it with
155`AFL_LLVM_THREADSAFE_INST=1`. The tradeoff is better precision in multi threaded
156apps for a slightly higher instrumentation overhead. This also disables the
157nozero counter default for performance reasons.
158
159## 4) deferred initialization, persistent mode, shared memory fuzzing
160
161This is the most powerful and effective fuzzing you can do. For a full
162explanation, see [README.persistent_mode.md](README.persistent_mode.md).
163
164## 5) Bonus feature: 'dict2file' pass
165
166Just specify `AFL_LLVM_DICT2FILE=/absolute/path/file.txt` and during compilation
167all constant string compare parameters will be written to this file to be used
168with afl-fuzz' `-x` option.
169
170Adding `AFL_LLVM_DICT2FILE_NO_MAIN=1` will skip parsing `main()` which often
171does command line parsing which has string comparisons that are not helpful
172for fuzzing.
173
174## 6) AFL++ Context Sensitive Branch Coverage
175
176### What is this?
177
178This is an LLVM-based implementation of the context sensitive branch coverage.
179
180Basically every function gets its own ID and, every time when an edge is logged,
181all the IDs in the callstack are hashed and combined with the edge transition
182hash to augment the classic edge coverage with the information about the calling
183context.
184
185So if both function A and function B call a function C, the coverage collected
186in C will be different.
187
188In math the coverage is collected as follows: `map[current_location_ID ^
189previous_location_ID >> 1 ^ hash_callstack_IDs] += 1`
190
191The callstack hash is produced XOR-ing the function IDs to avoid explosion with
192recursive functions.
193
194### Usage
195
196Set the `AFL_LLVM_INSTRUMENT=CTX` or `AFL_LLVM_CTX=1` environment variable.
197
198It is highly recommended to increase the MAP_SIZE_POW2 definition in config.h to
199at least 18 and maybe up to 20 for this as otherwise too many map collisions
200occur.
201
202### Caller Branch Coverage
203
204If the context sensitive coverage introduces too may collisions and becoming
205detrimental, the user can choose to augment edge coverage with just the called
206function ID, instead of the entire callstack hash.
207
208In math the coverage is collected as follows: `map[current_location_ID ^
209previous_location_ID >> 1 ^ previous_callee_ID] += 1`
210
211Set the `AFL_LLVM_INSTRUMENT=CALLER` or `AFL_LLVM_CALLER=1` environment
212variable.
213
214## 7) AFL++ N-Gram Branch Coverage
215
216### Source
217
218This is an LLVM-based implementation of the n-gram branch coverage proposed in
219the paper
220["Be Sensitive and Collaborative: Analyzing Impact of Coverage Metrics in Greybox Fuzzing"](https://www.usenix.org/system/files/raid2019-wang-jinghan.pdf)
221by Jinghan Wang, et. al.
222
223Note that the original implementation (available
224[here](https://github.com/bitsecurerlab/afl-sensitive)) is built on top of AFL's
225QEMU mode. This is essentially a port that uses LLVM vectorized instructions
226(available from llvm versions 4.0.1 and higher) to achieve the same results when
227compiling source code.
228
229In math the branch coverage is performed as follows: `map[current_location ^
230prev_location[0] >> 1 ^ prev_location[1] >> 1 ^ ... up to n-1`] += 1`
231
232### Usage
233
234The size of `n` (i.e., the number of branches to remember) is an option that is
235specified either in the `AFL_LLVM_INSTRUMENT=NGRAM-{value}` or the
236`AFL_LLVM_NGRAM_SIZE` environment variable. Good values are 2, 4, or 8, valid
237are 2-16.
238
239It is highly recommended to increase the MAP_SIZE_POW2 definition in config.h to
240at least 18 and maybe up to 20 for this as otherwise too many map collisions
241occur.
242
243## 8) NeverZero counters
244
245In larger, complex, or reiterative programs, the byte sized counters that
246collect the edge coverage can easily fill up and wrap around. This is not that
247much of an issue - unless, by chance, it wraps just to a value of zero when the
248program execution ends. In this case, afl-fuzz is not able to see that the edge
249has been accessed and will ignore it.
250
251NeverZero prevents this behavior. If a counter wraps, it jumps over the value 0
252directly to a 1. This improves path discovery (by a very small amount) at a very
253low cost (one instruction per edge).
254
255(The alternative of saturated counters has been tested also and proved to be
256inferior in terms of path discovery.)
257
258This is implemented in afl-gcc and afl-gcc-fast, however, for llvm_mode this is
259optional if multithread safe counters are selected or the llvm version is below
2609 - as there are severe performance costs in these cases.
261
262If you want to enable this for llvm versions below 9 or thread safe counters,
263then set
264
265```
266export AFL_LLVM_NOT_ZERO=1
267```
268
269In case you are on llvm 9 or greater and you do not want this behavior, then you
270can set:
271
272```
273AFL_LLVM_SKIP_NEVERZERO=1
274```
275
276If the target does not have extensive loops or functions that are called a lot,
277then this can give a small performance boost.
278
279Please note that the default counter implementations are not thread safe!
280
281Support for thread safe counters in mode LLVM CLASSIC can be activated with
282setting `AFL_LLVM_THREADSAFE_INST=1`.
283
284## 8) Source code coverage through instrumentation
285
286Measuring source code coverage is a common task in fuzzing, but it is very
287difficut to do in some situations (e.g. when using snapshot fuzzing).
288
289When using the `AFL_LLVM_INSTRUMENT=llvm-codecov` option, afl-cc will use
290native trace-pc-guard instrumentation but additionally select options that
291are required to utilize the instrumentation for source code coverage.
292
293In particular, it will switch the instrumentation to be per basic block
294instead of instrumenting edges, disable all guard pruning and enable the
295experimental pc-table support that allows the runtime to gather 100% of
296instrumented basic blocks at start, including their locations.
297
298Note: You must compile AFL with the `CODE_COVERAGE=1` option to enable the
299respective parts in the AFL compiler runtime. Support is currently only
300implemented for Nyx, but can in theory also work without Nyx.
301
302Note: You might have to adjust `MAP_SIZE_POW2` in include/config.h to ensure
303that your coverage map is large enough to hold all basic blocks of your
304target program without any collisions.
305
306More documentation on how to utilize this with Nyx will follow.
307