xref: /aosp_15_r20/external/AFLplusplus/docs/custom_mutators.md (revision 08b48e0b10e97b33e7b60c5b6e2243bd915777f2)
1# Custom Mutators in AFL++
2
3This file describes how you can implement custom mutations to be used in AFL.
4For now, we support C/C++ library and Python module, collectively named as the
5custom mutator.
6
7There is also experimental support for Rust in `custom_mutators/rust`. For
8documentation, refer to that directory. Run `cargo doc -p custom_mutator --open`
9in that directory to view the documentation in your web browser.
10
11Implemented by
12- C/C++ library (`*.so`): Khaled Yakdan from Code Intelligence
13  (<[email protected]>)
14- Python module: Christian Holler from Mozilla (<[email protected]>)
15
16## 1) Introduction
17
18Custom mutators can be passed to `afl-fuzz` to perform custom mutations on test
19cases beyond those available in AFL. For example, to enable structure-aware
20fuzzing by using libraries that perform mutations according to a given grammar.
21
22The custom mutator is passed to `afl-fuzz` via the `AFL_CUSTOM_MUTATOR_LIBRARY`
23or `AFL_PYTHON_MODULE` environment variable, and must export a fuzz function.
24Now AFL++ also supports multiple custom mutators which can be specified in the
25same `AFL_CUSTOM_MUTATOR_LIBRARY` environment variable like this.
26
27```bash
28export AFL_CUSTOM_MUTATOR_LIBRARY="full/path/to/mutator_first.so;full/path/to/mutator_second.so"
29```
30
31For details, see [APIs](#2-apis) and [Usage](#3-usage).
32
33The custom mutation stage is set to be the first non-deterministic stage (right
34before the havoc stage).
35
36Note: If `AFL_CUSTOM_MUTATOR_ONLY` is set, all mutations will solely be
37performed with the custom mutator.
38
39## 2) APIs
40
41**IMPORTANT NOTE**: If you use our C/C++ API and you want to increase the size
42of an **out_buf buffer, you have to use `afl_realloc()` for this, so include
43`include/alloc-inl.h` - otherwise afl-fuzz will crash when trying to free
44your buffers.
45
46C/C++:
47
48```c
49void *afl_custom_init(afl_state_t *afl, unsigned int seed);
50unsigned int afl_custom_fuzz_count(void *data, const unsigned char *buf, size_t buf_size);
51void afl_custom_splice_optout(void *data);
52size_t afl_custom_fuzz(void *data, unsigned char *buf, size_t buf_size, unsigned char **out_buf, unsigned char *add_buf, size_t add_buf_size, size_t max_size);
53const char *afl_custom_describe(void *data, size_t max_description_len);
54size_t afl_custom_post_process(void *data, unsigned char *buf, size_t buf_size, unsigned char **out_buf);
55int afl_custom_init_trim(void *data, unsigned char *buf, size_t buf_size);
56size_t afl_custom_trim(void *data, unsigned char **out_buf);
57int afl_custom_post_trim(void *data, unsigned char success);
58size_t afl_custom_havoc_mutation(void *data, unsigned char *buf, size_t buf_size, unsigned char **out_buf, size_t max_size);
59unsigned char afl_custom_havoc_mutation_probability(void *data);
60unsigned char afl_custom_queue_get(void *data, const unsigned char *filename);
61void (*afl_custom_fuzz_send)(void *data, const u8 *buf, size_t buf_size);
62u8 afl_custom_queue_new_entry(void *data, const unsigned char *filename_new_queue, const unsigned int *filename_orig_queue);
63const char* afl_custom_introspection(my_mutator_t *data);
64void afl_custom_deinit(void *data);
65```
66
67Python:
68
69```python
70def init(seed):
71    pass
72
73def fuzz_count(buf):
74    return cnt
75
76def splice_optout():
77    pass
78
79def fuzz(buf, add_buf, max_size):
80    return mutated_out
81
82def describe(max_description_length):
83    return "description_of_current_mutation"
84
85def post_process(buf):
86    return out_buf
87
88def init_trim(buf):
89    return cnt
90
91def trim():
92    return out_buf
93
94def post_trim(success):
95    return next_index
96
97def havoc_mutation(buf, max_size):
98    return mutated_out
99
100def havoc_mutation_probability():
101    return probability # int in [0, 100]
102
103def queue_get(filename):
104    return True
105
106def fuzz_send(buf):
107    pass
108
109def queue_new_entry(filename_new_queue, filename_orig_queue):
110    return False
111
112def introspection():
113    return string
114
115def deinit():  # optional for Python
116    pass
117```
118
119### Custom Mutation
120
121- `init` (optional in Python):
122
123    This method is called when AFL++ starts up and is used to seed RNG and set
124    up buffers and state.
125
126- `queue_get` (optional):
127
128    This method determines whether AFL++ should fuzz the current
129    queue entry or not: all defined custom mutators as well as
130    all AFL++'s mutators.
131
132- `fuzz_count` (optional):
133
134    When a queue entry is selected to be fuzzed, afl-fuzz selects the number of
135    fuzzing attempts with this input based on a few factors. If, however, the
136    custom mutator wants to set this number instead on how often it is called
137    for a specific queue entry, use this function. This function is most useful
138    if `AFL_CUSTOM_MUTATOR_ONLY` is **not** used.
139
140- `splice_optout` (optional):
141
142    If this function is present, no splicing target is passed to the `fuzz`
143    function. This saves time if splicing data is not needed by the custom
144    fuzzing function.
145    This function is never called, just needs to be present to activate.
146
147- `fuzz` (optional):
148
149    This method performs your custom mutations on a given input.
150    The add_buf is the contents of another queue item that can be used for
151    splicing - or anything else - and can also be ignored. If you are not
152    using this additional data then define `splice_optout` (see above).
153    This function is optional.
154    Returing a length of 0 is valid and is interpreted as skipping this
155    one mutation result.
156    For non-Python: the returned output buffer is under **your** memory
157    management!
158
159- `describe` (optional):
160
161    When this function is called, it shall describe the current test case,
162    generated by the last mutation. This will be called, for example, to name
163    the written test case file after a crash occurred. Using it can help to
164    reproduce crashing mutations.
165
166- `havoc_mutation` and `havoc_mutation_probability` (optional):
167
168    `havoc_mutation` performs a single custom mutation on a given input. This
169    mutation is stacked with other mutations in havoc. The other method,
170    `havoc_mutation_probability`, returns the probability that `havoc_mutation`
171    is called in havoc. By default, it is 6%.
172
173- `post_process` (optional):
174
175    For some cases, the format of the mutated data returned from the custom
176    mutator is not suitable to directly execute the target with this input. For
177    example, when using libprotobuf-mutator, the data returned is in a protobuf
178    format which corresponds to a given grammar. In order to execute the target,
179    the protobuf data must be converted to the plain-text format expected by the
180    target. In such scenarios, the user can define the `post_process` function.
181    This function is then transforming the data into the format expected by the
182    API before executing the target.
183
184    This can return any python object that implements the buffer protocol and
185    supports PyBUF_SIMPLE. These include bytes, bytearray, etc.
186
187    You can decide in the post_process mutator to not send the mutated data
188    to the target, e.g. if it is too short, too corrupted, etc. If so,
189    return a NULL buffer and zero length (or a 0 length string in Python).
190
191    NOTE: Do not make any random changes to the data in this function!
192
193    PERFORMANCE for C/C++: If possible make the changes in-place (so modify
194    the `*data` directly, and return it as `*outbuf = data`.
195
196- `fuzz_send` (optional):
197
198    This method can be used if you want to send data to the target yourself,
199    e.g. via IPC. This replaces some usage of utils/afl_proxy but requires
200    that you start the target with afl-fuzz.
201    Example: [custom_mutators/examples/custom_send.c](../custom_mutators/examples/custom_send.c)
202
203- `queue_new_entry` (optional):
204
205    This methods is called after adding a new test case to the queue. If the
206    contents of the file was changed, return True, False otherwise.
207
208- `introspection` (optional):
209
210    This method is called after a new queue entry, crash or timeout is
211    discovered if compiled with INTROSPECTION. The custom mutator can then
212    return a string (const char *) that reports the exact mutations used.
213
214- `deinit` (optional in Python):
215
216    The last method to be called, deinitializing the state.
217
218Note that there are also three functions for trimming as described in the next
219section.
220
221### Trimming Support
222
223The generic trimming routines implemented in AFL++ can easily destroy the
224structure of complex formats, possibly leading to a point where you have a lot
225of test cases in the queue that your Python module cannot process anymore but
226your target application still accepts. This is especially the case when your
227target can process a part of the input (causing coverage) and then errors out on
228the remaining input.
229
230In such cases, it makes sense to implement a custom trimming routine. The API
231consists of multiple methods because after each trimming step, we have to go
232back into the C code to check if the coverage bitmap is still the same for the
233trimmed input. Here's a quick API description:
234
235- `init_trim` (optional):
236
237    This method is called at the start of each trimming operation and receives
238    the initial buffer. It should return the amount of iteration steps possible
239    on this input (e.g., if your input has n elements and you want to remove
240    them one by one, return n, if you do a binary search, return log(n), and so
241    on).
242
243    If your trimming algorithm doesn't allow to determine the amount of
244    (remaining) steps easily (esp. while running), then you can alternatively
245    return 1 here and always return 0 in `post_trim` until you are finished and
246    no steps remain. In that case, returning 1 in `post_trim` will end the
247    trimming routine. The whole current index/max iterations stuff is only used
248    to show progress.
249
250- `trim` (optional)
251
252    This method is called for each trimming operation. It doesn't have any
253    arguments because there is already the initial buffer from `init_trim` and
254    we can memorize the current state in the data variables. This can also save
255    reparsing steps for each iteration. It should return the trimmed input
256    buffer.
257
258- `post_trim` (optional)
259
260    This method is called after each trim operation to inform you if your
261    trimming step was successful or not (in terms of coverage). If you receive a
262    failure here, you should reset your input to the last known good state. In
263    any case, this method must return the next trim iteration index (from 0 to
264    the maximum amount of steps you returned in `init_trim`).
265
266Omitting any of three trimming methods will cause the trimming to be disabled
267and trigger a fallback to the built-in default trimming routine.
268
269### Environment Variables
270
271Optionally, the following environment variables are supported:
272
273- `AFL_CUSTOM_MUTATOR_ONLY`
274
275    Disable all other mutation stages. This can prevent broken test cases (those
276    that your Python module can't work with anymore) to fill up your queue. Best
277    combined with a custom trimming routine (see below) because trimming can
278    cause the same test breakage like havoc and splice.
279
280- `AFL_PYTHON_ONLY`
281
282    Deprecated and removed, use `AFL_CUSTOM_MUTATOR_ONLY` instead.
283
284- `AFL_DEBUG`
285
286    When combined with `AFL_NO_UI`, this causes the C trimming code to emit
287    additional messages about the performance and actions of your custom
288    trimmer. Use this to see if it works :)
289
290## 3) Usage
291
292### Prerequisite
293
294For Python mutators, the python 3 or 2 development package is required. On
295Debian/Ubuntu/Kali it can be installed like this:
296
297```bash
298sudo apt install python3-dev
299# or
300sudo apt install python-dev
301```
302
303Then, AFL++ can be compiled with Python support. The AFL++ Makefile detects
304Python3 through `python-config`/`python3-config` if it is in the PATH and
305compiles `afl-fuzz` with the feature if available.
306
307Note: for some distributions, you might also need the package `python[3]-apt`.
308In case your setup is different, set the necessary variables like this:
309`PYTHON_INCLUDE=/path/to/python/include LDFLAGS=-L/path/to/python/lib make`.
310
311### Helpers
312
313For C/C++ custom mutators you get a pointer to `afl_state_t *afl` in the
314`afl_custom_init()` which contains all information that you need.
315Note that if you access it, you need to recompile your custom mutator if
316you update AFL++ because the structure might have changed!
317
318For mutators written in Python, Rust, GO, etc. there are a few environment
319variables set to help you to get started:
320
321`AFL_CUSTOM_INFO_PROGRAM` - the program name of the target that is executed.
322If your custom mutator is used with modes like Qemu (`-Q`), this will still
323contain the target program, not afl-qemu-trace.
324
325`AFL_CUSTOM_INFO_PROGRAM_INPUT` - if the `-f` parameter is used with afl-fuzz
326then this value is found in this environment variable.
327
328`AFL_CUSTOM_INFO_PROGRAM_ARGV` - this contains the parameters given to the
329target program and still has the `@@` identifier in there.
330
331Note: If `AFL_CUSTOM_INFO_PROGRAM_INPUT` is empty and `AFL_CUSTOM_INFO_PROGRAM_ARGV`
332is either empty or does not contain `@@` then the target gets the input via
333`stdin`.
334
335`AFL_CUSTOM_INFO_OUT` - This is the output directory for this fuzzer instance,
336so if `afl-fuzz` was called with `-o out -S foobar`, then this will be set to
337`out/foobar`.
338
339### Custom Mutator Preparation
340
341For C/C++ mutators, the source code must be compiled as a shared object:
342
343```bash
344gcc -shared -Wall -O3 example.c -o example.so
345```
346
347Note that if you specify multiple custom mutators, the corresponding functions
348will be called in the order in which they are specified. E.g., the first
349`post_process` function of `example_first.so` will be called and then that of
350`example_second.so`.
351
352### Run
353
354C/C++
355
356```bash
357export AFL_CUSTOM_MUTATOR_LIBRARY="/full/path/to/example_first.so;/full/path/to/example_second.so"
358afl-fuzz /path/to/program
359```
360
361Python
362
363```bash
364export PYTHONPATH=`dirname /full/path/to/example.py`
365export AFL_PYTHON_MODULE=example
366afl-fuzz /path/to/program
367```
368
369## 4) Example
370
371See [example.c](../custom_mutators/examples/example.c) and
372[example.py](../custom_mutators/examples/example.py).
373
374## 5) Other Resources
375
376- AFL libprotobuf mutator
377    - [bruce30262/libprotobuf-mutator_fuzzing_learning](https://github.com/bruce30262/libprotobuf-mutator_fuzzing_learning/tree/master/4_libprotobuf_aflpp_custom_mutator)
378    - [thebabush/afl-libprotobuf-mutator](https://github.com/thebabush/afl-libprotobuf-mutator)
379- [XML Fuzzing@NullCon 2017](https://www.agarri.fr/docs/XML_Fuzzing-NullCon2017-PUBLIC.pdf)
380    - [A bug detected by AFL + XML-aware mutators](https://bugs.chromium.org/p/chromium/issues/detail?id=930663)
381