xref: /aosp_15_r20/external/perfetto/docs/analysis/trace-processor.md (revision 6dbdd20afdafa5e3ca9b8809fa73465d530080dc)
1# Trace Processor (C++)
2
3_The Trace Processor is a C++ library
4([/src/trace_processor](/src/trace_processor)) that ingests traces encoded in a
5wide variety of formats and exposes an SQL interface for querying trace events
6contained in a consistent set of tables. It also has other features including
7computation of summary metrics, annotating the trace with user-friendly
8descriptions and deriving new events from the contents of the trace._
9
10![Trace processor block diagram](/docs/images/trace-processor.png)
11
12## Quickstart
13
14The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview
15on how to run SQL queries against traces using trace processor.
16
17## Introduction
18
19Events in a trace are optimized for fast, low-overhead recording. Therefore
20traces need significant data processing to extract meaningful information from
21them. This is compounded by the number of legacy formats which are still in use and
22need to be supported in trace analysis tools.
23
24The trace processor abstracts this complexity by parsing traces, extracting the
25data inside, and exposing it in a set of database tables which can be queried
26with SQL.
27
28Features of the trace processor include:
29
30* Execution of SQL queries on a custom, in-memory, columnar database backed by
31  the SQLite query engine.
32* Metrics subsystem which allows computation of summarized view of the trace
33  (e.g. CPU or memory usage of a process, time taken for app startup etc.).
34* Annotating events in the trace with user-friendly descriptions, providing
35  context and explanation of events to newer users.
36* Creation of new events derived from the contents of the trace.
37
38The formats supported by trace processor include:
39
40* Perfetto native protobuf format
41* Linux ftrace
42* Android systrace
43* Chrome JSON (including JSON embedding Android systrace text)
44* Fuchsia binary format
45* [Ninja](https://ninja-build.org/) logs (the build system)
46
47The trace processor is embedded in a wide variety of trace analysis tools, including:
48
49* [trace_processor](/docs/analysis/trace-processor.md), a standalone binary
50   providing a shell interface (and the reference embedder).
51* [Perfetto UI](https://ui.perfetto.dev), in the form of a WebAssembly module.
52* [Android GPU Inspector](https://gpuinspector.dev/).
53* [Android Studio](https://developer.android.com/studio/).
54
55## Concepts
56
57The trace processor has some foundational terminology and concepts which are
58used in the rest of documentation.
59
60### Events
61
62In the most general sense, a trace is simply a collection of timestamped
63"events". Events can have associated metadata and context which allows them to
64be interpreted and analyzed.
65
66Events form the foundation of trace processor and are one of two types: slices
67and counters.
68
69#### Slices
70
71![Examples of slices](/docs/images/slices.png)
72
73A slice refers to an interval of time with some data describing what was
74happening in that interval. Some example of slices include:
75
76* Scheduling slices for each CPU
77* Atrace slices on Android
78* Userspace slices from Chrome
79
80#### Counters
81
82![Examples of counters](/docs/images/counters.png)
83
84A counter is a continuous value which varies over time. Some examples of
85counters include:
86
87* CPU frequency for each CPU core
88* RSS memory events - both from the kernel and polled from /proc/stats
89* atrace counter events from Android
90* Chrome counter events
91
92### Tracks
93
94A track is a named partition of events of the same type and the same associated
95context. For example:
96
97* Scheduling slices have one track for each CPU
98* Sync userspace slice have one track for each thread which emitted an event
99* Async userspace slices have one track for each “cookie” linking a set of async
100  events
101
102The most intuitive way to think of a track is to imagine how they would be drawn
103in a UI; if all the events are in a single row, they belong to the same track.
104For example, all the scheduling events for CPU 5 are on the same track:
105
106![CPU slices track](/docs/images/cpu-slice-track.png)
107
108Tracks can be split into various types based on the type of event they contain
109and the context they are associated with. Examples include:
110
111* Global tracks are not associated to any context and contain slices
112* Thread tracks are associated to a single thread and contain slices
113* Counter tracks are not associated to any context and contain counters
114* CPU counter tracks are associated to a single CPU and contain counters
115
116### Thread and process identifiers
117
118The handling of threads and processes needs special care when considered in the
119context of tracing; identifiers for threads and processes (e.g. `pid`/`tgid` and
120`tid` in Android/macOS/Linux) can be reused by the operating system over the
121course of a trace. This means they cannot be relied upon as a unique identifier
122when querying tables in trace processor.
123
124To solve this problem, the trace processor uses `utid` (_unique_ tid) for
125threads and `upid` (_unique_ pid) for processes. All references to threads and
126processes (e.g. in CPU scheduling data, thread tracks) uses `utid` and `upid`
127instead of the system identifiers.
128
129## Writing Queries
130
131### Context using tracks
132
133A common question when querying tables in trace processor is: "how do I obtain
134the process or thread for a slice?". Phrased more generally, the question is
135"how do I get the context for an event?".
136
137In trace processor, any context associated with all events on a track is found
138on the associated `track` tables.
139
140For example, to obtain the `utid` of any thread which emitted a `measure` slice
141
142```sql
143SELECT utid
144FROM slice
145JOIN thread_track ON thread_track.id = slice.track_id
146WHERE slice.name = 'measure'
147```
148
149Similarly, to obtain the `upid`s of any process which has a `mem.swap` counter
150greater than 1000
151
152```sql
153SELECT upid
154FROM counter
155JOIN process_counter_track ON process_counter_track.id = counter.track_id
156WHERE process_counter_track.name = 'mem.swap' AND value > 1000
157```
158
159### Thread and process tables
160
161While obtaining `utid`s and `upid`s are a step in the right direction, generally
162users want the original `tid`, `pid`, and process/thread names.
163
164The `thread` and `process` tables map `utid`s and `upid`s to threads and
165processes respectively. For example, to lookup the thread with `utid` 10
166
167```sql
168SELECT tid, name
169FROM thread
170WHERE utid = 10
171```
172
173The `thread` and `process` tables can also be joined with the associated track
174tables directly to jump directly from the slice or counter to the information
175about processes and threads.
176
177For example, to get a list of all the threads which emitted a `measure` slice
178
179```sql
180SELECT thread.name AS thread_name
181FROM slice
182JOIN thread_track ON slice.track_id = thread_track.id
183JOIN thread USING(utid)
184WHERE slice.name = 'measure'
185GROUP BY thread_name
186```
187
188## Helper functions
189Helper functions are functions built into C++ which reduce the amount of
190boilerplate which needs to be written in SQL.
191
192### Extract args
193`EXTRACT_ARG` is a helper function which retreives a property of an
194event (e.g. slice or counter) from the `args` table.
195
196It takes an `arg_set_id` and `key` as input and returns the value looked
197up in the `args` table.
198
199For example, to retrieve the `prev_comm` field for `sched_switch` events in
200the `ftrace_event` table.
201```sql
202SELECT EXTRACT_ARG(arg_set_id, 'prev_comm')
203FROM ftrace_event
204WHERE name = 'sched_switch'
205```
206
207Behind the scenes, the above query would desugar to the following:
208```sql
209SELECT
210  (
211    SELECT string_value
212    FROM args
213    WHERE key = 'prev_comm' AND args.arg_set_id = raw.arg_set_id
214  )
215FROM ftrace_event
216WHERE name = 'sched_switch'
217```
218
219NOTE: while convinient, `EXTRACT_ARG` can inefficient compared to a `JOIN`
220when working with very large tables; a function call is required for every
221row which will be slower than the batch filters/sorts used by `JOIN`.
222
223## Operator tables
224SQL queries are usually sufficient to retrieve data from trace processor.
225Sometimes though, certain constructs can be difficult to express pure SQL.
226
227In these situations, trace processor has special "operator tables" which solve
228a particular problem in C++ but expose an SQL interface for queries to take
229advantage of.
230
231### Span join
232Span join is a custom operator table which computes the intersection of
233spans of time from two tables or views. A span in this concept is a row in a
234table/view which contains a "ts" (timestamp) and "dur" (duration) columns.
235
236A column (called the *partition*) can optionally be specified which divides the
237rows from each table into partitions before computing the intersection.
238
239![Span join block diagram](/docs/images/span-join.png)
240
241```sql
242-- Get all the scheduling slices
243CREATE VIEW sp_sched AS
244SELECT ts, dur, cpu, utid
245FROM sched;
246
247-- Get all the cpu frequency slices
248CREATE VIEW sp_frequency AS
249SELECT
250  ts,
251  lead(ts) OVER (PARTITION BY track_id ORDER BY ts) - ts as dur,
252  cpu,
253  value as freq
254FROM counter
255JOIN cpu_counter_track ON counter.track_id = cpu_counter_track.id
256WHERE cpu_counter_track.name = 'cpufreq';
257
258-- Create the span joined table which combines cpu frequency with
259-- scheduling slices.
260CREATE VIRTUAL TABLE sched_with_frequency
261USING SPAN_JOIN(sp_sched PARTITIONED cpu, sp_frequency PARTITIONED cpu);
262
263-- This span joined table can be queried as normal and has the columns from both
264-- tables.
265SELECT ts, dur, cpu, utid, freq
266FROM sched_with_frequency;
267```
268
269NOTE: A partition can be specified on neither, either or both tables. If
270specified on both, the same column name has to be specified on each table.
271
272WARNING: An important restriction on span joined tables is that spans from
273the same table in the same partition *cannot* overlap. For performance
274reasons, span join does not attempt to detect and error out in this situation;
275instead, incorrect rows will silently be produced.
276
277WARNING: Partitions mush be integers. Importantly, string partitions are *not*
278supported; note that strings *can* be converted to integers by
279applying the `HASH` function to the string column.
280
281Left and outer span joins are also supported; both function analogously to
282the left and outer joins from SQL.
283```sql
284-- Left table partitioned + right table unpartitioned.
285CREATE VIRTUAL TABLE left_join
286USING SPAN_LEFT_JOIN(table_a PARTITIONED a, table_b);
287
288-- Both tables unpartitioned.
289CREATE VIRTUAL TABLE outer_join
290USING SPAN_OUTER_JOIN(table_x, table_y);
291```
292
293NOTE: there is a subtlety if the partitioned table is empty and is
294either a) part of an outer join b) on the right side of a left join.
295In this case, *no* slices will be emitted even if the other table is
296non-empty. This approach was decided as being the most natural
297after considering how span joins are used in practice.
298
299### Ancestor slice
300ancestor_slice is a custom operator table that takes a
301[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and computes
302all slices on the same track that are direct parents above that id (i.e. given
303a slice id it will return as rows all slices that can be found by following
304the parent_id column to the top slice (depth = 0)).
305
306The returned format is the same as the
307[slice table](/docs/analysis/sql-tables.autogen#slice)
308
309For example, the following finds the top level slice given a bunch of slices of
310interest.
311
312```sql
313CREATE VIEW interesting_slices AS
314SELECT id, ts, dur, track_id
315FROM slice WHERE name LIKE "%interesting slice name%";
316
317SELECT
318  *
319FROM
320  interesting_slices LEFT JOIN
321  ancestor_slice(interesting_slices.id) AS ancestor ON ancestor.depth = 0
322```
323
324### Ancestor slice by stack
325ancestor_slice_by_stack is a custom operator table that takes a
326[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and
327finds all slice ids with that stack_id, then, for each id it computes
328all the ancestor slices similarly to
329[ancestor_slice](/docs/analysis/trace-processor#ancestor-slice).
330
331The returned format is the same as the
332[slice table](/docs/analysis/sql-tables.autogen#slice)
333
334For example, the following finds the top level slice of all slices with the
335given name.
336
337```sql
338CREATE VIEW interesting_stack_ids AS
339SELECT stack_id
340FROM slice WHERE name LIKE "%interesting slice name%";
341
342SELECT
343  *
344FROM
345  interesting_stack_ids LEFT JOIN
346  ancestor_slice_by_stack(interesting_stack_ids.stack_id) AS ancestor
347  ON ancestor.depth = 0
348```
349
350### Descendant slice
351descendant_slice is a custom operator table that takes a
352[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and
353computes all slices on the same track that are nested under that id (i.e.
354all slices that are on the same track at the same time frame with a depth
355greater than the given slice's depth.
356
357The returned format is the same as the
358[slice table](/docs/analysis/sql-tables.autogen#slice)
359
360For example, the following finds the number of slices under each slice of
361interest.
362
363```sql
364CREATE VIEW interesting_slices AS
365SELECT id, ts, dur, track_id
366FROM slice WHERE name LIKE "%interesting slice name%";
367
368SELECT
369  *
370  (
371    SELECT
372      COUNT(*) AS total_descendants
373    FROM descendant_slice(interesting_slice.id)
374  )
375FROM interesting_slices
376```
377
378### Descendant slice by stack
379descendant_slice_by_stack is a custom operator table that takes a
380[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and
381finds all slice ids with that stack_id, then, for each id it computes
382all the descendant slices similarly to
383[descendant_slice](/docs/analysis/trace-processor#descendant-slice).
384
385The returned format is the same as the
386[slice table](/docs/analysis/sql-tables.autogen#slice)
387
388For example, the following finds the next level descendant of all slices with
389the given name.
390
391```sql
392CREATE VIEW interesting_stacks AS
393SELECT stack_id, depth
394FROM slice WHERE name LIKE "%interesting slice name%";
395
396SELECT
397  *
398FROM
399  interesting_stacks LEFT JOIN
400  descendant_slice_by_stack(interesting_stacks.stack_id) AS descendant
401  ON descendant.depth = interesting_stacks.depth + 1
402```
403
404### Connected/Following/Preceding flows
405
406DIRECTLY_CONNECTED_FLOW, FOLLOWING_FLOW and PRECEDING_FLOW are custom operator
407tables that take a
408[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and collect
409all entries of [flow table](/docs/analysis/sql-tables.autogen#flow), that are
410directly or indirectly connected to the given starting slice.
411
412`DIRECTLY_CONNECTED_FLOW(start_slice_id)` - contains all entries of
413[flow table](/docs/analysis/sql-tables.autogen#flow) that are present in any
414chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where
415`flow[i].slice_out = flow[i+1].slice_in` and `flow[0].slice_out = start_slice_id
416OR start_slice_id = flow[n].slice_in`.
417
418NOTE: Unlike the following/preceding flow functions, this function will not
419include flows connected to ancestors or descendants while searching for flows
420from a slice. It only includes the slices in the directly connected chain.
421
422`FOLLOWING_FLOW(start_slice_id)` - contains all flows which can be reached from
423a given slice via recursively following from flow's outgoing slice to its
424incoming one and from a reached slice to its child. The return table contains
425all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are
426present in any chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where
427`flow[i+1].slice_out IN DESCENDANT_SLICE(flow[i].slice_in) OR
428flow[i+1].slice_out = flow[i].slice_in` and `flow[0].slice_out IN
429DESCENDANT_SLICE(start_slice_id) OR flow[0].slice_out = start_slice_id`.
430
431`PRECEDING_FLOW(start_slice_id)` - contains all flows which can be reached from
432a given slice via recursively following from flow's incoming slice to its
433outgoing one and from a reached slice to its parent. The return table contains
434all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are
435present in any chain of kind: `flow[n] -> flow[n-1] -> ... -> flow[0]`, where
436`flow[i].slice_in IN ANCESTOR_SLICE(flow[i+1].slice_out) OR flow[i].slice_in =
437flow[i+1].slice_out` and `flow[0].slice_in IN ANCESTOR_SLICE(start_slice_id) OR
438flow[0].slice_in = start_slice_id`.
439
440```sql
441--number of following flows for each slice
442SELECT (SELECT COUNT(*) FROM FOLLOWING_FLOW(slice_id)) as following FROM slice;
443```
444
445## Metrics
446
447TIP: To see how to add to add a new metric to trace processor, see the checklist
448[here](/docs/contributing/common-tasks.md#new-metric).
449
450The metrics subsystem is a significant part of trace processor and thus is
451documented on its own [page](/docs/analysis/metrics.md).
452
453## Python API
454The trace processor's C++ library is also exposed through Python. This
455is documented on a [separate page](/docs/analysis/trace-processor-python.md).
456
457## Testing
458
459Trace processor is mainly tested in two ways:
4601. Unit tests of low-level building blocks
4612. "Diff" tests which parse traces and check the output of queries
462
463### Unit tests
464Unit testing trace processor is the same as in other parts of Perfetto and
465other C++ projects. However, unlike the rest of Perfetto, unit testing is
466relatively light in trace processor.
467
468We have discovered over time that unit tests are generally too brittle
469when dealing with code which parses traces leading to painful, mechanical
470changes being needed when refactorings happen.
471
472Because of this, we choose to focus on diff tests for most areas (e.g.
473parsing events, testing schema of tables, testing metrics etc.) and only
474use unit testing for the low-level building blocks on which the rest of
475trace processor is built.
476
477### Diff tests
478Diff tests are essentially integration tests for trace processor and the
479main way trace processor is tested.
480
481Each diff test takes as input a) a trace file b) a query file *or* a metric
482name. It runs `trace_processor_shell` to parse the trace and then executes
483the query/metric. The result is then compared to a 'golden' file and any
484difference is highlighted.
485
486All diff tests are organized under [test/trace_processor](/test/trace_processor)
487in `tests{_category name}.py` files as methods of a class in each file
488and are run by the script
489[`tools/diff_test_trace_processor.py`](/tools/diff_test_trace_processor.py).
490To add a new test its enough to add a new method starting with `test_` in suitable
491python tests file.
492
493Methods can't take arguments and have to return `DiffTestBlueprint`:
494```python
495class DiffTestBlueprint:
496  trace: Union[Path, Json, Systrace, TextProto]
497  query: Union[str, Path, Metric]
498  out: Union[Path, Json, Csv, TextProto]
499```
500*Trace* and *Out*: For every type apart from `Path`, contents of the object will be treated as
501file contents so it has to follow the same rules.
502
503*Query*: For metric tests it is enough to provide the metric name. For query tests there
504can be a raw SQL statement, for example `"SELECT * FROM SLICE"` or path to an `.sql` file.
505
506
507
508NOTE: `trace_processor_shell` and associated proto descriptors needs to be
509built before running `tools/diff_test_trace_processor.py`. The easiest way
510to do this is to run `tools/ninja -C <out directory>` both initially and on
511every change to trace processor code or builtin metrics.
512
513#### Choosing where to add diff tests
514`diff_tests/` folder contains four directories corresponding to different
515areas of trace processor.
5161. __stdlib__: Tests focusing on testing Perfetto Standard Library, both
517   prelude and the regular modules. The subdirectories in this folder
518   should generally correspond to directories in `perfetto_sql/stdlib`.
5192. __parser__: Tests focusing on ensuring that different trace files are
520   parsed correctly and the corresponding built-in tables are populated.
5213. __metrics__: Tests focusing on testing metrics located in
522   `trace_processor/metrics/sql`. This organisation is mostly historical
523   and code (and corresponding tests) is expected to move to `stdlib` over time.
5244. __syntax__: Tests focusing on testing the core syntax of PerfettoSQL
525   (i.e. `CREATE PERFETTO TABLE` or `CREATE PERFETTO FUNCTION`).
526
527__Scenario__: A new stdlib module `foo/bar.sql` is being added.
528
529_Answer_: Add the test to the `stdlib/foo/bar_tests.py` file.
530
531__Scenario__: A new event is being parsed, the focus of the test is to ensure
532the event is being parsed correctly.
533
534_Answer_: Add the test in one of the `parser` subdirectories. Prefer adding a
535test to an existing related directory (i.e. `sched`, `power`) if one exists.
536
537__Scenario__: A new metric is being added and the focus of the test is to
538ensure the metric is being correctly computed.
539
540_Answer_: Add the test in one of the `metrics` subdirectories. Prefer adding a
541test to an existing related directory if one exists. Also consider adding the
542code in question to stdlib.
543
544__Scenario__: A new dynamic table is being added and the focus of the test is to
545ensure the dynamic table is being correctly computed...
546
547_Answer_: Add the test to the `stdlib/dynamic_tables` folder
548
549__Scenario__: The interals of trace processor are being modified and the test
550is to ensure the trace processor is correctly filtering/sorting important
551built-in tables.
552
553_Answer_: Add the test to the `parser/core_tables` folder.
554