1# Trace Processor (C++) 2 3_The Trace Processor is a C++ library 4([/src/trace_processor](/src/trace_processor)) that ingests traces encoded in a 5wide variety of formats and exposes an SQL interface for querying trace events 6contained in a consistent set of tables. It also has other features including 7computation of summary metrics, annotating the trace with user-friendly 8descriptions and deriving new events from the contents of the trace._ 9 10 11 12## Quickstart 13 14The [quickstart](/docs/quickstart/trace-analysis.md) provides a quick overview 15on how to run SQL queries against traces using trace processor. 16 17## Introduction 18 19Events in a trace are optimized for fast, low-overhead recording. Therefore 20traces need significant data processing to extract meaningful information from 21them. This is compounded by the number of legacy formats which are still in use and 22need to be supported in trace analysis tools. 23 24The trace processor abstracts this complexity by parsing traces, extracting the 25data inside, and exposing it in a set of database tables which can be queried 26with SQL. 27 28Features of the trace processor include: 29 30* Execution of SQL queries on a custom, in-memory, columnar database backed by 31 the SQLite query engine. 32* Metrics subsystem which allows computation of summarized view of the trace 33 (e.g. CPU or memory usage of a process, time taken for app startup etc.). 34* Annotating events in the trace with user-friendly descriptions, providing 35 context and explanation of events to newer users. 36* Creation of new events derived from the contents of the trace. 37 38The formats supported by trace processor include: 39 40* Perfetto native protobuf format 41* Linux ftrace 42* Android systrace 43* Chrome JSON (including JSON embedding Android systrace text) 44* Fuchsia binary format 45* [Ninja](https://ninja-build.org/) logs (the build system) 46 47The trace processor is embedded in a wide variety of trace analysis tools, including: 48 49* [trace_processor](/docs/analysis/trace-processor.md), a standalone binary 50 providing a shell interface (and the reference embedder). 51* [Perfetto UI](https://ui.perfetto.dev), in the form of a WebAssembly module. 52* [Android GPU Inspector](https://gpuinspector.dev/). 53* [Android Studio](https://developer.android.com/studio/). 54 55## Concepts 56 57The trace processor has some foundational terminology and concepts which are 58used in the rest of documentation. 59 60### Events 61 62In the most general sense, a trace is simply a collection of timestamped 63"events". Events can have associated metadata and context which allows them to 64be interpreted and analyzed. 65 66Events form the foundation of trace processor and are one of two types: slices 67and counters. 68 69#### Slices 70 71 72 73A slice refers to an interval of time with some data describing what was 74happening in that interval. Some example of slices include: 75 76* Scheduling slices for each CPU 77* Atrace slices on Android 78* Userspace slices from Chrome 79 80#### Counters 81 82 83 84A counter is a continuous value which varies over time. Some examples of 85counters include: 86 87* CPU frequency for each CPU core 88* RSS memory events - both from the kernel and polled from /proc/stats 89* atrace counter events from Android 90* Chrome counter events 91 92### Tracks 93 94A track is a named partition of events of the same type and the same associated 95context. For example: 96 97* Scheduling slices have one track for each CPU 98* Sync userspace slice have one track for each thread which emitted an event 99* Async userspace slices have one track for each “cookie” linking a set of async 100 events 101 102The most intuitive way to think of a track is to imagine how they would be drawn 103in a UI; if all the events are in a single row, they belong to the same track. 104For example, all the scheduling events for CPU 5 are on the same track: 105 106 107 108Tracks can be split into various types based on the type of event they contain 109and the context they are associated with. Examples include: 110 111* Global tracks are not associated to any context and contain slices 112* Thread tracks are associated to a single thread and contain slices 113* Counter tracks are not associated to any context and contain counters 114* CPU counter tracks are associated to a single CPU and contain counters 115 116### Thread and process identifiers 117 118The handling of threads and processes needs special care when considered in the 119context of tracing; identifiers for threads and processes (e.g. `pid`/`tgid` and 120`tid` in Android/macOS/Linux) can be reused by the operating system over the 121course of a trace. This means they cannot be relied upon as a unique identifier 122when querying tables in trace processor. 123 124To solve this problem, the trace processor uses `utid` (_unique_ tid) for 125threads and `upid` (_unique_ pid) for processes. All references to threads and 126processes (e.g. in CPU scheduling data, thread tracks) uses `utid` and `upid` 127instead of the system identifiers. 128 129## Writing Queries 130 131### Context using tracks 132 133A common question when querying tables in trace processor is: "how do I obtain 134the process or thread for a slice?". Phrased more generally, the question is 135"how do I get the context for an event?". 136 137In trace processor, any context associated with all events on a track is found 138on the associated `track` tables. 139 140For example, to obtain the `utid` of any thread which emitted a `measure` slice 141 142```sql 143SELECT utid 144FROM slice 145JOIN thread_track ON thread_track.id = slice.track_id 146WHERE slice.name = 'measure' 147``` 148 149Similarly, to obtain the `upid`s of any process which has a `mem.swap` counter 150greater than 1000 151 152```sql 153SELECT upid 154FROM counter 155JOIN process_counter_track ON process_counter_track.id = counter.track_id 156WHERE process_counter_track.name = 'mem.swap' AND value > 1000 157``` 158 159### Thread and process tables 160 161While obtaining `utid`s and `upid`s are a step in the right direction, generally 162users want the original `tid`, `pid`, and process/thread names. 163 164The `thread` and `process` tables map `utid`s and `upid`s to threads and 165processes respectively. For example, to lookup the thread with `utid` 10 166 167```sql 168SELECT tid, name 169FROM thread 170WHERE utid = 10 171``` 172 173The `thread` and `process` tables can also be joined with the associated track 174tables directly to jump directly from the slice or counter to the information 175about processes and threads. 176 177For example, to get a list of all the threads which emitted a `measure` slice 178 179```sql 180SELECT thread.name AS thread_name 181FROM slice 182JOIN thread_track ON slice.track_id = thread_track.id 183JOIN thread USING(utid) 184WHERE slice.name = 'measure' 185GROUP BY thread_name 186``` 187 188## Helper functions 189Helper functions are functions built into C++ which reduce the amount of 190boilerplate which needs to be written in SQL. 191 192### Extract args 193`EXTRACT_ARG` is a helper function which retreives a property of an 194event (e.g. slice or counter) from the `args` table. 195 196It takes an `arg_set_id` and `key` as input and returns the value looked 197up in the `args` table. 198 199For example, to retrieve the `prev_comm` field for `sched_switch` events in 200the `ftrace_event` table. 201```sql 202SELECT EXTRACT_ARG(arg_set_id, 'prev_comm') 203FROM ftrace_event 204WHERE name = 'sched_switch' 205``` 206 207Behind the scenes, the above query would desugar to the following: 208```sql 209SELECT 210 ( 211 SELECT string_value 212 FROM args 213 WHERE key = 'prev_comm' AND args.arg_set_id = raw.arg_set_id 214 ) 215FROM ftrace_event 216WHERE name = 'sched_switch' 217``` 218 219NOTE: while convinient, `EXTRACT_ARG` can inefficient compared to a `JOIN` 220when working with very large tables; a function call is required for every 221row which will be slower than the batch filters/sorts used by `JOIN`. 222 223## Operator tables 224SQL queries are usually sufficient to retrieve data from trace processor. 225Sometimes though, certain constructs can be difficult to express pure SQL. 226 227In these situations, trace processor has special "operator tables" which solve 228a particular problem in C++ but expose an SQL interface for queries to take 229advantage of. 230 231### Span join 232Span join is a custom operator table which computes the intersection of 233spans of time from two tables or views. A span in this concept is a row in a 234table/view which contains a "ts" (timestamp) and "dur" (duration) columns. 235 236A column (called the *partition*) can optionally be specified which divides the 237rows from each table into partitions before computing the intersection. 238 239 240 241```sql 242-- Get all the scheduling slices 243CREATE VIEW sp_sched AS 244SELECT ts, dur, cpu, utid 245FROM sched; 246 247-- Get all the cpu frequency slices 248CREATE VIEW sp_frequency AS 249SELECT 250 ts, 251 lead(ts) OVER (PARTITION BY track_id ORDER BY ts) - ts as dur, 252 cpu, 253 value as freq 254FROM counter 255JOIN cpu_counter_track ON counter.track_id = cpu_counter_track.id 256WHERE cpu_counter_track.name = 'cpufreq'; 257 258-- Create the span joined table which combines cpu frequency with 259-- scheduling slices. 260CREATE VIRTUAL TABLE sched_with_frequency 261USING SPAN_JOIN(sp_sched PARTITIONED cpu, sp_frequency PARTITIONED cpu); 262 263-- This span joined table can be queried as normal and has the columns from both 264-- tables. 265SELECT ts, dur, cpu, utid, freq 266FROM sched_with_frequency; 267``` 268 269NOTE: A partition can be specified on neither, either or both tables. If 270specified on both, the same column name has to be specified on each table. 271 272WARNING: An important restriction on span joined tables is that spans from 273the same table in the same partition *cannot* overlap. For performance 274reasons, span join does not attempt to detect and error out in this situation; 275instead, incorrect rows will silently be produced. 276 277WARNING: Partitions mush be integers. Importantly, string partitions are *not* 278supported; note that strings *can* be converted to integers by 279applying the `HASH` function to the string column. 280 281Left and outer span joins are also supported; both function analogously to 282the left and outer joins from SQL. 283```sql 284-- Left table partitioned + right table unpartitioned. 285CREATE VIRTUAL TABLE left_join 286USING SPAN_LEFT_JOIN(table_a PARTITIONED a, table_b); 287 288-- Both tables unpartitioned. 289CREATE VIRTUAL TABLE outer_join 290USING SPAN_OUTER_JOIN(table_x, table_y); 291``` 292 293NOTE: there is a subtlety if the partitioned table is empty and is 294either a) part of an outer join b) on the right side of a left join. 295In this case, *no* slices will be emitted even if the other table is 296non-empty. This approach was decided as being the most natural 297after considering how span joins are used in practice. 298 299### Ancestor slice 300ancestor_slice is a custom operator table that takes a 301[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and computes 302all slices on the same track that are direct parents above that id (i.e. given 303a slice id it will return as rows all slices that can be found by following 304the parent_id column to the top slice (depth = 0)). 305 306The returned format is the same as the 307[slice table](/docs/analysis/sql-tables.autogen#slice) 308 309For example, the following finds the top level slice given a bunch of slices of 310interest. 311 312```sql 313CREATE VIEW interesting_slices AS 314SELECT id, ts, dur, track_id 315FROM slice WHERE name LIKE "%interesting slice name%"; 316 317SELECT 318 * 319FROM 320 interesting_slices LEFT JOIN 321 ancestor_slice(interesting_slices.id) AS ancestor ON ancestor.depth = 0 322``` 323 324### Ancestor slice by stack 325ancestor_slice_by_stack is a custom operator table that takes a 326[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and 327finds all slice ids with that stack_id, then, for each id it computes 328all the ancestor slices similarly to 329[ancestor_slice](/docs/analysis/trace-processor#ancestor-slice). 330 331The returned format is the same as the 332[slice table](/docs/analysis/sql-tables.autogen#slice) 333 334For example, the following finds the top level slice of all slices with the 335given name. 336 337```sql 338CREATE VIEW interesting_stack_ids AS 339SELECT stack_id 340FROM slice WHERE name LIKE "%interesting slice name%"; 341 342SELECT 343 * 344FROM 345 interesting_stack_ids LEFT JOIN 346 ancestor_slice_by_stack(interesting_stack_ids.stack_id) AS ancestor 347 ON ancestor.depth = 0 348``` 349 350### Descendant slice 351descendant_slice is a custom operator table that takes a 352[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and 353computes all slices on the same track that are nested under that id (i.e. 354all slices that are on the same track at the same time frame with a depth 355greater than the given slice's depth. 356 357The returned format is the same as the 358[slice table](/docs/analysis/sql-tables.autogen#slice) 359 360For example, the following finds the number of slices under each slice of 361interest. 362 363```sql 364CREATE VIEW interesting_slices AS 365SELECT id, ts, dur, track_id 366FROM slice WHERE name LIKE "%interesting slice name%"; 367 368SELECT 369 * 370 ( 371 SELECT 372 COUNT(*) AS total_descendants 373 FROM descendant_slice(interesting_slice.id) 374 ) 375FROM interesting_slices 376``` 377 378### Descendant slice by stack 379descendant_slice_by_stack is a custom operator table that takes a 380[slice table's stack_id column](/docs/analysis/sql-tables.autogen#slice) and 381finds all slice ids with that stack_id, then, for each id it computes 382all the descendant slices similarly to 383[descendant_slice](/docs/analysis/trace-processor#descendant-slice). 384 385The returned format is the same as the 386[slice table](/docs/analysis/sql-tables.autogen#slice) 387 388For example, the following finds the next level descendant of all slices with 389the given name. 390 391```sql 392CREATE VIEW interesting_stacks AS 393SELECT stack_id, depth 394FROM slice WHERE name LIKE "%interesting slice name%"; 395 396SELECT 397 * 398FROM 399 interesting_stacks LEFT JOIN 400 descendant_slice_by_stack(interesting_stacks.stack_id) AS descendant 401 ON descendant.depth = interesting_stacks.depth + 1 402``` 403 404### Connected/Following/Preceding flows 405 406DIRECTLY_CONNECTED_FLOW, FOLLOWING_FLOW and PRECEDING_FLOW are custom operator 407tables that take a 408[slice table's id column](/docs/analysis/sql-tables.autogen#slice) and collect 409all entries of [flow table](/docs/analysis/sql-tables.autogen#flow), that are 410directly or indirectly connected to the given starting slice. 411 412`DIRECTLY_CONNECTED_FLOW(start_slice_id)` - contains all entries of 413[flow table](/docs/analysis/sql-tables.autogen#flow) that are present in any 414chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where 415`flow[i].slice_out = flow[i+1].slice_in` and `flow[0].slice_out = start_slice_id 416OR start_slice_id = flow[n].slice_in`. 417 418NOTE: Unlike the following/preceding flow functions, this function will not 419include flows connected to ancestors or descendants while searching for flows 420from a slice. It only includes the slices in the directly connected chain. 421 422`FOLLOWING_FLOW(start_slice_id)` - contains all flows which can be reached from 423a given slice via recursively following from flow's outgoing slice to its 424incoming one and from a reached slice to its child. The return table contains 425all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are 426present in any chain of kind: `flow[0] -> flow[1] -> ... -> flow[n]`, where 427`flow[i+1].slice_out IN DESCENDANT_SLICE(flow[i].slice_in) OR 428flow[i+1].slice_out = flow[i].slice_in` and `flow[0].slice_out IN 429DESCENDANT_SLICE(start_slice_id) OR flow[0].slice_out = start_slice_id`. 430 431`PRECEDING_FLOW(start_slice_id)` - contains all flows which can be reached from 432a given slice via recursively following from flow's incoming slice to its 433outgoing one and from a reached slice to its parent. The return table contains 434all entries of [flow table](/docs/analysis/sql-tables.autogen#flow) that are 435present in any chain of kind: `flow[n] -> flow[n-1] -> ... -> flow[0]`, where 436`flow[i].slice_in IN ANCESTOR_SLICE(flow[i+1].slice_out) OR flow[i].slice_in = 437flow[i+1].slice_out` and `flow[0].slice_in IN ANCESTOR_SLICE(start_slice_id) OR 438flow[0].slice_in = start_slice_id`. 439 440```sql 441--number of following flows for each slice 442SELECT (SELECT COUNT(*) FROM FOLLOWING_FLOW(slice_id)) as following FROM slice; 443``` 444 445## Metrics 446 447TIP: To see how to add to add a new metric to trace processor, see the checklist 448[here](/docs/contributing/common-tasks.md#new-metric). 449 450The metrics subsystem is a significant part of trace processor and thus is 451documented on its own [page](/docs/analysis/metrics.md). 452 453## Python API 454The trace processor's C++ library is also exposed through Python. This 455is documented on a [separate page](/docs/analysis/trace-processor-python.md). 456 457## Testing 458 459Trace processor is mainly tested in two ways: 4601. Unit tests of low-level building blocks 4612. "Diff" tests which parse traces and check the output of queries 462 463### Unit tests 464Unit testing trace processor is the same as in other parts of Perfetto and 465other C++ projects. However, unlike the rest of Perfetto, unit testing is 466relatively light in trace processor. 467 468We have discovered over time that unit tests are generally too brittle 469when dealing with code which parses traces leading to painful, mechanical 470changes being needed when refactorings happen. 471 472Because of this, we choose to focus on diff tests for most areas (e.g. 473parsing events, testing schema of tables, testing metrics etc.) and only 474use unit testing for the low-level building blocks on which the rest of 475trace processor is built. 476 477### Diff tests 478Diff tests are essentially integration tests for trace processor and the 479main way trace processor is tested. 480 481Each diff test takes as input a) a trace file b) a query file *or* a metric 482name. It runs `trace_processor_shell` to parse the trace and then executes 483the query/metric. The result is then compared to a 'golden' file and any 484difference is highlighted. 485 486All diff tests are organized under [test/trace_processor](/test/trace_processor) 487in `tests{_category name}.py` files as methods of a class in each file 488and are run by the script 489[`tools/diff_test_trace_processor.py`](/tools/diff_test_trace_processor.py). 490To add a new test its enough to add a new method starting with `test_` in suitable 491python tests file. 492 493Methods can't take arguments and have to return `DiffTestBlueprint`: 494```python 495class DiffTestBlueprint: 496 trace: Union[Path, Json, Systrace, TextProto] 497 query: Union[str, Path, Metric] 498 out: Union[Path, Json, Csv, TextProto] 499``` 500*Trace* and *Out*: For every type apart from `Path`, contents of the object will be treated as 501file contents so it has to follow the same rules. 502 503*Query*: For metric tests it is enough to provide the metric name. For query tests there 504can be a raw SQL statement, for example `"SELECT * FROM SLICE"` or path to an `.sql` file. 505 506 507 508NOTE: `trace_processor_shell` and associated proto descriptors needs to be 509built before running `tools/diff_test_trace_processor.py`. The easiest way 510to do this is to run `tools/ninja -C <out directory>` both initially and on 511every change to trace processor code or builtin metrics. 512 513#### Choosing where to add diff tests 514`diff_tests/` folder contains four directories corresponding to different 515areas of trace processor. 5161. __stdlib__: Tests focusing on testing Perfetto Standard Library, both 517 prelude and the regular modules. The subdirectories in this folder 518 should generally correspond to directories in `perfetto_sql/stdlib`. 5192. __parser__: Tests focusing on ensuring that different trace files are 520 parsed correctly and the corresponding built-in tables are populated. 5213. __metrics__: Tests focusing on testing metrics located in 522 `trace_processor/metrics/sql`. This organisation is mostly historical 523 and code (and corresponding tests) is expected to move to `stdlib` over time. 5244. __syntax__: Tests focusing on testing the core syntax of PerfettoSQL 525 (i.e. `CREATE PERFETTO TABLE` or `CREATE PERFETTO FUNCTION`). 526 527__Scenario__: A new stdlib module `foo/bar.sql` is being added. 528 529_Answer_: Add the test to the `stdlib/foo/bar_tests.py` file. 530 531__Scenario__: A new event is being parsed, the focus of the test is to ensure 532the event is being parsed correctly. 533 534_Answer_: Add the test in one of the `parser` subdirectories. Prefer adding a 535test to an existing related directory (i.e. `sched`, `power`) if one exists. 536 537__Scenario__: A new metric is being added and the focus of the test is to 538ensure the metric is being correctly computed. 539 540_Answer_: Add the test in one of the `metrics` subdirectories. Prefer adding a 541test to an existing related directory if one exists. Also consider adding the 542code in question to stdlib. 543 544__Scenario__: A new dynamic table is being added and the focus of the test is to 545ensure the dynamic table is being correctly computed... 546 547_Answer_: Add the test to the `stdlib/dynamic_tables` folder 548 549__Scenario__: The interals of trace processor are being modified and the test 550is to ensure the trace processor is correctly filtering/sorting important 551built-in tables. 552 553_Answer_: Add the test to the `parser/core_tables` folder. 554