1# Tracing API and ABI: surfaces and stability 2 3This document describes the API and ABI surface of the 4[Perfetto Client Library][cli_lib], what can be expected to be stable long-term 5and what not. 6 7#### In summary 8 9* The public C++ API in `include/perfetto/tracing/` is mostly stable but can 10 occasionally break at compile-time throughout 2020. 11* The C++ API within `include/perfetto/ext/` is internal-only and exposed only 12 for Chromium. 13* A new C API/ABI for a tracing shared library is in the works in 14 `include/perfetto/public`. It is not stable yet. 15* The tracing protocol ABI is based on protobuf-over-UNIX-socket and shared 16 memory. It is long-term stable and maintains compatibility in both directions 17 (old service + newer client and vice-versa). 18* The [DataSourceDescriptor][data_source_descriptor.proto], 19 [DataSourceConfig][data_source_config.proto] and 20 [TracePacket][trace-packet-ref] protos are updated maintaining backwards 21 compatibility unless a message is marked as experimental. Trace Processor 22 deals with importing older trace formats. 23* There isn't a version number neither in the trace file nor in the tracing 24 protocol and there will never be one. Feature flags are used when necessary. 25 26## C++ API 27 28The Client Library C++ API allows an app to contribute to the trace with custom 29trace events. Its headers live under [`include/perfetto/`](/include/perfetto). 30 31There are three different tiers of this API, offering increasingly higher 32expressive power, at the cost of increased complexity. The three tiers are built 33on top of each other. (Googlers, for more details see also 34[go/perfetto-client-api](http://go/perfetto-client-api)). 35 36 37 38### Track Event (public) 39 40This mainly consists of the `TRACE_EVENT*` macros defined in 41[`track_event.h`](/include/perfetto/tracing/track_event.h). 42Those macros provide apps with a quick and easy way to add common types of 43instrumentation points (slices, counters, instant events). 44For details and instructions see the [Client Library doc][cli_lib]. 45 46### Custom Data Sources (public) 47 48This consists of the `perfetto::DataSource` base class and the 49`perfetto::Tracing` controller class defined in 50[`tracing.h`](/include/perfetto/tracing.h). 51These classes allow an app to create custom data sources which can get 52notifications about tracing sessions lifecycle and emit custom protos in the 53trace (e.g. memory snapshots, compositor layers, etc). 54 55For details and instructions see the [Client Library doc][cli_lib]. 56 57Both the Track Event API and the custom data source are meant to be a public 58API. 59 60WARNING: The team is still iterating on this API surface. While we try to avoid 61 deliberate breakages, some occasional compile-time breakages might be 62 encountered when updating the library. The interface is expected to 63 stabilize by the end of 2020. 64 65### Producer / Consumer API (internal) 66 67This consists of all the interfaces defined in the 68[`include/perfetto/ext`](/include/perfetto/ext) directory. These provide access 69to the lowest levels of the Perfetto internals (manually registering producers 70and data sources, handling all IPCs). 71 72These interfaces will always be highly unstable. We highly discourage 73any project from depending on this API because it is too complex and extremely 74hard to get right. 75This API surface exists only for the Chromium project, which has unique 76challenges (e.g., its own IPC system, complex sandboxing models) and has dozens 77of subtle use cases accumulated through over ten years of legacy of 78chrome://tracing. The team is continuously reshaping this surface to gradually 79migrate all Chrome Tracing use cases over to Perfetto. 80 81## Tracing Protocol ABI 82 83The Tracing Protocol ABI consists of the following binary interfaces that allow 84various processes in the operating system to contribute to tracing sessions and 85inject tracing data into the tracing service: 86 87 * [Socket protocol](#socket-protocol) 88 * [Shared memory layout](#shmem-abi) 89 * [Protobuf messages](#protos) 90 91The whole tracing protocol ABI is binary stable across platforms and is updated 92maintaining both backwards and forward compatibility. No breaking changes 93have been introduced since its first revision in Android 9 (Pie, 2018). 94See also the [ABI Stability](#abi-stability) section below. 95 96 97 98### {#socket-protocol} Socket protocol 99 100At the lowest level, the tracing protocol is initiated with a UNIX socket of 101type `SOCK_STREAM` to the tracing service. 102The tracing service listens on two distinct sockets: producer and consumer. 103 104 105 106Both sockets use the same wire protocol, the `IPCFrame` message defined in 107[wire_protocol.proto](/protos/perfetto/ipc/wire_protocol.proto). The wire 108protocol is simply based on a sequence of length-prefixed messages of the form: 109``` 110< 4 bytes len little-endian > < proto-encoded IPCFrame > 111 11204 00 00 00 A0 A1 A2 A3 05 00 00 00 B0 B1 B2 B3 B4 ... 113{ len: 4 } [ Frame 1 ] { len: 5 } [ Frame 2 ] 114``` 115 116The `IPCFrame` proto message defines a request/response protocol that is 117compatible with the [protobuf services syntax][proto_rpc]. `IPCFrame` defines 118the following frame types: 119 1201. `BindService {producer, consumer} -> service`<br> 121 Binds to one of the two service ports (either `producer_port` or 122 `consumer_port`). 123 1242. `BindServiceReply service -> {producer, consumer}`<br> 125 Replies to the bind request, listing all the RPC methods available, together 126 with their method ID. 127 1283. `InvokeMethod {producer, consumer} -> service`<br> 129 Invokes a RPC method, identified by the ID returned by `BindServiceReply`. 130 The invocation takes as unique argument a proto sub-message. Each method 131 defines a pair of _request_ and _response_ method types.<br> 132 For instance the `RegisterDataSource` defined in [producer_port.proto] takes 133 a `perfetto.protos.RegisterDataSourceRequest` and returns a 134 `perfetto.protos.RegisterDataSourceResponse`. 135 1364. `InvokeMethodReply service -> {producer, consumer}`<br> 137 Returns the result of the corresponding invocation or an error flag. 138 If a method return signature is marked as `stream` (e.g. 139 `returns (stream GetAsyncCommandResponse)`), the method invocation can be 140 followed by more than one `InvokeMethodReply`, all with the same 141 `request_id`. All replies in the stream except for the last one will have 142 `has_more: true`, to notify the client more responses for the same invocation 143 will follow. 144 145Here is how the traffic over the IPC socket looks like: 146 147``` 148# [Prd > Svc] Bind request for the remote service named "producer_port" 149request_id: 1 150msg_bind_service { service_name: "producer_port" } 151 152# [Svc > Prd] Service reply. 153request_id: 1 154msg_bind_service_reply: { 155 success: true 156 service_id: 42 157 methods: {id: 2; name: "InitializeConnection" } 158 methods: {id: 5; name: "RegisterDataSource" } 159 methods: {id: 3; name: "UnregisterDataSource" } 160 ... 161} 162 163# [Prd > Svc] Method invocation (RegisterDataSource) 164request_id: 2 165msg_invoke_method: { 166 service_id: 42 # "producer_port" 167 method_id: 5 # "RegisterDataSource" 168 169 # Proto-encoded bytes for the RegisterDataSourceRequest message. 170 args_proto: [XX XX XX XX] 171} 172 173# [Svc > Prd] Result of RegisterDataSource method invocation. 174request_id: 2 175msg_invoke_method_reply: { 176 success: true 177 has_more: false # EOF for this request 178 179 # Proto-encoded bytes for the RegisterDataSourceResponse message. 180 reply_proto: [XX XX XX XX] 181} 182``` 183 184#### Producer socket 185 186The producer socket exposes the RPC interface defined in [producer_port.proto]. 187It allows processes to advertise data sources and their capabilities, receive 188notifications about the tracing session lifecycle (trace being started, stopped) 189and signal trace data commits and flush requests. 190 191This socket is also used by the producer and the service to exchange a 192tmpfs file descriptor during initialization for setting up the 193[shared memory buffer](/docs/concepts/buffers.md) where tracing data will be 194written (asynchronously). 195 196On Android this socket is linked at `/dev/socket/traced_producer`. On all 197platforms it is overridable via the `PERFETTO_PRODUCER_SOCK_NAME` env var. 198 199On Android all apps and most system processes can connect to it 200(see [`perfetto_producer` in SELinux policies][selinux_producer]). 201 202In the Perfetto codebase, the [`traced_probes`](/src/traced/probes/) and 203[`heapprofd`](/src/profiling/memory) processes use the producer socket for 204injecting system-wide tracing / profiling data. 205 206#### Consumer socket 207 208The consumer socket exposes the RPC interface defined in [consumer_port.proto]. 209The consumer socket allows processes to control tracing sessions (start / stop 210tracing) and read back trace data. 211 212On Android this socket is linked at `/dev/socket/traced_consumer`. On all 213platforms it is overridable via the `PERFETTO_CONSUMER_SOCK_NAME` env var. 214 215Trace data contains sensitive information that discloses the activity the 216system (e.g., which processes / threads are running) and can allow side-channel 217attacks. For this reason the consumer socket is intended to be exposed only to 218a few privileged processes. 219 220On Android, only the `adb shell` domain (used by various UI tools like 221[Perfetto UI](https://ui.perfetto.dev/), 222[Android Studio](https://developer.android.com/studio) or the 223[Android GPU Inspector](https://github.com/google/agi)) 224and few other trusted system services are allowed to access the consumer socket 225(see [traced_consumer in SELinux][selinux_consumer]). 226 227In the Perfetto codebase, the [`perfetto`](/docs/reference/perfetto-cli) 228binary (`/system/bin/perfetto` on Android) provides a consumer implementation 229and exposes it through a command line interface. 230 231#### Socket protocol FAQs 232 233_Why SOCK_STREAM and not DGRAM/SEQPACKET?_ 234 2351. To allow direct passthrough of the consumer socket on Android through 236 `adb forward localabstract` and allow host tools to directly talk to the 237 on-device tracing service. Today both the Perfetto UI and Android GPU 238 Inspector do this. 2392. To allow in future to directly control a remote service over TCP or SSH 240 tunneling. 2413. Because the socket buffer for `SOCK_DGRAM` is extremely limited and 242 and `SOCK_SEQPACKET` is not supported on MacOS. 243 244_Why not gRPC?_ 245 246The team evaluated gRPC in late 2017 as an alternative but ruled it out 247due to: (i) binary size and memory footprint; (ii) the complexity and overhead 248of running a full HTTP/2 stack over a UNIX socket; (iii) the lack of 249fine-grained control on back-pressure. 250 251_Is the UNIX socket protocol used within Chrome processes?_ 252 253No. Within Chrome processes (the browser app, not CrOS) Perfetto doesn't use 254any doesn't use any unix socket. Instead it uses the functionally equivalent 255Mojo endpoints [`Producer{Client,Host}` and `Consumer{Client,Host}`][mojom]. 256 257### {#shmem-abi} Shared memory 258 259This section describes the binary interface of the memory buffer shared between 260a producer process and the tracing service (SMB). 261 262The SMB is a staging area to decouple data sources living in the Producer 263and allow them to do non-blocking async writes. A SMB is small-ish, typically 264hundreds of KB. Its size is configurable by the producer when connecting. 265For more architectural details about the SMB see also the 266[buffers and dataflow doc](/docs/concepts/buffers.md) and the 267[shared_memory_abi.h] sources. 268 269#### Obtaining the SMB 270 271The SMB is obtained by passing a tmpfs file descriptor over the producer socket 272and memory-mapping it both from the producer and service. 273The producer specifies the desired SMB size and memory layout when sending the 274[`InitializeConnectionRequest`][producer_port.proto] request to the 275service, which is the very first IPC sent after connection. 276By default, the service creates the SMB and passes back its file descriptor to 277the producer with the [`InitializeConnectionResponse`][producer_port.proto] 278IPC reply. Recent versions of the service (Android R / 11) allow the FD to be 279created by the producer and passed down to the service in the request. When the 280service supports this, it acks the request setting 281`InitializeConnectionResponse.using_shmem_provided_by_producer = true`. At the 282time of writing this feature is used only by Chrome for dealing with lazy 283Mojo initialization during startup tracing. 284 285#### SMB memory layout: pages, chunks, fragments and packets 286 287The SMB is partitioned into fixed-size pages. A SMB page must be an integer 288multiple of 4KB. The only valid sizes are: 4KB, 8KB, 16KB, 32KB. 289 290The size of a SMB page is determined by each Producer at connection time, via 291the `shared_memory_page_size_hint_bytes` field of `InitializeConnectionRequest` 292and cannot be changed afterwards. All pages in the SMB have the same size, 293constant throughout the lifetime of the producer process. 294 295 296 297**A page** is a fixed-sized partition of the shared memory buffer and is just a 298container of chunks. 299The Producer can partition each Page SMB using a limited number of predetermined 300layouts (1 page : 1 chunk; 1 page : 2 chunks and so on). 301The page layout is stored in a 32-bit atomic word in the page header. The same 30232-bit word contains also the state of each chunk (2 bits per chunk). 303 304Having fixed the total SMB size (hence the total memory overhead), the page 305size is a triangular trade off between: 306 3071. IPC traffic: smaller pages -> more IPCs. 3082. Producer lock freedom: larger pages -> larger chunks -> data sources can 309 write more data without needing to swap chunks and synchronize. 3103. Risk of write-starving the SMB: larger pages -> higher chance that the 311 Service won't manage to drain them and the SMB remains full. 312 313The page size, on the other side, has no implications on memory wasted due to 314fragmentation (see Chunk below). 315 316**A chunk** A chunk is a portion of a Page and contains a linear sequence of 317[`TracePacket(s)`][trace-packet-ref] (the root trace proto). 318 319A Chunk defines the granularity of the interaction between the Producer and 320tracing Service. When a producer fills a chunk it sends `CommitData` IPC to the 321service, asking it to copy its contents into the central non-shared buffers. 322 323A a chunk can be in one of the following four states: 324 325* `Free` : The Chunk is free. The Service shall never touch it, the Producer 326 can acquire it when writing and transition it into the `BeingWritten` state. 327 328* `BeingWritten`: The Chunk is being written by the Producer and is not 329 complete yet (i.e. there is still room to write other trace packets). 330 The Service never alter the state of chunks in the `BeingWritten` state 331 (but will still read them when flushing even if incomplete). 332 333* `Complete`: The Producer is done writing the chunk and won't touch it 334 again. The Service can move it to its non-shared ring buffer and mark the 335 chunk as `BeingRead` -> `Free` when done. 336 337* `BeingRead`: The Service is moving the page into its non-shared ring 338 buffer. Producers never touch chunks in this state. 339 _Note: this state ended up being never used as the service directly 340 transitions chunks from `Complete` back to `Free`_. 341 342A chunk is owned exclusively by one thread of one data source of the producer. 343 344Chunks are essentially single-writer single-thread lock-free arenas. Locking 345happens only when a Chunk is full and a new one needs to be acquired. 346 347Locking happens only within the scope of a Producer process. 348Inter-process locking is not generally allowed. The Producer cannot lock the 349Service and vice versa. In the worst case, any of the two can starve the SMB, by 350marking all chunks as either being read or written. But that has the only side 351effect of losing the trace data. 352The only case when stalling on the writer-side (the Producer) can occur is when 353a data source in a producer opts in into using the 354[`BufferExhaustedPolicy.kStall`](/docs/concepts/buffers.md) policy and the SMB 355is full. 356 357**[TracePacket][trace-packet-ref]** is the atom of tracing. Putting aside 358pages and chunks a trace is conceptually just a concatenation of TracePacket(s). 359A TracePacket can be big (up to 64 MB) and can span across several chunks, hence 360across several pages. 361A TracePacket can therefore be >> chunk size, >> page size and even >> SMB size. 362The Chunk header carries metadata to deal with the TracePacket splitting. 363 364Overview of the Page, Chunk, Fragment and Packet concepts:<br> 365 366 367Memory layout of a Page:<br> 368 369 370Because a packet can be larger than a page, the first and the last packets in 371a chunk can be fragments. 372 373 374 375#### Post-facto patching through IPC 376 377If a TracePacket is particularly large, it is very likely that the chunk that 378contains its initial fragments is committed into the central buffers and removed 379from the SMB by the time the last fragments of the same packets is written. 380 381Nested messages in protobuf are prefixed by their length. In a zero-copy 382direct-serialization scenario like tracing, the length is known only when the 383last field of a submessage is written and cannot be known upfront. 384 385Because of this, it is possible that when the last fragment of a packet is 386written, the writer needs to backfill the size prefix in an earlier fragment, 387which now might have disappeared from the SMB. 388 389In order to do this, the tracing protocol allows to patch the contents of a 390chunk through the `CommitData` IPC (see 391[`CommitDataRequest.ChunkToPatch`][commit_data_request.proto]) after the tracing 392service copied it into the central buffer. There is no guarantee that the 393fragment will be still there (e.g., it can be over-written in ring-buffer mode). 394The service will patch the chunk only if it's still in the buffer and only if 395the producer ID that wrote it matches the Producer ID of the patch request over 396IPC (the Producer ID is not spoofable and is tied to the IPC socket file 397descriptor). 398 399### {#protos} Proto definitions 400 401The following protobuf messages are part of the overall trace protocol ABI and 402are updated maintaining backward-compatibility, unless marked as experimental 403in the comments. 404 405TIP: See also the _Updating A Message Type_ section of the 406 [Protobuf Language Guide][proto-updating] for valid ABI-compatible changes 407 when updating the schema of a protobuf message. 408 409#### DataSourceDescriptor 410 411Defined in [data_source_descriptor.proto]. This message is sent 412Producer -> Service through IPC on the Producer socket during the Producer 413initialization, before any tracing session is started. This message is used 414to register advertise a data source and its capabilities (e.g., which GPU HW 415counters are supported, their possible sampling rates). 416 417#### DataSourceConfig 418 419Defined in [data_source_config.proto]. This message is sent: 420 421* Consumer -> Service through IPC on the Consumer socket, as part of the 422 [TraceConfig](/docs/concepts/config.md) when a Consumer starts a new tracing 423 session. 424 425* Service -> Producer through IPC on the Producer socket, as a reaction to the 426 above. The service passes through each `DataSourceConfig` section defined in 427 the `TraceConfig` to the corresponding Producer(s) that advertise that data 428 source. 429 430#### TracePacket 431 432Defined in [trace_packet.proto]. This is the root object written by any data 433source into the SMB when producing any form of trace event. 434See the [TracePacket reference][trace-packet-ref] for the full details. 435 436## {#abi-stability} ABI Stability 437 438All the layers of the tracing protocol ABI are long-term stable and can only 439be changed maintaining backwards compatibility. 440 441This is due to the fact that on every Android release the `traced` service 442gets frozen in the system image while unbundled apps (e.g. Chrome) and host 443tools (e.g. Perfetto UI) can be updated at a more frequently cadence. 444 445Both the following scenarios are possible: 446 447#### Producer/Consumer client older than tracing service 448 449This happens typically during Android development. At some point some newer code 450is dropped in the Android platform and shipped to users, while client software 451and host tools will lag behind (or simply the user has not updated their app / 452tools). 453 454The tracing service needs to support clients talking and older version of the 455Producer or Consumer tracing protocol. 456 457* Don't remove IPC methods from the service. 458* Assume that fields added later to existing methods might be absent. 459* For newer Producer/Consumer behaviors, advertise those behaviors through 460 feature flags when connecting to the service. Good examples of this are the 461 `will_notify_on_stop` or `handles_incremental_state_clear` flags in 462 [data_source_descriptor.proto] 463 464#### Producer/Consumer client newer than tracing service 465 466This is the most likely scenario. At some point in 2022 a large number of phones 467will still run Android P or Q, hence running a snapshot of the tracing service 468from ~2018-2020, but will run a recent version Google Chrome. 469Chrome, when configured in system-tracing mode (i.e. system-wide + in-app 470tracing), connects to the Android's `traced` producer socket and talks the 471latest version of the tracing protocol. 472 473The producer/consumer client code needs to be able to talk with an older version of the 474service, which might not support some newer features. 475 476* Newer IPC methods defined in [producer_port.proto] won't exist in the older 477 service. When connecting on the socket the service lists its RPC methods 478 and the client is able to detect if a method is available or not. 479 At the C++ IPC layer, invoking a method that doesn't exist on the service 480 causes the `Deferred<>` promise to be rejected. 481 482* Newer fields in existing IPC methods will just be ignored by the older version 483 of the service. 484 485* If the producer/consumer client depends on a new behavior of the service, and 486 that behavior cannot be inferred by the presence of a method, a new feature 487 flag must be exposed through the `QueryCapabilities` method. 488 489## Static linking vs shared library 490 491The Perfetto C++ Client Library is only available in the form of a static 492library and a single-source amalgamated SDK (which is effectively a static 493library). The library implements the Tracing Protocol ABI so, once statically 494linked, depends only on the socket and shared memory protocol ABI, which are 495guaranteed to be stable. 496 497No shared library distributions for the C++ are available. We strongly 498discourage teams from attempting to build the C++ tracing library as shared 499library and use it from a different linker unit. It is fine to link AND use the 500client library within the same shared library, as long as none of the perfetto 501C++ API is exported. 502 503The `PERFETTO_EXPORT_COMPONENT` annotations are only used when building the 504third tier of the client library in chromium component builds and cannot be 505easily repurposed for delineating shared library boundaries for the other two 506API tiers. 507 508This is because the C++ the first two tiers of the Client Library C++ API make 509extensive use of inline headers and C++ templates, in order to allow the 510compiler to see through most of the layers of abstraction. 511 512Maintaining the C++ ABI across hundreds of inlined functions and a shared 513library is prohibitively expensive and too prone to break in extremely subtle 514ways. For this reason the team has ruled out shared library distributions for 515the time being. 516 517A new C Client library API/ABI is in the works, but it's not stable yet. 518 519[cli_lib]: /docs/instrumentation/tracing-sdk.md 520[selinux_producer]: https://cs.android.com/search?q=perfetto_producer%20f:sepolicy.*%5C.te&sq= 521[selinux_consumer]:https://cs.android.com/search?q=f:sepolicy%2F.*%5C.te%20traced_consumer&sq= 522[mojom]: https://source.chromium.org/chromium/chromium/src/+/master:services/tracing/public/mojom/perfetto_service.mojom?q=producer%20f:%5C.mojom$%20perfetto&ss=chromium&originalUrl=https:%2F%2Fcs.chromium.org%2F 523[proto_rpc]: https://developers.google.com/protocol-buffers/docs/proto#services 524[producer_port.proto]: /protos/perfetto/ipc/producer_port.proto 525[consumer_port.proto]: /protos/perfetto/ipc/consumer_port.proto 526[trace_packet.proto]: /protos/perfetto/trace/trace_packet.proto 527[data_source_descriptor.proto]: /protos/perfetto/common/data_source_descriptor.proto 528[data_source_config.proto]: /protos/perfetto/config/data_source_config.proto 529[trace-packet-ref]: /docs/reference/trace-packet-proto.autogen 530[shared_memory_abi.h]: /include/perfetto/ext/tracing/core/shared_memory_abi.h 531[commit_data_request.proto]: /protos/perfetto/common/commit_data_request.proto 532[proto-updating]: https://developers.google.com/protocol-buffers/docs/proto#updating 533