1*6dbdd20aSAndroid Build Coastguard Worker# ProtoZero design document 2*6dbdd20aSAndroid Build Coastguard Worker 3*6dbdd20aSAndroid Build Coastguard WorkerProtoZero is a zero-copy zero-alloc zero-syscall protobuf serialization libary 4*6dbdd20aSAndroid Build Coastguard Workerpurposefully built for Perfetto's tracing use cases. 5*6dbdd20aSAndroid Build Coastguard Worker 6*6dbdd20aSAndroid Build Coastguard Worker## Motivations 7*6dbdd20aSAndroid Build Coastguard Worker 8*6dbdd20aSAndroid Build Coastguard WorkerProtoZero has been designed and optimized for proto serialization, which is used 9*6dbdd20aSAndroid Build Coastguard Workerby all Perfetto tracing paths. 10*6dbdd20aSAndroid Build Coastguard WorkerDeserialization was introduced only at a later stage of the project and is 11*6dbdd20aSAndroid Build Coastguard Workermainly used by offline tools 12*6dbdd20aSAndroid Build Coastguard Worker(e.g., [TraceProcessor](/docs/analysis/trace-processor.md). 13*6dbdd20aSAndroid Build Coastguard WorkerThe _zero-copy zero-alloc zero-syscall_ statement applies only to the 14*6dbdd20aSAndroid Build Coastguard Workerserialization code. 15*6dbdd20aSAndroid Build Coastguard Worker 16*6dbdd20aSAndroid Build Coastguard WorkerPerfetto makes extensive use of protobuf in tracing fast-paths. Every trace 17*6dbdd20aSAndroid Build Coastguard Workerevent in Perfetto is a proto 18*6dbdd20aSAndroid Build Coastguard Worker(see [TracePacket](/docs/reference/trace-packet-proto.autogen) reference). This 19*6dbdd20aSAndroid Build Coastguard Workerallows events to be strongly typed and makes it easier for the team to maintain 20*6dbdd20aSAndroid Build Coastguard Workerbackwards compatibility using a language that is understood across the board. 21*6dbdd20aSAndroid Build Coastguard Worker 22*6dbdd20aSAndroid Build Coastguard WorkerTracing fast-paths need to have very little overhead, because instrumentation 23*6dbdd20aSAndroid Build Coastguard Workerpoints are sprinkled all over the codebase of projects like Android 24*6dbdd20aSAndroid Build Coastguard Workerand Chrome and are performance-critical. 25*6dbdd20aSAndroid Build Coastguard Worker 26*6dbdd20aSAndroid Build Coastguard WorkerOverhead here is not just defined as CPU time (or instructions retired) it 27*6dbdd20aSAndroid Build Coastguard Workertakes to execute the instrumentation point. A big source of overhead in a 28*6dbdd20aSAndroid Build Coastguard Workertracing system is represented by the working set of the instrumentation points, 29*6dbdd20aSAndroid Build Coastguard Workerspecifically extra I-cache and D-cache misses which would slow down the 30*6dbdd20aSAndroid Build Coastguard Workernon-tracing code _after_ the tracing instrumentation point. 31*6dbdd20aSAndroid Build Coastguard Worker 32*6dbdd20aSAndroid Build Coastguard WorkerThe major design departures of ProtoZero from canonical C++ protobuf libraries 33*6dbdd20aSAndroid Build Coastguard Workerlike [libprotobuf](https://github.com/google/protobuf) are: 34*6dbdd20aSAndroid Build Coastguard Worker 35*6dbdd20aSAndroid Build Coastguard Worker* Treating serialization and deserialization as different use-cases served by 36*6dbdd20aSAndroid Build Coastguard Worker different code. 37*6dbdd20aSAndroid Build Coastguard Worker 38*6dbdd20aSAndroid Build Coastguard Worker* Optimizing for binary size and working-set-size on the serialization paths. 39*6dbdd20aSAndroid Build Coastguard Worker 40*6dbdd20aSAndroid Build Coastguard Worker* Ignoring most of the error checking and long-tail features of protobuf 41*6dbdd20aSAndroid Build Coastguard Worker (repeated vs optional, full type checks). 42*6dbdd20aSAndroid Build Coastguard Worker 43*6dbdd20aSAndroid Build Coastguard Worker* ProtoZero is not designed as general-purpose protobuf de/serialization and is 44*6dbdd20aSAndroid Build Coastguard Worker heavily customized to maintain the tracing writing code minimal and allow the 45*6dbdd20aSAndroid Build Coastguard Worker compiler to see through the architectural layers. 46*6dbdd20aSAndroid Build Coastguard Worker 47*6dbdd20aSAndroid Build Coastguard Worker* Code generated by ProtoZero needs to be hermetic. When building the 48*6dbdd20aSAndroid Build Coastguard Worker amalgamated [Tracing SDK](/docs/instrumentation/tracing-sdk.md), the all 49*6dbdd20aSAndroid Build Coastguard Worker perfetto tracing sources need to not have any dependency on any other 50*6dbdd20aSAndroid Build Coastguard Worker libraries other than the C++ standard library and C library. 51*6dbdd20aSAndroid Build Coastguard Worker 52*6dbdd20aSAndroid Build Coastguard Worker## Usage 53*6dbdd20aSAndroid Build Coastguard Worker 54*6dbdd20aSAndroid Build Coastguard WorkerAt the build-system level, ProtoZero is extremely similar to the conventional 55*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf library. 56*6dbdd20aSAndroid Build Coastguard WorkerThe ProtoZero `.proto -> .pbzero.{cc,h}` compiler is based on top of the 57*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf parser and compiler infrastructure. ProtoZero is as a `protoc` 58*6dbdd20aSAndroid Build Coastguard Workercompiler plugin. 59*6dbdd20aSAndroid Build Coastguard Worker 60*6dbdd20aSAndroid Build Coastguard WorkerProtoZero has a build-time-only dependency on libprotobuf (the plugin depends 61*6dbdd20aSAndroid Build Coastguard Workeron libprotobuf's parser and compiler). The `.pbzero.{cc,h}` code generated by 62*6dbdd20aSAndroid Build Coastguard Workerit, however, has no runtime dependency (not even header-only dependencies) on 63*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf. 64*6dbdd20aSAndroid Build Coastguard Worker 65*6dbdd20aSAndroid Build Coastguard WorkerIn order to generate ProtoZero stubs from proto you need to: 66*6dbdd20aSAndroid Build Coastguard Worker 67*6dbdd20aSAndroid Build Coastguard Worker1. Build the ProtoZero compiler plugin, which lives in 68*6dbdd20aSAndroid Build Coastguard Worker [src/protozero/protoc_plugin/](/src/protozero/protoc_plugin/). 69*6dbdd20aSAndroid Build Coastguard Worker ```bash 70*6dbdd20aSAndroid Build Coastguard Worker tools/ninja -C out/default protozero_plugin protoc 71*6dbdd20aSAndroid Build Coastguard Worker ``` 72*6dbdd20aSAndroid Build Coastguard Worker 73*6dbdd20aSAndroid Build Coastguard Worker2. Invoke the libprotobuf `protoc` compiler passing the `protozero_plugin`: 74*6dbdd20aSAndroid Build Coastguard Worker ```bash 75*6dbdd20aSAndroid Build Coastguard Worker out/default/protoc \ 76*6dbdd20aSAndroid Build Coastguard Worker --plugin=protoc-gen-plugin=out/default/protozero_plugin \ 77*6dbdd20aSAndroid Build Coastguard Worker --plugin_out=wrapper_namespace=pbzero:/tmp/ \ 78*6dbdd20aSAndroid Build Coastguard Worker test_msg.proto 79*6dbdd20aSAndroid Build Coastguard Worker ``` 80*6dbdd20aSAndroid Build Coastguard Worker This generates `/tmp/test_msg.pbzero.{cc,h}`. 81*6dbdd20aSAndroid Build Coastguard Worker 82*6dbdd20aSAndroid Build Coastguard Worker NOTE: The .cc file is always empty. ProtoZero-generated code is header only. 83*6dbdd20aSAndroid Build Coastguard Worker The .cc file is emitted only because some build systems' rules assume that 84*6dbdd20aSAndroid Build Coastguard Worker protobuf codegens generate both a .cc and a .h file. 85*6dbdd20aSAndroid Build Coastguard Worker 86*6dbdd20aSAndroid Build Coastguard Worker## Proto serialization 87*6dbdd20aSAndroid Build Coastguard Worker 88*6dbdd20aSAndroid Build Coastguard WorkerThe quickest way to undestand ProtoZero design principles is to start from a 89*6dbdd20aSAndroid Build Coastguard Workersmall example and compare the generated code between libprotobuf and ProtoZero. 90*6dbdd20aSAndroid Build Coastguard Worker 91*6dbdd20aSAndroid Build Coastguard Worker```protobuf 92*6dbdd20aSAndroid Build Coastguard Workersyntax = "proto2"; 93*6dbdd20aSAndroid Build Coastguard Worker 94*6dbdd20aSAndroid Build Coastguard Workermessage TestMsg { 95*6dbdd20aSAndroid Build Coastguard Worker optional string str_val = 1; 96*6dbdd20aSAndroid Build Coastguard Worker optional int32 int_val = 2; 97*6dbdd20aSAndroid Build Coastguard Worker repeated TestMsg nested = 3; 98*6dbdd20aSAndroid Build Coastguard Worker} 99*6dbdd20aSAndroid Build Coastguard Worker``` 100*6dbdd20aSAndroid Build Coastguard Worker 101*6dbdd20aSAndroid Build Coastguard Worker#### libprotobuf approach 102*6dbdd20aSAndroid Build Coastguard Worker 103*6dbdd20aSAndroid Build Coastguard WorkerThe libprotobuf approach is to generate a C++ class that has one member for each 104*6dbdd20aSAndroid Build Coastguard Workerproto field, with dedicated serialization and de-serialization methods. 105*6dbdd20aSAndroid Build Coastguard Worker 106*6dbdd20aSAndroid Build Coastguard Worker```bash 107*6dbdd20aSAndroid Build Coastguard Workerout/default/protoc --cpp_out=. test_msg.proto 108*6dbdd20aSAndroid Build Coastguard Worker``` 109*6dbdd20aSAndroid Build Coastguard Worker 110*6dbdd20aSAndroid Build Coastguard Workergenerates test_msg.pb.{cc,h}. With many degrees of simplification, it looks 111*6dbdd20aSAndroid Build Coastguard Workeras follows: 112*6dbdd20aSAndroid Build Coastguard Worker 113*6dbdd20aSAndroid Build Coastguard Worker```c++ 114*6dbdd20aSAndroid Build Coastguard Worker// This class is generated by the standard protoc compiler in the .pb.h source. 115*6dbdd20aSAndroid Build Coastguard Workerclass TestMsg : public protobuf::MessageLite { 116*6dbdd20aSAndroid Build Coastguard Worker private: 117*6dbdd20aSAndroid Build Coastguard Worker int32 int_val_; 118*6dbdd20aSAndroid Build Coastguard Worker ArenaStringPtr str_val_; 119*6dbdd20aSAndroid Build Coastguard Worker RepeatedPtrField<TestMsg> nested_; // Effectively a vector<TestMsg> 120*6dbdd20aSAndroid Build Coastguard Worker 121*6dbdd20aSAndroid Build Coastguard Worker public: 122*6dbdd20aSAndroid Build Coastguard Worker const std::string& str_val() const; 123*6dbdd20aSAndroid Build Coastguard Worker void set_str_val(const std::string& value); 124*6dbdd20aSAndroid Build Coastguard Worker 125*6dbdd20aSAndroid Build Coastguard Worker bool has_int_val() const; 126*6dbdd20aSAndroid Build Coastguard Worker int32_t int_val() const; 127*6dbdd20aSAndroid Build Coastguard Worker void set_int_val(int32_t value); 128*6dbdd20aSAndroid Build Coastguard Worker 129*6dbdd20aSAndroid Build Coastguard Worker ::TestMsg* add_nested(); 130*6dbdd20aSAndroid Build Coastguard Worker ::TestMsg* mutable_nested(int index); 131*6dbdd20aSAndroid Build Coastguard Worker const TestMsg& nested(int index); 132*6dbdd20aSAndroid Build Coastguard Worker 133*6dbdd20aSAndroid Build Coastguard Worker std::string SerializeAsString(); 134*6dbdd20aSAndroid Build Coastguard Worker bool ParseFromString(const std::string&); 135*6dbdd20aSAndroid Build Coastguard Worker} 136*6dbdd20aSAndroid Build Coastguard Worker``` 137*6dbdd20aSAndroid Build Coastguard Worker 138*6dbdd20aSAndroid Build Coastguard WorkerThe main characteristic of these stubs are: 139*6dbdd20aSAndroid Build Coastguard Worker 140*6dbdd20aSAndroid Build Coastguard Worker* Code generated from .proto messages can be used in the codebase as general 141*6dbdd20aSAndroid Build Coastguard Worker purpose objects, without ever using the `SerializeAs*()` or `ParseFrom*()` 142*6dbdd20aSAndroid Build Coastguard Worker methods (although anecdotal evidence suggests that most project use these 143*6dbdd20aSAndroid Build Coastguard Worker proto-generated classes only at the de/serialization endpoints). 144*6dbdd20aSAndroid Build Coastguard Worker 145*6dbdd20aSAndroid Build Coastguard Worker* The end-to-end journey of serializing a proto involves two steps: 146*6dbdd20aSAndroid Build Coastguard Worker 1. Setting the individual int / string / vector fields of the generated class. 147*6dbdd20aSAndroid Build Coastguard Worker 2. Doing a serialization pass over these fields. 148*6dbdd20aSAndroid Build Coastguard Worker 149*6dbdd20aSAndroid Build Coastguard Worker In turn this has side-effects on the code generated. STL copy/assignment 150*6dbdd20aSAndroid Build Coastguard Worker operators for strings and vectors are non-trivial because, for instance, they 151*6dbdd20aSAndroid Build Coastguard Worker need to deal with dynamic memory resizing. 152*6dbdd20aSAndroid Build Coastguard Worker 153*6dbdd20aSAndroid Build Coastguard Worker#### ProtoZero approach 154*6dbdd20aSAndroid Build Coastguard Worker 155*6dbdd20aSAndroid Build Coastguard Worker```c++ 156*6dbdd20aSAndroid Build Coastguard Worker// This class is generated by the ProtoZero plugin in the .pbzero.h source. 157*6dbdd20aSAndroid Build Coastguard Workerclass TestMsg : public protozero::Message { 158*6dbdd20aSAndroid Build Coastguard Worker public: 159*6dbdd20aSAndroid Build Coastguard Worker void set_str_val(const std::string& value) { 160*6dbdd20aSAndroid Build Coastguard Worker AppendBytes(/*field_id=*/1, value.data(), value.size()); 161*6dbdd20aSAndroid Build Coastguard Worker } 162*6dbdd20aSAndroid Build Coastguard Worker void set_str_val(const char* data, size_t size) { 163*6dbdd20aSAndroid Build Coastguard Worker AppendBytes(/*field_id=*/1, data, size); 164*6dbdd20aSAndroid Build Coastguard Worker } 165*6dbdd20aSAndroid Build Coastguard Worker void set_int_val(int32_t value) { 166*6dbdd20aSAndroid Build Coastguard Worker AppendVarInt(/*field_id=*/2, value); 167*6dbdd20aSAndroid Build Coastguard Worker } 168*6dbdd20aSAndroid Build Coastguard Worker TestMsg* add_nested() { 169*6dbdd20aSAndroid Build Coastguard Worker return BeginNestedMessage<TestMsg>(/*field_id=*/3); 170*6dbdd20aSAndroid Build Coastguard Worker } 171*6dbdd20aSAndroid Build Coastguard Worker} 172*6dbdd20aSAndroid Build Coastguard Worker``` 173*6dbdd20aSAndroid Build Coastguard Worker 174*6dbdd20aSAndroid Build Coastguard WorkerThe ProtoZero-generated stubs are append-only. As the `set_*`, `add_*` methods 175*6dbdd20aSAndroid Build Coastguard Workerare invoked, the passed arguments are directly serialized into the target 176*6dbdd20aSAndroid Build Coastguard Workerbuffer. This introduces some limitations: 177*6dbdd20aSAndroid Build Coastguard Worker 178*6dbdd20aSAndroid Build Coastguard Worker* Readback is not possible: these classes cannot be used as C++ struct 179*6dbdd20aSAndroid Build Coastguard Worker replacements. 180*6dbdd20aSAndroid Build Coastguard Worker 181*6dbdd20aSAndroid Build Coastguard Worker* No error-checking is performed: nothing prevents a non-repeated field to be 182*6dbdd20aSAndroid Build Coastguard Worker emitted twice in the serialized proto if the caller accidentally calls a 183*6dbdd20aSAndroid Build Coastguard Worker `set_*()` method twice. Basic type checks are still performed at compile-time 184*6dbdd20aSAndroid Build Coastguard Worker though. 185*6dbdd20aSAndroid Build Coastguard Worker 186*6dbdd20aSAndroid Build Coastguard Worker* Nested fields must be filled in a stack fashion and cannot be written 187*6dbdd20aSAndroid Build Coastguard Worker interleaved. Once a nested message is started, its fields must be set before 188*6dbdd20aSAndroid Build Coastguard Worker going back setting the fields of the parent message. This turns out to not be 189*6dbdd20aSAndroid Build Coastguard Worker a problem for most tracing use-cases. 190*6dbdd20aSAndroid Build Coastguard Worker 191*6dbdd20aSAndroid Build Coastguard WorkerThis has a number of advantages: 192*6dbdd20aSAndroid Build Coastguard Worker 193*6dbdd20aSAndroid Build Coastguard Worker* The classes generated by ProtoZero don't add any extra state on top of the 194*6dbdd20aSAndroid Build Coastguard Worker base class they derive (`protozero::Message`). They define only inline 195*6dbdd20aSAndroid Build Coastguard Worker setter methods that call base-class serialization methods. Compilers can 196*6dbdd20aSAndroid Build Coastguard Worker see through all the inline expansions of these methods. 197*6dbdd20aSAndroid Build Coastguard Worker 198*6dbdd20aSAndroid Build Coastguard Worker* As a consequence of that, the binary cost of ProtoZero is independent of the 199*6dbdd20aSAndroid Build Coastguard Worker number of protobuf messages defined and their fields, and depends only on the 200*6dbdd20aSAndroid Build Coastguard Worker number of `set_*`/`add_*` calls. This (i.e. binary cost of non-used proto 201*6dbdd20aSAndroid Build Coastguard Worker messages and fields) anecdotally has been a big issue with libprotobuf. 202*6dbdd20aSAndroid Build Coastguard Worker 203*6dbdd20aSAndroid Build Coastguard Worker* The serialization methods don't involve any copy or dynamic allocation. The 204*6dbdd20aSAndroid Build Coastguard Worker inline expansion calls directly into the corresponding `AppendVarInt()` / 205*6dbdd20aSAndroid Build Coastguard Worker `AppendString()` methods of `protozero::Message`. 206*6dbdd20aSAndroid Build Coastguard Worker 207*6dbdd20aSAndroid Build Coastguard Worker* This allows to directly serialize trace events into the 208*6dbdd20aSAndroid Build Coastguard Worker [tracing shared memory buffers](/docs/concepts/buffers.md), even if they are 209*6dbdd20aSAndroid Build Coastguard Worker not contiguous. 210*6dbdd20aSAndroid Build Coastguard Worker 211*6dbdd20aSAndroid Build Coastguard Worker### Scattered buffer writing 212*6dbdd20aSAndroid Build Coastguard Worker 213*6dbdd20aSAndroid Build Coastguard WorkerA key part of the ProtoZero design is supporting direct serialization on 214*6dbdd20aSAndroid Build Coastguard Workernon-globally-contiguous sequences of contiguous memory regions. 215*6dbdd20aSAndroid Build Coastguard Worker 216*6dbdd20aSAndroid Build Coastguard WorkerThis happens by decoupling `protozero::Message`, the base class for all the 217*6dbdd20aSAndroid Build Coastguard Workergenerated classes, from the `protozero::ScatteredStreamWriter`. 218*6dbdd20aSAndroid Build Coastguard WorkerThe problem it solves is the following: ProtoZero is based on direct 219*6dbdd20aSAndroid Build Coastguard Workerserialization into shared memory buffers chunks. These chunks are 4KB - 32KB in 220*6dbdd20aSAndroid Build Coastguard Workermost cases. At the same time, there is no limit in how much data the caller will 221*6dbdd20aSAndroid Build Coastguard Workertry to write into an individual message, a trace event can be up to 256 MiB big. 222*6dbdd20aSAndroid Build Coastguard Worker 223*6dbdd20aSAndroid Build Coastguard Worker 224*6dbdd20aSAndroid Build Coastguard Worker 225*6dbdd20aSAndroid Build Coastguard Worker#### Fast-path 226*6dbdd20aSAndroid Build Coastguard Worker 227*6dbdd20aSAndroid Build Coastguard WorkerAt all times the underlying `ScatteredStreamWriter` knows what are the bounds 228*6dbdd20aSAndroid Build Coastguard Workerof the current buffer. All write operations are bound checked and hit a 229*6dbdd20aSAndroid Build Coastguard Workerslow-path when crossing the buffer boundary. 230*6dbdd20aSAndroid Build Coastguard Worker 231*6dbdd20aSAndroid Build Coastguard WorkerMost write operations can be completed within the current buffer boundaries. 232*6dbdd20aSAndroid Build Coastguard WorkerIn that case, the cost of a `set_*` operation is in essence a `memcpy()` with 233*6dbdd20aSAndroid Build Coastguard Workerthe extra overhead of var-int encoding for protobuf preambles and 234*6dbdd20aSAndroid Build Coastguard Workerlength-delimited fields. 235*6dbdd20aSAndroid Build Coastguard Worker 236*6dbdd20aSAndroid Build Coastguard Worker#### Slow-path 237*6dbdd20aSAndroid Build Coastguard Worker 238*6dbdd20aSAndroid Build Coastguard WorkerWhen crossing the boundary, the slow-path asks the 239*6dbdd20aSAndroid Build Coastguard Worker`ScatteredStreamWriter::Delegate` for a new buffer. The implementation of 240*6dbdd20aSAndroid Build Coastguard Worker`GetNewBuffer()` is up to the client. In tracing use-cases, that call will 241*6dbdd20aSAndroid Build Coastguard Workeracquire a new thread-local chunk from the tracing shared memory buffer. 242*6dbdd20aSAndroid Build Coastguard Worker 243*6dbdd20aSAndroid Build Coastguard WorkerOther heap-based implementations are possible. For instance, the ProtoZero 244*6dbdd20aSAndroid Build Coastguard Workersources provide a helper class `HeapBuffered<TestMsg>`, mainly used in tests (see 245*6dbdd20aSAndroid Build Coastguard Worker[scattered_heap_buffer.h](/include/perfetto/protozero/scattered_heap_buffer.h)), 246*6dbdd20aSAndroid Build Coastguard Workerwhich allocates a new heap buffer when crossing the boundaries of the current 247*6dbdd20aSAndroid Build Coastguard Workerone. 248*6dbdd20aSAndroid Build Coastguard Worker 249*6dbdd20aSAndroid Build Coastguard WorkerConsider the following example: 250*6dbdd20aSAndroid Build Coastguard Worker 251*6dbdd20aSAndroid Build Coastguard Worker```c++ 252*6dbdd20aSAndroid Build Coastguard WorkerTestMsg outer_msg; 253*6dbdd20aSAndroid Build Coastguard Workerfor (int i = 0; i < 1000; i++) { 254*6dbdd20aSAndroid Build Coastguard Worker TestMsg* nested = outer_msg.add_nested(); 255*6dbdd20aSAndroid Build Coastguard Worker nested->set_int_val(42); 256*6dbdd20aSAndroid Build Coastguard Worker} 257*6dbdd20aSAndroid Build Coastguard Worker``` 258*6dbdd20aSAndroid Build Coastguard Worker 259*6dbdd20aSAndroid Build Coastguard WorkerAt some point one of the `set_int_val()` calls will hit the slow-path and 260*6dbdd20aSAndroid Build Coastguard Workeracquire a new buffer. The overall idea is having a serialization mechanism 261*6dbdd20aSAndroid Build Coastguard Workerthat is extremely lightweight most of the times and that requires some extra 262*6dbdd20aSAndroid Build Coastguard Workerfunction calls when buffer boundary, so that their cost gets amortized across 263*6dbdd20aSAndroid Build Coastguard Workerall trace events. 264*6dbdd20aSAndroid Build Coastguard Worker 265*6dbdd20aSAndroid Build Coastguard WorkerIn the context of the overall Perfetto tracing use case, the slow-path involves 266*6dbdd20aSAndroid Build Coastguard Workergrabbing a process-local mutex and finding the next free chunk in the shared 267*6dbdd20aSAndroid Build Coastguard Workermemory buffer. Hence writes are lock-free as long as they happen within the 268*6dbdd20aSAndroid Build Coastguard Workerthread-local chunk and require a critical section to acquire a new chunk once 269*6dbdd20aSAndroid Build Coastguard Workerevery 4KB-32KB (depending on the trace configuration). 270*6dbdd20aSAndroid Build Coastguard Worker 271*6dbdd20aSAndroid Build Coastguard WorkerThe assumption is that the likeliness that two threads will cross the chunk 272*6dbdd20aSAndroid Build Coastguard Workerboundary and call `GetNewBuffer()` at the same time is extremely low and hence 273*6dbdd20aSAndroid Build Coastguard Workerthe critical section is un-contended most of the times. 274*6dbdd20aSAndroid Build Coastguard Worker 275*6dbdd20aSAndroid Build Coastguard Worker```mermaid 276*6dbdd20aSAndroid Build Coastguard WorkersequenceDiagram 277*6dbdd20aSAndroid Build Coastguard Worker participant C as Call site 278*6dbdd20aSAndroid Build Coastguard Worker participant M as Message 279*6dbdd20aSAndroid Build Coastguard Worker participant SSR as ScatteredStreamWriter 280*6dbdd20aSAndroid Build Coastguard Worker participant DEL as Buffer Delegate 281*6dbdd20aSAndroid Build Coastguard Worker C->>M: set_int_val(...) 282*6dbdd20aSAndroid Build Coastguard Worker activate C 283*6dbdd20aSAndroid Build Coastguard Worker M->>SSR: AppendVarInt(...) 284*6dbdd20aSAndroid Build Coastguard Worker deactivate C 285*6dbdd20aSAndroid Build Coastguard Worker Note over C,SSR: A typical write on the fast-path 286*6dbdd20aSAndroid Build Coastguard Worker 287*6dbdd20aSAndroid Build Coastguard Worker C->>M: set_str_val(...) 288*6dbdd20aSAndroid Build Coastguard Worker activate C 289*6dbdd20aSAndroid Build Coastguard Worker M->>SSR: AppendString(...) 290*6dbdd20aSAndroid Build Coastguard Worker SSR->>DEL: GetNewBuffer(...) 291*6dbdd20aSAndroid Build Coastguard Worker deactivate C 292*6dbdd20aSAndroid Build Coastguard Worker Note over C,DEL: A write on the slow-path when crossing 4KB - 32KB chunks. 293*6dbdd20aSAndroid Build Coastguard Worker``` 294*6dbdd20aSAndroid Build Coastguard Worker 295*6dbdd20aSAndroid Build Coastguard Worker### Deferred patching 296*6dbdd20aSAndroid Build Coastguard Worker 297*6dbdd20aSAndroid Build Coastguard WorkerNested messages in the protobuf binary encoding are prefixed with their 298*6dbdd20aSAndroid Build Coastguard Workervarint-encoded size. 299*6dbdd20aSAndroid Build Coastguard Worker 300*6dbdd20aSAndroid Build Coastguard WorkerConsider the following: 301*6dbdd20aSAndroid Build Coastguard Worker 302*6dbdd20aSAndroid Build Coastguard Worker```c++ 303*6dbdd20aSAndroid Build Coastguard WorkerTestMsg* nested = outer_msg.add_nested(); 304*6dbdd20aSAndroid Build Coastguard Workernested->set_int_val(42); 305*6dbdd20aSAndroid Build Coastguard Workernested->set_str_val("foo"); 306*6dbdd20aSAndroid Build Coastguard Worker``` 307*6dbdd20aSAndroid Build Coastguard Worker 308*6dbdd20aSAndroid Build Coastguard WorkerThe canonical encoding of this protobuf message, using libprotobuf, would be: 309*6dbdd20aSAndroid Build Coastguard Worker 310*6dbdd20aSAndroid Build Coastguard Worker```bash 311*6dbdd20aSAndroid Build Coastguard Worker1a 07 0a 03 66 6f 6f 10 2a 312*6dbdd20aSAndroid Build Coastguard Worker^-+-^ ^-----+------^ ^-+-^ 313*6dbdd20aSAndroid Build Coastguard Worker | | | 314*6dbdd20aSAndroid Build Coastguard Worker | | +--> Field ID: 2 [int_val], value = 42. 315*6dbdd20aSAndroid Build Coastguard Worker | | 316*6dbdd20aSAndroid Build Coastguard Worker | +------> Field ID: 1 [str_val], len = 3, value = "foo" (66 6f 6f). 317*6dbdd20aSAndroid Build Coastguard Worker | 318*6dbdd20aSAndroid Build Coastguard Worker +------> Field ID: 3 [nested], length: 7 # !!! 319*6dbdd20aSAndroid Build Coastguard Worker``` 320*6dbdd20aSAndroid Build Coastguard Worker 321*6dbdd20aSAndroid Build Coastguard WorkerThe second byte in this sequence (07) is problematic for direct encoding. At the 322*6dbdd20aSAndroid Build Coastguard Workerpoint where `outer_msg.add_nested()` is called, we can't possibly know upfront 323*6dbdd20aSAndroid Build Coastguard Workerwhat the overall size of the nested message will be (in this case, 5 + 2 = 7). 324*6dbdd20aSAndroid Build Coastguard Worker 325*6dbdd20aSAndroid Build Coastguard WorkerThe way we get around this in ProtoZero is by reserving four bytes for the 326*6dbdd20aSAndroid Build Coastguard Worker_size_ of each nested message and back-filling them once the message is 327*6dbdd20aSAndroid Build Coastguard Workerfinalized (or when we try to set a field in one of the parent messages). 328*6dbdd20aSAndroid Build Coastguard WorkerWe do this by encoding the size of the message using redundant varint encoding, 329*6dbdd20aSAndroid Build Coastguard Workerin this case: `87 80 80 00` instead of `07`. 330*6dbdd20aSAndroid Build Coastguard Worker 331*6dbdd20aSAndroid Build Coastguard WorkerAt the C++ level, the `protozero::Message` class holds a pointer to its `size` 332*6dbdd20aSAndroid Build Coastguard Workerfield, which typically points to the beginning of the message, where the four 333*6dbdd20aSAndroid Build Coastguard Workerbytes are reserved, and back-fills it in the `Message::Finalize()` pass. 334*6dbdd20aSAndroid Build Coastguard Worker 335*6dbdd20aSAndroid Build Coastguard WorkerThis works fine for cases where the entire message lies in one contiguous buffer 336*6dbdd20aSAndroid Build Coastguard Workerbut opens a further challenge: a message can be several MBs big. Looking at this 337*6dbdd20aSAndroid Build Coastguard Workerfrom the overall tracing perspective, the shared memory buffer chunk that holds 338*6dbdd20aSAndroid Build Coastguard Workerthe beginning of a message can be long gone (i.e. committed in the central 339*6dbdd20aSAndroid Build Coastguard Workerservice buffer) by the time we get to the end. 340*6dbdd20aSAndroid Build Coastguard Worker 341*6dbdd20aSAndroid Build Coastguard WorkerIn order to support this use case, at the tracing code level (outside of 342*6dbdd20aSAndroid Build Coastguard WorkerProtoZero), when a message crosses the buffer boundary, its `size` field gets 343*6dbdd20aSAndroid Build Coastguard Workerredirected to a temporary patch buffer 344*6dbdd20aSAndroid Build Coastguard Worker(see [patch_list.h](/src/tracing/core/patch_list.h)). This patch buffer is then 345*6dbdd20aSAndroid Build Coastguard Workersent out-of-band, piggybacking over the next commit IPC (see 346*6dbdd20aSAndroid Build Coastguard Worker[Tracing Protocol ABI](/docs/design-docs/api-and-abi.md#tracing-protocol-abi)) 347*6dbdd20aSAndroid Build Coastguard Worker 348*6dbdd20aSAndroid Build Coastguard Worker### Performance characteristics 349*6dbdd20aSAndroid Build Coastguard Worker 350*6dbdd20aSAndroid Build Coastguard WorkerNOTE: For the full code of the benchmark see 351*6dbdd20aSAndroid Build Coastguard Worker `/src/protozero/test/protozero_benchmark.cc` 352*6dbdd20aSAndroid Build Coastguard Worker 353*6dbdd20aSAndroid Build Coastguard WorkerWe consider two scenarios: writing a simple event and a nested event 354*6dbdd20aSAndroid Build Coastguard Worker 355*6dbdd20aSAndroid Build Coastguard Worker#### Simple event 356*6dbdd20aSAndroid Build Coastguard Worker 357*6dbdd20aSAndroid Build Coastguard WorkerConsists of filling a flat proto message with of 4 integers (2 x 32-bit, 358*6dbdd20aSAndroid Build Coastguard Worker2 x 64-bit) and a 32 bytes string, as follows: 359*6dbdd20aSAndroid Build Coastguard Worker 360*6dbdd20aSAndroid Build Coastguard Worker```c++ 361*6dbdd20aSAndroid Build Coastguard Workervoid FillMessage_Simple(T* msg) { 362*6dbdd20aSAndroid Build Coastguard Worker msg->set_field_int32(...); 363*6dbdd20aSAndroid Build Coastguard Worker msg->set_field_uint32(...); 364*6dbdd20aSAndroid Build Coastguard Worker msg->set_field_int64(...); 365*6dbdd20aSAndroid Build Coastguard Worker msg->set_field_uint64(...); 366*6dbdd20aSAndroid Build Coastguard Worker msg->set_field_string(...); 367*6dbdd20aSAndroid Build Coastguard Worker} 368*6dbdd20aSAndroid Build Coastguard Worker``` 369*6dbdd20aSAndroid Build Coastguard Worker 370*6dbdd20aSAndroid Build Coastguard Worker#### Nested event 371*6dbdd20aSAndroid Build Coastguard Worker 372*6dbdd20aSAndroid Build Coastguard WorkerConsists of filling a similar message which is recursively nested 3 levels deep: 373*6dbdd20aSAndroid Build Coastguard Worker 374*6dbdd20aSAndroid Build Coastguard Worker```c++ 375*6dbdd20aSAndroid Build Coastguard Workervoid FillMessage_Nested(T* msg, int depth = 0) { 376*6dbdd20aSAndroid Build Coastguard Worker FillMessage_Simple(msg); 377*6dbdd20aSAndroid Build Coastguard Worker if (depth < 3) { 378*6dbdd20aSAndroid Build Coastguard Worker auto* child = msg->add_field_nested(); 379*6dbdd20aSAndroid Build Coastguard Worker FillMessage_Nested(child, depth + 1); 380*6dbdd20aSAndroid Build Coastguard Worker } 381*6dbdd20aSAndroid Build Coastguard Worker} 382*6dbdd20aSAndroid Build Coastguard Worker``` 383*6dbdd20aSAndroid Build Coastguard Worker 384*6dbdd20aSAndroid Build Coastguard Worker#### Comparison terms 385*6dbdd20aSAndroid Build Coastguard Worker 386*6dbdd20aSAndroid Build Coastguard WorkerWe compare, for the same message type, the performance of ProtoZero, 387*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf and a speed-of-light serializer. 388*6dbdd20aSAndroid Build Coastguard Worker 389*6dbdd20aSAndroid Build Coastguard WorkerThe speed-of-light serializer is a very simple C++ class that just appends 390*6dbdd20aSAndroid Build Coastguard Workerdata into a linear buffer making all sorts of favourable assumptions. It does 391*6dbdd20aSAndroid Build Coastguard Workernot use any binary-stable encoding, it does not perform bound checking, 392*6dbdd20aSAndroid Build Coastguard Workerall writes are 64-bit aligned, it doesn't deal with any thread-safety. 393*6dbdd20aSAndroid Build Coastguard Worker 394*6dbdd20aSAndroid Build Coastguard Worker```c++ 395*6dbdd20aSAndroid Build Coastguard Workerstruct SOLMsg { 396*6dbdd20aSAndroid Build Coastguard Worker template <typename T> 397*6dbdd20aSAndroid Build Coastguard Worker void Append(T x) { 398*6dbdd20aSAndroid Build Coastguard Worker // The memcpy will be elided by the compiler, which will emit just a 399*6dbdd20aSAndroid Build Coastguard Worker // 64-bit aligned mov instruction. 400*6dbdd20aSAndroid Build Coastguard Worker memcpy(reinterpret_cast<void*>(ptr_), &x, sizeof(x)); 401*6dbdd20aSAndroid Build Coastguard Worker ptr_ += sizeof(x); 402*6dbdd20aSAndroid Build Coastguard Worker } 403*6dbdd20aSAndroid Build Coastguard Worker 404*6dbdd20aSAndroid Build Coastguard Worker void set_field_int32(int32_t x) { Append(x); } 405*6dbdd20aSAndroid Build Coastguard Worker void set_field_uint32(uint32_t x) { Append(x); } 406*6dbdd20aSAndroid Build Coastguard Worker void set_field_int64(int64_t x) { Append(x); } 407*6dbdd20aSAndroid Build Coastguard Worker void set_field_uint64(uint64_t x) { Append(x); } 408*6dbdd20aSAndroid Build Coastguard Worker void set_field_string(const char* str) { ptr_ = strcpy(ptr_, str); } 409*6dbdd20aSAndroid Build Coastguard Worker 410*6dbdd20aSAndroid Build Coastguard Worker alignas(uint64_t) char storage_[sizeof(g_fake_input_simple) + 8]; 411*6dbdd20aSAndroid Build Coastguard Worker char* ptr_ = &storage_[0]; 412*6dbdd20aSAndroid Build Coastguard Worker}; 413*6dbdd20aSAndroid Build Coastguard Worker``` 414*6dbdd20aSAndroid Build Coastguard Worker 415*6dbdd20aSAndroid Build Coastguard WorkerThe speed-of-light serializer serves as a reference for _how fast a serializer 416*6dbdd20aSAndroid Build Coastguard Workercould be if argument marshalling and bound checking were zero cost._ 417*6dbdd20aSAndroid Build Coastguard Worker 418*6dbdd20aSAndroid Build Coastguard Worker#### Benchmark results 419*6dbdd20aSAndroid Build Coastguard Worker 420*6dbdd20aSAndroid Build Coastguard Worker##### Google Pixel 3 - aarch64 421*6dbdd20aSAndroid Build Coastguard Worker 422*6dbdd20aSAndroid Build Coastguard Worker```bash 423*6dbdd20aSAndroid Build Coastguard Worker$ cat out/droid_arm64/args.gn 424*6dbdd20aSAndroid Build Coastguard Workertarget_os = "android" 425*6dbdd20aSAndroid Build Coastguard Workeris_clang = true 426*6dbdd20aSAndroid Build Coastguard Workeris_debug = false 427*6dbdd20aSAndroid Build Coastguard Workertarget_cpu = "arm64" 428*6dbdd20aSAndroid Build Coastguard Worker 429*6dbdd20aSAndroid Build Coastguard Worker$ ninja -C out/droid_arm64/ perfetto_benchmarks && \ 430*6dbdd20aSAndroid Build Coastguard Worker adb push --sync out/droid_arm64/perfetto_benchmarks /data/local/tmp/perfetto_benchmarks && \ 431*6dbdd20aSAndroid Build Coastguard Worker adb shell '/data/local/tmp/perfetto_benchmarks --benchmark_filter=BM_Proto*' 432*6dbdd20aSAndroid Build Coastguard Worker 433*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------ 434*6dbdd20aSAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 435*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------ 436*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Libprotobuf 402 ns 398 ns 1732807 437*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Protozero 242 ns 239 ns 2929528 438*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_SpeedOfLight 118 ns 117 ns 6101381 439*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Libprotobuf 1810 ns 1800 ns 390468 440*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Protozero 780 ns 773 ns 901369 441*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_SpeedOfLight 138 ns 136 ns 5147958 442*6dbdd20aSAndroid Build Coastguard Worker``` 443*6dbdd20aSAndroid Build Coastguard Worker 444*6dbdd20aSAndroid Build Coastguard Worker##### HP Z920 workstation (Intel Xeon E5-2690 v4) running Linux 445*6dbdd20aSAndroid Build Coastguard Worker 446*6dbdd20aSAndroid Build Coastguard Worker```bash 447*6dbdd20aSAndroid Build Coastguard Worker 448*6dbdd20aSAndroid Build Coastguard Worker$ cat out/linux_clang_release/args.gn 449*6dbdd20aSAndroid Build Coastguard Workeris_clang = true 450*6dbdd20aSAndroid Build Coastguard Workeris_debug = false 451*6dbdd20aSAndroid Build Coastguard Worker 452*6dbdd20aSAndroid Build Coastguard Worker$ ninja -C out/linux_clang_release/ perfetto_benchmarks && \ 453*6dbdd20aSAndroid Build Coastguard Worker out/linux_clang_release/perfetto_benchmarks --benchmark_filter=BM_Proto* 454*6dbdd20aSAndroid Build Coastguard Worker 455*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------ 456*6dbdd20aSAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 457*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------ 458*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Libprotobuf 428 ns 428 ns 1624801 459*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Protozero 261 ns 261 ns 2715544 460*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_SpeedOfLight 111 ns 111 ns 6297387 461*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Libprotobuf 1625 ns 1625 ns 436411 462*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Protozero 843 ns 843 ns 849302 463*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_SpeedOfLight 140 ns 140 ns 5012910 464*6dbdd20aSAndroid Build Coastguard Worker``` 465