xref: /aosp_15_r20/external/perfetto/docs/design-docs/protozero.md (revision 6dbdd20afdafa5e3ca9b8809fa73465d530080dc)
1*6dbdd20aSAndroid Build Coastguard Worker# ProtoZero design document
2*6dbdd20aSAndroid Build Coastguard Worker
3*6dbdd20aSAndroid Build Coastguard WorkerProtoZero is a zero-copy zero-alloc zero-syscall protobuf serialization libary
4*6dbdd20aSAndroid Build Coastguard Workerpurposefully built for Perfetto's tracing use cases.
5*6dbdd20aSAndroid Build Coastguard Worker
6*6dbdd20aSAndroid Build Coastguard Worker## Motivations
7*6dbdd20aSAndroid Build Coastguard Worker
8*6dbdd20aSAndroid Build Coastguard WorkerProtoZero has been designed and optimized for proto serialization, which is used
9*6dbdd20aSAndroid Build Coastguard Workerby all Perfetto tracing paths.
10*6dbdd20aSAndroid Build Coastguard WorkerDeserialization was introduced only at a later stage of the project and is
11*6dbdd20aSAndroid Build Coastguard Workermainly used by offline tools
12*6dbdd20aSAndroid Build Coastguard Worker(e.g., [TraceProcessor](/docs/analysis/trace-processor.md).
13*6dbdd20aSAndroid Build Coastguard WorkerThe _zero-copy zero-alloc zero-syscall_ statement applies only to the
14*6dbdd20aSAndroid Build Coastguard Workerserialization code.
15*6dbdd20aSAndroid Build Coastguard Worker
16*6dbdd20aSAndroid Build Coastguard WorkerPerfetto makes extensive use of protobuf in tracing fast-paths. Every trace
17*6dbdd20aSAndroid Build Coastguard Workerevent in Perfetto is a proto
18*6dbdd20aSAndroid Build Coastguard Worker(see [TracePacket](/docs/reference/trace-packet-proto.autogen) reference). This
19*6dbdd20aSAndroid Build Coastguard Workerallows events to be strongly typed and makes it easier for the team to maintain
20*6dbdd20aSAndroid Build Coastguard Workerbackwards compatibility using a language that is understood across the board.
21*6dbdd20aSAndroid Build Coastguard Worker
22*6dbdd20aSAndroid Build Coastguard WorkerTracing fast-paths need to have very little overhead, because instrumentation
23*6dbdd20aSAndroid Build Coastguard Workerpoints are sprinkled all over the codebase of projects like Android
24*6dbdd20aSAndroid Build Coastguard Workerand Chrome and are performance-critical.
25*6dbdd20aSAndroid Build Coastguard Worker
26*6dbdd20aSAndroid Build Coastguard WorkerOverhead here is not just defined as CPU time (or instructions retired) it
27*6dbdd20aSAndroid Build Coastguard Workertakes to execute the instrumentation point. A big source of overhead in a
28*6dbdd20aSAndroid Build Coastguard Workertracing system is represented by the working set of the instrumentation points,
29*6dbdd20aSAndroid Build Coastguard Workerspecifically extra I-cache and D-cache misses which would slow down the
30*6dbdd20aSAndroid Build Coastguard Workernon-tracing code _after_ the tracing instrumentation point.
31*6dbdd20aSAndroid Build Coastguard Worker
32*6dbdd20aSAndroid Build Coastguard WorkerThe major design departures of ProtoZero from canonical C++ protobuf libraries
33*6dbdd20aSAndroid Build Coastguard Workerlike [libprotobuf](https://github.com/google/protobuf) are:
34*6dbdd20aSAndroid Build Coastguard Worker
35*6dbdd20aSAndroid Build Coastguard Worker* Treating serialization and deserialization as different use-cases served by
36*6dbdd20aSAndroid Build Coastguard Worker  different code.
37*6dbdd20aSAndroid Build Coastguard Worker
38*6dbdd20aSAndroid Build Coastguard Worker* Optimizing for binary size and working-set-size on the serialization paths.
39*6dbdd20aSAndroid Build Coastguard Worker
40*6dbdd20aSAndroid Build Coastguard Worker* Ignoring most of the error checking and long-tail features of protobuf
41*6dbdd20aSAndroid Build Coastguard Worker  (repeated vs optional, full type checks).
42*6dbdd20aSAndroid Build Coastguard Worker
43*6dbdd20aSAndroid Build Coastguard Worker* ProtoZero is not designed as general-purpose protobuf de/serialization and is
44*6dbdd20aSAndroid Build Coastguard Worker  heavily customized to maintain the tracing writing code minimal and allow the
45*6dbdd20aSAndroid Build Coastguard Worker  compiler to see through the architectural layers.
46*6dbdd20aSAndroid Build Coastguard Worker
47*6dbdd20aSAndroid Build Coastguard Worker* Code generated by ProtoZero needs to be hermetic. When building the
48*6dbdd20aSAndroid Build Coastguard Worker  amalgamated [Tracing SDK](/docs/instrumentation/tracing-sdk.md), the all
49*6dbdd20aSAndroid Build Coastguard Worker  perfetto tracing sources need to not have any dependency on any other
50*6dbdd20aSAndroid Build Coastguard Worker  libraries other than the C++ standard library and C library.
51*6dbdd20aSAndroid Build Coastguard Worker
52*6dbdd20aSAndroid Build Coastguard Worker## Usage
53*6dbdd20aSAndroid Build Coastguard Worker
54*6dbdd20aSAndroid Build Coastguard WorkerAt the build-system level, ProtoZero is extremely similar to the conventional
55*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf library.
56*6dbdd20aSAndroid Build Coastguard WorkerThe ProtoZero `.proto -> .pbzero.{cc,h}` compiler is based on top of the
57*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf parser and compiler infrastructure. ProtoZero is as a `protoc`
58*6dbdd20aSAndroid Build Coastguard Workercompiler plugin.
59*6dbdd20aSAndroid Build Coastguard Worker
60*6dbdd20aSAndroid Build Coastguard WorkerProtoZero has a build-time-only dependency on libprotobuf (the plugin depends
61*6dbdd20aSAndroid Build Coastguard Workeron libprotobuf's parser and compiler). The `.pbzero.{cc,h}` code generated by
62*6dbdd20aSAndroid Build Coastguard Workerit, however, has no runtime dependency (not even header-only dependencies) on
63*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf.
64*6dbdd20aSAndroid Build Coastguard Worker
65*6dbdd20aSAndroid Build Coastguard WorkerIn order to generate ProtoZero stubs from proto you need to:
66*6dbdd20aSAndroid Build Coastguard Worker
67*6dbdd20aSAndroid Build Coastguard Worker1. Build the ProtoZero compiler plugin, which lives in
68*6dbdd20aSAndroid Build Coastguard Worker   [src/protozero/protoc_plugin/](/src/protozero/protoc_plugin/).
69*6dbdd20aSAndroid Build Coastguard Worker   ```bash
70*6dbdd20aSAndroid Build Coastguard Worker   tools/ninja -C out/default protozero_plugin protoc
71*6dbdd20aSAndroid Build Coastguard Worker   ```
72*6dbdd20aSAndroid Build Coastguard Worker
73*6dbdd20aSAndroid Build Coastguard Worker2. Invoke the libprotobuf `protoc` compiler passing the `protozero_plugin`:
74*6dbdd20aSAndroid Build Coastguard Worker   ```bash
75*6dbdd20aSAndroid Build Coastguard Worker  out/default/protoc \
76*6dbdd20aSAndroid Build Coastguard Worker      --plugin=protoc-gen-plugin=out/default/protozero_plugin \
77*6dbdd20aSAndroid Build Coastguard Worker      --plugin_out=wrapper_namespace=pbzero:/tmp/  \
78*6dbdd20aSAndroid Build Coastguard Worker      test_msg.proto
79*6dbdd20aSAndroid Build Coastguard Worker   ```
80*6dbdd20aSAndroid Build Coastguard Worker   This generates `/tmp/test_msg.pbzero.{cc,h}`.
81*6dbdd20aSAndroid Build Coastguard Worker
82*6dbdd20aSAndroid Build Coastguard Worker   NOTE: The .cc file is always empty. ProtoZero-generated code is header only.
83*6dbdd20aSAndroid Build Coastguard Worker   The .cc file is emitted only because some build systems' rules assume that
84*6dbdd20aSAndroid Build Coastguard Worker   protobuf codegens generate both a .cc and a .h file.
85*6dbdd20aSAndroid Build Coastguard Worker
86*6dbdd20aSAndroid Build Coastguard Worker## Proto serialization
87*6dbdd20aSAndroid Build Coastguard Worker
88*6dbdd20aSAndroid Build Coastguard WorkerThe quickest way to undestand ProtoZero design principles is to start from a
89*6dbdd20aSAndroid Build Coastguard Workersmall example and compare the generated code between libprotobuf and ProtoZero.
90*6dbdd20aSAndroid Build Coastguard Worker
91*6dbdd20aSAndroid Build Coastguard Worker```protobuf
92*6dbdd20aSAndroid Build Coastguard Workersyntax = "proto2";
93*6dbdd20aSAndroid Build Coastguard Worker
94*6dbdd20aSAndroid Build Coastguard Workermessage TestMsg {
95*6dbdd20aSAndroid Build Coastguard Worker  optional string str_val = 1;
96*6dbdd20aSAndroid Build Coastguard Worker  optional int32 int_val = 2;
97*6dbdd20aSAndroid Build Coastguard Worker  repeated TestMsg nested = 3;
98*6dbdd20aSAndroid Build Coastguard Worker}
99*6dbdd20aSAndroid Build Coastguard Worker```
100*6dbdd20aSAndroid Build Coastguard Worker
101*6dbdd20aSAndroid Build Coastguard Worker#### libprotobuf approach
102*6dbdd20aSAndroid Build Coastguard Worker
103*6dbdd20aSAndroid Build Coastguard WorkerThe libprotobuf approach is to generate a C++ class that has one member for each
104*6dbdd20aSAndroid Build Coastguard Workerproto field, with dedicated serialization and de-serialization methods.
105*6dbdd20aSAndroid Build Coastguard Worker
106*6dbdd20aSAndroid Build Coastguard Worker```bash
107*6dbdd20aSAndroid Build Coastguard Workerout/default/protoc  --cpp_out=. test_msg.proto
108*6dbdd20aSAndroid Build Coastguard Worker```
109*6dbdd20aSAndroid Build Coastguard Worker
110*6dbdd20aSAndroid Build Coastguard Workergenerates test_msg.pb.{cc,h}. With many degrees of simplification, it looks
111*6dbdd20aSAndroid Build Coastguard Workeras follows:
112*6dbdd20aSAndroid Build Coastguard Worker
113*6dbdd20aSAndroid Build Coastguard Worker```c++
114*6dbdd20aSAndroid Build Coastguard Worker// This class is generated by the standard protoc compiler in the .pb.h source.
115*6dbdd20aSAndroid Build Coastguard Workerclass TestMsg : public protobuf::MessageLite {
116*6dbdd20aSAndroid Build Coastguard Worker  private:
117*6dbdd20aSAndroid Build Coastguard Worker   int32 int_val_;
118*6dbdd20aSAndroid Build Coastguard Worker   ArenaStringPtr str_val_;
119*6dbdd20aSAndroid Build Coastguard Worker   RepeatedPtrField<TestMsg> nested_;  // Effectively a vector<TestMsg>
120*6dbdd20aSAndroid Build Coastguard Worker
121*6dbdd20aSAndroid Build Coastguard Worker public:
122*6dbdd20aSAndroid Build Coastguard Worker  const std::string& str_val() const;
123*6dbdd20aSAndroid Build Coastguard Worker  void set_str_val(const std::string& value);
124*6dbdd20aSAndroid Build Coastguard Worker
125*6dbdd20aSAndroid Build Coastguard Worker  bool has_int_val() const;
126*6dbdd20aSAndroid Build Coastguard Worker  int32_t int_val() const;
127*6dbdd20aSAndroid Build Coastguard Worker  void set_int_val(int32_t value);
128*6dbdd20aSAndroid Build Coastguard Worker
129*6dbdd20aSAndroid Build Coastguard Worker  ::TestMsg* add_nested();
130*6dbdd20aSAndroid Build Coastguard Worker  ::TestMsg* mutable_nested(int index);
131*6dbdd20aSAndroid Build Coastguard Worker  const TestMsg& nested(int index);
132*6dbdd20aSAndroid Build Coastguard Worker
133*6dbdd20aSAndroid Build Coastguard Worker  std::string SerializeAsString();
134*6dbdd20aSAndroid Build Coastguard Worker  bool ParseFromString(const std::string&);
135*6dbdd20aSAndroid Build Coastguard Worker}
136*6dbdd20aSAndroid Build Coastguard Worker```
137*6dbdd20aSAndroid Build Coastguard Worker
138*6dbdd20aSAndroid Build Coastguard WorkerThe main characteristic of these stubs are:
139*6dbdd20aSAndroid Build Coastguard Worker
140*6dbdd20aSAndroid Build Coastguard Worker* Code generated from .proto messages can be used in the codebase as general
141*6dbdd20aSAndroid Build Coastguard Worker  purpose objects, without ever using the `SerializeAs*()` or `ParseFrom*()`
142*6dbdd20aSAndroid Build Coastguard Worker  methods (although anecdotal evidence suggests that most project use these
143*6dbdd20aSAndroid Build Coastguard Worker  proto-generated classes only at the de/serialization endpoints).
144*6dbdd20aSAndroid Build Coastguard Worker
145*6dbdd20aSAndroid Build Coastguard Worker* The end-to-end journey of serializing a proto involves two steps:
146*6dbdd20aSAndroid Build Coastguard Worker  1. Setting the individual int / string / vector fields of the generated class.
147*6dbdd20aSAndroid Build Coastguard Worker  2. Doing a serialization pass over these fields.
148*6dbdd20aSAndroid Build Coastguard Worker
149*6dbdd20aSAndroid Build Coastguard Worker  In turn this has side-effects on the code generated. STL copy/assignment
150*6dbdd20aSAndroid Build Coastguard Worker  operators for strings and vectors are non-trivial because, for instance, they
151*6dbdd20aSAndroid Build Coastguard Worker  need to deal with dynamic memory resizing.
152*6dbdd20aSAndroid Build Coastguard Worker
153*6dbdd20aSAndroid Build Coastguard Worker#### ProtoZero approach
154*6dbdd20aSAndroid Build Coastguard Worker
155*6dbdd20aSAndroid Build Coastguard Worker```c++
156*6dbdd20aSAndroid Build Coastguard Worker// This class is generated by the ProtoZero plugin in the .pbzero.h source.
157*6dbdd20aSAndroid Build Coastguard Workerclass TestMsg : public protozero::Message {
158*6dbdd20aSAndroid Build Coastguard Worker public:
159*6dbdd20aSAndroid Build Coastguard Worker  void set_str_val(const std::string& value) {
160*6dbdd20aSAndroid Build Coastguard Worker    AppendBytes(/*field_id=*/1, value.data(), value.size());
161*6dbdd20aSAndroid Build Coastguard Worker  }
162*6dbdd20aSAndroid Build Coastguard Worker  void set_str_val(const char* data, size_t size) {
163*6dbdd20aSAndroid Build Coastguard Worker    AppendBytes(/*field_id=*/1, data, size);
164*6dbdd20aSAndroid Build Coastguard Worker  }
165*6dbdd20aSAndroid Build Coastguard Worker  void set_int_val(int32_t value) {
166*6dbdd20aSAndroid Build Coastguard Worker    AppendVarInt(/*field_id=*/2, value);
167*6dbdd20aSAndroid Build Coastguard Worker  }
168*6dbdd20aSAndroid Build Coastguard Worker  TestMsg* add_nested() {
169*6dbdd20aSAndroid Build Coastguard Worker    return BeginNestedMessage<TestMsg>(/*field_id=*/3);
170*6dbdd20aSAndroid Build Coastguard Worker  }
171*6dbdd20aSAndroid Build Coastguard Worker}
172*6dbdd20aSAndroid Build Coastguard Worker```
173*6dbdd20aSAndroid Build Coastguard Worker
174*6dbdd20aSAndroid Build Coastguard WorkerThe ProtoZero-generated stubs are append-only. As the `set_*`, `add_*` methods
175*6dbdd20aSAndroid Build Coastguard Workerare invoked, the passed arguments are directly serialized into the target
176*6dbdd20aSAndroid Build Coastguard Workerbuffer. This introduces some limitations:
177*6dbdd20aSAndroid Build Coastguard Worker
178*6dbdd20aSAndroid Build Coastguard Worker* Readback is not possible: these classes cannot be used as C++ struct
179*6dbdd20aSAndroid Build Coastguard Worker  replacements.
180*6dbdd20aSAndroid Build Coastguard Worker
181*6dbdd20aSAndroid Build Coastguard Worker* No error-checking is performed: nothing prevents a non-repeated field to be
182*6dbdd20aSAndroid Build Coastguard Worker  emitted twice in the serialized proto if the caller accidentally calls a
183*6dbdd20aSAndroid Build Coastguard Worker  `set_*()` method twice. Basic type checks are still performed at compile-time
184*6dbdd20aSAndroid Build Coastguard Worker  though.
185*6dbdd20aSAndroid Build Coastguard Worker
186*6dbdd20aSAndroid Build Coastguard Worker* Nested fields must be filled in a stack fashion and cannot be written
187*6dbdd20aSAndroid Build Coastguard Worker  interleaved. Once a nested message is started, its fields must be set before
188*6dbdd20aSAndroid Build Coastguard Worker  going back setting the fields of the parent message. This turns out to not be
189*6dbdd20aSAndroid Build Coastguard Worker  a problem for most tracing use-cases.
190*6dbdd20aSAndroid Build Coastguard Worker
191*6dbdd20aSAndroid Build Coastguard WorkerThis has a number of advantages:
192*6dbdd20aSAndroid Build Coastguard Worker
193*6dbdd20aSAndroid Build Coastguard Worker* The classes generated by ProtoZero don't add any extra state on top of the
194*6dbdd20aSAndroid Build Coastguard Worker  base class they derive (`protozero::Message`). They define only inline
195*6dbdd20aSAndroid Build Coastguard Worker  setter methods that call base-class serialization methods. Compilers can
196*6dbdd20aSAndroid Build Coastguard Worker  see through all the inline expansions of these methods.
197*6dbdd20aSAndroid Build Coastguard Worker
198*6dbdd20aSAndroid Build Coastguard Worker* As a consequence of that, the binary cost of ProtoZero is independent of the
199*6dbdd20aSAndroid Build Coastguard Worker  number of protobuf messages defined and their fields, and depends only on the
200*6dbdd20aSAndroid Build Coastguard Worker  number of `set_*`/`add_*` calls. This (i.e. binary cost of non-used proto
201*6dbdd20aSAndroid Build Coastguard Worker  messages and fields) anecdotally has been a big issue with libprotobuf.
202*6dbdd20aSAndroid Build Coastguard Worker
203*6dbdd20aSAndroid Build Coastguard Worker* The serialization methods don't involve any copy or dynamic allocation. The
204*6dbdd20aSAndroid Build Coastguard Worker  inline expansion calls directly into the corresponding `AppendVarInt()` /
205*6dbdd20aSAndroid Build Coastguard Worker  `AppendString()` methods of `protozero::Message`.
206*6dbdd20aSAndroid Build Coastguard Worker
207*6dbdd20aSAndroid Build Coastguard Worker* This allows to directly serialize trace events into the
208*6dbdd20aSAndroid Build Coastguard Worker  [tracing shared memory buffers](/docs/concepts/buffers.md), even if they are
209*6dbdd20aSAndroid Build Coastguard Worker  not contiguous.
210*6dbdd20aSAndroid Build Coastguard Worker
211*6dbdd20aSAndroid Build Coastguard Worker### Scattered buffer writing
212*6dbdd20aSAndroid Build Coastguard Worker
213*6dbdd20aSAndroid Build Coastguard WorkerA key part of the ProtoZero design is supporting direct serialization on
214*6dbdd20aSAndroid Build Coastguard Workernon-globally-contiguous sequences of contiguous memory regions.
215*6dbdd20aSAndroid Build Coastguard Worker
216*6dbdd20aSAndroid Build Coastguard WorkerThis happens by decoupling `protozero::Message`, the base class for all the
217*6dbdd20aSAndroid Build Coastguard Workergenerated classes, from the `protozero::ScatteredStreamWriter`.
218*6dbdd20aSAndroid Build Coastguard WorkerThe problem it solves is the following: ProtoZero is based on direct
219*6dbdd20aSAndroid Build Coastguard Workerserialization into shared memory buffers chunks. These chunks are 4KB - 32KB in
220*6dbdd20aSAndroid Build Coastguard Workermost cases. At the same time, there is no limit in how much data the caller will
221*6dbdd20aSAndroid Build Coastguard Workertry to write into an individual message, a trace event can be up to 256 MiB big.
222*6dbdd20aSAndroid Build Coastguard Worker
223*6dbdd20aSAndroid Build Coastguard Worker![ProtoZero scattered buffers diagram](/docs/images/protozero-ssw.png)
224*6dbdd20aSAndroid Build Coastguard Worker
225*6dbdd20aSAndroid Build Coastguard Worker#### Fast-path
226*6dbdd20aSAndroid Build Coastguard Worker
227*6dbdd20aSAndroid Build Coastguard WorkerAt all times the underlying `ScatteredStreamWriter` knows what are the bounds
228*6dbdd20aSAndroid Build Coastguard Workerof the current buffer. All write operations are bound checked and hit a
229*6dbdd20aSAndroid Build Coastguard Workerslow-path when crossing the buffer boundary.
230*6dbdd20aSAndroid Build Coastguard Worker
231*6dbdd20aSAndroid Build Coastguard WorkerMost write operations can be completed within the current buffer boundaries.
232*6dbdd20aSAndroid Build Coastguard WorkerIn that case, the cost of a `set_*` operation is in essence a `memcpy()` with
233*6dbdd20aSAndroid Build Coastguard Workerthe extra overhead of var-int encoding for protobuf preambles and
234*6dbdd20aSAndroid Build Coastguard Workerlength-delimited fields.
235*6dbdd20aSAndroid Build Coastguard Worker
236*6dbdd20aSAndroid Build Coastguard Worker#### Slow-path
237*6dbdd20aSAndroid Build Coastguard Worker
238*6dbdd20aSAndroid Build Coastguard WorkerWhen crossing the boundary, the slow-path asks the
239*6dbdd20aSAndroid Build Coastguard Worker`ScatteredStreamWriter::Delegate` for a new buffer. The implementation of
240*6dbdd20aSAndroid Build Coastguard Worker`GetNewBuffer()` is up to the client. In tracing use-cases, that call will
241*6dbdd20aSAndroid Build Coastguard Workeracquire a new thread-local chunk from the tracing shared memory buffer.
242*6dbdd20aSAndroid Build Coastguard Worker
243*6dbdd20aSAndroid Build Coastguard WorkerOther heap-based implementations are possible. For instance, the ProtoZero
244*6dbdd20aSAndroid Build Coastguard Workersources provide a helper class `HeapBuffered<TestMsg>`, mainly used in tests (see
245*6dbdd20aSAndroid Build Coastguard Worker[scattered_heap_buffer.h](/include/perfetto/protozero/scattered_heap_buffer.h)),
246*6dbdd20aSAndroid Build Coastguard Workerwhich allocates a new heap buffer when crossing the boundaries of the current
247*6dbdd20aSAndroid Build Coastguard Workerone.
248*6dbdd20aSAndroid Build Coastguard Worker
249*6dbdd20aSAndroid Build Coastguard WorkerConsider the following example:
250*6dbdd20aSAndroid Build Coastguard Worker
251*6dbdd20aSAndroid Build Coastguard Worker```c++
252*6dbdd20aSAndroid Build Coastguard WorkerTestMsg outer_msg;
253*6dbdd20aSAndroid Build Coastguard Workerfor (int i = 0; i < 1000; i++) {
254*6dbdd20aSAndroid Build Coastguard Worker  TestMsg* nested = outer_msg.add_nested();
255*6dbdd20aSAndroid Build Coastguard Worker  nested->set_int_val(42);
256*6dbdd20aSAndroid Build Coastguard Worker}
257*6dbdd20aSAndroid Build Coastguard Worker```
258*6dbdd20aSAndroid Build Coastguard Worker
259*6dbdd20aSAndroid Build Coastguard WorkerAt some point one of the `set_int_val()` calls will hit the slow-path and
260*6dbdd20aSAndroid Build Coastguard Workeracquire a new buffer. The overall idea is having a serialization mechanism
261*6dbdd20aSAndroid Build Coastguard Workerthat is extremely lightweight most of the times and that requires some extra
262*6dbdd20aSAndroid Build Coastguard Workerfunction calls when buffer boundary, so that their cost gets amortized across
263*6dbdd20aSAndroid Build Coastguard Workerall trace events.
264*6dbdd20aSAndroid Build Coastguard Worker
265*6dbdd20aSAndroid Build Coastguard WorkerIn the context of the overall Perfetto tracing use case, the slow-path involves
266*6dbdd20aSAndroid Build Coastguard Workergrabbing a process-local mutex and finding the next free chunk in the shared
267*6dbdd20aSAndroid Build Coastguard Workermemory buffer. Hence writes are lock-free as long as they happen within the
268*6dbdd20aSAndroid Build Coastguard Workerthread-local chunk and require a critical section to acquire a new chunk once
269*6dbdd20aSAndroid Build Coastguard Workerevery 4KB-32KB (depending on the trace configuration).
270*6dbdd20aSAndroid Build Coastguard Worker
271*6dbdd20aSAndroid Build Coastguard WorkerThe assumption is that the likeliness that two threads will cross the chunk
272*6dbdd20aSAndroid Build Coastguard Workerboundary and call `GetNewBuffer()` at the same time is extremely low and hence
273*6dbdd20aSAndroid Build Coastguard Workerthe critical section is un-contended most of the times.
274*6dbdd20aSAndroid Build Coastguard Worker
275*6dbdd20aSAndroid Build Coastguard Worker```mermaid
276*6dbdd20aSAndroid Build Coastguard WorkersequenceDiagram
277*6dbdd20aSAndroid Build Coastguard Worker  participant C as Call site
278*6dbdd20aSAndroid Build Coastguard Worker  participant M as Message
279*6dbdd20aSAndroid Build Coastguard Worker  participant SSR as ScatteredStreamWriter
280*6dbdd20aSAndroid Build Coastguard Worker  participant DEL as Buffer Delegate
281*6dbdd20aSAndroid Build Coastguard Worker  C->>M: set_int_val(...)
282*6dbdd20aSAndroid Build Coastguard Worker  activate C
283*6dbdd20aSAndroid Build Coastguard Worker  M->>SSR: AppendVarInt(...)
284*6dbdd20aSAndroid Build Coastguard Worker  deactivate C
285*6dbdd20aSAndroid Build Coastguard Worker  Note over C,SSR: A typical write on the fast-path
286*6dbdd20aSAndroid Build Coastguard Worker
287*6dbdd20aSAndroid Build Coastguard Worker  C->>M: set_str_val(...)
288*6dbdd20aSAndroid Build Coastguard Worker  activate C
289*6dbdd20aSAndroid Build Coastguard Worker  M->>SSR: AppendString(...)
290*6dbdd20aSAndroid Build Coastguard Worker  SSR->>DEL: GetNewBuffer(...)
291*6dbdd20aSAndroid Build Coastguard Worker  deactivate C
292*6dbdd20aSAndroid Build Coastguard Worker  Note over C,DEL: A write on the slow-path when crossing 4KB - 32KB chunks.
293*6dbdd20aSAndroid Build Coastguard Worker```
294*6dbdd20aSAndroid Build Coastguard Worker
295*6dbdd20aSAndroid Build Coastguard Worker### Deferred patching
296*6dbdd20aSAndroid Build Coastguard Worker
297*6dbdd20aSAndroid Build Coastguard WorkerNested messages in the protobuf binary encoding are prefixed with their
298*6dbdd20aSAndroid Build Coastguard Workervarint-encoded size.
299*6dbdd20aSAndroid Build Coastguard Worker
300*6dbdd20aSAndroid Build Coastguard WorkerConsider the following:
301*6dbdd20aSAndroid Build Coastguard Worker
302*6dbdd20aSAndroid Build Coastguard Worker```c++
303*6dbdd20aSAndroid Build Coastguard WorkerTestMsg* nested = outer_msg.add_nested();
304*6dbdd20aSAndroid Build Coastguard Workernested->set_int_val(42);
305*6dbdd20aSAndroid Build Coastguard Workernested->set_str_val("foo");
306*6dbdd20aSAndroid Build Coastguard Worker```
307*6dbdd20aSAndroid Build Coastguard Worker
308*6dbdd20aSAndroid Build Coastguard WorkerThe canonical encoding of this protobuf message, using libprotobuf, would be:
309*6dbdd20aSAndroid Build Coastguard Worker
310*6dbdd20aSAndroid Build Coastguard Worker```bash
311*6dbdd20aSAndroid Build Coastguard Worker1a 07 0a 03 66 6f 6f 10 2a
312*6dbdd20aSAndroid Build Coastguard Worker^-+-^ ^-----+------^ ^-+-^
313*6dbdd20aSAndroid Build Coastguard Worker  |         |          |
314*6dbdd20aSAndroid Build Coastguard Worker  |         |          +--> Field ID: 2 [int_val], value = 42.
315*6dbdd20aSAndroid Build Coastguard Worker  |         |
316*6dbdd20aSAndroid Build Coastguard Worker  |         +------> Field ID: 1 [str_val], len = 3, value = "foo" (66 6f 6f).
317*6dbdd20aSAndroid Build Coastguard Worker  |
318*6dbdd20aSAndroid Build Coastguard Worker  +------> Field ID: 3 [nested], length: 7  # !!!
319*6dbdd20aSAndroid Build Coastguard Worker```
320*6dbdd20aSAndroid Build Coastguard Worker
321*6dbdd20aSAndroid Build Coastguard WorkerThe second byte in this sequence (07) is problematic for direct encoding. At the
322*6dbdd20aSAndroid Build Coastguard Workerpoint where `outer_msg.add_nested()` is called, we can't possibly know upfront
323*6dbdd20aSAndroid Build Coastguard Workerwhat the overall size of the nested message will be (in this case, 5 + 2 = 7).
324*6dbdd20aSAndroid Build Coastguard Worker
325*6dbdd20aSAndroid Build Coastguard WorkerThe way we get around this in ProtoZero is by reserving four bytes for the
326*6dbdd20aSAndroid Build Coastguard Worker_size_ of each nested message and back-filling them once the message is
327*6dbdd20aSAndroid Build Coastguard Workerfinalized (or when we try to set a field in one of the parent messages).
328*6dbdd20aSAndroid Build Coastguard WorkerWe do this by encoding the size of the message using redundant varint encoding,
329*6dbdd20aSAndroid Build Coastguard Workerin this case: `87 80 80 00` instead of `07`.
330*6dbdd20aSAndroid Build Coastguard Worker
331*6dbdd20aSAndroid Build Coastguard WorkerAt the C++ level, the `protozero::Message` class holds a pointer to its `size`
332*6dbdd20aSAndroid Build Coastguard Workerfield, which typically points to the beginning of the message, where the four
333*6dbdd20aSAndroid Build Coastguard Workerbytes are reserved, and back-fills it in the `Message::Finalize()` pass.
334*6dbdd20aSAndroid Build Coastguard Worker
335*6dbdd20aSAndroid Build Coastguard WorkerThis works fine for cases where the entire message lies in one contiguous buffer
336*6dbdd20aSAndroid Build Coastguard Workerbut opens a further challenge: a message can be several MBs big. Looking at this
337*6dbdd20aSAndroid Build Coastguard Workerfrom the overall tracing perspective, the shared memory buffer chunk that holds
338*6dbdd20aSAndroid Build Coastguard Workerthe beginning of a message can be long gone (i.e. committed in the central
339*6dbdd20aSAndroid Build Coastguard Workerservice buffer) by the time we get to the end.
340*6dbdd20aSAndroid Build Coastguard Worker
341*6dbdd20aSAndroid Build Coastguard WorkerIn order to support this use case, at the tracing code level (outside of
342*6dbdd20aSAndroid Build Coastguard WorkerProtoZero), when a message crosses the buffer boundary, its `size` field gets
343*6dbdd20aSAndroid Build Coastguard Workerredirected to a temporary patch buffer
344*6dbdd20aSAndroid Build Coastguard Worker(see [patch_list.h](/src/tracing/core/patch_list.h)). This patch buffer is then
345*6dbdd20aSAndroid Build Coastguard Workersent out-of-band, piggybacking over the next commit IPC (see
346*6dbdd20aSAndroid Build Coastguard Worker[Tracing Protocol ABI](/docs/design-docs/api-and-abi.md#tracing-protocol-abi))
347*6dbdd20aSAndroid Build Coastguard Worker
348*6dbdd20aSAndroid Build Coastguard Worker### Performance characteristics
349*6dbdd20aSAndroid Build Coastguard Worker
350*6dbdd20aSAndroid Build Coastguard WorkerNOTE: For the full code of the benchmark see
351*6dbdd20aSAndroid Build Coastguard Worker      `/src/protozero/test/protozero_benchmark.cc`
352*6dbdd20aSAndroid Build Coastguard Worker
353*6dbdd20aSAndroid Build Coastguard WorkerWe consider two scenarios: writing a simple event and a nested event
354*6dbdd20aSAndroid Build Coastguard Worker
355*6dbdd20aSAndroid Build Coastguard Worker#### Simple event
356*6dbdd20aSAndroid Build Coastguard Worker
357*6dbdd20aSAndroid Build Coastguard WorkerConsists of filling a flat proto message with of 4 integers (2 x 32-bit,
358*6dbdd20aSAndroid Build Coastguard Worker2 x 64-bit) and a 32 bytes string, as follows:
359*6dbdd20aSAndroid Build Coastguard Worker
360*6dbdd20aSAndroid Build Coastguard Worker```c++
361*6dbdd20aSAndroid Build Coastguard Workervoid FillMessage_Simple(T* msg) {
362*6dbdd20aSAndroid Build Coastguard Worker  msg->set_field_int32(...);
363*6dbdd20aSAndroid Build Coastguard Worker  msg->set_field_uint32(...);
364*6dbdd20aSAndroid Build Coastguard Worker  msg->set_field_int64(...);
365*6dbdd20aSAndroid Build Coastguard Worker  msg->set_field_uint64(...);
366*6dbdd20aSAndroid Build Coastguard Worker  msg->set_field_string(...);
367*6dbdd20aSAndroid Build Coastguard Worker}
368*6dbdd20aSAndroid Build Coastguard Worker```
369*6dbdd20aSAndroid Build Coastguard Worker
370*6dbdd20aSAndroid Build Coastguard Worker#### Nested event
371*6dbdd20aSAndroid Build Coastguard Worker
372*6dbdd20aSAndroid Build Coastguard WorkerConsists of filling a similar message which is recursively nested 3 levels deep:
373*6dbdd20aSAndroid Build Coastguard Worker
374*6dbdd20aSAndroid Build Coastguard Worker```c++
375*6dbdd20aSAndroid Build Coastguard Workervoid FillMessage_Nested(T* msg, int depth = 0) {
376*6dbdd20aSAndroid Build Coastguard Worker  FillMessage_Simple(msg);
377*6dbdd20aSAndroid Build Coastguard Worker  if (depth < 3) {
378*6dbdd20aSAndroid Build Coastguard Worker    auto* child = msg->add_field_nested();
379*6dbdd20aSAndroid Build Coastguard Worker    FillMessage_Nested(child, depth + 1);
380*6dbdd20aSAndroid Build Coastguard Worker  }
381*6dbdd20aSAndroid Build Coastguard Worker}
382*6dbdd20aSAndroid Build Coastguard Worker```
383*6dbdd20aSAndroid Build Coastguard Worker
384*6dbdd20aSAndroid Build Coastguard Worker#### Comparison terms
385*6dbdd20aSAndroid Build Coastguard Worker
386*6dbdd20aSAndroid Build Coastguard WorkerWe compare, for the same message type, the performance of ProtoZero,
387*6dbdd20aSAndroid Build Coastguard Workerlibprotobuf and a speed-of-light serializer.
388*6dbdd20aSAndroid Build Coastguard Worker
389*6dbdd20aSAndroid Build Coastguard WorkerThe speed-of-light serializer is a very simple C++ class that just appends
390*6dbdd20aSAndroid Build Coastguard Workerdata into a linear buffer making all sorts of favourable assumptions. It does
391*6dbdd20aSAndroid Build Coastguard Workernot use any binary-stable encoding, it does not perform bound checking,
392*6dbdd20aSAndroid Build Coastguard Workerall writes are 64-bit aligned, it doesn't deal with any thread-safety.
393*6dbdd20aSAndroid Build Coastguard Worker
394*6dbdd20aSAndroid Build Coastguard Worker```c++
395*6dbdd20aSAndroid Build Coastguard Workerstruct SOLMsg {
396*6dbdd20aSAndroid Build Coastguard Worker  template <typename T>
397*6dbdd20aSAndroid Build Coastguard Worker  void Append(T x) {
398*6dbdd20aSAndroid Build Coastguard Worker    // The memcpy will be elided by the compiler, which will emit just a
399*6dbdd20aSAndroid Build Coastguard Worker    // 64-bit aligned mov instruction.
400*6dbdd20aSAndroid Build Coastguard Worker    memcpy(reinterpret_cast<void*>(ptr_), &x, sizeof(x));
401*6dbdd20aSAndroid Build Coastguard Worker    ptr_ += sizeof(x);
402*6dbdd20aSAndroid Build Coastguard Worker  }
403*6dbdd20aSAndroid Build Coastguard Worker
404*6dbdd20aSAndroid Build Coastguard Worker  void set_field_int32(int32_t x) { Append(x); }
405*6dbdd20aSAndroid Build Coastguard Worker  void set_field_uint32(uint32_t x) { Append(x); }
406*6dbdd20aSAndroid Build Coastguard Worker  void set_field_int64(int64_t x) { Append(x); }
407*6dbdd20aSAndroid Build Coastguard Worker  void set_field_uint64(uint64_t x) { Append(x); }
408*6dbdd20aSAndroid Build Coastguard Worker  void set_field_string(const char* str) { ptr_ = strcpy(ptr_, str); }
409*6dbdd20aSAndroid Build Coastguard Worker
410*6dbdd20aSAndroid Build Coastguard Worker  alignas(uint64_t) char storage_[sizeof(g_fake_input_simple) + 8];
411*6dbdd20aSAndroid Build Coastguard Worker  char* ptr_ = &storage_[0];
412*6dbdd20aSAndroid Build Coastguard Worker};
413*6dbdd20aSAndroid Build Coastguard Worker```
414*6dbdd20aSAndroid Build Coastguard Worker
415*6dbdd20aSAndroid Build Coastguard WorkerThe speed-of-light serializer serves as a reference for _how fast a serializer
416*6dbdd20aSAndroid Build Coastguard Workercould be if argument marshalling and bound checking were zero cost._
417*6dbdd20aSAndroid Build Coastguard Worker
418*6dbdd20aSAndroid Build Coastguard Worker#### Benchmark results
419*6dbdd20aSAndroid Build Coastguard Worker
420*6dbdd20aSAndroid Build Coastguard Worker##### Google Pixel 3 - aarch64
421*6dbdd20aSAndroid Build Coastguard Worker
422*6dbdd20aSAndroid Build Coastguard Worker```bash
423*6dbdd20aSAndroid Build Coastguard Worker$ cat out/droid_arm64/args.gn
424*6dbdd20aSAndroid Build Coastguard Workertarget_os = "android"
425*6dbdd20aSAndroid Build Coastguard Workeris_clang = true
426*6dbdd20aSAndroid Build Coastguard Workeris_debug = false
427*6dbdd20aSAndroid Build Coastguard Workertarget_cpu = "arm64"
428*6dbdd20aSAndroid Build Coastguard Worker
429*6dbdd20aSAndroid Build Coastguard Worker$ ninja -C out/droid_arm64/ perfetto_benchmarks && \
430*6dbdd20aSAndroid Build Coastguard Worker  adb push --sync out/droid_arm64/perfetto_benchmarks /data/local/tmp/perfetto_benchmarks && \
431*6dbdd20aSAndroid Build Coastguard Worker  adb shell '/data/local/tmp/perfetto_benchmarks --benchmark_filter=BM_Proto*'
432*6dbdd20aSAndroid Build Coastguard Worker
433*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------
434*6dbdd20aSAndroid Build Coastguard WorkerBenchmark                                 Time           CPU Iterations
435*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------
436*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Libprotobuf         402 ns        398 ns    1732807
437*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Protozero           242 ns        239 ns    2929528
438*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_SpeedOfLight        118 ns        117 ns    6101381
439*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Libprotobuf        1810 ns       1800 ns     390468
440*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Protozero           780 ns        773 ns     901369
441*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_SpeedOfLight        138 ns        136 ns    5147958
442*6dbdd20aSAndroid Build Coastguard Worker```
443*6dbdd20aSAndroid Build Coastguard Worker
444*6dbdd20aSAndroid Build Coastguard Worker##### HP Z920 workstation (Intel Xeon E5-2690 v4) running Linux
445*6dbdd20aSAndroid Build Coastguard Worker
446*6dbdd20aSAndroid Build Coastguard Worker```bash
447*6dbdd20aSAndroid Build Coastguard Worker
448*6dbdd20aSAndroid Build Coastguard Worker$ cat out/linux_clang_release/args.gn
449*6dbdd20aSAndroid Build Coastguard Workeris_clang = true
450*6dbdd20aSAndroid Build Coastguard Workeris_debug = false
451*6dbdd20aSAndroid Build Coastguard Worker
452*6dbdd20aSAndroid Build Coastguard Worker$ ninja -C out/linux_clang_release/ perfetto_benchmarks && \
453*6dbdd20aSAndroid Build Coastguard Worker  out/linux_clang_release/perfetto_benchmarks --benchmark_filter=BM_Proto*
454*6dbdd20aSAndroid Build Coastguard Worker
455*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------
456*6dbdd20aSAndroid Build Coastguard WorkerBenchmark                                 Time           CPU Iterations
457*6dbdd20aSAndroid Build Coastguard Worker------------------------------------------------------------------------
458*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Libprotobuf         428 ns        428 ns    1624801
459*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_Protozero           261 ns        261 ns    2715544
460*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Simple_SpeedOfLight        111 ns        111 ns    6297387
461*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Libprotobuf        1625 ns       1625 ns     436411
462*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_Protozero           843 ns        843 ns     849302
463*6dbdd20aSAndroid Build Coastguard WorkerBM_Protozero_Nested_SpeedOfLight        140 ns        140 ns    5012910
464*6dbdd20aSAndroid Build Coastguard Worker```
465