xref: /aosp_15_r20/external/pigweed/pw_tokenizer/docs.rst (revision 61c4878ac05f98d0ceed94b57d316916de578985)
1.. _module-pw_tokenizer:
2
3============
4pw_tokenizer
5============
6.. pigweed-module::
7   :name: pw_tokenizer
8
9Logging is critical, but developers are often forced to choose between
10additional logging or saving crucial flash space. The ``pw_tokenizer`` module
11enables **extensive logging with substantially less memory usage** by replacing
12printf-style strings with binary tokens during compilation. It is designed to
13integrate easily into existing logging systems.
14
15Although the most common application of ``pw_tokenizer`` is binary logging,
16**the tokenizer is general purpose and can be used to tokenize any strings**,
17with or without printf-style arguments.
18
19Why tokenize strings?
20
21* **Dramatically reduce binary size** by removing string literals from binaries.
22* **Reduce I/O traffic, RAM, and flash usage** by sending and storing compact tokens
23  instead of strings. We've seen over 50% reduction in encoded log contents.
24* **Reduce CPU usage** by replacing snprintf calls with simple tokenization code.
25* **Remove potentially sensitive log, assert, and other strings** from binaries.
26
27.. grid:: 1
28
29   .. grid-item-card:: :octicon:`rocket` Get started
30      :link: module-pw_tokenizer-get-started
31      :link-type: ref
32      :class-item: sales-pitch-cta-primary
33
34      Integrate pw_tokenizer into your project.
35
36.. grid:: 2
37
38   .. grid-item-card:: :octicon:`code-square` Tokenization
39      :link: module-pw_tokenizer-tokenization
40      :link-type: ref
41      :class-item: sales-pitch-cta-secondary
42
43      Convert strings and arguments to tokens.
44
45   .. grid-item-card:: :octicon:`code-square` Token databases
46      :link: module-pw_tokenizer-token-databases
47      :link-type: ref
48      :class-item: sales-pitch-cta-secondary
49
50      Store a mapping of tokens to the strings and arguments they represent.
51
52.. grid:: 2
53
54   .. grid-item-card:: :octicon:`code-square` Detokenization
55      :link: module-pw_tokenizer-detokenization
56      :link-type: ref
57      :class-item: sales-pitch-cta-secondary
58
59      Expand tokens back to the strings and arguments they represent.
60
61   .. grid-item-card:: :octicon:`info` API reference
62      :link: module-pw_tokenizer-api
63      :link-type: ref
64      :class-item: sales-pitch-cta-secondary
65
66      Detailed reference information about the pw_tokenizer API.
67
68
69.. _module-pw_tokenizer-tokenized-logging-example:
70
71---------------------------
72Tokenized logging in action
73---------------------------
74Here's an example of how ``pw_tokenizer`` enables you to store
75and send the same logging information using significantly less
76resources:
77
78.. mermaid::
79
80   flowchart TD
81
82     subgraph after["After: Tokenized Logs (37 bytes saved!)"]
83       after_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 4 bytes stored on-device as... -->
84       after_encoding["d9 28 47 8e"] -- 6 bytes sent over the wire as... -->
85       after_transmission["d9 28 47 8e aa 3e"] -- Displayed in logs as... -->
86       after_display["#quot;Battery Voltage: 3989 mV#quot;"]
87     end
88
89     subgraph before["Before: No Tokenization"]
90       before_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 41 bytes stored on-device as... -->
91       before_encoding["#quot;Battery Voltage: %d mV#quot;"] -- 43 bytes sent over the wire as... -->
92       before_transmission["#quot;Battery Voltage: 3989 mV#quot;"] -- Displayed in logs as... -->
93       before_display["#quot;Battery Voltage: 3989 mV#quot;"]
94     end
95
96     style after stroke:#00c852,stroke-width:3px
97     style before stroke:#ff5252,stroke-width:3px
98
99A quick overview of how the tokenized version works:
100
101* You tokenize ``"Battery Voltage: %d mV"`` with a macro like
102  :c:macro:`PW_TOKENIZE_STRING`. You can use :ref:`module-pw_log_tokenized`
103  to handle the tokenization automatically.
104* After tokenization, ``"Battery Voltage: %d mV"`` becomes ``d9 28 47 8e``.
105* The first 4 bytes sent over the wire is the tokenized version of
106  ``"Battery Voltage: %d mV"``. The last 2 bytes are the value of ``voltage``
107  converted to a varint using :ref:`module-pw_varint`.
108* The logs are converted back to the original, human-readable message
109  via the :ref:`Detokenization API <module-pw_tokenizer-detokenization>`
110  and a :ref:`token database <module-pw_tokenizer-token-databases>`.
111
112.. toctree::
113   :hidden:
114   :maxdepth: 1
115
116   Get started <get_started>
117   tokenization
118   token_databases
119   detokenization
120   API reference <api>
121