xref: /aosp_15_r20/external/emboss/doc/guide.md (revision 99e0aae7469b87d12f0ad23e61142c2d74c1ef70)
1# Emboss User Guide
2
3
4## Getting Started
5
6First, you must identify a data structure you want to read and write.  These are
7often documented in hardware manuals a bit like [this one, for the fictional
8BN-P-6000404 illuminated button panel](BogoNEL_BN-P-6000404_User_Guide.pdf).  We
9will use the BN-P-6000404 as an example.
10
11
12### A Caution
13
14Emboss is still beta software.  While we believe that we will not need to make
15any more breaking changes before 1.0, you may still encounter bugs and there are
16many missing features.
17
18You can contact `[email protected]` with any issues.  Emboss is not an
19officially supported Google product, but the Emboss authors will try to answer
20emails.
21
22
23### System Requirements
24
25#### Running the Emboss Compiler
26
27The Emboss compiler requires Python 3.8 or later -- the minimum supported
28version tracks the support timeline of the Python project.  On a Linux-like
29system with Python 3 installed in the usual place (`/usr/bin/python3`), you
30can run the embossc script at the top level on an `.emb` file to generate
31C++, like so:
32
33```
34embossc --generate cc --output-path path/to/object/dir path/to/input.emb
35```
36
37If your project is using Bazel, the `build_defs.bzl` file has an
38`emboss_cc_library` rule that you can use from your project.
39
40
41#### Using the Generated C++ Code
42
43The code generated by Emboss requires a C++11-compliant compiler, and a
44reasonably up-to-date standard library.  Emboss has been tested with GCC and
45Clang, libc++ and libstd++.  In theory, it should work with MSVC, ICC, etc., but
46it has not been tested, so there are likely to be bugs.
47
48The generated C++ code lives entirely in a `.h` file, one per `.emb` file.  All
49of the generated code is in C++ templates or (in a very few cases) `inline`
50functions.  The generated code is structured this way in order to implement
51"pay-as-you-use" for code size: any functions, methods, or views that are not
52used by your code won't end up in your final binary.  This is often important
53for environments like microcontrollers!
54
55There is an Emboss runtime library (under `runtime/cpp`), which is also
56header-only.  You will need to add the root of the Emboss source tree to your
57`#include` path.
58
59Note that it is *strongly* recommended that you compile your release code with
60at least some optimizations: `-Os` or `-O2`.  The Emboss generated code leans
61fairly heavily on your C++ compiler's inlining and common code elimination to
62produce fast, lean compiled code.
63
64
65#### Contributing to the Compiler
66
67If you want to contribute features or bugfixes to the Emboss compiler itself,
68you will need Bazel to run the Emboss test suite.
69
70
71### Create an `.emb` file
72
73Next, you will need to translate your structures.
74
75```
76[$default byte_order: "LittleEndian"]
77[(cpp) namespace: "bogonel::bnp6000404"]
78```
79
80The BN-P-6000404 uses little-endian numbers, so we can set the default byte
81order to `LittleEndian`.  There is no particular C++ namespace implied by the
82BN-P-6000404 user guide, so we use one that is specific to the BN-P-6000404.
83
84The BN-P-6000404, like many devices with serial interfaces, uses a framed
85message system, with a fixed header and a variable message body depending on a
86message ID.  For the BN-P-6000404, this framing looks like this:
87
88<!-- TODO(bolms): finalize the "magic value initialization" feature, document it
89here.  -->
90
91```
92struct Message:
93  -- Top-level message structure, specified in section 5.3 of the BN-P-6000404
94  -- user guide.
95
96  0 [+1]  UInt       sync_1
97    [requires: this == 0x42]
98
99  1 [+1]  UInt       sync_2
100    [requires: this == 0x4E]
101
102  2 [+1]  MessageId  message_id
103    -- Type of message
104
105  3 [+1]  UInt       message_length (ml)
106    -- Length of message, including header and checksum
107
108  # ... body fields to follow ...
109```
110
111We could have chosen to put the header fields into a separate `Header` structure
112instead of placing them directly in the `Message` structure.
113
114The `sync_1` and `sync_2` fields are required to have specific magic values, so
115we add the appropriate `[requires: ...]` attributes to them.  This tells Emboss
116that if those fields do not have those values, then the `Message` `struct` is
117ill-formed: in the client code, the `Message` will not be `Ok()` if those fields
118have the wrong values, and Emboss will not allow wrong values to be written into
119those fields using the checked (default) APIs.
120
121Unfortunately, BogoNEL does not provide a nice table of message IDs, but
122fortunately there are only a few, so we can gather them from the individual
123messages:
124
125```
126enum MessageId:
127  -- Message type idenfiers for the BN-P-6000404.
128  IDENTIFICATION       = 0x01
129  INTERACTION          = 0x02
130  QUERY_IDENTIFICATION = 0x10
131  QUERY_BUTTONS        = 0x11
132  SET_ILLUMINATION     = 0x12
133```
134
135Next, we should translate the individual messages to Emboss.
136
137```
138struct Identification:
139  -- IDENTIFICATION message, specified in section 5.3.3.
140
141  0 [+4]  UInt       vendor
142    # 0x4F474F42 is "BOGO" in ASCII, interpreted as a 4-byte little-endian
143    # value.
144    [requires: this == 0x4F47_4F42]
145
146  0 [+4]  UInt:8[4]  vendor_ascii
147    -- "BOGO" for BogoNEL Corp
148    # The `vendor` field really contains the four ASCII characters "BOGO", so we
149    # could use a byte array instead of a single UInt.  Since it is valid to
150    # have overlapping fields, we can have both `vendor` and `vendor_ascii` in
151    # our Emboss specification.
152
153  4 [+2]  UInt       firmware_major
154    -- Firmware major version
155
156  6 [+2]  UInt       firmware_minor
157    -- Firmware minor version
158```
159
160<!-- TODO(bolms): fixed-length, ASCIIZ, and variable-length string support? -->
161
162The `Identification` structure is fairly straightforward.  In this case, we
163provide an alternate view of the `vendor` field via `vendor_ascii`: 0x4F474F42
164in little-endian works out to the ASCII characters "BOGO".
165
166Note that `vendor_ascii` uses `UInt:8[4]` for its type, and not `UInt[4]`.  For
167most fields, we can use plain `UInt` and Emboss will figure out how big the
168`UInt` should be, but for an array we must be explicit that we want 8-bit
169elements.
170
171```
172struct Interaction:
173  -- INTERACTION message, specified in section 5.3.4.
174
175  0 [+1]  UInt           number_of_buttons (n)
176    -- Number of buttons currently depressed by user
177
178  4 [+n]  ButtonId:8[n]  button_id
179    -- ID of pressed button.  A number of entries equal to number_of_buttons
180    -- will be provided.
181```
182
183<!-- TODO(bolms): reserved field support -->
184
185`Interaction` is also fairly straightforward.  The only tricky bit is the
186`button_id` field: since `Interaction` can return a variable number of button
187IDs, depending on how many buttons are currently pressed, the `button_id` field
188must has length `n`.  It would have been OK to use `[+number_of_buttons]`, but
189full field names can get cumbersome, particularly when the length involves are
190more complex expression.  Instead, we set an *alias* for `number_of_buttons`
191using `(n)`, and then use the alias in `button_id`'s length.  The `n` alias is
192not visible outside of the `Interaction` message, and won't be available in the
193generated code, so the short name is not likely to cause confusion.
194
195```
196enum ButtonId:
197  -- Button IDs, specified in table 5-6.
198  BUTTON_A = 0x00
199  BUTTON_B = 0x04
200  BUTTON_C = 0x08
201  BUTTON_D = 0x0C
202  BUTTON_E = 0x01
203  BUTTON_F = 0x05
204  BUTTON_G = 0x09
205  BUTTON_H = 0x0D
206  BUTTON_I = 0x02
207  BUTTON_J = 0x06
208  BUTTON_K = 0x0A
209  BUTTON_L = 0x0E
210  BUTTON_M = 0x03
211  BUTTON_N = 0x07
212  BUTTON_O = 0x0B
213  BUTTON_P = 0x0F
214```
215
216We had to prefix all of the button names with `BUTTON_` because Emboss does not
217allow single-character enum names.
218
219The QUERY IDENTIFICATION and QUERY BUTTONS messages don't have any fields other
220than `checksum`, so we will handle them a bit differently.
221
222```
223struct SetIllumination:
224  -- SET ILLUMINATION message, specified in section 5.3.7.
225
226  0 [+1]    bits:
227    0 [+1]  Flag  red_channel_enable
228      -- Enables setting the RED channel.
229
230    1 [+1]  Flag  blue_channel_enable
231      -- Enables setting the BLUE channel.
232
233    2 [+1]  Flag  green_channel_enable
234      -- Enables setting the GREEN channel.
235
236  1 [+1]    UInt  blink_duty
237      -- Sets the proportion of time between time on and time off for blink
238      -- feature.
239      --
240      -- Minimum value = 0 (no illumination)
241      --
242      -- Maximum value = 240 (constant illumination)
243      [requires: 0 <= this <= 240]
244
245  2 [+2]    UInt  blink_period
246      -- Sets the blink period, in milliseconds.
247      --
248      -- Minimum value = 10
249      --
250      -- Maximum value = 10000
251      [requires: 10 <= this <= 10_000]
252
253  4 [+4]    bits:
254    0 [+32]  UInt:2[16]  intensity
255      -- Intensity values for the unmasked channels.  2 bits of intensity for
256      -- each button.
257```
258
259`SetIllumination` requires us to use bitfields.  The first bitfield is in the
260CHANNEL MASK field: rather than making a single `channel_mask` field, Emboss
261lets us specify the red, green, and blue channel masks separately.
262
263As with `sync_1` and `sync_2`, we have added `[requires: ...]` to the
264`blink_duty` and `blink_period` fields: this time, specifying a range of valid
265values.  `[requires: ...]` accepts an arbitrary expression, which can be as
266simple or as complex as desired.
267
268It is not clear from BogoNEL's documentation whether "bit 0" means the least
269significant or most significant bit of its byte, but a little experimentation
270with the device shows that setting the least significant bit causes
271`SetIllumination` to set its red channel.  Emboss always numbers bits in
272bitfields from least significant (bit 0) to most significant.
273
274The other bitfield is the `intensity` array.  The BN-P-6000404 uses an array of
2752 bit intensity values, so we specify that array.
276
277Finally, we should add all of the sub-messages into `Message`, and also take
278care of `checksum`.  After making those changes, `Message` looks like:
279
280```
281struct Message:
282  -- Top-level message structure, specified in section 5.3 of the BN-P-6000404
283  -- user guide.
284
285  0 [+1]       UInt                 sync_1
286    [requires: this == 0x42]
287
288  1 [+1]       UInt                 sync_2
289    [requires: this == 0x4E]
290
291  2 [+1]       MessageId            message_id
292    -- Type of message
293
294  3 [+1]       UInt                 message_length (ml)
295    -- Length of message, including header and checksum
296
297  if message_id == MessageId.IDENTIFICATION:
298    4 [+ml-8]  Identification       identification
299
300  if message_id == MessageId.INTERACTION:
301    4 [+ml-8]  Interaction          interaction
302
303  if message_id == MessageId.SET_ILLUMINATION:
304    4 [+ml-8]  SetIllumination      set_illumination
305
306  0 [+ml-4]    UInt:8[]             checksummed_bytes
307
308  ml-4 [+4]    UInt                 checksum
309```
310
311By wrapping the various message types in `if message_id == ...` constructs,
312those substructures will only be available when the `message_id` field is set to
313the corresponding message type.  This kind of selection is used for any
314structure field that is only valid some of the time.
315
316The substructures all have the length `ml-8`.  The `ml` is a short alias for the
317`message_length` field; these short aliases are available so that the field
318types and names don't have to be pushed far to the right.  Aliases may only be
319used directly in the same structure definition where they are created; they may
320not be used elsewhere in an Emboss file, and they are not available in the
321generated code.  The length is `ml-8` in this case because the `message_length`
322includes the header and checksum, which left out of the substructures.
323
324Note that we simply don't have any subfield for QUERY IDENTIFICATION or QUERY
325BUTTONS: since those messages do not have any fields, there is no need for a
326zero-byte structure.
327
328We also added the `checksummed_bytes` field as a convenience for computing the
329checksum.
330
331
332### Generate code
333
334Once you have an `.emb`, you will need to generate code from it.
335
336The simplest way to do so is to run the `embossc` tool:
337
338```
339embossc -I src --generate cc --output-path generated bogonel.emb
340```
341
342The `-I` option adds a directory to the *include path*.  The input file -- in
343this case, `bogonel.emb` -- must be found somewhere on the include path.
344
345The `--generate` option specifies which back end to use; `cc` is the C++ back
346end.
347
348The `--output-path` option specifies where the generated file should be placed.
349Note that the output path will include all of the path components of the input
350file: if the input file is `x/y/z.emb`, then the path `x/y/z.emb.h` will be
351appended to the `--output-path`.  Missing directories will be created.
352
353
354<!-- #### Using Bazel -->
355
356<!-- TODO(bolms): Make this usable from Bazel. -->
357
358
359### Include the generated C++ code
360
361Emboss generates a single C++ header file from your `.emb` by appending `.h` to
362the file name: to use the BogoNEL definitions, you would `#include
363"path/to/bogonel.emb.h"` in your C++ code.
364
365Currently, Emboss does not generate a corresponding `.cc` file: the code that
366Emboss generates is all templates, which exist in the `.h`.  Although the Emboss
367maintainers (e.g., bolms@) like the simplicity of generating a single file, this
368could change at some point.
369
370
371### Use the generated C++ code
372
373Emboss generates *views*, which your program can use to read and write existing
374arrays of bytes, and which do not take ownership.  For example:
375
376```c++
377#include "path/to/bogonel.emb.h"
378
379template <typename View>
380bool ChecksumIsCorrect(View message_view);
381
382// Handles BogoNEL BN-P-6000404 device messages from a byte stream.  Returns
383// the number of bytes that were processed.  Unprocessed bytes should be
384// passed into the next call.
385int HandleBogonelPanelMessages(const char *bytes, int byte_count) {
386  auto message_view = bogonel::bnp6000404::MakeMessageView(bytes, byte_count);
387
388  // IsComplete() will return true if the view has enough bytes to fully
389  // contain the message; i.e., that byte_count is at least
390  // message_view.message_length().Read() + 4.
391  if (!message_view->IsComplete()) {
392    return 0;
393  }
394
395  // If Emboss is happy with the message, we still need to check the checksum:
396  // Emboss does not (yet) have support for automatically checking checksums and
397  // CRCs.
398  if (!message_view->Ok() || !ChecksumIsCorrect(message_view)) {
399    // If the message is complete, but not correct, we need to log an error.
400    HandleBrokenMessage(message_view);
401    return message_view->Size();
402  }
403
404
405  // At this point, we know the message is complete and (basically) OK, so
406  // we dispatch it to a message-type-specific handler.
407  switch (message_view->message_id().Read()) {
408    case bogonel::bnp6000404::MessageId::IDENTIFICATION:
409      HandleIdentificationMessage(message_view);
410      break;
411
412    case bogonel::bnp6000404::MessageId::INTERACTION:
413      HandleInteractionMessage(message_view);
414      break;
415
416    case bogonel::bnp6000404::MessageId::QUERY_IDENTIFICATION:
417    case bogonel::bnp6000404::MessageId::QUERY_BUTTONS:
418    case bogonel::bnp6000404::MessageId::SET_ILLUMINATION:
419      Log("Unexpected host to device message type.");
420      break;
421
422    default:
423      Log("Unknown message type.");
424      break;
425  }
426
427  return message_view->Size();
428}
429
430template <typename View>
431bool ChecksumIsCorrect(View message_view) {
432  uint32_t checksum = 0;
433  for (int i = 0; i < message_view.checksum_bytes().ElementCount(); ++i) {
434    checksum += message_view.checksum_bytes()[i].Read();
435  }
436  return checksum == message_view.checksum().Read();
437}
438```
439
440<!-- TODO(bolms): solidify support for checksums, so that the Ok() call in the
441example actually checks them. -->
442
443The `message_view` object in this example is a lightweight object that simply
444provides *access* to the bytes in `message`.  Emboss views are very cheap to
445construct because they only contain a couple of pointers and a length -- they do
446not copy or take ownership of the underlying bytes.  This also means that you
447have to keep the underlying bytes alive as long as you are using a view -- you
448can't let them go out of scope or delete them.
449
450Views can also be used for writing, if they are given pointers to mutable
451memory:
452
453```c++
454void ConstructSetIlluminationMessage(const vector<bool> &lit_buttons,
455                                     vector<char> *result) {
456  // The SetIllumination message has a constant size, so SizeInBytes() is
457  // available as a static method.
458  int length = bogonel::bnp6000404::SetIllumination::SizeInBytes() + 8;
459  result->clear();
460  result->resize(length);
461
462  auto view = bogonel::bnp6000404::MakeMessageView(result);
463  view->sync_1().Write(0x42);
464  view->sync_2().Write(0x4E);
465  view->message_id().Write(bogonel::bnp6000404::MessageId::SET_ILLUMINATION);
466  view->message_length().Write(length);
467  view->set_illumination().red_channel_enable().Write(true);
468  view->set_illumination().blue_channel_enable().Write(true);
469  view->set_illumination().green_channel_enable().Write(true);
470  view->set_illumination().blink_duty().Write(240);
471  view->set_illumination().blink_period().Write(10000);
472  for (int i = 0; i < view->set_illumination().intensity().ElementCount();
473       ++i) {
474    view->set_illumination().intensity()[i].Write(lit_buttons[i] ? 3 : 0);
475  }
476}
477```
478
479
480### Use the `.emb` Autoformatter
481
482You can use the `.emb` autoformatter to avoid manual formatting.  For now, it is
483available at `compiler/front_end/format.py`.
484
485*TODO(bolms): Package the Emboss tools for easy workstation installation.*
486