xref: /aosp_15_r20/external/emboss/doc/language-reference.md (revision 99e0aae7469b87d12f0ad23e61142c2d74c1ef70)
1# Emboss Language Reference
2
3## Top Level Structure
4
5An `.emb` file contains four sections: a documentation block, imports, an
6attribute block, containing attributes which apply to the whole module, followed
7by a list of type definitions:
8
9```
10# Documentation block (optional)
11-- This is an example of an .emb file, with every section.
12
13# Imports (optional)
14import "other.emb" as other
15import "project/more.emb" as project_more
16
17# Attribute block (optional)
18[$default byte_order: "LittleEndian"]
19[(cpp) namespace: "foo::bar::baz"]
20[(java) namespace: "com.example.foo.bar.baz"]
21
22# Type definitions
23enum Foo:
24  ONE    = 1
25  TEN    = 10
26  PURPLE = 12
27
28struct Bar:
29  0 [+4]  Foo       purple
30  4 [+4]  UInt      payload_size (s)
31  8 [+s]  UInt:8[]  payload
32```
33
34The documentation and/or attribute blocks may be omitted if they are not
35necessary.
36
37
38### Comments
39
40Comments start with `#` and extend to the end of the line:
41
42```
43struct Foo:  # This is a comment
44  # This is a comment
45  0 [+1]  UInt  field  # This is a comment
46```
47
48Comments are ignored.  They should not be confused with
49[*documentation*](#documentation), which is intended to be used by some back
50ends.
51
52
53## Documentation
54
55Documentation blocks may be attached to modules, types, fields, or enum values.
56They are different from comments in that they will be used by the
57(not-yet-ready) documentation generator back-end.
58
59Documentation blocks take the form of any number of lines starting with `-- `:
60
61```
62-- This is a module documentation block.  Text in this block will be attached to
63-- the module as documentation.
64--
65-- This is a new paragraph in the same module documentation block.
66--
67-- Module-level documentation should describe the purpose of the module, and may
68-- point out the most salient features of the module.
69
70struct Message:
71  -- This is a documentation block attached to the Message structure.  It should
72  -- describe the purpose of Message, and how it should be used.
73  0 [+4]  UInt         header_length
74    -- This is documentation for the header_length field.  Again, it should
75    -- describe this specific field.
76  4 [+4]  MessageType  message_type  -- Short docs can go on the same line.
77```
78
79Documentation should be written in CommonMark format, ignoring the leading
80`-- `.
81
82
83## Imports
84
85An `import` line tells Emboss to read another `.emb` file and make its types
86available to the current file under the given name.  For example, given the
87import line:
88
89```
90import "other.emb" as helper
91```
92
93then the type `Type` from `other.emb` may be referenced as `helper.Type`.
94
95The `--import-dir` command-line flag tells Emboss which directories to search
96for imported files; it may be specified multiple times.  If no `--import-dir` is
97specified, Emboss will search the current working directory.
98
99
100## Attributes
101
102Attributes are an extensible way of adding arbitrary information to a module,
103type, field, or enum value.  Currently, only whitelisted attributes are allowed
104by the Emboss compiler, but this may change in the future.
105
106Attributes take a form like:
107
108```
109[name: value]            # name has value for the current entity.
110[$default name: value]   # Default name to value for all sub-entities.
111[(backend) name: value]  # Attribute for a specific back end.
112```
113
114
115### `byte_order`
116
117The `byte_order` attribute is used to specify the byte order of `bits` fields
118and of field with an atomic type, such as `UInt`.
119
120`byte_order` takes a string value, which must be either `"BigEndian"`,
121`"LittleEndian"`, or `"Null"`:
122
123```
124[$default byte_order: "LittleEndian"]
125
126struct Foo:
127  [$default byte_order: "Null"]
128
129  0 [+4]  UInt  bar
130    [byte_order: "BigEndian"]
131
132  4 [+4]  bits:
133    [byte_order: "LittleEndian"]
134
135    0  [+23]  UInt  baz
136    23 [+9]   UInt  qux
137
138  8 [+1]  UInt  froble
139```
140
141A `$default` byte order may be set on a module or structure.
142
143The `"BigEndian"` and `"LittleEndian"` byte orders set the byte order to big or
144little endian, respectively.  That is, for little endian:
145
146```
147  byte 0   byte 1   byte 2   byte 3
148+--------+--------+--------+--------+
149|76543210|76543210|76543210|76543210|
150+--------+--------+--------+--------+
151 ^      ^ ^      ^ ^      ^ ^      ^
152 07    00 15    08 23    16 31    24
153 ^^^^^^^^^^^^^^^ bit ^^^^^^^^^^^^^^^
154```
155
156And for big endian:
157
158```
159  byte 0   byte 1   byte 2   byte 3
160+--------+--------+--------+--------+
161|76543210|76543210|76543210|76543210|
162+--------+--------+--------+--------+
163 ^      ^ ^      ^ ^      ^ ^      ^
164 31    24 23    16 15    08 07    00
165 ^^^^^^^^^^^^^^^ bit ^^^^^^^^^^^^^^^
166```
167
168The `"Null"` byte order is used if no `byte_order` attribute is specified.
169`"Null"` indicates that the byte order is unknown; it is an error if a
170byte-order-dependent field that is not exactly 8 bits has the `"Null"` byte
171order.
172
173
174### `requires`
175
176The `requires` attribute may be placed on an atomic field (e.g., type `UInt`,
177`Int`, `Flag`, etc.) to specify a predicate that values of that field must
178satisfy, or on a `struct` or `bits` to specify relationships between fields that
179must be satisfied.
180
181```
182struct Foo:
183  [requires: bar < qux]
184
185  0 [+4]  UInt  bar
186    [requires: this <= 999_999_999]
187
188  4 [+4]  UInt  qux
189    [requires: 100 <= this <= 1_000_000_000]
190
191  let bar_plus_qux = bar + qux
192    [requires: this >= 199]
193```
194
195For `[requires]` on a field, other fields may not be referenced, and the value
196of the current field must be referred to as `this`.
197
198For `[requires]` on a `struct` or `bits`, any atomic field in the structure may
199be referenced.
200
201
202### `(cpp) namespace`
203
204The `namespace` attribute is used by the C++ back end to determine which
205namespace to place the generated code in:
206
207```
208[(cpp) namespace: "foo::bar::baz"]
209```
210
211A leading `::` is allowed, but not required; the previous example could also be
212written as:
213
214```
215[(cpp) namespace: "::foo::bar::baz"]
216```
217
218Internally, Emboss will translate either of these into a nested `namespace foo {
219namespace bar { namespace baz { ... } } }` wrapping the generated C++ code for
220this module.
221
222The `namespace` attribute may only be used at the module level; all structures
223and enums within a module will be placed in the same namespace.
224
225### `(cpp) enum_case`
226
227The `enum_case` attribute can be specified for the C++ backend to specify
228in which case the enum values should be emitted to generated source. It does
229not change the text representation, which always uses the original emboss
230definition name as the canonical name.
231
232Currently, the supported cases are`SHOUTY_CASE` and `kCamelCase`.
233
234A `$default` enum case can be set on a module, struct, bits, or enum and
235applies to all enum values within that module, struct, bits, or enum
236definition.
237
238For example, to use `kCamelCase` by default for all enum values in a module:
239
240```
241[$default enum_case: "kCamelCase"]
242```
243
244This will change enum names like `UPPER_CHANNEL_RANGE_LIMIT` to
245`kUpperChannelRangeLimit` in the C++ source for all enum values in the module.
246Multiple case names can be specified, which is especially useful when
247transitioning between two cases:
248
249```
250[enum_case: "SHOUTY_CASE, kCamelCase"]
251```
252
253### `text_output`
254
255The `text_output` attribute may be attached to a `struct` or `bits` field to
256control whether or not the field is included when emitting the text format
257version of the structure.  For example:
258
259```
260struct SuppressedField:
261  0 [+1]  UInt  a
262  1 [+1]  UInt  b
263    [text_output: "Skip"]
264```
265
266The text format output (as from `emboss::WriteToString()` in C++) would be of
267the form:
268
269```
270{ a: 1 }
271```
272
273instead of the default:
274
275```
276{ a: 1, b: 2 }
277```
278
279For completeness, `[text_output: "Emit"]` may be used to explicitly specify that
280a field should be included in text output.
281
282
283### `external` specifier attributes
284
285The `addressable_unit_size`, `type_requires`, `fixed_size_in_bits`, and
286`is_integer` attributes are used on `external` types to tell the compiler what
287it needs to know about the `external` types.  They are currently
288unstable, and should only be used internally.
289
290
291## Type Definitions
292
293Emboss allows you to define structs, unions, bits, and enums, and uses externals
294to define "basic types."  Types may be defined in any order, and may freely
295reference other types in the same module or any imported modules (including the
296implicitly-imported prelude).
297
298### `struct`
299
300A `struct` defines a view of a sequence of bytes.  Each field of a `struct` is a
301view of some particular subsequence of the `struct`'s bytes, whose
302interpretation is determined by the field's type.
303
304For example:
305
306```
307struct FramedMessage:
308  -- A FramedMessage wraps a Message with magic bytes, lengths, and CRC.
309  [$default byte_order: "LittleEndian"]
310  0   [+4]  UInt     magic_value
311  4   [+4]  UInt     header_length (h)
312  8   [+4]  UInt     message_length (m)
313  h   [+m]  Message  message
314  h+m [+4]  UInt     crc32
315    [byte_order: "BigEndian"]
316```
317
318The first line introduces the `struct` and gives it a name.  This name may be
319used in field definitions to specify that the field has a structured type, and
320is used in the generated code.  For example, to read the `message_length` from a
321sequence of bytes in C++, you would construct a `FramedMessageView` over the
322bytes:
323
324```c++
325// vector<uint8_t> bytes;
326auto framed_message_view = FramedMessageView(&bytes[0], bytes.size());
327uint32_t message_length = framed_message_view.message_length().Read();
328```
329
330(Note that the `FramedMessageView` does not take ownership of the bytes: it only
331provides a view of them.)
332
333Each field starts with a byte range (`0 [+4]`) that indicates *where* the field
334sits in the struct.  For example, the `magic_value` field covers the first four
335bytes of the struct.
336
337Field locations *do not have to be constants*.  In the example above, the
338`message` field starts at the end of the header (as determined by the
339`header_length` field) and covers `message_length` bytes.
340
341After the field's location is the field's *type*.  The type determines how the
342field's bytes are interpreted: the `header_length` field will be interpreted as
343an unsigned integer (`UInt`), while the `message` field is interpreted as a
344`Message` -- another `struct` type defined elsewhere.
345
346After the type is the field's *name*: this is a name used in the generated code
347to access that field, as in `framed_message_view.message_length()`.  The name
348may be followed by an optional *abbreviation*, like the `(h)` after
349`header_length`.  The abbreviation can be used elsewhere in the `struct`, but is
350not available in the generated code: `framed_message_view.h()` wouldn't compile.
351
352Finally, fields may have attributes and documentation, just like any other
353Emboss construct.
354
355
356#### `$next`
357
358The keyword `$next` may be used in the offset expression of a physical field:
359
360```
361struct Foo:
362  0     [+4]  UInt  x
363  $next [+2]  UInt  y
364  $next [+1]  UInt  z
365  $next [+4]  UInt  q
366```
367
368`$next` translates to a built-in constant meaning "the end of the previous
369physical field."  In the example above, `y` will start at offset 4 (0 + 4), `z`
370starts at offset 6 (4 + 2), and `q` at 7 (6 + 1).
371
372`$next` may be used in `bits` as well as `struct`s:
373
374```
375bits Bar:
376  0     [+4]  UInt  x
377  $next [+2]  UInt  y
378  $next [+1]  UInt  z
379  $next [+4]  UInt  q
380```
381
382You may use `$next` like a regular variable.  For example, if you want to leave
383a two-byte gap between `z` and `q` (so that `q` starts at offset 9):
384
385```
386struct Foo:
387  0       [+4]  UInt  x
388  $next   [+2]  UInt  y
389  $next   [+1]  UInt  z
390  $next+2 [+4]  UInt  q
391```
392
393`$next` is particularly useful if your datasheet defines structures as lists of
394fields without offsets, or if you are translating from a C or C++ packed
395`struct`.
396
397
398#### Parameters
399
400`struct`s and `bits` can take runtime parameters:
401
402```
403struct Foo(x: Int:8, y: Int:8):
404  0 [+x]  UInt:8[]  xs
405  x [+y]  UInt:8[]  ys
406
407enum Version:
408  VERSION_1 = 10
409  VERSION_2 = 20
410
411struct Bar(version: Version):
412  0 [+1]  UInt  payload
413  if payload == 1 && version == Version.VERSION_1:
414    1 [+10]  OldPayload1  old_payload_1
415  if payload == 1 && version == Version.VERSION_2:
416    1 [+12]  NewPayload1  new_payload_1
417```
418
419Each parameter must have the form *name`:` type*.  Currently, the *type* can
420be:
421
422*   <code>UInt:*n*</code>, where *`n`* is a number from 1 to 64, inclusive.
423*   <code>Int:*n*</code>, where *`n`* is a number from 1 to 64, inclusive.
424*   The name of an Emboss `enum` type.
425
426`UInt`- and `Int`-typed parameters are integers with the corresponding range:
427for example, an `Int:4` parameter can have any integer value from -8 to +7.
428
429`enum`-typed parameters can take any value in the `enum`'s native range.  Note
430that Emboss `enum`s are *open*, so unnamed values are allowed.
431
432Parameterized structures can be included in other structures by passing their
433parameters:
434
435```
436struct Baz:
437  0 [+1]     Version       version
438  1 [+1]     UInt:8        size
439  2 [+size]  Bar(version)  bar
440```
441
442
443#### Virtual "Fields"
444
445It is possible to define a non-physical "field" whose value is an expression:
446
447```
448struct Foo:
449  0 [+4]  UInt  bar
450  let two_bar = 2 * bar
451```
452
453These virtual "fields" may be used like any other field in most circumstances:
454
455```
456struct Bar:
457  0           [+4]  Foo   foo
458  if foo.two_bar < 100:
459    foo.two_bar [+4]  UInt  uint_at_offset_two_bar
460```
461
462Virtual fields may be integers, booleans, or an enum:
463
464```
465enum Size:
466  SMALL = 1
467  LARGE = 2
468
469struct Qux:
470  0 [+4]  UInt  x
471  let x_is_big = x > 100
472  let x_size = x_is_big ? Size.LARGE : Size.SMALL
473```
474
475When a virtual field has a constant value, you may refer to it using its type:
476
477```
478struct Foo:
479  let foo_offset = 0x120
480  0 [+4]  UInt  foo
481
482struct Bar:
483  Foo.foo_offset [+4]  Foo  foo
484```
485
486This does not work for non-constant virtual fields:
487
488```
489struct Foo:
490  0 [+4]  UInt  foo
491  let foo_offset = foo + 10
492
493struct Bar:
494  Foo.foo_offset [+4]  Foo  foo  # Won't compile.
495```
496
497Note that, in some cases, you *must* use Type.field, and not field.field:
498
499```
500struct Foo:
501  0 [+4]  UInt  foo
502  let foo_offset = 10
503
504struct Bar:
505  # Won't compile: foo.foo_offset depends on foo, which depends on
506  # foo.foo_offset.
507  foo.foo_offset [+4]  Foo  foo
508
509  # Will compile: Foo.foo_offset is a static constant.
510  Foo.foo_offset [+4]  Foo  foo
511```
512
513This limitation may be lifted in the future, but it has no practical effect.
514
515
516##### Aliases
517
518Virtual fields of the form `let x = y` or `let x = y.z.q` are allowed even when
519`y` or `q` are composite fields.  Virtuals of this form are considered to be
520*aliases* of the referred field; in generated code, they may be written as well
521as read, and writing through them is equivalent to writing to the aliased field.
522
523
524##### Simple Transforms
525
526Virtual fields of the forms `let x1 = y + 1`, `let x2 = 2 + y`, `let x3 = y -
5273`, and `let x4 = 4 - y`, where `y` is a writeable field, will be writeable in
528the generated code.  When writing through these fields, the transformed field
529will be set to an appropriate value.  For example, writing `5` to `x1` will
530actually write `4` to `y`, and writing `6` to `x4` will write `-2` to `y`.  This
531can be used to model fields whose raw values should be adjusted by some constant
532value, e.g.:
533
534```
535struct PosixDate:
536  0 [+1]  Int  raw_year
537    -- Number of years since 1900.
538
539  let year = raw_year + 1900
540    -- Gregorian year number.
541
542  1 [+1]  Int  zero_based_month
543    -- Month number, from 0-11.  Good for looking up a month name in a table.
544
545  let month = zero_based_month + 1
546    -- Month number, from 1-12.  Good for printing directly.
547
548  2 [+1]  Int  day
549    -- Day number, one-based.
550```
551
552
553#### Subtypes
554
555A `struct` definition may contain other type definitions:
556
557```
558struct Foo:
559  struct Bar:
560    0 [+2]  UInt  baz
561    2 [+2]  UInt  qux
562
563  0 [+4]  Bar  bar
564  4 [+4]  Bar  bar2
565```
566
567
568#### Conditional fields
569
570A `struct` field may have fields which are only present under some
571circumstances.  For example:
572
573```
574struct FramedMessage:
575  0 [+4]  enum  message_id:
576    TYPE1 = 1
577    TYPE2 = 2
578
579  if message_id == MessageId.TYPE1:
580    4 [+16]  Type1Message  type_1_message
581
582  if message_id == MessageId.TYPE2:
583    4 [+8]   Type2Message  type_2_message
584```
585
586The `type_1_message` field will only be available if `message_id` is `TYPE1`,
587and similarly the `type_2_message` field will only be available if `message_id`
588is `TYPE2`.  If `message_id` is some other value, then neither field will be
589available.
590
591
592#### Inline `struct`
593
594It is possible to define a `struct` inline in a `struct` field.  For example:
595
596```
597struct Message:
598  [$default byte_order: "BigEndian"]
599  0 [+4]  UInt    message_length
600  4 [+4]  struct  payload:
601    0 [+1]   UInt    incoming
602    2 [+2]   UInt    scale_factor
603```
604
605This is equivalent to:
606
607```
608struct Message:
609  [$default byte_order: "BigEndian"]
610
611  struct Payload:
612    0 [+1]   UInt    incoming
613    2 [+2]   UInt    scale_factor
614
615  0 [+4]  UInt     message_length
616  4 [+4]  Payload  payload
617```
618
619This can be useful as a way to group related fields together.
620
621
622#### Using `struct` to define a C-like `union`
623
624Emboss doesn't support C-like `union`s directly via built in type
625definitions. However, you can use Emboss's overlapping fields feature to
626effectively create a `union`:
627
628```
629struct Foo:
630  0 [+1] UInt a
631  0 [+2] UInt b
632  0 [+4] UInt c
633```
634
635
636#### Automatically-Generated Fields
637
638A `struct` will have `$size_in_bytes`, `$max_size_in_bytes`, and
639`$min_size_in_bytes` virtual field automatically generated.  These virtual field
640can be referenced inside the Emboss language just like any other virtual field:
641
642```
643struct Inner:
644  0 [+4]  UInt  field_a
645  4 [+4]  UInt  field_b
646
647struct Outer:
648  0 [+1]                       UInt   message_type
649  if message_type == 4:
650    4 [+Inner.$size_in_bytes]  Inner  payload
651```
652
653
654##### `$size_in_bytes` {#size-in-bytes}
655
656An Emboss `struct` has an *intrinsic* size, which is the size required to hold
657every field in the `struct`, regardless of how many bytes are in the buffer that
658backs the `struct`.  For example:
659
660```
661struct FixedSize:
662  0 [+4]  UInt  long_field
663  4 [+2]  UInt  short_field
664```
665
666In this case, `FixedSize.$size_in_bytes` will always be `6`, even if a
667`FixedSize` is placed in a larger field:
668
669```
670struct Envelope:
671  # padded_payload.$size_in_bytes == FixedSize.$size_in_bytes == 6
672  0 [+8]  FixedSize  padded_payload
673```
674
675The intrinsic size of a `struct` might not be constant:
676
677```
678struct DynamicallySizedField:
679  0 [+1]       UInt      length
680  1 [+length]  UInt:8[]  payload
681  # $size_in_bytes == 1 + length
682
683struct DynamicallyPlacedField:
684  0 [+1]       UInt  offset
685  offset [+1]  UInt  payload
686  # $size_in_bytes == offset + 1
687
688struct OptionalField:
689  0 [+1]    UInt  version
690  if version > 3:
691    1 [+1]  UInt  optional_field
692  # $size_in_bytes == (version > 3 ? 2 : 1)
693```
694
695If the intrinsic size is dynamic, it can still be read dynamically from a field:
696
697```
698struct Envelope2:
699  0 [+1]             UInt                   payload_size
700  1 [+payload_size]  DynamicallySizedField  payload
701  let padding_bytes = payload_size - payload.$size_in_bytes
702```
703
704
705##### `$max_size_in_bytes` {#max-size-in-bytes}
706
707The `$max_size_in_bytes` virtual field is a constant value that is at least as
708large as the largest possible value for `$size_in_bytes`.  In most cases, it
709will exactly equal the largest possible message size, but it is possible to
710outsmart Emboss's bounds checker.
711
712```
713struct DynamicallySizedStruct:
714  0 [+1]       UInt      length
715  1 [+length]  UInt:8[]  payload
716
717struct PaddedContainer:
718  0 [+DynamicallySizedStruct.$max_size_in_bytes]  DynamicallySizedStruct  s
719  # s will be 256 bytes long.
720```
721
722
723##### `$min_size_in_bytes` {#min-size-in-bytes}
724
725The `$min_size_in_bytes` virtual field is a constant value that is no larger
726than the smallest possible value for `$size_in_bytes`.  In most cases, it will
727exactly equal the smallest possible message size, but it is possible to
728outsmart Emboss's bounds checker.
729
730```
731struct DynamicallySizedStruct:
732  0 [+1]       UInt      length
733  1 [+length]  UInt:8[]  payload
734
735struct PaddedContainer:
736  0 [+DynamicallySizedStruct.$min_size_in_bytes]  DynamicallySizedStruct  s
737  # s will be 1 byte long.
738```
739
740
741### `enum`
742
743An `enum` defines a set of named integers.
744
745```
746enum Color:
747  BLACK   = 0
748  RED     = 1
749  GREEN   = 2
750  YELLOW  = 3
751  BLUE    = 4
752  MAGENTA = 5
753  CYAN    = 6
754  WHITE   = 7
755
756struct PaletteEntry:
757  0 [+1]  UInt   id
758  1 [+1]  Color  color
759```
760
761Enum values are always read the same way as `Int` or `UInt` -- that is, as an
762unsigned integer or as a 2's-complement signed integer, depending on whether the
763`enum` contains any negative values or not.
764
765Enum values do not have to be contiguous, and may repeat:
766
767```
768enum Baud:
769  B300     = 300
770  B600     = 600
771  B1200    = 1200
772  STANDARD = 1200
773```
774
775All values in a single `enum` must either be between -9223372036854775808
776(-2^63) and 9223372036854775807 (2^(63)-1), inclusive, or between 0 and
77718446744073709551615 (2^(64)-1), inclusive.
778
779It is valid to have an `enum` field that is too small to contain some values in
780the `enum`:
781
782```
783enum LittleAndBig:
784  LITTLE  = 1
785  BIG     = 0x1_0000_0000
786
787struct LittleOnly:
788  0 [+1]  LittleAndBig:8  little_only  # Too small to hold LittleAndBig.BIG
789```
790
791Emboss `enum`s are *open*: they may take values that are not defined in the
792`.emb`, as long as those values are in range.  The `is_signed` and
793`maximum_bits` attributes, below, may be used to control the allowed range of
794values.
795
796
797#### `is_signed` Attribute
798
799The attribute `is_signed` may be used to explicitly specify whether an `enum`
800is signed or unsigned.  Normally, an `enum` is signed if there is at least one
801negative value, and unsigned otherwise, but this behavior can be overridden:
802
803```
804enum ExplicitlySigned:
805  [is_signed: true]
806  POSITIVE = 10
807```
808
809
810#### `maximum_bits` Attribute
811
812The attribute `maximum_bits` may be used to specify the *maximum* width of an
813`enum`: fields of `enum` type may be smaller than `maximum_bits`, but never
814larger:
815
816```
817enum ExplicitlySized:
818  [maximum_bits: 32]
819  MAX_VALUE = 0xffff_ffff
820
821struct Foo:
822  0 [+4]  ExplicitlySized  four_bytes  # 32-bit is fine
823  #4 [+8]  ExplicitlySized  eight_bytes  # 64-bit field would be an error
824```
825
826If not specified, `maximum_bits` defaults to `64`.
827
828This also allows back end code generators to use smaller types for `enum`s, in
829some cases.
830
831
832#### Inline `enum`
833
834It is possible to provide an enum definition directly in a field definition in a
835`struct` or `bits`:
836
837```
838struct TurnSpecification:
839  0 [+1]  UInt  degrees
840  1 [+1]  enum  direction:
841    LEFT  = 0
842    RIGHT = 1
843```
844
845This example creates a nested `enum` `TurnSpecification.Direction`, exactly as
846if it were written:
847
848```
849struct TurnSpecification:
850  enum Direction:
851    LEFT  = 0
852    RIGHT = 1
853
854  0 [+1]  UInt       degrees
855  1 [+1]  Direction  direction
856```
857
858This can be useful when a particular `enum` is short and only used in one place.
859
860Note that `maximum_bits` and `is_signed` cannot be used on an inline `enum`.
861If you need to use either of these attributes, make a separate `enum`.
862
863
864### `bits`
865
866A `bits` defines a view of an ordered sequence of bits.  Each field is a view of
867some particular subsequence of the `bits`'s bits, whose interpretation is
868determined by the field's type.
869
870The structure of a `bits` definition is very similar to a `struct`, except that
871a `struct` provides a structured view of bytes, where a `bits` provides a
872structured view of bits.  Fields in a `bits` must have bit-oriented types (such
873as other `bits`, `UInt`, `Bcd`, `Flag`).  Byte-oriented types, such as
874`struct`s, may not be embedded in a `bits`.
875
876For example:
877
878```
879bits ControlRegister:
880  -- The `ControlRegister` holds basic control values.
881
882  4 [+12]  UInt  horizontal_start_offset
883    -- The number of pixel clock ticks to wait after the start of a line
884    -- before starting to draw pixel data.
885
886  3 [+1]   Flag  horizontal_overscan_disable
887    -- If set, the electron gun will be disabled during the overscan period,
888    -- otherwise the overscan color will be used.
889
890  0 [+3]   UInt  horizontal_overscan_color
891    -- The palette index of the overscan color to use.
892
893struct RegisterPage:
894  -- The registers of the BGA (Bogus Graphics Array) card.
895
896  0 [+2]  ControlRegister  control_register
897    [byte_order: "LittleEndian"]
898```
899
900The first line introduces the `bits` and gives it a name.  This name may be
901used in field definitions to specify that the field has a structured type, and
902is used in the generated code.
903
904For example, to write a `horizontal_overscan_color` of 7 to a pair of bytes in
905C++, you would use:
906
907```c++
908// vector<uint8_t> bytes;
909auto register_page_view = RegisterPageWriter(&bytes[0], bytes.size());
910register_page_view.control_register().horizontal_overscan_color().Write(7);
911```
912
913Similar to `struct`, each field starts with a *bit* range (`4 [+12]`) that
914indicates which bits it covers.  For example, the `horizontal_overscan_disable`
915field only covers bit 3.  Bit 0 always corresponds to the lowest-order bit the
916bitfield; that is, if a `UInt` covers the same bits as the `bits` construct,
917then bit 0 in the `bits` will be the same as the `UInt` mod 2.  This is often,
918but not always, how bits are numbered in protocol specifications.
919
920After the field's location is the field's *type*.  The type determines how the
921field's bits are interpreted: typical choices are `UInt` (for unsigned
922integers), `Flag` (for boolean flags), and `enum`s.  Other `bits` may also be
923used, as well as any `external` types declared with `[addressable_unit_size:
9241]`.
925
926Fields may have attributes and documentation, just like any other Emboss
927construct.
928
929In generated code, reading or writing any field of a `bits` construct will cause
930the entire field to be read or written -- something to keep in mind when reading
931or writing a memory-mapped register space.
932
933
934#### Anonymous `bits`
935
936It is possible to use an anonymous `bits` definition directly in a `struct`;
937for example:
938
939```
940struct Message:
941  [$default byte_order: "BigEndian"]
942  0 [+4]     UInt  message_length
943  4 [+4]     bits:
944    0 [+1]   Flag  incoming
945    1 [+1]   Flag  last_fragment
946    2 [+4]   UInt  scale_factor
947    31 [+1]  Flag  error
948```
949
950In this case, the fields of the `bits` will be treated as though they are fields
951of the outer struct.
952
953
954#### Inline `bits`
955
956Like `enum`s, it is also possible to define a named `bits` inline in a `struct`
957or `bits`.  For example:
958
959```
960struct Message:
961  [$default byte_order: "BigEndian"]
962  0 [+4]     UInt  message_length
963  4 [+4]     bits  payload:
964    0 [+1]   Flag  incoming
965    1 [+1]   Flag  last_fragment
966    2 [+4]   UInt  scale_factor
967    31 [+1]  Flag  error
968```
969
970This is equivalent to:
971
972```
973struct Message:
974  [$default byte_order: "BigEndian"]
975
976  bits  Payload:
977    0 [+1]   Flag  incoming
978    1 [+1]   Flag  last_fragment
979    2 [+4]   UInt  scale_factor
980    31 [+1]  Flag  error
981
982  0 [+4]  UInt     message_length
983  4 [+4]  Payload  payload
984```
985
986This can be useful as a way to group related fields together.
987
988
989#### Automatically-Generated Fields
990
991A `bits` will have `$size_in_bits`, `$max_size_in_bits`, and `$min_size_in_bits`
992virtual fields automatically generated.  These virtual fields can be referenced
993inside the Emboss language just like any other virtual field:
994
995```
996bits Inner:
997  0 [+4]  UInt  field_a
998  4 [+4]  UInt  field_b
999
1000struct Outer:
1001  0 [+1]                      UInt   message_type
1002  if message_type == 4:
1003    4 [+Inner.$size_in_bits]  Inner  payload
1004```
1005
1006
1007##### `$size_in_bits` {#size-in-bits}
1008
1009Like a `struct`, an Emboss `bits` has an *intrinsic* size, which is the size
1010required to hold every field in the `bits`, regardless of how many bits are
1011in the buffer that backs the `bits`.  For example:
1012
1013```
1014bits FixedSize:
1015  0 [+3]  UInt  long_field
1016  3 [+1]  Flag  short_field
1017```
1018
1019In this case, `FixedSize.$size_in_bits` will always be `4`, even if a
1020`FixedSize` is placed in a larger field:
1021
1022```
1023struct Envelope:
1024  # padded_payload.$size_in_bits == FixedSize.$size_in_bits == 4
1025  0 [+8]  FixedSize  padded_payload
1026```
1027
1028Unlike `struct`s, the size of `bits` must known at compile time; there are no
1029dynamic `$size_in_bits` fields.
1030
1031
1032##### `$max_size_in_bits` {#max-size-in-bits}
1033
1034Since `bits` must be fixed size, the `$max_size_in_bits` field has the same
1035value as `$size_in_bits`.  It is provided for consistency with
1036`$max_size_in_bytes`.
1037
1038
1039##### `$min_size_in_bits` {#min-size-in-bits}
1040
1041Since `bits` must be fixed size, the `$min_size_in_bits` field has the same
1042value as `$size_in_bits`.  It is provided for consistency with
1043`$min_size_in_bytes`.
1044
1045
1046### `external`
1047
1048An `external` type is used when a type cannot be defined in Emboss itself;
1049instead, external code must be provided to manipulate the type.
1050
1051Emboss's built-in types, such as `UInt`, `Bcd`, and `Flag`, are defined this way
1052in a special file called the *prelude*.  For example, `UInt` is defined as:
1053
1054```
1055external UInt:
1056  -- UInt is an automatically-sized unsigned integer.
1057  [type_requires: $is_statically_sized && 1 <= $static_size_in_bits <= 64]
1058  [is_integer: true]
1059  [addressable_unit_size: 1]
1060```
1061
1062`external` types are an unstable feature.  Contact `emboss-dev` if you would
1063like to add your own `external`s.
1064
1065
1066## Builtin Types and the Prelude
1067
1068Emboss has a built-in module called the *Prelude*, which contains types that are
1069automatically usable from any module.  In particular, types like `Int` and
1070`UInt` are defined in the Prelude.
1071
1072The Prelude is (more or less) a standard Emboss file, called `prelude.emb`, that
1073is embedded in the Emboss compiler.
1074
1075<!-- TODO(bolms): When the documentation generator backend is built, generate
1076the Prelude documentation from prelude.emb. -->
1077
1078
1079### `UInt`
1080
1081A `UInt` is an unsigned integer.  `UInt` can be anywhere from 1 to 64 bits in
1082size, and may be used both in `struct`s and in `bits`.  `UInt` fields may be
1083referenced in integer expressions.
1084
1085
1086### `Int`
1087
1088An `Int` is a signed two's-complement integer.  `Int` can be anywhere from 1 to
108964 bits in size, and may be used both in `struct`s and in `bits`.  `Int` fields
1090may be referenced in integer expressions.
1091
1092
1093### `Bcd`
1094
1095(Note: `Bcd` is subject to change.)
1096
1097A `Bcd` is an unsigned binary-coded decimal integer.  `Bcd` can be anywhere from
10981 to 64 bits in size, and may be used both in `struct`s and in `bits`.  `Bcd`
1099fields may be referenced in integer expressions.
1100
1101When a `Bcd`'s size is not a multiple of 4 bits, the high-order "digit" is
1102treated as if it were zero-extended to a multiple of 4 bits.  For example, a
11037-bit `Bcd` value can store any number from 0 to 79.
1104
1105
1106### `Flag`
1107
1108A `Flag` is a 1-bit boolean value.  A stored value of `0` means `false`, and a
1109stored value of `1` means `true`.
1110
1111
1112### `Float`
1113
1114A `Float` is a floating-point value in an IEEE 754 binaryNN format, where NN is
1115the bit width.
1116
1117Only 32- and 64-bit `Float`s are supported.  There are no current plans to
1118support 16- or 128-bit `Float`s, nor the nonstandard x86 80-bit `Float`s.
1119
1120IEEE 754 does not specify which NaN bit patterns are signalling NaNs and which
1121are quiet NaNs, and thus Emboss also does not specify which NaNs are which.
1122This means that a quiet NaN written through an Emboss view one system could be
1123read out as a signalling NaN through an Emboss view on a different system.  If
1124this is a concern, the application must explicitly check for NaN before doing
1125arithmetic on any floating-point value read from a `Float` field.
1126
1127
1128## General Syntax
1129
1130### Names
1131
1132All names in Emboss must be ASCII, for compatibility with languages such as C
1133and C++ that do not support Unicode identifiers.
1134
1135Type names in Emboss are always `CamelCase`.  They must start with a capital
1136letter, contain at least one lower-case letter, and contain only letters and
1137digits.  They are required to match the regex
1138`[A-Z][a-zA-Z0-9]*[a-z][a-zA-Z0-9]*`
1139
1140Imported module names and field names are always `snake_case`.  They must start
1141with a lower-case letter, and may only contain lower-case letters, numbers, and
1142underscore.  They must match the regex `[a-z][a-z_0-9]*`.
1143
1144Enum value names are always `SHOUTY_CASE`.  They must start with a capital
1145letter, may only contain capital letters, numbers, and underscore, and must be
1146at least two characters long.  They must match the regex
1147`[A-Z][A-Z_0-9]*[A-Z_][A-Z_0-9]*`.
1148
1149Additionally, names that are used as keywords in common programming languages
1150are disallowed.  A complete list can be found in the [Grammar
1151Reference](grammar.md).
1152
1153
1154### Expressions
1155
1156#### Primary expressions
1157
1158Emboss primary expressions are field names (like `field` or `field.subfield`),
1159numeric constants (like `9` or `0x1_0000_0000`), enum value names (like
1160`Enum.VALUE`), and the boolean constants `true` and `false`.
1161
1162Subfields may be specified using `.`; e.g., `foo.bar` references the `bar`
1163subfield of the `foo` field.  Emboss parses `.` before any expressions: unlike
1164many languages, something like `(foo).bar` is a syntax error in Emboss.
1165
1166Enum values generally must be qualified by their type; e.g., `Color.RED` rather
1167than just `RED`.  Enums defined in other modules must use the imported module
1168name, as in `styles.Color.RED`.
1169
1170
1171#### Operators and Functions
1172
1173Note: Emboss currently has a relatively limited set of operators because
1174operators have been implemented as needed.  If you could use an operator that is
1175not on the list, email `emboss-dev@`, and we'll see about adding it.
1176
1177Emboss operators have the following precedence (tightest binding to loosest
1178binding):
1179
11801.  `()` `$max()` `$present()` `$upper_bound()` `$lower_bound()`
11812.  unary `+` and `-` ([see note 1](#precedence-note-unary-plus-minus))
11823.  `*`
11834.  `+` `-`
11845.  `<` `>` `==` `!=` `>=` `<=` ([see note 2](#precedence-note-comparisons))
11856.  `&&` `||` ([see note 3](#precedence-note-and-or))
11867.  `?:` ([see note 4](#precedence-note-choice))
1187
1188
1189###### Note 1 {#precedence-note-unary-plus-minus}
1190
1191Only one unary `+` or `-` may be applied to an expression without parentheses.
1192These expressions are valid:
1193
1194```
1195-5
1196+6
1197-(-x)
1198```
1199
1200These are not:
1201
1202```
1203- -5
1204-+5
1205+ +5
1206+-5
1207```
1208
1209
1210###### Note 2 {#precedence-note-comparisons}
1211
1212The relational operators may be chained like so:
1213
1214```
121510 <= x < 50        # 10 <= x && x < 50
121610 <= x == y < 50   # 10 <= x && x == y && y < 50
1217100 > y >= 2        # 100 > y && y >= 2
1218x == y == 15        # x == y && y == 15
1219```
1220
1221These are not:
1222
1223```
122410 < x > 50
122510 < x == y >= z
1226x == y >= z <= 50
1227```
1228
1229If one specifically wants to compare the result of a comparison, parentheses
1230must be used:
1231
1232```
1233(x > 15) == (y > 15)
1234(x > 15) == true
1235```
1236
1237The `!=` operator may not be chained.
1238
1239A chain may contain either `<`, `<=`, and/or `==`, or `>`, `>=`, and/or `==`.
1240Greater-than comparisons may not be mixed with less-than comparisons.
1241
1242
1243###### Note 3 {#precedence-note-and-or}
1244
1245The boolean logical operators have the same precedence, but may not be mixed
1246without parentheses.  The following are allowed:
1247
1248```
1249x && y && z
1250x || y || z
1251(x || y) && z
1252x || (y && z)
1253```
1254
1255The following are not allowed:
1256
1257```
1258x || y && z
1259x && y || z
1260```
1261
1262
1263###### Note 4 {#precedence-note-choice}
1264
1265The choice operator `?:` may not be chained without parentheses.  These are OK:
1266
1267```
1268q ? x : (r ? y : z)
1269q ? (r ? x : y) : z
1270```
1271
1272This is not:
1273
1274```
1275q ? x : r ? y : z  # Is this `(q?x:r)?y:z` or `q?x:(r?y:z)`?
1276q ? r ? x : y : z  # Technically unambiguous, but visually confusing
1277```
1278
1279
1280##### `()`
1281
1282Parentheses are used to override precedence.  The subexpression inside the
1283parentheses will be evaluated as a unit:
1284
1285```
12863 * 4 + 5 == 17
12873 * (4 + 5) == 27
1288```
1289
1290The value inside the parentheses can have any type; the value of the resulting
1291expression will have the same type.
1292
1293
1294##### `$present()`
1295
1296The `$present()` function takes a field as an argument, and returns `true` if
1297the field is present in its structure.
1298
1299```
1300struct PresentExample:
1301  0 [+1]    UInt  x
1302  if false:
1303    1 [+1]  UInt  y
1304  if x > 10:
1305    2 [+1]  UInt  z
1306  if $present(x):  # Always true
1307    0 [+1]  Int  x2
1308  if $present(y):  # Always false
1309    1 [+1]  Int  y2
1310  if $present(z):  # Equivalent to `if x > 10`
1311    2 [+1]  Int  z2
1312```
1313
1314`$present()` takes exactly one argument.
1315
1316The argument to `$present()` must be a reference to a field.  It can be a nested
1317reference, like `$present(x.y.z.q.r)`.  The type of the field does not matter.
1318
1319`$present()` returns a boolean.
1320
1321
1322##### `$max()`
1323
1324The `$max()` function returns the maximum value out of its arguments:
1325
1326```
1327$max(1) == 1
1328$max(-10, -5) == -5
1329$max(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) == 10
1330```
1331
1332`$max()` requires at least one argument.  There is no explicit limit on the
1333number of arguments, but at some point the Emboss compiler will run out of
1334memory.
1335
1336All arguments to `$max()` must be integers, and it returns an integer.
1337
1338
1339##### `$upper_bound()`
1340
1341The `$upper_bound()` function returns a value that is at least as high as the
1342maximum possible value of its argument:
1343
1344```
1345$upper_bound(1) == 1
1346$upper_bound(-10) == -10
1347$upper_bound(foo) == 255  # If foo is UInt:8
1348$upper_bound($max(foo, 500)) == 500  # If foo is UInt:8
1349```
1350
1351Generally, `$upper_bound()` will return a tight bound, but it is possible to
1352outsmart Emboss's bounds checker.
1353
1354`$upper_bound()` takes a single integer argument, and returns a single integer
1355argument.
1356
1357
1358##### `$lower_bound()`
1359
1360The `$lower_bound()` function returns a value that is no greater than the
1361minimum possible value of its argument:
1362
1363```
1364$lower_bound(1) == 1
1365$lower_bound(-10) == -10
1366$lower_bound(foo) == -127  # If foo is Int:8
1367$lower_bound($min(foo, -500)) == -500  # If foo is Int:8
1368```
1369
1370Generally, `$lower_bound()` will return a tight bound, but it is possible to
1371outsmart Emboss's bounds checker.
1372
1373`$lower_bound()` takes a single integer argument, and returns a single integer
1374argument.
1375
1376
1377##### Unary `+` and `-`
1378
1379The unary `+` operator returns its argument unchanged.
1380
1381The unary `-` operator subtracts its argument from 0:
1382
1383```
13843 * -4 == 0 - 12
1385-(3 * 4) == -12
1386```
1387
1388Unary `+` and `-` require an integer argument, and return an integer result.
1389
1390
1391##### `*`
1392
1393`*` is the multiplication operator:
1394
1395```
13963 * 4 == 12
139710 * 10 == 100
1398```
1399
1400The `*` operator requires two integer arguments, and returns an integer.
1401
1402
1403##### `+` and `-`
1404
1405`+` and `-` are the addition and subtraction operators, respectively:
1406
1407```
14083 + 4 == 7
14093 - 4 == -1
1410```
1411
1412The `+` and `-` operators require two integer arguments, and return an integer
1413result.
1414
1415
1416##### `==` and `!=`
1417
1418The `==` operator returns `true` if its arguments are equal, and `false` if not.
1419
1420The `!=` operator returns `false` if its arguments are equal, and `true` if not.
1421
1422Both operators take two boolean arguments, two integer arguments, or two
1423arguments of the same enum type, and return a boolean result.
1424
1425
1426##### `<`, `<=`, `>`, and `>=`
1427
1428The `<` operator returns `true` if its first argument is numerically less than
1429its second argument.
1430
1431The `>` operator returns `true` if its first argument is numerically greater
1432than its second argument.
1433
1434The `<=` operator returns `true` if its first argument is numerically less than
1435or equal to its second argument.
1436
1437The `>=` operator returns `true` if its first argument is numerically greater
1438than or equal to its second argument.
1439
1440All of these operators take two integer arguments, and return a boolean value.
1441
1442
1443##### `&&` and `||`
1444
1445The `&&` operator returns `false` if either of its arguments are `false`, even
1446if the other argument cannot be computed.  `&&` returns `true` if both arguments
1447are `true`.
1448
1449The `||` operator returns `true` if either of its arguments are `true`, even if
1450the other argument cannot be computed.  `||` returns `false` if both arguments
1451are `false`.
1452
1453The `&&` and `||` operators require two boolean arguments, and return a boolean
1454result.
1455
1456
1457##### `?:`
1458
1459The `?:` operator, used like <code>*condition* ? *if\_true* :
1460*if\_false*</code>, returns *`if_true`* if *`condition`* is `true`, otherwise
1461*`if_false`*.
1462
1463Other than having stricter type requirements for its arguments, it behaves like
1464the C, C++, Java, JavaScript, C#, etc. conditional operator `?:` (sometimes
1465called the "ternary operator").
1466
1467The `?:` operator's *`condition`* argument must be a boolean, and the
1468*`if_true`* and *`if_false`* arguments must have the same type.  It returns the
1469same type as *`if_true`* and *`if_false`*.
1470
1471
1472### Numeric Constant Formats
1473
1474Numeric constants in Emboss may be written in decimal, hexadecimal, or binary
1475format:
1476
1477```
147812      # The decimal value of 6 + 6.
1479012     # The same value; NOT interpreted as octal.
14800xc     # The same value, written in hexadecimal.
14810xC     # Hex digits may be written in capital letters.
1482        # Note that the 'x' must be lower-case: 0XC is not allowed.
14830b1100  # The same value, in binary.
1484```
1485
1486Decimal numbers may use `_` as a thousands separator:
1487
1488```
14891_000_000  # 1e6
1490123_456_789
1491```
1492
1493Hexadecimal and binary numbers may use `_` as a separator every 4 or 8 digits:
1494
1495```
14960x1234_5678_9abc_def0
14970x12345678_9abcdef0
14980b1010_0101_1010_0101
14990b10100101_10100101
1500```
1501
1502If separators are used, they *must* be thousands separators (for decimal
1503numbers) or 4- or 8-digit separators (for binary or hexadecimal numbers); `_`
1504may *not* be placed arbitrarily.  Binary and hexadecimal numbers must be
1505consistent about whether they use 4- or 8-digit separators; they cannot be
1506mixed in the same constant:
1507
1508```
15091000_000              # Not allowed: missing the separator after 1.
15101_000_00              # Not allowed: separators must be followed by a multiple
1511                      # of 3 digits.
15120x1234_567            # Not allowed: separators must be followed by a multiple
1513                      # of 4 or 8 digits.
15140x1234_5678_9abcdef0  # Not allowed: cannot mix 4- and 8-digit separators.
1515```
1516