1# Emboss Language Reference 2 3## Top Level Structure 4 5An `.emb` file contains four sections: a documentation block, imports, an 6attribute block, containing attributes which apply to the whole module, followed 7by a list of type definitions: 8 9``` 10# Documentation block (optional) 11-- This is an example of an .emb file, with every section. 12 13# Imports (optional) 14import "other.emb" as other 15import "project/more.emb" as project_more 16 17# Attribute block (optional) 18[$default byte_order: "LittleEndian"] 19[(cpp) namespace: "foo::bar::baz"] 20[(java) namespace: "com.example.foo.bar.baz"] 21 22# Type definitions 23enum Foo: 24 ONE = 1 25 TEN = 10 26 PURPLE = 12 27 28struct Bar: 29 0 [+4] Foo purple 30 4 [+4] UInt payload_size (s) 31 8 [+s] UInt:8[] payload 32``` 33 34The documentation and/or attribute blocks may be omitted if they are not 35necessary. 36 37 38### Comments 39 40Comments start with `#` and extend to the end of the line: 41 42``` 43struct Foo: # This is a comment 44 # This is a comment 45 0 [+1] UInt field # This is a comment 46``` 47 48Comments are ignored. They should not be confused with 49[*documentation*](#documentation), which is intended to be used by some back 50ends. 51 52 53## Documentation 54 55Documentation blocks may be attached to modules, types, fields, or enum values. 56They are different from comments in that they will be used by the 57(not-yet-ready) documentation generator back-end. 58 59Documentation blocks take the form of any number of lines starting with `-- `: 60 61``` 62-- This is a module documentation block. Text in this block will be attached to 63-- the module as documentation. 64-- 65-- This is a new paragraph in the same module documentation block. 66-- 67-- Module-level documentation should describe the purpose of the module, and may 68-- point out the most salient features of the module. 69 70struct Message: 71 -- This is a documentation block attached to the Message structure. It should 72 -- describe the purpose of Message, and how it should be used. 73 0 [+4] UInt header_length 74 -- This is documentation for the header_length field. Again, it should 75 -- describe this specific field. 76 4 [+4] MessageType message_type -- Short docs can go on the same line. 77``` 78 79Documentation should be written in CommonMark format, ignoring the leading 80`-- `. 81 82 83## Imports 84 85An `import` line tells Emboss to read another `.emb` file and make its types 86available to the current file under the given name. For example, given the 87import line: 88 89``` 90import "other.emb" as helper 91``` 92 93then the type `Type` from `other.emb` may be referenced as `helper.Type`. 94 95The `--import-dir` command-line flag tells Emboss which directories to search 96for imported files; it may be specified multiple times. If no `--import-dir` is 97specified, Emboss will search the current working directory. 98 99 100## Attributes 101 102Attributes are an extensible way of adding arbitrary information to a module, 103type, field, or enum value. Currently, only whitelisted attributes are allowed 104by the Emboss compiler, but this may change in the future. 105 106Attributes take a form like: 107 108``` 109[name: value] # name has value for the current entity. 110[$default name: value] # Default name to value for all sub-entities. 111[(backend) name: value] # Attribute for a specific back end. 112``` 113 114 115### `byte_order` 116 117The `byte_order` attribute is used to specify the byte order of `bits` fields 118and of field with an atomic type, such as `UInt`. 119 120`byte_order` takes a string value, which must be either `"BigEndian"`, 121`"LittleEndian"`, or `"Null"`: 122 123``` 124[$default byte_order: "LittleEndian"] 125 126struct Foo: 127 [$default byte_order: "Null"] 128 129 0 [+4] UInt bar 130 [byte_order: "BigEndian"] 131 132 4 [+4] bits: 133 [byte_order: "LittleEndian"] 134 135 0 [+23] UInt baz 136 23 [+9] UInt qux 137 138 8 [+1] UInt froble 139``` 140 141A `$default` byte order may be set on a module or structure. 142 143The `"BigEndian"` and `"LittleEndian"` byte orders set the byte order to big or 144little endian, respectively. That is, for little endian: 145 146``` 147 byte 0 byte 1 byte 2 byte 3 148+--------+--------+--------+--------+ 149|76543210|76543210|76543210|76543210| 150+--------+--------+--------+--------+ 151 ^ ^ ^ ^ ^ ^ ^ ^ 152 07 00 15 08 23 16 31 24 153 ^^^^^^^^^^^^^^^ bit ^^^^^^^^^^^^^^^ 154``` 155 156And for big endian: 157 158``` 159 byte 0 byte 1 byte 2 byte 3 160+--------+--------+--------+--------+ 161|76543210|76543210|76543210|76543210| 162+--------+--------+--------+--------+ 163 ^ ^ ^ ^ ^ ^ ^ ^ 164 31 24 23 16 15 08 07 00 165 ^^^^^^^^^^^^^^^ bit ^^^^^^^^^^^^^^^ 166``` 167 168The `"Null"` byte order is used if no `byte_order` attribute is specified. 169`"Null"` indicates that the byte order is unknown; it is an error if a 170byte-order-dependent field that is not exactly 8 bits has the `"Null"` byte 171order. 172 173 174### `requires` 175 176The `requires` attribute may be placed on an atomic field (e.g., type `UInt`, 177`Int`, `Flag`, etc.) to specify a predicate that values of that field must 178satisfy, or on a `struct` or `bits` to specify relationships between fields that 179must be satisfied. 180 181``` 182struct Foo: 183 [requires: bar < qux] 184 185 0 [+4] UInt bar 186 [requires: this <= 999_999_999] 187 188 4 [+4] UInt qux 189 [requires: 100 <= this <= 1_000_000_000] 190 191 let bar_plus_qux = bar + qux 192 [requires: this >= 199] 193``` 194 195For `[requires]` on a field, other fields may not be referenced, and the value 196of the current field must be referred to as `this`. 197 198For `[requires]` on a `struct` or `bits`, any atomic field in the structure may 199be referenced. 200 201 202### `(cpp) namespace` 203 204The `namespace` attribute is used by the C++ back end to determine which 205namespace to place the generated code in: 206 207``` 208[(cpp) namespace: "foo::bar::baz"] 209``` 210 211A leading `::` is allowed, but not required; the previous example could also be 212written as: 213 214``` 215[(cpp) namespace: "::foo::bar::baz"] 216``` 217 218Internally, Emboss will translate either of these into a nested `namespace foo { 219namespace bar { namespace baz { ... } } }` wrapping the generated C++ code for 220this module. 221 222The `namespace` attribute may only be used at the module level; all structures 223and enums within a module will be placed in the same namespace. 224 225### `(cpp) enum_case` 226 227The `enum_case` attribute can be specified for the C++ backend to specify 228in which case the enum values should be emitted to generated source. It does 229not change the text representation, which always uses the original emboss 230definition name as the canonical name. 231 232Currently, the supported cases are`SHOUTY_CASE` and `kCamelCase`. 233 234A `$default` enum case can be set on a module, struct, bits, or enum and 235applies to all enum values within that module, struct, bits, or enum 236definition. 237 238For example, to use `kCamelCase` by default for all enum values in a module: 239 240``` 241[$default enum_case: "kCamelCase"] 242``` 243 244This will change enum names like `UPPER_CHANNEL_RANGE_LIMIT` to 245`kUpperChannelRangeLimit` in the C++ source for all enum values in the module. 246Multiple case names can be specified, which is especially useful when 247transitioning between two cases: 248 249``` 250[enum_case: "SHOUTY_CASE, kCamelCase"] 251``` 252 253### `text_output` 254 255The `text_output` attribute may be attached to a `struct` or `bits` field to 256control whether or not the field is included when emitting the text format 257version of the structure. For example: 258 259``` 260struct SuppressedField: 261 0 [+1] UInt a 262 1 [+1] UInt b 263 [text_output: "Skip"] 264``` 265 266The text format output (as from `emboss::WriteToString()` in C++) would be of 267the form: 268 269``` 270{ a: 1 } 271``` 272 273instead of the default: 274 275``` 276{ a: 1, b: 2 } 277``` 278 279For completeness, `[text_output: "Emit"]` may be used to explicitly specify that 280a field should be included in text output. 281 282 283### `external` specifier attributes 284 285The `addressable_unit_size`, `type_requires`, `fixed_size_in_bits`, and 286`is_integer` attributes are used on `external` types to tell the compiler what 287it needs to know about the `external` types. They are currently 288unstable, and should only be used internally. 289 290 291## Type Definitions 292 293Emboss allows you to define structs, unions, bits, and enums, and uses externals 294to define "basic types." Types may be defined in any order, and may freely 295reference other types in the same module or any imported modules (including the 296implicitly-imported prelude). 297 298### `struct` 299 300A `struct` defines a view of a sequence of bytes. Each field of a `struct` is a 301view of some particular subsequence of the `struct`'s bytes, whose 302interpretation is determined by the field's type. 303 304For example: 305 306``` 307struct FramedMessage: 308 -- A FramedMessage wraps a Message with magic bytes, lengths, and CRC. 309 [$default byte_order: "LittleEndian"] 310 0 [+4] UInt magic_value 311 4 [+4] UInt header_length (h) 312 8 [+4] UInt message_length (m) 313 h [+m] Message message 314 h+m [+4] UInt crc32 315 [byte_order: "BigEndian"] 316``` 317 318The first line introduces the `struct` and gives it a name. This name may be 319used in field definitions to specify that the field has a structured type, and 320is used in the generated code. For example, to read the `message_length` from a 321sequence of bytes in C++, you would construct a `FramedMessageView` over the 322bytes: 323 324```c++ 325// vector<uint8_t> bytes; 326auto framed_message_view = FramedMessageView(&bytes[0], bytes.size()); 327uint32_t message_length = framed_message_view.message_length().Read(); 328``` 329 330(Note that the `FramedMessageView` does not take ownership of the bytes: it only 331provides a view of them.) 332 333Each field starts with a byte range (`0 [+4]`) that indicates *where* the field 334sits in the struct. For example, the `magic_value` field covers the first four 335bytes of the struct. 336 337Field locations *do not have to be constants*. In the example above, the 338`message` field starts at the end of the header (as determined by the 339`header_length` field) and covers `message_length` bytes. 340 341After the field's location is the field's *type*. The type determines how the 342field's bytes are interpreted: the `header_length` field will be interpreted as 343an unsigned integer (`UInt`), while the `message` field is interpreted as a 344`Message` -- another `struct` type defined elsewhere. 345 346After the type is the field's *name*: this is a name used in the generated code 347to access that field, as in `framed_message_view.message_length()`. The name 348may be followed by an optional *abbreviation*, like the `(h)` after 349`header_length`. The abbreviation can be used elsewhere in the `struct`, but is 350not available in the generated code: `framed_message_view.h()` wouldn't compile. 351 352Finally, fields may have attributes and documentation, just like any other 353Emboss construct. 354 355 356#### `$next` 357 358The keyword `$next` may be used in the offset expression of a physical field: 359 360``` 361struct Foo: 362 0 [+4] UInt x 363 $next [+2] UInt y 364 $next [+1] UInt z 365 $next [+4] UInt q 366``` 367 368`$next` translates to a built-in constant meaning "the end of the previous 369physical field." In the example above, `y` will start at offset 4 (0 + 4), `z` 370starts at offset 6 (4 + 2), and `q` at 7 (6 + 1). 371 372`$next` may be used in `bits` as well as `struct`s: 373 374``` 375bits Bar: 376 0 [+4] UInt x 377 $next [+2] UInt y 378 $next [+1] UInt z 379 $next [+4] UInt q 380``` 381 382You may use `$next` like a regular variable. For example, if you want to leave 383a two-byte gap between `z` and `q` (so that `q` starts at offset 9): 384 385``` 386struct Foo: 387 0 [+4] UInt x 388 $next [+2] UInt y 389 $next [+1] UInt z 390 $next+2 [+4] UInt q 391``` 392 393`$next` is particularly useful if your datasheet defines structures as lists of 394fields without offsets, or if you are translating from a C or C++ packed 395`struct`. 396 397 398#### Parameters 399 400`struct`s and `bits` can take runtime parameters: 401 402``` 403struct Foo(x: Int:8, y: Int:8): 404 0 [+x] UInt:8[] xs 405 x [+y] UInt:8[] ys 406 407enum Version: 408 VERSION_1 = 10 409 VERSION_2 = 20 410 411struct Bar(version: Version): 412 0 [+1] UInt payload 413 if payload == 1 && version == Version.VERSION_1: 414 1 [+10] OldPayload1 old_payload_1 415 if payload == 1 && version == Version.VERSION_2: 416 1 [+12] NewPayload1 new_payload_1 417``` 418 419Each parameter must have the form *name`:` type*. Currently, the *type* can 420be: 421 422* <code>UInt:*n*</code>, where *`n`* is a number from 1 to 64, inclusive. 423* <code>Int:*n*</code>, where *`n`* is a number from 1 to 64, inclusive. 424* The name of an Emboss `enum` type. 425 426`UInt`- and `Int`-typed parameters are integers with the corresponding range: 427for example, an `Int:4` parameter can have any integer value from -8 to +7. 428 429`enum`-typed parameters can take any value in the `enum`'s native range. Note 430that Emboss `enum`s are *open*, so unnamed values are allowed. 431 432Parameterized structures can be included in other structures by passing their 433parameters: 434 435``` 436struct Baz: 437 0 [+1] Version version 438 1 [+1] UInt:8 size 439 2 [+size] Bar(version) bar 440``` 441 442 443#### Virtual "Fields" 444 445It is possible to define a non-physical "field" whose value is an expression: 446 447``` 448struct Foo: 449 0 [+4] UInt bar 450 let two_bar = 2 * bar 451``` 452 453These virtual "fields" may be used like any other field in most circumstances: 454 455``` 456struct Bar: 457 0 [+4] Foo foo 458 if foo.two_bar < 100: 459 foo.two_bar [+4] UInt uint_at_offset_two_bar 460``` 461 462Virtual fields may be integers, booleans, or an enum: 463 464``` 465enum Size: 466 SMALL = 1 467 LARGE = 2 468 469struct Qux: 470 0 [+4] UInt x 471 let x_is_big = x > 100 472 let x_size = x_is_big ? Size.LARGE : Size.SMALL 473``` 474 475When a virtual field has a constant value, you may refer to it using its type: 476 477``` 478struct Foo: 479 let foo_offset = 0x120 480 0 [+4] UInt foo 481 482struct Bar: 483 Foo.foo_offset [+4] Foo foo 484``` 485 486This does not work for non-constant virtual fields: 487 488``` 489struct Foo: 490 0 [+4] UInt foo 491 let foo_offset = foo + 10 492 493struct Bar: 494 Foo.foo_offset [+4] Foo foo # Won't compile. 495``` 496 497Note that, in some cases, you *must* use Type.field, and not field.field: 498 499``` 500struct Foo: 501 0 [+4] UInt foo 502 let foo_offset = 10 503 504struct Bar: 505 # Won't compile: foo.foo_offset depends on foo, which depends on 506 # foo.foo_offset. 507 foo.foo_offset [+4] Foo foo 508 509 # Will compile: Foo.foo_offset is a static constant. 510 Foo.foo_offset [+4] Foo foo 511``` 512 513This limitation may be lifted in the future, but it has no practical effect. 514 515 516##### Aliases 517 518Virtual fields of the form `let x = y` or `let x = y.z.q` are allowed even when 519`y` or `q` are composite fields. Virtuals of this form are considered to be 520*aliases* of the referred field; in generated code, they may be written as well 521as read, and writing through them is equivalent to writing to the aliased field. 522 523 524##### Simple Transforms 525 526Virtual fields of the forms `let x1 = y + 1`, `let x2 = 2 + y`, `let x3 = y - 5273`, and `let x4 = 4 - y`, where `y` is a writeable field, will be writeable in 528the generated code. When writing through these fields, the transformed field 529will be set to an appropriate value. For example, writing `5` to `x1` will 530actually write `4` to `y`, and writing `6` to `x4` will write `-2` to `y`. This 531can be used to model fields whose raw values should be adjusted by some constant 532value, e.g.: 533 534``` 535struct PosixDate: 536 0 [+1] Int raw_year 537 -- Number of years since 1900. 538 539 let year = raw_year + 1900 540 -- Gregorian year number. 541 542 1 [+1] Int zero_based_month 543 -- Month number, from 0-11. Good for looking up a month name in a table. 544 545 let month = zero_based_month + 1 546 -- Month number, from 1-12. Good for printing directly. 547 548 2 [+1] Int day 549 -- Day number, one-based. 550``` 551 552 553#### Subtypes 554 555A `struct` definition may contain other type definitions: 556 557``` 558struct Foo: 559 struct Bar: 560 0 [+2] UInt baz 561 2 [+2] UInt qux 562 563 0 [+4] Bar bar 564 4 [+4] Bar bar2 565``` 566 567 568#### Conditional fields 569 570A `struct` field may have fields which are only present under some 571circumstances. For example: 572 573``` 574struct FramedMessage: 575 0 [+4] enum message_id: 576 TYPE1 = 1 577 TYPE2 = 2 578 579 if message_id == MessageId.TYPE1: 580 4 [+16] Type1Message type_1_message 581 582 if message_id == MessageId.TYPE2: 583 4 [+8] Type2Message type_2_message 584``` 585 586The `type_1_message` field will only be available if `message_id` is `TYPE1`, 587and similarly the `type_2_message` field will only be available if `message_id` 588is `TYPE2`. If `message_id` is some other value, then neither field will be 589available. 590 591 592#### Inline `struct` 593 594It is possible to define a `struct` inline in a `struct` field. For example: 595 596``` 597struct Message: 598 [$default byte_order: "BigEndian"] 599 0 [+4] UInt message_length 600 4 [+4] struct payload: 601 0 [+1] UInt incoming 602 2 [+2] UInt scale_factor 603``` 604 605This is equivalent to: 606 607``` 608struct Message: 609 [$default byte_order: "BigEndian"] 610 611 struct Payload: 612 0 [+1] UInt incoming 613 2 [+2] UInt scale_factor 614 615 0 [+4] UInt message_length 616 4 [+4] Payload payload 617``` 618 619This can be useful as a way to group related fields together. 620 621 622#### Using `struct` to define a C-like `union` 623 624Emboss doesn't support C-like `union`s directly via built in type 625definitions. However, you can use Emboss's overlapping fields feature to 626effectively create a `union`: 627 628``` 629struct Foo: 630 0 [+1] UInt a 631 0 [+2] UInt b 632 0 [+4] UInt c 633``` 634 635 636#### Automatically-Generated Fields 637 638A `struct` will have `$size_in_bytes`, `$max_size_in_bytes`, and 639`$min_size_in_bytes` virtual field automatically generated. These virtual field 640can be referenced inside the Emboss language just like any other virtual field: 641 642``` 643struct Inner: 644 0 [+4] UInt field_a 645 4 [+4] UInt field_b 646 647struct Outer: 648 0 [+1] UInt message_type 649 if message_type == 4: 650 4 [+Inner.$size_in_bytes] Inner payload 651``` 652 653 654##### `$size_in_bytes` {#size-in-bytes} 655 656An Emboss `struct` has an *intrinsic* size, which is the size required to hold 657every field in the `struct`, regardless of how many bytes are in the buffer that 658backs the `struct`. For example: 659 660``` 661struct FixedSize: 662 0 [+4] UInt long_field 663 4 [+2] UInt short_field 664``` 665 666In this case, `FixedSize.$size_in_bytes` will always be `6`, even if a 667`FixedSize` is placed in a larger field: 668 669``` 670struct Envelope: 671 # padded_payload.$size_in_bytes == FixedSize.$size_in_bytes == 6 672 0 [+8] FixedSize padded_payload 673``` 674 675The intrinsic size of a `struct` might not be constant: 676 677``` 678struct DynamicallySizedField: 679 0 [+1] UInt length 680 1 [+length] UInt:8[] payload 681 # $size_in_bytes == 1 + length 682 683struct DynamicallyPlacedField: 684 0 [+1] UInt offset 685 offset [+1] UInt payload 686 # $size_in_bytes == offset + 1 687 688struct OptionalField: 689 0 [+1] UInt version 690 if version > 3: 691 1 [+1] UInt optional_field 692 # $size_in_bytes == (version > 3 ? 2 : 1) 693``` 694 695If the intrinsic size is dynamic, it can still be read dynamically from a field: 696 697``` 698struct Envelope2: 699 0 [+1] UInt payload_size 700 1 [+payload_size] DynamicallySizedField payload 701 let padding_bytes = payload_size - payload.$size_in_bytes 702``` 703 704 705##### `$max_size_in_bytes` {#max-size-in-bytes} 706 707The `$max_size_in_bytes` virtual field is a constant value that is at least as 708large as the largest possible value for `$size_in_bytes`. In most cases, it 709will exactly equal the largest possible message size, but it is possible to 710outsmart Emboss's bounds checker. 711 712``` 713struct DynamicallySizedStruct: 714 0 [+1] UInt length 715 1 [+length] UInt:8[] payload 716 717struct PaddedContainer: 718 0 [+DynamicallySizedStruct.$max_size_in_bytes] DynamicallySizedStruct s 719 # s will be 256 bytes long. 720``` 721 722 723##### `$min_size_in_bytes` {#min-size-in-bytes} 724 725The `$min_size_in_bytes` virtual field is a constant value that is no larger 726than the smallest possible value for `$size_in_bytes`. In most cases, it will 727exactly equal the smallest possible message size, but it is possible to 728outsmart Emboss's bounds checker. 729 730``` 731struct DynamicallySizedStruct: 732 0 [+1] UInt length 733 1 [+length] UInt:8[] payload 734 735struct PaddedContainer: 736 0 [+DynamicallySizedStruct.$min_size_in_bytes] DynamicallySizedStruct s 737 # s will be 1 byte long. 738``` 739 740 741### `enum` 742 743An `enum` defines a set of named integers. 744 745``` 746enum Color: 747 BLACK = 0 748 RED = 1 749 GREEN = 2 750 YELLOW = 3 751 BLUE = 4 752 MAGENTA = 5 753 CYAN = 6 754 WHITE = 7 755 756struct PaletteEntry: 757 0 [+1] UInt id 758 1 [+1] Color color 759``` 760 761Enum values are always read the same way as `Int` or `UInt` -- that is, as an 762unsigned integer or as a 2's-complement signed integer, depending on whether the 763`enum` contains any negative values or not. 764 765Enum values do not have to be contiguous, and may repeat: 766 767``` 768enum Baud: 769 B300 = 300 770 B600 = 600 771 B1200 = 1200 772 STANDARD = 1200 773``` 774 775All values in a single `enum` must either be between -9223372036854775808 776(-2^63) and 9223372036854775807 (2^(63)-1), inclusive, or between 0 and 77718446744073709551615 (2^(64)-1), inclusive. 778 779It is valid to have an `enum` field that is too small to contain some values in 780the `enum`: 781 782``` 783enum LittleAndBig: 784 LITTLE = 1 785 BIG = 0x1_0000_0000 786 787struct LittleOnly: 788 0 [+1] LittleAndBig:8 little_only # Too small to hold LittleAndBig.BIG 789``` 790 791Emboss `enum`s are *open*: they may take values that are not defined in the 792`.emb`, as long as those values are in range. The `is_signed` and 793`maximum_bits` attributes, below, may be used to control the allowed range of 794values. 795 796 797#### `is_signed` Attribute 798 799The attribute `is_signed` may be used to explicitly specify whether an `enum` 800is signed or unsigned. Normally, an `enum` is signed if there is at least one 801negative value, and unsigned otherwise, but this behavior can be overridden: 802 803``` 804enum ExplicitlySigned: 805 [is_signed: true] 806 POSITIVE = 10 807``` 808 809 810#### `maximum_bits` Attribute 811 812The attribute `maximum_bits` may be used to specify the *maximum* width of an 813`enum`: fields of `enum` type may be smaller than `maximum_bits`, but never 814larger: 815 816``` 817enum ExplicitlySized: 818 [maximum_bits: 32] 819 MAX_VALUE = 0xffff_ffff 820 821struct Foo: 822 0 [+4] ExplicitlySized four_bytes # 32-bit is fine 823 #4 [+8] ExplicitlySized eight_bytes # 64-bit field would be an error 824``` 825 826If not specified, `maximum_bits` defaults to `64`. 827 828This also allows back end code generators to use smaller types for `enum`s, in 829some cases. 830 831 832#### Inline `enum` 833 834It is possible to provide an enum definition directly in a field definition in a 835`struct` or `bits`: 836 837``` 838struct TurnSpecification: 839 0 [+1] UInt degrees 840 1 [+1] enum direction: 841 LEFT = 0 842 RIGHT = 1 843``` 844 845This example creates a nested `enum` `TurnSpecification.Direction`, exactly as 846if it were written: 847 848``` 849struct TurnSpecification: 850 enum Direction: 851 LEFT = 0 852 RIGHT = 1 853 854 0 [+1] UInt degrees 855 1 [+1] Direction direction 856``` 857 858This can be useful when a particular `enum` is short and only used in one place. 859 860Note that `maximum_bits` and `is_signed` cannot be used on an inline `enum`. 861If you need to use either of these attributes, make a separate `enum`. 862 863 864### `bits` 865 866A `bits` defines a view of an ordered sequence of bits. Each field is a view of 867some particular subsequence of the `bits`'s bits, whose interpretation is 868determined by the field's type. 869 870The structure of a `bits` definition is very similar to a `struct`, except that 871a `struct` provides a structured view of bytes, where a `bits` provides a 872structured view of bits. Fields in a `bits` must have bit-oriented types (such 873as other `bits`, `UInt`, `Bcd`, `Flag`). Byte-oriented types, such as 874`struct`s, may not be embedded in a `bits`. 875 876For example: 877 878``` 879bits ControlRegister: 880 -- The `ControlRegister` holds basic control values. 881 882 4 [+12] UInt horizontal_start_offset 883 -- The number of pixel clock ticks to wait after the start of a line 884 -- before starting to draw pixel data. 885 886 3 [+1] Flag horizontal_overscan_disable 887 -- If set, the electron gun will be disabled during the overscan period, 888 -- otherwise the overscan color will be used. 889 890 0 [+3] UInt horizontal_overscan_color 891 -- The palette index of the overscan color to use. 892 893struct RegisterPage: 894 -- The registers of the BGA (Bogus Graphics Array) card. 895 896 0 [+2] ControlRegister control_register 897 [byte_order: "LittleEndian"] 898``` 899 900The first line introduces the `bits` and gives it a name. This name may be 901used in field definitions to specify that the field has a structured type, and 902is used in the generated code. 903 904For example, to write a `horizontal_overscan_color` of 7 to a pair of bytes in 905C++, you would use: 906 907```c++ 908// vector<uint8_t> bytes; 909auto register_page_view = RegisterPageWriter(&bytes[0], bytes.size()); 910register_page_view.control_register().horizontal_overscan_color().Write(7); 911``` 912 913Similar to `struct`, each field starts with a *bit* range (`4 [+12]`) that 914indicates which bits it covers. For example, the `horizontal_overscan_disable` 915field only covers bit 3. Bit 0 always corresponds to the lowest-order bit the 916bitfield; that is, if a `UInt` covers the same bits as the `bits` construct, 917then bit 0 in the `bits` will be the same as the `UInt` mod 2. This is often, 918but not always, how bits are numbered in protocol specifications. 919 920After the field's location is the field's *type*. The type determines how the 921field's bits are interpreted: typical choices are `UInt` (for unsigned 922integers), `Flag` (for boolean flags), and `enum`s. Other `bits` may also be 923used, as well as any `external` types declared with `[addressable_unit_size: 9241]`. 925 926Fields may have attributes and documentation, just like any other Emboss 927construct. 928 929In generated code, reading or writing any field of a `bits` construct will cause 930the entire field to be read or written -- something to keep in mind when reading 931or writing a memory-mapped register space. 932 933 934#### Anonymous `bits` 935 936It is possible to use an anonymous `bits` definition directly in a `struct`; 937for example: 938 939``` 940struct Message: 941 [$default byte_order: "BigEndian"] 942 0 [+4] UInt message_length 943 4 [+4] bits: 944 0 [+1] Flag incoming 945 1 [+1] Flag last_fragment 946 2 [+4] UInt scale_factor 947 31 [+1] Flag error 948``` 949 950In this case, the fields of the `bits` will be treated as though they are fields 951of the outer struct. 952 953 954#### Inline `bits` 955 956Like `enum`s, it is also possible to define a named `bits` inline in a `struct` 957or `bits`. For example: 958 959``` 960struct Message: 961 [$default byte_order: "BigEndian"] 962 0 [+4] UInt message_length 963 4 [+4] bits payload: 964 0 [+1] Flag incoming 965 1 [+1] Flag last_fragment 966 2 [+4] UInt scale_factor 967 31 [+1] Flag error 968``` 969 970This is equivalent to: 971 972``` 973struct Message: 974 [$default byte_order: "BigEndian"] 975 976 bits Payload: 977 0 [+1] Flag incoming 978 1 [+1] Flag last_fragment 979 2 [+4] UInt scale_factor 980 31 [+1] Flag error 981 982 0 [+4] UInt message_length 983 4 [+4] Payload payload 984``` 985 986This can be useful as a way to group related fields together. 987 988 989#### Automatically-Generated Fields 990 991A `bits` will have `$size_in_bits`, `$max_size_in_bits`, and `$min_size_in_bits` 992virtual fields automatically generated. These virtual fields can be referenced 993inside the Emboss language just like any other virtual field: 994 995``` 996bits Inner: 997 0 [+4] UInt field_a 998 4 [+4] UInt field_b 999 1000struct Outer: 1001 0 [+1] UInt message_type 1002 if message_type == 4: 1003 4 [+Inner.$size_in_bits] Inner payload 1004``` 1005 1006 1007##### `$size_in_bits` {#size-in-bits} 1008 1009Like a `struct`, an Emboss `bits` has an *intrinsic* size, which is the size 1010required to hold every field in the `bits`, regardless of how many bits are 1011in the buffer that backs the `bits`. For example: 1012 1013``` 1014bits FixedSize: 1015 0 [+3] UInt long_field 1016 3 [+1] Flag short_field 1017``` 1018 1019In this case, `FixedSize.$size_in_bits` will always be `4`, even if a 1020`FixedSize` is placed in a larger field: 1021 1022``` 1023struct Envelope: 1024 # padded_payload.$size_in_bits == FixedSize.$size_in_bits == 4 1025 0 [+8] FixedSize padded_payload 1026``` 1027 1028Unlike `struct`s, the size of `bits` must known at compile time; there are no 1029dynamic `$size_in_bits` fields. 1030 1031 1032##### `$max_size_in_bits` {#max-size-in-bits} 1033 1034Since `bits` must be fixed size, the `$max_size_in_bits` field has the same 1035value as `$size_in_bits`. It is provided for consistency with 1036`$max_size_in_bytes`. 1037 1038 1039##### `$min_size_in_bits` {#min-size-in-bits} 1040 1041Since `bits` must be fixed size, the `$min_size_in_bits` field has the same 1042value as `$size_in_bits`. It is provided for consistency with 1043`$min_size_in_bytes`. 1044 1045 1046### `external` 1047 1048An `external` type is used when a type cannot be defined in Emboss itself; 1049instead, external code must be provided to manipulate the type. 1050 1051Emboss's built-in types, such as `UInt`, `Bcd`, and `Flag`, are defined this way 1052in a special file called the *prelude*. For example, `UInt` is defined as: 1053 1054``` 1055external UInt: 1056 -- UInt is an automatically-sized unsigned integer. 1057 [type_requires: $is_statically_sized && 1 <= $static_size_in_bits <= 64] 1058 [is_integer: true] 1059 [addressable_unit_size: 1] 1060``` 1061 1062`external` types are an unstable feature. Contact `emboss-dev` if you would 1063like to add your own `external`s. 1064 1065 1066## Builtin Types and the Prelude 1067 1068Emboss has a built-in module called the *Prelude*, which contains types that are 1069automatically usable from any module. In particular, types like `Int` and 1070`UInt` are defined in the Prelude. 1071 1072The Prelude is (more or less) a standard Emboss file, called `prelude.emb`, that 1073is embedded in the Emboss compiler. 1074 1075<!-- TODO(bolms): When the documentation generator backend is built, generate 1076the Prelude documentation from prelude.emb. --> 1077 1078 1079### `UInt` 1080 1081A `UInt` is an unsigned integer. `UInt` can be anywhere from 1 to 64 bits in 1082size, and may be used both in `struct`s and in `bits`. `UInt` fields may be 1083referenced in integer expressions. 1084 1085 1086### `Int` 1087 1088An `Int` is a signed two's-complement integer. `Int` can be anywhere from 1 to 108964 bits in size, and may be used both in `struct`s and in `bits`. `Int` fields 1090may be referenced in integer expressions. 1091 1092 1093### `Bcd` 1094 1095(Note: `Bcd` is subject to change.) 1096 1097A `Bcd` is an unsigned binary-coded decimal integer. `Bcd` can be anywhere from 10981 to 64 bits in size, and may be used both in `struct`s and in `bits`. `Bcd` 1099fields may be referenced in integer expressions. 1100 1101When a `Bcd`'s size is not a multiple of 4 bits, the high-order "digit" is 1102treated as if it were zero-extended to a multiple of 4 bits. For example, a 11037-bit `Bcd` value can store any number from 0 to 79. 1104 1105 1106### `Flag` 1107 1108A `Flag` is a 1-bit boolean value. A stored value of `0` means `false`, and a 1109stored value of `1` means `true`. 1110 1111 1112### `Float` 1113 1114A `Float` is a floating-point value in an IEEE 754 binaryNN format, where NN is 1115the bit width. 1116 1117Only 32- and 64-bit `Float`s are supported. There are no current plans to 1118support 16- or 128-bit `Float`s, nor the nonstandard x86 80-bit `Float`s. 1119 1120IEEE 754 does not specify which NaN bit patterns are signalling NaNs and which 1121are quiet NaNs, and thus Emboss also does not specify which NaNs are which. 1122This means that a quiet NaN written through an Emboss view one system could be 1123read out as a signalling NaN through an Emboss view on a different system. If 1124this is a concern, the application must explicitly check for NaN before doing 1125arithmetic on any floating-point value read from a `Float` field. 1126 1127 1128## General Syntax 1129 1130### Names 1131 1132All names in Emboss must be ASCII, for compatibility with languages such as C 1133and C++ that do not support Unicode identifiers. 1134 1135Type names in Emboss are always `CamelCase`. They must start with a capital 1136letter, contain at least one lower-case letter, and contain only letters and 1137digits. They are required to match the regex 1138`[A-Z][a-zA-Z0-9]*[a-z][a-zA-Z0-9]*` 1139 1140Imported module names and field names are always `snake_case`. They must start 1141with a lower-case letter, and may only contain lower-case letters, numbers, and 1142underscore. They must match the regex `[a-z][a-z_0-9]*`. 1143 1144Enum value names are always `SHOUTY_CASE`. They must start with a capital 1145letter, may only contain capital letters, numbers, and underscore, and must be 1146at least two characters long. They must match the regex 1147`[A-Z][A-Z_0-9]*[A-Z_][A-Z_0-9]*`. 1148 1149Additionally, names that are used as keywords in common programming languages 1150are disallowed. A complete list can be found in the [Grammar 1151Reference](grammar.md). 1152 1153 1154### Expressions 1155 1156#### Primary expressions 1157 1158Emboss primary expressions are field names (like `field` or `field.subfield`), 1159numeric constants (like `9` or `0x1_0000_0000`), enum value names (like 1160`Enum.VALUE`), and the boolean constants `true` and `false`. 1161 1162Subfields may be specified using `.`; e.g., `foo.bar` references the `bar` 1163subfield of the `foo` field. Emboss parses `.` before any expressions: unlike 1164many languages, something like `(foo).bar` is a syntax error in Emboss. 1165 1166Enum values generally must be qualified by their type; e.g., `Color.RED` rather 1167than just `RED`. Enums defined in other modules must use the imported module 1168name, as in `styles.Color.RED`. 1169 1170 1171#### Operators and Functions 1172 1173Note: Emboss currently has a relatively limited set of operators because 1174operators have been implemented as needed. If you could use an operator that is 1175not on the list, email `emboss-dev@`, and we'll see about adding it. 1176 1177Emboss operators have the following precedence (tightest binding to loosest 1178binding): 1179 11801. `()` `$max()` `$present()` `$upper_bound()` `$lower_bound()` 11812. unary `+` and `-` ([see note 1](#precedence-note-unary-plus-minus)) 11823. `*` 11834. `+` `-` 11845. `<` `>` `==` `!=` `>=` `<=` ([see note 2](#precedence-note-comparisons)) 11856. `&&` `||` ([see note 3](#precedence-note-and-or)) 11867. `?:` ([see note 4](#precedence-note-choice)) 1187 1188 1189###### Note 1 {#precedence-note-unary-plus-minus} 1190 1191Only one unary `+` or `-` may be applied to an expression without parentheses. 1192These expressions are valid: 1193 1194``` 1195-5 1196+6 1197-(-x) 1198``` 1199 1200These are not: 1201 1202``` 1203- -5 1204-+5 1205+ +5 1206+-5 1207``` 1208 1209 1210###### Note 2 {#precedence-note-comparisons} 1211 1212The relational operators may be chained like so: 1213 1214``` 121510 <= x < 50 # 10 <= x && x < 50 121610 <= x == y < 50 # 10 <= x && x == y && y < 50 1217100 > y >= 2 # 100 > y && y >= 2 1218x == y == 15 # x == y && y == 15 1219``` 1220 1221These are not: 1222 1223``` 122410 < x > 50 122510 < x == y >= z 1226x == y >= z <= 50 1227``` 1228 1229If one specifically wants to compare the result of a comparison, parentheses 1230must be used: 1231 1232``` 1233(x > 15) == (y > 15) 1234(x > 15) == true 1235``` 1236 1237The `!=` operator may not be chained. 1238 1239A chain may contain either `<`, `<=`, and/or `==`, or `>`, `>=`, and/or `==`. 1240Greater-than comparisons may not be mixed with less-than comparisons. 1241 1242 1243###### Note 3 {#precedence-note-and-or} 1244 1245The boolean logical operators have the same precedence, but may not be mixed 1246without parentheses. The following are allowed: 1247 1248``` 1249x && y && z 1250x || y || z 1251(x || y) && z 1252x || (y && z) 1253``` 1254 1255The following are not allowed: 1256 1257``` 1258x || y && z 1259x && y || z 1260``` 1261 1262 1263###### Note 4 {#precedence-note-choice} 1264 1265The choice operator `?:` may not be chained without parentheses. These are OK: 1266 1267``` 1268q ? x : (r ? y : z) 1269q ? (r ? x : y) : z 1270``` 1271 1272This is not: 1273 1274``` 1275q ? x : r ? y : z # Is this `(q?x:r)?y:z` or `q?x:(r?y:z)`? 1276q ? r ? x : y : z # Technically unambiguous, but visually confusing 1277``` 1278 1279 1280##### `()` 1281 1282Parentheses are used to override precedence. The subexpression inside the 1283parentheses will be evaluated as a unit: 1284 1285``` 12863 * 4 + 5 == 17 12873 * (4 + 5) == 27 1288``` 1289 1290The value inside the parentheses can have any type; the value of the resulting 1291expression will have the same type. 1292 1293 1294##### `$present()` 1295 1296The `$present()` function takes a field as an argument, and returns `true` if 1297the field is present in its structure. 1298 1299``` 1300struct PresentExample: 1301 0 [+1] UInt x 1302 if false: 1303 1 [+1] UInt y 1304 if x > 10: 1305 2 [+1] UInt z 1306 if $present(x): # Always true 1307 0 [+1] Int x2 1308 if $present(y): # Always false 1309 1 [+1] Int y2 1310 if $present(z): # Equivalent to `if x > 10` 1311 2 [+1] Int z2 1312``` 1313 1314`$present()` takes exactly one argument. 1315 1316The argument to `$present()` must be a reference to a field. It can be a nested 1317reference, like `$present(x.y.z.q.r)`. The type of the field does not matter. 1318 1319`$present()` returns a boolean. 1320 1321 1322##### `$max()` 1323 1324The `$max()` function returns the maximum value out of its arguments: 1325 1326``` 1327$max(1) == 1 1328$max(-10, -5) == -5 1329$max(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) == 10 1330``` 1331 1332`$max()` requires at least one argument. There is no explicit limit on the 1333number of arguments, but at some point the Emboss compiler will run out of 1334memory. 1335 1336All arguments to `$max()` must be integers, and it returns an integer. 1337 1338 1339##### `$upper_bound()` 1340 1341The `$upper_bound()` function returns a value that is at least as high as the 1342maximum possible value of its argument: 1343 1344``` 1345$upper_bound(1) == 1 1346$upper_bound(-10) == -10 1347$upper_bound(foo) == 255 # If foo is UInt:8 1348$upper_bound($max(foo, 500)) == 500 # If foo is UInt:8 1349``` 1350 1351Generally, `$upper_bound()` will return a tight bound, but it is possible to 1352outsmart Emboss's bounds checker. 1353 1354`$upper_bound()` takes a single integer argument, and returns a single integer 1355argument. 1356 1357 1358##### `$lower_bound()` 1359 1360The `$lower_bound()` function returns a value that is no greater than the 1361minimum possible value of its argument: 1362 1363``` 1364$lower_bound(1) == 1 1365$lower_bound(-10) == -10 1366$lower_bound(foo) == -127 # If foo is Int:8 1367$lower_bound($min(foo, -500)) == -500 # If foo is Int:8 1368``` 1369 1370Generally, `$lower_bound()` will return a tight bound, but it is possible to 1371outsmart Emboss's bounds checker. 1372 1373`$lower_bound()` takes a single integer argument, and returns a single integer 1374argument. 1375 1376 1377##### Unary `+` and `-` 1378 1379The unary `+` operator returns its argument unchanged. 1380 1381The unary `-` operator subtracts its argument from 0: 1382 1383``` 13843 * -4 == 0 - 12 1385-(3 * 4) == -12 1386``` 1387 1388Unary `+` and `-` require an integer argument, and return an integer result. 1389 1390 1391##### `*` 1392 1393`*` is the multiplication operator: 1394 1395``` 13963 * 4 == 12 139710 * 10 == 100 1398``` 1399 1400The `*` operator requires two integer arguments, and returns an integer. 1401 1402 1403##### `+` and `-` 1404 1405`+` and `-` are the addition and subtraction operators, respectively: 1406 1407``` 14083 + 4 == 7 14093 - 4 == -1 1410``` 1411 1412The `+` and `-` operators require two integer arguments, and return an integer 1413result. 1414 1415 1416##### `==` and `!=` 1417 1418The `==` operator returns `true` if its arguments are equal, and `false` if not. 1419 1420The `!=` operator returns `false` if its arguments are equal, and `true` if not. 1421 1422Both operators take two boolean arguments, two integer arguments, or two 1423arguments of the same enum type, and return a boolean result. 1424 1425 1426##### `<`, `<=`, `>`, and `>=` 1427 1428The `<` operator returns `true` if its first argument is numerically less than 1429its second argument. 1430 1431The `>` operator returns `true` if its first argument is numerically greater 1432than its second argument. 1433 1434The `<=` operator returns `true` if its first argument is numerically less than 1435or equal to its second argument. 1436 1437The `>=` operator returns `true` if its first argument is numerically greater 1438than or equal to its second argument. 1439 1440All of these operators take two integer arguments, and return a boolean value. 1441 1442 1443##### `&&` and `||` 1444 1445The `&&` operator returns `false` if either of its arguments are `false`, even 1446if the other argument cannot be computed. `&&` returns `true` if both arguments 1447are `true`. 1448 1449The `||` operator returns `true` if either of its arguments are `true`, even if 1450the other argument cannot be computed. `||` returns `false` if both arguments 1451are `false`. 1452 1453The `&&` and `||` operators require two boolean arguments, and return a boolean 1454result. 1455 1456 1457##### `?:` 1458 1459The `?:` operator, used like <code>*condition* ? *if\_true* : 1460*if\_false*</code>, returns *`if_true`* if *`condition`* is `true`, otherwise 1461*`if_false`*. 1462 1463Other than having stricter type requirements for its arguments, it behaves like 1464the C, C++, Java, JavaScript, C#, etc. conditional operator `?:` (sometimes 1465called the "ternary operator"). 1466 1467The `?:` operator's *`condition`* argument must be a boolean, and the 1468*`if_true`* and *`if_false`* arguments must have the same type. It returns the 1469same type as *`if_true`* and *`if_false`*. 1470 1471 1472### Numeric Constant Formats 1473 1474Numeric constants in Emboss may be written in decimal, hexadecimal, or binary 1475format: 1476 1477``` 147812 # The decimal value of 6 + 6. 1479012 # The same value; NOT interpreted as octal. 14800xc # The same value, written in hexadecimal. 14810xC # Hex digits may be written in capital letters. 1482 # Note that the 'x' must be lower-case: 0XC is not allowed. 14830b1100 # The same value, in binary. 1484``` 1485 1486Decimal numbers may use `_` as a thousands separator: 1487 1488``` 14891_000_000 # 1e6 1490123_456_789 1491``` 1492 1493Hexadecimal and binary numbers may use `_` as a separator every 4 or 8 digits: 1494 1495``` 14960x1234_5678_9abc_def0 14970x12345678_9abcdef0 14980b1010_0101_1010_0101 14990b10100101_10100101 1500``` 1501 1502If separators are used, they *must* be thousands separators (for decimal 1503numbers) or 4- or 8-digit separators (for binary or hexadecimal numbers); `_` 1504may *not* be placed arbitrarily. Binary and hexadecimal numbers must be 1505consistent about whether they use 4- or 8-digit separators; they cannot be 1506mixed in the same constant: 1507 1508``` 15091000_000 # Not allowed: missing the separator after 1. 15101_000_00 # Not allowed: separators must be followed by a multiple 1511 # of 3 digits. 15120x1234_567 # Not allowed: separators must be followed by a multiple 1513 # of 4 or 8 digits. 15140x1234_5678_9abcdef0 # Not allowed: cannot mix 4- and 8-digit separators. 1515``` 1516