xref: /aosp_15_r20/external/emboss/doc/design_docs/proto.md (revision 99e0aae7469b87d12f0ad23e61142c2d74c1ef70)
1*99e0aae7SDavid Rees# Design Sketch: Protocol Buffers <=> Emboss Translation
2*99e0aae7SDavid Rees
3*99e0aae7SDavid Rees## Overview
4*99e0aae7SDavid Rees
5*99e0aae7SDavid ReesThere are many tools that operate on Protocol Buffer objects ("Protos").
6*99e0aae7SDavid ReesProviding a way to translate between Protos and Emboss structures would allow
7*99e0aae7SDavid Reesthose tools to be used without writing a tedious translation layer.
8*99e0aae7SDavid Rees
9*99e0aae7SDavid Rees
10*99e0aae7SDavid Rees## Defining an Equivalent Proto `message`
11*99e0aae7SDavid Rees
12*99e0aae7SDavid ReesFor each Emboss `struct`, `bits`, `enum`, and primitive type, there would need
13*99e0aae7SDavid Reesto be some equivalent Proto encoding -- likely a `message` for each `struct` or
14*99e0aae7SDavid Rees`bits`, a Proto `enum` inside a `message` for each `enum` (see below), and a
15*99e0aae7SDavid ReesProto primitive type for each Emboss primitive type.
16*99e0aae7SDavid Rees
17*99e0aae7SDavid ReesThere are two basic ways that the Proto definition could be generated:
18*99e0aae7SDavid Rees
19*99e0aae7SDavid Rees1.  Human-Authored `.proto` Definitions:
20*99e0aae7SDavid Rees
21*99e0aae7SDavid Rees    This requires more human effort when trying to use Emboss structures as
22*99e0aae7SDavid Rees    Protos, likely approaching the level of effort to just hand-write a
23*99e0aae7SDavid Rees    translation layer.  It *might* make it easier to use an existing Proto
24*99e0aae7SDavid Rees    definition.
25*99e0aae7SDavid Rees
26*99e0aae7SDavid Rees    It would also require significantly more flexibility, and therefore more
27*99e0aae7SDavid Rees    complexity, in the Emboss compiler.
28*99e0aae7SDavid Rees
29*99e0aae7SDavid Rees2.  Emboss Generates a `.proto` File:
30*99e0aae7SDavid Rees
31*99e0aae7SDavid Rees    This option is likely to create slightly "unnatural" Proto definitions (see
32*99e0aae7SDavid Rees    below for more details), but requires very little human effort to create a
33*99e0aae7SDavid Rees    translation to a Proto.
34*99e0aae7SDavid Rees
35*99e0aae7SDavid Rees    Escape hatches for "partially hand-coded" translations should be
36*99e0aae7SDavid Rees    considered, even if they are not implemented in the first pass at Emboss
37*99e0aae7SDavid Rees    <=> Proto translation.
38*99e0aae7SDavid Rees
39*99e0aae7SDavid ReesBecause a human always has the option to hand code their own translation, this
40*99e0aae7SDavid Reesdocument will assume option 2: the Emboss compiler generates a Proto
41*99e0aae7SDavid Reesdefinition.
42*99e0aae7SDavid Rees
43*99e0aae7SDavid Rees
44*99e0aae7SDavid Rees### Proto2 vs Proto3
45*99e0aae7SDavid Rees
46*99e0aae7SDavid ReesThe current state of Google Protocol Buffers is a bit messy, with both "version
47*99e0aae7SDavid Rees2" ("Proto2") and "version 3" ("Proto3") Protocol Buffers.  Proto2 and Proto3
48*99e0aae7SDavid Reescan (mostly) freely interoperate -- Proto2 files can import and use messages
49*99e0aae7SDavid Reesfrom Proto3 files and vice versa -- and both have long-term support guarantees
50*99e0aae7SDavid Reesfrom Google.  Differences between Proto2 and Proto3 are highlighted below: it
51*99e0aae7SDavid Reesis not clear whether Emboss should generate Proto2, Proto3, or both (via a flag
52*99e0aae7SDavid Reesor file-level property).
53*99e0aae7SDavid Rees
54*99e0aae7SDavid Rees
55*99e0aae7SDavid Rees### Primitive Types
56*99e0aae7SDavid Rees
57*99e0aae7SDavid Rees#### `Int`, `UInt`
58*99e0aae7SDavid Rees
59*99e0aae7SDavid Rees`Int` and `UInt` can map to Proto's `int32`, `int64`, `uint32`, and `uint64`.
60*99e0aae7SDavid ReesSmaller integers can be extended to the next-largest Proto integer size.
61*99e0aae7SDavid Rees
62*99e0aae7SDavid Rees
63*99e0aae7SDavid Rees#### `Float`
64*99e0aae7SDavid Rees
65*99e0aae7SDavid Rees`Float` maps to Proto's `float` and `double`.
66*99e0aae7SDavid Rees
67*99e0aae7SDavid Rees
68*99e0aae7SDavid Rees#### `Flag`
69*99e0aae7SDavid Rees
70*99e0aae7SDavid Rees`Flag` maps to Proto's `bool`.
71*99e0aae7SDavid Rees
72*99e0aae7SDavid Rees
73*99e0aae7SDavid Rees#### (Future) Emboss String/Blob Type
74*99e0aae7SDavid Rees
75*99e0aae7SDavid ReesA future Emboss string or blob type would translate to Proto's `string` or
76*99e0aae7SDavid Rees`bytes`.  It is likely that an Emboss "string" will be `bytes` in Proto, since
77*99e0aae7SDavid ReesEmboss is unlikely to enforce UTF-8 compliance.
78*99e0aae7SDavid Rees
79*99e0aae7SDavid ReesNote that Proto (version 2 only?) C++ does not enforce UTF-8 compliance on
80*99e0aae7SDavid Rees`string`, which can lead to crashes when the message is decoded in Python,
81*99e0aae7SDavid ReesJava, or another language that properly enforces string encoding.
82*99e0aae7SDavid Rees
83*99e0aae7SDavid Rees
84*99e0aae7SDavid Rees### Arrays
85*99e0aae7SDavid Rees
86*99e0aae7SDavid ReesUnidimensional arrays map neatly to `repeated` Proto fields.
87*99e0aae7SDavid Rees
88*99e0aae7SDavid ReesMultidimensional arrays must be handled with a wrapper `message` at each
89*99e0aae7SDavid Reesdimension after the first.
90*99e0aae7SDavid Rees
91*99e0aae7SDavid ReesBecause of the way that Proto wire format works (see [Translation Between
92*99e0aae7SDavid ReesEmboss View and Proto Wire Format](#between-emboss-view-and-proto-wire-format),
93*99e0aae7SDavid Reesbelow), there is a slight technical advantage to wrapping the outermost array
94*99e0aae7SDavid Reesin its own message.  This does make the (Proto) API a bit awkward, but not too
95*99e0aae7SDavid Reesbad:
96*99e0aae7SDavid Rees
97*99e0aae7SDavid Rees```c++
98*99e0aae7SDavid Reesauto element = structure.array_field().v(2);
99*99e0aae7SDavid Reesauto nested_element = structure.array_2d_field().v(2).v(1);
100*99e0aae7SDavid Rees```
101*99e0aae7SDavid Rees
102*99e0aae7SDavid Reesvs
103*99e0aae7SDavid Rees
104*99e0aae7SDavid Rees```c++
105*99e0aae7SDavid Reesauto element = structure.array_field(2);
106*99e0aae7SDavid Reesauto nested_element = structure.array_2d_field(2).v(1);
107*99e0aae7SDavid Rees```
108*99e0aae7SDavid Rees
109*99e0aae7SDavid Rees
110*99e0aae7SDavid Rees### Conditional Fields
111*99e0aae7SDavid Rees
112*99e0aae7SDavid ReesIn Proto2, conditional fields map fairly well to the concept of "presence" for
113*99e0aae7SDavid Reesfields.  Proto2 allows non-present fields to be read -- returning the default
114*99e0aae7SDavid Reesvalue for that field -- but this is not an issue for Emboss, which can easily
115*99e0aae7SDavid Reesgenerate the appropriate <code>has_*field*()</code> calls.
116*99e0aae7SDavid Rees
117*99e0aae7SDavid ReesProto3 does not track existence for primitive types the way that Proto2 does.
118*99e0aae7SDavid ReesThe "recommended" workaround is to use standardized wrapper types
119*99e0aae7SDavid Rees(`google.protobuf.FloatValue`, `google.protobuf.Int32Value`, etc.), which
120*99e0aae7SDavid Reesintroduce an extra layer.  There is a second workaround, related to the slightly
121*99e0aae7SDavid Reesweird way that Proto handles `oneof`: if the primitive field is inside a
122*99e0aae7SDavid Rees`oneof`, then it is *not* always present.  A `oneof` may contain a single
123*99e0aae7SDavid Reesmember, so primitive-typed fields could be generated as something like:
124*99e0aae7SDavid Rees
125*99e0aae7SDavid Rees```
126*99e0aae7SDavid Reesmessage Foo {
127*99e0aae7SDavid Rees  oneof field_1_oneof {
128*99e0aae7SDavid Rees    int32 field_1 = 1;
129*99e0aae7SDavid Rees  }
130*99e0aae7SDavid Rees}
131*99e0aae7SDavid Rees```
132*99e0aae7SDavid Rees
133*99e0aae7SDavid ReesNote that in Emboss, changing a field from unconditionally present to
134*99e0aae7SDavid Reesconditionally present is (usually) a backwards-compatible change.
135*99e0aae7SDavid Rees
136*99e0aae7SDavid Rees
137*99e0aae7SDavid Rees### (Future) Emboss Union Construct
138*99e0aae7SDavid Rees
139*99e0aae7SDavid ReesAn Emboss union construct would be necessary to take advantage of runtime space
140*99e0aae7SDavid Reessavings from using a Proto `oneof`.
141*99e0aae7SDavid Rees
142*99e0aae7SDavid Rees
143*99e0aae7SDavid Rees### `struct` and `bits`
144*99e0aae7SDavid Rees
145*99e0aae7SDavid Rees`struct` and `bits` map neatly to `message`, with few issues.
146*99e0aae7SDavid Rees
147*99e0aae7SDavid Rees
148*99e0aae7SDavid Rees#### Anonymous `bits`
149*99e0aae7SDavid Rees
150*99e0aae7SDavid ReesAnonymous `bits` get "flattened" so that their fields appear to be part of their
151*99e0aae7SDavid Reesenclosing structure.  This should be handled reasonably well via treating
152*99e0aae7SDavid Reesread-write virtual fields as members of the `message`, and by suppressing the
153*99e0aae7SDavid Rees"private" fields, such as anonymous `bits`.
154*99e0aae7SDavid Rees
155*99e0aae7SDavid Rees
156*99e0aae7SDavid Rees#### Proto Field IDs
157*99e0aae7SDavid Rees
158*99e0aae7SDavid ReesProto requires each field to have a unique tag ID.  We propose that, for fields
159*99e0aae7SDavid Reeswith a fixed start location, the start location + 1 is used for a default tag
160*99e0aae7SDavid ReesID: since a change to a field's start location would be a breaking change to the
161*99e0aae7SDavid ReesEmboss definition, it should be reasonably stable.  For fields with a variable
162*99e0aae7SDavid Reesstart location, virtual fields, or where the programmer wants a specific tag,
163*99e0aae7SDavid Reesthe attribute `[(proto) id]` can be used to specify the ID.
164*99e0aae7SDavid Rees
165*99e0aae7SDavid ReesThe "+ 1" is required since `0` is not a valid Proto tag ID.
166*99e0aae7SDavid Rees
167*99e0aae7SDavid Rees
168*99e0aae7SDavid Rees### `enum`
169*99e0aae7SDavid Rees
170*99e0aae7SDavid ReesThe Emboss `enum` construct does not map cleanly to the Proto `enum` construct,
171*99e0aae7SDavid Reeswith different issues in Proto2 vs Proto3.
172*99e0aae7SDavid Rees
173*99e0aae7SDavid ReesCommon to both, the names of Proto `enum` values are hoisted into the same
174*99e0aae7SDavid Reesnamespace as the `enum` itself (consistent with the C's handling of `enum`),
175*99e0aae7SDavid Reeswhich means that multiple `enum`s in the same context cannot hold the same value
176*99e0aae7SDavid Reesname.  This can be handled -- somewhat awkwardly -- by wrapping the `enum` in a
177*99e0aae7SDavid Rees"namespace" `message`, like:
178*99e0aae7SDavid Rees
179*99e0aae7SDavid Rees```
180*99e0aae7SDavid Reesmessage SomeEnum {
181*99e0aae7SDavid Rees  enum SomeEnum {
182*99e0aae7SDavid Rees    VALUE1 = 1;
183*99e0aae7SDavid Rees    VALUE2 = 2;
184*99e0aae7SDavid Rees  }
185*99e0aae7SDavid Rees}
186*99e0aae7SDavid Rees```
187*99e0aae7SDavid Rees
188*99e0aae7SDavid ReesAdditionally, Proto `enum` values must fit in an `int32`, whereas Emboss `enum`
189*99e0aae7SDavid Reesvalues may require up to a `uint64`.
190*99e0aae7SDavid Rees
191*99e0aae7SDavid ReesProto2: In Proto2, `enum`s are closed: unknown values are ignored on message
192*99e0aae7SDavid Reesparse, so `enum` fields can never have an unknown value at runtime.  Emboss
193*99e0aae7SDavid Rees`enum`s, much like C `enum`s, can hold unknown values.
194*99e0aae7SDavid Rees
195*99e0aae7SDavid ReesProto3: In Proto3, `enum`s are open, like Emboss `enum`s, but every Proto3
196*99e0aae7SDavid Rees`enum` must have a first entry whose value is `0`.  In order to avoid
197*99e0aae7SDavid Reescompatibility issues, Emboss should emit a well-known name for the `0` value in
198*99e0aae7SDavid Reesevery case.  There is a second issue in Proto3: there is no "has" bit for enum
199*99e0aae7SDavid Reesfields, so conditional enum fields have to be wrapped in a struct.
200*99e0aae7SDavid Rees(TODO(bolms): are Proto3 `enum`s signed, unsigned, or either?)
201*99e0aae7SDavid Rees
202*99e0aae7SDavid ReesThus, for Proto2, `enum`s would produce something like:
203*99e0aae7SDavid Rees
204*99e0aae7SDavid Rees```
205*99e0aae7SDavid Reesmessage SomeEnum {
206*99e0aae7SDavid Rees  enum SomeEnum {
207*99e0aae7SDavid Rees    VALUE1 = 1;
208*99e0aae7SDavid Rees    VALUE2 = 2;
209*99e0aae7SDavid Rees  }
210*99e0aae7SDavid Rees  oneof {
211*99e0aae7SDavid Rees    SomeEnum value = 1;
212*99e0aae7SDavid Rees    int64 integer_value = 2;
213*99e0aae7SDavid Rees  }
214*99e0aae7SDavid Rees}
215*99e0aae7SDavid Rees```
216*99e0aae7SDavid Rees
217*99e0aae7SDavid Reeswhich would be included in structures as:
218*99e0aae7SDavid Rees
219*99e0aae7SDavid Rees```
220*99e0aae7SDavid Reesmessage SomeStruct {
221*99e0aae7SDavid Rees  optional SomeEnum some_enum = 1;  // NOT SomeEnum.SomeEnum
222*99e0aae7SDavid Rees}
223*99e0aae7SDavid Rees```
224*99e0aae7SDavid Rees
225*99e0aae7SDavid ReesFor Proto3, the situation ends up similar:
226*99e0aae7SDavid Rees
227*99e0aae7SDavid Rees```
228*99e0aae7SDavid Reesmessage SomeEnum {
229*99e0aae7SDavid Rees  enum SomeEnum {
230*99e0aae7SDavid Rees    DEFAULT = 0;
231*99e0aae7SDavid Rees    VALUE1 = 1;
232*99e0aae7SDavid Rees    VALUE2 = 2;
233*99e0aae7SDavid Rees  }
234*99e0aae7SDavid Rees  SomeEnum value = 1;
235*99e0aae7SDavid Rees}
236*99e0aae7SDavid Rees
237*99e0aae7SDavid Reesmessage SomeStruct {
238*99e0aae7SDavid Rees  optional SomeEnum some_enum = 1;  // NOT SomeEnum.SomeEnum
239*99e0aae7SDavid Rees}
240*99e0aae7SDavid Rees```
241*99e0aae7SDavid Rees
242*99e0aae7SDavid Rees
243*99e0aae7SDavid Rees#### `enum` Name Restrictions
244*99e0aae7SDavid Rees
245*99e0aae7SDavid ReesProto enforces a (very slightly) stricter rule for the names of values within
246*99e0aae7SDavid Reesan `enum` than Emboss does: they must not collide *even when translated to
247*99e0aae7SDavid ReesCamelCase*.
248*99e0aae7SDavid Rees
249*99e0aae7SDavid ReesFor example, Emboss allows:
250*99e0aae7SDavid Rees
251*99e0aae7SDavid Rees```
252*99e0aae7SDavid Reesenum Foo:
253*99e0aae7SDavid Rees  BAR_1_1 = 2
254*99e0aae7SDavid Rees  BAR_11 = 11
255*99e0aae7SDavid Rees```
256*99e0aae7SDavid Rees
257*99e0aae7SDavid ReesWhen translated to CamelCase, `BAR_1_1` and `BAR_11` both become `Bar11`, and
258*99e0aae7SDavid Reesthus are not allowed to be part of the same `enum` in Proto.
259*99e0aae7SDavid Rees
260*99e0aae7SDavid ReesIt may be sufficient to require `.emb` authors to update their `enum`s when
261*99e0aae7SDavid Reesattempting to compile to Proto.
262*99e0aae7SDavid Rees
263*99e0aae7SDavid Rees
264*99e0aae7SDavid Rees### Bookkeeping Fields
265*99e0aae7SDavid Rees
266*99e0aae7SDavid ReesEmboss structures often have "bookkeeping" fields that are either irrelevant to
267*99e0aae7SDavid Reestypical Proto consumers, or place unusual restrictions.
268*99e0aae7SDavid Rees
269*99e0aae7SDavid ReesFor example, fields which are used to calculate the offset of other fields are
270*99e0aae7SDavid Reesgenerally not useful to Proto consumers:
271*99e0aae7SDavid Rees
272*99e0aae7SDavid Rees```
273*99e0aae7SDavid Reesstruct Foo:
274*99e0aae7SDavid Rees  0 [+4]  UInt  header_length (h)
275*99e0aae7SDavid Rees  h [+4]  UInt  first_body_message
276*99e0aae7SDavid Rees```
277*99e0aae7SDavid Rees
278*99e0aae7SDavid Rees**These fields would still need to be set correctly when translating *from*
279*99e0aae7SDavid ReesProto to Emboss.**
280*99e0aae7SDavid Rees
281*99e0aae7SDavid ReesSome of the pain could likely be mitigated via a [default
282*99e0aae7SDavid Reesvalues](#default_values.md) feature, when implemented.
283*99e0aae7SDavid Rees
284*99e0aae7SDavid ReesField-length fields are somewhat trickier:
285*99e0aae7SDavid Rees
286*99e0aae7SDavid Rees```
287*99e0aae7SDavid Reesstruct Foo:
288*99e0aae7SDavid Rees  0 [+4]  UInt      message_length (m)
289*99e0aae7SDavid Rees  4 [+m]  UInt:8[]  message_bytes
290*99e0aae7SDavid Rees```
291*99e0aae7SDavid Rees
292*99e0aae7SDavid ReesIn Proto, `message_length` becomes an implicit part of `message_bytes`, since
293*99e0aae7SDavid Rees`message_bytes` knows its own length.  For simple fields cases, as above, we
294*99e0aae7SDavid Reescan likely have the Emboss compiler "just figure it out" and fold
295*99e0aae7SDavid Rees`message_length` into `message_bytes`.  For more complex cases, we will
296*99e0aae7SDavid Reesprobably need to have explicit annotations (`[(proto) set_length_by: x =
297*99e0aae7SDavid Reessome_expression]`), or just require applications using the Proto side to set
298*99e0aae7SDavid Reeslength fields correctly.
299*99e0aae7SDavid Rees
300*99e0aae7SDavid ReesA similar problem happens with "message type" fields:
301*99e0aae7SDavid Rees
302*99e0aae7SDavid Rees```
303*99e0aae7SDavid Reesstruct Foo:
304*99e0aae7SDavid Rees  0 [+4]  MessageType  message_type (mt)
305*99e0aae7SDavid Rees  if mt == MessageType.BAR:
306*99e0aae7SDavid Rees    4 [+8]  Bar  bar
307*99e0aae7SDavid Rees  if mt == MessageType.BAZ:
308*99e0aae7SDavid Rees    4 [+16]  Baz  baz
309*99e0aae7SDavid Rees  # ...
310*99e0aae7SDavid Rees```
311*99e0aae7SDavid Rees
312*99e0aae7SDavid ReesThis will probably be easier to handle with a `union` construct in Emboss.
313*99e0aae7SDavid ReesAgain, "complex" cases will probably have to be handled by application code.
314*99e0aae7SDavid Rees
315*99e0aae7SDavid Rees
316*99e0aae7SDavid Rees## Translation
317*99e0aae7SDavid Rees
318*99e0aae7SDavid Rees### Between Emboss View and Proto In-Memory Format
319*99e0aae7SDavid Rees
320*99e0aae7SDavid ReesTranslation should be relatively straightforward; when going from Emboss to
321*99e0aae7SDavid ReesProto, the problem is roughly equivalent to serializing a View to text, and for
322*99e0aae7SDavid ReesProto to Emboss it is roughly equivalent to deserializing a View from text.
323*99e0aae7SDavid Rees
324*99e0aae7SDavid ReesOne minor difference is that the *deserialization* from Proto must occur in
325*99e0aae7SDavid Reesdependency order, while serialization can happen in any order.  In Emboss text
326*99e0aae7SDavid Reesformat, *serialization* happens in dependency order, and deserialization happens
327*99e0aae7SDavid Reesin whatever order is specified in the text.
328*99e0aae7SDavid Rees
329*99e0aae7SDavid ReesAs with deserialization from text, it is possible for the Proto message to
330*99e0aae7SDavid Reesinclude untranslatable entries (e.g., an Emboss `Int:16` would stored in a Proto
331*99e0aae7SDavid Rees`int32`; a too-large value in the Proto `message` should be rejected).
332*99e0aae7SDavid Rees
333*99e0aae7SDavid Rees
334*99e0aae7SDavid Rees### Between Emboss View and Proto Wire Format
335*99e0aae7SDavid Rees
336*99e0aae7SDavid ReesSince the Proto wire format is extremely stable and documented, it would be
337*99e0aae7SDavid Reespossible for Emboss to emit code to directly translate between Emboss structs
338*99e0aae7SDavid Reesand proto wire format.
339*99e0aae7SDavid Rees
340*99e0aae7SDavid Rees*Serialization* is relatively straightforward; except for arrays, the code
341*99e0aae7SDavid Reesstructure is almost identical to the text serialization code structure.
342*99e0aae7SDavid Rees
343*99e0aae7SDavid Rees*Deserialization* is problematic.  First and foremost, Proto does not specify an
344*99e0aae7SDavid Reesorder in which the fields of a structure will be serialized, so it is entirely
345*99e0aae7SDavid Reespossible for the Emboss view to see a dependent field before its prerequisite
346*99e0aae7SDavid Rees(e.g., have a variable-offset field before the offset specifier field).
347*99e0aae7SDavid ReesSecondly, Proto repeated fields aren't really "arrays"; on the wire, other
348*99e0aae7SDavid Reesfields can appear *in between* elements of repeated fields.  For Emboss, this
349*99e0aae7SDavid Reesmeans that every array in the structure would have to maintain a cursor during
350*99e0aae7SDavid Reesdeserialization.
351*99e0aae7SDavid Rees
352*99e0aae7SDavid ReesIt *may* still be desirable to support serialization without trying to support
353*99e0aae7SDavid Reesdeserialization, or to support deserialization for a subset of structures, so
354*99e0aae7SDavid Reesthat we can send protos to/from microcontrollers: this would be an alternative
355*99e0aae7SDavid Reesto Nanopb for some cases.
356*99e0aae7SDavid Rees
357*99e0aae7SDavid Rees
358*99e0aae7SDavid Rees### Between Emboss View and [Nanopb](https://github.com/nanopb/nanopb)
359*99e0aae7SDavid Rees
360*99e0aae7SDavid ReesIn order to translate between Emboss views and Protos on microcontrollers and
361*99e0aae7SDavid Reesother limited-memory devices, it may make sense to generate Emboss <=> Nanopb
362*99e0aae7SDavid Reescode.  On top of the standard Proto generator, we would have to implement a
363*99e0aae7SDavid ReesNanopb options file generator, and translation code.
364*99e0aae7SDavid Rees
365*99e0aae7SDavid Rees
366*99e0aae7SDavid Rees## Miscellaneous Notes
367*99e0aae7SDavid Rees
368*99e0aae7SDavid Rees### Overlays
369*99e0aae7SDavid Rees
370*99e0aae7SDavid ReesEmboss was designed with the notion that some backends would need their own
371*99e0aae7SDavid Reesattributes -- for example, the `[(cpp) namespace]` attribute, and here there
372*99e0aae7SDavid Reesare a number of `[(proto)]` attributes.
373*99e0aae7SDavid Rees
374*99e0aae7SDavid ReesHowever, adding back-end-specific attributes still requires changes to be made
375*99e0aae7SDavid Reesdirectly to the `.emb` file, which may be inconvenient for `.emb`s from third
376*99e0aae7SDavid Reesparties.
377*99e0aae7SDavid Rees
378*99e0aae7SDavid ReesIdeally, one could write an "overlay file," like:
379*99e0aae7SDavid Rees
380*99e0aae7SDavid Rees```
381*99e0aae7SDavid Reesmessage Foo
382*99e0aae7SDavid Rees  [(proto) attr = value]
383*99e0aae7SDavid Rees
384*99e0aae7SDavid Rees  field
385*99e0aae7SDavid Rees    [(proto) field_attr = value]
386*99e0aae7SDavid Rees```
387*99e0aae7SDavid Rees
388*99e0aae7SDavid ReesThis is not needed for a first pass at a Proto back end, but should be
389*99e0aae7SDavid Reesconsidered.
390*99e0aae7SDavid Rees
391*99e0aae7SDavid Rees
392*99e0aae7SDavid Rees### Generating an `.emb` From a `.proto`
393*99e0aae7SDavid Rees
394*99e0aae7SDavid ReesThere are cases where it would be useful to generate a microcontroller-friendly
395*99e0aae7SDavid Reesrepresentation of an existing Proto, rather than the other way around.
396*99e0aae7SDavid Rees
397*99e0aae7SDavid ReesFor most `message`s, it would be relatively straightforward to generate a
398*99e0aae7SDavid Rees`struct`, like:
399*99e0aae7SDavid Rees
400*99e0aae7SDavid Rees```
401*99e0aae7SDavid Reesmessage Foo {
402*99e0aae7SDavid Rees  optional int32 bar = 1;
403*99e0aae7SDavid Rees  optional bool baz = 2;
404*99e0aae7SDavid Rees  optional string qux = 3;
405*99e0aae7SDavid Rees}
406*99e0aae7SDavid Rees```
407*99e0aae7SDavid Rees
408*99e0aae7SDavid Reesto:
409*99e0aae7SDavid Rees
410*99e0aae7SDavid Rees```
411*99e0aae7SDavid Reesstruct Foo:
412*99e0aae7SDavid Rees  0          [+4]             bits:
413*99e0aae7SDavid Rees    0 [+1]    Flag  has_bar
414*99e0aae7SDavid Rees    1 [+1]    Flag  has_baz
415*99e0aae7SDavid Rees    if has_baz:
416*99e0aae7SDavid Rees      2 [+1]  Flag  baz
417*99e0aae7SDavid Rees    2 [+1]    Flag  has_qux
418*99e0aae7SDavid Rees
419*99e0aae7SDavid Rees  if has_bar:
420*99e0aae7SDavid Rees    4          [+4]           Int:32    bar
421*99e0aae7SDavid Rees
422*99e0aae7SDavid Rees  if has_qux:
423*99e0aae7SDavid Rees    8          [+4]           UInt:32   qux_offset
424*99e0aae7SDavid Rees    12         [+4]           UInt:32   qux_length
425*99e0aae7SDavid Rees    qux_offset [+qux_length]  UInt:8[]  qux
426*99e0aae7SDavid Rees```
427*99e0aae7SDavid Rees
428*99e0aae7SDavid ReesThe main issue is that it would be difficult to maintain equivalent
429*99e0aae7SDavid Reesbackwards-compatibility guarantees to the ones that Proto provides as messages
430*99e0aae7SDavid Reesevolve.
431*99e0aae7SDavid Rees
432*99e0aae7SDavid ReesAlso note that this format is fairly close to the [Cap'n
433*99e0aae7SDavid ReesProto](https://capnproto.org/) format.
434