xref: /aosp_15_r20/external/harfbuzz_ng/docs/serializer.md (revision 2d1272b857b1f7575e6e246373e1cb218663db8a)
1# Introduction
2
3In hb-subset serialization is the process of writing the subsetted font
4tables out to actual bytes in the final format. All serialization works
5through an object called the serialize context
6([hb_serialize_context_t](https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-serialize.hh)).
7
8Internally the serialize context holds a fixed size memory buffer. For simple
9tables the final bytes are written into the buffer sequentially to produce
10the final serialized bytes.
11
12## Simple Tables
13
14Simple tables are tables that do not use offset graphs.
15
16To write a struct into the serialization context, first you call an
17allocation method on the context which requests a writable array of bytes of
18a fixed size. If the requested array will not exceed the bounds of the fixed
19buffer the serializer will return a pointer to the next unwritten portion
20of the buffer. Then the struct is cast onto the returned pointer and values
21are written to the structs fields.
22
23Internally the serialization context ends up looking like:
24
25```
26+-------+-------+-----+-------+--------------+
27| Obj 1 | Obj 2 | ... | Obj N | Unused Space |
28+-------+-------+-----+-------+--------------+
29```
30
31Here Obj N, is the object currently being written.
32
33## Complex Tables
34
35Complex tables are made up of graphs of objects, where offset's are used
36to form the edges of the graphs. Each object is a continuous slice of bytes
37that contains zero or more offsets pointing to more objects.
38
39In this case the serialization buffer has a different layout:
40
41```
42|- in progress objects -|              |--- packed objects --|
43+-----------+-----------+--------------+-------+-----+-------+
44|  Obj n+2  |  Obj n+1  | Unused Space | Obj n | ... | Obj 0 |
45+-----------+-----------+--------------+-------+-----+-------+
46|----------------------->              <---------------------|
47```
48
49The buffer holds two stacks:
50
511. In progress objects are held in a stack starting from the start of buffer
52   that grows towards the end of the buffer.
53
542. Packed objects are held in a stack that starts at the end of the buffer
55   and grows towards the start of the buffer.
56
57Once the object on the top of the in progress stack is finished being written
58its bytes are popped from the in progress stack and copied to the top of
59the packed objects stack. In the example above, finalizing Obj n+1
60would result in the following state:
61
62```
63+---------+--------------+---------+-------+-----+-------+
64| Obj n+2 | Unused Space | Obj n+1 | Obj n | ... | Obj 0 |
65+---------+--------------+---------+-------+-----+-------+
66```
67
68Each packed object is associated with an ID, it's zero based position in the packed
69objects stack. In this example Obj 0, would have an ID of 0.
70
71During serialization offsets that link from one object to another are stored
72using object ids. The serialize context maintains a list of links between
73objects. Each link records the parent object id, the child object id, the position
74of the offset field within the parent object, and the width of the offset.
75
76Links are always added to the current in progress object and you can only link to
77objects that have been packed and thus have an ID.
78
79### Object De-duplication
80
81An important optimization in packing offset graphs is de-duplicating equivalent objects. If you
82have two or more parent objects that point to child objects that are equivalent then you only need
83to encode the child once and can have the parents point to the same child. This can significantly
84reduce the final size of a serialized graph.
85
86During packing of an inprogress object the serialization context checks if any existing packed
87objects are equivalent to the object being packed. Here equivalence means the object has the
88exact same bytes and all of it's links are equivalent. If an equivalent object is found the
89in progress object is discarded and not copied to the packed object stack. The object id of
90the equivalent object is instead returned. Thus parent objects will then link to the existing
91equivalent object.
92
93To find equivalent objects the serialization context maintains a hashmap from object to the canonical
94object id.
95
96### Link Resolution
97
98Once all objects have been packed the next step is to assign actual values to all of the offset
99fields. Prior to this point all links in the graph have been recorded using object id's. For each
100link the resolver computes the offset between the parent and child and writes the offset into
101the serialization buffer at the appropriate location.
102
103### Offset Overflow Resolution
104
105If during link resolution the resolver finds that an offsets value would exceed what can be encoded
106in that offset field link resolution is aborted and the offset overflow resolver is invoked.
107That process is documented [here](reapcker.md).
108
109
110### Example of Complex Serialization
111
112
113If we wanted to serialize the following graph:
114
115```
116a--b--d
117 \   /
118   c
119```
120
121Serializer would be called like this:
122
123```c++
124hb_serialize_context_t ctx;
125
126struct root {
127  char name;
128  Offset16To<child> child_1;
129  Offset16To<child> child_2;
130}
131
132struct child {
133  char name;
134  Offset16To<char> leaf;
135}
136
137// Object A.
138ctx->push();
139root* a = ctx->start_embed<root> ();
140ctx->extend_min (a);
141a->name = 'a';
142
143// Object B.
144ctx->push();
145child* b = ctx->start_embed<child> ();
146ctx->extend_min (b);
147b->name = 'b';
148
149// Object D.
150ctx->push();
151*ctx->allocate_size<char> (1) = 'd';
152unsigned d_id = ctx->pop_pack ();
153
154ctx->add_link (b->leaf, d_id);
155unsigned b_id = ctx->pop_pack ();
156
157// Object C
158ctx->push();
159child* c = ctx->start_embed<child> ();
160ctx->extend_min (c);
161c->name = 'c';
162
163// Object D.
164ctx->push();
165*ctx->allocate_size<char> (1) = 'd';
166d_id = ctx->pop_pack (); // Serializer will automatically de-dup this with the previous 'd'
167
168ctx->add_link (c->leaf, d_id);
169unsigned c_id = ctx->pop_pack ();
170
171// Object A's links:
172ctx->add_link (a->child_1, b_id);
173ctx->add_link (a->child_2, c_id);
174ctx->pop_pack ();
175
176ctx->end_serialize ();
177
178```
179