1# Introduction 2 3In hb-subset serialization is the process of writing the subsetted font 4tables out to actual bytes in the final format. All serialization works 5through an object called the serialize context 6([hb_serialize_context_t](https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-serialize.hh)). 7 8Internally the serialize context holds a fixed size memory buffer. For simple 9tables the final bytes are written into the buffer sequentially to produce 10the final serialized bytes. 11 12## Simple Tables 13 14Simple tables are tables that do not use offset graphs. 15 16To write a struct into the serialization context, first you call an 17allocation method on the context which requests a writable array of bytes of 18a fixed size. If the requested array will not exceed the bounds of the fixed 19buffer the serializer will return a pointer to the next unwritten portion 20of the buffer. Then the struct is cast onto the returned pointer and values 21are written to the structs fields. 22 23Internally the serialization context ends up looking like: 24 25``` 26+-------+-------+-----+-------+--------------+ 27| Obj 1 | Obj 2 | ... | Obj N | Unused Space | 28+-------+-------+-----+-------+--------------+ 29``` 30 31Here Obj N, is the object currently being written. 32 33## Complex Tables 34 35Complex tables are made up of graphs of objects, where offset's are used 36to form the edges of the graphs. Each object is a continuous slice of bytes 37that contains zero or more offsets pointing to more objects. 38 39In this case the serialization buffer has a different layout: 40 41``` 42|- in progress objects -| |--- packed objects --| 43+-----------+-----------+--------------+-------+-----+-------+ 44| Obj n+2 | Obj n+1 | Unused Space | Obj n | ... | Obj 0 | 45+-----------+-----------+--------------+-------+-----+-------+ 46|-----------------------> <---------------------| 47``` 48 49The buffer holds two stacks: 50 511. In progress objects are held in a stack starting from the start of buffer 52 that grows towards the end of the buffer. 53 542. Packed objects are held in a stack that starts at the end of the buffer 55 and grows towards the start of the buffer. 56 57Once the object on the top of the in progress stack is finished being written 58its bytes are popped from the in progress stack and copied to the top of 59the packed objects stack. In the example above, finalizing Obj n+1 60would result in the following state: 61 62``` 63+---------+--------------+---------+-------+-----+-------+ 64| Obj n+2 | Unused Space | Obj n+1 | Obj n | ... | Obj 0 | 65+---------+--------------+---------+-------+-----+-------+ 66``` 67 68Each packed object is associated with an ID, it's zero based position in the packed 69objects stack. In this example Obj 0, would have an ID of 0. 70 71During serialization offsets that link from one object to another are stored 72using object ids. The serialize context maintains a list of links between 73objects. Each link records the parent object id, the child object id, the position 74of the offset field within the parent object, and the width of the offset. 75 76Links are always added to the current in progress object and you can only link to 77objects that have been packed and thus have an ID. 78 79### Object De-duplication 80 81An important optimization in packing offset graphs is de-duplicating equivalent objects. If you 82have two or more parent objects that point to child objects that are equivalent then you only need 83to encode the child once and can have the parents point to the same child. This can significantly 84reduce the final size of a serialized graph. 85 86During packing of an inprogress object the serialization context checks if any existing packed 87objects are equivalent to the object being packed. Here equivalence means the object has the 88exact same bytes and all of it's links are equivalent. If an equivalent object is found the 89in progress object is discarded and not copied to the packed object stack. The object id of 90the equivalent object is instead returned. Thus parent objects will then link to the existing 91equivalent object. 92 93To find equivalent objects the serialization context maintains a hashmap from object to the canonical 94object id. 95 96### Link Resolution 97 98Once all objects have been packed the next step is to assign actual values to all of the offset 99fields. Prior to this point all links in the graph have been recorded using object id's. For each 100link the resolver computes the offset between the parent and child and writes the offset into 101the serialization buffer at the appropriate location. 102 103### Offset Overflow Resolution 104 105If during link resolution the resolver finds that an offsets value would exceed what can be encoded 106in that offset field link resolution is aborted and the offset overflow resolver is invoked. 107That process is documented [here](reapcker.md). 108 109 110### Example of Complex Serialization 111 112 113If we wanted to serialize the following graph: 114 115``` 116a--b--d 117 \ / 118 c 119``` 120 121Serializer would be called like this: 122 123```c++ 124hb_serialize_context_t ctx; 125 126struct root { 127 char name; 128 Offset16To<child> child_1; 129 Offset16To<child> child_2; 130} 131 132struct child { 133 char name; 134 Offset16To<char> leaf; 135} 136 137// Object A. 138ctx->push(); 139root* a = ctx->start_embed<root> (); 140ctx->extend_min (a); 141a->name = 'a'; 142 143// Object B. 144ctx->push(); 145child* b = ctx->start_embed<child> (); 146ctx->extend_min (b); 147b->name = 'b'; 148 149// Object D. 150ctx->push(); 151*ctx->allocate_size<char> (1) = 'd'; 152unsigned d_id = ctx->pop_pack (); 153 154ctx->add_link (b->leaf, d_id); 155unsigned b_id = ctx->pop_pack (); 156 157// Object C 158ctx->push(); 159child* c = ctx->start_embed<child> (); 160ctx->extend_min (c); 161c->name = 'c'; 162 163// Object D. 164ctx->push(); 165*ctx->allocate_size<char> (1) = 'd'; 166d_id = ctx->pop_pack (); // Serializer will automatically de-dup this with the previous 'd' 167 168ctx->add_link (c->leaf, d_id); 169unsigned c_id = ctx->pop_pack (); 170 171// Object A's links: 172ctx->add_link (a->child_1, b_id); 173ctx->add_link (a->child_2, c_id); 174ctx->pop_pack (); 175 176ctx->end_serialize (); 177 178``` 179