1////
2Copyright 2011-2016 Beman Dawes
3
4Distributed under the Boost Software License, Version 1.0.
5(http://www.boost.org/LICENSE_1_0.txt)
6////
7
8[#choosing]
9# Choosing between Conversion Functions, Buffer Types, and Arithmetic Types
10:idprefix: choosing_
11
12NOTE: Deciding which is the best endianness approach (conversion functions, buffer
13types, or arithmetic types) for a particular application involves complex
14engineering trade-offs. It is hard to assess those trade-offs without some
15understanding of the different interfaces, so you might want to read the
16<<conversion,conversion functions>>, <<buffers,buffer types>>, and
17<<arithmetic,arithmetic types>> pages before proceeding.
18
19The best approach to endianness for a particular application depends on the
20interaction between the application's needs and the characteristics of each of
21the three  approaches.
22
23*Recommendation:* If you are new to endianness, uncertain, or don't want to
24invest the time to study engineering trade-offs, use
25<<arithmetic,endian arithmetic types>>. They are safe, easy to use, and easy to
26maintain. Use the _<<choosing_anticipating_need,anticipating need>>_ design
27pattern locally around performance hot spots like lengthy loops, if needed.
28
29## Background
30
31A dealing with endianness usually implies a program portability or a data
32portability requirement, and often both. That means real programs dealing with
33endianness are usually complex, so the examples shown here would really be
34written as multiple functions spread across multiple translation units. They
35would involve interfaces that can not be altered as they are supplied by
36third-parties or the standard library.
37
38## Characteristics
39
40The characteristics that differentiate the three approaches to endianness are
41the endianness invariants, conversion explicitness, arithmetic operations, sizes
42available, and alignment requirements.
43
44### Endianness invariants
45
46*Endian conversion functions* use objects of the ordinary {cpp} arithmetic types
47like `int` or `unsigned short` to hold values. That breaks the implicit
48invariant that the {cpp} language rules apply. The usual language rules only apply
49if the endianness of the object is currently set to the native endianness for
50the platform. That can make it very hard to reason about logic flow, and result
51in difficult to find bugs.
52
53For example:
54
55```
56struct data_t  // big endian
57{
58  int32_t   v1;  // description ...
59  int32_t   v2;  // description ...
60  ... additional character data members (i.e. non-endian)
61  int32_t   v3;  // description ...
62};
63
64data_t data;
65
66read(data);
67big_to_native_inplace(data.v1);
68big_to_native_inplace(data.v2);
69
70...
71
72++v1;
73third_party::func(data.v2);
74
75...
76
77native_to_big_inplace(data.v1);
78native_to_big_inplace(data.v2);
79write(data);
80```
81
82The programmer didn't bother to convert `data.v3` to native endianness because
83that member isn't used. A later maintainer needs to pass `data.v3` to the
84third-party function, so adds `third_party::func(data.v3);` somewhere deep in
85the code. This causes a silent failure because the usual invariant that an
86object of type `int32_t` holds a value as described by the {cpp} core language
87does not apply.
88
89*Endian buffer and arithmetic types* hold values internally as arrays of
90characters with an invariant that the endianness of the array never changes.
91That makes these types easier to use and programs easier to maintain.
92
93Here is the same example, using an endian arithmetic type:
94
95```
96struct data_t
97{
98  big_int32_t   v1;  // description ...
99  big_int32_t   v2;  // description ...
100  ... additional character data members (i.e. non-endian)
101  big_int32_t   v3;  // description ...
102};
103
104data_t data;
105
106read(data);
107
108...
109
110++v1;
111third_party::func(data.v2);
112
113...
114
115write(data);
116```
117
118A later maintainer can add `third_party::func(data.v3)` and it will just-work.
119
120### Conversion explicitness
121
122*Endian conversion functions* and *buffer types* never perform implicit
123conversions. This gives users explicit control of when conversion occurs, and
124may help avoid unnecessary conversions.
125
126*Endian arithmetic types* perform conversion implicitly. That makes these types
127very easy to use, but can result in unnecessary conversions. Failure to hoist
128conversions out of inner loops can bring a performance penalty.
129
130### Arithmetic operations
131
132*Endian conversion functions* do not supply arithmetic operations, but this is
133not a concern since this approach uses ordinary {cpp} arithmetic types to hold
134values.
135
136*Endian buffer types* do not supply arithmetic operations. Although this
137approach avoids unnecessary conversions, it can result in the introduction of
138additional variables and confuse maintenance programmers.
139
140*Endian arithmetic types* do supply arithmetic operations. They are very easy to
141use if lots of arithmetic is involved.
142
143### Sizes
144
145*Endianness conversion functions* only support 1, 2, 4, and 8 byte integers.
146That's sufficient for many applications.
147
148*Endian buffer and arithmetic types* support 1, 2, 3, 4, 5, 6, 7, and 8 byte
149integers. For an application where memory use or I/O speed is the limiting
150factor, using sizes tailored to application needs can be useful.
151
152### Alignments
153
154*Endianness conversion functions* only support aligned integer and
155floating-point types. That's sufficient for most applications.
156
157*Endian buffer and arithmetic types* support both aligned and unaligned
158integer and floating-point types. Unaligned types are rarely needed, but when
159needed they are often very useful and workarounds are painful. For example:
160
161Non-portable code like this:
162
163```
164struct S {
165  uint16_t a; // big endian
166  uint32_t b; // big endian
167} __attribute__ ((packed));
168```
169
170Can be replaced with portable code like this:
171
172```
173struct S {
174  big_uint16_ut a;
175  big_uint32_ut b;
176};
177```
178
179## Design patterns
180
181Applications often traffic in endian data as records or packets containing
182multiple endian data elements. For simplicity, we will just call them records.
183
184If desired endianness differs from native endianness, a conversion has to be
185performed. When should that conversion occur? Three design patterns have
186evolved.
187
188### Convert only as needed (i.e. lazy)
189
190This pattern defers conversion to the point in the code where the data
191element is actually used.
192
193This pattern is appropriate when which endian element is actually used varies
194greatly according to record content or other circumstances
195
196[#choosing_anticipating_need]
197### Convert in anticipation of need
198
199This pattern performs conversion to native endianness in anticipation of use,
200such as immediately after reading records. If needed, conversion to the output
201endianness is performed after all possible needs have passed, such as just
202before writing records.
203
204One implementation of this pattern is to create a proxy record with endianness
205converted to native in a read function, and expose only that proxy to the rest
206of the implementation. If a write function, if needed, handles the conversion
207from native to the desired output endianness.
208
209This pattern is appropriate when all endian elements in a record are typically
210used regardless of record content or other circumstances.
211
212### Convert only as needed, except locally in anticipation of need
213
214This pattern in general defers conversion but for specific local needs does
215anticipatory conversion. Although particularly appropriate when coupled with the
216endian buffer or arithmetic types, it also works well with the conversion
217functions.
218
219Example:
220
221[subs=+quotes]
222```
223struct data_t
224{
225  big_int32_t   v1;
226  big_int32_t   v2;
227  big_int32_t   v3;
228};
229
230data_t data;
231
232read(data);
233
234...
235++v1;
236...
237
238int32_t v3_temp = data.v3;  // hoist conversion out of loop
239
240for (int32_t i = 0; i < `large-number`; ++i)
241{
242  ... `lengthy computation that accesses v3_temp` ...
243}
244data.v3 = v3_temp;
245
246write(data);
247```
248
249In general the above pseudo-code leaves conversion up to the endian arithmetic
250type `big_int32_t`. But to avoid conversion inside the loop, a temporary is
251created before the loop is entered, and then used to set the new value of
252`data.v3` after the loop is complete.
253
254Question: Won't the compiler's optimizer hoist the conversion out of the loop
255anyhow?
256
257Answer: V{cpp} 2015 Preview, and probably others, does not, even for a toy test
258program. Although the savings is small (two register `bswap` instructions), the
259cost might be significant if the loop is repeated enough times. On the other
260hand, the program may be so dominated by I/O time that even a lengthy loop will
261be immaterial.
262
263## Use case examples
264
265### Porting endian unaware codebase
266
267An existing codebase runs on  big endian systems. It does not currently deal
268with endianness. The codebase needs to be modified so it can run on little
269endian systems under various operating systems. To ease transition and protect
270value of existing files, external data will continue to be maintained as big
271endian.
272
273The <<arithmetic,endian arithmetic approach>> is recommended to meet these
274needs. A relatively small number of header files dealing with binary I/O layouts
275need to change types. For example, `short` or `int16_t` would change to
276`big_int16_t`. No changes are required for `.cpp` files.
277
278### Porting endian aware codebase
279
280An existing codebase runs on little-endian Linux systems. It already deals with
281endianness via
282http://man7.org/linux/man-pages/man3/endian.3.html[Linux provided functions].
283Because of a business merger, the codebase has to be quickly modified for
284Windows and possibly other operating systems, while still supporting Linux. The
285codebase is reliable and the programmers are all well-aware of endian issues.
286
287These factors all argue for an <<conversion, endian conversion approach>> that
288just mechanically changes the calls to `htobe32`, etc. to
289`boost::endian::native_to_big`, etc. and replaces `<endian.h>` with
290`<boost/endian/conversion.hpp>`.
291
292### Reliability and arithmetic-speed
293
294A new, complex, multi-threaded application is to be developed that must run
295on little endian machines, but do big endian network I/O. The developers believe
296computational speed for endian variable is critical but have seen numerous bugs
297result from inability to reason about endian conversion state. They are also
298worried that future maintenance changes could inadvertently introduce a lot of
299slow conversions if full-blown endian arithmetic types are used.
300
301The <<buffers,endian buffers>> approach is made-to-order for this use case.
302
303### Reliability and ease-of-use
304
305A new, complex, multi-threaded application is to be developed that must run on
306little endian machines, but do big endian network I/O. The developers believe
307computational speed for endian variables is *not critical* but have seen
308numerous bugs result from inability to reason about endian conversion state.
309They are also concerned about ease-of-use both during development and long-term
310maintenance.
311
312Removing concern about conversion speed and adding concern about ease-of-use
313tips the balance strongly in favor the
314<<arithmetic,endian arithmetic approach>>.
315