xref: /aosp_15_r20/external/bc/manuals/development.md (revision 5a6e848804d15c18a0125914844ee4eb0bda4fcf)
1# Development
2
3Updated: 22 Dec 2023
4
5This document is meant for the day when I (Gavin D. Howard) get [hit by a
6bus][1]. In other words, it's meant to make the [bus factor][1] a non-issue.
7
8This document is supposed to contain all of the knowledge necessary to develop
9`bc` and `dc`.
10
11In addition, this document is meant to add to the [oral tradition of software
12engineering][118], as described by Bryan Cantrill.
13
14This document will reference other parts of the repository. That is so a lot of
15the documentation can be closest to the part of the repo where it is actually
16necessary.
17
18## What Is It?
19
20This repository contains an implementation of both [POSIX `bc`][2] and [Unix
21`dc`][3].
22
23POSIX `bc` is a standard utility required for POSIX systems. `dc` is a
24historical utility that was included in early Unix and even predates both Unix
25and C. They both are arbitrary-precision command-line calculators with their own
26programming languages. `bc`'s language looks similar to C, with infix notation
27and including functions, while `dc` uses [Reverse Polish Notation][4] and allows
28the user to execute strings as though they were functions.
29
30In addition, it is also possible to build the arbitrary-precision math as a
31library, named [`bcl`][156].
32
33**Note**: for ease, I will refer to both programs as `bc` in this document.
34However, if I say "just `bc`," I am referring to just `bc`, and if I say `dc`, I
35am referring to just `dc`.
36
37### History
38
39This project started in January 2018 when a certain individual on IRC, hearing
40that I knew how to write parsers, asked me to write a `bc` parser for his math
41library. I did so. I thought about writing my own math library, but he
42disparaged my programming skills and made me think that I couldn't do it.
43
44However, he took so long to do it that I eventually decided to give it a try and
45had a working math portion in two weeks. It taught me that I should not listen
46to such people.
47
48From that point, I decided to make it an extreme learning experience about how
49to write quality software.
50
51That individual's main goal had been to get his `bc` into [toybox][16], and I
52managed to get my own `bc` in. I also got it in [busybox][17].
53
54Eventually, in late 2018, I also decided to try my hand at implementing
55[Karatsuba multiplication][18], an algorithm that that unnamed individual
56claimed I could never implement. It took me a bit, but I did it.
57
58This project became a passion project for me, and I continued. In mid-2019,
59Stefan Eßer suggested I improve performance by putting more than 1 digit in each
60section of the numbers. After I showed immaturity because of some burnout, I
61implemented his suggestion, and the results were incredible.
62
63Since that time, I have gradually been improving the `bc` as I have learned more
64about things like fuzzing, [`scan-build`][19], [valgrind][20],
65[AddressSanitizer][21] (and the other sanitizers), and many other things.
66
67One of my happiest moments was when my `bc` was made the default in FreeBSD.
68Another happiest moment was when I found out that my `bc` had shipped with macOS
69Ventura, without my knowledge.
70
71But since I believe in [finishing the software I write][22], I have done less
72work on `bc` over time, though there are still times when I put a lot of effort
73in, such as now (17 June 2021), when I am attempting to convince OpenBSD to use
74my `bc`.
75
76And that is why I am writing this document: someday, someone else is going to
77want to change my code, and this document is my attempt to make it as simple as
78possible.
79
80### Values
81
82[According to Bryan Cantrill][10], all software has values. I think he's
83correct, though I [added one value for programming languages in particular][11].
84
85However, for `bc`, his original list will do:
86
87* Approachability
88* Availability
89* Compatibility
90* Composability
91* Debuggability
92* Expressiveness
93* Extensibility
94* Interoperability
95* Integrity
96* Maintainability
97* Measurability
98* Operability
99* Performance
100* Portability
101* Resiliency
102* Rigor
103* Robustness
104* Safety
105* Security
106* Simplicity
107* Stability
108* Thoroughness
109* Transparency
110* Velocity
111
112There are several values that don't apply. The reason they don't apply is
113because `bc` and `dc` are existing utilities; this is just another
114reimplementation. The designs of `bc` and `dc` are set in stone; there is
115nothing we can do to change them, so let's get rid of those values that would
116apply to their design:
117
118* Compatibility
119* Integrity
120* Maintainability
121* Measurability
122* Performance
123* Portability
124* Resiliency
125* Rigor
126* Robustness
127* Safety
128* Security
129* Simplicity
130* Stability
131* Thoroughness
132* Transparency
133
134Furthermore, some of the remaining ones don't matter to me, so let me get rid of
135those and order the rest according to my *actual* values for this project:
136
137* Robustness
138* Stability
139* Portability
140* Compatibility
141* Performance
142* Security
143* Simplicity
144
145First is **robustness**. This `bc` and `dc` should be robust, accepting any
146input, never crashing, and instead, returning an error.
147
148Closely related to that is **stability**. The execution of `bc` and `dc` should
149be deterministic and never change for the same inputs, including the
150pseudo-random number generator (for the same seed).
151
152Third is **portability**. These programs should run everywhere that POSIX
153exists, as well as Windows. This means that just about every person on the
154planet will have access to these programs.
155
156Next is **compatibility**. These programs should, as much as possible, be
157compatible with other existing implementations and standards.
158
159Then we come to **performance**. A calculator is only usable if it's fast, so
160these programs should run as fast as possible.
161
162After that is **security**. These programs should *never* be the reason a user's
163computer is compromised.
164
165And finally, **simplicity**. Where possible, the code should be simple, while
166deferring to the above values.
167
168Keep these values in mind for the rest of this document, and for exploring any
169other part of this repo.
170
171#### Portability
172
173But before I go on, I want to talk about portability in particular.
174
175Most of these principles just require good attention and care, but portability
176is different. Sometimes, it requires pulling in code from other places and
177adapting it. In other words, sometimes I need to duplicate and adapt code.
178
179This happened in a few cases:
180
181* Option parsing (see [`include/opt.h`][35]).
182* History (see [`include/history.h`][36]).
183* Pseudo-Random Number Generator (see [`include/rand.h`][37]).
184
185This was done because I decided to ensure that `bc`'s dependencies were
186basically zero. In particular, either users have a normal install of Windows or
187they have a POSIX system.
188
189A POSIX system limited me to C99, `sh`, and zero external dependencies. That
190last item is why I pull code into `bc`: if I pull it in, it's not an external
191dependency.
192
193That's why `bc` has duplicated code. Remove it, and you risk `bc` not being
194portable to some platforms.
195
196## Suggested Course
197
198I do have a suggested course for programmers to follow when trying to understand
199this codebase. The order is this:
200
2011.	`bc` Spec.
2022.	Manpages.
2033.	Test suite.
2044.	Understand the build.
2055.	Algorithms manual.
2066.	Code concepts.
2077.	Repo structure.
2088.	Headers.
2099.	Source code.
210
211This order roughly follows this order:
212
2131. High-level requirements
2142. Low-level requirements
2153. High-level implementation
2164. Low-level implementation
217
218In other words, first understand what the code is *supposed* to do, then
219understand the code itself.
220
221## Useful External Tools
222
223I have a few tools external to `bc` that are useful:
224
225* A [Vim plugin with syntax files made specifically for my `bc` and `dc`][132].
226* A [repo of `bc` and `dc` scripts][133].
227* A set of `bash` aliases (see below).
228* A `.bcrc` file with items useful for my `bash` setup (see below).
229
230My `bash` aliases are these:
231
232```sh
233alias makej='make -j16'
234alias mcmake='make clean && make'
235alias mcmakej='make clean && make -j16'
236alias bcdebug='CPPFLAGS="-DBC_DEBUG_CODE=1" CFLAGS="-Weverything -Wno-padded \
237    -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
238    -Wno-unreachable-code-return -Wno-missing-noreturn \
239    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
240    -pedantic -std=c99" ./configure.sh'
241alias bcconfig='CFLAGS="-Weverything -Wno-padded -Wno-switch-enum \
242    -Wno-format-nonliteral -Wno-cast-align -Wno-unreachable-code-return \
243    -Wno-missing-noreturn -Wno-disabled-macro-expansion -Wno-unreachable-code \
244    -Wall -Wextra -pedantic -std=c99" ./configure.sh'
245alias bcnoassert='CPPFLAGS="-DNDEBUG" CFLAGS="-Weverything -Wno-padded \
246    -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \
247    -Wno-unreachable-code-return -Wno-missing-noreturn \
248    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
249    -pedantic -std=c99" ./configure.sh'
250alias bcdebugnoassert='CPPFLAGS="-DNDEBUG -DBC_DEBUG_CODE=1" \
251    CFLAGS="-Weverything -Wno-padded -Wno-switch-enum -Wno-format-nonliteral \
252    -Wno-cast-align -Wno-unreachable-code-return -Wno-missing-noreturn \
253    -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \
254    -pedantic -std=c99" ./configure.sh'
255alias bcunset='unset BC_LINE_LENGTH && unset BC_ENV_ARGS'
256```
257
258`makej` runs `make` with all of my cores.
259
260`mcmake` runs `make clean` before running `make`. It will take a target on the
261command-line.
262
263`mcmakej` is a combination of `makej` and `mcmake`.
264
265`bcdebug` configures `bc` for a full debug build, including `BC_DEBUG_CODE` (see
266[Debugging][134] below).
267
268`bcconfig` configures `bc` with Clang (Clang is my personal default compiler)
269using full warnings, with a few really loud and useless warnings turned off.
270
271`bcnoassert` configures `bc` to not have asserts built in.
272
273`bcdebugnoassert` is like `bcnoassert`, except it also configures `bc` for debug
274mode.
275
276`bcunset` unsets my personal `bc` environment variables, which are set to:
277
278```sh
279export BC_ENV_ARGS="-l $HOME/.bcrc"
280export BC_LINE_LENGTH="74"
281```
282
283Unsetting these environment variables are necessary for running
284[`scripts/release.sh`][83] because otherwise, it will error when attempting to
285run `bc -s` on my `$HOME/.bcrc`.
286
287Speaking of which, the contents of that file are:
288
289```bc
290define void print_time_unit(t){
291	if(t<10)print "0"
292	if(t<1&&t)print "0"
293	print t,":"
294}
295define void sec2time(t){
296	auto s,m,h,d,r
297	r=scale
298	scale=0
299	t=abs(t)
300	s=t%60
301	t-=s
302	m=t/60%60
303	t-=m
304	h=t/3600%24
305	t-=h
306	d=t/86400
307	if(d)print_time_unit(d)
308	if(h)print_time_unit(h)
309	print_time_unit(m)
310	if(s<10)print "0"
311	if(s<1&&s)print "0"
312	s
313	scale=r
314}
315define minutes(secs){
316	return secs/60;
317}
318define hours(secs){
319	return secs/3600;
320}
321define days(secs){
322	return secs/3600/24;
323}
324define years(secs){
325	return secs/3600/24/365.25;
326}
327define fbrand(b,p){
328	auto l,s,t
329	b=abs(b)$
330	if(b<2)b=2
331	s=scale
332	t=b^abs(p)$
333	l=ceil(l2(t),0)
334	if(l>scale)scale=l
335	t=irand(t)/t
336	scale=s
337	return t
338}
339define ifbrand(i,b,p){return irand(abs(i)$)+fbrand(b,p)}
340```
341
342This allows me to use `bc` as part of my `bash` prompt.
343
344## Code Style
345
346The code style for `bc` is...weird, and that comes from historical accident.
347
348In [History][23], I mentioned how I got my `bc` in [toybox][16]. Well, in order
349to do that, my `bc` originally had toybox style. Eventually, I changed to using
350tabs, and assuming they were 4 spaces wide, but other than that, I basically
351kept the same style, with some exceptions that are more or less dependent on my
352taste.
353
354However, I later managed to get [ClangFormat][24] to work, so I changed the
355style to that.
356
357### ClangFormat
358
359The style is now defined as whatever [ClangFormat][24] outputs using the
360existing `.clang-format` file. More precisely, the style is whatever is output
361when the following command is run in the root directory:
362
363```
364./scripts/format.sh
365```
366
367### Historical Style
368
369The code style used to be:
370
371* Tabs are 4 spaces.
372* Tabs are used at the beginning of lines for indent.
373* Spaces are used for alignment.
374* Lines are limited to 80 characters, period.
375* Pointer asterisk (`*`) goes with the variable (on the right), not the type,
376  unless it is for a pointer type returned from a function.
377* The opening brace is put on the same line as the header for the function,
378  loop, or `if` statement.
379* Unless the header is more than one line, in which case the opening brace is
380  put on its own line.
381* If the opening brace is put on its own line, there is no blank line after it.
382* If the opening brace is *not* put on its own line, there *is* a blank line
383  after it, *unless* the block is only one or two lines long.
384* Code lines are grouped into what I call "paragraphs." Basically, lines that
385  seem like they should go together are grouped together. This one comes down
386  to judgment.
387* Bodies of `if` statements, `else` statements, and loops that are one line
388  long are put on the same line as the statement, unless the header is more than
389  one line long, and/or, the header and body cannot fit into 80 characters with
390  a space inbetween them.
391* If single-line bodies are on a separate line from their headers, and the
392  headers are only a single line, then no braces are used.
393* However, braces are *always* used if they contain another `if` statement or
394  loop.
395* Loops with empty bodies are ended with a semicolon.
396* Expressions that return a boolean value are surrounded by paretheses.
397* Macro backslashes are aligned as far to the left as possible.
398* Binary operators have spaces on both sides.
399* If a line with binary operators overflows 80 characters, a newline is inserted
400  *after* binary operators.
401* Function modifiers and return types are on the same line as the function name.
402* With one exception, `goto`'s are only used to jump to the end of a function
403  for cleanup.
404* All structs, enums, and unions are `typedef`'ed.
405* All constant data is in one file: [`src/data.c`][131], but the corresponding
406  `extern` declarations are in the appropriate header file.
407* All local variables are declared at the beginning of the scope where they
408  appear. They may be initialized at that point, if it does not invoke UB or
409  otherwise cause bugs.
410* All precondition `assert()`'s (see [Asserts][135]) come *after* local variable
411  declarations.
412* Besides short `if` statements and loops, there should *never* be more than one
413  statement per line.
414
415## Repo Structure
416
417Functions are documented with Doxygen-style doc comments. Functions that appear
418in headers are documented in the headers, while static functions are documented
419where they are defined.
420
421### `configure`
422
423A symlink to [`configure.sh`][69].
424
425### `configure.sh`
426
427This is the script to configure `bc` and [`bcl`][156] for building.
428
429This `bc` has a custom build system. The reason for this is because of
430[*portability*][136].
431
432If `bc` used an outside build system, that build system would be an external
433dependency. Thus, I had to write a build system for `bc` that used nothing but
434C99 and POSIX utilities.
435
436One of those utilities is POSIX `sh`, which technically implements a
437Turing-complete programming language. It's a terrible one, but it works.
438
439A user that wants to build `bc` on a POSIX system (not Windows) first runs
440`configure.sh` with the options he wants. `configure.sh` uses those options and
441the `Makefile` template ([`Makefile.in`][70]) to generate an actual valid
442`Makefile`. Then `make` can do the rest.
443
444For more information about the build process, see the [Build System][142]
445section and the [build manual][14].
446
447For more information about shell scripts, see [POSIX Shell Scripts][76].
448
449`configure.sh` does the following:
450
4511.	It processes command-line arguments and figure out what the user wants to
452	build.
4532.	It reads in [`Makefile.in`][70].
4543.	One-by-one, it replaces placeholders (in [`Makefile.in`][70]) of the form
455	`%%<placeholder_name>%%` based on the [build type][81].
4564.	It appends a list of file targets based on the [build type][81].
4575.	It appends the correct test targets.
4586.	It copies the correct manpage and markdown manual for `bc` and `dc` into a
459	location from which they can be copied for install.
4607.	It does a `make clean` to reset the build state.
461
462### `.gitattributes`
463
464A `.gitattributes` file. This is needed to preserve the `crlf` line endings in
465the Visual Studio files.
466
467### `.gitignore`
468
469The `.gitignore`
470
471### `LICENSE.md`
472
473This is the `LICENSE` file, including the licenses of various software that I
474have borrowed.
475
476### `Makefile.in`
477
478This is the `Makefile` template for [`configure.sh`][69] to use for generating a
479`Makefile`.
480
481For more information, see [`configure.sh`][69], the [Build System][142] section,
482and the [build manual][14].
483
484Because of [portability][136], the generated `Makefile.in` should be a pure
485[POSIX `make`][74]-compatible `Makefile` (minus the placeholders). Here are a
486few snares for the unwary programmer in this file:
487
4881.	No extensions allowed, including and especially GNU extensions.
4892.	If new headers are added, they must also be added to `Makefile.in`.
4903.	Don't delete the `.POSIX:` empty target at the top; that's what tells `make`
491	implementations that pure [POSIX `make`][74] is needed.
492
493In particular, there is no way to set up variables other than the `=` operator.
494There are no conditionals, so all of the conditional stuff must be in
495[`configure.sh`][69]. This is, in fact, why [`configure.sh`][69] exists in the
496first place: [POSIX `make`][74] is barebones and only does a build with no
497configuration.
498
499### `NEWS.md`
500
501A running changelog with an entry for each version. This should be updated at
502the same time that [`include/version.h`][75] is.
503
504### `NOTICE.md`
505
506The `NOTICE` file with proper attributions.
507
508### `README.md`
509
510The `README`. Read it.
511
512### `benchmarks/`
513
514The folder containing files to generate benchmarks.
515
516Each of these files was made, at one time or another, to benchmark some
517experimental feature, so if it seems there is no rhyme or reason to these
518benchmarks, it is because there is none, besides historical accident.
519
520#### `bc/`
521
522The folder containing `bc` scripts to generate `bc` benchmarks.
523
524##### `add.bc`
525
526The file to generate the benchmark to benchmark addition in `bc`.
527
528##### `arrays_and_constants.bc`
529
530The file to generate the benchmark to benchmark `bc` using lots of array names
531and constants.
532
533##### `arrays.bc`
534
535The file to generate the benchmark to benchmark `bc` using lots of array names.
536
537##### `constants.bc`
538
539The file to generate the benchmark to benchmark `bc` using lots of constants.
540
541##### `divide.bc`
542
543The file to generate the benchmark to benchmark division in `bc`.
544
545##### `functions.bc`
546
547The file to generate the benchmark to benchmark `bc` using lots of functions.
548
549##### `irand_long.bc`
550
551The file to generate the benchmark to benchmark `bc` using lots of calls to
552`irand()` with large bounds.
553
554##### `irand_short.bc`
555
556The file to generate the benchmark to benchmark `bc` using lots of calls to
557`irand()` with small bounds.
558
559##### `lib.bc`
560
561The file to generate the benchmark to benchmark `bc` using lots of calls to
562heavy functions in `lib.bc`.
563
564##### `multiply.bc`
565
566The file to generate the benchmark to benchmark multiplication in `bc`.
567
568##### `newton_raphson_div_large.bc`
569
570The file to generate the benchmark to benchmark the Newton-Raphson division in
571[GitHub PR #72][229] with large numbers.
572
573##### `newton_raphson_div_small.bc`
574
575The file to generate the benchmark to benchmark the Newton-Raphson division in
576[GitHub PR #72][229] with small numbers.
577
578##### `newton_raphson_sqrt_large.bc`
579
580The file to generate the benchmark to benchmark the Newton-Raphson square root
581in [GitHub PR #72][229] with large numbers.
582
583##### `newton_raphson_sqrt_small.bc`
584
585The file to generate the benchmark to benchmark the Newton-Raphson square root
586in [GitHub PR #72][229] with small numbers.
587
588##### `postfix_incdec.bc`
589
590The file to generate the benchmark to benchmark `bc` using postfix increment and
591decrement operators.
592
593##### `power.bc`
594
595The file to generate the benchmark to benchmark power (exponentiation) in `bc`.
596
597##### `subtract.bc`
598
599The file to generate the benchmark to benchmark subtraction in `bc`.
600
601##### `strings.bc`
602
603The file to generate the benchmark to benchmark `bc` using lots of strings.
604
605#### `dc/`
606
607The folder containing `dc` scripts to generate `dc` benchmarks.
608
609##### `modexp.dc`
610
611The file to generate the benchmark to benchmark modular exponentiation in `dc`.
612
613### `gen/`
614
615A folder containing the files necessary to generate C strings that will be
616embedded in the executable.
617
618All of the files in this folder have license headers, but the program and script
619that can generate strings from them include code to strip the license header out
620before strings are generated.
621
622#### `bc_help.txt`
623
624A text file containing the text displayed for `bc -h` or `bc --help`.
625
626This text just contains the command-line options and a short summary of the
627differences from GNU and BSD `bc`'s. It also directs users to the manpage.
628
629The reason for this is because otherwise, the help would be far too long to be
630useful.
631
632**Warning**: The text has some `printf()` format specifiers. You need to make
633sure the format specifiers match the arguments given to `bc_file_printf()`.
634
635#### `dc_help.txt`
636
637A text file containing the text displayed for `dc -h` or `dc --help`.
638
639This text just contains the command-line options and a short summary of the
640differences from GNU and BSD `dc`'s. It also directs users to the manpage.
641
642The reason for this is because otherwise, the help would be far too long to be
643useful.
644
645**Warning**: The text has some `printf()` format specifiers. You need to make
646sure the format specifiers match the arguments given to `bc_file_printf()`.
647
648#### `lib.bc`
649
650A `bc` script containing the [standard math library][5] required by POSIX. See
651the [POSIX standard][2] for what is required.
652
653This file does not have any extraneous whitespace, except for tabs at the
654beginning of lines. That is because this data goes directly into the binary,
655and whitespace is extra bytes in the binary. Thus, not having any extra
656whitespace shrinks the resulting binary.
657
658However, tabs at the beginning of lines are kept for two reasons:
659
6601.	Readability. (This file is still code.)
6612.	The program and script that generate strings from this file can remove
662	tabs at the beginning of lines.
663
664For more details about the algorithms used, see the [algorithms manual][25].
665
666However, there are a few snares for unwary programmers.
667
668First, all constants must be one digit. This is because otherwise, multi-digit
669constants could be interpreted wrongly if the user uses a different `ibase`.
670This does not happen with single-digit numbers because they are guaranteed to be
671interpreted what number they would be if the `ibase` was as high as possible.
672
673This is why `A` is used in the library instead of `10`, and things like `2*9*A`
674for `180` in [`lib2.bc`][26].
675
676As an alternative, you can set `ibase` in the function, but if you do, make sure
677to set it with a single-digit number and beware the snare below...
678
679Second, `scale`, `ibase`, and `obase` must be safely restored before returning
680from any function in the library. This is because without the `-g` option,
681functions are allowed to change any of the globals.
682
683Third, all local variables in a function must be declared in an `auto` statement
684before doing anything else. This includes arrays. However, function parameters
685are considered predeclared.
686
687Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code
688must not use *any* extensions. It has to work when users use the `-s` or `-w`
689flags.
690
691#### `lib2.bc`
692
693A `bc` script containing the [extended math library][7].
694
695Like [`lib.bc`][8], and for the same reasons, this file should have no
696extraneous whitespace, except for tabs at the beginning of lines.
697
698For more details about the algorithms used, see the [algorithms manual][25].
699
700Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary
701programmers when writing code for `lib2.bc`.
702
703#### `strgen.c`
704
705Code for the program to generate C strings from text files. This is the original
706program, although [`strgen.sh`][9] was added later.
707
708The reason I used C here is because even though I knew `sh` would be available
709(it must be available to run `configure.sh`), I didn't know how to do what I
710needed to do with POSIX utilities and `sh`.
711
712Later, [`strgen.sh`][9] was contributed by Stefan Eßer of FreeBSD, showing that
713it *could* be done with `sh` and POSIX utilities.
714
715However, `strgen.c` exists *still* exists because the versions generated by
716[`strgen.sh`][9] may technically hit an environmental limit. (See the [draft C99
717standard][12], page 21.) This is because [`strgen.sh`][9] generates string
718literals, and in C99, string literals can be limited to 4095 characters, and
719`gen/lib2.bc` is above that.
720
721Fortunately, the limit for "objects," which include `char` arrays, is much
722bigger: 65535 bytes, so that's what `strgen.c` generates.
723
724However, the existence of `strgen.c` does come with a cost: the build needs C99
725compiler that targets the host machine. For more information, see the ["Cross
726Compiling" section][13] of the [build manual][14].
727
728Read the comments in `strgen.c` for more detail about it, the arguments it
729takes, and how it works.
730
731#### `strgen.sh`
732
733An `sh` script that will generate C strings that uses only POSIX utilities. This
734exists for those situations where a host C99 compiler is not available, and the
735environment limits mentioned above in [`strgen.c`][15] don't matter.
736
737`strgen.sh` takes the same arguments as [`strgen.c`][15], and the arguments mean
738the exact same things, so see the comments in [`strgen.c`][15] for more detail
739about that, and see the comments in `strgen.sh` for more details about it and
740how it works.
741
742For more information about shell scripts, see [POSIX Shell Scripts][76].
743
744### `include/`
745
746A folder containing the headers.
747
748The headers are not included among the source code because I like it better that
749way. Also there were folders within `src/` at one point, and I did not want to
750see `#include "../some_header.h"` or things like that.
751
752So all headers are here, even though only one ([`bcl.h`][30]) is meant for end
753users (to be installed in `INCLUDEDIR`).
754
755#### `args.h`
756
757This file is the API for processing command-line arguments.
758
759#### `bc.h`
760
761This header is the API for `bc`-only items. This includes the `bc_main()`
762function and the `bc`-specific lexing and parsing items.
763
764The `bc` parser is perhaps the most sensitive part of the entire codebase. See
765the documentation in `bc.h` for more information.
766
767The code associated with this header is in [`src/bc.c`][40],
768[`src/bc_lex.c`][41], and [`src/bc_parse.c`][42].
769
770#### `bcl.h`
771
772This header is the API for the [`bcl`][156] library.
773
774This header is meant for distribution to end users and contains the API that end
775users of [`bcl`][156] can use in their own software.
776
777This header, because it's the public header, is also the root header. That means
778that it has platform-specific fixes for Windows. (If the fixes were not in this
779header, the build would fail on Windows.)
780
781The code associated with this header is in [`src/library.c`][43].
782
783#### `dc.h`
784
785This header is the API for `dc`-only items. This includes the `dc_main()`
786function and the `dc`-specific lexing and parsing items.
787
788The code associated with this header is in [`src/dc.c`][44],
789[`src/dc_lex.c`][45], and [`src/dc_parse.c`][46].
790
791#### `file.h`
792
793This header is for `bc`'s internal buffered I/O API.
794
795For more information about `bc`'s error handling and custom buffered I/O, see
796[Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the
797notes about version [`3.0.0`][32] in the [`NEWS`][32].
798
799The code associated with this header is in [`src/file.c`][47].
800
801#### `history.h`
802
803This header is for `bc`'s implementation of command-line editing/history, which
804is based on a [UTF-8-aware fork][28] of [`linenoise`][29].
805
806For more information, see the [Command-Line History][189] section.
807
808The code associated with this header is in [`src/history.c`][48].
809
810#### `lang.h`
811
812This header defines the data structures and bytecode used for actual execution
813of `bc` and `dc` code.
814
815Yes, it's misnamed; that's an accident of history where the first things I put
816into it all seemed related to the `bc` language.
817
818The code associated with this header is in [`src/lang.c`][49].
819
820#### `lex.h`
821
822This header defines the common items that both programs need for lexing.
823
824The code associated with this header is in [`src/lex.c`][50],
825[`src/bc_lex.c`][41], and [`src/dc_lex.c`][45].
826
827#### `library.h`
828
829This header defines the things needed for [`bcl`][156] that users should *not*
830have access to. In other words, [`bcl.h`][30] is the *public* header for the
831library, and this header is the *private* header for the library.
832
833The code associated with this header is in [`src/library.c`][43].
834
835#### `num.h`
836
837This header is the API for numbers and math.
838
839The code associated with this header is in [`src/num.c`][39].
840
841#### `opt.h`
842
843This header is the API for parsing command-line arguments.
844
845It's different from [`args.h`][31] in that [`args.h`][31] is for the main code
846to process the command-line arguments into global data *after* they have already
847been parsed by `opt.h` into proper tokens. In other words, `opt.h` actually
848parses the command-line arguments, and [`args.h`][31] turns that parsed data
849into flags (bits), strings, and expressions that will be used later.
850
851Why are they separate? Because originally, `bc` used `getopt_long()` for
852parsing, so [`args.h`][31] was the only one that existed. After it was
853discovered that `getopt_long()` has different behavior on different platforms, I
854adapted a [public-domain option parsing library][34] to do the job instead. And
855in doing so, I gave it its own header.
856
857They could probably be combined, but I don't really care enough at this point.
858
859The code associated with this header is in [`src/opt.c`][51].
860
861#### `parse.h`
862
863This header defines the common items that both programs need for parsing.
864
865Note that the parsers don't produce abstract syntax trees (AST's) or any
866intermediate representations. They produce bytecode directly. In other words,
867they don't have special data structures except what they need to do their job.
868
869The code associated with this header is in [`src/parse.c`][50],
870[`src/bc_lex.c`][42], and [`src/dc_lex.c`][46].
871
872#### `program.h`
873
874This header defines the items needed to manage the data structures in
875[`lang.h`][38] as well as any helper functions needed to generate bytecode or
876execute it.
877
878The code associated with this header is in [`src/program.c`][53].
879
880#### `rand.h`
881
882This header defines the API for the [pseudo-random number generator
883(PRNG)][179].
884
885The PRNG only generates fixed-size integers. The magic of generating random
886numbers of arbitrary size is actually given to the code that does math
887([`src/num.c`][39]).
888
889The code associated with this header is in [`src/rand.c`][54].
890
891#### `read.h`
892
893This header defines the API for reading from files and `stdin`.
894
895Thus, [`file.h`][55] is really for buffered *output*, while this file is for
896*input*. There is no buffering needed for `bc`'s inputs.
897
898The code associated with this header is in [`src/read.c`][56].
899
900#### `status.h`
901
902This header has several things:
903
904* A list of possible errors that internal `bc` code can use.
905* Compiler-specific fixes.
906* Platform-specific fixes.
907* Macros for `bc`'s [error handling][97].
908
909There is no code associated with this header.
910
911#### `vector.h`
912
913This header defines the API for the vectors (resizable arrays) that are used for
914data structures.
915
916Vectors are what do the heavy lifting in almost all of `bc`'s data structures.
917Even the maps of identifiers and arrays use vectors.
918
919The code associated with this header is in [`src/vector.c`][228].
920
921#### `version.h`
922
923This header defines the version of `bc`.
924
925There is no code associated with this header.
926
927#### `vm.h`
928
929This header defines the API for setting up and running `bc` and `dc`.
930
931It is so named because I think of it as the "virtual machine" of `bc`, though
932that is probably not true as [`program.h`][57] is probably the "virtual machine"
933API. Thus, the name is more historical accident.
934
935The code associated with this header is in [`src/vm.c`][58].
936
937### `locales/`
938
939This folder contains a bunch of `.msg` files and soft links to the real `.msg`
940files. This is how locale support is implemented in `bc`.
941
942The files are in the format required by the [`gencat`][59] POSIX utility. They
943all have the same messages, in the same order, with the same numbering, under
944the same groups. This is because the locale system expects those messages in
945that order.
946
947The softlinks exist because for many locales, they would contain the exact same
948information. To prevent duplication, they are simply linked to a master copy.
949
950The naming format for all files is:
951
952```
953<language_code>_<country_code>.<encoding>.msg
954```
955
956This naming format must be followed for all locale files.
957
958### `manuals/`
959
960This folder contains the documentation for `bc`, `dc`, and [`bcl`][156], along
961with a few other manuals.
962
963#### `algorithms.md`
964
965This file explains the mathematical algorithms that are used.
966
967The hope is that this file will guide people in understanding how the math code
968works.
969
970#### `bc.1.md.in`
971
972This file is a template for the markdown version of the `bc` manual and
973manpages.
974
975For more information about how the manpages and markdown manuals are generated,
976and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
977
978#### `bcl.3`
979
980This is the manpage for the [`bcl`][156] library. It is generated from
981[`bcl.3.md`][61] using [`scripts/manpage.sh`][60].
982
983For the reason why I check generated data into the repo, see
984[`scripts/manpage.sh`][60] and [Manuals][86].
985
986#### `bcl.3.md`
987
988This is the markdown manual for the [`bcl`][156] library. It is the source for the
989generated [`bcl.3`][62] file.
990
991#### `benchmarks.md`
992
993This is a document that compares this `bc` to GNU `bc` in various benchmarks. It
994was last updated when version [`3.0.0`][32] was released.
995
996It has very little documentation value, other than showing what compiler options
997are useful for performance.
998
999#### `build.md`
1000
1001This is the [build manual][14].
1002
1003This `bc` has a custom build system. The reason for this is because of
1004[*portability*][136].
1005
1006If `bc` used an outside build system, that build system would be an external
1007dependency. Thus, I had to write a build system for `bc` that used nothing but
1008C99 and POSIX utilities, including barebones [POSIX `make`][74].
1009
1010for more information about the build system, see the [build system][142]
1011section, the [build manual][14], [`configure.sh`][69], and [`Makefile.in`][70].
1012
1013#### `dc.1.md.in`
1014
1015This file is a template for the markdown version of the `dc` manual and
1016manpages.
1017
1018For more information about how the manpages and markdown manuals are generated,
1019and for why, see [`scripts/manpage.sh`][60] and [Manuals][86].
1020
1021#### `development.md`
1022
1023The file you are reading right now.
1024
1025#### `header_bcl.txt`
1026
1027Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper
1028header.
1029
1030For more information about generating manuals, see [`scripts/manpage.sh`][60]
1031and [Manuals][86].
1032
1033#### `header_bc.txt`
1034
1035Used by [`scripts/manpage.sh`][60] to give the [generated `bc` manpages][79] a
1036proper header.
1037
1038For more information about generating manuals, see [`scripts/manpage.sh`][60]
1039and [Manuals][86].
1040
1041#### `header_dc.txt`
1042
1043Used by [`scripts/manpage.sh`][60] to give the [generated `dc` manpages][80] a
1044proper header.
1045
1046For more information about generating manuals, see [`scripts/manpage.sh`][60]
1047and [Manuals][86].
1048
1049#### `header.txt`
1050
1051Used by [`scripts/manpage.sh`][60] to give all generated manpages a license
1052header.
1053
1054For more information about generating manuals, see [`scripts/manpage.sh`][60]
1055and [Manuals][86].
1056
1057#### `release.md`
1058
1059A checklist that I try to somewhat follow when making a release.
1060
1061#### `bc/`
1062
1063A folder containing the `bc` manuals.
1064
1065Each `bc` manual corresponds to a [build type][81]. See that link for more
1066details.
1067
1068For each manual, there are two copies: the markdown version generated from the
1069template, and the manpage generated from the markdown version.
1070
1071#### `dc/`
1072
1073A folder containing the `dc` manuals.
1074
1075Each `dc` manual corresponds to a [build type][81]. See that link for more
1076details.
1077
1078For each manual, there are two copies: the markdown version generated from the
1079template, and the manpage generated from the markdown version.
1080
1081### `scripts/`
1082
1083This folder contains helper scripts. Most of them are written in pure [POSIX
1084`sh`][72], but three ([`afl.py`][94], [`karatsuba.py`][78], and
1085[`randmath.py`][95]) are written in Python 3, and one ([`ministat.c`][223]) is
1086written in C. [`ministat.c`][223] in particular is copied from elsewhere.
1087
1088For more information about the shell scripts, see [POSIX Shell Scripts][76].
1089
1090#### `afl.py`
1091
1092This script is meant to be used as part of the fuzzing workflow.
1093
1094It does one of two things: checks for valid crashes, or runs `bc` and or `dc`
1095under all of the paths found by [AFL++][125].
1096
1097See [Fuzzing][82] for more information about fuzzing, including this script.
1098
1099#### `alloc.sh`
1100
1101This script is a quick and dirty script to test whether or not the garbage
1102collection mechanism of the [`BcNum` caching][96] works. It has been little-used
1103because it tests something that is not important to correctness.
1104
1105#### `benchmark.sh`
1106
1107A script making it easy to run benchmarks and to run the executable produced by
1108[`ministat.c`][223] on them.
1109
1110For more information, see the [Benchmarks][144] section.
1111
1112#### `bitfuncgen.c`
1113
1114A source file for an executable to generate tests for `bc`'s bitwise functions
1115in [`gen/lib2.bc`][26]. The executable is `scripts/bitfuncgen`, and it is built
1116with `make bitfuncgen`. It produces the test on `stdout` and the expected
1117results on `stderr`. This means that to generat tests, use the following
1118invokation:
1119
1120```
1121scripts/bitfuncgen > tests/bc/bitfuncs.txt 2> tests/bc/bitfuncs_results.txt
1122```
1123
1124It calls `abort()` if it runs into an error.
1125
1126#### `exec-install.sh`
1127
1128This script is the magic behind making sure `dc` is installed properly if it's
1129a symlink to `bc`. It checks to see if it is a link, and if so, it just creates
1130a new symlink in the install directory. Of course, it also installs `bc` itself,
1131or `dc` when it's alone.
1132
1133#### `functions.sh`
1134
1135This file is a bunch of common functions for most of the POSIX shell scripts. It
1136is not supposed to be run; instead, it is *sourced* by other POSIX shell
1137scripts, like so:
1138
1139```
1140. "$scriptdir/functions.sh"
1141```
1142
1143or the equivalent, depending on where the sourcing script is.
1144
1145For more information about the shell scripts, see [POSIX Shell Scripts][76].
1146
1147#### `fuzz_prep.sh`
1148
1149Fuzzing is a regular activity when I am preparing for a release.
1150
1151This script handles all the options and such for building a fuzzable binary.
1152Instead of having to remember a bunch of options, I just put them in this script
1153and run the script when I want to fuzz.
1154
1155For more information about fuzzing, see [Fuzzing][82].
1156
1157#### `karatsuba.py`
1158
1159This script has at least one of two major differences from most of the other
1160scripts:
1161
1162* It's written in Python 3.
1163* It's meant for software packagers.
1164
1165For example, [`scripts/afl.py`][94] and [`scripts/randmath.py`][95] are both in
1166Python 3, but they are not meant for the end user or software packagers and are
1167not included in source distributions. But this script is.
1168
1169This script breaks my rule of only POSIX utilities necessary for package
1170maintainers, but there's a very good reason for that: it's only meant to be run
1171*once* when the package is created for the first time, and maybe not even then.
1172
1173You see, this script does two things: it tests the Karatsuba implementation at
1174various settings for `KARATSUBA_LEN`, and it figures out what the optimal
1175`KARATSUBA_LEN` is for the machine that it is running on.
1176
1177Package maintainers can use this script, when creating a package for this `bc`,
1178to figure out what is optimal for their users. Then they don't have to run it
1179ever again. So this script only has to run on the packagers machine.
1180
1181I tried to write the script in `sh`, by the way, and I finally accepted the
1182tradeoff of using Python 3 when it became too hard.
1183
1184However, I also mentioned that it's for testing Karatsuba with various settings
1185of `KARATSUBA_LEN`. Package maintainers will want to run the [test suite][124],
1186right?
1187
1188Yes, but this script is not part of the [test suite][124]; it's used for testing
1189in the [`scripts/release.sh`][83] script, which is maintainer use only.
1190
1191However, there is one snare with `karatsuba.py`: I didn't want the user to have
1192to install any Python libraries to run it. Keep that in mind if you change it.
1193
1194#### `link.sh`
1195
1196This script is the magic behind making `dc` a symlink of `bc` when both
1197calculators are built.
1198
1199#### `locale_install.sh`
1200
1201This script does what its name says: it installs locales.
1202
1203It turns out that this is complicated.
1204
1205There is a magic environment variable, `$NLSPATH`, that tells you how and where
1206you are supposed to install locales.
1207
1208Yes, *how*. And where.
1209
1210But now is not the place to rant about `$NLSPATH`. For more information on
1211locales and `$NLSPATH`, see [Locales][85].
1212
1213#### `locale_uninstall.sh`
1214
1215This script does what its name says: it uninstalls locales.
1216
1217This is far less complicated than installing locales. I basically generate a
1218wildcard path and then list all paths that fit that wildcard. Then I delete each
1219one of those paths. Easy.
1220
1221For more information on locales, see [Locales][85].
1222
1223#### `manpage.sh`
1224
1225This script is the one that generates markdown manuals from a template and a
1226manpage from a markdown manual.
1227
1228For more information about generating manuals, see [Manuals][86].
1229
1230#### `ministat.c`
1231
1232This is a file copied [from FreeBSD][221] that calculates the standard
1233statistical numbers, such as mean, average, and median, based on numbers
1234obtained from a file.
1235
1236For more information, see the [FreeBSD ministat(1) manpage][222].
1237
1238This file allows `bc` to build the `scripts/ministat` executable using the
1239command `make ministat`, and this executable helps programmers evaluate the
1240results of [benchmarks][144] more accurately.
1241
1242#### `package.sh`
1243
1244This script is what helps `bc` maintainers cut a release. It does the following:
1245
12461.	Creates the appropriate `git` tag.
12472.	Pushes the `git` tag.
12483.	Copies the repo to a temp directory.
12494.	Removes files that should not be included in source distributions.
12505.	Creates the tarballs.
12516.	Signs the tarballs.
12527.	Zips and signs the Windows executables if they exist.
12538.	Calculates and outputs SHA512 and SHA256 sums for all of the files,
1254	including the signatures.
1255
1256This script is for `bc` maintainers to use when cutting a release. It is not
1257meant for outside use. This means that some non-POSIX utilities can be used,
1258such as `git` and `gpg`.
1259
1260In addition, before using this script, it expects that the folders that Windows
1261generated when building `bc`, `dc`, and [`bcl`][156], are in the parent
1262directory of the repo, exactly as Windows generated them. If they are not there,
1263then it will not zip and sign, nor calculate sums of, the Windows executables.
1264
1265Because this script creates a tag and pushes it, it should *only* be run *ONCE*
1266per release.
1267
1268#### `radamsa.sh`
1269
1270A script to test `bc`'s command-line expression parsing code, which, while
1271simple, strives to handle as much as possible.
1272
1273What this script does is it uses the test cases in [`radamsa.txt`][98] an input
1274to the [Radamsa fuzzer][99].
1275
1276For more information, see the [Radamsa][128] section.
1277
1278#### `radamsa.txt`
1279
1280Initial test cases for the [`radamsa.sh`][100] script.
1281
1282#### `randmath.py`
1283
1284This script generates random math problems and checks that `bc`'s and `dc`'s
1285output matches the GNU `bc` and `dc`. (For this reason, it is necessary to have
1286GNU `bc` and `dc` installed before using this script.)
1287
1288One snare: be sure that this script is using the GNU `bc` and `dc`, not a
1289previously-installed version of this `bc` and `dc`.
1290
1291If you want to check for memory issues or failing asserts, you can build the
1292`bc` using `./scripts/fuzz_prep.sh -a`, and then run it under this script. Any
1293errors or crashes should be caught by the script and given to the user as part
1294of the "checklist" (see below).
1295
1296The basic idea behind this script is that it generates as many math problems as
1297it can, biasing towards situations that may be likely to have bugs, and testing
1298each math problem against GNU `bc` or `dc`.
1299
1300If GNU `bc` or `dc` fails, it just continues. If this `bc` or `dc` fails, it
1301stores that problem. If the output mismatches, it also stores the problem.
1302
1303Then, when the user sends a `SIGINT`, the script stops testing and goes into
1304report mode. One-by-one, it will go through the "checklist," the list of failed
1305problems, and present each problem to the user, as well as whether this `bc` or
1306`dc` crashed, and its output versus GNU. Then the user can decide to add them as
1307test cases, which it does automatically to the appropriate test file.
1308
1309#### `release_settings.txt`
1310
1311A text file of settings combinations that [`release.sh`][83] uses to ensure that
1312`bc` and `dc` build and work with various default settings. [`release.sh`][83]
1313simply reads it line by line and uses each line for one build.
1314
1315#### `release.sh`
1316
1317This script is for `bc` maintainers only. It runs `bc`, `dc`, and [`bcl`][156]
1318through a gauntlet that is mostly meant to be used in preparation for a release.
1319
1320It does the following:
1321
13221.	Builds every [build type][81], with every setting combo in
1323	[`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1324	alone.
13252.	Builds every [build type][81], with every setting combo in
1326	[`release_settings.txt`][93] with both calculators, `bc` alone, and `dc`
1327	alone for 32-bit.
13283.	Does #1 and #2 for Debug, Release, Release with Debug Info, and Min Size
1329	Release builds.
13304.	Runs the [test suite][124] on every build, if desired.
13315.	Runs the [test suite][124] under [ASan, UBSan, and MSan][21] for every build
1332	type/setting combo.
13336.	Runs [`scripts/karatsuba.py`][78] in test mode.
13347.	Runs the [test suite][124] for both calculators, `bc` alone, and `dc` alone
1335	under [valgrind][20] and errors if there are any memory bugs or memory
1336	leaks.
1337
1338#### `safe-install.sh`
1339
1340A script copied from [musl][101] to atomically install files.
1341
1342#### `test_settings.sh`
1343
1344A quick and dirty script to help automate rebuilding while manually testing the
1345various default settings.
1346
1347This script uses [`test_settings.txt`][103] to generate the various settings
1348combos.
1349
1350For more information about settings, see [Settings][102] in the [build
1351manual][14].
1352
1353#### `test_settings.txt`
1354
1355A list of the various settings combos to be used by [`test_settings.sh`][104].
1356
1357### `src/`
1358
1359This folder is, obviously, where the actual heart and soul of `bc`, the source
1360code, is.
1361
1362All of the source files are in one folder; this simplifies the build system
1363immensely.
1364
1365There are separate files for `bc` and `dc` specific code ([`bc.c`][40],
1366[`bc_lex.c`][41], [`bc_parse.c`][42], [`dc.c`][44], [`dc_lex.c`][45], and
1367[`dc_parse.c`][46]) where possible because it is cleaner to exclude an entire
1368source file from a build than to have `#if`/`#endif` preprocessor guards.
1369
1370That said, it was easier in many cases to use preprocessor macros where both
1371calculators used much of the same code and data structures, so there is a
1372liberal sprinkling of them through the code.
1373
1374#### `args.c`
1375
1376Code for processing command-line arguments.
1377
1378The header for this file is [`include/args.h`][31].
1379
1380#### `bc.c`
1381
1382The code for the `bc` main function `bc_main()`.
1383
1384The header for this file is [`include/bc.h`][106].
1385
1386#### `bc_lex.c`
1387
1388The code for lexing that only `bc` needs.
1389
1390The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106].
1391
1392#### `bc_parse.c`
1393
1394The code for parsing that only `bc` needs. This code is the most complex and
1395subtle in the entire codebase.
1396
1397The headers for this file are [`include/parse.h`][181] and
1398[`include/bc.h`][106].
1399
1400#### `data.c`
1401
1402Due to [historical accident][23] because of a desire to get my `bc` into
1403[toybox][16], all of the constant data that `bc` needs is all in one file. This
1404is that file.
1405
1406There is no code in this file, but a lot of the const data has a heavy influence
1407on code, including the order of data in arrays because that order has to
1408correspond to the order of other things elsewhere in the codebase. If you change
1409the order of something in this file, run `make test`, and get errors, you
1410changed something that depends on the order that you messed up.
1411
1412Almost all headers have `extern` references to items in this file.
1413
1414#### `dc.c`
1415
1416The code for the `dc` main function `dc_main()`.
1417
1418The header for this file is [`include/dc.h`][182].
1419
1420#### `dc_lex.c`
1421
1422The code for lexing that only `dc` needs.
1423
1424The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182].
1425
1426#### `dc_parse.c`
1427
1428The code for parsing that only `dc` needs.
1429
1430The headers for this file are [`include/parse.h`][181] and
1431[`include/bc.h`][182].
1432
1433#### `file.c`
1434
1435The code for `bc`'s implementation of buffered I/O. For more information about
1436why I implemented my own buffered I/O, see [`include/file.h`][55], [Error
1437Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes
1438about version [`3.0.0`][32] in the [`NEWS`][32].
1439
1440The header for this file is [`include/file.h`][55].
1441
1442#### `history.c`
1443
1444The code for `bc`'s implementation of command-line editing/history, which is
1445based on a [UTF-8-aware fork][28] of [`linenoise`][29].
1446
1447For more information, see the [Command-Line History][189] section.
1448
1449The header for this file is [`include/history.h`][36].
1450
1451#### `lang.c`
1452
1453The data structures used for actual execution of `bc` and `dc` code.
1454
1455While execution is done in [`src/program.c`][53], this file defines functions
1456for initializing, copying, and freeing the data structures, which is somewhat
1457orthogonal to actual execution.
1458
1459Yes, it's misnamed; that's an accident of history where the first things I put
1460into it all seemed related to the `bc` language.
1461
1462The header for this file is [`include/lang.h`][38].
1463
1464#### `lex.c`
1465
1466The code for the common things that both programs need for lexing.
1467
1468The header for this file is [`include/lex.h`][180].
1469
1470#### `library.c`
1471
1472The code to implement the public API of the `bcl` library.
1473
1474The code in this file does a lot to ensure that clients do not have to worry
1475about internal `bc` details, especially error handling with `setjmp()` and
1476`longjmp()`. That and encapsulating the handling of numbers are the bulk of what
1477the code in this file actually does because most of the library is still
1478implemented in [`src/num.c`][39].
1479
1480The headers for this file are [`include/bcl.h`][30] and
1481[`include/library.h`][183].
1482
1483#### `main.c`
1484
1485The entry point for both programs; this is the `main()` function.
1486
1487This file has no headers associated with it.
1488
1489#### `num.c`
1490
1491The code for all of the arbitrary-precision [numbers][177] and [math][178] in
1492`bc`.
1493
1494The header for this file is [`include/num.h`][184].
1495
1496#### `opt.c`
1497
1498The code for parsing command-line options.
1499
1500The header for this file is [`include/opt.h`][35].
1501
1502#### `parse.c`
1503
1504The code for the common items that both programs need for parsing.
1505
1506The header for this file is [`include/parse.h`][181].
1507
1508#### `program.c`
1509
1510The code for the actual execution engine for `bc` and `dc` code.
1511
1512The header for this file is [`include/program.h`][57].
1513
1514#### `rand.c`
1515
1516The code for the [pseudo-random number generator (PRNG)][179] and the special
1517stack handling it needs.
1518
1519The PRNG only generates fixed-size integers. The magic of generating random
1520numbers of arbitrary size is actually given to the code that does math
1521([`src/num.c`][39]).
1522
1523The header for this file is [`include/rand.h`][37].
1524
1525#### `read.c`
1526
1527The code for reading from files and `stdin`.
1528
1529The header for this file is [`include/read.h`][185].
1530
1531#### `vector.c`
1532
1533The code for [vectors][111], [maps][186], and [slab vectors][187], along with
1534slabs.
1535
1536The header for this file is [`include/vector.h`][174].
1537
1538#### `vm.c`
1539
1540The code for setting up and running `bc` and `dc`.
1541
1542It is so named because I think of it as the "virtual machine" of `bc`, though
1543that is probably not true as [`program.h`][57] is probably the "virtual machine"
1544code. Thus, the name is more historical accident.
1545
1546The header for this file is [`include/vm.h`][27].
1547
1548### `tests/`
1549
1550This directory contains the entire [test suite][124] and its infrastructure.
1551
1552#### `all.sh`
1553
1554A convenience script for the `make run_all_tests` target (see the [Group
1555Tests][141] section for more information).
1556
1557#### `all.txt`
1558
1559The file with the names of the calculators. This is to make it easier for the
1560test scripts to know where the standard and other test directories are.
1561
1562#### `bcl.c`
1563
1564The test for the [`bcl`][156] API. For more information, see the [`bcl`
1565Test][157] section.
1566
1567#### `error.sh`
1568
1569The script to run the file-based error tests in `tests/<calculator>/errors/` for
1570each calculator. For more information, see the [Error Tests][151] section.
1571
1572This is a separate script so that each error file can be run separately and in
1573parallel.
1574
1575#### `errors.sh`
1576
1577The script to run the line-based error tests in `tests/<calculator>/errors.txt`
1578for each calculator. For more information, see the [Error Tests][151] section.
1579
1580#### `extra_required.txt`
1581
1582The file with the list of tests which both calculators have that need the [Extra
1583Math build option][188]. This exists to make it easy for test scripts to skip
1584those tests when the [Extra Math build option][188] is disabled.
1585
1586#### `history.py`
1587
1588The file with all of the history tests. For more information, see the [History
1589Tests][155] section.
1590
1591#### `history.sh`
1592
1593The script to integrate [`history.py`][139] into the build system in a portable
1594way, and to skip it if necessary.
1595
1596This script also re-runs the test three times if it fails. This is because
1597`pexpect` can be flaky at times.
1598
1599#### `other.sh`
1600
1601The script to run the "other" (miscellaneous) tests for each calculator. For
1602more information, see the [Other Tests][154] section.
1603
1604#### `read.sh`
1605
1606The script to run the read tests for each calculator. For more information, see
1607the [`read()` Tests][153] section.
1608
1609#### `script.sed`
1610
1611The `sed` script to edit the output of GNU `bc` when generating script tests.
1612For more information, see the [Script Tests][150] section.
1613
1614#### `script.sh`
1615
1616The script for running one script test. For more information, see the [Script
1617Tests][150] section.
1618
1619#### `scripts.sh`
1620
1621The script to help the `make run_all_tests` (see the [Group Tests][141] section)
1622run all of the script tests.
1623
1624#### `stdin.sh`
1625
1626The script to run the `stdin` tests for each calculator. For more information,
1627see the [`stdin` Tests][152] section.
1628
1629#### `test.sh`
1630
1631The script to run one standard test. For more information, see the [Standard
1632Tests][149] section.
1633
1634#### `bc/`
1635
1636The standard tests directory for `bc`. For more information, see the [Standard
1637Tests][149] section.
1638
1639##### `all.txt`
1640
1641The file to tell the build system and `make run_all_tests` (see the [Group
1642Tests][141] section) what standard tests to run for `bc`, as well as in what
1643order.
1644
1645This file just lists the test names, one per line.
1646
1647##### `errors.txt`
1648
1649The initial error test file for `bc`. This file has one test per line. See the
1650[Error Tests][151] section for more information.
1651
1652##### `posix_errors.txt`
1653
1654The file of tests for POSIX compatibility for `bc`. This file has one test per
1655line. For more information, see the [Error Tests][151] section.
1656
1657##### `timeconst.sh`
1658
1659The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6].
1660For more information, see the [Linux `timeconst.bc` Script][191]section.
1661
1662##### `errors/`
1663
1664The directory with error tests for `bc`, most discovered by AFL++ (see the
1665[Fuzzing][82] section). There is one test per file. For more information, see
1666the [Error Tests][151] section.
1667
1668##### `scripts/`
1669
1670The script tests directory for `bc`. For more information, see the [Script
1671Tests][150] section.
1672
1673###### `all.txt`
1674
1675A file to tell the build system and `make run_all_tests` (see the [Group
1676Tests][141] section) what script tests to run for `bc`, as well as in what
1677order.
1678
1679This file just lists the test names, one per line.
1680
1681#### `dc/`
1682
1683The standard tests directory for `dc`. For more information, see the [Standard
1684Tests][149] section.
1685
1686##### `all.txt`
1687
1688The file to tell the build system and `make run_all_tests` (see the [Group
1689Tests][141] section) what standard tests to run for `dc`, as well as in what
1690order.
1691
1692This file just lists the test names, one per line.
1693
1694##### `errors.txt`
1695
1696The initial error test file for `dc`. This file has one test per line. See the
1697[Error Tests][151] section for more information.
1698
1699##### `read_errors.txt`
1700
1701The file of tests errors with the `?` command (`read()` in `bc`). This file has
1702one test per line. See the [Error Tests][151] section for more information.
1703
1704##### `errors/`
1705
1706The directory with error tests for `dc`, most discovered by AFL++ (see the
1707[Fuzzing][82] section). There is one test per file. For more information, see
1708the [Error Tests][151] section.
1709
1710##### `scripts/`
1711
1712The script tests directory for `dc`. For more information, see the [Script
1713Tests][150] section.
1714
1715###### `all.txt`
1716
1717The file to tell the build system and `make run_all_tests` (see the [Group
1718Tests][141] section) what script tests to run for `dc`, as well as in what
1719order.
1720
1721This file just lists the test names, one per line.
1722
1723#### `fuzzing/`
1724
1725The directory containing the fuzzing infrastructure. For more information, see
1726the [Fuzzing][82] section.
1727
1728##### `bc_afl_continue.yaml`
1729
1730The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a
1731fuzz run. For more information, see the [Convenience][130] subsection of the
1732[Fuzzing][82] section.
1733
1734##### `bc_afl.yaml`
1735
1736The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a
1737fuzz run. For more information, see the [Convenience][130] subsection of the
1738[Fuzzing][82] section.
1739
1740Be aware that this will delete all previous unsaved fuzzing tests in the output
1741directories.
1742
1743##### `bc_inputs1/`
1744
1745The fuzzing input directory for the first third of inputs for `bc`. For more
1746information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1747
1748##### `bc_inputs2/`
1749
1750The fuzzing input directory for the second third of inputs for `bc`. For more
1751information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1752
1753##### `bc_inputs3/`
1754
1755The fuzzing input directory for the third third of inputs for `bc`. For more
1756information, see the [Corpuses][192] subsection of the [Fuzzing][82] section.
1757
1758##### `dc_inputs/`
1759
1760The fuzzing input directory for the inputs for `dc`. For more information, see
1761the [Corpuses][192] subsection of the [Fuzzing][82] section.
1762
1763### `vs/`
1764
1765The directory containing all of the materials needed to build `bc`, `dc`, and
1766`bcl` on Windows.
1767
1768#### `bcl.sln`
1769
1770A Visual Studio solution file for [`bcl`][156]. This, along with
1771[`bcl.vcxproj`][63] and [`bcl.vcxproj.filters`][64] is what makes it possible to
1772build [`bcl`][156] on Windows.
1773
1774#### `bcl.vcxproj`
1775
1776A Visual Studio project file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1777and [`bcl.vcxproj.filters`][64] is what makes it possible to build [`bcl`][156]
1778on Windows.
1779
1780#### `bcl.vcxproj.filters`
1781
1782A Visual Studio filters file for [`bcl`][156]. This, along with [`bcl.sln`][65]
1783and [`bcl.vcxproj`][63] is what makes it possible to build [`bcl`][156] on
1784Windows.
1785
1786#### `bc.sln`
1787
1788A Visual Studio solution file for `bc`. This, along with [`bc.vcxproj`][66]
1789and [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on
1790Windows.
1791
1792#### `bc.vcxproj`
1793
1794A Visual Studio project file for `bc`. This, along with [`bc.sln`][68] and
1795[`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on Windows.
1796
1797#### `bc.vcxproj.filters`
1798
1799A Visual Studio filters file for `bc`. This, along with [`bc.sln`][68] and
1800[`bc.vcxproj`][66] is what makes it possible to build `bc` on Windows.
1801
1802#### `tests/`
1803
1804A directory of files to run tests on Windows.
1805
1806##### `tests_bc.bat`
1807
1808A file to run basic `bc` tests on Windows. It expects that it will be run from
1809the directory containing it, and it also expects a `bc.exe` in the same
1810directory.
1811
1812##### `tests_dc.bat`
1813
1814A file to run basic `dc` tests on Windows. It expects that it will be run from
1815the directory containing it, and it also expects a `bc.exe` in the same
1816directory.
1817
1818## Build System
1819
1820The build system is described in detail in the [build manual][14], so
1821maintainers should start there. This section, however, describes some parts of
1822the build system that only maintainers will care about.
1823
1824### Clean Targets
1825
1826`bc` has a default `make clean` target that cleans up the build files. However,
1827because `bc`'s build system can generate many different types of files, there
1828are other clean targets that may be useful:
1829
1830* `make clean_gen` cleans the `gen/strgen` executable generated from
1831  [`gen/strgen.c`][15]. It has no prerequisites.
1832* `make clean` cleans object files, `*.cat` files (see the [Locales][85]
1833  section), executables, and files generated from text files in [`gen/`][145],
1834  including `gen/strgen` if it was built. So this has a prerequisite on
1835  `make clean_gen` in normal use.
1836* `make clean_benchmarks` cleans [benchmarks][144], including the `ministat`
1837  executable. It has no prerequisites.
1838* `make clean_config` cleans the generated `Makefile` and the manuals that
1839  [`configure.sh`][69] copied in preparation for install. It also depends on
1840  `make clean` and `make clean_benchmarks`, so it cleans those items too. This
1841  is the target that [`configure.sh`][69] uses before it does its work.
1842* `make clean_coverage` cleans the generated coverage files for the [test
1843  suite][124]'s [code coverage][146] capabilities. It has no prerequisites. This
1844  is useful if the code coverage tools are giving errors.
1845* `make clean_tests` cleans *everything*. It has prerequisites on all previous
1846  clean targets, but it also cleans all of the [generated tests][143].
1847
1848When adding more generated files, you may need to add them to one of these
1849targets and/or add a target for them especially.
1850
1851### Preprocessor Macros
1852
1853`bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type:
1854
1855* builds,
1856* works under the [test suite][124], and
1857* excludes as much code as possible from all builds.
1858
1859This section will explain the preprocessor style of `bc` and `dc`, as well as
1860provide an explanation of the macros used.
1861
1862#### Style
1863
1864The style of macro use in `bc` is pretty straightforward: I avoid depending on
1865macro definitions and instead, I set defaults if the macro is not defined and
1866then test the value if the macro with a plain `#if`.
1867
1868(Some examples of setting defaults are in [`include/status.h`][176], just above
1869the definition of the `BcStatus` enum.)
1870
1871In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible.
1872
1873There are a couple of cases where I went with standard stuff instead.
1874
1875#### Standard Macros
1876
1877`BC_ENABLED`
1878
1879:   This macro expands to `1` if `bc` is enabled, `0` if disabled.
1880
1881`DC_ENABLED`
1882
1883:   This macro expands to `1` if `dc` is enabled, `0` if disabled.
1884
1885`BUILD_TYPE`
1886
1887:   The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`,
1888    `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct
1889    the user to the correct markdown manual in the `git.gavinhoward.com`
1890    website.
1891
1892`EXECPREFIX`
1893
1894:   This macro expands to the prefix on the executable name. This is used to
1895    allow `bc` and `dc` to skip the prefix when finding out which calculator is
1896    executing.
1897
1898`BC_NUM_KARATSUBA_LEN`
1899
1900:   This macro expands to an integer, which is the length of numbers below which
1901    the Karatsuba multiplication algorithm switches to brute-force
1902    multiplication.
1903
1904`BC_ENABLE_EXTRA_MATH`
1905
1906:   This macro expands to `1` if the [Extra Math build option][188] is enabled,
1907    `0` if disabled.
1908
1909`BC_ENABLE_HISTORY`
1910
1911:   This macro expands to `1` if the [History build option][193] is enabled, `0`
1912    if disabled.
1913
1914`BC_ENABLE_NLS`
1915
1916:   This macro expands to `1` if the [NLS build option][193] (for locales) is
1917    enabled, `0` if disabled.
1918
1919`BC_ENABLE_LIBRARY`
1920
1921:   This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if
1922    disabled. If this is enabled, building the calculators themselves is
1923    disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero.
1924
1925`BC_ENABLE_MEMCHECK`
1926
1927:   This macro expands to `1` if `bc` has been built for use with Valgrind's
1928    [Memcheck][194], `0` otherwise. This ensures that fatal errors still free
1929    all of their memory when exiting. `bc` does not do that normally because
1930    what's the point?
1931
1932`BC_ENABLE_AFL`
1933
1934:   This macro expands to `1` if `bc` has been built for fuzzing with
1935    [AFL++][125], `0` otherwise. See the [Fuzzing][82] section for more
1936    information.
1937
1938`BC_DEFAULT_BANNER`
1939
1940:   This macro expands to the default value for displaying the `bc` banner.
1941
1942`BC_DEFAULT_SIGINT_RESET`
1943
1944:   The macro expands to the default value for whether or not `bc` should reset
1945    on `SIGINT` or quit.
1946
1947`BC_DEFAULT_TTY_MODE`
1948
1949:   The macro expands to the default value for whether or not `bc` should use
1950    TTY mode when it available.
1951
1952`BC_DEFAULT_PROMPT`
1953
1954:   This macro expands to the default value for whether or not `bc` should use a
1955    prompt when TTY mode is available.
1956
1957`DC_DEFAULT_SIGINT_RESET`
1958
1959:   The macro expands to the default value for whether or not `dc` should reset
1960    on `SIGINT` or quit.
1961
1962`DC_DEFAULT_TTY_MODE`
1963
1964:   The macro expands to the default value for whether or not `dc` should use
1965    TTY mode when it available.
1966
1967`DC_DEFAULT_PROMPT`
1968
1969:   This macro expands to the default value for whether or not `dc` should use a
1970    prompt when TTY mode is available.
1971
1972`BC_DEBUG_CODE`
1973
1974:   If this macro expands to a non-zero integer, then `bc` is built with *a lot*
1975    of extra debugging code. This is never set by the build system and must be
1976    set by the programmer manually. This should never be set in builds given to
1977    end users. For more information, see the [Debugging][134] section.
1978
1979## Test Suite
1980
1981While the source code may be the heart and soul of `bc`, the test suite is the
1982arms and legs: it gives `bc` the power to do anything it needs to do.
1983
1984The test suite is what allowed `bc` to climb to such high heights of quality.
1985This even goes for fuzzing because fuzzing depends on the test suite for its
1986input corpuses. (See the [Fuzzing][82] section.)
1987
1988Understanding how the test suite works should be, I think, the first thing that
1989maintainers learn after learning what `bc` and `dc` should do. This is because
1990the test suite, properly used, gives confidence that changes have not caused
1991bugs or regressions.
1992
1993That is why I spent the time to make the test suite as easy to use and as fast
1994as possible.
1995
1996To use the test suite (assuming `bc` and/or `dc` are already built), run the
1997following command:
1998
1999```
2000make test
2001```
2002
2003That's it. That's all.
2004
2005It will return an error code if the test suite failed. It will also print out
2006information about the failure.
2007
2008If you want the test suite to go fast, then run the following command:
2009
2010```
2011make -j<cores> test
2012```
2013
2014Where `<cores>` is the number of cores that your computer has. Of course, this
2015requires a `make` implementation that supports that option, but most do. (And I
2016will use this convention throughout the rest of this section.)
2017
2018I have even tried as much as possible, to put longer-running tests near the
2019beginning of the run so that the entire suite runs as fast as possible.
2020
2021However, if you want to be sure which test is failing, then running a bare
2022`make test` is a great way to do that.
2023
2024But enough about how you have no excuses to use the test suite as much as
2025possible; let's talk about how it works and what you *can* do with it.
2026
2027### Standard Tests
2028
2029The heavy lifting of testing the math in `bc`, as well as basic scripting, is
2030done by the "standard tests" for each calculator.
2031
2032These tests use the files in the [`tests/bc/`][161] and [`tests/dc/`][162]
2033directories (except for [`tests/bc/all.txt`][163], [`tests/bc/errors.txt`][164],
2034[`tests/bc/posix_errors.txt`][165], [`tests/bc/timeconst.sh`][166],
2035[`tests/dc/all.txt`][167], [`tests/dc/errors.txt`][168], and
2036[`tests/dc/read_errors.txt`][175]), which are called the "standard test
2037directories."
2038
2039For every test, there is the test file and the results file. The test files have
2040names of the form `<test>.txt`, where `<test>` is the name of the test, and the
2041results files have names of the form `<test>_results.txt`.
2042
2043If the test file exists but the results file does not, the results for that test
2044are generated by a GNU-compatible `bc` or `dc`. See the [Generated Tests][143]
2045section.
2046
2047The `all.txt` file in each standard tests directory is what tells the test suite
2048and [build system][142] what tests there are, and the tests are either run in
2049that order, or in the case of parallel `make`, that is the order that the
2050targets are listed as prerequisites of `make test`.
2051
2052If the test exists in the `all.txt` file but does not *actually* exist, the test
2053and its results are generated by a GNU-compatible `bc` or `dc`. See the
2054[Generated Tests][143] section.
2055
2056To add a non-generated standard test, do the following:
2057
2058* Add the test file (`<test>.txt` in the standard tests directory).
2059* Add the results file (`<test>_results.txt` in the standard tests directory).
2060  You can skip this step if just the results file needs to be generated. See the
2061  [Generated Tests][147] section for more information.
2062* Add the name of the test to the `all.txt` file in the standard tests
2063  directory, putting it in the order it should be in. If possible, I would put
2064  longer tests near the beginning because they will start running earlier with
2065  parallel `make`. I always keep `decimal` first, though, as a smoke test.
2066
2067If you need to add a generated standard test, see the [Generated Tests][147]
2068section for how to do that.
2069
2070Some standard tests need to be skipped in certain cases. That is handled by the
2071[build system][142]. See the [Integration with the Build System][147] section
2072for more details.
2073
2074In addition to all of the above, the standard test directory is not only the
2075directory for the standard tests of each calculator, it is also the parent
2076directory of all other test directories for each calculator.
2077
2078#### `bc` Standard Tests
2079
2080The list of current (27 February 2023) standard tests for `bc` is below:
2081
2082decimal
2083
2084:   Tests decimal parsing and printing.
2085
2086print
2087
2088:   Tests printing in every base from decimal. This is near the top for
2089    performance of parallel testing.
2090
2091parse
2092
2093:   Tests parsing in any base and outputting in decimal. This is near the top
2094    for performance of parallel testing.
2095
2096lib2
2097
2098:   Tests the extended math library. This is near the top for performance of
2099    parallel testing.
2100
2101print2
2102
2103:   Tests printing at the extreme values of `obase`.
2104
2105length
2106
2107:   Tests the `length()` builtin function.
2108
2109scale
2110
2111:   Tests the `scale()` builtin function.
2112
2113shift
2114
2115:   Tests the left (`<<`) and right (`>>`) shift operators.
2116
2117add
2118
2119:   Tests addition.
2120
2121subtract
2122
2123:   Tests subtraction.
2124
2125multiply
2126
2127:   Tests multiplication.
2128
2129divide
2130
2131:   Tests division.
2132
2133modulus
2134
2135:   Tests modulus.
2136
2137power
2138
2139:   Tests power (exponentiation).
2140
2141sqrt
2142
2143:   Tests the `sqrt()` (square root) builtin function.
2144
2145trunc
2146
2147:   Tests the truncation (`$`) operator.
2148
2149places
2150
2151:   Tests the places (`@`) operator.
2152
2153vars
2154
2155:   Tests some usage of variables. This one came from [AFL++][125] I think.
2156
2157boolean
2158
2159:   Tests boolean operators.
2160
2161comp
2162
2163:   Tests comparison operators.
2164
2165abs
2166
2167:   Tests the `abs()` builtin function.
2168
2169assignments
2170
2171:   Tests assignment operators, including increment/decrement operators.
2172
2173functions
2174
2175:   Tests functions, specifically function parameters being replaced before they
2176    themselves are used. See the comment in `bc_program_call()` about the last
2177    condition.
2178
2179scientific
2180
2181:   Tests scientific notation.
2182
2183engineering
2184
2185:   Tests engineering notation.
2186
2187globals
2188
2189:   Tests that assigning to globals affects callers.
2190
2191strings
2192
2193:   Tests strings.
2194
2195strings2
2196
2197:   Tests string allocation in slabs, to ensure slabs work.
2198
2199letters
2200
2201:   Tests single and double letter numbers to ensure they behave differently.
2202    Single-letter numbers always be set to the same value, regardless of
2203    `ibase`.
2204
2205exponent
2206
2207:   Tests the `e()` function in the math library.
2208
2209log
2210
2211:   Tests the `l()` function in the math library.
2212
2213pi
2214
2215:   Tests that `bc` produces the right value of pi for numbers with varying
2216    `scale` values.
2217
2218arctangent
2219
2220:   Tests the `a()` function in the math library.
2221
2222sine
2223
2224:   Tests the `s()` function in the math library.
2225
2226cosine
2227
2228:   Tests the `c()` function in the math library.
2229
2230bessel
2231
2232:   Tests the `j()` function in the math library.
2233
2234fib
2235
2236:   Tests the `fib()` Fibonacci function in the extended math library.
2237
2238arrays
2239
2240:   Test arrays.
2241
2242misc
2243
2244:   Miscellaneous tests. I named it this because at the time, I struggled to
2245    classify them, but it's really testing multi-line numbers.
2246
2247misc1
2248
2249:   A miscellaneous test found by [AFL++][125].
2250
2251misc2
2252
2253:   A miscellaneous test found by [AFL++][125].
2254
2255misc3
2256
2257:   A miscellaneous test found by [AFL++][125].
2258
2259misc4
2260
2261:   A miscellaneous test found by [AFL++][125].
2262
2263misc5
2264
2265:   A miscellaneous test found by [AFL++][125].
2266
2267misc6
2268
2269:   A miscellaneous test found by [AFL++][125].
2270
2271misc7
2272
2273:   A miscellaneous test found by [AFL++][125].
2274
2275void
2276
2277:   Tests void functions.
2278
2279rand
2280
2281:   Tests the pseudo-random number generator and its special stack handling.
2282
2283rand_limits
2284
2285:   Tests the limits of the pseudo-random number generator `irand()`.
2286
2287recursive_arrays
2288
2289:   Tested the slab vector undo ability in used in `bc_parse_name()` when it
2290    existed. Now used as a stress test.
2291
2292divmod
2293
2294:   Tests divmod.
2295
2296modexp
2297
2298:   Tests modular exponentiation.
2299
2300bitfuncs
2301
2302:   Tests the bitwise functions, `band()`, `bor()`, `bxor()`, `blshift()` and
2303    `brshift()` in [`gen/lib2.bc`][26].
2304
2305leadingzero
2306
2307:   Tests the leading zero functionality and the `plz*()` and `pnlz*()`
2308    functions in [`gen/lib2.bc`][26].
2309
2310is_number
2311
2312:   Tests the `is_number()` built-in function.
2313
2314is_string
2315
2316:   Tests the `is_number()` built-in function.
2317
2318asciify_array
2319
2320:   Tests the ability of `asciify()` to convert an array into a full string.
2321
2322line_by_line1
2323
2324:   Tests the line-by-line behavior of `bc` with regards to `quit` in a function
2325    definition.
2326
2327line_by_line2
2328
2329:   Tests the line-by-line behavior of `bc` with regards to `quit`.
2330
2331line_loop_quit1
2332
2333:   Tests the behavior of `bc` with a `quit` after a single-line loop.
2334
2335line_loop_quit2
2336
2337:   Tests the behavior of `bc` with a `quit` after a single-line loop and a
2338    newline escape.
2339
2340#### `dc` Standard Tests
2341
2342The list of current (27 February 2023) standard tests for `dc` is below:
2343
2344decimal
2345
2346:   Tests decimal parsing and printing.
2347
2348length
2349
2350:   Tests the `length()` builtin function, including for strings and arrays.
2351
2352stack_len
2353
2354:   Tests taking the length of the results stack.
2355
2356exec_stack_len
2357
2358:   Tests taking the length of the execution stack.
2359
2360add
2361
2362:   Tests addition.
2363
2364subtract
2365
2366:   Tests subtraction.
2367
2368multiply
2369
2370:   Tests multiplication.
2371
2372divide
2373
2374:   Tests division.
2375
2376modulus
2377
2378:   Tests modulus.
2379
2380divmod
2381
2382:   Tests divmod.
2383
2384power
2385
2386:   Tests power (exponentiation).
2387
2388sqrt
2389
2390:   Tests the `sqrt()` (square root) builtin function.
2391
2392modexp
2393
2394:   Tests modular exponentiation.
2395
2396boolean
2397
2398:   Tests boolean operators.
2399
2400negate
2401
2402:   Tests negation as a command and as part of numbers.
2403
2404trunc
2405
2406:   Tests the truncation (`$`) operator.
2407
2408places
2409
2410:   Tests the places (`@`) operator.
2411
2412shift
2413
2414:   Tests the left (`<<`) and right (`>>`) shift operators.
2415
2416abs
2417
2418:   Tests the `abs()` builtin function (the `b` command).
2419
2420scientific
2421
2422:   Tests scientific notation.
2423
2424engineering
2425
2426:   Tests engineering notation.
2427
2428vars
2429
2430:   Tests some usage of variables. This one came from [AFL++][125] I think.
2431
2432misc
2433
2434:   Miscellaneous tests. I named it this because at the time, I struggled to
2435    classify them.
2436
2437misc1
2438
2439:   More miscellaneous tests. This used to be an error file
2440    (`tests/dc/errors/15.txt`) due to the presence of a invalid `u` character.
2441    However, starting with version `6.1.0`, `u` became a valid command
2442    (`is_number()`), so the error file became valid. It was checked manually and
2443    moved, and `tests/dc/errors/34.txt` became the new `tests/dc/errors/15.txt`.
2444
2445strings
2446
2447:   Tests strings.
2448
2449rand
2450
2451:   Tests the pseudo-random number generator and its special stack handling.
2452
2453is_number
2454
2455:   Tests the `is_number()` built-in function (the `u` command).
2456
2457is_string
2458
2459:   Tests the `is_number()` built-in function (the `t` command).
2460
2461### Script Tests
2462
2463The heavy lifting of testing the scripting of `bc` is done by the "script tests"
2464for each calculator.
2465
2466These tests use the files in the [`tests/bc/scripts/`][169] and
2467[`tests/dc/scripts/`][170] directories (except for
2468[`tests/bc/scripts/all.txt`][171] and [`tests/dc/scripts/all.txt`][172]), which
2469are called the "script test directories."
2470
2471To add a script test, do the following:
2472
2473* Add the test file (`<test>.bc` or `<test>.dc` in the script tests directory).
2474* Add the results file (`<test>.txt` in the script tests directory). You can
2475  skip this step if just the results file needs to be generated. See the
2476  [Generated Tests][147] section for more information.
2477* Add the name of the test to the `all.txt` file in the script tests directory,
2478  putting it in the order it should be in. If possible, I would put longer tests
2479  near the beginning because they will start running earlier with parallel
2480  `make`.
2481
2482Some script tests need to be skipped in certain cases. That is handled by the
2483[build system][142]. See the [Integration with the Build System][147] section
2484for more details.
2485
2486Another unique thing about the script tests, at least for `bc`: they test the
2487`-g` and `--global-stacks` flags. This means that all of the script tests for
2488`bc` are written assuming the `-g` flag was given on the command-line
2489
2490There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed`
2491script is used to remove an incompatibility with GNU `bc`.
2492
2493If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU
2494`bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is
2495correct according to my reading of the `bc` spec, so my `bc` doesn't as well.
2496
2497The `sed` script edits numbers that end with just one digit on a line by itself
2498to put it on the same line as others.
2499
2500#### `bc` Script Tests
2501
2502The list of current (27 February 2023) script tests for `bc` is below:
2503
2504print.bc
2505
2506:   Tests printing even harder than the print standard test.
2507
2508print2
2509
2510:   Tests printing at the extreme edge of line length in various bases.
2511
2512multiply.bc
2513
2514:   Tests multiplication even harder than the multiply standard test.
2515
2516divide.bc
2517
2518:   Tests division even harder than the divide standard test.
2519
2520subtract.bc
2521
2522:   Tests subtraction even harder than the subtract standard test.
2523
2524add.bc
2525
2526:   Tests addition even harder than the add standard test.
2527
2528parse.bc
2529
2530:   Tests parsing even harder than the parse standard test.
2531
2532root.bc
2533
2534:   Tests that `root()` and `cbrt()` do not go into an infinite loop on a
2535    pathological case found by a user.
2536
2537array.bc
2538
2539:   Tests arrays even harder than the arrays standard test.
2540
2541array2.bc
2542
2543:   Implements a test where an array element is passed as a parameter to a
2544    function, and then another array is passed to a later parameter that is
2545    named the same as the first array. This was added because of a bug found
2546    while writing a script in bc.
2547
2548atan.bc
2549
2550:   Tests arctangent even harder than the arctangent standard test.
2551
2552bessel.bc
2553
2554:   Tests bessel even harder than the bessel standard test.
2555
2556functions.bc
2557
2558:   Tests functions even harder than the functions standard test.
2559
2560globals.bc
2561
2562:   Tests global stacks directly.
2563
2564len.bc
2565
2566:   Tests the `length()` builtin on arrays.
2567
2568rand.bc
2569
2570:   Tests the random number generator in the presence of global stacks.
2571
2572references.bc
2573
2574:   Tests functions with array reference parameters.
2575
2576screen.bc
2577
2578:   A random script provided by an early user that he used to calculate the size
2579    of computer screens
2580
2581strings2.bc
2582
2583:   Tests escaping in strings.
2584
2585ifs.bc
2586
2587:   Tests proper ending of `if` statements without `else` statements.
2588
2589ifs2.bc
2590
2591:   More tests proper ending of `if` statements without `else` statements.
2592
2593#### `dc` Script Tests
2594
2595The list of current (27 February 2023) script tests for `dc` is below:
2596
2597prime.dc
2598
2599:   Tests scripting by generating the first 100,000 primes.
2600
2601asciify.dc
2602
2603:   Tests the asciify command.
2604
2605stream.dc
2606
2607:   Tests the stream command.
2608
2609array.dc
2610
2611:   Tests arrays.
2612
2613else.dc
2614
2615:   Tests else clauses on conditional execution commands.
2616
2617factorial.dc
2618
2619:   Tests scripting with factorial.
2620
2621loop.dc
2622
2623:   Tests scripting by implementing loops.
2624
2625quit.dc
2626
2627:   Tests the quit command in the presence of tail calls.
2628
2629weird.dc
2630
2631:   A miscellaneous test.
2632
2633no_clamp.dc
2634
2635:   A test to ensure `dc` has the same behavior as the BSD `dc` with digi
2636    clamping off when parsing numbers.
2637
2638### Error Tests
2639
2640One of the most useful parts of the `bc` test suite, in my opinion, is the heavy
2641testing of error conditions.
2642
2643Just about every error condition I can think of is tested, along with many
2644machine-generated (by [AFL++][125]) ones.
2645
2646However, because the error tests will often return error codes, they require
2647different infrastructure from the rest of the test suite, which assumes that
2648the calculator under test will return successfully. A lot of that infrastructure
2649is in the [`scripts/functions.sh`][105] script, but it basically allows the
2650calculator to exit with an error code and then tests that there *was* an error
2651code.
2652
2653Besides returning error codes, error tests also ensure that there is output from
2654`stderr`. This is to make sure that an error message is always printed.
2655
2656The error tests for each calculator are spread through two directories, due to
2657historical accident. These two directories are the standard test directory (see
2658the [Standard Tests][149] section) and the `errors/` directory directly
2659underneath the standard tests directory.
2660
2661This split is convenient, however, because the tests in each directory are
2662treated differently.
2663
2664The error tests in the standard test directory, which include `errors.txt` for
2665both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for both,
2666are run by [`tests/errors.sh`][226]. It reads them line-by-line and shoves the
2667data through `stdin`. Each line is considered a separate test. For this reason,
2668there can't be any blank lines in the error files in the standard tests
2669directory because a blank line causes a successful exit.
2670
2671On the other hand, the tests in the `errors/` directory below the standard tests
2672directory are run by [`tests/error.sh`][227] and are considered to be one test
2673per file. As such, they are used differently. They are shoved into the
2674calculator through `stdin`, but they are also executed by passing them on the
2675command-line.
2676
2677To add an error test, first figure out which kind you want.
2678
2679Is it a simple one-liner, and you don't care if it's tested through a file?
2680
2681Then put it in one of the error files in the standard test directory. I would
2682only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()`
2683errors in the `read_errors.txt` file for `dc`; all others I would put in the
2684respective `errors.txt` file.
2685
2686On the other hand, if you care if the error is run as a file on the
2687command-line, or the error requires multiple lines to reproduce, then put the
2688test in the respective `errors/` directory and run the [`configure.sh`][69]
2689script again.
2690
2691After that, you are done; the test suite will automatically pick up the new
2692test, and you don't have to tell the test suite the expected results.
2693
2694### `stdin` Tests
2695
2696The `stdin` tests specifically test the lexing and parsing of multi-line
2697comments and strings. This is important because when reading from `stdin`, the
2698calculators can only read one line at a time, so partial parses are possible.
2699
2700To add `stdin` tests, just add the tests to the `stdin.txt` file in the
2701respective standard tests directory, and add the expected results in the
2702`stdin_results.txt` in the respective standard tests directory.
2703
2704### `read()` Tests
2705
2706The `read()` tests are meant to test the `read()` builtin function, to ensure
2707that the parsing and execution is correct.
2708
2709Each line is one test, as that is the nature of using the `read()` function, so
2710to add a test, just add it as another line in the `read.txt` file in the
2711respective standard tests directory, and add its result to the
2712`read_results.txt` file in the respective standard tests directory.
2713
2714### Other Tests
2715
2716The "other" tests are just random tests that I could not easily classify under
2717other types of tests. They usually include things like command-line parsing and
2718environment variable testing.
2719
2720To add an other test, it requires adding the programming for it to
2721[`tests/other.sh`][195] because all of the tests are written specifically in
2722that script. It would be best to use the infrastructure in
2723[`scripts/functions.sh`][105].
2724
2725### Linux `timeconst.bc` Script
2726
2727One special script that `bc`'s test suite will use is the [Linux `timeconst.bc`
2728script][6].
2729
2730I made the test suite able to use this script because the reason the
2731[toybox][16] maintainer wanted my `bc` is because of this script, and I wanted
2732to be sure that it would run correctly on the script.
2733
2734However, it is not part of the distribution, nor is it part of the repository.
2735The reason for this is because [`timeconst.bc`][6] is under the GPL, while this
2736repo is under a BSD license.
2737
2738If you want `bc` to run tests on [`timeconst.bc`][6], download it and place it
2739at `tests/bc/scripts/timeconst.bc`. If it is there, the test suite will
2740automatically run its tests; otherwise, it will skip it.
2741
2742### History Tests
2743
2744There are automatic tests for history; however, they have dependencies: Python 3
2745and [`pexpect`][137].
2746
2747As a result, because I need the [test suite to be portable][138], like the rest
2748of `bc`, the history tests are carefully guarded with things to ensure that they
2749are skipped, rather than failing if Python and [`pexpect`][137] are not
2750installed. For this reason, there is a `sh` script, [`tests/history.sh`][140]
2751that runs the actual script, [`tests/history.py`][139].
2752
2753I have added as many tests as I could to cover as many lines and branches as
2754possible. I guess I could have done more, but doing so would have required a lot
2755of time.
2756
2757I have tried to make it as easy as possible to run the history tests. They will
2758run automatically if you use the `make test_history` command, and they will also
2759use parallel execution with `make -j<cores> test_history`.
2760
2761However, the history tests are meant only to be run by maintainers of `bc`; they
2762are *not* meant to be run by users and packagers. The reason for this is that
2763they only seem to work reliably on Linux; `pexpect` seems to have issues on
2764other platforms, especially timeout issues.
2765
2766Thus, they are excluded from running with `make test` and [`tests/all.sh`][225].
2767However, they can be run from the [`scripts/release.sh`][83] script.
2768
2769All of the tests are contained in [`tests/history.py`][139]. The reason for this
2770is because they are in Python, and I don't have an easy way of including Python
2771(or at the very least, I am not familiar enough with Python to do that). So they
2772are all in the same file to make it easier on me.
2773
2774Each test is one function in the script. They all take the same number and type
2775of arguments:
2776
27771.	`exe`: the executable to run.
27782.	`args`: the arguments to pass to the executable.
27793.	`env`: the environment.
2780
2781Each function creates a child process with `pexpect.spawn` and then tests with
2782that child. Then the function returns the child to the caller, who closes it
2783and checks its error code against its expected error code.
2784
2785Yes, the error code is not a success all the time. This is because of the UTF-8
2786tests; `bc` gives a fatal error on any non-ASCII data because ASCII is all `bc`
2787is required to handle, per the [standard][2].
2788
2789So in [`tests/history.py`][139], there are four main arrays:
2790
2791* `bc` test functions,
2792* `bc` expected error codes.
2793* `dc` test functions.
2794* `dc` expected error codes.
2795
2796[`tests/history.py`][139] takes an index as an argument; that index is what test
2797it should run. That index is used to index into the proper test and error code
2798array.
2799
2800If you need to add more history tests, you need to do the following:
2801
28021.	Add the function for that test to [`tests/history.py`][139].
28032.	Add the function to the proper array of tests.
28043.	Add the expected error code to the proper array of error codes.
28054.	Add a target for the test to [`Makefile.in`][70].
28065.	Add that target as a prerequisite to either `test_bc_history` or
2807	`test_dc_history`.
2808
2809You do not need to do anything to add the test to `history_all_tests` (see
2810[Group Tests][141] below) because the scripts will automatically run all of the
2811tests properly.
2812
2813### Generated Tests
2814
2815Some tests are *large*, and as such, it is impractical to check them into `git`.
2816Instead, the tests depend on the existence of a GNU-compatible `bc` in the
2817`PATH`, which is then used to generate the tests.
2818
2819If [`configure.sh`][69] was run with the `-G` argument, which disables generated
2820tests, then `make test` and friends will automatically skip generated tests.
2821This is useful to do on platforms that are not guaranteed to have a
2822GNU-compatible `bc` installed.
2823
2824However, adding a generated test is a complicated because you have to figure out
2825*where* you want to put the file to generate the test.
2826
2827For example, `bc`'s test suite will automatically use a GNU-compatible `bc` to
2828generate a `<test>_results.txt` file in the [standard tests][149] directory
2829(either `tests/bc/` or `tests/dc/`) if none exists for the `<test>` test. If no
2830`<test>.txt` file exists in the [standard tests][149] directory, then `bc`'s
2831test suite will look for a `<test>.bc` or `<test>.dc` file in the [script
2832tests][150] directory (either `tests/bc/scripts` or `tests/dc/scripts`), and if
2833that exists, it will use that script to generate the `<test>.txt` file in the
2834[standard tests][149] directory after which it will generate the
2835`<test>_results.txt` file in the [standard tests][149] directory.
2836
2837So you can choose to either:
2838
2839* Have a test in the [standard tests][149] directory without a corresponding
2840  `*_results.txt` file, or
2841* Have a script in the [script tests][150] directory to generate the
2842  corresponding file in the standard test directory before generating the
2843  corresponding `*_results.txt` file.
2844
2845Adding a script has a double benefit: the script itself can be used as a test.
2846However, script test results can also be generated.
2847
2848If `bc` is asked to run a script test, then if the script does not exist, `bc`'s
2849test suite returns an error. If it *does* exist, but no corresponding
2850`<test>.txt` file exists in the [script tests][150] directory, then a
2851GNU-compatible `bc` is used to generate the `<test>.txt` results file.
2852
2853If generated tests are disabled through [`configure.sh`][69], then these tests
2854are not generated if they do not exist. However, if they *do* exist, then they
2855are run. This can happen if a `make clean_tests` was not run between a build
2856that generated tests and a build that will not.
2857
2858### Group Tests
2859
2860While the test suite has a lot of targets in order to get parallel execution,
2861there are five targets that allow you to run each section, or all, of the test
2862suite as one unit:
2863
2864* `bc_all_tests` (`bc` tests)
2865* `timeconst_all_tests` ([Linux `timeconst.bc` script][6] tests)
2866* `dc_all_tests` (`dc` tests)
2867* `history_all_tests` (history tests)
2868* `run_all_tests` (combination of the previous four)
2869
2870In addition, there are more fine-grained targets available:
2871
2872* `test_bc` runs all `bc` tests (except history tests).
2873* `test_dc` runs all `dc` tests (except history tests).
2874* `test_bc_tests` runs all `bc` [standard tests][149].
2875* `test_dc_tests` runs all `dc` [standard tests][149].
2876* `test_bc_scripts` runs all `bc` [script tests][150].
2877* `test_dc_scripts` runs all `dc` [script tests][150].
2878* `test_bc_stdin` runs the `bc` [`stdin` tests][152].
2879* `test_dc_stdin` runs the `dc` [`stdin` tests][152].
2880* `test_bc_read` runs the `bc` [`read()` tests][153].
2881* `test_dc_read` runs the `dc` [`read()` tests][153].
2882* `test_bc_errors` runs the `bc` [error tests][151].
2883* `test_dc_errors` runs the `dc` [error tests][151].
2884* `test_bc_other` runs the `bc` [other tests][151].
2885* `test_dc_other` runs the `dc` [other tests][151].
2886* `timeconst` runs the tests for the [Linux `timeconst.bc` script][6].
2887* `test_history` runs all history tests.
2888* `test_bc_history` runs all `bc` history tests.
2889* `test_dc_history` runs all `dc` history tests.
2890
2891All of the above tests are parallelizable.
2892
2893### Individual Tests
2894
2895In addition to all of the above, individual test targets are available. These
2896are mostly useful for attempting to fix a singular test failure.
2897
2898These tests are:
2899
2900* `test_bc_<test>`, where `<test>` is the name of a `bc` [standard test][149].
2901  The name is the name of the test file without the `.txt` extension. It is the
2902  name printed by the test suite when running the test.
2903* `test_dc_<test>`, where `<test>` is the name of a `dc` [standard test][149].
2904  The name is the name of the test file without the `.txt` extension. It is the
2905  name printed by the test suite when running the test.
2906* `test_bc_script_<test>`, where `<test>` is the name of a `bc` [script
2907  test][150]. The name of the test is the name of the script without the `.bc`
2908  extension.
2909* `test_dc_script_<test>`, where `<test>` is the name of a `dc` [script
2910  test][150]. The name of the test is the name of the script without the `.dc`
2911  extension.
2912* `test_bc_history<idx>` runs the `bc` history test with index `<idx>`.
2913* `test_dc_history<idx>` runs the `dc` history test with index `<idx>`.
2914
2915### [`bcl`][156] Test
2916
2917When [`bcl`][156] is built, the [build system][142] automatically ensures that
2918`make test` runs the [`bcl`][156] test instead of the `bc` and `dc` tests.
2919
2920There is only one test, and it is built from [`tests/bcl.c`][158].
2921
2922The reason the test is in C is because [`bcl`][156] is a C library; I did not
2923want to have to write C code *and* POSIX `sh` scripts to run it.
2924
2925The reason there is only one test is because most of the code for the library is
2926tested by virtue of testing `bc` and `dc`; the test needs to only ensure that
2927the library bindings and plumbing do not interfere with the underlying code.
2928
2929However, just because there is only one test does not mean that it doesn't test
2930more than one thing. The code actually handles a series of tests, along with
2931error checking to ensure that nothing went wrong.
2932
2933To add a [`bcl`][156] test, just figure out what test you want, figure out where
2934in the [`tests/bcl.c`][158] would be best to put it, and put it there. Do as
2935much error checking as possible, and use the `err(BclError)` function. Ensure
2936that all memory is freed because that test is run through [Valgrind][159] and
2937[AddressSanitizer][160].
2938
2939### Integration with the Build System
2940
2941If it was not obvious by now, the test suite is heavily integrated into the
2942[build system][142], but the integration goes further than just making the test
2943suite easy to run from `make` and generating individual and group tests.
2944
2945The big problem the test suite has is that some `bc` code, stuff that is
2946important to test, is only in *some* builds. This includes all of the extra math
2947extensions, for example.
2948
2949So the test suite needs to have some way of turning off the tests that depend on
2950certain [build types][81] when those [build types][81] are not used.
2951
2952This is the reason the is tightly integrated with the [build system][142]: the
2953[build system][142] knows what [build type][81] was used and can tell the test
2954suite to turn off the tests that do not apply.
2955
2956It does this with arguments to the test scripts that are either a `1` or a `0`,
2957depending on whether tests of that type should be enabled or not. These
2958arguments are why I suggest, in the [Test Scripts][148] section, to always use a
2959`make` target to run the test suite or any individual test. I have added a lot
2960of targets to make this easy and as fast as possible.
2961
2962In addition to all of that, the build system is responsible for selecting the
2963`bc`/`dc` tests or the [`bcl` test][157].
2964
2965### Output Directories
2966
2967During any run of the test suite, the test suite outputs the results of running
2968various tests to files. These files are usually output to `tests/bc_outputs/`
2969and `tests/dc_outputs/`.
2970
2971However, in some cases, it may be necessary to output test results to a
2972different directory. If that is the case, set the environment variable
2973`BC_TEST_OUTPUT_DIR` to the name of the directory.
2974
2975If that is done, then test results will be written to
2976`$BC_TEST_OUTPUT_DIR/bc_outputs/` and `$BC_TEST_OUTPUT_DIR/dc_outputs/`.
2977
2978### Test Suite Portability
2979
2980The test suite is meant to be run by users and packagers as part of their
2981install process.
2982
2983This puts some constraints on the test suite, but the biggest is that the test
2984suite must be as [portable as `bc` itself][136].
2985
2986This means that the test suite must be implemented in pure POSIX `make`, `sh`,
2987and C99.
2988
2989#### Test Scripts
2990
2991To accomplish the portability, the test suite is run by a bunch of `sh` scripts
2992that have the constraints laid out in [POSIX Shell Scripts][76].
2993
2994However, that means they have some quirks, made worse by the fact that there are
2995[generated tests][143] and [tests that need to be skipped, but only
2996sometimes][147].
2997
2998This means that a lot of the scripts take an awkward number and type of
2999arguments. Some arguments are strings, but most are integers, like
3000[`scripts/release.sh`][83].
3001
3002It is for this reason that I do not suggest running the test scripts directly.
3003Instead, always use an appropriate `make` target, which already knows the
3004correct arguments for the test because of the [integration with the build
3005system][147].
3006
3007### Test Coverage
3008
3009In order to get test coverage information, you need `gcc`, `gcov`, and `gcovr`.
3010
3011If you have them, run the following commands:
3012
3013```
3014CC=gcc ./configure -gO3 -c
3015make -j<cores>
3016make coverage
3017```
3018
3019Note that `make coverage` does not have a `-j<cores>` part; it cannot be run in
3020parallel. If you try, you will get errors. And note that `CC=gcc` is used.
3021
3022After running those commands, you can open your web browser and open the
3023`index.html` file in the root directory of the repo. From there, you can explore
3024all of the coverage results.
3025
3026If you see lines or branches that you think you could hit with a manual
3027execution, do such manual execution, and then run the following command:
3028
3029```
3030make coverage_output
3031```
3032
3033and the coverage output will be updated.
3034
3035If you want to rerun `make coverage`, you must do a `make clean` and build
3036first, like this:
3037
3038```
3039make clean
3040make -j<cores>
3041make coverage
3042```
3043
3044Otherwise, you will get errors.
3045
3046If you want to run tests in parallel, you can do this:
3047
3048```
3049make -j<cores>
3050make -j<cores> test
3051make coverage_output
3052```
3053
3054and that will generate coverage output correctly.
3055
3056### [AddressSanitizer][21] and Friends
3057
3058To run the test suite under [AddressSanitizer][21] or any of its friends, use
3059the following commands:
3060
3061```
3062CFLAGS="-fsanitize=<sanitizer> ./configure -gO3 -m
3063make -j<cores>
3064make -j<cores> test
3065```
3066
3067where `<sanitizer>` is the correct name of the desired sanitizer. There is one
3068exception to the above: `UndefinedBehaviorSanitizer` should be run on a build
3069that has zero optimization, so for `UBSan`, use the following commands:
3070
3071```
3072CFLAGS="-fsanitize=undefined" ./configure -gO0 -m
3073make -j<cores>
3074make -j<cores> test
3075```
3076
3077### [Valgrind][20]
3078
3079To run the test suite under [Valgrind][20], run the following commands:
3080
3081```
3082./configure -gO3 -v
3083make -j<cores>
3084make -j<cores> test
3085```
3086
3087It really is that easy. I have directly added infrastructure to the build system
3088and the test suite to ensure that if [Valgrind][20] detects any memory errors or
3089any memory leaks at all, it will tell the test suite infrastructure to report an
3090error and exit accordingly.
3091
3092## POSIX Shell Scripts
3093
3094There is a lot of shell scripts in this repository, and every single one of them
3095is written in pure [POSIX `sh`][72].
3096
3097The reason that they are written in [POSIX `sh`][72] is for *portability*: POSIX
3098systems are only guaranteed to have a barebones implementation of `sh`
3099available.
3100
3101There are *many* snares for unwary programmers attempting to modify
3102[`configure.sh`][69], any of the scripts in this directory, [`strgen.sh`][9], or
3103any of the scripts in [`tests/`][77]. Here are some of them:
3104
31051.	No `bash`-isms.
31062.	Only POSIX standard utilities are allowed.
31073.	Only command-line options defined in the POSIX standard for POSIX utilities
3108	are allowed.
31094.	Only the standardized behavior of POSIX utilities is allowed.
31105.	Functions return data by *printing* it. Using `return` sets their exit code.
3111
3112In other words, the script must only use what is standardized in the [`sh`][72]
3113and [Shell Command Language][73] standards in POSIX. This is *hard*. It precludes
3114things like `local` and the `[[ ]]` notation.
3115
3116These are *enormous* restrictions and must be tested properly. I put out at
3117least one release with a change to `configure.sh` that wasn't portable. That was
3118an embarrassing mistake.
3119
3120The lack of `local`, by the way, is why variables in functions are named with
3121the form:
3122
3123```
3124_<function_name>_<var_name>
3125```
3126
3127This is done to prevent any clashes of variable names with already existing
3128names. And this applies to *all* shell scripts. However, there are a few times
3129when that naming convention is *not* used; all of them are because those
3130functions are required to change variables in the global scope.
3131
3132### Maintainer-Only Scripts
3133
3134If a script is meant to be used for maintainers (of `bc`, not package
3135maintainers), then rules 2, 3, and 4 don't need to be followed as much because
3136it is assumed that maintainers will be able to install whatever tools are
3137necessary to do the job.
3138
3139## Manuals
3140
3141The manuals for `bc` and `dc` are all generated, and the manpages for `bc`,
3142`dc`, and `bcl` are also generated.
3143
3144Why?
3145
3146I don't like the format of manpages, and I am not confident in my ability to
3147write them. Also, they are not easy to read on the web.
3148
3149So that explains why `bcl`'s manpage is generated from its markdown version. But
3150why are the markdown versions of the `bc` and `dc` generated?
3151
3152Because the content of the manuals needs to change based on the [build
3153type][81]. For example, if `bc` was built with no history support, it should not
3154have the **COMMAND LINE HISTORY** section in its manual. If it did, that would
3155just confuse users.
3156
3157So the markdown manuals for `bc` and `dc` are generated from templates
3158([`manuals/bc.1.md.in`][89] and [`manuals/dc.1.md.in`][90]). And from there,
3159the manpages are generated from the generated manuals.
3160
3161The generated manpage for `bcl` ([`manuals/bcl.3`][62]) is checked into version
3162control, and the generated markdown manuals and manpages for `bc`
3163([`manuals/bc`][79]) and `dc` ([`manuals/dc`][80]) are as well.
3164
3165This is because generating the manuals and manpages requires a heavy dependency
3166that only maintainers should care about: [Pandoc][92]. Because users [should not
3167have to install *any* dependencies][136], the files are generated, checked into
3168version control, and included in distribution tarballs.
3169
3170If you run [`configure.sh`][69], you have an easy way of generating the markdown
3171manuals and manpages: just run `make manpages`. This target calls
3172[`scripts/manpage.sh`][60] appropriately for `bc`, `dc`, and `bcl`.
3173
3174For more on how generating manuals and manpages works, see
3175[`scripts/manpage.sh`][60].
3176
3177## Locales
3178
3179The locale system of `bc` is enormously complex, but that's because
3180POSIX-compatible locales are terrible.
3181
3182How are they terrible?
3183
3184First, `gencat` does not work for generating cross-compilation. In other words,
3185it does not generate machine-portable files. There's nothing I can do about
3186this except for warn users.
3187
3188Second, the format of `.msg` files is...interesting. Thank goodness it is text
3189because otherwise, it would be impossible to get them right.
3190
3191Third, `.msg` files are not used. In other words, `gencat` exists. Why?
3192
3193Fourth, `$NLSPATH` is an awful way to set where and *how* to install locales.
3194
3195Yes, where and *how*.
3196
3197Obviously, from it's name, it's a path, and that's the where. The *how* is more
3198complicated.
3199
3200It's actually *not* a path, but a path template. It's a format string, and it
3201can have a few format specifiers. For more information on that, see [this
3202link][84]. But in essence, those format specifiers configure how each locale is
3203supposed to be installed.
3204
3205With all those problems, why use POSIX locales? Portability, as always. I can't
3206assume that `gettext` will be available, but I *can* pretty well assume that
3207POSIX locales will be available.
3208
3209The locale system of `bc` includes all files under [`locales/`][85],
3210[`scripts/locale_install.sh`][87], [`scripts/locale_uninstall.sh`][88],
3211[`scripts/functions.sh`][105], the `bc_err_*` constants in [`src/data.c`][131],
3212and the parts of the build system needed to activate it. There is also code in
3213[`src/vm.c`][58] (in `bc_vm_gettext()`) for loading the current locale.
3214
3215If the order of error messages and/or categories are changed, the order of
3216errors must be changed in the enum, the default error messages and categories in
3217[`src/data.c`][131], and all of the messages and categories in the `.msg` files
3218under [`locales/`][85].
3219
3220## Static Analysis
3221
3222I do *some* static analysis on `bc`.
3223
3224I used to use [Coverity][196], but I stopped using it when it started giving me
3225too many false positives and also because it had a vulnerability.
3226
3227However, I still use the [Clang Static Analyzer][197] through
3228[`scan-build`][19]. I only use it in debug mode because I have to add some
3229special code to make it not complain about things that are definitely not a
3230problem.
3231
3232The most frequent example of false positives is where a local is passed to a
3233function to be initialized. [`scan-build`][19] misses that fact, so I
3234pre-initialize such locals to prevent the warnings.
3235
3236To run `scan-build`, do the following:
3237
3238```
3239make clean
3240scan-build make
3241```
3242
3243`scan-build` will print its warnings to `stdout`.
3244
3245## Fuzzing
3246
3247The quality of this `bc` is directly related to the amount of fuzzing I did. As
3248such, I spent a lot of work making the fuzzing convenient and fast, though I do
3249admit that it took me a long time to admit that it did need to be faster.
3250
3251First, there were several things which make fuzzing fast:
3252
3253* Using [AFL++][125]'s deferred initialization.
3254* Splitting `bc`'s corpuses.
3255* Parallel fuzzing.
3256
3257Second, there are several things which make fuzzing convenient:
3258
3259* Preprepared input corpuses.
3260* [`scripts/fuzz_prep.sh`][119].
3261* `tmux` and `tmuxp` configs.
3262* [`scripts/afl.py`][94].
3263
3264### Fuzzing Performance
3265
3266Fuzzing with [AFL++][125] can be ***SLOW***. Spending the time to make it as
3267fast as possible is well worth the time.
3268
3269However, there is a caveat to the above: it is easy to make [AFL++][125] crash,
3270be unstable, or be unable to find "paths" (see [AFL++ Quickstart][129]) if the
3271performance enhancements are done poorly.
3272
3273To stop [AFL++][125] from crashing on test cases, and to be stable, these are
3274the requirements:
3275
3276* The state at startup must be *exactly* the same.
3277* The virtual memory setup at startup must be *exactly* the same.
3278
3279The first isn't too hard; it's the second that is difficult.
3280
3281`bc` allocates a lot of memory at start. ("A lot" is relative; it's far less
3282than most programs.) After going through an execution run, however, some of that
3283memory, while it could be cleared and reset, is in different places because of
3284vectors. Since vectors reallocate, their allocations are not guaranteed to be in
3285the same place.
3286
3287So to make all three work, I had to set up the deferred initialization and
3288persistent mode *before* any memory was allocated (except for `vm.jmp_bufs`,
3289which is probably what caused the stability to drop below 100%). However, using
3290deferred alone let me put the [AFL++][125] initialization further back. This
3291works because [AFL++][125] sets up a `fork()` server that `fork()`'s `bc` right
3292at that call. Thus, every run has the exact same virtual memory setup, and each
3293run can skip all of the setup code.
3294
3295I tested `bc` using [AFL++][125]'s deferred initialization, plus persistent
3296mode, plus shared memory fuzzing. In order to do it safely, with stability above
329799%, all of that was actually *slower* than using just deferred initialization
3298with the initialization *right before* `stdin` was read. And as a bonus, the
3299stability in that situation is 100%.
3300
3301As a result, my [AFL++][125] setup only uses deferred initialization. That's the
3302`__AFL_INIT()` call.
3303
3304(Note: there is one more big item that must be done in order to have 100%
3305stability: the pseudo-random number generator *must* start with *exactly* the
3306same seed for every run. This is set up with the `tmux` and `tmuxp` configs that
3307I talk about below in [Convenience][130]. This seed is set before the
3308`__AFL_INIT()` call, so setting it has no runtime cost for each run, but without
3309it, stability would be abysmal.)
3310
3311On top of that, while `dc` is plenty fast under fuzzing (because of a faster
3312parser and less test cases), `bc` can be slow. So I have split the `bc` input
3313corpus into three parts, and I set fuzzers to run on each individually. This
3314means that they will duplicate work, but they will also find more stuff.
3315
3316On top of all of that, each input corpus (the three `bc` corpuses and the one
3317`dc` corpus) is set to run with 4 fuzzers. That works out perfectly for two
3318reasons: first, my machine has 16 cores, and second, the [AFL++][125] docs
3319recommend 4 parallel fuzzers, at least, to run different "power schedules."
3320
3321### Convenience
3322
3323The preprepared input corpuses are contained in the
3324`tests/fuzzing/bc_inputs{1,2,3}/`, and `tests/fuzzing/dc_inputs` directories.
3325There are three `bc` directories and only one `dc` directory because `bc`'s
3326input corpuses are about three times as large, and `bc` is a larger program;
3327it's going to need much more fuzzing.
3328
3329(They do share code though, so fuzzing all of them still tests a lot of the same
3330math code.)
3331
3332The next feature of convenience is the [`scripts/fuzz_prep.sh`][119] script. It
3333assumes the existence of `afl-clang-lto` in the `$PATH`, but if that exists, it
3334automatically configures and builds `bc` with a fuzz-ideal build.
3335
3336A fuzz-ideal build has several things:
3337
3338* `afl-clang-lto` as the compiler. (See [AFL++ Quickstart][129].)
3339* Debug mode, to crash as easily as possible.
3340* Full optimization (including [Link-Time Optimization][126]), for performance.
3341* [AFL++][125]'s deferred initialization (see [Fuzzing Performance][127] above).
3342* And `AFL_HARDEN=1` during the build to harden the build. See the [AFL++][125]
3343  documentation for more information.
3344
3345There is one big thing that a fuzz-ideal build does *not* have: it does not use
3346[AFL++][125]'s `libdislocator.so`. This is because `libdislocator.so` crashes if
3347it fails to allocate memory. I do not want to consider those as crashes because
3348my `bc` does, in fact, handle them gracefully by exiting with a set error code.
3349So `libdislocator.so` is not an option.
3350
3351However, to add to [`scripts/fuzz_prep.sh`][119] making a fuzz-ideal build, in
3352`tests/fuzzing/`, there are two `yaml` files: [`tests/fuzzing/bc_afl.yaml`][120]
3353and [`tests/fuzzing/bc_afl_continue.yaml`][121]. These files are meant to be
3354used with [`tmux`][122] and [`tmuxp`][123]. While other programmers will have to
3355adjust the `start_directory` item, once it is adjusted, then using this command:
3356
3357```
3358tmuxp load tests/fuzzing/bc_afl.yaml
3359```
3360
3361will start fuzzing.
3362
3363In other words, to start fuzzing, the sequence is:
3364
3365```
3366./scripts/fuzz_prep.sh
3367tmuxp load tests/fuzzing/bc_afl.yaml
3368```
3369
3370Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on
3371`bc` and 4 on `dc`. The outputs will be put into the
3372`tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories.
3373
3374(Note that loading that config will also delete all unsaved [AFL++][125] output
3375from the output directories.)
3376
3377Sometimes, [AFL++][125] will report crashes when there are none. When crashes
3378are reported, I always run the following command:
3379
3380```
3381./scripts/afl.py <dir>
3382```
3383
3384where `dir` is one of `bc1`, `bc2`, `bc3`, or `dc`, depending on which of the
338516 instances reported the crash. If it was one of the first four (`bc11` through
3386`bc14`), I use `bc1`. If it was one of the second four (`bc21` through `bc24`, I
3387use `bc2`. If it was one of the third four (`bc31` through `bc34`, I use `bc3`.
3388And if it was `dc`, I use `dc`.
3389
3390The [`scripts/afl.py`][94] script will report whether [AFL++][125] correctly
3391reported a crash or not. If so, it will copy the crashing test case to
3392`.test.txt` and tell you whether it was from running it as a file or through
3393`stdin`.
3394
3395From there, I personally always investigate the crash and fix it. Then, when the
3396crash is fixed, I either move `.test.txt` to `tests/{bc,dc}/errors/<idx>.txt` as
3397an error test (if it produces an error) or I create a new
3398`tests/{bc,dc}/misc<idx>.txt` test for it and a corresponding results file. (See
3399[Test Suite][124] for more information about the test suite.) In either case,
3400`<idx>` is the next number for a file in that particular place. For example, if
3401the last file in `tests/{bc,dc}/errors/` is `tests/{bc,dc}/errors/18.txt`, I
3402move `.test.txt` to `tests/bc/error/19.txt`.
3403
3404Then I immediately run [`scripts/afl.py`][94] again to find the next crash
3405because often, [AFL++][125] found multiple test cases that trigger the same
3406crash. If it finds another, I repeat the process until it is happy.
3407
3408Once it *is* happy, I do the same `fuzz_prep.sh`, `tmuxp load` sequence and
3409restart fuzzing. Why do I restart instead of continuing? Because with the
3410changes, the test outputs could be stale and invalid.
3411
3412However, there *is* a case where I continue: if [`scripts/afl.py`][94] finds
3413that every crash reported by [AFL++][125] is invalid. If that's the case, I can
3414just continue with the command:
3415
3416```
3417tmuxp load tests/fuzzing/bc_afl_continue.yaml
3418```
3419
3420(Note: I admit that I usually run [`scripts/afl.py`][94] while the fuzzer is
3421still running, so often, I don't find a need to continue since there was no
3422stop. However, the capability is there, if needed.)
3423
3424In addition, my fuzzing setup, including the `tmux` and `tmuxp` configs,
3425automatically set up [AFL++][125] power schedules (see [Fuzzing
3426Performance][127] above). They also set up the parallel fuzzing such that there
3427is one fuzzer in each group of 4 that does deterministic fuzzing. It's always
3428the first one in each group.
3429
3430For more information about deterministic fuzzing, see the [AFL++][125]
3431documentation.
3432
3433### Corpuses
3434
3435I occasionally add to the input corpuses. These files come from new files in the
3436[Test Suite][124]. In fact, I use soft links when the files are the same.
3437
3438However, when I add new files to an input corpus, I sometimes reduce the size of
3439the file by removing some redundancies.
3440
3441And then, when adding to the `bc` corpuses, I try to add them evenly so that
3442each corpus will take about the same amount of time to get to a finished state.
3443
3444### [AFL++][125] Quickstart
3445
3446The way [AFL++][125] works is complicated.
3447
3448First, it is the one to invoke the compiler. It leverages the compiler to add
3449code to the binary to help it know when certain branches are taken.
3450
3451Then, when fuzzing, it uses that branch information to generate information
3452about the "path" that was taken through the binary.
3453
3454I don't know what AFL++ counts as a new path, but each new path is added to an
3455output corpus, and it is later used as a springboard to find new paths.
3456
3457This is what makes AFL++ so effective: it's not just blindly thrashing a binary;
3458it adapts to the binary by leveraging information about paths.
3459
3460### Fuzzing Runs
3461
3462For doing a fuzzing run, I expect about a week or two where my computer is
3463basically unusable, except for text editing and light web browsing.
3464
3465Yes, it can take two weeks for me to do a full fuzzing run, and that does *not*
3466include the time needed to find and fix crashes; it only counts the time on the
3467*last* run, the one that does not find any crashes. This means that the entire
3468process can take a month or more.
3469
3470What I use as an indicator that the fuzzing run is good enough is when the
3471number of "Pending" paths (see [AFL++ Quickstart][129] above) for all fuzzer
3472instances, except maybe the deterministic instances, is below 50. And even then,
3473I try to let deterministic instances get that far as well.
3474
3475You can see how many pending paths are left in the "path geometry" section of
3476the [AFL++][125] dashboard.
3477
3478Also, to make [AFL++][125] quit, you need to send it a `SIGINT`, either with
3479`Ctrl+c` or some other method. It will not quit until you tell it to.
3480
3481### Radamsa
3482
3483I rarely use [Radamsa][99] instead of [AFL++][125]. In fact, it's only happened
3484once.
3485
3486The reason I use [Radamsa][99] instead of [AFL++][125] is because it is easier
3487to use with varying command-line arguments, which was needed for testing `bc`'s
3488command-line expression parsing code, and [AFL++][125] is best when testing
3489input from `stdin`.
3490
3491[`scripts/radamsa.sh`][100] does also do fuzzing on the [AFL++][125] inputs, but
3492it's not as effective at that, so I don't really use it for that either.
3493
3494[`scripts/radamsa.sh`][100] and [Radamsa][99] were only really used once; I have
3495not had to touch the command-line expression parsing code since.
3496
3497### [AddressSanitizer][21] with Fuzzing
3498
3499One advantage of using [AFL++][125] is that it saves every test case that
3500generated a new path (see [AFL++ Quickstart][129] above), and it doesn't delete
3501them when the user makes it quit.
3502
3503Keeping them around is not a good idea, for several reasons:
3504
3505* They are frequently large.
3506* There are a lot of them.
3507* They go stale; after `bc` is changed, the generated paths may not be valid
3508  anymore.
3509
3510However, before they are deleted, they can definitely be leveraged for even
3511*more* bug squashing by running *all* of the paths through a build of `bc` with
3512[AddressSanitizer][21].
3513
3514This can easily be done with these four commands:
3515
3516```
3517./scripts/fuzz_prep.sh -a
3518./scripts/afl.py --asan bc1
3519./scripts/afl.py --asan bc2
3520./scripts/afl.py --asan bc3
3521./scripts/afl.py --asan dc
3522```
3523
3524(By the way, the last four commands could be run in separate terminals to do the
3525processing in parallel.)
3526
3527These commands build an [ASan][21]-enabled build of `bc` and `dc` and then they
3528run `bc` and `dc` on all of the found crashes and path output corpuses. This is
3529to check that no path or crash has found any memory errors, including memory
3530leaks.
3531
3532Because the output corpuses can contain test cases that generate infinite loops
3533in `bc` or `dc`, [`scripts/afl.py`][94] has a timeout of 8 seconds, which is far
3534greater than the timeout that [AFL++][125] uses and should be enough to catch
3535any crash.
3536
3537If [AFL++][125] fails to find crashes *and* [ASan][21] fails to find memory
3538errors on the outputs of [AFL++][125], that is an excellent indicator of very
3539few bugs in `bc`, and a release can be made with confidence.
3540
3541## Code Concepts
3542
3543This section is about concepts that, if understood, will make it easier to
3544understand the code as it is written.
3545
3546The concepts in this section are not found in a single source file, but they are
3547littered throughout the code. That's why I am writing them all down in a single
3548place.
3549
3550### POSIX Mode
3551
3552POSIX mode is `bc`-only.
3553
3554In fact, POSIX mode is two different modes: Standard Mode and Warning Mode.
3555These modes are designed to help users write POSIX-compatible `bc` scripts.
3556
3557#### Standard Mode
3558
3559Standard Mode is activated with the `-s` or `--standard` flags.
3560
3561In this mode, `bc` will error if any constructs are used that are not strictly
3562compatible with the [POSIX `bc` specification][2].
3563
3564#### Warning Mode
3565
3566Warning Mode is activated with the `-w` or `--warn` flags.
3567
3568In this mode, `bc` will issue warnings, but continue, if any constructs are used
3569that are not strictly compatible with the [POSIX `bc` specification][2].
3570
3571### Memory Management
3572
3573The memory management in `bc` is simple: everything is owned by one thing.
3574
3575If something is in a vector, it is owned by that vector.
3576
3577If something is contained in a struct, it is owned by that struct with one
3578exception: structs can be given pointers to other things, but only if those
3579other things will outlast the struct itself.
3580
3581As an example, the `BcParse` struct has a pointer to the one `BcProgram` in
3582`bc`. This is okay because the program is initialized first and deallocated
3583last.
3584
3585In other words, it's simple: if a field in a struct is a pointer, then unless
3586that pointer is directly allocated by the struct (like the vector array or the
3587number limb array), that struct does not own the item at that pointer.
3588Otherwise, the struct *does* own the item.
3589
3590### [Async-Signal-Safe][115] Signal Handling
3591
3592`bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but
3593`bc` is, by and large, CPU-bound. This has several consequences, but the biggest
3594is that there is no easy way to allow signals to interrupt it.
3595
3596This consequence is not obvious, but it comes from the fact that a lot of I/O
3597operations can be interrupted and return [`EINTR`][198]. This makes such I/O
3598calls natural places for allowing signals to interrupt execution, even when the
3599signal comes during execution, and not interrupting I/O calls. The way this is
3600done is setting a flag in the signal handler, which is checked around the time
3601of the I/O call, when it is convenient.
3602
3603Alternatively, I/O bound programs can use the [self-pipe trick][199].
3604
3605Neither of these are possible in `bc` because the execution of math code can
3606take a long time. If a signal arrives during this long execution time, setting a
3607flag like an I/O bound application and waiting until the next I/O call could
3608take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a
3609week to calculate a million digits of pi, and it's not slow as far as `bc`
3610implementations go.)
3611
3612Thus, using just the technique of setting the flag just will not work for an
3613interactive calculator.
3614
3615Well, it can, but it requires a lot of code and massive inefficiencies. I know
3616this because that was the original implementation.
3617
3618The original implementation set a flag and just exit the signal handler. Then,
3619on just about every loop header, I have a check for the signal flag. These
3620checks happened on every iteration of every loop. It was a massive waste because
3621it was polling, and [polling is evil][200].
3622
3623So for version [3.0.0][32], I expended a lot of effort to change the
3624implementation.
3625
3626In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`)
3627to tell the signal handler whether it can use `longjmp()` to stop the current
3628execution. If so, it does. If not, it sets a flag, which then is used by the
3629code outside the signal handler that set the `vm.sig_lock` flag. When that code
3630unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that
3631code executes the `longjmp()` and stops the current execution.
3632
3633Other than that, the rest of the interrupt-based implementation is best
3634described in the [Error Handling][97].
3635
3636However, there are rules for signal handlers that I must lay out.
3637
3638First, signal handlers can only call [async-signal-safe][115] functions.
3639
3640Second, any field set or read by both the signal handler and normal code must be
3641a `volatile sig_atomic_t`.
3642
3643Third, when setting such fields, they must be set to constants and no math can
3644be done on them. This restriction and the above restriction exist in order to
3645ensure that the setting of the fields is always atomic with respect to signals.
3646
3647These rules exist for *any* code using Unix signal handlers, not just `bc`.
3648
3649#### Vectors and Numbers
3650
3651Vectors and numbers needed special consideration with the interrupt-based signal
3652handling.
3653
3654When vectors and numbers are about to allocate, or *reallocate* their arrays,
3655they need to lock signals to ensure that they do not call `malloc()` and friends
3656and get interrupted by a signal because, as you will see in the [Error
3657Handling][97] section, `longjmp()` cannot be used in a signal handler if it may
3658be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and
3659friends.
3660
3661### Asserts
3662
3663If you asked me what procedure is used the most in `bc`, I would reply without
3664hesitation, "`assert()`."
3665
3666I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with
3667[AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes,
3668and a failing `assert()` counts as one.
3669
3670So while a lot of bad bugs might have corrupted data and *not* caused crashes,
3671because I put in so many `assert()`'s, they were *turned into* crashing bugs,
3672and [AFL++][125] found them.
3673
3674By far, the most bugs it found this way was in the `bc` parser. (See the [`bc`
3675Parsing][110] for more information.) And even though I was careful to put
3676`assert()`'s everywhere, most parser bugs manifested during execution of
3677bytecode because the virtual machine assumes the bytecode is valid.
3678
3679Sidenote: one of those bugs caused an infinite recursion when running the sine
3680(`s()`) function in the math library, so yes, parser bugs can be *very* weird.
3681
3682Anyway, the way I did `assert()`'s was like this: whenever I realized that I
3683had put assumptions into the code, I would put an `assert()` there to test it
3684**and** to *document* it.
3685
3686Yes, documentation. In fact, by far the best documentation of the code in `bc`
3687is actually the `assert()`'s. The only time I would not put an `assert()` to
3688test an assumption is if that assumption was already tested by an `assert()`
3689earlier.
3690
3691As an example, if a function calls another function and passes a pointer that
3692the caller previously `assert()`'ed was *not* `NULL`, then the callee does not
3693have to `assert()` it too, unless *also* called by another function that does
3694not `assert()` that.
3695
3696At first glance, it may seem like putting asserts for pointers being non-`NULL`
3697everywhere would actually be good, but unfortunately, not for fuzzing. Each
3698`assert()` is a branch, and [AFL++][125] rates its own effectiveness based on
3699how many branches it covers. If there are too many `assert()`'s, it may think
3700that it is not being effective and that more fuzzing is needed.
3701
3702This means that `assert()`'s show up most often in two places: function
3703preconditions and function postconditions.
3704
3705Function preconditions are `assert()`'s that test conditions relating to the
3706arguments a function was given. They appear at the top of the function, usually
3707before anything else (except maybe initializing a local variable).
3708
3709Function postconditions are `assert()`'s that test the return values or other
3710conditions when a function exits. These are at the bottom of a function or just
3711before a `return` statement.
3712
3713The other `assert()`'s cover various miscellaneous assumptions.
3714
3715If you change the code, I ***HIGHLY*** suggest that you use `assert()`'s to
3716document your assumptions. And don't remove them when [AFL++][125] gleefully
3717crashes `bc` and `dc` over and over again.
3718
3719### Vectors
3720
3721In `bc`, vectors mean resizable arrays, and they are the most fundamental piece
3722of code in the entire codebase.
3723
3724I had previously written a [vector implementation][112], which I used to guide
3725my decisions, but I wrote a new one so that `bc` would not have a dependency. I
3726also didn't make it as sophisticated; the one in `bc` is very simple.
3727
3728Vectors store some information about the type that they hold:
3729
3730* The size (as returned by `sizeof`).
3731* An enum designating the destructor.
3732
3733If the destructor is `BC_DTOR_NONE`, it is counted as the type not having a
3734destructor.
3735
3736But by storing the size, the vector can then allocate `size * cap` bytes, where
3737`cap` is the capacity. Then, when growing the vector, the `cap` is doubled again
3738and again until it is bigger than the requested size.
3739
3740But to store items, or to push items, or even to return items, the vector has to
3741figure out where they are, since to it, the array just looks like an array of
3742bytes.
3743
3744It does this by calculating a pointer to the underlying type with
3745`v + (i * size)`, where `v` is the array of bytes, `i` is the index of the
3746desired element, and `size` is the size of the underlying type.
3747
3748Doing that, vectors can avoid undefined behavior (because `char` pointers can
3749be cast to any other pointer type), while calculating the exact position of
3750every element.
3751
3752Because it can do that, it can figure out where to push new elements by
3753calculating `v + (len * size)`, where `len` is the number of items actually in
3754the vector.
3755
3756By the way, `len` is different from `cap`. While `cap` is the amount of storage
3757*available*, `len` is the number of actual elements in the vector at the present
3758point in time.
3759
3760Growing the vector happens when `len` is equal to `cap` *before* pushing new
3761items, not after.
3762
3763To add a destructor, you need to add an enum item to `BcDtorType` in
3764[`include/vector.h`][174] and add the actual destructor in the same place as the
3765enum item in the `bc_vec_dtors[]` array in [`src/data.c`][131].
3766
3767#### Pointer Invalidation
3768
3769There is one big danger with the vectors as currently implemented: pointer
3770invalidation.
3771
3772If a piece of code receives a pointer from a vector, then adds an item to the
3773vector before they finish using the pointer, that code must then update the
3774pointer from the vector again.
3775
3776This is because any pointer inside the vector is calculated based off of the
3777array in the vector, and when the vector grows, it can `realloc()` the array,
3778which may move it in memory. If that is done, any pointer returned by
3779`bc_vec_item()`, `bc_vec_top()` and `bc_vec_item_rev()` will be invalid.
3780
3781This fact was the single most common cause of crashes in the early days of this
3782`bc`; wherever I have put a comment about pointers becoming invalidated and
3783updating them with another call to `bc_vec_item()` and friends, *do **NOT**
3784remove that code!*
3785
3786#### Maps
3787
3788Maps in `bc` are...not.
3789
3790They are really a combination of two vectors. Those combinations are easily
3791recognized in the source because one vector is named `<name>s` (plural), and the
3792other is named `<name>_map`.
3793
3794There are currently three, all in `BcProgram`:
3795
3796* `fns` and `fn_map` (`bc` functions).
3797* `vars` and `var_map` (variables).
3798* `arrs` and `arr_map` (arrays).
3799
3800They work like this: the `<name>_map` vector holds `BcId`'s, which just holds a
3801string and an index. The string is the name of the item, and the index is the
3802index of that item in the `<name>s` vector.
3803
3804Obviously, I could have just done a linear search for items in the `<name>s`
3805vector, but that would be slow with a lot of functions/variables/arrays.
3806Instead, I ensure that whenever an item is inserted into the `<name>_map`
3807vector, the item is inserted in sorted order. This means that the `<name>_map`
3808is always sorted (by the names of the items).
3809
3810So when looking up an item in the "map", what is really done is this:
3811
38121.	A binary search is carried out on the names in the `<name>_map` vector.
38132.	When one is found, it returns the index in the `<name>_map` vector where the
3814	item was found.
38153.	This index is then used to retrieve the `BcId`.
38164.	The index from the `BcId` is then used to index into the `<name>s` vector,
3817	which returns the *actual* desired item.
3818
3819Why were the `<name>s` and `<name>_map` vectors not combined for ease? The
3820answer is that sometime, when attempting to insert into the "map", code might
3821find that something is already there. For example, a function with that name may
3822already exist, or the variable might already exist.
3823
3824If the insert fails, then the name already exists, and the inserting code can
3825forego creating a new item to put into the vector. However, if there is no item,
3826the inserting code must create a new item and insert it.
3827
3828If the two vectors were combined together, it would not be possible to separate
3829the steps such that creating a new item could be avoided if it already exists.
3830
3831#### Slabs and Slab Vectors
3832
3833`bc` allocates *a lot* of small strings, and small allocations are the toughest
3834for general-purpose allocators to handle efficiently.
3835
3836Because of that reason, I decided to create a system for allocating small
3837strings using something that I call a "slab vector" after [slab
3838allocators][201].
3839
3840These vectors allocate what I call "slabs," which are just an allocation of a
3841single page with a length to tell the slab how much of the slab is used.
3842
3843The vector itself holds slabs, and when the slab vector is asked to allocate a
3844string, it attempts to in the last slab. If that slab cannot do so, it allocates
3845a new slab and allocates from that.
3846
3847There is one exception: if a string is going to be bigger than 128 bytes, then
3848the string is directly allocated, and a slab is created with that pointer and a
3849length of `SIZE_MAX`, which tells the slab vector that it is a direct
3850allocation. Then, the last slab is pushed into the next spot and the new special
3851slab is put into the vacated spot. This ensures that a non-special slab is
3852always last.
3853
3854### Command-Line History
3855
3856When I first wrote `bc`, I immediately started using it in order to eat my own
3857dog food.
3858
3859It sucked, and the biggest reason why was because of the lack of command-line
3860history.
3861
3862At first, I just dealt with it, not knowing how command-line history might be
3863implemented.
3864
3865Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had
3866known about for some time.
3867
3868It turned out to be easier than I thought; the hardest part was the tedious
3869renaming of everything to fit the `bc` naming scheme.
3870
3871Understanding command-line history in `bc` is really about understanding VT-100
3872escape codes, so I would start there.
3873
3874Now, the history implementation of `bc` has been adapted far beyond that initial
3875adaptation to make the command-line history implementation perfect for `bc`
3876alone, including integrating it into `bc`'s [Custom I/O][114] and making sure
3877that it does not disturb output that did not end with a newline.
3878
3879On top of that, at one point, I attempted to get history to work on Windows. It
3880barely worked after a lot of work and a lot of portability code, but even with
3881all of that, it does not have at least one feature: multi-line pasting from the
3882clipboard.
3883
3884#### Editline and Readline
3885
3886I also implemented an integration with both editline and readline, though not at
3887the same time. This allows distributions that want to use either one in `bc` to
3888do so. This enables users to use their `.editrc` and `.inputrc` settings.
3889
3890The integration is custom to each library, and it does *not* involve any of
3891`bc`'s custom history code. It also required changes to `bc`'s [Custom
3892I/O][114].
3893
3894### Error Handling
3895
3896The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it
3897became one of the things that taught me the most about C in particular and
3898programming in general.
3899
3900Before then, error handling was manual. Almost all functions returned a
3901`BcStatus` indicating if an error had occurred. This led to a proliferation of
3902lines like:
3903
3904```
3905if (BC_ERR(s)) return s;
3906```
3907
3908In fact, a quick and dirty count of such lines in version `2.7.2` (the last
3909version before [`3.0.0`][32]) turned up 252 occurrences of that sort of line.
3910
3911And that didn't even guarantee that return values were checked *everywhere*.
3912
3913But before I can continue, let me back up a bit.
3914
3915From the beginning, I decided that I would not do what GNU `bc` does on errors;
3916it tries to find a point at which it can recover. Instead, I decided that I
3917would have `bc` reset to a clean slate, which I believed, would reduce the
3918number of bugs where an unclean state caused errors with continuing execution.
3919
3920So from the beginning, errors would essentially unwind the stack until they got
3921to a safe place from which to clean the slate, reset, and ask for more input.
3922
3923Well, if that weren't enough, `bc` also has to handle [POSIX signals][113]. As
3924such, it had a signal handler that set a flag. But it could not safely interrupt
3925execution, so that's all it could do.
3926
3927In order to actually respond to the signal, I had to litter checks for the flag
3928*everywhere* in the code. And I mean *everywhere*. They had to be checked on
3929every iteration of *every* loop. They had to be checked going into and out of
3930certain functions.
3931
3932It was a mess.
3933
3934But fortunately for me, signals did the same thing that errors did: they unwound
3935the stack to the *same* place.
3936
3937Do you see where I am going with this?
3938
3939It turns out that what I needed was a [async-signal-safe][115] form of what
3940programmers call "exceptions" in other languages.
3941
3942I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement
3943exceptions, so I thought I would learn how to use them. How hard could it be?
3944
3945Quite hard, it turns out, especially in the presence of signals. And that's
3946because there are a few big snares:
3947
39481.	The value of any local variables are not guaranteed to be preserved after a
3949	`longjmp()` back into a function.
39502.	While `longjmp()` is required to be [async-signal-safe][115], if it is
3951	invoked by a signal handler that interrupted a non-[async-signal-safe][115]
3952	function, then the behavior is undefined.
39533.	Any mutation that is not guaranteed to be atomic with respect to signals may
3954	be incomplete when a signal arrives.
3955
3956Oh boy.
3957
3958For number 1, the answer to this is to hide data that must stay changed behind
3959pointers. Only the *pointers* are considered local, so as long as I didn't do
3960any modifying pointer arithmetic, pointers and their data would be safe. For
3961cases where I have local data that must change and stay changed, I needed to
3962*undo* the `setjmp()`, do the change, and the *redo* the `setjmp()`.
3963
3964For number 2 and number 3, `bc` needs some way to tell the signal handler that
3965it cannot do a `longjmp()`. This is done by "locking" signals with a `volatile
3966sig_atomic_t`. (For more information, see the [Async-Signal-Safe Signal
3967Handling][173] section.) For every function that calls a function that is not
3968async-signal-safe, they first need to use `BC_SIG_LOCK` to lock signals, and
3969afterward, use `BC_SIG_UNLOCK` to unlock signals.
3970
3971Code also need to do this for all global, non-atomic mutation, which means that
3972modifying any part of the `BcVm` global struct.
3973
3974`BC_SIG_UNLOCK` has another requirement: it must check for signals or errors and
3975jump if necessary.
3976
3977On top of all of that, *all* functions with cleanup needed to be able to run
3978their cleanup. This meant that `longjmp()` could not just jump to the finish; it
3979had to start what I call a "jump series," using a stack of `jmp_buf`'s
3980(`jmp_bufs` in `BcVm`). Each `longjmp()` uses the top of the `jmp_bufs` stack to
3981execute its jump. Then, if the cleanup code was executed because of a jump, the
3982cleanup code was responsible for continuing the jump series by popping the
3983previous item off the stack and using the new top of the stack for a jump.
3984
3985In this way, C++-style exceptions were implemented in pure C. Not fun, but it
3986works. However, the order of operations matters, especially in the macros that
3987help implement the error handling.
3988
3989For example, in `BC_UNSETJMP`, signals are unlocked before checking for signals.
3990If a signal comes between, that's fine; it will still cause a jump to the right
3991place. However, disabling the lock after could mean that a signal could come
3992*after* checking for signals, but before signals were unlocked, causing the
3993handling of the signal to be delayed.
3994
3995#### Custom I/O
3996
3997Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and
3998`longjmp()` for error handling (see the [Error Handling][97] section), and the
3999buffered I/O in `libc` does not interact well with the use of those procedures;
4000all of the buffered I/O API is basically non-[async-signal-safe][115].
4001
4002Implementing custom buffered I/O had other benefits. First, it allowed me to
4003tightly integrate history with the I/O code. Second, it allowed me to make
4004changes to history in order to make it adapt to user prompts.
4005
4006### Lexing
4007
4008To simplify parsing, both calculators use lexers to turn the text into a more
4009easily-parsable form.
4010
4011While some tokens are only one character long, others require many tokens, and
4012some of those need to store all of the text corresponding to the token for use
4013by the parsers. Tokens that need to store their corresponding text include, but
4014are not limited to:
4015
4016* Strings.
4017* Numbers.
4018* Identifiers.
4019
4020For this purpose, the lexer has a [vector][111] named `str` to store the data
4021for tokens. This data is overwritten if another token is lexed that needs to
4022store data, so the parsers need to copy the data before calling the lexer again.
4023
4024Both lexers do some of the same things:
4025
4026* Lex identifiers into tokens, storing the identifier in `str`.
4027* Lex number strings into tokens, storing the string in `str`.
4028* Lex whitespace.
4029* Lex comments.
4030
4031Other than that, and some common plumbing, the lexers have separate code.
4032
4033#### `dc` Lexing
4034
4035The `dc` lexer is remarkably simple; in fact, besides [`src/main.c`][205],
4036[`src/bc.c`][40], and [`src/dc.c`][44], which just contain one function each,
4037the only file smaller than [`src/dc_lex.c`][45] is [`src/args.c`][206], which
4038just processes command-line arguments after they are parsed by
4039[`src/opt.c`][51].
4040
4041For most characters, the `dc` lexer is able to convert directly from the
4042character to its corresponding token. This happens using `dc_lex_tokens[]` in
4043[`src/data.c`][131].
4044
4045`dc`'s lexer also has to lex the register name after lexing tokens for commands
4046that need registers.
4047
4048And finally, `dc`'s lexer needs to parse `dc` strings, which is the only part of
4049the `dc` lexer that is more complex than the `bc` lexer. This is because `dc`
4050strings need to have a balanced number of brackets.
4051
4052#### `bc` Lexing
4053
4054The `bc` lexer is fairly simple. It does the following things:
4055
4056* Lexes `bc` strings.
4057* Lexes `bc` identifiers. This is necessary because this is how `bc` keywords
4058  are lexed. After ensuring that an identifier is not a keyword, the `bc` lexer
4059  allows the common identifier function to take over.
4060* Turns characters and groups of characters into `bc` operator tokens.
4061
4062### Parsing
4063
4064The difference between parsing `bc` and `dc` code is...vast. The `dc` parser is
4065simple, while the `bc` parser is the most complex piece of code in the entire
4066codebase.
4067
4068However, they both do some of the same things.
4069
4070First, the parsers do *not* use [abstract syntax trees][207]; instead, they
4071directly generate the bytecode that will be executed by the `BcProgram` code.
4072Even in the case of `bc`, this heavily simplifies the parsing because the
4073[Shunting-Yard Algorithm][109] is designed to generate [Reverse Polish
4074Notation][108], which is basically directly executable.
4075
4076Second, any extra data that the `BcProgram` needs for execution is stored into
4077functions (see the [Functions][208] section). These include constants and
4078strings.
4079
4080#### `dc` Parsing
4081
4082The parser for `dc`, like its lexer, is remarkably simple. In fact, the easiness
4083of lexing and parsing [Reverse Polish notation][108] is probably why it was used
4084for `dc` when it was first created at Bell Labs.
4085
4086For most tokens, the `dc` parser is able to convert directly from the token
4087to its corresponding instruction. This happens using `dc_parse_insts[]` in
4088[`src/data.c`][131].
4089
4090`dc`'s parser also has to parse the register name for commands that need
4091registers. This is the most complex part of the `dc` parser; each different
4092register command needs to be parsed differently because most of them require two
4093or more instructions to execute properly.
4094
4095For example, storing in a register requires a swap instruction and an assignment
4096instruction.
4097
4098Another example are conditional execution instructions; they need to produce the
4099instruction for the condition, and then they must parse a possible "else" part,
4100which might not exist.
4101
4102##### Existing Commands
4103
4104`dc` is based on commands, which are usually one letter. The following table is
4105a table of which ASCII characters are already used:
4106
4107| Characters | Used? | For...                                     |
4108|------------|-------|--------------------------------------------|
4109| Space      | x     | Separator                                  |
4110| `!`        | x     | Conditional Execution of Registers         |
4111| `"`        | x     | Bounded Rand Operator                      |
4112| `#`        | x     | Comments                                   |
4113| `$`        | x     | Truncation                                 |
4114| `%`        | x     | Modulus                                    |
4115| `&`        |       |                                            |
4116| `'`        | x     | Rand Operator                              |
4117| `(`        | x     | Greater Than Operator                      |
4118| `)`        | x     | Less Than Operator                         |
4119| `*`        | x     | Multiplication                             |
4120| `+`        | x     | Addition                                   |
4121| `,`        | x     | Depth of Execution Stack                   |
4122| `-`        | x     | Subtraction                                |
4123| `.`        | x     | Numbers                                    |
4124| `/`        | x     | Division                                   |
4125| `0-9`      | x     | Numbers                                    |
4126| `:`        | x     | Store into Array                           |
4127| `;`        | x     | Load from Array                            |
4128| `<`        | x     | Conditional Execution of Registers         |
4129| `=`        | x     | Conditional Execution of Registers         |
4130| `>`        | x     | Conditional Execution of Registers         |
4131| `?`        | x     | Ask for User Input                         |
4132| `@`        | x     | Places Operator                            |
4133| `A-F`      | x     | Numbers                                    |
4134| `G`        | x     | Equal Operator                             |
4135| `H`        | x     | Shift Left                                 |
4136| `I`        | x     | Push `ibase` onto Stack                    |
4137| `J`        | x     | Push `seed` onto Stack                     |
4138| `K`        | x     | Push `scale` onto Stack                    |
4139| `L`        | x     | Pop off of Register                        |
4140| `M`        | x     | Boolean And Operator                       |
4141| `N`        | x     | Boolean Not Operator                       |
4142| `O`        | x     | Push `obase` onto Stack                    |
4143| `P`        | x     | Byte Stream Printing                       |
4144| `Q`        | x     | Quit Some Number of Macros                 |
4145| `R`        | x     | Pop Top of Stack                           |
4146| `S`        | x     | Push onto Register                         |
4147| `T`        | x     | Push Max `ibase` onto Stack                |
4148| `U`        | x     | Push Max `obase` onto Stack                |
4149| `V`        | x     | Push Max `scale` onto Stack                |
4150| `W`        | x     | Push Max of `'` Operator                   |
4151| `X`        | x     | Scale of a Number                          |
4152| `Y`        | x     | Length of Array                            |
4153| `Z`        | x     | Number of Significant Digits               |
4154| `[`        | x     | Strings                                    |
4155| `\\`       | x     | Escaping Brackets in Strings               |
4156| `]`        | x     | Strings                                    |
4157| `^`        | x     | Power                                      |
4158| `_`        | x     | Negative Numbers and Negation              |
4159| Backtick   |       |                                            |
4160| `a`        | x     | Asciify                                    |
4161| `b`        | x     | Absolute Value                             |
4162| `c`        | x     | Clear Stack                                |
4163| `d`        | x     | Duplication of Top of Stack                |
4164| `e`        | x     | Else in Conditional Execution of Registers |
4165| `f`        | x     | Printing the Stack                         |
4166| `g`        | x     | Global Settings                            |
4167| `h`        | x     | Shift Right                                |
4168| `i`        | x     | Set `ibase`                                |
4169| `j`        | x     | Set `seed`                                 |
4170| `k`        | x     | Set `scale`                                |
4171| `l`        | x     | Load from Register                         |
4172| `m`        | x     | Boolean Or Operator                        |
4173| `n`        | x     | Print and Pop                              |
4174| `o`        | x     | Set `obase`                                |
4175| `p`        | x     | Print with Newline                         |
4176| `q`        | x     | Quit Two Macros                            |
4177| `r`        | x     | Swap Top Two Items                         |
4178| `s`        | x     | Store into Register                        |
4179| `t`        | x     | Equivalent of `bc`'s `is_number()`         |
4180| `u`        | x     | Equivalent of `bc`'s `is_string()`         |
4181| `v`        | x     | Square Root                                |
4182| `w`        |       |                                            |
4183| `x`        | x     | Execute String                             |
4184| `y`        | x     | Current Depth of a Register                |
4185| `z`        | x     | Current Depth of Stack                     |
4186| `{`        | x     | Greater Than or Equal Operator             |
4187| `\|`       | x     | Moduler Exponentiation                     |
4188| `}`        | x     | Less Than or Equal Operator                |
4189| `~`        | x     | Division and Modulus Combined              |
4190
4191#### `bc` Parsing
4192
4193`bc`'s parser is, by far, the most sensitive piece of code in this software, and
4194there is a very big reason for that: `bc`'s standard is awful and defined a very
4195poor language.
4196
4197The standard says that either semicolons or newlines can end statements. Trying
4198to parse the end of a statement when it can either be a newline or a semicolon
4199is subtle. Doing it in the presence of control flow constructs that do not have
4200to use braces is even harder.
4201
4202And then comes the biggest complication of all: `bc` has to assume that it is
4203*always* at a REPL (Read-Eval-Print Loop). `bc` is, first and foremost, an
4204*interactive* utility.
4205
4206##### Flags
4207
4208All of this means that `bc` has to be able to partially parse something, store
4209enough data to recreate that state later, and return, making sure to not
4210execute anything in the meantime.
4211
4212*That* is what the flags in [`include/bc.h`][106] are: they are the state that
4213`bc` is saving for itself.
4214
4215It saves them in a stack, by the way, because it's possible to nest
4216structures, just like any other programming language. Thus, not only does it
4217have to store state, it needs to do it arbitrarily, and still be able to
4218come back to it.
4219
4220So `bc` stores its parser state with flags in a stack. Careful setting of these
4221flags, along with properly using them and maintaining the flag stack, are what
4222make `bc` parsing work, but it's complicated. In fact, as I mentioned, the `bc`
4223parser is the single most subtle, fickle, and sensitive piece of code in the
4224entire codebase. Only one thing came close once: square root, and that was only
4225sensitive because I wrote it wrong. This parser is pretty good, and it is
4226*still* sensitive. And flags are the reason why.
4227
4228For more information about what individual flags there are, see the comments in
4229[`include/bc.h`][106].
4230
4231##### Labels
4232
4233`bc`'s language is Turing-complete. That means that code needs the ability to
4234jump around, specifically to implement control flow like `if` statements and
4235loops.
4236
4237`bc` handles this while parsing with what I called "labels."
4238
4239Labels are markers in the bytecode. They are stored in functions alongside the
4240bytecode, and they are just indices into the bytecode.
4241
4242When the `bc` parser creates a label, it pushes an index onto the labels array,
4243and the index of the label in that array is the index that will be inserted into
4244the bytecode.
4245
4246Then, when a jump happens, the index pulled out of the bytecode is used to index
4247the labels array, and the label (index) at the index is then used to set the
4248instruction pointer.
4249
4250##### Cond Labels
4251
4252"Cond" labels are so-called because they are used by conditionals.
4253
4254The key to them is that they come *before* the code that uses them. In other
4255words, when jumping to a condition, code is jumping *backwards*.
4256
4257This means that when a cond label is created, the value that should go there is
4258well-known. Cond labels are easy.
4259
4260However, they are still stored on a stack so that the parser knows what cond
4261label to use.
4262
4263##### Exit Labels
4264
4265Exit labels are not so easy.
4266
4267"Exit" labels are so-called because they are used by code "exiting" out of `if`
4268statements or loops.
4269
4270The key to them is that they come *after* the code that uses them. In other
4271words, when jumping to an exit, code is jumping *forwards*.
4272
4273But this means that when an exit label is created, the value that should go
4274there is *not* known. The code that needs it must be parsed and generated first.
4275
4276That means that exit labels are created with the index of `SIZE_MAX`, which is
4277then specifically checked for with an assert in `bc_program_exec()` before using
4278those indices.
4279
4280There should ***NEVER*** be a case when an exit label is not filled in properly
4281if the parser has no bugs. This is because every `if` statement, every loop,
4282must have an exit, so the exit must be set. If not, there is a bug.
4283
4284Exit labels are also stored on a stack so that the parser knows what exit label
4285to use.
4286
4287##### Expression Parsing
4288
4289`bc` has expressions like you might expect in a typical programming language.
4290This means [infix notation][107].
4291
4292One thing about infix notation is that you can't just generate code straight
4293from it like you can with [Reverse Polish notation][108]. It requires more work
4294to shape it into a form that works for execution on a stack machine.
4295
4296That extra work is called the [Shunting-Yard algorithm][109], and the form it
4297translates infix notation into is...[Reverse Polish notation][108].
4298
4299In order to understand the rest of this section, you must understand the
4300[Shunting-Yard algorithm][109]. Go do that before you read on.
4301
4302###### Operator Stack
4303
4304In `bc`, the [Shunting-Yard algorithm][109] is implemented with bytecode as the
4305output and an explicit operator stack (the `ops` field in `BcParse`) as the
4306operator stack. It stores tokens from `BcLex`.
4307
4308However, there is one **HUGE** hangup: multiple expressions can stack. This
4309means that multiple expressions can be parsed at one time (think an array element
4310expression in the middle of a larger expression). Because of that, we need to
4311keep track of where the previous expression ended. That's what `start` parameter
4312to `bc_parse_operator()` is.
4313
4314Parsing multiple expressions on one operator stack only works because
4315expressions can only *stack*; this means that, if an expression begins before
4316another ends, it must *also* end before that other expression ends. This
4317property ensures that operators will never interfere with each other on the
4318operator stack.
4319
4320Also, the operator precedence is reversed: a lower precedence *value* means a
4321higher precedence.
4322
4323###### Recursion
4324
4325Because expressions can stack, parsing expressions actually requires recursion.
4326Well, it doesn't *require* it, but the code is much more readable that way.
4327
4328This recursion is indirect; the functions that `bc_parse_expr_err()` (the actual
4329expression parsing function) calls can, in turn, call it.
4330
4331###### Expression Flags
4332
4333There is one more big thing: not all expressions in `bc` are equal.
4334
4335Some expressions have requirements that others don't have. For example, only
4336array arguments can be arrays (which are technically not expressions, but are
4337treated as such for parsing), and some operators (in POSIX) are not allowed in
4338certain places.
4339
4340For this reason, functions that are part of the expression parsing
4341infrastructure in `bc`'s parser usually take a `flags` argument. This is meant
4342to be passed to children, and somewhere, they will be checked to ensure that the
4343resulting expression meets its requirements.
4344
4345There are also places where the flags are changed. This is because the
4346requirements change.
4347
4348Maintaining the integrity of the requirements flag set is an important part of
4349the `bc` parser. However, they do not have to be stored on a stack because their
4350stack is implicit from the recursion that expression parsing uses.
4351
4352### Functions
4353
4354Functions, in `bc`, are data structures that contain the bytecode and data
4355produced by the parsers. Functions are what the `BcProgram` program executes.
4356
4357#### Main and Read Functions
4358
4359There are two functions that always exist, which I call the "main" and "read"
4360functions.
4361
4362The "main" function is the function in which any code and data outside other
4363functions is put. Basically, it is the function where the scripting code ends
4364up.
4365
4366The "read" function is the function that is reset and parsed every time a call
4367to the `read()` builtin function happens.
4368
4369#### `dc` Strings
4370
4371In `dc`, strings can be executed, and since there are no actual "functions" in
4372`dc`, strings are handled as functions. In fact, they are effectively translated
4373into functions by parsing.
4374
4375##### Tail Calls
4376
4377Since strings in `dc` are functions, and the fact that `dc` has no native loops,
4378such loops are implemented in `dc` code using strings with conditional execution
4379commands at the end of strings.
4380
4381When such conditional execution, or even unconditional execution, commands are
4382the very last commands in a string, then `dc` can perform a [tail call][202].
4383
4384This is done by recording the fact that a tail call happened, done by
4385incrementing an integer on a stack. When a string is executed *without* a tail
4386call, a new entry is pushed onto the stack with the integer `1`.
4387
4388When a string finally quits that followed tail calls, its stack entry is popped,
4389eliminating all of those tail calls.
4390
4391Why perform tail calls? Because otherwise, `dc` would be subject to the same
4392thing that plagues [functional programming languages][203]: stack overflow. In
4393`dc`'s case, that would manifest itself as a growing [heap][204], because the
4394execution stack is stored on the heap, until a fatal allocation failure would
4395occur.
4396
4397#### Execution
4398
4399Execution is handled by an interpreter implemented using `BcProgram` and code
4400in [`src/program.c`][53].
4401
4402The interpreter is a mix between a [stack machine][210] and a [register
4403machine][211]. It is a stack machine in that operations happen on a stack I call
4404the "results stack," but it is a register machine in that items on the stack can
4405be stored to and loaded from "registers" (`dc` terminology), variables (`bc`
4406terminology), and arrays.
4407
4408##### Stacks
4409
4410There are two stacks in the interpreter:
4411
4412* The "results" stack (as mentioned above).
4413* The "execution" stack.
4414
4415The results stack (the `results` field of the `BcProgram` struct) is the stack
4416where the results of computations are stored. It is what makes the interpreter
4417part [stack machine][210]. It is filled with `BcResult`'s.
4418
4419The execution stack (the `stack` field of the `BcProgram` struct) is the stack
4420that tracks the current execution state of the interpreter. It is the presence
4421of this separate stack that allows the interpreter to implement the machine as a
4422loop, rather than recursively. It is filled with `BcInstPtr`'s, which are the
4423"instruction pointers."
4424
4425These instruction pointers have three fields, all integers:
4426
4427* `func`, the index of the function that is currently executing.
4428* `idx`, the index of the next bytecode instruction to execute in the function's
4429  bytecode array.
4430* `len`, which is the length of the results stack when the function started
4431  executing. This is not used by `dc`, but it used by `bc` because functions
4432  in `bc` should never affect the results stack of their callers.
4433
4434With these three fields, and always executing using the instruction pointer at
4435the top of the execution stack, the interpreter can always keep track of its
4436execution.
4437
4438When a function or a string starts executing, a new `BcInstPtr` is pushed onto
4439the execution stack for it. This includes if a function was called recursively.
4440And then, when the function or string returns, its `BcInstPtr` is popped off of
4441the execution stack.
4442
4443##### Bytecode
4444
4445Execution of functions are done through bytecode produced directly by the
4446parsers (see the [Parsing][209]). This bytecode is stored in the `code`
4447[vector][111] of the `BcFunc` struct.
4448
4449This is a vector for two reasons:
4450
4451* It makes it easier to add bytecode to the vector in the parsers.
4452* `bc` allows users to redefine functions.
4453
4454The reason I can use bytecode is because there are less than 256 instructions,
4455so an `unsigned char` can store all the bytecodes.
4456
4457###### Bytecode Indices
4458
4459There is one other factor to bytecode: there are instructions that need to
4460reference strings, constants, variables, or arrays. Bytecode need some way to
4461reference those things.
4462
4463Fortunately, all of those things can be referenced in the same way: with indices
4464because all of the items are in vectors.
4465
4466So `bc` has a way of encoding an index into bytecode. It does this by, after
4467pushing the instruction that references anything, pushing a byte set to the
4468length of the index in bytes, then the bytes of the index are pushed in
4469little-endian order.
4470
4471Then, when the interpreter encounters an instruction that needs one or more
4472items, it decodes the index or indices there and updates the `idx` field of the
4473current `BcInstPtr` to point to the byte after the index or indices.
4474
4475One more thing: the encoder of the indices only pushes as many bytes as
4476necessary to encode the index. It stops pushing when the index has no more bytes
4477with any 1 bits.
4478
4479##### Variables
4480
4481In `bc`, the vector of variables, `vars` in `BcProgram`, is not a vector of
4482numbers; it is a vector of vector of numbers. The first vector is the vector of
4483variables, the second is the variable stack, and the last level is the actual
4484number.
4485
4486This is because both `bc` and `dc` need variables to be stacks.
4487
4488For `dc`, registers are *defined* to be stacks.
4489
4490For `bc`, variables as stacks is how function arguments/parameters and function
4491`auto` variables are implemented.
4492
4493When a function is called, and a value needs to be used as a function argument,
4494a copy of the value is pushed onto the stack corresponding to the variable with
4495the same name as the function's parameter. For `auto` variables, a new number
4496set to zero is pushed onto each stack corresponding to the `auto` variables.
4497(Zero is used because the [`bc` spec][2] requires that `auto` variables are set
4498to zero.)
4499
4500It is in this way that the old value of the variable, which may not even be
4501related to the function parameter or `auto` variable, is preserved while the
4502variable is used as a function parameter or `auto` variable.
4503
4504When the function returns, of course, the stacks of the variables for the
4505parameters and `auto`'s will have their top item popped, restoring the old value
4506as it was before the function call.
4507
4508##### Arrays
4509
4510Like variables, arrays are also implemented as stacks. However, because they are
4511arrays, there is yet another level; the `arrs` field in `BcProgram` is a vector
4512of vectors of vectors of numbers. The first of the two levels is the vector of
4513arrays, the second the stack of for each array, the third the actual array, and
4514last the numbers in the array.
4515
4516`dc` has no need of this extra stack, but `bc` does because arrays can be
4517function parameters themselves.
4518
4519When arrays are used for function arguments, they are copied with a deep copy;
4520each item of the source vector is copied. This is because in `bc`, according to
4521the [`bc` spec][2], all function arguments are passed by value.
4522
4523However, array references are possible (see below).
4524
4525When arrays are used as `auto`'s, a new vector is pushed with one element; if
4526more elements are needed, the array is grown automatically, and new elements are
4527given the value of zero.
4528
4529In fact, if *any* array is accessed and does not have an element at that index,
4530the array is automatically grown to that size, and all new elements are given
4531the value zero. This behavior is guaranteed by the [`bc` spec][2].
4532
4533###### Array References
4534
4535Array references had to be implemented as vectors themselves because they must
4536be pushed on the vectors stacks, which, as seen above, expect vectors
4537themselves.
4538
4539So thus, references are implemented as vectors on the vector stacks. These
4540vectors are not vectors of vectors themselves; they are vectors of bytes; in
4541fact, the fact that they are byte vectors and not vector vectors is how a
4542reference vector is detected.
4543
4544These reference vectors always have the same two things pushed: a byte encoding
4545(the same way bytecode indices are) of the referenced vector's index in the
4546`arrs` vector, and a byte encoding of the referenced vectors index in the vector
4547stack.
4548
4549If an item in a referenced vector is needed, then the reference is dereferenced,
4550and the item is returned.
4551
4552If a reference vector is passed to a function that does *not* expect a
4553reference, the vector is dereferenced and a deep copy is done, in the same way
4554as vectors are copied for normal array function parameters.
4555
4556### Callbacks
4557
4558There are many places in `bc` and `dc` where function pointers are used:
4559
4560* To implement destructors in vectors. (See the [Vectors][111] section.)
4561* To select the correct lex and parse functions for `bc` and `dc`.
4562* To select the correct function to execute unary operators.
4563* To select the correct function to execute binary operators.
4564* To calculate the correct number size for binary operators.
4565* To print a "digit" of a number.
4566* To seed the pseudo-random number generator.
4567
4568And there might be more.
4569
4570In every case, they are used for reducing the amount of code. Instead of
4571`if`/`else` chains, such as:
4572
4573```
4574if (BC_IS_BC) {
4575	bc_parse_parse(vm.parse);
4576}
4577else {
4578	dc_parse_parse(vm.parse);
4579}
4580```
4581
4582The best example of this is `bc_num_binary()`. It is called by every binary
4583operator. It figures out if it needs to allocate space for a new `BcNum`. If so,
4584it allocates the space and then calls the function pointer to the *true*
4585operation.
4586
4587Doing it like that shrunk the code *immensely*. First, instead of every single
4588binary operator duplicating the allocation code, it only exists in one place.
4589Second, `bc_num_binary()` itself does not have a massive `if`/`else` chain or a
4590`switch` statement.
4591
4592But perhaps the most important use was for destructors in vectors.
4593
4594Most of the data structures in `bc` are stored in vectors. If I hadn't made
4595destructors available for vectors, then ensuring that `bc` had no memory leaks
4596would have been nigh impossible. As it is, I check `bc` for memory leaks every
4597release when I change the code, and I have not released `bc` after version
4598`1.0.0` with any memory leaks, as far as I can remember anyway.
4599
4600### Numbers
4601
4602In order to do arbitrary-precision math, as `bc` must do, there must be some way
4603of representing arbitrary-precision numbers. `BcNum` in [`include/num.h`][184]
4604is `bc`'s way of doing that.
4605
4606(Note: the word ["limb"][214] is used below; it has a specific meaning when
4607applied to arbitrary-precision numbers. It means one piece of the number. It can
4608have a single digit, which is what GNU `bc` does, or it can have multiple, which
4609is what this `bc` does.)
4610
4611This struct needs to store several things:
4612
4613* The array of limbs of the number. This is the `num` field.
4614* The location of the decimal point. This is the `rdx` (short for [radix][215])
4615  field.
4616* The number of limbs the number has. This is the `len` field.
4617* Whether the number is negative or not. This is the least significant bit of
4618  the `rdx` field. More on that later.
4619
4620In addition, `bc`'s number stores the capacity of the limb array; this is the
4621`cap` field.
4622
4623If the number needs to grow, and the capacity of the number is big enough, the
4624number is not reallocated; the number of limbs is just added to.
4625
4626There is one additional wrinkle: to make the usual operations (binary operators)
4627fast, the decimal point is *not* allowed to be in the middle of a limb; it must
4628always be between limbs, after all limbs (integer), or before all limbs (real
4629between -1 and 1).
4630
4631The reason for this is because addition, subtraction, multiplication, and
4632division expect digits to be lined up on the decimal point. By requiring that it
4633be between limbs, no extra alignment is needed, and those operations can proceed
4634without extra overhead.
4635
4636This does make some operations, most notably extending, truncating, and
4637shifting, more expensive, but the overhead is constant, and these operations are
4638usually cheap compared to the binary operators anyway.
4639
4640This also requires something else: `bc` numbers need to know *exactly* how many
4641decimal places they have after the decimal point. If the decimal point must be
4642inbetween limbs, the last decimal place could be in the middle of a limb. The
4643amount of decimal places in a number is carefully tracked and stored in the
4644`scale` field, and this number must always coincide with the `rdx` field by the
4645following formula:
4646
4647```
4648scale + (BC_BASE_DIGS - 1) / BC_BASE_DIGS == rdx >> 1
4649```
4650
4651(`BC_BASE_DIGS` is the number of decimal digits stored in one limb. It is 9 on
465264-bit systems and 4 on other systems.)
4653
4654Yes, `rdx` is shifted; that is because the negative bit is stored in the least
4655significant bit of the `rdx` field, and the actual radix (amount of limbs after
4656the decimal/radix point) is stored in the rest of the bits. This is safe because
4657`BC_BASE_DIGS` is always at least 4, which means `rdx` will always need at least
46582 bits less than `scale`.
4659
4660In addition to `rdx` always matching `scale`, another invariant is that `rdx`
4661must always be less than or equal to `len`. (Because `scale` may be greater than
4662`rdx`, `scale` does not have to be less than or equal to `len`.)
4663
4664Another invariant is that `len` must always be less than or equal to `cap`, for
4665obvious reasons.
4666
4667The last thing programmers need to know is that the limb array is stored in
4668little-endian order. This means that the last decimal places are in the limb
4669stored at index 0, and the most significant digits are stored at index `len-1`.
4670
4671This is done to make the most important operations fast. Addition and
4672subtraction are done from least significant to most significant limbs, which
4673means they can speed through memory in the way most computers are best at.
4674Multiplication does the same, sort of, and with division, it matters less.
4675Comparison does need to go backwards, but that's after exhausting all other
4676alternatives, including for example, checking the length of the integer portion
4677of each limb array.
4678
4679Finally, here are some possible special situations with numbers and what they
4680mean:
4681
4682* `len == 0`: the number equals 0.
4683* `len == 0 && scale != 0`: the number equals 0, but it has a `scale` value.
4684  This is the only case where `scale` does not have to coincide with `rdx`
4685  This can happen with division, for example, that sets a specific `scale` for
4686  the result value but may produce 0.
4687* `(rdx >> 1) == len`: the number is greater than or equal to 1, or less than or
4688  equal to -1.
4689* `(rdx >> 1) < len`: the number is greater than -1 and less than 1, not
4690  including 0, although this will be true for 0 as well. However, 0 is always
4691  assumed to be represented by `len == 0`.
4692* `(rdx >> 1) == 0`: the number is an integer. In this case, `scale` must also
4693  equal 0.
4694
4695#### Math Style
4696
4697When I wrote the math for `bc`, I adopted a certain style that, if known, will
4698make it easier to understand the code. The style follows these rules:
4699
4700* `BcNum` arguments always come before arguments of other types.
4701* Among the `BcNum` arguments, the operands always come first, and the `BcNum`
4702  where the result(s) will be stored come last.
4703* Error checking is placed first in the function.
4704* Easy cases are placed next.
4705* Preparation, such as allocating temporaries, comes next.
4706* The actual math.
4707* Cleanup and ensuring invariants.
4708
4709While these rules are not hard and fast, using them as a guide will probably
4710help.
4711
4712#### Digit Clamping
4713
4714There is a question of what to do when parsing numbers and coming across digits
4715that are greater than or equal to the current `ibase`. `bc` and `dc` do one of
4716two things:
4717
4718* Clamp the digit to the maximum correct digit, or
4719* Use the digit as-is, multiplying it by the `ibase` to the power of the digit's
4720  position (starting from 0 at the least significant digit).
4721
4722The former behavior is what GNU `bc` does, and the latter is what GNU `dc`, the
4723OpenBSD `bc`, and the OpenBSD `dc` do.
4724
4725It is possible to switch between the two with the `-c` and `-C` command-line
4726options. However, it is important for developers to understand that both
4727behaviors exist and should exist.
4728
4729The library can also do both, and it is set in a global for each thread.
4730
4731### Strings as Numbers
4732
4733Strings can be assigned to variables. This is a problem because the vectors for
4734variable stacks expect `BcNum` structs only.
4735
4736While I could have made a union, I decided that the complexity of adding an
4737entirely new type, with destructor and everything, was not worth it. Instead, I
4738took advantage of the fact that `free()`, when passed a `NULL` pointer, will do
4739nothing.
4740
4741Using that, I made it so `BcNum`'s could store strings instead. This is marked
4742by the `BcNum` having a `NULL` limb array (`num`) and a `cap` of 0 (which should
4743*never* happen with a real number, though the other fields could be 0).
4744
4745The `BcNum` stores the function that stores the string in the `rdx` field, and
4746it stores the index of the string in the `scale` field. This is used to actually
4747load the string if necessary.
4748
4749Note that historically, string information was stored in the `loc` field of
4750the `d` union in a `BcResult`. This was changed recently to standardize; now,
4751all string information are stored in the `n` field of the `d` union regardless.
4752This means that all string information is stored in `BcNum`'s. This removes
4753extra cases.
4754
4755Also, if a temp is made with a string, then the result type should still be
4756`BC_RESULT_STR`, not `BC_RESULT_TEMP`. This is to make it easier to do type
4757checks.
4758
4759### Pseudo-Random Number Generator
4760
4761In order to understand this section, I suggest you read the information in the
4762manpages about the pseudo-random number generator (PRNG) first; that will help
4763you understand the guarantees it has, which is important because this section
4764delves into implementation details.
4765
4766First, the PRNG I use is seeded; this is because most OS's have an excellent
4767cryptographically secure PRNG available via command-line, usually
4768`/dev/urandom`, but the only *seeded* PRNG available is usually `bash`'s
4769`$RANDOM`, which is essentially a wrapper around C's `rand()`.
4770
4771`rand()` is...bad. It is only guaranteed to return 15 bits of random data.
4772Obviously, getting good random data out of that would be hard with that alone,
4773but implementations also seem to be poor.
4774
4775On top of that, `bc` is an arbitrary-precision calculator; if I made it able to
4776generate random numbers, I could make it generate random numbers of any size,
4777and since it would be seeded, results would be reproducible, when wanted.
4778
4779So to get that, I needed a seeded PRNG with good characteristics. After scouring
4780the Internet, I decided on the [PCG PRNG][215], mostly because of [this blog
4781post][216]. Part of the reason was the behavior of the xoroshiro128+ author, who
4782hates on PCG and its author, but also because PCG seemed to do better when
4783tested by independent parties.
4784
4785After that decision, I faced a challenge: PCG requires 255 bits of seed: 128 for
4786the actual seed, and 127 for the "increment." (Melissa O'Neill, the PCG author,
4787likens the increment to selecting a codebook.)
4788
4789I could, of course, put the entire 255 bits into one massive arbitrary-precision
4790number; `bc` is good at that, after all. But that didn't sit right with me
4791because it would mean any seed selected by users would have the real portion
4792ignored, which is stupid in a program like `bc`.
4793
4794Instead, I decided to make the integer portion the increment (clamped down to
4795size), and the real portion the seed.
4796
4797In most cases, this would be a bad idea because you cannot, in general, know how
4798many decimal places you need to represent any number with `n` real digits in
4799base `b` in another base. However, there is an easy to how many decimal digits
4800after the decimal point it takes to represent reals of base 2 in base 10: the
4801power of two.
4802
4803It turns out that, for base 2 represented in base 10, the power of 2 is
4804*exactly* how many digits are necessary to represent *any* number `n/2^p`, where
4805`p` is the power of 2. This is because at every halving, the number of decimal
4806places increases by 1:
4807
4808```
48090.5
48100.25
48110.125
48120.0625
48130.03125
48140.015625
4815...
4816```
4817
4818This happens because because every decimal representation of `1/2^p` ends with a
48195 because 5 is half of 10. Because 5 is odd, half of it always ends with the
4820digits 25, where 2 is where the previous 5 was, and the new 5 is one decimal
4821place further right.
4822
4823So the algorithm to convert all 255 bits of the seed is as follows:
4824
48251.	Convert the increment to a `BcNum`.
48262.	Convert the seed to a `BcNum`.
48273.	Divide the seed by `2^128` with a `scale` of 128. (For 32-bit systems,
4828	substitute 64 bits for 128.)
48294.	Add the two numbers together.
4830
4831Likewise, the algorithm to convert from a user-supplied number to a seed is:
4832
48331.	Truncate a copy of the number.
48342.	Subtract the result from #1 from the original number. This gives the real
4835	portion of the number.
48363.	Clamp the result of #1 to 127 (or 63) bits. This is the increment.
48374.	Multiply the result of #2 by `2^128`.
48385.	Truncate the result of #4. This is the seed.
4839
4840#### Generating Arbitrary-Precision Numbers
4841
4842I wrote a function (`bc_rand_bounded()`) that will return unbiased results with
4843any bound below the max that PCG can generate.
4844
4845To generate an integer of arbitrary size using a bound, `bc` simply uses
4846`bc_rand_bounded()` to generate numbers with a bound `10^BC_BASE_DIGS` for as
4847many limbs as needed to satisfy the bigger bound.
4848
4849To generate numbers with arbitrary precision after the decimal point, `bc`
4850merely generates an arbitrary precision integer with the bound `10^p`, where `p`
4851is the desired number of decimal places, then divides in by `10^p` with a
4852`scale` of `p`.
4853
4854## Debug Code
4855
4856Besides building `bc` in debug mode with the `-g` flag to [`configure.sh`][69],
4857programmers can also add `-DBC_DEBUG_CODE=1` to the `CFLAGS`. This will enable
4858the inclusion of *a lot* of extra code to assist with debugging.
4859
4860For more information, see all of the code guarded by `#if BC_DEBUG_CODE` in the
4861[`include/`][212] directory and in the [`src/`][213] directory.
4862
4863Yes, all of the code is guarded by `#if` preprocessor statements; this is
4864because the code should *never* be in a release build, and by making programmers
4865add this manually (not even an option to [`configure.sh`][69]), it is easier to
4866ensure that never happens.
4867
4868However, that said, the extra debug code is useful; that was why I kept it in.
4869
4870## Performance
4871
4872While I have put in a lot of effort to make `bc` as fast as possible, there
4873might be some things users can do to speed it up without changing the code.
4874
4875First, users can probably use [profile-guided optimization][217] to optimize
4876even better, using the test suite to profile.
4877
4878Second, I included macros that might help branch placement and prediction:
4879
4880* `BC_ERR(e)`
4881* `BC_UNLIKELY(e)`
4882* `BC_NO_ERR(e)`
4883* `BC_LIKELY(e)`
4884
4885`BC_ERR` is the same as `BC_UNLIKELY`, and `BC_NO_ERR` is the same as
4886`BC_LIKELY`; I just added them to also document branches that lead to error
4887conditions or *away* from error conditions.
4888
4889Anyway, if `BC_LIKELY` and `BC_UNLIKELY` are not defined during compilation,
4890they expand to nothing but the argument they were given.
4891
4892They can, however, be defined to `__builtin_expect((e), 1)` and
4893`__builtin_expect((e), 0)`, respectively, on GCC and Clang for better branch
4894prediction and placement. (For more information about `__builtin_expect()` see
4895the [GCC documentation][218].)
4896
4897There might be other compilers that can take advantage of that, but I don't know
4898anything about that.
4899
4900Also, as stated in the [build manual][219], link-time optimization is excellent
4901at optimizing this `bc`. Use it.
4902
4903### Benchmarks
4904
4905To help programmers improve performance, I have built and assembled
4906infrastructure to make benchmarking easy.
4907
4908First, in order to easily run benchmarks, I created
4909[`scripts/benchmark.sh`][220].
4910
4911Second, I copied and adapted [`ministat.c`][223] [from FreeBSD][221], to make it
4912easier to judge whether the results are significant or not.
4913
4914Third, I made the `make` clean target `make clean_benchmarks`, to clean
4915`scripts/ministat` and the generated benchmark files.
4916
4917Fourth, I made it so [`scripts/benchmark.sh`][220] outputs the timing and memory
4918data in a format that is easy for `scripts/ministat` to digest.
4919
4920To add a benchmark, add a script in the right directory to generate the
4921benchmark. Yes, generate.
4922
4923All of the benchmarks are generated first, from `.bc` and `.dc` files in the
4924[`benchmarks/bc/`][91] and [`benchmarks/dc/`][224]. This is so that massive
4925amounts of data can be generated and then pushed through the calculators.
4926
4927If you need to benchmark `bc` or `dc` with simple loops, have the generator
4928files simply print the loop code.
4929
4930### Caching of Numbers
4931
4932In order to provide some performance boost, `bc` tries to reuse old `BcNum`'s
4933that have the default capacity (`BC_NUM_DEF_SIZE`).
4934
4935It does this by allowing `bc_num_free()` to put the limb array onto a
4936statically-allocated stack (it's just a global array with a set size). Then,
4937when a `BcNum` with the default capacity is needed, `bc_num_init()` asks if any
4938are available. If the answer is yes, the one on top of the stack is returned.
4939Otherwise, `NULL` is returned, and `bc_num_free()` knows it needs to `malloc()`
4940a new limb array.
4941
4942When the stack is filled, any numbers that `bc` attempts to put on it are just
4943freed.
4944
4945This setup saved a few percent in my testing for version [3.0.0][32], which is
4946when I added it.
4947
4948## `bcl`
4949
4950At the request of one of my biggest users, I spent the time to make a build mode
4951where the number and math code of `bc` could be wrapped into a library, which I
4952called `bcl`.
4953
4954This mode is exclusive; `bc` and `dc` themselves are *not* built when building
4955`bcl`.
4956
4957The only things in the `bc` math code that is not included is:
4958
4959* Printing newlines (clients do not care about `bc`'s line lenth restriction).
4960* `dc`'s stream print.
4961
4962Even the [pseudo-random number generator][179] is included, with extra support
4963for generating real numbers with it. (In `bc`, such support is in
4964[`lib2.bc`][26].)
4965
4966### Signal Handling
4967
4968`bcl` does not have the capability for signal handling (as of version `6.0.0`).
4969
4970### Threads
4971
4972It is possible to use `bcl` across multiple threads. However, `bcl` must be
4973initialized on each thread where it will be used.
4974
4975This is because `bcl` uses thread-specific data to store the "globals" for each
4976thread. This is to ensure that threads do not stomp on each other's numbers or
4977other data structures.
4978
4979### Contexts
4980
4981Contexts were an idea by the same user that requested `bcl`. They are meant to
4982make it so multiple clients in one program can keep their data separate from
4983each other.
4984
4985### Numbers
4986
4987Numbers in `bcl` are literally indices into an encapsulated array of numbers,
4988hidden in the context. These indices are then passed to clients to refer to
4989numbers later.
4990
4991### Operand Consumption
4992
4993Most math functions in `bcl` "consume" their operand arguments; the arguments
4994are freed, whether or not an error is returned.
4995
4996This is to make it easy to implement math code, like this:
4997
4998```
4999n = bcl_add(bcl_mul(a, b), bcl_div(c, d));
5000```
5001
5002If numbers need to be preserved, they can be with `bcl_dup()`:
5003
5004```
5005n = bcl_add(bcl_mul(bcl_dup(a), bc_dup(b)), bcl_div(bcl_dup(c), bcl_dup(d)));
5006```
5007
5008### Errors
5009
5010Errors can be encoded in the indices representing numbers, and where necessary,
5011clients are responsible for checking those errors.
5012
5013The encoding of errors is this: if an error happens, the value `0-error` is
5014returned. To decode, do the exact same thing. Thus, any index above
5015`0-num_errors` is an error.
5016
5017If an index that represents an error is passed to a math function, that function
5018propagates the error to its result and does not perform the math operation.
5019
5020All of this is to, once again, make it easy to implement the math code as above.
5021
5022However, where possible, errors are returned directly.
5023
5024[1]: https://en.wikipedia.org/wiki/Bus_factor
5025[2]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#top
5026[3]: https://en.wikipedia.org/wiki/Dc_(Unix)
5027[4]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5028[5]: ./bc/A.1.md#standard-library
5029[6]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc
5030[7]: ./bc/A.1.md#extended-library
5031[8]: #libbc-2
5032[9]: #strgensh
5033[10]: https://vimeo.com/230142234
5034[11]: https://gavinhoward.com/2019/12/values-for-yao/
5035[12]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf
5036[13]: ./build.md#cross-compiling
5037[14]: ./build.md
5038[15]: #strgenc
5039[16]: http://landley.net/toybox/about.html
5040[17]: https://www.busybox.net/
5041[18]: https://en.wikipedia.org/wiki/Karatsuba_algorithm
5042[19]: https://clang-analyzer.llvm.org/scan-build.html
5043[20]: https://www.valgrind.org/
5044[21]: https://clang.llvm.org/docs/AddressSanitizer.html
5045[22]: https://gavinhoward.com/2019/11/finishing-software/
5046[23]: #history
5047[24]: https://clang.llvm.org/docs/ClangFormat.html
5048[25]: ./algorithms.md
5049[26]: #lib2bc
5050[27]: #vmh
5051[28]: https://github.com/rain-1/linenoise-mob
5052[29]: https://github.com/antirez/linenoise
5053[30]: #bclh
5054[31]: #argsh
5055[32]: ../NEWS.md#3-0-0
5056[33]: ../NEWS.md
5057[34]: https://github.com/skeeto/optparse
5058[35]: #opth
5059[36]: #historyh
5060[37]: #randh
5061[38]: #langh
5062[39]: #numc
5063[40]: #bcc
5064[41]: #bc_lexc
5065[42]: #bc_parsec
5066[43]: #libraryc
5067[44]: #dcc
5068[45]: #dc_lexc
5069[46]: #dc_parsec
5070[47]: #filec
5071[48]: #historyc
5072[49]: #langc
5073[50]: #lexc
5074[51]: #optc
5075[52]: #parsec
5076[53]: #programc
5077[54]: #randc
5078[55]: #fileh
5079[56]: #readc
5080[57]: #programh
5081[58]: #vmc
5082[59]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/gencat.html#top
5083[60]: #manpagesh
5084[61]: #bcl3md
5085[62]: #bcl3
5086[63]: #bclvcxproj
5087[64]: #bclvcxprojfilters
5088[65]: #bclsln
5089[66]: #bcvcxproj
5090[67]: #bcvcxprojfilters
5091[68]: #bcsln
5092[69]: #configuresh
5093[70]: #makefilein
5094[71]: #functionsh
5095[72]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#top
5096[73]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18
5097[74]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html#top
5098[75]: #versionh
5099[76]: ##posix-shell-scripts
5100[77]: #tests
5101[78]: #karatsubapy
5102[79]: #bc-1
5103[80]: #dc-1
5104[81]: ./build.md#build-type
5105[82]: #fuzzing-1
5106[83]: #releasesh
5107[84]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
5108[85]: #locales-1
5109[86]: #manuals-1
5110[87]: #locale_installsh
5111[88]: #locale_uninstallsh
5112[89]: #bc1mdin
5113[90]: #dc1mdin
5114[91]: #bc
5115[92]: https://pandoc.org/
5116[93]: #release_settingstxt
5117[94]: #aflpy
5118[95]: #randmathpy
5119[96]: #caching-of-numbers
5120[97]: #error-handling
5121[98]: #radamsatxt
5122[99]: https://gitlab.com/akihe/radamsa
5123[100]: #radamsash
5124[101]: https://musl.libc.org/
5125[102]: ./build.md#settings
5126[103]: #test_settingstxt
5127[104]: #test_settingssh
5128[105]: #functionssh
5129[106]: #bch
5130[107]: https://en.wikipedia.org/wiki/Infix_notation
5131[108]: https://en.wikipedia.org/wiki/Reverse_Polish_notation
5132[109]: https://en.wikipedia.org/wiki/Shunting-yard_algorithm
5133[110]: #bc-parsing
5134[111]: #vectors
5135[112]: https://git.yzena.com/Yzena/Yc/src/branch/master/include/yc/vector.h
5136[113]: https://en.wikipedia.org/wiki/Signal_(IPC)
5137[114]: #custom-io
5138[115]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03
5139[116]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html
5140[117]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html
5141[118]: https://www.youtube.com/watch?v=4PaWFYm0kEw
5142[119]: #fuzz_prepsh
5143[120]: #bc_aflyaml
5144[121]: #bc_afl_continueyaml
5145[122]: https://github.com/tmux/tmux
5146[123]: https://tmuxp.git-pull.com/
5147[124]: #test-suite
5148[125]: https://aflplus.plus/
5149[126]: #link-time-optimization
5150[127]: #fuzzing-performance
5151[128]: #radamsa
5152[129]: #afl-quickstart
5153[130]: #convenience
5154[131]: #datac
5155[132]: https://git.gavinhoward.com/gavin/vim-bc
5156[133]: https://git.gavinhoward.com/gavin/bc_libs
5157[134]: #debugging
5158[135]: #asserts
5159[136]: #portability
5160[137]: https://pexpect.readthedocs.io/en/stable/
5161[138]: #test-suite-portability
5162[139]: #historypy
5163[140]: #historysh
5164[141]: #group-tests
5165[142]: #build-system
5166[143]: #generated-tests
5167[144]: #benchmarks-1
5168[145]: #gen
5169[146]: #test-coverage
5170[147]: #integration-with-the-build-system
5171[148]: #test-scripts
5172[149]: #standard-tests
5173[150]: #script-tests
5174[151]: #error-tests
5175[152]: #stdin-tests
5176[153]: #read-tests
5177[154]: #other-tests
5178[155]: #history-tests
5179[156]: #bcl
5180[157]: #bcl-test
5181[158]: #bclc
5182[159]: #valgrind
5183[160]: #addresssanitizer-and-friends
5184[161]: #bc-2
5185[162]: #dc-2
5186[163]: #alltxt-1
5187[164]: #errorstxt
5188[165]: #posix_errorstxt
5189[166]: #timeconstsh
5190[167]: #alltxt-3
5191[168]: #errorstxt-1
5192[169]: #scripts-1
5193[170]: #scripts-2
5194[171]: #alltxt-2
5195[172]: #alltxt-4
5196[173]: #async-signal-safe-signal-handling
5197[174]: #vectorh
5198[175]: #read_errorstxt
5199[176]: #statush
5200[177]: #numbers
5201[178]: #math-style
5202[179]: #pseudo-random-number-generator
5203[180]: #lexh
5204[181]: #parseh
5205[182]: #dch
5206[183]: #libraryh
5207[184]: #numh
5208[185]: #readh
5209[186]: #maps
5210[187]: #slabs-and-slab-vectors
5211[188]: ./build.md#extra-math
5212[189]: #command-line-history
5213[190]: #scriptsed
5214[191]: #linux-timeconstbc-script
5215[192]: #corpuses
5216[193]: ./build.md#history
5217[194]: https://www.valgrind.org/docs/manual/mc-manual.html
5218[195]: #othersh
5219[196]: https://scan.coverity.com/
5220[197]: https://clang-analyzer.llvm.org/
5221[198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it
5222[199]: https://cr.yp.to/docs/selfpipe.html
5223[200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph
5224[201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator
5225[202]: https://en.wikipedia.org/wiki/Tail_call
5226[203]: https://en.wikipedia.org/wiki/Functional_programming_language
5227[204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation
5228[205]: #mainc
5229[206]: #argc
5230[207]: https://en.wikipedia.org/wiki/Abstract_syntax_tree
5231[208]: #functions
5232[209]: #parsing
5233[210]: https://en.wikipedia.org/wiki/Stack_machine
5234[211]: https://en.wikipedia.org/wiki/Register_machine
5235[212]: #include
5236[213]: #src
5237[214]: https://gmplib.org/manual/Nomenclature-and-Types
5238[215]: https://en.wikipedia.org/wiki/Radix_point
5239[216]: #main-and-read-functions
5240[215]: https://www.pcg-random.org/
5241[216]: https://lemire.me/blog/2017/08/22/testing-non-cryptographic-random-number-generators-my-results/
5242[217]: https://en.wikipedia.org/wiki/Profile-guided_optimization
5243[218]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect
5244[219]: ./build.md#optimization
5245[220]: #benchmarksh
5246[221]: https://cgit.freebsd.org/src/tree/usr.bin/ministat/ministat.c
5247[222]: https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html
5248[223]: #ministatc
5249[224]: #dc
5250[225]: #allsh
5251[226]: #errorssh
5252[227]: #errorsh
5253[228]: #vectorc
5254[229]: https://github.com/gavinhoward/bc/pull/72
5255