1# Development 2 3Updated: 22 Dec 2023 4 5This document is meant for the day when I (Gavin D. Howard) get [hit by a 6bus][1]. In other words, it's meant to make the [bus factor][1] a non-issue. 7 8This document is supposed to contain all of the knowledge necessary to develop 9`bc` and `dc`. 10 11In addition, this document is meant to add to the [oral tradition of software 12engineering][118], as described by Bryan Cantrill. 13 14This document will reference other parts of the repository. That is so a lot of 15the documentation can be closest to the part of the repo where it is actually 16necessary. 17 18## What Is It? 19 20This repository contains an implementation of both [POSIX `bc`][2] and [Unix 21`dc`][3]. 22 23POSIX `bc` is a standard utility required for POSIX systems. `dc` is a 24historical utility that was included in early Unix and even predates both Unix 25and C. They both are arbitrary-precision command-line calculators with their own 26programming languages. `bc`'s language looks similar to C, with infix notation 27and including functions, while `dc` uses [Reverse Polish Notation][4] and allows 28the user to execute strings as though they were functions. 29 30In addition, it is also possible to build the arbitrary-precision math as a 31library, named [`bcl`][156]. 32 33**Note**: for ease, I will refer to both programs as `bc` in this document. 34However, if I say "just `bc`," I am referring to just `bc`, and if I say `dc`, I 35am referring to just `dc`. 36 37### History 38 39This project started in January 2018 when a certain individual on IRC, hearing 40that I knew how to write parsers, asked me to write a `bc` parser for his math 41library. I did so. I thought about writing my own math library, but he 42disparaged my programming skills and made me think that I couldn't do it. 43 44However, he took so long to do it that I eventually decided to give it a try and 45had a working math portion in two weeks. It taught me that I should not listen 46to such people. 47 48From that point, I decided to make it an extreme learning experience about how 49to write quality software. 50 51That individual's main goal had been to get his `bc` into [toybox][16], and I 52managed to get my own `bc` in. I also got it in [busybox][17]. 53 54Eventually, in late 2018, I also decided to try my hand at implementing 55[Karatsuba multiplication][18], an algorithm that that unnamed individual 56claimed I could never implement. It took me a bit, but I did it. 57 58This project became a passion project for me, and I continued. In mid-2019, 59Stefan Eßer suggested I improve performance by putting more than 1 digit in each 60section of the numbers. After I showed immaturity because of some burnout, I 61implemented his suggestion, and the results were incredible. 62 63Since that time, I have gradually been improving the `bc` as I have learned more 64about things like fuzzing, [`scan-build`][19], [valgrind][20], 65[AddressSanitizer][21] (and the other sanitizers), and many other things. 66 67One of my happiest moments was when my `bc` was made the default in FreeBSD. 68Another happiest moment was when I found out that my `bc` had shipped with macOS 69Ventura, without my knowledge. 70 71But since I believe in [finishing the software I write][22], I have done less 72work on `bc` over time, though there are still times when I put a lot of effort 73in, such as now (17 June 2021), when I am attempting to convince OpenBSD to use 74my `bc`. 75 76And that is why I am writing this document: someday, someone else is going to 77want to change my code, and this document is my attempt to make it as simple as 78possible. 79 80### Values 81 82[According to Bryan Cantrill][10], all software has values. I think he's 83correct, though I [added one value for programming languages in particular][11]. 84 85However, for `bc`, his original list will do: 86 87* Approachability 88* Availability 89* Compatibility 90* Composability 91* Debuggability 92* Expressiveness 93* Extensibility 94* Interoperability 95* Integrity 96* Maintainability 97* Measurability 98* Operability 99* Performance 100* Portability 101* Resiliency 102* Rigor 103* Robustness 104* Safety 105* Security 106* Simplicity 107* Stability 108* Thoroughness 109* Transparency 110* Velocity 111 112There are several values that don't apply. The reason they don't apply is 113because `bc` and `dc` are existing utilities; this is just another 114reimplementation. The designs of `bc` and `dc` are set in stone; there is 115nothing we can do to change them, so let's get rid of those values that would 116apply to their design: 117 118* Compatibility 119* Integrity 120* Maintainability 121* Measurability 122* Performance 123* Portability 124* Resiliency 125* Rigor 126* Robustness 127* Safety 128* Security 129* Simplicity 130* Stability 131* Thoroughness 132* Transparency 133 134Furthermore, some of the remaining ones don't matter to me, so let me get rid of 135those and order the rest according to my *actual* values for this project: 136 137* Robustness 138* Stability 139* Portability 140* Compatibility 141* Performance 142* Security 143* Simplicity 144 145First is **robustness**. This `bc` and `dc` should be robust, accepting any 146input, never crashing, and instead, returning an error. 147 148Closely related to that is **stability**. The execution of `bc` and `dc` should 149be deterministic and never change for the same inputs, including the 150pseudo-random number generator (for the same seed). 151 152Third is **portability**. These programs should run everywhere that POSIX 153exists, as well as Windows. This means that just about every person on the 154planet will have access to these programs. 155 156Next is **compatibility**. These programs should, as much as possible, be 157compatible with other existing implementations and standards. 158 159Then we come to **performance**. A calculator is only usable if it's fast, so 160these programs should run as fast as possible. 161 162After that is **security**. These programs should *never* be the reason a user's 163computer is compromised. 164 165And finally, **simplicity**. Where possible, the code should be simple, while 166deferring to the above values. 167 168Keep these values in mind for the rest of this document, and for exploring any 169other part of this repo. 170 171#### Portability 172 173But before I go on, I want to talk about portability in particular. 174 175Most of these principles just require good attention and care, but portability 176is different. Sometimes, it requires pulling in code from other places and 177adapting it. In other words, sometimes I need to duplicate and adapt code. 178 179This happened in a few cases: 180 181* Option parsing (see [`include/opt.h`][35]). 182* History (see [`include/history.h`][36]). 183* Pseudo-Random Number Generator (see [`include/rand.h`][37]). 184 185This was done because I decided to ensure that `bc`'s dependencies were 186basically zero. In particular, either users have a normal install of Windows or 187they have a POSIX system. 188 189A POSIX system limited me to C99, `sh`, and zero external dependencies. That 190last item is why I pull code into `bc`: if I pull it in, it's not an external 191dependency. 192 193That's why `bc` has duplicated code. Remove it, and you risk `bc` not being 194portable to some platforms. 195 196## Suggested Course 197 198I do have a suggested course for programmers to follow when trying to understand 199this codebase. The order is this: 200 2011. `bc` Spec. 2022. Manpages. 2033. Test suite. 2044. Understand the build. 2055. Algorithms manual. 2066. Code concepts. 2077. Repo structure. 2088. Headers. 2099. Source code. 210 211This order roughly follows this order: 212 2131. High-level requirements 2142. Low-level requirements 2153. High-level implementation 2164. Low-level implementation 217 218In other words, first understand what the code is *supposed* to do, then 219understand the code itself. 220 221## Useful External Tools 222 223I have a few tools external to `bc` that are useful: 224 225* A [Vim plugin with syntax files made specifically for my `bc` and `dc`][132]. 226* A [repo of `bc` and `dc` scripts][133]. 227* A set of `bash` aliases (see below). 228* A `.bcrc` file with items useful for my `bash` setup (see below). 229 230My `bash` aliases are these: 231 232```sh 233alias makej='make -j16' 234alias mcmake='make clean && make' 235alias mcmakej='make clean && make -j16' 236alias bcdebug='CPPFLAGS="-DBC_DEBUG_CODE=1" CFLAGS="-Weverything -Wno-padded \ 237 -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \ 238 -Wno-unreachable-code-return -Wno-missing-noreturn \ 239 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 240 -pedantic -std=c99" ./configure.sh' 241alias bcconfig='CFLAGS="-Weverything -Wno-padded -Wno-switch-enum \ 242 -Wno-format-nonliteral -Wno-cast-align -Wno-unreachable-code-return \ 243 -Wno-missing-noreturn -Wno-disabled-macro-expansion -Wno-unreachable-code \ 244 -Wall -Wextra -pedantic -std=c99" ./configure.sh' 245alias bcnoassert='CPPFLAGS="-DNDEBUG" CFLAGS="-Weverything -Wno-padded \ 246 -Wno-switch-enum -Wno-format-nonliteral -Wno-cast-align \ 247 -Wno-unreachable-code-return -Wno-missing-noreturn \ 248 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 249 -pedantic -std=c99" ./configure.sh' 250alias bcdebugnoassert='CPPFLAGS="-DNDEBUG -DBC_DEBUG_CODE=1" \ 251 CFLAGS="-Weverything -Wno-padded -Wno-switch-enum -Wno-format-nonliteral \ 252 -Wno-cast-align -Wno-unreachable-code-return -Wno-missing-noreturn \ 253 -Wno-disabled-macro-expansion -Wno-unreachable-code -Wall -Wextra \ 254 -pedantic -std=c99" ./configure.sh' 255alias bcunset='unset BC_LINE_LENGTH && unset BC_ENV_ARGS' 256``` 257 258`makej` runs `make` with all of my cores. 259 260`mcmake` runs `make clean` before running `make`. It will take a target on the 261command-line. 262 263`mcmakej` is a combination of `makej` and `mcmake`. 264 265`bcdebug` configures `bc` for a full debug build, including `BC_DEBUG_CODE` (see 266[Debugging][134] below). 267 268`bcconfig` configures `bc` with Clang (Clang is my personal default compiler) 269using full warnings, with a few really loud and useless warnings turned off. 270 271`bcnoassert` configures `bc` to not have asserts built in. 272 273`bcdebugnoassert` is like `bcnoassert`, except it also configures `bc` for debug 274mode. 275 276`bcunset` unsets my personal `bc` environment variables, which are set to: 277 278```sh 279export BC_ENV_ARGS="-l $HOME/.bcrc" 280export BC_LINE_LENGTH="74" 281``` 282 283Unsetting these environment variables are necessary for running 284[`scripts/release.sh`][83] because otherwise, it will error when attempting to 285run `bc -s` on my `$HOME/.bcrc`. 286 287Speaking of which, the contents of that file are: 288 289```bc 290define void print_time_unit(t){ 291 if(t<10)print "0" 292 if(t<1&&t)print "0" 293 print t,":" 294} 295define void sec2time(t){ 296 auto s,m,h,d,r 297 r=scale 298 scale=0 299 t=abs(t) 300 s=t%60 301 t-=s 302 m=t/60%60 303 t-=m 304 h=t/3600%24 305 t-=h 306 d=t/86400 307 if(d)print_time_unit(d) 308 if(h)print_time_unit(h) 309 print_time_unit(m) 310 if(s<10)print "0" 311 if(s<1&&s)print "0" 312 s 313 scale=r 314} 315define minutes(secs){ 316 return secs/60; 317} 318define hours(secs){ 319 return secs/3600; 320} 321define days(secs){ 322 return secs/3600/24; 323} 324define years(secs){ 325 return secs/3600/24/365.25; 326} 327define fbrand(b,p){ 328 auto l,s,t 329 b=abs(b)$ 330 if(b<2)b=2 331 s=scale 332 t=b^abs(p)$ 333 l=ceil(l2(t),0) 334 if(l>scale)scale=l 335 t=irand(t)/t 336 scale=s 337 return t 338} 339define ifbrand(i,b,p){return irand(abs(i)$)+fbrand(b,p)} 340``` 341 342This allows me to use `bc` as part of my `bash` prompt. 343 344## Code Style 345 346The code style for `bc` is...weird, and that comes from historical accident. 347 348In [History][23], I mentioned how I got my `bc` in [toybox][16]. Well, in order 349to do that, my `bc` originally had toybox style. Eventually, I changed to using 350tabs, and assuming they were 4 spaces wide, but other than that, I basically 351kept the same style, with some exceptions that are more or less dependent on my 352taste. 353 354However, I later managed to get [ClangFormat][24] to work, so I changed the 355style to that. 356 357### ClangFormat 358 359The style is now defined as whatever [ClangFormat][24] outputs using the 360existing `.clang-format` file. More precisely, the style is whatever is output 361when the following command is run in the root directory: 362 363``` 364./scripts/format.sh 365``` 366 367### Historical Style 368 369The code style used to be: 370 371* Tabs are 4 spaces. 372* Tabs are used at the beginning of lines for indent. 373* Spaces are used for alignment. 374* Lines are limited to 80 characters, period. 375* Pointer asterisk (`*`) goes with the variable (on the right), not the type, 376 unless it is for a pointer type returned from a function. 377* The opening brace is put on the same line as the header for the function, 378 loop, or `if` statement. 379* Unless the header is more than one line, in which case the opening brace is 380 put on its own line. 381* If the opening brace is put on its own line, there is no blank line after it. 382* If the opening brace is *not* put on its own line, there *is* a blank line 383 after it, *unless* the block is only one or two lines long. 384* Code lines are grouped into what I call "paragraphs." Basically, lines that 385 seem like they should go together are grouped together. This one comes down 386 to judgment. 387* Bodies of `if` statements, `else` statements, and loops that are one line 388 long are put on the same line as the statement, unless the header is more than 389 one line long, and/or, the header and body cannot fit into 80 characters with 390 a space inbetween them. 391* If single-line bodies are on a separate line from their headers, and the 392 headers are only a single line, then no braces are used. 393* However, braces are *always* used if they contain another `if` statement or 394 loop. 395* Loops with empty bodies are ended with a semicolon. 396* Expressions that return a boolean value are surrounded by paretheses. 397* Macro backslashes are aligned as far to the left as possible. 398* Binary operators have spaces on both sides. 399* If a line with binary operators overflows 80 characters, a newline is inserted 400 *after* binary operators. 401* Function modifiers and return types are on the same line as the function name. 402* With one exception, `goto`'s are only used to jump to the end of a function 403 for cleanup. 404* All structs, enums, and unions are `typedef`'ed. 405* All constant data is in one file: [`src/data.c`][131], but the corresponding 406 `extern` declarations are in the appropriate header file. 407* All local variables are declared at the beginning of the scope where they 408 appear. They may be initialized at that point, if it does not invoke UB or 409 otherwise cause bugs. 410* All precondition `assert()`'s (see [Asserts][135]) come *after* local variable 411 declarations. 412* Besides short `if` statements and loops, there should *never* be more than one 413 statement per line. 414 415## Repo Structure 416 417Functions are documented with Doxygen-style doc comments. Functions that appear 418in headers are documented in the headers, while static functions are documented 419where they are defined. 420 421### `configure` 422 423A symlink to [`configure.sh`][69]. 424 425### `configure.sh` 426 427This is the script to configure `bc` and [`bcl`][156] for building. 428 429This `bc` has a custom build system. The reason for this is because of 430[*portability*][136]. 431 432If `bc` used an outside build system, that build system would be an external 433dependency. Thus, I had to write a build system for `bc` that used nothing but 434C99 and POSIX utilities. 435 436One of those utilities is POSIX `sh`, which technically implements a 437Turing-complete programming language. It's a terrible one, but it works. 438 439A user that wants to build `bc` on a POSIX system (not Windows) first runs 440`configure.sh` with the options he wants. `configure.sh` uses those options and 441the `Makefile` template ([`Makefile.in`][70]) to generate an actual valid 442`Makefile`. Then `make` can do the rest. 443 444For more information about the build process, see the [Build System][142] 445section and the [build manual][14]. 446 447For more information about shell scripts, see [POSIX Shell Scripts][76]. 448 449`configure.sh` does the following: 450 4511. It processes command-line arguments and figure out what the user wants to 452 build. 4532. It reads in [`Makefile.in`][70]. 4543. One-by-one, it replaces placeholders (in [`Makefile.in`][70]) of the form 455 `%%<placeholder_name>%%` based on the [build type][81]. 4564. It appends a list of file targets based on the [build type][81]. 4575. It appends the correct test targets. 4586. It copies the correct manpage and markdown manual for `bc` and `dc` into a 459 location from which they can be copied for install. 4607. It does a `make clean` to reset the build state. 461 462### `.gitattributes` 463 464A `.gitattributes` file. This is needed to preserve the `crlf` line endings in 465the Visual Studio files. 466 467### `.gitignore` 468 469The `.gitignore` 470 471### `LICENSE.md` 472 473This is the `LICENSE` file, including the licenses of various software that I 474have borrowed. 475 476### `Makefile.in` 477 478This is the `Makefile` template for [`configure.sh`][69] to use for generating a 479`Makefile`. 480 481For more information, see [`configure.sh`][69], the [Build System][142] section, 482and the [build manual][14]. 483 484Because of [portability][136], the generated `Makefile.in` should be a pure 485[POSIX `make`][74]-compatible `Makefile` (minus the placeholders). Here are a 486few snares for the unwary programmer in this file: 487 4881. No extensions allowed, including and especially GNU extensions. 4892. If new headers are added, they must also be added to `Makefile.in`. 4903. Don't delete the `.POSIX:` empty target at the top; that's what tells `make` 491 implementations that pure [POSIX `make`][74] is needed. 492 493In particular, there is no way to set up variables other than the `=` operator. 494There are no conditionals, so all of the conditional stuff must be in 495[`configure.sh`][69]. This is, in fact, why [`configure.sh`][69] exists in the 496first place: [POSIX `make`][74] is barebones and only does a build with no 497configuration. 498 499### `NEWS.md` 500 501A running changelog with an entry for each version. This should be updated at 502the same time that [`include/version.h`][75] is. 503 504### `NOTICE.md` 505 506The `NOTICE` file with proper attributions. 507 508### `README.md` 509 510The `README`. Read it. 511 512### `benchmarks/` 513 514The folder containing files to generate benchmarks. 515 516Each of these files was made, at one time or another, to benchmark some 517experimental feature, so if it seems there is no rhyme or reason to these 518benchmarks, it is because there is none, besides historical accident. 519 520#### `bc/` 521 522The folder containing `bc` scripts to generate `bc` benchmarks. 523 524##### `add.bc` 525 526The file to generate the benchmark to benchmark addition in `bc`. 527 528##### `arrays_and_constants.bc` 529 530The file to generate the benchmark to benchmark `bc` using lots of array names 531and constants. 532 533##### `arrays.bc` 534 535The file to generate the benchmark to benchmark `bc` using lots of array names. 536 537##### `constants.bc` 538 539The file to generate the benchmark to benchmark `bc` using lots of constants. 540 541##### `divide.bc` 542 543The file to generate the benchmark to benchmark division in `bc`. 544 545##### `functions.bc` 546 547The file to generate the benchmark to benchmark `bc` using lots of functions. 548 549##### `irand_long.bc` 550 551The file to generate the benchmark to benchmark `bc` using lots of calls to 552`irand()` with large bounds. 553 554##### `irand_short.bc` 555 556The file to generate the benchmark to benchmark `bc` using lots of calls to 557`irand()` with small bounds. 558 559##### `lib.bc` 560 561The file to generate the benchmark to benchmark `bc` using lots of calls to 562heavy functions in `lib.bc`. 563 564##### `multiply.bc` 565 566The file to generate the benchmark to benchmark multiplication in `bc`. 567 568##### `newton_raphson_div_large.bc` 569 570The file to generate the benchmark to benchmark the Newton-Raphson division in 571[GitHub PR #72][229] with large numbers. 572 573##### `newton_raphson_div_small.bc` 574 575The file to generate the benchmark to benchmark the Newton-Raphson division in 576[GitHub PR #72][229] with small numbers. 577 578##### `newton_raphson_sqrt_large.bc` 579 580The file to generate the benchmark to benchmark the Newton-Raphson square root 581in [GitHub PR #72][229] with large numbers. 582 583##### `newton_raphson_sqrt_small.bc` 584 585The file to generate the benchmark to benchmark the Newton-Raphson square root 586in [GitHub PR #72][229] with small numbers. 587 588##### `postfix_incdec.bc` 589 590The file to generate the benchmark to benchmark `bc` using postfix increment and 591decrement operators. 592 593##### `power.bc` 594 595The file to generate the benchmark to benchmark power (exponentiation) in `bc`. 596 597##### `subtract.bc` 598 599The file to generate the benchmark to benchmark subtraction in `bc`. 600 601##### `strings.bc` 602 603The file to generate the benchmark to benchmark `bc` using lots of strings. 604 605#### `dc/` 606 607The folder containing `dc` scripts to generate `dc` benchmarks. 608 609##### `modexp.dc` 610 611The file to generate the benchmark to benchmark modular exponentiation in `dc`. 612 613### `gen/` 614 615A folder containing the files necessary to generate C strings that will be 616embedded in the executable. 617 618All of the files in this folder have license headers, but the program and script 619that can generate strings from them include code to strip the license header out 620before strings are generated. 621 622#### `bc_help.txt` 623 624A text file containing the text displayed for `bc -h` or `bc --help`. 625 626This text just contains the command-line options and a short summary of the 627differences from GNU and BSD `bc`'s. It also directs users to the manpage. 628 629The reason for this is because otherwise, the help would be far too long to be 630useful. 631 632**Warning**: The text has some `printf()` format specifiers. You need to make 633sure the format specifiers match the arguments given to `bc_file_printf()`. 634 635#### `dc_help.txt` 636 637A text file containing the text displayed for `dc -h` or `dc --help`. 638 639This text just contains the command-line options and a short summary of the 640differences from GNU and BSD `dc`'s. It also directs users to the manpage. 641 642The reason for this is because otherwise, the help would be far too long to be 643useful. 644 645**Warning**: The text has some `printf()` format specifiers. You need to make 646sure the format specifiers match the arguments given to `bc_file_printf()`. 647 648#### `lib.bc` 649 650A `bc` script containing the [standard math library][5] required by POSIX. See 651the [POSIX standard][2] for what is required. 652 653This file does not have any extraneous whitespace, except for tabs at the 654beginning of lines. That is because this data goes directly into the binary, 655and whitespace is extra bytes in the binary. Thus, not having any extra 656whitespace shrinks the resulting binary. 657 658However, tabs at the beginning of lines are kept for two reasons: 659 6601. Readability. (This file is still code.) 6612. The program and script that generate strings from this file can remove 662 tabs at the beginning of lines. 663 664For more details about the algorithms used, see the [algorithms manual][25]. 665 666However, there are a few snares for unwary programmers. 667 668First, all constants must be one digit. This is because otherwise, multi-digit 669constants could be interpreted wrongly if the user uses a different `ibase`. 670This does not happen with single-digit numbers because they are guaranteed to be 671interpreted what number they would be if the `ibase` was as high as possible. 672 673This is why `A` is used in the library instead of `10`, and things like `2*9*A` 674for `180` in [`lib2.bc`][26]. 675 676As an alternative, you can set `ibase` in the function, but if you do, make sure 677to set it with a single-digit number and beware the snare below... 678 679Second, `scale`, `ibase`, and `obase` must be safely restored before returning 680from any function in the library. This is because without the `-g` option, 681functions are allowed to change any of the globals. 682 683Third, all local variables in a function must be declared in an `auto` statement 684before doing anything else. This includes arrays. However, function parameters 685are considered predeclared. 686 687Fourth, and this is only a snare for `lib.bc`, not [`lib2.bc`][26], the code 688must not use *any* extensions. It has to work when users use the `-s` or `-w` 689flags. 690 691#### `lib2.bc` 692 693A `bc` script containing the [extended math library][7]. 694 695Like [`lib.bc`][8], and for the same reasons, this file should have no 696extraneous whitespace, except for tabs at the beginning of lines. 697 698For more details about the algorithms used, see the [algorithms manual][25]. 699 700Also, be sure to check [`lib.bc`][8] for the snares that can trip up unwary 701programmers when writing code for `lib2.bc`. 702 703#### `strgen.c` 704 705Code for the program to generate C strings from text files. This is the original 706program, although [`strgen.sh`][9] was added later. 707 708The reason I used C here is because even though I knew `sh` would be available 709(it must be available to run `configure.sh`), I didn't know how to do what I 710needed to do with POSIX utilities and `sh`. 711 712Later, [`strgen.sh`][9] was contributed by Stefan Eßer of FreeBSD, showing that 713it *could* be done with `sh` and POSIX utilities. 714 715However, `strgen.c` exists *still* exists because the versions generated by 716[`strgen.sh`][9] may technically hit an environmental limit. (See the [draft C99 717standard][12], page 21.) This is because [`strgen.sh`][9] generates string 718literals, and in C99, string literals can be limited to 4095 characters, and 719`gen/lib2.bc` is above that. 720 721Fortunately, the limit for "objects," which include `char` arrays, is much 722bigger: 65535 bytes, so that's what `strgen.c` generates. 723 724However, the existence of `strgen.c` does come with a cost: the build needs C99 725compiler that targets the host machine. For more information, see the ["Cross 726Compiling" section][13] of the [build manual][14]. 727 728Read the comments in `strgen.c` for more detail about it, the arguments it 729takes, and how it works. 730 731#### `strgen.sh` 732 733An `sh` script that will generate C strings that uses only POSIX utilities. This 734exists for those situations where a host C99 compiler is not available, and the 735environment limits mentioned above in [`strgen.c`][15] don't matter. 736 737`strgen.sh` takes the same arguments as [`strgen.c`][15], and the arguments mean 738the exact same things, so see the comments in [`strgen.c`][15] for more detail 739about that, and see the comments in `strgen.sh` for more details about it and 740how it works. 741 742For more information about shell scripts, see [POSIX Shell Scripts][76]. 743 744### `include/` 745 746A folder containing the headers. 747 748The headers are not included among the source code because I like it better that 749way. Also there were folders within `src/` at one point, and I did not want to 750see `#include "../some_header.h"` or things like that. 751 752So all headers are here, even though only one ([`bcl.h`][30]) is meant for end 753users (to be installed in `INCLUDEDIR`). 754 755#### `args.h` 756 757This file is the API for processing command-line arguments. 758 759#### `bc.h` 760 761This header is the API for `bc`-only items. This includes the `bc_main()` 762function and the `bc`-specific lexing and parsing items. 763 764The `bc` parser is perhaps the most sensitive part of the entire codebase. See 765the documentation in `bc.h` for more information. 766 767The code associated with this header is in [`src/bc.c`][40], 768[`src/bc_lex.c`][41], and [`src/bc_parse.c`][42]. 769 770#### `bcl.h` 771 772This header is the API for the [`bcl`][156] library. 773 774This header is meant for distribution to end users and contains the API that end 775users of [`bcl`][156] can use in their own software. 776 777This header, because it's the public header, is also the root header. That means 778that it has platform-specific fixes for Windows. (If the fixes were not in this 779header, the build would fail on Windows.) 780 781The code associated with this header is in [`src/library.c`][43]. 782 783#### `dc.h` 784 785This header is the API for `dc`-only items. This includes the `dc_main()` 786function and the `dc`-specific lexing and parsing items. 787 788The code associated with this header is in [`src/dc.c`][44], 789[`src/dc_lex.c`][45], and [`src/dc_parse.c`][46]. 790 791#### `file.h` 792 793This header is for `bc`'s internal buffered I/O API. 794 795For more information about `bc`'s error handling and custom buffered I/O, see 796[Error Handling][97] and [Custom I/O][114], along with [`status.h`][176] and the 797notes about version [`3.0.0`][32] in the [`NEWS`][32]. 798 799The code associated with this header is in [`src/file.c`][47]. 800 801#### `history.h` 802 803This header is for `bc`'s implementation of command-line editing/history, which 804is based on a [UTF-8-aware fork][28] of [`linenoise`][29]. 805 806For more information, see the [Command-Line History][189] section. 807 808The code associated with this header is in [`src/history.c`][48]. 809 810#### `lang.h` 811 812This header defines the data structures and bytecode used for actual execution 813of `bc` and `dc` code. 814 815Yes, it's misnamed; that's an accident of history where the first things I put 816into it all seemed related to the `bc` language. 817 818The code associated with this header is in [`src/lang.c`][49]. 819 820#### `lex.h` 821 822This header defines the common items that both programs need for lexing. 823 824The code associated with this header is in [`src/lex.c`][50], 825[`src/bc_lex.c`][41], and [`src/dc_lex.c`][45]. 826 827#### `library.h` 828 829This header defines the things needed for [`bcl`][156] that users should *not* 830have access to. In other words, [`bcl.h`][30] is the *public* header for the 831library, and this header is the *private* header for the library. 832 833The code associated with this header is in [`src/library.c`][43]. 834 835#### `num.h` 836 837This header is the API for numbers and math. 838 839The code associated with this header is in [`src/num.c`][39]. 840 841#### `opt.h` 842 843This header is the API for parsing command-line arguments. 844 845It's different from [`args.h`][31] in that [`args.h`][31] is for the main code 846to process the command-line arguments into global data *after* they have already 847been parsed by `opt.h` into proper tokens. In other words, `opt.h` actually 848parses the command-line arguments, and [`args.h`][31] turns that parsed data 849into flags (bits), strings, and expressions that will be used later. 850 851Why are they separate? Because originally, `bc` used `getopt_long()` for 852parsing, so [`args.h`][31] was the only one that existed. After it was 853discovered that `getopt_long()` has different behavior on different platforms, I 854adapted a [public-domain option parsing library][34] to do the job instead. And 855in doing so, I gave it its own header. 856 857They could probably be combined, but I don't really care enough at this point. 858 859The code associated with this header is in [`src/opt.c`][51]. 860 861#### `parse.h` 862 863This header defines the common items that both programs need for parsing. 864 865Note that the parsers don't produce abstract syntax trees (AST's) or any 866intermediate representations. They produce bytecode directly. In other words, 867they don't have special data structures except what they need to do their job. 868 869The code associated with this header is in [`src/parse.c`][50], 870[`src/bc_lex.c`][42], and [`src/dc_lex.c`][46]. 871 872#### `program.h` 873 874This header defines the items needed to manage the data structures in 875[`lang.h`][38] as well as any helper functions needed to generate bytecode or 876execute it. 877 878The code associated with this header is in [`src/program.c`][53]. 879 880#### `rand.h` 881 882This header defines the API for the [pseudo-random number generator 883(PRNG)][179]. 884 885The PRNG only generates fixed-size integers. The magic of generating random 886numbers of arbitrary size is actually given to the code that does math 887([`src/num.c`][39]). 888 889The code associated with this header is in [`src/rand.c`][54]. 890 891#### `read.h` 892 893This header defines the API for reading from files and `stdin`. 894 895Thus, [`file.h`][55] is really for buffered *output*, while this file is for 896*input*. There is no buffering needed for `bc`'s inputs. 897 898The code associated with this header is in [`src/read.c`][56]. 899 900#### `status.h` 901 902This header has several things: 903 904* A list of possible errors that internal `bc` code can use. 905* Compiler-specific fixes. 906* Platform-specific fixes. 907* Macros for `bc`'s [error handling][97]. 908 909There is no code associated with this header. 910 911#### `vector.h` 912 913This header defines the API for the vectors (resizable arrays) that are used for 914data structures. 915 916Vectors are what do the heavy lifting in almost all of `bc`'s data structures. 917Even the maps of identifiers and arrays use vectors. 918 919The code associated with this header is in [`src/vector.c`][228]. 920 921#### `version.h` 922 923This header defines the version of `bc`. 924 925There is no code associated with this header. 926 927#### `vm.h` 928 929This header defines the API for setting up and running `bc` and `dc`. 930 931It is so named because I think of it as the "virtual machine" of `bc`, though 932that is probably not true as [`program.h`][57] is probably the "virtual machine" 933API. Thus, the name is more historical accident. 934 935The code associated with this header is in [`src/vm.c`][58]. 936 937### `locales/` 938 939This folder contains a bunch of `.msg` files and soft links to the real `.msg` 940files. This is how locale support is implemented in `bc`. 941 942The files are in the format required by the [`gencat`][59] POSIX utility. They 943all have the same messages, in the same order, with the same numbering, under 944the same groups. This is because the locale system expects those messages in 945that order. 946 947The softlinks exist because for many locales, they would contain the exact same 948information. To prevent duplication, they are simply linked to a master copy. 949 950The naming format for all files is: 951 952``` 953<language_code>_<country_code>.<encoding>.msg 954``` 955 956This naming format must be followed for all locale files. 957 958### `manuals/` 959 960This folder contains the documentation for `bc`, `dc`, and [`bcl`][156], along 961with a few other manuals. 962 963#### `algorithms.md` 964 965This file explains the mathematical algorithms that are used. 966 967The hope is that this file will guide people in understanding how the math code 968works. 969 970#### `bc.1.md.in` 971 972This file is a template for the markdown version of the `bc` manual and 973manpages. 974 975For more information about how the manpages and markdown manuals are generated, 976and for why, see [`scripts/manpage.sh`][60] and [Manuals][86]. 977 978#### `bcl.3` 979 980This is the manpage for the [`bcl`][156] library. It is generated from 981[`bcl.3.md`][61] using [`scripts/manpage.sh`][60]. 982 983For the reason why I check generated data into the repo, see 984[`scripts/manpage.sh`][60] and [Manuals][86]. 985 986#### `bcl.3.md` 987 988This is the markdown manual for the [`bcl`][156] library. It is the source for the 989generated [`bcl.3`][62] file. 990 991#### `benchmarks.md` 992 993This is a document that compares this `bc` to GNU `bc` in various benchmarks. It 994was last updated when version [`3.0.0`][32] was released. 995 996It has very little documentation value, other than showing what compiler options 997are useful for performance. 998 999#### `build.md` 1000 1001This is the [build manual][14]. 1002 1003This `bc` has a custom build system. The reason for this is because of 1004[*portability*][136]. 1005 1006If `bc` used an outside build system, that build system would be an external 1007dependency. Thus, I had to write a build system for `bc` that used nothing but 1008C99 and POSIX utilities, including barebones [POSIX `make`][74]. 1009 1010for more information about the build system, see the [build system][142] 1011section, the [build manual][14], [`configure.sh`][69], and [`Makefile.in`][70]. 1012 1013#### `dc.1.md.in` 1014 1015This file is a template for the markdown version of the `dc` manual and 1016manpages. 1017 1018For more information about how the manpages and markdown manuals are generated, 1019and for why, see [`scripts/manpage.sh`][60] and [Manuals][86]. 1020 1021#### `development.md` 1022 1023The file you are reading right now. 1024 1025#### `header_bcl.txt` 1026 1027Used by [`scripts/manpage.sh`][60] to give the [`bcl.3`][62] manpage a proper 1028header. 1029 1030For more information about generating manuals, see [`scripts/manpage.sh`][60] 1031and [Manuals][86]. 1032 1033#### `header_bc.txt` 1034 1035Used by [`scripts/manpage.sh`][60] to give the [generated `bc` manpages][79] a 1036proper header. 1037 1038For more information about generating manuals, see [`scripts/manpage.sh`][60] 1039and [Manuals][86]. 1040 1041#### `header_dc.txt` 1042 1043Used by [`scripts/manpage.sh`][60] to give the [generated `dc` manpages][80] a 1044proper header. 1045 1046For more information about generating manuals, see [`scripts/manpage.sh`][60] 1047and [Manuals][86]. 1048 1049#### `header.txt` 1050 1051Used by [`scripts/manpage.sh`][60] to give all generated manpages a license 1052header. 1053 1054For more information about generating manuals, see [`scripts/manpage.sh`][60] 1055and [Manuals][86]. 1056 1057#### `release.md` 1058 1059A checklist that I try to somewhat follow when making a release. 1060 1061#### `bc/` 1062 1063A folder containing the `bc` manuals. 1064 1065Each `bc` manual corresponds to a [build type][81]. See that link for more 1066details. 1067 1068For each manual, there are two copies: the markdown version generated from the 1069template, and the manpage generated from the markdown version. 1070 1071#### `dc/` 1072 1073A folder containing the `dc` manuals. 1074 1075Each `dc` manual corresponds to a [build type][81]. See that link for more 1076details. 1077 1078For each manual, there are two copies: the markdown version generated from the 1079template, and the manpage generated from the markdown version. 1080 1081### `scripts/` 1082 1083This folder contains helper scripts. Most of them are written in pure [POSIX 1084`sh`][72], but three ([`afl.py`][94], [`karatsuba.py`][78], and 1085[`randmath.py`][95]) are written in Python 3, and one ([`ministat.c`][223]) is 1086written in C. [`ministat.c`][223] in particular is copied from elsewhere. 1087 1088For more information about the shell scripts, see [POSIX Shell Scripts][76]. 1089 1090#### `afl.py` 1091 1092This script is meant to be used as part of the fuzzing workflow. 1093 1094It does one of two things: checks for valid crashes, or runs `bc` and or `dc` 1095under all of the paths found by [AFL++][125]. 1096 1097See [Fuzzing][82] for more information about fuzzing, including this script. 1098 1099#### `alloc.sh` 1100 1101This script is a quick and dirty script to test whether or not the garbage 1102collection mechanism of the [`BcNum` caching][96] works. It has been little-used 1103because it tests something that is not important to correctness. 1104 1105#### `benchmark.sh` 1106 1107A script making it easy to run benchmarks and to run the executable produced by 1108[`ministat.c`][223] on them. 1109 1110For more information, see the [Benchmarks][144] section. 1111 1112#### `bitfuncgen.c` 1113 1114A source file for an executable to generate tests for `bc`'s bitwise functions 1115in [`gen/lib2.bc`][26]. The executable is `scripts/bitfuncgen`, and it is built 1116with `make bitfuncgen`. It produces the test on `stdout` and the expected 1117results on `stderr`. This means that to generat tests, use the following 1118invokation: 1119 1120``` 1121scripts/bitfuncgen > tests/bc/bitfuncs.txt 2> tests/bc/bitfuncs_results.txt 1122``` 1123 1124It calls `abort()` if it runs into an error. 1125 1126#### `exec-install.sh` 1127 1128This script is the magic behind making sure `dc` is installed properly if it's 1129a symlink to `bc`. It checks to see if it is a link, and if so, it just creates 1130a new symlink in the install directory. Of course, it also installs `bc` itself, 1131or `dc` when it's alone. 1132 1133#### `functions.sh` 1134 1135This file is a bunch of common functions for most of the POSIX shell scripts. It 1136is not supposed to be run; instead, it is *sourced* by other POSIX shell 1137scripts, like so: 1138 1139``` 1140. "$scriptdir/functions.sh" 1141``` 1142 1143or the equivalent, depending on where the sourcing script is. 1144 1145For more information about the shell scripts, see [POSIX Shell Scripts][76]. 1146 1147#### `fuzz_prep.sh` 1148 1149Fuzzing is a regular activity when I am preparing for a release. 1150 1151This script handles all the options and such for building a fuzzable binary. 1152Instead of having to remember a bunch of options, I just put them in this script 1153and run the script when I want to fuzz. 1154 1155For more information about fuzzing, see [Fuzzing][82]. 1156 1157#### `karatsuba.py` 1158 1159This script has at least one of two major differences from most of the other 1160scripts: 1161 1162* It's written in Python 3. 1163* It's meant for software packagers. 1164 1165For example, [`scripts/afl.py`][94] and [`scripts/randmath.py`][95] are both in 1166Python 3, but they are not meant for the end user or software packagers and are 1167not included in source distributions. But this script is. 1168 1169This script breaks my rule of only POSIX utilities necessary for package 1170maintainers, but there's a very good reason for that: it's only meant to be run 1171*once* when the package is created for the first time, and maybe not even then. 1172 1173You see, this script does two things: it tests the Karatsuba implementation at 1174various settings for `KARATSUBA_LEN`, and it figures out what the optimal 1175`KARATSUBA_LEN` is for the machine that it is running on. 1176 1177Package maintainers can use this script, when creating a package for this `bc`, 1178to figure out what is optimal for their users. Then they don't have to run it 1179ever again. So this script only has to run on the packagers machine. 1180 1181I tried to write the script in `sh`, by the way, and I finally accepted the 1182tradeoff of using Python 3 when it became too hard. 1183 1184However, I also mentioned that it's for testing Karatsuba with various settings 1185of `KARATSUBA_LEN`. Package maintainers will want to run the [test suite][124], 1186right? 1187 1188Yes, but this script is not part of the [test suite][124]; it's used for testing 1189in the [`scripts/release.sh`][83] script, which is maintainer use only. 1190 1191However, there is one snare with `karatsuba.py`: I didn't want the user to have 1192to install any Python libraries to run it. Keep that in mind if you change it. 1193 1194#### `link.sh` 1195 1196This script is the magic behind making `dc` a symlink of `bc` when both 1197calculators are built. 1198 1199#### `locale_install.sh` 1200 1201This script does what its name says: it installs locales. 1202 1203It turns out that this is complicated. 1204 1205There is a magic environment variable, `$NLSPATH`, that tells you how and where 1206you are supposed to install locales. 1207 1208Yes, *how*. And where. 1209 1210But now is not the place to rant about `$NLSPATH`. For more information on 1211locales and `$NLSPATH`, see [Locales][85]. 1212 1213#### `locale_uninstall.sh` 1214 1215This script does what its name says: it uninstalls locales. 1216 1217This is far less complicated than installing locales. I basically generate a 1218wildcard path and then list all paths that fit that wildcard. Then I delete each 1219one of those paths. Easy. 1220 1221For more information on locales, see [Locales][85]. 1222 1223#### `manpage.sh` 1224 1225This script is the one that generates markdown manuals from a template and a 1226manpage from a markdown manual. 1227 1228For more information about generating manuals, see [Manuals][86]. 1229 1230#### `ministat.c` 1231 1232This is a file copied [from FreeBSD][221] that calculates the standard 1233statistical numbers, such as mean, average, and median, based on numbers 1234obtained from a file. 1235 1236For more information, see the [FreeBSD ministat(1) manpage][222]. 1237 1238This file allows `bc` to build the `scripts/ministat` executable using the 1239command `make ministat`, and this executable helps programmers evaluate the 1240results of [benchmarks][144] more accurately. 1241 1242#### `package.sh` 1243 1244This script is what helps `bc` maintainers cut a release. It does the following: 1245 12461. Creates the appropriate `git` tag. 12472. Pushes the `git` tag. 12483. Copies the repo to a temp directory. 12494. Removes files that should not be included in source distributions. 12505. Creates the tarballs. 12516. Signs the tarballs. 12527. Zips and signs the Windows executables if they exist. 12538. Calculates and outputs SHA512 and SHA256 sums for all of the files, 1254 including the signatures. 1255 1256This script is for `bc` maintainers to use when cutting a release. It is not 1257meant for outside use. This means that some non-POSIX utilities can be used, 1258such as `git` and `gpg`. 1259 1260In addition, before using this script, it expects that the folders that Windows 1261generated when building `bc`, `dc`, and [`bcl`][156], are in the parent 1262directory of the repo, exactly as Windows generated them. If they are not there, 1263then it will not zip and sign, nor calculate sums of, the Windows executables. 1264 1265Because this script creates a tag and pushes it, it should *only* be run *ONCE* 1266per release. 1267 1268#### `radamsa.sh` 1269 1270A script to test `bc`'s command-line expression parsing code, which, while 1271simple, strives to handle as much as possible. 1272 1273What this script does is it uses the test cases in [`radamsa.txt`][98] an input 1274to the [Radamsa fuzzer][99]. 1275 1276For more information, see the [Radamsa][128] section. 1277 1278#### `radamsa.txt` 1279 1280Initial test cases for the [`radamsa.sh`][100] script. 1281 1282#### `randmath.py` 1283 1284This script generates random math problems and checks that `bc`'s and `dc`'s 1285output matches the GNU `bc` and `dc`. (For this reason, it is necessary to have 1286GNU `bc` and `dc` installed before using this script.) 1287 1288One snare: be sure that this script is using the GNU `bc` and `dc`, not a 1289previously-installed version of this `bc` and `dc`. 1290 1291If you want to check for memory issues or failing asserts, you can build the 1292`bc` using `./scripts/fuzz_prep.sh -a`, and then run it under this script. Any 1293errors or crashes should be caught by the script and given to the user as part 1294of the "checklist" (see below). 1295 1296The basic idea behind this script is that it generates as many math problems as 1297it can, biasing towards situations that may be likely to have bugs, and testing 1298each math problem against GNU `bc` or `dc`. 1299 1300If GNU `bc` or `dc` fails, it just continues. If this `bc` or `dc` fails, it 1301stores that problem. If the output mismatches, it also stores the problem. 1302 1303Then, when the user sends a `SIGINT`, the script stops testing and goes into 1304report mode. One-by-one, it will go through the "checklist," the list of failed 1305problems, and present each problem to the user, as well as whether this `bc` or 1306`dc` crashed, and its output versus GNU. Then the user can decide to add them as 1307test cases, which it does automatically to the appropriate test file. 1308 1309#### `release_settings.txt` 1310 1311A text file of settings combinations that [`release.sh`][83] uses to ensure that 1312`bc` and `dc` build and work with various default settings. [`release.sh`][83] 1313simply reads it line by line and uses each line for one build. 1314 1315#### `release.sh` 1316 1317This script is for `bc` maintainers only. It runs `bc`, `dc`, and [`bcl`][156] 1318through a gauntlet that is mostly meant to be used in preparation for a release. 1319 1320It does the following: 1321 13221. Builds every [build type][81], with every setting combo in 1323 [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc` 1324 alone. 13252. Builds every [build type][81], with every setting combo in 1326 [`release_settings.txt`][93] with both calculators, `bc` alone, and `dc` 1327 alone for 32-bit. 13283. Does #1 and #2 for Debug, Release, Release with Debug Info, and Min Size 1329 Release builds. 13304. Runs the [test suite][124] on every build, if desired. 13315. Runs the [test suite][124] under [ASan, UBSan, and MSan][21] for every build 1332 type/setting combo. 13336. Runs [`scripts/karatsuba.py`][78] in test mode. 13347. Runs the [test suite][124] for both calculators, `bc` alone, and `dc` alone 1335 under [valgrind][20] and errors if there are any memory bugs or memory 1336 leaks. 1337 1338#### `safe-install.sh` 1339 1340A script copied from [musl][101] to atomically install files. 1341 1342#### `test_settings.sh` 1343 1344A quick and dirty script to help automate rebuilding while manually testing the 1345various default settings. 1346 1347This script uses [`test_settings.txt`][103] to generate the various settings 1348combos. 1349 1350For more information about settings, see [Settings][102] in the [build 1351manual][14]. 1352 1353#### `test_settings.txt` 1354 1355A list of the various settings combos to be used by [`test_settings.sh`][104]. 1356 1357### `src/` 1358 1359This folder is, obviously, where the actual heart and soul of `bc`, the source 1360code, is. 1361 1362All of the source files are in one folder; this simplifies the build system 1363immensely. 1364 1365There are separate files for `bc` and `dc` specific code ([`bc.c`][40], 1366[`bc_lex.c`][41], [`bc_parse.c`][42], [`dc.c`][44], [`dc_lex.c`][45], and 1367[`dc_parse.c`][46]) where possible because it is cleaner to exclude an entire 1368source file from a build than to have `#if`/`#endif` preprocessor guards. 1369 1370That said, it was easier in many cases to use preprocessor macros where both 1371calculators used much of the same code and data structures, so there is a 1372liberal sprinkling of them through the code. 1373 1374#### `args.c` 1375 1376Code for processing command-line arguments. 1377 1378The header for this file is [`include/args.h`][31]. 1379 1380#### `bc.c` 1381 1382The code for the `bc` main function `bc_main()`. 1383 1384The header for this file is [`include/bc.h`][106]. 1385 1386#### `bc_lex.c` 1387 1388The code for lexing that only `bc` needs. 1389 1390The headers for this file are [`include/lex.h`][180] and [`include/bc.h`][106]. 1391 1392#### `bc_parse.c` 1393 1394The code for parsing that only `bc` needs. This code is the most complex and 1395subtle in the entire codebase. 1396 1397The headers for this file are [`include/parse.h`][181] and 1398[`include/bc.h`][106]. 1399 1400#### `data.c` 1401 1402Due to [historical accident][23] because of a desire to get my `bc` into 1403[toybox][16], all of the constant data that `bc` needs is all in one file. This 1404is that file. 1405 1406There is no code in this file, but a lot of the const data has a heavy influence 1407on code, including the order of data in arrays because that order has to 1408correspond to the order of other things elsewhere in the codebase. If you change 1409the order of something in this file, run `make test`, and get errors, you 1410changed something that depends on the order that you messed up. 1411 1412Almost all headers have `extern` references to items in this file. 1413 1414#### `dc.c` 1415 1416The code for the `dc` main function `dc_main()`. 1417 1418The header for this file is [`include/dc.h`][182]. 1419 1420#### `dc_lex.c` 1421 1422The code for lexing that only `dc` needs. 1423 1424The headers for this file are [`include/lex.h`][180] and [`include/dc.h`][182]. 1425 1426#### `dc_parse.c` 1427 1428The code for parsing that only `dc` needs. 1429 1430The headers for this file are [`include/parse.h`][181] and 1431[`include/bc.h`][182]. 1432 1433#### `file.c` 1434 1435The code for `bc`'s implementation of buffered I/O. For more information about 1436why I implemented my own buffered I/O, see [`include/file.h`][55], [Error 1437Handling][97], and [Custom I/O][114], along with [`status.h`][176] and the notes 1438about version [`3.0.0`][32] in the [`NEWS`][32]. 1439 1440The header for this file is [`include/file.h`][55]. 1441 1442#### `history.c` 1443 1444The code for `bc`'s implementation of command-line editing/history, which is 1445based on a [UTF-8-aware fork][28] of [`linenoise`][29]. 1446 1447For more information, see the [Command-Line History][189] section. 1448 1449The header for this file is [`include/history.h`][36]. 1450 1451#### `lang.c` 1452 1453The data structures used for actual execution of `bc` and `dc` code. 1454 1455While execution is done in [`src/program.c`][53], this file defines functions 1456for initializing, copying, and freeing the data structures, which is somewhat 1457orthogonal to actual execution. 1458 1459Yes, it's misnamed; that's an accident of history where the first things I put 1460into it all seemed related to the `bc` language. 1461 1462The header for this file is [`include/lang.h`][38]. 1463 1464#### `lex.c` 1465 1466The code for the common things that both programs need for lexing. 1467 1468The header for this file is [`include/lex.h`][180]. 1469 1470#### `library.c` 1471 1472The code to implement the public API of the `bcl` library. 1473 1474The code in this file does a lot to ensure that clients do not have to worry 1475about internal `bc` details, especially error handling with `setjmp()` and 1476`longjmp()`. That and encapsulating the handling of numbers are the bulk of what 1477the code in this file actually does because most of the library is still 1478implemented in [`src/num.c`][39]. 1479 1480The headers for this file are [`include/bcl.h`][30] and 1481[`include/library.h`][183]. 1482 1483#### `main.c` 1484 1485The entry point for both programs; this is the `main()` function. 1486 1487This file has no headers associated with it. 1488 1489#### `num.c` 1490 1491The code for all of the arbitrary-precision [numbers][177] and [math][178] in 1492`bc`. 1493 1494The header for this file is [`include/num.h`][184]. 1495 1496#### `opt.c` 1497 1498The code for parsing command-line options. 1499 1500The header for this file is [`include/opt.h`][35]. 1501 1502#### `parse.c` 1503 1504The code for the common items that both programs need for parsing. 1505 1506The header for this file is [`include/parse.h`][181]. 1507 1508#### `program.c` 1509 1510The code for the actual execution engine for `bc` and `dc` code. 1511 1512The header for this file is [`include/program.h`][57]. 1513 1514#### `rand.c` 1515 1516The code for the [pseudo-random number generator (PRNG)][179] and the special 1517stack handling it needs. 1518 1519The PRNG only generates fixed-size integers. The magic of generating random 1520numbers of arbitrary size is actually given to the code that does math 1521([`src/num.c`][39]). 1522 1523The header for this file is [`include/rand.h`][37]. 1524 1525#### `read.c` 1526 1527The code for reading from files and `stdin`. 1528 1529The header for this file is [`include/read.h`][185]. 1530 1531#### `vector.c` 1532 1533The code for [vectors][111], [maps][186], and [slab vectors][187], along with 1534slabs. 1535 1536The header for this file is [`include/vector.h`][174]. 1537 1538#### `vm.c` 1539 1540The code for setting up and running `bc` and `dc`. 1541 1542It is so named because I think of it as the "virtual machine" of `bc`, though 1543that is probably not true as [`program.h`][57] is probably the "virtual machine" 1544code. Thus, the name is more historical accident. 1545 1546The header for this file is [`include/vm.h`][27]. 1547 1548### `tests/` 1549 1550This directory contains the entire [test suite][124] and its infrastructure. 1551 1552#### `all.sh` 1553 1554A convenience script for the `make run_all_tests` target (see the [Group 1555Tests][141] section for more information). 1556 1557#### `all.txt` 1558 1559The file with the names of the calculators. This is to make it easier for the 1560test scripts to know where the standard and other test directories are. 1561 1562#### `bcl.c` 1563 1564The test for the [`bcl`][156] API. For more information, see the [`bcl` 1565Test][157] section. 1566 1567#### `error.sh` 1568 1569The script to run the file-based error tests in `tests/<calculator>/errors/` for 1570each calculator. For more information, see the [Error Tests][151] section. 1571 1572This is a separate script so that each error file can be run separately and in 1573parallel. 1574 1575#### `errors.sh` 1576 1577The script to run the line-based error tests in `tests/<calculator>/errors.txt` 1578for each calculator. For more information, see the [Error Tests][151] section. 1579 1580#### `extra_required.txt` 1581 1582The file with the list of tests which both calculators have that need the [Extra 1583Math build option][188]. This exists to make it easy for test scripts to skip 1584those tests when the [Extra Math build option][188] is disabled. 1585 1586#### `history.py` 1587 1588The file with all of the history tests. For more information, see the [History 1589Tests][155] section. 1590 1591#### `history.sh` 1592 1593The script to integrate [`history.py`][139] into the build system in a portable 1594way, and to skip it if necessary. 1595 1596This script also re-runs the test three times if it fails. This is because 1597`pexpect` can be flaky at times. 1598 1599#### `other.sh` 1600 1601The script to run the "other" (miscellaneous) tests for each calculator. For 1602more information, see the [Other Tests][154] section. 1603 1604#### `read.sh` 1605 1606The script to run the read tests for each calculator. For more information, see 1607the [`read()` Tests][153] section. 1608 1609#### `script.sed` 1610 1611The `sed` script to edit the output of GNU `bc` when generating script tests. 1612For more information, see the [Script Tests][150] section. 1613 1614#### `script.sh` 1615 1616The script for running one script test. For more information, see the [Script 1617Tests][150] section. 1618 1619#### `scripts.sh` 1620 1621The script to help the `make run_all_tests` (see the [Group Tests][141] section) 1622run all of the script tests. 1623 1624#### `stdin.sh` 1625 1626The script to run the `stdin` tests for each calculator. For more information, 1627see the [`stdin` Tests][152] section. 1628 1629#### `test.sh` 1630 1631The script to run one standard test. For more information, see the [Standard 1632Tests][149] section. 1633 1634#### `bc/` 1635 1636The standard tests directory for `bc`. For more information, see the [Standard 1637Tests][149] section. 1638 1639##### `all.txt` 1640 1641The file to tell the build system and `make run_all_tests` (see the [Group 1642Tests][141] section) what standard tests to run for `bc`, as well as in what 1643order. 1644 1645This file just lists the test names, one per line. 1646 1647##### `errors.txt` 1648 1649The initial error test file for `bc`. This file has one test per line. See the 1650[Error Tests][151] section for more information. 1651 1652##### `posix_errors.txt` 1653 1654The file of tests for POSIX compatibility for `bc`. This file has one test per 1655line. For more information, see the [Error Tests][151] section. 1656 1657##### `timeconst.sh` 1658 1659The script to run the `bc` tests that use the [Linux `timeconst.bc` script][6]. 1660For more information, see the [Linux `timeconst.bc` Script][191]section. 1661 1662##### `errors/` 1663 1664The directory with error tests for `bc`, most discovered by AFL++ (see the 1665[Fuzzing][82] section). There is one test per file. For more information, see 1666the [Error Tests][151] section. 1667 1668##### `scripts/` 1669 1670The script tests directory for `bc`. For more information, see the [Script 1671Tests][150] section. 1672 1673###### `all.txt` 1674 1675A file to tell the build system and `make run_all_tests` (see the [Group 1676Tests][141] section) what script tests to run for `bc`, as well as in what 1677order. 1678 1679This file just lists the test names, one per line. 1680 1681#### `dc/` 1682 1683The standard tests directory for `dc`. For more information, see the [Standard 1684Tests][149] section. 1685 1686##### `all.txt` 1687 1688The file to tell the build system and `make run_all_tests` (see the [Group 1689Tests][141] section) what standard tests to run for `dc`, as well as in what 1690order. 1691 1692This file just lists the test names, one per line. 1693 1694##### `errors.txt` 1695 1696The initial error test file for `dc`. This file has one test per line. See the 1697[Error Tests][151] section for more information. 1698 1699##### `read_errors.txt` 1700 1701The file of tests errors with the `?` command (`read()` in `bc`). This file has 1702one test per line. See the [Error Tests][151] section for more information. 1703 1704##### `errors/` 1705 1706The directory with error tests for `dc`, most discovered by AFL++ (see the 1707[Fuzzing][82] section). There is one test per file. For more information, see 1708the [Error Tests][151] section. 1709 1710##### `scripts/` 1711 1712The script tests directory for `dc`. For more information, see the [Script 1713Tests][150] section. 1714 1715###### `all.txt` 1716 1717The file to tell the build system and `make run_all_tests` (see the [Group 1718Tests][141] section) what script tests to run for `dc`, as well as in what 1719order. 1720 1721This file just lists the test names, one per line. 1722 1723#### `fuzzing/` 1724 1725The directory containing the fuzzing infrastructure. For more information, see 1726the [Fuzzing][82] section. 1727 1728##### `bc_afl_continue.yaml` 1729 1730The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily restarting a 1731fuzz run. For more information, see the [Convenience][130] subsection of the 1732[Fuzzing][82] section. 1733 1734##### `bc_afl.yaml` 1735 1736The [`tmuxp`][123] config (for use with [`tmux`][122]) for easily starting a 1737fuzz run. For more information, see the [Convenience][130] subsection of the 1738[Fuzzing][82] section. 1739 1740Be aware that this will delete all previous unsaved fuzzing tests in the output 1741directories. 1742 1743##### `bc_inputs1/` 1744 1745The fuzzing input directory for the first third of inputs for `bc`. For more 1746information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1747 1748##### `bc_inputs2/` 1749 1750The fuzzing input directory for the second third of inputs for `bc`. For more 1751information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1752 1753##### `bc_inputs3/` 1754 1755The fuzzing input directory for the third third of inputs for `bc`. For more 1756information, see the [Corpuses][192] subsection of the [Fuzzing][82] section. 1757 1758##### `dc_inputs/` 1759 1760The fuzzing input directory for the inputs for `dc`. For more information, see 1761the [Corpuses][192] subsection of the [Fuzzing][82] section. 1762 1763### `vs/` 1764 1765The directory containing all of the materials needed to build `bc`, `dc`, and 1766`bcl` on Windows. 1767 1768#### `bcl.sln` 1769 1770A Visual Studio solution file for [`bcl`][156]. This, along with 1771[`bcl.vcxproj`][63] and [`bcl.vcxproj.filters`][64] is what makes it possible to 1772build [`bcl`][156] on Windows. 1773 1774#### `bcl.vcxproj` 1775 1776A Visual Studio project file for [`bcl`][156]. This, along with [`bcl.sln`][65] 1777and [`bcl.vcxproj.filters`][64] is what makes it possible to build [`bcl`][156] 1778on Windows. 1779 1780#### `bcl.vcxproj.filters` 1781 1782A Visual Studio filters file for [`bcl`][156]. This, along with [`bcl.sln`][65] 1783and [`bcl.vcxproj`][63] is what makes it possible to build [`bcl`][156] on 1784Windows. 1785 1786#### `bc.sln` 1787 1788A Visual Studio solution file for `bc`. This, along with [`bc.vcxproj`][66] 1789and [`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on 1790Windows. 1791 1792#### `bc.vcxproj` 1793 1794A Visual Studio project file for `bc`. This, along with [`bc.sln`][68] and 1795[`bc.vcxproj.filters`][67] is what makes it possible to build `bc` on Windows. 1796 1797#### `bc.vcxproj.filters` 1798 1799A Visual Studio filters file for `bc`. This, along with [`bc.sln`][68] and 1800[`bc.vcxproj`][66] is what makes it possible to build `bc` on Windows. 1801 1802#### `tests/` 1803 1804A directory of files to run tests on Windows. 1805 1806##### `tests_bc.bat` 1807 1808A file to run basic `bc` tests on Windows. It expects that it will be run from 1809the directory containing it, and it also expects a `bc.exe` in the same 1810directory. 1811 1812##### `tests_dc.bat` 1813 1814A file to run basic `dc` tests on Windows. It expects that it will be run from 1815the directory containing it, and it also expects a `bc.exe` in the same 1816directory. 1817 1818## Build System 1819 1820The build system is described in detail in the [build manual][14], so 1821maintainers should start there. This section, however, describes some parts of 1822the build system that only maintainers will care about. 1823 1824### Clean Targets 1825 1826`bc` has a default `make clean` target that cleans up the build files. However, 1827because `bc`'s build system can generate many different types of files, there 1828are other clean targets that may be useful: 1829 1830* `make clean_gen` cleans the `gen/strgen` executable generated from 1831 [`gen/strgen.c`][15]. It has no prerequisites. 1832* `make clean` cleans object files, `*.cat` files (see the [Locales][85] 1833 section), executables, and files generated from text files in [`gen/`][145], 1834 including `gen/strgen` if it was built. So this has a prerequisite on 1835 `make clean_gen` in normal use. 1836* `make clean_benchmarks` cleans [benchmarks][144], including the `ministat` 1837 executable. It has no prerequisites. 1838* `make clean_config` cleans the generated `Makefile` and the manuals that 1839 [`configure.sh`][69] copied in preparation for install. It also depends on 1840 `make clean` and `make clean_benchmarks`, so it cleans those items too. This 1841 is the target that [`configure.sh`][69] uses before it does its work. 1842* `make clean_coverage` cleans the generated coverage files for the [test 1843 suite][124]'s [code coverage][146] capabilities. It has no prerequisites. This 1844 is useful if the code coverage tools are giving errors. 1845* `make clean_tests` cleans *everything*. It has prerequisites on all previous 1846 clean targets, but it also cleans all of the [generated tests][143]. 1847 1848When adding more generated files, you may need to add them to one of these 1849targets and/or add a target for them especially. 1850 1851### Preprocessor Macros 1852 1853`bc` and `dc` use *a lot* of preprocessor macros to ensure that each build type: 1854 1855* builds, 1856* works under the [test suite][124], and 1857* excludes as much code as possible from all builds. 1858 1859This section will explain the preprocessor style of `bc` and `dc`, as well as 1860provide an explanation of the macros used. 1861 1862#### Style 1863 1864The style of macro use in `bc` is pretty straightforward: I avoid depending on 1865macro definitions and instead, I set defaults if the macro is not defined and 1866then test the value if the macro with a plain `#if`. 1867 1868(Some examples of setting defaults are in [`include/status.h`][176], just above 1869the definition of the `BcStatus` enum.) 1870 1871In other words, I use `#if` instead of `#ifndef` or `#ifdef`, where possible. 1872 1873There are a couple of cases where I went with standard stuff instead. 1874 1875#### Standard Macros 1876 1877`BC_ENABLED` 1878 1879: This macro expands to `1` if `bc` is enabled, `0` if disabled. 1880 1881`DC_ENABLED` 1882 1883: This macro expands to `1` if `dc` is enabled, `0` if disabled. 1884 1885`BUILD_TYPE` 1886 1887: The macro expands to the build type, which is one of: `A`, `E`, `H`, `N`, 1888 `EH`, `EN`, `HN`, `EHN`. This build type is used in the help text to direct 1889 the user to the correct markdown manual in the `git.gavinhoward.com` 1890 website. 1891 1892`EXECPREFIX` 1893 1894: This macro expands to the prefix on the executable name. This is used to 1895 allow `bc` and `dc` to skip the prefix when finding out which calculator is 1896 executing. 1897 1898`BC_NUM_KARATSUBA_LEN` 1899 1900: This macro expands to an integer, which is the length of numbers below which 1901 the Karatsuba multiplication algorithm switches to brute-force 1902 multiplication. 1903 1904`BC_ENABLE_EXTRA_MATH` 1905 1906: This macro expands to `1` if the [Extra Math build option][188] is enabled, 1907 `0` if disabled. 1908 1909`BC_ENABLE_HISTORY` 1910 1911: This macro expands to `1` if the [History build option][193] is enabled, `0` 1912 if disabled. 1913 1914`BC_ENABLE_NLS` 1915 1916: This macro expands to `1` if the [NLS build option][193] (for locales) is 1917 enabled, `0` if disabled. 1918 1919`BC_ENABLE_LIBRARY` 1920 1921: This macro expands to `1` if the [`bcl` library][156] is enabled, `0` if 1922 disabled. If this is enabled, building the calculators themselves is 1923 disabled, but both `BC_ENABLED` and `DC_ENABLED` must be non-zero. 1924 1925`BC_ENABLE_MEMCHECK` 1926 1927: This macro expands to `1` if `bc` has been built for use with Valgrind's 1928 [Memcheck][194], `0` otherwise. This ensures that fatal errors still free 1929 all of their memory when exiting. `bc` does not do that normally because 1930 what's the point? 1931 1932`BC_ENABLE_AFL` 1933 1934: This macro expands to `1` if `bc` has been built for fuzzing with 1935 [AFL++][125], `0` otherwise. See the [Fuzzing][82] section for more 1936 information. 1937 1938`BC_DEFAULT_BANNER` 1939 1940: This macro expands to the default value for displaying the `bc` banner. 1941 1942`BC_DEFAULT_SIGINT_RESET` 1943 1944: The macro expands to the default value for whether or not `bc` should reset 1945 on `SIGINT` or quit. 1946 1947`BC_DEFAULT_TTY_MODE` 1948 1949: The macro expands to the default value for whether or not `bc` should use 1950 TTY mode when it available. 1951 1952`BC_DEFAULT_PROMPT` 1953 1954: This macro expands to the default value for whether or not `bc` should use a 1955 prompt when TTY mode is available. 1956 1957`DC_DEFAULT_SIGINT_RESET` 1958 1959: The macro expands to the default value for whether or not `dc` should reset 1960 on `SIGINT` or quit. 1961 1962`DC_DEFAULT_TTY_MODE` 1963 1964: The macro expands to the default value for whether or not `dc` should use 1965 TTY mode when it available. 1966 1967`DC_DEFAULT_PROMPT` 1968 1969: This macro expands to the default value for whether or not `dc` should use a 1970 prompt when TTY mode is available. 1971 1972`BC_DEBUG_CODE` 1973 1974: If this macro expands to a non-zero integer, then `bc` is built with *a lot* 1975 of extra debugging code. This is never set by the build system and must be 1976 set by the programmer manually. This should never be set in builds given to 1977 end users. For more information, see the [Debugging][134] section. 1978 1979## Test Suite 1980 1981While the source code may be the heart and soul of `bc`, the test suite is the 1982arms and legs: it gives `bc` the power to do anything it needs to do. 1983 1984The test suite is what allowed `bc` to climb to such high heights of quality. 1985This even goes for fuzzing because fuzzing depends on the test suite for its 1986input corpuses. (See the [Fuzzing][82] section.) 1987 1988Understanding how the test suite works should be, I think, the first thing that 1989maintainers learn after learning what `bc` and `dc` should do. This is because 1990the test suite, properly used, gives confidence that changes have not caused 1991bugs or regressions. 1992 1993That is why I spent the time to make the test suite as easy to use and as fast 1994as possible. 1995 1996To use the test suite (assuming `bc` and/or `dc` are already built), run the 1997following command: 1998 1999``` 2000make test 2001``` 2002 2003That's it. That's all. 2004 2005It will return an error code if the test suite failed. It will also print out 2006information about the failure. 2007 2008If you want the test suite to go fast, then run the following command: 2009 2010``` 2011make -j<cores> test 2012``` 2013 2014Where `<cores>` is the number of cores that your computer has. Of course, this 2015requires a `make` implementation that supports that option, but most do. (And I 2016will use this convention throughout the rest of this section.) 2017 2018I have even tried as much as possible, to put longer-running tests near the 2019beginning of the run so that the entire suite runs as fast as possible. 2020 2021However, if you want to be sure which test is failing, then running a bare 2022`make test` is a great way to do that. 2023 2024But enough about how you have no excuses to use the test suite as much as 2025possible; let's talk about how it works and what you *can* do with it. 2026 2027### Standard Tests 2028 2029The heavy lifting of testing the math in `bc`, as well as basic scripting, is 2030done by the "standard tests" for each calculator. 2031 2032These tests use the files in the [`tests/bc/`][161] and [`tests/dc/`][162] 2033directories (except for [`tests/bc/all.txt`][163], [`tests/bc/errors.txt`][164], 2034[`tests/bc/posix_errors.txt`][165], [`tests/bc/timeconst.sh`][166], 2035[`tests/dc/all.txt`][167], [`tests/dc/errors.txt`][168], and 2036[`tests/dc/read_errors.txt`][175]), which are called the "standard test 2037directories." 2038 2039For every test, there is the test file and the results file. The test files have 2040names of the form `<test>.txt`, where `<test>` is the name of the test, and the 2041results files have names of the form `<test>_results.txt`. 2042 2043If the test file exists but the results file does not, the results for that test 2044are generated by a GNU-compatible `bc` or `dc`. See the [Generated Tests][143] 2045section. 2046 2047The `all.txt` file in each standard tests directory is what tells the test suite 2048and [build system][142] what tests there are, and the tests are either run in 2049that order, or in the case of parallel `make`, that is the order that the 2050targets are listed as prerequisites of `make test`. 2051 2052If the test exists in the `all.txt` file but does not *actually* exist, the test 2053and its results are generated by a GNU-compatible `bc` or `dc`. See the 2054[Generated Tests][143] section. 2055 2056To add a non-generated standard test, do the following: 2057 2058* Add the test file (`<test>.txt` in the standard tests directory). 2059* Add the results file (`<test>_results.txt` in the standard tests directory). 2060 You can skip this step if just the results file needs to be generated. See the 2061 [Generated Tests][147] section for more information. 2062* Add the name of the test to the `all.txt` file in the standard tests 2063 directory, putting it in the order it should be in. If possible, I would put 2064 longer tests near the beginning because they will start running earlier with 2065 parallel `make`. I always keep `decimal` first, though, as a smoke test. 2066 2067If you need to add a generated standard test, see the [Generated Tests][147] 2068section for how to do that. 2069 2070Some standard tests need to be skipped in certain cases. That is handled by the 2071[build system][142]. See the [Integration with the Build System][147] section 2072for more details. 2073 2074In addition to all of the above, the standard test directory is not only the 2075directory for the standard tests of each calculator, it is also the parent 2076directory of all other test directories for each calculator. 2077 2078#### `bc` Standard Tests 2079 2080The list of current (27 February 2023) standard tests for `bc` is below: 2081 2082decimal 2083 2084: Tests decimal parsing and printing. 2085 2086print 2087 2088: Tests printing in every base from decimal. This is near the top for 2089 performance of parallel testing. 2090 2091parse 2092 2093: Tests parsing in any base and outputting in decimal. This is near the top 2094 for performance of parallel testing. 2095 2096lib2 2097 2098: Tests the extended math library. This is near the top for performance of 2099 parallel testing. 2100 2101print2 2102 2103: Tests printing at the extreme values of `obase`. 2104 2105length 2106 2107: Tests the `length()` builtin function. 2108 2109scale 2110 2111: Tests the `scale()` builtin function. 2112 2113shift 2114 2115: Tests the left (`<<`) and right (`>>`) shift operators. 2116 2117add 2118 2119: Tests addition. 2120 2121subtract 2122 2123: Tests subtraction. 2124 2125multiply 2126 2127: Tests multiplication. 2128 2129divide 2130 2131: Tests division. 2132 2133modulus 2134 2135: Tests modulus. 2136 2137power 2138 2139: Tests power (exponentiation). 2140 2141sqrt 2142 2143: Tests the `sqrt()` (square root) builtin function. 2144 2145trunc 2146 2147: Tests the truncation (`$`) operator. 2148 2149places 2150 2151: Tests the places (`@`) operator. 2152 2153vars 2154 2155: Tests some usage of variables. This one came from [AFL++][125] I think. 2156 2157boolean 2158 2159: Tests boolean operators. 2160 2161comp 2162 2163: Tests comparison operators. 2164 2165abs 2166 2167: Tests the `abs()` builtin function. 2168 2169assignments 2170 2171: Tests assignment operators, including increment/decrement operators. 2172 2173functions 2174 2175: Tests functions, specifically function parameters being replaced before they 2176 themselves are used. See the comment in `bc_program_call()` about the last 2177 condition. 2178 2179scientific 2180 2181: Tests scientific notation. 2182 2183engineering 2184 2185: Tests engineering notation. 2186 2187globals 2188 2189: Tests that assigning to globals affects callers. 2190 2191strings 2192 2193: Tests strings. 2194 2195strings2 2196 2197: Tests string allocation in slabs, to ensure slabs work. 2198 2199letters 2200 2201: Tests single and double letter numbers to ensure they behave differently. 2202 Single-letter numbers always be set to the same value, regardless of 2203 `ibase`. 2204 2205exponent 2206 2207: Tests the `e()` function in the math library. 2208 2209log 2210 2211: Tests the `l()` function in the math library. 2212 2213pi 2214 2215: Tests that `bc` produces the right value of pi for numbers with varying 2216 `scale` values. 2217 2218arctangent 2219 2220: Tests the `a()` function in the math library. 2221 2222sine 2223 2224: Tests the `s()` function in the math library. 2225 2226cosine 2227 2228: Tests the `c()` function in the math library. 2229 2230bessel 2231 2232: Tests the `j()` function in the math library. 2233 2234fib 2235 2236: Tests the `fib()` Fibonacci function in the extended math library. 2237 2238arrays 2239 2240: Test arrays. 2241 2242misc 2243 2244: Miscellaneous tests. I named it this because at the time, I struggled to 2245 classify them, but it's really testing multi-line numbers. 2246 2247misc1 2248 2249: A miscellaneous test found by [AFL++][125]. 2250 2251misc2 2252 2253: A miscellaneous test found by [AFL++][125]. 2254 2255misc3 2256 2257: A miscellaneous test found by [AFL++][125]. 2258 2259misc4 2260 2261: A miscellaneous test found by [AFL++][125]. 2262 2263misc5 2264 2265: A miscellaneous test found by [AFL++][125]. 2266 2267misc6 2268 2269: A miscellaneous test found by [AFL++][125]. 2270 2271misc7 2272 2273: A miscellaneous test found by [AFL++][125]. 2274 2275void 2276 2277: Tests void functions. 2278 2279rand 2280 2281: Tests the pseudo-random number generator and its special stack handling. 2282 2283rand_limits 2284 2285: Tests the limits of the pseudo-random number generator `irand()`. 2286 2287recursive_arrays 2288 2289: Tested the slab vector undo ability in used in `bc_parse_name()` when it 2290 existed. Now used as a stress test. 2291 2292divmod 2293 2294: Tests divmod. 2295 2296modexp 2297 2298: Tests modular exponentiation. 2299 2300bitfuncs 2301 2302: Tests the bitwise functions, `band()`, `bor()`, `bxor()`, `blshift()` and 2303 `brshift()` in [`gen/lib2.bc`][26]. 2304 2305leadingzero 2306 2307: Tests the leading zero functionality and the `plz*()` and `pnlz*()` 2308 functions in [`gen/lib2.bc`][26]. 2309 2310is_number 2311 2312: Tests the `is_number()` built-in function. 2313 2314is_string 2315 2316: Tests the `is_number()` built-in function. 2317 2318asciify_array 2319 2320: Tests the ability of `asciify()` to convert an array into a full string. 2321 2322line_by_line1 2323 2324: Tests the line-by-line behavior of `bc` with regards to `quit` in a function 2325 definition. 2326 2327line_by_line2 2328 2329: Tests the line-by-line behavior of `bc` with regards to `quit`. 2330 2331line_loop_quit1 2332 2333: Tests the behavior of `bc` with a `quit` after a single-line loop. 2334 2335line_loop_quit2 2336 2337: Tests the behavior of `bc` with a `quit` after a single-line loop and a 2338 newline escape. 2339 2340#### `dc` Standard Tests 2341 2342The list of current (27 February 2023) standard tests for `dc` is below: 2343 2344decimal 2345 2346: Tests decimal parsing and printing. 2347 2348length 2349 2350: Tests the `length()` builtin function, including for strings and arrays. 2351 2352stack_len 2353 2354: Tests taking the length of the results stack. 2355 2356exec_stack_len 2357 2358: Tests taking the length of the execution stack. 2359 2360add 2361 2362: Tests addition. 2363 2364subtract 2365 2366: Tests subtraction. 2367 2368multiply 2369 2370: Tests multiplication. 2371 2372divide 2373 2374: Tests division. 2375 2376modulus 2377 2378: Tests modulus. 2379 2380divmod 2381 2382: Tests divmod. 2383 2384power 2385 2386: Tests power (exponentiation). 2387 2388sqrt 2389 2390: Tests the `sqrt()` (square root) builtin function. 2391 2392modexp 2393 2394: Tests modular exponentiation. 2395 2396boolean 2397 2398: Tests boolean operators. 2399 2400negate 2401 2402: Tests negation as a command and as part of numbers. 2403 2404trunc 2405 2406: Tests the truncation (`$`) operator. 2407 2408places 2409 2410: Tests the places (`@`) operator. 2411 2412shift 2413 2414: Tests the left (`<<`) and right (`>>`) shift operators. 2415 2416abs 2417 2418: Tests the `abs()` builtin function (the `b` command). 2419 2420scientific 2421 2422: Tests scientific notation. 2423 2424engineering 2425 2426: Tests engineering notation. 2427 2428vars 2429 2430: Tests some usage of variables. This one came from [AFL++][125] I think. 2431 2432misc 2433 2434: Miscellaneous tests. I named it this because at the time, I struggled to 2435 classify them. 2436 2437misc1 2438 2439: More miscellaneous tests. This used to be an error file 2440 (`tests/dc/errors/15.txt`) due to the presence of a invalid `u` character. 2441 However, starting with version `6.1.0`, `u` became a valid command 2442 (`is_number()`), so the error file became valid. It was checked manually and 2443 moved, and `tests/dc/errors/34.txt` became the new `tests/dc/errors/15.txt`. 2444 2445strings 2446 2447: Tests strings. 2448 2449rand 2450 2451: Tests the pseudo-random number generator and its special stack handling. 2452 2453is_number 2454 2455: Tests the `is_number()` built-in function (the `u` command). 2456 2457is_string 2458 2459: Tests the `is_number()` built-in function (the `t` command). 2460 2461### Script Tests 2462 2463The heavy lifting of testing the scripting of `bc` is done by the "script tests" 2464for each calculator. 2465 2466These tests use the files in the [`tests/bc/scripts/`][169] and 2467[`tests/dc/scripts/`][170] directories (except for 2468[`tests/bc/scripts/all.txt`][171] and [`tests/dc/scripts/all.txt`][172]), which 2469are called the "script test directories." 2470 2471To add a script test, do the following: 2472 2473* Add the test file (`<test>.bc` or `<test>.dc` in the script tests directory). 2474* Add the results file (`<test>.txt` in the script tests directory). You can 2475 skip this step if just the results file needs to be generated. See the 2476 [Generated Tests][147] section for more information. 2477* Add the name of the test to the `all.txt` file in the script tests directory, 2478 putting it in the order it should be in. If possible, I would put longer tests 2479 near the beginning because they will start running earlier with parallel 2480 `make`. 2481 2482Some script tests need to be skipped in certain cases. That is handled by the 2483[build system][142]. See the [Integration with the Build System][147] section 2484for more details. 2485 2486Another unique thing about the script tests, at least for `bc`: they test the 2487`-g` and `--global-stacks` flags. This means that all of the script tests for 2488`bc` are written assuming the `-g` flag was given on the command-line 2489 2490There is one extra piece of script tests: [`tests/script.sed`][190]. This `sed` 2491script is used to remove an incompatibility with GNU `bc`. 2492 2493If there is only one more character to print at the end of `BC_LINE_LENGTH`, GNU 2494`bc` still prints a backslash+newline+digit combo. OpenBSD doesn't, which is 2495correct according to my reading of the `bc` spec, so my `bc` doesn't as well. 2496 2497The `sed` script edits numbers that end with just one digit on a line by itself 2498to put it on the same line as others. 2499 2500#### `bc` Script Tests 2501 2502The list of current (27 February 2023) script tests for `bc` is below: 2503 2504print.bc 2505 2506: Tests printing even harder than the print standard test. 2507 2508print2 2509 2510: Tests printing at the extreme edge of line length in various bases. 2511 2512multiply.bc 2513 2514: Tests multiplication even harder than the multiply standard test. 2515 2516divide.bc 2517 2518: Tests division even harder than the divide standard test. 2519 2520subtract.bc 2521 2522: Tests subtraction even harder than the subtract standard test. 2523 2524add.bc 2525 2526: Tests addition even harder than the add standard test. 2527 2528parse.bc 2529 2530: Tests parsing even harder than the parse standard test. 2531 2532root.bc 2533 2534: Tests that `root()` and `cbrt()` do not go into an infinite loop on a 2535 pathological case found by a user. 2536 2537array.bc 2538 2539: Tests arrays even harder than the arrays standard test. 2540 2541array2.bc 2542 2543: Implements a test where an array element is passed as a parameter to a 2544 function, and then another array is passed to a later parameter that is 2545 named the same as the first array. This was added because of a bug found 2546 while writing a script in bc. 2547 2548atan.bc 2549 2550: Tests arctangent even harder than the arctangent standard test. 2551 2552bessel.bc 2553 2554: Tests bessel even harder than the bessel standard test. 2555 2556functions.bc 2557 2558: Tests functions even harder than the functions standard test. 2559 2560globals.bc 2561 2562: Tests global stacks directly. 2563 2564len.bc 2565 2566: Tests the `length()` builtin on arrays. 2567 2568rand.bc 2569 2570: Tests the random number generator in the presence of global stacks. 2571 2572references.bc 2573 2574: Tests functions with array reference parameters. 2575 2576screen.bc 2577 2578: A random script provided by an early user that he used to calculate the size 2579 of computer screens 2580 2581strings2.bc 2582 2583: Tests escaping in strings. 2584 2585ifs.bc 2586 2587: Tests proper ending of `if` statements without `else` statements. 2588 2589ifs2.bc 2590 2591: More tests proper ending of `if` statements without `else` statements. 2592 2593#### `dc` Script Tests 2594 2595The list of current (27 February 2023) script tests for `dc` is below: 2596 2597prime.dc 2598 2599: Tests scripting by generating the first 100,000 primes. 2600 2601asciify.dc 2602 2603: Tests the asciify command. 2604 2605stream.dc 2606 2607: Tests the stream command. 2608 2609array.dc 2610 2611: Tests arrays. 2612 2613else.dc 2614 2615: Tests else clauses on conditional execution commands. 2616 2617factorial.dc 2618 2619: Tests scripting with factorial. 2620 2621loop.dc 2622 2623: Tests scripting by implementing loops. 2624 2625quit.dc 2626 2627: Tests the quit command in the presence of tail calls. 2628 2629weird.dc 2630 2631: A miscellaneous test. 2632 2633no_clamp.dc 2634 2635: A test to ensure `dc` has the same behavior as the BSD `dc` with digi 2636 clamping off when parsing numbers. 2637 2638### Error Tests 2639 2640One of the most useful parts of the `bc` test suite, in my opinion, is the heavy 2641testing of error conditions. 2642 2643Just about every error condition I can think of is tested, along with many 2644machine-generated (by [AFL++][125]) ones. 2645 2646However, because the error tests will often return error codes, they require 2647different infrastructure from the rest of the test suite, which assumes that 2648the calculator under test will return successfully. A lot of that infrastructure 2649is in the [`scripts/functions.sh`][105] script, but it basically allows the 2650calculator to exit with an error code and then tests that there *was* an error 2651code. 2652 2653Besides returning error codes, error tests also ensure that there is output from 2654`stderr`. This is to make sure that an error message is always printed. 2655 2656The error tests for each calculator are spread through two directories, due to 2657historical accident. These two directories are the standard test directory (see 2658the [Standard Tests][149] section) and the `errors/` directory directly 2659underneath the standard tests directory. 2660 2661This split is convenient, however, because the tests in each directory are 2662treated differently. 2663 2664The error tests in the standard test directory, which include `errors.txt` for 2665both calculators, `posix_errors.txt` for `bc`, and `read_errors.txt` for both, 2666are run by [`tests/errors.sh`][226]. It reads them line-by-line and shoves the 2667data through `stdin`. Each line is considered a separate test. For this reason, 2668there can't be any blank lines in the error files in the standard tests 2669directory because a blank line causes a successful exit. 2670 2671On the other hand, the tests in the `errors/` directory below the standard tests 2672directory are run by [`tests/error.sh`][227] and are considered to be one test 2673per file. As such, they are used differently. They are shoved into the 2674calculator through `stdin`, but they are also executed by passing them on the 2675command-line. 2676 2677To add an error test, first figure out which kind you want. 2678 2679Is it a simple one-liner, and you don't care if it's tested through a file? 2680 2681Then put it in one of the error files in the standard test directory. I would 2682only put POSIX errors in the `posix_errors.txt` file for `bc`, and only `read()` 2683errors in the `read_errors.txt` file for `dc`; all others I would put in the 2684respective `errors.txt` file. 2685 2686On the other hand, if you care if the error is run as a file on the 2687command-line, or the error requires multiple lines to reproduce, then put the 2688test in the respective `errors/` directory and run the [`configure.sh`][69] 2689script again. 2690 2691After that, you are done; the test suite will automatically pick up the new 2692test, and you don't have to tell the test suite the expected results. 2693 2694### `stdin` Tests 2695 2696The `stdin` tests specifically test the lexing and parsing of multi-line 2697comments and strings. This is important because when reading from `stdin`, the 2698calculators can only read one line at a time, so partial parses are possible. 2699 2700To add `stdin` tests, just add the tests to the `stdin.txt` file in the 2701respective standard tests directory, and add the expected results in the 2702`stdin_results.txt` in the respective standard tests directory. 2703 2704### `read()` Tests 2705 2706The `read()` tests are meant to test the `read()` builtin function, to ensure 2707that the parsing and execution is correct. 2708 2709Each line is one test, as that is the nature of using the `read()` function, so 2710to add a test, just add it as another line in the `read.txt` file in the 2711respective standard tests directory, and add its result to the 2712`read_results.txt` file in the respective standard tests directory. 2713 2714### Other Tests 2715 2716The "other" tests are just random tests that I could not easily classify under 2717other types of tests. They usually include things like command-line parsing and 2718environment variable testing. 2719 2720To add an other test, it requires adding the programming for it to 2721[`tests/other.sh`][195] because all of the tests are written specifically in 2722that script. It would be best to use the infrastructure in 2723[`scripts/functions.sh`][105]. 2724 2725### Linux `timeconst.bc` Script 2726 2727One special script that `bc`'s test suite will use is the [Linux `timeconst.bc` 2728script][6]. 2729 2730I made the test suite able to use this script because the reason the 2731[toybox][16] maintainer wanted my `bc` is because of this script, and I wanted 2732to be sure that it would run correctly on the script. 2733 2734However, it is not part of the distribution, nor is it part of the repository. 2735The reason for this is because [`timeconst.bc`][6] is under the GPL, while this 2736repo is under a BSD license. 2737 2738If you want `bc` to run tests on [`timeconst.bc`][6], download it and place it 2739at `tests/bc/scripts/timeconst.bc`. If it is there, the test suite will 2740automatically run its tests; otherwise, it will skip it. 2741 2742### History Tests 2743 2744There are automatic tests for history; however, they have dependencies: Python 3 2745and [`pexpect`][137]. 2746 2747As a result, because I need the [test suite to be portable][138], like the rest 2748of `bc`, the history tests are carefully guarded with things to ensure that they 2749are skipped, rather than failing if Python and [`pexpect`][137] are not 2750installed. For this reason, there is a `sh` script, [`tests/history.sh`][140] 2751that runs the actual script, [`tests/history.py`][139]. 2752 2753I have added as many tests as I could to cover as many lines and branches as 2754possible. I guess I could have done more, but doing so would have required a lot 2755of time. 2756 2757I have tried to make it as easy as possible to run the history tests. They will 2758run automatically if you use the `make test_history` command, and they will also 2759use parallel execution with `make -j<cores> test_history`. 2760 2761However, the history tests are meant only to be run by maintainers of `bc`; they 2762are *not* meant to be run by users and packagers. The reason for this is that 2763they only seem to work reliably on Linux; `pexpect` seems to have issues on 2764other platforms, especially timeout issues. 2765 2766Thus, they are excluded from running with `make test` and [`tests/all.sh`][225]. 2767However, they can be run from the [`scripts/release.sh`][83] script. 2768 2769All of the tests are contained in [`tests/history.py`][139]. The reason for this 2770is because they are in Python, and I don't have an easy way of including Python 2771(or at the very least, I am not familiar enough with Python to do that). So they 2772are all in the same file to make it easier on me. 2773 2774Each test is one function in the script. They all take the same number and type 2775of arguments: 2776 27771. `exe`: the executable to run. 27782. `args`: the arguments to pass to the executable. 27793. `env`: the environment. 2780 2781Each function creates a child process with `pexpect.spawn` and then tests with 2782that child. Then the function returns the child to the caller, who closes it 2783and checks its error code against its expected error code. 2784 2785Yes, the error code is not a success all the time. This is because of the UTF-8 2786tests; `bc` gives a fatal error on any non-ASCII data because ASCII is all `bc` 2787is required to handle, per the [standard][2]. 2788 2789So in [`tests/history.py`][139], there are four main arrays: 2790 2791* `bc` test functions, 2792* `bc` expected error codes. 2793* `dc` test functions. 2794* `dc` expected error codes. 2795 2796[`tests/history.py`][139] takes an index as an argument; that index is what test 2797it should run. That index is used to index into the proper test and error code 2798array. 2799 2800If you need to add more history tests, you need to do the following: 2801 28021. Add the function for that test to [`tests/history.py`][139]. 28032. Add the function to the proper array of tests. 28043. Add the expected error code to the proper array of error codes. 28054. Add a target for the test to [`Makefile.in`][70]. 28065. Add that target as a prerequisite to either `test_bc_history` or 2807 `test_dc_history`. 2808 2809You do not need to do anything to add the test to `history_all_tests` (see 2810[Group Tests][141] below) because the scripts will automatically run all of the 2811tests properly. 2812 2813### Generated Tests 2814 2815Some tests are *large*, and as such, it is impractical to check them into `git`. 2816Instead, the tests depend on the existence of a GNU-compatible `bc` in the 2817`PATH`, which is then used to generate the tests. 2818 2819If [`configure.sh`][69] was run with the `-G` argument, which disables generated 2820tests, then `make test` and friends will automatically skip generated tests. 2821This is useful to do on platforms that are not guaranteed to have a 2822GNU-compatible `bc` installed. 2823 2824However, adding a generated test is a complicated because you have to figure out 2825*where* you want to put the file to generate the test. 2826 2827For example, `bc`'s test suite will automatically use a GNU-compatible `bc` to 2828generate a `<test>_results.txt` file in the [standard tests][149] directory 2829(either `tests/bc/` or `tests/dc/`) if none exists for the `<test>` test. If no 2830`<test>.txt` file exists in the [standard tests][149] directory, then `bc`'s 2831test suite will look for a `<test>.bc` or `<test>.dc` file in the [script 2832tests][150] directory (either `tests/bc/scripts` or `tests/dc/scripts`), and if 2833that exists, it will use that script to generate the `<test>.txt` file in the 2834[standard tests][149] directory after which it will generate the 2835`<test>_results.txt` file in the [standard tests][149] directory. 2836 2837So you can choose to either: 2838 2839* Have a test in the [standard tests][149] directory without a corresponding 2840 `*_results.txt` file, or 2841* Have a script in the [script tests][150] directory to generate the 2842 corresponding file in the standard test directory before generating the 2843 corresponding `*_results.txt` file. 2844 2845Adding a script has a double benefit: the script itself can be used as a test. 2846However, script test results can also be generated. 2847 2848If `bc` is asked to run a script test, then if the script does not exist, `bc`'s 2849test suite returns an error. If it *does* exist, but no corresponding 2850`<test>.txt` file exists in the [script tests][150] directory, then a 2851GNU-compatible `bc` is used to generate the `<test>.txt` results file. 2852 2853If generated tests are disabled through [`configure.sh`][69], then these tests 2854are not generated if they do not exist. However, if they *do* exist, then they 2855are run. This can happen if a `make clean_tests` was not run between a build 2856that generated tests and a build that will not. 2857 2858### Group Tests 2859 2860While the test suite has a lot of targets in order to get parallel execution, 2861there are five targets that allow you to run each section, or all, of the test 2862suite as one unit: 2863 2864* `bc_all_tests` (`bc` tests) 2865* `timeconst_all_tests` ([Linux `timeconst.bc` script][6] tests) 2866* `dc_all_tests` (`dc` tests) 2867* `history_all_tests` (history tests) 2868* `run_all_tests` (combination of the previous four) 2869 2870In addition, there are more fine-grained targets available: 2871 2872* `test_bc` runs all `bc` tests (except history tests). 2873* `test_dc` runs all `dc` tests (except history tests). 2874* `test_bc_tests` runs all `bc` [standard tests][149]. 2875* `test_dc_tests` runs all `dc` [standard tests][149]. 2876* `test_bc_scripts` runs all `bc` [script tests][150]. 2877* `test_dc_scripts` runs all `dc` [script tests][150]. 2878* `test_bc_stdin` runs the `bc` [`stdin` tests][152]. 2879* `test_dc_stdin` runs the `dc` [`stdin` tests][152]. 2880* `test_bc_read` runs the `bc` [`read()` tests][153]. 2881* `test_dc_read` runs the `dc` [`read()` tests][153]. 2882* `test_bc_errors` runs the `bc` [error tests][151]. 2883* `test_dc_errors` runs the `dc` [error tests][151]. 2884* `test_bc_other` runs the `bc` [other tests][151]. 2885* `test_dc_other` runs the `dc` [other tests][151]. 2886* `timeconst` runs the tests for the [Linux `timeconst.bc` script][6]. 2887* `test_history` runs all history tests. 2888* `test_bc_history` runs all `bc` history tests. 2889* `test_dc_history` runs all `dc` history tests. 2890 2891All of the above tests are parallelizable. 2892 2893### Individual Tests 2894 2895In addition to all of the above, individual test targets are available. These 2896are mostly useful for attempting to fix a singular test failure. 2897 2898These tests are: 2899 2900* `test_bc_<test>`, where `<test>` is the name of a `bc` [standard test][149]. 2901 The name is the name of the test file without the `.txt` extension. It is the 2902 name printed by the test suite when running the test. 2903* `test_dc_<test>`, where `<test>` is the name of a `dc` [standard test][149]. 2904 The name is the name of the test file without the `.txt` extension. It is the 2905 name printed by the test suite when running the test. 2906* `test_bc_script_<test>`, where `<test>` is the name of a `bc` [script 2907 test][150]. The name of the test is the name of the script without the `.bc` 2908 extension. 2909* `test_dc_script_<test>`, where `<test>` is the name of a `dc` [script 2910 test][150]. The name of the test is the name of the script without the `.dc` 2911 extension. 2912* `test_bc_history<idx>` runs the `bc` history test with index `<idx>`. 2913* `test_dc_history<idx>` runs the `dc` history test with index `<idx>`. 2914 2915### [`bcl`][156] Test 2916 2917When [`bcl`][156] is built, the [build system][142] automatically ensures that 2918`make test` runs the [`bcl`][156] test instead of the `bc` and `dc` tests. 2919 2920There is only one test, and it is built from [`tests/bcl.c`][158]. 2921 2922The reason the test is in C is because [`bcl`][156] is a C library; I did not 2923want to have to write C code *and* POSIX `sh` scripts to run it. 2924 2925The reason there is only one test is because most of the code for the library is 2926tested by virtue of testing `bc` and `dc`; the test needs to only ensure that 2927the library bindings and plumbing do not interfere with the underlying code. 2928 2929However, just because there is only one test does not mean that it doesn't test 2930more than one thing. The code actually handles a series of tests, along with 2931error checking to ensure that nothing went wrong. 2932 2933To add a [`bcl`][156] test, just figure out what test you want, figure out where 2934in the [`tests/bcl.c`][158] would be best to put it, and put it there. Do as 2935much error checking as possible, and use the `err(BclError)` function. Ensure 2936that all memory is freed because that test is run through [Valgrind][159] and 2937[AddressSanitizer][160]. 2938 2939### Integration with the Build System 2940 2941If it was not obvious by now, the test suite is heavily integrated into the 2942[build system][142], but the integration goes further than just making the test 2943suite easy to run from `make` and generating individual and group tests. 2944 2945The big problem the test suite has is that some `bc` code, stuff that is 2946important to test, is only in *some* builds. This includes all of the extra math 2947extensions, for example. 2948 2949So the test suite needs to have some way of turning off the tests that depend on 2950certain [build types][81] when those [build types][81] are not used. 2951 2952This is the reason the is tightly integrated with the [build system][142]: the 2953[build system][142] knows what [build type][81] was used and can tell the test 2954suite to turn off the tests that do not apply. 2955 2956It does this with arguments to the test scripts that are either a `1` or a `0`, 2957depending on whether tests of that type should be enabled or not. These 2958arguments are why I suggest, in the [Test Scripts][148] section, to always use a 2959`make` target to run the test suite or any individual test. I have added a lot 2960of targets to make this easy and as fast as possible. 2961 2962In addition to all of that, the build system is responsible for selecting the 2963`bc`/`dc` tests or the [`bcl` test][157]. 2964 2965### Output Directories 2966 2967During any run of the test suite, the test suite outputs the results of running 2968various tests to files. These files are usually output to `tests/bc_outputs/` 2969and `tests/dc_outputs/`. 2970 2971However, in some cases, it may be necessary to output test results to a 2972different directory. If that is the case, set the environment variable 2973`BC_TEST_OUTPUT_DIR` to the name of the directory. 2974 2975If that is done, then test results will be written to 2976`$BC_TEST_OUTPUT_DIR/bc_outputs/` and `$BC_TEST_OUTPUT_DIR/dc_outputs/`. 2977 2978### Test Suite Portability 2979 2980The test suite is meant to be run by users and packagers as part of their 2981install process. 2982 2983This puts some constraints on the test suite, but the biggest is that the test 2984suite must be as [portable as `bc` itself][136]. 2985 2986This means that the test suite must be implemented in pure POSIX `make`, `sh`, 2987and C99. 2988 2989#### Test Scripts 2990 2991To accomplish the portability, the test suite is run by a bunch of `sh` scripts 2992that have the constraints laid out in [POSIX Shell Scripts][76]. 2993 2994However, that means they have some quirks, made worse by the fact that there are 2995[generated tests][143] and [tests that need to be skipped, but only 2996sometimes][147]. 2997 2998This means that a lot of the scripts take an awkward number and type of 2999arguments. Some arguments are strings, but most are integers, like 3000[`scripts/release.sh`][83]. 3001 3002It is for this reason that I do not suggest running the test scripts directly. 3003Instead, always use an appropriate `make` target, which already knows the 3004correct arguments for the test because of the [integration with the build 3005system][147]. 3006 3007### Test Coverage 3008 3009In order to get test coverage information, you need `gcc`, `gcov`, and `gcovr`. 3010 3011If you have them, run the following commands: 3012 3013``` 3014CC=gcc ./configure -gO3 -c 3015make -j<cores> 3016make coverage 3017``` 3018 3019Note that `make coverage` does not have a `-j<cores>` part; it cannot be run in 3020parallel. If you try, you will get errors. And note that `CC=gcc` is used. 3021 3022After running those commands, you can open your web browser and open the 3023`index.html` file in the root directory of the repo. From there, you can explore 3024all of the coverage results. 3025 3026If you see lines or branches that you think you could hit with a manual 3027execution, do such manual execution, and then run the following command: 3028 3029``` 3030make coverage_output 3031``` 3032 3033and the coverage output will be updated. 3034 3035If you want to rerun `make coverage`, you must do a `make clean` and build 3036first, like this: 3037 3038``` 3039make clean 3040make -j<cores> 3041make coverage 3042``` 3043 3044Otherwise, you will get errors. 3045 3046If you want to run tests in parallel, you can do this: 3047 3048``` 3049make -j<cores> 3050make -j<cores> test 3051make coverage_output 3052``` 3053 3054and that will generate coverage output correctly. 3055 3056### [AddressSanitizer][21] and Friends 3057 3058To run the test suite under [AddressSanitizer][21] or any of its friends, use 3059the following commands: 3060 3061``` 3062CFLAGS="-fsanitize=<sanitizer> ./configure -gO3 -m 3063make -j<cores> 3064make -j<cores> test 3065``` 3066 3067where `<sanitizer>` is the correct name of the desired sanitizer. There is one 3068exception to the above: `UndefinedBehaviorSanitizer` should be run on a build 3069that has zero optimization, so for `UBSan`, use the following commands: 3070 3071``` 3072CFLAGS="-fsanitize=undefined" ./configure -gO0 -m 3073make -j<cores> 3074make -j<cores> test 3075``` 3076 3077### [Valgrind][20] 3078 3079To run the test suite under [Valgrind][20], run the following commands: 3080 3081``` 3082./configure -gO3 -v 3083make -j<cores> 3084make -j<cores> test 3085``` 3086 3087It really is that easy. I have directly added infrastructure to the build system 3088and the test suite to ensure that if [Valgrind][20] detects any memory errors or 3089any memory leaks at all, it will tell the test suite infrastructure to report an 3090error and exit accordingly. 3091 3092## POSIX Shell Scripts 3093 3094There is a lot of shell scripts in this repository, and every single one of them 3095is written in pure [POSIX `sh`][72]. 3096 3097The reason that they are written in [POSIX `sh`][72] is for *portability*: POSIX 3098systems are only guaranteed to have a barebones implementation of `sh` 3099available. 3100 3101There are *many* snares for unwary programmers attempting to modify 3102[`configure.sh`][69], any of the scripts in this directory, [`strgen.sh`][9], or 3103any of the scripts in [`tests/`][77]. Here are some of them: 3104 31051. No `bash`-isms. 31062. Only POSIX standard utilities are allowed. 31073. Only command-line options defined in the POSIX standard for POSIX utilities 3108 are allowed. 31094. Only the standardized behavior of POSIX utilities is allowed. 31105. Functions return data by *printing* it. Using `return` sets their exit code. 3111 3112In other words, the script must only use what is standardized in the [`sh`][72] 3113and [Shell Command Language][73] standards in POSIX. This is *hard*. It precludes 3114things like `local` and the `[[ ]]` notation. 3115 3116These are *enormous* restrictions and must be tested properly. I put out at 3117least one release with a change to `configure.sh` that wasn't portable. That was 3118an embarrassing mistake. 3119 3120The lack of `local`, by the way, is why variables in functions are named with 3121the form: 3122 3123``` 3124_<function_name>_<var_name> 3125``` 3126 3127This is done to prevent any clashes of variable names with already existing 3128names. And this applies to *all* shell scripts. However, there are a few times 3129when that naming convention is *not* used; all of them are because those 3130functions are required to change variables in the global scope. 3131 3132### Maintainer-Only Scripts 3133 3134If a script is meant to be used for maintainers (of `bc`, not package 3135maintainers), then rules 2, 3, and 4 don't need to be followed as much because 3136it is assumed that maintainers will be able to install whatever tools are 3137necessary to do the job. 3138 3139## Manuals 3140 3141The manuals for `bc` and `dc` are all generated, and the manpages for `bc`, 3142`dc`, and `bcl` are also generated. 3143 3144Why? 3145 3146I don't like the format of manpages, and I am not confident in my ability to 3147write them. Also, they are not easy to read on the web. 3148 3149So that explains why `bcl`'s manpage is generated from its markdown version. But 3150why are the markdown versions of the `bc` and `dc` generated? 3151 3152Because the content of the manuals needs to change based on the [build 3153type][81]. For example, if `bc` was built with no history support, it should not 3154have the **COMMAND LINE HISTORY** section in its manual. If it did, that would 3155just confuse users. 3156 3157So the markdown manuals for `bc` and `dc` are generated from templates 3158([`manuals/bc.1.md.in`][89] and [`manuals/dc.1.md.in`][90]). And from there, 3159the manpages are generated from the generated manuals. 3160 3161The generated manpage for `bcl` ([`manuals/bcl.3`][62]) is checked into version 3162control, and the generated markdown manuals and manpages for `bc` 3163([`manuals/bc`][79]) and `dc` ([`manuals/dc`][80]) are as well. 3164 3165This is because generating the manuals and manpages requires a heavy dependency 3166that only maintainers should care about: [Pandoc][92]. Because users [should not 3167have to install *any* dependencies][136], the files are generated, checked into 3168version control, and included in distribution tarballs. 3169 3170If you run [`configure.sh`][69], you have an easy way of generating the markdown 3171manuals and manpages: just run `make manpages`. This target calls 3172[`scripts/manpage.sh`][60] appropriately for `bc`, `dc`, and `bcl`. 3173 3174For more on how generating manuals and manpages works, see 3175[`scripts/manpage.sh`][60]. 3176 3177## Locales 3178 3179The locale system of `bc` is enormously complex, but that's because 3180POSIX-compatible locales are terrible. 3181 3182How are they terrible? 3183 3184First, `gencat` does not work for generating cross-compilation. In other words, 3185it does not generate machine-portable files. There's nothing I can do about 3186this except for warn users. 3187 3188Second, the format of `.msg` files is...interesting. Thank goodness it is text 3189because otherwise, it would be impossible to get them right. 3190 3191Third, `.msg` files are not used. In other words, `gencat` exists. Why? 3192 3193Fourth, `$NLSPATH` is an awful way to set where and *how* to install locales. 3194 3195Yes, where and *how*. 3196 3197Obviously, from it's name, it's a path, and that's the where. The *how* is more 3198complicated. 3199 3200It's actually *not* a path, but a path template. It's a format string, and it 3201can have a few format specifiers. For more information on that, see [this 3202link][84]. But in essence, those format specifiers configure how each locale is 3203supposed to be installed. 3204 3205With all those problems, why use POSIX locales? Portability, as always. I can't 3206assume that `gettext` will be available, but I *can* pretty well assume that 3207POSIX locales will be available. 3208 3209The locale system of `bc` includes all files under [`locales/`][85], 3210[`scripts/locale_install.sh`][87], [`scripts/locale_uninstall.sh`][88], 3211[`scripts/functions.sh`][105], the `bc_err_*` constants in [`src/data.c`][131], 3212and the parts of the build system needed to activate it. There is also code in 3213[`src/vm.c`][58] (in `bc_vm_gettext()`) for loading the current locale. 3214 3215If the order of error messages and/or categories are changed, the order of 3216errors must be changed in the enum, the default error messages and categories in 3217[`src/data.c`][131], and all of the messages and categories in the `.msg` files 3218under [`locales/`][85]. 3219 3220## Static Analysis 3221 3222I do *some* static analysis on `bc`. 3223 3224I used to use [Coverity][196], but I stopped using it when it started giving me 3225too many false positives and also because it had a vulnerability. 3226 3227However, I still use the [Clang Static Analyzer][197] through 3228[`scan-build`][19]. I only use it in debug mode because I have to add some 3229special code to make it not complain about things that are definitely not a 3230problem. 3231 3232The most frequent example of false positives is where a local is passed to a 3233function to be initialized. [`scan-build`][19] misses that fact, so I 3234pre-initialize such locals to prevent the warnings. 3235 3236To run `scan-build`, do the following: 3237 3238``` 3239make clean 3240scan-build make 3241``` 3242 3243`scan-build` will print its warnings to `stdout`. 3244 3245## Fuzzing 3246 3247The quality of this `bc` is directly related to the amount of fuzzing I did. As 3248such, I spent a lot of work making the fuzzing convenient and fast, though I do 3249admit that it took me a long time to admit that it did need to be faster. 3250 3251First, there were several things which make fuzzing fast: 3252 3253* Using [AFL++][125]'s deferred initialization. 3254* Splitting `bc`'s corpuses. 3255* Parallel fuzzing. 3256 3257Second, there are several things which make fuzzing convenient: 3258 3259* Preprepared input corpuses. 3260* [`scripts/fuzz_prep.sh`][119]. 3261* `tmux` and `tmuxp` configs. 3262* [`scripts/afl.py`][94]. 3263 3264### Fuzzing Performance 3265 3266Fuzzing with [AFL++][125] can be ***SLOW***. Spending the time to make it as 3267fast as possible is well worth the time. 3268 3269However, there is a caveat to the above: it is easy to make [AFL++][125] crash, 3270be unstable, or be unable to find "paths" (see [AFL++ Quickstart][129]) if the 3271performance enhancements are done poorly. 3272 3273To stop [AFL++][125] from crashing on test cases, and to be stable, these are 3274the requirements: 3275 3276* The state at startup must be *exactly* the same. 3277* The virtual memory setup at startup must be *exactly* the same. 3278 3279The first isn't too hard; it's the second that is difficult. 3280 3281`bc` allocates a lot of memory at start. ("A lot" is relative; it's far less 3282than most programs.) After going through an execution run, however, some of that 3283memory, while it could be cleared and reset, is in different places because of 3284vectors. Since vectors reallocate, their allocations are not guaranteed to be in 3285the same place. 3286 3287So to make all three work, I had to set up the deferred initialization and 3288persistent mode *before* any memory was allocated (except for `vm.jmp_bufs`, 3289which is probably what caused the stability to drop below 100%). However, using 3290deferred alone let me put the [AFL++][125] initialization further back. This 3291works because [AFL++][125] sets up a `fork()` server that `fork()`'s `bc` right 3292at that call. Thus, every run has the exact same virtual memory setup, and each 3293run can skip all of the setup code. 3294 3295I tested `bc` using [AFL++][125]'s deferred initialization, plus persistent 3296mode, plus shared memory fuzzing. In order to do it safely, with stability above 329799%, all of that was actually *slower* than using just deferred initialization 3298with the initialization *right before* `stdin` was read. And as a bonus, the 3299stability in that situation is 100%. 3300 3301As a result, my [AFL++][125] setup only uses deferred initialization. That's the 3302`__AFL_INIT()` call. 3303 3304(Note: there is one more big item that must be done in order to have 100% 3305stability: the pseudo-random number generator *must* start with *exactly* the 3306same seed for every run. This is set up with the `tmux` and `tmuxp` configs that 3307I talk about below in [Convenience][130]. This seed is set before the 3308`__AFL_INIT()` call, so setting it has no runtime cost for each run, but without 3309it, stability would be abysmal.) 3310 3311On top of that, while `dc` is plenty fast under fuzzing (because of a faster 3312parser and less test cases), `bc` can be slow. So I have split the `bc` input 3313corpus into three parts, and I set fuzzers to run on each individually. This 3314means that they will duplicate work, but they will also find more stuff. 3315 3316On top of all of that, each input corpus (the three `bc` corpuses and the one 3317`dc` corpus) is set to run with 4 fuzzers. That works out perfectly for two 3318reasons: first, my machine has 16 cores, and second, the [AFL++][125] docs 3319recommend 4 parallel fuzzers, at least, to run different "power schedules." 3320 3321### Convenience 3322 3323The preprepared input corpuses are contained in the 3324`tests/fuzzing/bc_inputs{1,2,3}/`, and `tests/fuzzing/dc_inputs` directories. 3325There are three `bc` directories and only one `dc` directory because `bc`'s 3326input corpuses are about three times as large, and `bc` is a larger program; 3327it's going to need much more fuzzing. 3328 3329(They do share code though, so fuzzing all of them still tests a lot of the same 3330math code.) 3331 3332The next feature of convenience is the [`scripts/fuzz_prep.sh`][119] script. It 3333assumes the existence of `afl-clang-lto` in the `$PATH`, but if that exists, it 3334automatically configures and builds `bc` with a fuzz-ideal build. 3335 3336A fuzz-ideal build has several things: 3337 3338* `afl-clang-lto` as the compiler. (See [AFL++ Quickstart][129].) 3339* Debug mode, to crash as easily as possible. 3340* Full optimization (including [Link-Time Optimization][126]), for performance. 3341* [AFL++][125]'s deferred initialization (see [Fuzzing Performance][127] above). 3342* And `AFL_HARDEN=1` during the build to harden the build. See the [AFL++][125] 3343 documentation for more information. 3344 3345There is one big thing that a fuzz-ideal build does *not* have: it does not use 3346[AFL++][125]'s `libdislocator.so`. This is because `libdislocator.so` crashes if 3347it fails to allocate memory. I do not want to consider those as crashes because 3348my `bc` does, in fact, handle them gracefully by exiting with a set error code. 3349So `libdislocator.so` is not an option. 3350 3351However, to add to [`scripts/fuzz_prep.sh`][119] making a fuzz-ideal build, in 3352`tests/fuzzing/`, there are two `yaml` files: [`tests/fuzzing/bc_afl.yaml`][120] 3353and [`tests/fuzzing/bc_afl_continue.yaml`][121]. These files are meant to be 3354used with [`tmux`][122] and [`tmuxp`][123]. While other programmers will have to 3355adjust the `start_directory` item, once it is adjusted, then using this command: 3356 3357``` 3358tmuxp load tests/fuzzing/bc_afl.yaml 3359``` 3360 3361will start fuzzing. 3362 3363In other words, to start fuzzing, the sequence is: 3364 3365``` 3366./scripts/fuzz_prep.sh 3367tmuxp load tests/fuzzing/bc_afl.yaml 3368``` 3369 3370Doing that will load, in `tmux`, 16 separate instances of [AFL++][125], 12 on 3371`bc` and 4 on `dc`. The outputs will be put into the 3372`tests/fuzzing/bc_outputs{1,2,3}/` and `tests/fuzzing/dc_outputs/` directories. 3373 3374(Note that loading that config will also delete all unsaved [AFL++][125] output 3375from the output directories.) 3376 3377Sometimes, [AFL++][125] will report crashes when there are none. When crashes 3378are reported, I always run the following command: 3379 3380``` 3381./scripts/afl.py <dir> 3382``` 3383 3384where `dir` is one of `bc1`, `bc2`, `bc3`, or `dc`, depending on which of the 338516 instances reported the crash. If it was one of the first four (`bc11` through 3386`bc14`), I use `bc1`. If it was one of the second four (`bc21` through `bc24`, I 3387use `bc2`. If it was one of the third four (`bc31` through `bc34`, I use `bc3`. 3388And if it was `dc`, I use `dc`. 3389 3390The [`scripts/afl.py`][94] script will report whether [AFL++][125] correctly 3391reported a crash or not. If so, it will copy the crashing test case to 3392`.test.txt` and tell you whether it was from running it as a file or through 3393`stdin`. 3394 3395From there, I personally always investigate the crash and fix it. Then, when the 3396crash is fixed, I either move `.test.txt` to `tests/{bc,dc}/errors/<idx>.txt` as 3397an error test (if it produces an error) or I create a new 3398`tests/{bc,dc}/misc<idx>.txt` test for it and a corresponding results file. (See 3399[Test Suite][124] for more information about the test suite.) In either case, 3400`<idx>` is the next number for a file in that particular place. For example, if 3401the last file in `tests/{bc,dc}/errors/` is `tests/{bc,dc}/errors/18.txt`, I 3402move `.test.txt` to `tests/bc/error/19.txt`. 3403 3404Then I immediately run [`scripts/afl.py`][94] again to find the next crash 3405because often, [AFL++][125] found multiple test cases that trigger the same 3406crash. If it finds another, I repeat the process until it is happy. 3407 3408Once it *is* happy, I do the same `fuzz_prep.sh`, `tmuxp load` sequence and 3409restart fuzzing. Why do I restart instead of continuing? Because with the 3410changes, the test outputs could be stale and invalid. 3411 3412However, there *is* a case where I continue: if [`scripts/afl.py`][94] finds 3413that every crash reported by [AFL++][125] is invalid. If that's the case, I can 3414just continue with the command: 3415 3416``` 3417tmuxp load tests/fuzzing/bc_afl_continue.yaml 3418``` 3419 3420(Note: I admit that I usually run [`scripts/afl.py`][94] while the fuzzer is 3421still running, so often, I don't find a need to continue since there was no 3422stop. However, the capability is there, if needed.) 3423 3424In addition, my fuzzing setup, including the `tmux` and `tmuxp` configs, 3425automatically set up [AFL++][125] power schedules (see [Fuzzing 3426Performance][127] above). They also set up the parallel fuzzing such that there 3427is one fuzzer in each group of 4 that does deterministic fuzzing. It's always 3428the first one in each group. 3429 3430For more information about deterministic fuzzing, see the [AFL++][125] 3431documentation. 3432 3433### Corpuses 3434 3435I occasionally add to the input corpuses. These files come from new files in the 3436[Test Suite][124]. In fact, I use soft links when the files are the same. 3437 3438However, when I add new files to an input corpus, I sometimes reduce the size of 3439the file by removing some redundancies. 3440 3441And then, when adding to the `bc` corpuses, I try to add them evenly so that 3442each corpus will take about the same amount of time to get to a finished state. 3443 3444### [AFL++][125] Quickstart 3445 3446The way [AFL++][125] works is complicated. 3447 3448First, it is the one to invoke the compiler. It leverages the compiler to add 3449code to the binary to help it know when certain branches are taken. 3450 3451Then, when fuzzing, it uses that branch information to generate information 3452about the "path" that was taken through the binary. 3453 3454I don't know what AFL++ counts as a new path, but each new path is added to an 3455output corpus, and it is later used as a springboard to find new paths. 3456 3457This is what makes AFL++ so effective: it's not just blindly thrashing a binary; 3458it adapts to the binary by leveraging information about paths. 3459 3460### Fuzzing Runs 3461 3462For doing a fuzzing run, I expect about a week or two where my computer is 3463basically unusable, except for text editing and light web browsing. 3464 3465Yes, it can take two weeks for me to do a full fuzzing run, and that does *not* 3466include the time needed to find and fix crashes; it only counts the time on the 3467*last* run, the one that does not find any crashes. This means that the entire 3468process can take a month or more. 3469 3470What I use as an indicator that the fuzzing run is good enough is when the 3471number of "Pending" paths (see [AFL++ Quickstart][129] above) for all fuzzer 3472instances, except maybe the deterministic instances, is below 50. And even then, 3473I try to let deterministic instances get that far as well. 3474 3475You can see how many pending paths are left in the "path geometry" section of 3476the [AFL++][125] dashboard. 3477 3478Also, to make [AFL++][125] quit, you need to send it a `SIGINT`, either with 3479`Ctrl+c` or some other method. It will not quit until you tell it to. 3480 3481### Radamsa 3482 3483I rarely use [Radamsa][99] instead of [AFL++][125]. In fact, it's only happened 3484once. 3485 3486The reason I use [Radamsa][99] instead of [AFL++][125] is because it is easier 3487to use with varying command-line arguments, which was needed for testing `bc`'s 3488command-line expression parsing code, and [AFL++][125] is best when testing 3489input from `stdin`. 3490 3491[`scripts/radamsa.sh`][100] does also do fuzzing on the [AFL++][125] inputs, but 3492it's not as effective at that, so I don't really use it for that either. 3493 3494[`scripts/radamsa.sh`][100] and [Radamsa][99] were only really used once; I have 3495not had to touch the command-line expression parsing code since. 3496 3497### [AddressSanitizer][21] with Fuzzing 3498 3499One advantage of using [AFL++][125] is that it saves every test case that 3500generated a new path (see [AFL++ Quickstart][129] above), and it doesn't delete 3501them when the user makes it quit. 3502 3503Keeping them around is not a good idea, for several reasons: 3504 3505* They are frequently large. 3506* There are a lot of them. 3507* They go stale; after `bc` is changed, the generated paths may not be valid 3508 anymore. 3509 3510However, before they are deleted, they can definitely be leveraged for even 3511*more* bug squashing by running *all* of the paths through a build of `bc` with 3512[AddressSanitizer][21]. 3513 3514This can easily be done with these four commands: 3515 3516``` 3517./scripts/fuzz_prep.sh -a 3518./scripts/afl.py --asan bc1 3519./scripts/afl.py --asan bc2 3520./scripts/afl.py --asan bc3 3521./scripts/afl.py --asan dc 3522``` 3523 3524(By the way, the last four commands could be run in separate terminals to do the 3525processing in parallel.) 3526 3527These commands build an [ASan][21]-enabled build of `bc` and `dc` and then they 3528run `bc` and `dc` on all of the found crashes and path output corpuses. This is 3529to check that no path or crash has found any memory errors, including memory 3530leaks. 3531 3532Because the output corpuses can contain test cases that generate infinite loops 3533in `bc` or `dc`, [`scripts/afl.py`][94] has a timeout of 8 seconds, which is far 3534greater than the timeout that [AFL++][125] uses and should be enough to catch 3535any crash. 3536 3537If [AFL++][125] fails to find crashes *and* [ASan][21] fails to find memory 3538errors on the outputs of [AFL++][125], that is an excellent indicator of very 3539few bugs in `bc`, and a release can be made with confidence. 3540 3541## Code Concepts 3542 3543This section is about concepts that, if understood, will make it easier to 3544understand the code as it is written. 3545 3546The concepts in this section are not found in a single source file, but they are 3547littered throughout the code. That's why I am writing them all down in a single 3548place. 3549 3550### POSIX Mode 3551 3552POSIX mode is `bc`-only. 3553 3554In fact, POSIX mode is two different modes: Standard Mode and Warning Mode. 3555These modes are designed to help users write POSIX-compatible `bc` scripts. 3556 3557#### Standard Mode 3558 3559Standard Mode is activated with the `-s` or `--standard` flags. 3560 3561In this mode, `bc` will error if any constructs are used that are not strictly 3562compatible with the [POSIX `bc` specification][2]. 3563 3564#### Warning Mode 3565 3566Warning Mode is activated with the `-w` or `--warn` flags. 3567 3568In this mode, `bc` will issue warnings, but continue, if any constructs are used 3569that are not strictly compatible with the [POSIX `bc` specification][2]. 3570 3571### Memory Management 3572 3573The memory management in `bc` is simple: everything is owned by one thing. 3574 3575If something is in a vector, it is owned by that vector. 3576 3577If something is contained in a struct, it is owned by that struct with one 3578exception: structs can be given pointers to other things, but only if those 3579other things will outlast the struct itself. 3580 3581As an example, the `BcParse` struct has a pointer to the one `BcProgram` in 3582`bc`. This is okay because the program is initialized first and deallocated 3583last. 3584 3585In other words, it's simple: if a field in a struct is a pointer, then unless 3586that pointer is directly allocated by the struct (like the vector array or the 3587number limb array), that struct does not own the item at that pointer. 3588Otherwise, the struct *does* own the item. 3589 3590### [Async-Signal-Safe][115] Signal Handling 3591 3592`bc` is not the typical Unix utility. Most Unix utilities are I/O bound, but 3593`bc` is, by and large, CPU-bound. This has several consequences, but the biggest 3594is that there is no easy way to allow signals to interrupt it. 3595 3596This consequence is not obvious, but it comes from the fact that a lot of I/O 3597operations can be interrupted and return [`EINTR`][198]. This makes such I/O 3598calls natural places for allowing signals to interrupt execution, even when the 3599signal comes during execution, and not interrupting I/O calls. The way this is 3600done is setting a flag in the signal handler, which is checked around the time 3601of the I/O call, when it is convenient. 3602 3603Alternatively, I/O bound programs can use the [self-pipe trick][199]. 3604 3605Neither of these are possible in `bc` because the execution of math code can 3606take a long time. If a signal arrives during this long execution time, setting a 3607flag like an I/O bound application and waiting until the next I/O call could 3608take seconds, minutes, hours, or even days. (Last I checked, my `bc` takes a 3609week to calculate a million digits of pi, and it's not slow as far as `bc` 3610implementations go.) 3611 3612Thus, using just the technique of setting the flag just will not work for an 3613interactive calculator. 3614 3615Well, it can, but it requires a lot of code and massive inefficiencies. I know 3616this because that was the original implementation. 3617 3618The original implementation set a flag and just exit the signal handler. Then, 3619on just about every loop header, I have a check for the signal flag. These 3620checks happened on every iteration of every loop. It was a massive waste because 3621it was polling, and [polling is evil][200]. 3622 3623So for version [3.0.0][32], I expended a lot of effort to change the 3624implementation. 3625 3626In the new system, code *outside* the signal handler sets a flag (`vm.sig_lock`) 3627to tell the signal handler whether it can use `longjmp()` to stop the current 3628execution. If so, it does. If not, it sets a flag, which then is used by the 3629code outside the signal handler that set the `vm.sig_lock` flag. When that code 3630unsets `vm.sig_lock`, it checks to see if a signal happened, and if so, that 3631code executes the `longjmp()` and stops the current execution. 3632 3633Other than that, the rest of the interrupt-based implementation is best 3634described in the [Error Handling][97]. 3635 3636However, there are rules for signal handlers that I must lay out. 3637 3638First, signal handlers can only call [async-signal-safe][115] functions. 3639 3640Second, any field set or read by both the signal handler and normal code must be 3641a `volatile sig_atomic_t`. 3642 3643Third, when setting such fields, they must be set to constants and no math can 3644be done on them. This restriction and the above restriction exist in order to 3645ensure that the setting of the fields is always atomic with respect to signals. 3646 3647These rules exist for *any* code using Unix signal handlers, not just `bc`. 3648 3649#### Vectors and Numbers 3650 3651Vectors and numbers needed special consideration with the interrupt-based signal 3652handling. 3653 3654When vectors and numbers are about to allocate, or *reallocate* their arrays, 3655they need to lock signals to ensure that they do not call `malloc()` and friends 3656and get interrupted by a signal because, as you will see in the [Error 3657Handling][97] section, `longjmp()` cannot be used in a signal handler if it may 3658be able to interrupt a non-[async-signal-safe][115] function like `malloc()` and 3659friends. 3660 3661### Asserts 3662 3663If you asked me what procedure is used the most in `bc`, I would reply without 3664hesitation, "`assert()`." 3665 3666I use `assert()` everywhere. In fact, it is what made [fuzzing][82] with 3667[AFL++][125] so effective. [AFL++][125] is incredibly good at finding crashes, 3668and a failing `assert()` counts as one. 3669 3670So while a lot of bad bugs might have corrupted data and *not* caused crashes, 3671because I put in so many `assert()`'s, they were *turned into* crashing bugs, 3672and [AFL++][125] found them. 3673 3674By far, the most bugs it found this way was in the `bc` parser. (See the [`bc` 3675Parsing][110] for more information.) And even though I was careful to put 3676`assert()`'s everywhere, most parser bugs manifested during execution of 3677bytecode because the virtual machine assumes the bytecode is valid. 3678 3679Sidenote: one of those bugs caused an infinite recursion when running the sine 3680(`s()`) function in the math library, so yes, parser bugs can be *very* weird. 3681 3682Anyway, the way I did `assert()`'s was like this: whenever I realized that I 3683had put assumptions into the code, I would put an `assert()` there to test it 3684**and** to *document* it. 3685 3686Yes, documentation. In fact, by far the best documentation of the code in `bc` 3687is actually the `assert()`'s. The only time I would not put an `assert()` to 3688test an assumption is if that assumption was already tested by an `assert()` 3689earlier. 3690 3691As an example, if a function calls another function and passes a pointer that 3692the caller previously `assert()`'ed was *not* `NULL`, then the callee does not 3693have to `assert()` it too, unless *also* called by another function that does 3694not `assert()` that. 3695 3696At first glance, it may seem like putting asserts for pointers being non-`NULL` 3697everywhere would actually be good, but unfortunately, not for fuzzing. Each 3698`assert()` is a branch, and [AFL++][125] rates its own effectiveness based on 3699how many branches it covers. If there are too many `assert()`'s, it may think 3700that it is not being effective and that more fuzzing is needed. 3701 3702This means that `assert()`'s show up most often in two places: function 3703preconditions and function postconditions. 3704 3705Function preconditions are `assert()`'s that test conditions relating to the 3706arguments a function was given. They appear at the top of the function, usually 3707before anything else (except maybe initializing a local variable). 3708 3709Function postconditions are `assert()`'s that test the return values or other 3710conditions when a function exits. These are at the bottom of a function or just 3711before a `return` statement. 3712 3713The other `assert()`'s cover various miscellaneous assumptions. 3714 3715If you change the code, I ***HIGHLY*** suggest that you use `assert()`'s to 3716document your assumptions. And don't remove them when [AFL++][125] gleefully 3717crashes `bc` and `dc` over and over again. 3718 3719### Vectors 3720 3721In `bc`, vectors mean resizable arrays, and they are the most fundamental piece 3722of code in the entire codebase. 3723 3724I had previously written a [vector implementation][112], which I used to guide 3725my decisions, but I wrote a new one so that `bc` would not have a dependency. I 3726also didn't make it as sophisticated; the one in `bc` is very simple. 3727 3728Vectors store some information about the type that they hold: 3729 3730* The size (as returned by `sizeof`). 3731* An enum designating the destructor. 3732 3733If the destructor is `BC_DTOR_NONE`, it is counted as the type not having a 3734destructor. 3735 3736But by storing the size, the vector can then allocate `size * cap` bytes, where 3737`cap` is the capacity. Then, when growing the vector, the `cap` is doubled again 3738and again until it is bigger than the requested size. 3739 3740But to store items, or to push items, or even to return items, the vector has to 3741figure out where they are, since to it, the array just looks like an array of 3742bytes. 3743 3744It does this by calculating a pointer to the underlying type with 3745`v + (i * size)`, where `v` is the array of bytes, `i` is the index of the 3746desired element, and `size` is the size of the underlying type. 3747 3748Doing that, vectors can avoid undefined behavior (because `char` pointers can 3749be cast to any other pointer type), while calculating the exact position of 3750every element. 3751 3752Because it can do that, it can figure out where to push new elements by 3753calculating `v + (len * size)`, where `len` is the number of items actually in 3754the vector. 3755 3756By the way, `len` is different from `cap`. While `cap` is the amount of storage 3757*available*, `len` is the number of actual elements in the vector at the present 3758point in time. 3759 3760Growing the vector happens when `len` is equal to `cap` *before* pushing new 3761items, not after. 3762 3763To add a destructor, you need to add an enum item to `BcDtorType` in 3764[`include/vector.h`][174] and add the actual destructor in the same place as the 3765enum item in the `bc_vec_dtors[]` array in [`src/data.c`][131]. 3766 3767#### Pointer Invalidation 3768 3769There is one big danger with the vectors as currently implemented: pointer 3770invalidation. 3771 3772If a piece of code receives a pointer from a vector, then adds an item to the 3773vector before they finish using the pointer, that code must then update the 3774pointer from the vector again. 3775 3776This is because any pointer inside the vector is calculated based off of the 3777array in the vector, and when the vector grows, it can `realloc()` the array, 3778which may move it in memory. If that is done, any pointer returned by 3779`bc_vec_item()`, `bc_vec_top()` and `bc_vec_item_rev()` will be invalid. 3780 3781This fact was the single most common cause of crashes in the early days of this 3782`bc`; wherever I have put a comment about pointers becoming invalidated and 3783updating them with another call to `bc_vec_item()` and friends, *do **NOT** 3784remove that code!* 3785 3786#### Maps 3787 3788Maps in `bc` are...not. 3789 3790They are really a combination of two vectors. Those combinations are easily 3791recognized in the source because one vector is named `<name>s` (plural), and the 3792other is named `<name>_map`. 3793 3794There are currently three, all in `BcProgram`: 3795 3796* `fns` and `fn_map` (`bc` functions). 3797* `vars` and `var_map` (variables). 3798* `arrs` and `arr_map` (arrays). 3799 3800They work like this: the `<name>_map` vector holds `BcId`'s, which just holds a 3801string and an index. The string is the name of the item, and the index is the 3802index of that item in the `<name>s` vector. 3803 3804Obviously, I could have just done a linear search for items in the `<name>s` 3805vector, but that would be slow with a lot of functions/variables/arrays. 3806Instead, I ensure that whenever an item is inserted into the `<name>_map` 3807vector, the item is inserted in sorted order. This means that the `<name>_map` 3808is always sorted (by the names of the items). 3809 3810So when looking up an item in the "map", what is really done is this: 3811 38121. A binary search is carried out on the names in the `<name>_map` vector. 38132. When one is found, it returns the index in the `<name>_map` vector where the 3814 item was found. 38153. This index is then used to retrieve the `BcId`. 38164. The index from the `BcId` is then used to index into the `<name>s` vector, 3817 which returns the *actual* desired item. 3818 3819Why were the `<name>s` and `<name>_map` vectors not combined for ease? The 3820answer is that sometime, when attempting to insert into the "map", code might 3821find that something is already there. For example, a function with that name may 3822already exist, or the variable might already exist. 3823 3824If the insert fails, then the name already exists, and the inserting code can 3825forego creating a new item to put into the vector. However, if there is no item, 3826the inserting code must create a new item and insert it. 3827 3828If the two vectors were combined together, it would not be possible to separate 3829the steps such that creating a new item could be avoided if it already exists. 3830 3831#### Slabs and Slab Vectors 3832 3833`bc` allocates *a lot* of small strings, and small allocations are the toughest 3834for general-purpose allocators to handle efficiently. 3835 3836Because of that reason, I decided to create a system for allocating small 3837strings using something that I call a "slab vector" after [slab 3838allocators][201]. 3839 3840These vectors allocate what I call "slabs," which are just an allocation of a 3841single page with a length to tell the slab how much of the slab is used. 3842 3843The vector itself holds slabs, and when the slab vector is asked to allocate a 3844string, it attempts to in the last slab. If that slab cannot do so, it allocates 3845a new slab and allocates from that. 3846 3847There is one exception: if a string is going to be bigger than 128 bytes, then 3848the string is directly allocated, and a slab is created with that pointer and a 3849length of `SIZE_MAX`, which tells the slab vector that it is a direct 3850allocation. Then, the last slab is pushed into the next spot and the new special 3851slab is put into the vacated spot. This ensures that a non-special slab is 3852always last. 3853 3854### Command-Line History 3855 3856When I first wrote `bc`, I immediately started using it in order to eat my own 3857dog food. 3858 3859It sucked, and the biggest reason why was because of the lack of command-line 3860history. 3861 3862At first, I just dealt with it, not knowing how command-line history might be 3863implemented. 3864 3865Eventually, I caved and attempted to adapt [`linenoise-mob`][28], which I had 3866known about for some time. 3867 3868It turned out to be easier than I thought; the hardest part was the tedious 3869renaming of everything to fit the `bc` naming scheme. 3870 3871Understanding command-line history in `bc` is really about understanding VT-100 3872escape codes, so I would start there. 3873 3874Now, the history implementation of `bc` has been adapted far beyond that initial 3875adaptation to make the command-line history implementation perfect for `bc` 3876alone, including integrating it into `bc`'s [Custom I/O][114] and making sure 3877that it does not disturb output that did not end with a newline. 3878 3879On top of that, at one point, I attempted to get history to work on Windows. It 3880barely worked after a lot of work and a lot of portability code, but even with 3881all of that, it does not have at least one feature: multi-line pasting from the 3882clipboard. 3883 3884#### Editline and Readline 3885 3886I also implemented an integration with both editline and readline, though not at 3887the same time. This allows distributions that want to use either one in `bc` to 3888do so. This enables users to use their `.editrc` and `.inputrc` settings. 3889 3890The integration is custom to each library, and it does *not* involve any of 3891`bc`'s custom history code. It also required changes to `bc`'s [Custom 3892I/O][114]. 3893 3894### Error Handling 3895 3896The error handling on `bc` got an overhaul for version [`3.0.0`][32], and it 3897became one of the things that taught me the most about C in particular and 3898programming in general. 3899 3900Before then, error handling was manual. Almost all functions returned a 3901`BcStatus` indicating if an error had occurred. This led to a proliferation of 3902lines like: 3903 3904``` 3905if (BC_ERR(s)) return s; 3906``` 3907 3908In fact, a quick and dirty count of such lines in version `2.7.2` (the last 3909version before [`3.0.0`][32]) turned up 252 occurrences of that sort of line. 3910 3911And that didn't even guarantee that return values were checked *everywhere*. 3912 3913But before I can continue, let me back up a bit. 3914 3915From the beginning, I decided that I would not do what GNU `bc` does on errors; 3916it tries to find a point at which it can recover. Instead, I decided that I 3917would have `bc` reset to a clean slate, which I believed, would reduce the 3918number of bugs where an unclean state caused errors with continuing execution. 3919 3920So from the beginning, errors would essentially unwind the stack until they got 3921to a safe place from which to clean the slate, reset, and ask for more input. 3922 3923Well, if that weren't enough, `bc` also has to handle [POSIX signals][113]. As 3924such, it had a signal handler that set a flag. But it could not safely interrupt 3925execution, so that's all it could do. 3926 3927In order to actually respond to the signal, I had to litter checks for the flag 3928*everywhere* in the code. And I mean *everywhere*. They had to be checked on 3929every iteration of *every* loop. They had to be checked going into and out of 3930certain functions. 3931 3932It was a mess. 3933 3934But fortunately for me, signals did the same thing that errors did: they unwound 3935the stack to the *same* place. 3936 3937Do you see where I am going with this? 3938 3939It turns out that what I needed was a [async-signal-safe][115] form of what 3940programmers call "exceptions" in other languages. 3941 3942I knew that [`setjmp()`][116] and [`longjmp()`][117] are used in C to implement 3943exceptions, so I thought I would learn how to use them. How hard could it be? 3944 3945Quite hard, it turns out, especially in the presence of signals. And that's 3946because there are a few big snares: 3947 39481. The value of any local variables are not guaranteed to be preserved after a 3949 `longjmp()` back into a function. 39502. While `longjmp()` is required to be [async-signal-safe][115], if it is 3951 invoked by a signal handler that interrupted a non-[async-signal-safe][115] 3952 function, then the behavior is undefined. 39533. Any mutation that is not guaranteed to be atomic with respect to signals may 3954 be incomplete when a signal arrives. 3955 3956Oh boy. 3957 3958For number 1, the answer to this is to hide data that must stay changed behind 3959pointers. Only the *pointers* are considered local, so as long as I didn't do 3960any modifying pointer arithmetic, pointers and their data would be safe. For 3961cases where I have local data that must change and stay changed, I needed to 3962*undo* the `setjmp()`, do the change, and the *redo* the `setjmp()`. 3963 3964For number 2 and number 3, `bc` needs some way to tell the signal handler that 3965it cannot do a `longjmp()`. This is done by "locking" signals with a `volatile 3966sig_atomic_t`. (For more information, see the [Async-Signal-Safe Signal 3967Handling][173] section.) For every function that calls a function that is not 3968async-signal-safe, they first need to use `BC_SIG_LOCK` to lock signals, and 3969afterward, use `BC_SIG_UNLOCK` to unlock signals. 3970 3971Code also need to do this for all global, non-atomic mutation, which means that 3972modifying any part of the `BcVm` global struct. 3973 3974`BC_SIG_UNLOCK` has another requirement: it must check for signals or errors and 3975jump if necessary. 3976 3977On top of all of that, *all* functions with cleanup needed to be able to run 3978their cleanup. This meant that `longjmp()` could not just jump to the finish; it 3979had to start what I call a "jump series," using a stack of `jmp_buf`'s 3980(`jmp_bufs` in `BcVm`). Each `longjmp()` uses the top of the `jmp_bufs` stack to 3981execute its jump. Then, if the cleanup code was executed because of a jump, the 3982cleanup code was responsible for continuing the jump series by popping the 3983previous item off the stack and using the new top of the stack for a jump. 3984 3985In this way, C++-style exceptions were implemented in pure C. Not fun, but it 3986works. However, the order of operations matters, especially in the macros that 3987help implement the error handling. 3988 3989For example, in `BC_UNSETJMP`, signals are unlocked before checking for signals. 3990If a signal comes between, that's fine; it will still cause a jump to the right 3991place. However, disabling the lock after could mean that a signal could come 3992*after* checking for signals, but before signals were unlocked, causing the 3993handling of the signal to be delayed. 3994 3995#### Custom I/O 3996 3997Why did I implement my own buffered I/O for `bc`? Because I use `setjmp()` and 3998`longjmp()` for error handling (see the [Error Handling][97] section), and the 3999buffered I/O in `libc` does not interact well with the use of those procedures; 4000all of the buffered I/O API is basically non-[async-signal-safe][115]. 4001 4002Implementing custom buffered I/O had other benefits. First, it allowed me to 4003tightly integrate history with the I/O code. Second, it allowed me to make 4004changes to history in order to make it adapt to user prompts. 4005 4006### Lexing 4007 4008To simplify parsing, both calculators use lexers to turn the text into a more 4009easily-parsable form. 4010 4011While some tokens are only one character long, others require many tokens, and 4012some of those need to store all of the text corresponding to the token for use 4013by the parsers. Tokens that need to store their corresponding text include, but 4014are not limited to: 4015 4016* Strings. 4017* Numbers. 4018* Identifiers. 4019 4020For this purpose, the lexer has a [vector][111] named `str` to store the data 4021for tokens. This data is overwritten if another token is lexed that needs to 4022store data, so the parsers need to copy the data before calling the lexer again. 4023 4024Both lexers do some of the same things: 4025 4026* Lex identifiers into tokens, storing the identifier in `str`. 4027* Lex number strings into tokens, storing the string in `str`. 4028* Lex whitespace. 4029* Lex comments. 4030 4031Other than that, and some common plumbing, the lexers have separate code. 4032 4033#### `dc` Lexing 4034 4035The `dc` lexer is remarkably simple; in fact, besides [`src/main.c`][205], 4036[`src/bc.c`][40], and [`src/dc.c`][44], which just contain one function each, 4037the only file smaller than [`src/dc_lex.c`][45] is [`src/args.c`][206], which 4038just processes command-line arguments after they are parsed by 4039[`src/opt.c`][51]. 4040 4041For most characters, the `dc` lexer is able to convert directly from the 4042character to its corresponding token. This happens using `dc_lex_tokens[]` in 4043[`src/data.c`][131]. 4044 4045`dc`'s lexer also has to lex the register name after lexing tokens for commands 4046that need registers. 4047 4048And finally, `dc`'s lexer needs to parse `dc` strings, which is the only part of 4049the `dc` lexer that is more complex than the `bc` lexer. This is because `dc` 4050strings need to have a balanced number of brackets. 4051 4052#### `bc` Lexing 4053 4054The `bc` lexer is fairly simple. It does the following things: 4055 4056* Lexes `bc` strings. 4057* Lexes `bc` identifiers. This is necessary because this is how `bc` keywords 4058 are lexed. After ensuring that an identifier is not a keyword, the `bc` lexer 4059 allows the common identifier function to take over. 4060* Turns characters and groups of characters into `bc` operator tokens. 4061 4062### Parsing 4063 4064The difference between parsing `bc` and `dc` code is...vast. The `dc` parser is 4065simple, while the `bc` parser is the most complex piece of code in the entire 4066codebase. 4067 4068However, they both do some of the same things. 4069 4070First, the parsers do *not* use [abstract syntax trees][207]; instead, they 4071directly generate the bytecode that will be executed by the `BcProgram` code. 4072Even in the case of `bc`, this heavily simplifies the parsing because the 4073[Shunting-Yard Algorithm][109] is designed to generate [Reverse Polish 4074Notation][108], which is basically directly executable. 4075 4076Second, any extra data that the `BcProgram` needs for execution is stored into 4077functions (see the [Functions][208] section). These include constants and 4078strings. 4079 4080#### `dc` Parsing 4081 4082The parser for `dc`, like its lexer, is remarkably simple. In fact, the easiness 4083of lexing and parsing [Reverse Polish notation][108] is probably why it was used 4084for `dc` when it was first created at Bell Labs. 4085 4086For most tokens, the `dc` parser is able to convert directly from the token 4087to its corresponding instruction. This happens using `dc_parse_insts[]` in 4088[`src/data.c`][131]. 4089 4090`dc`'s parser also has to parse the register name for commands that need 4091registers. This is the most complex part of the `dc` parser; each different 4092register command needs to be parsed differently because most of them require two 4093or more instructions to execute properly. 4094 4095For example, storing in a register requires a swap instruction and an assignment 4096instruction. 4097 4098Another example are conditional execution instructions; they need to produce the 4099instruction for the condition, and then they must parse a possible "else" part, 4100which might not exist. 4101 4102##### Existing Commands 4103 4104`dc` is based on commands, which are usually one letter. The following table is 4105a table of which ASCII characters are already used: 4106 4107| Characters | Used? | For... | 4108|------------|-------|--------------------------------------------| 4109| Space | x | Separator | 4110| `!` | x | Conditional Execution of Registers | 4111| `"` | x | Bounded Rand Operator | 4112| `#` | x | Comments | 4113| `$` | x | Truncation | 4114| `%` | x | Modulus | 4115| `&` | | | 4116| `'` | x | Rand Operator | 4117| `(` | x | Greater Than Operator | 4118| `)` | x | Less Than Operator | 4119| `*` | x | Multiplication | 4120| `+` | x | Addition | 4121| `,` | x | Depth of Execution Stack | 4122| `-` | x | Subtraction | 4123| `.` | x | Numbers | 4124| `/` | x | Division | 4125| `0-9` | x | Numbers | 4126| `:` | x | Store into Array | 4127| `;` | x | Load from Array | 4128| `<` | x | Conditional Execution of Registers | 4129| `=` | x | Conditional Execution of Registers | 4130| `>` | x | Conditional Execution of Registers | 4131| `?` | x | Ask for User Input | 4132| `@` | x | Places Operator | 4133| `A-F` | x | Numbers | 4134| `G` | x | Equal Operator | 4135| `H` | x | Shift Left | 4136| `I` | x | Push `ibase` onto Stack | 4137| `J` | x | Push `seed` onto Stack | 4138| `K` | x | Push `scale` onto Stack | 4139| `L` | x | Pop off of Register | 4140| `M` | x | Boolean And Operator | 4141| `N` | x | Boolean Not Operator | 4142| `O` | x | Push `obase` onto Stack | 4143| `P` | x | Byte Stream Printing | 4144| `Q` | x | Quit Some Number of Macros | 4145| `R` | x | Pop Top of Stack | 4146| `S` | x | Push onto Register | 4147| `T` | x | Push Max `ibase` onto Stack | 4148| `U` | x | Push Max `obase` onto Stack | 4149| `V` | x | Push Max `scale` onto Stack | 4150| `W` | x | Push Max of `'` Operator | 4151| `X` | x | Scale of a Number | 4152| `Y` | x | Length of Array | 4153| `Z` | x | Number of Significant Digits | 4154| `[` | x | Strings | 4155| `\\` | x | Escaping Brackets in Strings | 4156| `]` | x | Strings | 4157| `^` | x | Power | 4158| `_` | x | Negative Numbers and Negation | 4159| Backtick | | | 4160| `a` | x | Asciify | 4161| `b` | x | Absolute Value | 4162| `c` | x | Clear Stack | 4163| `d` | x | Duplication of Top of Stack | 4164| `e` | x | Else in Conditional Execution of Registers | 4165| `f` | x | Printing the Stack | 4166| `g` | x | Global Settings | 4167| `h` | x | Shift Right | 4168| `i` | x | Set `ibase` | 4169| `j` | x | Set `seed` | 4170| `k` | x | Set `scale` | 4171| `l` | x | Load from Register | 4172| `m` | x | Boolean Or Operator | 4173| `n` | x | Print and Pop | 4174| `o` | x | Set `obase` | 4175| `p` | x | Print with Newline | 4176| `q` | x | Quit Two Macros | 4177| `r` | x | Swap Top Two Items | 4178| `s` | x | Store into Register | 4179| `t` | x | Equivalent of `bc`'s `is_number()` | 4180| `u` | x | Equivalent of `bc`'s `is_string()` | 4181| `v` | x | Square Root | 4182| `w` | | | 4183| `x` | x | Execute String | 4184| `y` | x | Current Depth of a Register | 4185| `z` | x | Current Depth of Stack | 4186| `{` | x | Greater Than or Equal Operator | 4187| `\|` | x | Moduler Exponentiation | 4188| `}` | x | Less Than or Equal Operator | 4189| `~` | x | Division and Modulus Combined | 4190 4191#### `bc` Parsing 4192 4193`bc`'s parser is, by far, the most sensitive piece of code in this software, and 4194there is a very big reason for that: `bc`'s standard is awful and defined a very 4195poor language. 4196 4197The standard says that either semicolons or newlines can end statements. Trying 4198to parse the end of a statement when it can either be a newline or a semicolon 4199is subtle. Doing it in the presence of control flow constructs that do not have 4200to use braces is even harder. 4201 4202And then comes the biggest complication of all: `bc` has to assume that it is 4203*always* at a REPL (Read-Eval-Print Loop). `bc` is, first and foremost, an 4204*interactive* utility. 4205 4206##### Flags 4207 4208All of this means that `bc` has to be able to partially parse something, store 4209enough data to recreate that state later, and return, making sure to not 4210execute anything in the meantime. 4211 4212*That* is what the flags in [`include/bc.h`][106] are: they are the state that 4213`bc` is saving for itself. 4214 4215It saves them in a stack, by the way, because it's possible to nest 4216structures, just like any other programming language. Thus, not only does it 4217have to store state, it needs to do it arbitrarily, and still be able to 4218come back to it. 4219 4220So `bc` stores its parser state with flags in a stack. Careful setting of these 4221flags, along with properly using them and maintaining the flag stack, are what 4222make `bc` parsing work, but it's complicated. In fact, as I mentioned, the `bc` 4223parser is the single most subtle, fickle, and sensitive piece of code in the 4224entire codebase. Only one thing came close once: square root, and that was only 4225sensitive because I wrote it wrong. This parser is pretty good, and it is 4226*still* sensitive. And flags are the reason why. 4227 4228For more information about what individual flags there are, see the comments in 4229[`include/bc.h`][106]. 4230 4231##### Labels 4232 4233`bc`'s language is Turing-complete. That means that code needs the ability to 4234jump around, specifically to implement control flow like `if` statements and 4235loops. 4236 4237`bc` handles this while parsing with what I called "labels." 4238 4239Labels are markers in the bytecode. They are stored in functions alongside the 4240bytecode, and they are just indices into the bytecode. 4241 4242When the `bc` parser creates a label, it pushes an index onto the labels array, 4243and the index of the label in that array is the index that will be inserted into 4244the bytecode. 4245 4246Then, when a jump happens, the index pulled out of the bytecode is used to index 4247the labels array, and the label (index) at the index is then used to set the 4248instruction pointer. 4249 4250##### Cond Labels 4251 4252"Cond" labels are so-called because they are used by conditionals. 4253 4254The key to them is that they come *before* the code that uses them. In other 4255words, when jumping to a condition, code is jumping *backwards*. 4256 4257This means that when a cond label is created, the value that should go there is 4258well-known. Cond labels are easy. 4259 4260However, they are still stored on a stack so that the parser knows what cond 4261label to use. 4262 4263##### Exit Labels 4264 4265Exit labels are not so easy. 4266 4267"Exit" labels are so-called because they are used by code "exiting" out of `if` 4268statements or loops. 4269 4270The key to them is that they come *after* the code that uses them. In other 4271words, when jumping to an exit, code is jumping *forwards*. 4272 4273But this means that when an exit label is created, the value that should go 4274there is *not* known. The code that needs it must be parsed and generated first. 4275 4276That means that exit labels are created with the index of `SIZE_MAX`, which is 4277then specifically checked for with an assert in `bc_program_exec()` before using 4278those indices. 4279 4280There should ***NEVER*** be a case when an exit label is not filled in properly 4281if the parser has no bugs. This is because every `if` statement, every loop, 4282must have an exit, so the exit must be set. If not, there is a bug. 4283 4284Exit labels are also stored on a stack so that the parser knows what exit label 4285to use. 4286 4287##### Expression Parsing 4288 4289`bc` has expressions like you might expect in a typical programming language. 4290This means [infix notation][107]. 4291 4292One thing about infix notation is that you can't just generate code straight 4293from it like you can with [Reverse Polish notation][108]. It requires more work 4294to shape it into a form that works for execution on a stack machine. 4295 4296That extra work is called the [Shunting-Yard algorithm][109], and the form it 4297translates infix notation into is...[Reverse Polish notation][108]. 4298 4299In order to understand the rest of this section, you must understand the 4300[Shunting-Yard algorithm][109]. Go do that before you read on. 4301 4302###### Operator Stack 4303 4304In `bc`, the [Shunting-Yard algorithm][109] is implemented with bytecode as the 4305output and an explicit operator stack (the `ops` field in `BcParse`) as the 4306operator stack. It stores tokens from `BcLex`. 4307 4308However, there is one **HUGE** hangup: multiple expressions can stack. This 4309means that multiple expressions can be parsed at one time (think an array element 4310expression in the middle of a larger expression). Because of that, we need to 4311keep track of where the previous expression ended. That's what `start` parameter 4312to `bc_parse_operator()` is. 4313 4314Parsing multiple expressions on one operator stack only works because 4315expressions can only *stack*; this means that, if an expression begins before 4316another ends, it must *also* end before that other expression ends. This 4317property ensures that operators will never interfere with each other on the 4318operator stack. 4319 4320Also, the operator precedence is reversed: a lower precedence *value* means a 4321higher precedence. 4322 4323###### Recursion 4324 4325Because expressions can stack, parsing expressions actually requires recursion. 4326Well, it doesn't *require* it, but the code is much more readable that way. 4327 4328This recursion is indirect; the functions that `bc_parse_expr_err()` (the actual 4329expression parsing function) calls can, in turn, call it. 4330 4331###### Expression Flags 4332 4333There is one more big thing: not all expressions in `bc` are equal. 4334 4335Some expressions have requirements that others don't have. For example, only 4336array arguments can be arrays (which are technically not expressions, but are 4337treated as such for parsing), and some operators (in POSIX) are not allowed in 4338certain places. 4339 4340For this reason, functions that are part of the expression parsing 4341infrastructure in `bc`'s parser usually take a `flags` argument. This is meant 4342to be passed to children, and somewhere, they will be checked to ensure that the 4343resulting expression meets its requirements. 4344 4345There are also places where the flags are changed. This is because the 4346requirements change. 4347 4348Maintaining the integrity of the requirements flag set is an important part of 4349the `bc` parser. However, they do not have to be stored on a stack because their 4350stack is implicit from the recursion that expression parsing uses. 4351 4352### Functions 4353 4354Functions, in `bc`, are data structures that contain the bytecode and data 4355produced by the parsers. Functions are what the `BcProgram` program executes. 4356 4357#### Main and Read Functions 4358 4359There are two functions that always exist, which I call the "main" and "read" 4360functions. 4361 4362The "main" function is the function in which any code and data outside other 4363functions is put. Basically, it is the function where the scripting code ends 4364up. 4365 4366The "read" function is the function that is reset and parsed every time a call 4367to the `read()` builtin function happens. 4368 4369#### `dc` Strings 4370 4371In `dc`, strings can be executed, and since there are no actual "functions" in 4372`dc`, strings are handled as functions. In fact, they are effectively translated 4373into functions by parsing. 4374 4375##### Tail Calls 4376 4377Since strings in `dc` are functions, and the fact that `dc` has no native loops, 4378such loops are implemented in `dc` code using strings with conditional execution 4379commands at the end of strings. 4380 4381When such conditional execution, or even unconditional execution, commands are 4382the very last commands in a string, then `dc` can perform a [tail call][202]. 4383 4384This is done by recording the fact that a tail call happened, done by 4385incrementing an integer on a stack. When a string is executed *without* a tail 4386call, a new entry is pushed onto the stack with the integer `1`. 4387 4388When a string finally quits that followed tail calls, its stack entry is popped, 4389eliminating all of those tail calls. 4390 4391Why perform tail calls? Because otherwise, `dc` would be subject to the same 4392thing that plagues [functional programming languages][203]: stack overflow. In 4393`dc`'s case, that would manifest itself as a growing [heap][204], because the 4394execution stack is stored on the heap, until a fatal allocation failure would 4395occur. 4396 4397#### Execution 4398 4399Execution is handled by an interpreter implemented using `BcProgram` and code 4400in [`src/program.c`][53]. 4401 4402The interpreter is a mix between a [stack machine][210] and a [register 4403machine][211]. It is a stack machine in that operations happen on a stack I call 4404the "results stack," but it is a register machine in that items on the stack can 4405be stored to and loaded from "registers" (`dc` terminology), variables (`bc` 4406terminology), and arrays. 4407 4408##### Stacks 4409 4410There are two stacks in the interpreter: 4411 4412* The "results" stack (as mentioned above). 4413* The "execution" stack. 4414 4415The results stack (the `results` field of the `BcProgram` struct) is the stack 4416where the results of computations are stored. It is what makes the interpreter 4417part [stack machine][210]. It is filled with `BcResult`'s. 4418 4419The execution stack (the `stack` field of the `BcProgram` struct) is the stack 4420that tracks the current execution state of the interpreter. It is the presence 4421of this separate stack that allows the interpreter to implement the machine as a 4422loop, rather than recursively. It is filled with `BcInstPtr`'s, which are the 4423"instruction pointers." 4424 4425These instruction pointers have three fields, all integers: 4426 4427* `func`, the index of the function that is currently executing. 4428* `idx`, the index of the next bytecode instruction to execute in the function's 4429 bytecode array. 4430* `len`, which is the length of the results stack when the function started 4431 executing. This is not used by `dc`, but it used by `bc` because functions 4432 in `bc` should never affect the results stack of their callers. 4433 4434With these three fields, and always executing using the instruction pointer at 4435the top of the execution stack, the interpreter can always keep track of its 4436execution. 4437 4438When a function or a string starts executing, a new `BcInstPtr` is pushed onto 4439the execution stack for it. This includes if a function was called recursively. 4440And then, when the function or string returns, its `BcInstPtr` is popped off of 4441the execution stack. 4442 4443##### Bytecode 4444 4445Execution of functions are done through bytecode produced directly by the 4446parsers (see the [Parsing][209]). This bytecode is stored in the `code` 4447[vector][111] of the `BcFunc` struct. 4448 4449This is a vector for two reasons: 4450 4451* It makes it easier to add bytecode to the vector in the parsers. 4452* `bc` allows users to redefine functions. 4453 4454The reason I can use bytecode is because there are less than 256 instructions, 4455so an `unsigned char` can store all the bytecodes. 4456 4457###### Bytecode Indices 4458 4459There is one other factor to bytecode: there are instructions that need to 4460reference strings, constants, variables, or arrays. Bytecode need some way to 4461reference those things. 4462 4463Fortunately, all of those things can be referenced in the same way: with indices 4464because all of the items are in vectors. 4465 4466So `bc` has a way of encoding an index into bytecode. It does this by, after 4467pushing the instruction that references anything, pushing a byte set to the 4468length of the index in bytes, then the bytes of the index are pushed in 4469little-endian order. 4470 4471Then, when the interpreter encounters an instruction that needs one or more 4472items, it decodes the index or indices there and updates the `idx` field of the 4473current `BcInstPtr` to point to the byte after the index or indices. 4474 4475One more thing: the encoder of the indices only pushes as many bytes as 4476necessary to encode the index. It stops pushing when the index has no more bytes 4477with any 1 bits. 4478 4479##### Variables 4480 4481In `bc`, the vector of variables, `vars` in `BcProgram`, is not a vector of 4482numbers; it is a vector of vector of numbers. The first vector is the vector of 4483variables, the second is the variable stack, and the last level is the actual 4484number. 4485 4486This is because both `bc` and `dc` need variables to be stacks. 4487 4488For `dc`, registers are *defined* to be stacks. 4489 4490For `bc`, variables as stacks is how function arguments/parameters and function 4491`auto` variables are implemented. 4492 4493When a function is called, and a value needs to be used as a function argument, 4494a copy of the value is pushed onto the stack corresponding to the variable with 4495the same name as the function's parameter. For `auto` variables, a new number 4496set to zero is pushed onto each stack corresponding to the `auto` variables. 4497(Zero is used because the [`bc` spec][2] requires that `auto` variables are set 4498to zero.) 4499 4500It is in this way that the old value of the variable, which may not even be 4501related to the function parameter or `auto` variable, is preserved while the 4502variable is used as a function parameter or `auto` variable. 4503 4504When the function returns, of course, the stacks of the variables for the 4505parameters and `auto`'s will have their top item popped, restoring the old value 4506as it was before the function call. 4507 4508##### Arrays 4509 4510Like variables, arrays are also implemented as stacks. However, because they are 4511arrays, there is yet another level; the `arrs` field in `BcProgram` is a vector 4512of vectors of vectors of numbers. The first of the two levels is the vector of 4513arrays, the second the stack of for each array, the third the actual array, and 4514last the numbers in the array. 4515 4516`dc` has no need of this extra stack, but `bc` does because arrays can be 4517function parameters themselves. 4518 4519When arrays are used for function arguments, they are copied with a deep copy; 4520each item of the source vector is copied. This is because in `bc`, according to 4521the [`bc` spec][2], all function arguments are passed by value. 4522 4523However, array references are possible (see below). 4524 4525When arrays are used as `auto`'s, a new vector is pushed with one element; if 4526more elements are needed, the array is grown automatically, and new elements are 4527given the value of zero. 4528 4529In fact, if *any* array is accessed and does not have an element at that index, 4530the array is automatically grown to that size, and all new elements are given 4531the value zero. This behavior is guaranteed by the [`bc` spec][2]. 4532 4533###### Array References 4534 4535Array references had to be implemented as vectors themselves because they must 4536be pushed on the vectors stacks, which, as seen above, expect vectors 4537themselves. 4538 4539So thus, references are implemented as vectors on the vector stacks. These 4540vectors are not vectors of vectors themselves; they are vectors of bytes; in 4541fact, the fact that they are byte vectors and not vector vectors is how a 4542reference vector is detected. 4543 4544These reference vectors always have the same two things pushed: a byte encoding 4545(the same way bytecode indices are) of the referenced vector's index in the 4546`arrs` vector, and a byte encoding of the referenced vectors index in the vector 4547stack. 4548 4549If an item in a referenced vector is needed, then the reference is dereferenced, 4550and the item is returned. 4551 4552If a reference vector is passed to a function that does *not* expect a 4553reference, the vector is dereferenced and a deep copy is done, in the same way 4554as vectors are copied for normal array function parameters. 4555 4556### Callbacks 4557 4558There are many places in `bc` and `dc` where function pointers are used: 4559 4560* To implement destructors in vectors. (See the [Vectors][111] section.) 4561* To select the correct lex and parse functions for `bc` and `dc`. 4562* To select the correct function to execute unary operators. 4563* To select the correct function to execute binary operators. 4564* To calculate the correct number size for binary operators. 4565* To print a "digit" of a number. 4566* To seed the pseudo-random number generator. 4567 4568And there might be more. 4569 4570In every case, they are used for reducing the amount of code. Instead of 4571`if`/`else` chains, such as: 4572 4573``` 4574if (BC_IS_BC) { 4575 bc_parse_parse(vm.parse); 4576} 4577else { 4578 dc_parse_parse(vm.parse); 4579} 4580``` 4581 4582The best example of this is `bc_num_binary()`. It is called by every binary 4583operator. It figures out if it needs to allocate space for a new `BcNum`. If so, 4584it allocates the space and then calls the function pointer to the *true* 4585operation. 4586 4587Doing it like that shrunk the code *immensely*. First, instead of every single 4588binary operator duplicating the allocation code, it only exists in one place. 4589Second, `bc_num_binary()` itself does not have a massive `if`/`else` chain or a 4590`switch` statement. 4591 4592But perhaps the most important use was for destructors in vectors. 4593 4594Most of the data structures in `bc` are stored in vectors. If I hadn't made 4595destructors available for vectors, then ensuring that `bc` had no memory leaks 4596would have been nigh impossible. As it is, I check `bc` for memory leaks every 4597release when I change the code, and I have not released `bc` after version 4598`1.0.0` with any memory leaks, as far as I can remember anyway. 4599 4600### Numbers 4601 4602In order to do arbitrary-precision math, as `bc` must do, there must be some way 4603of representing arbitrary-precision numbers. `BcNum` in [`include/num.h`][184] 4604is `bc`'s way of doing that. 4605 4606(Note: the word ["limb"][214] is used below; it has a specific meaning when 4607applied to arbitrary-precision numbers. It means one piece of the number. It can 4608have a single digit, which is what GNU `bc` does, or it can have multiple, which 4609is what this `bc` does.) 4610 4611This struct needs to store several things: 4612 4613* The array of limbs of the number. This is the `num` field. 4614* The location of the decimal point. This is the `rdx` (short for [radix][215]) 4615 field. 4616* The number of limbs the number has. This is the `len` field. 4617* Whether the number is negative or not. This is the least significant bit of 4618 the `rdx` field. More on that later. 4619 4620In addition, `bc`'s number stores the capacity of the limb array; this is the 4621`cap` field. 4622 4623If the number needs to grow, and the capacity of the number is big enough, the 4624number is not reallocated; the number of limbs is just added to. 4625 4626There is one additional wrinkle: to make the usual operations (binary operators) 4627fast, the decimal point is *not* allowed to be in the middle of a limb; it must 4628always be between limbs, after all limbs (integer), or before all limbs (real 4629between -1 and 1). 4630 4631The reason for this is because addition, subtraction, multiplication, and 4632division expect digits to be lined up on the decimal point. By requiring that it 4633be between limbs, no extra alignment is needed, and those operations can proceed 4634without extra overhead. 4635 4636This does make some operations, most notably extending, truncating, and 4637shifting, more expensive, but the overhead is constant, and these operations are 4638usually cheap compared to the binary operators anyway. 4639 4640This also requires something else: `bc` numbers need to know *exactly* how many 4641decimal places they have after the decimal point. If the decimal point must be 4642inbetween limbs, the last decimal place could be in the middle of a limb. The 4643amount of decimal places in a number is carefully tracked and stored in the 4644`scale` field, and this number must always coincide with the `rdx` field by the 4645following formula: 4646 4647``` 4648scale + (BC_BASE_DIGS - 1) / BC_BASE_DIGS == rdx >> 1 4649``` 4650 4651(`BC_BASE_DIGS` is the number of decimal digits stored in one limb. It is 9 on 465264-bit systems and 4 on other systems.) 4653 4654Yes, `rdx` is shifted; that is because the negative bit is stored in the least 4655significant bit of the `rdx` field, and the actual radix (amount of limbs after 4656the decimal/radix point) is stored in the rest of the bits. This is safe because 4657`BC_BASE_DIGS` is always at least 4, which means `rdx` will always need at least 46582 bits less than `scale`. 4659 4660In addition to `rdx` always matching `scale`, another invariant is that `rdx` 4661must always be less than or equal to `len`. (Because `scale` may be greater than 4662`rdx`, `scale` does not have to be less than or equal to `len`.) 4663 4664Another invariant is that `len` must always be less than or equal to `cap`, for 4665obvious reasons. 4666 4667The last thing programmers need to know is that the limb array is stored in 4668little-endian order. This means that the last decimal places are in the limb 4669stored at index 0, and the most significant digits are stored at index `len-1`. 4670 4671This is done to make the most important operations fast. Addition and 4672subtraction are done from least significant to most significant limbs, which 4673means they can speed through memory in the way most computers are best at. 4674Multiplication does the same, sort of, and with division, it matters less. 4675Comparison does need to go backwards, but that's after exhausting all other 4676alternatives, including for example, checking the length of the integer portion 4677of each limb array. 4678 4679Finally, here are some possible special situations with numbers and what they 4680mean: 4681 4682* `len == 0`: the number equals 0. 4683* `len == 0 && scale != 0`: the number equals 0, but it has a `scale` value. 4684 This is the only case where `scale` does not have to coincide with `rdx` 4685 This can happen with division, for example, that sets a specific `scale` for 4686 the result value but may produce 0. 4687* `(rdx >> 1) == len`: the number is greater than or equal to 1, or less than or 4688 equal to -1. 4689* `(rdx >> 1) < len`: the number is greater than -1 and less than 1, not 4690 including 0, although this will be true for 0 as well. However, 0 is always 4691 assumed to be represented by `len == 0`. 4692* `(rdx >> 1) == 0`: the number is an integer. In this case, `scale` must also 4693 equal 0. 4694 4695#### Math Style 4696 4697When I wrote the math for `bc`, I adopted a certain style that, if known, will 4698make it easier to understand the code. The style follows these rules: 4699 4700* `BcNum` arguments always come before arguments of other types. 4701* Among the `BcNum` arguments, the operands always come first, and the `BcNum` 4702 where the result(s) will be stored come last. 4703* Error checking is placed first in the function. 4704* Easy cases are placed next. 4705* Preparation, such as allocating temporaries, comes next. 4706* The actual math. 4707* Cleanup and ensuring invariants. 4708 4709While these rules are not hard and fast, using them as a guide will probably 4710help. 4711 4712#### Digit Clamping 4713 4714There is a question of what to do when parsing numbers and coming across digits 4715that are greater than or equal to the current `ibase`. `bc` and `dc` do one of 4716two things: 4717 4718* Clamp the digit to the maximum correct digit, or 4719* Use the digit as-is, multiplying it by the `ibase` to the power of the digit's 4720 position (starting from 0 at the least significant digit). 4721 4722The former behavior is what GNU `bc` does, and the latter is what GNU `dc`, the 4723OpenBSD `bc`, and the OpenBSD `dc` do. 4724 4725It is possible to switch between the two with the `-c` and `-C` command-line 4726options. However, it is important for developers to understand that both 4727behaviors exist and should exist. 4728 4729The library can also do both, and it is set in a global for each thread. 4730 4731### Strings as Numbers 4732 4733Strings can be assigned to variables. This is a problem because the vectors for 4734variable stacks expect `BcNum` structs only. 4735 4736While I could have made a union, I decided that the complexity of adding an 4737entirely new type, with destructor and everything, was not worth it. Instead, I 4738took advantage of the fact that `free()`, when passed a `NULL` pointer, will do 4739nothing. 4740 4741Using that, I made it so `BcNum`'s could store strings instead. This is marked 4742by the `BcNum` having a `NULL` limb array (`num`) and a `cap` of 0 (which should 4743*never* happen with a real number, though the other fields could be 0). 4744 4745The `BcNum` stores the function that stores the string in the `rdx` field, and 4746it stores the index of the string in the `scale` field. This is used to actually 4747load the string if necessary. 4748 4749Note that historically, string information was stored in the `loc` field of 4750the `d` union in a `BcResult`. This was changed recently to standardize; now, 4751all string information are stored in the `n` field of the `d` union regardless. 4752This means that all string information is stored in `BcNum`'s. This removes 4753extra cases. 4754 4755Also, if a temp is made with a string, then the result type should still be 4756`BC_RESULT_STR`, not `BC_RESULT_TEMP`. This is to make it easier to do type 4757checks. 4758 4759### Pseudo-Random Number Generator 4760 4761In order to understand this section, I suggest you read the information in the 4762manpages about the pseudo-random number generator (PRNG) first; that will help 4763you understand the guarantees it has, which is important because this section 4764delves into implementation details. 4765 4766First, the PRNG I use is seeded; this is because most OS's have an excellent 4767cryptographically secure PRNG available via command-line, usually 4768`/dev/urandom`, but the only *seeded* PRNG available is usually `bash`'s 4769`$RANDOM`, which is essentially a wrapper around C's `rand()`. 4770 4771`rand()` is...bad. It is only guaranteed to return 15 bits of random data. 4772Obviously, getting good random data out of that would be hard with that alone, 4773but implementations also seem to be poor. 4774 4775On top of that, `bc` is an arbitrary-precision calculator; if I made it able to 4776generate random numbers, I could make it generate random numbers of any size, 4777and since it would be seeded, results would be reproducible, when wanted. 4778 4779So to get that, I needed a seeded PRNG with good characteristics. After scouring 4780the Internet, I decided on the [PCG PRNG][215], mostly because of [this blog 4781post][216]. Part of the reason was the behavior of the xoroshiro128+ author, who 4782hates on PCG and its author, but also because PCG seemed to do better when 4783tested by independent parties. 4784 4785After that decision, I faced a challenge: PCG requires 255 bits of seed: 128 for 4786the actual seed, and 127 for the "increment." (Melissa O'Neill, the PCG author, 4787likens the increment to selecting a codebook.) 4788 4789I could, of course, put the entire 255 bits into one massive arbitrary-precision 4790number; `bc` is good at that, after all. But that didn't sit right with me 4791because it would mean any seed selected by users would have the real portion 4792ignored, which is stupid in a program like `bc`. 4793 4794Instead, I decided to make the integer portion the increment (clamped down to 4795size), and the real portion the seed. 4796 4797In most cases, this would be a bad idea because you cannot, in general, know how 4798many decimal places you need to represent any number with `n` real digits in 4799base `b` in another base. However, there is an easy to how many decimal digits 4800after the decimal point it takes to represent reals of base 2 in base 10: the 4801power of two. 4802 4803It turns out that, for base 2 represented in base 10, the power of 2 is 4804*exactly* how many digits are necessary to represent *any* number `n/2^p`, where 4805`p` is the power of 2. This is because at every halving, the number of decimal 4806places increases by 1: 4807 4808``` 48090.5 48100.25 48110.125 48120.0625 48130.03125 48140.015625 4815... 4816``` 4817 4818This happens because because every decimal representation of `1/2^p` ends with a 48195 because 5 is half of 10. Because 5 is odd, half of it always ends with the 4820digits 25, where 2 is where the previous 5 was, and the new 5 is one decimal 4821place further right. 4822 4823So the algorithm to convert all 255 bits of the seed is as follows: 4824 48251. Convert the increment to a `BcNum`. 48262. Convert the seed to a `BcNum`. 48273. Divide the seed by `2^128` with a `scale` of 128. (For 32-bit systems, 4828 substitute 64 bits for 128.) 48294. Add the two numbers together. 4830 4831Likewise, the algorithm to convert from a user-supplied number to a seed is: 4832 48331. Truncate a copy of the number. 48342. Subtract the result from #1 from the original number. This gives the real 4835 portion of the number. 48363. Clamp the result of #1 to 127 (or 63) bits. This is the increment. 48374. Multiply the result of #2 by `2^128`. 48385. Truncate the result of #4. This is the seed. 4839 4840#### Generating Arbitrary-Precision Numbers 4841 4842I wrote a function (`bc_rand_bounded()`) that will return unbiased results with 4843any bound below the max that PCG can generate. 4844 4845To generate an integer of arbitrary size using a bound, `bc` simply uses 4846`bc_rand_bounded()` to generate numbers with a bound `10^BC_BASE_DIGS` for as 4847many limbs as needed to satisfy the bigger bound. 4848 4849To generate numbers with arbitrary precision after the decimal point, `bc` 4850merely generates an arbitrary precision integer with the bound `10^p`, where `p` 4851is the desired number of decimal places, then divides in by `10^p` with a 4852`scale` of `p`. 4853 4854## Debug Code 4855 4856Besides building `bc` in debug mode with the `-g` flag to [`configure.sh`][69], 4857programmers can also add `-DBC_DEBUG_CODE=1` to the `CFLAGS`. This will enable 4858the inclusion of *a lot* of extra code to assist with debugging. 4859 4860For more information, see all of the code guarded by `#if BC_DEBUG_CODE` in the 4861[`include/`][212] directory and in the [`src/`][213] directory. 4862 4863Yes, all of the code is guarded by `#if` preprocessor statements; this is 4864because the code should *never* be in a release build, and by making programmers 4865add this manually (not even an option to [`configure.sh`][69]), it is easier to 4866ensure that never happens. 4867 4868However, that said, the extra debug code is useful; that was why I kept it in. 4869 4870## Performance 4871 4872While I have put in a lot of effort to make `bc` as fast as possible, there 4873might be some things users can do to speed it up without changing the code. 4874 4875First, users can probably use [profile-guided optimization][217] to optimize 4876even better, using the test suite to profile. 4877 4878Second, I included macros that might help branch placement and prediction: 4879 4880* `BC_ERR(e)` 4881* `BC_UNLIKELY(e)` 4882* `BC_NO_ERR(e)` 4883* `BC_LIKELY(e)` 4884 4885`BC_ERR` is the same as `BC_UNLIKELY`, and `BC_NO_ERR` is the same as 4886`BC_LIKELY`; I just added them to also document branches that lead to error 4887conditions or *away* from error conditions. 4888 4889Anyway, if `BC_LIKELY` and `BC_UNLIKELY` are not defined during compilation, 4890they expand to nothing but the argument they were given. 4891 4892They can, however, be defined to `__builtin_expect((e), 1)` and 4893`__builtin_expect((e), 0)`, respectively, on GCC and Clang for better branch 4894prediction and placement. (For more information about `__builtin_expect()` see 4895the [GCC documentation][218].) 4896 4897There might be other compilers that can take advantage of that, but I don't know 4898anything about that. 4899 4900Also, as stated in the [build manual][219], link-time optimization is excellent 4901at optimizing this `bc`. Use it. 4902 4903### Benchmarks 4904 4905To help programmers improve performance, I have built and assembled 4906infrastructure to make benchmarking easy. 4907 4908First, in order to easily run benchmarks, I created 4909[`scripts/benchmark.sh`][220]. 4910 4911Second, I copied and adapted [`ministat.c`][223] [from FreeBSD][221], to make it 4912easier to judge whether the results are significant or not. 4913 4914Third, I made the `make` clean target `make clean_benchmarks`, to clean 4915`scripts/ministat` and the generated benchmark files. 4916 4917Fourth, I made it so [`scripts/benchmark.sh`][220] outputs the timing and memory 4918data in a format that is easy for `scripts/ministat` to digest. 4919 4920To add a benchmark, add a script in the right directory to generate the 4921benchmark. Yes, generate. 4922 4923All of the benchmarks are generated first, from `.bc` and `.dc` files in the 4924[`benchmarks/bc/`][91] and [`benchmarks/dc/`][224]. This is so that massive 4925amounts of data can be generated and then pushed through the calculators. 4926 4927If you need to benchmark `bc` or `dc` with simple loops, have the generator 4928files simply print the loop code. 4929 4930### Caching of Numbers 4931 4932In order to provide some performance boost, `bc` tries to reuse old `BcNum`'s 4933that have the default capacity (`BC_NUM_DEF_SIZE`). 4934 4935It does this by allowing `bc_num_free()` to put the limb array onto a 4936statically-allocated stack (it's just a global array with a set size). Then, 4937when a `BcNum` with the default capacity is needed, `bc_num_init()` asks if any 4938are available. If the answer is yes, the one on top of the stack is returned. 4939Otherwise, `NULL` is returned, and `bc_num_free()` knows it needs to `malloc()` 4940a new limb array. 4941 4942When the stack is filled, any numbers that `bc` attempts to put on it are just 4943freed. 4944 4945This setup saved a few percent in my testing for version [3.0.0][32], which is 4946when I added it. 4947 4948## `bcl` 4949 4950At the request of one of my biggest users, I spent the time to make a build mode 4951where the number and math code of `bc` could be wrapped into a library, which I 4952called `bcl`. 4953 4954This mode is exclusive; `bc` and `dc` themselves are *not* built when building 4955`bcl`. 4956 4957The only things in the `bc` math code that is not included is: 4958 4959* Printing newlines (clients do not care about `bc`'s line lenth restriction). 4960* `dc`'s stream print. 4961 4962Even the [pseudo-random number generator][179] is included, with extra support 4963for generating real numbers with it. (In `bc`, such support is in 4964[`lib2.bc`][26].) 4965 4966### Signal Handling 4967 4968`bcl` does not have the capability for signal handling (as of version `6.0.0`). 4969 4970### Threads 4971 4972It is possible to use `bcl` across multiple threads. However, `bcl` must be 4973initialized on each thread where it will be used. 4974 4975This is because `bcl` uses thread-specific data to store the "globals" for each 4976thread. This is to ensure that threads do not stomp on each other's numbers or 4977other data structures. 4978 4979### Contexts 4980 4981Contexts were an idea by the same user that requested `bcl`. They are meant to 4982make it so multiple clients in one program can keep their data separate from 4983each other. 4984 4985### Numbers 4986 4987Numbers in `bcl` are literally indices into an encapsulated array of numbers, 4988hidden in the context. These indices are then passed to clients to refer to 4989numbers later. 4990 4991### Operand Consumption 4992 4993Most math functions in `bcl` "consume" their operand arguments; the arguments 4994are freed, whether or not an error is returned. 4995 4996This is to make it easy to implement math code, like this: 4997 4998``` 4999n = bcl_add(bcl_mul(a, b), bcl_div(c, d)); 5000``` 5001 5002If numbers need to be preserved, they can be with `bcl_dup()`: 5003 5004``` 5005n = bcl_add(bcl_mul(bcl_dup(a), bc_dup(b)), bcl_div(bcl_dup(c), bcl_dup(d))); 5006``` 5007 5008### Errors 5009 5010Errors can be encoded in the indices representing numbers, and where necessary, 5011clients are responsible for checking those errors. 5012 5013The encoding of errors is this: if an error happens, the value `0-error` is 5014returned. To decode, do the exact same thing. Thus, any index above 5015`0-num_errors` is an error. 5016 5017If an index that represents an error is passed to a math function, that function 5018propagates the error to its result and does not perform the math operation. 5019 5020All of this is to, once again, make it easy to implement the math code as above. 5021 5022However, where possible, errors are returned directly. 5023 5024[1]: https://en.wikipedia.org/wiki/Bus_factor 5025[2]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/bc.html#top 5026[3]: https://en.wikipedia.org/wiki/Dc_(Unix) 5027[4]: https://en.wikipedia.org/wiki/Reverse_Polish_notation 5028[5]: ./bc/A.1.md#standard-library 5029[6]: https://github.com/torvalds/linux/blob/master/kernel/time/timeconst.bc 5030[7]: ./bc/A.1.md#extended-library 5031[8]: #libbc-2 5032[9]: #strgensh 5033[10]: https://vimeo.com/230142234 5034[11]: https://gavinhoward.com/2019/12/values-for-yao/ 5035[12]: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf 5036[13]: ./build.md#cross-compiling 5037[14]: ./build.md 5038[15]: #strgenc 5039[16]: http://landley.net/toybox/about.html 5040[17]: https://www.busybox.net/ 5041[18]: https://en.wikipedia.org/wiki/Karatsuba_algorithm 5042[19]: https://clang-analyzer.llvm.org/scan-build.html 5043[20]: https://www.valgrind.org/ 5044[21]: https://clang.llvm.org/docs/AddressSanitizer.html 5045[22]: https://gavinhoward.com/2019/11/finishing-software/ 5046[23]: #history 5047[24]: https://clang.llvm.org/docs/ClangFormat.html 5048[25]: ./algorithms.md 5049[26]: #lib2bc 5050[27]: #vmh 5051[28]: https://github.com/rain-1/linenoise-mob 5052[29]: https://github.com/antirez/linenoise 5053[30]: #bclh 5054[31]: #argsh 5055[32]: ../NEWS.md#3-0-0 5056[33]: ../NEWS.md 5057[34]: https://github.com/skeeto/optparse 5058[35]: #opth 5059[36]: #historyh 5060[37]: #randh 5061[38]: #langh 5062[39]: #numc 5063[40]: #bcc 5064[41]: #bc_lexc 5065[42]: #bc_parsec 5066[43]: #libraryc 5067[44]: #dcc 5068[45]: #dc_lexc 5069[46]: #dc_parsec 5070[47]: #filec 5071[48]: #historyc 5072[49]: #langc 5073[50]: #lexc 5074[51]: #optc 5075[52]: #parsec 5076[53]: #programc 5077[54]: #randc 5078[55]: #fileh 5079[56]: #readc 5080[57]: #programh 5081[58]: #vmc 5082[59]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/gencat.html#top 5083[60]: #manpagesh 5084[61]: #bcl3md 5085[62]: #bcl3 5086[63]: #bclvcxproj 5087[64]: #bclvcxprojfilters 5088[65]: #bclsln 5089[66]: #bcvcxproj 5090[67]: #bcvcxprojfilters 5091[68]: #bcsln 5092[69]: #configuresh 5093[70]: #makefilein 5094[71]: #functionsh 5095[72]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#top 5096[73]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18 5097[74]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html#top 5098[75]: #versionh 5099[76]: ##posix-shell-scripts 5100[77]: #tests 5101[78]: #karatsubapy 5102[79]: #bc-1 5103[80]: #dc-1 5104[81]: ./build.md#build-type 5105[82]: #fuzzing-1 5106[83]: #releasesh 5107[84]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 5108[85]: #locales-1 5109[86]: #manuals-1 5110[87]: #locale_installsh 5111[88]: #locale_uninstallsh 5112[89]: #bc1mdin 5113[90]: #dc1mdin 5114[91]: #bc 5115[92]: https://pandoc.org/ 5116[93]: #release_settingstxt 5117[94]: #aflpy 5118[95]: #randmathpy 5119[96]: #caching-of-numbers 5120[97]: #error-handling 5121[98]: #radamsatxt 5122[99]: https://gitlab.com/akihe/radamsa 5123[100]: #radamsash 5124[101]: https://musl.libc.org/ 5125[102]: ./build.md#settings 5126[103]: #test_settingstxt 5127[104]: #test_settingssh 5128[105]: #functionssh 5129[106]: #bch 5130[107]: https://en.wikipedia.org/wiki/Infix_notation 5131[108]: https://en.wikipedia.org/wiki/Reverse_Polish_notation 5132[109]: https://en.wikipedia.org/wiki/Shunting-yard_algorithm 5133[110]: #bc-parsing 5134[111]: #vectors 5135[112]: https://git.yzena.com/Yzena/Yc/src/branch/master/include/yc/vector.h 5136[113]: https://en.wikipedia.org/wiki/Signal_(IPC) 5137[114]: #custom-io 5138[115]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03_03 5139[116]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setjmp.html 5140[117]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/longjmp.html 5141[118]: https://www.youtube.com/watch?v=4PaWFYm0kEw 5142[119]: #fuzz_prepsh 5143[120]: #bc_aflyaml 5144[121]: #bc_afl_continueyaml 5145[122]: https://github.com/tmux/tmux 5146[123]: https://tmuxp.git-pull.com/ 5147[124]: #test-suite 5148[125]: https://aflplus.plus/ 5149[126]: #link-time-optimization 5150[127]: #fuzzing-performance 5151[128]: #radamsa 5152[129]: #afl-quickstart 5153[130]: #convenience 5154[131]: #datac 5155[132]: https://git.gavinhoward.com/gavin/vim-bc 5156[133]: https://git.gavinhoward.com/gavin/bc_libs 5157[134]: #debugging 5158[135]: #asserts 5159[136]: #portability 5160[137]: https://pexpect.readthedocs.io/en/stable/ 5161[138]: #test-suite-portability 5162[139]: #historypy 5163[140]: #historysh 5164[141]: #group-tests 5165[142]: #build-system 5166[143]: #generated-tests 5167[144]: #benchmarks-1 5168[145]: #gen 5169[146]: #test-coverage 5170[147]: #integration-with-the-build-system 5171[148]: #test-scripts 5172[149]: #standard-tests 5173[150]: #script-tests 5174[151]: #error-tests 5175[152]: #stdin-tests 5176[153]: #read-tests 5177[154]: #other-tests 5178[155]: #history-tests 5179[156]: #bcl 5180[157]: #bcl-test 5181[158]: #bclc 5182[159]: #valgrind 5183[160]: #addresssanitizer-and-friends 5184[161]: #bc-2 5185[162]: #dc-2 5186[163]: #alltxt-1 5187[164]: #errorstxt 5188[165]: #posix_errorstxt 5189[166]: #timeconstsh 5190[167]: #alltxt-3 5191[168]: #errorstxt-1 5192[169]: #scripts-1 5193[170]: #scripts-2 5194[171]: #alltxt-2 5195[172]: #alltxt-4 5196[173]: #async-signal-safe-signal-handling 5197[174]: #vectorh 5198[175]: #read_errorstxt 5199[176]: #statush 5200[177]: #numbers 5201[178]: #math-style 5202[179]: #pseudo-random-number-generator 5203[180]: #lexh 5204[181]: #parseh 5205[182]: #dch 5206[183]: #libraryh 5207[184]: #numh 5208[185]: #readh 5209[186]: #maps 5210[187]: #slabs-and-slab-vectors 5211[188]: ./build.md#extra-math 5212[189]: #command-line-history 5213[190]: #scriptsed 5214[191]: #linux-timeconstbc-script 5215[192]: #corpuses 5216[193]: ./build.md#history 5217[194]: https://www.valgrind.org/docs/manual/mc-manual.html 5218[195]: #othersh 5219[196]: https://scan.coverity.com/ 5220[197]: https://clang-analyzer.llvm.org/ 5221[198]: https://unix.stackexchange.com/questions/253349/eintr-is-there-a-rationale-behind-it 5222[199]: https://cr.yp.to/docs/selfpipe.html 5223[200]: https://skarnet.org/cgi-bin/archive.cgi?2:mss:1607:201701:dfblejammjllfkggpcph 5224[201]: https://slembcke.github.io/2020/10/12/CustomAllocators.html#1-slab-allocator 5225[202]: https://en.wikipedia.org/wiki/Tail_call 5226[203]: https://en.wikipedia.org/wiki/Functional_programming_language 5227[204]: https://en.wikipedia.org/wiki/C_dynamic_memory_allocation 5228[205]: #mainc 5229[206]: #argc 5230[207]: https://en.wikipedia.org/wiki/Abstract_syntax_tree 5231[208]: #functions 5232[209]: #parsing 5233[210]: https://en.wikipedia.org/wiki/Stack_machine 5234[211]: https://en.wikipedia.org/wiki/Register_machine 5235[212]: #include 5236[213]: #src 5237[214]: https://gmplib.org/manual/Nomenclature-and-Types 5238[215]: https://en.wikipedia.org/wiki/Radix_point 5239[216]: #main-and-read-functions 5240[215]: https://www.pcg-random.org/ 5241[216]: https://lemire.me/blog/2017/08/22/testing-non-cryptographic-random-number-generators-my-results/ 5242[217]: https://en.wikipedia.org/wiki/Profile-guided_optimization 5243[218]: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect 5244[219]: ./build.md#optimization 5245[220]: #benchmarksh 5246[221]: https://cgit.freebsd.org/src/tree/usr.bin/ministat/ministat.c 5247[222]: https://www.freebsd.org/cgi/man.cgi?query=ministat&apropos=0&sektion=0&manpath=FreeBSD+13.0-RELEASE+and+Ports&arch=default&format=html 5248[223]: #ministatc 5249[224]: #dc 5250[225]: #allsh 5251[226]: #errorssh 5252[227]: #errorsh 5253[228]: #vectorc 5254[229]: https://github.com/gavinhoward/bc/pull/72 5255