1This directory contains data needed by Bison. 2 3# Directory Content 4## Skeletons 5Bison skeletons: the general shapes of the different parser kinds, that are 6specialized for specific grammars by the bison program. 7 8Currently, the supported skeletons are: 9 10- yacc.c 11 It used to be named bison.simple: it corresponds to C Yacc 12 compatible LALR(1) parsers. 13 14- lalr1.cc 15 Produces a C++ parser class. 16 17- lalr1.java 18 Produces a Java parser class. 19 20- glr.c 21 A Generalized LR C parser based on Bison's LALR(1) tables. 22 23- glr.cc 24 A Generalized LR C++ parser. Actually a C++ wrapper around glr.c. 25 26These skeletons are the only ones supported by the Bison team. Because the 27interface between skeletons and the bison program is not finished, *we are 28not bound to it*. In particular, Bison is not mature enough for us to 29consider that "foreign skeletons" are supported. 30 31## m4sugar 32This directory contains M4sugar, sort of an extended library for M4, which 33is used by Bison to instantiate the skeletons. 34 35## xslt 36This directory contains XSLT programs that transform Bison's XML output into 37various formats. 38 39- bison.xsl 40 A library of routines used by the other XSLT programs. 41 42- xml2dot.xsl 43 Conversion into GraphViz's dot format. 44 45- xml2text.xsl 46 Conversion into text. 47 48- xml2xhtml.xsl 49 Conversion into XHTML. 50 51# Implementation Notes About the Skeletons 52 53"Skeleton" in Bison parlance means "backend": a skeleton is fed by the bison 54executable with LR tables, facts about the symbols, etc. and they generate 55the output (say parser.cc, parser.hh, location.hh, etc.). They are only in 56charge of generating the parser and its auxiliary files, they do not 57generate the XML output, the parser.output reports, nor the graphical 58rendering. 59 60The bits of information passing from bison to the backend is named 61"muscles". Muscles are passed to M4 via its standard input: it's a set of 62m4 definitions. To see them, use `--trace=muscles`. 63 64Except for muscles, whose names are generated by bison, the skeletons have 65no constraint at all on the macro names: there is no technical/theoretical 66limitation, as long as you generate the output, you can do what you want. 67However, of course, that would be a bad idea if, say, the C and C++ 68skeletons used different approaches and had completely different 69implementations. That would be a maintenance nightmare. 70 71Below, we document some of the macros that we use in several of the 72skeletons. If you are to write a new skeleton, please, implement them for 73your language. Overall, be sure to follow the same patterns as the existing 74skeletons. 75 76## Vocabulary 77 78We use "formal arguments", or "formals" for short, to denote the declared 79parameters of a function (e.g., `int argc, const char **argv`). Yes, this 80is somewhat contradictory with `param` in the `%param` directives. 81 82We use "effective arguments", or "args" for short, to denote the values 83passed in function calls (e.g., `argc, argv`). 84 85## Symbols 86 87### `b4_symbol(NUM, FIELD)` 88In order to unify the handling of the various aspects of symbols (tag, type 89name, whether terminal, etc.), bison.exe defines one macro per (token, 90field), where field can `has_id`, `id`, etc.: see 91`prepare_symbol_definitions()` in `src/output.c`. 92 93NUM can be: 94- `empty` to denote the "empty" pseudo-symbol when it exists, 95- `eof`, `error`, or `undef` 96- a symbol number. 97 98FIELD can be: 99 100- `has_id`: 0 or 1 101 Whether the symbol has an `id`. 102 103- `id`: string (e.g., `exp`, `NUM`, or `TOK_NUM` with api.token.prefix) 104 If `has_id`, the name of the token kind (prefixed by api.token.prefix if 105 defined), otherwise empty. Guaranteed to be usable as a C identifier. 106 This is used to define the token kind (i.e., the enum used by the return 107 value of yylex). Should be named `token_kind`. 108 109- `tag`: string 110 A human readable representation of the symbol. Can be `'foo'`, 111 `'foo.id'`, `'"foo"'` etc. 112 113- `code`: integer 114 The token code associated to the token kind `id`. 115 The external number as used by yylex. Can be ASCII code when a character, 116 some number chosen by bison, or some user number in the case of `%token 117 FOO <NUM>`. Corresponds to `yychar` in `yacc.c`. 118 119- `is_token`: 0 or 1 120 Whether this is a terminal symbol. 121 122- `kind_base`: string (e.g., `YYSYMBOL_exp`, `YYSYMBOL_NUM`) 123 The base of the symbol kind, i.e., the enumerator of this symbol (token or 124 nonterminal) which is mapped to its `number`. 125 126- `kind`: string 127 Same as `kind_base`, but possibly with a prefix in some languages. E.g., 128 EOF's `kind_base` and `kind` are `YYSYMBOL_YYEOF` in C, but are 129 `S_YYEMPTY` and `symbol_kind::S_YYEMPTY` in C++. 130 131- `number`: integer 132 The code associated to the `kind`. 133 The internal number (computed from the external number by yytranslate). 134 Corresponds to yytoken in yacc.c. This is the same number that serves as 135 key in b4_symbol(NUM, FIELD). 136 137 In bison, symbols are first assigned increasing numbers in order of 138 appearance (but tokens first, then nterms). After grammar reduction, 139 unused nterms are then renumbered to appear last (i.e., first tokens, then 140 used nterms and finally unused nterms). This final number NUM is the one 141 contained in this field, and it is the one used as key in `b4_symbol(NUM, 142 FIELD)`. 143 144 The code of the rule actions, however, is emitted before we know what 145 symbols are unused, so they use the original numbers. To avoid confusion, 146 they actually use "orig NUM" instead of just "NUM". bison also emits 147 definitions for `b4_symbol(orig NUM, number)` that map from original 148 numbers to the new ones. `b4_symbol` actually resolves `orig NUM` in the 149 other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the 150 symbols whose original number was 42. 151 152- `has_type`: 0, 1 153 Whether has a semantic value. 154 155- `type_tag`: string 156 When api.value.type=union, the generated name for the union member. 157 yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc. 158 159- `type`: string 160 If it has a semantic value, its type tag, or, if variant are used, 161 its type. 162 In the case of api.value.type=union, type is the real type (e.g. int). 163 164- `slot`: string 165 If it has a semantic value, the name of the union member (i.e., bounces to 166 either `type_tag` or `type`). It would be better to fix our mess and 167 always use `type` for the true type of the member, and `type_tag` for the 168 name of the union member. 169 170- `has_printer`: 0, 1 171- `printer`: string 172- `printer_file`: string 173- `printer_line`: integer 174- `printer_loc`: location 175 If the symbol has a printer, everything about it. 176 177- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`, `destructor_loc` 178 Likewise. 179 180### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])` 181Expansion of $$, $1, $<TYPE-TAG>3, etc. 182 183The semantic value from a given VAL. 184- `VAL`: some semantic value storage (typically a union). e.g., `yylval` 185- `SYMBOL-NUM`: the symbol number from which we extract the type tag. 186- `TYPE-TAG`, the user forced the `<TYPE-TAG>`. 187 188The result can be used safely, it is put in parens to avoid nasty precedence 189issues. 190 191### `b4_lhs_value(SYMBOL-NUM, [TYPE])` 192Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`. 193 194### `b4_rhs_data(RULE-LENGTH, POS)` 195The data corresponding to the symbol `#POS`, where the current rule has 196`RULE-LENGTH` symbols on RHS. 197 198### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])` 199Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols 200on RHS. 201 202<!-- 203 204Local Variables: 205mode: markdown 206fill-column: 76 207ispell-dictionary: "american" 208End: 209 210Copyright (C) 2002, 2008-2015, 2018-2021 Free Software Foundation, Inc. 211 212This file is part of GNU Bison. 213 214This program is free software: you can redistribute it and/or modify 215it under the terms of the GNU General Public License as published by 216the Free Software Foundation, either version 3 of the License, or 217(at your option) any later version. 218 219This program is distributed in the hope that it will be useful, 220but WITHOUT ANY WARRANTY; without even the implied warranty of 221MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 222GNU General Public License for more details. 223 224You should have received a copy of the GNU General Public License 225along with this program. If not, see <https://www.gnu.org/licenses/>. 226 227--> 228