xref: /aosp_15_r20/prebuilts/build-tools/common/bison/README.md (revision cda5da8d549138a6648c5ee6d7a49cf8f4a657be)
1This directory contains data needed by Bison.
2
3# Directory Content
4## Skeletons
5Bison skeletons: the general shapes of the different parser kinds, that are
6specialized for specific grammars by the bison program.
7
8Currently, the supported skeletons are:
9
10- yacc.c
11  It used to be named bison.simple: it corresponds to C Yacc
12  compatible LALR(1) parsers.
13
14- lalr1.cc
15  Produces a C++ parser class.
16
17- lalr1.java
18  Produces a Java parser class.
19
20- glr.c
21  A Generalized LR C parser based on Bison's LALR(1) tables.
22
23- glr.cc
24  A Generalized LR C++ parser.  Actually a C++ wrapper around glr.c.
25
26These skeletons are the only ones supported by the Bison team.  Because the
27interface between skeletons and the bison program is not finished, *we are
28not bound to it*.  In particular, Bison is not mature enough for us to
29consider that "foreign skeletons" are supported.
30
31## m4sugar
32This directory contains M4sugar, sort of an extended library for M4, which
33is used by Bison to instantiate the skeletons.
34
35## xslt
36This directory contains XSLT programs that transform Bison's XML output into
37various formats.
38
39- bison.xsl
40  A library of routines used by the other XSLT programs.
41
42- xml2dot.xsl
43  Conversion into GraphViz's dot format.
44
45- xml2text.xsl
46  Conversion into text.
47
48- xml2xhtml.xsl
49  Conversion into XHTML.
50
51# Implementation Notes About the Skeletons
52
53"Skeleton" in Bison parlance means "backend": a skeleton is fed by the bison
54executable with LR tables, facts about the symbols, etc. and they generate
55the output (say parser.cc, parser.hh, location.hh, etc.).  They are only in
56charge of generating the parser and its auxiliary files, they do not
57generate the XML output, the parser.output reports, nor the graphical
58rendering.
59
60The bits of information passing from bison to the backend is named
61"muscles".  Muscles are passed to M4 via its standard input: it's a set of
62m4 definitions.  To see them, use `--trace=muscles`.
63
64Except for muscles, whose names are generated by bison, the skeletons have
65no constraint at all on the macro names: there is no technical/theoretical
66limitation, as long as you generate the output, you can do what you want.
67However, of course, that would be a bad idea if, say, the C and C++
68skeletons used different approaches and had completely different
69implementations.  That would be a maintenance nightmare.
70
71Below, we document some of the macros that we use in several of the
72skeletons.  If you are to write a new skeleton, please, implement them for
73your language.  Overall, be sure to follow the same patterns as the existing
74skeletons.
75
76## Vocabulary
77
78We use "formal arguments", or "formals" for short, to denote the declared
79parameters of a function (e.g., `int argc, const char **argv`).  Yes, this
80is somewhat contradictory with `param` in the `%param` directives.
81
82We use "effective arguments", or "args" for short, to denote the values
83passed in function calls (e.g., `argc, argv`).
84
85## Symbols
86
87### `b4_symbol(NUM, FIELD)`
88In order to unify the handling of the various aspects of symbols (tag, type
89name, whether terminal, etc.), bison.exe defines one macro per (token,
90field), where field can `has_id`, `id`, etc.: see
91`prepare_symbol_definitions()` in `src/output.c`.
92
93NUM can be:
94- `empty` to denote the "empty" pseudo-symbol when it exists,
95- `eof`, `error`, or `undef`
96- a symbol number.
97
98FIELD can be:
99
100- `has_id`: 0 or 1
101  Whether the symbol has an `id`.
102
103- `id`: string (e.g., `exp`, `NUM`, or `TOK_NUM` with api.token.prefix)
104  If `has_id`, the name of the token kind (prefixed by api.token.prefix if
105  defined), otherwise empty.  Guaranteed to be usable as a C identifier.
106  This is used to define the token kind (i.e., the enum used by the return
107  value of yylex).  Should be named `token_kind`.
108
109- `tag`: string
110  A human readable representation of the symbol.  Can be `'foo'`,
111  `'foo.id'`, `'"foo"'` etc.
112
113- `code`: integer
114  The token code associated to the token kind `id`.
115  The external number as used by yylex.  Can be ASCII code when a character,
116  some number chosen by bison, or some user number in the case of `%token
117  FOO <NUM>`.  Corresponds to `yychar` in `yacc.c`.
118
119- `is_token`: 0 or 1
120  Whether this is a terminal symbol.
121
122- `kind_base`: string (e.g., `YYSYMBOL_exp`, `YYSYMBOL_NUM`)
123  The base of the symbol kind, i.e., the enumerator of this symbol (token or
124  nonterminal) which is mapped to its `number`.
125
126- `kind`: string
127  Same as `kind_base`, but possibly with a prefix in some languages.  E.g.,
128  EOF's `kind_base` and `kind` are `YYSYMBOL_YYEOF` in C, but are
129  `S_YYEMPTY` and `symbol_kind::S_YYEMPTY` in C++.
130
131- `number`: integer
132  The code associated to the `kind`.
133  The internal number (computed from the external number by yytranslate).
134  Corresponds to yytoken in yacc.c.  This is the same number that serves as
135  key in b4_symbol(NUM, FIELD).
136
137  In bison, symbols are first assigned increasing numbers in order of
138  appearance (but tokens first, then nterms).  After grammar reduction,
139  unused nterms are then renumbered to appear last (i.e., first tokens, then
140  used nterms and finally unused nterms).  This final number NUM is the one
141  contained in this field, and it is the one used as key in `b4_symbol(NUM,
142  FIELD)`.
143
144  The code of the rule actions, however, is emitted before we know what
145  symbols are unused, so they use the original numbers.  To avoid confusion,
146  they actually use "orig NUM" instead of just "NUM".  bison also emits
147  definitions for `b4_symbol(orig NUM, number)` that map from original
148  numbers to the new ones.  `b4_symbol` actually resolves `orig NUM` in the
149  other case, i.e., `b4_symbol(orig 42, tag)` would return the tag of the
150  symbols whose original number was 42.
151
152- `has_type`: 0, 1
153  Whether has a semantic value.
154
155- `type_tag`: string
156  When api.value.type=union, the generated name for the union member.
157  yytype_INT etc. for symbols that has_id, otherwise yytype_1 etc.
158
159- `type`: string
160  If it has a semantic value, its type tag, or, if variant are used,
161  its type.
162  In the case of api.value.type=union, type is the real type (e.g. int).
163
164- `slot`: string
165  If it has a semantic value, the name of the union member (i.e., bounces to
166  either `type_tag` or `type`).  It would be better to fix our mess and
167  always use `type` for the true type of the member, and `type_tag` for the
168  name of the union member.
169
170- `has_printer`: 0, 1
171- `printer`: string
172- `printer_file`: string
173- `printer_line`: integer
174- `printer_loc`: location
175  If the symbol has a printer, everything about it.
176
177- `has_destructor`, `destructor`, `destructor_file`, `destructor_line`, `destructor_loc`
178  Likewise.
179
180### `b4_symbol_value(VAL, [SYMBOL-NUM], [TYPE-TAG])`
181Expansion of $$, $1, $<TYPE-TAG>3, etc.
182
183The semantic value from a given VAL.
184- `VAL`: some semantic value storage (typically a union).  e.g., `yylval`
185- `SYMBOL-NUM`: the symbol number from which we extract the type tag.
186- `TYPE-TAG`, the user forced the `<TYPE-TAG>`.
187
188The result can be used safely, it is put in parens to avoid nasty precedence
189issues.
190
191### `b4_lhs_value(SYMBOL-NUM, [TYPE])`
192Expansion of `$$` or `$<TYPE>$`, for symbol `SYMBOL-NUM`.
193
194### `b4_rhs_data(RULE-LENGTH, POS)`
195The data corresponding to the symbol `#POS`, where the current rule has
196`RULE-LENGTH` symbols on RHS.
197
198### `b4_rhs_value(RULE-LENGTH, POS, SYMBOL-NUM, [TYPE])`
199Expansion of `$<TYPE>POS`, where the current rule has `RULE-LENGTH` symbols
200on RHS.
201
202<!--
203
204Local Variables:
205mode: markdown
206fill-column: 76
207ispell-dictionary: "american"
208End:
209
210Copyright (C) 2002, 2008-2015, 2018-2021 Free Software Foundation, Inc.
211
212This file is part of GNU Bison.
213
214This program is free software: you can redistribute it and/or modify
215it under the terms of the GNU General Public License as published by
216the Free Software Foundation, either version 3 of the License, or
217(at your option) any later version.
218
219This program is distributed in the hope that it will be useful,
220but WITHOUT ANY WARRANTY; without even the implied warranty of
221MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
222GNU General Public License for more details.
223
224You should have received a copy of the GNU General Public License
225along with this program.  If not, see <https://www.gnu.org/licenses/>.
226
227-->
228