xref: /aosp_15_r20/external/starlark-go/doc/impl.md (revision 4947cdc739c985f6d86941e22894f5cefe7c9e9a)
1*4947cdc7SCole Faust
2*4947cdc7SCole Faust# Starlark in Go: Implementation
3*4947cdc7SCole Faust
4*4947cdc7SCole FaustThis document (a work in progress) describes some of the design
5*4947cdc7SCole Faustchoices of the Go implementation of Starlark.
6*4947cdc7SCole Faust
7*4947cdc7SCole Faust  * [Scanner](#scanner)
8*4947cdc7SCole Faust  * [Parser](#parser)
9*4947cdc7SCole Faust  * [Resolver](#resolver)
10*4947cdc7SCole Faust  * [Evaluator](#evaluator)
11*4947cdc7SCole Faust    * [Data types](#data-types)
12*4947cdc7SCole Faust    * [Freezing](#freezing)
13*4947cdc7SCole Faust  * [Testing](#testing)
14*4947cdc7SCole Faust
15*4947cdc7SCole Faust
16*4947cdc7SCole Faust## Scanner
17*4947cdc7SCole Faust
18*4947cdc7SCole FaustThe scanner is derived from Russ Cox's
19*4947cdc7SCole Faust[buildifier](https://github.com/bazelbuild/buildtools/tree/master/buildifier)
20*4947cdc7SCole Fausttool, which pretty-prints Bazel BUILD files.
21*4947cdc7SCole Faust
22*4947cdc7SCole FaustMost of the work happens in `(*scanner).nextToken`.
23*4947cdc7SCole Faust
24*4947cdc7SCole Faust## Parser
25*4947cdc7SCole Faust
26*4947cdc7SCole FaustThe parser is hand-written recursive-descent parser. It uses the
27*4947cdc7SCole Fausttechnique of [precedence
28*4947cdc7SCole Faustclimbing](http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing)
29*4947cdc7SCole Faustto reduce the number of productions.
30*4947cdc7SCole Faust
31*4947cdc7SCole FaustIn some places the parser accepts a larger set of programs than are
32*4947cdc7SCole Fauststrictly valid, leaving the task of rejecting them to the subsequent
33*4947cdc7SCole Faustresolver pass. For example, in the function call `f(a, b=c)` the
34*4947cdc7SCole Faustparser accepts any expression for `a` and `b`, even though `b` may
35*4947cdc7SCole Faustlegally be only an identifier. For the parser to distinguish these
36*4947cdc7SCole Faustcases would require additional lookahead.
37*4947cdc7SCole Faust
38*4947cdc7SCole Faust## Resolver
39*4947cdc7SCole Faust
40*4947cdc7SCole FaustThe resolver reports structural errors in the program, such as the use
41*4947cdc7SCole Faustof `break` and `continue` outside of a loop.
42*4947cdc7SCole Faust
43*4947cdc7SCole FaustStarlark has stricter syntactic limitations than Python. For example,
44*4947cdc7SCole Faustit does not permit `for` loops or `if` statements at top level, nor
45*4947cdc7SCole Faustdoes it permit global variables to be bound more than once.
46*4947cdc7SCole FaustThese limitations come from the Bazel project's desire to make it easy
47*4947cdc7SCole Faustto identify the sole statement that defines each global, permitting
48*4947cdc7SCole Faustaccurate cross-reference documentation.
49*4947cdc7SCole Faust
50*4947cdc7SCole FaustIn addition, the resolver validates all variable names, classifying
51*4947cdc7SCole Faustthem as references to universal, global, local, or free variables.
52*4947cdc7SCole FaustLocal and free variables are mapped to a small integer, allowing the
53*4947cdc7SCole Faustevaluator to use an efficient (flat) representation for the
54*4947cdc7SCole Faustenvironment.
55*4947cdc7SCole Faust
56*4947cdc7SCole FaustNot all features of the Go implementation are "standard" (that is,
57*4947cdc7SCole Faustsupported by Bazel's Java implementation), at least for now, so
58*4947cdc7SCole Faustnon-standard features such as `lambda`, `float`, and `set`
59*4947cdc7SCole Faustare flag-controlled.  The resolver reports
60*4947cdc7SCole Faustany uses of dialect features that have not been enabled.
61*4947cdc7SCole Faust
62*4947cdc7SCole Faust
63*4947cdc7SCole Faust## Evaluator
64*4947cdc7SCole Faust
65*4947cdc7SCole Faust### Data types
66*4947cdc7SCole Faust
67*4947cdc7SCole Faust<b>Integers:</b> Integers are representing using `big.Int`, an
68*4947cdc7SCole Faustarbitrary precision integer. This representation was chosen because,
69*4947cdc7SCole Faustfor many applications, Starlark must be able to handle without loss
70*4947cdc7SCole Faustprotocol buffer values containing signed and unsigned 64-bit integers,
71*4947cdc7SCole Faustwhich requires 65 bits of precision.
72*4947cdc7SCole Faust
73*4947cdc7SCole FaustSmall integers (<256) are preallocated, but all other values require
74*4947cdc7SCole Faustmemory allocation. Integer performance is relatively poor, but it
75*4947cdc7SCole Faustmatters little for Bazel-like workloads which depend much
76*4947cdc7SCole Faustmore on lists of strings than on integers. (Recall that a typical loop
77*4947cdc7SCole Faustover a list in Starlark does not materialize the loop index as an `int`.)
78*4947cdc7SCole Faust
79*4947cdc7SCole FaustAn optimization worth trying would be to represent integers using
80*4947cdc7SCole Fausteither an `int32` or `big.Int`, with the `big.Int` used only when
81*4947cdc7SCole Faust`int32` does not suffice. Using `int32`, not `int64`, for "small"
82*4947cdc7SCole Faustnumbers would make it easier to detect overflow from operations like
83*4947cdc7SCole Faust`int32 * int32`, which would trigger the use of `big.Int`.
84*4947cdc7SCole Faust
85*4947cdc7SCole Faust<b>Floating point</b>:
86*4947cdc7SCole FaustFloating point numbers are represented using Go's `float64`.
87*4947cdc7SCole FaustAgain, `float` support is required to support protocol buffers. The
88*4947cdc7SCole Faustexistence of floating-point NaN and its infamous comparison behavior
89*4947cdc7SCole Faust(`NaN != NaN`) had many ramifications for the API, since we cannot
90*4947cdc7SCole Faustassume the result of an ordered comparison is either less than,
91*4947cdc7SCole Faustgreater than, or equal: it may also fail.
92*4947cdc7SCole Faust
93*4947cdc7SCole Faust<b>Strings</b>:
94*4947cdc7SCole Faust
95*4947cdc7SCole FaustTODO: discuss UTF-8 and string.bytes method.
96*4947cdc7SCole Faust
97*4947cdc7SCole Faust<b>Dictionaries and sets</b>:
98*4947cdc7SCole FaustStarlark dictionaries have predictable iteration order.
99*4947cdc7SCole FaustFurthermore, many Starlark values are hashable in Starlark even though
100*4947cdc7SCole Faustthe Go values that represent them are not hashable in Go: big
101*4947cdc7SCole Faustintegers, for example.
102*4947cdc7SCole FaustConsequently, we cannot use Go maps to implement Starlark's dictionary.
103*4947cdc7SCole Faust
104*4947cdc7SCole FaustWe use a simple hash table whose buckets are linked lists, each
105*4947cdc7SCole Faustelement of which holds up to 8 key/value pairs. In a well-distributed
106*4947cdc7SCole Fausttable the list should rarely exceed length 1. In addition, each
107*4947cdc7SCole Faustkey/value item is part of doubly-linked list that maintains the
108*4947cdc7SCole Faustinsertion order of the elements for iteration.
109*4947cdc7SCole Faust
110*4947cdc7SCole Faust<b>Struct:</b>
111*4947cdc7SCole FaustThe `starlarkstruct` Go package provides a non-standard Starlark
112*4947cdc7SCole Faustextension data type, `struct`, that maps field identifiers to
113*4947cdc7SCole Faustarbitrary values. Fields are accessed using dot notation: `y = s.f`.
114*4947cdc7SCole FaustThis data type is extensively used in Bazel, but its specification is
115*4947cdc7SCole Faustcurrently evolving.
116*4947cdc7SCole Faust
117*4947cdc7SCole FaustStarlark has no `class` mechanism, nor equivalent of Python's
118*4947cdc7SCole Faust`namedtuple`, though it is likely that future versions will support
119*4947cdc7SCole Faustsome way to define a record data type of several fields, with a
120*4947cdc7SCole Faustrepresentation more efficient than a hash table.
121*4947cdc7SCole Faust
122*4947cdc7SCole Faust
123*4947cdc7SCole Faust### Freezing
124*4947cdc7SCole Faust
125*4947cdc7SCole FaustAll mutable values created during module initialization are _frozen_
126*4947cdc7SCole Faustupon its completion. It is this property that permits a Starlark module
127*4947cdc7SCole Faustto be referenced by two Starlark threads running concurrently (such as
128*4947cdc7SCole Faustthe initialization threads of two other modules) without the
129*4947cdc7SCole Faustpossibility of a data race.
130*4947cdc7SCole Faust
131*4947cdc7SCole FaustThe Go implementation supports freezing by storing an additional
132*4947cdc7SCole Faust"frozen" Boolean variable in each mutable object. Once this flag is set,
133*4947cdc7SCole Faustall subsequent attempts at mutation fail. Every value defines a
134*4947cdc7SCole FaustFreeze method that sets its own frozen flag if not already set, and
135*4947cdc7SCole Faustcalls Freeze for each value that it contains.
136*4947cdc7SCole FaustFor example, when a list is frozen, it freezes each of its elements;
137*4947cdc7SCole Faustwhen a dictionary is frozen, it freezes each of its keys and values;
138*4947cdc7SCole Faustand when a function value is frozen, it freezes each of the free
139*4947cdc7SCole Faustvariables and parameter default values implicitly referenced by its closure.
140*4947cdc7SCole FaustApplication-defined types must also follow this discipline.
141*4947cdc7SCole Faust
142*4947cdc7SCole FaustThe freeze mechanism in the Go implementation is finer grained than in
143*4947cdc7SCole Faustthe Java implementation: in effect, the latter has one "frozen" flag
144*4947cdc7SCole Faustper module, and every value holds a reference to the frozen flag of
145*4947cdc7SCole Faustits module. This makes setting the frozen flag more efficient---a
146*4947cdc7SCole Faustsimple bit flip, no need to traverse the object graph---but coarser
147*4947cdc7SCole Faustgrained. Also, it complicates the API slightly because to construct a
148*4947cdc7SCole Faustlist, say, requires a reference to the frozen flag it should use.
149*4947cdc7SCole Faust
150*4947cdc7SCole FaustThe Go implementation would also permit the freeze operation to be
151*4947cdc7SCole Faustexposed to the program, for example as a built-in function.
152*4947cdc7SCole FaustThis has proven valuable in writing tests of the freeze mechanism
153*4947cdc7SCole Faustitself, but is otherwise mostly a curiosity.
154*4947cdc7SCole Faust
155*4947cdc7SCole Faust
156*4947cdc7SCole Faust### Fail-fast iterators
157*4947cdc7SCole Faust
158*4947cdc7SCole FaustIn some languages (such as Go), a program may mutate a data structure
159*4947cdc7SCole Faustwhile iterating over it; for example, a range loop over a map may
160*4947cdc7SCole Faustdelete map elements. In other languages (such as Java), iterators do
161*4947cdc7SCole Faustextra bookkeeping so that modification of the underlying collection
162*4947cdc7SCole Faustinvalidates the iterator, and the next attempt to use it fails.
163*4947cdc7SCole FaustThis often helps to detect subtle mistakes.
164*4947cdc7SCole Faust
165*4947cdc7SCole FaustStarlark takes this a step further. Instead of mutation of the
166*4947cdc7SCole Faustcollection invalidating the iterator, the act of iterating makes the
167*4947cdc7SCole Faustcollection temporarily immutable, so that an attempt to, say, delete a
168*4947cdc7SCole Faustdict element while looping over the dict, will fail. The error is
169*4947cdc7SCole Faustreported against the delete operation, not the iteration.
170*4947cdc7SCole Faust
171*4947cdc7SCole FaustThis is implemented by having each mutable iterable value record a
172*4947cdc7SCole Faustcounter of active iterators. Starting a loop increments this counter,
173*4947cdc7SCole Faustand completing a loop decrements it. A collection with a nonzero
174*4947cdc7SCole Faustcounter behaves as if frozen. If the collection is actually frozen,
175*4947cdc7SCole Faustthe counter bookkeeping is unnecessary. (Consequently, iterator
176*4947cdc7SCole Faustbookkeeping is needed only while objects are still mutable, before
177*4947cdc7SCole Faustthey can have been published to another thread, and thus no
178*4947cdc7SCole Faustsynchronization is necessary.)
179*4947cdc7SCole Faust
180*4947cdc7SCole FaustA consequence of this design is that in the Go API, it is imperative
181*4947cdc7SCole Faustto call `Done` on each iterator once it is no longer needed.
182*4947cdc7SCole Faust
183*4947cdc7SCole Faust```
184*4947cdc7SCole FaustTODO
185*4947cdc7SCole Fauststarlark.Value interface and subinterfaces
186*4947cdc7SCole Faustargument passing to builtins: UnpackArgs, UnpackPositionalArgs.
187*4947cdc7SCole Faust```
188*4947cdc7SCole Faust
189*4947cdc7SCole Faust<b>Evaluation strategy:</b>
190*4947cdc7SCole FaustThe evaluator uses a simple recursive tree walk, returning a value or
191*4947cdc7SCole Faustan error for each expression. We have experimented with just-in-time
192*4947cdc7SCole Faustcompilation of syntax trees to bytecode, but two limitations in the
193*4947cdc7SCole Faustcurrent Go compiler prevent this strategy from outperforming the
194*4947cdc7SCole Fausttree-walking evaluator.
195*4947cdc7SCole Faust
196*4947cdc7SCole FaustFirst, the Go compiler does not generate a "computed goto" for a
197*4947cdc7SCole Faustswitch statement ([Go issue
198*4947cdc7SCole Faust5496](https://github.com/golang/go/issues/5496)). A bytecode
199*4947cdc7SCole Faustinterpreter's main loop is a for-loop around a switch statement with
200*4947cdc7SCole Faustdozens or hundreds of cases, and the speed with which each case can be
201*4947cdc7SCole Faustdispatched strongly affects overall performance.
202*4947cdc7SCole FaustCurrently, a switch statement generates a binary tree of ordered
203*4947cdc7SCole Faustcomparisons, requiring several branches instead of one.
204*4947cdc7SCole Faust
205*4947cdc7SCole FaustSecond, the Go compiler's escape analysis assumes that the underlying
206*4947cdc7SCole Faustarray from a `make([]Value, n)` allocation always escapes
207*4947cdc7SCole Faust([Go issue 20533](https://github.com/golang/go/issues/20533)).
208*4947cdc7SCole FaustBecause the bytecode interpreter's operand stack has a non-constant
209*4947cdc7SCole Faustlength, it must be allocated with `make`. The resulting allocation
210*4947cdc7SCole Faustadds to the cost of each Starlark function call; this can be tolerated
211*4947cdc7SCole Faustby amortizing one very large stack allocation across many calls.
212*4947cdc7SCole FaustMore problematic appears to be the cost of the additional GC write
213*4947cdc7SCole Faustbarriers incurred by every VM operation: every intermediate result is
214*4947cdc7SCole Faustsaved to the VM's operand stack, which is on the heap.
215*4947cdc7SCole FaustBy contrast, intermediate results in the tree-walking evaluator are
216*4947cdc7SCole Faustnever stored to the heap.
217*4947cdc7SCole Faust
218*4947cdc7SCole Faust```
219*4947cdc7SCole FaustTODO
220*4947cdc7SCole Faustframes, backtraces, errors.
221*4947cdc7SCole Faustthreads
222*4947cdc7SCole FaustPrint
223*4947cdc7SCole FaustLoad
224*4947cdc7SCole Faust```
225*4947cdc7SCole Faust
226*4947cdc7SCole Faust## Testing
227*4947cdc7SCole Faust
228*4947cdc7SCole Faust```
229*4947cdc7SCole FaustTODO
230*4947cdc7SCole Fauststarlarktest package
231*4947cdc7SCole Faust`assert` module
232*4947cdc7SCole Fauststarlarkstruct
233*4947cdc7SCole Faustintegration with Go testing.T
234*4947cdc7SCole Faust```
235*4947cdc7SCole Faust
236*4947cdc7SCole Faust
237*4947cdc7SCole Faust## TODO
238*4947cdc7SCole Faust
239*4947cdc7SCole Faust
240*4947cdc7SCole Faust```
241*4947cdc7SCole FaustDiscuss practical separation of code and data.
242*4947cdc7SCole Faust```
243