1# The Frame Stack
2
3Each call to a Python function has an activation record,
4commonly known as a "frame".
5Python semantics allows frames to outlive the activation,
6so they have (before 3.11) been allocated on the heap.
7This is expensive as it requires many allocations and
8results in poor locality of reference.
9
10In 3.11, rather than have these frames scattered about memory,
11as happens for heap-allocated objects, frames are allocated
12contiguously in a per-thread stack.
13This improves performance significantly for two reasons:
14* It reduces allocation overhead to a pointer comparison and increment.
15* Stack allocated data has the best possible locality and will always be in
16  CPU cache.
17
18Generator and coroutines still need heap allocated activation records, but
19can be linked into the per-thread stack so as to not impact performance too much.
20
21## Layout
22
23Each activation record consists of four conceptual sections:
24
25* Local variables (including arguments, cells and free variables)
26* Evaluation stack
27* Specials: The per-frame object references needed by the VM: globals dict,
28  code object, etc.
29* Linkage: Pointer to the previous activation record, stack depth, etc.
30
31### Layout
32
33The specials and linkage sections are a fixed size, so are grouped together.
34
35Each activation record is laid out as:
36* Specials and linkage
37* Locals
38* Stack
39
40This seems to provide the best performance without excessive complexity.
41It needs the interpreter to hold two pointers, a frame pointer and a stack pointer.
42
43#### Alternative layout
44
45An alternative layout that was used for part of 3.11 alpha was:
46
47* Locals
48* Specials and linkage
49* Stack
50
51This has the advantage that no copying is required when making a call,
52as the arguments on the stack are (usually) already in the correct
53location for the parameters. However, it requires the VM to maintain
54an extra pointer for the locals, which can hurt performance.
55
56A variant that only needs the need two pointers is to reverse the numbering
57of the locals, so that the last one is numbered `0`, and the first in memory
58is numbered `N-1`.
59This allows the locals, specials and linkage to accessed from the frame pointer.
60We may implement this in the future.
61
62#### Note:
63
64> In a contiguous stack, we would need to save one fewer registers, as the
65> top of the caller's activation record would be the same at the base of the
66> callee's. However, since some activation records are kept on the heap we
67> cannot do this.
68
69### Generators and Coroutines
70
71Generators and coroutines contain a `_PyInterpreterFrame`
72The specials sections contains the following pointers:
73
74* Globals dict
75* Builtins dict
76* Locals dict (not the "fast" locals, but the locals for eval and class creation)
77* Code object
78* Heap allocated `PyFrameObject` for this activation record, if any.
79* The function.
80
81The pointer to the function is not strictly required, but it is cheaper to
82store a strong reference to the function and borrowed references to the globals
83and builtins, than strong references to both globals and builtins.
84
85### Frame objects
86
87When creating a backtrace or when calling `sys._getframe()` the frame becomes
88visible to Python code. When this happens a new `PyFrameObject` is created
89and a strong reference to it placed in the `frame_obj` field of the specials
90section. The `frame_obj` field is initially `NULL`.
91
92The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`.
93If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`,
94except the evaluation stack which must be empty at this point.
95The linkage section is updated to reflect the new location of the frame.
96
97This mechanism provides the appearance of persistent, heap-allocated
98frames for each activation, but with low runtime overhead.
99
100### Generators and Coroutines
101
102
103Generator objects have a `_PyInterpreterFrame` embedded in them.
104This means that creating a generator requires only a single allocation,
105reducing allocation overhead and improving locality of reference.
106The embedded frame is linked into the per-thread frame when iterated or
107awaited.
108
109If a frame object associated with a generator outlives the generator, then
110the embedded `_PyInterpreterFrame` is copied into the frame object.
111
112
113All the above applies to coroutines and async generators as well.
114
115### Field names
116
117Many of the fields in `_PyInterpreterFrame` were copied from the 3.10 `PyFrameObject`.
118Thus, some of the field names may be a bit misleading.
119
120For example the `f_globals` field has a `f_` prefix implying it belongs to the
121`PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct.
122We may rationalize this naming scheme for 3.12.