1# The Frame Stack 2 3Each call to a Python function has an activation record, 4commonly known as a "frame". 5Python semantics allows frames to outlive the activation, 6so they have (before 3.11) been allocated on the heap. 7This is expensive as it requires many allocations and 8results in poor locality of reference. 9 10In 3.11, rather than have these frames scattered about memory, 11as happens for heap-allocated objects, frames are allocated 12contiguously in a per-thread stack. 13This improves performance significantly for two reasons: 14* It reduces allocation overhead to a pointer comparison and increment. 15* Stack allocated data has the best possible locality and will always be in 16 CPU cache. 17 18Generator and coroutines still need heap allocated activation records, but 19can be linked into the per-thread stack so as to not impact performance too much. 20 21## Layout 22 23Each activation record consists of four conceptual sections: 24 25* Local variables (including arguments, cells and free variables) 26* Evaluation stack 27* Specials: The per-frame object references needed by the VM: globals dict, 28 code object, etc. 29* Linkage: Pointer to the previous activation record, stack depth, etc. 30 31### Layout 32 33The specials and linkage sections are a fixed size, so are grouped together. 34 35Each activation record is laid out as: 36* Specials and linkage 37* Locals 38* Stack 39 40This seems to provide the best performance without excessive complexity. 41It needs the interpreter to hold two pointers, a frame pointer and a stack pointer. 42 43#### Alternative layout 44 45An alternative layout that was used for part of 3.11 alpha was: 46 47* Locals 48* Specials and linkage 49* Stack 50 51This has the advantage that no copying is required when making a call, 52as the arguments on the stack are (usually) already in the correct 53location for the parameters. However, it requires the VM to maintain 54an extra pointer for the locals, which can hurt performance. 55 56A variant that only needs the need two pointers is to reverse the numbering 57of the locals, so that the last one is numbered `0`, and the first in memory 58is numbered `N-1`. 59This allows the locals, specials and linkage to accessed from the frame pointer. 60We may implement this in the future. 61 62#### Note: 63 64> In a contiguous stack, we would need to save one fewer registers, as the 65> top of the caller's activation record would be the same at the base of the 66> callee's. However, since some activation records are kept on the heap we 67> cannot do this. 68 69### Generators and Coroutines 70 71Generators and coroutines contain a `_PyInterpreterFrame` 72The specials sections contains the following pointers: 73 74* Globals dict 75* Builtins dict 76* Locals dict (not the "fast" locals, but the locals for eval and class creation) 77* Code object 78* Heap allocated `PyFrameObject` for this activation record, if any. 79* The function. 80 81The pointer to the function is not strictly required, but it is cheaper to 82store a strong reference to the function and borrowed references to the globals 83and builtins, than strong references to both globals and builtins. 84 85### Frame objects 86 87When creating a backtrace or when calling `sys._getframe()` the frame becomes 88visible to Python code. When this happens a new `PyFrameObject` is created 89and a strong reference to it placed in the `frame_obj` field of the specials 90section. The `frame_obj` field is initially `NULL`. 91 92The `PyFrameObject` may outlive a stack-allocated `_PyInterpreterFrame`. 93If it does then `_PyInterpreterFrame` is copied into the `PyFrameObject`, 94except the evaluation stack which must be empty at this point. 95The linkage section is updated to reflect the new location of the frame. 96 97This mechanism provides the appearance of persistent, heap-allocated 98frames for each activation, but with low runtime overhead. 99 100### Generators and Coroutines 101 102 103Generator objects have a `_PyInterpreterFrame` embedded in them. 104This means that creating a generator requires only a single allocation, 105reducing allocation overhead and improving locality of reference. 106The embedded frame is linked into the per-thread frame when iterated or 107awaited. 108 109If a frame object associated with a generator outlives the generator, then 110the embedded `_PyInterpreterFrame` is copied into the frame object. 111 112 113All the above applies to coroutines and async generators as well. 114 115### Field names 116 117Many of the fields in `_PyInterpreterFrame` were copied from the 3.10 `PyFrameObject`. 118Thus, some of the field names may be a bit misleading. 119 120For example the `f_globals` field has a `f_` prefix implying it belongs to the 121`PyFrameObject` struct, although it belongs to the `_PyInterpreterFrame` struct. 122We may rationalize this naming scheme for 3.12.