1 // This module exists only to provide a separate page for the implementation 2 // documentation. 3 4 //! Notes on `sharded-slab`'s implementation and design. 5 //! 6 //! # Design 7 //! 8 //! The sharded slab's design is strongly inspired by the ideas presented by 9 //! Leijen, Zorn, and de Moura in [Mimalloc: Free List Sharding in 10 //! Action][mimalloc]. In this report, the authors present a novel design for a 11 //! memory allocator based on a concept of _free list sharding_. 12 //! 13 //! Memory allocators must keep track of what memory regions are not currently 14 //! allocated ("free") in order to provide them to future allocation requests. 15 //! The term [_free list_][freelist] refers to a technique for performing this 16 //! bookkeeping, where each free block stores a pointer to the next free block, 17 //! forming a linked list. The memory allocator keeps a pointer to the most 18 //! recently freed block, the _head_ of the free list. To allocate more memory, 19 //! the allocator pops from the free list by setting the head pointer to the 20 //! next free block of the current head block, and returning the previous head. 21 //! To deallocate a block, the block is pushed to the free list by setting its 22 //! first word to the current head pointer, and the head pointer is set to point 23 //! to the deallocated block. Most implementations of slab allocators backed by 24 //! arrays or vectors use a similar technique, where pointers are replaced by 25 //! indices into the backing array. 26 //! 27 //! When allocations and deallocations can occur concurrently across threads, 28 //! they must synchronize accesses to the free list; either by putting the 29 //! entire allocator state inside of a lock, or by using atomic operations to 30 //! treat the free list as a lock-free structure (such as a [Treiber stack]). In 31 //! both cases, there is a significant performance cost — even when the free 32 //! list is lock-free, it is likely that a noticeable amount of time will be 33 //! spent in compare-and-swap loops. Ideally, the global synchronzation point 34 //! created by the single global free list could be avoided as much as possible. 35 //! 36 //! The approach presented by Leijen, Zorn, and de Moura is to introduce 37 //! sharding and thus increase the granularity of synchronization significantly. 38 //! In mimalloc, the heap is _sharded_ so that each thread has its own 39 //! thread-local heap. Objects are always allocated from the local heap of the 40 //! thread where the allocation is performed. Because allocations are always 41 //! done from a thread's local heap, they need not be synchronized. 42 //! 43 //! However, since objects can move between threads before being deallocated, 44 //! _deallocations_ may still occur concurrently. Therefore, Leijen et al. 45 //! introduce a concept of _local_ and _global_ free lists. When an object is 46 //! deallocated on the same thread it was originally allocated on, it is placed 47 //! on the local free list; if it is deallocated on another thread, it goes on 48 //! the global free list for the heap of the thread from which it originated. To 49 //! allocate, the local free list is used first; if it is empty, the entire 50 //! global free list is popped onto the local free list. Since the local free 51 //! list is only ever accessed by the thread it belongs to, it does not require 52 //! synchronization at all, and because the global free list is popped from 53 //! infrequently, the cost of synchronization has a reduced impact. A majority 54 //! of allocations can occur without any synchronization at all; and 55 //! deallocations only require synchronization when an object has left its 56 //! parent thread (a relatively uncommon case). 57 //! 58 //! [mimalloc]: https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf 59 //! [freelist]: https://en.wikipedia.org/wiki/Free_list 60 //! [Treiber stack]: https://en.wikipedia.org/wiki/Treiber_stack 61 //! 62 //! # Implementation 63 //! 64 //! A slab is represented as an array of [`MAX_THREADS`] _shards_. A shard 65 //! consists of a vector of one or more _pages_ plus associated metadata. 66 //! Finally, a page consists of an array of _slots_, head indices for the local 67 //! and remote free lists. 68 //! 69 //! ```text 70 //! ┌─────────────┐ 71 //! │ shard 1 │ 72 //! │ │ ┌─────────────┐ ┌────────┐ 73 //! │ pages───────┼───▶│ page 1 │ │ │ 74 //! ├─────────────┤ ├─────────────┤ ┌────▶│ next──┼─┐ 75 //! │ shard 2 │ │ page 2 │ │ ├────────┤ │ 76 //! ├─────────────┤ │ │ │ │XXXXXXXX│ │ 77 //! │ shard 3 │ │ local_head──┼──┘ ├────────┤ │ 78 //! └─────────────┘ │ remote_head─┼──┐ │ │◀┘ 79 //! ... ├─────────────┤ │ │ next──┼─┐ 80 //! ┌─────────────┐ │ page 3 │ │ ├────────┤ │ 81 //! │ shard n │ └─────────────┘ │ │XXXXXXXX│ │ 82 //! └─────────────┘ ... │ ├────────┤ │ 83 //! ┌─────────────┐ │ │XXXXXXXX│ │ 84 //! │ page n │ │ ├────────┤ │ 85 //! └─────────────┘ │ │ │◀┘ 86 //! └────▶│ next──┼───▶ ... 87 //! ├────────┤ 88 //! │XXXXXXXX│ 89 //! └────────┘ 90 //! ``` 91 //! 92 //! 93 //! The size of the first page in a shard is always a power of two, and every 94 //! subsequent page added after the first is twice as large as the page that 95 //! preceeds it. 96 //! 97 //! ```text 98 //! 99 //! pg. 100 //! ┌───┐ ┌─┬─┐ 101 //! │ 0 │───▶ │ │ 102 //! ├───┤ ├─┼─┼─┬─┐ 103 //! │ 1 │───▶ │ │ │ │ 104 //! ├───┤ ├─┼─┼─┼─┼─┬─┬─┬─┐ 105 //! │ 2 │───▶ │ │ │ │ │ │ │ │ 106 //! ├───┤ ├─┼─┼─┼─┼─┼─┼─┼─┼─┬─┬─┬─┬─┬─┬─┬─┐ 107 //! │ 3 │───▶ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ 108 //! └───┘ └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘ 109 //! ``` 110 //! 111 //! When searching for a free slot, the smallest page is searched first, and if 112 //! it is full, the search proceeds to the next page until either a free slot is 113 //! found or all available pages have been searched. If all available pages have 114 //! been searched and the maximum number of pages has not yet been reached, a 115 //! new page is then allocated. 116 //! 117 //! Since every page is twice as large as the previous page, and all page sizes 118 //! are powers of two, we can determine the page index that contains a given 119 //! address by shifting the address down by the smallest page size and 120 //! looking at how many twos places necessary to represent that number, 121 //! telling us what power of two page size it fits inside of. We can 122 //! determine the number of twos places by counting the number of leading 123 //! zeros (unused twos places) in the number's binary representation, and 124 //! subtracting that count from the total number of bits in a word. 125 //! 126 //! The formula for determining the page number that contains an offset is thus: 127 //! 128 //! ```rust,ignore 129 //! WIDTH - ((offset + INITIAL_PAGE_SIZE) >> INDEX_SHIFT).leading_zeros() 130 //! ``` 131 //! 132 //! where `WIDTH` is the number of bits in a `usize`, and `INDEX_SHIFT` is 133 //! 134 //! ```rust,ignore 135 //! INITIAL_PAGE_SIZE.trailing_zeros() + 1; 136 //! ``` 137 //! 138 //! [`MAX_THREADS`]: https://docs.rs/sharded-slab/latest/sharded_slab/trait.Config.html#associatedconstant.MAX_THREADS 139