Name Date Size #Lines LOC

..--

COW.cppH A D25-Apr-20255.4 KiB15388

COW.hH A D25-Apr-20251 KiB3314

COWDeleter.cppH A D25-Apr-20251.1 KiB4333

COWDeleter.hH A D25-Apr-20252 KiB6724

DeviceGuardImplInterface.cppH A D25-Apr-2025518 1711

DeviceGuardImplInterface.hH A D25-Apr-202512.9 KiB366148

FakeGuardImpl.hH A D25-Apr-20253.1 KiB10381

GPUTrace.cppH A D25-Apr-2025425 1913

GPUTrace.hH A D25-Apr-2025864 2914

HermeticPyObjectTLS.cppH A D25-Apr-2025447 2214

HermeticPyObjectTLS.hH A D25-Apr-20252.4 KiB6017

InlineDeviceGuard.hH A D25-Apr-202515.3 KiB430185

InlineEvent.hH A D25-Apr-20253.8 KiB140117

InlineStreamGuard.hH A D25-Apr-20259.5 KiB257147

LocalDispatchKeySet.cppH A D25-Apr-20254.1 KiB11869

LocalDispatchKeySet.hH A D25-Apr-20256.1 KiB16598

PyInterpreter.cppH A D25-Apr-20254.5 KiB148122

PyInterpreter.hH A D25-Apr-202511 KiB264114

PyObjectSlot.cppH A D25-Apr-20252.3 KiB7451

PyObjectSlot.hH A D25-Apr-20258 KiB19177

PythonDispatcherTLS.cppH A D25-Apr-2025741 3023

PythonDispatcherTLS.hH A D25-Apr-2025549 2519

README-cow.mdH A D25-Apr-20253.3 KiB6858

README.mdH A D25-Apr-2025815 1411

SizesAndStrides.cppH A D25-Apr-20252.8 KiB8064

SizesAndStrides.hH A D25-Apr-20258.2 KiB316242

TorchDispatchModeTLS.cppH A D25-Apr-20256.7 KiB197165

TorchDispatchModeTLS.hH A D25-Apr-20252.2 KiB6841

VirtualGuardImpl.hH A D25-Apr-20253.1 KiB10484

alloc_cpu.cppH A D25-Apr-20254.5 KiB167133

alloc_cpu.hH A D25-Apr-2025178 137

README-cow.md

1Copy-on-write storage
2=====================
3This library adds support for copy-on-write storage, i.e. lazy copies,
4to tensors. The design maintains the PyTorch invariant that tensors
5alias if and only if they share a storage. Thus, tensors that are lazy
6copies of one another will have distinct storages that share a data
7allocation.
8
9Thread-safety
10-------------
11The correctness of this design hinges on the pre-existing PyTorch user
12requirement (and general default programming assumption) that users
13are responsible for guaranteeing that writes do not take places
14concurrently with reads and other writes.
15
16Lazily copied tensors add a complication to this programming model
17because users are not required to know if lazy copies exist and are
18not required to serialize writes across lazy copies. For example: two
19tensors with distinct storages that share a copy-on-write data context
20may be given to different threads that may do whatever they wish to
21them, and the runtime is required to guarantee its safety.
22
23It turns out that this is not that difficult to protect because, due
24to the copy-on-write requirement, we just need to materialize a tensor
25upon writing. This could be done entirely without synchronization if
26we materialized each copy, however, we have a common-sense
27optimization to elide the copy for the last remaining reference. This
28requires waiting for any pending copies.
29
30### Thread-safety detailed design
31There are two operations that affect the copy-on-write details of a
32tensor:
33
341) lazy-clone (e.g. an explicit call or a hidden implementation detail
35   added through an operator like reshape)
362) materialization (i.e. any write to the tensor)
37
38The key insight that we exploit is that lazy-clone is logically a read
39operation and materialization is logically a write operation. This
40means that, for a given set of tensors that share a storage, if
41materialization is taking place, no other read operation, including
42lazy-clone, can be concurrent with it.
43
44However, this insight only applies within a set of tensors that share
45a storage. We also have to be concerned with tensors with different
46storages that share a copy-on-write context. In this world,
47materialization can race with lazy-clone or even other
48materializations. _However_, in order for this to be the case, there
49must be _at least_ two references to the context. This means that the
50context _can not_ vanish out from under you if you are performing a
51lazy-clone, and hence, it only requires an atomic refcount bump.
52
53The most complicated case is that all lazy-copies are concurrently
54materializing. In this case, because a write is occurring, there are
55no in-flight lazy-copies taking place. We must simply ensure that all
56lazy-copies are able to materialize (read the data) concurrently. If
57we didn't have the aforementioned optimization where the last copy
58steals the data, we could get away with no locking whatsoever: each
59makes a copy and decrements the refcount. However, because of the
60optimization, we require the loser of the materializing race wait for
61the pending copies to finish, and then steal the data without copying
62it.
63
64We implement this by taking a shared lock when copying the data and
65taking an exclusive lock when stealing the data. The exclusive lock
66acquisition ensures that all pending shared locks are finished before
67we steal the data.
68

README.md

1c10/core/impl provides headers for functionality that is only needed in very
2*specific* use-cases (e.g., you are defining a new device type), which are
3generally only needed by C10 or PyTorch code.  If you are an ordinary end-user,
4you **should not** use headers in this folder.  We permanently give NO
5backwards-compatibility guarantees for implementations in this folder.
6
7Compare with [c10/util](../../util), which provides functionality that is not
8directly related to being a deep learning library (e.g., C++20 polyfills), but
9may still be generally useful and visible to users.
10
11(We don't call this c10/detail, because the detail namespace convention is for
12*header private* details.  However, c10::impl may be utilized from external
13headers; it simply indicates that the functionality is not for end users.)
14