impl - OpenGrok cross reference for /aosp_15_r20/external/pytorch/c10/core/impl/

Copy-on-write storage
=====================
This library adds support for copy-on-write storage, i.e. lazy copies,
to tensors. The design maintains the PyTorch invariant that tensors
alias if and only if they share a storage. Thus, tensors that are lazy
copies of one another will have distinct storages that share a data
allocation.

Thread-safety
-------------
The correctness of this design hinges on the pre-existing PyTorch user
requirement (and general default programming assumption) that users
are responsible for guaranteeing that writes do not take places
concurrently with reads and other writes.

Lazily copied tensors add a complication to this programming model
because users are not required to know if lazy copies exist and are
not required to serialize writes across lazy copies. For example: two
tensors with distinct storages that share a copy-on-write data context
may be given to different threads that may do whatever they wish to
them, and the runtime is required to guarantee its safety.

It turns out that this is not that difficult to protect because, due
to the copy-on-write requirement, we just need to materialize a tensor
upon writing. This could be done entirely without synchronization if
we materialized each copy, however, we have a common-sense
optimization to elide the copy for the last remaining reference. This
requires waiting for any pending copies.

### Thread-safety detailed design
There are two operations that affect the copy-on-write details of a
tensor:

1) lazy-clone (e.g. an explicit call or a hidden implementation detail
   added through an operator like reshape)
2) materialization (i.e. any write to the tensor)

The key insight that we exploit is that lazy-clone is logically a read
operation and materialization is logically a write operation. This
means that, for a given set of tensors that share a storage, if
materialization is taking place, no other read operation, including
lazy-clone, can be concurrent with it.

However, this insight only applies within a set of tensors that share
a storage. We also have to be concerned with tensors with different
storages that share a copy-on-write context. In this world,
materialization can race with lazy-clone or even other
materializations. _However_, in order for this to be the case, there
must be _at least_ two references to the context. This means that the
context _can not_ vanish out from under you if you are performing a
lazy-clone, and hence, it only requires an atomic refcount bump.

The most complicated case is that all lazy-copies are concurrently
materializing. In this case, because a write is occurring, there are
no in-flight lazy-copies taking place. We must simply ensure that all
lazy-copies are able to materialize (read the data) concurrently. If
we didn't have the aforementioned optimization where the last copy
steals the data, we could get away with no locking whatsoever: each
makes a copy and decrements the refcount. However, because of the
optimization, we require the loser of the materializing race wait for
the pending copies to finish, and then steal the data without copying
it.

We implement this by taking a shared lock when copying the data and
taking an exclusive lock when stealing the data. The exclusive lock
acquisition ensures that all pending shared locks are finished before
we steal the data.
Name		Date	Size	#Lines	LOC
..		-	-
COW.cpp	H A D	25-Apr-2025	5.4 KiB	153	88
COW.h	H A D	25-Apr-2025	1 KiB	33	14
COWDeleter.cpp	H A D	25-Apr-2025	1.1 KiB	43	33
COWDeleter.h	H A D	25-Apr-2025	2 KiB	67	24
DeviceGuardImplInterface.cpp	H A D	25-Apr-2025	518	17	11
DeviceGuardImplInterface.h	H A D	25-Apr-2025	12.9 KiB	366	148
FakeGuardImpl.h	H A D	25-Apr-2025	3.1 KiB	103	81
GPUTrace.cpp	H A D	25-Apr-2025	425	19	13
GPUTrace.h	H A D	25-Apr-2025	864	29	14
HermeticPyObjectTLS.cpp	H A D	25-Apr-2025	447	22	14
HermeticPyObjectTLS.h	H A D	25-Apr-2025	2.4 KiB	60	17
InlineDeviceGuard.h	H A D	25-Apr-2025	15.3 KiB	430	185
InlineEvent.h	H A D	25-Apr-2025	3.8 KiB	140	117
InlineStreamGuard.h	H A D	25-Apr-2025	9.5 KiB	257	147
LocalDispatchKeySet.cpp	H A D	25-Apr-2025	4.1 KiB	118	69
LocalDispatchKeySet.h	H A D	25-Apr-2025	6.1 KiB	165	98
PyInterpreter.cpp	H A D	25-Apr-2025	4.5 KiB	148	122
PyInterpreter.h	H A D	25-Apr-2025	11 KiB	264	114
PyObjectSlot.cpp	H A D	25-Apr-2025	2.3 KiB	74	51
PyObjectSlot.h	H A D	25-Apr-2025	8 KiB	191	77
PythonDispatcherTLS.cpp	H A D	25-Apr-2025	741	30	23
PythonDispatcherTLS.h	H A D	25-Apr-2025	549	25	19
README-cow.md	H A D	25-Apr-2025	3.3 KiB	68	58
README.md	H A D	25-Apr-2025	815	14	11
SizesAndStrides.cpp	H A D	25-Apr-2025	2.8 KiB	80	64
SizesAndStrides.h	H A D	25-Apr-2025	8.2 KiB	316	242
TorchDispatchModeTLS.cpp	H A D	25-Apr-2025	6.7 KiB	197	165
TorchDispatchModeTLS.h	H A D	25-Apr-2025	2.2 KiB	68	41
VirtualGuardImpl.h	H A D	25-Apr-2025	3.1 KiB	104	84
alloc_cpu.cpp	H A D	25-Apr-2025	4.5 KiB	167	133
alloc_cpu.h	H A D	25-Apr-2025	178	13	7