README-cow.md
1Copy-on-write storage
2=====================
3This library adds support for copy-on-write storage, i.e. lazy copies,
4to tensors. The design maintains the PyTorch invariant that tensors
5alias if and only if they share a storage. Thus, tensors that are lazy
6copies of one another will have distinct storages that share a data
7allocation.
8
9Thread-safety
10-------------
11The correctness of this design hinges on the pre-existing PyTorch user
12requirement (and general default programming assumption) that users
13are responsible for guaranteeing that writes do not take places
14concurrently with reads and other writes.
15
16Lazily copied tensors add a complication to this programming model
17because users are not required to know if lazy copies exist and are
18not required to serialize writes across lazy copies. For example: two
19tensors with distinct storages that share a copy-on-write data context
20may be given to different threads that may do whatever they wish to
21them, and the runtime is required to guarantee its safety.
22
23It turns out that this is not that difficult to protect because, due
24to the copy-on-write requirement, we just need to materialize a tensor
25upon writing. This could be done entirely without synchronization if
26we materialized each copy, however, we have a common-sense
27optimization to elide the copy for the last remaining reference. This
28requires waiting for any pending copies.
29
30### Thread-safety detailed design
31There are two operations that affect the copy-on-write details of a
32tensor:
33
341) lazy-clone (e.g. an explicit call or a hidden implementation detail
35 added through an operator like reshape)
362) materialization (i.e. any write to the tensor)
37
38The key insight that we exploit is that lazy-clone is logically a read
39operation and materialization is logically a write operation. This
40means that, for a given set of tensors that share a storage, if
41materialization is taking place, no other read operation, including
42lazy-clone, can be concurrent with it.
43
44However, this insight only applies within a set of tensors that share
45a storage. We also have to be concerned with tensors with different
46storages that share a copy-on-write context. In this world,
47materialization can race with lazy-clone or even other
48materializations. _However_, in order for this to be the case, there
49must be _at least_ two references to the context. This means that the
50context _can not_ vanish out from under you if you are performing a
51lazy-clone, and hence, it only requires an atomic refcount bump.
52
53The most complicated case is that all lazy-copies are concurrently
54materializing. In this case, because a write is occurring, there are
55no in-flight lazy-copies taking place. We must simply ensure that all
56lazy-copies are able to materialize (read the data) concurrently. If
57we didn't have the aforementioned optimization where the last copy
58steals the data, we could get away with no locking whatsoever: each
59makes a copy and decrements the refcount. However, because of the
60optimization, we require the loser of the materializing race wait for
61the pending copies to finish, and then steal the data without copying
62it.
63
64We implement this by taking a shared lock when copying the data and
65taking an exclusive lock when stealing the data. The exclusive lock
66acquisition ensures that all pending shared locks are finished before
67we steal the data.
68
README.md
1c10/core/impl provides headers for functionality that is only needed in very
2*specific* use-cases (e.g., you are defining a new device type), which are
3generally only needed by C10 or PyTorch code. If you are an ordinary end-user,
4you **should not** use headers in this folder. We permanently give NO
5backwards-compatibility guarantees for implementations in this folder.
6
7Compare with [c10/util](../../util), which provides functionality that is not
8directly related to being a deep learning library (e.g., C++20 polyfills), but
9may still be generally useful and visible to users.
10
11(We don't call this c10/detail, because the detail namespace convention is for
12*header private* details. However, c10::impl may be utilized from external
13headers; it simply indicates that the functionality is not for end users.)
14