xref: /aosp_15_r20/external/tensorflow/tensorflow/compiler/xla/g3doc/broadcasting.md (revision b6fb3261f9314811a0f4371741dbb8839866f948)
1# Broadcasting semantics
2
3This document describes how the broadcasting semantics in XLA work.
4
5## What is broadcasting?
6
7Broadcasting is the process of making arrays with different shapes have
8compatible shapes for arithmetic operations. The terminology is borrowed from
9Numpy
10[broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
11
12Broadcasting may be required for operations between multi-dimensional arrays of
13different ranks, or between multi-dimensional arrays with different but
14compatible shapes. Consider the addition `X+v` where `X` is a matrix (an array
15of rank 2) and `v` is a vector (an array of rank 1). To perform element-wise
16addition, XLA needs to "broadcast" the vector `v` to the same rank as the
17matrix `X`, by replicating `v` a certain number of times. The vector's length
18has to match at least one of the dimensions of the matrix.
19
20For example:
21
22    |1 2 3| + |7 8 9|
23    |4 5 6|
24
25The matrix's dimensions are (2,3), the vector's are (3). The vector is broadcast
26by replicating it over rows to get:
27
28    |1 2 3| + |7 8 9| = |8  10 12|
29    |4 5 6|   |7 8 9|   |11 13 15|
30
31In Numpy, this is called
32[broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
33
34## Principles
35
36The XLA language is as strict and explicit as possible, avoiding implicit and
37"magical" features. Such features may make some computations slightly easier to
38define, at the cost of more assumptions baked into user code that will be
39difficult to change in the long term. If necessary, implicit and magical
40features can be added in client-level wrappers.
41
42In regards to broadcasting, explicit broadcasting specifications on operations
43between arrays of different ranks is required. This is different from Numpy,
44which infers the specification when possible.
45
46## Broadcasting a lower-rank array onto a higher-rank array
47
48*Scalars* can always be broadcast over arrays without an explicit specification
49of broadcasting dimensions. An element-wise binary operation between a scalar
50and an array means applying the operation with the scalar for each element in
51the array. For example, adding a scalar to a matrix means producing a matrix
52each element of which is a sum of the scalar with the corresponding input
53matrix's element.
54
55    |1 2 3| + 7 = |8  9  10|
56    |4 5 6|       |11 12 13|
57
58Most broadcasting needs can be captured by using a tuple of dimensions on a
59binary operation. When the inputs to the operation have different ranks, this
60broadcasting tuple specifies which dimension(s) in the **higher-rank** array to
61match with the **lower-rank** array.
62
63Consider the previous example, instead of adding a scalar to a (2,3) matrix, add
64a vector of dimension (3) to a matrix of dimensions (2,3). *Without specifying
65broadcasting, this operation is invalid.* To correctly request matrix-vector
66addition, specify the broadcasting dimension to be (1), meaning the vector's
67dimension is matched to dimension 1 of the matrix. In 2D, if dimension 0 is
68considered as rows and dimension 1 as columns, this means that each element of
69the vector becomes a column of a size matching the number of rows in the matrix:
70
71    |7 8 9| ==> |7 8 9|
72                |7 8 9|
73
74As a more complex example, consider adding a 3-element vector (dimension (3)) to
75a 3x3 matrix (dimensions (3,3)). There are two ways broadcasting can happen for
76this example:
77
78(1) A broadcasting dimension of 1 can be used. Each vector element becomes a
79column and the vector is duplicated for each row in the matrix.
80
81    |7 8 9| ==> |7 8 9|
82                |7 8 9|
83                |7 8 9|
84
85(2) A broadcasting dimension of 0 can be used. Each vector element becomes a row
86and the vector is duplicated for each column in the matrix.
87
88     |7| ==> |7 7 7|
89     |8|     |8 8 8|
90     |9|     |9 9 9|
91
92> Note: when adding a 2x3 matrix to a 3-element vector, a broadcasting dimension
93> of 0 is invalid.
94
95The broadcasting dimensions can be a tuple that describes how a smaller rank
96shape is broadcast into a larger rank shape. For example, given a 2x3x4 cuboid
97and a 3x4 matrix, a broadcasting tuple (1,2) means matching the matrix to
98dimensions 1 and 2 of the cuboid.
99
100This type of broadcast is used in the binary ops in `XlaBuilder`, if the
101`broadcast_dimensions` argument is given. For example, see
102[XlaBuilder::Add](https://www.tensorflow.org/code/tensorflow/compiler/xla/client/xla_builder.cc).
103In the XLA source code, this type of broadcasting is sometimes called "InDim"
104broadcasting.
105
106### Formal definition
107
108The broadcasting attribute allows matching a lower-rank array to a higher-rank
109array, by specifying which dimensions of the higher-rank array to match. For
110example, for an array with dimensions MxNxPxQ, a vector with dimension T can be
111matched as follows:
112
113              MxNxPxQ
114
115    dim 3:          T
116    dim 2:        T
117    dim 1:      T
118    dim 0:    T
119
120In each case, T has to be equal to the matching dimension of the higher-rank
121array. The vector's values are then broadcast from the matched dimension to all
122the other dimensions.
123
124To match a TxV matrix onto the MxNxPxQ array, a pair of broadcasting dimensions
125are used:
126
127              MxNxPxQ
128    dim 2,3:      T V
129    dim 1,2:    T V
130    dim 0,3:  T     V
131    etc...
132
133The order of dimensions in the broadcasting tuple has to be the order in which
134the lower-rank array's dimensions are expected to match the higher-rank array's
135dimensions. The first element in the tuple says which dimension in the
136higher-rank array has to match dimension 0 in the lower-rank array. The second
137element for dimension 1, and so on. The order of broadcast dimensions has to be
138strictly increasing. For example, in the previous example it is illegal to match
139V to N and T to P; it is also illegal to match V to both P and N.
140
141## Broadcasting similar-rank arrays with degenerate dimensions
142
143A related broadcasting problem is broadcasting two arrays that have the same
144rank but different dimension sizes. Similarly to Numpy's rules, this is only
145possible when the arrays are *compatible*. Two arrays are compatible when all
146their dimensions are compatible. Two dimensions are compatible if:
147
148*   They are equal, or
149*   One of them is 1 (a "degenerate" dimension)
150
151When two compatible arrays are encountered, the result shape has the maximum
152among the two inputs at every dimension index.
153
154Examples:
155
1561.  (2,1) and (2,3) broadcast to (2,3).
1572.  (1,2,5) and (7,2,5) broadcast to (7,2,5)
1583.  (7,2,5) and (7,1,5) broadcast to (7,2,5)
1594.  (7,2,5) and (7,2,6) are incompatible and cannot be broadcast.
160
161A special case arises, and is also supported, where each of the input arrays has
162a degenerate dimension at a different index. In this case, the result is an
163"outer operation": (2,1) and (1,3) broadcast to (2,3). For more examples,
164consult the
165[Numpy documentation on broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
166
167## Broadcast composition
168
169Broadcasting of a lower-rank array to a higher-rank array **and** broadcasting
170using degenerate dimensions can both be performed in the same binary operation.
171For example, a vector of size 4 and a matrix of size 1x2 can be added together
172using broadcast dimensions value of (0):
173
174    |1 2 3 4| + [5 6]    // [5 6] is a 1x2 matrix, not a vector.
175
176First the vector is broadcast up to rank 2 (matrix) using the broadcast
177dimensions. The single value (0) in the broadcast dimensions indicates that
178dimension zero of the vector matches to dimension zero of the matrix. This
179produces a matrix of size 4xM where the value M is chosen to match the
180corresponding dimension size in the 1x2 array. Therefore, a 4x2 matrix is
181produced:
182
183    |1 1| + [5 6]
184    |2 2|
185    |3 3|
186    |4 4|
187
188Then "degenerate dimension broadcasting" broadcasts dimension zero of the 1x2
189matrix to match the corresponding dimension size of the right hand side:
190
191    |1 1| + |5 6|     |6  7|
192    |2 2| + |5 6|  =  |7  8|
193    |3 3| + |5 6|     |8  9|
194    |4 4| + |5 6|     |9 10|
195
196A more complicated example is a matrix of size 1x2 added to an array of size
1974x3x1 using broadcast dimensions of (1, 2). First the 1x2 matrix is broadcast up
198to rank 3 using the broadcast dimensions to produces an intermediate Mx1x2 array
199where the dimension size M is determined by the size of the larger operand (the
2004x3x1 array) producing a 4x1x2 intermediate array. The M is at dimension 0
201(left-most dimension) because the dimensions 1 and 2 are mapped to the
202dimensions of the original 1x2 matrix as the broadcast dimension are (1, 2).
203This intermediate array can be added to the 4x3x1 matrix using broadcasting of
204degenerate dimensions to produce a 4x3x2 array result.
205