xref: /aosp_15_r20/external/eigen/doc/HiPerformance.dox (revision bf2c37156dfe67e5dfebd6d394bad8b2ab5804d4)
1*bf2c3715SXin Li
2*bf2c3715SXin Linamespace Eigen {
3*bf2c3715SXin Li
4*bf2c3715SXin Li/** \page TopicWritingEfficientProductExpression Writing efficient matrix product expressions
5*bf2c3715SXin Li
6*bf2c3715SXin LiIn general achieving good performance with Eigen does no require any special effort:
7*bf2c3715SXin Lisimply write your expressions in the most high level way. This is especially true
8*bf2c3715SXin Lifor small fixed size matrices. For large matrices, however, it might be useful to
9*bf2c3715SXin Litake some care when writing your expressions in order to minimize useless evaluations
10*bf2c3715SXin Liand optimize the performance.
11*bf2c3715SXin LiIn this page we will give a brief overview of the Eigen's internal mechanism to simplify
12*bf2c3715SXin Liand evaluate complex product expressions, and discuss the current limitations.
13*bf2c3715SXin LiIn particular we will focus on expressions matching level 2 and 3 BLAS routines, i.e,
14*bf2c3715SXin Liall kind of matrix products and triangular solvers.
15*bf2c3715SXin Li
16*bf2c3715SXin LiIndeed, in Eigen we have implemented a set of highly optimized routines which are very similar
17*bf2c3715SXin Lito BLAS's ones. Unlike BLAS, those routines are made available to user via a high level and
18*bf2c3715SXin Linatural API. Each of these routines can compute in a single evaluation a wide variety of expressions.
19*bf2c3715SXin LiGiven an expression, the challenge is then to map it to a minimal set of routines.
20*bf2c3715SXin LiAs explained latter, this mechanism has some limitations, and knowing them will allow
21*bf2c3715SXin Liyou to write faster code by making your expressions more Eigen friendly.
22*bf2c3715SXin Li
23*bf2c3715SXin Li\section GEMM General Matrix-Matrix product (GEMM)
24*bf2c3715SXin Li
25*bf2c3715SXin LiLet's start with the most common primitive: the matrix product of general dense matrices.
26*bf2c3715SXin LiIn the BLAS world this corresponds to the GEMM routine. Our equivalent primitive can
27*bf2c3715SXin Liperform the following operation:
28*bf2c3715SXin Li\f$ C.noalias() += \alpha op1(A) op2(B) \f$
29*bf2c3715SXin Liwhere A, B, and C are column and/or row major matrices (or sub-matrices),
30*bf2c3715SXin Lialpha is a scalar value, and op1, op2 can be transpose, adjoint, conjugate, or the identity.
31*bf2c3715SXin LiWhen Eigen detects a matrix product, it analyzes both sides of the product to extract a
32*bf2c3715SXin Liunique scalar factor alpha, and for each side, its effective storage order, shape, and conjugation states.
33*bf2c3715SXin LiMore precisely each side is simplified by iteratively removing trivial expressions such as scalar multiple,
34*bf2c3715SXin Linegation and conjugation. Transpose and Block expressions are not evaluated and they only modify the storage order
35*bf2c3715SXin Liand shape. All other expressions are immediately evaluated.
36*bf2c3715SXin LiFor instance, the following expression:
37*bf2c3715SXin Li\code m1.noalias() -= s4 * (s1 * m2.adjoint() * (-(s3*m3).conjugate()*s2))  \endcode
38*bf2c3715SXin Liis automatically simplified to:
39*bf2c3715SXin Li\code m1.noalias() += (s1*s2*conj(s3)*s4) * m2.adjoint() * m3.conjugate() \endcode
40*bf2c3715SXin Liwhich exactly matches our GEMM routine.
41*bf2c3715SXin Li
42*bf2c3715SXin Li\subsection GEMM_Limitations Limitations
43*bf2c3715SXin LiUnfortunately, this simplification mechanism is not perfect yet and not all expressions which could be
44*bf2c3715SXin Lihandled by a single GEMM-like call are correctly detected.
45*bf2c3715SXin Li<table class="manual" style="width:100%">
46*bf2c3715SXin Li<tr>
47*bf2c3715SXin Li<th>Not optimal expression</th>
48*bf2c3715SXin Li<th>Evaluated as</th>
49*bf2c3715SXin Li<th>Optimal version (single evaluation)</th>
50*bf2c3715SXin Li<th>Comments</th>
51*bf2c3715SXin Li</tr>
52*bf2c3715SXin Li<tr>
53*bf2c3715SXin Li<td>\code
54*bf2c3715SXin Lim1 += m2 * m3; \endcode</td>
55*bf2c3715SXin Li<td>\code
56*bf2c3715SXin Litemp = m2 * m3;
57*bf2c3715SXin Lim1 += temp; \endcode</td>
58*bf2c3715SXin Li<td>\code
59*bf2c3715SXin Lim1.noalias() += m2 * m3; \endcode</td>
60*bf2c3715SXin Li<td>Use .noalias() to tell Eigen the result and right-hand-sides do not alias.
61*bf2c3715SXin Li    Otherwise the product m2 * m3 is evaluated into a temporary.</td>
62*bf2c3715SXin Li</tr>
63*bf2c3715SXin Li<tr class="alt">
64*bf2c3715SXin Li<td></td>
65*bf2c3715SXin Li<td></td>
66*bf2c3715SXin Li<td>\code
67*bf2c3715SXin Lim1.noalias() += s1 * (m2 * m3); \endcode</td>
68*bf2c3715SXin Li<td>This is a special feature of Eigen. Here the product between a scalar
69*bf2c3715SXin Li    and a matrix product does not evaluate the matrix product but instead it
70*bf2c3715SXin Li    returns a matrix product expression tracking the scalar scaling factor. <br>
71*bf2c3715SXin Li    Without this optimization, the matrix product would be evaluated into a
72*bf2c3715SXin Li    temporary as in the next example.</td>
73*bf2c3715SXin Li</tr>
74*bf2c3715SXin Li<tr>
75*bf2c3715SXin Li<td>\code
76*bf2c3715SXin Lim1.noalias() += (m2 * m3).adjoint(); \endcode</td>
77*bf2c3715SXin Li<td>\code
78*bf2c3715SXin Litemp = m2 * m3;
79*bf2c3715SXin Lim1 += temp.adjoint(); \endcode</td>
80*bf2c3715SXin Li<td>\code
81*bf2c3715SXin Lim1.noalias() += m3.adjoint()
82*bf2c3715SXin Li*              * m2.adjoint(); \endcode</td>
83*bf2c3715SXin Li<td>This is because the product expression has the EvalBeforeNesting bit which
84*bf2c3715SXin Li    enforces the evaluation of the product by the Tranpose expression.</td>
85*bf2c3715SXin Li</tr>
86*bf2c3715SXin Li<tr class="alt">
87*bf2c3715SXin Li<td>\code
88*bf2c3715SXin Lim1 = m1 + m2 * m3; \endcode</td>
89*bf2c3715SXin Li<td>\code
90*bf2c3715SXin Litemp = m2 * m3;
91*bf2c3715SXin Lim1 = m1 + temp; \endcode</td>
92*bf2c3715SXin Li<td>\code m1.noalias() += m2 * m3; \endcode</td>
93*bf2c3715SXin Li<td>Here there is no way to detect at compile time that the two m1 are the same,
94*bf2c3715SXin Li    and so the matrix product will be immediately evaluated.</td>
95*bf2c3715SXin Li</tr>
96*bf2c3715SXin Li<tr>
97*bf2c3715SXin Li<td>\code
98*bf2c3715SXin Lim1.noalias() = m4 + m2 * m3; \endcode</td>
99*bf2c3715SXin Li<td>\code
100*bf2c3715SXin Litemp = m2 * m3;
101*bf2c3715SXin Lim1 = m4 + temp; \endcode</td>
102*bf2c3715SXin Li<td>\code
103*bf2c3715SXin Lim1 = m4;
104*bf2c3715SXin Lim1.noalias() += m2 * m3; \endcode</td>
105*bf2c3715SXin Li<td>First of all, here the .noalias() in the first expression is useless because
106*bf2c3715SXin Li    m2*m3 will be evaluated anyway. However, note how this expression can be rewritten
107*bf2c3715SXin Li    so that no temporary is required. (tip: for very small fixed size matrix
108*bf2c3715SXin Li    it is slightly better to rewrite it like this: m1.noalias() = m2 * m3; m1 += m4;</td>
109*bf2c3715SXin Li</tr>
110*bf2c3715SXin Li<tr class="alt">
111*bf2c3715SXin Li<td>\code
112*bf2c3715SXin Lim1.noalias() += (s1*m2).block(..) * m3; \endcode</td>
113*bf2c3715SXin Li<td>\code
114*bf2c3715SXin Litemp = (s1*m2).block(..);
115*bf2c3715SXin Lim1 += temp * m3; \endcode</td>
116*bf2c3715SXin Li<td>\code
117*bf2c3715SXin Lim1.noalias() += s1 * m2.block(..) * m3; \endcode</td>
118*bf2c3715SXin Li<td>This is because our expression analyzer is currently not able to extract trivial
119*bf2c3715SXin Li    expressions nested in a Block expression. Therefore the nested scalar
120*bf2c3715SXin Li    multiple cannot be properly extracted.</td>
121*bf2c3715SXin Li</tr>
122*bf2c3715SXin Li</table>
123*bf2c3715SXin Li
124*bf2c3715SXin LiOf course all these remarks hold for all other kind of products involving triangular or selfadjoint matrices.
125*bf2c3715SXin Li
126*bf2c3715SXin Li*/
127*bf2c3715SXin Li
128*bf2c3715SXin Li}
129