1*9880d681SAndroid Build Coastguard Worker========================== 2*9880d681SAndroid Build Coastguard WorkerAuto-Vectorization in LLVM 3*9880d681SAndroid Build Coastguard Worker========================== 4*9880d681SAndroid Build Coastguard Worker 5*9880d681SAndroid Build Coastguard Worker.. contents:: 6*9880d681SAndroid Build Coastguard Worker :local: 7*9880d681SAndroid Build Coastguard Worker 8*9880d681SAndroid Build Coastguard WorkerLLVM has two vectorizers: The :ref:`Loop Vectorizer <loop-vectorizer>`, 9*9880d681SAndroid Build Coastguard Workerwhich operates on Loops, and the :ref:`SLP Vectorizer 10*9880d681SAndroid Build Coastguard Worker<slp-vectorizer>`. These vectorizers 11*9880d681SAndroid Build Coastguard Workerfocus on different optimization opportunities and use different techniques. 12*9880d681SAndroid Build Coastguard WorkerThe SLP vectorizer merges multiple scalars that are found in the code into 13*9880d681SAndroid Build Coastguard Workervectors while the Loop Vectorizer widens instructions in loops 14*9880d681SAndroid Build Coastguard Workerto operate on multiple consecutive iterations. 15*9880d681SAndroid Build Coastguard Worker 16*9880d681SAndroid Build Coastguard WorkerBoth the Loop Vectorizer and the SLP Vectorizer are enabled by default. 17*9880d681SAndroid Build Coastguard Worker 18*9880d681SAndroid Build Coastguard Worker.. _loop-vectorizer: 19*9880d681SAndroid Build Coastguard Worker 20*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer 21*9880d681SAndroid Build Coastguard Worker=================== 22*9880d681SAndroid Build Coastguard Worker 23*9880d681SAndroid Build Coastguard WorkerUsage 24*9880d681SAndroid Build Coastguard Worker----- 25*9880d681SAndroid Build Coastguard Worker 26*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer is enabled by default, but it can be disabled 27*9880d681SAndroid Build Coastguard Workerthrough clang using the command line flag: 28*9880d681SAndroid Build Coastguard Worker 29*9880d681SAndroid Build Coastguard Worker.. code-block:: console 30*9880d681SAndroid Build Coastguard Worker 31*9880d681SAndroid Build Coastguard Worker $ clang ... -fno-vectorize file.c 32*9880d681SAndroid Build Coastguard Worker 33*9880d681SAndroid Build Coastguard WorkerCommand line flags 34*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^ 35*9880d681SAndroid Build Coastguard Worker 36*9880d681SAndroid Build Coastguard WorkerThe loop vectorizer uses a cost model to decide on the optimal vectorization factor 37*9880d681SAndroid Build Coastguard Workerand unroll factor. However, users of the vectorizer can force the vectorizer to use 38*9880d681SAndroid Build Coastguard Workerspecific values. Both 'clang' and 'opt' support the flags below. 39*9880d681SAndroid Build Coastguard Worker 40*9880d681SAndroid Build Coastguard WorkerUsers can control the vectorization SIMD width using the command line flag "-force-vector-width". 41*9880d681SAndroid Build Coastguard Worker 42*9880d681SAndroid Build Coastguard Worker.. code-block:: console 43*9880d681SAndroid Build Coastguard Worker 44*9880d681SAndroid Build Coastguard Worker $ clang -mllvm -force-vector-width=8 ... 45*9880d681SAndroid Build Coastguard Worker $ opt -loop-vectorize -force-vector-width=8 ... 46*9880d681SAndroid Build Coastguard Worker 47*9880d681SAndroid Build Coastguard WorkerUsers can control the unroll factor using the command line flag "-force-vector-unroll" 48*9880d681SAndroid Build Coastguard Worker 49*9880d681SAndroid Build Coastguard Worker.. code-block:: console 50*9880d681SAndroid Build Coastguard Worker 51*9880d681SAndroid Build Coastguard Worker $ clang -mllvm -force-vector-unroll=2 ... 52*9880d681SAndroid Build Coastguard Worker $ opt -loop-vectorize -force-vector-unroll=2 ... 53*9880d681SAndroid Build Coastguard Worker 54*9880d681SAndroid Build Coastguard WorkerPragma loop hint directives 55*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^ 56*9880d681SAndroid Build Coastguard Worker 57*9880d681SAndroid Build Coastguard WorkerThe ``#pragma clang loop`` directive allows loop vectorization hints to be 58*9880d681SAndroid Build Coastguard Workerspecified for the subsequent for, while, do-while, or c++11 range-based for 59*9880d681SAndroid Build Coastguard Workerloop. The directive allows vectorization and interleaving to be enabled or 60*9880d681SAndroid Build Coastguard Workerdisabled. Vector width as well as interleave count can also be manually 61*9880d681SAndroid Build Coastguard Workerspecified. The following example explicitly enables vectorization and 62*9880d681SAndroid Build Coastguard Workerinterleaving: 63*9880d681SAndroid Build Coastguard Worker 64*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 65*9880d681SAndroid Build Coastguard Worker 66*9880d681SAndroid Build Coastguard Worker #pragma clang loop vectorize(enable) interleave(enable) 67*9880d681SAndroid Build Coastguard Worker while(...) { 68*9880d681SAndroid Build Coastguard Worker ... 69*9880d681SAndroid Build Coastguard Worker } 70*9880d681SAndroid Build Coastguard Worker 71*9880d681SAndroid Build Coastguard WorkerThe following example implicitly enables vectorization and interleaving by 72*9880d681SAndroid Build Coastguard Workerspecifying a vector width and interleaving count: 73*9880d681SAndroid Build Coastguard Worker 74*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 75*9880d681SAndroid Build Coastguard Worker 76*9880d681SAndroid Build Coastguard Worker #pragma clang loop vectorize_width(2) interleave_count(2) 77*9880d681SAndroid Build Coastguard Worker for(...) { 78*9880d681SAndroid Build Coastguard Worker ... 79*9880d681SAndroid Build Coastguard Worker } 80*9880d681SAndroid Build Coastguard Worker 81*9880d681SAndroid Build Coastguard WorkerSee the Clang 82*9880d681SAndroid Build Coastguard Worker`language extensions 83*9880d681SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations>`_ 84*9880d681SAndroid Build Coastguard Workerfor details. 85*9880d681SAndroid Build Coastguard Worker 86*9880d681SAndroid Build Coastguard WorkerDiagnostics 87*9880d681SAndroid Build Coastguard Worker----------- 88*9880d681SAndroid Build Coastguard Worker 89*9880d681SAndroid Build Coastguard WorkerMany loops cannot be vectorized including loops with complicated control flow, 90*9880d681SAndroid Build Coastguard Workerunvectorizable types, and unvectorizable calls. The loop vectorizer generates 91*9880d681SAndroid Build Coastguard Workeroptimization remarks which can be queried using command line options to identify 92*9880d681SAndroid Build Coastguard Workerand diagnose loops that are skipped by the loop-vectorizer. 93*9880d681SAndroid Build Coastguard Worker 94*9880d681SAndroid Build Coastguard WorkerOptimization remarks are enabled using: 95*9880d681SAndroid Build Coastguard Worker 96*9880d681SAndroid Build Coastguard Worker``-Rpass=loop-vectorize`` identifies loops that were successfully vectorized. 97*9880d681SAndroid Build Coastguard Worker 98*9880d681SAndroid Build Coastguard Worker``-Rpass-missed=loop-vectorize`` identifies loops that failed vectorization and 99*9880d681SAndroid Build Coastguard Workerindicates if vectorization was specified. 100*9880d681SAndroid Build Coastguard Worker 101*9880d681SAndroid Build Coastguard Worker``-Rpass-analysis=loop-vectorize`` identifies the statements that caused 102*9880d681SAndroid Build Coastguard Workervectorization to fail. 103*9880d681SAndroid Build Coastguard Worker 104*9880d681SAndroid Build Coastguard WorkerConsider the following loop: 105*9880d681SAndroid Build Coastguard Worker 106*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 107*9880d681SAndroid Build Coastguard Worker 108*9880d681SAndroid Build Coastguard Worker #pragma clang loop vectorize(enable) 109*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < Length; i++) { 110*9880d681SAndroid Build Coastguard Worker switch(A[i]) { 111*9880d681SAndroid Build Coastguard Worker case 0: A[i] = i*2; break; 112*9880d681SAndroid Build Coastguard Worker case 1: A[i] = i; break; 113*9880d681SAndroid Build Coastguard Worker default: A[i] = 0; 114*9880d681SAndroid Build Coastguard Worker } 115*9880d681SAndroid Build Coastguard Worker } 116*9880d681SAndroid Build Coastguard Worker 117*9880d681SAndroid Build Coastguard WorkerThe command line ``-Rpass-missed=loop-vectorized`` prints the remark: 118*9880d681SAndroid Build Coastguard Worker 119*9880d681SAndroid Build Coastguard Worker.. code-block:: console 120*9880d681SAndroid Build Coastguard Worker 121*9880d681SAndroid Build Coastguard Worker no_switch.cpp:4:5: remark: loop not vectorized: vectorization is explicitly enabled [-Rpass-missed=loop-vectorize] 122*9880d681SAndroid Build Coastguard Worker 123*9880d681SAndroid Build Coastguard WorkerAnd the command line ``-Rpass-analysis=loop-vectorize`` indicates that the 124*9880d681SAndroid Build Coastguard Workerswitch statement cannot be vectorized. 125*9880d681SAndroid Build Coastguard Worker 126*9880d681SAndroid Build Coastguard Worker.. code-block:: console 127*9880d681SAndroid Build Coastguard Worker 128*9880d681SAndroid Build Coastguard Worker no_switch.cpp:4:5: remark: loop not vectorized: loop contains a switch statement [-Rpass-analysis=loop-vectorize] 129*9880d681SAndroid Build Coastguard Worker switch(A[i]) { 130*9880d681SAndroid Build Coastguard Worker ^ 131*9880d681SAndroid Build Coastguard Worker 132*9880d681SAndroid Build Coastguard WorkerTo ensure line and column numbers are produced include the command line options 133*9880d681SAndroid Build Coastguard Worker``-gline-tables-only`` and ``-gcolumn-info``. See the Clang `user manual 134*9880d681SAndroid Build Coastguard Worker<http://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports>`_ 135*9880d681SAndroid Build Coastguard Workerfor details 136*9880d681SAndroid Build Coastguard Worker 137*9880d681SAndroid Build Coastguard WorkerFeatures 138*9880d681SAndroid Build Coastguard Worker-------- 139*9880d681SAndroid Build Coastguard Worker 140*9880d681SAndroid Build Coastguard WorkerThe LLVM Loop Vectorizer has a number of features that allow it to vectorize 141*9880d681SAndroid Build Coastguard Workercomplex loops. 142*9880d681SAndroid Build Coastguard Worker 143*9880d681SAndroid Build Coastguard WorkerLoops with unknown trip count 144*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 145*9880d681SAndroid Build Coastguard Worker 146*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer supports loops with an unknown trip count. 147*9880d681SAndroid Build Coastguard WorkerIn the loop below, the iteration ``start`` and ``finish`` points are unknown, 148*9880d681SAndroid Build Coastguard Workerand the Loop Vectorizer has a mechanism to vectorize loops that do not start 149*9880d681SAndroid Build Coastguard Workerat zero. In this example, 'n' may not be a multiple of the vector width, and 150*9880d681SAndroid Build Coastguard Workerthe vectorizer has to execute the last few iterations as scalar code. Keeping 151*9880d681SAndroid Build Coastguard Workera scalar copy of the loop increases the code size. 152*9880d681SAndroid Build Coastguard Worker 153*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 154*9880d681SAndroid Build Coastguard Worker 155*9880d681SAndroid Build Coastguard Worker void bar(float *A, float* B, float K, int start, int end) { 156*9880d681SAndroid Build Coastguard Worker for (int i = start; i < end; ++i) 157*9880d681SAndroid Build Coastguard Worker A[i] *= B[i] + K; 158*9880d681SAndroid Build Coastguard Worker } 159*9880d681SAndroid Build Coastguard Worker 160*9880d681SAndroid Build Coastguard WorkerRuntime Checks of Pointers 161*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^ 162*9880d681SAndroid Build Coastguard Worker 163*9880d681SAndroid Build Coastguard WorkerIn the example below, if the pointers A and B point to consecutive addresses, 164*9880d681SAndroid Build Coastguard Workerthen it is illegal to vectorize the code because some elements of A will be 165*9880d681SAndroid Build Coastguard Workerwritten before they are read from array B. 166*9880d681SAndroid Build Coastguard Worker 167*9880d681SAndroid Build Coastguard WorkerSome programmers use the 'restrict' keyword to notify the compiler that the 168*9880d681SAndroid Build Coastguard Workerpointers are disjointed, but in our example, the Loop Vectorizer has no way of 169*9880d681SAndroid Build Coastguard Workerknowing that the pointers A and B are unique. The Loop Vectorizer handles this 170*9880d681SAndroid Build Coastguard Workerloop by placing code that checks, at runtime, if the arrays A and B point to 171*9880d681SAndroid Build Coastguard Workerdisjointed memory locations. If arrays A and B overlap, then the scalar version 172*9880d681SAndroid Build Coastguard Workerof the loop is executed. 173*9880d681SAndroid Build Coastguard Worker 174*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 175*9880d681SAndroid Build Coastguard Worker 176*9880d681SAndroid Build Coastguard Worker void bar(float *A, float* B, float K, int n) { 177*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < n; ++i) 178*9880d681SAndroid Build Coastguard Worker A[i] *= B[i] + K; 179*9880d681SAndroid Build Coastguard Worker } 180*9880d681SAndroid Build Coastguard Worker 181*9880d681SAndroid Build Coastguard Worker 182*9880d681SAndroid Build Coastguard WorkerReductions 183*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^ 184*9880d681SAndroid Build Coastguard Worker 185*9880d681SAndroid Build Coastguard WorkerIn this example the ``sum`` variable is used by consecutive iterations of 186*9880d681SAndroid Build Coastguard Workerthe loop. Normally, this would prevent vectorization, but the vectorizer can 187*9880d681SAndroid Build Coastguard Workerdetect that 'sum' is a reduction variable. The variable 'sum' becomes a vector 188*9880d681SAndroid Build Coastguard Workerof integers, and at the end of the loop the elements of the array are added 189*9880d681SAndroid Build Coastguard Workertogether to create the correct result. We support a number of different 190*9880d681SAndroid Build Coastguard Workerreduction operations, such as addition, multiplication, XOR, AND and OR. 191*9880d681SAndroid Build Coastguard Worker 192*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 193*9880d681SAndroid Build Coastguard Worker 194*9880d681SAndroid Build Coastguard Worker int foo(int *A, int *B, int n) { 195*9880d681SAndroid Build Coastguard Worker unsigned sum = 0; 196*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < n; ++i) 197*9880d681SAndroid Build Coastguard Worker sum += A[i] + 5; 198*9880d681SAndroid Build Coastguard Worker return sum; 199*9880d681SAndroid Build Coastguard Worker } 200*9880d681SAndroid Build Coastguard Worker 201*9880d681SAndroid Build Coastguard WorkerWe support floating point reduction operations when `-ffast-math` is used. 202*9880d681SAndroid Build Coastguard Worker 203*9880d681SAndroid Build Coastguard WorkerInductions 204*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^ 205*9880d681SAndroid Build Coastguard Worker 206*9880d681SAndroid Build Coastguard WorkerIn this example the value of the induction variable ``i`` is saved into an 207*9880d681SAndroid Build Coastguard Workerarray. The Loop Vectorizer knows to vectorize induction variables. 208*9880d681SAndroid Build Coastguard Worker 209*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 210*9880d681SAndroid Build Coastguard Worker 211*9880d681SAndroid Build Coastguard Worker void bar(float *A, float* B, float K, int n) { 212*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < n; ++i) 213*9880d681SAndroid Build Coastguard Worker A[i] = i; 214*9880d681SAndroid Build Coastguard Worker } 215*9880d681SAndroid Build Coastguard Worker 216*9880d681SAndroid Build Coastguard WorkerIf Conversion 217*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^ 218*9880d681SAndroid Build Coastguard Worker 219*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer is able to "flatten" the IF statement in the code and 220*9880d681SAndroid Build Coastguard Workergenerate a single stream of instructions. The Loop Vectorizer supports any 221*9880d681SAndroid Build Coastguard Workercontrol flow in the innermost loop. The innermost loop may contain complex 222*9880d681SAndroid Build Coastguard Workernesting of IFs, ELSEs and even GOTOs. 223*9880d681SAndroid Build Coastguard Worker 224*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 225*9880d681SAndroid Build Coastguard Worker 226*9880d681SAndroid Build Coastguard Worker int foo(int *A, int *B, int n) { 227*9880d681SAndroid Build Coastguard Worker unsigned sum = 0; 228*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < n; ++i) 229*9880d681SAndroid Build Coastguard Worker if (A[i] > B[i]) 230*9880d681SAndroid Build Coastguard Worker sum += A[i] + 5; 231*9880d681SAndroid Build Coastguard Worker return sum; 232*9880d681SAndroid Build Coastguard Worker } 233*9880d681SAndroid Build Coastguard Worker 234*9880d681SAndroid Build Coastguard WorkerPointer Induction Variables 235*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^ 236*9880d681SAndroid Build Coastguard Worker 237*9880d681SAndroid Build Coastguard WorkerThis example uses the "accumulate" function of the standard c++ library. This 238*9880d681SAndroid Build Coastguard Workerloop uses C++ iterators, which are pointers, and not integer indices. 239*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer detects pointer induction variables and can vectorize 240*9880d681SAndroid Build Coastguard Workerthis loop. This feature is important because many C++ programs use iterators. 241*9880d681SAndroid Build Coastguard Worker 242*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 243*9880d681SAndroid Build Coastguard Worker 244*9880d681SAndroid Build Coastguard Worker int baz(int *A, int n) { 245*9880d681SAndroid Build Coastguard Worker return std::accumulate(A, A + n, 0); 246*9880d681SAndroid Build Coastguard Worker } 247*9880d681SAndroid Build Coastguard Worker 248*9880d681SAndroid Build Coastguard WorkerReverse Iterators 249*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^ 250*9880d681SAndroid Build Coastguard Worker 251*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer can vectorize loops that count backwards. 252*9880d681SAndroid Build Coastguard Worker 253*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 254*9880d681SAndroid Build Coastguard Worker 255*9880d681SAndroid Build Coastguard Worker int foo(int *A, int *B, int n) { 256*9880d681SAndroid Build Coastguard Worker for (int i = n; i > 0; --i) 257*9880d681SAndroid Build Coastguard Worker A[i] +=1; 258*9880d681SAndroid Build Coastguard Worker } 259*9880d681SAndroid Build Coastguard Worker 260*9880d681SAndroid Build Coastguard WorkerScatter / Gather 261*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^ 262*9880d681SAndroid Build Coastguard Worker 263*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions 264*9880d681SAndroid Build Coastguard Workerthat scatter/gathers memory. 265*9880d681SAndroid Build Coastguard Worker 266*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 267*9880d681SAndroid Build Coastguard Worker 268*9880d681SAndroid Build Coastguard Worker int foo(int * A, int * B, int n) { 269*9880d681SAndroid Build Coastguard Worker for (intptr_t i = 0; i < n; ++i) 270*9880d681SAndroid Build Coastguard Worker A[i] += B[i * 4]; 271*9880d681SAndroid Build Coastguard Worker } 272*9880d681SAndroid Build Coastguard Worker 273*9880d681SAndroid Build Coastguard WorkerIn many situations the cost model will inform LLVM that this is not beneficial 274*9880d681SAndroid Build Coastguard Workerand LLVM will only vectorize such code if forced with "-mllvm -force-vector-width=#". 275*9880d681SAndroid Build Coastguard Worker 276*9880d681SAndroid Build Coastguard WorkerVectorization of Mixed Types 277*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 278*9880d681SAndroid Build Coastguard Worker 279*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer can vectorize programs with mixed types. The Vectorizer 280*9880d681SAndroid Build Coastguard Workercost model can estimate the cost of the type conversion and decide if 281*9880d681SAndroid Build Coastguard Workervectorization is profitable. 282*9880d681SAndroid Build Coastguard Worker 283*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 284*9880d681SAndroid Build Coastguard Worker 285*9880d681SAndroid Build Coastguard Worker int foo(int *A, char *B, int n, int k) { 286*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < n; ++i) 287*9880d681SAndroid Build Coastguard Worker A[i] += 4 * B[i]; 288*9880d681SAndroid Build Coastguard Worker } 289*9880d681SAndroid Build Coastguard Worker 290*9880d681SAndroid Build Coastguard WorkerGlobal Structures Alias Analysis 291*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 292*9880d681SAndroid Build Coastguard Worker 293*9880d681SAndroid Build Coastguard WorkerAccess to global structures can also be vectorized, with alias analysis being 294*9880d681SAndroid Build Coastguard Workerused to make sure accesses don't alias. Run-time checks can also be added on 295*9880d681SAndroid Build Coastguard Workerpointer access to structure members. 296*9880d681SAndroid Build Coastguard Worker 297*9880d681SAndroid Build Coastguard WorkerMany variations are supported, but some that rely on undefined behaviour being 298*9880d681SAndroid Build Coastguard Workerignored (as other compilers do) are still being left un-vectorized. 299*9880d681SAndroid Build Coastguard Worker 300*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 301*9880d681SAndroid Build Coastguard Worker 302*9880d681SAndroid Build Coastguard Worker struct { int A[100], K, B[100]; } Foo; 303*9880d681SAndroid Build Coastguard Worker 304*9880d681SAndroid Build Coastguard Worker int foo() { 305*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < 100; ++i) 306*9880d681SAndroid Build Coastguard Worker Foo.A[i] = Foo.B[i] + 100; 307*9880d681SAndroid Build Coastguard Worker } 308*9880d681SAndroid Build Coastguard Worker 309*9880d681SAndroid Build Coastguard WorkerVectorization of function calls 310*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 311*9880d681SAndroid Build Coastguard Worker 312*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorize can vectorize intrinsic math functions. 313*9880d681SAndroid Build Coastguard WorkerSee the table below for a list of these functions. 314*9880d681SAndroid Build Coastguard Worker 315*9880d681SAndroid Build Coastguard Worker+-----+-----+---------+ 316*9880d681SAndroid Build Coastguard Worker| pow | exp | exp2 | 317*9880d681SAndroid Build Coastguard Worker+-----+-----+---------+ 318*9880d681SAndroid Build Coastguard Worker| sin | cos | sqrt | 319*9880d681SAndroid Build Coastguard Worker+-----+-----+---------+ 320*9880d681SAndroid Build Coastguard Worker| log |log2 | log10 | 321*9880d681SAndroid Build Coastguard Worker+-----+-----+---------+ 322*9880d681SAndroid Build Coastguard Worker|fabs |floor| ceil | 323*9880d681SAndroid Build Coastguard Worker+-----+-----+---------+ 324*9880d681SAndroid Build Coastguard Worker|fma |trunc|nearbyint| 325*9880d681SAndroid Build Coastguard Worker+-----+-----+---------+ 326*9880d681SAndroid Build Coastguard Worker| | | fmuladd | 327*9880d681SAndroid Build Coastguard Worker+-----+-----+---------+ 328*9880d681SAndroid Build Coastguard Worker 329*9880d681SAndroid Build Coastguard WorkerThe loop vectorizer knows about special instructions on the target and will 330*9880d681SAndroid Build Coastguard Workervectorize a loop containing a function call that maps to the instructions. For 331*9880d681SAndroid Build Coastguard Workerexample, the loop below will be vectorized on Intel x86 if the SSE4.1 roundps 332*9880d681SAndroid Build Coastguard Workerinstruction is available. 333*9880d681SAndroid Build Coastguard Worker 334*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 335*9880d681SAndroid Build Coastguard Worker 336*9880d681SAndroid Build Coastguard Worker void foo(float *f) { 337*9880d681SAndroid Build Coastguard Worker for (int i = 0; i != 1024; ++i) 338*9880d681SAndroid Build Coastguard Worker f[i] = floorf(f[i]); 339*9880d681SAndroid Build Coastguard Worker } 340*9880d681SAndroid Build Coastguard Worker 341*9880d681SAndroid Build Coastguard WorkerPartial unrolling during vectorization 342*9880d681SAndroid Build Coastguard Worker^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 343*9880d681SAndroid Build Coastguard Worker 344*9880d681SAndroid Build Coastguard WorkerModern processors feature multiple execution units, and only programs that contain a 345*9880d681SAndroid Build Coastguard Workerhigh degree of parallelism can fully utilize the entire width of the machine. 346*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer increases the instruction level parallelism (ILP) by 347*9880d681SAndroid Build Coastguard Workerperforming partial-unrolling of loops. 348*9880d681SAndroid Build Coastguard Worker 349*9880d681SAndroid Build Coastguard WorkerIn the example below the entire array is accumulated into the variable 'sum'. 350*9880d681SAndroid Build Coastguard WorkerThis is inefficient because only a single execution port can be used by the processor. 351*9880d681SAndroid Build Coastguard WorkerBy unrolling the code the Loop Vectorizer allows two or more execution ports 352*9880d681SAndroid Build Coastguard Workerto be used simultaneously. 353*9880d681SAndroid Build Coastguard Worker 354*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 355*9880d681SAndroid Build Coastguard Worker 356*9880d681SAndroid Build Coastguard Worker int foo(int *A, int *B, int n) { 357*9880d681SAndroid Build Coastguard Worker unsigned sum = 0; 358*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < n; ++i) 359*9880d681SAndroid Build Coastguard Worker sum += A[i]; 360*9880d681SAndroid Build Coastguard Worker return sum; 361*9880d681SAndroid Build Coastguard Worker } 362*9880d681SAndroid Build Coastguard Worker 363*9880d681SAndroid Build Coastguard WorkerThe Loop Vectorizer uses a cost model to decide when it is profitable to unroll loops. 364*9880d681SAndroid Build Coastguard WorkerThe decision to unroll the loop depends on the register pressure and the generated code size. 365*9880d681SAndroid Build Coastguard Worker 366*9880d681SAndroid Build Coastguard WorkerPerformance 367*9880d681SAndroid Build Coastguard Worker----------- 368*9880d681SAndroid Build Coastguard Worker 369*9880d681SAndroid Build Coastguard WorkerThis section shows the execution time of Clang on a simple benchmark: 370*9880d681SAndroid Build Coastguard Worker`gcc-loops <http://llvm.org/viewvc/llvm-project/test-suite/trunk/SingleSource/UnitTests/Vectorizer/>`_. 371*9880d681SAndroid Build Coastguard WorkerThis benchmarks is a collection of loops from the GCC autovectorization 372*9880d681SAndroid Build Coastguard Worker`page <http://gcc.gnu.org/projects/tree-ssa/vectorization.html>`_ by Dorit Nuzman. 373*9880d681SAndroid Build Coastguard Worker 374*9880d681SAndroid Build Coastguard WorkerThe chart below compares GCC-4.7, ICC-13, and Clang-SVN with and without loop vectorization at -O3, tuned for "corei7-avx", running on a Sandybridge iMac. 375*9880d681SAndroid Build Coastguard WorkerThe Y-axis shows the time in msec. Lower is better. The last column shows the geomean of all the kernels. 376*9880d681SAndroid Build Coastguard Worker 377*9880d681SAndroid Build Coastguard Worker.. image:: gcc-loops.png 378*9880d681SAndroid Build Coastguard Worker 379*9880d681SAndroid Build Coastguard WorkerAnd Linpack-pc with the same configuration. Result is Mflops, higher is better. 380*9880d681SAndroid Build Coastguard Worker 381*9880d681SAndroid Build Coastguard Worker.. image:: linpack-pc.png 382*9880d681SAndroid Build Coastguard Worker 383*9880d681SAndroid Build Coastguard Worker.. _slp-vectorizer: 384*9880d681SAndroid Build Coastguard Worker 385*9880d681SAndroid Build Coastguard WorkerThe SLP Vectorizer 386*9880d681SAndroid Build Coastguard Worker================== 387*9880d681SAndroid Build Coastguard Worker 388*9880d681SAndroid Build Coastguard WorkerDetails 389*9880d681SAndroid Build Coastguard Worker------- 390*9880d681SAndroid Build Coastguard Worker 391*9880d681SAndroid Build Coastguard WorkerThe goal of SLP vectorization (a.k.a. superword-level parallelism) is 392*9880d681SAndroid Build Coastguard Workerto combine similar independent instructions 393*9880d681SAndroid Build Coastguard Workerinto vector instructions. Memory accesses, arithmetic operations, comparison 394*9880d681SAndroid Build Coastguard Workeroperations, PHI-nodes, can all be vectorized using this technique. 395*9880d681SAndroid Build Coastguard Worker 396*9880d681SAndroid Build Coastguard WorkerFor example, the following function performs very similar operations on its 397*9880d681SAndroid Build Coastguard Workerinputs (a1, b1) and (a2, b2). The basic-block vectorizer may combine these 398*9880d681SAndroid Build Coastguard Workerinto vector operations. 399*9880d681SAndroid Build Coastguard Worker 400*9880d681SAndroid Build Coastguard Worker.. code-block:: c++ 401*9880d681SAndroid Build Coastguard Worker 402*9880d681SAndroid Build Coastguard Worker void foo(int a1, int a2, int b1, int b2, int *A) { 403*9880d681SAndroid Build Coastguard Worker A[0] = a1*(a1 + b1)/b1 + 50*b1/a1; 404*9880d681SAndroid Build Coastguard Worker A[1] = a2*(a2 + b2)/b2 + 50*b2/a2; 405*9880d681SAndroid Build Coastguard Worker } 406*9880d681SAndroid Build Coastguard Worker 407*9880d681SAndroid Build Coastguard WorkerThe SLP-vectorizer processes the code bottom-up, across basic blocks, in search of scalars to combine. 408*9880d681SAndroid Build Coastguard Worker 409*9880d681SAndroid Build Coastguard WorkerUsage 410*9880d681SAndroid Build Coastguard Worker------ 411*9880d681SAndroid Build Coastguard Worker 412*9880d681SAndroid Build Coastguard WorkerThe SLP Vectorizer is enabled by default, but it can be disabled 413*9880d681SAndroid Build Coastguard Workerthrough clang using the command line flag: 414*9880d681SAndroid Build Coastguard Worker 415*9880d681SAndroid Build Coastguard Worker.. code-block:: console 416*9880d681SAndroid Build Coastguard Worker 417*9880d681SAndroid Build Coastguard Worker $ clang -fno-slp-vectorize file.c 418*9880d681SAndroid Build Coastguard Worker 419*9880d681SAndroid Build Coastguard WorkerLLVM has a second basic block vectorization phase 420*9880d681SAndroid Build Coastguard Workerwhich is more compile-time intensive (The BB vectorizer). This optimization 421*9880d681SAndroid Build Coastguard Workercan be enabled through clang using the command line flag: 422*9880d681SAndroid Build Coastguard Worker 423*9880d681SAndroid Build Coastguard Worker.. code-block:: console 424*9880d681SAndroid Build Coastguard Worker 425*9880d681SAndroid Build Coastguard Worker $ clang -fslp-vectorize-aggressive file.c 426*9880d681SAndroid Build Coastguard Worker 427