1*dbb99499SAndroid Build Coastguard Worker# Benchmark Tools 2*dbb99499SAndroid Build Coastguard Worker 3*dbb99499SAndroid Build Coastguard Worker## compare.py 4*dbb99499SAndroid Build Coastguard Worker 5*dbb99499SAndroid Build Coastguard WorkerThe `compare.py` can be used to compare the result of benchmarks. 6*dbb99499SAndroid Build Coastguard Worker 7*dbb99499SAndroid Build Coastguard Worker### Dependencies 8*dbb99499SAndroid Build Coastguard WorkerThe utility relies on the [scipy](https://www.scipy.org) package which can be installed using pip: 9*dbb99499SAndroid Build Coastguard Worker```bash 10*dbb99499SAndroid Build Coastguard Workerpip3 install -r requirements.txt 11*dbb99499SAndroid Build Coastguard Worker``` 12*dbb99499SAndroid Build Coastguard Worker 13*dbb99499SAndroid Build Coastguard Worker### Displaying aggregates only 14*dbb99499SAndroid Build Coastguard Worker 15*dbb99499SAndroid Build Coastguard WorkerThe switch `-a` / `--display_aggregates_only` can be used to control the 16*dbb99499SAndroid Build Coastguard Workerdisplayment of the normal iterations vs the aggregates. When passed, it will 17*dbb99499SAndroid Build Coastguard Workerbe passthrough to the benchmark binaries to be run, and will be accounted for 18*dbb99499SAndroid Build Coastguard Workerin the tool itself; only the aggregates will be displayed, but not normal runs. 19*dbb99499SAndroid Build Coastguard WorkerIt only affects the display, the separate runs will still be used to calculate 20*dbb99499SAndroid Build Coastguard Workerthe U test. 21*dbb99499SAndroid Build Coastguard Worker 22*dbb99499SAndroid Build Coastguard Worker### Modes of operation 23*dbb99499SAndroid Build Coastguard Worker 24*dbb99499SAndroid Build Coastguard WorkerThere are three modes of operation: 25*dbb99499SAndroid Build Coastguard Worker 26*dbb99499SAndroid Build Coastguard Worker1. Just compare two benchmarks 27*dbb99499SAndroid Build Coastguard WorkerThe program is invoked like: 28*dbb99499SAndroid Build Coastguard Worker 29*dbb99499SAndroid Build Coastguard Worker``` bash 30*dbb99499SAndroid Build Coastguard Worker$ compare.py benchmarks <benchmark_baseline> <benchmark_contender> [benchmark options]... 31*dbb99499SAndroid Build Coastguard Worker``` 32*dbb99499SAndroid Build Coastguard WorkerWhere `<benchmark_baseline>` and `<benchmark_contender>` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file. 33*dbb99499SAndroid Build Coastguard Worker 34*dbb99499SAndroid Build Coastguard Worker`[benchmark options]` will be passed to the benchmarks invocations. They can be anything that binary accepts, be it either normal `--benchmark_*` parameters, or some custom parameters your binary takes. 35*dbb99499SAndroid Build Coastguard Worker 36*dbb99499SAndroid Build Coastguard WorkerExample output: 37*dbb99499SAndroid Build Coastguard Worker``` 38*dbb99499SAndroid Build Coastguard Worker$ ./compare.py benchmarks ./a.out ./a.out 39*dbb99499SAndroid Build Coastguard WorkerRUNNING: ./a.out --benchmark_out=/tmp/tmprBT5nW 40*dbb99499SAndroid Build Coastguard WorkerRun on (8 X 4000 MHz CPU s) 41*dbb99499SAndroid Build Coastguard Worker2017-11-07 21:16:44 42*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 43*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 44*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 45*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8 36 ns 36 ns 19101577 211.669MB/s 46*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/64 76 ns 76 ns 9412571 800.199MB/s 47*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/512 84 ns 84 ns 8249070 5.64771GB/s 48*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/1024 116 ns 116 ns 6181763 8.19505GB/s 49*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8192 643 ns 643 ns 1062855 11.8636GB/s 50*dbb99499SAndroid Build Coastguard WorkerBM_copy/8 222 ns 222 ns 3137987 34.3772MB/s 51*dbb99499SAndroid Build Coastguard WorkerBM_copy/64 1608 ns 1608 ns 432758 37.9501MB/s 52*dbb99499SAndroid Build Coastguard WorkerBM_copy/512 12589 ns 12589 ns 54806 38.7867MB/s 53*dbb99499SAndroid Build Coastguard WorkerBM_copy/1024 25169 ns 25169 ns 27713 38.8003MB/s 54*dbb99499SAndroid Build Coastguard WorkerBM_copy/8192 201165 ns 201112 ns 3486 38.8466MB/s 55*dbb99499SAndroid Build Coastguard WorkerRUNNING: ./a.out --benchmark_out=/tmp/tmpt1wwG_ 56*dbb99499SAndroid Build Coastguard WorkerRun on (8 X 4000 MHz CPU s) 57*dbb99499SAndroid Build Coastguard Worker2017-11-07 21:16:53 58*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 59*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 60*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 61*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8 36 ns 36 ns 19397903 211.255MB/s 62*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/64 73 ns 73 ns 9691174 839.635MB/s 63*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/512 85 ns 85 ns 8312329 5.60101GB/s 64*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/1024 118 ns 118 ns 6438774 8.11608GB/s 65*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8192 656 ns 656 ns 1068644 11.6277GB/s 66*dbb99499SAndroid Build Coastguard WorkerBM_copy/8 223 ns 223 ns 3146977 34.2338MB/s 67*dbb99499SAndroid Build Coastguard WorkerBM_copy/64 1611 ns 1611 ns 435340 37.8751MB/s 68*dbb99499SAndroid Build Coastguard WorkerBM_copy/512 12622 ns 12622 ns 54818 38.6844MB/s 69*dbb99499SAndroid Build Coastguard WorkerBM_copy/1024 25257 ns 25239 ns 27779 38.6927MB/s 70*dbb99499SAndroid Build Coastguard WorkerBM_copy/8192 205013 ns 205010 ns 3479 38.108MB/s 71*dbb99499SAndroid Build Coastguard WorkerComparing ./a.out to ./a.out 72*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Time Old Time New CPU Old CPU New 73*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------------------------------------------------------ 74*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8 +0.0020 +0.0020 36 36 36 36 75*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/64 -0.0468 -0.0470 76 73 76 73 76*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/512 +0.0081 +0.0083 84 85 84 85 77*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/1024 +0.0098 +0.0097 116 118 116 118 78*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8192 +0.0200 +0.0203 643 656 643 656 79*dbb99499SAndroid Build Coastguard WorkerBM_copy/8 +0.0046 +0.0042 222 223 222 223 80*dbb99499SAndroid Build Coastguard WorkerBM_copy/64 +0.0020 +0.0020 1608 1611 1608 1611 81*dbb99499SAndroid Build Coastguard WorkerBM_copy/512 +0.0027 +0.0026 12589 12622 12589 12622 82*dbb99499SAndroid Build Coastguard WorkerBM_copy/1024 +0.0035 +0.0028 25169 25257 25169 25239 83*dbb99499SAndroid Build Coastguard WorkerBM_copy/8192 +0.0191 +0.0194 201165 205013 201112 205010 84*dbb99499SAndroid Build Coastguard Worker``` 85*dbb99499SAndroid Build Coastguard Worker 86*dbb99499SAndroid Build Coastguard WorkerWhat it does is for the every benchmark from the first run it looks for the benchmark with exactly the same name in the second run, and then compares the results. If the names differ, the benchmark is omitted from the diff. 87*dbb99499SAndroid Build Coastguard WorkerAs you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`. 88*dbb99499SAndroid Build Coastguard Worker 89*dbb99499SAndroid Build Coastguard Worker2. Compare two different filters of one benchmark 90*dbb99499SAndroid Build Coastguard WorkerThe program is invoked like: 91*dbb99499SAndroid Build Coastguard Worker 92*dbb99499SAndroid Build Coastguard Worker``` bash 93*dbb99499SAndroid Build Coastguard Worker$ compare.py filters <benchmark> <filter_baseline> <filter_contender> [benchmark options]... 94*dbb99499SAndroid Build Coastguard Worker``` 95*dbb99499SAndroid Build Coastguard WorkerWhere `<benchmark>` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file. 96*dbb99499SAndroid Build Coastguard Worker 97*dbb99499SAndroid Build Coastguard WorkerWhere `<filter_baseline>` and `<filter_contender>` are the same regex filters that you would pass to the `[--benchmark_filter=<regex>]` parameter of the benchmark binary. 98*dbb99499SAndroid Build Coastguard Worker 99*dbb99499SAndroid Build Coastguard Worker`[benchmark options]` will be passed to the benchmarks invocations. They can be anything that binary accepts, be it either normal `--benchmark_*` parameters, or some custom parameters your binary takes. 100*dbb99499SAndroid Build Coastguard Worker 101*dbb99499SAndroid Build Coastguard WorkerExample output: 102*dbb99499SAndroid Build Coastguard Worker``` 103*dbb99499SAndroid Build Coastguard Worker$ ./compare.py filters ./a.out BM_memcpy BM_copy 104*dbb99499SAndroid Build Coastguard WorkerRUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmpBWKk0k 105*dbb99499SAndroid Build Coastguard WorkerRun on (8 X 4000 MHz CPU s) 106*dbb99499SAndroid Build Coastguard Worker2017-11-07 21:37:28 107*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 108*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 109*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 110*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8 36 ns 36 ns 17891491 211.215MB/s 111*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/64 74 ns 74 ns 9400999 825.646MB/s 112*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/512 87 ns 87 ns 8027453 5.46126GB/s 113*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/1024 111 ns 111 ns 6116853 8.5648GB/s 114*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8192 657 ns 656 ns 1064679 11.6247GB/s 115*dbb99499SAndroid Build Coastguard WorkerRUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpAvWcOM 116*dbb99499SAndroid Build Coastguard WorkerRun on (8 X 4000 MHz CPU s) 117*dbb99499SAndroid Build Coastguard Worker2017-11-07 21:37:33 118*dbb99499SAndroid Build Coastguard Worker---------------------------------------------------- 119*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 120*dbb99499SAndroid Build Coastguard Worker---------------------------------------------------- 121*dbb99499SAndroid Build Coastguard WorkerBM_copy/8 227 ns 227 ns 3038700 33.6264MB/s 122*dbb99499SAndroid Build Coastguard WorkerBM_copy/64 1640 ns 1640 ns 426893 37.2154MB/s 123*dbb99499SAndroid Build Coastguard WorkerBM_copy/512 12804 ns 12801 ns 55417 38.1444MB/s 124*dbb99499SAndroid Build Coastguard WorkerBM_copy/1024 25409 ns 25407 ns 27516 38.4365MB/s 125*dbb99499SAndroid Build Coastguard WorkerBM_copy/8192 202986 ns 202990 ns 3454 38.4871MB/s 126*dbb99499SAndroid Build Coastguard WorkerComparing BM_memcpy to BM_copy (from ./a.out) 127*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Time Old Time New CPU Old CPU New 128*dbb99499SAndroid Build Coastguard Worker-------------------------------------------------------------------------------------------------------------------- 129*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/8 +5.2829 +5.2812 36 227 36 227 130*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/64 +21.1719 +21.1856 74 1640 74 1640 131*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/512 +145.6487 +145.6097 87 12804 87 12801 132*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/1024 +227.1860 +227.1776 111 25409 111 25407 133*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/8192 +308.1664 +308.2898 657 202986 656 202990 134*dbb99499SAndroid Build Coastguard Worker``` 135*dbb99499SAndroid Build Coastguard Worker 136*dbb99499SAndroid Build Coastguard WorkerAs you can see, it applies filter to the benchmarks, both when running the benchmark, and before doing the diff. And to make the diff work, the matches are replaced with some common string. Thus, you can compare two different benchmark families within one benchmark binary. 137*dbb99499SAndroid Build Coastguard WorkerAs you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`. 138*dbb99499SAndroid Build Coastguard Worker 139*dbb99499SAndroid Build Coastguard Worker3. Compare filter one from benchmark one to filter two from benchmark two: 140*dbb99499SAndroid Build Coastguard WorkerThe program is invoked like: 141*dbb99499SAndroid Build Coastguard Worker 142*dbb99499SAndroid Build Coastguard Worker``` bash 143*dbb99499SAndroid Build Coastguard Worker$ compare.py filters <benchmark_baseline> <filter_baseline> <benchmark_contender> <filter_contender> [benchmark options]... 144*dbb99499SAndroid Build Coastguard Worker``` 145*dbb99499SAndroid Build Coastguard Worker 146*dbb99499SAndroid Build Coastguard WorkerWhere `<benchmark_baseline>` and `<benchmark_contender>` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file. 147*dbb99499SAndroid Build Coastguard Worker 148*dbb99499SAndroid Build Coastguard WorkerWhere `<filter_baseline>` and `<filter_contender>` are the same regex filters that you would pass to the `[--benchmark_filter=<regex>]` parameter of the benchmark binary. 149*dbb99499SAndroid Build Coastguard Worker 150*dbb99499SAndroid Build Coastguard Worker`[benchmark options]` will be passed to the benchmarks invocations. They can be anything that binary accepts, be it either normal `--benchmark_*` parameters, or some custom parameters your binary takes. 151*dbb99499SAndroid Build Coastguard Worker 152*dbb99499SAndroid Build Coastguard WorkerExample output: 153*dbb99499SAndroid Build Coastguard Worker``` 154*dbb99499SAndroid Build Coastguard Worker$ ./compare.py benchmarksfiltered ./a.out BM_memcpy ./a.out BM_copy 155*dbb99499SAndroid Build Coastguard WorkerRUNNING: ./a.out --benchmark_filter=BM_memcpy --benchmark_out=/tmp/tmp_FvbYg 156*dbb99499SAndroid Build Coastguard WorkerRun on (8 X 4000 MHz CPU s) 157*dbb99499SAndroid Build Coastguard Worker2017-11-07 21:38:27 158*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 159*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 160*dbb99499SAndroid Build Coastguard Worker------------------------------------------------------ 161*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8 37 ns 37 ns 18953482 204.118MB/s 162*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/64 74 ns 74 ns 9206578 828.245MB/s 163*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/512 91 ns 91 ns 8086195 5.25476GB/s 164*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/1024 120 ns 120 ns 5804513 7.95662GB/s 165*dbb99499SAndroid Build Coastguard WorkerBM_memcpy/8192 664 ns 664 ns 1028363 11.4948GB/s 166*dbb99499SAndroid Build Coastguard WorkerRUNNING: ./a.out --benchmark_filter=BM_copy --benchmark_out=/tmp/tmpDfL5iE 167*dbb99499SAndroid Build Coastguard WorkerRun on (8 X 4000 MHz CPU s) 168*dbb99499SAndroid Build Coastguard Worker2017-11-07 21:38:32 169*dbb99499SAndroid Build Coastguard Worker---------------------------------------------------- 170*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Iterations 171*dbb99499SAndroid Build Coastguard Worker---------------------------------------------------- 172*dbb99499SAndroid Build Coastguard WorkerBM_copy/8 230 ns 230 ns 2985909 33.1161MB/s 173*dbb99499SAndroid Build Coastguard WorkerBM_copy/64 1654 ns 1653 ns 419408 36.9137MB/s 174*dbb99499SAndroid Build Coastguard WorkerBM_copy/512 13122 ns 13120 ns 53403 37.2156MB/s 175*dbb99499SAndroid Build Coastguard WorkerBM_copy/1024 26679 ns 26666 ns 26575 36.6218MB/s 176*dbb99499SAndroid Build Coastguard WorkerBM_copy/8192 215068 ns 215053 ns 3221 36.3283MB/s 177*dbb99499SAndroid Build Coastguard WorkerComparing BM_memcpy (from ./a.out) to BM_copy (from ./a.out) 178*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Time Old Time New CPU Old CPU New 179*dbb99499SAndroid Build Coastguard Worker-------------------------------------------------------------------------------------------------------------------- 180*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/8 +5.1649 +5.1637 37 230 37 230 181*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/64 +21.4352 +21.4374 74 1654 74 1653 182*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/512 +143.6022 +143.5865 91 13122 91 13120 183*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/1024 +221.5903 +221.4790 120 26679 120 26666 184*dbb99499SAndroid Build Coastguard Worker[BM_memcpy vs. BM_copy]/8192 +322.9059 +323.0096 664 215068 664 215053 185*dbb99499SAndroid Build Coastguard Worker``` 186*dbb99499SAndroid Build Coastguard WorkerThis is a mix of the previous two modes, two (potentially different) benchmark binaries are run, and a different filter is applied to each one. 187*dbb99499SAndroid Build Coastguard WorkerAs you can note, the values in `Time` and `CPU` columns are calculated as `(new - old) / |old|`. 188*dbb99499SAndroid Build Coastguard Worker 189*dbb99499SAndroid Build Coastguard Worker### Note: Interpreting the output 190*dbb99499SAndroid Build Coastguard Worker 191*dbb99499SAndroid Build Coastguard WorkerPerformance measurements are an art, and performance comparisons are doubly so. 192*dbb99499SAndroid Build Coastguard WorkerResults are often noisy and don't necessarily have large absolute differences to 193*dbb99499SAndroid Build Coastguard Workerthem, so just by visual inspection, it is not at all apparent if two 194*dbb99499SAndroid Build Coastguard Workermeasurements are actually showing a performance change or not. It is even more 195*dbb99499SAndroid Build Coastguard Workerconfusing with multiple benchmark repetitions. 196*dbb99499SAndroid Build Coastguard Worker 197*dbb99499SAndroid Build Coastguard WorkerThankfully, what we can do, is use statistical tests on the results to determine 198*dbb99499SAndroid Build Coastguard Workerwhether the performance has statistically-significantly changed. `compare.py` 199*dbb99499SAndroid Build Coastguard Workeruses [Mann–Whitney U 200*dbb99499SAndroid Build Coastguard Workertest](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test), with a null 201*dbb99499SAndroid Build Coastguard Workerhypothesis being that there's no difference in performance. 202*dbb99499SAndroid Build Coastguard Worker 203*dbb99499SAndroid Build Coastguard Worker**The below output is a summary of a benchmark comparison with statistics 204*dbb99499SAndroid Build Coastguard Workerprovided for a multi-threaded process.** 205*dbb99499SAndroid Build Coastguard Worker``` 206*dbb99499SAndroid Build Coastguard WorkerBenchmark Time CPU Time Old Time New CPU Old CPU New 207*dbb99499SAndroid Build Coastguard Worker----------------------------------------------------------------------------------------------------------------------------- 208*dbb99499SAndroid Build Coastguard Workerbenchmark/threads:1/process_time/real_time_pvalue 0.0000 0.0000 U Test, Repetitions: 27 vs 27 209*dbb99499SAndroid Build Coastguard Workerbenchmark/threads:1/process_time/real_time_mean -0.1442 -0.1442 90 77 90 77 210*dbb99499SAndroid Build Coastguard Workerbenchmark/threads:1/process_time/real_time_median -0.1444 -0.1444 90 77 90 77 211*dbb99499SAndroid Build Coastguard Workerbenchmark/threads:1/process_time/real_time_stddev +0.3974 +0.3933 0 0 0 0 212*dbb99499SAndroid Build Coastguard Workerbenchmark/threads:1/process_time/real_time_cv +0.6329 +0.6280 0 0 0 0 213*dbb99499SAndroid Build Coastguard WorkerOVERALL_GEOMEAN -0.1442 -0.1442 0 0 0 0 214*dbb99499SAndroid Build Coastguard Worker``` 215*dbb99499SAndroid Build Coastguard Worker-------------------------------------------- 216*dbb99499SAndroid Build Coastguard WorkerHere's a breakdown of each row: 217*dbb99499SAndroid Build Coastguard Worker 218*dbb99499SAndroid Build Coastguard Worker**benchmark/threads:1/process_time/real_time_pvalue**: This shows the _p-value_ for 219*dbb99499SAndroid Build Coastguard Workerthe statistical test comparing the performance of the process running with one 220*dbb99499SAndroid Build Coastguard Workerthread. A value of 0.0000 suggests a statistically significant difference in 221*dbb99499SAndroid Build Coastguard Workerperformance. The comparison was conducted using the U Test (Mann-Whitney 222*dbb99499SAndroid Build Coastguard WorkerU Test) with 27 repetitions for each case. 223*dbb99499SAndroid Build Coastguard Worker 224*dbb99499SAndroid Build Coastguard Worker**benchmark/threads:1/process_time/real_time_mean**: This shows the relative 225*dbb99499SAndroid Build Coastguard Workerdifference in mean execution time between two different cases. The negative 226*dbb99499SAndroid Build Coastguard Workervalue (-0.1442) implies that the new process is faster by about 14.42%. The old 227*dbb99499SAndroid Build Coastguard Workertime was 90 units, while the new time is 77 units. 228*dbb99499SAndroid Build Coastguard Worker 229*dbb99499SAndroid Build Coastguard Worker**benchmark/threads:1/process_time/real_time_median**: Similarly, this shows the 230*dbb99499SAndroid Build Coastguard Workerrelative difference in the median execution time. Again, the new process is 231*dbb99499SAndroid Build Coastguard Workerfaster by 14.44%. 232*dbb99499SAndroid Build Coastguard Worker 233*dbb99499SAndroid Build Coastguard Worker**benchmark/threads:1/process_time/real_time_stddev**: This is the relative 234*dbb99499SAndroid Build Coastguard Workerdifference in the standard deviation of the execution time, which is a measure 235*dbb99499SAndroid Build Coastguard Workerof how much variation or dispersion there is from the mean. A positive value 236*dbb99499SAndroid Build Coastguard Worker(+0.3974) implies there is more variance in the execution time in the new 237*dbb99499SAndroid Build Coastguard Workerprocess. 238*dbb99499SAndroid Build Coastguard Worker 239*dbb99499SAndroid Build Coastguard Worker**benchmark/threads:1/process_time/real_time_cv**: CV stands for Coefficient of 240*dbb99499SAndroid Build Coastguard WorkerVariation. It is the ratio of the standard deviation to the mean. It provides a 241*dbb99499SAndroid Build Coastguard Workerstandardized measure of dispersion. An increase (+0.6329) indicates more 242*dbb99499SAndroid Build Coastguard Workerrelative variability in the new process. 243*dbb99499SAndroid Build Coastguard Worker 244*dbb99499SAndroid Build Coastguard Worker**OVERALL_GEOMEAN**: Geomean stands for geometric mean, a type of average that is 245*dbb99499SAndroid Build Coastguard Workerless influenced by outliers. The negative value indicates a general improvement 246*dbb99499SAndroid Build Coastguard Workerin the new process. However, given the values are all zero for the old and new 247*dbb99499SAndroid Build Coastguard Workertimes, this seems to be a mistake or placeholder in the output. 248*dbb99499SAndroid Build Coastguard Worker 249*dbb99499SAndroid Build Coastguard Worker----------------------------------------- 250*dbb99499SAndroid Build Coastguard Worker 251*dbb99499SAndroid Build Coastguard Worker 252*dbb99499SAndroid Build Coastguard Worker 253*dbb99499SAndroid Build Coastguard WorkerLet's first try to see what the different columns represent in the above 254*dbb99499SAndroid Build Coastguard Worker`compare.py` benchmarking output: 255*dbb99499SAndroid Build Coastguard Worker 256*dbb99499SAndroid Build Coastguard Worker 1. **Benchmark:** The name of the function being benchmarked, along with the 257*dbb99499SAndroid Build Coastguard Worker size of the input (after the slash). 258*dbb99499SAndroid Build Coastguard Worker 259*dbb99499SAndroid Build Coastguard Worker 2. **Time:** The average time per operation, across all iterations. 260*dbb99499SAndroid Build Coastguard Worker 261*dbb99499SAndroid Build Coastguard Worker 3. **CPU:** The average CPU time per operation, across all iterations. 262*dbb99499SAndroid Build Coastguard Worker 263*dbb99499SAndroid Build Coastguard Worker 4. **Iterations:** The number of iterations the benchmark was run to get a 264*dbb99499SAndroid Build Coastguard Worker stable estimate. 265*dbb99499SAndroid Build Coastguard Worker 266*dbb99499SAndroid Build Coastguard Worker 5. **Time Old and Time New:** These represent the average time it takes for a 267*dbb99499SAndroid Build Coastguard Worker function to run in two different scenarios or versions. For example, you 268*dbb99499SAndroid Build Coastguard Worker might be comparing how fast a function runs before and after you make some 269*dbb99499SAndroid Build Coastguard Worker changes to it. 270*dbb99499SAndroid Build Coastguard Worker 271*dbb99499SAndroid Build Coastguard Worker 6. **CPU Old and CPU New:** These show the average amount of CPU time that the 272*dbb99499SAndroid Build Coastguard Worker function uses in two different scenarios or versions. This is similar to 273*dbb99499SAndroid Build Coastguard Worker Time Old and Time New, but focuses on CPU usage instead of overall time. 274*dbb99499SAndroid Build Coastguard Worker 275*dbb99499SAndroid Build Coastguard WorkerIn the comparison section, the relative differences in both time and CPU time 276*dbb99499SAndroid Build Coastguard Workerare displayed for each input size. 277*dbb99499SAndroid Build Coastguard Worker 278*dbb99499SAndroid Build Coastguard Worker 279*dbb99499SAndroid Build Coastguard WorkerA statistically-significant difference is determined by a **p-value**, which is 280*dbb99499SAndroid Build Coastguard Workera measure of the probability that the observed difference could have occurred 281*dbb99499SAndroid Build Coastguard Workerjust by random chance. A smaller p-value indicates stronger evidence against the 282*dbb99499SAndroid Build Coastguard Workernull hypothesis. 283*dbb99499SAndroid Build Coastguard Worker 284*dbb99499SAndroid Build Coastguard Worker**Therefore:** 285*dbb99499SAndroid Build Coastguard Worker 1. If the p-value is less than the chosen significance level (alpha), we 286*dbb99499SAndroid Build Coastguard Worker reject the null hypothesis and conclude the benchmarks are significantly 287*dbb99499SAndroid Build Coastguard Worker different. 288*dbb99499SAndroid Build Coastguard Worker 2. If the p-value is greater than or equal to alpha, we fail to reject the 289*dbb99499SAndroid Build Coastguard Worker null hypothesis and treat the two benchmarks as similar. 290*dbb99499SAndroid Build Coastguard Worker 291*dbb99499SAndroid Build Coastguard Worker 292*dbb99499SAndroid Build Coastguard Worker 293*dbb99499SAndroid Build Coastguard WorkerThe result of said the statistical test is additionally communicated through color coding: 294*dbb99499SAndroid Build Coastguard Worker```diff 295*dbb99499SAndroid Build Coastguard Worker+ Green: 296*dbb99499SAndroid Build Coastguard Worker``` 297*dbb99499SAndroid Build Coastguard Worker The benchmarks are _**statistically different**_. This could mean the 298*dbb99499SAndroid Build Coastguard Worker performance has either **significantly improved** or **significantly 299*dbb99499SAndroid Build Coastguard Worker deteriorated**. You should look at the actual performance numbers to see which 300*dbb99499SAndroid Build Coastguard Worker is the case. 301*dbb99499SAndroid Build Coastguard Worker```diff 302*dbb99499SAndroid Build Coastguard Worker- Red: 303*dbb99499SAndroid Build Coastguard Worker``` 304*dbb99499SAndroid Build Coastguard Worker The benchmarks are _**statistically similar**_. This means the performance 305*dbb99499SAndroid Build Coastguard Worker **hasn't significantly changed**. 306*dbb99499SAndroid Build Coastguard Worker 307*dbb99499SAndroid Build Coastguard WorkerIn statistical terms, **'green'** means we reject the null hypothesis that 308*dbb99499SAndroid Build Coastguard Workerthere's no difference in performance, and **'red'** means we fail to reject the 309*dbb99499SAndroid Build Coastguard Workernull hypothesis. This might seem counter-intuitive if you're expecting 'green' 310*dbb99499SAndroid Build Coastguard Workerto mean 'improved performance' and 'red' to mean 'worsened performance'. 311*dbb99499SAndroid Build Coastguard Worker```bash 312*dbb99499SAndroid Build Coastguard Worker But remember, in this context: 313*dbb99499SAndroid Build Coastguard Worker 314*dbb99499SAndroid Build Coastguard Worker 'Success' means 'successfully finding a difference'. 315*dbb99499SAndroid Build Coastguard Worker 'Failure' means 'failing to find a difference'. 316*dbb99499SAndroid Build Coastguard Worker``` 317*dbb99499SAndroid Build Coastguard Worker 318*dbb99499SAndroid Build Coastguard Worker 319*dbb99499SAndroid Build Coastguard WorkerAlso, please note that **even if** we determine that there **is** a 320*dbb99499SAndroid Build Coastguard Workerstatistically-significant difference between the two measurements, it does not 321*dbb99499SAndroid Build Coastguard Worker_necessarily_ mean that the actual benchmarks that were measured **are** 322*dbb99499SAndroid Build Coastguard Workerdifferent, or vice versa, even if we determine that there is **no** 323*dbb99499SAndroid Build Coastguard Workerstatistically-significant difference between the two measurements, it does not 324*dbb99499SAndroid Build Coastguard Workernecessarily mean that the actual benchmarks that were measured **are not** 325*dbb99499SAndroid Build Coastguard Workerdifferent. 326*dbb99499SAndroid Build Coastguard Worker 327*dbb99499SAndroid Build Coastguard Worker 328*dbb99499SAndroid Build Coastguard Worker 329*dbb99499SAndroid Build Coastguard Worker### U test 330*dbb99499SAndroid Build Coastguard Worker 331*dbb99499SAndroid Build Coastguard WorkerIf there is a sufficient repetition count of the benchmarks, the tool can do 332*dbb99499SAndroid Build Coastguard Workera [U Test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test), of the 333*dbb99499SAndroid Build Coastguard Workernull hypothesis that it is equally likely that a randomly selected value from 334*dbb99499SAndroid Build Coastguard Workerone sample will be less than or greater than a randomly selected value from a 335*dbb99499SAndroid Build Coastguard Workersecond sample. 336*dbb99499SAndroid Build Coastguard Worker 337*dbb99499SAndroid Build Coastguard WorkerIf the calculated p-value is below this value is lower than the significance 338*dbb99499SAndroid Build Coastguard Workerlevel alpha, then the result is said to be statistically significant and the 339*dbb99499SAndroid Build Coastguard Workernull hypothesis is rejected. Which in other words means that the two benchmarks 340*dbb99499SAndroid Build Coastguard Workeraren't identical. 341*dbb99499SAndroid Build Coastguard Worker 342*dbb99499SAndroid Build Coastguard Worker**WARNING**: requires **LARGE** (no less than 9) number of repetitions to be 343*dbb99499SAndroid Build Coastguard Workermeaningful! 344