1*da0073e9SAndroid Build Coastguard Worker# Instruction count microbenchmarks 2*da0073e9SAndroid Build Coastguard Worker## Quick start 3*da0073e9SAndroid Build Coastguard Worker 4*da0073e9SAndroid Build Coastguard Worker### To run the benchmark: 5*da0073e9SAndroid Build Coastguard Worker 6*da0073e9SAndroid Build Coastguard Worker``` 7*da0073e9SAndroid Build Coastguard Worker# From pytorch root 8*da0073e9SAndroid Build Coastguard Workercd benchmarks/instruction_counts 9*da0073e9SAndroid Build Coastguard Workerpython main.py 10*da0073e9SAndroid Build Coastguard Worker``` 11*da0073e9SAndroid Build Coastguard Worker 12*da0073e9SAndroid Build Coastguard WorkerCurrently `main.py` contains a very simple threadpool (so that run time isn't 13*da0073e9SAndroid Build Coastguard Workerunbearably onerous) and simply prints the results. These components will be 14*da0073e9SAndroid Build Coastguard Workerupgraded in subsequent PRs. 15*da0073e9SAndroid Build Coastguard Worker 16*da0073e9SAndroid Build Coastguard Worker### To define a new benchmark: 17*da0073e9SAndroid Build Coastguard Worker* `TimerArgs`: Low level definition which maps directly to 18*da0073e9SAndroid Build Coastguard Worker`torch.utils.benchmark.Timer` 19*da0073e9SAndroid Build Coastguard Worker* `GroupedStmts`: Benchmark a snippet. (Python, C++, or both) Can automatically 20*da0073e9SAndroid Build Coastguard Workergenerate TorchScript and autograd variants. 21*da0073e9SAndroid Build Coastguard Worker* `GroupedModules`: Like `GroupedStmts`, but takes `nn.Module`s 22*da0073e9SAndroid Build Coastguard Worker* `GroupedVariants`: Benchmark-per-line to define many related benchmarks in a 23*da0073e9SAndroid Build Coastguard Workersingle code block. 24*da0073e9SAndroid Build Coastguard Worker 25*da0073e9SAndroid Build Coastguard Worker## Architecture 26*da0073e9SAndroid Build Coastguard Worker### Benchmark definition. 27*da0073e9SAndroid Build Coastguard Worker 28*da0073e9SAndroid Build Coastguard WorkerOne primary goal of this suite is to make it easy to define semantically 29*da0073e9SAndroid Build Coastguard Workerrelated clusters of benchmarks. The crux of this effort is the 30*da0073e9SAndroid Build Coastguard Worker`GroupedBenchmark` class, which is defined in `core/api.py`. It takes a 31*da0073e9SAndroid Build Coastguard Workerdefinition for a set of related benchmarks, and produces one or more concrete 32*da0073e9SAndroid Build Coastguard Workercases. It's helpful to see an example to understand how the machinery works. 33*da0073e9SAndroid Build Coastguard WorkerConsider the following benchmark: 34*da0073e9SAndroid Build Coastguard Worker 35*da0073e9SAndroid Build Coastguard Worker``` 36*da0073e9SAndroid Build Coastguard Worker# `GroupedStmts` is an alias of `GroupedBenchmark.init_from_stmts` 37*da0073e9SAndroid Build Coastguard Workerbenchmark = GroupedStmts( 38*da0073e9SAndroid Build Coastguard Worker py_stmt=r"y = x * w", 39*da0073e9SAndroid Build Coastguard Worker cpp_stmt=r"auto y = x * w;", 40*da0073e9SAndroid Build Coastguard Worker 41*da0073e9SAndroid Build Coastguard Worker setup=GroupedSetup( 42*da0073e9SAndroid Build Coastguard Worker py_setup=""" 43*da0073e9SAndroid Build Coastguard Worker x = torch.ones((4, 4)) 44*da0073e9SAndroid Build Coastguard Worker w = torch.ones((4, 4), requires_grad=True) 45*da0073e9SAndroid Build Coastguard Worker """, 46*da0073e9SAndroid Build Coastguard Worker cpp_setup=""" 47*da0073e9SAndroid Build Coastguard Worker auto x = torch::ones((4, 4)); 48*da0073e9SAndroid Build Coastguard Worker auto w = torch::ones((4, 4)); 49*da0073e9SAndroid Build Coastguard Worker w.set_requires_grad(true); 50*da0073e9SAndroid Build Coastguard Worker """, 51*da0073e9SAndroid Build Coastguard Worker ), 52*da0073e9SAndroid Build Coastguard Worker 53*da0073e9SAndroid Build Coastguard Worker signature="f(x, w) -> y", 54*da0073e9SAndroid Build Coastguard Worker torchscript=True, 55*da0073e9SAndroid Build Coastguard Worker autograd=True, 56*da0073e9SAndroid Build Coastguard Worker), 57*da0073e9SAndroid Build Coastguard Worker``` 58*da0073e9SAndroid Build Coastguard Worker 59*da0073e9SAndroid Build Coastguard WorkerIt is trivial to generate Timers for the eager forward mode case (ignoring 60*da0073e9SAndroid Build Coastguard Worker`num_threads` for now): 61*da0073e9SAndroid Build Coastguard Worker 62*da0073e9SAndroid Build Coastguard Worker``` 63*da0073e9SAndroid Build Coastguard WorkerTimer( 64*da0073e9SAndroid Build Coastguard Worker stmt=benchmark.py_fwd_stmt, 65*da0073e9SAndroid Build Coastguard Worker setup=benchmark.setup.py_setup, 66*da0073e9SAndroid Build Coastguard Worker) 67*da0073e9SAndroid Build Coastguard Worker 68*da0073e9SAndroid Build Coastguard WorkerTimer( 69*da0073e9SAndroid Build Coastguard Worker stmt=benchmark.cpp_fwd_stmt, 70*da0073e9SAndroid Build Coastguard Worker setup=benchmark.setup.cpp_setup, 71*da0073e9SAndroid Build Coastguard Worker language="cpp", 72*da0073e9SAndroid Build Coastguard Worker) 73*da0073e9SAndroid Build Coastguard Worker``` 74*da0073e9SAndroid Build Coastguard Worker 75*da0073e9SAndroid Build Coastguard WorkerMoreover, because `signature` is provided we know that creation of `x` and `w` 76*da0073e9SAndroid Build Coastguard Workeris part of setup, and the overall computation uses `x` and `w` to produce `y`. 77*da0073e9SAndroid Build Coastguard WorkerAs a result, we can derive TorchScript'd and AutoGrad variants as well. We can 78*da0073e9SAndroid Build Coastguard Workerdeduce that a TorchScript model will take the form: 79*da0073e9SAndroid Build Coastguard Worker 80*da0073e9SAndroid Build Coastguard Worker``` 81*da0073e9SAndroid Build Coastguard Worker@torch.jit.script 82*da0073e9SAndroid Build Coastguard Workerdef f(x, w): 83*da0073e9SAndroid Build Coastguard Worker # Paste `benchmark.py_fwd_stmt` into the function body. 84*da0073e9SAndroid Build Coastguard Worker y = x * w 85*da0073e9SAndroid Build Coastguard Worker return y # Set by `-> y` in signature. 86*da0073e9SAndroid Build Coastguard Worker``` 87*da0073e9SAndroid Build Coastguard Worker 88*da0073e9SAndroid Build Coastguard WorkerAnd because we will want to use this model in both Python and C++, we save it to 89*da0073e9SAndroid Build Coastguard Workerdisk and load it as needed. At this point Timers for TorchScript become: 90*da0073e9SAndroid Build Coastguard Worker 91*da0073e9SAndroid Build Coastguard Worker``` 92*da0073e9SAndroid Build Coastguard WorkerTimer( 93*da0073e9SAndroid Build Coastguard Worker stmt=""" 94*da0073e9SAndroid Build Coastguard Worker y = jit_model(x, w) 95*da0073e9SAndroid Build Coastguard Worker """, 96*da0073e9SAndroid Build Coastguard Worker setup=""", 97*da0073e9SAndroid Build Coastguard Worker # benchmark.setup.py_setup 98*da0073e9SAndroid Build Coastguard Worker # jit_model = torch.jit.load(...) 99*da0073e9SAndroid Build Coastguard Worker # Warm up jit_model 100*da0073e9SAndroid Build Coastguard Worker """, 101*da0073e9SAndroid Build Coastguard Worker) 102*da0073e9SAndroid Build Coastguard Worker 103*da0073e9SAndroid Build Coastguard WorkerTimer( 104*da0073e9SAndroid Build Coastguard Worker stmt=""" 105*da0073e9SAndroid Build Coastguard Worker std::vector<torch::jit::IValue> ivalue_inputs( 106*da0073e9SAndroid Build Coastguard Worker torch::jit::IValue({x}), 107*da0073e9SAndroid Build Coastguard Worker torch::jit::IValue({w}) 108*da0073e9SAndroid Build Coastguard Worker ); 109*da0073e9SAndroid Build Coastguard Worker auto y = jit_model.forward(ivalue_inputs); 110*da0073e9SAndroid Build Coastguard Worker """, 111*da0073e9SAndroid Build Coastguard Worker setup=""" 112*da0073e9SAndroid Build Coastguard Worker # benchmark.setup.cpp_setup 113*da0073e9SAndroid Build Coastguard Worker # jit_model = torch::jit::load(...) 114*da0073e9SAndroid Build Coastguard Worker # Warm up jit_model 115*da0073e9SAndroid Build Coastguard Worker """, 116*da0073e9SAndroid Build Coastguard Worker) 117*da0073e9SAndroid Build Coastguard Worker``` 118*da0073e9SAndroid Build Coastguard Worker 119*da0073e9SAndroid Build Coastguard WorkerWhile nothing above is particularly complex, there is non-trivial bookkeeping 120*da0073e9SAndroid Build Coastguard Worker(managing the model artifact, setting up IValues) which if done manually would 121*da0073e9SAndroid Build Coastguard Workerbe rather bug-prone and hard to read. 122*da0073e9SAndroid Build Coastguard Worker 123*da0073e9SAndroid Build Coastguard WorkerThe story is similar for autograd: because we know the output variable (`y`) 124*da0073e9SAndroid Build Coastguard Workerand we make sure to assign it when calling TorchScript models, testing AutoGrad 125*da0073e9SAndroid Build Coastguard Workeris as simple as appending `y.backward()` (or `y.backward();` in C++) to the 126*da0073e9SAndroid Build Coastguard Workerstmt of the forward only variant. Of course this requires that `signature` be 127*da0073e9SAndroid Build Coastguard Workerprovided, as there is nothing special about the name `y`. 128*da0073e9SAndroid Build Coastguard Worker 129*da0073e9SAndroid Build Coastguard WorkerThe logic for the manipulations above is split between `core/api.py` (for 130*da0073e9SAndroid Build Coastguard Workergenerating `stmt` based on language, Eager/TorchScript, with or without AutoGrad) 131*da0073e9SAndroid Build Coastguard Workerand `core/expand.py` (for larger, more expansive generation). The benchmarks 132*da0073e9SAndroid Build Coastguard Workerthemselves are defined in `definitions/standard.py`. The current set is chosen 133*da0073e9SAndroid Build Coastguard Workerto demonstrate the various model definition APIs, and will be expanded when the 134*da0073e9SAndroid Build Coastguard Workerbenchmark runner infrastructure is better equipped to deal with a larger run. 135*da0073e9SAndroid Build Coastguard Worker 136*da0073e9SAndroid Build Coastguard Worker### Benchmark execution. 137*da0073e9SAndroid Build Coastguard Worker 138*da0073e9SAndroid Build Coastguard WorkerOnce `expand.materialize` has flattened the abstract benchmark definitions into 139*da0073e9SAndroid Build Coastguard Worker`TimerArgs`, they can be sent to a worker (`worker/main.py`) subprocess to 140*da0073e9SAndroid Build Coastguard Workerexecution. This worker has no concept of the larger benchmark suite; `TimerArgs` 141*da0073e9SAndroid Build Coastguard Workeris a one-to-one and direct mapping to the `torch.utils.benchmark.Timer` instance 142*da0073e9SAndroid Build Coastguard Workerthat the worker instantiates. 143