xref: /aosp_15_r20/external/pytorch/benchmarks/instruction_counts/README.md (revision da0073e96a02ea20f0ac840b70461e3646d07c45)
1*da0073e9SAndroid Build Coastguard Worker# Instruction count microbenchmarks
2*da0073e9SAndroid Build Coastguard Worker## Quick start
3*da0073e9SAndroid Build Coastguard Worker
4*da0073e9SAndroid Build Coastguard Worker### To run the benchmark:
5*da0073e9SAndroid Build Coastguard Worker
6*da0073e9SAndroid Build Coastguard Worker```
7*da0073e9SAndroid Build Coastguard Worker# From pytorch root
8*da0073e9SAndroid Build Coastguard Workercd benchmarks/instruction_counts
9*da0073e9SAndroid Build Coastguard Workerpython main.py
10*da0073e9SAndroid Build Coastguard Worker```
11*da0073e9SAndroid Build Coastguard Worker
12*da0073e9SAndroid Build Coastguard WorkerCurrently `main.py` contains a very simple threadpool (so that run time isn't
13*da0073e9SAndroid Build Coastguard Workerunbearably onerous) and simply prints the results. These components will be
14*da0073e9SAndroid Build Coastguard Workerupgraded in subsequent PRs.
15*da0073e9SAndroid Build Coastguard Worker
16*da0073e9SAndroid Build Coastguard Worker### To define a new benchmark:
17*da0073e9SAndroid Build Coastguard Worker* `TimerArgs`: Low level definition which maps directly to
18*da0073e9SAndroid Build Coastguard Worker`torch.utils.benchmark.Timer`
19*da0073e9SAndroid Build Coastguard Worker* `GroupedStmts`: Benchmark a snippet. (Python, C++, or both) Can automatically
20*da0073e9SAndroid Build Coastguard Workergenerate TorchScript and autograd variants.
21*da0073e9SAndroid Build Coastguard Worker* `GroupedModules`: Like `GroupedStmts`, but takes `nn.Module`s
22*da0073e9SAndroid Build Coastguard Worker* `GroupedVariants`: Benchmark-per-line to define many related benchmarks in a
23*da0073e9SAndroid Build Coastguard Workersingle code block.
24*da0073e9SAndroid Build Coastguard Worker
25*da0073e9SAndroid Build Coastguard Worker## Architecture
26*da0073e9SAndroid Build Coastguard Worker### Benchmark definition.
27*da0073e9SAndroid Build Coastguard Worker
28*da0073e9SAndroid Build Coastguard WorkerOne primary goal of this suite is to make it easy to define semantically
29*da0073e9SAndroid Build Coastguard Workerrelated clusters of benchmarks. The crux of this effort is the
30*da0073e9SAndroid Build Coastguard Worker`GroupedBenchmark` class, which is defined in `core/api.py`. It takes a
31*da0073e9SAndroid Build Coastguard Workerdefinition for a set of related benchmarks, and produces one or more concrete
32*da0073e9SAndroid Build Coastguard Workercases. It's helpful to see an example to understand how the machinery works.
33*da0073e9SAndroid Build Coastguard WorkerConsider the following benchmark:
34*da0073e9SAndroid Build Coastguard Worker
35*da0073e9SAndroid Build Coastguard Worker```
36*da0073e9SAndroid Build Coastguard Worker# `GroupedStmts` is an alias of `GroupedBenchmark.init_from_stmts`
37*da0073e9SAndroid Build Coastguard Workerbenchmark = GroupedStmts(
38*da0073e9SAndroid Build Coastguard Worker    py_stmt=r"y = x * w",
39*da0073e9SAndroid Build Coastguard Worker    cpp_stmt=r"auto y = x * w;",
40*da0073e9SAndroid Build Coastguard Worker
41*da0073e9SAndroid Build Coastguard Worker    setup=GroupedSetup(
42*da0073e9SAndroid Build Coastguard Worker        py_setup="""
43*da0073e9SAndroid Build Coastguard Worker            x = torch.ones((4, 4))
44*da0073e9SAndroid Build Coastguard Worker            w = torch.ones((4, 4), requires_grad=True)
45*da0073e9SAndroid Build Coastguard Worker        """,
46*da0073e9SAndroid Build Coastguard Worker        cpp_setup="""
47*da0073e9SAndroid Build Coastguard Worker            auto x = torch::ones((4, 4));
48*da0073e9SAndroid Build Coastguard Worker            auto w = torch::ones((4, 4));
49*da0073e9SAndroid Build Coastguard Worker            w.set_requires_grad(true);
50*da0073e9SAndroid Build Coastguard Worker        """,
51*da0073e9SAndroid Build Coastguard Worker    ),
52*da0073e9SAndroid Build Coastguard Worker
53*da0073e9SAndroid Build Coastguard Worker    signature="f(x, w) -> y",
54*da0073e9SAndroid Build Coastguard Worker    torchscript=True,
55*da0073e9SAndroid Build Coastguard Worker    autograd=True,
56*da0073e9SAndroid Build Coastguard Worker),
57*da0073e9SAndroid Build Coastguard Worker```
58*da0073e9SAndroid Build Coastguard Worker
59*da0073e9SAndroid Build Coastguard WorkerIt is trivial to generate Timers for the eager forward mode case (ignoring
60*da0073e9SAndroid Build Coastguard Worker`num_threads` for now):
61*da0073e9SAndroid Build Coastguard Worker
62*da0073e9SAndroid Build Coastguard Worker```
63*da0073e9SAndroid Build Coastguard WorkerTimer(
64*da0073e9SAndroid Build Coastguard Worker    stmt=benchmark.py_fwd_stmt,
65*da0073e9SAndroid Build Coastguard Worker    setup=benchmark.setup.py_setup,
66*da0073e9SAndroid Build Coastguard Worker)
67*da0073e9SAndroid Build Coastguard Worker
68*da0073e9SAndroid Build Coastguard WorkerTimer(
69*da0073e9SAndroid Build Coastguard Worker    stmt=benchmark.cpp_fwd_stmt,
70*da0073e9SAndroid Build Coastguard Worker    setup=benchmark.setup.cpp_setup,
71*da0073e9SAndroid Build Coastguard Worker    language="cpp",
72*da0073e9SAndroid Build Coastguard Worker)
73*da0073e9SAndroid Build Coastguard Worker```
74*da0073e9SAndroid Build Coastguard Worker
75*da0073e9SAndroid Build Coastguard WorkerMoreover, because `signature` is provided we know that creation of `x` and `w`
76*da0073e9SAndroid Build Coastguard Workeris part of setup, and the overall computation uses `x` and `w` to produce `y`.
77*da0073e9SAndroid Build Coastguard WorkerAs a result, we can derive TorchScript'd and AutoGrad variants as well. We can
78*da0073e9SAndroid Build Coastguard Workerdeduce that a TorchScript model will take the form:
79*da0073e9SAndroid Build Coastguard Worker
80*da0073e9SAndroid Build Coastguard Worker```
81*da0073e9SAndroid Build Coastguard Worker@torch.jit.script
82*da0073e9SAndroid Build Coastguard Workerdef f(x, w):
83*da0073e9SAndroid Build Coastguard Worker    # Paste `benchmark.py_fwd_stmt` into the function body.
84*da0073e9SAndroid Build Coastguard Worker    y = x * w
85*da0073e9SAndroid Build Coastguard Worker    return y  # Set by `-> y` in signature.
86*da0073e9SAndroid Build Coastguard Worker```
87*da0073e9SAndroid Build Coastguard Worker
88*da0073e9SAndroid Build Coastguard WorkerAnd because we will want to use this model in both Python and C++, we save it to
89*da0073e9SAndroid Build Coastguard Workerdisk and load it as needed. At this point Timers for TorchScript become:
90*da0073e9SAndroid Build Coastguard Worker
91*da0073e9SAndroid Build Coastguard Worker```
92*da0073e9SAndroid Build Coastguard WorkerTimer(
93*da0073e9SAndroid Build Coastguard Worker    stmt="""
94*da0073e9SAndroid Build Coastguard Worker        y = jit_model(x, w)
95*da0073e9SAndroid Build Coastguard Worker    """,
96*da0073e9SAndroid Build Coastguard Worker    setup=""",
97*da0073e9SAndroid Build Coastguard Worker        # benchmark.setup.py_setup
98*da0073e9SAndroid Build Coastguard Worker        # jit_model = torch.jit.load(...)
99*da0073e9SAndroid Build Coastguard Worker        # Warm up jit_model
100*da0073e9SAndroid Build Coastguard Worker    """,
101*da0073e9SAndroid Build Coastguard Worker)
102*da0073e9SAndroid Build Coastguard Worker
103*da0073e9SAndroid Build Coastguard WorkerTimer(
104*da0073e9SAndroid Build Coastguard Worker    stmt="""
105*da0073e9SAndroid Build Coastguard Worker        std::vector<torch::jit::IValue> ivalue_inputs(
106*da0073e9SAndroid Build Coastguard Worker            torch::jit::IValue({x}),
107*da0073e9SAndroid Build Coastguard Worker            torch::jit::IValue({w})
108*da0073e9SAndroid Build Coastguard Worker        );
109*da0073e9SAndroid Build Coastguard Worker        auto y = jit_model.forward(ivalue_inputs);
110*da0073e9SAndroid Build Coastguard Worker    """,
111*da0073e9SAndroid Build Coastguard Worker    setup="""
112*da0073e9SAndroid Build Coastguard Worker        # benchmark.setup.cpp_setup
113*da0073e9SAndroid Build Coastguard Worker        # jit_model = torch::jit::load(...)
114*da0073e9SAndroid Build Coastguard Worker        # Warm up jit_model
115*da0073e9SAndroid Build Coastguard Worker    """,
116*da0073e9SAndroid Build Coastguard Worker)
117*da0073e9SAndroid Build Coastguard Worker```
118*da0073e9SAndroid Build Coastguard Worker
119*da0073e9SAndroid Build Coastguard WorkerWhile nothing above is particularly complex, there is non-trivial bookkeeping
120*da0073e9SAndroid Build Coastguard Worker(managing the model artifact, setting up IValues) which if done manually would
121*da0073e9SAndroid Build Coastguard Workerbe rather bug-prone and hard to read.
122*da0073e9SAndroid Build Coastguard Worker
123*da0073e9SAndroid Build Coastguard WorkerThe story is similar for autograd: because we know the output variable (`y`)
124*da0073e9SAndroid Build Coastguard Workerand we make sure to assign it when calling TorchScript models, testing AutoGrad
125*da0073e9SAndroid Build Coastguard Workeris as simple as appending `y.backward()` (or `y.backward();` in C++) to the
126*da0073e9SAndroid Build Coastguard Workerstmt of the forward only variant. Of course this requires that `signature` be
127*da0073e9SAndroid Build Coastguard Workerprovided, as there is nothing special about the name `y`.
128*da0073e9SAndroid Build Coastguard Worker
129*da0073e9SAndroid Build Coastguard WorkerThe logic for the manipulations above is split between `core/api.py` (for
130*da0073e9SAndroid Build Coastguard Workergenerating `stmt` based on language, Eager/TorchScript, with or without AutoGrad)
131*da0073e9SAndroid Build Coastguard Workerand `core/expand.py` (for larger, more expansive generation). The benchmarks
132*da0073e9SAndroid Build Coastguard Workerthemselves are defined in `definitions/standard.py`. The current set is chosen
133*da0073e9SAndroid Build Coastguard Workerto demonstrate the various model definition APIs, and will be expanded when the
134*da0073e9SAndroid Build Coastguard Workerbenchmark runner infrastructure is better equipped to deal with a larger run.
135*da0073e9SAndroid Build Coastguard Worker
136*da0073e9SAndroid Build Coastguard Worker### Benchmark execution.
137*da0073e9SAndroid Build Coastguard Worker
138*da0073e9SAndroid Build Coastguard WorkerOnce `expand.materialize` has flattened the abstract benchmark definitions into
139*da0073e9SAndroid Build Coastguard Worker`TimerArgs`, they can be sent to a worker (`worker/main.py`) subprocess to
140*da0073e9SAndroid Build Coastguard Workerexecution. This worker has no concept of the larger benchmark suite; `TimerArgs`
141*da0073e9SAndroid Build Coastguard Workeris a one-to-one and direct mapping to the `torch.utils.benchmark.Timer` instance
142*da0073e9SAndroid Build Coastguard Workerthat the worker instantiates.
143