xref: /aosp_15_r20/external/pytorch/benchmarks/operator_benchmark/README.md (revision da0073e96a02ea20f0ac840b70461e3646d07c45)
1# PyTorch Operator Micro-benchmarks
2
3This benchmark suite provides a systemic way to measure the performance of operators for a wide range of inputs. The generated benchmark data fully characterized the performance of an operator in terms of execution time and the efficiency of the PyTorch frameworks used.
4
5## Features
6
7Key Features:
8
91\. Language used: Python
10
112\. Supported Frameworks: PyTorch
12
133\. Supported PyTorch mode: eager and JIT
14
154\. Input shapes: user-defined shapes, randomly generated shapes
16
17## Getting Started
18
19## Initial Setup
20The instruction below installs a cpp\_extension for PyTorch and it is required to run the benchmark suite.
21```bash
22cd pt_extension
23python setup.py install
24```
25
26## How to run the benchmarks:
27
28Run `torch.add` benchmark:
29```bash
30cd pytorch/benchmarks/operator_benchmark
31python -m pt.add_test --omp-num-threads 1 --mkl-num-threads 1
32```
33Note: we set the number of OpenMP and MKL threads both to 1. If you want to benchmark operators with multithreading (intra-op parallelism), use the `--omp-num-threads` and `--mkl-num-threads` flags.
34
35List all the supported tests:
36```bash
37python -m pt.add_test --list-tests
38```
39
40Filter and run a test (use `add_M8_N16_K32` as an example):
41```bash
42python -m pt.add_test --test-name add_K32_M8_N1
43--omp-num-threads 1 --mkl-num-threads 1
44```
45
46Run all the supported benchmarks:
47```bash
48python -m benchmark_all_test
49```
50
51## Code to support `torch.add` in the benchmark
52The following example shows the code to support `torch.add` with 27 different tests. In the subpages of this wiki, we'll step through the complete flow of adding PyTorch operators to the benchmark suite. Existing benchmarks for operators are in the `pt` directory and we highly recommend putting your new operators in those locations.
53
54```python
55add_short_configs = op_bench.cross_product_configs(
56    M=[8, 64, 128],
57    N=range(2, 10, 3),
58    K=[2 ** x for x in range(0, 3)],
59    tags=["short"]
60)
61
62class AddBenchmark(op_bench.TorchBenchmarkBase):
63    def init(self, M, N, K, device):
64        self.inputs = {
65            "input_one": torch.rand(M, N, K, device=device, requires_grad=self.auto_set()),
66            "input_two": torch.rand(M, N, K, device=device, requires_grad=self.auto_set())
67        }
68        self.set_module_name("add")
69
70    def forward(self, input_one, input_two):
71        return torch.add(input_one, input_two)
72
73op_bench.generate_pt_test(add_short_configs, AddBenchmark)
74```
75
76## Output and Command Line Control of the Benchmark
77The output is intended to be a human readable format. Here is an example output for `torch.add`:
78```
79# ----------------------------------------
80# PyTorch Operator Micro-benchmarks
81# ----------------------------------------
82# Tag : short
83
84# Benchmarking PyTorch: add
85# Mode: Eager
86# Name: add_M8_N16_K32
87# Input: M: 8, N: 16, K: 32
88Forward Execution Time (us) : 6.651
89
90# Benchmarking PyTorch: add
91# Mode: Eager
92# Name: add_M16_N16_K64
93# Input: M: 16, N: 16, K: 64
94Forward Execution Time (us) : 11.976
95
96# Benchmarking PyTorch: add
97# Mode: Eager
98# Name: add_M64_N64_K128
99# Input: M: 64, N: 64, K: 128
100Forward Execution Time (us) : 222.370
101```
102At a high level, the output includes the execution time of `torch.add` with three different inputs. Let's look at each line in detail:
103
1041\. `Tag: short` tags a group of inputs. For each operator, you could be interested in a large number of inputs, but you may not always want to run all the inputs. `Tag` allows you to only run some of the inputs. Most of the inputs to operators being supported in the benchmark are grouped using two tags. One group is tagged with `short` which stores some commonly used shapes. The other group is tagged with `long` which stores many random inputs to have better coverage compared with `short`.
105
1062\. `Benchmarking PyTorch: Add` shows name of the operator being benchmarked.
107
1083\. `Mode: Eager` shows that PyTorch eager mode is here.
109
1104\. `Name: add_M8_N16_K32` is the name of the test and it can be used to filter tests.
111
1125\. `Input: M: 8, N: 16, K: 32` shows inputs to the operator.
113
1146\. `Forward Execution Time (us) : 6.651` reports the execution time of an operator in microseconds.
115
116### Command-Line Control
117You can control all the aspects of the benchmark suite through the command-line. Please find details of those arguments by running the following command or look into `benchmark_runner.py`.
118```bash
119python benchmark_runner.py --help
120```
121
122Run all the supported benchmarks:
123```bash
124python -m benchmark_all_test --omp-num-threads 1 --mkl-num-threads 1
125```
126
127List all the supported operators:
128```bash
129python -m benchmark_all_test --list-ops
130```
131
132List all the supported tests:
133```bash
134python -m benchmark_all_test --list-tests
135```
136
137Filter and run an operator (use add as an example):
138```bash
139python -m benchmark_all_test --operators add --omp-num-threads 1 --mkl-num-threads 1
140```
141Note: this filter is based on the operator name rather than the file name.
142
143Run torch.add benchmark with tag 'long':
144```bash
145python -m pt.add_test --tag-filter long
146```
147
148## Adding New Operators to the Benchmark Suite
149In the previous sections, we gave several examples to show how to run the already available operators in the benchmark suite. In the following sections, we'll step through the complete flow of adding PyTorch operators to the benchmark suite. Existing benchmarks for operators are in the `pt` directory and we highly recommend putting your new operators in those directories as well.
150
151### Add a New PyTorch Operator
152Let's say you want to measure the execution time of the following operator:
153```python
154C = torch.add(A, B) # Shape of A and B is [M, N, K]
155```
156The code below shows how to add it to the benchmark suite. Let's go over the example line by line.
157```python
158import operator_benchmark as op_bench
159import torch
160
161add_long_configs = op_bench.cross_product_configs(
162    M=[8, 64, 128],
163    N=range(2, 10, 3),
164    K=[2 ** x for x in range(0, 3)],
165    tags=["long"]
166)
167
168add_short_configs = op_bench.config_list(
169    attr_names=["M", "N", "K"],
170    attrs=[
171        [8, 16, 32],
172        [16, 16, 64],
173        [64, 64, 128],
174    ],
175    tags=["short"],
176)
177
178class AddBenchmark(op_bench.TorchBenchmarkBase):
179    def init(self, M, N, K, device):
180        self.inputs = {
181            "input_one": torch.rand(M, N, K, device=device, requires_grad=self.auto_set()),
182            "input_two": torch.rand(M, N, K, device=device, requires_grad=self.auto_set())
183        }
184        self.set_module_name("add")
185
186    def forward(self, input_one, input_two):
187        return torch.add(input_one, input_two)
188
189op_bench.generate_pt_test(add_long_configs + add_short_configs, AddBenchmark)
190
191if __name__ == "__main__":
192    op_bench.benchmark_runner.main()
193```
194
195#### Part 1. Specify Inputs to Operators
196For the `torch.add` operator, we would like to make sure it delivers good performance with input tensors which are of small, medium and large sizes. We have introduced two helper functions for users to easily generate a combination of inputs.
197```python
198# Generate list configurations that will be used for benchmark experiments
199add_long_configs = op_bench.cross_product_configs(
200    M=[8, 64, 128],
201    N=range(2, 10, 3),
202    K=[2 ** x for x in range(0, 3)],
203    tags=["long"]
204)
205
206add_short_configs = op_bench.config_list(
207    attr_names=["M", "N", "K"],
208    attrs=[
209        [8, 16, 32],
210        [16, 16, 64],
211        [64, 64, 128],
212    ],
213    tags=["short"],
214)
215```
216Let's look at it in detail:
217
2181\. `op_bench.config_list` is a helper function which specifies a list of inputs to operators. It takes three parameters which are `attrs_names, attrs, and tags`, all of them are python lists. `attr_names` stores the names of the inputs. `attrs` stores the real value of each input. In this example, three different inputs will be returned which are: `M=8, N=16, K=32; M=16, N=16, K=64; M=64, N=64, K=128`.
219
2202\. `op_bench.cross_product_configs` is another helper function to generate a cartesian product of the inputs. Each input is specified in a python list. In this example, the helper method will return a combination of 27 (len(M) * len(N) * len(K)) inputs.
221
222#### Part 2. Create Tensors and Add Computation
223After inputs are provided, we now look at adding the computation of an operator. Adding a new operator requires implementing a new `TorchBenchmarkBase` subclass. Every new class is required to implement 2 methods:
224* `init` is used to create tensors based on the inputs we provided before. In this example, the parameters to `init` are `M, N, and K` which have been specified in the input configuration. `init` also packed all the needed inputs together into a dictionary `self.inputs` which will be provided to `forward` as arguments for running the benchmark.
225* `forward` includes the operator to be tested and the computation based on the created tensors in `init`. Apart from `self`, the order of the arguments must match the entries specified in `self.inputs`.
226
227The example below shows the code for `torch.add`:
228```python
229# Given one set of M, N, K, the init method creates input tensors based on
230# that. The forward method does torch.add calculation on those input tensors.
231
232class AddBenchmark(op_bench.TorchBenchmarkBase):
233    def init(self, M, N, K, device):
234        # this is the method where you need to create tensors
235        # M, N, and K can be in different order, but they must match with
236        # names in the configs.
237        self.inputs = {
238            "input_one": torch.rand(M, N, K, device=device, requires_grad=self.auto_set()),
239            "input_two": torch.rand(M, N, K, device=device, requires_grad=self.auto_set())
240        }
241        self.set_module_name("add")
242
243    def forward(self, input_one, input_two):
244        # this is the method to have operator and do computation
245        return torch.add(input_one, input_two)
246```
247
248#### Part 3. Register Tests With the Benchmark Suite
249After we have inputs and the benchmark class, it's time to register them with our benchmark suite. Here is how it looks like:
250```python
251op_bench.generate_pt_test(add_long_configs + add_short_configs, AddBenchmark)
252```
253`generate_pt_test` takes two parameters which are inputs configs and the benchmark class.
254
255#### Part 4. Run the Registered Tests
256To run the benchmark, we use the main method in `benchmark_runner` module.
257```python
258if __name__ == "__main__":
259    op_bench.benchmark_runner.main()
260```
261That's it. You just added a new operator to the benchmark suite!
262
263### Add a List of Operators
264In the previous sections, we introduced the steps required to add a single operator to the benchmark suite. There are scenarios where you want to extend the benchmark suite with a list of operators which can share the same inputs. For example, to benchmark `abs` and `acos` operators, you can use the same set of inputs for both.
265
266Let's say we want to benchmark the following operators separately:
267```python
268C = torch.abs(A) # Shape of A [M, N]
269C = torch.acos(A) # Shape of A [M, N]
270```
271The following code shows how to do that:
272```python
273import operator_benchmark as op_bench
274import torch
275
276unary_ops_configs = op_bench.config_list(
277    attrs=[
278        [128, 128],
279        [256, 256],
280        [1024, 1024],
281    ],
282    attr_names=["M", "N"],
283    tags=["short"]
284)
285
286unary_ops_list = op_bench.op_list(
287    attr_names=["op_name", "op_func"],
288    attrs=[
289        ["abs", torch.abs],
290        ["acos", torch.acos],
291    ],
292)
293
294class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
295    def init(self, M, N, device, op_func):
296        self.inputs = {
297            "input": torch.rand(M, N, device=device)
298        }
299        self.op_func = op_func
300
301    def forward(self, input):
302        return self.op_func(input)
303
304op_bench.generate_pt_tests_from_op_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark)
305
306if __name__ == "__main__":
307    op_bench.benchmark_runner.main()
308```
309The inputs to those operators are specified using the same method we went over before. So we just skip it here.
310
311#### Part 1. Specify the List of Operators
312To add a list of operators to the benchmark suite, we introduce the `op_bench.op_list` method which takes two parameters:
313* `attrs` stores the name of the operator and the method to do the real calculation.
314* `attr_names` stores the names of values in attrs.
315
316The example below shows the code to add `torch.abs` and `torch.acos` :
317```python
318unary_ops_list = op_bench.op_list(
319    attr_names=["op_name", "op_func"],
320    attrs=[
321        ["abs", torch.abs],
322        ["acos", torch.acos],
323    ],
324)
325```
326
327#### Part 2. Create Tensors and Add Computation
328In this example, both operators share the same input so we only need to implement one TorchBenchmarkBase subclass.
329Every new subclass is required to implement 3 methods:
330* `init` is used to create tensors and set the operator name and function. In this example, the parameters to `init` are `M`, `N`, and `op_func` which have been specified in the configurations.
331* `forward` includes the operator to be tested and the computation based on the created tensors in `init`. Apart from `self`, the order of the arguments must match the entries specified in `self.inputs`.
332Here is the code for `abs` and `acos`:
333
334```python
335class UnaryOpBenchmark(op_bench.TorchBenchmarkBase):
336    def init(self, M, N, device, op_func):
337        # The M and N match with the attr_names in the input configuration
338        # The op_func matches with the attr_name in the ops configuration
339        self.inputs = {
340            "input": torch.rand(M, N, device=device)
341        }
342        self.op_func = op_func
343
344    def forward(self, input):
345        return self.op_func(input)
346```
347
348#### Part 3. Register a List of Operators
349To register multiple operators,  we introduced the `generate_pt_tests_from_op_list` function which takes three parameters. First, the list of operators. Second,the configs. Third, the benchmark class.
350Here is an example:
351```python
352op_bench.generate_pt_tests_from_op_list(unary_ops_list, unary_ops_configs, UnaryOpBenchmark)
353```
354
355
356### Add Gradient Ops
357In this section, we go over the steps to benchmark the backward path of operators.
358#### For PyTorch Gradient Ops
359To measure the performance of an operator in its backward path, there are only two changes needed in addition to the steps we covered for the forward path:
360
3611\. Specify `requires_grad=True` when creating the tensor. This is a standard PyTorch way of enabling backward path.
362
3632\. Use `generate_pt_gradient_test` to register the tests.
364
365The example below shows the relevant code for that:
366```python
367self.input_one = torch.rand(M, N, K, requires_grad=True)
368generate_pt_gradient_test(long_configs + short_configs, TorchAddBenchmark)
369```
370