Name Date Size #Lines LOC

..--

quantization/H25-Apr-2025-337248

README.mdH A D25-Apr-20255.3 KiB12292

TARGETSH A D25-Apr-2025232 95

__init__.pyH A D25-Apr-20251.2 KiB4126

aot_compiler.pyH A D25-Apr-20253.8 KiB12089

targets.bzlH A D25-Apr-20251.9 KiB6457

README.md

1# XNNPACK Backend
2
3[XNNPACK](https://github.com/google/XNNPACK) is a library of optimized neural network operators for ARM and x86 CPU platforms. Our delegate lowers models to run using these highly optimized CPU operators. You can try out lowering and running some example models in the demo. Please refer to the following docs for information on the XNNPACK Delegate
4- [XNNPACK Backend Delegate Overview](https://pytorch.org/executorch/stable/native-delegates-executorch-xnnpack-delegate.html)
5- [XNNPACK Delegate Export Tutorial](https://pytorch.org/executorch/stable/tutorial-xnnpack-delegate-lowering.html)
6
7
8## Directory structure
9
10```bash
11examples/xnnpack
12├── quantization                      # Scripts to illustrate PyTorch 2 Export Quantization workflow with XNNPACKQuantizer
13│   └── example.py
14├── aot_compiler.py                   # The main script to illustrate the full AOT (export, quantization, delegation) workflow with XNNPACK delegate
15└── README.md                         # This file
16```
17
18## Delegating a Floating-point Model
19
20The following command will produce a floating-point XNNPACK delegated model `mv2_xnnpack_fp32.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
21
22```bash
23# For MobileNet V2
24python3 -m examples.xnnpack.aot_compiler --model_name="mv2" --delegate
25```
26
27Once we have the model binary (pte) file, then let's run it with ExecuTorch runtime using the `xnn_executor_runner`. With cmake, you first configure your cmake with the following:
28
29```bash
30# cd to the root of executorch repo
31cd executorch
32
33# Get a clean cmake-out directory
34rm -rf cmake-out
35mkdir cmake-out
36
37# Configure cmake
38cmake \
39    -DCMAKE_INSTALL_PREFIX=cmake-out \
40    -DCMAKE_BUILD_TYPE=Release \
41    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
42    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
43    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
44    -DEXECUTORCH_BUILD_XNNPACK=ON \
45    -DEXECUTORCH_ENABLE_LOGGING=ON \
46    -DPYTHON_EXECUTABLE=python \
47    -Bcmake-out .
48```
49
50Then you can build the runtime components with
51
52```bash
53cmake --build cmake-out -j9 --target install --config Release
54```
55
56Now finally you should be able to run this model with the following command
57
58```bash
59./cmake-out/backends/xnnpack/xnn_executor_runner --model_path ./mv2_xnnpack_fp32.pte
60```
61
62## Quantization
63First, learn more about the generic PyTorch 2 Export Quantization workflow in the [Quantization Flow Docs](https://pytorch.org/executorch/stable/quantization-overview.html), if you are not familiar already.
64
65Here we will discuss quantizing a model suitable for XNNPACK delegation using XNNPACKQuantizer.
66
67Though it is typical to run this quantized mode via XNNPACK delegate, we want to highlight that this is just another quantization flavor, and we can run this quantized model without necessarily using XNNPACK delegate, but only using standard quantization operators.
68
69A shared library to register the out variants of the quantized operators (e.g., `quantized_decomposed::add.out`) into EXIR is required. On cmake, follow the instructions in `test_quantize.sh` to build it, the default path is `cmake-out/kernels/quantized/libquantized_ops_lib.so`.
70
71Then you can generate a XNNPACK quantized model with the following command by passing the path to the shared library into the script `quantization/example.py`:
72```bash
73python3 -m examples.xnnpack.quantization.example --model_name "mv2" --so_library "<path/to/so/lib>" # for MobileNetv2
74
75# This should generate ./mv2_quantized.pte file, if successful.
76```
77You can find more valid quantized example models by running:
78```bash
79python3 -m examples.xnnpack.quantization.example --help
80```
81
82## Running the XNNPACK Model with CMake
83After exporting the XNNPACK Delegated model, we can now try running it with example inputs using CMake. We can build and use the xnn_executor_runner, which is a sample wrapper for the ExecuTorch Runtime and XNNPACK Backend. We first begin by configuring the CMake build like such:
84```bash
85# cd to the root of executorch repo
86cd executorch
87
88# Get a clean cmake-out directory
89rm -rf cmake-out
90mkdir cmake-out
91
92# Configure cmake
93cmake \
94    -DCMAKE_INSTALL_PREFIX=cmake-out \
95    -DCMAKE_BUILD_TYPE=Release \
96    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
97    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
98    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
99    -DEXECUTORCH_BUILD_XNNPACK=ON \
100    -DEXECUTORCH_ENABLE_LOGGING=ON \
101    -DPYTHON_EXECUTABLE=python \
102    -Bcmake-out .
103```
104Then you can build the runtime componenets with
105
106```bash
107cmake --build cmake-out -j9 --target install --config Release
108```
109
110Now you should be able to find the executable built at `./cmake-out/backends/xnnpack/xnn_executor_runner` you can run the executable with the model you generated as such
111```bash
112./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./mv2_quantized.pte
113```
114
115## Delegating a Quantized Model
116
117The following command will produce a XNNPACK quantized and delegated model `mv2_xnnpack_q8.pte` that can be run using XNNPACK's operators. It will also print out the lowered graph, showing what parts of the models have been lowered to XNNPACK via `executorch_call_delegate`.
118
119```bash
120python3 -m examples.xnnpack.aot_compiler --model_name "mv2" --quantize --delegate
121```
122