xref: /aosp_15_r20/external/executorch/docs/source/build-run-qualcomm-ai-engine-direct-backend.md (revision 523fa7a60841cd1ecfb9cc4201f1ca8b03ed023a)
1# Building and Running ExecuTorch with Qualcomm AI Engine Direct Backend
2
3In this tutorial we will walk you through the process of getting started to
4build ExecuTorch for Qualcomm AI Engine Direct and running a model on it.
5
6Qualcomm AI Engine Direct is also referred to as QNN in the source and documentation.
7
8
9<!----This will show a grid card on the page----->
10::::{grid} 2
11:::{grid-item-card}  What you will learn in this tutorial:
12:class-card: card-prerequisites
13* In this tutorial you will learn how to lower and deploy a model for Qualcomm AI Engine Direct.
14:::
15:::{grid-item-card}  Tutorials we recommend you complete before this:
16:class-card: card-prerequisites
17* [Introduction to ExecuTorch](intro-how-it-works.md)
18* [Setting up ExecuTorch](getting-started-setup.md)
19* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md)
20:::
21::::
22
23
24## What's Qualcomm AI Engine Direct?
25
26[Qualcomm AI Engine Direct](https://developer.qualcomm.com/software/qualcomm-ai-engine-direct-sdk)
27is designed to provide unified, low-level APIs for AI development.
28
29Developers can interact with various accelerators on Qualcomm SoCs with these set of APIs, including
30Kryo CPU, Adreno GPU, and Hexagon processors. More details can be found [here](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html).
31
32Currently, this ExecuTorch Backend can delegate AI computations to Hexagon processors through Qualcomm AI Engine Direct APIs.
33
34
35## Prerequsites (Hardware and Software)
36
37### Host OS
38
39The Linux host operating system that QNN Backend is verified with is Ubuntu 22.04 LTS x64
40at the moment of updating this tutorial.
41Usually, we verified the backend on the same OS version which QNN is verified with.
42The version is documented in QNN SDK.
43
44### Hardware:
45You will need an Android smartphone with adb-connected running on one of below Qualcomm SoCs:
46 - SM8450 (Snapdragon 8 Gen 1)
47 - SM8475 (Snapdragon 8 Gen 1+)
48 - SM8550 (Snapdragon 8 Gen 2)
49 - SM8650 (Snapdragon 8 Gen 3)
50
51This example is verified with SM8550 and SM8450.
52
53### Software:
54
55 - Follow ExecuTorch recommended Python version.
56 - A compiler to compile AOT parts, e.g., the GCC compiler comes with Ubuntu LTS.
57 - [Android NDK](https://developer.android.com/ndk). This example is verified with NDK 26c.
58 - [Qualcomm AI Engine Direct SDK](https://developer.qualcomm.com/software/qualcomm-ai-engine-direct-sdk)
59   - Click the "Get Software" button to download a version of QNN SDK.
60   - However, at the moment of updating this tutorial, the above website doesn't provide QNN SDK newer than 2.22.6.
61   - The below is public links to download various QNN versions. Hope they can be publicly discoverable soon.
62   - [QNN 2.26.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.26.0.240828.zip)
63
64The directory with installed Qualcomm AI Engine Direct SDK looks like:
65```
66├── benchmarks
67├── bin
68├── docs
69├── examples
70├── include
71├── lib
72├── LICENSE.pdf
73├── NOTICE.txt
74├── NOTICE_WINDOWS.txt
75├── QNN_NOTICE.txt
76├── QNN_README.txt
77├── QNN_ReleaseNotes.txt
78├── ReleaseNotes.txt
79├── ReleaseNotesWindows.txt
80├── sdk.yaml
81└── share
82```
83
84
85## Setting up your developer environment
86
87### Conventions
88
89`$QNN_SDK_ROOT` refers to the root of Qualcomm AI Engine Direct SDK,
90i.e., the directory containing `QNN_README.txt`.
91
92`$ANDROID_NDK_ROOT` refers to the root of Android NDK.
93
94`$EXECUTORCH_ROOT` refers to the root of executorch git repository.
95
96### Setup environment variables
97
98We set `LD_LIBRARY_PATH` to make sure the dynamic linker can find QNN libraries.
99
100Further, we set `PYTHONPATH` because it's easier to develop and import ExecuTorch
101Python APIs.
102
103```bash
104export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang/:$LD_LIBRARY_PATH
105export PYTHONPATH=$EXECUTORCH_ROOT/..
106```
107
108## Build
109
110An example script for the below building instructions is [here](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/scripts/build.sh).
111We recommend to use the script because the ExecuTorch build-command can change from time to time.
112The above script is actively used. It is updated more frquently than this tutorial.
113An example usage is
114```bash
115cd $EXECUTORCH_ROOT
116./backends/qualcomm/scripts/build.sh
117# or
118./backends/qualcomm/scripts/build.sh --release
119```
120
121### AOT (Ahead-of-time) components:
122
123Python APIs on x64 are required to compile models to Qualcomm AI Engine Direct binary.
124
125```bash
126cd $EXECUTORCH_ROOT
127mkdir build-x86
128cd build-x86
129# Note that the below command might change.
130# Please refer to the above build.sh for latest workable commands.
131cmake .. \
132  -DCMAKE_INSTALL_PREFIX=$PWD \
133  -DEXECUTORCH_BUILD_QNN=ON \
134  -DQNN_SDK_ROOT=${QNN_SDK_ROOT} \
135  -DEXECUTORCH_BUILD_DEVTOOLS=ON \
136  -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
137  -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
138  -DEXECUTORCH_ENABLE_EVENT_TRACER=ON \
139  -DPYTHON_EXECUTABLE=python3 \
140  -DEXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT=OFF
141
142# nproc is used to detect the number of available CPU.
143# If it is not applicable, please feel free to use the number you want.
144cmake --build $PWD --target "PyQnnManagerAdaptor" "PyQnnWrapperAdaptor" -j$(nproc)
145
146# install Python APIs to correct import path
147# The filename might vary depending on your Python and host version.
148cp -f backends/qualcomm/PyQnnManagerAdaptor.cpython-310-x86_64-linux-gnu.so $EXECUTORCH_ROOT/backends/qualcomm/python
149cp -f backends/qualcomm/PyQnnWrapperAdaptor.cpython-310-x86_64-linux-gnu.so $EXECUTORCH_ROOT/backends/qualcomm/python
150
151# Workaround for fbs files in exir/_serialize
152cp $EXECUTORCH_ROOT/schema/program.fbs $EXECUTORCH_ROOT/exir/_serialize/program.fbs
153cp $EXECUTORCH_ROOT/schema/scalar_type.fbs $EXECUTORCH_ROOT/exir/_serialize/scalar_type.fbs
154```
155
156### Runtime:
157
158A example `qnn_executor_runner` executable would be used to run the compiled `pte` model.
159
160Commands to build `qnn_executor_runner` for Android:
161
162```bash
163cd $EXECUTORCH_ROOT
164mkdir build-android
165cd build-android
166# build executorch & qnn_executorch_backend
167cmake .. \
168    -DCMAKE_INSTALL_PREFIX=$PWD \
169    -DEXECUTORCH_BUILD_QNN=ON \
170    -DQNN_SDK_ROOT=$QNN_SDK_ROOT \
171    -DEXECUTORCH_BUILD_DEVTOOLS=ON \
172    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
173    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
174    -DEXECUTORCH_ENABLE_EVENT_TRACER=ON \
175    -DPYTHON_EXECUTABLE=python3 \
176    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_ROOT/build/cmake/android.toolchain.cmake \
177    -DANDROID_ABI='arm64-v8a' \
178    -DANDROID_NATIVE_API_LEVEL=23
179
180# nproc is used to detect the number of available CPU.
181# If it is not applicable, please feel free to use the number you want.
182cmake --build $PWD --target install -j$(nproc)
183
184cmake ../examples/qualcomm \
185    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_ROOT/build/cmake/android.toolchain.cmake \
186    -DANDROID_ABI='arm64-v8a' \
187    -DANDROID_NATIVE_API_LEVEL=23 \
188    -DCMAKE_PREFIX_PATH="$PWD/lib/cmake/ExecuTorch;$PWD/third-party/gflags;" \
189    -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=BOTH \
190    -DPYTHON_EXECUTABLE=python3 \
191    -Bexamples/qualcomm
192
193cmake --build examples/qualcomm -j$(nproc)
194
195# qnn_executor_runner can be found under examples/qualcomm
196# The full path is $EXECUTORCH_ROOT/build-android/examples/qualcomm/qnn_executor_runner
197ls examples/qualcomm
198```
199
200**Note:** If you want to build for release, add `-DCMAKE_BUILD_TYPE=Release` to the `cmake` command options.
201
202
203## Deploying and running on device
204
205### AOT compile a model
206
207Refer to [this script](https://github.com/pytorch/executorch/blob/main/examples/qualcomm/scripts/deeplab_v3.py) for the exact flow.
208We use deeplab-v3-resnet101 as an example in this tutorial. Run below commands to compile:
209
210```bash
211cd $EXECUTORCH_ROOT
212
213python -m examples.qualcomm.scripts.deeplab_v3 -b build-android -m SM8550 --compile_only --download
214```
215
216You might see something like below:
217
218```
219[INFO][Qnn ExecuTorch] Destroy Qnn context
220[INFO][Qnn ExecuTorch] Destroy Qnn device
221[INFO][Qnn ExecuTorch] Destroy Qnn backend
222
223opcode         name                      target                       args                           kwargs
224-------------  ------------------------  ---------------------------  -----------------------------  --------
225placeholder    arg684_1                  arg684_1                     ()                             {}
226get_attr       lowered_module_0          lowered_module_0             ()                             {}
227call_function  executorch_call_delegate  executorch_call_delegate     (lowered_module_0, arg684_1)   {}
228call_function  getitem                   <built-in function getitem>  (executorch_call_delegate, 0)  {}
229call_function  getitem_1                 <built-in function getitem>  (executorch_call_delegate, 1)  {}
230output         output                    output                       ([getitem_1, getitem],)        {}
231```
232
233The compiled model is `./deeplab_v3/dlv3_qnn.pte`.
234
235
236### Test model inference on QNN HTP emulator
237
238We can test model inferences before deploying it to a device by HTP emulator.
239
240Let's build `qnn_executor_runner` for a x64 host:
241```bash
242# assuming the AOT component is built.
243cd $EXECUTORCH_ROOT/build-x86
244cmake ../examples/qualcomm \
245  -DCMAKE_PREFIX_PATH="$PWD/lib/cmake/ExecuTorch;$PWD/third-party/gflags;" \
246  -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=BOTH \
247  -DPYTHON_EXECUTABLE=python3 \
248  -Bexamples/qualcomm
249
250cmake --build examples/qualcomm -j$(nproc)
251
252# qnn_executor_runner can be found under examples/qualcomm
253# The full path is $EXECUTORCH_ROOT/build-x86/examples/qualcomm/qnn_executor_runner
254ls examples/qualcomm/
255```
256
257To run the HTP emulator, the dynamic linker need to access QNN libraries and `libqnn_executorch_backend.so`.
258We set the below two paths to `LD_LIBRARY_PATH` environment variable:
259  1. `$QNN_SDK_ROOT/lib/x86_64-linux-clang/`
260  2. `$EXECUTORCH_ROOT/build-x86/lib/`
261
262The first path is for QNN libraries including HTP emulator. It has been configured in the AOT compilation section.
263
264The second path is for `libqnn_executorch_backend.so`.
265
266So, we can run `./deeplab_v3/dlv3_qnn.pte` by:
267```bash
268cd $EXECUTORCH_ROOT/build-x86
269export LD_LIBRARY_PATH=$EXECUTORCH_ROOT/build-x86/lib/:$LD_LIBRARY_PATH
270examples/qualcomm/qnn_executor_runner --model_path ../deeplab_v3/dlv3_qnn.pte
271```
272
273We should see some outputs like the below. Note that the emulator can take some time to finish.
274```bash
275I 00:00:00.354662 executorch:qnn_executor_runner.cpp:213] Method loaded.
276I 00:00:00.356460 executorch:qnn_executor_runner.cpp:261] ignoring error from set_output_data_ptr(): 0x2
277I 00:00:00.357991 executorch:qnn_executor_runner.cpp:261] ignoring error from set_output_data_ptr(): 0x2
278I 00:00:00.357996 executorch:qnn_executor_runner.cpp:265] Inputs prepared.
279
280I 00:01:09.328144 executorch:qnn_executor_runner.cpp:414] Model executed successfully.
281I 00:01:09.328159 executorch:qnn_executor_runner.cpp:421] Write etdump to etdump.etdp, Size = 424
282[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
283[INFO] [Qnn ExecuTorch]: Destroy Qnn context
284[INFO] [Qnn ExecuTorch]: Destroy Qnn device
285[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
286```
287
288### Run model inference on an Android smartphone with Qualcomm SoCs
289
290***Step 1***. We need to push required QNN libraries to the device.
291
292```bash
293# make sure you have write-permission on below path.
294DEVICE_DIR=/data/local/tmp/executorch_qualcomm_tutorial/
295adb shell "mkdir -p ${DEVICE_DIR}"
296adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtp.so ${DEVICE_DIR}
297adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnSystem.so ${DEVICE_DIR}
298adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV69Stub.so ${DEVICE_DIR}
299adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV73Stub.so ${DEVICE_DIR}
300adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV75Stub.so ${DEVICE_DIR}
301adb push ${QNN_SDK_ROOT}/lib/hexagon-v69/unsigned/libQnnHtpV69Skel.so ${DEVICE_DIR}
302adb push ${QNN_SDK_ROOT}/lib/hexagon-v73/unsigned/libQnnHtpV73Skel.so ${DEVICE_DIR}
303adb push ${QNN_SDK_ROOT}/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so ${DEVICE_DIR}
304```
305
306***Step 2***.  We also need to indicate dynamic linkers on Android and Hexagon
307where to find these libraries by setting `ADSP_LIBRARY_PATH` and `LD_LIBRARY_PATH`.
308So, we can run `qnn_executor_runner` like
309
310```bash
311adb push ./deeplab_v3/dlv3_qnn.pte ${DEVICE_DIR}
312adb push ${EXECUTORCH_ROOT}/build-android/examples/qualcomm/executor_runner/qnn_executor_runner ${DEVICE_DIR}
313adb push ${EXECUTORCH_ROOT}/build-android/lib/libqnn_executorch_backend.so ${DEVICE_DIR}
314adb shell "cd ${DEVICE_DIR} \
315           && export LD_LIBRARY_PATH=${DEVICE_DIR} \
316           && export ADSP_LIBRARY_PATH=${DEVICE_DIR} \
317           && ./qnn_executor_runner --model_path ./dlv3_qnn.pte"
318```
319
320You should see something like below:
321
322```
323I 00:00:00.257354 executorch:qnn_executor_runner.cpp:213] Method loaded.
324I 00:00:00.323502 executorch:qnn_executor_runner.cpp:262] ignoring error from set_output_data_ptr(): 0x2
325I 00:00:00.357496 executorch:qnn_executor_runner.cpp:262] ignoring error from set_output_data_ptr(): 0x2
326I 00:00:00.357555 executorch:qnn_executor_runner.cpp:265] Inputs prepared.
327I 00:00:00.364824 executorch:qnn_executor_runner.cpp:414] Model executed successfully.
328I 00:00:00.364875 executorch:qnn_executor_runner.cpp:425] Write etdump to etdump.etdp, Size = 424
329[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
330[INFO] [Qnn ExecuTorch]: Destroy Qnn context
331[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
332```
333
334The model is merely executed. If we want to feed real inputs and get model outputs, we can use
335```bash
336cd $EXECUTORCH_ROOT
337python -m examples.qualcomm.scripts.deeplab_v3 -b build-android -m SM8550 --download -s <device_serial>
338```
339The `<device_serial>` can be found by `adb devices` command.
340
341After the above command, pre-processed inputs and outputs are put in `$EXECUTORCH_ROOT/deeplab_v3` and `$EXECUTORCH_ROOT/deeplab_v3/outputs` folder.
342
343The command-line arguents are written in [utils.py](https://github.com/pytorch/executorch/blob/main/examples/qualcomm/scripts/utils.py#L127).
344The model, inputs, and output location are passed to `qnn_executorch_runner` by `--model_path`, `--input_list_path`, and `--output_folder_path`.
345
346
347### Running a model via ExecuTorch's android demo-app
348
349An Android demo-app using Qualcomm AI Engine Direct Backend can be found in
350`examples`. Please refer to android demo app [tutorial](https://pytorch.org/executorch/stable/demo-apps-android.html).
351
352## Supported model list
353
354Please refer to `$EXECUTORCH_ROOT/examples/qualcomm/scripts/` and `EXECUTORCH_ROOT/examples/qualcomm/oss_scripts/` to the list of supported models.
355
356## What is coming?
357
358 - Improve the performance for llama3-8B-Instruct and support batch prefill.
359 - We will support pre-compiled binaries from [Qualcomm AI Hub](https://aihub.qualcomm.com/).
360
361## FAQ
362
363If you encounter any issues while reproducing the tutorial, please file a github
364issue on ExecuTorch repo and tag use `#qcom_aisw` tag
365