1# Building and Running ExecuTorch with Qualcomm AI Engine Direct Backend 2 3In this tutorial we will walk you through the process of getting started to 4build ExecuTorch for Qualcomm AI Engine Direct and running a model on it. 5 6Qualcomm AI Engine Direct is also referred to as QNN in the source and documentation. 7 8 9<!----This will show a grid card on the page-----> 10::::{grid} 2 11:::{grid-item-card} What you will learn in this tutorial: 12:class-card: card-prerequisites 13* In this tutorial you will learn how to lower and deploy a model for Qualcomm AI Engine Direct. 14::: 15:::{grid-item-card} Tutorials we recommend you complete before this: 16:class-card: card-prerequisites 17* [Introduction to ExecuTorch](intro-how-it-works.md) 18* [Setting up ExecuTorch](getting-started-setup.md) 19* [Building ExecuTorch with CMake](runtime-build-and-cross-compilation.md) 20::: 21:::: 22 23 24## What's Qualcomm AI Engine Direct? 25 26[Qualcomm AI Engine Direct](https://developer.qualcomm.com/software/qualcomm-ai-engine-direct-sdk) 27is designed to provide unified, low-level APIs for AI development. 28 29Developers can interact with various accelerators on Qualcomm SoCs with these set of APIs, including 30Kryo CPU, Adreno GPU, and Hexagon processors. More details can be found [here](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/overview.html). 31 32Currently, this ExecuTorch Backend can delegate AI computations to Hexagon processors through Qualcomm AI Engine Direct APIs. 33 34 35## Prerequsites (Hardware and Software) 36 37### Host OS 38 39The Linux host operating system that QNN Backend is verified with is Ubuntu 22.04 LTS x64 40at the moment of updating this tutorial. 41Usually, we verified the backend on the same OS version which QNN is verified with. 42The version is documented in QNN SDK. 43 44### Hardware: 45You will need an Android smartphone with adb-connected running on one of below Qualcomm SoCs: 46 - SM8450 (Snapdragon 8 Gen 1) 47 - SM8475 (Snapdragon 8 Gen 1+) 48 - SM8550 (Snapdragon 8 Gen 2) 49 - SM8650 (Snapdragon 8 Gen 3) 50 51This example is verified with SM8550 and SM8450. 52 53### Software: 54 55 - Follow ExecuTorch recommended Python version. 56 - A compiler to compile AOT parts, e.g., the GCC compiler comes with Ubuntu LTS. 57 - [Android NDK](https://developer.android.com/ndk). This example is verified with NDK 26c. 58 - [Qualcomm AI Engine Direct SDK](https://developer.qualcomm.com/software/qualcomm-ai-engine-direct-sdk) 59 - Click the "Get Software" button to download a version of QNN SDK. 60 - However, at the moment of updating this tutorial, the above website doesn't provide QNN SDK newer than 2.22.6. 61 - The below is public links to download various QNN versions. Hope they can be publicly discoverable soon. 62 - [QNN 2.26.0](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.26.0.240828.zip) 63 64The directory with installed Qualcomm AI Engine Direct SDK looks like: 65``` 66├── benchmarks 67├── bin 68├── docs 69├── examples 70├── include 71├── lib 72├── LICENSE.pdf 73├── NOTICE.txt 74├── NOTICE_WINDOWS.txt 75├── QNN_NOTICE.txt 76├── QNN_README.txt 77├── QNN_ReleaseNotes.txt 78├── ReleaseNotes.txt 79├── ReleaseNotesWindows.txt 80├── sdk.yaml 81└── share 82``` 83 84 85## Setting up your developer environment 86 87### Conventions 88 89`$QNN_SDK_ROOT` refers to the root of Qualcomm AI Engine Direct SDK, 90i.e., the directory containing `QNN_README.txt`. 91 92`$ANDROID_NDK_ROOT` refers to the root of Android NDK. 93 94`$EXECUTORCH_ROOT` refers to the root of executorch git repository. 95 96### Setup environment variables 97 98We set `LD_LIBRARY_PATH` to make sure the dynamic linker can find QNN libraries. 99 100Further, we set `PYTHONPATH` because it's easier to develop and import ExecuTorch 101Python APIs. 102 103```bash 104export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang/:$LD_LIBRARY_PATH 105export PYTHONPATH=$EXECUTORCH_ROOT/.. 106``` 107 108## Build 109 110An example script for the below building instructions is [here](https://github.com/pytorch/executorch/blob/main/backends/qualcomm/scripts/build.sh). 111We recommend to use the script because the ExecuTorch build-command can change from time to time. 112The above script is actively used. It is updated more frquently than this tutorial. 113An example usage is 114```bash 115cd $EXECUTORCH_ROOT 116./backends/qualcomm/scripts/build.sh 117# or 118./backends/qualcomm/scripts/build.sh --release 119``` 120 121### AOT (Ahead-of-time) components: 122 123Python APIs on x64 are required to compile models to Qualcomm AI Engine Direct binary. 124 125```bash 126cd $EXECUTORCH_ROOT 127mkdir build-x86 128cd build-x86 129# Note that the below command might change. 130# Please refer to the above build.sh for latest workable commands. 131cmake .. \ 132 -DCMAKE_INSTALL_PREFIX=$PWD \ 133 -DEXECUTORCH_BUILD_QNN=ON \ 134 -DQNN_SDK_ROOT=${QNN_SDK_ROOT} \ 135 -DEXECUTORCH_BUILD_DEVTOOLS=ON \ 136 -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ 137 -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ 138 -DEXECUTORCH_ENABLE_EVENT_TRACER=ON \ 139 -DPYTHON_EXECUTABLE=python3 \ 140 -DEXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT=OFF 141 142# nproc is used to detect the number of available CPU. 143# If it is not applicable, please feel free to use the number you want. 144cmake --build $PWD --target "PyQnnManagerAdaptor" "PyQnnWrapperAdaptor" -j$(nproc) 145 146# install Python APIs to correct import path 147# The filename might vary depending on your Python and host version. 148cp -f backends/qualcomm/PyQnnManagerAdaptor.cpython-310-x86_64-linux-gnu.so $EXECUTORCH_ROOT/backends/qualcomm/python 149cp -f backends/qualcomm/PyQnnWrapperAdaptor.cpython-310-x86_64-linux-gnu.so $EXECUTORCH_ROOT/backends/qualcomm/python 150 151# Workaround for fbs files in exir/_serialize 152cp $EXECUTORCH_ROOT/schema/program.fbs $EXECUTORCH_ROOT/exir/_serialize/program.fbs 153cp $EXECUTORCH_ROOT/schema/scalar_type.fbs $EXECUTORCH_ROOT/exir/_serialize/scalar_type.fbs 154``` 155 156### Runtime: 157 158A example `qnn_executor_runner` executable would be used to run the compiled `pte` model. 159 160Commands to build `qnn_executor_runner` for Android: 161 162```bash 163cd $EXECUTORCH_ROOT 164mkdir build-android 165cd build-android 166# build executorch & qnn_executorch_backend 167cmake .. \ 168 -DCMAKE_INSTALL_PREFIX=$PWD \ 169 -DEXECUTORCH_BUILD_QNN=ON \ 170 -DQNN_SDK_ROOT=$QNN_SDK_ROOT \ 171 -DEXECUTORCH_BUILD_DEVTOOLS=ON \ 172 -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ 173 -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ 174 -DEXECUTORCH_ENABLE_EVENT_TRACER=ON \ 175 -DPYTHON_EXECUTABLE=python3 \ 176 -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_ROOT/build/cmake/android.toolchain.cmake \ 177 -DANDROID_ABI='arm64-v8a' \ 178 -DANDROID_NATIVE_API_LEVEL=23 179 180# nproc is used to detect the number of available CPU. 181# If it is not applicable, please feel free to use the number you want. 182cmake --build $PWD --target install -j$(nproc) 183 184cmake ../examples/qualcomm \ 185 -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_ROOT/build/cmake/android.toolchain.cmake \ 186 -DANDROID_ABI='arm64-v8a' \ 187 -DANDROID_NATIVE_API_LEVEL=23 \ 188 -DCMAKE_PREFIX_PATH="$PWD/lib/cmake/ExecuTorch;$PWD/third-party/gflags;" \ 189 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=BOTH \ 190 -DPYTHON_EXECUTABLE=python3 \ 191 -Bexamples/qualcomm 192 193cmake --build examples/qualcomm -j$(nproc) 194 195# qnn_executor_runner can be found under examples/qualcomm 196# The full path is $EXECUTORCH_ROOT/build-android/examples/qualcomm/qnn_executor_runner 197ls examples/qualcomm 198``` 199 200**Note:** If you want to build for release, add `-DCMAKE_BUILD_TYPE=Release` to the `cmake` command options. 201 202 203## Deploying and running on device 204 205### AOT compile a model 206 207Refer to [this script](https://github.com/pytorch/executorch/blob/main/examples/qualcomm/scripts/deeplab_v3.py) for the exact flow. 208We use deeplab-v3-resnet101 as an example in this tutorial. Run below commands to compile: 209 210```bash 211cd $EXECUTORCH_ROOT 212 213python -m examples.qualcomm.scripts.deeplab_v3 -b build-android -m SM8550 --compile_only --download 214``` 215 216You might see something like below: 217 218``` 219[INFO][Qnn ExecuTorch] Destroy Qnn context 220[INFO][Qnn ExecuTorch] Destroy Qnn device 221[INFO][Qnn ExecuTorch] Destroy Qnn backend 222 223opcode name target args kwargs 224------------- ------------------------ --------------------------- ----------------------------- -------- 225placeholder arg684_1 arg684_1 () {} 226get_attr lowered_module_0 lowered_module_0 () {} 227call_function executorch_call_delegate executorch_call_delegate (lowered_module_0, arg684_1) {} 228call_function getitem <built-in function getitem> (executorch_call_delegate, 0) {} 229call_function getitem_1 <built-in function getitem> (executorch_call_delegate, 1) {} 230output output output ([getitem_1, getitem],) {} 231``` 232 233The compiled model is `./deeplab_v3/dlv3_qnn.pte`. 234 235 236### Test model inference on QNN HTP emulator 237 238We can test model inferences before deploying it to a device by HTP emulator. 239 240Let's build `qnn_executor_runner` for a x64 host: 241```bash 242# assuming the AOT component is built. 243cd $EXECUTORCH_ROOT/build-x86 244cmake ../examples/qualcomm \ 245 -DCMAKE_PREFIX_PATH="$PWD/lib/cmake/ExecuTorch;$PWD/third-party/gflags;" \ 246 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=BOTH \ 247 -DPYTHON_EXECUTABLE=python3 \ 248 -Bexamples/qualcomm 249 250cmake --build examples/qualcomm -j$(nproc) 251 252# qnn_executor_runner can be found under examples/qualcomm 253# The full path is $EXECUTORCH_ROOT/build-x86/examples/qualcomm/qnn_executor_runner 254ls examples/qualcomm/ 255``` 256 257To run the HTP emulator, the dynamic linker need to access QNN libraries and `libqnn_executorch_backend.so`. 258We set the below two paths to `LD_LIBRARY_PATH` environment variable: 259 1. `$QNN_SDK_ROOT/lib/x86_64-linux-clang/` 260 2. `$EXECUTORCH_ROOT/build-x86/lib/` 261 262The first path is for QNN libraries including HTP emulator. It has been configured in the AOT compilation section. 263 264The second path is for `libqnn_executorch_backend.so`. 265 266So, we can run `./deeplab_v3/dlv3_qnn.pte` by: 267```bash 268cd $EXECUTORCH_ROOT/build-x86 269export LD_LIBRARY_PATH=$EXECUTORCH_ROOT/build-x86/lib/:$LD_LIBRARY_PATH 270examples/qualcomm/qnn_executor_runner --model_path ../deeplab_v3/dlv3_qnn.pte 271``` 272 273We should see some outputs like the below. Note that the emulator can take some time to finish. 274```bash 275I 00:00:00.354662 executorch:qnn_executor_runner.cpp:213] Method loaded. 276I 00:00:00.356460 executorch:qnn_executor_runner.cpp:261] ignoring error from set_output_data_ptr(): 0x2 277I 00:00:00.357991 executorch:qnn_executor_runner.cpp:261] ignoring error from set_output_data_ptr(): 0x2 278I 00:00:00.357996 executorch:qnn_executor_runner.cpp:265] Inputs prepared. 279 280I 00:01:09.328144 executorch:qnn_executor_runner.cpp:414] Model executed successfully. 281I 00:01:09.328159 executorch:qnn_executor_runner.cpp:421] Write etdump to etdump.etdp, Size = 424 282[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters 283[INFO] [Qnn ExecuTorch]: Destroy Qnn context 284[INFO] [Qnn ExecuTorch]: Destroy Qnn device 285[INFO] [Qnn ExecuTorch]: Destroy Qnn backend 286``` 287 288### Run model inference on an Android smartphone with Qualcomm SoCs 289 290***Step 1***. We need to push required QNN libraries to the device. 291 292```bash 293# make sure you have write-permission on below path. 294DEVICE_DIR=/data/local/tmp/executorch_qualcomm_tutorial/ 295adb shell "mkdir -p ${DEVICE_DIR}" 296adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtp.so ${DEVICE_DIR} 297adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnSystem.so ${DEVICE_DIR} 298adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV69Stub.so ${DEVICE_DIR} 299adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV73Stub.so ${DEVICE_DIR} 300adb push ${QNN_SDK_ROOT}/lib/aarch64-android/libQnnHtpV75Stub.so ${DEVICE_DIR} 301adb push ${QNN_SDK_ROOT}/lib/hexagon-v69/unsigned/libQnnHtpV69Skel.so ${DEVICE_DIR} 302adb push ${QNN_SDK_ROOT}/lib/hexagon-v73/unsigned/libQnnHtpV73Skel.so ${DEVICE_DIR} 303adb push ${QNN_SDK_ROOT}/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so ${DEVICE_DIR} 304``` 305 306***Step 2***. We also need to indicate dynamic linkers on Android and Hexagon 307where to find these libraries by setting `ADSP_LIBRARY_PATH` and `LD_LIBRARY_PATH`. 308So, we can run `qnn_executor_runner` like 309 310```bash 311adb push ./deeplab_v3/dlv3_qnn.pte ${DEVICE_DIR} 312adb push ${EXECUTORCH_ROOT}/build-android/examples/qualcomm/executor_runner/qnn_executor_runner ${DEVICE_DIR} 313adb push ${EXECUTORCH_ROOT}/build-android/lib/libqnn_executorch_backend.so ${DEVICE_DIR} 314adb shell "cd ${DEVICE_DIR} \ 315 && export LD_LIBRARY_PATH=${DEVICE_DIR} \ 316 && export ADSP_LIBRARY_PATH=${DEVICE_DIR} \ 317 && ./qnn_executor_runner --model_path ./dlv3_qnn.pte" 318``` 319 320You should see something like below: 321 322``` 323I 00:00:00.257354 executorch:qnn_executor_runner.cpp:213] Method loaded. 324I 00:00:00.323502 executorch:qnn_executor_runner.cpp:262] ignoring error from set_output_data_ptr(): 0x2 325I 00:00:00.357496 executorch:qnn_executor_runner.cpp:262] ignoring error from set_output_data_ptr(): 0x2 326I 00:00:00.357555 executorch:qnn_executor_runner.cpp:265] Inputs prepared. 327I 00:00:00.364824 executorch:qnn_executor_runner.cpp:414] Model executed successfully. 328I 00:00:00.364875 executorch:qnn_executor_runner.cpp:425] Write etdump to etdump.etdp, Size = 424 329[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters 330[INFO] [Qnn ExecuTorch]: Destroy Qnn context 331[INFO] [Qnn ExecuTorch]: Destroy Qnn backend 332``` 333 334The model is merely executed. If we want to feed real inputs and get model outputs, we can use 335```bash 336cd $EXECUTORCH_ROOT 337python -m examples.qualcomm.scripts.deeplab_v3 -b build-android -m SM8550 --download -s <device_serial> 338``` 339The `<device_serial>` can be found by `adb devices` command. 340 341After the above command, pre-processed inputs and outputs are put in `$EXECUTORCH_ROOT/deeplab_v3` and `$EXECUTORCH_ROOT/deeplab_v3/outputs` folder. 342 343The command-line arguents are written in [utils.py](https://github.com/pytorch/executorch/blob/main/examples/qualcomm/scripts/utils.py#L127). 344The model, inputs, and output location are passed to `qnn_executorch_runner` by `--model_path`, `--input_list_path`, and `--output_folder_path`. 345 346 347### Running a model via ExecuTorch's android demo-app 348 349An Android demo-app using Qualcomm AI Engine Direct Backend can be found in 350`examples`. Please refer to android demo app [tutorial](https://pytorch.org/executorch/stable/demo-apps-android.html). 351 352## Supported model list 353 354Please refer to `$EXECUTORCH_ROOT/examples/qualcomm/scripts/` and `EXECUTORCH_ROOT/examples/qualcomm/oss_scripts/` to the list of supported models. 355 356## What is coming? 357 358 - Improve the performance for llama3-8B-Instruct and support batch prefill. 359 - We will support pre-compiled binaries from [Qualcomm AI Hub](https://aihub.qualcomm.com/). 360 361## FAQ 362 363If you encounter any issues while reproducing the tutorial, please file a github 364issue on ExecuTorch repo and tag use `#qcom_aisw` tag 365