Name Date Size #Lines LOC

..--

app/H25-Apr-2025-4,3003,515

docs/delegates/H25-Apr-2025-583470

gradle/wrapper/H25-Apr-2025-76

.gitignoreH A D25-Apr-2025119 1312

README.mdH A D25-Apr-20256.9 KiB146102

SDK-quick-setup-guide.mdH A D25-Apr-20253.3 KiB9583

build.gradle.ktsH A D25-Apr-2025459 144

download_prebuilt_lib.shH A D25-Apr-2025680 209

gradle.propertiesH A D25-Apr-20251.3 KiB2423

gradlewH A D25-Apr-20255.6 KiB186125

gradlew.batH A D25-Apr-20252.9 KiB9673

settings.gradle.ktsH A D25-Apr-2025527 2816

setup-with-qnn.shH A D25-Apr-2025620 2614

setup.shH A D25-Apr-2025574 2211

README.md

1# ExecuTorch Llama Android Demo App
2
3**[UPDATE - 10/24]** We have added support for running quantized Llama 3.2 1B/3B models in demo apps on the [XNNPACK backend](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md). We currently support inference with SpinQuant and QAT+LoRA quantization methods.
4
5We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer.
6
7This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case.
8
9Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas.
10
11
12## Key Concepts
13From this demo app, you will learn many key concepts such as:
14* How to prepare Llama models, build the ExecuTorch library, and model inferencing across delegates
15* Expose the ExecuTorch library via JNI layer
16* Familiarity with current ExecuTorch app-facing capabilities
17
18The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases.
19
20## Supporting Models
21As a whole, the models that this app supports are (varies by delegate):
22* Llama 3.2 Quantized 1B/3B
23* Llama 3.2 1B/3B in BF16
24* Llama Guard 3 1B
25* Llama 3.1 8B
26* Llama 3 8B
27* Llama 2 7B
28* LLaVA-1.5 vision model (only XNNPACK)
29
30
31## Building the APK
32First it’s important to note that currently ExecuTorch provides support across 3 delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to exporting the models to build ExecuTorch libraries and apps to run on device:
33
34| Delegate      | Resource |
35| ------------- | ------------- |
36| XNNPACK (CPU-based library)  | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/xnnpack_README.md) |
37| QNN (Qualcomm AI Accelerators)  | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/qualcomm_README.md) |
38| MediaTek (MediaTek AI Accelerators)  | [link](https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/LlamaDemo/docs/delegates/mediatek_README.md)  |
39
40
41## How to Use the App
42
43This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API.
44
45For loading the app, development, and running on device we recommend Android Studio:
461. Open Android Studio and select "Open an existing Android Studio project" to open examples/demo-apps/android/LlamaDemo.
472. Run the app (^R). This builds and launches the app on the phone.
48
49### Opening the App
50
51Below are the UI features for the app.
52
53Select the settings widget to get started with picking a model, its parameters and any prompts.
54<p align="center">
55<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/opening_the_app_details.png" style="width:800px">
56</p>
57
58
59
60### Select Models and Parameters
61
62Once you've selected the model, tokenizer, and model type you are ready to click on "Load Model" to have the app load the model and go back to the main Chat activity.
63<p align="center">
64      <img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/settings_menu.png" style="width:300px">
65</p>
66
67
68
69Optional Parameters:
70* Temperature: Defaulted to 0, you can adjust the temperature for the model as well. The model will reload upon any adjustments.
71* System Prompt: Without any formatting, you can enter in a system prompt. For example, "you are a travel assistant" or "give me a response in a few sentences".
72* User Prompt: More for the advanced user, if you would like to manually input a prompt then you can do so by modifying the `{{user prompt}}`. You can also modify the special tokens as well. Once changed then go back to the main Chat activity to send.
73
74#### ExecuTorch App API
75
76```java
77// Upon returning to the Main Chat Activity
78mModule = new LlamaModule(
79            ModelUtils.getModelCategory(mCurrentSettingsFields.getModelType()),
80            modelPath,
81            tokenizerPath,
82            temperature);
83int loadResult = mModule.load();
84```
85
86* `modelCategory`: Indicate whether it’s a text-only or vision model
87* `modePath`: path to the .pte file
88* `tokenizerPath`: path to the tokenizer .bin file
89* `temperature`: model parameter to adjust the randomness of the model’s output
90
91
92### User Prompt
93Once model is successfully loaded then enter any prompt and click the send (i.e. generate) button to send it to the model.
94<p align="center">
95<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/load_complete_and_start_prompt.png" style="width:300px">
96</p>
97
98You can provide it more follow-up questions as well.
99<p align="center">
100<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/chat.png" style="width:300px">
101</p>
102
103#### ExecuTorch App API
104
105```java
106mModule.generate(prompt,sequence_length, MainActivity.this);
107```
108* `prompt`: User formatted prompt
109* `sequence_length`: Number of tokens to generate in response to a prompt
110* `MainActivity.this`: Indicate that the callback functions (OnResult(), OnStats()) are present in this class.
111
112[*LLaVA-1.5: Only for XNNPACK delegate*]
113
114For LLaVA-1.5 implementation, select the exported LLaVA .pte and tokenizer file in the Settings menu and load the model. After this you can send an image from your gallery or take a live picture along with a text prompt to the model.
115
116<p align="center">
117<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/llava_example.png" style="width:300px">
118</p>
119
120
121### Output Generated
122To show completion of the follow-up question, here is the complete detailed response from the model.
123<p align="center">
124<img src="https://raw.githubusercontent.com/pytorch/executorch/refs/heads/main/docs/source/_static/img/chat_response.png" style="width:300px">
125</p>
126
127#### ExecuTorch App API
128
129Ensure you have the following functions in your callback class that you provided in the `mModule.generate()`. For this example, it is `MainActivity.this`.
130```java
131  @Override
132  public void onResult(String result) {
133    //...result contains token from response
134    //.. onResult will continue to be invoked until response is complete
135  }
136
137  @Override
138  public void onStats(float tps) {
139    //...tps (tokens per second) stats is provided by framework
140  }
141
142```
143
144## Reporting Issues
145If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).
146