xref: /aosp_15_r20/external/executorch/examples/models/llama/README.md (revision 523fa7a60841cd1ecfb9cc4201f1ca8b03ed023a)
1*523fa7a6SAndroid Build Coastguard Worker# Summary
2*523fa7a6SAndroid Build Coastguard WorkerThis example demonstrates how to run [Llama models](https://www.llama.com/) on mobile via ExecuTorch. We use XNNPACK to accelerate the performance and 4-bit groupwise quantization to fit the model on a phone.
3*523fa7a6SAndroid Build Coastguard Worker
4*523fa7a6SAndroid Build Coastguard WorkerHere are supported models:
5*523fa7a6SAndroid Build Coastguard Worker
6*523fa7a6SAndroid Build Coastguard Worker- Llama 3.2 1B and 3B
7*523fa7a6SAndroid Build Coastguard Worker- Llama 3.2 Quantized 1B and 3B
8*523fa7a6SAndroid Build Coastguard Worker- Llama 3.1 8B
9*523fa7a6SAndroid Build Coastguard Worker- Llama 3 8B
10*523fa7a6SAndroid Build Coastguard Worker- [Llama 2 7B](../llama2/README.md)
11*523fa7a6SAndroid Build Coastguard Worker
12*523fa7a6SAndroid Build Coastguard WorkerPretrained models are not included in this repo. Users are suggested to download them [here](https://ai.meta.com/resources/models-and-libraries/llama-downloads/).
13*523fa7a6SAndroid Build Coastguard Worker
14*523fa7a6SAndroid Build Coastguard WorkerThis page contains the basic recipe for running Llama. See [Llama utils page](./UTILS.md) page for more advanced use-cases such as fine-tuning and running smaller models for educational purposes.
15*523fa7a6SAndroid Build Coastguard Worker
16*523fa7a6SAndroid Build Coastguard Worker# What is Llama?
17*523fa7a6SAndroid Build Coastguard WorkerLlama is a collection of large language models that use publicly available data for training. These models are based on the transformer architecture, which allows it to process input sequences of arbitrary length and generate output sequences of variable length. One of the key features of Llama models is its ability to generate coherent and contextually relevant text. This is achieved through the use of attention mechanisms, which allow the model to focus on different parts of the input sequence as it generates output. Additionally, Llama models use a technique called “masked language modeling” to pre-train the model on a large corpus of text, which helps it learn to predict missing words in a sentence.
18*523fa7a6SAndroid Build Coastguard Worker
19*523fa7a6SAndroid Build Coastguard WorkerLlama models have shown to perform well on a variety of natural language processing tasks, including language translation, question answering, and text summarization and are also capable of generating human-like text, making Llama models a useful tool for creative writing and other applications where natural language generation is important.
20*523fa7a6SAndroid Build Coastguard Worker
21*523fa7a6SAndroid Build Coastguard WorkerOverall, Llama models are powerful and versatile language models that can be used for a wide range of natural language processing tasks. The model’s ability to generate coherent and contextually relevant text makes it particularly useful for applications such as chatbots, virtual assistants, and language translation.
22*523fa7a6SAndroid Build Coastguard Worker
23*523fa7a6SAndroid Build Coastguard WorkerPlease note that the models are subject to the [Llama 2 Acceptable Use Policy](https://github.com/facebookresearch/llama/blob/main/USE_POLICY.md), [Llama 3 Acceptable Use Policy](https://github.com/meta-llama/llama3/blob/main/USE_POLICY.md) and [Responsible Use Guide](https://ai.meta.com/static-resource/responsible-use-guide/).
24*523fa7a6SAndroid Build Coastguard Worker
25*523fa7a6SAndroid Build Coastguard Worker
26*523fa7a6SAndroid Build Coastguard Worker# Results
27*523fa7a6SAndroid Build Coastguard Worker
28*523fa7a6SAndroid Build Coastguard Worker## Llama 3.2 1B/3B and quantized 1B/3B models
29*523fa7a6SAndroid Build Coastguard Worker
30*523fa7a6SAndroid Build Coastguard WorkerFor Llama 3.2 1B/3B models, we have enabled the original BF16 format and quantization to 4-bit, using SpinQuant and QAT+LoRA, for enhanced performance.
31*523fa7a6SAndroid Build Coastguard Worker
32*523fa7a6SAndroid Build Coastguard WorkerThe quantized models were optimized primarily for Arm CPU architecture by leveraging XNNPACK and Kleidi AI library. Work is underway to specifically enable quantization on mobile accelerators for Llama 1B/3B.
33*523fa7a6SAndroid Build Coastguard Worker
34*523fa7a6SAndroid Build Coastguard Worker### Enablement
35*523fa7a6SAndroid Build Coastguard Worker
36*523fa7a6SAndroid Build Coastguard WorkerWe have successfully verified performance on the following devices: iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S24+, S22 and OnePlus 12 (featuring 16GB RAM).
37*523fa7a6SAndroid Build Coastguard Worker
38*523fa7a6SAndroid Build Coastguard WorkerNote, the Llama 3.2 3B unquantized BF16 model was only tested on the OnePlus 12, which has sufficient memory (16GB RAM) to support its size requirements.
39*523fa7a6SAndroid Build Coastguard Worker
40*523fa7a6SAndroid Build Coastguard Worker### Quantization
41*523fa7a6SAndroid Build Coastguard Worker
42*523fa7a6SAndroid Build Coastguard WorkerThe 1B/3B models are sensitive to accuracy loss when regular post-training quantization (PTQ) is applied. To achieve a balance between accuracy, performance and memory, we utilized 4-bit quantization, using [SpinQuant](https://github.com/facebookresearch/SpinQuant/tree/main) and QAT+LoRA methods.
43*523fa7a6SAndroid Build Coastguard Worker
44*523fa7a6SAndroid Build Coastguard WorkerOur quantization scheme involves three parts, applicable to both methods:
45*523fa7a6SAndroid Build Coastguard Worker
46*523fa7a6SAndroid Build Coastguard Worker- We quantize all linear layers in all transformer blocks to a 4-bit groupwise scheme (with a group size of 32) for weights and 8-bit per-token dynamic quantization for activations.
47*523fa7a6SAndroid Build Coastguard Worker- The classification layer is quantized to 8-bit per-channel for weight and 8-bit per token dynamic quantization for activation.
48*523fa7a6SAndroid Build Coastguard Worker- We employ an 8-bit per channel quantization for embedding.
49*523fa7a6SAndroid Build Coastguard Worker
50*523fa7a6SAndroid Build Coastguard WorkerWe use [torchao](https://github.com/pytorch/ao) library APIs to define these schemes.
51*523fa7a6SAndroid Build Coastguard Worker
52*523fa7a6SAndroid Build Coastguard Worker#### SpinQuant
53*523fa7a6SAndroid Build Coastguard Worker
54*523fa7a6SAndroid Build Coastguard WorkerThe SpinQuant method takes the original weights and produces optimized quantized weights with minimal outliers, resulting in higher accuracy. This can be achieved without any finetuning of the weights and only requires 100 iterations on a single A100 node.
55*523fa7a6SAndroid Build Coastguard Worker
56*523fa7a6SAndroid Build Coastguard WorkerSpinQuant can generate quantized weights that are [compatible with ExecuTorch](https://github.com/facebookresearch/SpinQuant/tree/main?tab=readme-ov-file#3-export-to-executorch), specifically, it can be integrated with the existing optimized XNNPACK kernels (e.g., group-wise 4bit weight and 8bit dynamic activation). This allows developers to benefit from the higher accuracy of SpinQuant while also taking advantage of the strong performance of ExecuTorch acceleration.
57*523fa7a6SAndroid Build Coastguard Worker
58*523fa7a6SAndroid Build Coastguard Worker#### Quantization-Aware Training and LoRA (QAT+LoRA)
59*523fa7a6SAndroid Build Coastguard Worker
60*523fa7a6SAndroid Build Coastguard WorkerQuantization-Aware Training (QAT) is employed to simulate the effects of quantization during the training of Llama-3.2 models, enabling optimization of their performance in low precision environments. To initialize QAT, BF16 Llama-3.2 model checkpoints obtained after supervised fine-tuning (SFT) are utilized and an additional full round of SFT training with QAT is performed. The backbone of the QAT model is then frozen and another round of SFT is performed with low-rank adaptation (LoRA) adaptors applied to all layers within the transformer block. Meanwhile, the LoRA adaptors' weights and activations are maintained in BF16.
61*523fa7a6SAndroid Build Coastguard Worker
62*523fa7a6SAndroid Build Coastguard Worker### Accuracy
63*523fa7a6SAndroid Build Coastguard Worker
64*523fa7a6SAndroid Build Coastguard WorkerPlease see the [Llama 3.2 model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD.md) for accuracy evalations.
65*523fa7a6SAndroid Build Coastguard Worker
66*523fa7a6SAndroid Build Coastguard Worker### Performance
67*523fa7a6SAndroid Build Coastguard Worker
68*523fa7a6SAndroid Build Coastguard WorkerLlama 3.2 1B and 3B performance was measured on Android OnePlus 12 device. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-4-run-benchmark-on-android-phone) with prompt length of 64. It is measured with KleidiAI library. KleidiAI is not enabled by default yet. Use `-DEXECUTORCH_XNNPACK_ENABLE_KLEIDI=ON` to enable it in the build.
69*523fa7a6SAndroid Build Coastguard Worker
70*523fa7a6SAndroid Build Coastguard Worker|Model  | Decode (tokens/s) | Time-to-first-token (sec) | Prefill (tokens/s) | Model size (PTE file size in MiB) | Memory size (RSS in MiB) |
71*523fa7a6SAndroid Build Coastguard Worker|-------|------------------:|--------------------------:| ------------------:|----------------------------------:| ------------------------:|
72*523fa7a6SAndroid Build Coastguard Worker|1B BF16 (baseline) | 19.2 |  1.0 | 60.3  | 2,358 | 3,185 |
73*523fa7a6SAndroid Build Coastguard Worker|1B SpinQuant | 50.2 (2.6x) | 0.3 (-76.9%) | 260.5 (4.3x) | 1,083 (-54.1%)  | 1,921 (-39.7%) |
74*523fa7a6SAndroid Build Coastguard Worker|1B QAT+LoRA | 45.8 (2.4x) | 0.3 (-76.0%)  | 252.0 (4.2x) | 1,127 (-52.2%)  | 2,255 (-29.2%) |
75*523fa7a6SAndroid Build Coastguard Worker|3B BF16 (baseline) | 7.6  | 3.0 | 21.2 | 6,129 | 7,419 |
76*523fa7a6SAndroid Build Coastguard Worker|3B SpinQuant | 19.7 (2.6x) | 0.7 (-76.4%) | 89.7 (4.2x) | 2,435 (-60.3%) | 3,726 (-49.8%) |
77*523fa7a6SAndroid Build Coastguard Worker|3B QAT+LoRA | 18.5 (2.4x) | 0.7 (-76.1%) | 88.8 (4.2x) | 2,529 (-58.7%) | 4,060 (-45.3%) |
78*523fa7a6SAndroid Build Coastguard Worker
79*523fa7a6SAndroid Build Coastguard Worker
80*523fa7a6SAndroid Build Coastguard Worker<table>
81*523fa7a6SAndroid Build Coastguard Worker  <tr>
82*523fa7a6SAndroid Build Coastguard Worker    <td>
83*523fa7a6SAndroid Build Coastguard Worker        <img src="./Android3_2_1B_bf16.gif" width="300">
84*523fa7a6SAndroid Build Coastguard Worker        <br>
85*523fa7a6SAndroid Build Coastguard Worker        <em> Llama3.2 1B, unquantized, BF16 on Android phone. </em>
86*523fa7a6SAndroid Build Coastguard Worker    </td>
87*523fa7a6SAndroid Build Coastguard Worker    <td>
88*523fa7a6SAndroid Build Coastguard Worker      <img src="./Android3_2_3B_SpinQuant.gif" width="300">
89*523fa7a6SAndroid Build Coastguard Worker      <br>
90*523fa7a6SAndroid Build Coastguard Worker      <em>
91*523fa7a6SAndroid Build Coastguard Worker      Llama3.2 3B, 4bit quantized (SpinQuant) on Android phone
92*523fa7a6SAndroid Build Coastguard Worker      </em>
93*523fa7a6SAndroid Build Coastguard Worker    </td>
94*523fa7a6SAndroid Build Coastguard Worker  </tr>
95*523fa7a6SAndroid Build Coastguard Worker</table>
96*523fa7a6SAndroid Build Coastguard Worker
97*523fa7a6SAndroid Build Coastguard Worker## Llama 3/3.1 8B
98*523fa7a6SAndroid Build Coastguard WorkerSince Llama 3 8B model needs at least 4-bit quantization to fit even within some of the highend phones, results presented here correspond to 4-bit groupwise post-training quantized (PTQ) model.
99*523fa7a6SAndroid Build Coastguard Worker
100*523fa7a6SAndroid Build Coastguard Worker### Enablement
101*523fa7a6SAndroid Build Coastguard Worker
102*523fa7a6SAndroid Build Coastguard WorkerFor Llama 3 8B and Llama3.1 8B, we have verified so far on iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S24+ and OnePlus 12 (with 16GB RAM) by quantizing to 4bit.
103*523fa7a6SAndroid Build Coastguard Worker
104*523fa7a6SAndroid Build Coastguard Worker### Quantization
105*523fa7a6SAndroid Build Coastguard Worker
106*523fa7a6SAndroid Build Coastguard WorkerWe employed PTQ 4-bit groupwise per token dynamic quantization of all the linear layers of the model. Dynamic quantization refers to quantizating activations dynamically, such that quantization parameters for activations are calculated, from min/max range, at runtime. Here we quantized activations with 8bits (signed integer). Furthermore, weights are statically quantized. In our case weights were per-channel groupwise quantized with 4bit signed integer. Due to Llama3's vocabulary size, we had to quantize embedding lookup table as well. For these results embedding lookup table was groupwise quantized with 4-bits and group size of 32.
107*523fa7a6SAndroid Build Coastguard Worker
108*523fa7a6SAndroid Build Coastguard WorkerWe use [torchao](https://github.com/pytorch/ao) library APIs to define these schemes.
109*523fa7a6SAndroid Build Coastguard Worker
110*523fa7a6SAndroid Build Coastguard Worker### Accuracy
111*523fa7a6SAndroid Build Coastguard Worker
112*523fa7a6SAndroid Build Coastguard WorkerWe evaluated WikiText perplexity using [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness). Below are the results for two different groupsizes, with max_seq_length 2048, and limit 1000.
113*523fa7a6SAndroid Build Coastguard Worker
114*523fa7a6SAndroid Build Coastguard Worker|Model | Baseline (FP32) | Groupwise 4-bit (128) | Groupwise 4-bit (256)
115*523fa7a6SAndroid Build Coastguard Worker|--------|-----------------| ---------------------- | ---------------
116*523fa7a6SAndroid Build Coastguard Worker|Llama 3 8B | 7.9 | 9.4 | 9.7
117*523fa7a6SAndroid Build Coastguard Worker
118*523fa7a6SAndroid Build Coastguard WorkerPlease note that LM Eval reports perplexity normalized by word count instead of token count. You may see different perplexity for WikiText from other sources if they implement it differently. More details could be found [here](https://github.com/EleutherAI/lm-evaluation-harness/issues/2301).
119*523fa7a6SAndroid Build Coastguard Worker
120*523fa7a6SAndroid Build Coastguard Worker### Performance
121*523fa7a6SAndroid Build Coastguard Worker
122*523fa7a6SAndroid Build Coastguard WorkerLlama 3 8B performance was measured on the Samsung Galaxy S22, S24, and OnePlus 12 devices. The performance measurement is expressed in terms of tokens per second using an [adb binary-based approach](#step-4-run-benchmark-on-android-phone).
123*523fa7a6SAndroid Build Coastguard Worker
124*523fa7a6SAndroid Build Coastguard Worker|Device  | Groupwise 4-bit (128) | Groupwise 4-bit (256)
125*523fa7a6SAndroid Build Coastguard Worker|--------| ---------------------- | ---------------
126*523fa7a6SAndroid Build Coastguard Worker|Galaxy S22  | 7.85 tokens/second | 8.4 tokens/second |
127*523fa7a6SAndroid Build Coastguard Worker|Galaxy S24 | 10.91 tokens/second | 11.21 tokens/second |
128*523fa7a6SAndroid Build Coastguard Worker|OnePlus 12 | 10.85 tokens/second | 11.02 tokens/second |
129*523fa7a6SAndroid Build Coastguard Worker
130*523fa7a6SAndroid Build Coastguard Worker<p align="center">
131*523fa7a6SAndroid Build Coastguard Worker      <br>
132*523fa7a6SAndroid Build Coastguard Worker      <img src="./llama_via_xnnpack.gif" width=300>
133*523fa7a6SAndroid Build Coastguard Worker      <br>
134*523fa7a6SAndroid Build Coastguard Worker      <em>
135*523fa7a6SAndroid Build Coastguard Worker      Llama3.1 8B, 4bit quantized on Android phone
136*523fa7a6SAndroid Build Coastguard Worker      </em>
137*523fa7a6SAndroid Build Coastguard Worker</p>
138*523fa7a6SAndroid Build Coastguard Worker
139*523fa7a6SAndroid Build Coastguard Worker[Please visit this section to try it on non-CPU backend, including CoreML, MPS, Qualcomm HTP or MediaTek](non_cpu_backends.md).
140*523fa7a6SAndroid Build Coastguard Worker
141*523fa7a6SAndroid Build Coastguard Worker# Instructions
142*523fa7a6SAndroid Build Coastguard Worker
143*523fa7a6SAndroid Build Coastguard Worker## Tested on
144*523fa7a6SAndroid Build Coastguard Worker
145*523fa7a6SAndroid Build Coastguard Worker- MacOS M1/M2, Linux.
146*523fa7a6SAndroid Build Coastguard Worker- For Llama 3 8B, your device may require at least 32GB RAM. If this is a constraint for you, please try the [smaller stories model](./UTILS.md).
147*523fa7a6SAndroid Build Coastguard Worker
148*523fa7a6SAndroid Build Coastguard Worker## Step 1: Setup
149*523fa7a6SAndroid Build Coastguard Worker> :warning: **double check your python environment**: make sure `conda activate <VENV>` is run before all the bash and python scripts.
150*523fa7a6SAndroid Build Coastguard Worker
151*523fa7a6SAndroid Build Coastguard Worker1. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch. For installation run `./install_requirements.sh --pybind xnnpack`
152*523fa7a6SAndroid Build Coastguard Worker2. Run `examples/models/llama/install_requirements.sh` to install a few dependencies.
153*523fa7a6SAndroid Build Coastguard Worker
154*523fa7a6SAndroid Build Coastguard Worker
155*523fa7a6SAndroid Build Coastguard Worker## Step 2: Prepare model
156*523fa7a6SAndroid Build Coastguard Worker
157*523fa7a6SAndroid Build Coastguard Worker### Option A: Download and export Llama3.2 1B/3B model.
158*523fa7a6SAndroid Build Coastguard Worker
159*523fa7a6SAndroid Build Coastguard Worker1. Download `consolidated.00.pth`, `params.json` and `tokenizer.model` from [Llama website](https://www.llama.com/llama-downloads/) or [Hugging Face](https://huggingface.co/meta-llama/Llama-3.2-1B). For chat use-cases, download the instruct models.
160*523fa7a6SAndroid Build Coastguard Worker
161*523fa7a6SAndroid Build Coastguard Worker2. Export model and generate `.pte` file.
162*523fa7a6SAndroid Build Coastguard Worker
163*523fa7a6SAndroid Build Coastguard Worker- Use **original BF16** version, without any quantization.
164*523fa7a6SAndroid Build Coastguard Worker```
165*523fa7a6SAndroid Build Coastguard Worker# No quantization
166*523fa7a6SAndroid Build Coastguard Worker# Set these paths to point to the downloaded files
167*523fa7a6SAndroid Build Coastguard WorkerLLAMA_CHECKPOINT=path/to/checkpoint.pth
168*523fa7a6SAndroid Build Coastguard WorkerLLAMA_PARAMS=path/to/params.json
169*523fa7a6SAndroid Build Coastguard Worker
170*523fa7a6SAndroid Build Coastguard Workerpython -m examples.models.llama.export_llama \
171*523fa7a6SAndroid Build Coastguard Worker  --checkpoint "${LLAMA_CHECKPOINT:?}" \
172*523fa7a6SAndroid Build Coastguard Worker  --params "${LLAMA_PARAMS:?}" \
173*523fa7a6SAndroid Build Coastguard Worker  -kv \
174*523fa7a6SAndroid Build Coastguard Worker  --use_sdpa_with_kv_cache \
175*523fa7a6SAndroid Build Coastguard Worker  -X \
176*523fa7a6SAndroid Build Coastguard Worker  -d bf16 \
177*523fa7a6SAndroid Build Coastguard Worker  --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
178*523fa7a6SAndroid Build Coastguard Worker  --output_name="llama3_2.pte"
179*523fa7a6SAndroid Build Coastguard Worker```
180*523fa7a6SAndroid Build Coastguard Worker
181*523fa7a6SAndroid Build Coastguard Worker- To use **SpinQuant**, here are two ways:
182*523fa7a6SAndroid Build Coastguard Worker    - Download directly from [Llama website](https://www.llama.com/llama-downloads). The model weights are prequantized and can be exported to `pte` file directly.
183*523fa7a6SAndroid Build Coastguard Worker    - Follow its [instruction](https://github.com/facebookresearch/SpinQuant/tree/main?tab=readme-ov-file#3-export-to-executorch) for exporting checkpoint to ExecuTorch and then export the SpinQuant checkpoint.
184*523fa7a6SAndroid Build Coastguard Worker
185*523fa7a6SAndroid Build Coastguard Worker```
186*523fa7a6SAndroid Build Coastguard Worker# SpinQuant
187*523fa7a6SAndroid Build Coastguard Worker# Set these paths to point to the exported files
188*523fa7a6SAndroid Build Coastguard WorkerLLAMA_QUANTIZED_CHECKPOINT=path/to/spinquant/checkpoint.pth
189*523fa7a6SAndroid Build Coastguard WorkerLLAMA_PARAMS=path/to/spinquant/params.json
190*523fa7a6SAndroid Build Coastguard Worker
191*523fa7a6SAndroid Build Coastguard Workerpython -m examples.models.llama.export_llama \
192*523fa7a6SAndroid Build Coastguard Worker   --checkpoint "${LLAMA_QUANTIZED_CHECKPOINT:?}" \
193*523fa7a6SAndroid Build Coastguard Worker   --params "${LLAMA_PARAMS:?}" \
194*523fa7a6SAndroid Build Coastguard Worker   --use_sdpa_with_kv_cache \
195*523fa7a6SAndroid Build Coastguard Worker   -X \
196*523fa7a6SAndroid Build Coastguard Worker   --xnnpack-extended-ops \
197*523fa7a6SAndroid Build Coastguard Worker   --preq_mode 8da4w_output_8da8w \
198*523fa7a6SAndroid Build Coastguard Worker   --preq_group_size 32 \
199*523fa7a6SAndroid Build Coastguard Worker   --max_seq_length 2048 \
200*523fa7a6SAndroid Build Coastguard Worker   --output_name "llama3_2.pte" \
201*523fa7a6SAndroid Build Coastguard Worker   -kv \
202*523fa7a6SAndroid Build Coastguard Worker   -d fp32 \
203*523fa7a6SAndroid Build Coastguard Worker   --preq_embedding_quantize 8,0 \
204*523fa7a6SAndroid Build Coastguard Worker   --use_spin_quant native \
205*523fa7a6SAndroid Build Coastguard Worker   --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
206*523fa7a6SAndroid Build Coastguard Worker```
207*523fa7a6SAndroid Build Coastguard Worker
208*523fa7a6SAndroid Build Coastguard Worker- To use **QAT+LoRA**, download directly from [Llama website](https://www.llama.com/llama-downloads). The model weights are prequantized and can be exported to `pte` file directly by:
209*523fa7a6SAndroid Build Coastguard Worker
210*523fa7a6SAndroid Build Coastguard Worker```
211*523fa7a6SAndroid Build Coastguard Worker# QAT+LoRA
212*523fa7a6SAndroid Build Coastguard Worker# Set these paths to point to the exported files
213*523fa7a6SAndroid Build Coastguard WorkerLLAMA_QUANTIZED_CHECKPOINT=path/to/qlora/checkpoint.pth
214*523fa7a6SAndroid Build Coastguard WorkerLLAMA_PARAMS=path/to/qlora/params.json
215*523fa7a6SAndroid Build Coastguard Worker
216*523fa7a6SAndroid Build Coastguard Workerpython -m examples.models.llama.export_llama \
217*523fa7a6SAndroid Build Coastguard Worker   --checkpoint "${LLAMA_QUANTIZED_CHECKPOINT:?}" \
218*523fa7a6SAndroid Build Coastguard Worker   --params "${LLAMA_PARAMS:?}" \
219*523fa7a6SAndroid Build Coastguard Worker   -qat \
220*523fa7a6SAndroid Build Coastguard Worker   -lora 16 \
221*523fa7a6SAndroid Build Coastguard Worker   --preq_mode 8da4w_output_8da8w \
222*523fa7a6SAndroid Build Coastguard Worker   --preq_group_size 32 \
223*523fa7a6SAndroid Build Coastguard Worker   --preq_embedding_quantize 8,0 \
224*523fa7a6SAndroid Build Coastguard Worker   --use_sdpa_with_kv_cache \
225*523fa7a6SAndroid Build Coastguard Worker   -kv \
226*523fa7a6SAndroid Build Coastguard Worker   -X \
227*523fa7a6SAndroid Build Coastguard Worker   --xnnpack-extended-ops \
228*523fa7a6SAndroid Build Coastguard Worker   -d fp32 \
229*523fa7a6SAndroid Build Coastguard Worker   --max_seq_length 2048 \
230*523fa7a6SAndroid Build Coastguard Worker   --output_name "llama3_2.pte" \
231*523fa7a6SAndroid Build Coastguard Worker   --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}'
232*523fa7a6SAndroid Build Coastguard Worker```
233*523fa7a6SAndroid Build Coastguard Worker
234*523fa7a6SAndroid Build Coastguard Worker### Option B: Download and export Llama 3 8B instruct model
235*523fa7a6SAndroid Build Coastguard Worker
236*523fa7a6SAndroid Build Coastguard WorkerYou can export and run the original Llama 3 8B instruct model.
237*523fa7a6SAndroid Build Coastguard Worker
238*523fa7a6SAndroid Build Coastguard Worker1. Llama 3 pretrained parameters can be downloaded from [Meta's official Llama 3 repository](https://github.com/meta-llama/llama3/).
239*523fa7a6SAndroid Build Coastguard Worker
240*523fa7a6SAndroid Build Coastguard Worker2. Export model and generate `.pte` file
241*523fa7a6SAndroid Build Coastguard Worker    ```
242*523fa7a6SAndroid Build Coastguard Worker    python -m examples.models.llama.export_llama \
243*523fa7a6SAndroid Build Coastguard Worker	    --checkpoint <consolidated.00.pth> \
244*523fa7a6SAndroid Build Coastguard Worker		-p <params.json> \
245*523fa7a6SAndroid Build Coastguard Worker		-kv \
246*523fa7a6SAndroid Build Coastguard Worker		--use_sdpa_with_kv_cache \
247*523fa7a6SAndroid Build Coastguard Worker		-X \
248*523fa7a6SAndroid Build Coastguard Worker		-qmode 8da4w \
249*523fa7a6SAndroid Build Coastguard Worker		--group_size 128 \
250*523fa7a6SAndroid Build Coastguard Worker		-d fp32 \
251*523fa7a6SAndroid Build Coastguard Worker		--metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' \
252*523fa7a6SAndroid Build Coastguard Worker		--embedding-quantize 4,32 \
253*523fa7a6SAndroid Build Coastguard Worker		--output_name="llama3_kv_sdpa_xnn_qe_4_32.pte"
254*523fa7a6SAndroid Build Coastguard Worker    ```
255*523fa7a6SAndroid Build Coastguard Worker    Due to the larger vocabulary size of Llama 3, we recommend quantizing the embeddings with `--embedding-quantize 4,32` as shown above to further reduce the model size.
256*523fa7a6SAndroid Build Coastguard Worker
257*523fa7a6SAndroid Build Coastguard Worker
258*523fa7a6SAndroid Build Coastguard Worker    If you're interested in deploying on non-CPU backends, [please refer the non-cpu-backend section](non_cpu_backends.md)
259*523fa7a6SAndroid Build Coastguard Worker
260*523fa7a6SAndroid Build Coastguard Worker## Step 3: Run on your computer to validate
261*523fa7a6SAndroid Build Coastguard Worker
262*523fa7a6SAndroid Build Coastguard Worker1. Build executorch with optimized CPU performance as follows. Build options available [here](https://github.com/pytorch/executorch/blob/main/CMakeLists.txt#L59).
263*523fa7a6SAndroid Build Coastguard Worker    ```
264*523fa7a6SAndroid Build Coastguard Worker    cmake -DPYTHON_EXECUTABLE=python \
265*523fa7a6SAndroid Build Coastguard Worker        -DCMAKE_INSTALL_PREFIX=cmake-out \
266*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_ENABLE_LOGGING=1 \
267*523fa7a6SAndroid Build Coastguard Worker        -DCMAKE_BUILD_TYPE=Release \
268*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
269*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
270*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
271*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_XNNPACK=ON \
272*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
273*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
274*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
275*523fa7a6SAndroid Build Coastguard Worker        -Bcmake-out .
276*523fa7a6SAndroid Build Coastguard Worker
277*523fa7a6SAndroid Build Coastguard Worker    cmake --build cmake-out -j16 --target install --config Release
278*523fa7a6SAndroid Build Coastguard Worker    ```
279*523fa7a6SAndroid Build Coastguard WorkerNote for Mac users: There's a known linking issue with Xcode 15.1. Refer to the section of Common Issues and Mitigations below for solutions.
280*523fa7a6SAndroid Build Coastguard Worker
281*523fa7a6SAndroid Build Coastguard Worker2. Build llama runner.
282*523fa7a6SAndroid Build Coastguard Worker    ```
283*523fa7a6SAndroid Build Coastguard Worker    cmake -DPYTHON_EXECUTABLE=python \
284*523fa7a6SAndroid Build Coastguard Worker        -DCMAKE_INSTALL_PREFIX=cmake-out \
285*523fa7a6SAndroid Build Coastguard Worker        -DCMAKE_BUILD_TYPE=Release \
286*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
287*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
288*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_XNNPACK=ON \
289*523fa7a6SAndroid Build Coastguard Worker        -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
290*523fa7a6SAndroid Build Coastguard Worker        -Bcmake-out/examples/models/llama \
291*523fa7a6SAndroid Build Coastguard Worker        examples/models/llama
292*523fa7a6SAndroid Build Coastguard Worker
293*523fa7a6SAndroid Build Coastguard Worker    cmake --build cmake-out/examples/models/llama -j16 --config Release
294*523fa7a6SAndroid Build Coastguard Worker    ```
295*523fa7a6SAndroid Build Coastguard Worker
296*523fa7a6SAndroid Build Coastguard Worker3. Run model. Run options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/llama/main.cpp#L18-L40).
297*523fa7a6SAndroid Build Coastguard Worker    ```
298*523fa7a6SAndroid Build Coastguard Worker    cmake-out/examples/models/llama/llama_main --model_path=<model pte file> --tokenizer_path=<tokenizer.model> --prompt=<prompt>
299*523fa7a6SAndroid Build Coastguard Worker    ```
300*523fa7a6SAndroid Build Coastguard Worker
301*523fa7a6SAndroid Build Coastguard WorkerTo build for CoreML backend and validate on Mac, replace `-DEXECUTORCH_BUILD_XNNPACK=ON` with `-DEXECUTORCH_BUILD_COREML=ON`
302*523fa7a6SAndroid Build Coastguard Worker
303*523fa7a6SAndroid Build Coastguard Worker## Step 4: Run benchmark on Android phone
304*523fa7a6SAndroid Build Coastguard Worker
305*523fa7a6SAndroid Build Coastguard Worker**1. Build llama runner binary for Android**
306*523fa7a6SAndroid Build Coastguard Worker
307*523fa7a6SAndroid Build Coastguard Worker*Pre-requisite*: Android NDK (tested with r27b) which can be downloaded from [here](https://developer.android.com/ndk/downloads). Note that the mac binary can be unpackaged and you can locate NDK folder from it.
308*523fa7a6SAndroid Build Coastguard Worker
309*523fa7a6SAndroid Build Coastguard Worker**1.1 Set Android NDK**
310*523fa7a6SAndroid Build Coastguard Worker```
311*523fa7a6SAndroid Build Coastguard Workerexport ANDROID_NDK=<path-to-android-ndk>
312*523fa7a6SAndroid Build Coastguard Worker```
313*523fa7a6SAndroid Build Coastguard Worker**1.2 Build executorch and associated libraries for android.**
314*523fa7a6SAndroid Build Coastguard Worker```
315*523fa7a6SAndroid Build Coastguard Workercmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
316*523fa7a6SAndroid Build Coastguard Worker    -DANDROID_ABI=arm64-v8a \
317*523fa7a6SAndroid Build Coastguard Worker    -DANDROID_PLATFORM=android-23 \
318*523fa7a6SAndroid Build Coastguard Worker    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
319*523fa7a6SAndroid Build Coastguard Worker    -DCMAKE_BUILD_TYPE=Release \
320*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
321*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
322*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
323*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_ENABLE_LOGGING=1 \
324*523fa7a6SAndroid Build Coastguard Worker    -DPYTHON_EXECUTABLE=python \
325*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_XNNPACK=ON \
326*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
327*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
328*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
329*523fa7a6SAndroid Build Coastguard Worker    -Bcmake-out-android .
330*523fa7a6SAndroid Build Coastguard Worker
331*523fa7a6SAndroid Build Coastguard Workercmake --build cmake-out-android -j16 --target install --config Release
332*523fa7a6SAndroid Build Coastguard Worker```
333*523fa7a6SAndroid Build Coastguard Worker
334*523fa7a6SAndroid Build Coastguard Worker**1.2 Build llama runner for android**
335*523fa7a6SAndroid Build Coastguard Worker```
336*523fa7a6SAndroid Build Coastguard Workercmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
337*523fa7a6SAndroid Build Coastguard Worker    -DANDROID_ABI=arm64-v8a \
338*523fa7a6SAndroid Build Coastguard Worker    -DANDROID_PLATFORM=android-23 \
339*523fa7a6SAndroid Build Coastguard Worker    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
340*523fa7a6SAndroid Build Coastguard Worker    -DCMAKE_BUILD_TYPE=Release \
341*523fa7a6SAndroid Build Coastguard Worker    -DPYTHON_EXECUTABLE=python \
342*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_XNNPACK=ON \
343*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
344*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
345*523fa7a6SAndroid Build Coastguard Worker    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
346*523fa7a6SAndroid Build Coastguard Worker    -Bcmake-out-android/examples/models/llama \
347*523fa7a6SAndroid Build Coastguard Worker    examples/models/llama
348*523fa7a6SAndroid Build Coastguard Worker
349*523fa7a6SAndroid Build Coastguard Workercmake --build cmake-out-android/examples/models/llama -j16 --config Release
350*523fa7a6SAndroid Build Coastguard Worker```
351*523fa7a6SAndroid Build Coastguard Worker
352*523fa7a6SAndroid Build Coastguard Worker**2. Run on Android via adb shell**
353*523fa7a6SAndroid Build Coastguard Worker
354*523fa7a6SAndroid Build Coastguard Worker*Pre-requisite*: Make sure you enable USB debugging via developer options on your phone
355*523fa7a6SAndroid Build Coastguard Worker
356*523fa7a6SAndroid Build Coastguard Worker**2.1 Connect your android phone**
357*523fa7a6SAndroid Build Coastguard Worker
358*523fa7a6SAndroid Build Coastguard Worker**2.2 Upload model, tokenizer and llama runner binary to phone**
359*523fa7a6SAndroid Build Coastguard Worker```
360*523fa7a6SAndroid Build Coastguard Workeradb shell mkdir -p /data/local/tmp/llama
361*523fa7a6SAndroid Build Coastguard Workeradb push <model.pte> /data/local/tmp/llama/
362*523fa7a6SAndroid Build Coastguard Workeradb push <tokenizer.model> /data/local/tmp/llama/
363*523fa7a6SAndroid Build Coastguard Workeradb push cmake-out-android/examples/models/llama/llama_main /data/local/tmp/llama/
364*523fa7a6SAndroid Build Coastguard Worker```
365*523fa7a6SAndroid Build Coastguard Worker
366*523fa7a6SAndroid Build Coastguard Worker**2.3 Run model**
367*523fa7a6SAndroid Build Coastguard Worker```
368*523fa7a6SAndroid Build Coastguard Workeradb shell "cd /data/local/tmp/llama && ./llama_main --model_path <model.pte> --tokenizer_path <tokenizer.model> --prompt \"What is the capital of France?\" --seq_len 120" --warmup=1
369*523fa7a6SAndroid Build Coastguard Worker```
370*523fa7a6SAndroid Build Coastguard Worker## Step 6: Build Mobile apps
371*523fa7a6SAndroid Build Coastguard Worker
372*523fa7a6SAndroid Build Coastguard Worker### iOS
373*523fa7a6SAndroid Build Coastguard Worker
374*523fa7a6SAndroid Build Coastguard WorkerPlease refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-ios.html) to for full instructions on building the iOS LLAMA Demo App. Rename `tokenizer.model` file to `tokenizer.bin` because the demo app looks for the tokenizer file with .bin extension.
375*523fa7a6SAndroid Build Coastguard Worker
376*523fa7a6SAndroid Build Coastguard Worker### Android
377*523fa7a6SAndroid Build Coastguard WorkerPlease refer to [this tutorial](https://pytorch.org/executorch/main/llm/llama-demo-android.html) to for full instructions on building the Android LLAMA Demo App.
378*523fa7a6SAndroid Build Coastguard Worker
379*523fa7a6SAndroid Build Coastguard Worker
380*523fa7a6SAndroid Build Coastguard Worker## Utility tools for Llama enablement
381*523fa7a6SAndroid Build Coastguard Worker
382*523fa7a6SAndroid Build Coastguard Worker### Evaluate model accuracy
383*523fa7a6SAndroid Build Coastguard Worker
384*523fa7a6SAndroid Build Coastguard Worker> Forewarning: Model evaluation without a GPU may take a long time, especially on larger models.
385*523fa7a6SAndroid Build Coastguard Worker
386*523fa7a6SAndroid Build Coastguard WorkerWe use [LM Eval](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate model accuracy.
387*523fa7a6SAndroid Build Coastguard Worker
388*523fa7a6SAndroid Build Coastguard WorkerFor base models, use the following example command to calculate its perplexity based on WikiText.
389*523fa7a6SAndroid Build Coastguard Worker```
390*523fa7a6SAndroid Build Coastguard Workerpython -m examples.models.llama.eval_llama \
391*523fa7a6SAndroid Build Coastguard Worker	-c <checkpoint.pth> \
392*523fa7a6SAndroid Build Coastguard Worker	-p <params.json> \
393*523fa7a6SAndroid Build Coastguard Worker	-t <tokenizer.model/bin> \
394*523fa7a6SAndroid Build Coastguard Worker	-kv \
395*523fa7a6SAndroid Build Coastguard Worker	-d <checkpoint dtype> \
396*523fa7a6SAndroid Build Coastguard Worker	--max_seq_len <max sequence length> \
397*523fa7a6SAndroid Build Coastguard Worker	--limit <number of samples>
398*523fa7a6SAndroid Build Coastguard Worker```
399*523fa7a6SAndroid Build Coastguard Worker
400*523fa7a6SAndroid Build Coastguard WorkerFor instruct models, use the following example command to calculate its MMLU score.
401*523fa7a6SAndroid Build Coastguard Worker```
402*523fa7a6SAndroid Build Coastguard Workerpython -m examples.models.llama.eval_llama \
403*523fa7a6SAndroid Build Coastguard Worker	-c <checkpoint.pth> \
404*523fa7a6SAndroid Build Coastguard Worker	-p <params.json> \
405*523fa7a6SAndroid Build Coastguard Worker	-t <tokenizer.model/bin> \
406*523fa7a6SAndroid Build Coastguard Worker	-kv \
407*523fa7a6SAndroid Build Coastguard Worker	-d <checkpoint dtype> \
408*523fa7a6SAndroid Build Coastguard Worker	--tasks mmlu \
409*523fa7a6SAndroid Build Coastguard Worker	--num_fewshot 5 \
410*523fa7a6SAndroid Build Coastguard Worker	--max_seq_len <max sequence length>
411*523fa7a6SAndroid Build Coastguard Worker```
412*523fa7a6SAndroid Build Coastguard Worker
413*523fa7a6SAndroid Build Coastguard WorkerSee [Llama utils page](./UTILS.md) page for more advanced use-cases such as fine-tuning and running smaller models for educational purposes, and quick iteration and verification.
414*523fa7a6SAndroid Build Coastguard Worker
415*523fa7a6SAndroid Build Coastguard Worker# What is coming next?
416*523fa7a6SAndroid Build Coastguard Worker## Quantization
417*523fa7a6SAndroid Build Coastguard Worker- Enabling FP16 model to leverage smaller groupsize for 4-bit quantization.
418*523fa7a6SAndroid Build Coastguard Worker- Enabling GPTQ for 4-bit groupwise quantization
419*523fa7a6SAndroid Build Coastguard Worker- Enabling custom quantization
420*523fa7a6SAndroid Build Coastguard Worker- Lower bit quantization
421*523fa7a6SAndroid Build Coastguard Worker## Models
422*523fa7a6SAndroid Build Coastguard Worker- Enabling more generative AI models and architectures.
423*523fa7a6SAndroid Build Coastguard Worker## Performance
424*523fa7a6SAndroid Build Coastguard Worker- Performance improvement via techniques such as speculative decoding
425*523fa7a6SAndroid Build Coastguard Worker- Enabling LLama and other architectures via Vulkan
426*523fa7a6SAndroid Build Coastguard Worker- Enabling performant execution of widely used quantization schemes.
427*523fa7a6SAndroid Build Coastguard Worker
428*523fa7a6SAndroid Build Coastguard Worker# Notes
429*523fa7a6SAndroid Build Coastguard WorkerThis example tries to reuse the Python code, with minimal modifications to make it compatible with current ExecuTorch:
430*523fa7a6SAndroid Build Coastguard Worker1. Since ExecuTorch does not support complex Tensor data type, use the customized functions to have rotary embedding with real numbers. Please see [GitHub issue: Support complex data type in ExecuTorch](https://github.com/pytorch/executorch/issues/886).
431*523fa7a6SAndroid Build Coastguard Worker2. No CUDA. ExecuTorch is focused on Edge use cases where CUDA is not available on most of the edge devices.
432*523fa7a6SAndroid Build Coastguard Worker3. No dependencies on fairscale. The ColumnParallelLinear, ParallelEmbedding and training are not needed and supported in ExecuTorch.
433*523fa7a6SAndroid Build Coastguard Worker
434*523fa7a6SAndroid Build Coastguard Worker
435*523fa7a6SAndroid Build Coastguard Worker# Common Issues and Mitigations:
436*523fa7a6SAndroid Build Coastguard Worker- To clean your build:
437*523fa7a6SAndroid Build Coastguard Worker```
438*523fa7a6SAndroid Build Coastguard Workergit clean -xfd
439*523fa7a6SAndroid Build Coastguard Workerpip uninstall executorch
440*523fa7a6SAndroid Build Coastguard Worker./install_requirements.sh --pybind xnnpack
441*523fa7a6SAndroid Build Coastguard Worker
442*523fa7a6SAndroid Build Coastguard Workerrm -rf cmake-out
443*523fa7a6SAndroid Build Coastguard Worker```
444*523fa7a6SAndroid Build Coastguard Worker- If you encounter `pthread` related issues during link time, add `pthread` in `target_link_libraries` in `CMakeLists.txt`
445*523fa7a6SAndroid Build Coastguard Worker- On Mac, if there is linking error in Step 4 with error message like
446*523fa7a6SAndroid Build Coastguard Worker```
447*523fa7a6SAndroid Build Coastguard Worker0  0x100823648  __assert_rtn + 72
448*523fa7a6SAndroid Build Coastguard Worker1  0x10074bc5c  ld::Fixup::applyFixup(ld::Atom const*, ld::LayoutLinkedImage const&, unsigned char*) const + 8268
449*523fa7a6SAndroid Build Coastguard Worker2  0x1007de7d8  ___ZN2ld16LayoutExecutable27writeContentWithoutLinkEditENSt3__14spanIhLm18446744073709551615EEEy_block_invoke + 332
450*523fa7a6SAndroid Build Coastguard Worker3  0x188cca428  _dispatch_client_callout2 + 20
451*523fa7a6SAndroid Build Coastguard Worker4  0x188cde850  _dispatch_apply_invoke3 + 336
452*523fa7a6SAndroid Build Coastguard Worker5  0x188cca3e8  _dispatch_client_callout + 20
453*523fa7a6SAndroid Build Coastguard Worker6  0x188ccbc68  _dispatch_once_callout + 32
454*523fa7a6SAndroid Build Coastguard Worker7  0x188cdeeec  _dispatch_apply_invoke_and_wait + 372
455*523fa7a6SAndroid Build Coastguard Worker8  0x188cdde9c  _dispatch_apply_with_attr_f + 1212
456*523fa7a6SAndroid Build Coastguard Worker9  0x188cde08c  dispatch_apply + 96
457*523fa7a6SAndroid Build Coastguard Worker10  0x1007de9e4  void mapReduce<ld::Atom const*, mach_o::Error>(std::__1::span<ld::Atom const*, 18446744073709551615ul>, unsigned long, void (unsigned long, mach_o::Error&, std::__1::span<ld::Atom const*, 18446744073709551615ul>) block_pointer, void (std::__1::span<mach_o::Error, 18446744073709551615ul>) block_pointer) + 336
458*523fa7a6SAndroid Build Coastguard Worker11  0x1007de594  ld::LayoutExecutable::writeContentWithoutLinkEdit(std::__1::span<unsigned char, 18446744073709551615ul>, unsigned long long) + 1180
459*523fa7a6SAndroid Build Coastguard Worker12  0x1007e4020  ld::LayoutExecutable::writeToFile(char const*) + 15248
460*523fa7a6SAndroid Build Coastguard Worker13  0x1007962e8  main + 9424
461*523fa7a6SAndroid Build Coastguard Workerld: Assertion failed: (extras.otherInstrOffset != 0 && "Kind::arm64_adrp_ldr missing extra info"), function applyFixup, file Fixup.cpp, line 793.
462*523fa7a6SAndroid Build Coastguard Workerclang: error: linker command failed with exit code 1 (use -v to see invocation)
463*523fa7a6SAndroid Build Coastguard Worker```
464*523fa7a6SAndroid Build Coastguard WorkerIt's a known issue for Xcode version 15.1.
465*523fa7a6SAndroid Build Coastguard WorkerMitigation: update to most recent Xcode version, clean and rebuild.
466