Name Date Size #Lines LOC

..--

CMakeLists.txtH A D25-Apr-20251.5 KiB5447

README.mdH A D25-Apr-20252.7 KiB6763

__init__.pyH A D25-Apr-2025271 124

eager.pyH A D25-Apr-20253.2 KiB11985

export_phi-3-mini.pyH A D25-Apr-20253.7 KiB12193

install_requirements.shH A D25-Apr-2025299 154

main.cppH A D25-Apr-20251.2 KiB5127

phi_3_mini.pyH A D25-Apr-20251.4 KiB4224

runner.cppH A D25-Apr-20253.1 KiB10276

runner.hH A D25-Apr-20251.5 KiB5124

static_cache.pyH A D25-Apr-20251.3 KiB4429

README.md

1# Summary
2This example demonstrates how to run a [Phi-3-mini](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct) 3.8B model via ExecuTorch. We use XNNPACK to accelarate the performance and XNNPACK symmetric per channel quantization.
3
4# Instructions
5## Step 1: Setup
61. Follow the [tutorial](https://pytorch.org/executorch/main/getting-started-setup) to set up ExecuTorch. For installation run `./install_requirements.sh --pybind xnnpack`
72. Currently, we support transformers v4.44.2. Install transformers with the following command:
8```
9pip uninstall -y transformers ; pip install transformers==4.44.2
10```
11## Step 2: Prepare and run the model
121. Download the `tokenizer.model` from HuggingFace and create `tokenizer.bin`.
13```
14cd executorch
15wget -O tokenizer.model "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/tokenizer.model?download=true"
16python -m extension.llm.tokenizer.tokenizer -t tokenizer.model -o tokenizer.bin
17```
182. Export the model. This step will take a few minutes to finish.
19```
20python -m examples.models.phi-3-mini.export_phi-3-mini -c "4k" -s 128 -o phi-3-mini.pte
21```
223. Build and run the model.
23- Build executorch with optimized CPU performance as follows. Build options available [here](https://github.com/pytorch/executorch/blob/main/CMakeLists.txt#L59).
24 ```
25 cmake -DPYTHON_EXECUTABLE=python \
26     -DCMAKE_INSTALL_PREFIX=cmake-out \
27     -DEXECUTORCH_ENABLE_LOGGING=1 \
28     -DCMAKE_BUILD_TYPE=Release \
29     -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
30     -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
31     -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
32     -DEXECUTORCH_BUILD_XNNPACK=ON \
33     -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
34     -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
35     -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
36     -Bcmake-out .
37
38 cmake --build cmake-out -j16 --target install --config Release
39 ```
40- Build Phi-3-mini runner.
41```
42cmake -DPYTHON_EXECUTABLE=python \
43    -DCMAKE_INSTALL_PREFIX=cmake-out \
44    -DCMAKE_BUILD_TYPE=Release \
45    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
46    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
47    -DEXECUTORCH_BUILD_XNNPACK=ON \
48    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
49    -Bcmake-out/examples/models/phi-3-mini \
50    examples/models/phi-3-mini
51
52cmake --build cmake-out/examples/models/phi-3-mini -j16 --config Release
53```
54- Run model. Options available [here](https://github.com/pytorch/executorch/blob/main/examples/models/phi-3-mini/main.cpp#L13-L30)
55```
56cmake-out/examples/models/phi-3-mini/phi_3_mini_runner \
57    --model_path=phi-3-mini.pte \
58    --tokenizer_path=tokenizer.bin \
59    --seq_len=128 \
60    --temperature=0 \
61    --prompt="<|system|>
62You are a helpful assistant.<|end|>
63<|user|>
64What is the capital of France?<|end|>
65<|assistant|>"
66```
67