xref: /aosp_15_r20/external/executorch/docs/source/extension-module.md (revision 523fa7a60841cd1ecfb9cc4201f1ca8b03ed023a)
1# Running an ExecuTorch Model Using the Module Extension in C++
2
3**Author:** [Anthony Shoumikhin](https://github.com/shoumikhin)
4
5In the [Running an ExecuTorch Model in C++ Tutorial](running-a-model-cpp-tutorial.md), we explored the lower-level ExecuTorch APIs for running an exported model. While these APIs offer zero overhead, great flexibility, and control, they can be verbose and complex for regular use. To simplify this and resemble PyTorch's eager mode in Python, we introduce the `Module` facade APIs over the regular ExecuTorch runtime APIs. The `Module` APIs provide the same flexibility but default to commonly used components like `DataLoader` and `MemoryAllocator`, hiding most intricate details.
6
7## Example
8
9Let's see how we can run the `SimpleConv` model generated from the [Exporting to ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial) using the `Module` and [`TensorPtr`](extension-tensor.md) APIs:
10
11```cpp
12#include <executorch/extension/module/module.h>
13#include <executorch/extension/tensor/tensor.h>
14
15using namespace ::executorch::extension;
16
17// Create a Module.
18Module module("/path/to/model.pte");
19
20// Wrap the input data with a Tensor.
21float input[1 * 3 * 256 * 256];
22auto tensor = from_blob(input, {1, 3, 256, 256});
23
24// Perform an inference.
25const auto result = module.forward(tensor);
26
27// Check for success or failure.
28if (result.ok()) {
29  // Retrieve the output data.
30  const auto output = result->at(0).toTensor().const_data_ptr<float>();
31}
32```
33
34The code now boils down to creating a `Module` and calling `forward()` on it, with no additional setup. Let's take a closer look at these and other `Module` APIs to better understand the internal workings.
35
36## APIs
37
38### Creating a Module
39
40Creating a `Module` object is a fast operation that does not involve significant processing time or memory allocation. The actual loading of a `Program` and a `Method` happens lazily on the first inference unless explicitly requested with a dedicated API.
41
42```cpp
43Module module("/path/to/model.pte");
44```
45
46### Force-Loading a Method
47
48To force-load the `Module` (and thus the underlying ExecuTorch `Program`) at any time, use the `load()` function:
49
50```cpp
51const auto error = module.load();
52
53assert(module.is_loaded());
54```
55
56To force-load a particular `Method`, call the `load_method()` function:
57
58```cpp
59const auto error = module.load_method("forward");
60
61assert(module.is_method_loaded("forward"));
62```
63
64You can also use the convenience function to load the `forward` method:
65
66```cpp
67const auto error = module.load_forward();
68
69assert(module.is_method_loaded("forward"));
70```
71
72**Note:** The `Program` is loaded automatically before any `Method` is loaded. Subsequent attempts to load them have no effect if a previous attempt was successful.
73
74### Querying for Metadata
75
76Get a set of method names that a `Module` contains using the `method_names()` function:
77
78```cpp
79const auto method_names = module.method_names();
80
81if (method_names.ok()) {
82  assert(method_names->count("forward"));
83}
84```
85
86**Note:** `method_names()` will force-load the `Program` when called for the first time.
87
88To introspect miscellaneous metadata about a particular method, use the `method_meta()` function, which returns a `MethodMeta` struct:
89
90```cpp
91const auto method_meta = module.method_meta("forward");
92
93if (method_meta.ok()) {
94  assert(method_meta->name() == "forward");
95  assert(method_meta->num_inputs() > 1);
96
97  const auto input_meta = method_meta->input_tensor_meta(0);
98  if (input_meta.ok()) {
99    assert(input_meta->scalar_type() == ScalarType::Float);
100  }
101
102  const auto output_meta = method_meta->output_tensor_meta(0);
103  if (output_meta.ok()) {
104    assert(output_meta->sizes().size() == 1);
105  }
106}
107```
108
109**Note:** `method_meta()` will also force-load the `Method` the first time it is called.
110
111### Performing an Inference
112
113Assuming the `Program`'s method names and their input format are known ahead of time, you can run methods directly by name using the `execute()` function:
114
115```cpp
116const auto result = module.execute("forward", tensor);
117```
118
119For the standard `forward()` method, the above can be simplified:
120
121```cpp
122const auto result = module.forward(tensor);
123```
124
125**Note:** `execute()` or `forward()` will load the `Program` and the `Method` the first time they are called. Therefore, the first inference will take longer, as the model is loaded lazily and prepared for execution unless it was explicitly loaded earlier.
126
127### Setting Input and Output
128
129You can set individual input and output values for methods with the following APIs.
130
131#### Setting Inputs
132
133Inputs can be any `EValue`, which includes tensors, scalars, lists, and other supported types. To set a specific input value for a method:
134
135```cpp
136module.set_input("forward", input_value, input_index);
137```
138
139- `input_value` is an `EValue` representing the input you want to set.
140- `input_index` is the zero-based index of the input to set.
141
142For example, to set the first input tensor:
143
144```cpp
145module.set_input("forward", tensor_value, 0);
146```
147
148You can also set multiple inputs at once:
149
150```cpp
151std::vector<runtime::EValue> inputs = {input1, input2, input3};
152module.set_inputs("forward", inputs);
153```
154
155**Note:** You can skip the method name argument for the `forward()` method.
156
157By pre-setting all inputs, you can perform an inference without passing any arguments:
158
159```cpp
160const auto result = module.forward();
161```
162
163Or just setting and then passing the inputs partially:
164
165```cpp
166// Set the second input ahead of time.
167module.set_input(input_value_1, 1);
168
169// Execute the method, providing the first input at call time.
170const auto result = module.forward(input_value_0);
171```
172
173**Note:** The pre-set inputs are stored in the `Module` and can be reused multiple times for the next executions.
174
175Don't forget to clear or reset the inputs if you don't need them anymore by setting them to default-constructed `EValue`:
176
177```cpp
178module.set_input(runtime::EValue(), 1);
179```
180
181#### Setting Outputs
182
183Only outputs of type Tensor can be set at runtime, and they must not be memory-planned at model export time. Memory-planned tensors are preallocated during model export and cannot be replaced.
184
185To set the output tensor for a specific method:
186
187```cpp
188module.set_output("forward", output_tensor, output_index);
189```
190
191- `output_tensor` is an `EValue` containing the tensor you want to set as the output.
192- `output_index` is the zero-based index of the output to set.
193
194**Note:** Ensure that the output tensor you're setting matches the expected shape and data type of the method's output.
195
196You can skip the method name for `forward()` and the index for the first output:
197
198```cpp
199module.set_output(output_tensor);
200```
201
202**Note:** The pre-set outputs are stored in the `Module` and can be reused multiple times for the next executions, just like inputs.
203
204### Result and Error Types
205
206Most of the ExecuTorch APIs return either `Result` or `Error` types:
207
208- [`Error`](https://github.com/pytorch/executorch/blob/main/runtime/core/error.h) is a C++ enum containing valid error codes. The default is `Error::Ok`, denoting success.
209
210- [`Result`](https://github.com/pytorch/executorch/blob/main/runtime/core/result.h) can hold either an `Error` if the operation fails, or a payload such as an `EValue` wrapping a `Tensor` if successful. To check if a `Result` is valid, call `ok()`. To retrieve the `Error`, use `error()`, and to get the data, use `get()` or dereference operators like `*` and `->`.
211
212### Profiling the Module
213
214Use [ExecuTorch Dump](etdump.md) to trace model execution. Create an `ETDumpGen` instance and pass it to the `Module` constructor. After executing a method, save the `ETDump` data to a file for further analysis:
215
216```cpp
217#include <fstream>
218#include <memory>
219
220#include <executorch/extension/module/module.h>
221#include <executorch/devtools/etdump/etdump_flatcc.h>
222
223using namespace ::executorch::extension;
224
225Module module("/path/to/model.pte", Module::LoadMode::MmapUseMlock, std::make_unique<ETDumpGen>());
226
227// Execute a method, e.g., module.forward(...); or module.execute("my_method", ...);
228
229if (auto* etdump = dynamic_cast<ETDumpGen*>(module.event_tracer())) {
230  const auto trace = etdump->get_etdump_data();
231
232  if (trace.buf && trace.size > 0) {
233    std::unique_ptr<void, decltype(&free)> guard(trace.buf, free);
234    std::ofstream file("/path/to/trace.etdump", std::ios::binary);
235
236    if (file) {
237      file.write(static_cast<const char*>(trace.buf), trace.size);
238    }
239  }
240}
241```
242
243## Conclusion
244
245The `Module` APIs provide a simplified interface for running ExecuTorch models in C++, closely resembling the experience of PyTorch's eager mode. By abstracting away the complexities of the lower-level runtime APIs, developers can focus on model execution without worrying about the underlying details.
246