README.md
1This subtree contains operator implementations that ExecuTorch clients can use and
2contribute to. For internal users, please see `executorch/kernels/fb/README.md`.
3
4## Layout
5
6- `kernels`: Contains implementations and tests for the operators defined
7 in the YAML files.
8 - `kernels/portable/cpu`: Pure C++ implementations of the operators defined in the
9 YAML files.
10 - `kernels/optimized/cpu`: Optimized C++ implementations of the operators defined in the
11 YAML files, for specific hardware platforms.
12 - `kernels/aten`: A thin wrapper layer to hookup ATen library into ExecuTorch.
13 - `kernels/test`: Tests for all operator implementations. Since all
14 implementations should behave identically, the same tests should pass for
15 all target types.
16
17## Help & Improvements
18
19If you have problems or questions, or have suggestions for ways to make
20implementation and testing better, please contact [Dave
21Bort](https://fb.workplace.com/profile.php?id=100042415022179), [Mengwei
22Liu](https://fb.workplace.com/profile.php?id=100024007250862), or [Martin
23 Yuan](https://fb.workplace.com/profile.php?id=100020734910364) on the PyTorch
24Edge team.
25
26## Contributing
27
28Please follow these steps and guidelines when adding a new operator
29implementation to this library. The goals of these guidelines are to:
30- Make it straightforward to add new operator implementations.
31- Ensure that the operator implementations are of high quality, and are easy to
32 maintain.
33- Make it easy for users to find available operator implementations, and to
34 trust in their quality and behavioral stability.
35
36### Your code must be compatible with ExecuTorch types
37
38ExecuTorch does not use `at::Tensor`, `at::ScalarType`, `c10::Scalar`, or any of
39the types defined by PyTorch core in the `at` or `c10` namespaces. To retain
40tigher control over CPU and memory runtime behavior, ExecuTorch reimplements
41compatible but restricted subsets of those types.
42
43[`//runtime/core/exec_aten/exec_aten.h`](https://github.com/pytorch/executorch/blob/main/runtime/core/exec_aten/exec_aten.h)
44contains the mapping between ATen/c10 types and the ExecuTorch types. The
45ExecuTorch types are defined in other headers in that same directory,
46[`//runtime/core/portable_type/`](https://github.com/pytorch/executorch/tree/main/runtime/core/portable_type).
47
48The ExecuTorch types are source-compatible with the ATen/c10 types; if you write
49code that works with the ExecuTorch types, then that same code should work when
50built against ATen/c10. But, there are features of `at::Tensor` and other
51ATen/c10 types that may not be present. In many cases this is intentional, but
52in other cases we can consider adding the missing features.
53
54### Declare the operator in a YAML file
55
56We use yaml files to declare the ATen operators or custom operators being implemented by this kernel library.
57
58Before implementing, the operator must be declared in exactly one of the
59operator YAML files:
60- [`//kernels/portable/functions.yaml`](https://github.com/pytorch/executorch/blob/main/kernels/portable/functions.yaml)
61 - Add your entry here if your operator overload (e.g., `op: add.out`)
62 appears in the core pytorch file
63 [`pytorch/aten/src/ATen/native/native_functions.yaml`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml).
64 - Also add your entry to [`//kernels/aten/functions.yaml`](https://github.com/pytorch/executorch/blob/main/kernels/aten/functions.yaml) for test coverage.
65- [`//kernels/portable/custom_ops.yaml`](https://github.com/pytorch/executorch/blob/main/kernels/portable/custom_ops.yaml)
66 - Add your entry here if your operator overload does *not* appear in the core pytorch `native_functions.yaml`.
67
68The next sections describe how to add a yaml entry.
69
70#### YAML Schema
71
72This YAML file schema is a DSL to decribe the operators and the kernels that implement them. This YAML file is a contract between AOT model export and runtime execution, that if followed correctly, can make sure ExecuTorch runtime be able to link the C++ implementation of an operator to the exported model artifact. Here are some rules of writing up your own YAML files.
73
74**Out variants only**
75
76ExecuTorch only supports out-style operators, where:
77- The caller provides the output Tensor or Tensor list in the final position
78 with the name `out`.
79- The C++ function modifies and returns the same `out` argument.
80 - If the return type in the YAML file is `()` (which maps to void), the C++
81 function should still modify `out` but does not need to return anything.
82- The `out` argument must be keyword-only, which means it needs to follow an
83 argument named `*` like in the `add.out` example below.
84- Conventionally, these out operators are named using the pattern `<name>.out`
85 or `<name>.<overload>_out`.
86
87Since all output values are returned via an `out` parameter, ExecuTorch ignores
88the actual C++ function return value. But, to be consistent, functions should
89always return `out` when the return type is non-`void`.
90
91**Can only return `Tensor` or `()`**
92
93ExecuTorch only supports operators that return a single `Tensor`, or the unit
94type `()` (which maps to `void`). It does not support returning any other types,
95including lists, optionals, tuples, or scalars like `bool`.
96
97**Supported argument types**
98
99ExecuTorch does not support all of the argument types that core PyTorch
100supports. See [this
101spreadsheet](https://docs.google.com/spreadsheets/d/1uArc0r1Yq1QSeyRJZKzZ8Wkz0eS9TsM39ghmMAZCXDA/edit#gid=0)
102for the list of supported and unsupported types.
103<!-- TODO(dbort): Once that list stablizes, move to a table in this file
104so that external users can see it. -->
105
106**Functions only, no methods**
107
108ExecuTorch does not support Tensor methods, and assumes `variants: function` for
109all operators. Entries like `variants: method` or `variants: function, method`
110will be ignored.
111
112#### Add your operator entry
113
114Some examples of operator entry:
115
116ATen operator with a default kernel
117```
118- op: add.out
119 kernels:
120 - arg_meta: null
121 kernel_name: torch::executor::add_out
122```
123
124ATen operator with a dtype/dim order specialized kernel (works for `Double` dtype and dim order needs to be (0, 1, 2, 3))
125```
126- op: add.out
127 type_alias:
128 T0: [Double]
129 dim_order_alias:
130 D0: [[0, 1, 2, 3]]
131 kernels:
132 - arg_meta:
133 self: [T0, D0]
134 other: [T0 , D0]
135 out: [T0, D0]
136 kernel_name: torch::executor::add_out
137```
138
139Custom operator with a default kernel
140```
141- func: allclose.out(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False, bool dummy_param=False, *, Tensor(a!) out) -> Tensor(a!)
142 kernels:
143 - arg_meta: null
144 kernel_name: torch::executor::allclose_out
145```
146
147Top level attributes:
148* `op` (if the operator appears in `native_functions.yaml`) or `func` for custom operator. The value for this key needs to be the full operator name (including overload name) for `op` key, or a full operator schema (namespace, operator name, operator overload name and schema string). For schema syntax please refer to this [instruction](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md).
149
150* `kernels`: this entry is used to define the information of kernels. It consists of `arg_meta` and `kernel_name`, they are bound together to describe "for input tensors with these metadata, use this kernel".
151* `type_alias`(optional): we are giving aliases to possible dtype options. `T0: [Double, Float]` means `T0` can be one of `Double` or `Float`.
152* `dim_order_alias`(optional): similar to `type_alias`, we are giving names to possible dim order options.
153
154Attributes under `kernels`:
155* `arg_meta`: a list of "tensor arg name" entries. The value for these keys are dtypes and dim orders alias, that are implemented by the corresponding `kernel_name`. This being `null` means the kernel will be used for all types of input.
156* `kernel_name`: the expected name of the
157C++ function that will implement this operator. You can put whatever you want to
158here, but you should follow the convention of replacing the `.` in the overload
159name with an underscore, and lowercasing all characters. In this example,
160`add.out` uses the C++ function named `add_out`. `add.Scalar_out` would become `add_scalar_out`, with a lowercase `S`. We support namespace for kernels, but note that we will be inserting a `native::` to the last level of namespace. So `custom::add_out` in the `kernel_name` will point to `custom::native::add_out`.
161
162### Find operator base name
163
164The base name is the part of the operator name before the `.`, excluding any
165trailing underscores. The rest of this document refer to this as `<name>`.
166
167E.g., these operator overloads all have a base name of `add`:
168- `add.Scalar`
169- `add.Tensor`
170- `add.out`
171- `add_.Tensor`
172
173So, if you were implementing `add.out` then your operator base name would be
174`add`, and you would replace `<name>` with `add` everywhere below.
175
176### Selective build
177
178When using macros that require a `NAME` argument, eg. `#define ET_SWITCH_REAL_TYPES_AND(ADDITIONAL, TYPE, CONTEXT, NAME, CTYPE_ALIAS, ...)`, make sure to pass in the same operator name defined in `functions.yaml`. This is the base name + variant, eg. `add.out`, `add.Scalar_out`. The function name is required for dtype selective build, which matches against the operator names and dtypes present in a model.
179
180### Overview of files and targets
181
182For the operator base name `<name>`, you should work with these files. Sections below give more details about what they should contain.
183
184- `./kernels/portable/cpu/op_<name>.cpp`: The implementations of operator overloads
185 with base name `<name>`. This is the file that clients will link into their
186 runtimes.
187- `./kernels/portable/CMakeLists.txt`: The CMake build file for all the
188 `op_<name>.cpp` files in the same directory.
189- `./kernels/test/op_<name>_test.cpp`: Unit tests for the operator overloads
190 with base name `<name>`.
191 - Note that tests under this directory are for portable kernel specific. To
192 share tests between multiple kernels, we can put tests in ../test.
193 - Note that the tests do not live under `cpu`; tests should be
194 implementation-agnostic. This will let us run the same tests against all
195 implementations of a given operator, which should behave identically.
196- `./kernels/test/CMakeLists.txt`: The CMake build file for all the
197 `op_<name>_test.cpp` files in the same directory.
198
199For an example, see the `add` operator (note that these are slightly different
200from the `add` examples in this doc):
201- [`executorch/kernels/portable/cpu/op_add.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/op_add.cpp):
202 Implementations.
203- [`./kernels/portable/CMakeLists.txt`](https://github.com/pytorch/executorch/blob/main/kernels/portable/CMakeLists.txt):
204 Build portable ops.
205- [`executorch/kernels/portable/test/op_add_test.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/test/op_add_test.cpp):
206 Unit tests.
207- [`./kernels/test/CMakeLists.txt`](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt):
208 Build kernel tests.
209
210### Add the operator implementation to CMakeLists.txt
211
212The portable operator files are collected by [`./kernels/portable/CMakeLists.txt`](https://github.com/pytorch/executorch/blob/main/kernels/portable/CMakeLists.txt) with a glob on `./kernels/portable/cpu/*.cpp`. Ensure your operator file is in that directory.
213
214NOTE: a given `op_<name>` cannot implement both ATen-compatible and
215non-ATen-compatible (i.e., custom) operators. We suggest adding the suffix
216`_custom` if necessary: e.g., `op_add` for ATen-compatible overloads of
217the `add` operator, and `op_add_custom` for non-ATen-compatible overloads.
218
219NOTE: An `op_<name>` may not have dependencies outside of `//executorch`.
220This library is intended to be portable, open-sourceable, and self-contained.
221
222### Create a skeleton .cpp file for the operator implementation
223
224If not already present, create the file
225`executorch/kernels/portable/cpu/op_<name>.cpp`, which should follow the
226pattern:
227```
228// Copyright (c) Meta Platforms, Inc. and affiliates.
229#include <executorch/runtime/kernel/kernel_includes.h>
230
231namespace torch {
232namespace executor {
233namespace native {
234
235namespace {
236 // <helper code>
237} // namespace
238
239// <operator overload implementations>
240
241} // namespace native
242} // namespace executor
243} // namespace torch
244```
245
246### Find the function signature for the operator overload
247
248When you add an entry to the YAML file, the codegen tools will generate an
249expected function signature for you to implement in a file called
250`NativeFunctions.h`. To build and find that generated header:
251
2521. Build executorch
253```
254cmake -DCMAKE_INSTALL_PREFIX=cmake-out \
255 -DCMAKE_BUILD_TYPE=Release \
256 -DPYTHON_EXECUTABLE=python \
257 -Bcmake-out .
258cmake --build cmake-out -j9 --target install --config Release
259```
2602. The generated `NativeFunctions.h` file is located in
261```
262cmake-out/kernels/portable/portable_ops_lib/NativeFunctions.h
263```
264
265Since this header is generated from the YAML files, re-run the script if you have modified your
266operator's entry in those files.
267
268Open the file and look for the function with the same name that you earlier
269added in the YAML file. For `add_out`, this might look like
270```
271TORCH_API torch::executor::Tensor & add_out(const at::Tensor & self, const at::Tensor & other, at::Tensor & out);
272```
273
274This is the function signature that you will need to implement.
275
276### Add a stub implementation
277
278Now that you have your function signature, add a stub to the `op_<name>.cpp`
279file that just returns the `out` argument. For example:
280```
281Tensor& add_out(
282 const Tensor& self,
283 const Tensor& other,
284 Tensor& out) {
285 return out;
286}
287```
288
289Note that you should drop the `TORCH_API` attribute, and should drop `at::`.
290
291### Create a skeleton test .cpp file
292
293If not already present, create the file
294`executorch/kernels/portable/test/op_<name>_test.cpp`. Here's a suggested
295starting point:
296```
297// Copyright (c) Meta Platforms, Inc. and affiliates.
298
299#include <executorch/kernels/test/FunctionHeaderWrapper.h> // Declares the operator
300#include <executorch/runtime/core/exec_aten/exec_aten.h>
301#include <executorch/runtime/core/exec_aten/testing_util/tensor_factory.h>
302#include <executorch/runtime/core/exec_aten/testing_util/tensor_util.h>
303
304#include <gtest/gtest.h>
305
306using namespace ::testing;
307using exec_aten::ScalarType;
308using exec_aten::Tensor;
309using torch::executor::native::<operator_function_name>;
310using torch::executor::testing::IsCloseTo;
311using torch::executor::testing::TensorFactory;
312
313TEST(Op<Name>Test, SmokeTest) {
314 TensorFactory<ScalarType::Int> tf;
315
316 Tensor a = tf.make(/*sizes=*/{2, 2}, /*data=*/{1, 1, 1, 1}):
317 Tensor b = tf.ones(/*sizes=*/{2, 2}):
318 Tensor z = tf.zeros(/*sizes=*/{2, 2}):
319
320 EXPECT_EQ(a, b); // Exact equality
321 EXPECT_THAT(a, IsCloseTo(b)); // For floating-point tensors
322
323 EXPECT_NE(a, z);
324 EXPECT_THAT(a, Not(IsCloseTo(z)));
325}
326```
327
328### Add operator test to CMakeLists.txt
329
330Now, we have to add this to [executorch/kernels/tests/CMakeLists.txt](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt). Note that this builds all the kernel tests.
331
332For portable kernels, add your test file to [`all_test_sources`](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt#L69).
333
334For optimized kernels, add your test file to [`_optimized_kernels_test_sources](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt#L230).
335
336### Implement and test the operator
337
338You should now be able to implement and test your operator. It's helpful to see
339how other operators do it, so take a look at `op_add`:
340- [`executorch/kernels/portable/cpu/op_add.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/op_add.cpp)
341- [`executorch/kernels/portable/test/op_add_test.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/test/op_add_test.cpp):
342
343Check out how it uses helper macros like `ET_CHECK_SAME_SHAPE_AND_DTYPE` and
344`ET_FORALL_REAL_TYPES` when implementing the operator, and test helpers like
345`TensorFactory` and `IsCloseTo()` when testing.
346
347Once you have your operator and corresponding tests in place, we can try it out.
348
3491. Build ExecuTorch.
350```
351cmake . \
352 -DCMAKE_INSTALL_PREFIX=cmake-out \
353 -DEXECUTORCH_USE_CPP_CODE_COVERAGE=ON \
354 -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
355 -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
356 -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
357 -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
358 -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
359 -DEXECUTORCH_BUILD_DEVTOOLS=ON \
360 -DEXECUTORCH_BUILD_VULKAN=OFF \
361 -DEXECUTORCH_BUILD_XNNPACK=ON \
362 -Bcmake-out
363
364cmake --build cmake-out -j9 --target install
365```
3662. Build gtest.
367```
368mkdir -p third-party/googletest/build
369cd third-party/googletest/build
370cmake .. -DCMAKE_INSTALL_PREFIX=.
371make -j4
372make install
373cd ../../../
374```
375
3763. Build kernel tests.
377```
378cmake kernels/test \
379 -DCMAKE_BUILD_TYPE=Debug \
380 -DCMAKE_INSTALL_PREFIX=cmake-out \
381 -DEXECUTORCH_USE_CPP_CODE_COVERAGE=ON \
382 -DCMAKE_PREFIX_PATH="$(pwd)/third-party/googletest/build" \
383 -Bcmake-out/kernels/test
384cmake --build cmake-out/kernels/test -j9
385```
3864. Run tests. You should see your test here.
387```
388./cmake-out/kernels/test/portable_kernels_test
389./cmake-out/kernels/test/optimized_kernels_test
390```
391
392#### Implementation restrictions
393
394To reduce dependencies and size, to ensure portability, and to conform to the
395restrictions of embedded environments, your operator implementations:
396
397- Must not include C++ stdlib headers, or use C++ stdlib types. For example,
398 `string`/`basic_string`, `vector`, `unordered_map`, `cout`, `unique_pointer`
399 must not be used.
400- Must not dynamically allocate memory, or cause memory to be dynamically
401 allocated. All non-stack memory must be provided as a function parameter by
402 the caller, typically via an `out` parameter or another tensor parameter to be
403 used as scratch space.
404 - This includes direct calls to `new`, `malloc`, `realloc`, etc., as well as
405 operations that allocate under the hood like `make_unique`, or the creation
406 of `vector` or `string`, for example.
407- Must be stateless.
408- Must be thread-safe. Note that the ExecuTorch environment does not provide
409 a locking construct, so this means that operator implementations must not
410 modify global memory.
411- Must work in an environment without threads. This, along with the stateless
412 requirement, means that thread local storage must not be used.
413- Must not use `stdout`, `stderr`, or other file/stream IO via `printf`/`cout`
414 etc.; instead, use `ET_LOG` from `executorch/runtime/platform/log.h`.
415- Must not use `assert()`. Instead use `ET_CHECK` and other macros from
416 `executorch/runtime/platform/assert.h`.
417- Must not raise exceptions. Instead use `ET_CHECK` and other macros from
418 `executorch/runtime/platform/assert.h`.
419
420Note that not all of these apply to *every* ExecuTorch-compatible operator
421implementation, only those included in this portable library.
422
423For example, a target-specfic custom operator that initiates a DMA copy would be
424stateful, and would probaby modify global memory, but it would need to use
425target-specific APIs to do so. But, since this library is only for portable
426operator implementations, the operators it contains can't depend on
427target-specific APIs like that.
428
429### Shared kernel tests (executorch/kernels/test)
430The portable kernel implementation and its corresponding tests can be used as a
431reference for other kernels. We can also share the test cases in
432`//executorch/kernels/test`, which contains common resources for kernel testing.
433
434*generate_wrapper* generates a header FunctionHeaderWrapper.h, which simply
435includes the corresponding Functions.h file for the specified kernel:
436`#include <executorch/kernels/{}/Functions.h>`. With that, the test sources don't need to know
437about which kernel we are testing and which Functions.h we should use.
438
439With *_common_op_test* we use a single test source file (op_<op>_test.cpp) at this directory.
440We automatically find the corresponding registered dispatch function through Funcitons.h, so
441it can be used to test multiple kernels.
442
443In <kernel>/test/ we can put kernel-specific test cases.
444
445*supported_features* is used to distinguish between different kernel features. For example,
446ATen supports mixing input and output dtype while portable doesn't. When we expect death in
447portable testing in such case, we can check the supported features by the running kernel and
448bypass if it's supported.
449- The default value of supported features is in test/supported_features.yaml
450- Each kernel needs to override its supported features in <kernel>/test/supported_features_def.yaml.
451 See example in supported_features_def_example.yaml.
452- This ensures that all kernels can share the same c++ test case source
453