1This subtree contains operator implementations that ExecuTorch clients can use and 2contribute to. For internal users, please see `executorch/kernels/fb/README.md`. 3 4## Layout 5 6- `kernels`: Contains implementations and tests for the operators defined 7 in the YAML files. 8 - `kernels/portable/cpu`: Pure C++ implementations of the operators defined in the 9 YAML files. 10 - `kernels/optimized/cpu`: Optimized C++ implementations of the operators defined in the 11 YAML files, for specific hardware platforms. 12 - `kernels/aten`: A thin wrapper layer to hookup ATen library into ExecuTorch. 13 - `kernels/test`: Tests for all operator implementations. Since all 14 implementations should behave identically, the same tests should pass for 15 all target types. 16 17## Help & Improvements 18 19If you have problems or questions, or have suggestions for ways to make 20implementation and testing better, please contact [Dave 21Bort](https://fb.workplace.com/profile.php?id=100042415022179), [Mengwei 22Liu](https://fb.workplace.com/profile.php?id=100024007250862), or [Martin 23 Yuan](https://fb.workplace.com/profile.php?id=100020734910364) on the PyTorch 24Edge team. 25 26## Contributing 27 28Please follow these steps and guidelines when adding a new operator 29implementation to this library. The goals of these guidelines are to: 30- Make it straightforward to add new operator implementations. 31- Ensure that the operator implementations are of high quality, and are easy to 32 maintain. 33- Make it easy for users to find available operator implementations, and to 34 trust in their quality and behavioral stability. 35 36### Your code must be compatible with ExecuTorch types 37 38ExecuTorch does not use `at::Tensor`, `at::ScalarType`, `c10::Scalar`, or any of 39the types defined by PyTorch core in the `at` or `c10` namespaces. To retain 40tigher control over CPU and memory runtime behavior, ExecuTorch reimplements 41compatible but restricted subsets of those types. 42 43[`//runtime/core/exec_aten/exec_aten.h`](https://github.com/pytorch/executorch/blob/main/runtime/core/exec_aten/exec_aten.h) 44contains the mapping between ATen/c10 types and the ExecuTorch types. The 45ExecuTorch types are defined in other headers in that same directory, 46[`//runtime/core/portable_type/`](https://github.com/pytorch/executorch/tree/main/runtime/core/portable_type). 47 48The ExecuTorch types are source-compatible with the ATen/c10 types; if you write 49code that works with the ExecuTorch types, then that same code should work when 50built against ATen/c10. But, there are features of `at::Tensor` and other 51ATen/c10 types that may not be present. In many cases this is intentional, but 52in other cases we can consider adding the missing features. 53 54### Declare the operator in a YAML file 55 56We use yaml files to declare the ATen operators or custom operators being implemented by this kernel library. 57 58Before implementing, the operator must be declared in exactly one of the 59operator YAML files: 60- [`//kernels/portable/functions.yaml`](https://github.com/pytorch/executorch/blob/main/kernels/portable/functions.yaml) 61 - Add your entry here if your operator overload (e.g., `op: add.out`) 62 appears in the core pytorch file 63 [`pytorch/aten/src/ATen/native/native_functions.yaml`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/native_functions.yaml). 64 - Also add your entry to [`//kernels/aten/functions.yaml`](https://github.com/pytorch/executorch/blob/main/kernels/aten/functions.yaml) for test coverage. 65- [`//kernels/portable/custom_ops.yaml`](https://github.com/pytorch/executorch/blob/main/kernels/portable/custom_ops.yaml) 66 - Add your entry here if your operator overload does *not* appear in the core pytorch `native_functions.yaml`. 67 68The next sections describe how to add a yaml entry. 69 70#### YAML Schema 71 72This YAML file schema is a DSL to decribe the operators and the kernels that implement them. This YAML file is a contract between AOT model export and runtime execution, that if followed correctly, can make sure ExecuTorch runtime be able to link the C++ implementation of an operator to the exported model artifact. Here are some rules of writing up your own YAML files. 73 74**Out variants only** 75 76ExecuTorch only supports out-style operators, where: 77- The caller provides the output Tensor or Tensor list in the final position 78 with the name `out`. 79- The C++ function modifies and returns the same `out` argument. 80 - If the return type in the YAML file is `()` (which maps to void), the C++ 81 function should still modify `out` but does not need to return anything. 82- The `out` argument must be keyword-only, which means it needs to follow an 83 argument named `*` like in the `add.out` example below. 84- Conventionally, these out operators are named using the pattern `<name>.out` 85 or `<name>.<overload>_out`. 86 87Since all output values are returned via an `out` parameter, ExecuTorch ignores 88the actual C++ function return value. But, to be consistent, functions should 89always return `out` when the return type is non-`void`. 90 91**Can only return `Tensor` or `()`** 92 93ExecuTorch only supports operators that return a single `Tensor`, or the unit 94type `()` (which maps to `void`). It does not support returning any other types, 95including lists, optionals, tuples, or scalars like `bool`. 96 97**Supported argument types** 98 99ExecuTorch does not support all of the argument types that core PyTorch 100supports. See [this 101spreadsheet](https://docs.google.com/spreadsheets/d/1uArc0r1Yq1QSeyRJZKzZ8Wkz0eS9TsM39ghmMAZCXDA/edit#gid=0) 102for the list of supported and unsupported types. 103<!-- TODO(dbort): Once that list stablizes, move to a table in this file 104so that external users can see it. --> 105 106**Functions only, no methods** 107 108ExecuTorch does not support Tensor methods, and assumes `variants: function` for 109all operators. Entries like `variants: method` or `variants: function, method` 110will be ignored. 111 112#### Add your operator entry 113 114Some examples of operator entry: 115 116ATen operator with a default kernel 117``` 118- op: add.out 119 kernels: 120 - arg_meta: null 121 kernel_name: torch::executor::add_out 122``` 123 124ATen operator with a dtype/dim order specialized kernel (works for `Double` dtype and dim order needs to be (0, 1, 2, 3)) 125``` 126- op: add.out 127 type_alias: 128 T0: [Double] 129 dim_order_alias: 130 D0: [[0, 1, 2, 3]] 131 kernels: 132 - arg_meta: 133 self: [T0, D0] 134 other: [T0 , D0] 135 out: [T0, D0] 136 kernel_name: torch::executor::add_out 137``` 138 139Custom operator with a default kernel 140``` 141- func: allclose.out(Tensor self, Tensor other, float rtol=1e-05, float atol=1e-08, bool equal_nan=False, bool dummy_param=False, *, Tensor(a!) out) -> Tensor(a!) 142 kernels: 143 - arg_meta: null 144 kernel_name: torch::executor::allclose_out 145``` 146 147Top level attributes: 148* `op` (if the operator appears in `native_functions.yaml`) or `func` for custom operator. The value for this key needs to be the full operator name (including overload name) for `op` key, or a full operator schema (namespace, operator name, operator overload name and schema string). For schema syntax please refer to this [instruction](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md). 149 150* `kernels`: this entry is used to define the information of kernels. It consists of `arg_meta` and `kernel_name`, they are bound together to describe "for input tensors with these metadata, use this kernel". 151* `type_alias`(optional): we are giving aliases to possible dtype options. `T0: [Double, Float]` means `T0` can be one of `Double` or `Float`. 152* `dim_order_alias`(optional): similar to `type_alias`, we are giving names to possible dim order options. 153 154Attributes under `kernels`: 155* `arg_meta`: a list of "tensor arg name" entries. The value for these keys are dtypes and dim orders alias, that are implemented by the corresponding `kernel_name`. This being `null` means the kernel will be used for all types of input. 156* `kernel_name`: the expected name of the 157C++ function that will implement this operator. You can put whatever you want to 158here, but you should follow the convention of replacing the `.` in the overload 159name with an underscore, and lowercasing all characters. In this example, 160`add.out` uses the C++ function named `add_out`. `add.Scalar_out` would become `add_scalar_out`, with a lowercase `S`. We support namespace for kernels, but note that we will be inserting a `native::` to the last level of namespace. So `custom::add_out` in the `kernel_name` will point to `custom::native::add_out`. 161 162### Find operator base name 163 164The base name is the part of the operator name before the `.`, excluding any 165trailing underscores. The rest of this document refer to this as `<name>`. 166 167E.g., these operator overloads all have a base name of `add`: 168- `add.Scalar` 169- `add.Tensor` 170- `add.out` 171- `add_.Tensor` 172 173So, if you were implementing `add.out` then your operator base name would be 174`add`, and you would replace `<name>` with `add` everywhere below. 175 176### Selective build 177 178When using macros that require a `NAME` argument, eg. `#define ET_SWITCH_REAL_TYPES_AND(ADDITIONAL, TYPE, CONTEXT, NAME, CTYPE_ALIAS, ...)`, make sure to pass in the same operator name defined in `functions.yaml`. This is the base name + variant, eg. `add.out`, `add.Scalar_out`. The function name is required for dtype selective build, which matches against the operator names and dtypes present in a model. 179 180### Overview of files and targets 181 182For the operator base name `<name>`, you should work with these files. Sections below give more details about what they should contain. 183 184- `./kernels/portable/cpu/op_<name>.cpp`: The implementations of operator overloads 185 with base name `<name>`. This is the file that clients will link into their 186 runtimes. 187- `./kernels/portable/CMakeLists.txt`: The CMake build file for all the 188 `op_<name>.cpp` files in the same directory. 189- `./kernels/test/op_<name>_test.cpp`: Unit tests for the operator overloads 190 with base name `<name>`. 191 - Note that tests under this directory are for portable kernel specific. To 192 share tests between multiple kernels, we can put tests in ../test. 193 - Note that the tests do not live under `cpu`; tests should be 194 implementation-agnostic. This will let us run the same tests against all 195 implementations of a given operator, which should behave identically. 196- `./kernels/test/CMakeLists.txt`: The CMake build file for all the 197 `op_<name>_test.cpp` files in the same directory. 198 199For an example, see the `add` operator (note that these are slightly different 200from the `add` examples in this doc): 201- [`executorch/kernels/portable/cpu/op_add.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/op_add.cpp): 202 Implementations. 203- [`./kernels/portable/CMakeLists.txt`](https://github.com/pytorch/executorch/blob/main/kernels/portable/CMakeLists.txt): 204 Build portable ops. 205- [`executorch/kernels/portable/test/op_add_test.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/test/op_add_test.cpp): 206 Unit tests. 207- [`./kernels/test/CMakeLists.txt`](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt): 208 Build kernel tests. 209 210### Add the operator implementation to CMakeLists.txt 211 212The portable operator files are collected by [`./kernels/portable/CMakeLists.txt`](https://github.com/pytorch/executorch/blob/main/kernels/portable/CMakeLists.txt) with a glob on `./kernels/portable/cpu/*.cpp`. Ensure your operator file is in that directory. 213 214NOTE: a given `op_<name>` cannot implement both ATen-compatible and 215non-ATen-compatible (i.e., custom) operators. We suggest adding the suffix 216`_custom` if necessary: e.g., `op_add` for ATen-compatible overloads of 217the `add` operator, and `op_add_custom` for non-ATen-compatible overloads. 218 219NOTE: An `op_<name>` may not have dependencies outside of `//executorch`. 220This library is intended to be portable, open-sourceable, and self-contained. 221 222### Create a skeleton .cpp file for the operator implementation 223 224If not already present, create the file 225`executorch/kernels/portable/cpu/op_<name>.cpp`, which should follow the 226pattern: 227``` 228// Copyright (c) Meta Platforms, Inc. and affiliates. 229#include <executorch/runtime/kernel/kernel_includes.h> 230 231namespace torch { 232namespace executor { 233namespace native { 234 235namespace { 236 // <helper code> 237} // namespace 238 239// <operator overload implementations> 240 241} // namespace native 242} // namespace executor 243} // namespace torch 244``` 245 246### Find the function signature for the operator overload 247 248When you add an entry to the YAML file, the codegen tools will generate an 249expected function signature for you to implement in a file called 250`NativeFunctions.h`. To build and find that generated header: 251 2521. Build executorch 253``` 254cmake -DCMAKE_INSTALL_PREFIX=cmake-out \ 255 -DCMAKE_BUILD_TYPE=Release \ 256 -DPYTHON_EXECUTABLE=python \ 257 -Bcmake-out . 258cmake --build cmake-out -j9 --target install --config Release 259``` 2602. The generated `NativeFunctions.h` file is located in 261``` 262cmake-out/kernels/portable/portable_ops_lib/NativeFunctions.h 263``` 264 265Since this header is generated from the YAML files, re-run the script if you have modified your 266operator's entry in those files. 267 268Open the file and look for the function with the same name that you earlier 269added in the YAML file. For `add_out`, this might look like 270``` 271TORCH_API torch::executor::Tensor & add_out(const at::Tensor & self, const at::Tensor & other, at::Tensor & out); 272``` 273 274This is the function signature that you will need to implement. 275 276### Add a stub implementation 277 278Now that you have your function signature, add a stub to the `op_<name>.cpp` 279file that just returns the `out` argument. For example: 280``` 281Tensor& add_out( 282 const Tensor& self, 283 const Tensor& other, 284 Tensor& out) { 285 return out; 286} 287``` 288 289Note that you should drop the `TORCH_API` attribute, and should drop `at::`. 290 291### Create a skeleton test .cpp file 292 293If not already present, create the file 294`executorch/kernels/portable/test/op_<name>_test.cpp`. Here's a suggested 295starting point: 296``` 297// Copyright (c) Meta Platforms, Inc. and affiliates. 298 299#include <executorch/kernels/test/FunctionHeaderWrapper.h> // Declares the operator 300#include <executorch/runtime/core/exec_aten/exec_aten.h> 301#include <executorch/runtime/core/exec_aten/testing_util/tensor_factory.h> 302#include <executorch/runtime/core/exec_aten/testing_util/tensor_util.h> 303 304#include <gtest/gtest.h> 305 306using namespace ::testing; 307using exec_aten::ScalarType; 308using exec_aten::Tensor; 309using torch::executor::native::<operator_function_name>; 310using torch::executor::testing::IsCloseTo; 311using torch::executor::testing::TensorFactory; 312 313TEST(Op<Name>Test, SmokeTest) { 314 TensorFactory<ScalarType::Int> tf; 315 316 Tensor a = tf.make(/*sizes=*/{2, 2}, /*data=*/{1, 1, 1, 1}): 317 Tensor b = tf.ones(/*sizes=*/{2, 2}): 318 Tensor z = tf.zeros(/*sizes=*/{2, 2}): 319 320 EXPECT_EQ(a, b); // Exact equality 321 EXPECT_THAT(a, IsCloseTo(b)); // For floating-point tensors 322 323 EXPECT_NE(a, z); 324 EXPECT_THAT(a, Not(IsCloseTo(z))); 325} 326``` 327 328### Add operator test to CMakeLists.txt 329 330Now, we have to add this to [executorch/kernels/tests/CMakeLists.txt](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt). Note that this builds all the kernel tests. 331 332For portable kernels, add your test file to [`all_test_sources`](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt#L69). 333 334For optimized kernels, add your test file to [`_optimized_kernels_test_sources](https://github.com/pytorch/executorch/blob/main/kernels/test/CMakeLists.txt#L230). 335 336### Implement and test the operator 337 338You should now be able to implement and test your operator. It's helpful to see 339how other operators do it, so take a look at `op_add`: 340- [`executorch/kernels/portable/cpu/op_add.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/portable/cpu/op_add.cpp) 341- [`executorch/kernels/portable/test/op_add_test.cpp`](https://github.com/pytorch/executorch/blob/main/kernels/test/op_add_test.cpp): 342 343Check out how it uses helper macros like `ET_CHECK_SAME_SHAPE_AND_DTYPE` and 344`ET_FORALL_REAL_TYPES` when implementing the operator, and test helpers like 345`TensorFactory` and `IsCloseTo()` when testing. 346 347Once you have your operator and corresponding tests in place, we can try it out. 348 3491. Build ExecuTorch. 350``` 351cmake . \ 352 -DCMAKE_INSTALL_PREFIX=cmake-out \ 353 -DEXECUTORCH_USE_CPP_CODE_COVERAGE=ON \ 354 -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ 355 -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ 356 -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ 357 -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ 358 -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ 359 -DEXECUTORCH_BUILD_DEVTOOLS=ON \ 360 -DEXECUTORCH_BUILD_VULKAN=OFF \ 361 -DEXECUTORCH_BUILD_XNNPACK=ON \ 362 -Bcmake-out 363 364cmake --build cmake-out -j9 --target install 365``` 3662. Build gtest. 367``` 368mkdir -p third-party/googletest/build 369cd third-party/googletest/build 370cmake .. -DCMAKE_INSTALL_PREFIX=. 371make -j4 372make install 373cd ../../../ 374``` 375 3763. Build kernel tests. 377``` 378cmake kernels/test \ 379 -DCMAKE_BUILD_TYPE=Debug \ 380 -DCMAKE_INSTALL_PREFIX=cmake-out \ 381 -DEXECUTORCH_USE_CPP_CODE_COVERAGE=ON \ 382 -DCMAKE_PREFIX_PATH="$(pwd)/third-party/googletest/build" \ 383 -Bcmake-out/kernels/test 384cmake --build cmake-out/kernels/test -j9 385``` 3864. Run tests. You should see your test here. 387``` 388./cmake-out/kernels/test/portable_kernels_test 389./cmake-out/kernels/test/optimized_kernels_test 390``` 391 392#### Implementation restrictions 393 394To reduce dependencies and size, to ensure portability, and to conform to the 395restrictions of embedded environments, your operator implementations: 396 397- Must not include C++ stdlib headers, or use C++ stdlib types. For example, 398 `string`/`basic_string`, `vector`, `unordered_map`, `cout`, `unique_pointer` 399 must not be used. 400- Must not dynamically allocate memory, or cause memory to be dynamically 401 allocated. All non-stack memory must be provided as a function parameter by 402 the caller, typically via an `out` parameter or another tensor parameter to be 403 used as scratch space. 404 - This includes direct calls to `new`, `malloc`, `realloc`, etc., as well as 405 operations that allocate under the hood like `make_unique`, or the creation 406 of `vector` or `string`, for example. 407- Must be stateless. 408- Must be thread-safe. Note that the ExecuTorch environment does not provide 409 a locking construct, so this means that operator implementations must not 410 modify global memory. 411- Must work in an environment without threads. This, along with the stateless 412 requirement, means that thread local storage must not be used. 413- Must not use `stdout`, `stderr`, or other file/stream IO via `printf`/`cout` 414 etc.; instead, use `ET_LOG` from `executorch/runtime/platform/log.h`. 415- Must not use `assert()`. Instead use `ET_CHECK` and other macros from 416 `executorch/runtime/platform/assert.h`. 417- Must not raise exceptions. Instead use `ET_CHECK` and other macros from 418 `executorch/runtime/platform/assert.h`. 419 420Note that not all of these apply to *every* ExecuTorch-compatible operator 421implementation, only those included in this portable library. 422 423For example, a target-specfic custom operator that initiates a DMA copy would be 424stateful, and would probaby modify global memory, but it would need to use 425target-specific APIs to do so. But, since this library is only for portable 426operator implementations, the operators it contains can't depend on 427target-specific APIs like that. 428 429### Shared kernel tests (executorch/kernels/test) 430The portable kernel implementation and its corresponding tests can be used as a 431reference for other kernels. We can also share the test cases in 432`//executorch/kernels/test`, which contains common resources for kernel testing. 433 434*generate_wrapper* generates a header FunctionHeaderWrapper.h, which simply 435includes the corresponding Functions.h file for the specified kernel: 436`#include <executorch/kernels/{}/Functions.h>`. With that, the test sources don't need to know 437about which kernel we are testing and which Functions.h we should use. 438 439With *_common_op_test* we use a single test source file (op_<op>_test.cpp) at this directory. 440We automatically find the corresponding registered dispatch function through Funcitons.h, so 441it can be used to test multiple kernels. 442 443In <kernel>/test/ we can put kernel-specific test cases. 444 445*supported_features* is used to distinguish between different kernel features. For example, 446ATen supports mixing input and output dtype while portable doesn't. When we expect death in 447portable testing in such case, we can check the supported features by the running kernel and 448bypass if it's supported. 449- The default value of supported features is in test/supported_features.yaml 450- Each kernel needs to override its supported features in <kernel>/test/supported_features_def.yaml. 451 See example in supported_features_def_example.yaml. 452- This ensures that all kernels can share the same c++ test case source 453