README.md
1# Custom Operator Registration Examples
2This folder contains examples to register custom operators into PyTorch as well as register its kernels into ExecuTorch runtime.
3
4## How to run
5
6Prerequisite: finish the [setting up wiki](https://pytorch.org/executorch/stable/getting-started-setup).
7
8Run:
9
10```bash
11cd executorch
12bash examples/portable/custom_ops/test_custom_ops.sh
13```
14
15## AOT registration
16
17In order to use custom ops in ExecuTorch AOT flow (EXIR), the first option is to register the custom ops into PyTorch JIT runtime using `torch.library` APIs.
18
19We can see the example in `custom_ops_1.py` where we try to register `my_ops::mul3` and `my_ops::mul3_out`. `my_ops` is the namespace and it will show up in the way we use the operator like `torch.ops.my_ops.mul3.default`. For more information about PyTorch operator, checkout [`pytorch/torch/_ops.py`](https://github.com/pytorch/pytorch/blob/main/torch/_ops.py).
20
21Notice that we need both functional variant and out variant for custom ops, because EXIR will need to perform memory planning on the out variant `my_ops::mul3_out`.
22
23The second option is to register the custom ops into PyTorch JIT runtime using C++ APIs (`TORCH_LIBRARY`/`TORCH_LIBRARY_IMPL`). This also means we need to write C++ code and it needs to depend on `libtorch`.
24
25We added an example in `custom_ops_2.cpp` where we implement and register `my_ops::mul4`, also `custom_ops_2_out.cpp` with an implementation for `my_ops::mul4_out`.
26
27By linking them both with `libtorch` and `executorch` library, we can build a shared library `libcustom_ops_aot_lib_2` that can be dynamically loaded by Python environment and then register these ops into PyTorch. This is done by `torch.ops.load_library(<path_to_libcustom_ops_aot_lib_2>)` in `custom_ops_2.py`.
28
29## C++ kernel registration
30
31After the model is exported by EXIR, we need C++ implementations of these custom ops in order to run it. For example, `custom_ops_1_out.cpp` is a C++ kernel that can be plugged into the ExecuTorch runtime. Other than that, we also need a way to bind the PyTorch op to this kernel. This binding is specified in `custom_ops.yaml`:
32```yaml
33- func: my_ops::mul3.out(Tensor input, *, Tensor(a!) output) -> Tensor(a!)
34 kernels:
35 - arg_meta: null
36 kernel_name: custom::mul3_out_impl # sub-namespace native:: is auto-added
37```
38For how to write these YAML entries, please refer to [`kernels/portable/README.md`](https://github.com/pytorch/executorch/blob/main/kernels/portable/README.md).
39
40Currently we use Cmake as the build system to link the `my_ops::mul3.out` kernel (written in `custom_ops_1.cpp`) to the ExecuTorch runtime. See instructions in: `examples/portable/custom_ops/test_custom_ops.sh` (test_cmake_custom_op_1).
41
42## Selective build
43
44Note that we have defined a custom op for both `my_ops::mul3.out` and `my_ops::mul4.out` in `custom_ops.yaml`. To reduce binary size, we can choose to only register the operators used in the model. This is done by passing in a list of operators to the `gen_oplist` custom rule, for example: `--root_ops="my_ops::mul4.out"`.
45
46We then let the custom ops library depend on this target, to only register the ops we want.
47
48For more information about selective build, please refer to [`selective_build.md`](../../../docs/source/kernel-library-selective-build.md).
49