Name Date Size #Lines LOC

..--

cpu/H25-Apr-2025-116,40698,823

cuda/H25-Apr-2025-1,5551,319

cudnn/H25-Apr-2025-1,9301,472

AffineQuantizer.cppH A D25-Apr-20259.3 KiB295242

AffineQuantizer.hH A D25-Apr-20253.6 KiB131111

AffineQuantizerBase.cppH A D25-Apr-20259.6 KiB294237

AffineQuantizerBase.hH A D25-Apr-20251.4 KiB4836

ConvUtils.hH A D25-Apr-20252.2 KiB6355

Copy.cppH A D25-Apr-20251.8 KiB4030

Copy.hH A D25-Apr-2025169 117

FakeQuantAffine.hH A D25-Apr-20251.8 KiB6853

FakeQuantPerChannelAffine.cppH A D25-Apr-20259.2 KiB260175

FakeQuantPerTensorAffine.cppH A D25-Apr-20258 KiB230144

IndexKernel.hH A D25-Apr-2025567 1510

PackedParams.hH A D25-Apr-20254.6 KiB148103

QTensor.cppH A D25-Apr-202512.7 KiB397315

README.mdH A D25-Apr-20256.6 KiB188142

TensorAdvancedIndexing.cppH A D25-Apr-20259.3 KiB205168

TensorCompare.cppH A D25-Apr-20251.7 KiB6247

TensorFactories.cppH A D25-Apr-20256.2 KiB176146

library.cppH A D25-Apr-202540.1 KiB281214

qconv_unpack.cppH A D25-Apr-20259.3 KiB244195

qlinear_unpack.cppH A D25-Apr-20252.8 KiB8059

README.md

1The quantized folder holds the implementation of the low-level quantized kernel.
2The kernels are registered in `torch::_ops` namespace, and operate on the quantized `at::Tensor` data type.
3You can learn more about the quantized tensors in the [quantized tensor API wiki](https://github.com/pytorch/pytorch/wiki/Introducing-Quantized-Tensor) page.
4
5This document serves as an entry point for quantized kernel implementation.
6
7## Implementing native quantized ops
8
9The new quantized ops are almost always located under the `ATen/native/quantized/cpu` folder. For
10the sake of an example, let us implement an element-wise quantized [logical XAND](https://en.wiktionary.org/wiki/XAND)
11operation under `ATen/native/quantized/cpu/qxand.cpp`.
12
13### Step 0. Implement the quantized function
14
15Before writing the quantized kernel and registering it, let us implement a quantized function.
16That would assist in any further discussion.
17The snippet below shows the implementation of a quantized XAND operator, with the support of all implemented quantized types.
18
19```c++
20Tensor quantized_xand(Tensor qa, Tensor qb) {
21  // Some type checks for qa and qb should be here...
22  Tensor qc;
23  double scale = qa.q_scale();
24  int64_t zero_point = qa.q_zero_point();
25
26  auto iter = TensorIterator::binary_op(qc, qa, qb);
27
28  AT_DISPATCH_QINT_TYPES(qa.scalar_type(), "quantized_xand", [&]() {
29    Tensor qc = at::_empty_affine_quantized(
30        qa.sizes(), at::device(kCPU).dtype(SCALAR_TYPE), scale, zero_point);
31    cpu_kernel(iter, [&](scalar_t a_value, scalar_t b_value) -> scalar_t {
32      return scalar_t(a_value.val_ & b_value.val_);
33    });
34  });
35  return qc;
36}
37```
38
39The code above is fairly straight-forward:
40It takes two quantized tensors `qa` and `qb`, and uses `binary_kernel` to produce a quantized tensor `qc`.
41We also use the [`TensorIterator`](https://caffe2.ai/doxygen-c/html/structat_1_1_tensor_iterator.html) in this example.
42The only part that requires explicit explanation is the `AT_DISPATCH_QINT_TYPES`.
43This macro makes sure that the underlying code works with all quantized types.
44It provides several useful "aliases":
45
46- `SCALAR_TYPE` -- `ScalarType` of the quantized tensor (e.g. `kQInt8`)
47- `scalar_t` -- quantized data type (dtype, e.g. `qint8`)
48- `underlying_t` -- underlying POD data type (dtype, e.g. `int8_t`)
49
50The macro takes three arguments:
51
521. Quantized data type. This will define what the "aliases" are.
53In the example above, the resulting tensor will be the same as the `qa.scalar_type()`.
542. Function name. This argument is currently used for error reporting.
553. Implementation lambda. The main implementation should sit in the body of this lambda.
56it should also use the aliases for the quantized data types instead of the explicit data types.
57
58### Step 1. Define the schema
59
60Update `aten/src/ATen/native/quantized/library.cpp` and add
61a `def` for your new operator:
62
63```c++
64TORCH_LIBRARY(quantized, m) {
65  // ... the existing definitions ...
66  m.def("quantized::xand(Tensor qa, Tensor qb) -> Tensor");
67}
68```
69
70Def takes a **function schema string**: This schema describes the usage of the op.
71In the example above the schema is `"quantized::xand(Tensor qa, Tensor qb) -> Tensor"`.
72This translates to `torch._ops.ops.quantized.xand` function in Python of the appropriate signature.
73
74### Step 2. Register the implementation
75
76The registration is done using `TORCH_LIBRARY_IMPL`.
77
78```c++
79TORCH_LIBRARY_IMPL(quantized, QuantizedCPU, m) {
80  m.impl("xand", TORCH_FN(quantized_xand));
81}
82```
83
84### Step 2b. [Optional] Registering the operation with the `native_functions.yaml`
85
86In some cases, if the signature of the quantized function and its non-quantized counterpart are the same, it is worth adding it to the `ATen/native/native_functions.yaml`.
87A detailed explanation on this file can be found [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/README.md).
88
89**If adding a new entry to the `native_functions.yaml`:**
90
91```yaml
92- func: quantized_xand(Tensor qa, Tensor qb) -> Tensor
93  dispatch:
94    QuantizedCPU: quantized_xand
95```
96
97**If adding to an existing entry in the `native_functions.yaml`:**
98
99If you find an entry in the yaml file, and would like to add a quantized kernel to it, you can just add a new dispatch entry for it.
100For example, let's assume there existed a `xand` function in the YAML file.
101In that case, modification would look as:
102
103```yaml
104- func: xand(Tensor a, Tensor b) -> Tensor
105  dispatch:
106    CPU: _xand_cpu     # Assume this existed
107    CUDA: _xand_cuda   # Assume this existed
108    QuantizedCPU: quantized_xand
109```
110
111### Putting it all together
112
113The final file `ATen/native/quantized/cpu/qxand.cpp` would look as follows
114
115```c++
116#include <ATen/ATen.h>
117#include <ATen/NativeFunctions.h> // Need that for the `native_functions.yaml`
118#include <ATen/core/Type.h>
119#include <torch/library.h>
120#include <ATen/native/TensorIterator.h>
121#include <ATen/native/cpu/Loops.h>
122
123namespace at {
124  namespace native {
125  Tensor quantized_xand(Tensor qa, Tensor qb) {
126    // The awesome op implementation...
127    return qc;
128  }
129
130  TORCH_LIBRARY_IMPL(quantized, QuantizedCPU, m) {
131    m.impl("xand", TORCH_FN(quantized_xand));
132  }
133}}  // namespace at::native
134```
135
136### Step 3. Administrative stuff
137
138Before the op can be used, it needs to be compiled.
139If the op is placed under `native/quantized/cpu`, this already done for you.
140However, if the location is changed, two files must be notified:
141
142- *`caffe2/aten/TARGETS`* -- You can follow the same example, and add your path in somewhere in that file. Notice in this file we places the path to the quantized source files:
143```bash
144ATEN_NATIVE_CPP = glob([
145#...
146  "src/ATen/native/quantized/**/*.cpp",
147])
148```
149
150- *`caffe2/aten/src/ATen/CMakeLists.txt`* -- Again, following the example, you must add your paths.
151The current quantization paths are added as
152
153```bash
154FILE(GLOB native_quantized_cpp
155          "native/quantized/*.cpp"
156          "native/quantized/cpu/*.cpp")
157```
158
159## Using quantized ops
160
161### Python
162
163Usage in Python is pretty easy.
164To implement the python quantized function using our kernel, you can do the following
165
166```python
167from torch._ops import ops
168
169def quantized_xand(qa, qb):
170#Notice the schema changed from `quantized::xand` to `quantized.xand`
171  return ops.quantized.xand(qa, qb)
172```
173
174**Note:** If writing new pytorch functions that use quantized kernels,
175it is strongly encouraged to place them in the `torch/ao/nn/quantized/functional.py`.
176
177### C++
178
179You should not need to use the registered kernels in C++.
180Although **officially not supported**, you can use the following
181
182```c++
183  Tensor quantized_xand(Tensor qa, Tensor qb) {
184    static const c10::OperatorHandle op = c10::Dispatcher::singleton().findSchema({"quantized::xand", ""}).value();
185    return op.call<Tensor, Tensor, Tensor>(qa, qb);
186  }
187```
188