1# Microkernel naming conventions 2 3This documents deciphers XNNPACK's microkernels naming convention. 4 5## General conventions 6 7Microkernel function names follow this convention: 8 9`xnn_<datatype>_<microkernel><activation?>_ukernel_<parameters>__<arch>` 10 11Where `<datatype>` can be: 12 13- `cs16` 14- `f16` - 16-bit half precision float 15- `f32` - 32-bit single precision float 16- `qc8` 17- `qs8` - quantized signed 8 bit 18- `qu8` - quantized unsigned 8 bit 19- `s16` 20- `u32` 21- `x8` 22- `x16` 23- `x24` 24- `x32` 25- `xx` 26 27`<microkernel>` is the type of microkernel, such as: 28 29- `gemm` 30- `igemm` 31- `avgpool` 32 33`<activation>` if supported for the microkernel is activation that is fused into 34the microkernel: 35 36- `linear` 37- `minmax` 38- `relu` 39 40`<parameters>` are microkernel specific, and can mean different things depending 41on the microkernel (see below for details). 42 43`<arch>` is the architecture the microkernel is optimized for, and can contain 44further subdivisions for additional instruction sets supported on the specified 45architecture, or processor information: 46 47- `scalar` 48- `aarch32_neon_cortex_a55` 49- `neonv8_mlal` 50- `wasm` 51- `avx512` 52- `avx512skx` 53 54## GEMM and IGEMM microkernels 55 56The `<parameters>` for GEMM and IGEMM microkernels represent the `mr` and `nr` 57of the microkernel. You can think of it as the number of rows and columns of the 58output calculated by the microkernel. 59 60E.g. `xnn_f32_gemm_minmax_ukernel_4x8__aarch32_neon_cortex_a7` processes 32 61elements of the output matrix. 62 63## Average Pooling and Global Average Pooling 64 65These microkernels come in 2 varieties, uni-pass and multi-pass. 66 67Uni-pass have `Cx` in their name, where `C` is a number. This microkernel 68processes up to and including `C` elements. 69 70Multi-pass have `CpDx` in their name, where `C` and `D` are numbers. This 71microkernel processes `D` elements in the first pass, and middle pass (which can 72run multiple times), and up to `C` elements in the last pass. 73 74E.g. `xnn_f32_avgpool_minmax_ukernel_9x__neon_c4` can process up to 9 elements. 75