fuser - OpenGrok cross reference for /aosp_15_r20/external/pytorch/torch/csrc/jit/codegen/fuser/

# PyTorch Fuser

The fuser accepts subgraphs wrapped in "fusion nodes" and tries to execute them by just-in-time (JIT) compiling kernels that run all the graph operations.

## Code Organization

The fuser is designed hierarchically with device-independent logic eventually deferring to device-specific logic and implementation. The device-specific code is (mostly) found in each devices' subdirectory. The device-independent logic has six components:

* The Interface (interface.h/cpp) has functions to register and run fusions, interrogate fusion functionality, and perform debugging.
* The Compiler (compiler.h/cpp) performs "upfront" and "runtime" compilation. When fusions are registered, upfront compilation produces fallback code and and performs some shape inference. When a fusion is run, runtime compilation invokes code generation and the device-specific compilation logic.
* The Code Generator (codegen.h/cpp) produces the string to be compiled on the device.
* The Executor (executor.h/cpp) runs requested fusions. It performs shape inference, expands tensors as necessary, determines the device to run on, acquires a cached compiled kernel or requests the Compiler produce a new one, invokes device-specific code to launch the kernel and updates the stack.
* The Fallback (fallback.h/cpp) runs subgraphs that can't be fused because shape inference didn't determine a common tensor size or the device the tensors are on doesn't support fusion.
* The Kernel Specification Cache (kernel_cache.h/cpp) is a thread-safe cache holding the device-independent specifications produced during upfront compilation. These specifications each have their own thread-safe stores of compiled kernels that the Executor checks before requesting runtime compilation.

The device-specific components have logic for compiling and running code in FusedKernelCPU (cpu/fused_kernel.h/cpp) and FusedKernelCUDA (cuda/fused_kernel.h/cpp).
Name		Date	Size	#Lines	LOC
..		-	-
cpu/	H	25-Apr-2025	-	664	530
cuda/	H	25-Apr-2025	-	734	589
README.md	H A D	25-Apr-2025	1.9 KiB	17	11
arg_spec.h	H A D	25-Apr-2025	1.4 KiB	57	38
codegen.cpp	H A D	25-Apr-2025	23.7 KiB	687	556
codegen.h	H A D	25-Apr-2025	745	25	17
compiler.cpp	H A D	25-Apr-2025	9.9 KiB	299	226
compiler.h	H A D	25-Apr-2025	1.8 KiB	57	40
executor.cpp	H A D	25-Apr-2025	13.4 KiB	406	309
executor.h	H A D	25-Apr-2025	500	20	12
fallback.cpp	H A D	25-Apr-2025	1.4 KiB	48	37
fallback.h	H A D	25-Apr-2025	174	12	6
fused_kernel.h	H A D	25-Apr-2025	3.2 KiB	99	61
interface.cpp	H A D	25-Apr-2025	3.1 KiB	108	79
interface.h	H A D	25-Apr-2025	1.7 KiB	55	25
kernel_cache.cpp	H A D	25-Apr-2025	2.6 KiB	89	63
kernel_cache.h	H A D	25-Apr-2025	994	34	16
kernel_spec.h	H A D	25-Apr-2025	4.4 KiB	148	109
partition_desc.h	H A D	25-Apr-2025	1.7 KiB	59	41
tensor_desc.h	H A D	25-Apr-2025	2.6 KiB	99	75
tensor_info.h	H A D	25-Apr-2025	536	25	16