swiftshader/docs/SamplingRoutines.md

*03ce13f7SAndroid Build Coastguard WorkerSampling Routines
*03ce13f7SAndroid Build Coastguard Worker=================
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerIntroduction
*03ce13f7SAndroid Build Coastguard Worker------------
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerLike other modern real-time graphics APIs, Vulkan has support for [sampler objects](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#samplers) which provide the sampling state to be used by image reading and sampling instructions in the shaders. [Sampler descriptors](https://www.khronos.org/registry/vulkan/specs/1.2/html/vkspec.html#descriptorsets-sampler) contain or reference this state. The sampler descriptor is [combined](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpSampledImage) with an image descriptor, and this combination may only be known at shader execution time.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerThis poses a challenge to SwiftShader's use of [dynamic code generation](Reactor.md), where we wish to specialize the sampling code for both the image's properties (most notably the format) and the sampler state. Historically, sampler state was either part of the texture objects or both the texture and sampler object were bound to a texture unit, ahead of shader execution.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerJIT Trampolines
*03ce13f7SAndroid Build Coastguard Worker---------------
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerThe solution is to defer code generation for the sampling instructions until shader execution. For each image sampling operation we generate a call to a C++ function which will provide the specialized routine based on the image and sampler descriptor used at run-time. Then we call the returned routine.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerNote that this differs from typical JIT-compilers' use of trampoline functions in that we generate code specific to the combination of state, and adapt it to changes in state dynamically.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard Worker3-Level Caching
*03ce13f7SAndroid Build Coastguard Worker---------------
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerWe cache the generated sampling routines, using the descriptors as well as the type of sampling instruction, as the key. This is done at three levels, described in reverse order for easier understanding:
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerL3: At the third and last level, we use a generic least-recently-used (LRU) cache, just like the caches of the pipeline stages' routines. It is protected by a mutex, which may experience high contention due to all shader worker threads needing the sampling routines.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerL2: To mitigate that, there's a second-level cache which contains a 'snapshot' of the last-level cache, which can be queried concurrently without locking. The snapshot is updated at pipeline barriers. While much faster than the last-level cache's critical section, the hash table lookup is still a lot of work per sampling instruction.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerL1: Often the descriptors being used don't change between executions of the sampling instruction. Which is where the first-level or '[inline](https://en.wikipedia.org/wiki/Inline_caching)' cache comes in. It is a single-entry cache implemented at the compiled sampling instruction level. Before calling out to the C++ function to retrieve the routine, we check if the sampler and image descriptor haven't changed since the last execution of the instruction. Note that this cache doesn't use the instruction type as part of the lookup key, since each sampling instruction instance gets its own inline cache.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerDescriptor Identifiers
*03ce13f7SAndroid Build Coastguard Worker----------------------
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerTo make testing whether the descriptor state remained the same fast, they have unique 32-bit identifiers. Note that sampler object state and image view state that is relevant to sampling routine specialization may not be unique among sampler and image view objects. For image views we're able to compress the state into the 32-bit identifier itself to avoid unnecessary recompiles.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerFor sampler state, which is considerably larger than 32-bit, we keep a map of it to the unique identifiers. We keep count of how many sampler objects share each identifier, so we know when we can remove the entry.
*03ce13f7SAndroid Build Coastguard Worker
*03ce13f7SAndroid Build Coastguard WorkerBoth these 32-bit identifiers are the only thing used as the key of the first-level sampling routine cache.