-
Notifications
You must be signed in to change notification settings - Fork 329
Description
Motivation
There have been quite a few requests for pieces of custom blending, but the analysis hasn’t been linked together into a coherent investigation.
Existing issues:
- Advanced blend equations implementable on top of custom blending
- Dual source blending implementable on top of custom blending
- Fragment shader interlock/ordering half of the custom blending use case
- Fragment shader framebuffer fetch the other half of the custom blending use case
The use case is being able to have materials in a rendered scene that are not simply blended (or min’ed or max’ed) with the rest of the scene. This isn’t really relevant to physically-based renderers, but is more relevant for things like cartoon shaders.
This is used in Lumberyard, Just Cause 3, Grid 2, and The Forge.
Similarly, achieving effects like the “vibrancy” effect that’s used all over macOS and iOS would use custom blending. This uses a custom formula to make sure that the foreground is always visible and readable on top any possible background. Here’s an example:
Programmable blending functionality can’t be emulated by either API-level texture barriers or by adding additional render passes, because there’s nowhere to save the intermediate results of overlapping geometry. This investigation is about additional capabilities, rather than additional performance.
Difficulty
There are two distinct pieces here:
- Being able to read from (and write to) the rendering destination
- Because the order of fragment shader execution is undefined, overlapping geometry have to have some synchronization for the read/modify/write cycle to be race-free for each pixel.
Unfortunately, support in the various APIs is different for each of these pieces.
Direct3D
Direct3D has no facility for reading from the framebuffer (that I could find). However, you can bind a texture as a RWTexture
. However, if you do this, your reads and writes are unordered.
In Shader Module 5.1, there’s another object which is a drop-in replacement for RWTextures
: RasterizerOrderedTexture
s. These have the guarantee that all operations on this resource, between any two fragment shader invocations which target the same framebuffer location (and level and sample), will be strictly ordered. Beyond that, the ordering is guaranteed to be in API submission order.
This means that, if you bind the destination texture as a UAV, rather than binding it as a framebuffer, you can do programmable blending on that resource.
It looks like its a requirement that all D3D12 devices support Shader Model 5.1. However, support for Rasterizer Order Views is optional; to detect support, check the ROVsSupported
field in the return of D3D12Device::CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS)
.
macOS Metal
Similarly to Direct3D, macOS Metal doesn’t have any facility for reading from the framebuffer. However, you can do the same trick of binding the texture as a texture2D<access::read_write>
instead of binding it to the framebuffer.
Then, you can mark the texture as belonging to a “Raster Order Group”, which has the same guarantees that RasterizerOrdered
resources have in HLSL. You do this by simply annotating the image with [[raster_order_group(0)]]
.
Unfortunately, not all hardware supports raster order groups, and support isn’t aligned with any of the existing GPU Family demarcations; instead, authors have to check device.areRasterOrderGroupsSupported
. Also, not all hardware can support access::read_write
textures; authors have to query support by calling MTLDevice.readWriteTextureSupport
.
iOS Metal
iOS Metal has the same concepts of Raster Order Groups, but extends it to work with the framebuffer. The fragment shader can mark a value as both a framebuffer color and a raster_order_group by annotating it with [[color(0), raster_order_group(0)]]
. It can read this value from the framebuffer by simply repeating the same object as a parameter to the shader:
struct PixelShaderOutput {
uint result [[color(0), raster_order_group(0)]];
};
fragment PixelShaderOutput fragmentShader(PixelShaderOutput pixelShaderOutput) {
...
}
This means that programmable blending works naturally.
Vulkan
The story on Vulkan is much more complicated: KhronosGroup/Vulkan-Ecosystem#27. Nothing is present in pure Vulkan, but there are some extensions:
VK_EXT_fragment_shader_interlock (GPUInfo says 8% on Windows, 4% on Linux, and 0% on Android): Adds explicit functions for locking and unlocking an implicit mutex. There’s one mutex per pixel/level/sample in the framebuffer. Given that none of the other APIs support explicit locking & unlocking, and the fact that the other API’s designs are easier to get right than this kind of explicit API, I’d recommend against adding this design into WebGPU.
GL_EXT_shader_framebuffer_fetch: Lets you read from the framebuffer, but this is a GL extension, not a Vulkan extension.
VK_EXT_blend_operation_advanced: Doesn’t allow true programmable blending, but does allow some pre-canned blend equations. Also, presence of this extension doesn’t mean that the blend equations actually work in overlapping geometry; there’s an extra bit exposed by this extension which represents whether the blend operations are threadsafe with respect to overlapping fragments.
OpenGL (just for fun)
ARB_shader_image_load_store includes a memoryBarrier()
GLSL function which can be used to order reads and writes to resources. INTEL_fragment_shader_ordering includes a modal API where you can toggle between “all reads/writes are unordered” and “all reads/writes are ordered” by calling beginFragmentShaderOrderingINTEL()
at the boundary.