这是indexloc提供的服务,不要输入任何密码
Skip to content

Investigation: Programmable Blending #442

@litherum

Description

@litherum

Motivation

There have been quite a few requests for pieces of custom blending, but the analysis hasn’t been linked together into a coherent investigation.

Existing issues:

The use case is being able to have materials in a rendered scene that are not simply blended (or min’ed or max’ed) with the rest of the scene. This isn’t really relevant to physically-based renderers, but is more relevant for things like cartoon shaders.

This is used in Lumberyard, Just Cause 3, Grid 2, and The Forge.

Similarly, achieving effects like the “vibrancy” effect that’s used all over macOS and iOS would use custom blending. This uses a custom formula to make sure that the foreground is always visible and readable on top any possible background. Here’s an example:

Screen Shot 2019-09-20 at 10 12 11 AM

Programmable blending functionality can’t be emulated by either API-level texture barriers or by adding additional render passes, because there’s nowhere to save the intermediate results of overlapping geometry. This investigation is about additional capabilities, rather than additional performance.

Difficulty

There are two distinct pieces here:

  • Being able to read from (and write to) the rendering destination
  • Because the order of fragment shader execution is undefined, overlapping geometry have to have some synchronization for the read/modify/write cycle to be race-free for each pixel.

Unfortunately, support in the various APIs is different for each of these pieces.

Direct3D

Direct3D has no facility for reading from the framebuffer (that I could find). However, you can bind a texture as a RWTexture. However, if you do this, your reads and writes are unordered.

In Shader Module 5.1, there’s another object which is a drop-in replacement for RWTextures: RasterizerOrderedTextures. These have the guarantee that all operations on this resource, between any two fragment shader invocations which target the same framebuffer location (and level and sample), will be strictly ordered. Beyond that, the ordering is guaranteed to be in API submission order.

This means that, if you bind the destination texture as a UAV, rather than binding it as a framebuffer, you can do programmable blending on that resource.

It looks like its a requirement that all D3D12 devices support Shader Model 5.1. However, support for Rasterizer Order Views is optional; to detect support, check the ROVsSupported field in the return of D3D12Device::CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS).

macOS Metal

Similarly to Direct3D, macOS Metal doesn’t have any facility for reading from the framebuffer. However, you can do the same trick of binding the texture as a texture2D<access::read_write> instead of binding it to the framebuffer.

Then, you can mark the texture as belonging to a “Raster Order Group”, which has the same guarantees that RasterizerOrdered resources have in HLSL. You do this by simply annotating the image with [[raster_order_group(0)]].

Unfortunately, not all hardware supports raster order groups, and support isn’t aligned with any of the existing GPU Family demarcations; instead, authors have to check device.areRasterOrderGroupsSupported. Also, not all hardware can support access::read_write textures; authors have to query support by calling MTLDevice.readWriteTextureSupport.

iOS Metal

iOS Metal has the same concepts of Raster Order Groups, but extends it to work with the framebuffer. The fragment shader can mark a value as both a framebuffer color and a raster_order_group by annotating it with [[color(0), raster_order_group(0)]]. It can read this value from the framebuffer by simply repeating the same object as a parameter to the shader:

struct PixelShaderOutput {
    uint result [[color(0), raster_order_group(0)]];
};

fragment PixelShaderOutput fragmentShader(PixelShaderOutput pixelShaderOutput) {
    ...
}

This means that programmable blending works naturally.

Vulkan

The story on Vulkan is much more complicated: KhronosGroup/Vulkan-Ecosystem#27. Nothing is present in pure Vulkan, but there are some extensions:

VK_EXT_fragment_shader_interlock (GPUInfo says 8% on Windows, 4% on Linux, and 0% on Android): Adds explicit functions for locking and unlocking an implicit mutex. There’s one mutex per pixel/level/sample in the framebuffer. Given that none of the other APIs support explicit locking & unlocking, and the fact that the other API’s designs are easier to get right than this kind of explicit API, I’d recommend against adding this design into WebGPU.

GL_EXT_shader_framebuffer_fetch: Lets you read from the framebuffer, but this is a GL extension, not a Vulkan extension.

VK_EXT_blend_operation_advanced: Doesn’t allow true programmable blending, but does allow some pre-canned blend equations. Also, presence of this extension doesn’t mean that the blend equations actually work in overlapping geometry; there’s an extra bit exposed by this extension which represents whether the blend operations are threadsafe with respect to overlapping fragments.

OpenGL (just for fun)

ARB_shader_image_load_store includes a memoryBarrier() GLSL function which can be used to order reads and writes to resources. INTEL_fragment_shader_ordering includes a modal API where you can toggle between “all reads/writes are unordered” and “all reads/writes are ordered” by calling beginFragmentShaderOrderingINTEL() at the boundary.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiWebGPU APIinvestigationwgslWebGPU Shading Language Issues

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions