这是indexloc提供的服务,不要输入任何密码
Skip to content

Investigation: Bindless resources #380

@litherum

Description

@litherum

Currently, in WebGPU, if a draw/dispatch call wants to use a resource, that resource must be part of a pre-baked "bind group" and then associated with the draw call inside the current render/compute pass. This means that all the resources that the draw/dispatch call could possibly access are explicitly listed by the programmer at the draw/dispatch site.

"Bindless" is a model where the programmer doesn't explicitly list all of the available resources at the draw/dispatch site. Instead, a large swath of resources are made available to the GPU ahead of time (e.g. during application launch) and then shaders can access any/all of them at runtime.

This is distinct from "sparse" resources, in which a single resource is only partly resident on the GPU. This post considers each resource as indivisible and doesn't consider sparse resources.

Motivation

Engines can be surprised by the set of resources that are needed to draw the next frame. Because bind groups in WebGPU are immutable, it is likely that no existing bind group exactly matches the set of resources that the GPU needs to render the next frame, and that the set of bind groups that hold the required resources don't also hold lots of unnecessary extra resources in them.

A common pattern for rendering engines is to "bind, draw, bind, draw, bind, draw, etc." If the application doesn't need to bind between every draw call, the number of API calls is significantly reduced. In addition, it allows for instancing / multi-draw to collapse all the draw calls down to a single one.

It also means that the shader source code is insensitive to the details of which resources are in which bind groups. This improves shader reusability and simplicity. Engines wanting to use a novel set of resources don't have to modify the shader source code.

Lastly, because resources are not limited to just the ones available in the current set of bind groups, the number of available resources to a single shader is significantly larger.

Difficulty

If resources are not explicitly bound, they are accessed through handles (effectively pointers). As we know, dereferencing an arbitrary pointer is forbidden in Web APIs. In the native APIs, it's easily possible to trick the GPU into thinking that some arbitrary number is actually a resource and end up crashing the GPU. Even if the pointer is valid, if it points to a resource that's the wrong type, it's undefined behavior.

Also, the native graphics APIs all have some affordances for resource residency. If the GPU runs out of memory, the runtime is allowed to move some resources off the GPU into main memory to make space. When the runtime knows exactly which resources each draw/dispatch will access, it can make sure that those resources are resident at the time the draw/dispatch occurs. However, with bindless resources, the runtime doesn't know a priori which resources will be used by the shader, which means that information must come from the programmer. However, if the programmer gets it wrong, and writes a shader that accesses a resource which is nonresident, undefined behavior occurs.

If we wanted to make this safe, we would probably have to inject code into the shader at any sites of resource usage. This injected code would have to validate that the resource is of the right type and is resident. This can't be encoded in a switch statement, because the set of resident resources isn't known at shader compilation time; instead, that information would have to be exposed via a side-channel buffer which the injected code would scan. However, this is a scan over a potentially large list of resident resources, which may actually be so slow that it makes the whole endeavor not worth it.

Metal

Metal allows for this model using argument buffers. Inside Metal Shading Language, you can simply place the resource inside the buffer's type:

struct Material {
    texture2d<float> surfaceTexture;
    texture2d<float> specularTexture;
};
kernel void myKernel(constant Material* material [[buffer(0)]]) { ... }

After you've compiled this source code to a MTLLibrary, you can populate this buffer by using the MTLArgumentEncoder class, like this:

let argumentEncoder = library.makeArgumentEncoder(bufferIndex: 0)
let buffer = device.makeBuffer(length: argumentEncoder.encodedLength, options: ...)
argumentEncoder.setArgumentBuffer(buffer, offset: 0)
argumentEncoder.setTexture(theSurfaceTexture, index: 0)
argumentEncoder.setTexture(theSpecularTexture, index: 1)

They can also be populated without MSL by using MTLArgumentDescriptor.

Authors must handle residency themselves by calling useResource(resource:, usage) on the MTLCommandEncoder that issues the draw call. This means that authors don't have to remember to mark their resources as evictable, but they do have to issue one call for every encoder that uses this technique. If the system evicts some resource that you mark as required, your app will pause until the system can make the resource resident.

There is a hardware component. Tier 1 hardware can do all of the above, and Tier 2 hardware can write to an argument buffer from inside a shader (by simply assigning to the texture field) and can dynamically index into the material array in the above example.

So, a bindless design can be achieved by building an argument buffer with all necessary resources, and making sure that the necessary ones are resident whenever they need to be.

Direct3D 12

All resource accesses go through a root signature. A root signature defines which resources are available to the shader, and how they are mapped to semantics in the shader source code. A root signature can contain literal data or a direct reference to a resource. It can also hold a "descriptor table" (described below).

A "descriptor" is simply a reference to a resource. A "descriptor heap" is a linear array of descriptors. Descriptor heaps are expected to be quite large, because there can only be a single one active* for any single draw/dispatch, and switching to a new active one is expected to be slow. A "descriptor table" contains a collection of indices which represent items inside the active descriptor heap. This set can be quite large, and is represented as a collection of contiguous ranges.

CD3DX12_DESCRIPTOR_RANGE range;
range.Init(D3D12_DESCRIPTOR_RANGE_TYPE_CBV, NumDescriptors, BaseShaderRegister, RegisterSpace, OffsetInDescriptorsFromTableStart);
parameter.InitAsDescriptorTable(1, &range, ...);
const D3D12_ROOT_PARAMETER parameters[] = { parameter };
CD3DX12_ROOT_SIGNATURE_DESC descRootSignature;
descRootSignature.Init(_countof(parameters), parameters, ...);

As you can see, each item in the table isn't associated with a particular register number in the shader. Instead, each range is given a "base shader register," and each resource in the range simply increments that value. So, if you have a range of 2 descriptors, and the base shader register is 5, the shader can use register(b5) and register(b6).

Alternatively, the shader can declare an array of resources. StructuredBuffer<Foo> theBuffers[] : register(b5); does the same as above. The array is as long as the root signature says the range is. However, when doing this, the index into the array expression needs to be dynamically uniform.

You indicate which resources need to be resident explicitly by calling ID3D12Device::Evict() and ID3D12Device::makeResident(). These are stateful, so it's easy to forget to call them in balancing pairs. However, you don't have to call them every frame. There's also a way for you to assign priorities so the system knows which resources are less necessary than others.

So, a bindless design can be achieved by making a big descriptor heap with a big descriptor table with all necessary resources, and making sure that the necessary ones are resident whenever they need to be.

* for CBVs/SRVs/UAVs

Vulkan

Core Vulkan doesn't support the bindless model. Beyond that, the story is very confusing. There are two extensions which seem to be related:

VK_EXT_descriptor_indexing. GPUInfo says it’s 42% support on Windows, 52% support on Linux, and 1% on Android. The idea seems to be that you create a single bind group with every resource inside it. Not every resource needs to be valid, and the contents of the set can be modified during the lifetime of the program. The last resource in the set can be an unbounded array.

VK_EXT_buffer_device_address. GPUInfo says it’s 13% on Windows, 16% on Linux, and 0% on Android. The idea seems to be that you can get the physical address for a buffer using vkGetBufferDeviceAddress(). The shading language add a new type that lets you dereference this address. (This sounds SUPER unsafe.)

OpenGL (Just for fun)

It's an extension: ARB_bindless_texture. A handle to a bindless texture is represented as a uint64. You can get the uint64 by calling glGetTextureHandleARB(). Once you have the handle, you can give it to the shader any way you like (even as a vertex attribute). Inside the shader, you just put your sampler2D inside the buffer block, similarly to how Metal Shading Language works. Similarly to HLSL, this works with unbounded arrays. And, similarly to HLSL, indexing into the array has to be done with a dynamically uniform expression.

Similarly to D3D12, you have to manually mark which resources are necessary by calling MakeTextureHandleResidentARB() / MakeTextureHandleNonResidentARB() & friends.

Recommendation

I think our hands are tied here because of Vulkan. The Android support for the bindless extensions is just too low. I don't think we can do it.

Also, I haven't run any performance numbers, but it's hard to imagine the validation code injected into the shader could be executed quickly. It may end up being slower than just not using bindless resources in the first place.

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions