这是indexloc提供的服务,不要输入任何密码
Skip to content

Efficient Per-Frame/Transient Bind Groups #915

@tklajnscek

Description

@tklajnscek

Quick TL;DR:

Is there a way to efficiently create transient bind groups in WebGPU? If so, what is it?
If not, is the group willing to entertain the idea of a simple hint/flag to help with this?

The problem

Obviously as many bind groups as possible should be created up-front and then used over and over again, but there will always be some things unknown until or close to draw time.

For some resources like buffers, WebGPU has dynamic offsets which let us change the offset without re-creating bind groups. The limitation being that we're always within the same buffer, but that's mostly workable.

Unfortunately there's no such thing for textures. There are a lot of cases where textures are not known up-front such as render targets that are used as inputs to other draws. Currently there's no good option other than to create single-use bind groups, use them and throw them away at every draw, which seems wasteful since the underlying implementation is most likely not designed for this kind of usage. There's also probably a case to be made for dynamic buffers that are not offsets within just one buffer.

Now I do realize that you could say but you should know all the render targets up front and should be able to bake these as static bind groups but it's not that straightforward as your rendering pipeline complexity grows, some of it is even runtime generated (render graphs), and then you start throwing in pooled/transient render targets.

(Just one possible) Solution

To address this in our engine, we have the notion of dynamic bind groups which are optimized to allocate linearly and fill descriptors efficiently for all supported platforms:

  • For DX12 we have one large gpu-visible heap that we partition into a static area and N (where N = max buffered frames) dynamic areas. The static ones are baked from CPU visible heaps once on init, but the dynamic ones get descriptors copied into them during each frame. The allocation is linear, there's no fragmentation/freeing it's all just reset back to 0 at the start of the frame, guarded by a single fence and the copies are quite fast.
  • For Vulkan we have N descriptor set pools for dynamic descriptor sets which we reset at the start of the frame, linearly allocate from, fill the descriptors and guard with a single fence. So it's basically the same as DX12.
  • For Metal we use Argument Buffers for static bind groups and individual setVertex/FragmentXXX calls for dynamic bind groups.

These bind groups work exactly the same as regular ones, except that their lifetime is only a single frame, they don't have to be cleaned up, it's a simple fire and forget system.

Question/Proposal

Is the current spec of WebGPU enough to avoid performance issues with these? If so developers should just create the bind groups with each draw that needs them, use them and forget them immediately.

Or do you guys feel like it's worth either investigating this further, potentially adding a flag/usage hint to bind groups which lets the implementation handle these better/faster/lighter?

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiWebGPU APIfeature requestA request for a new GPU feature exposed in the APIquestion

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions