Efficient Per-Frame/Transient Bind Groups

### Quick TL;DR:
Is there a way to efficiently create _transient_ bind groups in WebGPU? If so, what is it?
If not, is the group willing to entertain the idea of a simple hint/flag to help with this?

### The problem
Obviously as many bind groups as possible should be created up-front and then used over and over again, but there will always be some things unknown until or close to draw time.

For some resources like buffers, WebGPU has dynamic offsets which let us change the offset without re-creating bind groups. The limitation being that we're always within the same buffer, but that's mostly workable.

Unfortunately there's no such thing for textures. There are a lot of cases where textures are not known up-front such as render targets that are used as inputs to other draws. Currently there's no good option other than to create single-use bind groups, use them and throw them away at every draw, which seems wasteful since the underlying implementation is most likely not designed for this kind of usage. There's also probably a case to be made for dynamic buffers that are not offsets within just one buffer.

Now I do realize that you could say _but you should know all the render targets up front and should be able to bake these as static bind groups_ but it's not that straightforward as your rendering pipeline complexity grows, some of it is even runtime generated (render graphs), and then you start throwing in pooled/transient render targets.

### (Just one possible) Solution
To address this in our engine, we have the notion of dynamic bind groups which are optimized to allocate linearly and fill descriptors efficiently for all supported platforms:
- For DX12 we have one large gpu-visible heap that we partition into a static area and N (where N = max buffered frames) dynamic areas. The static ones are baked from CPU visible heaps once on init, but the dynamic ones get descriptors copied into them during each frame. The allocation is linear, there's no fragmentation/freeing it's all just reset back to 0 at the start of the frame, guarded by a single fence and the copies are quite fast.
- For Vulkan we have N descriptor set pools for dynamic descriptor sets which we reset at the start of the frame, linearly allocate from, fill the descriptors and guard with a single fence. So it's basically the same as DX12.
- For Metal we use Argument Buffers for static bind groups and individual setVertex/FragmentXXX calls for dynamic bind groups.

These bind groups work exactly the same as regular ones, except that their lifetime is only a single frame, they don't have to be cleaned up, it's a simple fire and forget system.

### Question/Proposal
Is the current spec of WebGPU enough to avoid performance issues with these? If so developers should just create the bind groups with each draw that needs them, use them and forget them immediately. 

Or do you guys feel like it's worth either investigating this further, potentially adding a flag/usage hint to bind groups which lets the implementation handle these better/faster/lighter?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient Per-Frame/Transient Bind Groups #915

Quick TL;DR:

The problem

(Just one possible) Solution

Question/Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Efficient Per-Frame/Transient Bind Groups #915

Description

Quick TL;DR:

The problem

(Just one possible) Solution

Question/Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions