这是indexloc提供的服务,不要输入任何密码
Skip to content

GPU Web 2025‐11‐05

Corentin Wallez edited this page Nov 18, 2025 · 1 revision

GPU Web WG 2025-11-05 Atlantic-time

Chair: CW
Scribe: KR
Location: Google Meet

Tentative agenda

  • Administrivia
  • CTS Update
  • Specify immediate data API and WGSL <immediate> address space #5423
  • [main branch] Specify Compatibility Mode #5402
  • Add support for memoryless textures #5396
  • [bindless] Add the proposal for how bindings are updated in dynamic binding arrays. #5379
  • [bindless] What limit for maxDynamicBindingArraySize? #5373
  • [bindless] Alternative to the GPUBindGroup addition of a variable size "dynamic binding array" #5372
  • Agenda for next meeting

Attendance

  • Google
    • Brandon Jones
    • Corentin Wallez
    • Geoff Lang
    • Kai Ninomiya
    • Ken Russell
    • Stephen White
  • Mozilla
    • Jim Blandy
  • Nvidia
    • Fabio Bernardon
    • Markus Tavenrath
  • Albin Bernhardsson
  • Connor Fitzgerald

Administrivia

  • None

CTS Update

  • None

Specify immediate data API and WGSL <immediate> address space #5423

  • Shaobo has a PR up with comments from Kai, etc
  • 2 things to discuss
  • 1. Any CTS or implementation that is up to the spec yet? Shouldn't merge until we have both
    • JB: Moz agrees. Want CTS ready to go. Our impl isn't up to spec either.
  • 2. Can we make the name smaller? SetImmediates
    • CF: short name good 😀
    • JB: I have feedback on non-API-visible names, can be editorial
  • CW: keep the PR open until we have impls / CTS ready

[main branch] Specify Compatibility Mode #5402

  • CW: any
  • JB: Teo had one point about derivatives, but other than that, he looked it over and says everything looks great
    • Are "Fine" derivatives guaranteed to actually be fine? #4325
  • CF: think we should expect fine, and if we can't provide it, warn
  • SW: Intel Mac doesn't support fine in core
  • KN: Mike told us if you use the fine derivatives in Mac Intel Metal, you'll get coarse. Metal doesn't have coarse/fine derivatives. All variants have the same impl. Coarse are always allowed to be fine, unfortunately. Will never be fixed.
  • CW: that sounds like a bug but we feel we can emulate these with quad shuffles
  • KN: yes, think we can make core give you fine derivatives when you ask for fine. So I think it's fine to disallow fine derivatives in Compat.
  • CF: emulating on top of subgroup ops sounds not possible unless all Mac impls require subgroup ops. These might not exist.
  • KN: sounds like they always exist on those Macs.
  • CW: so full subgroup quad shuffle on macOS requires MSL 2.1 (James Price?). Supported in macOS 10.15 which is MSL 2.2. Worst case we can say it's a driver bug.
  • SW: should we raise an error in Compat if you try to use fine derivatives?
  • KN: think so.
  • CW: yes.
  • SW: will put up a PR for that. We want to do this before merge?
  • CW: think so. This is the biggest issue remaining. Can we have agreement to merge after this is fixed and we fix editorial issues?
  • TT: sounds fine.
  • CW: and Mike was happy with this as well. Awesome!

Add support for memoryless textures #5396

  • CW: Mike commented on the issue. In WebKit they wanted to support transient textures, but was too expensive without the hint.
  • CF: wgpu also has impl of transient attachments. Almost exactly what Mike proposed. Have a usage - can only use that usage if the other usage is render-attachment. One extra validation rule.
  • CW: same in Dawn. Must use it the same way as render attachment.
  • CF: sounds great.
  • KN: sounds good. Vk feature's a little strange, lazily allocated and can never be freed. Is this a problem? If it becomes one, impls can probably free the mem, allocate a new one and swap out the resource underneath, since it's a render attachments.
  • AB: not sure about reusing the memory. In Vk, you can query the amount of committed memory. Monotonically increasing. Driver doesn't reclaim even if it could, because if you rendered last frame you'd want to render this frame too - no point in reclaiming in most cases.
  • JB: and the query is on device memory? Not on images?
  • AB: yes.
  • CF: so on Vk device memory can show up and there's no way to get rid of it?
  • AB: can always get rid of it manually by deleting the texture.
  • CW: in Mesa it's like render target aliasing, and if you need to spill the tile. VkFramebuffer allocation is the one that allocates the spilling memory. Transient / memoryless - so it can alias all the renderpasses in the same memory if it can. AB how does the proposal look to you?
  • AB: looks great from our side. Should be simple to implement as well. My colleague had a suggestion for an alternative approach: instead of texture usage flag, when you set render attachment in render pass, instead of ImageView, set a descriptor saying you want something transient with this format, sample count, etc. Impl would then manage the resource. Then on non-tiled GPU you could alias memory resources too. But only a limited set of memory usage.
  • CF: feel that impls could do funky things under this API if they wanted to.
  • AB: a little harder since you don't know what resources will be used simultaneously in the same pass in the future. But could probably do something like that.
  • CF: think the proposal right now is clear as to what's happening. Matches what the API shape is in the backends.
  • AB: agree, and everyone's implemented in that direction already.
  • CW: consensus to go in this direction.

[bindless] Add the proposal for how bindings are updated in dynamic binding arrays. #5379

  • CW: will try to explain Gregg's concern.
  • CW: was presented at the F2F
  • CW: Want to mutate the bindless bindgroup to add/remove things to/from it as assets are loaded/unloaded
  • CW: Gregg has valid concern, why do we ask the user to do all that (update/insertBinding/clone), when we could do it for them?
  • CW: problem here: nice to make things easier but:
    • 1) if you can mutate the BG at any point, we either race with the GPU (don't know if binding's in use by the GPU or not), or we need a stall. Reason is that Bindless SetBindGroup records a GPU pointer. (Discussion with Moz) Can't patch the pointer between Encoder.Finish() and submission. Whatever was put in SetBindGroup must be used at submission.
    • 2) don't have a portable way to put descriptor/BindGroup updates on the queue timeline. Vk without Fence extensions, and D3D12 - only the content is allowed to update BindGroups. Once GPU starts being able to use a bindgroup entry - can't update that entry until GPU work is finished.
    • Because we can only do the mods on the CPU, on the content timeline - it's like buffer mapping, we have to wait for things to finish before we can start CPU work.
  • Going back, then:
  • We'd want to let devs say, I want this slot to be this, make it happen. If something before was in use by the GPU, have to either 1) race with the GPU (we won't do that) 2) synchronously wait for the GPU to be finished - don't want to do that, massive multi-frame stall 3) do some kind of copy-on-write. Update BindGroup - this entry's still in use by the GPU - copy the BG, update the entry in this copy.
  • (3) is what Gregg suggested we could do in browsers.
  • CW: I argue that this isn't a palatable solution because - if the "update binding" performs COW, then all SetBindGroups before won't see the binding. Either COW happened or it didn't. Portability hazard, unpredictable behavior of the API. For that reason, and the reasons Jasper suggested for being explicit about when e.g. a 2 MB copy might happen - I think COW isn't palatable here.
  • CF: +1 in favor of this being an explicit operation. A lot of the complaints we get with our current bindless impl by users - this operation's expensive, and they have to do this to update anything at all. The user should want to do it, and have it in their code.
  • KR: Why does the copy on write behavior be application visible? If the bindgroup is in use by the GPU and the browser knows it, then all implementations need to perform the copy on write. So why is the behavior not portable?
  • CW: It's a race condition between the browser noticing that the GPU is finished with the work and the application calling updateBindings.
  • KR: I’m assuming there’s some sort of quick CPU polling option. We do a poll, if the thing is still in use then we do the copy on write. Doesn’t matter if the work finishes by the time the copy happens. I fail to see why there’s a race if we do this carefully? If we’re checking fences and provisionally doing the COW.
  • CW: That’s exactly what we’re doing. If the GPU is super fast the next time you call update bindings it’s already done. Then you release to production and start running on slower machine and the GPU takes more time. When you call update bindings the fence hasn’t passed. Previous calls don’t see the updated bindings. So there’s now two paths of the code running.
  • KR: I don’t see how that causes a problem?
  • CW: The reason why it affects previous versions of the binding is that if you have a big scene, start a render pass, bind the bind group, and have a new asset loaded. You add it to the bind group and maybe COW happens. If it does happen you need to set the bind group again, but if it doesn’t then the previous binding stays put.
  • KR: I can’t see how that happens? If you update the bind group you must be seeing the new version after, and the old version before the update, always.
  • BJ: At the moment the pattern is that you don't have to call setBindGroup over and over again, it stays set after setBindGroup. If we do CoW there is a concern that we need to set the bindgroup again where it was set to see the new bindings. But if no CoW happens then the new binding would appear .
  • CW: Two problems with the race. Need to update the bind groups and previous draws may or may not see the binding.
  • KN: Think what Ken is suggesting is that as soon as you call setBindGroup, you can't update that bind group anymore. And Corentin is saying that's not an acceptable/usable API.
  • KR: If you mutate the set bind group the implementation should implicitly take responsibility for changing the state. Even if by splitting…
  • CW: We could have an API that does that. The other problem is that you are a ray tracer, and you have an algorithm that uses bindless data to do something on your frame. You walk your scene graph, notice a new object, and update the bind group. On a fast machine no COW happens. Start of your frame see update objects. If the COW happens then the start of your frame won’t see the updated object and interesting things happen. It’s a race that’s not controlled by the user, and there will always be a way to construct timings that make it easy to break.
  • KR: If the mutation is handled implicitly by the implementation there should never be a scenario where that race can happen?
  • CW: No, but lets go through the queue.
  • JB: I know we want to support update and insert, but if we are only appending new resources is that still a usable model? If we limit mutation to appends it’s a lot easier to deal with. That said I don’t know how this is intended to be used.
  • CW: The slot we get back is a number, so people can see the order of insertion, but that’s fine. The reason we support updates is that some people like to say “slot three is reserved for X”. Helps portability from other APIs. Also helps if you want to allocate a range (all of my cubemaps are in range X-Y).
  • JB: Feels analogous to managing a malloc heap. It’s a mutating operation across threads, but nobody thinks about it that way. When you have ops that are guaranteed “fresh” that offers nice semantics. Seems like something that we could leverage.
  • CF: I’m trying really hard to not take away features that exist on native for bindless. People find bindless to be simple and flexible, and solve lots of problems. Want to preserve that when porting to WebGPU. Realized an API level problem with COW. You now have two distinct bind groups with the same “name”. We’ve all seen how people handle the fact that writeBuffers happens OOO. The more I think about it, the scarier it looks. What I like about clone with updates is you get a new name. The user sees that they are different. For understanding the behind the scenes it’s well worth the boilerplate.
  • JB: In the process of trying to keep things safe we’re adding complicated new machinery that native APIs don’t think about. Need to think about how we can actually ship, not necessarily what the native APIs look like. We have a lot of existing compromises in that direction.
  • CW: Can you explain?
  • JB: Is there anything analogous to clone with updates in native?
  • CF: Yes. <Some vulkan thing>
  • CW: In practice they have their own insert binding that they use manually. They don’t use the clone with update equivalent by allocating enough space.
  • JB: They’re using their understanding of the system to avoid unsafe behavior? Is that right?
  • CW: They only run into problems if they run out of space. <Missed something here about doing inserts for them>
  • KR: They know which regions are safe to mutate on the fly.
  • JB: That AAA behavior is what I meant. They know the internals and how to use it safely.
  • CF: Yes, they MUST know that.
  • CW: Only a few minutes left, happy to sync offline. You said “every time you call insert bindings it semantically creates a new version of the bindings and some get optimized out.” That’s a new proposal, different than what we discussed here. Worth discussing.
  • CF: That was my original proposal, I think, before I discussed it with Jasper.

[bindless] What limit for maxDynamicBindingArraySize? #5373

[bindless] Alternative to the GPUBindGroup addition of a variable size "dynamic binding array" #5372

Agenda for next meeting

Clone this wiki locally