GPU Web 2024 05 29

GPU Web WG 2024-05-29 Atlantic-time

Chair: KG

Scribe: JB/KR/MM

Location: Google Meet

Tentative agenda

Administrivia
CTS Update
“Tacit Resolution” Queue
Add optional feature clip-distances (if there are new questions) #4588
Add optional feature dual_source_blending #4621
Push constant proposal #4612
Should there be a limit on the number of open GPUCommandEncoder instances? #4622
[editorial] Consider if there's a much better name for "image copies" #4573
[spec] What to do with the “Detailed Operations” section?
Agenda for next meeting

Attendance

Apple
- Mike Wyrzykowski
Google
- Kai Ninomiya
- Ken Russell
Intel
- Hao Li
- Jiawei Shao
- Shaobo Yan
- Zhaoming Jiang
Microsoft
- Rafael Cintron
Mozilla
- Jim Blandy
- Kelsey Gilbert
Mehmet Oguz Derin
Myles Maxfield

Administrivia

None

CTS Update

KN: GT has been working on compressed textures, MOD working on 3d textures with compressed texture formats. There were some issues with ETC2 textures on Metal, which we’ll want to discuss at some point.

“Tacit Resolution” Queue

Add missing multisample.count validation #4681
- Backends essentially require this - in render pipeline creation they require the sample count to be supported by the format. (We’re just more restrictive in that we only allow 1 and 4 regardless of format.)

Add optional feature clip-distances (if there are new questions) #4588

JS: Kai had one last comment about whether we needed ??, but this is a pretty straightforward feature to implement on the platform APIs
KN: The compressed texture thing is a good example of this: I would like to keep the spec in a good state, and we thought were in good shape with 3D textures but it turned around to bite us. So in the future I would like to be more careful, and get things implemented landed in the CTS before we add things to the spec, so we can keep the spec “Green”
KG: Yes, we are supposed to land what we think is “likely” in the working drafts, and you’re not supposed to implement the working drafts. We’re supposed to be focussing on things we need to cut the CR. So we need to focus on actually getting to the CR, not adding new features
KR: Just to clarify: working is definitely going towards making CR, lots of PRs from Kai and Brandon
KN: So, what does that mean we should do with this PR?
KG: Hard to say. In part, I’d like to simply stop adding features to the spec, and stop making our work harder. Alternatively, we should just branch already, and separate the thing we’re going to move to CR from the WD that we’re going to continue to contribute new features to.
MM: from an outsider's perspective, CR means nothing, and implementations are what actually matter. if implementations aren't going to implement something, it makes sense to wait to land in the spec. otherwise, landing is good
KG: These things we’re considering are too complicated to get right in the first PR. The WD is supposed to be where we land these ideas and iterate on them as we discover problems. The standard is supposed to be the thing that people can expect we’re moving towards implementing.
KG: There’s not a single right answer: if this group reaches consensus that we should merge this, then we should merge it. I’m advocating not trying to get this right on the first try. We’re supposed to respond to our attempts to implement them [by adjusting the WD]. But whatever the group’s consensus is on what to do with this PR, that’s what we’ll do.
KN: I don’t feel too strongly: If we want to treat the WD as a WD even in this intermediate period, that’s fine with me. We can make sure that nothing we’re positive isn’t ready for CR appears in the CR branch.
KG: How does Google feel about merging this PR?
KN: Google is okay with merging this PR.
KG: I think Mozilla is ready to merge this PR too. Is there more work needed?
KN: The spec is ready.
Conclusion: merge it
(merged)

Add optional feature dual_source_blending #4621

JS: discussed this feature last year. Some issues while creating the PR. Kai raised a blocking issue.
KN: trying to figure out validation rules in the WGSL spec. If you put blend_src on anything, you need exactly 2 locations? Blend sources 0 and 1, on the entry point? And different locations, can't use blend_src at all? Validate this eagerly in WGSL? Trying to figure out how to actually write this down.
JS: agree, no big issues, just spec wording.
KG: anything you wanted feedback from the group on?
JS: only updated the PR, found some new info on Mac/Metal - outputs have to have the same type when doing dual-source blending, or it will error. Simplifies this PR a lot, actually.
KG: need approvals on the PR.
KN: needs editorial work; we'll bring this back here, maybe don't need to wait for a Pacific time meeting.

Push constant proposal #4612

SY: debating whether we should define push constant limits. MM listed several options for implementers. For implicit ones, have more burden - one block in shaders for normal push constant buffer, and the other for uniform buffer. And, prepare a bind group, or have holes for backing buffers.
MM: https://github.com/gpuweb/gpuweb/pull/4612#issuecomment-2138245033
SY: for explicit proposal - have to limit whole numbers. If we want to go this way, perf should be proven - or make sure it performs better than the implicit one.
SY: my opinion: slightly prefer explicit push constants. Concerned about impl burden. Lots of extra work on the API side; shader side, not sure about perf loss risks.
KN: agree that impl burden should be a consideration. Would be nice to have proof that using native push constants had the best perf. Think it's better to keep this feature simple.
SY: this feature's to encourage people to upload small size of data. Maybe limit could help. Going totally implicit - think some requirements can't be met.
MW: can we change the language to not call it "push constants"? It sort of implies it's going into the command buffer, not possible at all in Metal.
SY: yes, "immediate data" sounds better to me; will update the PR.
KG: It sounds like people need more time to consider the different options
SY: I can make a benchmark to show the performance improvement of native push constants
MM: That would be awesome. Please make sure you run it on a variety of different vendors’ cards.

Should there be a limit on the number of open GPUCommandEncoder instances? #4622

MW: wrote a benchmark. Can request a large limit, don't see degradation until excessive numbers are use. Can just make the limit large and make createCommandEncoder when you get to the device-specific number. Impl-wise, very small change on our side.
KG: sounds like broad acceptability?
KN: don't want OOM from createCommandEncoder. Think it will cause weird errors in apps.
MM: The application has to be able to report OOM during command encoder creation. any thing that allocates memory has to be able to report OOM.
KG: I hear you but the platform has endured limits people have run into before. Better if we didn't have to, but probably good enough for vast amount of apps. Apps will probably either fix themselves or file bugs against impls, maybe encourage impls to get better.
KG: It’s similar to maxStringLength in Javascript. It isn’t a problem until it is
KN: to clarify, I'd rather lose the device than OOM.
MM: Sounds good.
KR: I’m concerned that permitting OOM here or Device loss is going to make the API feel fragile: on some subset of devices code that works fine elsewhere is just going go boom. And people are going to start having to work around it. It seems like it’s just better to require the implementation to bear the complexity of providing a more robust implementation that defers to a slower implementation.
MM: This is already true. Make one too-many Javascript objects, and boom you get OOM
KN: (reading Myles' comments)
- MM: Thanks Kai :)
KN: we don't explicitly permit device loss anywhere - but it can happen any time. Wouldn't need to explicitly call it out. Safari can make the choice to lose the device when this happens. If there are stability problems with this in Safari, it's Safari's problem and they can try to mitigate it, for example with record-and-replay. Basically true with anything in the API - just an impl detail.
KG: there are edge cases in FF's WebGL impl where we lose the device for technically valid operations. In practice, tolerable to the impl (a successful, shipping impl). Not about perfectly robust, but sufficiently robust so that real apps can run on your impl.
MW: we did consider record-and-replay for this. Don't know if it would buy us much on lower-end devices - as opposed to requesting higher limit. Repro I uploaded - 6 MB per command buffer. Can record all these, but at some point you'll be killed due to memory pressure anyway. That was our thought process during investigation.
KG: think we'd be within consensus to not have a stated limit, or a very high stated limit in the spec. In practice, most apps don't rely on having more than that.
MW: on Mac, could make 64,000 buffers before the driver complained.
KG: I'd propose that we don't set a limit for this today. But it is useful to give ourselves a mandatory minimum for the purpose of CTS testing. In the spec, impls must support a certain number of open GPUCommandEncoders. Not unbounded, but not making it an explicit limit.
KN: think that's a great idea.
KG: what should be the mandatory amount? 64?
MW: 64 is fine. Tried on an iPad - could make 2000 command buffers before OOM limit.
KG: how about 1000? CTS will try to make 1000 of these and expect it succeeds.
Consensus for this.

[editorial] Consider if there's a much better name for "image copies" #4573

JB: There doesn’t seem to be a name that’s much better, so let’s stop discussing this topic
KN: There isn’t anything that’s perfect, so it depends how bad the current name is. I think the current name is actually pretty bad, but not terrible.
KG: It’s tricky for me to be confident in my opinion, because i’m “damaged” by this existing in API specs. This is the way it’s used a lot of times. I don’t feel like this is much much better than what we currently have. Maybe interested parties will poke at in this PR and see if anything comes out of it. It might be useful to think about constraining this to something individual rather than all at once
JB: One of the ways to provide value in names is to have them be consistent. I think there’s a reason to consider them all at once. We have to have some way for the process to be bounded. Time is expensive.
KN: The way we came to using the word “image” here is it’s in the API in the form of <missed>. It’s totally unclear that those two uses of the word “image” are related at all. Nothing is binding us to keep it. Editors can discuss this.
KG: My 2c is “images” makes a lot of sense if you’re coming from graphics background. That’s the common background. Any 2D surface of texel data is called an image.
KN: Maybe it’s all good and we should just stick with it. Most likely, yes. But editors will discuss one more time.
KG: Any other thoughts? They can go into GitHub later.

[spec] What to do with the “Detailed Operations” section?

KN: Another editing thing. Brandon isn’t here, he’s the one poking at this the most. We have this section, and it defines what actually happens when you rendering stuff. It’s got some good details and it’s not very complete. Dzmitry wrote most of this, because he wanted to have this written out. A lot of it is a useful reference. It’s full of holes. “There’s a hole here, there’s a hole here.”
KN: For the purposes of getting to CR, we’ve been thinking about what to do with this. It’s probably too complicated to flesh out the entire thing completely. We had this plan to point at the Vulkan spec, and say “Here’s the inputs we push to the Vulkan spec, and we get the output.” We haven’t figure out how to do this, and it’s also complicated.
KN: One idea is take the section, extract the bits that were important to the rest of the spec, and keep them. Everything else that’s a reference for how things are supposed to work would be a separate document, which would be OK if it’s full of holes. It would include key details like standard sample locations, but not the full algorithms.
KN: Would this be sufficient for CR, if we actually didn’t describe how to draw stuff?
KG: We don’t have to exhaustively list every component. It has to be useful by a partially-informed person to create an implementation. We will always have gaps. WebAudio specs don’t describe how to do an FFT. We can just say “rasterization happens.” That, in concert with the CTS, will be sufficient. It would be nice to have a little bit, maybe, for “what does dispatchWorkgroups() do?” but….
KN: Okay. Brandon and I will continue looking at this and see what we can do. I think we can come up with something that doesn’t involve moving all of this out of the spec. But that’s an option as well, if there’s enough left over. We could make a copy of this in another document, and reduce the amount that’s in this spec… but I agree.
KN: The fact that it’s supposed to be implementable on hardware, and implementable on these backend APIs puts all the restrictions in place, really. So, I think it’s probably suitable for CR if we are a bit hand-wavey when it comes to some of the details.
KG: I find it particularly difficult to phrase what would be pseudocode and put that into an Algorithm in bikeshed. I wish we had details about how things operated even if they weren’t in the normative Algorithm form. Whatever helps us get the details in, as long as we all understand it, that’s what we need
KN: We just need to say “here’s what various things do” and we pick and choose the important stuff.
KG: The OpenGL spec goes into exhaustive detail on how filtering works. I would not wish for anyone to have to convert that to Bikeshed algorithm format. If it would be useful to have those things in the spec, but onerous to put them into the algorithmic format… any form we find useful is good. We can improve on it later.
KG: We have a section for transfer. Do we have any transfer operations?
KN: Those are the copy operations. I don’t know why they’re called “transfer”
KG: OK. It’s hard to say exactly where the threshold is. Put enough in that you feel like an implementation is possible. We can ask each person in the WG to send it off for CR. And we can address any comments that come in.
KN: I wonder if we can take everything that says “editorial note” and fill it out with details.
KG: File an editor’s issue, and we can do incremental improvement.
KN: We’ll keep figuring it out.

Agenda for next meeting

Add optional feature dual_source_blending #4621
Push constant proposal #4612
KG: Let’s do it at Atlantic time. In 2 weeks. On the 12th.
Agenda requests
- 3D compressed textures: ETC on macOS is failing validation #3183
- Feature levels again?
- (not before the 24th) Presentation about GPU memory isolation from UCSC

GPU Web 2024 05 29

GPU Web WG 2024-05-29 Atlantic-time

Tentative agenda

Attendance

Administrivia

CTS Update

“Tacit Resolution” Queue

Add optional feature clip-distances (if there are new questions) #4588

Add optional feature dual_source_blending #4621

Push constant proposal #4612

Should there be a limit on the number of open GPUCommandEncoder instances? #4622

[editorial] Consider if there's a much better name for "image copies" #4573

[spec] What to do with the “Detailed Operations” section?

Agenda for next meeting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!