GPU Web 2020 11 09

Chair: Corentin

Scribe: Austin / Ken

Location: Google Meet

Tentative agenda

Multi-queue synchronization #1169
Documented guaranteed swap chain formats and validation #1185
PR burndown
- Add validation rule for QuerySet and Timestamp Query #1161
- Make blending state explicit #1134
Agenda for next meeting

Attendance

Apple
- Dean Jackson
Google
- Austin Eng
- Brandon Jones
- Corentin Wallez
- James Darpinian
- Kai Ninomiya
- Ken Russell
Kings Distributed Systems
- Dominic Cerisano
- Hamada Gasmallah
Microsoft
- Damyan Pepper
Mozilla
- Dzmitry Malyshau
- Jeff Gilbert
Eduardo H.P. Souza
Michael Shannon
Mehmet Oguz Derin
Timo de Kort

Administrivia

W3C's new Patent Policy
- DM: Choice of going with it or the old way? or are we required?
- CW: WG has a Patent Policy which is the former one. There’s a new (better?) policy which we can optionally switch to. Probably shouldn’t answer now, but should reach out to W3C reps.
Publish WebGPU as First Public Working Draft #1210
- Francois suggests that we can do it soon because the specification covers most of the API. though right now it doesn’t cover things like rendering, etc. Should we have a placeholder description to make sure we take into account the Essential Claims Exclusion?
- KN: Not sure if we need that because we’re not implementing rasterization algorithms - we’re just asking the system to do it. Obv don’t know the legal answer here, but it’s possible the legal things for that aren’t relevant. Especially if we don’t describe it and just say to go with the documentation on how rasterization occurs.
- DJ: That’s how we looked at it internally. This is not describing that level of detail. It’s describing APIs on top of things that already do it. If we do want to cover rasterization algorithms, that’s different.
- CW: Aren’t we going to need to describe the rasterization algorithms in some form? Because we don’t want the LOD computation to differ between systems. Most likely not different, but may be small differences in the APIs.
- KN: Maybe those could be carved out as an exclusion. Not contributing a rasterization algorithm but just describing one.
- KR: Could just be covered by the conformance suite. Like WebGL, this is a wrapper API and doesn’t contribute the algorithms.
- CW: Difference is that WebGL is explicitly a wrapper API. Basically GLES, but formalized for constraints of web / JS. It says, “we take GLES, and add a layer on top”. Don’t know how practical this is for WebGPU. “Go look at Vk / D3D12 func spec for how these things work.”
- KR: Anticipate adding a ton of text to describe line / triangle rasterization?
- CW: maybe. We need to guarantee things like min precision of certain operations, etc. Even that the point center is at (0,0) and not (0.5, 0.5) like D3D9.
- KR: Inclination is that like Canvas2D which doesn’t cover the exact details like that. It just says: these are the pixels covered if you drew XYZ. Don’t think WebGPU spec would cover the exact details of how the line is rasterized. I think that would be covered by the conformance suite and not have patented documentation in the spec.
- CW: What do other people think?
- KN: would be nice if we could sidestep that issue. Should try to.
- RC: +1 to writing as little as we can get away with. Corentin’s right, we can’t just refer to the OpenGL man pages, will have to spec things ourselves.
- DM: I don’t know about this one.
- CW: let’s think about it more. I need to research more, had some concerns about not having it.

Documented guaranteed swap chain formats and validation #1185

CW: is the only active part of the discussion the part about the format?
BJ: question about the formats that should be guaranteed. BGRA8, BGRA8_SRGB, unorm’s in there somewhere, RGB16F. If any of those formats are not something you want or you want more, let us know.
BJ: also the behavior that PR describes right now is that you can only use either one of those guaranteed formats or the preferred format. To use the preferred format you have to query it from the API first. Can’t assume RGBA will be a preferred format all the time. These are the 2 areas in question.
KN: Only other thing I remember is how it interacts with wide color and HDR. That discussion is still going. Should we allow f16 only on canvases created with certain color space attributes?
BJ: I do recall one of the discussion points: when querying preferred swap chain format, you’re using both the device and canvas as inputs. Why? What do we expect to change that outcome? e.g. canvas is on an HDR monitor, or there is an HDR monitor plugged into the system that the canvas might be put on at some point? This text right now says the UA can use whatever logic it wants to choose the preferred format, but we could use more text to describe what impacts the decision.
CW: also the API’s currently asynchronous, and potentially it could be synchronous.
KN: good point. I don’t know how these would change except maybe on Android and you’re displaying other wide-color-gamut content vs. not.
CW: none of these depend on the GPUDevice nor asynchrony. On Android the non-WCG is RGBA8Unorm; other OSs are BGRA8Unorm (?)
KN: suspect we’d choose the preferred format statically per OS depending on whether the canvas is HDR or not. Need you to tell us that information.
JG: whether sRGB is the optimal format depends on the system.
KN: had it in mind that that was fixed.
JG: I don’t think that’s how it works. sRGB is a different encoding than unorm 8
DM: On d3d you can’t create an unorm swapchain. You have to create a view of it.
KN: don’t think we can use D3D swapchains, practically speaking, in a browser. Talked with a lot of people.
KR: pretty sure that Chrome’s windows team has found you can create overlays for RGB swapchains, not just YUV - at least on some GPUs. They’re looking into doing that for desync=true canvases and probably offscreen canvas too
JG: we looked into this in Firefox for our DirectComposition path.
KN: might be that you don’t use SwapChains for this but another mechanism.
CW: preferred swap chain format might be BGRA8 unorm, sRGB, and not just BGRA8 unorm.
JG: We have at least one device where that’s the only actual fast path.
CW: do you know this on GPU process start?
JG: probably, yes.
CW: maybe we can remove the asynchrony. RGBA8Unorm, or sRGB variants, are acceptable formats. getPreferredSwapChainFormat depends only on canvas format, not GPU device, and no asynchrony.
DM: if it doesn’t depend on the device then it doesn’t know about what API will be used with it.
JG: why remove the GPUDevice?
CW: if it’s unnecessary. Is it? Can we type up a list of exact formats? Do we need a device for that determination?
DM: what does it guarantee? If I take the preferred format, am I guaranteed my swap chain will be created with this format?
JG: yes.
CW: whatever’s on the SwapChainDescriptor’s what you get.
DM: so if my preferred format isn’t part of the baseline formats, e.g. if I’m on a special HDR monitor, I got the preferred format, disconnected the HDR monitor, and then create the swap chain with the preferred format, is that legal?
JG: Preferred format would need to be part of the valid acceptable formats.
DM: that’s not what we discussed earlier. We want a list of valid formats all the time, and the preferred format can be different.
KN: Probably would be a fixed list of what those could be, but probably would not be allowed by default. Don’t think it will depend on the monitor. It might depend on the GPU vendor, but it would be limited.
JG: for portability reasons you need to be able to test these without the device. Need to be able to ask for a format that isn’t the preferred one, and test that.
CW: For SDR we can assume it’ll always be one of the RGBA8unorm - or srgb or different swizzle. And say those formats are always allowed - and then figure out HDR later.
KN: Agree we should figure out HDR later, and would appreciate Jeff and Ken’s input on that.
JG: Also a discussion about how canvas does HDR. WICG canvas color space. https://github.com/WICG/canvas-color-space
KR: I’d be concerned about advertising something lke BGRA all the time if it’s not natively supported and incurs a full device blit. If we people write content assuming this, and on mobile platforms incur a fullscreen blit, it’ll be a big perf cliff. We’ve run into this with alpha:false and premultipliedAlpha:false on WebGL.
JG: Agree, but we need to accept a wider variety than what the platform natively supports in the name of portability. If you’re on desktop and write it assuming BGRA, and then go to Android and test there, your application just won’t work. Would be nice if it worked but maybe not as well.
KR: What about devtools emulating it? Emulate mobile device would only advertise RGBA8 path.
JG: good idea, we should have this, problem here is: have a lot of constraints where the way this interacts with things, have to attach it to a render target, it actually has to be that format as opposed to how it works in WebGL, where you can’t see what the real format is. Have to handle weird things like blitting multisampled things to it. Have more constraints in WebGPU, so can’t have the same softening of the edges like we do in WebGL. Have to be actual formats, or behave like they are.
CW: maybe we can say: these are the 4 formats the swap chain supports. If your canvas is SDR, these are the only types you can pass in. Preferred format - which you should use - should be one of these 4. Can test your code - will be portable - but all docs should say, please use the preferred swap chain format.
BJ: Know we didn’t want to get too into HDR, but one question about that pattern is that what if the preferred format is an HDR format, but my content is not? Operating under the assumption that if I ask for an HDR rendering pipeline and write to it as if it’s rgb or sRGB, it might not work out well. I’d like to know the preferred 8-bit format.
KN: We wouldn’t prefer an HDR format unless the canvas were set up that way - which is an explicit step.
BJ: what does that look like? Is that something that happens during rendering context creation time?
KN: yes. We don’t have that path yet.
JG: FWIW, there’s an orthogonality between pixel format and color space, so we don’t have to worry so much. If you render 1.0 into the red channel, you’ll get 1.0 no matter what the pixel format is. It just might be a different XYZ color based on the color space.
KN: still wouldn’t want to get an F16 texture out of the blue. Only if you did something saying you can support it.
JG: it can be part of the acceptable formats though.
KN: Probably would like to have it depend on how the canvas is set up, or a flag in getPreferredSwapChainFormat.
BJ: Does that also mean the float16 format should not be guaranteed unless the flag has been set?
KN: probably.
JG: I don’t see why we have to forbid those. Not the preferred format, but we can sample / composite those in our renderer process.
BJ: Doesn’t tone mapping have to take place?
JG: only for color spaces. They’re different from pixel formats.
KN: Some color spaces can’t handle a number greater than 1 which f16 will support.
CW: question is: general consensus is the preferred swap chain format should be part of the allowed formats, in general. For testing, and other reasons. Question now: if your canvas is SDR, can you use FP16? Do we want to make people handle this complexity? Or only if you have the flag set? Otherwise I think there’s rough agreement on the rest of the ideas.
KR: THink it would be fine if it’s opt in. Not clear how the opt in is made available.
..
How about we say that that’s where you opt in to higher bit depth, and the preferred swapchain format falls out of that.
CW: Okay so preferred format changes on that, but it also changes the list of guaranteed formats. Now we’re also saying the preferred formats also has to be part of it - so it’s more like the allowed format list.
BJ: Preferred formats only added to allowed/guaranteed formats after being queried?
CW: I think the consensus is that we want to guarantee that the preferred format is part of the original allowed format list.
RC: I think that’s reasonable.
CW: And so, maybe we can start with that and talk offline about whether fp16 and rgb10a2 are part of the allowed format list for SDR canvases or not. It would be a small change compared to the current state of the spec.
BJ: Okay sounds good to me. I’ll make the changes and can continue discussion on the github issue.
DM: about asynchrony - can we agree to not have this asynchronous?
CW: I think so.
DM: can we have it on something that has both the GPU context and adapter?
KN: Not sure if the adapter matters.
RC: if you have multiple adapters, the list of allowed formats might differ.
KN: think we have to guarantee the list of supported formats on all devices.
RC: then same thing between a nice monitor and a crummy monitor.
KN: not strong connection between adapters & monitors. Can always move content between them.
MS: problem I see: on the Intel side, almost every CPU has an integrated GPU. Will every integrated chip drag down the capabilities of the system?
JG: not every CPU has an integrated GPU.
KN: I think that we should support exactly the same list of formats on every system and the browser would be responsible for making it work.
CW: to go back to DM’s question: 1) do we need asynchrony for getPreferredFormat? 2) does it depend only on the swapchain, or the device you want to use? Michael’s saying it should depend on the device. Many will want BGRAunorm, sRGB, and maybe the bigger GPU on the system will want something different.
JG: When we say device do you mean adapter?
CW: Yes, sorry.
BJ: Device could be passed but it implicitly represents the adapter.
DM: there’s no need to pass the device. Pass the adapter. That’s what you pass at the native level.
CW: the native APIs take adapters. So: this API is synchronous, and returns one of the allowed formats.
DM: Takes in GPUCanvasContext and GPUAdapter.
CW: HDR we can discuss offline.

Multi-queue synchronization #1169

DM: had 2 proposals. Talked a lot, converged into one. Tricky parts:
DM: 1) removes the notion of fences. Integrates fence semantics into the queue itself. JG suggested trying to get that in separately from the change, so that’s probably what we’ll do next.
- CW: trying to present that part. Fence logic was folded here in GPUQueue.
- DM: each Queue has a counter with two values: last submitted job, last completed job. That’s how you synchronize between them.
DM: 2) It’s still explicit. You say what queues you want to synchronize with. Because there’s no indirection via fences, it becomes very clear how to synchronize. I believe it’s easy enough to use that people will be able to use it.
RC: is last signaled value what you tie to signalWith? not something that increments each frame?
CW: Of note, the signal changes to not be changed with a value. Instead, it gives you a value that is guaranteed to be increasing.
RC: talking specifically about attribute called lastSignaledValue. Does that increment when you call the API, or can the UA increment that whenever it wants?
CW: It would go up when you call signal(). I propose it always returns ++lastSignaledValue.
RC: if it returns 2, then lastSignaledValue will return 1?
CW: no. If lastSignaledValue is 1, signal() returns 2. then lastSignaledValue == 2. It’s the last value returned by signal. For composability, and to make it easier for people to remember.
JG: Do we need to have this in the multiqueue PR?
CW: it’s necessary for multi-queue things because it simplifies the behavior of multi-queue heavily. No multiple fences per queue that you deal with.
JG: it simplifies it but doesn’t fundamentally make it possible.
CW: Yes.
DM: I’d say the properties of the proposal is different if landed without the change. If we want to separate them we should block multiqueue on the change first.
CW: I think the fence as it is today is a useless object. Not useful to have more than 1 per queue. For composability, maybe, but this proposal is composable, too. Seems like a good improvement to merge Fence and Queue.
RC: is it also the case that the valid values you can pass into wait() are valid values you got from signal() before? Can’t hang system by calling wait(6 million)?
CW: Yes. Cannot hang with this.
JG: You change fence relatively substantially here, but keep the parts you say were useless.
CW: Can you elaborate?
JG: You said you felt Fence was a useless object. It’s weird we’re changing Fence given..
KN: it’s not that fences were useless, but the ability to make multiple Fences per Queue were useless.
CW: two things. what Kai said, and in a single queue world, the fence object is only there so that you can see the queue has finished work. And actually you could do the same thing by calling mapAsync. Fence doesn’t bring that much value.
JG: I think one of the core reasons we have them this way is because it’s mostly what they look like in D3D and Vk. And what is a concern anytime we take something people do and simplify it is that it becomes useful later and we have to add it back in. Then we end up with two things.
DM: I don’t see how we’re going to need that.
JG: hasn’t happened yet. That’s the risk here. We’re trying to design an API for 5-10 years. All the APIs aren’t done changing things around. Seems risky to take it out and add more constraints because it doesn’t look valuable right now.
CW: it’s a valid concern. For fences - fences are always used in all APIs as a flag saying “we went past this execution point or not”. Even if APIs start using them for more things, the intuition is that the native APIs say that. Maybe you can wait on something not signaled yet; interop; do integration between different pieces of hardware; etc. My intuition is that there won’t be major change from saying that we went past this point of execution on a particular timeline. That’s what we’re doing here.
JG: here we’re changing the soverign Fence object to tuple of GpuQueue and signal value.
CW: that’s what our implementation was doing internally.
JG: if you typedef it that way, that’s what a GpuFence is.
CW: For multiqueue, it helps to reduce the complexity by having a single fence per queue.
JG: But we don’t have that here.
CW: there’s a single D3D-style fence. It’s a number that gets incremented as execution goes along. Compared to current state of the API, it’s like no createFence(), but GPU.fence / Queue.fence. What’s the use of having multiple fences per queue?
JG: I don’t have a concrete use case right now. My concern is that this is a big change. I understand that this simplifies multi-queue, but the reason we’re off in the weeds is that this is a big semantic change.
CW: A little background: the proposal is that when you do a submit with a bunch of resource transfers, the receiving queue explicitly waits on the value. There’s explicit handoff and explicit synchronization. If you have multiple fences per queue, there’s a question of when you do a transfer, which fence is it tied to? Or is it tied to all the fences as soon as they’re signaled after this point - but that might be a lot of impl tracking complexity..
KN: also: I don’t think we had agreement on multi-queue without this. There was pushback on having it be explicit. This has it implicit, to simplify things and make them palatable.
JG: It’s hard to call things explicit vs. implicit and describe them as whether or not people had appetite for them. IIRC the part people didn’t have appetite for was different from this.
CW: what should we do for this? Split the multi-queue and Fence changes, see what each of them looks like independently, and keep that merged proposal as a reference?
KN: is there a reason to not just try to solve the fence thing first? It changes MQ enough that I think we should just block MQ on it.
JG: we can talk about it. I think the fence changes aren’t necessary for MQ, but the solution for how this handles MQ without the fence changes would have been sufficient. Would like to know more about what this makes a lot better.
CW: I can write this down. What the tracking would look like with multiple fences and a single fence.
JG: we can have fences not exposed to the users. That’s what I thought we’d be doing with submit with transfers.
CW: I’ll try to replay the process that went into this proposal.

PR burndown

Agenda for next meeting

There are too many combinations of texture bind point type and sampler bind point type #1164
More multi-queue / fence changes?
Robust buffer access performance cost? This meeting or the WGSL meeting?
- JG: probably WGSL.

GPU Web 2020 11 09

Tentative agenda

Attendance

Administrivia

Documented guaranteed swap chain formats and validation #1185

Multi-queue synchronization #1169

PR burndown

Agenda for next meeting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!