-
Notifications
You must be signed in to change notification settings - Fork 344
Proposal for timeline fences #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
IIUC the question (i.e. its waiting for a fence on the CPU), we can borrow |
|
Question: what's the value to have a value stored in a fence given the limitations? Can't we get away with creating a new fence for every interface WebGPUQueue {
[NewObject] WebGPUFence singal();
}
interface WebGPUFence {
boolean isSignaled();
Promise<void> onSignaled();
void wait(); // only for workers
}? |
|
The solution you are describing is what's in the sketch IDL and what Vulkan has. Vulkan developers have complained that they want timeline fences because it matches their existing usage of D3D12. Also having a number in the fence makes it so you don't have to do the frame-counting yourself, something that almost every user of WebGPU would do, so we do it for them. Also it create less garbage. Having |
a7a5ead to
6e3ff17
Compare
|
PTAL @kvark @jdashg @litherum @RafaelCintron, addressed the comments from the call so I think it is in a good state now (even if the prose is a bit ew). |
|
|
||
| To mirror native APIs, fences are created directly on the `WebGPUDevice`. | ||
|
|
||
| ```webidl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we put webidl in the design docs, how do we make sure it's valid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, I don't think there is something that can compile the snippets of code like rust-docs does.
|
|
||
| ## Questions | ||
|
|
||
| - Should we call fences "timelines" and have them created on queues like so `queue.createTimeline()`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fence is more familiar to d3d12, which we mimic here
| // Queue | ||
| interface WebGPUQueue { | ||
| void submit(sequence<WebGPUCommandBuffer> buffers); | ||
| WebGPUFence insertFence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you've deleted insertFence changed wait, this pull request should contain modifications to the buffer mapping document to explain the new usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole buffer-mapping.md doc is broken now because we don't have transition, we need to rework it completely, and figure out how it will work with the updated fences, if at all. Since buffer mapping isn't an enqueued operation anymore, I don't think it makes sense to wait on a fence to see the mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR makes it out-of-date, I was expecting you to update the buffer mapping document as part of this PR.
We can do it as a subsequent update if you wish but we should tackle it sooner rather than later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the F2F we discussed buffer mapping again and seemed to have consensus around:
- We need
WebGPUBuffer.mapReadAsyncandWebGPUBuffer.mapWriteAsync. mapWrite*returns a zero-filled buffer.- The buffer is either owned by the GPU (on one or multiple queues) or owned by the CPU. Never both at the same time.
Things that were still unknowns:
- How do we know when the mapping is ready? We talked about promises but fell in the same debate of promises vs. not as a year ago.
- Are we able to map sub-ranges of the buffer, or only the whole range?
At this point I think our old proposal of having a WebGPUMappedMemory object might make sense (minus the transitions).
I agree that buffer mapping needs to be handled sooner than later, but I think it is way bigger than the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Might make sense to put a note in the buffer mapping explainer telling people that parts of it are incorrect and need to be reworked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
design/TimelineFences.md
Outdated
|
|
||
| This call generates a Javascript exception if `value` is greater than the fence's signaled value (to make sure the promise completes in finite time). | ||
| There is no way to synchronously wait on a fence in this proposal (it is better handled via the requestMainLoop proposal). | ||
| The promise is completed as soon as the fence's completed value is highr than `value`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: highr -> higher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| A Javascript exception is generated if: | ||
| - `value` is smaller or equal to the signaled value of the fence. | ||
| - [multi-queue] the fence must be signaled on the queue passed as the `signalQueue` of its descriptor. | ||
| This restriction is to make sure the signal operations are well-ordered, which would be more complicated if you could signal on multiple queues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the API will be validating that any wait on a fence will be unblocked by a prior signal operation that has been enqueued, I am not sure I understand the how multi-queue signaling would affect well orderedness. Can you please provide an example where this would break? What would become more complicated that single-queue signaling would simplify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things would break down in cases like this. Let F be a timeline fence (current and signaled value at 0) and Q1, Q2, Q3 be queues. The application issues the following:
Q1.signal(F, 4)Q2.signal(F, 5)Q3.wait(F, 5)
If Q1 is scheduled before Q2 then the fence value is 5 and Q3 is unblocked. All is good.
If Q2 is scheduled before Q1 then the fence value is 4, and Q3 is never unblocked, which isn't valid (all execution must finish in finite time, this is a good thing and also a requirement of Vulkan on Linux).
To avoid this case we would need to validate that the signal operation on Q2 happens after the signal operation on Q1. This would require the application to put another fence between Q1 and Q2 and the implementation to walk the signaling graph to do the validation. This is complicated and I don't expect signaling on multiple queues to be a common use case at all.
Single-queue signaling helps because the ordering of the signal operations is obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the example, @Kangz . I agree what you've proposed here should avoid the problem in your example as well as other deadlocks where two queues wait for different fences and the unblocking Signals are trapped behind those waits.
Out of curiosity, what happens in Vulkan on Linux if you have multiple queues waiting for something that never gets signaled? Does the OS hang or crash or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, if we ever add Signaling from the CPU to the API, then your example would become a problem. Since CPU Signaling is done on the fence itself, as opposed to the queue, we can't enforce the "that has been enqueued" part of "any wait on a fence will be unblocked by a prior signal operation that has been enqueued" earlier in the document.
Signaling on the CPU allows us to do producer/consumer scenarios that involve long running GPU operations who's final step depends on a CPU upload. Similarly, multi-queue GPU operations may produce textures that are readback by the CPU in a round robin fashion. Both of these scenarios can still be done, but we would lose CPU/GPU parallelism.
I do not think this is a big deal for MVP but something to keep in mind for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what happens in Vulkan on Linux if you have multiple queues waiting for something that never gets signaled? Does the OS hang or crash or something else?
No idea :)
Separately, if we ever add Signaling from the CPU to the API, then your example would become a problem. Since CPU Signaling is done on the fence itself, as opposed to the queue, we can't enforce the "that has been enqueued" part of "any wait on a fence will be unblocked by a prior signal operation that has been enqueued" earlier in the document.
This has the same issue that it isn't supported on Vulkan and isn't possible on Linux. I agree that it is an interesting use-case so if/when Vulkan has that capability we can think about it again. Also a fence on the CPU would probably be only signalable on the CPU.
|
@RafaelCintron are there other changes you'd like before merging this PR? |
|
Thanks for the review, I'll land this now. |
No description provided.