这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@Kangz
Copy link
Contributor

@Kangz Kangz commented Oct 8, 2018

No description provided.

@dmikis
Copy link

dmikis commented Oct 8, 2018

How do we wait synchronously on fences?

IIUC the question (i.e. its waiting for a fence on the CPU), we can borrow Atomics.wait logic: allow blocking in workers, throw an exception in the main thread (i.e., disallow blocking in the main thread).

@dmikis
Copy link

dmikis commented Oct 8, 2018

Question: what's the value to have a value stored in a fence given the limitations? Can't we get away with creating a new fence for every signal:

interface WebGPUQueue {
    [NewObject] WebGPUFence singal();
}

interface WebGPUFence {
    boolean isSignaled();
    Promise<void> onSignaled();
    void wait(); // only for workers
}

?

@Kangz
Copy link
Contributor Author

Kangz commented Oct 8, 2018

The solution you are describing is what's in the sketch IDL and what Vulkan has. Vulkan developers have complained that they want timeline fences because it matches their existing usage of D3D12. Also having a number in the fence makes it so you don't have to do the frame-counting yourself, something that almost every user of WebGPU would do, so we do it for them. Also it create less garbage.

Having wait be like Atomics.wait would be an option, I'll add it to the open questions.

@Kangz Kangz force-pushed the fences branch 2 times, most recently from a7a5ead to 6e3ff17 Compare October 8, 2018 21:09
@Kangz
Copy link
Contributor Author

Kangz commented Oct 8, 2018

PTAL @kvark @jdashg @litherum @RafaelCintron, addressed the comments from the call so I think it is in a good state now (even if the prose is a bit ew).


To mirror native APIs, fences are created directly on the `WebGPUDevice`.

```webidl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we put webidl in the design docs, how do we make sure it's valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, I don't think there is something that can compile the snippets of code like rust-docs does.


## Questions

- Should we call fences "timelines" and have them created on queues like so `queue.createTimeline()`?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fence is more familiar to d3d12, which we mimic here

// Queue
interface WebGPUQueue {
void submit(sequence<WebGPUCommandBuffer> buffers);
WebGPUFence insertFence();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you've deleted insertFence changed wait, this pull request should contain modifications to the buffer mapping document to explain the new usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole buffer-mapping.md doc is broken now because we don't have transition, we need to rework it completely, and figure out how it will work with the updated fences, if at all. Since buffer mapping isn't an enqueued operation anymore, I don't think it makes sense to wait on a fence to see the mapping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR makes it out-of-date, I was expecting you to update the buffer mapping document as part of this PR.

We can do it as a subsequent update if you wish but we should tackle it sooner rather than later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the F2F we discussed buffer mapping again and seemed to have consensus around:

  • We need WebGPUBuffer.mapReadAsync and WebGPUBuffer.mapWriteAsync.
  • mapWrite* returns a zero-filled buffer.
  • The buffer is either owned by the GPU (on one or multiple queues) or owned by the CPU. Never both at the same time.

Things that were still unknowns:

  • How do we know when the mapping is ready? We talked about promises but fell in the same debate of promises vs. not as a year ago.
  • Are we able to map sub-ranges of the buffer, or only the whole range?

At this point I think our old proposal of having a WebGPUMappedMemory object might make sense (minus the transitions).

I agree that buffer mapping needs to be handled sooner than later, but I think it is way bigger than the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Might make sense to put a note in the buffer mapping explainer telling people that parts of it are incorrect and need to be reworked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


This call generates a Javascript exception if `value` is greater than the fence's signaled value (to make sure the promise completes in finite time).
There is no way to synchronously wait on a fence in this proposal (it is better handled via the requestMainLoop proposal).
The promise is completed as soon as the fence's completed value is highr than `value`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: highr -> higher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

A Javascript exception is generated if:
- `value` is smaller or equal to the signaled value of the fence.
- [multi-queue] the fence must be signaled on the queue passed as the `signalQueue` of its descriptor.
This restriction is to make sure the signal operations are well-ordered, which would be more complicated if you could signal on multiple queues.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the API will be validating that any wait on a fence will be unblocked by a prior signal operation that has been enqueued, I am not sure I understand the how multi-queue signaling would affect well orderedness. Can you please provide an example where this would break? What would become more complicated that single-queue signaling would simplify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things would break down in cases like this. Let F be a timeline fence (current and signaled value at 0) and Q1, Q2, Q3 be queues. The application issues the following:

  • Q1.signal(F, 4)
  • Q2.signal(F, 5)
  • Q3.wait(F, 5)

If Q1 is scheduled before Q2 then the fence value is 5 and Q3 is unblocked. All is good.

If Q2 is scheduled before Q1 then the fence value is 4, and Q3 is never unblocked, which isn't valid (all execution must finish in finite time, this is a good thing and also a requirement of Vulkan on Linux).

To avoid this case we would need to validate that the signal operation on Q2 happens after the signal operation on Q1. This would require the application to put another fence between Q1 and Q2 and the implementation to walk the signaling graph to do the validation. This is complicated and I don't expect signaling on multiple queues to be a common use case at all.

Single-queue signaling helps because the ordering of the signal operations is obvious.

Copy link
Contributor

@RafaelCintron RafaelCintron Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the example, @Kangz . I agree what you've proposed here should avoid the problem in your example as well as other deadlocks where two queues wait for different fences and the unblocking Signals are trapped behind those waits.

Out of curiosity, what happens in Vulkan on Linux if you have multiple queues waiting for something that never gets signaled? Does the OS hang or crash or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separately, if we ever add Signaling from the CPU to the API, then your example would become a problem. Since CPU Signaling is done on the fence itself, as opposed to the queue, we can't enforce the "that has been enqueued" part of "any wait on a fence will be unblocked by a prior signal operation that has been enqueued" earlier in the document.

Signaling on the CPU allows us to do producer/consumer scenarios that involve long running GPU operations who's final step depends on a CPU upload. Similarly, multi-queue GPU operations may produce textures that are readback by the CPU in a round robin fashion. Both of these scenarios can still be done, but we would lose CPU/GPU parallelism.

I do not think this is a big deal for MVP but something to keep in mind for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what happens in Vulkan on Linux if you have multiple queues waiting for something that never gets signaled? Does the OS hang or crash or something else?

No idea :)

Separately, if we ever add Signaling from the CPU to the API, then your example would become a problem. Since CPU Signaling is done on the fence itself, as opposed to the queue, we can't enforce the "that has been enqueued" part of "any wait on a fence will be unblocked by a prior signal operation that has been enqueued" earlier in the document.

This has the same issue that it isn't supported on Vulkan and isn't possible on Linux. I agree that it is an interesting use-case so if/when Vulkan has that capability we can think about it again. Also a fence on the CPU would probably be only signalable on the CPU.

@Kangz
Copy link
Contributor Author

Kangz commented Oct 16, 2018

@RafaelCintron are there other changes you'd like before merging this PR?

@Kangz
Copy link
Contributor Author

Kangz commented Oct 21, 2018

Thanks for the review, I'll land this now.

@Kangz Kangz merged commit 06ea9c5 into gpuweb:master Oct 21, 2018
@Kangz Kangz deleted the fences branch October 21, 2018 10:03
@kainino0x kainino0x added the multi-queue Part of the multi-queue feature label Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

multi-queue Part of the multi-queue feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants