-
Notifications
You must be signed in to change notification settings - Fork 345
Description
Following #1065, I want to propose an API that can be a reference point in discussion.
Queue discovery
We are constrained by Vulkan discovery mechanism here, which requires specifying used queues at the logical device instantiation time. Therefore:
dictionary GPUAdapterQueues {
// number of available general queues, have to be >= 1
required unsigned int general;
// number of available compute-only queues
required unsigned int compute;
// number of available copy-only queues
required unsigned int copy;
};
interface GPUAdapter {
// Discover the available queues. Note that this is a method because we can't have
// readonly attributes of dictionary types in WebIDL.
GPUAdapterQueues availableQueues();
...
};On Metal prior to MTLEvent support, this would always return a single general queue.
On D3D12, an implementation can return N queues of each type. It doesn't really matter what N is.
On Vulkan, an implementation may return a subset of what the VkPhysicalDevice exposes.
Note: it's always safe for an implementation to expose a single general queue, as a matter of finger-print reduction.
Initialization
And here is the way to request them:
enum GPUQueueType {
"general",
"compute",
"copy",
};
dictionary GPUDeviceDescriptor {
sequence<GPUQueueType> queues = ["general"];
};
interface GPUDevice {
readonly attribute FrozenArray<GPUQueue> queues;
};We'll make defaultQueue attribute to be equivalent to queues[0].
Commands
Command and render bundle encoders are going to be created from the queue, instead of the device:
interface GPUQueue {
GPUCommandEncoder createCommandEncoder(optional GPUCommandEncoderDescriptor desc = {});
GPURenderBundleEncoder createRenderBundleEncoder(optional GPURenderBundleEncoderDescriptor desc = {});
};A queue type limits the kind of operations that can be submitted in the relevant command buffers:
- "general" allows all operations
- "compute" doesn't allow render passes
- "copy" only allows
copy_operations
Buffers
There appears to be no cost in sharing the buffers between queues in D3D12, and presumably the same applies to Vulkan with "concurrent" sharing mode. Therefore, buffers are always considered shared. They can be used by any queue, or even multiple queues at the same time, but with restrictions.
Buffer can only be used simultaneously on multiple queues if their combined usage across the command buffers that are executed simultaneously is a subset of { "input", "constant", "storage-read" } internal usages. The idea is that the implementations, when multiple queues are requested, will consider these three internal usages as one big "shader-read-only" usage for the matter of synchronization. Notice the lack of copy usages here, since D3D12 doesn't allow to mix them in.
Internally, we'll associate each buffer with a set of queues that currently "own" it on the device timeline, in order to know when to insert the synchronization between queues. If it sees a buffer used on queue that is not in the current "owner" set, and the combined usage across the submissions is not "shader-read-only", the implementation will need to insert fences/semaphores/events internally, so that the new submission will only start when the previous owners of the buffer are done. This is a GPU-GPU synchronization, which still doesn't involve the CPU.
If the user has already inserted the fence signaling and waiting to synchronize the submissions, it's expected that the implementation can detect that and omit additional synchronization.
Textures
For textures, both D3D12 and Vulkan have penalties for "concurrent" sharing. Therefore, we can expose textures in a way that only a single queue can use a texture subresource at a time. Important note: we still work with individual subresources. Different subresources of a texture can still be used by multiple queues.
Similar (to GPUBuffer) synchronization rules apply: if a texture subresource is used on a different queue, the submissions need to be linked/separated by a GPUFence. Otherwise, we generate an error on submission.
Internally, we'll associate each texture subresource with a single queue (not a set of queues) that currently "owns" it on the device timeline. When an implementation sees a texture subresource used in a submission on a different queue, and it synchronizes the submissions by inserting the appropriate fences/semaphores/events signaling on the old queue, and waiting (on GPU) on the new queue. In addition, on Vulkan the implementation submits commands to "release" ownership of the old queue, and "acquire" ownership on the new queue.
Explicit handover
This is an option to consider. We could also allow users to explicitly specify this release of ownership, in order to allow the implementation to skip the additional synchronization after the first submission is done. This can be exposed by the following addition to the command buffer creation:
dictionary GPUTextureSubresourceRange {
GPUTextureAspect aspect = "all";
GPUIntegerCoordinate baseMipLevel = 0;
GPUIntegerCoordinate mipLevelCount;
GPUIntegerCoordinate baseArrayLayer = 0;
GPUIntegerCoordinate arrayLayerCount;
};
dictionary GPUTextureHandover: GPUTextureSubresourceRange {
required GPUTexture texture;
required unsigned int targetQueueIndex;
};
dictionary GPUCommandBufferDescriptor : GPUObjectDescriptorBase {
sequence<GPUTextureHandover> handoverTextures = [];
};The subresource range must be a subset of the subresources used by this command buffer. Otherwise, an error is generated.
Note: the GPUTextureSubresourceRange will also be used as a base for GPUTextureViewDescriptor.
This API would allow the implementation to:
- do fence signaling upon submission
- insert the appropriate explicit queue family transition at the end of the command buffer
If this is used correctly, on Vulkan we could avoid an extra internal submission (that needs to "release" ownership and signal a semaphore).
Concurrent
As a follow-up after MVP, we can consider a way of exposing the "concurrent" mode for textures, which translates to VK_SHARING_MODE_CONCURRENT and D3D12_RESOURCE_FLAG_ALLOW_SIMULTANEOUS_ACCESS.
It seems not critical for MVP, since users can do pretty much everything with the "exclusive" texture mode, and if they need concurrent access, they can still use buffers. It's still something to consider exposing, but it comes with several caveats:
- it can't be used with MSAA and depth textures, for example. This is probably not too difficult to specify.
- it comes "free" for textures with
STORAGEusage flag (since the color compression is disabled anyway). - we'll have to use the
VK_IMAGE_LAYOUT_GENERALfor all read-only usage, internally, on Vulkan and D3D12. It's less efficient to sample from, for example, versusVK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL.
So we can expect a somewhat resonable question from the users: why don't you allow us to use concurrent mode, if our textures are STORAGE already, and we only use it as such?