Strawman Multi-Queue Proposal

Following  #1065, I want to propose an API that can be a reference point in discussion.

## Queue discovery
We are constrained by Vulkan discovery mechanism here, which requires specifying used queues at the logical device instantiation time. Therefore:
```webidl
dictionary GPUAdapterQueues {
  // number of available general queues, have to be >= 1
  required unsigned int general;
  // number of available compute-only queues
  required unsigned int compute;
  // number of available copy-only queues
  required unsigned int copy;
};

interface GPUAdapter {
  // Discover the available queues. Note that this is a method because we can't have
  // readonly attributes of dictionary types in WebIDL.
  GPUAdapterQueues availableQueues();
  ...
};
```
On Metal prior to `MTLEvent` support, this would always return a single general queue.
On D3D12, an implementation can return N queues of each type. It doesn't really matter what N is.
On Vulkan, an implementation may return a subset of what the `VkPhysicalDevice` exposes.

Note: it's always safe for an implementation to expose a single general queue, as a matter of finger-print reduction.

### Initialization
And here is the way to request them:
```webidl
enum GPUQueueType {
  "general",
  "compute",
  "copy",
};

dictionary GPUDeviceDescriptor {
  sequence<GPUQueueType> queues = ["general"];
};

interface GPUDevice {
  readonly attribute FrozenArray<GPUQueue> queues;
};
```

We'll make `defaultQueue` attribute to be equivalent to `queues[0]`.

## Commands

Command and render bundle encoders are going to be created from the queue, instead of the device:
```webidl
interface GPUQueue {
  GPUCommandEncoder createCommandEncoder(optional GPUCommandEncoderDescriptor desc = {});
  GPURenderBundleEncoder createRenderBundleEncoder(optional GPURenderBundleEncoderDescriptor desc = {});
};
```

A queue type limits the kind of operations that can be submitted in the relevant command buffers:
  - "general" allows all operations
  - "compute" doesn't allow render passes
  - "copy" only allows `copy_` operations

## Buffers

There appears to be no cost in sharing the buffers between queues in D3D12, and presumably the same applies to Vulkan with "concurrent" sharing mode. Therefore, buffers are always considered shared. They can be used by any queue, or even multiple queues at the same time, but with restrictions.

Buffer can only be used simultaneously on multiple queues if their combined usage across the command buffers that are executed simultaneously is a subset of { "input", "constant", "storage-read" } internal usages. The idea is that the implementations, when multiple queues are requested, will consider these three internal usages as one big "shader-read-only" usage for the matter of synchronization. Notice the lack of copy usages here, since D3D12 doesn't allow to mix them in.

Internally, we'll associate each buffer with a set of queues that currently "own" it on the device timeline, in order to know when to insert the synchronization between queues. If it sees a buffer used on queue that is not in the current "owner" set, and the combined usage across the submissions is not "shader-read-only", the implementation will need to insert fences/semaphores/events internally, so that the new submission will only start when the previous owners of the buffer are done. This is a GPU-GPU synchronization, which still doesn't involve the CPU.

If the user has already inserted the fence signaling and waiting to synchronize the submissions, it's expected that the implementation can detect that and omit additional synchronization.

## Textures

For textures, both D3D12 and Vulkan have penalties for "concurrent" sharing. Therefore, we can expose textures in a way that only a single queue can use a texture subresource at a time. Important note: we still work with individual subresources. Different subresources of a texture can still be used by multiple queues.

Similar (to `GPUBuffer`) synchronization rules apply: if a texture subresource is used on a different queue, the submissions need to be linked/separated by a `GPUFence`. Otherwise, we generate an error on submission.

Internally, we'll associate each texture subresource with a single queue (not a set of queues) that currently "owns" it on the device timeline. When an implementation sees a texture subresource used in a submission on a different queue, and it synchronizes the submissions by inserting the appropriate fences/semaphores/events signaling on the old queue, and waiting (on GPU) on the new queue. In addition, on Vulkan the implementation submits commands to "release" ownership of the old queue, and "acquire" ownership on the new queue.

### Explicit handover

This is an option to consider. We could also allow users to explicitly specify this release of ownership, in order to allow the implementation to skip the additional synchronization after the first submission is done. This can be exposed by the following addition to the command buffer creation:
```webidl
dictionary GPUTextureSubresourceRange {
    GPUTextureAspect aspect = "all";
    GPUIntegerCoordinate baseMipLevel = 0;
    GPUIntegerCoordinate mipLevelCount;
    GPUIntegerCoordinate baseArrayLayer = 0;
    GPUIntegerCoordinate arrayLayerCount;
};

dictionary GPUTextureHandover: GPUTextureSubresourceRange {
  required GPUTexture texture;
  required unsigned int targetQueueIndex;
};

dictionary GPUCommandBufferDescriptor : GPUObjectDescriptorBase {
  sequence<GPUTextureHandover> handoverTextures = [];
};
```

The subresource range must be a subset of the subresources used by this command buffer. Otherwise, an error is generated.
Note: the `GPUTextureSubresourceRange` will also be used as a base for `GPUTextureViewDescriptor`.

This API would allow the implementation to:
  1. do fence signaling upon submission
  2. insert the appropriate explicit queue family transition at the end of the command buffer

If this is used correctly, on Vulkan we could avoid an extra internal submission (that needs to "release" ownership and signal a semaphore).

### Concurrent

As a follow-up after MVP, we can consider a way of exposing the "concurrent" mode for textures, which translates to `VK_SHARING_MODE_CONCURRENT` and `D3D12_RESOURCE_FLAG_ALLOW_SIMULTANEOUS_ACCESS`.

It seems not critical for MVP, since users can do pretty much everything with the "exclusive" texture mode, and if they need concurrent access, they can still use buffers. It's still something to consider exposing, but it comes with several caveats:
  - it can't be used with MSAA and depth textures, for example. This is probably not too difficult to specify.
  - it comes "free" for textures with `STORAGE` usage flag (since the color compression is disabled anyway).
  - we'll have to use the `VK_IMAGE_LAYOUT_GENERAL` for all read-only usage, internally, on Vulkan and D3D12. It's less efficient to sample from, for example, versus `VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL`.

So we can expect a somewhat resonable question from the users: why don't you allow us to use concurrent mode, if our textures are `STORAGE` already, and we only use it as such?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Strawman Multi-Queue Proposal #1066

Queue discovery

Initialization

Commands

Buffers

Textures

Explicit handover

Concurrent

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strawman Multi-Queue Proposal #1066

Description

Queue discovery

Initialization

Commands

Buffers

Textures

Explicit handover

Concurrent

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions