这是indexloc提供的服务,不要输入任何密码
Skip to content

Proposal for a simple streaming API #154

@kvark

Description

@kvark

Intro

The reasoning implies vague familiarity with #138 and surrounding discussion.

Here are the base points for this proposal:

  1. Persistent mapping is the most efficient path on native APIs for streaming resources in from CPU. The user sub-allocates within a persistently mapped buffer (or multiple ones) and some parts of it are being updated from CPU while the others are used by GPU for copying data into device-local memory.
  2. WebGPU (currently) tracks usage of a buffer object as a whole, we don't allow parts of the same buffer to be in a different usage state. It also aims to have WebGPU buffers (and textures) to correspond 1:1 to the native objects in the respective APIs.
  3. We can't allow CPU-GPU data races for the upload/download scenarios.

I think those points start conflicting with each other as soon as we are talking about CPU-visible buffers. If we treat the buffer as a whole for GPU side (i.e. when considering it to be used as a vertex buffer, or in a resource group), we can't have parts of it being accessed by CPU. At some level we'd need to track buffer sub-ranges and manage persistent mapping. Here I propose to resolve the conflict by stating that CPU-visible resources are only going to be used for copy operations, and thus we don't need full-featured mapping semantics for a general WebGPUBuffer.

This proposal is introducing new commands for data uploads/downloads. They can be implemented by the browser with the most efficient method of uploads/downloads (via persistent mapping). The regular buffers then keep the 1:1 relation to native objects and the constraints of a single (mutable) usage.

Code

dictionary WebGPUTextureDataLayout {
    u32 rowPitch;
    u32 slicePitch;
};

partial interface WebGPUCommandBuffer {
  Promise<ArrayBuffer> uploadBuffer(
    WebGPUBuffer dst,
    u32 dstOffset,
    u32 size);

  Promise<ArrayBuffer> downloadBuffer(
    WebGPUBuffer src,
    u32 srcOffset,
    u32 size);

  Promise<ArrayBuffer> uploadTexture(
    WebGPUTextureDataLayout layout,
    WebGPUTextureCopyView destination,
    WebGPUExtent3D size);

  Promise<ArrayBuffer> downloadTexture(
    WebGPUTextureCopyView source,
    WebGPUTextureDataLayout layout,
    WebGPUExtent3D size);
}

Semantics

When recording a command buffer, the user may request to upload or download data from some of the resources. Those operations return Promise<ArrayBuffer>.

For upload operations, the promise is resolved once the WebGPU implementation has prepared the staging area for an upload. The initial contents of the ArrayBuffer are guaranteed to be zeroed out. The implementation would also hold the submission of the command buffer until all the upload callbacks (from promises) are done. This guarantees that the data for an upload is ready, and the implementation can proceed with the transfer into device-local memory, where all of the resources (buffers/textures) are allocated.

For download operations, the promise is resolved once the GPU is done executing this command buffer, and the ArrayBuffer would contain the data read from device-local memory of corresponding resources.

Analysis

One clear downside is inability to have resources accessible from both CPU and GPU directly. On some architectures, the shared memory can be effectively used by GPUs and could technically allow the graphics programmer to avoid a copy (from CPU-visible to device-local), but this implementation doesn't allow that.

A follow-up upside, however, is that developers don't need to consider those architectures and implement diverging code paths for them. This results in simpler code and more portable behavior of the programs for our API.

Simplicity, in general, is the main feature of this proposal. The complexity of recycling buffer ranges and synchronizing the mapping access is put on the browser implementation. I believe this goes in line with the current direction WebGPU is heading (implicit synchronization, no memory objects).

There is another downside about the fact that a single ArrayBuffer corresponds to a single copy operation. I don't think this limitation is essential to the core of the proposal, and we can adjust to this need, if it's too constraining.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions