Proposal for a simple streaming API

## Intro
The reasoning implies vague familiarity with #138 and surrounding discussion.

Here are the base points for this proposal:
  1. Persistent mapping is the most efficient path on native APIs for streaming resources in from CPU. The user sub-allocates within a persistently mapped buffer (or multiple ones) and some parts of it are being updated from CPU while the others are used by GPU for copying data into device-local memory.
  2. WebGPU (currently) tracks usage of a buffer object as a whole, we don't allow parts of the same buffer to be in a different usage state. It also aims to have WebGPU buffers (and textures) to correspond 1:1 to the native objects in the respective APIs.
  3. We can't allow CPU-GPU data races for the upload/download scenarios.

I think those points start conflicting with each other as soon as we are talking about CPU-visible buffers. If we treat the buffer as a whole for GPU side (i.e. when considering it to be used as a vertex buffer, or in a resource group), we can't have parts of it being accessed by CPU. At some level we'd need to track buffer sub-ranges and manage persistent mapping. Here I propose to resolve the conflict by stating that CPU-visible resources are only going to be used for copy operations, and thus we don't need full-featured mapping semantics for a general `WebGPUBuffer`.

This proposal is introducing new commands for data uploads/downloads. They can be implemented by the browser with the most efficient method of uploads/downloads (via persistent mapping). The regular buffers then keep the 1:1 relation to native objects and the constraints of a single (mutable) usage.

## Code

```idl
dictionary WebGPUTextureDataLayout {
    u32 rowPitch;
    u32 slicePitch;
};

partial interface WebGPUCommandBuffer {
  Promise<ArrayBuffer> uploadBuffer(
    WebGPUBuffer dst,
    u32 dstOffset,
    u32 size);

  Promise<ArrayBuffer> downloadBuffer(
    WebGPUBuffer src,
    u32 srcOffset,
    u32 size);

  Promise<ArrayBuffer> uploadTexture(
    WebGPUTextureDataLayout layout,
    WebGPUTextureCopyView destination,
    WebGPUExtent3D size);

  Promise<ArrayBuffer> downloadTexture(
    WebGPUTextureCopyView source,
    WebGPUTextureDataLayout layout,
    WebGPUExtent3D size);
}
```

## Semantics

When recording a command buffer, the user may request to upload or download data from some of the resources. Those operations return `Promise<ArrayBuffer>`.

For upload operations, the promise is resolved once the WebGPU implementation has prepared the staging area for an upload. The initial contents of the `ArrayBuffer` are guaranteed to be zeroed out. The implementation would also hold the submission of the command buffer until all the upload callbacks (from promises) are done. This guarantees that the data for an upload is ready, and the implementation can proceed with the transfer into device-local memory, where all of the resources (buffers/textures) are allocated.

For download operations, the promise is resolved once the GPU is done executing this command buffer, and the `ArrayBuffer` would contain the data read from device-local memory of corresponding resources.

## Analysis

One clear downside is inability to have resources accessible from both CPU and GPU directly. On some architectures, the shared memory can be effectively used by GPUs and could technically allow the graphics programmer to avoid a copy (from CPU-visible to device-local), but this implementation doesn't allow that.

A follow-up upside, however, is that developers don't need to consider those architectures and implement diverging code paths for them. This results in simpler code and more portable behavior of the programs for our API.

Simplicity, in general, is the main feature of this proposal. The complexity of recycling buffer ranges and synchronizing the mapping access is put on the browser implementation. I believe this goes in line with the current direction WebGPU is heading (implicit synchronization, no memory objects).

There is another downside about the fact that a single `ArrayBuffer` corresponds to a single copy operation. I don't think this limitation is essential to the core of the proposal, and we can adjust to this need, if it's too constraining.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal for a simple streaming API #154

Intro

Code

Semantics

Analysis

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal for a simple streaming API #154

Description

Intro

Code

Semantics

Analysis

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions