Proposals for buffer operations (immediate uploads, buffer mapping)

PTAL, this is basically #49 but as an investigation, and with additional alternatives.

Our thoughts on this proposals are the following:

 - Buffer mapping 1: our preferred solution, but a bit complex
 - Buffer mapping 2: Mapping whole buffers seems too heavyweight for multi-process implementations
 - Immediate uploads 1: Goes very well with Buffer mapping 1
 - Immediate uploads 2: strongly in favor

# Buffer operations

This describes `WebGPUBuffer` operations that are used by applications to interact directly with the content of the buffer's memory.
The two primitives we need to support are the CPU writing data inside the buffer for use by the GPU (upload) and the CPU reading data produced by the GPU (readback).

Design constraints are:

 - For the portability of the API, prevent data races between the CPU and the GPU.
 - For performance, minimize the number of times the data is copied around.
 - To make the API non-blocking, only allow asynchronous readbacks.
 - For performance on multi-process implementations, make an asynchronous upload path.

Two alternative proposals are described for buffer mapping, `WebGPUMappedMemory` and whole-buffer mapping.
Two other proposals are described for immediate data uploads that aren't mutually exclusive, one base one `mapReadSync` of `WebGPUMappedMemory` and another using `setSubData`.

## Buffer mapping proposal 1

### map[Write|Read]Async and unmap

The way to have the minimal number of copies for upload and readback is to provide a buffer mapping mechanism.
This mechanism has to be asynchronous to ensure the GPU is done using the buffer before the application can look into the ArrayBuffer.
Otherwise on implementation where the ArrayBuffer is directly a pointer to the buffer memory, data races between the CPU and the GPU could occur.

We want the status of a map operation to act as both a promise, and something that's pollable as there are advantages to both.
`WebGPUMappedMemory` is an object that is `then`-able, meaning that it acts like a Javascript `Promise` but is pollable at the same time.

The mapping operations for `WebGPUBuffer` are:

```webidl
partial interface WebGPUBuffer {
    WebGPUMappedMemory mapWriteAsync(u32 offset, u32 size);
    WebGPUMappedMemory mapReadAsync(u32 offset, u32 size);
};
```

These operations return new `WebGPUMappedMemory` objects representing the current range of the buffer for writing or mapping.
The results are initialized in the "pending" state and transition at Javascript task boundary to the "available" state when the implementation can determine the GPU is done using the buffer.
Calling `mapReadAsync` or `mapWriteAsync` puts the buffer in the mapped state.
No operations are allowed in a buffer in that state except additional calls to `mapReadiAsync` or `mapWriteAsync` and calls to `unmap`.
In particular a mapped buffer cannot be used in a `WebGPUCommandBuffer` given to `WebGPUQueue.submit`.
The following must be true or a validation error occurs for `mapWriteAsync` (resp. `mapReadAsync`):

 - The buffer must have been created with the `WebGPUBufferUsage.MAP_WRITE` (resp. `WebGPUBufferUsage.MAP_READ`) usage.
 - `offset + size` must not overflow and be at most the size of the buffer
 - The `[offset, offset + size)` range must not intersect the range of another `WebGPUMappedMemory` on the same buffer which hasn't been previously invalidated.
 - The buffer has been destroyed.

Then a mapped buffer can be unmapped with:

```webidl
partial interface WebGPUBuffer {
    void unmap();
};
```

This operation invalidates all the `WebGPUMappedMemory` created from the buffer and puts the buffer in the unmapped state.
The buffer must be in the mapped state otherwise a validation error occurs when `unmap` is called.

### WebGPUMappedMemory

`WebGPUMappedMemory` is an object representing a mapped region of a buffer that's both pollable and promise-like.

It can be in one of three states: pending, available and invalidated.

The pollable interface is:

```webidl
partial interface WebGPUMappedMemory {
    bool isPending();
    ArrayBuffer getPointer();
};
```

`isPending` return true if the object is in the pending state, false otherwise.
`getPointer` returns an ArrayBuffer representing the buffer data if the object is in the available state, null otherwise.

`WebGPUMappedMemory` is also `then`-able, meaning that it acts like a Javascript `Promise`:

```webidl
partial interface WebGPUMappedMemory {
    Promise then(WebGPUMappedMemorySuccessCallback success,
                 optional WebGPUMappedMemoryErrorCallback error);
};
```

This acts like a `Promise<ArrayBuffer>.then` that is resolved on the Javascript task boundary in which the implementation detects the GPU is done with the buffer.
On that boundary:

 - The `WebGPUMappedMemory` goes in the available state.
 - If the `WebGPUMappedMemory` was created via `WebGPUBuffer.mapWriteAsync`, its content is cleared to 0.
 - `success` is called with the content of the memory as an argument.

If `success` hasn't been called when the WebGPUMappedMemory gets invalidated (meaning the object is still in the pending state), `error` is called instead.
When `WebGPUMappedMemory` goes from the available state to the invalidated state, the `ArrayBuffer` for its content gets neutered.
The return value of `then` acts like the return value of `Promise.then`.

The `ArrayBuffer` of a `WebGPUMappedMemory` created from a `mapWriteAsync` is where the application should write the data and its content is made available to the buffer when the `WebGPUMappedMemory` is invalidated (i.e. `WebGPUBuffer.unmap` is called).

## Buffer mapping proposal 2

In this proposal a buffer is always mapped as a whole as an asynchronous operation.
Mapping for reading (resp.writing) is done using `WebGPUBuffer.mapRead` (resp `WebGPUBuffer.mapWrite`).
The mapping calls but the buffer in the "mapped" state.
A Javascript error is thrown under these conditions:

 - The buffer hasn't been created with the `MAP_READ` (resp. `MAP_WRITE`) usage.
 - The buffer isn't in the unmapped state.
 - The buffer has been destroyed.

```webidl
partial interface WebGPUBuffer {
    void mapRead();
    void mapWrite();
}
```

Mapping is an asynchronous operation and after its resolution the buffer's `mapping` member will be updated to represent the content of the buffer (resp. filled with zero and ready to receive data from the application).
Resolution can only happen at Javascript task boundary, and after the implementation has determined it is safe to give access to the buffer to the CPU.
Resolution is guaranteed to complete before (or at the same time) as when all previously enqueued operations are finished executing (as can be observed with `WebGPUFence`).

```webidl
partial interface WebGPUBuffer {
    readonly attribute ArrayBuffer? mapping;
}
```

The buffer is unmapped with a call to `unmap` which puts it in the unmapped state.
It is an error to call `unmap` while in the unmapped state.
In the mapped state it is an error to do operations in the buffer (such as `setSubdata` or enqueuing commands using the buffer).

```webidl
partial interface WebGPUBuffer {
    void unmap();
};
```

## Immediate data upload proposal 1

When mapping for writing, the application doesn't see GPU state since the content is cleared to 0.
This means WebGPU can expose a `mapWriteSync` primitive that behaves exactly like `mapWriteAsync` except that the returned `WebGPUMappedMemory` object starts in the available state.

```webidl
partial interface WebGPUBuffer {
    WebGPUMappedMemory mapWriteSync(u32 offset, u32 size);
};
```

## Immediate data upload proposal 2

Buffer mapping is the path with the least number of copies but it is often useful to upload data to a buffer *right now*, if only for debugging.
A `WebGPUBuffer` operation is provided that takes an ArrayBuffer and copies its content at an offset in the buffer.

```webidl
partial interface WebGPUBuffer {
    void setSubData(ArrayBuffer data, u32 offset);
}
```

This operation acts as if it was done after all previous "device-level" commands and before all subsequent "device-level" commands.
"Device level" commands are all commands not buffered in a `WebGPUCommandBuffer`, and include `WebGPUQueue.submit`.
The content of `data` is only read during the call and can be modified by the application afterwards.
The following must be true or a validation error occurs:

 - The buffer must have been created with the `WebGPUBufferUsage.TRANSFER_DST` usage flag.
 - `offset + data.length` must not overflow and be at most the size of the buffer.
 - The buffer must not be currently mapped.

## Unused designs

### Persistently mapped buffer

Persistently mapped buffer are when the result of mapping the buffer can be kept by the application while the buffer is in use by the GPU.
We didn't find a way to have persistently mapped buffers and at the same time keep things data race free between the CPU and GPU.
Being data race free would be possible if ArrayBuffer could be unneutered but this is not the case.

### Promise<ArrayBuffer> readback();

This didn't have a pollable interface and forced an extra buffer-to-buffer copy to occur if the GPU execution could be resumed immediately.

### Dawn's MapReadAsync(callback);

Not a pollable interface.

## Issues

### GC discoverability

It isn't clear yet what happens when a buffer gets garbage collected while it is mapped.
The simple answer is that the `WebGPUMappedMemory` objects get invalidated but that would allow the application to discover when the GC runs.

### GC pressure

The `WebGPUMappedMemory` design makes each mapped region create two garbage collected objects. This could lead to some GC pressure.

### Side effects between mapped memory regions

What happens when `WebGPUMappedMemory` object's region in the buffer overlap?
Are write from one visible from the other?
If they are, maybe `WebGPUMappedMemory.getPointer` should return an `ArrayBufferView` instead.

### Interactions with workers

Can a buffer be mapped in multiple different workers?
If that's the case, the pointer should be represented with a `SharedArrayBuffer`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposals for buffer operations (immediate uploads, buffer mapping) #138

Buffer operations

Buffer mapping proposal 1

map[Write|Read]Async and unmap

WebGPUMappedMemory

Buffer mapping proposal 2

Immediate data upload proposal 1

Immediate data upload proposal 2

Unused designs

Persistently mapped buffer

Promise readback();

Dawn's MapReadAsync(callback);

Issues

GC discoverability

GC pressure

Side effects between mapped memory regions

Interactions with workers

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposals for buffer operations (immediate uploads, buffer mapping) #138

Description

Buffer operations

Buffer mapping proposal 1

map[Write|Read]Async and unmap

WebGPUMappedMemory

Buffer mapping proposal 2

Immediate data upload proposal 1

Immediate data upload proposal 2

Unused designs

Persistently mapped buffer

Promise readback();

Dawn's MapReadAsync(callback);

Issues

GC discoverability

GC pressure

Side effects between mapped memory regions

Interactions with workers

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions