Proposal: BufferSubRanges to solve buffer-tracking, streaming and descriptor set explosion.

# Motivation and Overview

**TL;DR** Basically TextureViews, but for Buffers.

As you may have noticed from #138 I'm of the opinion that whole-buffer tracking will be a great hindrance to porting and general development, and will require WebGPU specific work-arounds to make normal resource streaming possible.

However the WebGPU group's focus on portability throws the usual performant implementations off the table.

In general, Mapping Proposal 1 from #138 is quite good, as it allows for:
- Zero-copy write or readback (if heap sharing, external memory are available or WebGPU is a singleprocess implementation)
- Using the mappable buffer for any purpose
However it fails w.r.t. the following:
- Having to create multiple copies of the buffer due to whole-buffer single mutable usage tracking

Extending that proposal by the introduction of the new BufferSubRange (name T.B.C.) object will allow finer-grain usage tracking of buffers by declaring the sub-ranges to be tracked up-front, while an alignment requirement ensures that the total number of sub-ranges and the difficulty of tracking them is reduced.
This allows us to partition a `WebGPUBuffer` (which we hope is represented by a single native-API buffer) into multiple `WebGPUBufferSubRange`s on which the actual "single mutable state" rule gets enforced.
This will prevent "Bind Group Explosion" and other side-effects in streaming scenarios, such as with this example streaming allocator made by @Kangz 
```js
// A "mapping" is a WebGPUBuffer and its ArrayBuffer mapping.
// {"buffer": WebGPUBuffer, "pointer": ArrayBuffer}

let availableMappings = [];
let mappingsUsedThisFrame = [];

let currentMapping = null;
let currentOffset = 0;

function getFreshMapping() {
    if (availableMappings.length == 0) {
        let buffer = device.createBuffer({"size": 2*1024*1024, "usage": ...});
        let pointer = buffer.mapWriteSync(0, buffer.size);
        availableMappings.push({"buffer": buffer, "pointer":pointer});
    }

    currentMapping = availableMappings.pop();
    mappingsUsedThisFrame.push(currentMapping);

    return currentMapping;
}

// Returns a (mapping, offset) pair.
function requestUploadSpace(size) {
    if (currentMapping == null || currentOffset + size > currentMapping.buffer.size) {
        getFreshMapping();
    }

    let offset = currentOffset;
    currentOffset += size;
    return (currentMapping, offset)
}

function unmapUploadBuffers() {
    mappingsUsedThisFrame.forEach(mapping => mapping.buffer.unmap());
}

function remapAsyncUploadBuffers() {
    mappingsUsedThisFrame.forEach(mapping => {
        mapping.buffer.mapWriteAsync(0, mapping.buffer.size).then(pointer => {
            availableMappings.push({
                "buffer": mapping.buffer,
                "pointer": pointer,
            });
        });
    });
}

function endOfFrame() {
    mappingsUsedThisFrame = [];
    currentMapping = null;
    currentOffset = 0;
}


function frame() {
    let commands = device.createCommandBufferBuilder();

    frameObjects.forEach(object => {
        // uploadUniformData uses requestUploadSpace to know in which ArrayBuffer to put data
        // it also knows which WebGPUBuffer this corresponds to and return it so it can be used
        // to build the command buffer.
        const (buffer, offset) = uploadUniformData(object)
        renderObjectUsingUniform(object, commands, buffer, offset)
    });

    unmapUploadBuffers();

    queue.submit(commands);

    remapAsyncUploadBuffers();
    endOfFrame();
}
``` 

In such a scenario the WebGPU app would have to make a `WebGPUBindGroup` for each possible combination of actual buffer objects being used.
If however we could use dynamic buffer offsets #116 and track per-`WebGPUBufferSubRange` instead, then there would be no need for creating and using different bind-groups.

# Mapping Proposal

We introduce a new object, its creation parameters and mapping flags
```webidl
interface WebGPUMappingType {
    const u32 MAP_READ = 1;
    const u32 MAP_WRITE = 2;
};

dictionary WebGPUBufferSubRangeDescriptor {
    u32 offset;
    u32 size;
    WebGPUBufferUsageFlags usage;
};

interface WebGPUBufferSubRange {    
    Promise then(WebGPUUsageFinishedCallback success);

    WebGPUMappedMemory mapAsync(WebGPUMappingType accessFlags, u32 offset, u32 size);
    void unmap();

    void destroy();
};

interface WebGPUBuffer {
    WebGPUBufferSubRange createSubRange(WebGPUBufferSubRangeDescriptor descriptor);
};
```

The actual mapping is a slight spin on Proposal 1 from #138 , with a few changes:
1. `offset` is an offset into the `WebGPUBufferSubRange`, not the parent `WebGPUBuffer`
2. `WebGPUBuffer` cannot be mapped, only a subrange represented by `WebGPUBufferSubRange` can.
3. The promise is generalized and is used to queue up a callback which will return after the BufferSubRange's last use by any of the pending and already submitted command buffers. Any command buffer that uses a BufferSubRange that has any promise pending at the time of WebGPU submission will wait with actual native API submission until these promises reject or fullfill and the callbacks return.
4. Calling `mapAsync` will throw an error if the BufferSubRange is "in use" by the GPU, which we define as being used in a currently executing command on any queue or by any command which has not yet executed but in a command buffer that has already been submitted to any queue.
5. SubRange must not be destroyed while mapped or "used" by the GPU
6. `WebGPUBuffer` cannot be destroyed while any of its "children" `WebGPUBufferSubRange` cannot be destroyed.
7. The memory can be mapped for both read and write (not necessary, but useful)
8. For non-read mappings the memory is either zero-ed or memory recycled from previous `WebGPUBufferSubRange` mappings made by the same application (not necessary, but doubles streaming performance)
**I'm obviously open to modifications to points 3-onwards.**


Key advantages of this proposal:
1. Persistently Mapped Buffers can be used by all implementations internally
2. Zero-copy is possible on singleprocess implementations and any implementations which can use Vulkan external memory or D3D12 shared heaps
3. It's easy to use `WebGPUBufferSubRange`s in-place of memory-blocks in existing mappable gpu buffer allocators, also using `WebGPUBuffer.createSubRange` and `WebGPUBufferSubRange.destroy` as a fast and easy way to coalesce free memory blocks
4. Two `WebGPUBufferSubRange`s created from the same `WebGPUBuffer` are actually sub-ranges of the same native API buffer object, hence internally they are the same object which optimizes descriptor set changes and creation.
5. Like in native Persistently Mapped Buffer applications, we can have one range of the buffer mapped while having the GPU use another
6. `WebGPUBufferSubRange` creation and destruction are really light-weight operations that require no API calls except for event polling
7. Granularity of range-splitting is reduced by alignment requirement

# WebGPUBufferSubRange Creation

The expectation is that `WebGPUBufferSubRange` creation and destruction will be really high performance compared to `WebGPUBuffer` creation and destruction.

`WebGPUBuffer.createSubRange` will return an error if any of the following are not true:
- `descriptor.offset` and `descriptor.size` are both multiples of 256 (can be higher, even 4096) [1]
- The `[descriptor.offset,descriptor.offset+descriptor.size)` range is not contained within the creating buffer
- The `[descriptor.offset,descriptor.offset+descriptor.size)` range includes any part of a range assigned to a `WebGPUBufferSubRange` previously created from the same `WebGPUBuffer` which has not been destroyed yet
- Each bit in the `descriptor.usage` field was also present in the `WebGPUBufferDescriptor.usage` field that the creating `WebGPUBuffer` was created with
- If `descriptor.offset` is not equal to 0 or `descriptor.size` does not equal the creating `WebGPUBuffer`'s size, then `descriptor.usage` field CANNOT contain the `WebGPUBufferUsage` flags `INDEX` nor `VERTEX`
- The creating `WebGPUBuffer` has been destroyed

[1] This places the same requirement on `WebGPUBuffer` size alignment.
[2] This only applies is the native API provided robustness guarantees are relied on.

## Word about Robustness

Because the `WebGPUBufferSubRange` does not map 1:1 with the native APIs' buffer objects (only `WebGPUBuffer` does), we cannot rely on out-of-bounds accesses of index buffers or vertex buffers to be handled gracefully (because they are not bound as ranges, unlike UBOs and SSBOs).

This is why is the `WebGPUBufferSubRange` does not contain the entire parent `WebGPUBuffer`, then it could not be used as vertex (fetched by bad indices) or index (fetched by bad draw indirect) input data.

This obviously does not apply if #117 was to be implemented, or if the robustness gurantees of the native APIs were deemed not sufficient and `WebGPU` implementations handle out-of-bounds and invalid accesses entirely by themselves.

# Changes to the rest of WebGPU

Naturally `WebGPUBufferSubRange` would take the place of `WebGPUBuffer` in most APIs except for bind-groups.

## Changes to Bind Groups

These would be the new Bind Group and Binding definitions:
```javascript
dictionary WebGPUStaticBufferBinding {
    WebGPUBufferSubRange bufferRange;
    u32 offset;
    u32 size;
};

dictionary WebGPUDynamicBufferBinding {
    WebGPUBuffer buffer;
    u32 size;
};

typedef (WebGPUSampler or WebGPUTextureView or WebGPUStaticBufferBinding or WebGPUDynamicBufferBinding) WebGPUBindingResource;
```

Assuming #149 being merged, and modifying it
```webidl
dictionary WebGPUBufferSubRangeAndOffsetPair {
    WebGPUBufferSubRange subBuffer;
    u32 offset; // offset is an offset into subBuffer specified above
};

partial interface WebGPUProgrammablePassEncoder {
    void setBindGroup(u32 index, WebGPUBindGroup bindGroup, optional sequence<WebGPUBufferSubRangeAndOffsetPair> dynamicOffsets);
}
```
Here we add constraints that:
1. The `WebGPUBuffer` given by `WebGPUDynamicBufferBinding.buffer` of the `WebGPUDynamicBufferBinding` associated with a sequence element `WebGPUBufferSubRangeAndOffsetPair`, must be the parent creating buffer of the `WebGPUBufferSubRangeAndOffsetPair.subBuffer`
2. The range given by `[WebGPUBufferSubRangeAndOffsetPair.offset,WebGPUBufferSubRangeAndOffsetPair.offset+WebGPUDynamicBufferBinding.size)` must not overflow `WebGPUBufferSubRange.subBuffer`


And finally, changing the rest:
```webidl
partial interface WebGPURenderPassEncoder {
    void setIndexBuffer(WebGPUBufferSubRange buffer, u32 offset);
    void setVertexBuffers(u32 startSlot, sequence<WebGPUBufferSubRange > buffers, sequence<u32> offsets);
};

dictionary WebGPUBufferCopyView {
    WebGPUBufferSubRange subBuffer;
    u32 offset;
    u32 rowPitch;
    u32 imageHeight;
};

partial interface WebGPUCommandBuffer {
    void copyBufferToBuffer(
        WebGPUBufferSubRange src,
        u32 srcOffset,
        WebGPUBufferSubRange dst,
        u32 dstOffset,
        u32 size);
};
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: BufferSubRanges to solve buffer-tracking, streaming and descriptor set explosion. #156

Motivation and Overview

Mapping Proposal

WebGPUBufferSubRange Creation

Word about Robustness

Changes to the rest of WebGPU

Changes to Bind Groups

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: BufferSubRanges to solve buffer-tracking, streaming and descriptor set explosion. #156

Description

Motivation and Overview

Mapping Proposal

WebGPUBufferSubRange Creation

Word about Robustness

Changes to the rest of WebGPU

Changes to Bind Groups

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions