这是indexloc提供的服务,不要输入任何密码
Skip to content

Proposal: BufferSubRanges to solve buffer-tracking, streaming and descriptor set explosion. #156

@devshgraphicsprogramming

Description

Motivation and Overview

TL;DR Basically TextureViews, but for Buffers.

As you may have noticed from #138 I'm of the opinion that whole-buffer tracking will be a great hindrance to porting and general development, and will require WebGPU specific work-arounds to make normal resource streaming possible.

However the WebGPU group's focus on portability throws the usual performant implementations off the table.

In general, Mapping Proposal 1 from #138 is quite good, as it allows for:

  • Zero-copy write or readback (if heap sharing, external memory are available or WebGPU is a singleprocess implementation)
  • Using the mappable buffer for any purpose
    However it fails w.r.t. the following:
  • Having to create multiple copies of the buffer due to whole-buffer single mutable usage tracking

Extending that proposal by the introduction of the new BufferSubRange (name T.B.C.) object will allow finer-grain usage tracking of buffers by declaring the sub-ranges to be tracked up-front, while an alignment requirement ensures that the total number of sub-ranges and the difficulty of tracking them is reduced.
This allows us to partition a WebGPUBuffer (which we hope is represented by a single native-API buffer) into multiple WebGPUBufferSubRanges on which the actual "single mutable state" rule gets enforced.
This will prevent "Bind Group Explosion" and other side-effects in streaming scenarios, such as with this example streaming allocator made by @Kangz

// A "mapping" is a WebGPUBuffer and its ArrayBuffer mapping.
// {"buffer": WebGPUBuffer, "pointer": ArrayBuffer}

let availableMappings = [];
let mappingsUsedThisFrame = [];

let currentMapping = null;
let currentOffset = 0;

function getFreshMapping() {
    if (availableMappings.length == 0) {
        let buffer = device.createBuffer({"size": 2*1024*1024, "usage": ...});
        let pointer = buffer.mapWriteSync(0, buffer.size);
        availableMappings.push({"buffer": buffer, "pointer":pointer});
    }

    currentMapping = availableMappings.pop();
    mappingsUsedThisFrame.push(currentMapping);

    return currentMapping;
}

// Returns a (mapping, offset) pair.
function requestUploadSpace(size) {
    if (currentMapping == null || currentOffset + size > currentMapping.buffer.size) {
        getFreshMapping();
    }

    let offset = currentOffset;
    currentOffset += size;
    return (currentMapping, offset)
}

function unmapUploadBuffers() {
    mappingsUsedThisFrame.forEach(mapping => mapping.buffer.unmap());
}

function remapAsyncUploadBuffers() {
    mappingsUsedThisFrame.forEach(mapping => {
        mapping.buffer.mapWriteAsync(0, mapping.buffer.size).then(pointer => {
            availableMappings.push({
                "buffer": mapping.buffer,
                "pointer": pointer,
            });
        });
    });
}

function endOfFrame() {
    mappingsUsedThisFrame = [];
    currentMapping = null;
    currentOffset = 0;
}


function frame() {
    let commands = device.createCommandBufferBuilder();

    frameObjects.forEach(object => {
        // uploadUniformData uses requestUploadSpace to know in which ArrayBuffer to put data
        // it also knows which WebGPUBuffer this corresponds to and return it so it can be used
        // to build the command buffer.
        const (buffer, offset) = uploadUniformData(object)
        renderObjectUsingUniform(object, commands, buffer, offset)
    });

    unmapUploadBuffers();

    queue.submit(commands);

    remapAsyncUploadBuffers();
    endOfFrame();
}

In such a scenario the WebGPU app would have to make a WebGPUBindGroup for each possible combination of actual buffer objects being used.
If however we could use dynamic buffer offsets #116 and track per-WebGPUBufferSubRange instead, then there would be no need for creating and using different bind-groups.

Mapping Proposal

We introduce a new object, its creation parameters and mapping flags

interface WebGPUMappingType {
    const u32 MAP_READ = 1;
    const u32 MAP_WRITE = 2;
};

dictionary WebGPUBufferSubRangeDescriptor {
    u32 offset;
    u32 size;
    WebGPUBufferUsageFlags usage;
};

interface WebGPUBufferSubRange {    
    Promise then(WebGPUUsageFinishedCallback success);

    WebGPUMappedMemory mapAsync(WebGPUMappingType accessFlags, u32 offset, u32 size);
    void unmap();

    void destroy();
};

interface WebGPUBuffer {
    WebGPUBufferSubRange createSubRange(WebGPUBufferSubRangeDescriptor descriptor);
};

The actual mapping is a slight spin on Proposal 1 from #138 , with a few changes:

  1. offset is an offset into the WebGPUBufferSubRange, not the parent WebGPUBuffer
  2. WebGPUBuffer cannot be mapped, only a subrange represented by WebGPUBufferSubRange can.
  3. The promise is generalized and is used to queue up a callback which will return after the BufferSubRange's last use by any of the pending and already submitted command buffers. Any command buffer that uses a BufferSubRange that has any promise pending at the time of WebGPU submission will wait with actual native API submission until these promises reject or fullfill and the callbacks return.
  4. Calling mapAsync will throw an error if the BufferSubRange is "in use" by the GPU, which we define as being used in a currently executing command on any queue or by any command which has not yet executed but in a command buffer that has already been submitted to any queue.
  5. SubRange must not be destroyed while mapped or "used" by the GPU
  6. WebGPUBuffer cannot be destroyed while any of its "children" WebGPUBufferSubRange cannot be destroyed.
  7. The memory can be mapped for both read and write (not necessary, but useful)
  8. For non-read mappings the memory is either zero-ed or memory recycled from previous WebGPUBufferSubRange mappings made by the same application (not necessary, but doubles streaming performance)
    I'm obviously open to modifications to points 3-onwards.

Key advantages of this proposal:

  1. Persistently Mapped Buffers can be used by all implementations internally
  2. Zero-copy is possible on singleprocess implementations and any implementations which can use Vulkan external memory or D3D12 shared heaps
  3. It's easy to use WebGPUBufferSubRanges in-place of memory-blocks in existing mappable gpu buffer allocators, also using WebGPUBuffer.createSubRange and WebGPUBufferSubRange.destroy as a fast and easy way to coalesce free memory blocks
  4. Two WebGPUBufferSubRanges created from the same WebGPUBuffer are actually sub-ranges of the same native API buffer object, hence internally they are the same object which optimizes descriptor set changes and creation.
  5. Like in native Persistently Mapped Buffer applications, we can have one range of the buffer mapped while having the GPU use another
  6. WebGPUBufferSubRange creation and destruction are really light-weight operations that require no API calls except for event polling
  7. Granularity of range-splitting is reduced by alignment requirement

WebGPUBufferSubRange Creation

The expectation is that WebGPUBufferSubRange creation and destruction will be really high performance compared to WebGPUBuffer creation and destruction.

WebGPUBuffer.createSubRange will return an error if any of the following are not true:

  • descriptor.offset and descriptor.size are both multiples of 256 (can be higher, even 4096) [1]
  • The [descriptor.offset,descriptor.offset+descriptor.size) range is not contained within the creating buffer
  • The [descriptor.offset,descriptor.offset+descriptor.size) range includes any part of a range assigned to a WebGPUBufferSubRange previously created from the same WebGPUBuffer which has not been destroyed yet
  • Each bit in the descriptor.usage field was also present in the WebGPUBufferDescriptor.usage field that the creating WebGPUBuffer was created with
  • If descriptor.offset is not equal to 0 or descriptor.size does not equal the creating WebGPUBuffer's size, then descriptor.usage field CANNOT contain the WebGPUBufferUsage flags INDEX nor VERTEX
  • The creating WebGPUBuffer has been destroyed

[1] This places the same requirement on WebGPUBuffer size alignment.
[2] This only applies is the native API provided robustness guarantees are relied on.

Word about Robustness

Because the WebGPUBufferSubRange does not map 1:1 with the native APIs' buffer objects (only WebGPUBuffer does), we cannot rely on out-of-bounds accesses of index buffers or vertex buffers to be handled gracefully (because they are not bound as ranges, unlike UBOs and SSBOs).

This is why is the WebGPUBufferSubRange does not contain the entire parent WebGPUBuffer, then it could not be used as vertex (fetched by bad indices) or index (fetched by bad draw indirect) input data.

This obviously does not apply if #117 was to be implemented, or if the robustness gurantees of the native APIs were deemed not sufficient and WebGPU implementations handle out-of-bounds and invalid accesses entirely by themselves.

Changes to the rest of WebGPU

Naturally WebGPUBufferSubRange would take the place of WebGPUBuffer in most APIs except for bind-groups.

Changes to Bind Groups

These would be the new Bind Group and Binding definitions:

dictionary WebGPUStaticBufferBinding {
    WebGPUBufferSubRange bufferRange;
    u32 offset;
    u32 size;
};

dictionary WebGPUDynamicBufferBinding {
    WebGPUBuffer buffer;
    u32 size;
};

typedef (WebGPUSampler or WebGPUTextureView or WebGPUStaticBufferBinding or WebGPUDynamicBufferBinding) WebGPUBindingResource;

Assuming #149 being merged, and modifying it

dictionary WebGPUBufferSubRangeAndOffsetPair {
    WebGPUBufferSubRange subBuffer;
    u32 offset; // offset is an offset into subBuffer specified above
};

partial interface WebGPUProgrammablePassEncoder {
    void setBindGroup(u32 index, WebGPUBindGroup bindGroup, optional sequence<WebGPUBufferSubRangeAndOffsetPair> dynamicOffsets);
}

Here we add constraints that:

  1. The WebGPUBuffer given by WebGPUDynamicBufferBinding.buffer of the WebGPUDynamicBufferBinding associated with a sequence element WebGPUBufferSubRangeAndOffsetPair, must be the parent creating buffer of the WebGPUBufferSubRangeAndOffsetPair.subBuffer
  2. The range given by [WebGPUBufferSubRangeAndOffsetPair.offset,WebGPUBufferSubRangeAndOffsetPair.offset+WebGPUDynamicBufferBinding.size) must not overflow WebGPUBufferSubRange.subBuffer

And finally, changing the rest:

partial interface WebGPURenderPassEncoder {
    void setIndexBuffer(WebGPUBufferSubRange buffer, u32 offset);
    void setVertexBuffers(u32 startSlot, sequence<WebGPUBufferSubRange > buffers, sequence<u32> offsets);
};

dictionary WebGPUBufferCopyView {
    WebGPUBufferSubRange subBuffer;
    u32 offset;
    u32 rowPitch;
    u32 imageHeight;
};

partial interface WebGPUCommandBuffer {
    void copyBufferToBuffer(
        WebGPUBufferSubRange src,
        u32 srcOffset,
        WebGPUBufferSubRange dst,
        u32 dstOffset,
        u32 size);
};

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions