-
Notifications
You must be signed in to change notification settings - Fork 329
Description
Motivation and Overview
TL;DR Basically TextureViews, but for Buffers.
As you may have noticed from #138 I'm of the opinion that whole-buffer tracking will be a great hindrance to porting and general development, and will require WebGPU specific work-arounds to make normal resource streaming possible.
However the WebGPU group's focus on portability throws the usual performant implementations off the table.
In general, Mapping Proposal 1 from #138 is quite good, as it allows for:
- Zero-copy write or readback (if heap sharing, external memory are available or WebGPU is a singleprocess implementation)
- Using the mappable buffer for any purpose
However it fails w.r.t. the following: - Having to create multiple copies of the buffer due to whole-buffer single mutable usage tracking
Extending that proposal by the introduction of the new BufferSubRange (name T.B.C.) object will allow finer-grain usage tracking of buffers by declaring the sub-ranges to be tracked up-front, while an alignment requirement ensures that the total number of sub-ranges and the difficulty of tracking them is reduced.
This allows us to partition a WebGPUBuffer
(which we hope is represented by a single native-API buffer) into multiple WebGPUBufferSubRange
s on which the actual "single mutable state" rule gets enforced.
This will prevent "Bind Group Explosion" and other side-effects in streaming scenarios, such as with this example streaming allocator made by @Kangz
// A "mapping" is a WebGPUBuffer and its ArrayBuffer mapping.
// {"buffer": WebGPUBuffer, "pointer": ArrayBuffer}
let availableMappings = [];
let mappingsUsedThisFrame = [];
let currentMapping = null;
let currentOffset = 0;
function getFreshMapping() {
if (availableMappings.length == 0) {
let buffer = device.createBuffer({"size": 2*1024*1024, "usage": ...});
let pointer = buffer.mapWriteSync(0, buffer.size);
availableMappings.push({"buffer": buffer, "pointer":pointer});
}
currentMapping = availableMappings.pop();
mappingsUsedThisFrame.push(currentMapping);
return currentMapping;
}
// Returns a (mapping, offset) pair.
function requestUploadSpace(size) {
if (currentMapping == null || currentOffset + size > currentMapping.buffer.size) {
getFreshMapping();
}
let offset = currentOffset;
currentOffset += size;
return (currentMapping, offset)
}
function unmapUploadBuffers() {
mappingsUsedThisFrame.forEach(mapping => mapping.buffer.unmap());
}
function remapAsyncUploadBuffers() {
mappingsUsedThisFrame.forEach(mapping => {
mapping.buffer.mapWriteAsync(0, mapping.buffer.size).then(pointer => {
availableMappings.push({
"buffer": mapping.buffer,
"pointer": pointer,
});
});
});
}
function endOfFrame() {
mappingsUsedThisFrame = [];
currentMapping = null;
currentOffset = 0;
}
function frame() {
let commands = device.createCommandBufferBuilder();
frameObjects.forEach(object => {
// uploadUniformData uses requestUploadSpace to know in which ArrayBuffer to put data
// it also knows which WebGPUBuffer this corresponds to and return it so it can be used
// to build the command buffer.
const (buffer, offset) = uploadUniformData(object)
renderObjectUsingUniform(object, commands, buffer, offset)
});
unmapUploadBuffers();
queue.submit(commands);
remapAsyncUploadBuffers();
endOfFrame();
}
In such a scenario the WebGPU app would have to make a WebGPUBindGroup
for each possible combination of actual buffer objects being used.
If however we could use dynamic buffer offsets #116 and track per-WebGPUBufferSubRange
instead, then there would be no need for creating and using different bind-groups.
Mapping Proposal
We introduce a new object, its creation parameters and mapping flags
interface WebGPUMappingType {
const u32 MAP_READ = 1;
const u32 MAP_WRITE = 2;
};
dictionary WebGPUBufferSubRangeDescriptor {
u32 offset;
u32 size;
WebGPUBufferUsageFlags usage;
};
interface WebGPUBufferSubRange {
Promise then(WebGPUUsageFinishedCallback success);
WebGPUMappedMemory mapAsync(WebGPUMappingType accessFlags, u32 offset, u32 size);
void unmap();
void destroy();
};
interface WebGPUBuffer {
WebGPUBufferSubRange createSubRange(WebGPUBufferSubRangeDescriptor descriptor);
};
The actual mapping is a slight spin on Proposal 1 from #138 , with a few changes:
offset
is an offset into theWebGPUBufferSubRange
, not the parentWebGPUBuffer
WebGPUBuffer
cannot be mapped, only a subrange represented byWebGPUBufferSubRange
can.- The promise is generalized and is used to queue up a callback which will return after the BufferSubRange's last use by any of the pending and already submitted command buffers. Any command buffer that uses a BufferSubRange that has any promise pending at the time of WebGPU submission will wait with actual native API submission until these promises reject or fullfill and the callbacks return.
- Calling
mapAsync
will throw an error if the BufferSubRange is "in use" by the GPU, which we define as being used in a currently executing command on any queue or by any command which has not yet executed but in a command buffer that has already been submitted to any queue. - SubRange must not be destroyed while mapped or "used" by the GPU
WebGPUBuffer
cannot be destroyed while any of its "children"WebGPUBufferSubRange
cannot be destroyed.- The memory can be mapped for both read and write (not necessary, but useful)
- For non-read mappings the memory is either zero-ed or memory recycled from previous
WebGPUBufferSubRange
mappings made by the same application (not necessary, but doubles streaming performance)
I'm obviously open to modifications to points 3-onwards.
Key advantages of this proposal:
- Persistently Mapped Buffers can be used by all implementations internally
- Zero-copy is possible on singleprocess implementations and any implementations which can use Vulkan external memory or D3D12 shared heaps
- It's easy to use
WebGPUBufferSubRange
s in-place of memory-blocks in existing mappable gpu buffer allocators, also usingWebGPUBuffer.createSubRange
andWebGPUBufferSubRange.destroy
as a fast and easy way to coalesce free memory blocks - Two
WebGPUBufferSubRange
s created from the sameWebGPUBuffer
are actually sub-ranges of the same native API buffer object, hence internally they are the same object which optimizes descriptor set changes and creation. - Like in native Persistently Mapped Buffer applications, we can have one range of the buffer mapped while having the GPU use another
WebGPUBufferSubRange
creation and destruction are really light-weight operations that require no API calls except for event polling- Granularity of range-splitting is reduced by alignment requirement
WebGPUBufferSubRange Creation
The expectation is that WebGPUBufferSubRange
creation and destruction will be really high performance compared to WebGPUBuffer
creation and destruction.
WebGPUBuffer.createSubRange
will return an error if any of the following are not true:
descriptor.offset
anddescriptor.size
are both multiples of 256 (can be higher, even 4096) [1]- The
[descriptor.offset,descriptor.offset+descriptor.size)
range is not contained within the creating buffer - The
[descriptor.offset,descriptor.offset+descriptor.size)
range includes any part of a range assigned to aWebGPUBufferSubRange
previously created from the sameWebGPUBuffer
which has not been destroyed yet - Each bit in the
descriptor.usage
field was also present in theWebGPUBufferDescriptor.usage
field that the creatingWebGPUBuffer
was created with - If
descriptor.offset
is not equal to 0 ordescriptor.size
does not equal the creatingWebGPUBuffer
's size, thendescriptor.usage
field CANNOT contain theWebGPUBufferUsage
flagsINDEX
norVERTEX
- The creating
WebGPUBuffer
has been destroyed
[1] This places the same requirement on WebGPUBuffer
size alignment.
[2] This only applies is the native API provided robustness guarantees are relied on.
Word about Robustness
Because the WebGPUBufferSubRange
does not map 1:1 with the native APIs' buffer objects (only WebGPUBuffer
does), we cannot rely on out-of-bounds accesses of index buffers or vertex buffers to be handled gracefully (because they are not bound as ranges, unlike UBOs and SSBOs).
This is why is the WebGPUBufferSubRange
does not contain the entire parent WebGPUBuffer
, then it could not be used as vertex (fetched by bad indices) or index (fetched by bad draw indirect) input data.
This obviously does not apply if #117 was to be implemented, or if the robustness gurantees of the native APIs were deemed not sufficient and WebGPU
implementations handle out-of-bounds and invalid accesses entirely by themselves.
Changes to the rest of WebGPU
Naturally WebGPUBufferSubRange
would take the place of WebGPUBuffer
in most APIs except for bind-groups.
Changes to Bind Groups
These would be the new Bind Group and Binding definitions:
dictionary WebGPUStaticBufferBinding {
WebGPUBufferSubRange bufferRange;
u32 offset;
u32 size;
};
dictionary WebGPUDynamicBufferBinding {
WebGPUBuffer buffer;
u32 size;
};
typedef (WebGPUSampler or WebGPUTextureView or WebGPUStaticBufferBinding or WebGPUDynamicBufferBinding) WebGPUBindingResource;
Assuming #149 being merged, and modifying it
dictionary WebGPUBufferSubRangeAndOffsetPair {
WebGPUBufferSubRange subBuffer;
u32 offset; // offset is an offset into subBuffer specified above
};
partial interface WebGPUProgrammablePassEncoder {
void setBindGroup(u32 index, WebGPUBindGroup bindGroup, optional sequence<WebGPUBufferSubRangeAndOffsetPair> dynamicOffsets);
}
Here we add constraints that:
- The
WebGPUBuffer
given byWebGPUDynamicBufferBinding.buffer
of theWebGPUDynamicBufferBinding
associated with a sequence elementWebGPUBufferSubRangeAndOffsetPair
, must be the parent creating buffer of theWebGPUBufferSubRangeAndOffsetPair.subBuffer
- The range given by
[WebGPUBufferSubRangeAndOffsetPair.offset,WebGPUBufferSubRangeAndOffsetPair.offset+WebGPUDynamicBufferBinding.size)
must not overflowWebGPUBufferSubRange.subBuffer
And finally, changing the rest:
partial interface WebGPURenderPassEncoder {
void setIndexBuffer(WebGPUBufferSubRange buffer, u32 offset);
void setVertexBuffers(u32 startSlot, sequence<WebGPUBufferSubRange > buffers, sequence<u32> offsets);
};
dictionary WebGPUBufferCopyView {
WebGPUBufferSubRange subBuffer;
u32 offset;
u32 rowPitch;
u32 imageHeight;
};
partial interface WebGPUCommandBuffer {
void copyBufferToBuffer(
WebGPUBufferSubRange src,
u32 srcOffset,
WebGPUBufferSubRange dst,
u32 dstOffset,
u32 size);
};