-
Notifications
You must be signed in to change notification settings - Fork 329
Push constant proposal #4612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push constant proposal #4612
Conversation
Previews, as seen when this build job started (6d92094): |
proposals/push-constants.md
Outdated
uint32_t pushConstantsSize = 0; | ||
}; | ||
``` | ||
two pipeline layouts are defined to be “compatible for push constants” if they were created with identical push constant size. It means push constants values can share between pipeline layouts that are compatible for push constants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume it's a compile error for a module to use push constants outside of its defined size range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts is that we follow out-of-bounds access here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can validate that the shader module uses no more than the push constants allowed by the module so we should be fine. Also dynamic indexing of push constants is not possible so in the shader we know that all accesses are in bounds.
proposals/push-constants.md
Outdated
| --- | --- | --- | --- | --- | --- | --- | | ||
| `var<push_constant>` | Immutable | [Module](https://www.w3.org/TR/WGSL/#module-scope) | [Concrete](https://www.w3.org/TR/WGSL/#type-concrete) [constructible](https://www.w3.org/TR/WGSL/#constructible) [host-shareable](https://www.w3.org/TR/WGSL/#host-shareable) excludes [array types](https://www.w3.org/TR/WGSL/#array-types) and [structure types](https://www.w3.org/TR/WGSL/#struct-types) contains array members | Disallowed | | Yes. [uniform buffer](https://www.w3.org/TR/WGSL/#uniform-buffer) | | ||
|
||
NOTE: Each [entry point](https://www.w3.org/TR/WGSL/#entry-point) can statically use at most one push constant variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make implementation a simpler. So we don't need to introduce @offset
in wgsl to describe multiple push constant variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to enforce a single push constant interface we should consider an alternate design of making it a parameter to the entry point (identified via an attribute). If you did consider it, what did you see as the draw backs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally would prefer push constants to be like bindings, and not tied to a specific entry point. I tend to keep my bindings/push constants in a separate file I can reuse across passes like so.
HLSL seems to have requirements around having push constants be structs anyways gfx-rs/wgpu#5571
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was my strawman in #75 (comment) considered at all? If we assume 4-byte elements and alignment, I think this could become a lot easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@magcius in that proposal, where are @parameter
variable declared? At the global scope of as function arguments? It seems very similar to specifying an offset, except that it stays in "words" instead of bytes. It'd be nice to support smaller data types in push constants eventually so IMHO it's still be better to act on bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to enforce a single push constant interface we should consider an alternate design of making it a parameter to the entry point (identified via an attribute). If you did consider it, what did you see as the draw backs?
I agree it is an optional design. Personally, I'm slightly prefer current address space mode proposal because:
- It defines a global scope variable (like uniform), so that the helper functions could use it directly. ( instead of accepting attributes as parameter, if we pass push constant variable as entry point function parameter)
- From API aspect,
pushConstantBytes
is related withPipelineLayout
rather thanShaderModule
. Take it as a special binding resource is natural.
proposals/push-constants.md
Outdated
|
||
```javascript | ||
interface GPUCommandEncoder { | ||
void setPushConstantU32(uint32_t offset, uint32_t value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What units is offset
in? If bytes, are there any restrictions? If I do setPushConstantU32(1, 0x12345678);
, will it overwrite bytes 1, 2, 3, 4 and leave byte 0 alone? Should there be alignment restrictions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Same as bufferOffset )
It is in bytes, and it should be multiple of 4 bytes. And the alignment should follow this(https://gpuweb.github.io/gpuweb/wgsl/#alignment-and-size). So it means a the pushconstantSize = variable sizes + padding sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about f16? Should it just be set via a packed shared buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F16 should be set via packed shared buffer, or setPushConstantU32
so that we don't change the bits.
proposals/push-constants.md
Outdated
|
||
| Address space | Sharing among invocations | Default access mode | Notes | | ||
| --- | --- | --- | --- | | ||
| `push_constant` | Invocations in [shader stage](https://www.w3.org/TR/WGSL/#shader-stages) | [read](https://www.w3.org/TR/WGSL/#access-read) | For [uniform buffer](https://www.w3.org/TR/WGSL/#uniform-buffer) variables exclude [array types](https://www.w3.org/TR/WGSL/#array-types) variable or [structure types](https://www.w3.org/TR/WGSL/#struct-types) variable contains [array types](https://www.w3.org/TR/WGSL/#array-types) attributes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the uniform buffer reference a copy paste error?
Separately, are arrays excluded to avoid Vulkan's default uniform access requirements? It will seem arbitrary to users unfamiliar with Vulkan. Is this a better resolution than requiring non-uniform indexing (probably)? Is it better than having a uniformity requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the uniform buffer reference a copy paste error?
No, I think uniform buffer
variable defined in wgsl aligns with what I want with push constant
. I think push constant variables
should be host-shareable constructible type, and must satisfy the address space layout constraints.
Separately, are arrays excluded to avoid Vulkan's default uniform access requirements?
Not only Vulkan, but also DX12. DX12 says Arrays are not permitted in constant buffers that get mapped onto root constants since dynamic indexing in the root signature space is not supported (https://learn.microsoft.com/en-us/windows/win32/direct3d12/using-constants-directly-in-the-root-signature)
proposals/push-constants.md
Outdated
| --- | --- | --- | --- | --- | --- | --- | | ||
| `var<push_constant>` | Immutable | [Module](https://www.w3.org/TR/WGSL/#module-scope) | [Concrete](https://www.w3.org/TR/WGSL/#type-concrete) [constructible](https://www.w3.org/TR/WGSL/#constructible) [host-shareable](https://www.w3.org/TR/WGSL/#host-shareable) excludes [array types](https://www.w3.org/TR/WGSL/#array-types) and [structure types](https://www.w3.org/TR/WGSL/#struct-types) contains array members | Disallowed | | Yes. [uniform buffer](https://www.w3.org/TR/WGSL/#uniform-buffer) | | ||
|
||
NOTE: Each [entry point](https://www.w3.org/TR/WGSL/#entry-point) can statically use at most one push constant variable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to enforce a single push constant interface we should consider an alternate design of making it a parameter to the entry point (identified via an attribute). If you did consider it, what did you see as the draw backs?
proposals/push-constants.md
Outdated
|
||
```javascript | ||
interface GPUCommandEncoder { | ||
void setPushConstantU32(uint32_t offset, uint32_t value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about f16? Should it just be set via a packed shared buffer?
proposals/push-constants.md
Outdated
Four new functions in `GPUCommandEncoder`. | ||
|
||
```javascript | ||
interface GPUCommandEncoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like these commands should either take a pipeline or be on render/compute encoders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On render/compute encoders. And that's why I tend to add these APIs in GPUCommandEncoder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands wouldn't all pipelines need the same push constants. That's more what I was getting at of needing to associate the command with a particular pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPUCommandEncoder is the top-level encoder. If you want to add these methods to render/compute encoders you need to add them to GPUBindingCommandsMixin (assuming they be used in render bundles).
We could associate push constant values with a particular pipeline by passing them as part of setPipeline()
(kind of parallel to overrides in createPipeline()
), but since IIUC setPushConstant()
is often going to be called multiple times per setPipeline()
, I think it makes more sense to make it parallel setBindGroup()
: there are separate states for the current {pipeline, bind groups, push constants} and we capture all the current states each time we hit a draw/dispatch call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands wouldn't all pipelines need the same push constants. That's more what I was getting at of needing to associate the command with a particular pipeline.
Yes. My expected use case is:
passEncoder.setPipeline().
passEncoder.setBindGroup();
passEncoder.setPushConstant();
...
// setBindGroup();
// setPushConstant();
If you want to add these methods to render/compute encoders you need to add them to GPUBindingCommandsMixin (assuming they be used in render bundles).
Thanks! That's the place I expected.
proposals/push-constants.md
Outdated
void setPushConstantU32(uint32_t offset, uint32_t value); | ||
void setPushConstantI32(uint32_t offset, int32_t value); | ||
void setPushConstantF32(uin32_t offset, float32_t value); | ||
void setPushConstants(uint32_t offset, uint32_t count, AllowSharedBufferSource data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kelsey points out we would want to add a dataOffset and dataCount (like other entry points that take BufferSources) so we can point directly into a larger array buffer instead of copying it into a new ArrayBuffer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the API shape to void setImmediateDataRange(uint32_t rangeOffset, AllowSharedBufferSource data, optional dataOffset, optional size);
proposals/push-constants.md
Outdated
void setPushConstantU32(uint32_t offset, uint32_t value); | ||
void setPushConstantI32(uint32_t offset, int32_t value); | ||
void setPushConstantF32(uin32_t offset, float32_t value); | ||
void setPushConstants(uint32_t offset, uint32_t count, AllowSharedBufferSource data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks good, but I think we only need the last signature taking an AllowSharedBufferSource
and not the other overloads (U32, I32, F32, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But tend to keep these U32, I32, F32 for convenience , especially we tend to support small amount of data uploading.
I think this will have impact on the already complex matrix of GPUSupportedLimits. Have we considered making setVertexBuffer take an ArrayBufferSource? Then we don't need to change the limits. I didn't take a look at the D3D or Vulkan backends yet so maybe that is the motivation? On Metal, this is as simple as calling setVertexBytes: / setFragmentBytes: |
Assuming that push constants would be implemented using D3D12's root constants... If I'm not mistaken, D3D12 has a fixed (per-device?) amount of space in the root signature. In shipping WebGPU, the WebGPU implementation gets to decide how to use that space. For example, if the WebGPU Pipeline Layout includes 2 textures, the WebGPU implementation can place those 2 texture descriptors directly inside the D3D12 root signature. Or, alternatively, the WebGPU implementation can choose to put both texture descriptors into a descriptor table, and make the root signature point to the descriptor table. The WebGPU implementation has freedom here to use this fixed root signature space as it desires. Okay, now let's add in support for push constants. Push constants would also live directly within the fixed root signature space. But what if all the root signature space is already occupied (e.g. by those texture descriptors from the above paragraph)? Even if WebGPU only allows a single push constant, that doesn't actually solve the problem. The root signature space may be 100% occupied by descriptors, leaving no space at all for push constants. There are a variety of potential solutions to this problem:
|
Previously conducted performance analysis of push constants: |
GPU Web WG 2024-05-08 Pacific-time
|
Just point out possible implementation issues I observed:
|
It sounds like D3D and Vulkan can be optimized to not use a buffer when the push constants are small enough and there is space. With Metal, we always need to reserve a buffer slot for the push constants. Though it would be possible to map them all into one buffer slot. Abstracting this concept seems to further justify that the term 'push constants' does not necessarily need to exist in WebGPU. Rather, the D3D and Vulkan backends can implement small, dynamic setBindGroup calls into push constants and on Metal, we would use set[Vertex/Fragment]Bytes unless the data is large, then we would make a buffer |
If you tie push constants to specifically render pipelines, then you can't use them for compute shaders (and later, other types of pipelines), which would be unfortunate. |
Yeah, people in this thread are using a variety of different terminology. Above, when people say "setVertexBytes()" or "setFragmentBytes()," I believe they are referring to the Metal functions of the same name, as an example of a function that takes immediate data (rather than a method taking resources or bind groups). Even in Metal, there's a way to supply immediate data to compute pipelines. |
tl;dr: Thinking deeper, I think the only way it can work in Vulkan is either to just not add the feature at all, or add an explicit API for it. I think the biggest unanswered question here regarding "let push constants be an implementation detail" is this: Consider such a program:
The question is: What should the code in the shader be to handle both of these cases? In the first case, we'd want to promote the In SPIR-V, the text of the shader is actually different depending on whether or not you're accessing a push constant or not. For example, in GLSL (which I'm using here just as a readable form of SPIR-V), you'd either have this:
Or this:
If we operate under the constraint that no amount of recompilation is allowed to happen while commands are being recorded into the render pass - not even supplying specialization constants - then the only way this can work is if both blocks are present in the shader, and each access to the data is guarded by a flag - in software - to determine whether the data was bound to Assuming it is in fact a dealbreaker, it seems like the only way this could ever work (again, assuming that the committee does want to make it work, which I'm still not convinced has been sufficiently justified) would be to come up with a system where the implementation can guarantee that the immediate data being set by the app will fit into push constant space. This is why I suggested a But there's even an additional problem - what about the following program:
How should this work, assuming no amount of recompilation is acceptable? Imagine the bind group contains a buffer whose contents is populated by a previous compute shader - the CPU has never seen this data, and therefore such data cannot be hoisted to push constants. There would have to be some way, at the API level, to distinguish the immediate data from the bind group data, so that the shader wouldn't have to handle both cases in software at runtime. I think the only way this could work is to add a new type of descriptor in the bind group layout. That way, an entry in the bind group layout could either refer to a buffer, or refer to immediate data, but not both. Any data set by immediates for one draw call would be incapable of being clobbered by a GPU-populated buffer for a subsequent draw call - thereby allowing the shader compiler to know that it's safe to only emit the So then that begs the question: What should this new type of descriptor in the bind group layout be named? Perhaps it should be named ... "push constant"... |
This is turning into a bit of a bikeshed opportunity, but I'll add my two cents to the jar:
To give a bit of an unprompted history lesson and awareness of how these features are implemented in practice... The simplified AMD interpretation of push constants (not applicable to all drivers and hardware, just giving one mental model): The hardware has some general-purpose 32-bit scalar registers (SGPRs); these registers are active storage for threads, but also can be initialized with user constants set up by the hardware that launches threads. Some of these registers are used by the driver itself for internal purposes (e.g. the "base draw offset" feature from Vulkan would get passed by an SGPR), some of them are used by the driver to implement binding models (D3D12 CBVs might get mapped directly to an SGPR), and some of them are used to store memory addresses to binding tables (e.g. if you have a lot of textures, the driver will allocate a block of memory which has the texture descriptors, and store a pointer to the table in an SGPR; these are D3D12's "descriptor tables"). The D3D12 idea was that all root signature parameters (root constants, CBVs, descriptor tables) would map to SGPRs; however, as the driver sometimes needs some for its own purposes, this isn't a 1:1 mapping, and sometimes the driver has to jettison some of the parameters to a memory block and do some finagling to rearrange all of the binding updates. Note that since the usage of an SGPR is arbitrary (it can contain a memory address or a value, just depending on how it's used), the usage needs to be determined at root signature time; in practice, the AMD compiler effectively reserves a few driver SGPRs that work across all platforms. So, in order to make "transparent push constants" work, we can't simply replace the first N bytes of a uniform buffer of a bind group with immediate data; using it with a bind group will require a pointer in the SGPRs, while using it with push constants will require the values in SGPRs. Instead, we'll need syntax that lets us optimistically compile the first N push constants to SGPRs, and then the rest get jettisoned to a memory buffer. This is enabled a bit more by the extra pipeline layout field so we know up-front what will fit, but it still seems more complicated than just having a small fixed-size limit in the specification. We can always expand to support bigger sizes through emulation later. |
I think we have to be careful here - if the biggest use case is simply "make some indices available to a shader" then a simple API convenience method would be sufficient, without changing the binding model or adding push constants at all. |
(I just realized that we already have that simple API convenience method: |
GPU Web WG 2024-06-26 Atlantic-time
|
It would still be nice to get rid of the typed method names which seem out of place with the rest of the specification:
and maybe allow something more flexible like:
Not sure what @shaoboyan thinks |
Unless we expect that we'll be adding a lot more data types I don't see any particular advantage to passing the type as an enum, and it probably adds some additional work for implementors to do the value cast in the implementation rather than allowing the WebIDL layer to handle it for you. Plus I think the variant with separate methods for each type is more WASM-friendly as well? |
I proposed the original wrapper functions as optional sugar, and I'm fine with them being removed in favor of an ArrayBuffer interface if we can't agree on them. We already require that users construct ArrayBuffers for writeBuffer so it's not like a user should have any difficulty figuring it out. |
I'm a fan of using IDL to handle type conversion. I think immediate data is a perf focus API so I prefer avoiding any non-necessary steps.
According to this benchmark, I think the perf gap between typed data and constructed array buffer is not trivial. So I support specific typed APIs
I'm slightly preferred to put type conversion in IDL layer so that we might benefit from future IDL opts. And again, this might help the uploading perf compared with extra enum variable. |
The question to me is comparing: setImmediateDataF32(0, 123.0);
setImmediateDataF32(4, 456.0);
setImmediateDataF32(8, 789.0);
setImmediateDataF32(12, 101112.0); and setImmediateData(0, new Float32Array([123.0, 456.0, 789.0, 101112.0])); There are already multiple tricks to cut down on array buffer allocation cost (e.g. allocate one large buffer which you reuse). I also wish that JS implementations would see this as an opportunity to improve TypedArray performance, but I probably can't have my wish here. |
Internally in Firefox's WebGL implementation, we convert all calls to e.g. uniform1i into something called UniformData: This very nicely predisposes me to setImmediateData being more generic, rather than having typed entrypoints. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per API meeting today - let's get this proposal doc landed! I've added comments about what we should write down before landing it.
Co-authored-by: Kai Ninomiya <kainino1@gmail.com>
GPU Web WG 2024-07-10 Atlantic-time
|
GPU Web WG 2024-07-17 Pacific-time
|
- Should pipelineLayout defines immediate range compatible? | ||
- Implementation internal immediate data usage could easily break compatibility. Implementation needs extra | ||
effort to ensure such compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still an open question? There is this note on line 101:
NOTE: two pipeline layouts are defined to be “compatible for immediate data” if they were created with identical immediate data byte size. It means immediate data values can share between pipeline layouts that are compatible for immediate data.
Push constants proposal. Related issue #75