-
Notifications
You must be signed in to change notification settings - Fork 329
Description
Context
Every API with render targets has at least the following load ops:
- Load, where the initial value of the pixel in the renderpass will be loaded from memory
- Clear, where they will be set to a constant value
- Don't care, where they can be any value and the app will overwrite the value anyway
Likewise every API has at least the following store ops:
- Store, where the value is written to memory
- Discard, where the stored value is undefined (and can just not be written)
- Resolve, for multisample attachments (although it is part of the subpass description in Vulkan)
"Clear", "Don't care" and "Discard" are extremely important on tiler GPUs because they allow skipping memory operations are the beginning and end of render passes and save a lot of power on mobile platforms.
(side note: Vulkan resolve attachments are part of pipeline compatibility, this will be mildly annoying)
Proposal 1
"Don't care" is undefined value, which we don't want so WebGPU could expose only "Clear" which on mobile GPUs usually happen inside the tile memory and is super cheap.
"Discard" is undefined value too but "Store" is really expensive. Instead we could have a "StoreZero" which acts as if it stores zeroes in the memory, but actually discards and clears lazily the next time the texture is used. If the next time the texture is used as loadOp "Clear" with zeroes, then that counts as a lazily clear and is cheap!
So the way to make transient attachments efficient would be to loadOp "Clear" with zeroes and "StoreZero" at the end.
The issue with proposal 1
Proposal 1 would work great on tilers who can clear in the tile cache, but what about immediate-mode GPUs or tiler GPUs without an explicit tile cache? For these "Clear" zero and "StoreZero" could cause the following to happen:
- The texture is clear to zero outside the render pass
- The render pass starts, loading pixel values (no tile-cache clearing)
- The render pass ends, storing pixel values (no discard)
This would be even worse than "Load" and "Store" because an additional clear happens.
Proposal 2
Have the following load ops:
- Load
- Clear
- LoadOrClearZero where all pixel values will start at zero, or all pixel values will start at the previously stored value.
Have the following store ops:
- Store
- StoreZero
- StoreOrStoreZero where all pixels get zero stored (lazily) or all pixels get stored the final value.
Using LoadOrClearZero and StoreOrStoreZero will allow the WebGPU implementation to choose the optimal load and store ops based on the underlying hardware without exposing them but few differences for the application.