-
Notifications
You must be signed in to change notification settings - Fork 345
Description
Different hardware running the same algorithm may need dramatically different threads-per-threadgroup values in order to get good performance. A small device may need a small value so the register file doesn't spill to main memory, but a big device may need a large value to take advantage of all its multiprocessing lanes. However, an author doesn't know which value is a good value to pick.
This problem is not specific to WebGPU, but we do have a goal of portable performance.
It's kind of doubly bad because the current design of WebGPU bakes in these values as literals into the shader. Therefore, even if the application could divine good values at runtime, it would have to rewrite its shader to use them.
Similarly, each implementation/device has its own thresholds of acceptable values, and these thresholds are specific to each shader. Therefore, compiling a compute shader on one platform may succeed, but on another platform may fail because it ran up against an implementation-specific limit.