I’m checking out Example 23 and found a thing when using kGemmSplitKParallel mode; I’d like to get this cleared up:
In this mode, the example explicitly allocates a block of workspace memory as part of its setup. However, when the GEMM kernel actually executes, this pre-allocated workspace is not being utilized—there are no read or write operations to it during kernel runtime.
I’m not sure if I’ve missed any conditions that would explain why I can’t detect the usage of this workspace.