What is your question?
Hi everyone, I modified the example to JIT mode, i.e., gemm(mA, mB, mC, stream), but observed cache missing among different processes.
In fact, stream is only used in kernel launch, and does not affect the compilation. Adding it as a placeholder in mangle name might help.
Do you have any suggestions? Thanks!
http://github.com/NVIDIA/cutlass/blob/bd96096d58e4886e204cd1d71a385ca73e7719b8/examples/python/CuTeDSL/hopper/dense_gemm.py#L381
http://github.com/NVIDIA/cutlass/blob/bd96096d58e4886e204cd1d71a385ca73e7719b8/python/CuTeDSL/cutlass/base_dsl/dsl.py#L555
Furthermore, could you kindly expose the mangle_name API so that users could check if they need re-compilation (i.e., the AoT)?