[QST] Does gemm of cutlass using distributed shared memory?

**Is there any gemm kernel using distributed shared memory?**
 I have profiled many gemm kernel of hopper using nsight compute, but not found there's no data transfer using distributed shared memory. 
example: cutlass/examples/*hopper_gemm*

- 48_hopper_warp_specialized_gemm
- 49_hopper_gemm_with_collective_builder
- ...63_hopper_gemm_with_weight_prefetch
- ./python/CuTeDSL/hopper/*
- ./cute/tutorial/hopper/*

Thus I want to know if cutlass has implemented distributed shared memory in gemm kernel?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QST] Does gemm of cutlass using distributed shared memory? #2770

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] Does gemm of cutlass using distributed shared memory? #2770

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions