-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Labels
Description
I notice the kernel generator python/cutlass_library/generator.py can't emit kernels with fp16/fp32 input on sm120 architecture, only f8 is supported.
While sm100 kernels support fp16/fp32 inputs, they can't run on sm120 gpus directly. Is there any restrictions?
using cutlass3x_sm100_tensorop_gemm_f16_f16_f16_f16_f16_128x256x64_0x0x1_0_tnt_align8_1sm_epilogue =
typename cutlass::epilogue::collective::CollectiveBuilder<
cutlass::arch::Sm100, cutlass::arch::OpClassTensorOp,
cute::Shape<cute::_128, cute::_256, cute::_64>,
cute::Shape<int, int, cute::_1>,
cutlass::epilogue::collective::EpilogueTileAuto,
cutlass::half_t, cutlass::half_t,
cutlass::half_t, cutlass::layout::RowMajor, 8,
cutlass::half_t, cutlass::layout::RowMajor, 8,
cutlass::epilogue::TmaWarpSpecialized1Sm,
cutlass::epilogue::fusion::LinearCombination<
cutlass::half_t,
cutlass::half_t,
cutlass::half_t,
cutlass::half_t
>>::CollectiveOp;
Above shows part of a cutlass kernel designed for sm100, if I replace cutlass::arch::Sm100 with cutlass::arch::Sm120, error SM120 TmaWarpSpecialized builder currently only supports F8F6F4 MMA. will occur.
So how to generate kernels with fp16/fp32 inputs/accumulator/D on sm120 arch?