[QST] What is the best timing to trigger dependent grid launch?

I've seen many kernels with `cutlass::arch::launch_dependent_grid()` or `cudaTriggerProgrammaticLaunchCompletion()` at the end of kernel. Is this a good practice?

I have a kernel with heavy preamble that follows a CUTLASS gemm kernel. So I want to let it overlap with the previous gemm.

In this case, wouldn't it be more efficient to trigger the dependent grid launch at the beginning of the GeMM kernel so preamble of the following kernel properly run in parallel? What's the tradeoff of trigger PDL early or lately in the primary kernel?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QST] What is the best timing to trigger dependent grid launch? #2778

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[QST] What is the best timing to trigger dependent grid launch? #2778

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions