[XLA:GPU] Enable wgmma with block_n=8 in Triton and autotuner #97505

copybara-service · 2025-07-24T11:36:27Z

[XLA:GPU] Enable wgmma with block_n=8 in Triton and autotuner

The workaround in AccelerateMatmul forces a contracting-contiguous (K-contiguous) layout for rhs with small N. This prevents from triggering unsupported no-swizzle lowering.

We also add checks to WGMMA lowering to catch incorrectly-filled descriptors. Previously this would either crash in runtime or produce incorrect results.

The workaround in AccelerateMatmul forces a contracting-contiguous (K-contiguous) layout for rhs with small N. This prevents from triggering unsupported no-swizzle lowering. We also add checks to WGMMA lowering to catch incorrectly-filled descriptors. Previously this would either crash in runtime or produce incorrect results. PiperOrigin-RevId: 772893153

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XLA:GPU] Enable wgmma with block_n=8 in Triton and autotuner #97505

[XLA:GPU] Enable wgmma with block_n=8 in Triton and autotuner #97505

Uh oh!

copybara-service bot commented Jul 24, 2025

Uh oh!

Uh oh!

[XLA:GPU] Enable wgmma with block_n=8 in Triton and autotuner #97505

Are you sure you want to change the base?

[XLA:GPU] Enable wgmma with block_n=8 in Triton and autotuner #97505

Uh oh!

Conversation

copybara-service bot commented Jul 24, 2025

Uh oh!

Uh oh!