[wip][llvmgpu][codegen] Remove warp reduction pipeline #21433

newling · 2025-07-21T17:12:33Z

This PR is to see what the fallout is in the e2e CI when the warp reduction pipeline is removed.

Signed-off-by: James Newling <james.newling@gmail.com>

Groverkss · 2025-07-22T09:48:33Z

compiler/src/iree/compiler/Preprocessing/Common/test/pad_to_intrinsics_mfma.mlir

@@ -207,7 +207,8 @@ func.func @main2(%arg0: tensor<2x130x130x4xf16>, %arg1: tensor<3x3x4x320xf16>, %

 // -----

-// We want to skip padding skinny matmul cases, since warpReduction is more performant for it.
+// We want to skip padding skinny matmul cases, since WarpReduction is more performant for it.
+// TODO(newling) reconsider this, as WarpReduction is removed.


we dont need to reconsider this, warp reduction is still a "strategy", just the llvmgpuvectordistribute pipeline can handle it.

Groverkss · 2025-07-22T09:48:47Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/reduction_pipeline_softmax_rocm.mlir

-  #hal.pipeline.binding<storage_buffer>,
-  #hal.pipeline.binding<storage_buffer>
-]>
-func.func @dynamic_softmax() {


we need this test to go through vector distribute

I have a (not pushed) branch which can do this now. Tested numerics, and they match the numerics on ToM with WarpReduction... which are incorrect. Created an issue to track this #21468

Groverkss · 2025-07-22T09:49:07Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_matvec.mlir

-
-// -----
-
-func.func @dynamic_parallel_dims(%dynsize : index, %input : tensor<4x?x4096xf16>) -> tensor<4x?xf32> {


None of these tests should be deleted, they need to go through vector distribute now

Groverkss · 2025-07-22T09:49:32Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_matvec.mlir

-  %4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : i32
-  %5 = arith.index_castui %0 : i32 to index
-  %6 = arith.index_castui %1 : i32 to index
-  %7 = arith.index_castui %2 : i32 to index


we need to send this through vector distribute, this cannot be deleted.

Groverkss · 2025-07-22T09:49:53Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/config_matvec.mlir

-  %4 = iree_tensor_ext.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !iree_tensor_ext.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
-  %5 = tensor.empty() : tensor<2x32000xf16>
-  %6 = linalg.fill ins(%cst : f16) outs(%5 : tensor<2x32000xf16>) -> tensor<2x32000xf16>
-  %7 = linalg.matmul_transpose_b ins(%3, %4 : tensor<2x4096xf16>, tensor<32000x4096xf16>) outs(%6 : tensor<2x32000xf16>) -> tensor<2x32000xf16>


these need to go through vector distribute. None of the tests is config_matvec can be deleted

Ok. I will push on PRs like #21430 to get this (and other) lit tests working with vector distribute

Groverkss · 2025-07-22T09:50:57Z

compiler/src/iree/compiler/Codegen/LLVMGPU/test/ROCDL/pipeline_warp_reduction.mlir

-        %5 = linalg.generic {
-          indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>],
-          iterator_types = ["parallel", "reduction"]
-        } ins(%2 : tensor<2x512xf32>) outs(%4 : tensor<2xf32>) {


instead of deleting these tests, these tests need to be moved to pipeline_vector_distribute file and checked if we can do them with vector distribute.

newling changed the title ~~[wip][[llvmgpu]codegen] Remove warp reduction pipeline~~ [wip][llvmgpu][codegen] Remove warp reduction pipeline Jul 21, 2025

newling added 5 commits July 21, 2025 15:10

remove warp reduction selection

002eecf

Signed-off-by: James Newling <james.newling@gmail.com>

further removal of lit tests pertaining to warp reduction pipeline

7c8ab22

Signed-off-by: James Newling <james.newling@gmail.com>

further removal

7ab7c8a

Signed-off-by: James Newling <james.newling@gmail.com>

add test that isn't bad bvack

496bb0a

Signed-off-by: James Newling <james.newling@gmail.com>

deeper removal of warp reduction

0e92480

newling force-pushed the warp_reduction_remove_and_see_fallout branch from aa6508f to e908914 Compare July 21, 2025 22:44

further removal and TODO comments

e908914

Groverkss reviewed Jul 22, 2025

View reviewed changes

newling mentioned this pull request Jul 22, 2025

[Codegen] linalg.generic with dynamic reduction dim: use LLVMGPUVectorDistribution. #21430

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip][llvmgpu][codegen] Remove warp reduction pipeline #21433

[wip][llvmgpu][codegen] Remove warp reduction pipeline #21433

newling commented Jul 21, 2025

Uh oh!

Groverkss Jul 22, 2025

Uh oh!

Groverkss Jul 22, 2025

Uh oh!

newling Jul 23, 2025

Uh oh!

Groverkss Jul 22, 2025

Uh oh!

Groverkss Jul 22, 2025

Uh oh!

Groverkss Jul 22, 2025

Uh oh!

newling Jul 22, 2025

Uh oh!

Groverkss Jul 22, 2025

Uh oh!

Uh oh!


		// -----

		func.func @dynamic_parallel_dims(%dynsize : index, %input : tensor<4x?x4096xf16>) -> tensor<4x?xf32> {

[wip][llvmgpu][codegen] Remove warp reduction pipeline #21433

Are you sure you want to change the base?

[wip][llvmgpu][codegen] Remove warp reduction pipeline #21433

Conversation

newling commented Jul 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!