这是indexloc提供的服务,不要输入任何密码
Skip to content

[wip][llvmgpu][codegen] Remove warp reduction pipeline #21433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

newling
Copy link
Contributor

@newling newling commented Jul 21, 2025

This PR is to see what the fallout is in the e2e CI when the warp reduction pipeline is removed.

@newling newling changed the title [wip][[llvmgpu]codegen] Remove warp reduction pipeline [wip][llvmgpu][codegen] Remove warp reduction pipeline Jul 21, 2025
newling added 5 commits July 21, 2025 15:10
Signed-off-by: James Newling <james.newling@gmail.com>
Signed-off-by: James Newling <james.newling@gmail.com>
Signed-off-by: James Newling <james.newling@gmail.com>
Signed-off-by: James Newling <james.newling@gmail.com>
@newling newling force-pushed the warp_reduction_remove_and_see_fallout branch from aa6508f to e908914 Compare July 21, 2025 22:44
@@ -207,7 +207,8 @@ func.func @main2(%arg0: tensor<2x130x130x4xf16>, %arg1: tensor<3x3x4x320xf16>, %

// -----

// We want to skip padding skinny matmul cases, since warpReduction is more performant for it.
// We want to skip padding skinny matmul cases, since WarpReduction is more performant for it.
// TODO(newling) reconsider this, as WarpReduction is removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need to reconsider this, warp reduction is still a "strategy", just the llvmgpuvectordistribute pipeline can handle it.

#hal.pipeline.binding<storage_buffer>,
#hal.pipeline.binding<storage_buffer>
]>
func.func @dynamic_softmax() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need this test to go through vector distribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a (not pushed) branch which can do this now. Tested numerics, and they match the numerics on ToM with WarpReduction... which are incorrect. Created an issue to track this #21468


// -----

func.func @dynamic_parallel_dims(%dynsize : index, %input : tensor<4x?x4096xf16>) -> tensor<4x?xf32> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these tests should be deleted, they need to go through vector distribute now

%4 = hal.interface.constant.load layout(#pipeline_layout) ordinal(4) : i32
%5 = arith.index_castui %0 : i32 to index
%6 = arith.index_castui %1 : i32 to index
%7 = arith.index_castui %2 : i32 to index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to send this through vector distribute, this cannot be deleted.

%4 = iree_tensor_ext.dispatch.tensor.load %1, offsets = [0, 0], sizes = [32000, 4096], strides = [1, 1] : !iree_tensor_ext.dispatch.tensor<readonly:tensor<32000x4096xf16>> -> tensor<32000x4096xf16>
%5 = tensor.empty() : tensor<2x32000xf16>
%6 = linalg.fill ins(%cst : f16) outs(%5 : tensor<2x32000xf16>) -> tensor<2x32000xf16>
%7 = linalg.matmul_transpose_b ins(%3, %4 : tensor<2x4096xf16>, tensor<32000x4096xf16>) outs(%6 : tensor<2x32000xf16>) -> tensor<2x32000xf16>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to go through vector distribute. None of the tests is config_matvec can be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will push on PRs like #21430 to get this (and other) lit tests working with vector distribute

%5 = linalg.generic {
indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]
} ins(%2 : tensor<2x512xf32>) outs(%4 : tensor<2xf32>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of deleting these tests, these tests need to be moved to pipeline_vector_distribute file and checked if we can do them with vector distribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants