-
Notifications
You must be signed in to change notification settings - Fork 730
Pattern to hoist pack unpack ops from scf.for op #21431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Pattern to hoist pack unpack ops from scf.for op #21431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, just some suggestions to harden the pattern and testing
compiler/src/iree/compiler/Codegen/Common/GPU/GPUPackToIntrinsics.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_pack_to_instrinsics.mlir
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/test/gpu_pack_to_instrinsics.mlir
Outdated
Show resolved
Hide resolved
bfd1ee6
to
9b2603a
Compare
compiler/src/iree/compiler/Codegen/Common/GPU/GPUPackToIntrinsics.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUPackToIntrinsics.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUPackToIntrinsics.cpp
Outdated
Show resolved
Hide resolved
compiler/src/iree/compiler/Codegen/Common/GPU/GPUPackToIntrinsics.cpp
Outdated
Show resolved
Hide resolved
8d43595
to
144417a
Compare
Signed-off-by: Yash Deshpande <ydeshpan@amd.com>
144417a
to
19a1783
Compare
for (auto operand : yieldOp.getOperands()) { | ||
unpackOp = operand.getDefiningOp<linalg::UnPackOp>(); | ||
if (!unpackOp) { | ||
tiedResultIdx++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use for for (auto [tiedResultIdx, operand] = llvm::enumerate(yieldOp.getOperands())
, this way you dont need to increment tiedResultIdx at different points.
#map = affine_map<(d0, d1, d2) -> (d0, d2)> | ||
#map1 = affine_map<(d0, d1, d2) -> (d2, d1)> | ||
#map2 = affine_map<(d0, d1, d2) -> (d0, d1)> | ||
func.func @hoist_pack_unpack(%arg0 : tensor<4x8x16x4xf32>, %arg1 : tensor<8x8x16x4xf32>, %arg2 : tensor<64x127xf32>) -> tensor<64x127xf32>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this test, testing something that the later test with two pack/unpack pairs is not? If it is not, you can remove this one.
#map = affine_map<(d0, d1, d2) -> (d0, d2)> | ||
#map1 = affine_map<(d0, d1, d2) -> (d2, d1)> | ||
#map2 = affine_map<(d0, d1, d2) -> (d0, d1)> | ||
func.func @no_hoist_pack_unpack(%arg0 : tensor<4x8x16x4xf32>, %arg1 : tensor<8x4x16x4xf32>, %arg2 : tensor<64x64xf32>) -> tensor<64x64xf32>{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make the test names more descriptive, e.g no_hoist_pack_unpack_mismatched
no_hoist_pack_unpack_mismatched_otherusers
%c0 = arith.constant 0 : index | ||
%c32 = arith.constant 32 : index | ||
%padding_value = arith.constant 0.000000e+00 : f32 | ||
%1, %result0 = scf.for %arg3 = %c0 to %c512 step %c32 iter_args(%arg5 = %input, %arg4 = %arg2) -> (tensor<64x128xf32>, tensor<64x127xf32>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe do another argument in the middle that doesn't have a pack/unpack on it?
linalg::PackOp packOp; | ||
int64_t tiedResultIdx = 0; | ||
|
||
// Iterate through all operands of yieldOp & hoist each available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is misleading, you hoist the first legal pack/unpack pair?
This pattern looks for a
scf.yield
op within anscf.for
, whose producer is alinalg.unpack
op & hoist both the unpack & the corresponding pack ops outside thescf.for
within GPUPackToIntrinsicsPass.This pattern is helpful if we want to do reduction tiling before packToIntrinsics.