-
Notifications
You must be signed in to change notification settings - Fork 349
Fix FX Graph Cache issue in register_da8w4_concat_linear_cpu_pass #2907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2907
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 0bf3410 with merge base bc2c83e ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @cyxlily! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
bb4f148
to
62d3689
Compare
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
62d3689
to
535f344
Compare
Fix the bug that the FX Graph Cache was being bypassed when using the register_da8w4_concat_linear_cpu_pass, preventing cache hits on subsequent model runs. Implement DA8W4ConcatLinearCPUPass that inherits from CustomGraphPass. Ensure it can be serialized and saved as fxgraph properly. Add the unit test. When saving fxgraph, the fxgraph_cache_bypass shuold remain at 0, confirming that the custom pass is no longer being rejected by the cache system. Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>
it's disabled in #2623 because of internal failures, so we'll need to import this to make sure no internal failures I think |
@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81553661. |
Thanks |
Hi @jerryzh168 May I know the results of internal checks? Thanks. |
test/quantization/test_da8w4_cpu.py
Outdated
# The trailing "(" is to avoid matching the op in the comment | ||
assert code[0].count("torch.ops.torchao.da8w4_linear_cpu.default(") == 1 | ||
|
||
# ensure the custom DA8W4ConcatLinearCPUPass is properly cached as fxgraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does the added test make sure DA8W4ConcatLinearCPUPass
is properly cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It checks the counter. Do you think the comment here should be improved? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, probably some explanation would be helpful, on how the cache works and why the counters would show that DA8W4ConcatLinearCPUPass
is cached, I don't quite get it right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about change the comment to "# ensure the custom DA8W4ConcatLinearCPUPass is not bypassed when saving as fxgraph"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still not clear what this means, can you add a more detailed explanation on how the caching works by default, what is the desired behavior for caching and how does the test makes sure the desired behavior occurs?
just adding the content of summary to the comment might be enough, and also make sure the variable naming is clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's updated.
I checked with @henrylhtsang and he says that this should fine |
Modify the test description for test_da8w4_cpu. Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>
Hi @jerryzh168 How is the internal check going? BTW, the PR has been updated since your last import. Thanks. |
test/quantization/test_da8w4_cpu.py
Outdated
# The trailing "(" is to avoid matching the op in the comment | ||
assert code[0].count("torch.ops.torchao.da8w4_linear_cpu.default(") == 1 | ||
|
||
# ensure the custom DA8W4ConcatLinearCPUPass is not bypassed when saving as fxgraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update the comments to make it clearer what is being tested? why does this test the cache bypass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's updated.
test/quantization/test_da8w4_cpu.py
Outdated
) | ||
assert torch.allclose(y, y_ref) | ||
|
||
disable_fxgraph_cache_bypass = counters["inductor"]["fxgraph_cache_bypass"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the variable naming is a bit confusing probably, I understand this is making sure the with the new pass, fx graph cache is not bypassing anything now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can merge after making comments clearer
Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>
@pytorchbot merge |
Merge failedReason: 1 mandatory check(s) are pending/not yet run. The first few are:
Dig deeper by viewing the pending checks on hud |
@jerryzh168 Could you help to merge? |
@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81553661. |
sorry I don't know how to merge this because I can't make the meta internal-only changes check pass, can you create a new PR? |
merged now thanks @cyxlily @Xia-Weiwen |
@cyxlily @Xia-Weiwen is it possible to do inductor changes only on your side? this PR has caused problems twice already, I don't think it's worth it to try to land it again |
Hi @jerryzh168 May I know what the new issue is? |
Hi @jerryzh168 I feel it is ok for us to enable the fusion pass in our script. Please feel free to revert it if needed. However, we are just curious about the issue you met and wondering if that means inductor fusion passes are not so friendly to extend or the CI should be improved somehow. |
similar issues as before I think, cache hit becomes 0 after this PR |
yeah will let you know if there is any followup afterwards |
Fix the bug that the FX Graph Cache was being bypassed when using the register_da8w4_concat_linear_cpu_pass, preventing cache hits on subsequent model runs.
Implement DA8W4ConcatLinearCPUPass that inherits from CustomGraphPass. Ensure it can be serialized and saved as fxgraph properly. Add the unit test. When saving fxgraph, the fxgraph_cache_bypass shuold remain at 0, confirming that the custom pass is no longer being rejected by the cache system.