Fix FX Graph Cache issue in register_da8w4_concat_linear_cpu_pass #2907

cyxlily · 2025-08-29T08:38:53Z

Fix the bug that the FX Graph Cache was being bypassed when using the register_da8w4_concat_linear_cpu_pass, preventing cache hits on subsequent model runs.
Implement DA8W4ConcatLinearCPUPass that inherits from CustomGraphPass. Ensure it can be serialized and saved as fxgraph properly. Add the unit test. When saving fxgraph, the fxgraph_cache_bypass shuold remain at 0, confirming that the custom pass is no longer being rejected by the cache system.

pytorch-bot · 2025-08-29T08:38:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2907

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0bf3410 with merge base bc2c83e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla · 2025-08-29T08:38:59Z

Hi @cyxlily!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2025-08-29T10:12:14Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Xia-Weiwen

Thanks for the fix!

Fix the bug that the FX Graph Cache was being bypassed when using the register_da8w4_concat_linear_cpu_pass, preventing cache hits on subsequent model runs. Implement DA8W4ConcatLinearCPUPass that inherits from CustomGraphPass. Ensure it can be serialized and saved as fxgraph properly. Add the unit test. When saving fxgraph, the fxgraph_cache_bypass shuold remain at 0, confirming that the custom pass is no longer being rejected by the cache system. Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>

jerryzh168 · 2025-09-03T03:16:04Z

it's disabled in #2623 because of internal failures, so we'll need to import this to make sure no internal failures I think

facebook-github-bot · 2025-09-03T03:16:30Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81553661.

Xia-Weiwen · 2025-09-03T03:43:50Z

it's disabled in #2623 because of internal failures, so we'll need to import this to make sure no internal failures I think

Thanks

Xia-Weiwen · 2025-09-05T01:21:44Z

Hi @jerryzh168 May I know the results of internal checks? Thanks.

jerryzh168 · 2025-09-05T02:40:23Z

test/quantization/test_da8w4_cpu.py

            # The trailing "(" is to avoid matching the op in the comment
            assert code[0].count("torch.ops.torchao.da8w4_linear_cpu.default(") == 1
+
+            # ensure the custom DA8W4ConcatLinearCPUPass is properly cached as fxgraph


how does the added test make sure DA8W4ConcatLinearCPUPass is properly cached?

It checks the counter. Do you think the comment here should be improved? Thanks.

yeah, probably some explanation would be helpful, on how the cache works and why the counters would show that DA8W4ConcatLinearCPUPass is cached, I don't quite get it right now

How about change the comment to "# ensure the custom DA8W4ConcatLinearCPUPass is not bypassed when saving as fxgraph"?

still not clear what this means, can you add a more detailed explanation on how the caching works by default, what is the desired behavior for caching and how does the test makes sure the desired behavior occurs?

just adding the content of summary to the comment might be enough, and also make sure the variable naming is clearer

Thanks. It's updated.

jerryzh168 · 2025-09-05T02:41:22Z

I checked with @henrylhtsang and he says that this should fine

Modify the test description for test_da8w4_cpu. Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>

Xia-Weiwen · 2025-09-09T14:20:23Z

Hi @jerryzh168 How is the internal check going? BTW, the PR has been updated since your last import. Thanks.

jerryzh168 · 2025-09-11T04:14:02Z

test/quantization/test_da8w4_cpu.py

            # The trailing "(" is to avoid matching the op in the comment
            assert code[0].count("torch.ops.torchao.da8w4_linear_cpu.default(") == 1
+
+            # ensure the custom DA8W4ConcatLinearCPUPass is not bypassed when saving as fxgraph


can you update the comments to make it clearer what is being tested? why does this test the cache bypass?

Thanks. It's updated.

jerryzh168 · 2025-09-11T04:17:54Z

test/quantization/test_da8w4_cpu.py

                )
            assert torch.allclose(y, y_ref)

+            disable_fxgraph_cache_bypass = counters["inductor"]["fxgraph_cache_bypass"]


I guess the variable naming is a bit confusing probably, I understand this is making sure the with the new pass, fx graph cache is not bypassing anything now

jerryzh168

can merge after making comments clearer

Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>

cyxlily · 2025-09-12T01:23:01Z

@pytorchbot merge

pytorchmergebot · 2025-09-12T01:23:37Z

Merge failed

Reason: 1 mandatory check(s) are pending/not yet run. The first few are:

Facebook CLA Check

Dig deeper by viewing the pending checks on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: superuser

cyxlily · 2025-09-12T01:28:05Z

@jerryzh168 Could you help to merge?

facebook-github-bot · 2025-09-12T17:27:15Z

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81553661.

jerryzh168 · 2025-09-12T17:32:47Z

sorry I don't know how to merge this because I can't make the meta internal-only changes check pass, can you create a new PR?

jerryzh168 · 2025-09-12T23:26:28Z

merged now thanks @cyxlily @Xia-Weiwen

jerryzh168 · 2025-10-08T23:18:50Z

@cyxlily @Xia-Weiwen is it possible to do inductor changes only on your side? this PR has caused problems twice already, I don't think it's worth it to try to land it again

Xia-Weiwen · 2025-10-09T02:37:34Z

Hi @jerryzh168 May I know what the new issue is?

Xia-Weiwen · 2025-10-09T02:55:26Z

Hi @jerryzh168 I feel it is ok for us to enable the fusion pass in our script. Please feel free to revert it if needed. However, we are just curious about the issue you met and wondering if that means inductor fusion passes are not so friendly to extend or the CI should be improved somehow.

jerryzh168 · 2025-10-09T03:30:25Z

Hi @jerryzh168 May I know what the new issue is?

similar issues as before I think, cache hit becomes 0 after this PR

jerryzh168 · 2025-10-09T03:30:57Z

Hi @jerryzh168 I feel it is ok for us to enable the fusion pass in our script. Please feel free to revert it if needed. However, we are just curious about the issue you met and wondering if that means inductor fusion passes are not so friendly to extend or the CI should be improved somehow.

yeah will let you know if there is any followup afterwards

Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 29, 2025

cyxlily force-pushed the DA8W4ConcatLinearCPUPass branch from bb4f148 to 62d3689 Compare August 29, 2025 09:05

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 29, 2025

Xia-Weiwen approved these changes Sep 1, 2025

View reviewed changes

Xia-Weiwen requested review from henrylhtsang and jerryzh168 September 1, 2025 01:15

cyxlily force-pushed the DA8W4ConcatLinearCPUPass branch from 62d3689 to 535f344 Compare September 1, 2025 01:44

jerryzh168 reviewed Sep 5, 2025

View reviewed changes

cyxlily requested a review from jerryzh168 September 8, 2025 01:21

Modify the test description for test_da8w4_cpu

16eae7c

Modify the test description for test_da8w4_cpu. Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>

jerryzh168 approved these changes Sep 11, 2025

View reviewed changes

jerryzh168 reviewed Sep 11, 2025

View reviewed changes

jerryzh168 self-requested a review September 11, 2025 04:15

jerryzh168 reviewed Sep 11, 2025

View reviewed changes

jerryzh168 approved these changes Sep 11, 2025

View reviewed changes

Add more detailed comments

0bf3410

Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>

jerryzh168 approved these changes Sep 11, 2025

View reviewed changes

pytorchmergebot added the merging label Sep 12, 2025

pytorchmergebot removed the merging label Sep 12, 2025

jerryzh168 enabled auto-merge (squash) September 12, 2025 17:26

jerryzh168 disabled auto-merge September 12, 2025 17:26

jerryzh168 enabled auto-merge (squash) September 12, 2025 17:31

jerryzh168 disabled auto-merge September 12, 2025 17:32

jerryzh168 merged commit 045c959 into pytorch:main Sep 12, 2025
20 checks passed

Fix FX Graph Cache issue in register_da8w4_concat_linear_cpu_pass #2907

Fix FX Graph Cache issue in register_da8w4_concat_linear_cpu_pass #2907

Uh oh!

Conversation

cyxlily commented Aug 29, 2025

Uh oh!

pytorch-bot bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2907

✅ No Failures

Uh oh!

meta-cla bot commented Aug 29, 2025

Action Required

Process

Uh oh!

meta-cla bot commented Aug 29, 2025

Uh oh!

Xia-Weiwen left a comment

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Sep 3, 2025

Uh oh!

facebook-github-bot commented Sep 3, 2025

Uh oh!

Xia-Weiwen commented Sep 3, 2025

Uh oh!

Xia-Weiwen commented Sep 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 commented Sep 5, 2025

Uh oh!

Xia-Weiwen commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 left a comment

Choose a reason for hiding this comment

Uh oh!

cyxlily commented Sep 12, 2025

Uh oh!

pytorchmergebot commented Sep 12, 2025

Merge failed

Uh oh!

cyxlily commented Sep 12, 2025

Uh oh!

facebook-github-bot commented Sep 12, 2025

Uh oh!

jerryzh168 commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jerryzh168 commented Sep 12, 2025

Uh oh!

jerryzh168 commented Oct 8, 2025

Uh oh!

Xia-Weiwen commented Oct 9, 2025

Uh oh!

Xia-Weiwen commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jerryzh168 commented Oct 9, 2025

pytorch-bot bot commented Aug 29, 2025 •

edited

Loading

jerryzh168 Sep 11, 2025 •

edited

Loading

Xia-Weiwen commented Sep 9, 2025 •

edited

Loading

jerryzh168 commented Sep 12, 2025 •

edited

Loading

Xia-Weiwen commented Oct 9, 2025 •

edited

Loading