+
Skip to content

Conversation

cyxlily
Copy link
Contributor

@cyxlily cyxlily commented Aug 29, 2025

Fix the bug that the FX Graph Cache was being bypassed when using the register_da8w4_concat_linear_cpu_pass, preventing cache hits on subsequent model runs.
Implement DA8W4ConcatLinearCPUPass that inherits from CustomGraphPass. Ensure it can be serialized and saved as fxgraph properly. Add the unit test. When saving fxgraph, the fxgraph_cache_bypass shuold remain at 0, confirming that the custom pass is no longer being rejected by the cache system.

Copy link

pytorch-bot bot commented Aug 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2907

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0bf3410 with merge base bc2c83e (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

meta-cla bot commented Aug 29, 2025

Hi @cyxlily!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@Xia-Weiwen Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Aug 29, 2025
@cyxlily cyxlily force-pushed the DA8W4ConcatLinearCPUPass branch from bb4f148 to 62d3689 Compare August 29, 2025 09:05
Copy link

meta-cla bot commented Aug 29, 2025

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 29, 2025
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Fix the bug that the FX Graph Cache was being bypassed when using the
register_da8w4_concat_linear_cpu_pass, preventing cache hits on subsequent
model runs.
Implement DA8W4ConcatLinearCPUPass that inherits from CustomGraphPass.
Ensure it can be serialized and saved as fxgraph properly.
Add the unit test. When saving fxgraph, the fxgraph_cache_bypass shuold
remain at 0, confirming that the custom pass is no longer being rejected
by the cache system.

Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>
@jerryzh168
Copy link
Contributor

it's disabled in #2623 because of internal failures, so we'll need to import this to make sure no internal failures I think

@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81553661.

@Xia-Weiwen
Copy link
Collaborator

it's disabled in #2623 because of internal failures, so we'll need to import this to make sure no internal failures I think

Thanks

@Xia-Weiwen
Copy link
Collaborator

Hi @jerryzh168 May I know the results of internal checks? Thanks.

# The trailing "(" is to avoid matching the op in the comment
assert code[0].count("torch.ops.torchao.da8w4_linear_cpu.default(") == 1

# ensure the custom DA8W4ConcatLinearCPUPass is properly cached as fxgraph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the added test make sure DA8W4ConcatLinearCPUPass is properly cached?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checks the counter. Do you think the comment here should be improved? Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, probably some explanation would be helpful, on how the cache works and why the counters would show that DA8W4ConcatLinearCPUPass is cached, I don't quite get it right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change the comment to "# ensure the custom DA8W4ConcatLinearCPUPass is not bypassed when saving as fxgraph"?

Copy link
Contributor

@jerryzh168 jerryzh168 Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not clear what this means, can you add a more detailed explanation on how the caching works by default, what is the desired behavior for caching and how does the test makes sure the desired behavior occurs?

just adding the content of summary to the comment might be enough, and also make sure the variable naming is clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's updated.

@jerryzh168
Copy link
Contributor

I checked with @henrylhtsang and he says that this should fine

@cyxlily cyxlily requested a review from jerryzh168 September 8, 2025 01:21
Modify the test description for test_da8w4_cpu.

Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>
@Xia-Weiwen
Copy link
Collaborator

Xia-Weiwen commented Sep 9, 2025

Hi @jerryzh168 How is the internal check going? BTW, the PR has been updated since your last import. Thanks.

# The trailing "(" is to avoid matching the op in the comment
assert code[0].count("torch.ops.torchao.da8w4_linear_cpu.default(") == 1

# ensure the custom DA8W4ConcatLinearCPUPass is not bypassed when saving as fxgraph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the comments to make it clearer what is being tested? why does this test the cache bypass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's updated.

@jerryzh168 jerryzh168 self-requested a review September 11, 2025 04:15
)
assert torch.allclose(y, y_ref)

disable_fxgraph_cache_bypass = counters["inductor"]["fxgraph_cache_bypass"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the variable naming is a bit confusing probably, I understand this is making sure the with the new pass, fx graph cache is not bypassing anything now

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can merge after making comments clearer

Signed-off-by: Cui, Yuxin <yuxin.cui@intel.com>
@cyxlily
Copy link
Contributor Author

cyxlily commented Sep 12, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) are pending/not yet run. The first few are:

  • Facebook CLA Check

Dig deeper by viewing the pending checks on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: superuser

@cyxlily
Copy link
Contributor Author

cyxlily commented Sep 12, 2025

@jerryzh168 Could you help to merge?

@jerryzh168 jerryzh168 enabled auto-merge (squash) September 12, 2025 17:26
@jerryzh168 jerryzh168 disabled auto-merge September 12, 2025 17:26
@facebook-github-bot
Copy link
Contributor

@jerryzh168 has imported this pull request. If you are a Meta employee, you can view this in D81553661.

@jerryzh168 jerryzh168 enabled auto-merge (squash) September 12, 2025 17:31
@jerryzh168 jerryzh168 disabled auto-merge September 12, 2025 17:32
@jerryzh168
Copy link
Contributor

jerryzh168 commented Sep 12, 2025

sorry I don't know how to merge this because I can't make the meta internal-only changes check pass, can you create a new PR?

@jerryzh168 jerryzh168 merged commit 045c959 into pytorch:main Sep 12, 2025
20 checks passed
@jerryzh168
Copy link
Contributor

merged now thanks @cyxlily @Xia-Weiwen

@jerryzh168
Copy link
Contributor

@cyxlily @Xia-Weiwen is it possible to do inductor changes only on your side? this PR has caused problems twice already, I don't think it's worth it to try to land it again

@Xia-Weiwen
Copy link
Collaborator

Hi @jerryzh168 May I know what the new issue is?

@Xia-Weiwen
Copy link
Collaborator

Xia-Weiwen commented Oct 9, 2025

Hi @jerryzh168 I feel it is ok for us to enable the fusion pass in our script. Please feel free to revert it if needed. However, we are just curious about the issue you met and wondering if that means inductor fusion passes are not so friendly to extend or the CI should be improved somehow.

@jerryzh168
Copy link
Contributor

Hi @jerryzh168 May I know what the new issue is?

similar issues as before I think, cache hit becomes 0 after this PR

@jerryzh168
Copy link
Contributor

Hi @jerryzh168 I feel it is ok for us to enable the fusion pass in our script. Please feel free to revert it if needed. However, we are just curious about the issue you met and wondering if that means inductor fusion passes are not so friendly to extend or the CI should be improved somehow.

yeah will let you know if there is any followup afterwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing Use this tag if you don't want this PR to show up in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载