+

hf integration doc page #2899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

liangel-02 merged 2 commits into main from hf_integration_docs

Sep 12, 2025

Contributor

liangel-02 commented Aug 28, 2025

Adding HuggingFace integration docs with Transfomers/Diffusers per #2873


          hf integration doc page

d01aafb

pytorch-bot bot commented Aug 28, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2899

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 304f4ec with merge base f0cca99 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-cla bot added the CLA Signed label

andrewor14 self-requested a review

August 28, 2025 16:48

andrewor14 added the topic: documentation label

andrewor14 reviewed

View reviewed changes

Contributor

andrewor14 left a comment

Thanks for the doc! I think right now there's a lot of duplicated content, probably because of the current organization. I think for simplicity we should just have a HF transformers section and a HF diffusers section, and within each section just show how to load the model, quantize it, save and reload it, and do inference on it. That way we can just show each code block once. So in summary I think a better organization is something like:

## Integration with HF transformers
- installation
- load the model, quantize, save_pretrained + push_to_hub
- reload the quantized model, inference
## Integration with HF diffusers
- same as above
## Supported quantization types
## Configuration system
## Serving with vLLM (just link to our vllm doc page)
## Safetensors support

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated

    
              ```

              ```{note}

              For more information on supported quantization and sparsity configurations, see [HF-Torchao Docs](https://huggingface.co/docs/transformers/main/en/quantization/torchao).

Contributor

andrewor14 Sep 2, 2025

This is a bit outdated now. I think a good next task will be to update it with the new fp8+int4 and fp8+fp8 configs

Contributor

jerryzh168 Sep 11, 2025 •

edited

Loading

I think we should probably have a single place that we can point people to that contains information about:

if in A100, what are the things to try based on the workload, and what are the trade offs
and same for H100 and CPU

This should probably live in torchao and both transformers and diffusers can link to torchao

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

andrewor14 requested a review from jerryzh168

September 2, 2025 22:14

liangel-02 marked this pull request as ready for review

September 9, 2025 15:19

andrewor14 reviewed

View reviewed changes

Contributor

andrewor14 left a comment •

edited

Loading

Looks great! Just need to fix the numbering a bit and this is good to go from my side.

@sayakpaul @stevhliu @jerryzh168 any thoughts from you guys?

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md

    
              ```{note}

              Example Output:

              ![alt text](output.png "Model Output")

Contributor

andrewor14 Sep 10, 2025

love this!

docs/source/torchao_hf_integration.md Outdated

    
              ```

              (serving-with-vllm)=

              ### 2. Serving with VLLM

Contributor

andrewor14 Sep 10, 2025

Maybe there should be an inference/serving section (on the same level as "Configuration System"), where vLLM, HF transformers, and HF diffusers are 3 separate ways to do this. Right now the numbering is a bit confusing, we have 2. vLLM, 3a. HF transformers, and 3b. HF diffusers.

stevhliu Sep 10, 2025

It would also be nice to either have a direct link to the relevant section (https://docs.pytorch.org/ao/main/torchao_vllm_integration.html#usage-examples) or just include the code snippet here so users don't have to navigate to a different page.

Contributor Author

liangel-02 Sep 11, 2025

i'll add a link directly to the usage examples section so that there isn't duplicate code between the pages

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

stevhliu approved these changes

View reviewed changes

stevhliu left a comment

Thanks for the doc! 🤗

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md

    
              ```python

              from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig

              model_id = "black-forest-labs/Flux.1-Dev"

stevhliu Sep 10, 2025

Should we eventually update this example to use Int8WeightOnlyConfig? (see PR here huggingface/diffusers#12275)

Contributor

jerryzh168 Sep 12, 2025

yeah I think so, probably after the PR is merged

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Outdated

    
              Recall how we can quantize models using HuggingFace Transformers in Part 1. Now we can use the model for inference.

              ```python

Contributor

jerryzh168 Sep 11, 2025

I feel code example should live in transformers and diffusers page itself, and here we just need to link to them

Contributor

sayakpaul Sep 11, 2025

Agreed with @jerryzh168 here.

We should probably just include two basic examples (one for transformers and one for diffusers) and then provide links.

This way the content stays lean, to-the-point, and up-to-date (as the HF docs are generally up-to-date about the integrations).

liangel-02 force-pushed the hf_integration_docs branch 3 times, most recently from bf8ecff to b45a76a Compare

September 12, 2025 16:30

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Show resolved Hide resolved

liangel-02 force-pushed the hf_integration_docs branch from b45a76a to a425a27 Compare

September 12, 2025 18:53

andrewor14 approved these changes

View reviewed changes

Contributor

andrewor14 left a comment

Looks great, thanks!

liangel-02 force-pushed the hf_integration_docs branch from a425a27 to 1749293 Compare

September 12, 2025 18:59

jerryzh168 reviewed

View reviewed changes

docs/source/serving.rst Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

jerryzh168 reviewed

View reviewed changes

docs/source/torchao_hf_integration.md Outdated Show resolved Hide resolved

jerryzh168 approved these changes

View reviewed changes

Contributor

jerryzh168 left a comment

looks good, see some comments inline

liangel-02 force-pushed the hf_integration_docs branch from 1749293 to fef12b9 Compare

September 12, 2025 20:02


          changes

304f4ec

liangel-02 force-pushed the hf_integration_docs branch from fef12b9 to 304f4ec Compare

September 12, 2025 20:08

liangel-02 merged commit cc65dc5 into main

21 checks passed

liangel-02 deleted the hf_integration_docs branch

September 12, 2025 23:15

andrewor14 mentioned this pull request

Add HuggingFace integration doc page #2873

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed topic: documentation

点击这是indexloc提供的php浏览器服务，不要输入任何密码和下载