+
Skip to content

Conversation

liangel-02
Copy link
Contributor

Adding HuggingFace integration docs with Transfomers/Diffusers per #2873

Copy link

pytorch-bot bot commented Aug 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2899

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 304f4ec with merge base f0cca99 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 28, 2025
@andrewor14 andrewor14 self-requested a review August 28, 2025 16:48
@andrewor14 andrewor14 added the topic: documentation Use this tag if this PR adds or improves documentation label Aug 28, 2025
Copy link
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! I think right now there's a lot of duplicated content, probably because of the current organization. I think for simplicity we should just have a HF transformers section and a HF diffusers section, and within each section just show how to load the model, quantize it, save and reload it, and do inference on it. That way we can just show each code block once. So in summary I think a better organization is something like:

## Integration with HF transformers
- installation
- load the model, quantize, save_pretrained + push_to_hub
- reload the quantized model, inference
## Integration with HF diffusers
- same as above
## Supported quantization types
## Configuration system
## Serving with vLLM (just link to our vllm doc page)
## Safetensors support

```

```{note}
For more information on supported quantization and sparsity configurations, see [HF-Torchao Docs](https://huggingface.co/docs/transformers/main/en/quantization/torchao).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit outdated now. I think a good next task will be to update it with the new fp8+int4 and fp8+fp8 configs

Copy link
Contributor

@jerryzh168 jerryzh168 Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably have a single place that we can point people to that contains information about:

if in A100, what are the things to try based on the workload, and what are the trade offs
and same for H100 and CPU

This should probably live in torchao and both transformers and diffusers can link to torchao

@liangel-02 liangel-02 marked this pull request as ready for review September 9, 2025 15:19
Copy link
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just need to fix the numbering a bit and this is good to go from my side.

@sayakpaul @stevhliu @jerryzh168 any thoughts from you guys?


```{note}
Example Output:
![alt text](output.png "Model Output")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love this!

```

(serving-with-vllm)=
### 2. Serving with VLLM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there should be an inference/serving section (on the same level as "Configuration System"), where vLLM, HF transformers, and HF diffusers are 3 separate ways to do this. Right now the numbering is a bit confusing, we have 2. vLLM, 3a. HF transformers, and 3b. HF diffusers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be nice to either have a direct link to the relevant section (https://docs.pytorch.org/ao/main/torchao_vllm_integration.html#usage-examples) or just include the code snippet here so users don't have to navigate to a different page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll add a link directly to the usage examples section so that there isn't duplicate code between the pages

Copy link

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! 🤗

```python
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig

model_id = "black-forest-labs/Flux.1-Dev"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we eventually update this example to use Int8WeightOnlyConfig? (see PR here huggingface/diffusers#12275)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think so, probably after the PR is merged


Recall how we can quantize models using HuggingFace Transformers in Part 1. Now we can use the model for inference.

```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel code example should live in transformers and diffusers page itself, and here we just need to link to them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @jerryzh168 here.

We should probably just include two basic examples (one for transformers and one for diffusers) and then provide links.

This way the content stays lean, to-the-point, and up-to-date (as the HF docs are generally up-to-date about the integrations).

@liangel-02 liangel-02 force-pushed the hf_integration_docs branch 3 times, most recently from bf8ecff to b45a76a Compare September 12, 2025 16:30
Copy link
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, see some comments inline

@liangel-02 liangel-02 merged commit cc65dc5 into main Sep 12, 2025
21 checks passed
@liangel-02 liangel-02 deleted the hf_integration_docs branch September 12, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: documentation Use this tag if this PR adds or improves documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载