-
Notifications
You must be signed in to change notification settings - Fork 348
hf integration doc page #2899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hf integration doc page #2899
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2899
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 304f4ec with merge base f0cca99 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! I think right now there's a lot of duplicated content, probably because of the current organization. I think for simplicity we should just have a HF transformers section and a HF diffusers section, and within each section just show how to load the model, quantize it, save and reload it, and do inference on it. That way we can just show each code block once. So in summary I think a better organization is something like:
## Integration with HF transformers
- installation
- load the model, quantize, save_pretrained + push_to_hub
- reload the quantized model, inference
## Integration with HF diffusers
- same as above
## Supported quantization types
## Configuration system
## Serving with vLLM (just link to our vllm doc page)
## Safetensors support
``` | ||
|
||
```{note} | ||
For more information on supported quantization and sparsity configurations, see [HF-Torchao Docs](https://huggingface.co/docs/transformers/main/en/quantization/torchao). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit outdated now. I think a good next task will be to update it with the new fp8+int4 and fp8+fp8 configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably have a single place that we can point people to that contains information about:
if in A100, what are the things to try based on the workload, and what are the trade offs
and same for H100 and CPU
This should probably live in torchao and both transformers and diffusers can link to torchao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just need to fix the numbering a bit and this is good to go from my side.
@sayakpaul @stevhliu @jerryzh168 any thoughts from you guys?
|
||
```{note} | ||
Example Output: | ||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this!
``` | ||
|
||
(serving-with-vllm)= | ||
### 2. Serving with VLLM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there should be an inference/serving section (on the same level as "Configuration System"), where vLLM, HF transformers, and HF diffusers are 3 separate ways to do this. Right now the numbering is a bit confusing, we have 2. vLLM, 3a. HF transformers, and 3b. HF diffusers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be nice to either have a direct link to the relevant section (https://docs.pytorch.org/ao/main/torchao_vllm_integration.html#usage-examples) or just include the code snippet here so users don't have to navigate to a different page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll add a link directly to the usage examples section so that there isn't duplicate code between the pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! 🤗
```python | ||
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig | ||
|
||
model_id = "black-forest-labs/Flux.1-Dev" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we eventually update this example to use Int8WeightOnlyConfig
? (see PR here huggingface/diffusers#12275)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think so, probably after the PR is merged
|
||
Recall how we can quantize models using HuggingFace Transformers in Part 1. Now we can use the model for inference. | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel code example should live in transformers and diffusers page itself, and here we just need to link to them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @jerryzh168 here.
We should probably just include two basic examples (one for transformers
and one for diffusers
) and then provide links.
This way the content stays lean, to-the-point, and up-to-date (as the HF docs are generally up-to-date about the integrations).
bf8ecff
to
b45a76a
Compare
b45a76a
to
a425a27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
a425a27
to
1749293
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, see some comments inline
1749293
to
fef12b9
Compare
fef12b9
to
304f4ec
Compare
Adding HuggingFace integration docs with Transfomers/Diffusers per #2873