这是indexloc提供的服务,不要输入任何密码
Skip to content

WIP: end-to-end ONNX export and inference for stable diffusion #730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

fxmarty
Copy link
Contributor

@fxmarty fxmarty commented Jan 31, 2023

A work in progress extension of the stable diffusion deployment through the ONNX + ONNX Runtime deployment path, allowing to perform the end-to-end pipeline through a single InferenceSession, with the exception of tokenization and generating the timesteps, that is done ahead of time.

From my preliminar tests, the ONNX export and inference through CUDAExecutionProvider works nicely:
image

Major remaining issues:

  • Inference through CPUExecutionProvider yields garbage
  • Memory usage for a single-batch inference with CUDAExecutionProvider is huge, up to 20 GB, although the model is < GB and PyTorch inference takes ~5 GB during inference and the same setting.

From the two major issues above, it's clear that this POC is not usable as is. On top of fixing them, there would remain quite a bit of work:

  • Separate more clearly the schedulers and pipeline implementation. Currently there's a hack that mix both of them, but it's very bad for generality.
  • Support other pipelines than text2img (e.g. support as well img2img).
  • Test on a larger variety of models, currently this is only stable-diffusion 1.4.
  • Test with TensorrtExecutionProvider ==> I expect to have issues there as well, as in my experience the Loop / If support from TensorRT can be buggy: Model run with TensorrtExecutionProvider outputs different results compared to CPUExecutionProvider / CUDAExecutionProvider when the ONNX Loop operator is used onnx/onnx-tensorrt#891 I'm not sure how much NVIDIA folks are interested by TensorrtExecutionProvider to be honest.
  • Test in fp16 (if it's possible with ONNX Runtime and this kind of complex models)
  • Support passing width and height as inputs, or alternatively, to have them as constants that can be modified.
  • Support passing guidance_scale as a model input.
  • Test out with num_inference_steps != 50.

Longer term goals once all of this is tested:

  • Integration with optimum.exporters
  • Possibly, have an ORTStableDiffusionEndToEndPipeline or something like this, that is PyTorch-free.

@fxmarty fxmarty marked this pull request as draft January 31, 2023 10:59
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@fxmarty
Copy link
Contributor Author

fxmarty commented Feb 2, 2023

Inference through CPUExecutionProvider yields garbage

is due to a bug in FusedConv in ONNX Runtime, tracked in microsoft/onnxruntime#14500

Memory usage for a single-batch inference with CUDAExecutionProvider is huge

Tracked here: microsoft/onnxruntime#14526 . ONNX Runtime seems to be very greedy compared to PyTorch when it comes to GPU memory. From what I tried, CUDAExecutionProvider is basically unusable for stable diffusion currently.

@fxmarty
Copy link
Contributor Author

fxmarty commented Feb 3, 2023

torch.jit.trace is pretty much unusable with deep loop: pytorch/pytorch#93943 I'll just go on with torch.jit.scrit.

@fxmarty fxmarty force-pushed the research-end-to-end-onnx-stable-diffusion branch from 5f49c60 to 7f8685a Compare February 15, 2023 15:35
@philipwan
Copy link

When I run the code of this branch. I can generate single pt, onnx files, but using the same input to run onnx model multiple times, the output results are different. Why is this, are there any random parameters that need to be set fixed?

python run_ort.py --gpu

@echarlaix echarlaix added the onnx Related to the ONNX export label Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
onnx Related to the ONNX export
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants