WIP: end-to-end ONNX export and inference for stable diffusion #730

fxmarty · 2023-01-31T10:59:21Z

A work in progress extension of the stable diffusion deployment through the ONNX + ONNX Runtime deployment path, allowing to perform the end-to-end pipeline through a single InferenceSession, with the exception of tokenization and generating the timesteps, that is done ahead of time.

From my preliminar tests, the ONNX export and inference through CUDAExecutionProvider works nicely:

Major remaining issues:

Inference through CPUExecutionProvider yields garbage
Memory usage for a single-batch inference with CUDAExecutionProvider is huge, up to 20 GB, although the model is < GB and PyTorch inference takes ~5 GB during inference and the same setting.

From the two major issues above, it's clear that this POC is not usable as is. On top of fixing them, there would remain quite a bit of work:

Separate more clearly the schedulers and pipeline implementation. Currently there's a hack that mix both of them, but it's very bad for generality.
Support other pipelines than text2img (e.g. support as well img2img).
Test on a larger variety of models, currently this is only stable-diffusion 1.4.
Test with TensorrtExecutionProvider ==> I expect to have issues there as well, as in my experience the Loop / If support from TensorRT can be buggy: Model run with TensorrtExecutionProvider outputs different results compared to CPUExecutionProvider / CUDAExecutionProvider when the ONNX Loop operator is used onnx/onnx-tensorrt#891 I'm not sure how much NVIDIA folks are interested by TensorrtExecutionProvider to be honest.
Test in fp16 (if it's possible with ONNX Runtime and this kind of complex models)
Support passing width and height as inputs, or alternatively, to have them as constants that can be modified.
Support passing guidance_scale as a model input.
Test out with num_inference_steps != 50.

Longer term goals once all of this is tested:

Integration with optimum.exporters
Possibly, have an ORTStableDiffusionEndToEndPipeline or something like this, that is PyTorch-free.

HuggingFaceDocBuilderDev · 2023-01-31T11:19:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

fxmarty · 2023-02-02T10:02:43Z

Inference through CPUExecutionProvider yields garbage

is due to a bug in FusedConv in ONNX Runtime, tracked in microsoft/onnxruntime#14500

Memory usage for a single-batch inference with CUDAExecutionProvider is huge

Tracked here: microsoft/onnxruntime#14526 . ONNX Runtime seems to be very greedy compared to PyTorch when it comes to GPU memory. From what I tried, CUDAExecutionProvider is basically unusable for stable diffusion currently.

fxmarty · 2023-02-03T09:54:39Z

torch.jit.trace is pretty much unusable with deep loop: pytorch/pytorch#93943 I'll just go on with torch.jit.scrit.

philipwan · 2023-09-08T05:32:57Z

When I run the code of this branch. I can generate single pt, onnx files, but using the same input to run onnx model multiple times, the output results are different. Why is this, are there any random parameters that need to be set fixed?

python run_ort.py --gpu

fxmarty marked this pull request as draft January 31, 2023 10:59

fxmarty mentioned this pull request Jan 31, 2023

Model run through CPUExecutionProvider and CUDAExecutionProvider do not yield the same output microsoft/onnxruntime#14500

Closed

fxmarty added 9 commits February 15, 2023 16:35

add scriptable pipeline

8f883d5

cleanup

2a585fd

fix

10ae56f

fix

d57a3d3

few nits

1eebcea

add script

259b6b6

nit

f7a4661

nit

6179a88

add back old

7f8685a

fxmarty force-pushed the research-end-to-end-onnx-stable-diffusion branch from 5f49c60 to 7f8685a Compare February 15, 2023 15:35

fxmarty added 15 commits February 15, 2023 16:47

fix

3d398fa

use 1.4

67a99ba

simplify

645c586

nit

540bc5d

nit

fcf4583

nit

dc0156d

nit

c54f594

nit

d61a600

nit

134890b

nit

133cfd0

nit

6ce50b8

final fixes

0a097fc

support width, height, guidance_scale, num_images_per_prompt

19058cc

nit

1fd2034

fix

7db2e17

nit

6767f93

tianleiwu mentioned this pull request Sep 6, 2023

[Feature Request] Tool to fuse Stable Diffusion from multiple models into single model microsoft/onnxruntime#17430

Closed

echarlaix added the onnx Related to the ONNX export label Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: end-to-end ONNX export and inference for stable diffusion #730

WIP: end-to-end ONNX export and inference for stable diffusion #730

Uh oh!

fxmarty commented Jan 31, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jan 31, 2023

Uh oh!

fxmarty commented Feb 2, 2023

Uh oh!

fxmarty commented Feb 3, 2023

Uh oh!

philipwan commented Sep 8, 2023

Uh oh!

Uh oh!

WIP: end-to-end ONNX export and inference for stable diffusion #730

Are you sure you want to change the base?

WIP: end-to-end ONNX export and inference for stable diffusion #730

Uh oh!

Conversation

fxmarty commented Jan 31, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jan 31, 2023

Uh oh!

fxmarty commented Feb 2, 2023

Uh oh!

fxmarty commented Feb 3, 2023

Uh oh!

philipwan commented Sep 8, 2023

Uh oh!

Uh oh!