+
Skip to content

sony/cmt

Repository files navigation

Consistency Mid-Training (CMT)

TL;DR: CMT introduces a mid-training stage between (diffusion) pre-training and flow map post-training, cutting training costs dramatically: achieving SOTA 2-step FIDs while using up to 98% less data and GPU time than Consistency Models, and about 50% less total training time than Mean Flow.

👉 Checking here for our paper: arXiv

FID vs. training time for vanilla ECD and CMT (ours) on ImageNet 512×512

FID vs. training time for vanilla MF and CMT (ours) on ImageNet 256×256.

Two-Step Generated Images by CMT Trained on ImageNet 512×512

General Instructions:

The main branch is for unconditional CIFAR-10, FFHQ, and AFHQv2.

🔔 We will release codes and checkpoints for ImageNet 256x256 and 512x512 soon.

Our code structure follows ECT's format. The dependencies are the same as ECT. Before training, please put the dataset zip file under the datasets folder.

CIFAR-10

CIFAR-10 Dataset

Download the dataset zip from Google Drive and put it under the datasets folder, i.e., datasets/CIFAR-10-32x32.zip. Otherwise, one can obtain the dataset by following EDM's Preparing Dataset section.

https://drive.google.com/drive/folders/1EPl9fc8XOgc135x8l0EGwTsHLf0ip_pQ?usp=drive_link.

CMT Mid-Training Stage

bash run_dcp_cifar.sh <NGPUs> <PORT> --desc bs128.300k

The command is mid-training with a 38.4Mimgs budget in 300k iterations. The resulting model checkpoint, logs, and all other details are available at https://drive.google.com/drive/folders/1Kc5pGlR-4aeXYYGEM6XzN9r4svmc6eRV?usp=drive_link.

CMT Post-Training Stage with ECT

bash run_dcp+ecm_cifar.sh <NGPUs> <PORT> --desc bs128.100k

It uses \Delta t = 1/4096 throughout the 12.8Mimgs/100K iterations training and uses the mid-training checkpoint. The resulting model is a SOTA CM with 2.74 one-step FID and 1.97 two-step FID. The resulting SOTA model checkpoint, logs, and all other details are available at https://drive.google.com/drive/folders/1Di17c-QdUV9yfC9rpZgjg-I2Tcjp-WLu?usp=drive_link.

FFHQ (64 x 64)

FFHQ Dataset

Download the dataset zip from Google Drive and put it under the datasets folder, i.e., datasets/ffhq-64x64.zip. Otherwise, one can obtain the dataset by following EDM's Preparing Dataset section.

We provide the processed FFHQ dataset in https://drive.google.com/drive/folders/1EPl9fc8XOgc135x8l0EGwTsHLf0ip_pQ?usp=drive_link.

CMT Mid-Training Stage

bash run_dcp_ffhq.sh <NGPUs> <PORT> --desc bs128.300k

The command is mid-training with a 38.4 Mimgs budget in 300k iterations. The resulting model checkpoint, logs, and all other details are available at https://drive.google.com/drive/folders/1UlPyNUEAZ5aM8OKc01F1Nw4DvoFxZQbf?usp=drive_link.

CMT Post-Training Stage with ECT

bash run_dcp+ecm_ffhq.sh <NGPUs> <PORT> --desc bs128.100k

It uses \Delta t = 1/512 throughout the 12.8Mimgs/100K training and uses the mid-training checkpoint. The resulting model is a SOTA CM with 3.89 one-step FID and 2.75 two-step FID. The resulting SOTA model checkpoint, logs, and all other details are available at https://drive.google.com/drive/folders/1m4v8cqcd1nelBZdMAUxxfj9EHOTlkReW?usp=drive_link.

AFHQv2 (64 x 64)

AFHQ Dataset

Download the dataset zip from Google Drive and put it under the datasets folder, i.e., datasets/afhqv2-64x64.zip. Otherwise, one can obtain the dataset by following EDM's Preparing Dataset section.

We provide the processed AFHQv2 dataset in https://drive.google.com/drive/folders/1EPl9fc8XOgc135x8l0EGwTsHLf0ip_pQ?usp=drive_link.

CMT Mid-Training Stage

bash run_dcp_afhq.sh <NGPUs> <PORT> --desc bs128.300k

Same hyperparameters as FFHQ. The results are at https://drive.google.com/drive/folders/1PezXgBQLLvNib_iKpYrPj26NX59Mml0y?usp=drive_link.

CMT Post-Training Stage with ECT

bash run_dcp+ecm_afhq.sh <NGPUs> <PORT> --desc bs128.100k

1/2-step FID=3.28/2.34. The results are at https://drive.google.com/drive/folders/1Bq7g7l-ErK_7EsrAomsGCaR7QMLVU_TJ?usp=drive_link.

Evaluations

First, one needs to download the pretrained checkpoints from the above links.

Then, run the following:

bash eval_ecm.sh <NGPUs> <PORT> --resume <CKPT_PATH>
bash eval_ecm_ffhq.sh <NGPUs> <PORT> --resume <CKPT_PATH>

ImageNet 64x64, 256x256, and 512x512

Please refer to other branches for the three resolutions of ImageNet.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载