SoftCFG is an open-source implementation of the uncertainty-guided inference method described in the paper "SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Model" (arXiv:2510.00996). This framework addresses key challenges in applying Classifier-Free Guidance (CFG) to autoregressive (AR) models for image generation, such as guidance diminishing and over-guidance. SoftCFG distributes adaptive perturbations across generated tokens based on their uncertainty, ensuring stable and persistent guidance while resolving conflicts between text conditions and visual coherence. It introduces Step Normalization to bound cumulative perturbations for long-sequence stability.
The method is training-free, model-agnostic, and compatible with existing AR pipelines like AliTok (arXiv:2506.05289, GitHub) and RAR (arXiv:2411.00776, GitHub). Experiments on ImageNet 256×256 show state-of-the-art FID scores among AR models, e.g., reducing FID from 1.37 to 1.27 on AliTok-XL.
- Uncertainty-Guided Perturbation: Adaptively scales value caches based on token confidence (e.g., 1 - p_max) to provide context-aware guidance, reducing artifacts like duplicated objects.
- Step Normalization: Bounds cumulative perturbations to prevent guidance explosion, ensuring stability in long sequences.
- Compatibility: Seamlessly integrates with AR models like AliTok (B/L/XL variants) and RAR, supporting class-conditional and text-to-image generation.
- Performance Gains: Achieves SOTA FID on ImageNet 256×256 (e.g., 1.27 on AliTok-XL), with negligible inference overhead (e.g., <1% slowdown).
- Python 3.8+
- PyTorch 2.0+ (with CUDA for GPU acceleration)
- Accelerate (for distributed training/inference)
- Transformers, NumPy, Pillow, and other basics (see
requirements.txt
) - Optional: WANDB for logging, ADM evaluator for FID computation
-
Clone the repository:
git clone https://github.com/[your-username]/SoftCFG.git cd SoftCFG
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate
-
Install dependencies: Please follow the installation instructions for the base AR models: AliTok and RAR:
-
Download pre-trained models:
- AliTok models: From Google Drive (e.g., tokenizer and AR checkpoints for B/L/XL).
- RAR models: From 1d-tokenizer GitHub (check README_RAR.md for checkpoints).
-
Prepare evaluation tools (for ImageNet FID):
git clone https://github.com/openai/guided-diffusion.git cd guided-diffusion/evaluations wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz cd ../..
SoftCFG is applied during inference on top of base AR models. Below are examples for class-conditional generation on ImageNet 256×256.
Generate 50k samples with SoftCFG:
cd SoftCFG/alitok &&\
torchrun --nnodes=1 --nproc_per_node=8 sample_imagenet.py \
--config configs/alitok_xl.yaml \
--experiment.output_dir="output/softcfg_alitok_xl" \
--experiment.generator_checkpoint="weights/alitok_xl.bin" \
--model.generator.guidance_scale=9 \
--model.generator.guidance_scale_pow=1.5 \
--model.generator.softcfg_strength=1 \ # use 0 to ignore softcfg
--model.generator.step_norm=True
Evaluate FID:
python guided-diffusion/evaluations/evaluator.py \
VIRTUAL_imagenet256_labeled.npz \
output/softcfg_alitok_xl.npz
--softcfg_strength
: 'cfg' (baseline) pls set this as 0 or 'softcfg' (ours) pls set this as 1.--guidance_scale
: Strength of guidance.--guidance_scale_pow
: Pow of Cosine guidance.--step_norm
: Enable Step Normalization (bool, default True).- For full options, run
python sample_imagenet.py --help
.
Generated images are saved as .png files in the output directory. For metrics, expect FID ~1.27 on AliTok-XL with SoftCFG.
SoftCFG also supports the Lumina-mGPT-2.0 autoregressive pipeline through the updated Jacobi sampler. After setting up the base project (see Lumina-mGPT-2.0/README.md
for checkpoint and dependency details), you can enable SoftCFG with:
cd SoftCFG/Lumina-mGPT-2.0/lumina_mgpt &&\
python generate_examples/generate.py \
--model_path path/to/ckpt \
--save_path outputs/softcfg_lumina/ \
--task t2i \
--cfg 9.0 \
--temperature 1.0 \
--top_k 2048 \
--speculative_jacobi \
--softcfg_strength 1.0 \
--softcfg_disable_step_norm false
Key flags:
--softcfg_strength
: Set to0
to disable SoftCFG or positive values to control perturbation strength (default1.0
).--softcfg_disable_step_norm
: Add this flag to turn off Step Normalization; omit it (or passfalse
) to keep the default stabilization.
The sampler writes images to --save_path
and logs timings/FIDs using the same workflow as the original Lumina-mGPT release.
RobusTok inherits the same RAR backbone as 1D-Tokenizer, so the SoftCFG hooks mirror that integration. After syncing the new RobusTok
submodule, enable SoftCFG by setting the generator flags or passing them on the CLI:
cd SoftCFG/RobusTok &&\
torchrun --nnodes=1 --nproc_per_node=8 sample_imagenet_rar.py \
config=configs/generator/rar.yaml \
experiment.output_dir="output/softcfg_robus" \
experiment.generator_checkpoint="weights/rar_b.pt" \
model.generator.guidance_scale=16.0 \
model.generator.guidance_scale_pow=2.75 \
model.generator.softcfg_strength=1 \
model.generator.step_norm=True
Notes:
model.generator.softcfg_strength
: set to0
to disable SoftCFG (default in config), positive values apply uncertainty-weighted KV scaling.model.generator.step_norm
: keeps step normalization on by default; set toFalse
to match vanilla CFG behaviour.- The sampler and
demo_util.sample_fn(_with_tf)
automatically forward these options toRAR.generate
.
We welcome contributions! To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch
). - Make your changes and commit (
git commit -m "Add support for new AR model"
). - Push to the branch (
git push origin feature-branch
). - Open a pull request describing your changes.
Please adhere to PEP 8 standards and include tests for new features.
This project is licensed under the MIT License. See the LICENSE file for details.
If you use SoftCFG, please cite our paper:
@article{SoftCFG2025,
title={SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Model},
author={Dongli Xu and Aleksei Tiulpin and Matthew B. Blaschko},
journal={arXiv preprint arXiv:2510.00996},
year={2025},
url={https://arxiv.org/abs/2510.00996}
}
This implementation builds on AliTok (GitHub), RAR (GitHub) and Lumina-mGPT2.0 (GitHub) . Thanks to the original authors for their foundational work.
For questions, open a GitHub issue or email dongliixu@gmail.com.