SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Models

Overview

SoftCFG is an open-source implementation of the uncertainty-guided inference method described in the paper "SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Model" (arXiv:2510.00996). This framework addresses key challenges in applying Classifier-Free Guidance (CFG) to autoregressive (AR) models for image generation, such as guidance diminishing and over-guidance. SoftCFG distributes adaptive perturbations across generated tokens based on their uncertainty, ensuring stable and persistent guidance while resolving conflicts between text conditions and visual coherence. It introduces Step Normalization to bound cumulative perturbations for long-sequence stability.

The method is training-free, model-agnostic, and compatible with existing AR pipelines like AliTok (arXiv:2506.05289, GitHub) and RAR (arXiv:2411.00776, GitHub). Experiments on ImageNet 256×256 show state-of-the-art FID scores among AR models, e.g., reducing FID from 1.37 to 1.27 on AliTok-XL.

Features

Uncertainty-Guided Perturbation: Adaptively scales value caches based on token confidence (e.g., 1 - p_max) to provide context-aware guidance, reducing artifacts like duplicated objects.
Step Normalization: Bounds cumulative perturbations to prevent guidance explosion, ensuring stability in long sequences.
Compatibility: Seamlessly integrates with AR models like AliTok (B/L/XL variants) and RAR, supporting class-conditional and text-to-image generation.
Performance Gains: Achieves SOTA FID on ImageNet 256×256 (e.g., 1.27 on AliTok-XL), with negligible inference overhead (e.g., <1% slowdown).

Installation

Prerequisites

Python 3.8+
PyTorch 2.0+ (with CUDA for GPU acceleration)
Accelerate (for distributed training/inference)
Transformers, NumPy, Pillow, and other basics (see requirements.txt)
Optional: WANDB for logging, ADM evaluator for FID computation

Steps

Clone the repository:

git clone https://github.com/[your-username]/SoftCFG.git
cd SoftCFG

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate

Install dependencies: Please follow the installation instructions for the base AR models: AliTok and RAR:
Download pre-trained models:
- AliTok models: From Google Drive (e.g., tokenizer and AR checkpoints for B/L/XL).
- RAR models: From 1d-tokenizer GitHub (check README_RAR.md for checkpoints).

Prepare evaluation tools (for ImageNet FID):

git clone https://github.com/openai/guided-diffusion.git
cd guided-diffusion/evaluations
wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
cd ../..

Usage

SoftCFG is applied during inference on top of base AR models. Below are examples for class-conditional generation on ImageNet 256×256.

Basic Example with AliTok-XL

Generate 50k samples with SoftCFG:

cd SoftCFG/alitok &&\
torchrun --nnodes=1 --nproc_per_node=8 sample_imagenet.py \
  --config configs/alitok_xl.yaml \
  --experiment.output_dir="output/softcfg_alitok_xl" \
  --experiment.generator_checkpoint="weights/alitok_xl.bin" \
  --model.generator.guidance_scale=9 \
  --model.generator.guidance_scale_pow=1.5 \
  --model.generator.softcfg_strength=1 \ # use 0 to ignore softcfg
  --model.generator.step_norm=True

Evaluate FID:

python guided-diffusion/evaluations/evaluator.py \
  VIRTUAL_imagenet256_labeled.npz \
  output/softcfg_alitok_xl.npz

Command-Line Arguments

--softcfg_strength: 'cfg' (baseline) pls set this as 0 or 'softcfg' (ours) pls set this as 1.
--guidance_scale: Strength of guidance.
--guidance_scale_pow: Pow of Cosine guidance.
--step_norm: Enable Step Normalization (bool, default True).
For full options, run python sample_imagenet.py --help.

Example Output

Generated images are saved as .png files in the output directory. For metrics, expect FID ~1.27 on AliTok-XL with SoftCFG.

SoftCFG with Lumina-mGPT-2.0

SoftCFG also supports the Lumina-mGPT-2.0 autoregressive pipeline through the updated Jacobi sampler. After setting up the base project (see Lumina-mGPT-2.0/README.md for checkpoint and dependency details), you can enable SoftCFG with:

cd SoftCFG/Lumina-mGPT-2.0/lumina_mgpt &&\
python generate_examples/generate.py \
  --model_path path/to/ckpt \
  --save_path outputs/softcfg_lumina/ \
  --task t2i \
  --cfg 9.0 \
  --temperature 1.0 \
  --top_k 2048 \
  --speculative_jacobi \
  --softcfg_strength 1.0 \
  --softcfg_disable_step_norm false

Key flags:

--softcfg_strength: Set to 0 to disable SoftCFG or positive values to control perturbation strength (default 1.0).
--softcfg_disable_step_norm: Add this flag to turn off Step Normalization; omit it (or pass false) to keep the default stabilization.

The sampler writes images to --save_path and logs timings/FIDs using the same workflow as the original Lumina-mGPT release.

SoftCFG with RobusTok

RobusTok inherits the same RAR backbone as 1D-Tokenizer, so the SoftCFG hooks mirror that integration. After syncing the new RobusTok submodule, enable SoftCFG by setting the generator flags or passing them on the CLI:

cd SoftCFG/RobusTok &&\
torchrun --nnodes=1 --nproc_per_node=8 sample_imagenet_rar.py \
  config=configs/generator/rar.yaml \
  experiment.output_dir="output/softcfg_robus" \
  experiment.generator_checkpoint="weights/rar_b.pt" \
  model.generator.guidance_scale=16.0 \
  model.generator.guidance_scale_pow=2.75 \
  model.generator.softcfg_strength=1 \
  model.generator.step_norm=True

Notes:

model.generator.softcfg_strength: set to 0 to disable SoftCFG (default in config), positive values apply uncertainty-weighted KV scaling.
model.generator.step_norm: keeps step normalization on by default; set to False to match vanilla CFG behaviour.
The sampler and demo_util.sample_fn(_with_tf) automatically forward these options to RAR.generate.

Contributing

We welcome contributions! To contribute:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Make your changes and commit (git commit -m "Add support for new AR model").
Push to the branch (git push origin feature-branch).
Open a pull request describing your changes.

Please adhere to PEP 8 standards and include tests for new features.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use SoftCFG, please cite our paper:

@article{SoftCFG2025,
  title={SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Model},
  author={Dongli Xu and Aleksei Tiulpin and Matthew B. Blaschko},
  journal={arXiv preprint arXiv:2510.00996},
  year={2025},
  url={https://arxiv.org/abs/2510.00996}
}

Acknowledgments

This implementation builds on AliTok (GitHub), RAR (GitHub) and Lumina-mGPT2.0 (GitHub) . Thanks to the original authors for their foundational work.

Contact

For questions, open a GitHub issue or email dongliixu@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
1d-tokenizer @ cef2c94		1d-tokenizer @ cef2c94
Lumina-mGPT-2.0 @ 8238a6f		Lumina-mGPT-2.0 @ 8238a6f
RobusTok @ ac49d8e		RobusTok @ ac49d8e
alitok @ 97f90d2		alitok @ 97f90d2
assets		assets
.gitignore		.gitignore
.gitmodules		.gitmodules
README.MD		README.MD
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Models

Overview

Features

Installation

Prerequisites

Steps

Usage

Basic Example with AliTok-XL

Command-Line Arguments

Example Output

SoftCFG with Lumina-mGPT-2.0

SoftCFG with RobusTok

Contributing

License

Citation

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Languages

Xudangliatiger/SoftCFG

Folders and files

Latest commit

History

Repository files navigation

SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Models

Overview

Features

Installation

Prerequisites

Steps

Usage

Basic Example with AliTok-XL

Command-Line Arguments

Example Output

SoftCFG with Lumina-mGPT-2.0

SoftCFG with RobusTok

Contributing

License

Citation

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages