+
Skip to content

Xudangliatiger/SoftCFG

Repository files navigation

SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Models

SoftCFG

Overview

SoftCFG is an open-source implementation of the uncertainty-guided inference method described in the paper "SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Model" (arXiv:2510.00996). This framework addresses key challenges in applying Classifier-Free Guidance (CFG) to autoregressive (AR) models for image generation, such as guidance diminishing and over-guidance. SoftCFG distributes adaptive perturbations across generated tokens based on their uncertainty, ensuring stable and persistent guidance while resolving conflicts between text conditions and visual coherence. It introduces Step Normalization to bound cumulative perturbations for long-sequence stability.

The method is training-free, model-agnostic, and compatible with existing AR pipelines like AliTok (arXiv:2506.05289, GitHub) and RAR (arXiv:2411.00776, GitHub). Experiments on ImageNet 256×256 show state-of-the-art FID scores among AR models, e.g., reducing FID from 1.37 to 1.27 on AliTok-XL.

Features

  • Uncertainty-Guided Perturbation: Adaptively scales value caches based on token confidence (e.g., 1 - p_max) to provide context-aware guidance, reducing artifacts like duplicated objects.
  • Step Normalization: Bounds cumulative perturbations to prevent guidance explosion, ensuring stability in long sequences.
  • Compatibility: Seamlessly integrates with AR models like AliTok (B/L/XL variants) and RAR, supporting class-conditional and text-to-image generation.
  • Performance Gains: Achieves SOTA FID on ImageNet 256×256 (e.g., 1.27 on AliTok-XL), with negligible inference overhead (e.g., <1% slowdown).

Installation

Prerequisites

  • Python 3.8+
  • PyTorch 2.0+ (with CUDA for GPU acceleration)
  • Accelerate (for distributed training/inference)
  • Transformers, NumPy, Pillow, and other basics (see requirements.txt)
  • Optional: WANDB for logging, ADM evaluator for FID computation

Steps

  1. Clone the repository:

    git clone https://github.com/[your-username]/SoftCFG.git
    cd SoftCFG
  2. Create a virtual environment (recommended):

    python -m venv venv
    source venv/bin/activate 
  3. Install dependencies: Please follow the installation instructions for the base AR models: AliTok and RAR:

  4. Download pre-trained models:

    • AliTok models: From Google Drive (e.g., tokenizer and AR checkpoints for B/L/XL).
    • RAR models: From 1d-tokenizer GitHub (check README_RAR.md for checkpoints).
  5. Prepare evaluation tools (for ImageNet FID):

    git clone https://github.com/openai/guided-diffusion.git
    cd guided-diffusion/evaluations
    wget https://openaipublic.blob.core.windows.net/diffusion/jul-2021/ref_batches/imagenet/256/VIRTUAL_imagenet256_labeled.npz
    cd ../..

Usage

SoftCFG is applied during inference on top of base AR models. Below are examples for class-conditional generation on ImageNet 256×256.

Basic Example with AliTok-XL

Generate 50k samples with SoftCFG:

cd SoftCFG/alitok &&\
torchrun --nnodes=1 --nproc_per_node=8 sample_imagenet.py \
  --config configs/alitok_xl.yaml \
  --experiment.output_dir="output/softcfg_alitok_xl" \
  --experiment.generator_checkpoint="weights/alitok_xl.bin" \
  --model.generator.guidance_scale=9 \
  --model.generator.guidance_scale_pow=1.5 \
  --model.generator.softcfg_strength=1 \ # use 0 to ignore softcfg
  --model.generator.step_norm=True

Evaluate FID:

python guided-diffusion/evaluations/evaluator.py \
  VIRTUAL_imagenet256_labeled.npz \
  output/softcfg_alitok_xl.npz

Command-Line Arguments

  • --softcfg_strength: 'cfg' (baseline) pls set this as 0 or 'softcfg' (ours) pls set this as 1.
  • --guidance_scale: Strength of guidance.
  • --guidance_scale_pow: Pow of Cosine guidance.
  • --step_norm: Enable Step Normalization (bool, default True).
  • For full options, run python sample_imagenet.py --help.

Example Output

Generated images are saved as .png files in the output directory. For metrics, expect FID ~1.27 on AliTok-XL with SoftCFG.

SoftCFG with Lumina-mGPT-2.0

SoftCFG also supports the Lumina-mGPT-2.0 autoregressive pipeline through the updated Jacobi sampler. After setting up the base project (see Lumina-mGPT-2.0/README.md for checkpoint and dependency details), you can enable SoftCFG with:

cd SoftCFG/Lumina-mGPT-2.0/lumina_mgpt &&\
python generate_examples/generate.py \
  --model_path path/to/ckpt \
  --save_path outputs/softcfg_lumina/ \
  --task t2i \
  --cfg 9.0 \
  --temperature 1.0 \
  --top_k 2048 \
  --speculative_jacobi \
  --softcfg_strength 1.0 \
  --softcfg_disable_step_norm false

Key flags:

  • --softcfg_strength: Set to 0 to disable SoftCFG or positive values to control perturbation strength (default 1.0).
  • --softcfg_disable_step_norm: Add this flag to turn off Step Normalization; omit it (or pass false) to keep the default stabilization.

The sampler writes images to --save_path and logs timings/FIDs using the same workflow as the original Lumina-mGPT release.

SoftCFG with RobusTok

RobusTok inherits the same RAR backbone as 1D-Tokenizer, so the SoftCFG hooks mirror that integration. After syncing the new RobusTok submodule, enable SoftCFG by setting the generator flags or passing them on the CLI:

cd SoftCFG/RobusTok &&\
torchrun --nnodes=1 --nproc_per_node=8 sample_imagenet_rar.py \
  config=configs/generator/rar.yaml \
  experiment.output_dir="output/softcfg_robus" \
  experiment.generator_checkpoint="weights/rar_b.pt" \
  model.generator.guidance_scale=16.0 \
  model.generator.guidance_scale_pow=2.75 \
  model.generator.softcfg_strength=1 \
  model.generator.step_norm=True

Notes:

  • model.generator.softcfg_strength: set to 0 to disable SoftCFG (default in config), positive values apply uncertainty-weighted KV scaling.
  • model.generator.step_norm: keeps step normalization on by default; set to False to match vanilla CFG behaviour.
  • The sampler and demo_util.sample_fn(_with_tf) automatically forward these options to RAR.generate.

Contributing

We welcome contributions! To contribute:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature-branch).
  3. Make your changes and commit (git commit -m "Add support for new AR model").
  4. Push to the branch (git push origin feature-branch).
  5. Open a pull request describing your changes.

Please adhere to PEP 8 standards and include tests for new features.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use SoftCFG, please cite our paper:

@article{SoftCFG2025,
  title={SoftCFG: Uncertainty-Guided Stable Guidance for Visual Autoregressive Model},
  author={Dongli Xu and Aleksei Tiulpin and Matthew B. Blaschko},
  journal={arXiv preprint arXiv:2510.00996},
  year={2025},
  url={https://arxiv.org/abs/2510.00996}
}

Acknowledgments

This implementation builds on AliTok (GitHub), RAR (GitHub) and Lumina-mGPT2.0 (GitHub) . Thanks to the original authors for their foundational work.

Contact

For questions, open a GitHub issue or email dongliixu@gmail.com.

About

The official implementation of SoftCFG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载