Stars
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
(CVPR 2025) Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
[SIGGRAPH Asia 2025] DreamO: A Unified Framework for Image Customization
Reference PyTorch implementation and models for DINOv3
TrackGait is a sub project of OpenGait. Implemented a gait recognition system.
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…
[TMLR 2025🔥] A survey for the autoregressive models in vision.
NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation Models
Implementation of "FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing"
[ICLR 2025] ControlAR: Controllable Image Generation with Autoregressive Models
Official implementation of "STAR: Scale-wise Text-to-image generation via Auto-Regressive representations"
EditAR: Unified Conditional Generation with Autoregressive Models (CVPR 2025)
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
official training and inference code of bitwise tokenizer
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
[ NeurIPS 2024 ] The official PyTorch implementation for Learning Truncated Causal History Model for Video Restoration.
A collection of literature after or concurrent with Masked Autoencoder (MAE) (Kaiming He el al.).
This is the official implementation for ControlVAR.
High-performance Image Tokenizers for VAR and AR
This is a repo to track the latest autoregressive visual generation papers.
PyTorch implementation of FractalGen https://arxiv.org/abs/2502.17437
Industry leading face manipulation platform
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
State-of-the-art 2D and 3D Face Analysis Project
[CVPR 2025 (Oral)] Open implementation of "RandAR"
🔥[CVPR2025] EventGPT: Event Stream Understanding with Multimodal Large Language Models
DiffuEraser is a diffusion model for video inpainting, which performs great content completeness and temporal consistency while maintaining acceptable efficiency.