Stars
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Code release for https://kovenyu.com/WonderWorld/
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
Official implementation for "JanusVLN: Decoupling Semantics and Spatiality with Dual Implicit Memory for Vision-Language Navigation"
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
rCM: SOTA Diffusion Distillation & Few-Step Video Generation
[ICCV 2025] D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
A fast and simple implementation of RL algorithms, designed to run fully on GPU.
Official repo for paper "EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning"
Media Downloader is a Qt/C++ front end to yt-dlp, youtube-dl, gallery-dl, lux, you-get, svtplay-dl, aria2c, wget and safari books..
[SIGGRAPH 2025] One Model to Rig Them All: Diverse Skeleton Rigging with UniRig
TransNet V2: Shot Boundary Detection Neural Network
TypeMovie-ParaAttention is an enhanced version of ParaAttention, designed to accelerate Diffusion Transformer (DiT) model inference with context parallelism, dynamic caching, and a new high-perform…
Pandora: Towards General World Model with Natural Language Actions and Video States
The fundamental package for scientific computing with Python.
Implementation of "Hyperspherical Latents Improve Continuous-Token Autoregressive"
Training library for Megatron-based models
Official code for the CVPR 2025 paper "SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models."
An open-source vibe coding platform that helps you build your own vibe-coding platform, built entirely on Cloudflare stack
[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation