Stars
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
An open source implementation of CLIP.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Background Music, a macOS audio utility: automatically pause your music, set individual apps' volumes and record system audio.
⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)
Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
(CVPR 2025) Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations [COLM 2025]
Fine-Tuning Dataset Auto-Generation for Graph Query Languages.
Chat2Graph: Graph Native Agentic System.
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus Agent Tools, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae…
[ICCV 2025] DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models (official implement)
GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities
Official implementations for paper: Anydoor: zero-shot object-level image customization
[CVPR 2025 Highlight] Official implementation of "MangaNinja: Line Art Colorization with Precise Reference Following"
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation