-
SK Telecom
- @South Korea
Stars
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
🔥🔥 Official Repo of UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
A parallelism VAE avoids OOM for high resolution image generation
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A unified inference and post-training framework for accelerated video generation.
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning
[Official] Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off
Reference PyTorch implementation and models for DINOv3
Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"
Wan: Open and Advanced Large-Scale Video Generative Models
[NeurIPS 2025] Official implementation of "XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation".
Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks