Highlights
- Pro
Stars
Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in Large Multimodal Models'
Official PyTorch implementation of One-Minute Video Generation with Test-Time Training
[CVPR 2025] WildAvatar: Learning In-the-wild 3D Avatars from the Web
Wan: Open and Advanced Large-Scale Video Generative Models
Scalable and memory-optimized training of diffusion models
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
No fortress, purely open ground. OpenManus is Coming.
[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
SkyReels V1: The first and most advanced open-source human-centric video foundation model
LAVIS - A One-stop Library for Language-Vision Intelligence
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
The best OSS video generation models, created by Genmo
Agent S: an open agentic framework that uses computers like a human
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
[CVPR 2024] On the Content Bias in Fréchet Video Distance