-
Tsinghua University
- Beijing, China
-
21:23
(UTC +08:00) - thuwzy.github.io
Starred repositories
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
ViPE: Video Pose Engine for Geometric 3D Perception
A curated collection of fun and creative examples generated with Nano Banana🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the community's development…
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstruction.
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
Generate large-scale explorable 3D scenes with high-quality panorama videos from a single image or text prompt.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Code for "Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers" (NeurIPS 2024)
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
PhysX: Physical-Grounded 3D Asset Generation (NeurIPS 2025, Spotlight)
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Towards a Generative 3D World Engine for Embodied Intelligence
Mesh Silksong: Auto-Regressive Mesh Generation as Weaving Silk
Code implementation for: From Virtual Games to Real-World Play
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Efficient Part-level 3D Object Generation via Dual Volume Packing
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
[NeurIPS 2025 Spotlight] A Native Multimodal LLM for 3D Generation and Understanding
Ongoing research training transformer models at scale
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Official repository for BrickGPT, the first approach for generating physically stable toy brick models from text prompts.