-
NWPU -> NKU
- Tianjin, China
-
04:10
(UTC +08:00) - https://jbwang1997.github.io/
Stars
official code of "MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction"
[ICCV 2025] This is the official PyTorch codes for the paper: "DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution"
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Open-source simulator for autonomous driving research.
Official code for the paper: Depth Anything At Any Condition
[ICCV 2023] MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond.
[CVPR 2025 Oral & Award Candidate] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
[ICCV 2025] DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
Dingo: A Comprehensive AI Data Quality Evaluation Tool
Adding Scene-Centric Forecasting Control to Occupancy World Model
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
[ICLR 2023 Oral] Zero-Shot Image Restoration Using Denoising Diffusion Null-Space Model
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[ICCV 2025] GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
ICCV 2025 | Nexus: Decoupled Diffusion Sparks Adaptive Scene Generation
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
A Paper List for Humanoid Robot Learning.
Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Wan: Open and Advanced Large-Scale Video Generative Models
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.