-
NWPU -> NKU
- Tianjin, China
-
18:01
(UTC +08:00) - https://jbwang1997.github.io/
Stars
[NeurIPS 2025] Official implementation for "Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling"
[ICCV 2025] SuperDec: 3D Scene Decomposition with Superquadric Primitives.
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Official Implementation of DA^2: Depth Anything in Any Direction
SOTAMak1r / Infinite-Forcing
Forked from guandeh17/Self-ForcingInfinite-Forcing: Towards Infinite-Long Video Generation
[NeurIPS 2025 (Spotlight)] The implementation for the paper "4DGT Learning a 4D Gaussian Transformer Using Real-World Monocular Videos"
[NeurIPS'25 Spotlight] GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
[NeurIPS 2025] RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning
A minimal implementation of DeepMind's Genie world model
[CVPR 2024 Highlight] Visual Point Cloud Forecasting
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion (ICCV 2025)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"
Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
Tongyi Deep Research, the Leading Open-source Deep Research Agent
[CVPR'25 Oral] MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Official implementation of Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Code for FastVGGT: Training-Free Acceleration of Visual Geometry Transformer
[ICLR'23 Spotlight & ECCV'24 & IJCV'24] MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction
[ICLR2025] A PyTorch implementation for STORM: Spatiotemporal Reconstruction Model for Large-Scale Outdoor Scenes
Scalable and Generalizable Autonomous Driving Scene Synthesis
The offical repo for paper "VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers" (ICCV 2025)
MiroThinker is open-source agentic models trained for deep research and complex tool use scenarios.