LMD0311

😇

Xin Zhou LMD0311

😇

79 followers · 36 following

Huazhong University of Science & Technology
Wuhan, Hubei Province, China
18:00 (UTC +08:00)
https://orcid.org/0009-0009-4752-6118
@THELMDOFZHOUXIN
https://lmd0311.github.io/

Achievements

Highlights

Organizations

Lists (1)

Sort

🚀 My stack

1 repository

Stars

OpenHelix-Team / LLaVA-VLA

LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [Actively Maintained🔥]

Python 109 2 Updated Jul 24, 2025

ZhaoYujie2002 / LangSplatV2

Python 88 4 Updated Jul 13, 2025

dk-liang / UniFuture

A Unified Driving World Model for Future Generation and Perception

Python 110 4 Updated Jul 22, 2025

Perceive-Anything / PAM

Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

Jupyter Notebook 267 14 Updated Jun 26, 2025

Yaofang-Liu / Pusa-VidGen

Pusa: Thousands Timesteps Video Diffusion Model

Python 523 38 Updated Jul 26, 2025

InternRobotics / Aether

[ICCV 2025] Aether: Geometric-Aware Unified World Modeling

Python 416 4 Updated Jul 7, 2025

UMass-Embodied-AGI / MindJourney

Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"

Python 65 1 Updated Jul 18, 2025

yyfz / Pi3

Code of π^3: Scalable Permutation-Equivariant Visual Geometry Learning

Python 844 21 Updated Jul 18, 2025

LMD0311 / HERMES

[ICCV 2025] HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

Python 130 5 Updated Jul 14, 2025

kwsong0113 / diffusion-forcing-transformer

[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"

Python 416 18 Updated Jul 1, 2025

FlagOpen / RoboBrain2.0

RoboBrain 2.0: Advanced version of RoboBrain. See Better. Think Harder. Do Smarter. 🎉🎉🎉

Python 472 36 Updated Jul 25, 2025

facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.

Python 2,477 574 Updated Jul 24, 2025

tianweiy / CausVid

(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Python 813 37 Updated May 17, 2025

Stability-AI / stable-virtual-camera

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

Python 1,386 96 Updated Jun 5, 2025

HorizonRobotics / EmbodiedGen

Towards a Generative 3D World Engine for Embodied Intelligence

Python 265 13 Updated Jul 21, 2025

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,284 1,385 Updated Jul 9, 2025

liuff19 / LangScene-X

[ICCV 2025] LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Python 258 16 Updated Jul 15, 2025

H-EmbodVis / EasyCache

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching

Python 201 3 Updated Jul 19, 2025

PaddlePaddle / ERNIE

The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit based on PaddlePaddle.

Jupyter Notebook 7,371 1,400 Updated Jul 25, 2025

runjiali-rl / vmem

[ICCV 2025 ⭐highlight⭐] Implementation of VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

Python 222 11 Updated Jul 25, 2025

PKU-YuanGroup / UniWorld-V1

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 656 19 Updated Jul 16, 2025

Visual-Agent / DeepEyes

Python 669 28 Updated Jul 7, 2025

MiniMax-AI / MiniMax-M1

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.

Python 2,740 221 Updated Jul 7, 2025

Yuliang-Liu / MonkeyOCR

A lightweight LMM-based Document Parsing Model

Python 5,237 321 Updated Jul 24, 2025

bingreeky / GMemory

SAS 50 4 Updated Jun 10, 2025

JAMESYJL / ShapeLLM-Omni

A Native Multimodal LLM for 3D Generation and Understanding

Python 463 25 Updated Jul 4, 2025

microsoft / TRELLIS

Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation" (CVPR'25 Spotlight).

Python 10,195 879 Updated May 30, 2025

facebookresearch / Multi-SpatialMLLM

Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Python 137 6 Updated May 26, 2025

dvlab-research / Jenga

Official Implementation: Training-Free Efficient Video Generation via Dynamic Token Carving

Python 221 10 Updated Jun 29, 2025

helblazer811 / Diffusion-Explorer

Interactive visualizations of the geometric intuition behind diffusion models.

Svelte 804 32 Updated Jun 17, 2025