-
Tsinghua University
- Beijing,China
Stars
Generative Universal Verifier as Multimodal Meta-Reasoner
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
MiroThinker is open-source agentic models trained for deep research and complex tool use scenarios.
(ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines
A version of verl to support diverse tool use
My learning notes/codes for ML SYS.
A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable caption styles.
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Ph…
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Fully open reproduction of DeepSeek-R1
verl: Volcano Engine Reinforcement Learning for LLMs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Latest Advances on System-2 Reasoning
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
OctoTools: An agentic framework with extensible tools for complex reasoning
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Witness the aha moment of VLM with less than $3.
ToRA is a series of Tool-integrated Reasoning LLM Agents designed to solve challenging mathematical reasoning problems by interacting with tools [ICLR'24].
Scalable RL solution for advanced reasoning of language models