_{RLinf: Reinforcement Learning Infrastructure for Agentic AI}

_{RLinf: Reinforcement Learning Infrastructure for Agentic AI}

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

What's NEW!

[2025/08] RLinf is open-sourced. The formal v0.1 will be released soon.
[2025/09] The paper RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation is released.

Key Features

RLinf is unique with:

Macro-to-Micro Flow: a new paradigm M2Flow, which executes macro-level logical flows through micro-level execution flows, decoupling logical workflow construction (programmable) from physical communication and scheduling (efficiency).
Flexible Execution Modes
- Collocated mode: shares all GPUs across all workers.
- Disaggregated mode: enables fine-grained pipelining.
- Hybrid mode: a customizable combination of different placement modes, integrating both collocated and disaggregated modes.
Auto-scheduling Strategy: automatically selects the most suitable execution mode based on the training workload, without the need for manual resource allocation.
Embodied Agent Support
- Fast adaptation support for mainstream VLA models: OpenVLA, OpenVLA-OFT, π₀ and π₀.₅.
- Support for mainstream CPU & GPU-based simulators via standardized RL interfaces: ManiSkill3, LIBERO.
- Enabling the first RL fine-tuning of the $\pi_0$ and $\pi_{0.5}$ model family with a flow-matching action expert.

RLinf is fast with:

Hybrid mode with fine-grained pipelining: achieves a 120%+ throughput improvement compared to other frameworks.
Automatic Online Scaling Strategy: dynamically scales training resources, with GPU switching completed within seconds, further improving efficiency by 20–40% while preserving the on-policy nature of RL algorithms.

RLinf is flexible and easy to use with:

Multiple Backend Integrations
- FSDP + Hugging Face: rapid adaptation to new models and algorithms, ideal for beginners and fast prototyping.
- Megatron + SGLang: optimized for large-scale training, delivering maximum efficiency for expert users with demanding workloads.
Adaptive communication via the asynchronous communication channel
Built-in support for popular RL methods, including PPO, GRPO, DAPO, Reinforce++, and more.

Main Results

Embodied Intelligence

OpenVLA-OFT model results on ManiSkill3
Model	Vision	Semantic	Position	Average
rl4vla	76.6%	75.4%	77.6%	76.1%
GRPO-OpenVLA-OFT	84.6%	51.6%	42.9%	61.5%
PPO-OpenVLA-OFT	80.5%	56.6%	56.1%	64.5%
PPO-OpenVLA	82.0%	80.6%	89.3%	82.2%
GRPO-OpenVLA	74.7%	74.4%	81.6%	75.5%

OpenVLA-OFT model results on LIBERO
Model	Spatial	Goal	Object	Long	Average
OpenVLA-OFT-SFT (one-shot)	56.5%	45.6%	25.6%	9.7%	34.4%
OpenVLA-OFT-RLinf	99.0%	99.0%	99.0%	94.4%	97.9%
Improvement	+42.5%	+53.4%	+73.4%	+84.7%	+63.5%

RLinf supports both PPO and GRPO algorithms, enabling state-of-the-art training for Vision-Language-Action models.
The framework provides seamless integration with mainstream embodied intelligence benchmarks, including ManiSkill3 and LIBERO, and achieves strong performance across diverse evaluation metrics.

Math Reasoning

1.5B model results
Model	AIME 24	AIME 25	GPQA-diamond	Average
DeepSeek-R1-Distill-Qwen-1.5B (base model)	28.33	24.90	27.45	26.89
DeepMath-1.5B	37.80	30.42	32.11	33.44
DeepScaleR-1.5B-Preview	40.41	30.93	27.54	32.96
AReaL-1.5B-Preview-Stage-3	40.73	31.56	28.10	33.46
AReaL-1.5B-retrain*	44.42	34.27	33.81	37.50
FastCuRL-1.5B-V3	43.65	32.49	35.00	37.05
RLinf-math-1.5B	48.44	35.63	38.46	40.84

* We retrain the model using the default settings for 600 steps.

7B model results
Model	AIME 24	AIME 25	GPQA-diamond	Average
DeepSeek-R1-Distill-Qwen-7B (base model)	54.90	40.20	45.48	46.86
AReaL-boba-RL-7B	61.66	49.38	46.93	52.66
Skywork-OR1-7B	66.87	52.49	44.43	54.60
Polaris-7B-Preview	68.55	51.24	43.88	54.56
AceMath-RL-Nemotron-7B	67.30	55.00	45.57	55.96
RLinf-math-7B	68.33	52.19	48.18	56.23

RLinf achieves state-of-the-art performance on math reasoning tasks, consistently outperforming existing models across multiple benchmarks (AIME 24, AIME 25, GPQA-diamond) for both 1.5B and 7B model sizes.

Roadmap

1. System-Level Enhancements

Support for heterogeneous GPUs
Support for asynchronous pipeline execution
Support for Mixture of Experts (MoE)
Support for vLLM inference backend

2. Application-Level Extensions

Support for Vision-Language Models (VLMs) training
Support for deep searcher agent training
Support for multi-agent training
Support for integration with more embodied simulators (e.g., Meta-World, GENESIS, RoboTwin)
Support for more Vision Language Action models (VLAs), such as GR00T, WALL-OSS
Support for world model
Support for real-world RL embodied intelligence

Getting Started

Complete documentation for RLinf can be found Here.

Quickstart

Key Design

Example Gallery

Advanced Features

Extending The Framework:

Blogs

Comparison with VeRL

Build Status

Type	Status
Reasoning RL-MATH
Embodied RL-VLA

Contribution Guidelines

We welcome contributions to RLinf. Please read contribution guide before taking action.

Citation and Acknowledgement

If you find RLinf helpful, please cite the paper:

@misc{yu2025rlinfflexibleefficientlargescale,
  title={RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow Transformation}, 
  author={Chao Yu and Yuanqing Wang and Zhen Guo and Hao Lin and Si Xu and Hongzhi Zang and Quanlu Zhang and Yongji Wu and Chunyang Zhu and Junhao Hu and Zixiao Huang and Mingjie Wei and Yuqing Xie and Ke Yang and Bo Dai and Zhexuan Xu and Xiangyuan Wang and Xu Fu and Zhihao Liu and Kang Chen and Weilin Liu and Gang Liu and Boxun Li and Jianlei Yang and Zhi Yang and Guohao Dai and Yu Wang},
  year={2025},
  eprint={2509.15965},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2509.15965}, 
}

If you use RL+VLA in RLinf, you can also cite our empirical study paper:

@misc{liu2025rlbringvlageneralization,
  title={What Can RL Bring to VLA Generalization? An Empirical Study}, 
  author={Jijia Liu and Feng Gao and Bingwen Wei and Xinlei Chen and Qingmin Liao and Yi Wu and Chao Yu and Yu Wang},
  year={2025},
  eprint={2505.19789},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2505.19789}, 
}

Acknowledgements RLinf has been inspired by, and benefits from, the ideas and tooling of the broader open-source community. In particular, we would like to thank the teams and contributors behind VeRL, AReaL, Megatron-LM, SGLang, and PyTorch Fully Sharded Data Parallel (FSDP), and if we have inadvertently missed your project or contribution, please open an issue or a pull request so we can properly credit you.

Contact: We welcome applications from Postdocs, PhD/Master's students, and interns. Join us in shaping the future of RL infrastructure and embodied AI!

Chao Yu: zoeyuchao@gmail.com
Yu Wang: yu-wang@tsinghua.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github		.github
docker		docker
docs		docs
examples		examples
ray_utils		ray_utils
requirements		requirements
rlinf		rlinf
tests		tests
toolkits		toolkits
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

_{RLinf: Reinforcement Learning Infrastructure for Agentic AI}

What's NEW!

Key Features

Main Results

Embodied Intelligence

Math Reasoning

Roadmap

1. System-Level Enhancements

2. Application-Level Extensions

Getting Started

Build Status

Contribution Guidelines

Citation and Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

anHappyDog/RLinf

Folders and files

Latest commit

History

Repository files navigation

RLinf: Reinforcement Learning Infrastructure for Agentic AI

What's NEW!

Key Features

Main Results

Embodied Intelligence

Math Reasoning

Roadmap

1. System-Level Enhancements

2. Application-Level Extensions

Getting Started

Build Status

Contribution Guidelines

Citation and Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

_{RLinf: Reinforcement Learning Infrastructure for Agentic AI}

Packages