rlvr

Star

Here are 23 public repositories matching this topic...

alibaba / ROLL

Star

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

rlhf agentic rlvr

Updated Oct 14, 2025
Python

pat-jj / s3

Star

[EMNLP'25] s3 - ⚡ Efficient & Effective Search Agent Training via RL for RAG (Verifier-Powered RLVR for Search with Minimal Data)

information-retrieval efficiency verifier rag large-language-models search-agent gpt-5 agentic-ai rlvr

Updated Oct 15, 2025
Python

thinkwee / AgentsMeetRL

Star

An Awesome List of Agentic Model trained with Reinforcement Learning

agent awesome-list multiagent reinforcement llm rlhf large-language-model tool-learning agentic-workflow agentic-ai agentic-coding rlvr llm-age

Updated Oct 13, 2025
HTML

thuml / RLVR-World

Star

Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934

text-game video-generation robotic-manipulation video-prediction web-agent real2sim world-model webarena video-gpt grpo verl rlvr reinforcement-learning-with-verifiable-rewards

Updated Sep 27, 2025
Python

InternLM / CapRL

Star

An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"

image-captioning multi-modal caption-generation llm vision-language-model large-vision-language-models grpo rlvr

Updated Oct 15, 2025
Python

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

agent awesome reinforcement-learning rl awesome-list llm reward-model agentic-ai rlvr agent-training

Updated Sep 1, 2025

teilomillet / retrain

Star

a Python library that uses Reinforcement Learning (RL) to train LLMs.

mcp rl llm deepseek rlvr

Updated Aug 1, 2025
Python

sileod / reasoning_core

Star

A RL env with procedurally generated symbolic reasoning data

logic dataset dataset-generation reasoning llm grpo verifiers rlvr

Updated Sep 25, 2025
Python

RUC-GSAI / YuLan-SwarmIntell

Star

🐝 SwarmBench: Benchmarking LLMs' Swarm Intelligence

benchmark swarm swarm-intelligence kilobots swarm-robotics llms-benchmarking rlvr

Updated May 21, 2025
Python

osoleve / glitchlings

Star

Enemies for your LLM

nlp linguistics adversarial-data-augmentation rlvr

Updated Oct 16, 2025
Python

zli12321 / free-form-grpo

Star

grpo to train long form QA and instructions with long-form reward model

reinforcement-learning-algorithms evaluation-framework reward-design rl-training long-form-text-generation qwen2-5 grpo rlvr

Updated Jul 17, 2025
Python

purbeshmitra / MOTIF

Star

MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs

reinforcement-learning llm-training rlvr

Updated Jul 6, 2025
Python

slowfastai / LLM-Tool-Integrated-Reasoning-TIR-Papers

Star

A curated collection of research papers on LLM Tool-Integrated Reasoning (TIR), where LLMs enhance reasoning by interacting with external tools such as calculators, search engines, and code interpreters.

reinforcement-learning reinf tool-use large-language-models llms function-calling llm-reasoning rlvr tool-integrated-reasoning

Updated Aug 20, 2025

kylebrussell / cap-rlvr

Star

CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning using Caselaw Access Project data. Complete GRPO training pipeline with OpenAI Gym environments, deterministic reward functions, and multi-stage curriculum learning for legal LLM development.