+
Skip to content
@RLHFlow

RLHFlow

Code for the Workflow of Reinforcement Learning from Human Feedback (RLHF)

Popular repositories Loading

  1. RLHF-Reward-Modeling RLHF-Reward-Modeling Public

    Recipes to train reward model for RLHF.

    Python 1.5k 103

  2. Online-RLHF Online-RLHF Public

    A recipe for online RLHF and online iterative DPO.

    Python 533 50

  3. Online-DPO-R1 Online-DPO-R1 Public

    Codebase for Iterative DPO Using Rule-based Rewards

    Python 258 33

  4. Minimal-RL Minimal-RL Public

    Python 241 11

  5. Self-rewarding-reasoning-LLM Self-rewarding-reasoning-LLM Public

    Recipes to train the self-rewarding reasoning LLMs.

    Python 226 10

  6. Reinforce-Ada Reinforce-Ada Public

    An adaptive sampling framework for Reinforce-style LLM post training.

    Python 63 6

Repositories

Showing 10 of 11 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载