+
Skip to content

akanazawa/fpo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flow Matching Policy Gradients

This repo implements Flow Policy Optimization (FPO) for reinforcement learning in continuous action spaces.

Please see the blog and paper for more details.

David McAllister1,*, Songwei Ge1,*, Brent Yi1,*, Chung Min Kim1, Ethan Weber1, Hongsuk Choi1, Haiwen Feng1,2, and Angjoo Kanazawa1. Flow Matching Policy Gradients. arXiV, 2025.
1UC Berkeley, 2Max Planck Institute for Intelligent Systems

Updates

  • July 28, 2025: Initial code release.

Repository Structure

Our initial release contains two FPO implementations. Stay tuned for more updates!

Gridworld

gridworld/ contains PyTorch code for gridworld experiments, which are based on the Eric Yu's PPO implementation.

MuJoCo Playground

playground/ contains JAX code for both FPO and PPO baselines in the DeepMind Control Suite experiments, which are based on MuJoCo Playground and Brax.

PHC

phc/ contains PyTorch code for humanoid control experiments, which are based on the Puffer PHC.

Acknowledgements

We thank Qiyang (Colin) Li, Oleg Rybkin, Lily Goli and Michael Psenka for helpful discussions and feedback on the manuscript. We thank Arthur Allshire, Tero Karras, Miika Aittala, Kevin Zakka and Seohong Park for insightful input and feedback on implementation details and the broader context of this work.

About

Implementation of Flow Policy Optimization (FPO)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载