This repo implements Flow Policy Optimization (FPO) for reinforcement learning in continuous action spaces.
Please see the blog and paper for more details.
David McAllister1,*, Songwei Ge1,*, Brent Yi1,*, Chung Min Kim1, Ethan Weber1, Hongsuk Choi1, Haiwen Feng1,2, and Angjoo Kanazawa1. Flow Matching Policy Gradients. arXiV, 2025. |
- July 28, 2025: Initial code release.
Our initial release contains two FPO implementations. Stay tuned for more updates!
gridworld/
contains PyTorch code for gridworld experiments, which are based on the
Eric Yu's PPO implementation.
playground/
contains JAX code for both FPO and PPO baselines in the DeepMind Control Suite experiments, which are based on
MuJoCo Playground and Brax.
phc/
contains PyTorch code for humanoid control experiments, which are based on the
Puffer PHC.
We thank Qiyang (Colin) Li, Oleg Rybkin, Lily Goli and Michael Psenka for helpful discussions and feedback on the manuscript. We thank Arthur Allshire, Tero Karras, Miika Aittala, Kevin Zakka and Seohong Park for insightful input and feedback on implementation details and the broader context of this work.