+
Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
7007d83
Update
May 22, 2025
4c3cbaf
Update
May 22, 2025
2c3f757
Update
May 22, 2025
995faa8
Update
May 22, 2025
6fce4e6
Update
May 22, 2025
8a48708
Update
May 22, 2025
b187eb2
Update
May 22, 2025
0173e4b
Update
May 22, 2025
0a10089
Update
May 22, 2025
b5f0e8c
Update
May 22, 2025
8f11a43
Update
May 23, 2025
b565b27
Update
May 23, 2025
938a338
Update
May 23, 2025
9734c79
Update
May 23, 2025
9944ca5
Update
May 23, 2025
746cbb9
Update
May 23, 2025
264e40f
Update
May 23, 2025
57f4fd3
Update
May 23, 2025
d5b435a
Update
May 23, 2025
4fa1fd7
Update
May 23, 2025
0ed5caf
Update
May 23, 2025
728c0c3
Update
May 23, 2025
40ac01a
Update
May 23, 2025
bce665d
Update
May 23, 2025
e7188bd
Update
May 23, 2025
8bdd211
Update
vmoens May 29, 2025
9549547
Update
vmoens Jun 2, 2025
b195004
Update
vmoens Jun 2, 2025
82eceba
Update
vmoens Jun 2, 2025
6382814
Update
vmoens Jun 2, 2025
4cc875d
Update
vmoens Jun 2, 2025
19e701d
Update
vmoens Jun 2, 2025
1c4e528
Update
vmoens Jun 2, 2025
5305522
Update
vmoens Jun 2, 2025
d96cbdc
Update
vmoens Jun 2, 2025
fcfa098
Update
vmoens Jun 2, 2025
ee5f3fb
Update
vmoens Jun 2, 2025
9411df1
Update
vmoens Jun 2, 2025
dd67da4
Update
vmoens Jun 2, 2025
33c9f91
Update
vmoens Jun 2, 2025
8184c95
Update
vmoens Jun 2, 2025
7a12ae8
Update
vmoens Jun 2, 2025
f2bdf16
Update
vmoens Jun 2, 2025
0464c77
Update
vmoens Jun 3, 2025
16af4c6
Update
vmoens Jun 3, 2025
c11c5de
Update
vmoens Jun 3, 2025
2440207
Update
vmoens Jun 3, 2025
0326295
Update
vmoens Jun 3, 2025
3b1fc1f
Update
vmoens Jun 3, 2025
7528593
Update
vmoens Jun 4, 2025
edd9ea1
Update
vmoens Jun 5, 2025
334126f
Update
vmoens Jun 5, 2025
fbd0e0f
Update
vmoens Jun 5, 2025
cfb5b31
Update
vmoens Jun 5, 2025
c9c3926
Update
vmoens Jun 5, 2025
0addc4b
Update
vmoens Jun 5, 2025
5ff419d
Update
vmoens Jun 5, 2025
6d89d5c
Update
vmoens Jun 5, 2025
caf73fd
Update
vmoens Jun 5, 2025
4c294ac
Update
vmoens Jun 5, 2025
5b72f3d
Update
vmoens Jun 5, 2025
c18012c
Update
vmoens Jun 5, 2025
1510647
Update
vmoens Jun 5, 2025
1df01eb
Update
vmoens Jun 5, 2025
825b3d3
Update
vmoens Jun 5, 2025
488a595
Update
vmoens Jun 5, 2025
3d461b0
Update
vmoens Jun 5, 2025
773a729
Update
vmoens Jun 5, 2025
5daa67a
Update
vmoens Jun 5, 2025
9a55768
Update
vmoens Jun 5, 2025
212a0d0
Update
vmoens Jun 5, 2025
6974b24
Update
vmoens Jun 6, 2025
cbc93e5
Update
vmoens Jun 6, 2025
821afe1
Update
vmoens Jun 6, 2025
9a3196b
Update
vmoens Jun 6, 2025
aecfc47
Update
vmoens Jun 6, 2025
1e2eb62
Update
vmoens Jun 6, 2025
3e0c9d0
Update
vmoens Jun 6, 2025
f4234aa
Update
vmoens Jun 6, 2025
5a1b3bd
Update
vmoens Jun 6, 2025
a5574c3
Update
vmoens Jun 6, 2025
ace2796
Update
vmoens Jun 6, 2025
54bcdb1
Update
vmoens Jun 6, 2025
46861bd
Update
vmoens Jun 6, 2025
965ed1a
Update
vmoens Jun 6, 2025
44fe77c
Update
vmoens Jun 6, 2025
0d8a64f
Update
vmoens Jun 6, 2025
1ca82c4
Update
vmoens Jun 6, 2025
f82a440
Update
vmoens Jun 6, 2025
dd4d43a
Update
vmoens Jun 6, 2025
b701c25
Update
vmoens Jun 6, 2025
8e885bf
Update
vmoens Jun 7, 2025
a4ca1c2
Update
vmoens Jun 7, 2025
7561d18
Update
vmoens Jun 7, 2025
102e708
Update
vmoens Jun 7, 2025
d200008
Update
vmoens Jun 7, 2025
e522601
Update
vmoens Jun 7, 2025
19c0dd1
Update
vmoens Jun 7, 2025
e624ac1
Update
vmoens Jun 7, 2025
85bc8de
Update
vmoens Jun 7, 2025
bcfa77f
Update
vmoens Jun 7, 2025
dae84e2
Update
vmoens Jun 7, 2025
13e199f
Update
vmoens Jun 7, 2025
203365c
Update
vmoens Jun 7, 2025
ace577a
Update
vmoens Jun 7, 2025
fa89fc0
Update
vmoens Jun 7, 2025
4def870
Update
vmoens Jun 7, 2025
83f4285
Update
vmoens Jun 8, 2025
4428a6e
Update
vmoens Jun 8, 2025
e46ce79
Update
vmoens Jun 8, 2025
c0b8623
Update
vmoens Jun 8, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -180,4 +180,4 @@ log
.DS_Store
Roms

scratch/*.py
scratch/*
152 changes: 41 additions & 111 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -826,85 +826,52 @@ If you're using TorchRL, please refer to this BibTeX entry to cite this work:

## Installation

Create a conda environment where the packages will be installed.
### Create a new virtual environment:
```bash
python -m venv torchrl
source torchrl/bin/activate # On Windows use: venv\Scripts\activate
```

Or create a conda environment where the packages will be installed.

```
conda create --name torch_rl python=3.9
conda activate torch_rl
conda create --name torchrl python=3.9
conda activate torchrl
```

**PyTorch**
### Install dependencies:

Depending on the use of functorch that you want to make, you may want to
#### PyTorch

Depending on the use of torchrl that you want to make, you may want to
install the latest (nightly) PyTorch release or the latest stable version of PyTorch.
See [here](https://pytorch.org/get-started/locally/) for a detailed list of commands,
including `pip3` or other special installation instructions.

**Torchrl**
TorchRL offers a few pre-defined dependencies such as `"torchrl[tests]"`, `"torchrl[atari]"` etc.

#### Torchrl

You can install the **latest stable release** by using
```bash
pip3 install torchrl
```
This should work on linux, Windows 10 and OsX (Intel or Silicon chips).
On certain Windows machines (Windows 11), one should install the library locally (see below).

For AArch64 machines, the binaries are not yet stored on PyPI so you will need to download them directly from
the [release page](https://github.com/pytorch/rl/releases/) or install the library via
```
pip3 install git+https://github.com/pytorch/rl@v0.8.0
```
This should work on linux (including AArch64 machines), Windows 10 and OsX (Metal chips only).
On certain Windows machines (Windows 11), one should build the library locally.
This can be done in two ways:

The **nightly build** can be installed via
```bash
pip3 install tensordict-nightly torchrl-nightly
```
which we currently only ship for Linux machines.
Importantly, the nightly builds require the nightly builds of PyTorch too.

To install extra dependencies, call
```bash
pip3 install "torchrl[atari,dm_control,gym_continuous,rendering,tests,utils,marl,open_spiel,checkpointing]"
```
or a subset of these.

To install torchrl with the latest pytorch, use
```bash
pip3 install "torchrl[replay_buffer]"
```
since some features in the replay buffer require PyTorch 2.7.0 or above.

One may also desire to install the library locally. Three main reasons can motivate this:
- the nightly/stable release isn't available for one's platform (eg, Windows 11, nightlies for Apple Silicon etc.);
- contributing to the code;
- install torchrl with a previous version of PyTorch (any version >= 2.1) (note that this should also be doable via a regular install followed
by a downgrade to a previous pytorch version -- but the C++ binaries will not be available so some feature will not work,
such as prioritized replay buffers and the like.)

**Disclaimer**: As of today, TorchRL is roughly compatible with any pytorch version >= 2.1 and installing it will not
directly require a newer version of pytorch to be installed. Indirectly though, tensordict still requires the latest
PyTorch to be installed and we are working hard to loosen that requirement.
The C++ binaries of TorchRL (mainly for prioritized replay buffers) will only work with PyTorch 2.7.0 and above.
Some features (e.g., working with nested jagged tensors) may also
be limited with older versions of pytorch. It is recommended to use the latest TorchRL with the latest PyTorch version
unless there is a strong reason not to do so.

To install the library locally, start by cloning the repo:
```bash
# Install and build locally v0.8.1 of the library without cloning
pip3 install git+https://github.com/pytorch/rl@v0.8.1
# Clone the library and build it locally
git clone https://github.com/pytorch/tensordict
git clone https://github.com/pytorch/rl
```
and don't forget to check out the branch or tag you want to use for the build:
```bash
git checkout v0.8.0
pip install -e tensordict
pip install -e rl
```

Go to the directory where you have cloned the torchrl repo and install it (after
installing `ninja`)
```bash
cd /path/to/torchrl/
pip3 install ninja -U
python setup.py develop
```
Note that tensordict local build requires `cmake` to be installed via [homebrew](https://brew.sh/) (MacOS) or another package manager
such as `apt`, `apt-get`, `conda` or `yum` but NOT `pip`, as well as `pip install "pybind11[global]"`.

One can also build the wheels to distribute to co-workers using
```bash
Expand All @@ -915,22 +882,22 @@ Your wheels will be stored there `./dist/torchrl<name>.whl` and installable via
pip install torchrl<name>.whl
```

**Warning**: Unfortunately, `pip3 install -e .` does not currently work. Contributions to help fix this are welcome!

On M1 machines, this should work out-of-the-box with the nightly build of PyTorch.
If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message
`(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))` appears, then try

```
ARCHFLAGS="-arch arm64" python setup.py develop
The **nightly build** can be installed via
```bash
pip3 install tensordict-nightly torchrl-nightly
```
which we currently only ship for Linux machines.
Importantly, the nightly builds require the nightly builds of PyTorch too.
Also, a local build of torchrl with the nightly build of tensordict may fail - install both nightlies or both local builds but do not mix them.

To run a quick sanity check, leave that directory (e.g. by executing `cd ~/`)
and try to import the library.
```
python -c "import torchrl"
```
This should not return any warning or error.

**Disclaimer**: As of today, TorchRL is roughly compatible with any pytorch version >= 2.1 and installing it will not
directly require a newer version of pytorch to be installed. Indirectly though, tensordict still requires the latest
PyTorch to be installed and we are working hard to loosen that requirement.
The C++ binaries of TorchRL (mainly for prioritized replay buffers) will only work with PyTorch 2.7.0 and above.
Some features (e.g., working with nested jagged tensors) may also
be limited with older versions of pytorch. It is recommended to use the latest TorchRL with the latest PyTorch version
unless there is a strong reason not to do so.

**Optional dependencies**

Expand Down Expand Up @@ -959,43 +926,6 @@ pip3 install tensorboard
pip3 install wandb
```

**Troubleshooting**

If a `ModuleNotFoundError: No module named ‘torchrl._torchrl` errors occurs (or
a warning indicating that the C++ binaries could not be loaded),
it means that the C++ extensions were not installed or not found.

- One common reason might be that you are trying to import torchrl from within the
git repo location. The following code snippet should return an error if
torchrl has not been installed in `develop` mode:
```
cd ~/path/to/rl/repo
python -c 'from torchrl.envs.libs.gym import GymEnv'
```
If this is the case, consider executing torchrl from another location.
- If you're not importing torchrl from within its repo location, it could be
caused by a problem during the local installation. Check the log after the
`python setup.py develop`. One common cause is a g++/C++ version discrepancy
and/or a problem with the `ninja` library.
- If the problem persists, feel free to open an issue on the topic in the repo,
we'll make our best to help!
- On **MacOs**, we recommend installing XCode first.
With Apple Silicon M1 chips, make sure you are using the arm64-built python
(e.g. [here](https://betterprogramming.pub/how-to-install-pytorch-on-apple-m1-series-512b3ad9bc6)).
Running the following lines of code
```
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
python collect_env.py
```
should display
```
OS: macOS *** (arm64)
```
and not
```
OS: macOS **** (x86_64)
```

Versioning issues can cause error message of the type ```undefined symbol```
and such. For these, refer to the [versioning issues document](https://github.com/pytorch/rl/blob/main/knowledge_base/VERSIONING_ISSUES.md)
for a complete explanation and proposed workarounds.
Expand Down
8 changes: 1 addition & 7 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,13 +195,7 @@ def _main(argv):
sys.argv = [sys.argv[0]] + unknown

extra_requires = {
"atari": [
"gym",
"atari-py",
"ale-py",
"gym[accept-rom-license]",
"pygame",
],
"atari": ["gymnasium[atari]"],
"dm_control": ["dm_control"],
"replay_buffer": ["torch>=2.7.0"],
"gym_continuous": ["gymnasium<1.0", "mujoco"],
Expand Down
136 changes: 136 additions & 0 deletions sota-implementations/grpo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# GRPO: Generalized Reward-Conditioned Policy Optimization

This is an implementation of GRPO for language models, built on top of TorchRL.

## Overview

GRPO is a method for training language models using reinforcement learning, with the following key features:
- Multi-GPU support with efficient device management
- Mixed precision training
- Gradient accumulation
- Automatic checkpointing
- Comprehensive logging with Weights & Biases
- Hydra configuration system

## Installation

1. Install dependencies:
```bash
# GSM8K deps
pip install -r sota-implementations/grpo/requirements_gsm8k.txt
# IFEval deps
pip install -r sota-implementations/grpo/requirements_ifeval.txt
```

2. Set required environment variables:
```bash
export VLLM_USE_V1=0 # Required for vLLM compatibility
```

## Hardware Requirements

- At least 3 CUDA-capable GPUs:
- Training device(s)
- vLLM inference device
- Reference model device

Devices can be controlled via the `training_model.devices`, `inference_model.devices` and `ref_model.devices` arguments.

## Configuration

The training configuration is managed through Hydra. There are two main configuration files:
- `config/grpo_gsm8k.yaml`: Default configuration for GSM8K tasks (default)
- `config/grpo_ifeval.yaml`: Configuration optimized for IFEval tasks

## Usage

### Basic Training

```bash
python grpo.py
```

### Run with IFEval Config

```bash
python grpo.py --config-name grpo_ifeval
```

### Override Config Values

```bash
# Change dataset
python grpo.py env.dataset=ifeval

# Modify training parameters
python grpo.py train.epochs=2 train.optimizer.lr=2e-5

# Change model
python grpo.py model.name=meta-llama/Llama-2-7b-hf
```

### Hyperparameter Sweeps

```bash
# Learning rate sweep
python grpo.py --multirun train.optimizer.lr=1e-4,1e-5,1e-6

# Multiple parameters
python grpo.py --multirun \
train.optimizer.lr=1e-4,1e-5 \
policy.kl_coef=0.01,0.1
```

## Monitoring

Training progress is logged to Weights & Biases with the following metrics:
- Reward
- Advantage
- KL penalty
- Sequence length
- ESS (Effective Sample Size)
- Loss metrics (objective, clip fraction, etc.)
- Gradient norm

## Checkpointing

Checkpoints are saved every `logging.checkpoint_frequency` batches and contain:
- Model state
- Optimizer state
- Gradient scaler state (for mixed precision)
- Full configuration

## Debugging Out-of-memory issues

- vLLM: Reduce `inference_model.gpu_memory_utilization=FRACTION` or number of environments run
in parallel (`env.num_envs=N`).
- KL scoring: If the KL scoring is achieved on the batch of data,
reduce the number of environments (`env.num_envs=N`) run in parallel.
- Training: Reduce batch size (`train.optim_batch_size`)

## Directory Structure

```
sota-implementations/grpo/
├── config/
│ └── grpo_gsm8k.yaml # Main configuration file
│ └── grpo_ifeval.yaml # config file for IFEval task
├── grpo.py # Training script
├── grpo_utils.py # Utility functions
└── README.md # This file
```

## Output Structure

Each run creates a timestamped directory under `outputs/`:
```
outputs/
└── YYYY-MM-DD/
└── HH-MM-SS/
├── checkpoints/
│ └── checkpoint_*.pt
└── .hydra/
└── config.yaml
```

For hyperparameter sweeps, outputs are stored under `multirun/`.
Loading
Loading
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载