pytorch · vmoens · Jun 8, 2025 · May 22, 2025 · May 22, 2025 · May 22, 2025
diff --git a/.gitignore b/.gitignore
@@ -180,4 +180,4 @@ log
 .DS_Store
 Roms
 
-scratch/*.py
+scratch/*
diff --git a/README.md b/README.md
@@ -826,85 +826,52 @@ If you're using TorchRL, please refer to this BibTeX entry to cite this work:
 
 ## Installation
 
-Create a conda environment where the packages will be installed.
+### Create a new virtual environment:
+```bash
+python -m venv torchrl
+source torchrl/bin/activate  # On Windows use: venv\Scripts\activate
+```
+
+Or create a conda environment where the packages will be installed.
 
 ```
-conda create --name torch_rl python=3.9
-conda activate torch_rl
+conda create --name torchrl python=3.9
+conda activate torchrl
 ```
 
-**PyTorch**
+### Install dependencies:
 
-Depending on the use of functorch that you want to make, you may want to 
+#### PyTorch
+
+Depending on the use of torchrl that you want to make, you may want to 
 install the latest (nightly) PyTorch release or the latest stable version of PyTorch.
 See [here](https://pytorch.org/get-started/locally/) for a detailed list of commands, 
 including `pip3` or other special installation instructions.
 
-**Torchrl**
+TorchRL offers a few pre-defined dependencies such as `"torchrl[tests]"`, `"torchrl[atari]"` etc. 
+
+#### Torchrl
 
 You can install the **latest stable release** by using
 ```bash
 pip3 install torchrl
 ```
-This should work on linux, Windows 10 and OsX (Intel or Silicon chips).
-On certain Windows machines (Windows 11), one should install the library locally (see below).
-
-For AArch64 machines, the binaries are not yet stored on PyPI so you will need to download them directly from
-the [release page](https://github.com/pytorch/rl/releases/) or install the library via
-```
-pip3 install git+https://github.com/pytorch/rl@v0.8.0
-```
+This should work on linux (including AArch64 machines), Windows 10 and OsX (Metal chips only).
+On certain Windows machines (Windows 11), one should build the library locally.
+This can be done in two ways:
 
-The **nightly build** can be installed via
-```bash
-pip3 install tensordict-nightly torchrl-nightly
-```
-which we currently only ship for Linux machines.
-Importantly, the nightly builds require the nightly builds of PyTorch too.
-
-To install extra dependencies, call
-```bash
-pip3 install "torchrl[atari,dm_control,gym_continuous,rendering,tests,utils,marl,open_spiel,checkpointing]"
-```
-or a subset of these.
-
-To install torchrl with the latest pytorch, use
-```bash
-pip3 install "torchrl[replay_buffer]"
-```
-since some features in the replay buffer require PyTorch 2.7.0 or above.
-
-One may also desire to install the library locally. Three main reasons can motivate this:
-- the nightly/stable release isn't available for one's platform (eg, Windows 11, nightlies for Apple Silicon etc.);
-- contributing to the code;
-- install torchrl with a previous version of PyTorch (any version >= 2.1) (note that this should also be doable via a regular install followed
-  by a downgrade to a previous pytorch version -- but the C++ binaries will not be available so some feature will not work,  
-  such as prioritized replay buffers and the like.)
-
-  **Disclaimer**: As of today, TorchRL is roughly compatible with any pytorch version >= 2.1 and installing it will not
-  directly require a newer version of pytorch to be installed. Indirectly though, tensordict still requires the latest
-  PyTorch to be installed and we are working hard to loosen that requirement. 
-  The C++ binaries of TorchRL (mainly for prioritized replay buffers) will only work with PyTorch 2.7.0 and above.
-  Some features (e.g., working with nested jagged tensors) may also
-  be limited with older versions of pytorch. It is recommended to use the latest TorchRL with the latest PyTorch version
-  unless there is a strong reason not to do so.
-
-To install the library locally, start by cloning the repo:
 ```bash
+# Install and build locally v0.8.1 of the library without cloning
+pip3 install git+https://github.com/pytorch/rl@v0.8.1
+# Clone the library and build it locally
+git clone https://github.com/pytorch/tensordict
 git clone https://github.com/pytorch/rl
-```
-and don't forget to check out the branch or tag you want to use for the build:
-```bash
-git checkout v0.8.0
+pip install -e tensordict
+pip install -e rl
 ```
 
-Go to the directory where you have cloned the torchrl repo and install it (after
-installing `ninja`)
-```bash
-cd /path/to/torchrl/
-pip3 install ninja -U
-python setup.py develop
-```
+Note that tensordict local build requires `cmake` to be installed via [homebrew](https://brew.sh/) (MacOS) or another package manager
+such as `apt`, `apt-get`, `conda` or `yum` but NOT `pip`, as well as `pip install "pybind11[global]"`.   
 
 One can also build the wheels to distribute to co-workers using
 ```bash
@@ -915,22 +882,22 @@ Your wheels will be stored there `./dist/torchrl<name>.whl` and installable via
 pip install torchrl<name>.whl
 ```
 
-**Warning**: Unfortunately, `pip3 install -e .` does not currently work. Contributions to help fix this are welcome!
-
-On M1 machines, this should work out-of-the-box with the nightly build of PyTorch.
-If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message
-`(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))` appears, then try
-
-```
-ARCHFLAGS="-arch arm64" python setup.py develop
+The **nightly build** can be installed via
+```bash
+pip3 install tensordict-nightly torchrl-nightly
 ```
+which we currently only ship for Linux machines.
+Importantly, the nightly builds require the nightly builds of PyTorch too.
+Also, a local build of torchrl with the nightly build of tensordict may fail - install both nightlies or both local builds but do not mix them.
 
-To run a quick sanity check, leave that directory (e.g. by executing `cd ~/`)
-and try to import the library.
-```
-python -c "import torchrl"
-```
-This should not return any warning or error.
+
+**Disclaimer**: As of today, TorchRL is roughly compatible with any pytorch version >= 2.1 and installing it will not
+directly require a newer version of pytorch to be installed. Indirectly though, tensordict still requires the latest
+PyTorch to be installed and we are working hard to loosen that requirement. 
+The C++ binaries of TorchRL (mainly for prioritized replay buffers) will only work with PyTorch 2.7.0 and above.
+Some features (e.g., working with nested jagged tensors) may also
+be limited with older versions of pytorch. It is recommended to use the latest TorchRL with the latest PyTorch version
+unless there is a strong reason not to do so.
 
 **Optional dependencies**
 
@@ -959,43 +926,6 @@ pip3 install tensorboard
 pip3 install wandb
 ```
 
-**Troubleshooting**
-
-If a `ModuleNotFoundError: No module named ‘torchrl._torchrl` errors occurs (or
-a warning indicating that the C++ binaries could not be loaded),
-it means that the C++ extensions were not installed or not found.
-
-- One common reason might be that you are trying to import torchrl from within the
-  git repo location. The following code snippet should return an error if
-  torchrl has not been installed in `develop` mode:
-  ```
-  cd ~/path/to/rl/repo
-  python -c 'from torchrl.envs.libs.gym import GymEnv'
-  ```
-  If this is the case, consider executing torchrl from another location.
-- If you're not importing torchrl from within its repo location, it could be
-  caused by a problem during the local installation. Check the log after the
-  `python setup.py develop`. One common cause is a g++/C++ version discrepancy
-  and/or a problem with the `ninja` library.
-- If the problem persists, feel free to open an issue on the topic in the repo,
-  we'll make our best to help!
-- On **MacOs**, we recommend installing XCode first. 
-  With Apple Silicon M1 chips, make sure you are using the arm64-built python
-  (e.g. [here](https://betterprogramming.pub/how-to-install-pytorch-on-apple-m1-series-512b3ad9bc6)).
-  Running the following lines of code
-  ```
-  wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
-  python collect_env.py
-  ```
-  should display
-  ```
-  OS: macOS *** (arm64)
-  ```
-  and not
-  ```
-  OS: macOS **** (x86_64)
-  ```
-
 Versioning issues can cause error message of the type ```undefined symbol```
 and such. For these, refer to the [versioning issues document](https://github.com/pytorch/rl/blob/main/knowledge_base/VERSIONING_ISSUES.md)
 for a complete explanation and proposed workarounds.

diff --git a/setup.py b/setup.py
@@ -195,13 +195,7 @@ def _main(argv):
     sys.argv = [sys.argv[0]] + unknown
 
     extra_requires = {
-        "atari": [
-            "gym",
-            "atari-py",
-            "ale-py",
-            "gym[accept-rom-license]",
-            "pygame",
-        ],
+        "atari": ["gymnasium[atari]"],
         "dm_control": ["dm_control"],
         "replay_buffer": ["torch>=2.7.0"],
         "gym_continuous": ["gymnasium<1.0", "mujoco"],

diff --git a/sota-implementations/grpo/README.md b/sota-implementations/grpo/README.md
@@ -0,0 +1,136 @@
+# GRPO: Generalized Reward-Conditioned Policy Optimization
+
+This is an implementation of GRPO for language models, built on top of TorchRL.
+
+## Overview
+
+GRPO is a method for training language models using reinforcement learning, with the following key features:
+- Multi-GPU support with efficient device management
+- Mixed precision training
+- Gradient accumulation
+- Automatic checkpointing
+- Comprehensive logging with Weights & Biases
+- Hydra configuration system
+
+## Installation
+
+1. Install dependencies:
+```bash
+# GSM8K deps
+pip install -r sota-implementations/grpo/requirements_gsm8k.txt
+# IFEval deps
+pip install -r sota-implementations/grpo/requirements_ifeval.txt
+```
+
+2. Set required environment variables:
+```bash
+export VLLM_USE_V1=0  # Required for vLLM compatibility
+```
+
+## Hardware Requirements
+
+- At least 3 CUDA-capable GPUs:
+  - Training device(s)
+  - vLLM inference device
+  - Reference model device
+
+Devices can be controlled via the `training_model.devices`, `inference_model.devices` and `ref_model.devices` arguments.
+
+## Configuration
+
+The training configuration is managed through Hydra. There are two main configuration files:
+- `config/grpo_gsm8k.yaml`: Default configuration for GSM8K tasks (default)
+- `config/grpo_ifeval.yaml`: Configuration optimized for IFEval tasks
+
+## Usage
+
+### Basic Training
+
+```bash
+python grpo.py
+```
+
+### Run with IFEval Config
+
+```bash
+python grpo.py --config-name grpo_ifeval
+```
+
+### Override Config Values
+
+```bash
+# Change dataset
+python grpo.py env.dataset=ifeval
+
+# Modify training parameters
+python grpo.py train.epochs=2 train.optimizer.lr=2e-5
+
+# Change model
+python grpo.py model.name=meta-llama/Llama-2-7b-hf
+```
+
+### Hyperparameter Sweeps
+
+```bash
+# Learning rate sweep
+python grpo.py --multirun train.optimizer.lr=1e-4,1e-5,1e-6
+
+# Multiple parameters
+python grpo.py --multirun \
+  train.optimizer.lr=1e-4,1e-5 \
+  policy.kl_coef=0.01,0.1
+```
+
+## Monitoring
+
+Training progress is logged to Weights & Biases with the following metrics:
+- Reward
+- Advantage
+- KL penalty
+- Sequence length
+- ESS (Effective Sample Size)
+- Loss metrics (objective, clip fraction, etc.)
+- Gradient norm
+
+## Checkpointing
+
+Checkpoints are saved every `logging.checkpoint_frequency` batches and contain:
+- Model state
+- Optimizer state
+- Gradient scaler state (for mixed precision)
+- Full configuration
+
+## Debugging Out-of-memory issues
+
+- vLLM: Reduce `inference_model.gpu_memory_utilization=FRACTION` or number of environments run
+  in parallel (`env.num_envs=N`).
+- KL scoring: If the KL scoring is achieved on the batch of data,
+  reduce the number of environments (`env.num_envs=N`) run in parallel.
+- Training: Reduce batch size (`train.optim_batch_size`)
+
+## Directory Structure
+
+```
+sota-implementations/grpo/
+├── config/
+│   └── grpo_gsm8k.yaml       # Main configuration file
+│   └── grpo_ifeval.yaml       # config file for IFEval task
+├── grpo.py            # Training script
+├── grpo_utils.py      # Utility functions
+└── README.md          # This file
+```
+
+## Output Structure
+
+Each run creates a timestamped directory under `outputs/`:
+```
+outputs/
+└── YYYY-MM-DD/
+    └── HH-MM-SS/
+        ├── checkpoints/
+        │   └── checkpoint_*.pt
+        └── .hydra/
+            └── config.yaml
+```
+
+For hyperparameter sweeps, outputs are stored under `multirun/`.