-
Notifications
You must be signed in to change notification settings - Fork 77
Open
Labels
Description
Right now NeMo RL uses uv and that has worked for us so far. With newer fixes and next gen hardware, we'll have to support pytorch coming from NGC containers as opposed to official wheels on torch index.
As an example, we won't be able to use torch 2.8 wheels (when they come out) b/c they do not ship with a new enough nccl for blackwell
This requires some design, but initial thoughts of directions to try (in order of complexity) are:
- Adding a second dockerfile (w/ NGC pytorch base) and add a switch in NeMo RL to use a "non-isolated" environment
- this would mean we try to install everything in the system environment
- may need to think about how to ensure libraries like
vllm
do not override system torch - need a switch in nemo-rl to use the PY_EXECUTABLES.SYSTEM everywhere
- Keep the existing isolated environments, but somehow constrain the
uv
resolver to not install torch (and its transitive dependencies, e.g., pypi nccl/nvidia wheels)
- This path should also not be made default in NeMo RL which prevents us from being "pip installable"
- Ideally we keep this functionality in the main branch.