[Refactor] Refactor the weight update logic #2914

vmoens · 2025-04-23T09:46:44Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2025-04-23T09:46:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2914

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 8 New Failures, 1 Cancelled Job, 2 Pending, 4 Unrelated Failures

As of commit 844cf3b with merge base 21ef725 ():

NEW FAILURES - The following jobs have failed:

Build Aarch64 Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / build-wheel-py3_9-cpu-aarch64 (gh)
ModuleNotFoundError: No module named 'torch'
Build Aarch64 Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / build-wheel-py3_9-cuda-aarch6412_8-aarch64 (gh)
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Process completed with exit code 4.
Habitat Tests on Linux / tests (3.9, 12.8) / linux-job (gh)
RuntimeError: Command docker exec -t 515f8e3ba15fb4776ded4e2e59913a1faf942217fd2caa1aefcccb7eac2d7d1d /exec failed with exit code 1
SOTA Tests on Linux / tests (3.9, 12.8) / linux-job (gh)
RuntimeError: Command docker exec -t eeea84a3d237d08b54cc4c6414cd6dc7618b4ab5768a9e26a8376c21e5b9ce41 /exec failed with exit code 1
Unit-tests on Linux / tests-cpu (3.12) / linux-job (gh)
test/test_env.py::TestNonTensorEnv::test_parallel[False-False]
Unit-tests on Linux / tests-cpu (3.9) / linux-job (gh)
test/test_env.py::TestNonTensorEnv::test_parallel[False-True]
Unit-tests on Linux / tests-olddeps (3.9, 11.6) / linux-job (gh)
RuntimeError: Command docker exec -t fa8230eacf698d86f1d5df257cfcd8be5c0c77d43115055c78b7df8afc6a5171 /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Build Aarch64 Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / upload / upload-wheel-py3_9-cpu-aarch64 (gh) (trunk failure)
Build Aarch64 Linux Wheels / pytorch/rl (pytorch/rl, test/smoke_test.py, torchrl, .github/scripts/pre-build-script.sh) / upload / upload-wheel-py3_9-cuda-aarch6412_8-aarch64 (gh) (trunk failure)
Unable to download artifact(s): Artifact not found for name: pytorch_rl__3.9_cu128_aarch64
Unit-tests on Linux / tests-optdeps (3.11, 12.8) / linux-job (gh) (trunk failure)
test/test_collector.py::TestPolicyFactory::test_weight_update
Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: f685b50 Pull Request resolved: #2914

[ghstack-poisoned]

ghstack-source-id: fe044d8 Pull Request resolved: #2914

vmoens

I'm trying to rethink about sender and receiver one last time.

I think we always need a sender: in some way, you always need to push the weights somewhere (because vllm will never ask for weights, you push the weights to vllm).

In centralized settings, where you have a central collector orchestrating satellites ones, the responsibility of the central collector is to push weights to the workers (note that this is not the schema that we are using, which is decentralized).

The receiver on the other hand is accessory, it's more like the kind of settings where your worker can ask for weights by itself at a given interval or when some conditions are met.

The update_policy_weights_ function then looks like

def update_policy_weights_(self, *args, **kwargs):
    weights = self.receive(*args, **kwargs) #  this is a no-op if the weights (hanlde) are in the args
    self.send(weights)  # this should never be a no-op, as this is where the weight update actually occurs

@mikaylagawarecki @Darktex

vmoens · 2025-04-23T09:52:20Z

torchrl/collectors/weight_update.py

    @property
    def collector(self) -> torchrl.collectors.DataCollectorBase:  # noqa
-        return self._collector_wr() if self._collector_wr is not None else None
+        """The collector or container of the receiver.


I'm saying collector or container because we may want to use these classes with something else than a collector (eg have a sender in a parameter server)
cc @mikaylagawarecki

[ghstack-poisoned]

ghstack-source-id: a53a09e Pull Request resolved: #2914

mikaylagawarecki · 2025-04-24T04:49:48Z

docs/source/reference/collectors.rst

+
+.. figure:: /_static/img/param-update.svg
+
+   In this setting, a parameter server holds various copies of the parameters. The "pulling" of the weights from the


why do you envision this to hold various copies rather than one?

mikaylagawarecki · 2025-04-24T04:51:03Z

docs/source/reference/collectors.rst

+.. figure:: /_static/img/param-update.svg
+
+   In this setting, a parameter server holds various copies of the parameters. The "pulling" of the weights from the
+    parameter server is handled by the main collector receiver. The main collector server sender instance sends the


main collector server

Is it accurate to think of this as the main thread in RayCollector?

mikaylagawarecki · 2025-04-24T05:05:02Z

docs/source/reference/collectors.rst

-  the local inference worker. It is particularly useful when the training and inference occur on the same machine but on
+- :class:`~torchrl.collectors.WeightUpdateSenderBase`: This component handles the distribution of policy weights to
+  the policy or to remote inference workers. Every collector -- server or worker -- should have a `WeightUpdateSenderBase`
+  instance to handle the "push" operation of the weights to the policy.


I think "push/pull" and "sender/receiver" are confusing 🫤 In particular, for me the Receiver == "Puller" part is tough to wrap my head around.

Pull architecture: the client sends the request, and the server responds accordingly
Push architecture: the server pushes data to clients as updates become available

The confusion for me is that I think of sender --> receiver as "sender actively pushes, receiver passively receives". Hence receiver == puller is not intuitive

Got it
In this context I'm starting to think that having 2 separate classes will always be confusing so perhaps we should just have one that can be customized at will.
In every case I've been dealing with so far it never occured that I could write senders and receivers that would compose freely, so that tells me that making a perfectly composable API may be an illusion.
I'm myself a bit confused about what should live within each of these classes to be honest...
I'll refactor this to have a single Updater class that gives a somewhat unopinionated implementation of the update functionality!

mikaylagawarecki · 2025-04-24T05:14:52Z

docs/source/reference/collectors.rst


 Weight Synchronization in Distributed Environments
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------------------------------------------


I read the diagram above as

CollectorServer: main thread of RayCollector

Collector Worker {i}, remote DataCollector

If this read is correct, in my mind, it might sometimes make sense to have the receiver on the collector worker rather than the collector server
e.g. If the number of remote workers is sufficiently high, the collector worker might not be colocated with the collector server, in that case it might not make sense to pass the weights "two hops" to get to the worker

Separate qn -- from the diagram it looks like the collector server chooses when to pull from the param server and then "forcefully pushes" to all the workers at once. Is this design intentional? (e.g. Is the purpose of this to batch up workers to different collector servers and update them in batches?)

[ghstack-poisoned]

ghstack-source-id: 42ce4ae Pull Request resolved: #2914

[ghstack-poisoned]

ghstack-source-id: 72b710a Pull Request resolved: #2914

Update

e0ec386

[ghstack-poisoned]

vmoens mentioned this pull request Apr 23, 2025

[Refactor] Remove LLM features for release #2912

Merged

vmoens pushed a commit that referenced this pull request Apr 23, 2025

[Refactor] Refactor the weight update logic

2a406fb

ghstack-source-id: f685b50 Pull Request resolved: #2914

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2025

Update

9aaab89

[ghstack-poisoned]

vmoens pushed a commit that referenced this pull request Apr 23, 2025

[Refactor] Refactor the weight update logic

fbbeeeb

ghstack-source-id: fe044d8 Pull Request resolved: #2914

vmoens commented Apr 23, 2025

View reviewed changes

Update

b9e7568

[ghstack-poisoned]

vmoens pushed a commit that referenced this pull request Apr 23, 2025

[Refactor] Refactor the weight update logic

d56ebac

ghstack-source-id: a53a09e Pull Request resolved: #2914

mikaylagawarecki reviewed Apr 24, 2025

View reviewed changes

vmoens added the Refactoring Refactoring of an existing feature label Apr 28, 2025

Update

4895d36

[ghstack-poisoned]

vmoens pushed a commit that referenced this pull request Apr 28, 2025

[Refactor] Refactor the weight update logic

d71502a

ghstack-source-id: 42ce4ae Pull Request resolved: #2914

Update

844cf3b

[ghstack-poisoned]

vmoens pushed a commit that referenced this pull request Apr 28, 2025

[Refactor] Refactor the weight update logic

b367227

ghstack-source-id: 72b710a Pull Request resolved: #2914

vmoens mentioned this pull request Apr 28, 2025

[BugFix] Fix slow and flaky non-tensor parallel env test #2926

Merged

vmoens merged commit 844cf3b into gh/vmoens/130/base Apr 28, 2025
56 of 75 checks passed

vmoens pushed a commit that referenced this pull request Apr 28, 2025

[Refactor] Refactor the weight update logic

0da9044

ghstack-source-id: 72b710a Pull Request resolved: #2914

vmoens deleted the gh/vmoens/130/head branch April 28, 2025 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Refactor] Refactor the weight update logic #2914

[Refactor] Refactor the weight update logic #2914

Uh oh!

vmoens commented Apr 23, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 23, 2025 •

edited

Loading

Uh oh!

vmoens left a comment

Uh oh!

vmoens Apr 23, 2025

Uh oh!

mikaylagawarecki Apr 24, 2025

Uh oh!

mikaylagawarecki Apr 24, 2025

Uh oh!

mikaylagawarecki Apr 24, 2025 •

edited

Loading

Uh oh!

vmoens Apr 24, 2025

Uh oh!

mikaylagawarecki Apr 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		.. figure:: /_static/img/param-update.svg

		In this setting, a parameter server holds various copies of the parameters. The "pulling" of the weights from the

[Refactor] Refactor the weight update logic #2914

[Refactor] Refactor the weight update logic #2914

Uh oh!

Conversation

vmoens commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2914

❌ 8 New Failures, 1 Cancelled Job, 2 Pending, 4 Unrelated Failures

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

vmoens Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vmoens Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vmoens commented Apr 23, 2025 •

edited

Loading

pytorch-bot bot commented Apr 23, 2025 •

edited

Loading

mikaylagawarecki Apr 24, 2025 •

edited

Loading

mikaylagawarecki Apr 24, 2025 •

edited

Loading