这是indexloc提供的服务,不要输入任何密码
Skip to content

Improve offline algo performance #1261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

MischaPanch
Copy link
Collaborator

@MischaPanch MischaPanch commented May 21, 2025

For @arnaujc91

Background: currently the pre-processing of most offline learning algorithms is done in _preprocess_batch, which is highly suboptimal. Instead it should be done in process_buffer.

In this PR, a general class that implements process_buffer using _preprocess_batch is introduced that allows converting an OffPolicyAlgorithm into an efficient OfflineAlgorithm.

Current status: when using it to improve performance of TD3BC, the algorithm converges and tests pass, but the determinism test fails, meaning something changed in the processing or at least in the random number generation. If it's the latter, the failure is not a problem, but I currently don't see why any rng related things should have changed.

The implementation that changes the buffer's managed batch is rather hacky. I suspect something goes wrong with the indexing but after 20 mins of debugging I haven't yet pinned down what causes the determinism test to fail. Understanding this will require reading through the sample_indices in ReplayBuffer and ReplayBufferManager. Note that there is inconsistency in how sample_indices(None) is handled between the two, but it shouldn't play a role for this PR.

In the course of this PR, determinism snapshots should be created from the dev-v2 branch, then after switching to this branch, the determinism test of td3bc should succeed. See docstring of AlgorithmDeterminismTest in determinism_test.py for further details.

The PR is finished when all relevant offline algorithms inherit from OfflineAlgorithmFromOffPolicyAlgorithm and either the determinism tests pass, or a source of different random number generation caused by the refactoring has been identified.

@MischaPanch MischaPanch changed the base branch from master to dev-v2 May 21, 2025 10:48
@MischaPanch MischaPanch force-pushed the improve-offline-algo-performance branch from ae605bf to c743fcf Compare May 21, 2025 10:55
We should be able to set batches with supersets of reserved_keys, there's no reason to not allow that
Would lead to duplicated initialization of parents, in particular nn.Module, which is problematic
@MischaPanch MischaPanch force-pushed the improve-offline-algo-performance branch from dbcfad2 to 0ebf152 Compare July 14, 2025 13:27
Base automatically changed from dev-v2 to master July 15, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant