Basepolicy requires replay buffer restricting modularity. 

Not really a coding issue. 
You assert in your paper that the code base is highly modular but the algorithms are very strongly tied to your implementation of a replay buffer. All of the static methods in the base policy require the replay buffer.

It would be non-trivial to decouple the algorithmic implementations from the replay.

I would like to recommend that you rename learn to _learn. learn without the underscore implies that it is a stand alone method that can be publicly called but, at least for DDPG and child classes, learn requires everything surrounding it in update.
```
if buffer is None:
    return {}
batch, indices = buffer.sample(sample_size)
self.updating = True
batch = self.process_fn(batch, buffer, indices)
result = self.learn(batch, **kwargs)
self.post_process_fn(batch, buffer, indices)
if self.lr_scheduler is not None:
    self.lr_scheduler.step()
self.updating = False
return result
```
The docstring of learn is also not indicative of this limitation.     




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Basepolicy requires replay buffer restricting modularity. #898

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Basepolicy requires replay buffer restricting modularity. #898

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions