这是indexloc提供的服务,不要输入任何密码
Skip to content

Basepolicy requires replay buffer restricting modularity.  #898

@llewynS

Description

@llewynS

Not really a coding issue.
You assert in your paper that the code base is highly modular but the algorithms are very strongly tied to your implementation of a replay buffer. All of the static methods in the base policy require the replay buffer.

It would be non-trivial to decouple the algorithmic implementations from the replay.

I would like to recommend that you rename learn to _learn. learn without the underscore implies that it is a stand alone method that can be publicly called but, at least for DDPG and child classes, learn requires everything surrounding it in update.

if buffer is None:
    return {}
batch, indices = buffer.sample(sample_size)
self.updating = True
batch = self.process_fn(batch, buffer, indices)
result = self.learn(batch, **kwargs)
self.post_process_fn(batch, buffer, indices)
if self.lr_scheduler is not None:
    self.lr_scheduler.step()
self.updating = False
return result

The docstring of learn is also not indicative of this limitation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionDiscussion of a typical issuerefactoringNo change to functionality

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions