这是indexloc提供的服务,不要输入任何密码
Skip to content

No minibatch for computation of logp_old in PPOPolicy #1164

@jvasso

Description

@jvasso
  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
    • design request (i.e. "X should be changed to Y.")
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:

I have noticed that in the implementation of the PPOPolicy, the computation of the old log probabilities logp_old is performed without using minibatch:

with torch.no_grad():
   batch.logp_old = self(batch).dist.log_prob(batch.act)

This makes this algorithm unusable in situations where the batch is too large, with no possibility of controlling it via batch_size.
I simply suggest to add support for minibatch:

logp_old = []
with torch.no_grad():
    for minibatch in batch.split(self._batch, shuffle=False, merge_last=True):
        logp_old.append(self(minibatch).dist.log_prob(minibatch.act))
    batch.logp_old = torch.cat(logp_old, dim=0).flatten()

The version of Tianshou that I'm using is 1.0.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performance issuesSlow execution or poor-quality results

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions