No minibatch for computation of logp_old in PPOPolicy

- [x] I have marked all applicable categories:
    + [x] exception-raising bug
    + [x] RL algorithm bug
    + [ ] documentation request (i.e. "X is missing from the documentation.")
    + [ ] new feature request
    + [x] design request (i.e. "X should be changed to Y.")
- [x] I have visited the [source website](https://github.com/thu-ml/tianshou/)
- [x] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
- [x] I have mentioned version numbers, operating system and environment, where applicable:

I have noticed that in the implementation of the PPOPolicy, the computation of the old log probabilities `logp_old` is performed without using minibatch:
```python
with torch.no_grad():
   batch.logp_old = self(batch).dist.log_prob(batch.act)
```
This makes this algorithm unusable in situations where the batch is too large, with no possibility of controlling it via batch_size.
I simply suggest to add support for minibatch:
```python
logp_old = []
with torch.no_grad():
    for minibatch in batch.split(self._batch, shuffle=False, merge_last=True):
        logp_old.append(self(minibatch).dist.log_prob(minibatch.act))
    batch.logp_old = torch.cat(logp_old, dim=0).flatten()
```

The version of Tianshou that I'm using is 1.0.0.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No minibatch for computation of logp_old in PPOPolicy #1164

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No minibatch for computation of logp_old in PPOPolicy #1164

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions