Variable batch-size during on-policy training

- [ ] I have marked all applicable categories:
    + [ ] exception-raising bug
    + [ ] RL algorithm bug
    + [ ] documentation request (i.e. "X is missing from the documentation.")
    + [x] new feature request
- [x] I have visited the [source website](https://github.com/thu-ml/tianshou/)
- [x] I have searched through the [issue tracker](https://github.com/thu-ml/tianshou/issues) for duplicates
- [x] I have mentioned version numbers, operating system and environment, where applicable:
  ```python
  import tianshou, torch, sys
  print(tianshou.__version__, torch.__version__, sys.version)
  0.2.5, 1.5.0, 3.6.10
  ```

On-policy training now splits the replaybuffer into chunks. For 10 transitions with batch-size 4, tianshou yields three batches with batch-size 4, 4, 2, causing the variable batch-size problem.

Generally it is ok. But sometimes it causes problems, especially if we are using BatchNorm during training, which raises an error if the batch-size is 1.

I think there are three possible solutions:

- Discard the one remaining batch. For the example above, just yield two batches with batch-size 4, 4

- Pad the last batch. For the example above, yield three batches with batch-size 4, 4, 4

- Sample batches with the same batch-size. For the example above, sample three batches with batch-size 4 from 10 transitions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Variable batch-size during on-policy training #185

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Variable batch-size during on-policy training #185

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions