这是indexloc提供的服务,不要输入任何密码
Skip to content

Variable batch-size during on-policy training #185

@youkaichao

Description

@youkaichao
  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version)
    0.2.5, 1.5.0, 3.6.10

On-policy training now splits the replaybuffer into chunks. For 10 transitions with batch-size 4, tianshou yields three batches with batch-size 4, 4, 2, causing the variable batch-size problem.

Generally it is ok. But sometimes it causes problems, especially if we are using BatchNorm during training, which raises an error if the batch-size is 1.

I think there are three possible solutions:

  • Discard the one remaining batch. For the example above, just yield two batches with batch-size 4, 4

  • Pad the last batch. For the example above, yield three batches with batch-size 4, 4, 4

  • Sample batches with the same batch-size. For the example above, sample three batches with batch-size 4 from 10 transitions.

Metadata

Metadata

Assignees

Labels

enhancementFeature that is not a new algorithm or an algorithm enhancement

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions