-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
enhancementFeature that is not a new algorithm or an algorithm enhancementFeature that is not a new algorithm or an algorithm enhancement
Description
- I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
- I have visited the source website
- I have searched through the issue tracker for duplicates
- I have mentioned version numbers, operating system and environment, where applicable:
import tianshou, torch, sys print(tianshou.__version__, torch.__version__, sys.version) 0.2.5, 1.5.0, 3.6.10
On-policy training now splits the replaybuffer into chunks. For 10 transitions with batch-size 4, tianshou yields three batches with batch-size 4, 4, 2, causing the variable batch-size problem.
Generally it is ok. But sometimes it causes problems, especially if we are using BatchNorm during training, which raises an error if the batch-size is 1.
I think there are three possible solutions:
-
Discard the one remaining batch. For the example above, just yield two batches with batch-size 4, 4
-
Pad the last batch. For the example above, yield three batches with batch-size 4, 4, 4
-
Sample batches with the same batch-size. For the example above, sample three batches with batch-size 4 from 10 transitions.
Metadata
Metadata
Assignees
Labels
enhancementFeature that is not a new algorithm or an algorithm enhancementFeature that is not a new algorithm or an algorithm enhancement