Extend and fix reward/return normalizations

There are some ways in which reward/return/value normalization could be improved. But one most drastic thing is the following:

Currently `PGPolicy` instantiates  `self.ret_rms = RunningMeanStd()`, and `RunningMeanStd` has a default value of `clip_max=10`. **This cannot be adjusted by users!** (except through monkey-patching, ofc)

This might work well for some standard envs, but the clipping value is arbitrary and making it non-configurable is a major hinderance for users, who are most probably not aware of this.

Generally, how to best normalize stuff in RL is an active discussion and normalization can play an important role in performance. I believe tianshou should be extended to accomodate various normalization strategies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend and fix reward/return normalizations #927

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extend and fix reward/return normalizations #927

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions