这是indexloc提供的服务,不要输入任何密码
Skip to content

Extend and fix reward/return normalizations #927

@MischaPanch

Description

@MischaPanch

There are some ways in which reward/return/value normalization could be improved. But one most drastic thing is the following:

Currently PGPolicy instantiates self.ret_rms = RunningMeanStd(), and RunningMeanStd has a default value of clip_max=10. This cannot be adjusted by users! (except through monkey-patching, ofc)

This might work well for some standard envs, but the clipping value is arbitrary and making it non-configurable is a major hinderance for users, who are most probably not aware of this.

Generally, how to best normalize stuff in RL is an active discussion and normalization can play an important role in performance. I believe tianshou should be extended to accomodate various normalization strategies

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementFeature that is not a new algorithm or an algorithm enhancementgood first issueGood for newcomers

    Type

    No type

    Projects

    Status

    To do

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions