-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Labels
enhancementFeature that is not a new algorithm or an algorithm enhancementFeature that is not a new algorithm or an algorithm enhancementgood first issueGood for newcomersGood for newcomers
Milestone
Description
There are some ways in which reward/return/value normalization could be improved. But one most drastic thing is the following:
Currently PGPolicy
instantiates self.ret_rms = RunningMeanStd()
, and RunningMeanStd
has a default value of clip_max=10
. This cannot be adjusted by users! (except through monkey-patching, ofc)
This might work well for some standard envs, but the clipping value is arbitrary and making it non-configurable is a major hinderance for users, who are most probably not aware of this.
Generally, how to best normalize stuff in RL is an active discussion and normalization can play an important role in performance. I believe tianshou should be extended to accomodate various normalization strategies
Metadata
Metadata
Assignees
Labels
enhancementFeature that is not a new algorithm or an algorithm enhancementFeature that is not a new algorithm or an algorithm enhancementgood first issueGood for newcomersGood for newcomers
Type
Projects
Status
To do