Remove reward_normaliztion option in offpolicy algorithm #298

ChenDRAG · 2021-02-26T07:41:49Z

This pr will remove 'reward_normaliztion' option in offpolicy algorithm and related tests/examples because original reward_normaliztion in n_step_return will cause unstability in training when buffer is full (1M steps in the graph below).

codecov-io · 2021-02-26T07:59:05Z

Codecov Report

Merging #298 (c530139) into dev (3108b9d) will increase coverage by 0.01%.
The diff coverage is 97.14%.

@@            Coverage Diff             @@
##              dev     #298      +/-   ##
==========================================
+ Coverage   93.89%   93.91%   +0.01%     
==========================================
  Files          47       47              
  Lines        3241     3235       -6     
==========================================
- Hits         3043     3038       -5     
+ Misses        198      197       -1

Flag	Coverage Δ
unittests	`93.91% <97.14%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/modelbase/psrl.py	`97.40% <ø> (ø)`
tianshou/policy/modelfree/c51.py	`93.61% <ø> (ø)`
tianshou/policy/modelfree/discrete_sac.py	`87.69% <ø> (ø)`
tianshou/utils/net/common.py	`93.61% <ø> (ø)`
tianshou/policy/base.py	`77.23% <75.00%> (-0.11%)`	⬇️
tianshou/data/buffer.py	`93.29% <100.00%> (ø)`
tianshou/policy/imitation/base.py	`100.00% <100.00%> (ø)`
tianshou/policy/imitation/discrete_bcq.py	`98.43% <100.00%> (ø)`
tianshou/policy/modelfree/a2c.py	`86.20% <100.00%> (ø)`
tianshou/policy/modelfree/ddpg.py	`98.70% <100.00%> (-0.02%)`	⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3108b9d...c530139. Read the comment docs.

tianshou/policy/modelfree/ddpg.py

tianshou/policy/modelfree/dqn.py

examples/mujoco/runnable/ant_v2_ddpg.py

* remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>

remove rew_norm in offpolicy algorithm

bc4d0bc

ChenDRAG requested a review from Trinkle23897 February 26, 2021 07:41

all

6c75da7

Trinkle23897 reviewed Feb 26, 2021

View reviewed changes

tianshou/policy/modelfree/ddpg.py Outdated Show resolved Hide resolved

tianshou/policy/modelfree/dqn.py Outdated Show resolved Hide resolved

examples/mujoco/runnable/ant_v2_ddpg.py Outdated Show resolved Hide resolved

Trinkle23897 added 11 commits February 26, 2021 21:15

defaults -> Default

f5f9b24

action=store_true

76be246

fix test_ddpg: pass 10 seed within avg 30s

6f76618

fix test_td3: pass 10 seed within avg 35s

9291d5a

fix test_drqn: 10 seed avg < 20s

72e074d

test td3 seed

f6ef057

fix test_sac

19a66f8

td3 seed=1

33e6ae0

change psrl seed to see what happens

ec1096b

remove runnable/

4146534

greater

c530139

Trinkle23897 approved these changes Feb 27, 2021

View reviewed changes

Trinkle23897 merged commit f22b539 into thu-ml:dev Feb 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove reward_normaliztion option in offpolicy algorithm #298

Remove reward_normaliztion option in offpolicy algorithm #298

Uh oh!

ChenDRAG commented Feb 26, 2021

Uh oh!

codecov-io commented Feb 26, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Remove reward_normaliztion option in offpolicy algorithm #298

Remove reward_normaliztion option in offpolicy algorithm #298

Uh oh!

Conversation

ChenDRAG commented Feb 26, 2021

Uh oh!

codecov-io commented Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-io commented Feb 26, 2021 •

edited

Loading