这是indexloc提供的服务,不要输入任何密码
Skip to content

Remove reward_normaliztion option in offpolicy algorithm #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Feb 27, 2021

Conversation

ChenDRAG
Copy link
Collaborator

This pr will remove 'reward_normaliztion' option in offpolicy algorithm and related tests/examples because original reward_normaliztion in n_step_return will cause unstability in training when buffer is full (1M steps in the graph below).
e363823530e46ececb68476981be87e

@codecov-io
Copy link

codecov-io commented Feb 26, 2021

Codecov Report

Merging #298 (c530139) into dev (3108b9d) will increase coverage by 0.01%.
The diff coverage is 97.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #298      +/-   ##
==========================================
+ Coverage   93.89%   93.91%   +0.01%     
==========================================
  Files          47       47              
  Lines        3241     3235       -6     
==========================================
- Hits         3043     3038       -5     
+ Misses        198      197       -1     
Flag Coverage Δ
unittests 93.91% <97.14%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/policy/modelbase/psrl.py 97.40% <ø> (ø)
tianshou/policy/modelfree/c51.py 93.61% <ø> (ø)
tianshou/policy/modelfree/discrete_sac.py 87.69% <ø> (ø)
tianshou/utils/net/common.py 93.61% <ø> (ø)
tianshou/policy/base.py 77.23% <75.00%> (-0.11%) ⬇️
tianshou/data/buffer.py 93.29% <100.00%> (ø)
tianshou/policy/imitation/base.py 100.00% <100.00%> (ø)
tianshou/policy/imitation/discrete_bcq.py 98.43% <100.00%> (ø)
tianshou/policy/modelfree/a2c.py 86.20% <100.00%> (ø)
tianshou/policy/modelfree/ddpg.py 98.70% <100.00%> (-0.02%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3108b9d...c530139. Read the comment docs.

@Trinkle23897 Trinkle23897 merged commit f22b539 into thu-ml:dev Feb 27, 2021
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
* remove rew_norm in nstep implementation
* improve test
* remove runnable/
* various doc fix

Co-authored-by: n+e <trinkle23897@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants