Trainer refactor : some definition change #293

ChenDRAG · 2021-02-19T03:04:21Z

This is the 4th commit of 6 commits mentioned in #274, which features refactor of trainer to fix #161. You can check #274 for more detail.
To avoid large commit as in #280, I will change in 2 pr. first focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers. Second pr focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally.
This is the first pr.

tianshou/trainer/onpolicy.py

tianshou/trainer/offpolicy.py

docs/tutorials/dqn.rst

codecov-io · 2021-02-19T10:55:09Z

Codecov Report

Merging #293 (fdb4e57) into dev (150d0ec) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #293      +/-   ##
==========================================
+ Coverage   94.51%   94.53%   +0.02%     
==========================================
  Files          45       45              
  Lines        3152     3164      +12     
==========================================
+ Hits         2979     2991      +12     
  Misses        173      173

Flag	Coverage Δ
unittests	`94.53% <100.00%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/data/buffer.py	`98.57% <ø> (ø)`
tianshou/policy/base.py	`76.80% <ø> (ø)`
tianshou/policy/modelfree/pg.py	`97.36% <ø> (ø)`
tianshou/policy/random.py	`100.00% <ø> (ø)`
tianshou/trainer/utils.py	`100.00% <ø> (ø)`
tianshou/data/collector.py	`94.89% <100.00%> (ø)`
tianshou/trainer/offline.py	`100.00% <100.00%> (ø)`
tianshou/trainer/offpolicy.py	`100.00% <100.00%> (ø)`
tianshou/trainer/onpolicy.py	`97.05% <100.00%> (+0.18%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 150d0ec...fdb4e57. Read the comment docs.

docs/tutorials/tictactoe.rst

tianshou/trainer/offline.py

tianshou/trainer/onpolicy.py

Trinkle23897

examples/box2d/[lunarlander_dqn|bipedal_hardcore_sac|acrobot_dualdqn].py need to add update-per-step

tianshou/trainer/offline.py

Trinkle23897 · 2021-02-19T23:09:45Z

tianshou/trainer/offpolicy.py

@@ -33,7 +33,7 @@ def offpolicy_trainer(
 ) -> Dict[str, Union[float, str]]:
    """A wrapper for off-policy trainer procedure.

-    The "step" in trainer means a policy network update.
+    The "step" in trainer means an environment frame.


Suggested change

The "step" in trainer means an environment frame.

The "step" in trainer means an environment step.

seems in other places in doc, environment 'step' is described as frame. wouldn't it be weird that here is step?

I agree this suggestion. Now that we have explained that the step refers to the environment step, all frames in the doc can be changed to steps. In addition, the annotation of step in offline trainer can also be deleted, because we have used the word update to refer to the gradient step.

@danagi please help us check again including docstring of 3 trainers and the description of trainer in docs/tutorials/[dqn|concepts].rst, thanks!

Co-authored-by: n+e <trinkle23897@qq.com>

…o trainer_definition

tianshou/policy/base.py

Co-authored-by: danagi <420147879@qq.com>

ChenDRAG · 2021-02-21T05:02:03Z

Ready to merge now, I think.

This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.

rebase

e8e0ec1

Trinkle23897 force-pushed the trainer_definition branch from 241681e to e8e0ec1 Compare February 19, 2021 03:09

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

tianshou/trainer/onpolicy.py Show resolved Hide resolved

tianshou/trainer/onpolicy.py Outdated Show resolved Hide resolved

Trinkle23897 requested a review from danagi February 19, 2021 03:18

pep8 fix

3f215f9

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

tianshou/trainer/offpolicy.py Outdated Show resolved Hide resolved

docs/tutorials/dqn.rst Outdated Show resolved Hide resolved

ChenDRAG added 10 commits February 19, 2021 11:40

remove collect method

652a5de

small fix

cfdefe1

test fix

b9a7597

fix a bug

237f16a

pep8 fix

9ac0228

adjust update option to be consistent with history

27568a3

pep8 fix

044f909

some other change

fe412ac

fix test

6262239

update change

75750e8

restart test

602fa2d

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

docs/tutorials/tictactoe.rst Show resolved Hide resolved

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

tianshou/trainer/offline.py Outdated Show resolved Hide resolved

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

tianshou/trainer/onpolicy.py Outdated Show resolved Hide resolved

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

tianshou/trainer/onpolicy.py Outdated Show resolved Hide resolved

fix

7642f44

danagi requested changes Feb 19, 2021

View reviewed changes

tianshou/trainer/onpolicy.py Outdated Show resolved Hide resolved

ChenDRAG added 2 commits February 19, 2021 22:38

solve review

cccea72

fix review

670d3df

ChenDRAG requested review from danagi and Trinkle23897 February 19, 2021 14:42

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

tianshou/trainer/offline.py Outdated Show resolved Hide resolved

Trinkle23897 reviewed Feb 19, 2021

View reviewed changes

ChenDRAG and others added 8 commits February 20, 2021 08:59

Update tianshou/trainer/offline.py

3abdaff

Co-authored-by: n+e <trinkle23897@qq.com>

update option in box2d

d383229

Merge branch 'trainer_definition' of github.com:ChenDRAG/tianshou int…

5d6b23c

…o trainer_definition

pep8fix

f3727a0

fix test

9f7d410

fix doc

2d9fc02

fix doc

28f5f8f

fix doc

6330c51

danagi reviewed Feb 20, 2021

View reviewed changes

tianshou/policy/base.py Outdated Show resolved Hide resolved

Trinkle23897 and others added 2 commits February 20, 2021 20:16

Update tianshou/policy/base.py

4597367

Co-authored-by: danagi <420147879@qq.com>

replace frame with transition

fdb4e57

danagi approved these changes Feb 21, 2021

View reviewed changes

Trinkle23897 merged commit 7036073 into thu-ml:dev Feb 21, 2021

ChenDRAG mentioned this pull request Feb 21, 2021

Trainer refactor : flexible logger #295

Merged

Trinkle23897 linked an issue Apr 21, 2021 that may be closed by this pull request

Plans of releasing mujoco benchmark with ddpg/sac/td3 on Tianshou #274

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trainer refactor : some definition change #293

Trainer refactor : some definition change #293

Uh oh!

ChenDRAG commented Feb 19, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-io commented Feb 19, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 left a comment

Uh oh!

Uh oh!

Trinkle23897 Feb 19, 2021

Uh oh!

ChenDRAG Feb 20, 2021

Uh oh!

danagi Feb 20, 2021

Uh oh!

Trinkle23897 Feb 20, 2021

Uh oh!

Uh oh!

ChenDRAG commented Feb 21, 2021

Uh oh!

Uh oh!

	The "step" in trainer means an environment frame.
	The "step" in trainer means an environment step.

Trainer refactor : some definition change #293

Trainer refactor : some definition change #293

Uh oh!

Conversation

ChenDRAG commented Feb 19, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-io commented Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Trinkle23897 Feb 19, 2021

Choose a reason for hiding this comment

Uh oh!

ChenDRAG Feb 20, 2021

Choose a reason for hiding this comment

Uh oh!

danagi Feb 20, 2021

Choose a reason for hiding this comment

Uh oh!

Trinkle23897 Feb 20, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChenDRAG commented Feb 21, 2021

Uh oh!

Uh oh!

codecov-io commented Feb 19, 2021 •

edited

Loading