这是indexloc提供的服务,不要输入任何密码
Skip to content

Trainer refactor : some definition change #293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Feb 21, 2021

Conversation

ChenDRAG
Copy link
Collaborator

This is the 4th commit of 6 commits mentioned in #274, which features refactor of trainer to fix #161. You can check #274 for more detail.
To avoid large commit as in #280, I will change in 2 pr. first focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers. Second pr focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally.
This is the first pr.

@Trinkle23897 Trinkle23897 requested a review from danagi February 19, 2021 03:18
@codecov-io
Copy link

codecov-io commented Feb 19, 2021

Codecov Report

Merging #293 (fdb4e57) into dev (150d0ec) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #293      +/-   ##
==========================================
+ Coverage   94.51%   94.53%   +0.02%     
==========================================
  Files          45       45              
  Lines        3152     3164      +12     
==========================================
+ Hits         2979     2991      +12     
  Misses        173      173              
Flag Coverage Δ
unittests 94.53% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/data/buffer.py 98.57% <ø> (ø)
tianshou/policy/base.py 76.80% <ø> (ø)
tianshou/policy/modelfree/pg.py 97.36% <ø> (ø)
tianshou/policy/random.py 100.00% <ø> (ø)
tianshou/trainer/utils.py 100.00% <ø> (ø)
tianshou/data/collector.py 94.89% <100.00%> (ø)
tianshou/trainer/offline.py 100.00% <100.00%> (ø)
tianshou/trainer/offpolicy.py 100.00% <100.00%> (ø)
tianshou/trainer/onpolicy.py 97.05% <100.00%> (+0.18%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 150d0ec...fdb4e57. Read the comment docs.

Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples/box2d/[lunarlander_dqn|bipedal_hardcore_sac|acrobot_dualdqn].py need to add update-per-step

@@ -33,7 +33,7 @@ def offpolicy_trainer(
) -> Dict[str, Union[float, str]]:
"""A wrapper for off-policy trainer procedure.

The "step" in trainer means a policy network update.
The "step" in trainer means an environment frame.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The "step" in trainer means an environment frame.
The "step" in trainer means an environment step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems in other places in doc, environment 'step' is described as frame. wouldn't it be weird that here is step?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this suggestion. Now that we have explained that the step refers to the environment step, all frames in the doc can be changed to steps. In addition, the annotation of step in offline trainer can also be deleted, because we have used the word update to refer to the gradient step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danagi please help us check again including docstring of 3 trainers and the description of trainer in docs/tutorials/[dqn|concepts].rst, thanks!

Trinkle23897 and others added 2 commits February 20, 2021 20:16
@ChenDRAG
Copy link
Collaborator Author

Ready to merge now, I think.

@Trinkle23897 Trinkle23897 merged commit 7036073 into thu-ml:dev Feb 21, 2021
@Trinkle23897 Trinkle23897 linked an issue Apr 21, 2021 that may be closed by this pull request
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plans of releasing mujoco benchmark with ddpg/sac/td3 on Tianshou
4 participants