-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Trainer refactor : some definition change #293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
241681e
to
e8e0ec1
Compare
Codecov Report
@@ Coverage Diff @@
## dev #293 +/- ##
==========================================
+ Coverage 94.51% 94.53% +0.02%
==========================================
Files 45 45
Lines 3152 3164 +12
==========================================
+ Hits 2979 2991 +12
Misses 173 173
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
examples/box2d/[lunarlander_dqn|bipedal_hardcore_sac|acrobot_dualdqn].py need to add update-per-step
tianshou/trainer/offpolicy.py
Outdated
@@ -33,7 +33,7 @@ def offpolicy_trainer( | |||
) -> Dict[str, Union[float, str]]: | |||
"""A wrapper for off-policy trainer procedure. | |||
|
|||
The "step" in trainer means a policy network update. | |||
The "step" in trainer means an environment frame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "step" in trainer means an environment frame. | |
The "step" in trainer means an environment step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems in other places in doc, environment 'step' is described as frame. wouldn't it be weird that here is step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this suggestion. Now that we have explained that the step
refers to the environment step, all frames
in the doc can be changed to steps
. In addition, the annotation of step
in offline trainer can also be deleted, because we have used the word update
to refer to the gradient step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danagi please help us check again including docstring of 3 trainers and the description of trainer in docs/tutorials/[dqn|concepts].rst
, thanks!
Co-authored-by: danagi <420147879@qq.com>
Ready to merge now, I think. |
This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.
This is the 4th commit of 6 commits mentioned in #274, which features refactor of trainer to fix #161. You can check #274 for more detail.
To avoid large commit as in #280, I will change in 2 pr. first focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers. Second pr focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally.
This is the first pr.