这是indexloc提供的服务,不要输入任何密码
Skip to content

optimize training procedure and improve code coverage #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 60 commits into from
Aug 27, 2020
Merged

optimize training procedure and improve code coverage #189

merged 60 commits into from
Aug 27, 2020

Conversation

Trinkle23897
Copy link
Collaborator

@Trinkle23897 Trinkle23897 commented Aug 22, 2020

This is a cherry-pick of #187

  1. add policy.eval() in all test scripts' "watch performance"
  2. remove dict return support for collector preprocess_fn
  3. add __contains__ and pop in batch: key in batch, batch.pop(key, deft)
  4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (Memory error when sampling from collector #184)
  5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard
  6. add test_returns (both GAE and nstep)
  7. change the type-checking order in batch.py and converter.py in order to meet the most often case first
  8. fix shape inconsistency for torch.Tensor in replay buffer
  9. remove **kwargs in ReplayBuffer
  10. remove default value in batch.split() and add merge_last argument (Variable batch-size during on-policy training #185)
  11. improve nstep efficiency
  12. add max_batchsize in onpolicy algorithms
  13. potential bugfix for subproc.wait
  14. fix RecurrentActorProb
  15. improve the code-coverage (from 90% to 95%) and remove the dead code
  16. fix some incorrect type annotation

The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).

@Trinkle23897 Trinkle23897 changed the title Optimize training procedure and improve code coverage WIP: optimize training procedure and improve code coverage Aug 22, 2020
@duburcqa
Copy link
Collaborator

duburcqa commented Aug 26, 2020

I'm done with the review ! Nice PR !

youkaichao
youkaichao previously approved these changes Aug 27, 2020
@Trinkle23897 Trinkle23897 merged commit 94bfb32 into thu-ml:master Aug 27, 2020
@Trinkle23897 Trinkle23897 deleted the optimize branch August 27, 2020 04:15
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
1. add policy.eval() in all test scripts' "watch performance"
2. remove dict return support for collector preprocess_fn
3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)`
4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (thu-ml#184)
5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard
6. add test_returns (both GAE and nstep)
7. change the type-checking order in batch.py and converter.py in order to meet the most often case first
8. fix shape inconsistency for torch.Tensor in replay buffer
9. remove `**kwargs` in ReplayBuffer
10. remove default value in batch.split() and add merge_last argument (thu-ml#185)
11. improve nstep efficiency
12. add max_batchsize in onpolicy algorithms
13. potential bugfix for subproc.wait
14. fix RecurrentActorProb
15. improve the code-coverage (from 90% to 95%) and remove the dead code
16. fix some incorrect type annotation

The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Variable batch-size during on-policy training Memory error when sampling from collector Batch & Buffer profiling
4 participants