optimize training procedure and improve code coverage #189

Trinkle23897 · 2020-08-22T00:13:46Z

This is a cherry-pick of #187

add policy.eval() in all test scripts' "watch performance"
remove dict return support for collector preprocess_fn
add __contains__ and pop in batch: key in batch, batch.pop(key, deft)
exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (Memory error when sampling from collector #184)
fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard
add test_returns (both GAE and nstep)
change the type-checking order in batch.py and converter.py in order to meet the most often case first
fix shape inconsistency for torch.Tensor in replay buffer
remove **kwargs in ReplayBuffer
remove default value in batch.split() and add merge_last argument (Variable batch-size during on-policy training #185)
improve nstep efficiency
add max_batchsize in onpolicy algorithms
potential bugfix for subproc.wait
fix RecurrentActorProb
improve the code-coverage (from 90% to 95%) and remove the dead code
fix some incorrect type annotation

The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).

…tBuffer still keep reference)

… 1910

tianshou/data/collector.py

tianshou/data/utils/converter.py

tianshou/env/venvs.py

tianshou/policy/base.py

tianshou/policy/modelfree/ppo.py

tianshou/trainer/offpolicy.py

docs/tutorials/cheatsheet.rst

setup.py

tianshou/data/batch.py

duburcqa · 2020-08-26T14:40:47Z

I'm done with the review ! Nice PR !

tianshou/env/venvs.py

1. add policy.eval() in all test scripts' "watch performance" 2. remove dict return support for collector preprocess_fn 3. add `__contains__` and `pop` in batch: `key in batch`, `batch.pop(key, deft)` 4. exact n_episode for a list of n_episode limitation and save fake data in cache_buffer when self.buffer is None (thu-ml#184) 5. fix tensorboard logging: h-axis stands for env step instead of gradient step; add test results into tensorboard 6. add test_returns (both GAE and nstep) 7. change the type-checking order in batch.py and converter.py in order to meet the most often case first 8. fix shape inconsistency for torch.Tensor in replay buffer 9. remove `**kwargs` in ReplayBuffer 10. remove default value in batch.split() and add merge_last argument (thu-ml#185) 11. improve nstep efficiency 12. add max_batchsize in onpolicy algorithms 13. potential bugfix for subproc.wait 14. fix RecurrentActorProb 15. improve the code-coverage (from 90% to 95%) and remove the dead code 16. fix some incorrect type annotation The above improvement also increases the training FPS: on my computer, the previous version is only ~1800 FPS and after that, it can reach ~2050 (faster than v0.2.4.post1).

Trinkle23897 and others added 24 commits August 22, 2020 08:09

collect fake data when buffer is None in Collector

ea62500

add env_id in info for all environments

b8a797c

fix test in collector preprocess_fn

d47c38d

potential bugfix for subproc.wait

43cbed7

add steps count for test env; copy data for list buffer

cea56cb

enable exact n_episode for each env.

b0ae34c

.keys()

6725bfc

test __contains__

9489e7a

fix atari test

4d943e7

move deepcopy to collector (whole_data inplace modification cause Lis…

c1e4fbd

…tBuffer still keep reference)

bypsas the attr check for batch.weight, test_dqn training fps 1870 ->…

4516b90

… 1910

change nstep to_torch and reach 1950+ (near v0.2.4.post1)

c8ca9e5

fix a bug in per

c50de09

batch.pop

b0e3cd5

move previous script to runnable/

d52f4f3

rename

6be957c

little enhancement by modifying _parse_value

1367e2c

add max_batchsize in a2c and ppo

cdbf8f6

move test_gae to test/base/test_returns

7585f49

add test_nstep

543a7d9

increase drqn gamma

7681742

performance improvement (+50) by analyzing traces

a9c1b2e

add policy.eval() before watching its performance

4d24149

remove previous atari script

dbddda7

Trinkle23897 changed the title ~~Optimize training procedure and improve code coverage~~ WIP: optimize training procedure and improve code coverage Aug 22, 2020

Trinkle23897 added 5 commits August 22, 2020 08:46

find a bug in exact n_episode

88db97b

fix 0 in n_episode

226a518

improve little coverage

5fe0850

add missing test

8c0f414

add missing test for buffer, to_numpy and to_torch

0a168ce

fix

df4129f

Trinkle23897 dismissed duburcqa’s stale review via df4129f August 26, 2020 13:34