-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Step collector implementation #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… for replaybuffer
Codecov Report
@@ Coverage Diff @@
## dev #280 +/- ##
==========================================
- Coverage 94.64% 94.47% -0.17%
==========================================
Files 45 45
Lines 3027 3152 +125
==========================================
+ Hits 2865 2978 +113
- Misses 162 174 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
I think this pr is ready to merge now. |
Other suggestings will appear in the next PR because this is too large (over 2000+ lines, though lots of changes are from the test). |
This is the third PR of 6 commits mentioned in thu-ml#274, which features refactor of Collector to fix thu-ml#245. You can check thu-ml#274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>
This is the third commit of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail.