merge dev to master #302

Trinkle23897 · 2021-03-01T13:38:41Z

This merge contains several API and implementation updates as described in #274.

This is the second commit of 6 commits mentioned in #274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check #274 for more detail. 1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer; 2. Make sure the reserved keys cannot be edited by methods like `buffer.done = xxx`; 3. Add `set_batch` method for manually choosing the batch the ReplayBuffer wants to handle; 4. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data; 5. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose done==False); 6. Separate `alloc_fn` method for allocating new memory for `self._meta` when a new `(key, value)` pair comes in; 7. Move buffer's documentation to `docs/tutorials/concepts.rst`. Co-authored-by: n+e <trinkle23897@gmail.com>

1. `_create_value(Batch(a={}, b=[1, 2, 3]), 10, False)` before: ```python TypeError: cannot concatenate with Batch() which is scalar ``` after: ```python Batch( a: Batch(), b: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), ) ``` 2. creating keys in a batch's subkey, e.g. ```python a = Batch(info={"key1": [0, 1], "key2": [2, 3]}) a[0] = Batch(info={"key1": 2, "key3": 4}) print(a) ``` before: ```python Batch( info: Batch( key1: array([0, 1]), key2: array([0, 3]), ), ) ``` after: ```python ValueError: Creating keys is not supported by item assignment. ``` 3. small optimization for `Batch.stack_` and `Batch.cat_`, raise ValueError when receiving invalid data format.

This is the third PR of 6 commits mentioned in #274, which features refactor of Collector to fix #245. You can check #274 for more detail. Things changed in this PR: 1. refactor collector to be more cleaner, split AsyncCollector to support asyncvenv; 2. change buffer.add api to add(batch, bffer_ids); add several types of buffer (VectorReplayBuffer, PrioritizedVectorReplayBuffer, etc.) 3. add policy.exploration_noise(act, batch) -> act 4. small change in BasePolicy.compute_*_returns 5. move reward_metric from collector to trainer 6. fix np.asanyarray issue (different version's numpy will result in different output) 7. flake8 maxlength=88 8. polish docs and fix test Co-authored-by: n+e <trinkle23897@gmail.com>

This PR focus on some definition change of trainer to make it more friendly to use and be consistent with typical usage in research papers, typically change `collect-per-step` to `step-per-collect`, add `update-per-step` / `episode-per-collect` accordingly, and modify the documentation.

This PR focus on refactor of logging method to solve bug of nan reward and log interval. After these two pr, hopefully fundamental change of tianshou/data is finished. We then can concentrate on building benchmarks of tianshou finally. Things changed: 1. trainer now accepts logger (BasicLogger or LazyLogger) instead of writer; 2. remove utils.SummaryWriter;

* consider timelimit.truncated in calculating returns by default * remove ignore_done

* remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>

#303) Things changed in this PR: - various docs update, add TOC - split buffer into several files - fix venv action_space randomness

v0.4.0

ChenDRAG and others added 10 commits January 29, 2021 12:23

merge master into dev

d918022

Improve buffer.prev() & buffer.next() (#294)

e99e1b0

Add Timelimit trick to optimize policies (#296)

3108b9d

* consider timelimit.truncated in calculating returns by default * remove ignore_done

Remove reward_normaliztion option in offpolicy algorithm (#298)

f22b539

* remove rew_norm in nstep implementation * improve test * remove runnable/ * various doc fix Co-authored-by: n+e <trinkle23897@gmail.com>

fix vecenv action_space randomness (#300)

31e7f44

Trinkle23897 linked an issue Mar 1, 2021 that may be closed by this pull request

Plans of releasing mujoco benchmark with ddpg/sac/td3 on Tianshou #274

Closed

fix venv seed, add TOC in docs, and split buffer.py into several files (

454c86c

#303) Things changed in this PR: - various docs update, add TOC - split buffer into several files - fix venv action_space randomness

Trinkle23897 requested a review from ChenDRAG March 2, 2021 05:51

ChenDRAG approved these changes Mar 2, 2021

View reviewed changes

Trinkle23897 merged commit 389bdb7 into master Mar 2, 2021

Trinkle23897 deleted the dev branch March 25, 2021 01:49

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024

Merge pull request thu-ml#302 from thu-ml/dev

3c9f257

v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

merge dev to master #302

merge dev to master #302

Uh oh!

Trinkle23897 commented Mar 1, 2021

Uh oh!

Uh oh!

merge dev to master #302

merge dev to master #302

Uh oh!

Conversation

Trinkle23897 commented Mar 1, 2021

Uh oh!

Uh oh!