Add CachedReplayBuffer and ReplayBufferManager #278

ChenDRAG · 2021-01-20T09:00:56Z

This is the second commit of 6 commits mentioned in #274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check #274 for more detail.

Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer;
Make sure the reserved keys cannot be edited by methods like buffer.done = xxx;
Add set_batch method for manually choosing the batch the ReplayBuffer wants to handle;
Add sample_index method, same as sample but only return index instead of both index and batch data;
Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False);
Separate alloc_fn method for allocating new memory for self._meta when a new (key, value) pair comes in;
Move buffer's documentation to docs/tutorials/concepts.rst.

… for replaybuffer

ChenDRAG · 2021-01-20T09:07:38Z

@Trinkle23897 Could you please help me check if it is possible to separate of stack option and other abilities of ReplayBuffer? I have not done that part. And it is a little diffcult for me because I'm not familar with RNN stack usage. Thanks a lot.

Trinkle23897 · 2021-01-24T08:50:01Z

Update: resolved

@duburcqa I have some trouble with the new buffer type's pickle. (I think the current code is clean but lack of unit test, will be fixed soon...)

Because the CachedReplayBuffer needs to perform getitem from many sub-buffers, if each sub-buffer's _meta is a separated batch (no memory contiguous), it will first getitem then Batch.cat all of the data, which may cause a lot of overhead. To avoid this, our approach is to create _meta on the top, and create batch slice and send to the sub-buffers by set_batch.

Here is an example: we now have a buffer that contains three sub-buffers, with each of size=10 (total=30). At this time, we add a new key-value pair called info["Timelimit.truncate"]: bool. It will allocate new memory as follows:

Step 1: one sub-buffer's `_add_to_buffer` will raise exception, and will call a hook function `_alloc` to
        require new memory for this key;
Step 2: The top of this buffer class receives this signal, and generate `np.zeros(30, np.bool_)` to 
        `top_buffer._meta.info["Timelimit.truncate"]`;
Step 3: Dispatch the whole `top_buffer._meta` to sub-buffers, i.e., 
        `subbuffer[0].set_batch(top_buffer._meta[0:10])`

The current implementation use alloc_fn hook as an input argument, but in VectorReplayBuffer and CachedReplayBuffer, this approach is not compatible with pickle, because the user-defined hook cannot be pickled. You can run the script under test/base/test_buffer.py to see this issue.

I haven't figure out the solution to this issue, could you please give us some suggestions?

codecov-io · 2021-01-25T13:39:27Z

Codecov Report

Merging #278 (b9385d8) into master (1eb6137) will increase coverage by 0.24%.
The diff coverage is 99.57%.

@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
+ Coverage   94.39%   94.64%   +0.24%     
==========================================
  Files          45       45              
  Lines        2892     3006     +114     
==========================================
+ Hits         2730     2845     +115     
+ Misses        162      161       -1

Flag	Coverage Δ
unittests	`94.64% <99.57%> (+0.24%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/data/buffer.py	`99.69% <99.57%> (+0.64%)`	⬆️
tianshou/data/__init__.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1eb6137...b9385d8. Read the comment docs.

…r(size=0, )

Trinkle23897 · 2021-01-28T13:16:51Z

If you think it is a special case, your workaround is fine

In my current thought it is a special case. I haven't come up with other demands or requirements similar to alloc_fn.

tianshou/data/buffer.py

duburcqa · 2021-01-28T13:19:59Z

In my current thought it is a special case. I haven't come up with other demands or requirements similar to alloc_fn.

Why not then ! But may be the name is not very appropriate if it is a instance method (supposedly private) instead of a hook ?

duburcqa · 2021-01-28T13:21:36Z

commit 425c2bd made the speed from 1980 to 1900

What are you using for profiling ? Personally I'm now using py-spy and it is just mesmerising (open the interactive svg in webbrowser)

py-spy record --native -o profiling.svg -- python3 bench.py

duburcqa · 2021-01-28T13:23:41Z

I remove @staticmethod in the previous implementation and overwrite the sub-buffer's alloc_fn in the initialization of ReplayBuffers

Hum... that's look kind of hacky but not meaningless actually. Is it still pickle correctly after that change at instance-level ?

Trinkle23897 · 2021-01-28T13:25:52Z

Hum... that's look kind of hacky but not meaningless actually.

I think so, but I haven't come up with other approaches better than this. (compatible with pickle)

Is it still pickle correctly after that change at instance-level ?

Sure. I add the related test under test_hdf5

Trinkle23897 · 2021-01-28T13:27:20Z

But may be the name is not very appropriate if it is a instance method (supposedly private) instead of a hook ?

What's your suggestion? (Also I'm not sure if the name ReplayBuffers is good enough since it is only one character different than ReplayBuffer, but changing its name to ReplayBufferList will be further misleading because we already have ListReplayBuffer)

duburcqa · 2021-01-28T13:34:12Z

I'm not sure if the name ReplayBuffers is good enough since it is only one character different than ReplayBuffer, but changing its name to ReplayBufferList will be further misleading because we already have ListReplayBuffer

What about ReplayBufferManager, ReplayBufferSupervisor, ReplayBufferOrchestrator, ReplayBufferDispatcher ...

duburcqa · 2021-01-28T13:36:27Z

What's your suggestion?

I don't know, maybe just _buffer_allocator or _buffer_dynamic_allocator.

…self._episode_rew/len defaults to 0.0/0 instead of np.array

Trinkle23897 · 2021-01-28T14:07:30Z

What are you using for profiling ?

duburcqa · 2021-01-28T15:01:37Z

Your profiling graph is polluted by non-significant method calls (such as import.load modules), you should use a script running for a longer duration.

I was using this tool before, but I find it less readable than py-spy at the end. But both are good :)

Trinkle23897 · 2021-01-28T15:04:10Z

py-spy result profiling.zip

I can't find the part related to replaybuffer. Is there anything wrong?

duburcqa · 2021-01-28T15:12:47Z

I can't find the part related to replaybuffer. Is there anything wrong?

And yet it is here! Just use the search functionality integrated in the svg itself, and look for buffer.py

duburcqa · 2021-01-28T15:13:52Z

By looking at the time spend in this module, I don't think it is relevant to optimize it further x)

NB: Just disable --native option to avoid showing time spend in torch if you prefer.

ChenDRAG · 2021-01-29T01:57:22Z

I think cachedbuffer is ready now.

Trinkle23897 · 2021-01-30T13:15:05Z

@duburcqa There is a question that makes me confused.

import numpy as np
from tianshou.data import ReplayBuffer, CachedReplayBuffer
buf = CachedReplayBuffer(ReplayBuffer(10000), 1, 10000)
buf.add(obs=[np.random.rand(4, 84, 84)], act=[1], rew=[1], done=[1])
buf = CachedReplayBuffer(ReplayBuffer(10000), 1, 10000)
buf.add(obs=[np.random.rand(4, 84, 84)], act=[1], rew=[1], done=[1])

Repeating executing the last two lines, you'll see that the actual memory is still increasing. That's weird because theoretically the reference count of previous buf = CachedReplayBuffer(...) is 0. However it seems like it doesn't do the garbage collection. Do you have any idea?

duburcqa · 2021-01-30T13:34:20Z

What about this ?

import gc
import numpy as np
from tianshou.data import ReplayBuffer, CachedReplayBuffer
for _ in range(100):
    buf = CachedReplayBuffer(ReplayBuffer(10000), 1, 10000)
    buf.add(obs=[np.random.rand(4, 84, 84)], act=[1], rew=[1], done=[1])
    gc.collect()

If there is an actual memory leak, you should be able to find the line that is responsible of it using memory_profiler. I'm using it frequently and it is quite good, even though time consuming to use. mprof especially is pretty useful.

Trinkle23897 · 2021-01-30T15:10:46Z

gc.collect() works fine. Thanks~

This is the second commit of 6 commits mentioned in thu-ml#274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check thu-ml#274 for more detail. 1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer; 2. Make sure the reserved keys cannot be edited by methods like `buffer.done = xxx`; 3. Add `set_batch` method for manually choosing the batch the ReplayBuffer wants to handle; 4. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data; 5. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose done==False); 6. Separate `alloc_fn` method for allocating new memory for `self._meta` when a new `(key, value)` pair comes in; 7. Move buffer's documentation to `docs/tutorials/concepts.rst`. Co-authored-by: n+e <trinkle23897@gmail.com>

ChenDRAG added 5 commits January 19, 2021 17:00

add first version of cached replay buffer(baseline), add standard api…

8229b47

… for replaybuffer

add cached buffer, vec buffer

483404f

simple pep8 fix

942c2a3

init

ec4b246

Merge branch 'master(net/utils change thu-ml#275)' into cached

e7f631e

ChenDRAG changed the title ~~update cachedreplaybuffer~~ update cachedreplaybuffer&vecreplaybuffer Jan 20, 2021

ChenDRAG changed the title ~~update cachedreplaybuffer&vecreplaybuffer~~ update CachedReplayBuffer&VecReplayBuffer Jan 20, 2021

Trinkle23897 marked this pull request as draft January 20, 2021 09:53

Trinkle23897 and others added 8 commits January 20, 2021 18:21

Merge branch 'master' into cached

da564be

some change

36e799e

update ReplayBuffer

50b20a0

Merge branch 'cached' of github.com:ChenDRAG/tianshou into cached

3e487dc

refactor ReplayBuffer

0ac97af

refactor vec/cached buffer

bd90b97

pep8 fix

2b7c227

update VectorReplayBuffer and add test

3afb2cb

Trinkle23897 changed the title ~~update CachedReplayBuffer&VecReplayBuffer~~ Add VectorReplayBuffer and CachedReplayBuffer Jan 23, 2021

update cached

443969d

Trinkle23897 marked this pull request as ready for review January 24, 2021 10:07

Trinkle23897 requested a review from duburcqa January 24, 2021 10:07

order change, small fix

ee51e64

Trinkle23897 changed the title ~~Add VectorReplayBuffer and CachedReplayBuffer~~ Add CachedReplayBuffer Jan 25, 2021

Trinkle23897 added 2 commits January 25, 2021 18:19

try unittest

a5bc4ad

add more test and fix bugs

17e3612

Trinkle23897 added 3 commits January 25, 2021 22:03

fix a bug and add some corner-case tests

7eba23d

re-implement sample_avail function and add test for CachedReplayBuffe…

b9f4f2a

…r(size=0, )

improve documents

8fe85f8

duburcqa reviewed Jan 28, 2021

View reviewed changes

tianshou/data/buffer.py Outdated Show resolved Hide resolved

ReplayBuffers -> ReplayBufferManager; alloc_fn -> _buffer_allocator; …

720da29

…self._episode_rew/len defaults to 0.0/0 instead of np.array

re-organize test_buffer.py

b9385d8

duburcqa previously approved these changes Jan 28, 2021

View reviewed changes

improve test

16bb42e

Trinkle23897 dismissed duburcqa’s stale review via 16bb42e January 29, 2021 00:57

test if can be faster

cada7cc

Trinkle23897 changed the title ~~Add CachedReplayBuffer~~ Add CachedReplayBuffer and ReplayBufferManager Jan 29, 2021

Trinkle23897 approved these changes Jan 29, 2021

View reviewed changes

Trinkle23897 merged commit f0129f4 into thu-ml:master Jan 29, 2021

Trinkle23897 mentioned this pull request Feb 16, 2021

restore #284 #291

Merged

Trinkle23897 linked an issue Apr 21, 2021 that may be closed by this pull request

Plans of releasing mujoco benchmark with ddpg/sac/td3 on Tianshou #274

Closed

Add CachedReplayBuffer and ReplayBufferManager #278

Add CachedReplayBuffer and ReplayBufferManager #278

Uh oh!

Conversation

ChenDRAG commented Jan 20, 2021 • edited by Trinkle23897 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChenDRAG commented Jan 20, 2021

Uh oh!

Trinkle23897 commented Jan 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Jan 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Trinkle23897 commented Jan 28, 2021

Uh oh!

Uh oh!

duburcqa commented Jan 28, 2021

Uh oh!

duburcqa commented Jan 28, 2021

Uh oh!

duburcqa commented Jan 28, 2021

Uh oh!

Trinkle23897 commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trinkle23897 commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trinkle23897 commented Jan 28, 2021

Uh oh!

duburcqa commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trinkle23897 commented Jan 28, 2021

Uh oh!

duburcqa commented Jan 28, 2021

Uh oh!

duburcqa commented Jan 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChenDRAG commented Jan 29, 2021

Uh oh!

Trinkle23897 commented Jan 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duburcqa commented Jan 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Trinkle23897 commented Jan 30, 2021

Uh oh!

Uh oh!

ChenDRAG commented Jan 20, 2021 •

edited by Trinkle23897

Loading

Trinkle23897 commented Jan 24, 2021 •

edited

Loading

codecov-io commented Jan 25, 2021 •

edited

Loading

Trinkle23897 commented Jan 28, 2021 •

edited

Loading

Trinkle23897 commented Jan 28, 2021 •

edited

Loading

duburcqa commented Jan 28, 2021 •

edited

Loading

duburcqa commented Jan 28, 2021 •

edited

Loading

duburcqa commented Jan 28, 2021 •

edited

Loading

duburcqa commented Jan 28, 2021 •

edited

Loading

Trinkle23897 commented Jan 30, 2021 •

edited

Loading

duburcqa commented Jan 30, 2021 •

edited

Loading