这是indexloc提供的服务,不要输入任何密码
Skip to content

Add CachedReplayBuffer and ReplayBufferManager #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jan 29, 2021

Conversation

ChenDRAG
Copy link
Collaborator

@ChenDRAG ChenDRAG commented Jan 20, 2021

This is the second commit of 6 commits mentioned in #274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check #274 for more detail.

  1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer;
  2. Make sure the reserved keys cannot be edited by methods like buffer.done = xxx;
  3. Add set_batch method for manually choosing the batch the ReplayBuffer wants to handle;
  4. Add sample_index method, same as sample but only return index instead of both index and batch data;
  5. Add prev (one-step previous transition index), next (one-step next transition index) and unfinished_index (the last modified index whose done==False);
  6. Separate alloc_fn method for allocating new memory for self._meta when a new (key, value) pair comes in;
  7. Move buffer's documentation to docs/tutorials/concepts.rst.

@ChenDRAG ChenDRAG changed the title update cachedreplaybuffer update cachedreplaybuffer&vecreplaybuffer Jan 20, 2021
@ChenDRAG ChenDRAG changed the title update cachedreplaybuffer&vecreplaybuffer update CachedReplayBuffer&VecReplayBuffer Jan 20, 2021
@ChenDRAG
Copy link
Collaborator Author

@Trinkle23897 Could you please help me check if it is possible to separate of stack option and other abilities of ReplayBuffer? I have not done that part. And it is a little diffcult for me because I'm not familar with RNN stack usage. Thanks a lot.

@Trinkle23897 Trinkle23897 marked this pull request as draft January 20, 2021 09:53
@Trinkle23897 Trinkle23897 changed the title update CachedReplayBuffer&VecReplayBuffer Add VectorReplayBuffer and CachedReplayBuffer Jan 23, 2021
@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jan 24, 2021

Update: resolved

@duburcqa I have some trouble with the new buffer type's pickle. (I think the current code is clean but lack of unit test, will be fixed soon...)

Because the CachedReplayBuffer needs to perform getitem from many sub-buffers, if each sub-buffer's _meta is a separated batch (no memory contiguous), it will first getitem then Batch.cat all of the data, which may cause a lot of overhead. To avoid this, our approach is to create _meta on the top, and create batch slice and send to the sub-buffers by set_batch.

Here is an example: we now have a buffer that contains three sub-buffers, with each of size=10 (total=30). At this time, we add a new key-value pair called info["Timelimit.truncate"]: bool. It will allocate new memory as follows:

Step 1: one sub-buffer's `_add_to_buffer` will raise exception, and will call a hook function `_alloc` to
        require new memory for this key;
Step 2: The top of this buffer class receives this signal, and generate `np.zeros(30, np.bool_)` to 
        `top_buffer._meta.info["Timelimit.truncate"]`;
Step 3: Dispatch the whole `top_buffer._meta` to sub-buffers, i.e., 
        `subbuffer[0].set_batch(top_buffer._meta[0:10])`

The current implementation use alloc_fn hook as an input argument, but in VectorReplayBuffer and CachedReplayBuffer, this approach is not compatible with pickle, because the user-defined hook cannot be pickled. You can run the script under test/base/test_buffer.py to see this issue.

I haven't figure out the solution to this issue, could you please give us some suggestions?

@Trinkle23897 Trinkle23897 marked this pull request as ready for review January 24, 2021 10:07
@Trinkle23897 Trinkle23897 requested a review from duburcqa January 24, 2021 10:07
@Trinkle23897 Trinkle23897 changed the title Add VectorReplayBuffer and CachedReplayBuffer Add CachedReplayBuffer Jan 25, 2021
@codecov-io
Copy link

codecov-io commented Jan 25, 2021

Codecov Report

Merging #278 (b9385d8) into master (1eb6137) will increase coverage by 0.24%.
The diff coverage is 99.57%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
+ Coverage   94.39%   94.64%   +0.24%     
==========================================
  Files          45       45              
  Lines        2892     3006     +114     
==========================================
+ Hits         2730     2845     +115     
+ Misses        162      161       -1     
Flag Coverage Δ
unittests 94.64% <99.57%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/data/buffer.py 99.69% <99.57%> (+0.64%) ⬆️
tianshou/data/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1eb6137...b9385d8. Read the comment docs.

@Trinkle23897
Copy link
Collaborator

If you think it is a special case, your workaround is fine

In my current thought it is a special case. I haven't come up with other demands or requirements similar to alloc_fn.

@duburcqa
Copy link
Collaborator

In my current thought it is a special case. I haven't come up with other demands or requirements similar to alloc_fn.

Why not then ! But may be the name is not very appropriate if it is a instance method (supposedly private) instead of a hook ?

@duburcqa
Copy link
Collaborator

commit 425c2bd made the speed from 1980 to 1900

What are you using for profiling ? Personally I'm now using py-spy and it is just mesmerising (open the interactive svg in webbrowser)

py-spy record --native -o profiling.svg -- python3 bench.py

@duburcqa
Copy link
Collaborator

I remove @staticmethod in the previous implementation and overwrite the sub-buffer's alloc_fn in the initialization of ReplayBuffers

Hum... that's look kind of hacky but not meaningless actually. Is it still pickle correctly after that change at instance-level ?

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jan 28, 2021

Hum... that's look kind of hacky but not meaningless actually.

I think so, but I haven't come up with other approaches better than this. (compatible with pickle)

Is it still pickle correctly after that change at instance-level ?

Sure. I add the related test under test_hdf5

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jan 28, 2021

But may be the name is not very appropriate if it is a instance method (supposedly private) instead of a hook ?

What's your suggestion? (Also I'm not sure if the name ReplayBuffers is good enough since it is only one character different than ReplayBuffer, but changing its name to ReplayBufferList will be further misleading because we already have ListReplayBuffer)

@duburcqa
Copy link
Collaborator

duburcqa commented Jan 28, 2021

I'm not sure if the name ReplayBuffers is good enough since it is only one character different than ReplayBuffer, but changing its name to ReplayBufferList will be further misleading because we already have ListReplayBuffer

What about ReplayBufferManager, ReplayBufferSupervisor, ReplayBufferOrchestrator, ReplayBufferDispatcher ...

@duburcqa
Copy link
Collaborator

duburcqa commented Jan 28, 2021

What's your suggestion?

I don't know, maybe just _buffer_allocator or _buffer_dynamic_allocator.

…self._episode_rew/len defaults to 0.0/0 instead of np.array
@Trinkle23897
Copy link
Collaborator

What are you using for profiling ?

output

@duburcqa
Copy link
Collaborator

duburcqa commented Jan 28, 2021

Your profiling graph is polluted by non-significant method calls (such as import.load modules), you should use a script running for a longer duration.

I was using this tool before, but I find it less readable than py-spy at the end. But both are good :)

@Trinkle23897
Copy link
Collaborator

py-spy result profiling.zip

I can't find the part related to replaybuffer. Is there anything wrong?

@duburcqa
Copy link
Collaborator

I can't find the part related to replaybuffer. Is there anything wrong?

And yet it is here! Just use the search functionality integrated in the svg itself, and look for buffer.py

@duburcqa
Copy link
Collaborator

duburcqa commented Jan 28, 2021

By looking at the time spend in this module, I don't think it is relevant to optimize it further x)

NB: Just disable --native option to avoid showing time spend in torch if you prefer.

duburcqa
duburcqa previously approved these changes Jan 28, 2021
@Trinkle23897 Trinkle23897 changed the title Add CachedReplayBuffer Add CachedReplayBuffer and ReplayBufferManager Jan 29, 2021
@ChenDRAG
Copy link
Collaborator Author

I think cachedbuffer is ready now.

@Trinkle23897 Trinkle23897 merged commit f0129f4 into thu-ml:master Jan 29, 2021
@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jan 30, 2021

@duburcqa There is a question that makes me confused.

import numpy as np
from tianshou.data import ReplayBuffer, CachedReplayBuffer
buf = CachedReplayBuffer(ReplayBuffer(10000), 1, 10000)
buf.add(obs=[np.random.rand(4, 84, 84)], act=[1], rew=[1], done=[1])
buf = CachedReplayBuffer(ReplayBuffer(10000), 1, 10000)
buf.add(obs=[np.random.rand(4, 84, 84)], act=[1], rew=[1], done=[1])

Repeating executing the last two lines, you'll see that the actual memory is still increasing. That's weird because theoretically the reference count of previous buf = CachedReplayBuffer(...) is 0. However it seems like it doesn't do the garbage collection. Do you have any idea?

@duburcqa
Copy link
Collaborator

duburcqa commented Jan 30, 2021

What about this ?

import gc
import numpy as np
from tianshou.data import ReplayBuffer, CachedReplayBuffer
for _ in range(100):
    buf = CachedReplayBuffer(ReplayBuffer(10000), 1, 10000)
    buf.add(obs=[np.random.rand(4, 84, 84)], act=[1], rew=[1], done=[1])
    gc.collect()

If there is an actual memory leak, you should be able to find the line that is responsible of it using memory_profiler. I'm using it frequently and it is quite good, even though time consuming to use. mprof especially is pretty useful.

@Trinkle23897
Copy link
Collaborator

gc.collect() works fine. Thanks~

@Trinkle23897 Trinkle23897 mentioned this pull request Feb 16, 2021
@Trinkle23897 Trinkle23897 linked an issue Apr 21, 2021 that may be closed by this pull request
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
This is the second commit of 6 commits mentioned in thu-ml#274, which features minor refactor of ReplayBuffer and adding two new ReplayBuffer classes called CachedReplayBuffer and ReplayBufferManager. You can check thu-ml#274 for more detail.

1. Add ReplayBufferManager (handle a list of buffers) and CachedReplayBuffer;
2. Make sure the reserved keys cannot be edited by methods like `buffer.done = xxx`;
3. Add `set_batch` method for manually choosing the batch the ReplayBuffer wants to handle;
4. Add `sample_index` method, same as `sample` but only return index instead of both index and batch data;
5. Add `prev` (one-step previous transition index), `next` (one-step next transition index) and `unfinished_index` (the last modified index whose done==False);
6. Separate `alloc_fn` method for allocating new memory for `self._meta` when a new `(key, value)` pair comes in;
7. Move buffer's documentation to `docs/tutorials/concepts.rst`.

Co-authored-by: n+e <trinkle23897@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plans of releasing mujoco benchmark with ddpg/sac/td3 on Tianshou
4 participants