Improve PER #159

Trinkle23897 · 2020-07-23T13:34:25Z

use segment tree to rewrite the previous PrioReplayBuffer code, add the test
enable all Q-learning algorithms to use PER

tianshou/data/utils/segtree.py

codecov-commenter · 2020-08-01T01:19:53Z

Codecov Report

Merging #159 into dev will increase coverage by 0.87%.
The diff coverage is 94.59%.

@@            Coverage Diff             @@
##              dev     #159      +/-   ##
==========================================
+ Coverage   88.63%   89.50%   +0.87%     
==========================================
  Files          38       38              
  Lines        2226     2278      +52     
==========================================
+ Hits         1973     2039      +66     
+ Misses        253      239      -14

Flag	Coverage Δ
#unittests	`89.50% <94.59%> (+0.87%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/data/utils/converter.py	`87.23% <ø> (ø)`
tianshou/policy/modelfree/ddpg.py	`97.46% <75.00%> (-1.24%)`	⬇️
tianshou/policy/modelfree/sac.py	`85.41% <83.33%> (-0.61%)`	⬇️
tianshou/policy/modelfree/td3.py	`98.59% <83.33%> (-1.41%)`	⬇️
tianshou/data/utils/segtree.py	`94.82% <94.82%> (ø)`
tianshou/data/__init__.py	`100.00% <100.00%> (ø)`
tianshou/data/buffer.py	`96.62% <100.00%> (+4.03%)`	⬆️
tianshou/policy/base.py	`95.38% <100.00%> (+0.38%)`	⬆️
tianshou/policy/modelfree/dqn.py	`97.50% <100.00%> (-0.18%)`	⬇️
tianshou/utils/__init__.py
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 99a1d40...0b8a316. Read the comment docs.

tianshou/data/utils/segtree.py

…rio-buffer

Trinkle23897 · 2020-08-02T11:22:31Z

Discussion: is it necessary to support the segment tree with min/max operator? By definition, segment tree supports any binary commutative operators op(a, op(b, c)) = op(op(a, b), c). But the sum tree we use is not an exact segment tree: we require elements in the tree that are non-negative.

This is only the constraint for get_prefix_sum_idx, but for the normal usage of Segment Tree (op(value[l:r])), that doesn't exist.

test/base/test_buffer.py

youkaichao · 2020-08-03T15:40:19Z

test/base/test_buffer.py

+    # profile
+    if __name__ == '__main__':
+        size = 100000
+        bsz = 64
+        naive = np.random.rand(size)
+        tree = SegmentTree(size)
+        tree[np.arange(size)] = naive
+
+        def sample_npbuf():
+            return np.random.choice(size, bsz, p=naive / naive.sum())
+
+        def sample_tree():
+            scalar = np.random.rand(bsz) * tree.reduce()
+            return tree.get_prefix_sum_idx(scalar)
+
+        print('npbuf', timeit(sample_npbuf, setup=sample_npbuf, number=1000))
+        print('tree', timeit(sample_tree, setup=sample_tree, number=1000))


It is better to make it a separate function?

test/base/test_buffer.py

tianshou/data/buffer.py

youkaichao · 2020-08-03T16:09:21Z

tianshou/policy/base.py

+        # prio buffer update
+        if isinstance(buffer, PrioritizedReplayBuffer):
+            batch.update_weight = buffer.update_weight


Mounting a function buffer.update_weight to be a field of batch objects is a hack and should be avoided. Since it has been here for a while and this PR is already very large, I will open a new PR to deal with it.

The solution may be something like the following: add a function BasePolicy .update and a function BasePolicy.post_process_fn. The update of weight into the buffer can be done in BasePolicy.post_process_fn. Trainer functions just have to call BasePolicy.update.

def update(self, buffer, batch_size): batch, indices = buffer.sample(batch_size) self.process_fn(batch, buffer, indices) self.learn(batch) self.post_process_fn(batch, buffer, indices)

tianshou/policy/modelfree/td3.py

youkaichao

A nice PR to improve the efficiency of prioritized buffer!

Trinkle23897 · 2020-08-04T06:44:33Z

@duburcqa should be okay now and please have a check.

- use segment tree to rewrite the previous PrioReplayBuffer code, add the test - enable all Q-learning algorithms to use PER

first segtree without test

9e636ed

Trinkle23897 changed the title ~~WIP: Prio Experience Replay~~ WIP: Improve PER Jul 23, 2020

duburcqa reviewed Jul 23, 2020

View reviewed changes

tianshou/data/utils/segtree.py Show resolved Hide resolved

Trinkle23897 added 5 commits July 24, 2020 11:43

test some code

9208895

test prefix-sum-idx

addc7a9

Merge branch 'dev' into prio-buffer

6f7e0d1

Merge branch 'dev' into prio-buffer

f580910

finish test segtree

c0c1290

Trinkle23897 added 8 commits August 1, 2020 09:25

fix test

322a520

change prio-buffer

e15803f

align PER

a3e037a

fix test

0f0c58b

fix a bug

bbf60cd

remove mintree

31c25c3

DQN and DDPG

c3443ef

TD3 and SAC

4f72573

Trinkle23897 changed the title ~~WIP: Improve PER~~ Improve PER Aug 2, 2020

Trinkle23897 added 2 commits August 2, 2020 12:15

docs

15c53be

revert gitignore

0b8a316

youkaichao reviewed Aug 2, 2020

View reviewed changes

tianshou/data/utils/segtree.py Outdated Show resolved Hide resolved

Merge branch 'dev' into prio-buffer

8301c74

youkaichao reviewed Aug 2, 2020

View reviewed changes

tianshou/data/utils/segtree.py Outdated Show resolved Hide resolved

tianshou/data/utils/segtree.py Outdated Show resolved Hide resolved

tianshou/data/utils/segtree.py Outdated Show resolved Hide resolved

duburcqa reviewed Aug 2, 2020

View reviewed changes

tianshou/data/utils/segtree.py Outdated Show resolved Hide resolved

duburcqa previously approved these changes Aug 2, 2020

View reviewed changes

Trinkle23897 added 3 commits August 2, 2020 16:05

merge pdqn test to dqn

c3c5a64

Merge branch 'prio-buffer' of github.com:Trinkle23897/tianshou into p…

81ba208

…rio-buffer

rm <<>>

d9cd78c

Trinkle23897 dismissed duburcqa’s stale review via d9cd78c August 2, 2020 08:21

Trinkle23897 and others added 7 commits August 2, 2020 19:36

fix

a6b2e2d

fix test

6f5c4f6

fix numba part

1226e2f

add to profile test

370802a

doc polish and remove intricate xor operators

d6765d7

code refactor for _get_prefix_sum_idx

0ced37a

leave todo and doc fix

cebcc2d

youkaichao reviewed Aug 3, 2020

View reviewed changes

tianshou/data/buffer.py Outdated Show resolved Hide resolved

tianshou/data/buffer.py Show resolved Hide resolved

youkaichao reviewed Aug 3, 2020

View reviewed changes

youkaichao and others added 4 commits August 4, 2020 00:17

small fix for torch ones like

30a6619

minor fix

61ea9f0

minor fix

687ccbb

doc improve for sample

316974f

youkaichao previously approved these changes Aug 4, 2020

View reviewed changes

Trinkle23897 requested a review from duburcqa August 4, 2020 04:50

fix dqn local test

87bb133

Trinkle23897 dismissed youkaichao’s stale review via 87bb133 August 4, 2020 04:52

youkaichao previously approved these changes Aug 4, 2020

View reviewed changes

Merge branch 'dev' into prio-buffer

1377a0d

Trinkle23897 added 2 commits August 5, 2020 10:30

Merge branch 'dev' into prio-buffer

fd2ee87

fix weight update in buffer.add

eb307eb

Trinkle23897 dismissed youkaichao’s stale review via eb307eb August 5, 2020 07:25

youkaichao approved these changes Aug 6, 2020

View reviewed changes

Trinkle23897 merged commit 140b1c2 into thu-ml:dev Aug 6, 2020

Trinkle23897 deleted the prio-buffer branch August 6, 2020 02:27

youkaichao mentioned this pull request Aug 6, 2020

add policy.update to enable post process and remove collector.sample #180

Merged

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024

Improve PER (thu-ml#159)

14d4630

- use segment tree to rewrite the previous PrioReplayBuffer code, add the test - enable all Q-learning algorithms to use PER

Improve PER #159

Improve PER #159

Uh oh!

Conversation

Trinkle23897 commented Jul 23, 2020

Uh oh!

Uh oh!

codecov-commenter commented Aug 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 commented Aug 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao Aug 3, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

youkaichao Aug 3, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

youkaichao left a comment

Choose a reason for hiding this comment

Uh oh!

Trinkle23897 commented Aug 4, 2020

Uh oh!

Uh oh!

codecov-commenter commented Aug 1, 2020 •

edited

Loading

Trinkle23897 commented Aug 2, 2020 •

edited

Loading