Add Rainbow DQN #386

nuance1979 · 2021-06-30T04:17:27Z

Implement Rainbow DQN
Reference: paper and code1, code2

I am currently running Atari examples. Will update the results soon.

codecov-commenter · 2021-06-30T04:31:22Z

Codecov Report

Merging #386 (a040a6d) into master (d161059) will increase coverage by 0.13%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #386      +/-   ##
==========================================
+ Coverage   94.69%   94.82%   +0.13%     
==========================================
  Files          57       58       +1     
  Lines        3749     3807      +58     
==========================================
+ Hits         3550     3610      +60     
+ Misses        199      197       -2

Flag	Coverage Δ
unittests	`94.82% <100.00%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/data/buffer/prio.py	`91.30% <100.00%> (+0.82%)`	⬆️
tianshou/data/buffer/vecbuf.py	`100.00% <100.00%> (ø)`
tianshou/policy/__init__.py	`100.00% <100.00%> (ø)`
tianshou/policy/modelfree/rainbow.py	`100.00% <100.00%> (ø)`
tianshou/utils/net/common.py	`95.74% <100.00%> (+2.12%)`	⬆️
tianshou/utils/net/discrete.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d161059...a040a6d. Read the comment docs.

docs/index.rst

tianshou/data/buffer/prio.py

Trinkle23897

Also could you please check #393?

examples/atari/README.md

tianshou/policy/modelfree/c51.py

Trinkle23897 · 2021-07-19T01:18:08Z

Why did you move weight norm to policy side?

nuance1979 · 2021-07-19T04:00:36Z

Why did you move weight norm to policy side?

I thought for backward compatibility it might be better to put the weight norm hack on the policy side. Basically it's up to the policy to decide whether it wants to use the weight norm hack or not.

I don't have a strong opinion about it so if you insist, I can move it back to the buffer side.

Trinkle23897 · 2021-07-19T04:02:38Z

So how about adding an extra argument in prio-buffer and set it default to False?

Trinkle23897 · 2021-08-12T00:19:29Z

@nuance1979 is it ready now...?

nuance1979 · 2021-08-17T18:05:17Z

@nuance1979 is it ready now...?

Sorry for the delay.

After fixing the weight normalization and beta annealing, some tasks (Enduro, SpaceInvaders, etc.) still get terrible results so I tried hard to figure out why. Let me summarize my current findings:

I compared the current NoisyLinear implementation with https://github.com/deepmind/dqn_zoo/ and found two differences:

A different but equivalent way of adding the noise;
dqn_zoo removes the bias term for the second NoisyLinear layer.

I tried to align with dqn_zoo implementations regarding the above two but found no meaningful differences in Enduro performance.

So I tried turning off some features of Rainbow:

Experiment	Priority Buffer	NoisyLinear	Dueling	Best reward
Enduro C51 baseline (copied from README.md)	N	N	N	1032
weight norm + beta from 0.4 to 1	Y	Y	Y	450.2
- prio buffer	N	Y	Y	487.2
- prio buffer - dueling	N	Y	N	354.9
- prio buffer - dueling - noisy	N	N	N	1459.7
- prio buffer - noisy	N	N	Y	1469
- noisy	Y	N	Y	1645.6

Therefore for Enduro, NoisyLinear layer hurts the performance. I suspect the same would be true for other low-performing tasks.

I feel that I have exhausted ideas to explore further. I could add "--no-noisy" as a task-specific parameter for Enduro (and potentially more tasks with similar behavior) or I could keep the low-performing numbers for the sake of consistency.

What do you think? @Trinkle23897

… rainbow

Trinkle23897 · 2021-08-17T23:23:48Z

That's fine for --no-noisy. But have you ever tried --training-num=4 ? I suspect it suffers from policy lag.

nuance1979 · 2021-08-17T23:48:12Z

That's fine for --no-noisy. But have you ever tried --training-num=4 ? I suspect it suffers from policy lag.

I tried it before and it didn't make a difference. But things have changed a lot since then. I'll kick off an experiment to confirm.

nuance1979 · 2021-08-20T01:10:21Z

That's fine for --no-noisy. But have you ever tried --training-num=4 ? I suspect it suffers from policy lag.

I tried it before and it didn't make a difference. But things have changed a lot since then. I'll kick off an experiment to confirm.

I got the result:

Experiment	Priority Buffer	NoisyLinear	Dueling	Best reward
Enduro C51 baseline (copied from README.md)	N	N	N	1032
weight norm + beta from 0.4 to 1	Y	Y	Y	450.2
+ training_num_4	Y	Y	Y	957.3
- noisy (training_num_10)	Y	N	Y	1645.6

Better than default ("--training-num=10") but still far worse than "--no-noisy".

Trinkle23897 · 2021-08-20T01:12:53Z

okay that's fine. feel free to add --no-noisy

nuance1979 · 2021-08-23T02:31:46Z

I might have found a bug in my code which could cause NoisyLinear not working correctly. I'm running experiments to confirm.

Trinkle23897 · 2021-08-23T09:43:30Z

I have another unrelated issue: usually the atari network's feature part is end up by linear(3136, 512) instead of nn.flatten. I see only https://github.com/ku2482/fqf-iqn-qrdqn.pytorch use the latter setting, is that correct?

nuance1979 · 2021-08-23T17:00:27Z

I have another unrelated issue: usually the atari network's feature part is end up by linear(3136, 512) instead of nn.flatten. I see only https://github.com/ku2482/fqf-iqn-qrdqn.pytorch use the latter setting, is that correct?

I think that's correct. Otherwise it will not match the input shape of the following linear layer. I found the following line, which is just another way of flattening, in the repo where there is no nn.Flatten in the model itself:

https://github.com/Kaixhin/Rainbow/blob/9ff5567ad1234ae0ed30d8471e8f13ae07119395/model.py#L71

    x = x.view(-1, self.conv_output_size)

Trinkle23897 · 2021-08-24T01:42:52Z

well I mean in on-policy atari setting they use the following:

ActorCriticCnnPolicy(
  (features_extractor): NatureCNN(
    (cnn): Sequential(
      (0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4))
      (1): ReLU()
      (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
      (3): ReLU()
      (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (5): ReLU()
      (6): Flatten(start_dim=1, end_dim=-1)
    )
    (linear): Sequential(
      (0): Linear(in_features=3136, out_features=512, bias=True)
      (1): ReLU()
    )
  )
  (action_net): Linear(in_features=512, out_features=6, bias=True)
  (value_net): Linear(in_features=512, out_features=1, bias=True)
)

Could you please double-check the author's implementation on a series of offline-rl atari settings you previously implemented, together with fqf and iqn?

nuance1979 · 2021-08-24T18:05:00Z

well I mean in on-policy atari setting they use the following:

ActorCriticCnnPolicy(
  (features_extractor): NatureCNN(
    (cnn): Sequential(
      (0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4))
      (1): ReLU()
      (2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
      (3): ReLU()
      (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
      (5): ReLU()
      (6): Flatten(start_dim=1, end_dim=-1)
    )
    (linear): Sequential(
      (0): Linear(in_features=3136, out_features=512, bias=True)
      (1): ReLU()
    )
  )
  (action_net): Linear(in_features=512, out_features=6, bias=True)
  (value_net): Linear(in_features=512, out_features=1, bias=True)
)

Could you please double-check the author's implementation on a series of offline-rl atari settings you previously implemented, together with fqf and iqn?

I see. The above structure essentially shared one more linear layer of (3136, 512) between action and value nets. I checked a few repos:

dqn_zoo does NOT share this layer. link1 and link2
Dopamine does. link1 and link2
Kaixhin's rainbow does NOT share this layer. link1 and link2
fqf-iqn-qrdqn's rainbow does NOT share this layer. link1 and link2
My current Rainbow implementation does NOT share this layer.

So for Rainbow's action and value nets, we are with the majority.

For IQN, the equivalent question is whether the CosNet has input size 3136 or 512 (i.e., after one linear layer of (3136, 512)). All the repos above has input size 3136, following the original paper. Same for FQF.

For offline-rl methods, CQL doesn't have this problem since no two heads share layers; CRR does have this issue since there are actor and critic nets. However, I can't find the reference implementation from the authors.

nuance1979 · 2021-08-28T02:19:33Z

I might have found a bug in my code which could cause NoisyLinear not working correctly. I'm running experiments to confirm.

I have fixed a bug due to a misunderstanding of the role NoisyLinear plays in Rainbow model. Basically I disabled explore_noise when NoisyLinear layers were used. Therefore the bad results with NoisyLinear layer were mainly due to the absence of the exploration. After the fix, the results with NoisyLinear are comparable to, but not always better than, the results without. Since we only have a single run, I wouldn't draw any conclusion based on it. Now I will not use "--no-noisy" option for the reported results in README.md .

There is one exception: Seaquest. When I accidentally disabled explore_noise, I got much higher results (~16000 vs ~2300). I couldn't figure out the reason.

Anyway I think this PR is ready to merge.

Trinkle23897 · 2021-08-28T02:21:08Z

Cool, I'll take a look this weekend

Trinkle23897 · 2021-12-03T14:04:36Z

There are some differences in the implementation of the function f(x). In this repo, the x = torch.randn(x.size(0), device=x.device) will override the input parameter x.

https://github.com/ku2482/fqf-iqn-qrdqn.pytorch/blob/11d70bb428e449fe5384654c05e4ab2c3bbdd4cd/fqf_iqn_qrdqn/network.py#L218-219

tianshou/tianshou/utils/net/discrete.py

Lines 349 to 351 in 3592f45

def f(self, x: torch.Tensor) -> torch.Tensor:

x = torch.randn(x.size(0), device=x.device)

return x.sign().mul_(x.abs().sqrt_())

@nuance1979 is that the cause of bad performance? Could you please check it when you're free? Many thanks!

- add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network

Yi Su added 2 commits June 30, 2021 12:03

implement Rainbow DQN

8c0a4c0

make linter happy

9d8f565

make mypy happy

45d8bb1

Trinkle23897 reviewed Jun 30, 2021

View reviewed changes

docs/index.rst Outdated Show resolved Hide resolved

tianshou/data/buffer/prio.py Outdated Show resolved Hide resolved

Trinkle23897 linked an issue Jun 30, 2021 that may be closed by this pull request

Rainbow DQN #385

Closed

8 tasks

Trinkle23897 and others added 3 commits June 30, 2021 17:47

fix a bug about thu-ml#381

46ed079

address review comments

403404e

add a test for rainbow

1a44548

Trinkle23897 linked an issue Jul 1, 2021 that may be closed by this pull request

Noisy network implementation #194

Closed

Trinkle23897 and others added 7 commits July 5, 2021 09:50

Merge branch 'master' into rainbow

a22f474

fix documentation

2762945

control the timing of sampling noises

7f7b136

fix a bug in noisy linear

05ac8f2

fix doc and test

45874a4

update exp results

14c9c18

make pydocstyle happy

5e6c46d

Trinkle23897 reviewed Jul 8, 2021

View reviewed changes

examples/atari/README.md Show resolved Hide resolved

Yi Su added 4 commits July 9, 2021 10:09

minor fix

03d3f73

minor fix about sample_noise on model_old

f85a584

remove eps hack in prio buffer

c2c12ce

revert eps hack and scale weights instead

f9d4347

Trinkle23897 reviewed Jul 16, 2021

View reviewed changes

tianshou/policy/modelfree/c51.py Outdated Show resolved Hide resolved

Yi Su added 4 commits July 18, 2021 12:20

remove weight scaling by magic number in favor of weight normalization

7900450

fix test failure

1ea40a1

use np.max() to maximize compatibility

3772c0f

move weight norm to the policy side

42b4023

separate log dirs

0ed3f21

nuance1979 and others added 2 commits August 17, 2021 13:12

Merge branch 'master' into rainbow

a204fda

Merge branch 'rainbow' of https://github.com/nuance1979/tianshou into…

4cb94f2

… rainbow

Yi Su and others added 6 commits August 22, 2021 23:05

update results

ed8552f

update plots

96a5b86

Merge branch 'master' into rainbow

8599d1e

fix test failure

2211296

fix test failure again

f2384eb

fix more test failure

5ecf6d3

Yi Su and others added 3 commits August 25, 2021 02:21

fix a bug about explore_noise

4d2debf

update plots

c946542

make linter happy

a040a6d

Trinkle23897 approved these changes Aug 29, 2021

View reviewed changes

Trinkle23897 merged commit 291be08 into thu-ml:master Aug 29, 2021

nuance1979 deleted the rainbow branch October 6, 2021 17:27

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024

Add Rainbow DQN (thu-ml#386)

b7372b0

- add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network

Add Rainbow DQN #386

Add Rainbow DQN #386

Uh oh!

Conversation

nuance1979 commented Jun 30, 2021

Uh oh!

codecov-commenter commented Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 commented Jul 19, 2021

Uh oh!

nuance1979 commented Jul 19, 2021

Uh oh!

Trinkle23897 commented Jul 19, 2021

Uh oh!

Trinkle23897 commented Aug 12, 2021

Uh oh!

nuance1979 commented Aug 17, 2021

Uh oh!

Trinkle23897 commented Aug 17, 2021

Uh oh!

nuance1979 commented Aug 17, 2021

Uh oh!

nuance1979 commented Aug 20, 2021

Uh oh!

Trinkle23897 commented Aug 20, 2021

Uh oh!

nuance1979 commented Aug 23, 2021

Uh oh!

Trinkle23897 commented Aug 23, 2021

Uh oh!

nuance1979 commented Aug 23, 2021

Uh oh!

Trinkle23897 commented Aug 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nuance1979 commented Aug 24, 2021

Uh oh!

nuance1979 commented Aug 28, 2021

Uh oh!

Trinkle23897 commented Aug 28, 2021

Uh oh!

Trinkle23897 commented Dec 3, 2021

Uh oh!

Uh oh!

codecov-commenter commented Jun 30, 2021 •

edited

Loading

Trinkle23897 commented Aug 24, 2021 •

edited

Loading