-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add Rainbow DQN #386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rainbow DQN #386
Conversation
Codecov Report
@@ Coverage Diff @@
## master #386 +/- ##
==========================================
+ Coverage 94.69% 94.82% +0.13%
==========================================
Files 57 58 +1
Lines 3749 3807 +58
==========================================
+ Hits 3550 3610 +60
+ Misses 199 197 -2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also could you please check #393?
I thought for backward compatibility it might be better to put the weight norm hack on the policy side. Basically it's up to the policy to decide whether it wants to use the weight norm hack or not. I don't have a strong opinion about it so if you insist, I can move it back to the buffer side. |
So how about adding an extra argument in prio-buffer and set it default to False? |
@nuance1979 is it ready now...? |
Sorry for the delay. After fixing the weight normalization and beta annealing, some tasks (Enduro, SpaceInvaders, etc.) still get terrible results so I tried hard to figure out why. Let me summarize my current findings: I compared the current NoisyLinear implementation with https://github.com/deepmind/dqn_zoo/ and found two differences:
I tried to align with dqn_zoo implementations regarding the above two but found no meaningful differences in Enduro performance. So I tried turning off some features of Rainbow:
Therefore for Enduro, NoisyLinear layer hurts the performance. I suspect the same would be true for other low-performing tasks. I feel that I have exhausted ideas to explore further. I could add "--no-noisy" as a task-specific parameter for Enduro (and potentially more tasks with similar behavior) or I could keep the low-performing numbers for the sake of consistency. What do you think? @Trinkle23897 |
That's fine for --no-noisy. But have you ever tried --training-num=4 ? I suspect it suffers from policy lag. |
I tried it before and it didn't make a difference. But things have changed a lot since then. I'll kick off an experiment to confirm. |
I got the result:
Better than default ("--training-num=10") but still far worse than "--no-noisy". |
okay that's fine. feel free to add |
I might have found a bug in my code which could cause NoisyLinear not working correctly. I'm running experiments to confirm. |
I have another unrelated issue: usually the atari network's feature part is end up by linear(3136, 512) instead of nn.flatten. I see only https://github.com/ku2482/fqf-iqn-qrdqn.pytorch use the latter setting, is that correct? |
I think that's correct. Otherwise it will not match the input shape of the following linear layer. I found the following line, which is just another way of flattening, in the repo where there is no https://github.com/Kaixhin/Rainbow/blob/9ff5567ad1234ae0ed30d8471e8f13ae07119395/model.py#L71
|
well I mean in on-policy atari setting they use the following: ActorCriticCnnPolicy(
(features_extractor): NatureCNN(
(cnn): Sequential(
(0): Conv2d(4, 32, kernel_size=(8, 8), stride=(4, 4))
(1): ReLU()
(2): Conv2d(32, 64, kernel_size=(4, 4), stride=(2, 2))
(3): ReLU()
(4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
(5): ReLU()
(6): Flatten(start_dim=1, end_dim=-1)
)
(linear): Sequential(
(0): Linear(in_features=3136, out_features=512, bias=True)
(1): ReLU()
)
)
(action_net): Linear(in_features=512, out_features=6, bias=True)
(value_net): Linear(in_features=512, out_features=1, bias=True)
) Could you please double-check the author's implementation on a series of offline-rl atari settings you previously implemented, together with fqf and iqn? |
I see. The above structure essentially shared one more linear layer of (3136, 512) between action and value nets. I checked a few repos:
So for Rainbow's action and value nets, we are with the majority. For IQN, the equivalent question is whether the CosNet has input size 3136 or 512 (i.e., after one linear layer of (3136, 512)). All the repos above has input size 3136, following the original paper. Same for FQF. For offline-rl methods, CQL doesn't have this problem since no two heads share layers; CRR does have this issue since there are actor and critic nets. However, I can't find the reference implementation from the authors. |
I have fixed a bug due to a misunderstanding of the role NoisyLinear plays in Rainbow model. Basically I disabled explore_noise when NoisyLinear layers were used. Therefore the bad results with NoisyLinear layer were mainly due to the absence of the exploration. After the fix, the results with NoisyLinear are comparable to, but not always better than, the results without. Since we only have a single run, I wouldn't draw any conclusion based on it. Now I will not use "--no-noisy" option for the reported results in README.md . There is one exception: Seaquest. When I accidentally disabled explore_noise, I got much higher results (~16000 vs ~2300). I couldn't figure out the reason. Anyway I think this PR is ready to merge. |
Cool, I'll take a look this weekend |
@nuance1979 is that the cause of bad performance? Could you please check it when you're free? Many thanks! |
- add RainbowPolicy - add `set_beta` method in prio_buffer - add NoisyLinear in utils/network
I am currently running Atari examples. Will update the results soon.