-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add discrete Critic Regularized Regression #367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nuance1979
commented
May 14, 2021
- Add discrete Critic Regularized Regression
- Reference: paper and code
- Note that CRR itself does not seem to work well on Atari tasks but adding CQL loss/regularizer helps.
Codecov Report
@@ Coverage Diff @@
## master #367 +/- ##
==========================================
- Coverage 94.55% 94.53% -0.02%
==========================================
Files 54 55 +1
Lines 3505 3567 +62
==========================================
+ Hits 3314 3372 +58
- Misses 191 195 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
FYI, I tried a variant of CRR where I replaced the DQN critic with a QRDQN (num_quantiles=200). It was slightly better but not much: For Breakout, after 40 iterations, I got 161.4 for QRCRR vs 137.2 for CRR; for Pong, after 40 iterations, I got 18.5 for QRCRR vs 16.8 for CRR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you ever considered inheriting from QRDQN or DiscreteCQL instead of PG? Because the only reused part is forward
and it is really simple (just a few line, just discrete
, no other cases in PGPolicy) -- and maybe that would be more concise?
Yes, I have. But after some thought, I chose to inherit from PGPolicy to emphasize the fact that CRR is a policy gradient model, i.e., the critic, be it DQN, QRDQN or else, is only a by-product of the training process and does NOT contribute to the evaluation of the policy. I also thought about multiple inheritance, i.e., let CRR inherits from both PGPolicy and DQN or QRDQN or DiscreteCQL but I realized that it would not make the implementation simpler. For example, because none of them has a The only place I can think of improving is to refactor the Anyway that's just my opinion. I can certainly change it if you insist. |