-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add Fully-parameterized Quantile Function #376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nuance1979
commented
Jun 4, 2021
- Add Fully-parameterized Quantile Function
- Reference: paper and code
- Note:
- For some reason, in my experiments, I encountered mode collapse (entropy of taus goes to zero; fraction loss skyrockets; quantile loss goes to zero; test rewards go to almost zero) much more often than both the paper and the reference implementation had suggested (they mentioned odds like 1 in 20 seeds). Therefore I set the default value of entropy regularization coefficient to 10.
- I tried hard to figure out the reason by making my implementation as close to the reference as possible, sometimes at the expense of making the code unnecessarily complicated and the computation, wasteful. However, I still experience mode collapse quite often. Since I didn't see any gain (e.g., in training stability), I removed those modification to be more consistent with the tianshou conventions.
- The training time is much longer than IQN, as expected.
- I will rerun all experiments and update results after code review and revisions are done.
Codecov Report
@@ Coverage Diff @@
## master #376 +/- ##
==========================================
+ Coverage 94.37% 94.45% +0.08%
==========================================
Files 56 57 +1
Lines 3644 3751 +107
==========================================
+ Hits 3439 3543 +104
- Misses 205 208 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Why? Have you aligned with that repo with all details? Because their result shows FQF/IQN/QR-DQN all have similar convergence time. |
I meant it was slower in terms of wall clock time, not in terms of training steps, which I believe is what you meant by "convergence time". The slowdown in wall clock time is simply due to the extra computation of the fraction proposal net and its gradient. I'll post some numbers and pictures once I get them. |
Oh I see, so is it possible to reduce the model forward time by reusing the previous result? The profiling result shows it takes a lot of time in the network part. |
Sure. I know there is a place in |
Wait. I take it back. It wasn't exactly the same computation because the feature net of the target model is different from that of the online model. However, since they usually do not differ much, the reference implementation does take this shortcut to save some computation. I'm running an experiment to see how much wall clock time it can save. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more other comments, great job!
Just came here to say thank you for this contribution but also thank you for explaining why the default entropy was set to 10.0 - I was specifically wondering this and found the explanation helpful! |