Add Fully-parameterized Quantile Function #376

nuance1979 · 2021-06-04T20:18:31Z

Add Fully-parameterized Quantile Function
Reference: paper and code
Note:
- For some reason, in my experiments, I encountered mode collapse (entropy of taus goes to zero; fraction loss skyrockets; quantile loss goes to zero; test rewards go to almost zero) much more often than both the paper and the reference implementation had suggested (they mentioned odds like 1 in 20 seeds). Therefore I set the default value of entropy regularization coefficient to 10.
- I tried hard to figure out the reason by making my implementation as close to the reference as possible, sometimes at the expense of making the code unnecessarily complicated and the computation, wasteful. However, I still experience mode collapse quite often. Since I didn't see any gain (e.g., in training stability), I removed those modification to be more consistent with the tianshou conventions.
- The training time is much longer than IQN, as expected.
- I will rerun all experiments and update results after code review and revisions are done.

codecov-commenter · 2021-06-04T20:32:07Z

Codecov Report

Merging #376 (cd90a99) into master (21b2b22) will increase coverage by 0.08%.
The diff coverage is 97.22%.

@@            Coverage Diff             @@
##           master     #376      +/-   ##
==========================================
+ Coverage   94.37%   94.45%   +0.08%     
==========================================
  Files          56       57       +1     
  Lines        3644     3751     +107     
==========================================
+ Hits         3439     3543     +104     
- Misses        205      208       +3

Flag	Coverage Δ
unittests	`94.45% <97.22%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tianshou/policy/modelfree/fqf.py	`95.71% <95.71%> (ø)`
tianshou/policy/__init__.py	`100.00% <100.00%> (ø)`
tianshou/utils/net/discrete.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21b2b22...cd90a99. Read the comment docs.

tianshou/policy/modelfree/fqf.py

test/discrete/test_fqf.py

Trinkle23897 · 2021-06-06T02:56:38Z

The training time is much longer than IQN, as expected.

Why? Have you aligned with that repo with all details? Because their result shows FQF/IQN/QR-DQN all have similar convergence time.

nuance1979 · 2021-06-06T16:31:37Z

The training time is much longer than IQN, as expected.

Why? Have you aligned with that repo with all details? Because their result shows FQF/IQN/QR-DQN all have similar convergence time.

I meant it was slower in terms of wall clock time, not in terms of training steps, which I believe is what you meant by "convergence time". The slowdown in wall clock time is simply due to the extra computation of the fraction proposal net and its gradient. I'll post some numbers and pictures once I get them.

nuance1979 · 2021-06-06T18:51:44Z

For example, a single run of Qbert: ~7h for IQN vs ~11h for FQF. Best reward: 15837.50 for IQN vs 15650 for FQF. (FQF is not a clear winner in Qbert; in some other games, it is.)

IQN reward:

FQF reward:

Trinkle23897 · 2021-06-07T01:11:53Z

I meant it was slower in terms of wall clock time

Oh I see, so is it possible to reduce the model forward time by reusing the previous result? The profiling result shows it takes a lot of time in the network part.
I'll try to understand the complex operation in the network and see if there's any way we can optimize it.

nuance1979 · 2021-06-07T05:04:50Z

I meant it was slower in terms of wall clock time

Oh I see, so is it possible to reduce the model forward time by reusing the previous result? The profiling result shows it takes a lot of time in the network part.
I'll try to understand the complex operation in the network and see if there's any way we can optimize it.

Sure. I know there is a place in _target_q() where the same computation (one forward pass of fraction proposal net) is performed twice. I can start by optimizing it.

nuance1979 · 2021-06-07T17:39:49Z

I meant it was slower in terms of wall clock time

Oh I see, so is it possible to reduce the model forward time by reusing the previous result? The profiling result shows it takes a lot of time in the network part.
I'll try to understand the complex operation in the network and see if there's any way we can optimize it.

Sure. I know there is a place in _target_q() where the same computation (one forward pass of fraction proposal net) is performed twice. I can start by optimizing it.

Wait. I take it back. It wasn't exactly the same computation because the feature net of the target model is different from that of the online model. However, since they usually do not differ much, the reference implementation does take this shortcut to save some computation. I'm running an experiment to see how much wall clock time it can save.

nuance1979 · 2021-06-08T03:24:17Z

Using the shortcut mentioned above, a single run of Qbert takes ~9.5h now. Best reward: 16172.5 . So it is faster without loss of performance. However, the code does look a bit ugly.

Trinkle23897

No more other comments, great job!

tianshou/policy/modelfree/fqf.py

emrul · 2023-10-02T20:20:26Z

Just came here to say thank you for this contribution but also thank you for explaining why the default entropy was set to 10.0 - I was specifically wondering this and found the explanation helpful!

Yi Su added 5 commits May 30, 2021 05:33

Add Fully parameterized Quantile Function

882532f

move fraction grad computation from model to policy

e526f0a

refactor model output

59cb39b

fix a bug about model output

79b210c

clean up unused code

42adcda

Yi Su added 2 commits June 5, 2021 05:37

make mypy happy

7e51333

add test for fqf

baa961e

Trinkle23897 linked an issue Jun 5, 2021 that may be closed by this pull request

Implicit Quantile Network (IQN) and Fully parameterized Quantile Function (FQF) #370

Closed

8 tasks

format

9d28ea3

Trinkle23897 reviewed Jun 6, 2021

View reviewed changes

tianshou/policy/modelfree/fqf.py Outdated Show resolved Hide resolved

test/discrete/test_fqf.py Outdated Show resolved Hide resolved

nuance1979 and others added 2 commits June 5, 2021 22:01

address review comments

02a295f

fix format

37810a5

Yi Su and others added 4 commits June 9, 2021 00:55

save some computation by sharing fraction proposal

6e33599

update fqf results and reward plots

0d465dc

Merge branch 'master' into fqf_new

981fa7c

format

e3dafa3

Trinkle23897 reviewed Jun 12, 2021

View reviewed changes

tianshou/policy/modelfree/fqf.py Show resolved Hide resolved

fix documentation

cd90a99

Trinkle23897 approved these changes Jun 15, 2021

View reviewed changes

Trinkle23897 merged commit c0bc8e0 into thu-ml:master Jun 15, 2021

nuance1979 deleted the fqf_new branch October 6, 2021 17:27

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024

Add Fully-parameterized Quantile Function (thu-ml#376)

7c7d51d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Fully-parameterized Quantile Function #376

Add Fully-parameterized Quantile Function #376

Uh oh!

nuance1979 commented Jun 4, 2021

Uh oh!

codecov-commenter commented Jun 4, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 commented Jun 6, 2021 •

edited

Loading

Uh oh!

nuance1979 commented Jun 6, 2021

Uh oh!

nuance1979 commented Jun 6, 2021

Uh oh!

Trinkle23897 commented Jun 7, 2021

Uh oh!

nuance1979 commented Jun 7, 2021

Uh oh!

nuance1979 commented Jun 7, 2021

Uh oh!

nuance1979 commented Jun 8, 2021

Uh oh!

Trinkle23897 left a comment

Uh oh!

Uh oh!

emrul commented Oct 2, 2023

Uh oh!

Uh oh!

Add Fully-parameterized Quantile Function #376

Add Fully-parameterized Quantile Function #376

Uh oh!

Conversation

nuance1979 commented Jun 4, 2021

Uh oh!

codecov-commenter commented Jun 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Trinkle23897 commented Jun 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nuance1979 commented Jun 6, 2021

Uh oh!

nuance1979 commented Jun 6, 2021

Uh oh!

Trinkle23897 commented Jun 7, 2021

Uh oh!

nuance1979 commented Jun 7, 2021

Uh oh!

nuance1979 commented Jun 7, 2021

Uh oh!

nuance1979 commented Jun 8, 2021

Uh oh!

Trinkle23897 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

emrul commented Oct 2, 2023

Uh oh!

Uh oh!

codecov-commenter commented Jun 4, 2021 •

edited

Loading

Trinkle23897 commented Jun 6, 2021 •

edited

Loading