这是indexloc提供的服务,不要输入任何密码
Skip to content

Add Fully-parameterized Quantile Function #376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 15, 2021

Conversation

nuance1979
Copy link
Collaborator

  • Add Fully-parameterized Quantile Function
  • Reference: paper and code
  • Note:
    • For some reason, in my experiments, I encountered mode collapse (entropy of taus goes to zero; fraction loss skyrockets; quantile loss goes to zero; test rewards go to almost zero) much more often than both the paper and the reference implementation had suggested (they mentioned odds like 1 in 20 seeds). Therefore I set the default value of entropy regularization coefficient to 10.
    • I tried hard to figure out the reason by making my implementation as close to the reference as possible, sometimes at the expense of making the code unnecessarily complicated and the computation, wasteful. However, I still experience mode collapse quite often. Since I didn't see any gain (e.g., in training stability), I removed those modification to be more consistent with the tianshou conventions.
    • The training time is much longer than IQN, as expected.
    • I will rerun all experiments and update results after code review and revisions are done.

@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2021

Codecov Report

Merging #376 (cd90a99) into master (21b2b22) will increase coverage by 0.08%.
The diff coverage is 97.22%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #376      +/-   ##
==========================================
+ Coverage   94.37%   94.45%   +0.08%     
==========================================
  Files          56       57       +1     
  Lines        3644     3751     +107     
==========================================
+ Hits         3439     3543     +104     
- Misses        205      208       +3     
Flag Coverage Δ
unittests 94.45% <97.22%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/policy/modelfree/fqf.py 95.71% <95.71%> (ø)
tianshou/policy/__init__.py 100.00% <100.00%> (ø)
tianshou/utils/net/discrete.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21b2b22...cd90a99. Read the comment docs.

@Trinkle23897 Trinkle23897 linked an issue Jun 5, 2021 that may be closed by this pull request
8 tasks
@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Jun 6, 2021

  • The training time is much longer than IQN, as expected.

Why? Have you aligned with that repo with all details? Because their result shows FQF/IQN/QR-DQN all have similar convergence time.

@nuance1979
Copy link
Collaborator Author

  • The training time is much longer than IQN, as expected.

Why? Have you aligned with that repo with all details? Because their result shows FQF/IQN/QR-DQN all have similar convergence time.

I meant it was slower in terms of wall clock time, not in terms of training steps, which I believe is what you meant by "convergence time". The slowdown in wall clock time is simply due to the extra computation of the fraction proposal net and its gradient. I'll post some numbers and pictures once I get them.

@nuance1979
Copy link
Collaborator Author

For example, a single run of Qbert: ~7h for IQN vs ~11h for FQF. Best reward: 15837.50 for IQN vs 15650 for FQF. (FQF is not a clear winner in Qbert; in some other games, it is.)

IQN reward:
Qbert_rew

FQF reward:
Qbert_rew_fqf_new

@Trinkle23897
Copy link
Collaborator

I meant it was slower in terms of wall clock time

Oh I see, so is it possible to reduce the model forward time by reusing the previous result? The profiling result shows it takes a lot of time in the network part.
I'll try to understand the complex operation in the network and see if there's any way we can optimize it.

@nuance1979
Copy link
Collaborator Author

I meant it was slower in terms of wall clock time

Oh I see, so is it possible to reduce the model forward time by reusing the previous result? The profiling result shows it takes a lot of time in the network part.
I'll try to understand the complex operation in the network and see if there's any way we can optimize it.

Sure. I know there is a place in _target_q() where the same computation (one forward pass of fraction proposal net) is performed twice. I can start by optimizing it.

@nuance1979
Copy link
Collaborator Author

I meant it was slower in terms of wall clock time

Oh I see, so is it possible to reduce the model forward time by reusing the previous result? The profiling result shows it takes a lot of time in the network part.
I'll try to understand the complex operation in the network and see if there's any way we can optimize it.

Sure. I know there is a place in _target_q() where the same computation (one forward pass of fraction proposal net) is performed twice. I can start by optimizing it.

Wait. I take it back. It wasn't exactly the same computation because the feature net of the target model is different from that of the online model. However, since they usually do not differ much, the reference implementation does take this shortcut to save some computation. I'm running an experiment to see how much wall clock time it can save.

@nuance1979
Copy link
Collaborator Author

Using the shortcut mentioned above, a single run of Qbert takes ~9.5h now. Best reward: 16172.5 . So it is faster without loss of performance. However, the code does look a bit ugly.

Screen Shot 2021-06-07 at 8 16 13 PM

Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more other comments, great job!

@Trinkle23897 Trinkle23897 merged commit c0bc8e0 into thu-ml:master Jun 15, 2021
@nuance1979 nuance1979 deleted the fqf_new branch October 6, 2021 17:27
@emrul
Copy link

emrul commented Oct 2, 2023

Just came here to say thank you for this contribution but also thank you for explaining why the default entropy was set to 10.0 - I was specifically wondering this and found the explanation helpful!

BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implicit Quantile Network (IQN) and Fully parameterized Quantile Function (FQF)
4 participants