这是indexloc提供的服务,不要输入任何密码
Skip to content

Numba acceleration #193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Sep 2, 2020
Merged

Numba acceleration #193

merged 18 commits into from
Sep 2, 2020

Conversation

Trinkle23897
Copy link
Collaborator

@Trinkle23897 Trinkle23897 commented Aug 27, 2020

Training FPS improvement (base commit is 94bfb32):
test_pdqn: 1660 (without numba) -> 1930
discrete/test_ppo: 5100 -> 5170

since nstep has little impact on overall performance, the unit test result is:
GAE: 4.1s -> 0.057s
nstep: 0.3s -> 0.15s (little improvement)

Others:

  • fix a bug in ttt set_eps
  • keep only sumtree in segment tree implementation
  • dirty fix for asyncVenv check_id test

@Trinkle23897 Trinkle23897 marked this pull request as draft August 27, 2020 06:48
@youkaichao youkaichao marked this pull request as ready for review August 27, 2020 14:20
@youkaichao youkaichao marked this pull request as draft August 27, 2020 14:23
@youkaichao youkaichao marked this pull request as ready for review August 27, 2020 14:24
@youkaichao youkaichao marked this pull request as draft August 27, 2020 14:24
@duburcqa
Copy link
Collaborator

Could you add more information about the relevance of improving the efficiency of 'segtree' ? How much time is spent in it ?

@Trinkle23897
Copy link
Collaborator Author

Could you add more information about the relevance of improving the efficiency of 'segtree' ? How much time is spent in it ?

Sure, posted on the top.

@Trinkle23897 Trinkle23897 marked this pull request as ready for review August 30, 2020 09:57
@Trinkle23897 Trinkle23897 requested a review from duburcqa August 30, 2020 13:36
@Trinkle23897
Copy link
Collaborator Author

@duburcqa ready for review, any comment is appreciated!

return target_q


def _compile():
Copy link
Collaborator

@duburcqa duburcqa Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to precompile the function ? I don't see the point. It goes against the JIT (Just-in-Time compilation) principle. I would use instead AOT (Ahead-of-Time compilation) if you really think it is better to pre-compile whose methods.

Here is the official documentation, including examples. Note that it is possible to define the shape of the input arrays without passing examples for both JIT and AOT.

Copy link
Collaborator Author

@Trinkle23897 Trinkle23897 Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I post the reason in the docstring. Since I want to compare the training time of several versions, if we do not compile the main function pattern ahead of time, the first-time compilation time will add onto the training time in the collector, and thus it will show that Numba slows down the whole process (from 2050 fps to 1800 fps).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think AOT is a good choice since we cannot determine some arguments' type ahead of time, for example, v_s_ in GAE.

Copy link
Collaborator

@duburcqa duburcqa Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would rather move _compile at the beginning of the brenchmark, rather than here. I thing it makes more sense, because it is really just a benchmarking issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think AOT is a good choice since we cannot determine some arguments' type ahead of time, for example, v_s_ in GAE.

I don't now, it is strange. Compilation support variable size numpy array, just the number of dimensions must be known in advance. Knowing that you can also precompile several signatures if necessary.

Copy link
Collaborator

@duburcqa duburcqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review!

Copy link
Collaborator Author

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _compile should under tianshou.utils and include also SegTree. I'll change it.

@duburcqa
Copy link
Collaborator

Nice ! Good to go now :)

Comment on lines 123 to 124
from tianshou.utils import pre_compile
pre_compile() # exclude compilation time to get the correct train_speed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit strange for newcomers. Maybe pre_compile can be called during tianshou.init .

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but this conflicts with @duburcqa #193 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I would rather move _compile at the beginning of the brenchmark, rather than here. I thing it makes more sense, because it is really just a benchmarking issue.

There are many benchmarks in tianshou, but some of the benchmarks (in test folder) are also examples for beginners. Therefore, I think it is not a good solution to add pre-compile for every benchmark. Pre-compile in tianshou.__init__.py should be enough.

@duburcqa

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point but it is an anti-pattern. JIT is for jit. If precompile is actually able to precompile all you need, then I would use, it works just the same as jit and it is actually intenses to do what you are looking for.

Copy link
Collaborator

@duburcqa duburcqa Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well AOT is not exactly as portable as JIT in practice because of some limitations. You can put it in init if you want but I really don't like the idea. I would rather keep it as it is right now. Maybe a more comprehensive name for beginner would be enough, like 'init_profiler'. It is not exactly what it does but it is what it is designed for. From this name plus a comment it would be clear it is there on the CI.

Copy link
Collaborator Author

@Trinkle23897 Trinkle23897 Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But beginners may build up their own code and benchmark and say, "Hey, your provided result is not compatible with my own test", something like #169 and #196. That's a sad thing.
Also, maybe more functions will be added to the future version. Perhaps they need to be compiled too.

Copy link
Collaborator

@duburcqa duburcqa Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a sad thing. People have never been able to properly benchmark a code, you can see this in all but one in a million posts related to stuff this on stackoverflow. It is a petty to lower our standard to match people programming knowledge, especially from people not contributing but only complaining. But I get your point too and I understand your point of view. At the end it can be changed in the future pretty easily, it is not an code design issue so it is not a big deal. As I see you seem to both agree on this point, maybe you should go for it.

@Trinkle23897
Copy link
Collaborator Author

I have no more commits on this pr.

@Trinkle23897 Trinkle23897 merged commit a746187 into thu-ml:master Sep 2, 2020
@Trinkle23897 Trinkle23897 deleted the numba branch September 2, 2020 04:50
Trinkle23897 added a commit that referenced this pull request Sep 2, 2020
Training FPS improvement (base commit is 94bfb32):
test_pdqn: 1660 (without numba) -> 1930
discrete/test_ppo: 5100 -> 5170

since nstep has little impact on overall performance, the unit test result is:
GAE: 4.1s -> 0.057s
nstep: 0.3s -> 0.15s (little improvement)

Others:
- fix a bug in ttt set_eps
- keep only sumtree in segment tree implementation
- dirty fix for asyncVenv check_id test
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
Training FPS improvement (base commit is 94bfb32):
test_pdqn: 1660 (without numba) -> 1930
discrete/test_ppo: 5100 -> 5170

since nstep has little impact on overall performance, the unit test result is:
GAE: 4.1s -> 0.057s
nstep: 0.3s -> 0.15s (little improvement)

Others:
- fix a bug in ttt set_eps
- keep only sumtree in segment tree implementation
- dirty fix for asyncVenv check_id test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants