-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Numba acceleration #193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numba acceleration #193
Conversation
Could you add more information about the relevance of improving the efficiency of 'segtree' ? How much time is spent in it ? |
Sure, posted on the top. |
@duburcqa ready for review, any comment is appreciated! |
tianshou/policy/base.py
Outdated
return target_q | ||
|
||
|
||
def _compile(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to precompile the function ? I don't see the point. It goes against the JIT (Just-in-Time compilation) principle. I would use instead AOT (Ahead-of-Time compilation) if you really think it is better to pre-compile whose methods.
Here is the official documentation, including examples. Note that it is possible to define the shape of the input arrays without passing examples for both JIT and AOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I post the reason in the docstring. Since I want to compare the training time of several versions, if we do not compile the main function pattern ahead of time, the first-time compilation time will add onto the training time in the collector, and thus it will show that Numba slows down the whole process (from 2050 fps to 1800 fps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think AOT is a good choice since we cannot determine some arguments' type ahead of time, for example, v_s_
in GAE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would rather move _compile at the beginning of the brenchmark, rather than here. I thing it makes more sense, because it is really just a benchmarking issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think AOT is a good choice since we cannot determine some arguments' type ahead of time, for example, v_s_ in GAE.
I don't now, it is strange. Compilation support variable size numpy array, just the number of dimensions must be known in advance. Knowing that you can also precompile several signatures if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _compile
should under tianshou.utils
and include also SegTree. I'll change it.
Nice ! Good to go now :) |
test/continuous/test_td3.py
Outdated
from tianshou.utils import pre_compile | ||
pre_compile() # exclude compilation time to get the correct train_speed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a bit strange for newcomers. Maybe pre_compile
can be called during tianshou.init .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but this conflicts with @duburcqa #193 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I would rather move _compile at the beginning of the brenchmark, rather than here. I thing it makes more sense, because it is really just a benchmarking issue.
There are many benchmarks in tianshou, but some of the benchmarks (in test
folder) are also examples for beginners. Therefore, I think it is not a good solution to add pre-compile for every benchmark. Pre-compile in tianshou.__init__.py
should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get your point but it is an anti-pattern. JIT is for jit. If precompile is actually able to precompile all you need, then I would use, it works just the same as jit and it is actually intenses to do what you are looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well AOT is not exactly as portable as JIT in practice because of some limitations. You can put it in init if you want but I really don't like the idea. I would rather keep it as it is right now. Maybe a more comprehensive name for beginner would be enough, like 'init_profiler'. It is not exactly what it does but it is what it is designed for. From this name plus a comment it would be clear it is there on the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is a sad thing. People have never been able to properly benchmark a code, you can see this in all but one in a million posts related to stuff this on stackoverflow. It is a petty to lower our standard to match people programming knowledge, especially from people not contributing but only complaining. But I get your point too and I understand your point of view. At the end it can be changed in the future pretty easily, it is not an code design issue so it is not a big deal. As I see you seem to both agree on this point, maybe you should go for it.
I have no more commits on this pr. |
Training FPS improvement (base commit is 94bfb32): test_pdqn: 1660 (without numba) -> 1930 discrete/test_ppo: 5100 -> 5170 since nstep has little impact on overall performance, the unit test result is: GAE: 4.1s -> 0.057s nstep: 0.3s -> 0.15s (little improvement) Others: - fix a bug in ttt set_eps - keep only sumtree in segment tree implementation - dirty fix for asyncVenv check_id test
Training FPS improvement (base commit is 94bfb32): test_pdqn: 1660 (without numba) -> 1930 discrete/test_ppo: 5100 -> 5170 since nstep has little impact on overall performance, the unit test result is: GAE: 4.1s -> 0.057s nstep: 0.3s -> 0.15s (little improvement) Others: - fix a bug in ttt set_eps - keep only sumtree in segment tree implementation - dirty fix for asyncVenv check_id test
Training FPS improvement (base commit is 94bfb32):
test_pdqn: 1660 (without numba) -> 1930
discrete/test_ppo: 5100 -> 5170
since nstep has little impact on overall performance, the unit test result is:
GAE: 4.1s -> 0.057s
nstep: 0.3s -> 0.15s (little improvement)
Others: