-
Notifications
You must be signed in to change notification settings - Fork 245
[STF] Allow CUfunction/CUkernel (driver API) in the cuda_kernel(_chain) API #5215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/ok to test b3304a1 |
/ok to test 8651e9f |
/ok to test 0a0cd17 |
🟩 CI finished in 33m 48s: Pass: 100%/32 | Total: 6h 58m | Avg: 13m 05s | Max: 28m 23s | Hits: 75%/16246
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 32)
# | Runner |
---|---|
17 | linux-amd64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
4 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1 |
…rs used by kernels
/ok to test fd42c70 |
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
/ok to test e12de1c |
🟩 CI finished in 28m 23s: Pass: 100%/32 | Total: 7h 07m | Avg: 13m 21s | Max: 28m 19s | Hits: 75%/16246
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 32)
# | Runner |
---|---|
17 | linux-amd64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
4 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1 |
/ok to test 74a4577 |
🟩 CI finished in 14m 51s: Pass: 100%/32 | Total: 2h 46m | Avg: 5m 12s | Max: 11m 31s | Hits: 98%/15930
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 32)
# | Runner |
---|---|
17 | linux-amd64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
4 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1 |
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
cudax/include/cuda/experimental/__stf/internal/cuda_kernel_scope.cuh
Outdated
Show resolved
Hide resolved
Made a pass - thanks @caugonnet and thanks @davebayer for the good points |
Co-authored-by: Andrei Alexandrescu <andrei@erdani.com>
…pe.cuh Co-authored-by: Andrei Alexandrescu <andrei@erdani.com>
/ok to test 76a9a45 |
/ok to test 01d638f |
🟩 CI finished in 31m 09s: Pass: 100%/32 | Total: 7h 06m | Avg: 13m 19s | Max: 30m 47s | Hits: 75%/15930
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 32)
# | Runner |
---|---|
17 | linux-amd64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
4 | linux-arm64-cpu16 |
4 | windows-amd64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1 |
/ok to test 4c8cfaf |
🟩 CI finished in 31m 42s: Pass: 100%/32 | Total: 7h 07m | Avg: 13m 22s | Max: 29m 09s | Hits: 75%/15930
|
Project | |
---|---|
CCCL Infrastructure | |
CCCL Packaging | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | CCCL Packaging |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 32)
# | Runner |
---|---|
17 | linux-amd64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
4 | linux-arm64-cpu16 |
4 | windows-amd64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1 |
{ | ||
|
||
template <typename T> | ||
inline constexpr bool is_function_or_kernel_v = ::std::is_same_v<T, CUfunction> || ::std::is_same_v<T, CUkernel>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: wouldn't it make sense to call it is_cufunction_or_cukernel_v
?
auto* ker_ptr = ::std::get_if<CUfunction>(&func_variant); | ||
if (!ker_ptr) | ||
{ | ||
// If this is a CUkernel, the cast to a CUfunction is sufficient | ||
ker_ptr = reinterpret_cast<const CUfunction*>(::std::get_if<CUkernel>(&func_variant)); | ||
} | ||
return cuda_try<cuFuncGetAttribute>(CU_FUNC_ATTRIBUTE_NUM_REGS, *ker_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I am not sure whether this is correct. The documentation explicitly sais that cuLauchKernel
can be used with (CUfunction)cukernel
, but there is no such a note for cuFuncGetAttribute
. Especially when there is a cuKernelGetAttribute
function, that takes CUkernel
+ CUdevice
arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that CUkernel is really transformed into the same as a CUfunction so you can call methods for CUfunction on it, the cast might do the work of getting the underlying current device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have serios doubts here. Why would functions like cuKernelGetFution
or cuKernelSetAttribute
exist then? It may be possible that they do exactly what you say now, but I believe the right thing is to use the cuKernelXxx
functions for CUkernel
and cuFuncXxx
function for CUfunction
.
If you don't want to use them, you should verify this internally :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you are not necessarily in the device context of the device for which you try to extract the CUfunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davebayer what i'll do is that i'll add a new unit test then, checking if the driver API and runtime API get the same number for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's all possible. My only concern is that are we sure this thing won't change in the future?
Description
Make it possible to pass a CUfunction to ctx.cuda_kernel and ctx.cuda_kernel_chain constructs
closes
Checklist