-
Notifications
You must be signed in to change notification settings - Fork 85
Description
Recently when benchmarking libcudf on a DGX system, I ran into an issue where the MR setup by libcudf would only be respected by nvbench on GPU0. We observed that a CUDA MR would be used in place of the MR provided by libcudf. However, the compute did run on the correct GPU as specified by devices
, so the root cause may be different than some related issues (e.g. #113).
This works and uses the pool MR default on GPU4
nsys profile -f true --gpu-metrics-device=all --output=report_cudavis --env-var CUDA_VISIBLE_DEVICES=4 ./STREAM_COMPACTION_NVBENCH -d 0 -b 1 -a NumRows=100000000 --timeout 0.3 -a Type=[I32] -a keep=[any] -a cardinality=10000000
This does not work and somehow uses a CUDA MR on GPU4
nsys profile -f true --gpu-metrics-device=all --output=report ./STREAM_COMPACTION_NVBENCH -d 4 -b 1 -a NumRows=100000000 --timeout 0.3 -a Type=[I32] -a keep=[any] -a cardinality=10000000