-
Notifications
You must be signed in to change notification settings - Fork 74.8k
Description
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
binary
TensorFlow version
2.20.0.dev20250314
Custom code
No
OS platform and distribution
Windows 11 - WSL2 - Ubuntu 22.04.5 LTS
Mobile device
No response
Python version
3.10.12
Bazel version
7.4.1
GCC/compiler version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
CUDA/cuDNN version
CUDA Version: 12.8
GPU model and memory
RTX 5090 32GB
Current behavior?
I had hoped that tensorflow would work on the RTX 5090 at all. It does not, sadly. I tried building from source but that didn't work either. I tried running the environment script but that didn't work either. At least bash is my primary programming language, so I was able to tidy that one up here:
But I wasn't able to get tensorflow running. I had a similar issue with PyTorch, which needed to use CUDA 12.8.* to work on the Blackwell cards, but no dice with the nightly build of tensorflow. Below is my test and the output, and under that is the tf_env.txt
from my patched script.
It may be helpful to know that nvidia themselves seem to have it running here:
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel-25-02.html
But I get the same errors that this other guy does when I try it out:
https://www.reddit.com/r/tensorflow/comments/1iutjoj/tensorflow_2501_cuda_128_rtx_5090_on_wsl2_cuda/
This conversation was another one I found that may be helpful, according to these guys, you need to support CUDA 12.8.1 to support Blackwell (aka the RTX 50## series cards):
https://discuss.ai.google.dev/t/building-tensorflow-from-source-for-rtx5000-gpu-series/65171/15
(tfnightie) mitch@win11ml:~/stable_diff
$ cat tfnightie/test_2.py
import tensorflow as tf
import time
# Check if TensorFlow sees the GPU
print("TensorFlow version:", tf.__version__)
print("Available GPUs:", tf.config.experimental.list_physical_devices('GPU'))
# Matrix multiplication test
shape = (5000, 5000)
a = tf.random.normal(shape)
b = tf.random.normal(shape)
# Time execution on GPU
with tf.device('/GPU:0'):
print("Running on GPU...")
start_time = time.time()
c = tf.matmul(a, b)
tf.print("Matrix multiplication (GPU) done.")
print("Execution time (GPU):", time.time() - start_time, "seconds")
# Time execution on CPU for comparison
with tf.device('/CPU:0'):
print("Running on CPU...")
start_time = time.time()
c = tf.matmul(a, b)
tf.print("Matrix multiplication (CPU) done.")
print("Execution time (CPU):", time.time() - start_time, "seconds")
(tfnightie) mitch@win11ml:~/stable_diff
$ python tfnightie/test_2.py
2025-03-14 21:35:33.400099: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow version: 2.20.0-dev20250314
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1742009735.413544 326199 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
Available GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
W0000 00:00:1742009735.417720 326199 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1742009735.572153 326199 gpu_device.cc:2018] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29043 MB memory: -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:09:00.0, compute capability: 12.0
2025-03-14 21:35:36.969440: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_PTX'
2025-03-14 21:35:36.969480: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleGetFunction(&function, module, kernel_name)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2025-03-14 21:35:36.969505: W tensorflow/core/framework/op_kernel.cc:1843] INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2025-03-14 21:35:36.969533: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
Traceback (most recent call last):
File "/home/mitch/stable_diff/tfnightie/test_2.py", line 10, in <module>
a = tf.random.normal(shape)
File "/home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 6027, in raise_from_not_ok_status
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Mul_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Mul] name:
Also, while nvidia's site says that the Compute Capability of the RTX5090 is "10.0", the card itself seems to report "12.0". I am not so sure that info will be helpful, but it spun me for a loop:
$ cat <<EOF > card_details.cu
> #include <cuda_runtime.h>
#include <iostream>
int main() {
cudaDeviceProp prop;
int device;
cudaGetDevice(&device); // Get the current device ID
cudaGetDeviceProperties(&prop, device); // Get device properties
size_t free_mem, total_mem;
cudaMemGetInfo(&free_mem, &total_mem); // Get VRAM usage
std::cout << "GPU Name: " << prop.name << std::endl;
std::cout << "Compute Capability: " << prop.major << "." << prop.minor << std::endl;
std::cout << "VRAM Usage: " << (total_mem - free_mem) / (1024 * 1024) << " MB / " << total_mem / (1024 * 1024) << " MB" << std::endl;
return 0;
}
EOF
$ nvcc card_details.cu -o card_details && ./card_details
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
GPU Name: NVIDIA GeForce RTX 5090
Compute Capability: 12.0
VRAM Usage: 1763 MB / 32606 MB
tf_env.txt
== check python ====================================================
python version: 3.10.12
python branch:
python build version: ('main', 'Feb 4 2025 14:57:36')
python compiler version: GCC 11.4.0
python implementation: CPython
== check os platform ===============================================
os: Linux
os kernel version: #1 SMP Tue Nov 5 00:21:55 UTC 2024
os release version: 5.15.167.4-microsoft-standard-WSL2
os platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
freedesktop os release: {'NAME': 'Ubuntu', 'ID': 'ubuntu', 'PRETTY_NAME': 'Ubuntu 22.04.5 LTS', 'VERSION_ID': '22.04', 'VERSION': '22.04.5 LTS (Jammy Jellyfish)', 'VERSION_CODENAME': 'jammy', 'ID_LIKE': 'debian', 'HOME_URL': 'https://www.ubuntu.com/', 'SUPPORT_URL': 'https://help.ubuntu.com/', 'BUG_REPORT_URL': 'https://bugs.launchpad.net/ubuntu/', 'PRIVACY_POLICY_URL': 'https://www.ubuntu.com/legal/terms-and-policies/privacy-policy', 'UBUNTU_CODENAME': 'jammy'}
mac version: ('', ('', '', ''), '')
uname: uname_result(system='Linux', node='win11ml', release='5.15.167.4-microsoft-standard-WSL2', version='#1 SMP Tue Nov 5 00:21:55 UTC 2024', machine='x86_64')
architecture: ('64bit', 'ELF')
machine: x86_64
== are we in docker ================================================
No
== c++ compiler ====================================================
/usr/bin/c++
c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
== check pips ======================================================
numpy 2.1.3
protobuf 5.29.3
tf_nightly 2.20.0.dev20250314
== check for virtualenv ============================================
Running inside a virtual environment.
== tensorflow import ===============================================
2025-03-14 21:02:48.002965: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1742007769.198398 317963 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
W0000 00:00:1742007769.202246 317963 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1742007769.355021 317963 gpu_device.cc:2018] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29043 MB memory: -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:09:00.0, compute capability: 12.0
tf.version.VERSION = 2.20.0-dev20250314
tf.version.GIT_VERSION = v1.12.1-123444-g07ff428d432
tf.version.COMPILER_VERSION = Ubuntu Clang 18.1.8 (++20240731024944+3b5b5c1ec4a3-1~exp1~20240731145000.144)
Sanity check: <tf.Tensor: shape=(1,), dtype=int32, numpy=array([1], dtype=int32)>
libcudnn not found
== env =============================================================
LD_LIBRARY_PATH /usr/local/cuda-12.8/lib64:
DYLD_LIBRARY_PATH is unset
== nvidia-smi ======================================================
Fri Mar 14 21:02:52 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06 Driver Version: 572.70 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:09:00.0 Off | N/A |
| 0% 43C P1 78W / 600W | 2115MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 31 G /Xwayland N/A |
| 0 N/A N/A 35 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
== cuda libs =======================================================
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so.11.8.89
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart.so.12.8.90
== tensorflow installation =========================================
tensorflow not found
== tf_nightly installation =========================================
Name: tf_nightly
Version: 2.20.0.dev20250314
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages
Required-by:
== python version ==================================================
(major, minor, micro, releaselevel, serial)
(3, 10, 12, 'final', 0)
== bazel version ===================================================
Bazelisk version: v1.25.0
Build label: 7.4.1
Build time: Mon Nov 11 21:24:53 2024 (1731360293)
Build timestamp: 1731360293
Build timestamp as int: 1731360293
Standalone code to reproduce the issue
Try running anything with an RTX 5090. My test script is above.