这是indexloc提供的服务,不要输入任何密码
Skip to content

TensorFlow on RTX 5090 #89272

@maludwig

Description

@maludwig

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.20.0.dev20250314

Custom code

No

OS platform and distribution

Windows 11 - WSL2 - Ubuntu 22.04.5 LTS

Mobile device

No response

Python version

3.10.12

Bazel version

7.4.1

GCC/compiler version

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

CUDA/cuDNN version

CUDA Version: 12.8

GPU model and memory

RTX 5090 32GB

Current behavior?

I had hoped that tensorflow would work on the RTX 5090 at all. It does not, sadly. I tried building from source but that didn't work either. I tried running the environment script but that didn't work either. At least bash is my primary programming language, so I was able to tidy that one up here:

#89271

But I wasn't able to get tensorflow running. I had a similar issue with PyTorch, which needed to use CUDA 12.8.* to work on the Blackwell cards, but no dice with the nightly build of tensorflow. Below is my test and the output, and under that is the tf_env.txt from my patched script.

It may be helpful to know that nvidia themselves seem to have it running here:

https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel-25-02.html

But I get the same errors that this other guy does when I try it out:

https://www.reddit.com/r/tensorflow/comments/1iutjoj/tensorflow_2501_cuda_128_rtx_5090_on_wsl2_cuda/

This conversation was another one I found that may be helpful, according to these guys, you need to support CUDA 12.8.1 to support Blackwell (aka the RTX 50## series cards):

https://discuss.ai.google.dev/t/building-tensorflow-from-source-for-rtx5000-gpu-series/65171/15


(tfnightie) mitch@win11ml:~/stable_diff
$ cat tfnightie/test_2.py
import tensorflow as tf
import time

# Check if TensorFlow sees the GPU
print("TensorFlow version:", tf.__version__)
print("Available GPUs:", tf.config.experimental.list_physical_devices('GPU'))

# Matrix multiplication test
shape = (5000, 5000)
a = tf.random.normal(shape)
b = tf.random.normal(shape)

# Time execution on GPU
with tf.device('/GPU:0'):
    print("Running on GPU...")
    start_time = time.time()
    c = tf.matmul(a, b)
    tf.print("Matrix multiplication (GPU) done.")
    print("Execution time (GPU):", time.time() - start_time, "seconds")

# Time execution on CPU for comparison
with tf.device('/CPU:0'):
    print("Running on CPU...")
    start_time = time.time()
    c = tf.matmul(a, b)
    tf.print("Matrix multiplication (CPU) done.")
    print("Execution time (CPU):", time.time() - start_time, "seconds")




(tfnightie) mitch@win11ml:~/stable_diff
$ python tfnightie/test_2.py
2025-03-14 21:35:33.400099: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow version: 2.20.0-dev20250314
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1742009735.413544  326199 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
Available GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
W0000 00:00:1742009735.417720  326199 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1742009735.572153  326199 gpu_device.cc:2018] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29043 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:09:00.0, compute capability: 12.0
2025-03-14 21:35:36.969440: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_PTX'

2025-03-14 21:35:36.969480: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleGetFunction(&function, module, kernel_name)' failed with 'CUDA_ERROR_INVALID_HANDLE'

2025-03-14 21:35:36.969505: W tensorflow/core/framework/op_kernel.cc:1843] INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2025-03-14 21:35:36.969533: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
Traceback (most recent call last):
  File "/home/mitch/stable_diff/tfnightie/test_2.py", line 10, in <module>
    a = tf.random.normal(shape)
  File "/home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 6027, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Mul_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Mul] name:

Also, while nvidia's site says that the Compute Capability of the RTX5090 is "10.0", the card itself seems to report "12.0". I am not so sure that info will be helpful, but it spun me for a loop:


$ cat <<EOF > card_details.cu
> #include <cuda_runtime.h>
#include <iostream>

int main() {
    cudaDeviceProp prop;
    int device;

    cudaGetDevice(&device); // Get the current device ID
    cudaGetDeviceProperties(&prop, device); // Get device properties

    size_t free_mem, total_mem;
    cudaMemGetInfo(&free_mem, &total_mem); // Get VRAM usage

    std::cout << "GPU Name: " << prop.name << std::endl;
    std::cout << "Compute Capability: " << prop.major << "." << prop.minor << std::endl;
    std::cout << "VRAM Usage: " << (total_mem - free_mem) / (1024 * 1024) << " MB / " << total_mem / (1024 * 1024) << " MB" << std::endl;

    return 0;
}
EOF



$ nvcc card_details.cu -o card_details && ./card_details
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
GPU Name: NVIDIA GeForce RTX 5090
Compute Capability: 12.0
VRAM Usage: 1763 MB / 32606 MB

tf_env.txt


== check python ====================================================
python version: 3.10.12
python branch:
python build version: ('main', 'Feb  4 2025 14:57:36')
python compiler version: GCC 11.4.0
python implementation: CPython


== check os platform ===============================================
os: Linux
os kernel version: #1 SMP Tue Nov 5 00:21:55 UTC 2024
os release version: 5.15.167.4-microsoft-standard-WSL2
os platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
freedesktop os release: {'NAME': 'Ubuntu', 'ID': 'ubuntu', 'PRETTY_NAME': 'Ubuntu 22.04.5 LTS', 'VERSION_ID': '22.04', 'VERSION': '22.04.5 LTS (Jammy Jellyfish)', 'VERSION_CODENAME': 'jammy', 'ID_LIKE': 'debian', 'HOME_URL': 'https://www.ubuntu.com/', 'SUPPORT_URL': 'https://help.ubuntu.com/', 'BUG_REPORT_URL': 'https://bugs.launchpad.net/ubuntu/', 'PRIVACY_POLICY_URL': 'https://www.ubuntu.com/legal/terms-and-policies/privacy-policy', 'UBUNTU_CODENAME': 'jammy'}
mac version: ('', ('', '', ''), '')
uname: uname_result(system='Linux', node='win11ml', release='5.15.167.4-microsoft-standard-WSL2', version='#1 SMP Tue Nov 5 00:21:55 UTC 2024', machine='x86_64')
architecture: ('64bit', 'ELF')
machine: x86_64

== are we in docker ================================================
No

== c++ compiler ====================================================
/usr/bin/c++
c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


== check pips ======================================================
numpy                   2.1.3
protobuf                5.29.3
tf_nightly              2.20.0.dev20250314

== check for virtualenv ============================================
Running inside a virtual environment.

== tensorflow import ===============================================
2025-03-14 21:02:48.002965: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1742007769.198398  317963 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
W0000 00:00:1742007769.202246  317963 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1742007769.355021  317963 gpu_device.cc:2018] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29043 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:09:00.0, compute capability: 12.0

tf.version.VERSION = 2.20.0-dev20250314
tf.version.GIT_VERSION = v1.12.1-123444-g07ff428d432
tf.version.COMPILER_VERSION = Ubuntu Clang 18.1.8 (++20240731024944+3b5b5c1ec4a3-1~exp1~20240731145000.144)

Sanity check: <tf.Tensor: shape=(1,), dtype=int32, numpy=array([1], dtype=int32)>
libcudnn not found

== env =============================================================
LD_LIBRARY_PATH /usr/local/cuda-12.8/lib64:
DYLD_LIBRARY_PATH is unset

== nvidia-smi ======================================================
Fri Mar 14 21:02:52 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 572.70         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        On  |   00000000:09:00.0 Off |                  N/A |
|  0%   43C    P1             78W /  600W |    2115MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              31      G   /Xwayland                             N/A      |
|    0   N/A  N/A              35      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+

== cuda libs =======================================================
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so.11.8.89
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart.so.12.8.90

== tensorflow installation =========================================
tensorflow not found

== tf_nightly installation =========================================
Name: tf_nightly
Version: 2.20.0.dev20250314
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages
Required-by:

== python version ==================================================
(major, minor, micro, releaselevel, serial)
(3, 10, 12, 'final', 0)

== bazel version ===================================================
Bazelisk version: v1.25.0
Build label: 7.4.1
Build time: Mon Nov 11 21:24:53 2024 (1731360293)
Build timestamp: 1731360293
Build timestamp as int: 1731360293

Standalone code to reproduce the issue

Try running anything with an RTX 5090. My test script is above.

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions