TensorFlow on RTX 5090

### Issue type

Bug

### Have you reproduced the bug with TensorFlow Nightly?

Yes

### Source

binary

### TensorFlow version

2.20.0.dev20250314

### Custom code

No

### OS platform and distribution

Windows 11 - WSL2 - Ubuntu 22.04.5 LTS

### Mobile device

_No response_

### Python version

3.10.12

### Bazel version

7.4.1

### GCC/compiler version

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

### CUDA/cuDNN version

CUDA Version: 12.8

### GPU model and memory

RTX 5090 32GB

### Current behavior?

I had hoped that tensorflow would work on the RTX 5090 at all. It does not, sadly. I tried building from source but that didn't work either. I tried running the environment script but that didn't work either. At least bash is my primary programming language, so I was able to tidy that one up here:

https://github.com/tensorflow/tensorflow/pull/89271

But I wasn't able to get tensorflow running. I had a similar issue with PyTorch, which needed to use CUDA 12.8.* to work on the Blackwell cards, but no dice with the nightly build of tensorflow. Below is my test and the output, and under that is the `tf_env.txt` from my patched script.

It may be helpful to know that nvidia themselves seem to have it running here:

https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel-25-02.html

But I get the same errors that this other guy does when I try it out:

https://www.reddit.com/r/tensorflow/comments/1iutjoj/tensorflow_2501_cuda_128_rtx_5090_on_wsl2_cuda/

This conversation was another one I found that may be helpful, according to these guys, you need to support CUDA 12.8.1 to support Blackwell (aka the RTX 50## series cards):

https://discuss.ai.google.dev/t/building-tensorflow-from-source-for-rtx5000-gpu-series/65171/15


```

(tfnightie) mitch@win11ml:~/stable_diff
$ cat tfnightie/test_2.py
import tensorflow as tf
import time

# Check if TensorFlow sees the GPU
print("TensorFlow version:", tf.__version__)
print("Available GPUs:", tf.config.experimental.list_physical_devices('GPU'))

# Matrix multiplication test
shape = (5000, 5000)
a = tf.random.normal(shape)
b = tf.random.normal(shape)

# Time execution on GPU
with tf.device('/GPU:0'):
    print("Running on GPU...")
    start_time = time.time()
    c = tf.matmul(a, b)
    tf.print("Matrix multiplication (GPU) done.")
    print("Execution time (GPU):", time.time() - start_time, "seconds")

# Time execution on CPU for comparison
with tf.device('/CPU:0'):
    print("Running on CPU...")
    start_time = time.time()
    c = tf.matmul(a, b)
    tf.print("Matrix multiplication (CPU) done.")
    print("Execution time (CPU):", time.time() - start_time, "seconds")




(tfnightie) mitch@win11ml:~/stable_diff
$ python tfnightie/test_2.py
2025-03-14 21:35:33.400099: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow version: 2.20.0-dev20250314
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1742009735.413544  326199 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
Available GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
W0000 00:00:1742009735.417720  326199 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1742009735.572153  326199 gpu_device.cc:2018] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29043 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:09:00.0, compute capability: 12.0
2025-03-14 21:35:36.969440: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_PTX'

2025-03-14 21:35:36.969480: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleGetFunction(&function, module, kernel_name)' failed with 'CUDA_ERROR_INVALID_HANDLE'

2025-03-14 21:35:36.969505: W tensorflow/core/framework/op_kernel.cc:1843] INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2025-03-14 21:35:36.969533: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
Traceback (most recent call last):
  File "/home/mitch/stable_diff/tfnightie/test_2.py", line 10, in <module>
    a = tf.random.normal(shape)
  File "/home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages/tensorflow/python/framework/ops.py", line 6027, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Mul_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Mul] name:
```

Also, while nvidia's site says that the Compute Capability of the RTX5090 is "10.0", the card itself seems to report "12.0". I am not so sure that info will be helpful, but it spun me for a loop:

```

$ cat <<EOF > card_details.cu
> #include <cuda_runtime.h>
#include <iostream>

int main() {
    cudaDeviceProp prop;
    int device;

    cudaGetDevice(&device); // Get the current device ID
    cudaGetDeviceProperties(&prop, device); // Get device properties

    size_t free_mem, total_mem;
    cudaMemGetInfo(&free_mem, &total_mem); // Get VRAM usage

    std::cout << "GPU Name: " << prop.name << std::endl;
    std::cout << "Compute Capability: " << prop.major << "." << prop.minor << std::endl;
    std::cout << "VRAM Usage: " << (total_mem - free_mem) / (1024 * 1024) << " MB / " << total_mem / (1024 * 1024) << " MB" << std::endl;

    return 0;
}
EOF



$ nvcc card_details.cu -o card_details && ./card_details
nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
GPU Name: NVIDIA GeForce RTX 5090
Compute Capability: 12.0
VRAM Usage: 1763 MB / 32606 MB
```


# tf_env.txt
```

== check python ====================================================
python version: 3.10.12
python branch:
python build version: ('main', 'Feb  4 2025 14:57:36')
python compiler version: GCC 11.4.0
python implementation: CPython


== check os platform ===============================================
os: Linux
os kernel version: #1 SMP Tue Nov 5 00:21:55 UTC 2024
os release version: 5.15.167.4-microsoft-standard-WSL2
os platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.35
freedesktop os release: {'NAME': 'Ubuntu', 'ID': 'ubuntu', 'PRETTY_NAME': 'Ubuntu 22.04.5 LTS', 'VERSION_ID': '22.04', 'VERSION': '22.04.5 LTS (Jammy Jellyfish)', 'VERSION_CODENAME': 'jammy', 'ID_LIKE': 'debian', 'HOME_URL': 'https://www.ubuntu.com/', 'SUPPORT_URL': 'https://help.ubuntu.com/', 'BUG_REPORT_URL': 'https://bugs.launchpad.net/ubuntu/', 'PRIVACY_POLICY_URL': 'https://www.ubuntu.com/legal/terms-and-policies/privacy-policy', 'UBUNTU_CODENAME': 'jammy'}
mac version: ('', ('', '', ''), '')
uname: uname_result(system='Linux', node='win11ml', release='5.15.167.4-microsoft-standard-WSL2', version='#1 SMP Tue Nov 5 00:21:55 UTC 2024', machine='x86_64')
architecture: ('64bit', 'ELF')
machine: x86_64

== are we in docker ================================================
No

== c++ compiler ====================================================
/usr/bin/c++
c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


== check pips ======================================================
numpy                   2.1.3
protobuf                5.29.3
tf_nightly              2.20.0.dev20250314

== check for virtualenv ============================================
Running inside a virtual environment.

== tensorflow import ===============================================
2025-03-14 21:02:48.002965: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1742007769.198398  317963 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
W0000 00:00:1742007769.202246  317963 gpu_device.cc:2429] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1742007769.355021  317963 gpu_device.cc:2018] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29043 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:09:00.0, compute capability: 12.0

tf.version.VERSION = 2.20.0-dev20250314
tf.version.GIT_VERSION = v1.12.1-123444-g07ff428d432
tf.version.COMPILER_VERSION = Ubuntu Clang 18.1.8 (++20240731024944+3b5b5c1ec4a3-1~exp1~20240731145000.144)

Sanity check: <tf.Tensor: shape=(1,), dtype=int32, numpy=array([1], dtype=int32)>
libcudnn not found

== env =============================================================
LD_LIBRARY_PATH /usr/local/cuda-12.8/lib64:
DYLD_LIBRARY_PATH is unset

== nvidia-smi ======================================================
Fri Mar 14 21:02:52 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.124.06             Driver Version: 572.70         CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        On  |   00000000:09:00.0 Off |                  N/A |
|  0%   43C    P1             78W /  600W |    2115MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              31      G   /Xwayland                             N/A      |
|    0   N/A  N/A              35      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+

== cuda libs =======================================================
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudart.so.11.8.89
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudart.so.12.8.90

== tensorflow installation =========================================
tensorflow not found

== tf_nightly installation =========================================
Name: tf_nightly
Version: 2.20.0.dev20250314
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /home/mitch/.virtualenvs/tfnightie/lib/python3.10/site-packages
Required-by:

== python version ==================================================
(major, minor, micro, releaselevel, serial)
(3, 10, 12, 'final', 0)

== bazel version ===================================================
Bazelisk version: v1.25.0
Build label: 7.4.1
Build time: Mon Nov 11 21:24:53 2024 (1731360293)
Build timestamp: 1731360293
Build timestamp as int: 1731360293
```

### Standalone code to reproduce the issue

```shell
Try running anything with an RTX 5090. My test script is above.
```

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorFlow on RTX 5090 #89272

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

tf_env.txt

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorFlow on RTX 5090 #89272

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

tf_env.txt

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions