TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0 CUDA_ERROR_INVALID_HANDLE

### Issue type

Bug

### Have you reproduced the bug with TensorFlow Nightly?

Yes

### Source

binary

### TensorFlow version

tf-nightly-2.21.0.dev20250722

### Custom code

No

### OS platform and distribution

Ubuntu 20.04

### Mobile device

no

### Python version

3.11

### Bazel version

_No response_

### GCC/compiler version

_No response_

### CUDA/cuDNN version

12.8.1/9.8

### GPU model and memory

RTX5080 16gb

### Current behavior?


WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1753232013.685876   24341 gpu_device.cc:2431] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.

CUDA_ERROR_INVALID_HANDLE



### Standalone code to reproduce the issue

```shell
import tensorflow as tf
import numpy as np
from tensorflow.python.client import device_lib
import keras

print("Keras version: ", keras.__version__)

print(device_lib.list_local_devices())

x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
tensor = tf.convert_to_tensor(x)

print("Tensor: ", tensor)

# ========================================== define model ======================================
input_data = keras.Input(shape = (8,1))

# Data Encoder
dx = keras.layers.Dense(16, activation='relu')(input_data)

print("dx", dx.shape)
```

### Relevant log output

```shell
Keras version:  3.10.0.dev2025072204
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1753232219.958168   25258 gpu_device.cc:2431] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1753232220.029881   25258 gpu_device.cc:2020] Created device /device:GPU:0 with 11546 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 5080, pci bus id: 0000:01:00.0, compute capability: 12.0
W0000 00:00:1753232220.033014   25258 gpu_device.cc:2431] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
W0000 00:00:1753232220.035555   25258 gpu_device.cc:2431] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1753232220.037159   25258 gpu_device.cc:2020] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11546 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 5080, pci bus id: 0000:01:00.0, compute capability: 12.0
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13363326776403279234
xla_global_id: -1
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 12107055104
locality {
  bus_id: 1
  links {
  }
}
incarnation: 7207726466463925696
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 5080, pci bus id: 0000:01:00.0, compute capability: 12.0"
xla_global_id: 416903419
]
Tensor:  tf.Tensor(
[[1 2 3]
 [4 5 6]
 [7 8 9]], shape=(3, 3), dtype=int64)
2025-07-23 10:57:00.115747: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_PTX'

2025-07-23 10:57:00.115757: W tensorflow/compiler/mlir/tools/kernel_gen/tf_gpu_runtime_wrappers.cc:40] 'cuModuleGetFunction(&function, module, kernel_name)' failed with 'CUDA_ERROR_INVALID_HANDLE'

2025-07-23 10:57:00.115761: W tensorflow/core/framework/op_kernel.cc:1842] INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
2025-07-23 10:57:00.115766: I tensorflow/core/framework/local_rendezvous.cc:407] Local rendezvous is aborting with status: INTERNAL: 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE'
Traceback (most recent call last):
  File "/home/mike/catkin_ws2/src/mypy311/scripts/tftest.py", line 19, in <module>
    dx = keras.layers.Dense(16, activation='relu')(input_data)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mike/PycharmProjects/py311/.venv/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/mike/PycharmProjects/py311/.venv/lib/python3.11/site-packages/keras/src/backend/tensorflow/core.py", line 152, in convert_to_tensor
    return tf.cast(x, dtype)
           ^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Cast] name: 

Process finished with exit code 1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0 CUDA_ERROR_INVALID_HANDLE #97387

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0 CUDA_ERROR_INVALID_HANDLE #97387

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions