`tf.linalg.matrix_rank` produces inconsistent output on CPU vs GPU with `tol=6`

### Issue type

Bug

### Have you reproduced the bug with TensorFlow Nightly?

Yes

### Source

source

### TensorFlow version

2.20.0-dev20250715

### Custom code

Yes

### OS platform and distribution

Linux Ubuntu 20.04

### Mobile device

_No response_

### Python version

3.12

### Bazel version

_No response_

### GCC/compiler version

_No response_

### CUDA/cuDNN version

_No response_

### GPU model and memory

_No response_

### Current behavior?

Running `tf.linalg.matrix_rank` with a `float64` tensor with `tol=6` produces different values on CPU vs GPU. On the release versions `2.18.0` and `2.19.0`, neither CPU or GPU produced the same output as `numpy` (see [colab](https://colab.research.google.com/drive/1IMkKpTyy8jRpnK5V0IZv0jV9SSIHI7on?usp=sharing)). On nightly (`2.20.0-dev20250715`), the output from GPU matched the output from `numpy` but the CPU output did not match.

### Standalone code to reproduce the issue

```shell
import tensorflow as tf
import numpy as np

print(tf.__version__)   # 2.20.0-dev20250715

a = tf.ones((48, 74), dtype=tf.float64) * -88917319269045.
tol = 6.

with tf.device('/cpu:0'):
    output_cpu = tf.linalg.matrix_rank(a, tol=tol)

with tf.device('/gpu:0'):
    output_gpu = tf.linalg.matrix_rank(a, tol=tol)

output_np = np.linalg.matrix_rank(a.numpy(), tol=tol)

print("CPU output:", output_cpu)        # 4
print("GPU output:", output_gpu)        # 1
print("NumPy output:", output_np)       # 1
```

### Relevant log output

```shell
2.20.0-dev20250715
CPU output: tf.Tensor(4, shape=(), dtype=int32)
GPU output: tf.Tensor(1, shape=(), dtype=int32)
NumPy output: 1
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`tf.linalg.matrix_rank` produces inconsistent output on CPU vs GPU with `tol=6` #97102

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tf.linalg.matrix_rank produces inconsistent output on CPU vs GPU with tol=6 #97102

Description

Issue type

Have you reproduced the bug with TensorFlow Nightly?

Source

TensorFlow version

Custom code

OS platform and distribution

Mobile device

Python version

Bazel version

GCC/compiler version

CUDA/cuDNN version

GPU model and memory

Current behavior?

Standalone code to reproduce the issue

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`tf.linalg.matrix_rank` produces inconsistent output on CPU vs GPU with `tol=6` #97102