这是indexloc提供的服务,不要输入任何密码
Skip to content

Inconsistent NotEqual broadcasting behavior between CPU and GPU (CPU fails silently, GPU raises error) #97227

@pras529

Description

@pras529

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.19.0

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

Python 3.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

When tf.raw_ops.NotEqual is called with two tensors whose shapes are not broadcastable, the behavior is inconsistent between the CPU and GPU implementations.

The GPU correctly identifies the invalid input and raises an InvalidArgumentError, which is the expected behavior for a mathematically invalid operation.
The CPU, however, fails silently and returns a misleading scalar value (tf.Tensor(True, shape=(), dtype=bool)), even when the incompatible_shape_error=False flag is used.
This violates the principle of device consistency, where the same operation with the same inputs should yield the same result or error across all devices. The GPU's strict error handling is preferable as it prevents silent bugs in user code.

Failing loudly on invalid inputs is crucial for preventing silent errors and difficult-to-debug numerical issues. The CPU implementation should be updated to match the GPU's stricter and more correct behavior of erroring out when presented with non-broadcastable shapes for this operation.

Standalone code to reproduce the issue

import numpy as np
import tensorflow as tf

# Set seed for reproducibility
np.random.seed(202)

# Generate input tensors with non-broadcastable shapes
# x.shape = (4, 1)
# y.shape = (1, 28, 2, 3, 2)
x = np.random.uniform(-32767., 127., size=(4, 1)).astype(np.float32)
y = np.random.uniform(0., 89., size=(1, 28, 2, 3, 2)).astype(np.float32)

# Convert to TensorFlow tensors
x_tensor = tf.constant(x, dtype=tf.float32)
y_tensor = tf.constant(y, dtype=tf.float32)

# --- CPU Execution ---
# This runs without error and produces a misleading result
try:
     with tf.device("/CPU:0"):
         result_cpu = tf.raw_ops.NotEqual(
             x=x_tensor,
             y=y_tensor,
             incompatible_shape_error=False,
             name="selu_cpu",
         )
     print("CPU Result:", result_cpu)
except Exception as e:
     print("CPU Error:", e)


# --- GPU Execution ---
# This correctly fails with an InvalidArgumentError
try:
     with tf.device("/GPU:0"):
         result_gpu = tf.raw_ops.NotEqual(
             x=x_tensor,
             y=y_tensor,
             incompatible_shape_error=False,
             name="selu_gpu",
         )
     print("GPU Result:", result_gpu)
except Exception as e:
     print("\nGPU Error:", e)

Relevant log output

**CPU Output:**
CPU Result: tf.Tensor(True, shape=(), dtype=bool)

**GPU Output:**
GPU Error: {{function_node __wrapped__NotEqual_device_/job:localhost/replica:0/task:0/device:GPU:0}} required broadcastable shapes [Op:NotEqual] name: selu_gpu

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions