这是indexloc提供的服务,不要输入任何密码
Skip to content

Inconsistent behavior for tf.raw_ops.NotEqual between CPU and GPU with non-broadcastable shapes #97204

@jiren-the-gray

Description

@jiren-the-gray

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.19.0

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

Python 3.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

When tf.raw_ops.NotEqual is called with two tensors whose shapes are
not broadcastable, the behavior is inconsistent between the CPU and GPU
implementations.

  • The GPU correctly identifies the invalid input and raises an
    InvalidArgumentError, which is the expected behavior for a
    mathematically invalid operation.
  • The CPU, however, fails silently and returns a misleading scalar
    value (tf.Tensor(True, shape=(), dtype=bool)), even when the
    incompatible_shape_error=False flag is used.

This violates the principle of device consistency, where the same
operation with the same inputs should yield the same result or error
across all devices. The GPU's strict error handling is preferable as it
prevents silent bugs in user code.

Expected Behavior

The behavior should be consistent across all devices. The most correct
and safest behavior would be for both the CPU and GPU to raise an
InvalidArgumentError
.

Failing loudly on invalid inputs is crucial for preventing silent errors
and difficult-to-debug numerical issues. The CPU implementation should
be updated to match the GPU's stricter and more correct behavior of
erroring out when presented with non-broadcastable shapes for this
operation.

Standalone code to reproduce the issue

import numpy as np
import tensorflow as tf

# Set seed for reproducibility
np.random.seed(202)

# Generate input tensors with non-broadcastable shapes
# x.shape = (4, 1)
# y.shape = (1, 28, 2, 3, 2)
x = np.random.uniform(-32767., 127., size=(4, 1)).astype(np.float32)
y = np.random.uniform(0., 89., size=(1, 28, 2, 3, 2)).astype(np.float32)

# Convert to TensorFlow tensors
x_tensor = tf.constant(x, dtype=tf.float32)
y_tensor = tf.constant(y, dtype=tf.float32)

# --- CPU Execution ---
# This runs without error and produces a misleading result
try:
     with tf.device("/CPU:0"):
         result_cpu = tf.raw_ops.NotEqual(
             x=x_tensor,
             y=y_tensor,
             incompatible_shape_error=False,
             name="selu_cpu",
         )
     print("CPU Result:", result_cpu)
except Exception as e:
     print("CPU Error:", e)


# --- GPU Execution ---
# This correctly fails with an InvalidArgumentError
try:
     with tf.device("/GPU:0"):
         result_gpu = tf.raw_ops.NotEqual(
             x=x_tensor,
             y=y_tensor,
             incompatible_shape_error=False,
             name="selu_gpu",
         )
     print("GPU Result:", result_gpu)
except Exception as e:
     print("\nGPU Error:", e)

Relevant log output

**CPU Output:**


CPU Result: tf.Tensor(True, shape=(), dtype=bool)


**GPU Output:**


GPU Error: {{function_node
__wrapped__NotEqual_device_/job:localhost/replica:0/task:0/device:GPU:0}}
required broadcastable shapes [Op:NotEqual] name: selu_gpu

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions