Non-deterministic behaviour: tf.math.unsorted_segment_sum uses CUDA Atomic Operations

**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.3
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below):v2.1.0-rc2-17-ge5bf8de and v2.2.0-rc4-8-g2b96f3662b
- Python version: 3.7.7
- Bazel version (if compiling from source):
- GCC/Compiler version (if compiling from source):
- CUDA/cuDNN version: 10.1.105 and 7.6.5.32
- GPU model and memory: RTX6000 24GB

**Describe the current behavior**
Currently, tf.math.unsorted_segment_sum uses non-deterministic GPU kernels which lead significant failings in the TensorFlow determinism venture. Other TensorFlow functions make use of tf.math.unsorted_segment_sum such as tf.gather (on backprop).

Some functions affected that I've discovered:
 - tfa.image.dense_image_warp (on backprop)
 - tf.gather (on backprop)

**Describe the expected behavior**

When TF_DETERMINISTIC_OPS=1,  tf.math.unsorted_segment_sum should use deterministic GPU kernels leading to reproducibility.

**Who will benefit from this bug fix correction?**
Determinism is an extremely important part of our venture into deep learning as a community. Without determinism, it is hard to reliably tune hyperparameters and conduct other types of investigations such ablation studies. Whilst many TensorFlow operations have a deterministic alternative upon setting the OS Environment variable TF_DETERMINISTIC_OPS=1, tf.math.unsorted_segment_sum seems to have fallen under the radar, perhaps because other operations took priority (such as tf.reduce_sum). 

Introducing this level of determinism to TensorFlow will allow it to be a better candidate for deep learning deployments in more sensitive environments such as medical. I.e. it doesn't make sense that a radiologist will look at result during one scan and then conduct the same scan and get a different result. It also affects the public's trust in AI venture altogether. ~~As far as I'm aware, PyTorch offers full deterministic capabilities (perhaps due to the benefit of hindsight with TensorFlow not having it).~~ 

**Standalone code to reproduce the issue**
Code to reproduce the issue:
(Edit: Please see the code here instead: https://github.com/tensorflow/tensorflow/issues/39751#issuecomment-632590302

I've added seed settings, TF_DETERMINISTIC_OPS, etc... and the issue still reproduces)

```
import tensorflow as tf
import numpy as np

num_segments = 4
data = tf.random.normal([30, 256, 256])
data = tf.constant(data)
segments = np.random.randint(low=0, high=num_segments, size=data.shape)

for i in range(5):
    reduced_summed = tf.math.unsorted_segment_sum(data, segments, num_segments)
    print(reduced_summed)
```
Output:

tf.Tensor([-273.92117  380.23163 1279.9718  -839.6437 ], shape=(4,), dtype=float32)
tf.Tensor([-273.92395  380.22168 1279.9834  -839.62573], shape=(4,), dtype=float32)
tf.Tensor([-273.91425  380.22177 1279.9773  -839.62976], shape=(4,), dtype=float32)
tf.Tensor([-273.9177  380.2243 1279.9733 -839.6427], shape=(4,), dtype=float32)
tf.Tensor([-273.91568  380.2217  1279.9747  -839.64166], shape=(4,), dtype=float32)

# Note: all printed results are different but in reality, they should be the same

Colab notebook with this code can be found at : https://colab.research.google.com/drive/1HNHSfERQ_IDDDM7bgabii9TQPufpsutp?usp=sharing

** Unit Tests **
Essentially, the code above will produce the same result, rather than a different result every time it is executed in the for loop.

Coming soon.


**Other info / logs**

More information about the GPU operation can be found at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/segment_reduction_ops_gpu.cu.cc

More information coming soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-deterministic behaviour: tf.math.unsorted_segment_sum uses CUDA Atomic Operations #39751

Note: all printed results are different but in reality, they should be the same

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Non-deterministic behaviour: tf.math.unsorted_segment_sum uses CUDA Atomic Operations #39751

Description

Note: all printed results are different but in reality, they should be the same

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions