[determinism] Add softmax/cross-entropy op exceptions for GPU determinism #47925

duncanriach · 2021-03-19T18:22:59Z

High-Level Summary

This current PR adds and tests the following functionality:

When the environment variable TF_DETERMINISTIC_OPS is set to "true" or "1", an attempt to run the following ops on a GPU will throw tf.errors.UnimplementedError (with an understandable message).

tf.nn.softmax_cross_entropy_with_logits
tf.nn.sparse_softmax_cross_entropy_with_logits

Please see RFC: Enhancing determinism in TF (being added via tensorflow/community PR 346).

Additional Notes

Data Types

The exceptions will be thrown for all currently GPU-supported data types for the logits input: tf.float16 and tf.float32 for both ops, and, additionally, tf.float64 for tf.nn.softmax_cross_entropy_with_logits.

Exception-throwing for all combinations of relevant data types for logits and labels (tf.int32 and tf.int64) are tested in both eager and graph mode when the op is used in the forward direction.

Forward vs Backward

It is currently suspected that the introduction of random noise into the gradients passed backwards from this op actually originate in the forward path algorithm, but the backward path algorithm might add additional noise. However, the backprop path for this op is not, and cannot, be used without the forward path algorithm also being used (due to this being a loss function). Therefore, the presence of exception-throwing on the backward path specifically is not necessary and is not implemented or tested by this current PR.

When these ops have a fully deterministic mode of operation, the bit-exact reproducibility of the outputs of both the forward and backward paths of the ops should be verified.

XLA

The tests will not be run with XLA auto-jit enabled because any XLA implementation of these ops will not throw these exceptions.

When a fully deterministic mode for these ops is implemented, the bit-exact reproducibility of the outputs of both the forward and backward paths of the ops should be verified both with and without XLA auto-jit enabled.

duncanriach · 2021-03-19T18:24:26Z

@sanjoy: It would be so cool for this one to get into TensorFlow version 2.5 as well. :-)

…tions

sanjoy · 2021-03-19T19:41:13Z

tensorflow/core/kernels/xent_op.cc

+          (!RequireDeterminism() ||
+            DisableSoftmaxXentWithLogitsOpDeterminismExceptions()),


Unnecessary parens.

sanjoy · 2021-03-23T02:47:02Z

tensorflow/core/kernels/xent_op.cc

@@ -58,6 +86,17 @@ class SoftmaxXentWithLogitsOp : public OpKernel {
                                        "2-dimensional, or broadcasted to be "
                                        "2-dimensional"));

+    if (std::is_same<Device, GPUDevice>::value) {


Is the CPU implementation deterministic?

We just confirmed that the CPU implementation is deterministic (thanks @wenscarl). I'm considering adding tests to prove/confirm/ensure that in a future PR.

duncanriach · 2021-03-24T17:58:33Z

@sanjoy, please will it be possible to get this merged before the version 2.5 branch is cut, which I believe will be on March 25 (tomorrow)?

sanjoy · 2021-03-25T05:30:58Z

@sanjoy, please will it be possible to get this merged before the version 2.5 branch is cut, which I believe will be on March 25 (tomorrow)?

I just approved it, but I can't guarantee that it will make it through the merge process before the branch cut.

duncanriach · 2021-03-26T00:34:11Z

My spam filter was acting up and I missed your comment, @sanjoy, sorry. I also missed the internal checks failure. This PR didn't make it into 2.5. Please will you let me know what the feedback/copybara problem is? (I can't see into it).

sanjoy · 2021-03-26T03:50:24Z

My spam filter was acting up and I missed your comment, @sanjoy, sorry. I also missed the internal checks failure. This PR didn't make it into 2.5. Please will you let me know what the feedback/copybara problem is? (I can't see into it).

Looks like some internal tooling failure, I'll merge it manually. Unfortunately this won't make 2.5, as you noted.

[determinism] Add softmax/cross-entropy op exceptions

28b56da

google-ml-butler bot added the size:M CL Change Size: Medium label Mar 19, 2021

google-cla bot added the cla: yes label Mar 19, 2021

gbaned self-assigned this Mar 19, 2021

gbaned requested review from penpornk and sanjoy and removed request for penpornk March 19, 2021 18:33

[determinism] Add disable switches for softmax/cross-entropy op excep…

9e988e0

…tions

sanjoy suggested changes Mar 19, 2021

View reviewed changes

[determinism] Address review, step 1, on PR 47925

caa3615

duncanriach requested a review from sanjoy March 19, 2021 21:44

sanjoy approved these changes Mar 20, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 20, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 20, 2021

gbaned added the kokoro:force-run Tests on submitted change label Mar 21, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 21, 2021

[determinism] Make RequireDeterminism internally-linked (for now)

dec1609

google-ml-butler bot removed the ready to pull PR ready for merge process label Mar 22, 2021

duncanriach requested a review from sanjoy March 22, 2021 21:21

sanjoy approved these changes Mar 23, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 23, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 23, 2021

sanjoy approved these changes Mar 25, 2021

View reviewed changes

google-ml-butler bot added the kokoro:force-run Tests on submitted change label Mar 25, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 25, 2021

copybara-service bot merged commit 4736e5e into tensorflow:master Apr 5, 2021

duncanriach mentioned this pull request Apr 16, 2021

Add GPU-deterministic back-prop for fused softmax/cross-entropy ops #38185

Closed

duncanriach deleted the softmax-crossentropy-nond9m-exceptions branch April 19, 2021 22:39

duncanriach mentioned this pull request Apr 22, 2021

[determinism] Add CPU-focused tests for fused softmax/cross-entropy ops #48688

Merged

duncanriach mentioned this pull request Jun 4, 2021

[determinism] Add sparse softmax/xent GPU-determinism #50070

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[determinism] Add softmax/cross-entropy op exceptions for GPU determinism #47925

[determinism] Add softmax/cross-entropy op exceptions for GPU determinism #47925

Uh oh!

duncanriach commented Mar 19, 2021 •

edited

Loading

Uh oh!

duncanriach commented Mar 19, 2021

Uh oh!

sanjoy Mar 19, 2021

Uh oh!

sanjoy Mar 23, 2021

Uh oh!

duncanriach Mar 23, 2021 •

edited

Loading

Uh oh!

duncanriach commented Mar 24, 2021 •

edited

Loading

Uh oh!

sanjoy commented Mar 25, 2021

Uh oh!

duncanriach commented Mar 26, 2021

Uh oh!

sanjoy commented Mar 26, 2021

Uh oh!

Uh oh!

		(!RequireDeterminism() \|\|
		DisableSoftmaxXentWithLogitsOpDeterminismExceptions()),

[determinism] Add softmax/cross-entropy op exceptions for GPU determinism #47925

[determinism] Add softmax/cross-entropy op exceptions for GPU determinism #47925

Uh oh!

Conversation

duncanriach commented Mar 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

High-Level Summary

Additional Notes

Data Types

Forward vs Backward

XLA

Uh oh!

duncanriach commented Mar 19, 2021

Uh oh!

sanjoy Mar 19, 2021

Choose a reason for hiding this comment

Uh oh!

sanjoy Mar 23, 2021

Choose a reason for hiding this comment

Uh oh!

duncanriach Mar 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

duncanriach commented Mar 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sanjoy commented Mar 25, 2021

Uh oh!

duncanriach commented Mar 26, 2021

Uh oh!

sanjoy commented Mar 26, 2021

Uh oh!

Uh oh!

duncanriach commented Mar 19, 2021 •

edited

Loading

duncanriach Mar 23, 2021 •

edited

Loading

duncanriach commented Mar 24, 2021 •

edited

Loading