-
Notifications
You must be signed in to change notification settings - Fork 74.8k
Description
Hello everyone,
I would like resuscitate very old issue. Actually, it is so old that even github's autocompletion doesn't offer it after typing "#" - #216. This request was raised several times, but still it hasn't been resolved.
In short, broadcasting interface is not "good enough" in TensorFlow :)
Lets first check how broadcasting works in numpy
:
In [1]: import numpy as np
In [2]: a = np.random.rand(2, 3, 4)
In [3]: b = np.random.rand(4, 5)
In [4]: a @ b
Out[4]:
array([[[1.42709275, 1.40630067, 0.46525725, 0.68734581, 0.65227036],
[2.01336504, 1.59980866, 0.93739699, 0.63190484, 0.92472892],
[1.82979902, 1.46193243, 0.85498406, 0.5994646 , 0.77767957]],
[[1.83010035, 1.49088728, 0.76694665, 0.65568003, 0.89110954],
[2.12214864, 1.41728107, 1.04566743, 0.60652825, 0.97115822],
[2.32478779, 2.06297214, 1.02016205, 0.81821249, 1.02604722]]])
Now, let's check what the TF is offering:
In [25]: a = tf.random.normal((2, 3, 4))
In [26]: b = tf.random.normal((4, 5))
In [27]: a @ b
... InvalidArgumentError: In[0] is not a matrix. Instead it has shape [2,3,4] [Op:MatMul] name: matmul/
Ouch! The "correct" way of doing it in the TF (of course there are other) is:
In [26]: a = tf.random.normal((2, 3, 4))
In [27]: b = tf.random.normal((4, 5))
In [28]: a @ tf.broadcast_to(b, tf.concat([a.shape[:-2], b.shape], axis=0))
<tf.Tensor: id=87, shape=(2, 3, 5), dtype=float32, numpy=
array([[[ 1.1977772 , -1.363074 , 1.8021748 , 0.1448586 , -0.6269997 ],
[ 1.2322128 , -2.1586194 , 0.09486479, 0.02937585, 0.9694344 ],
[ 0.5580032 , 6.11664 , -0.24535722, 0.16691092, -2.2263217 ]],
[[-0.7386743 , 1.2142425 , 1.1371945 , -1.2736351 , -2.971829 ],
[-1.9222848 , -0.7198772 , -0.9807504 , 0.02805561, 1.0210879 ],
[ 1.8334148 , 0.80895233, 1.2308785 , -0.23910654, -1.5128168 ]]], dtype=float32)>
You can see how much effort it requires to make operations broadcastable for two distinct tensors: extract leading shape from the left tensor, extract shape from the right tensors, concatenate these shapes with correct axis, call tf.broadcast_to
...
The same applies to cholesky, triangular solve and other operations. That is very upsetting that such a crucial feature isn't available out of the box.
Another concern is the performance of these "solutions". E.g. memory consumption for tiling and broadcast_to
operations, as they simply copy the tensor to match leading dimensions. Of course, native TensorFlow broadcasting implementation would be preferable in this case.
Kind regards,
Artem Artemev