这是indexloc提供的服务,不要输入任何密码
Skip to content

fix: resolve TensorFlow keras lazy loader recursion error #3126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

rkuester
Copy link
Contributor

Set TF_USE_LEGACY_KERAS=1 to force TensorFlow 2.19.0 to use tf_keras
instead of Keras 3.x if the build environment has both keras 3.10.0 and
tf_keras 2.19.0 installed. This prevents TensorFlow's lazy loader from
entering infinite recursion when trying to determine which Keras version
to use for tf.keras.models.Sequential().

This fixes runtime_test which was failing with:
"RecursionError: maximum recursion depth exceeded in comparison"
at line checking "if self._tfll_keras_version == 'keras_3'"

BUG=#3125

@rkuester rkuester requested a review from a team as a code owner June 24, 2025 22:56
@rkuester rkuester requested review from suleshahid and removed request for suleshahid June 24, 2025 22:58
Set TF_USE_LEGACY_KERAS=1 to force TensorFlow 2.19.0 to use
tf_keras instead of Keras 3.x if the environment has Keras 3.x
installed. This work around prevents TensorFlow's lazy loader
from infinitely recursing when trying to determine which
Keras version to use. See:

    https://keras.io/getting_started/#tensorflow--keras-2-backwards-compatibility.

This fixes generate_test_models.py which was failing with:
"RecursionError: maximum recursion depth exceeded in comparison"
at line checking "if self._tfll_keras_version == 'keras_3'"

BUG=tensorflow#3125
@rkuester
Copy link
Contributor Author

Oops, must ensure that tf_keras is installed.

Have I mentioned how much I dislike our non-hermetic Python setup in Bazel? I might fix that soon.

@suleshahid
Copy link
Collaborator

Thanks Ryan. Should we also add this for all other files that use Keras? Or does issue only occur for this one?

@suleshahid
Copy link
Collaborator

Also is it not an issue for the CI because we have only one keras version in those testing environments?

@rkuester
Copy link
Contributor Author

rkuester commented Jul 1, 2025

Thanks Ryan. Should we also add this for all other files that use Keras? Or does issue only occur for this one?

AFACT, the problem only rears its head when Keras >= 3 is visible alongside (older versions of?) Tensorflow. I'm not 100% sure of the circumstances in which this happens. Perhaps it only happens with some of our Bazel targets due to varying dependency sets and Bazel's sandboxing 🤷‍♂️.

I had Keras 3 in my environment, because I was doing model compression with TensorFlow Model Optimization Toolkit, which has a dependency on Keras 3.

Maybe the workaround in this PR isn't the right solution. I can think of at least two other possibilities:

  1. Upgrade the version of TF Bazel's Python environment brings in, and see if that fixes the incompatibility.
  2. Configure the Bazel Python toolchain to be hermetic, so the user's environment can't affect it.

Those seemed like changes with possibly farther-reaching impact, hence my affinity for the workaround. I'll experiment with it a bit more and follow up.

@rkuester
Copy link
Contributor Author

rkuester commented Jul 1, 2025

Ugh.

Turns out I was wrong about the underlying problem. The workaround does indeed avoid importing Keras, but that "fixes" a different underlying problem. The underlying problem is that Keras imports the module psutil for whatever reason, and psutil has a crashing bug. See the backtrace below.

psutil is imported if and only if it happens to be in the environment, and it happened to be in my environment. Uninstalling psutil fixes everything without the TF_USE_LEGACY_KERAS=1 workaround.

There appears to be no problem with Tensorflow coexisting with Keras >= 3. In fact, Keras >= 3 has been in our Bazel Python dependencies alongside Tensorflow 2.18 for some time now without any issues.

I will withdraw this PR, and open or reopen an issue about configuring Bazel's Python toolchain to be hermetic.

Backtrace:

[....]
  File "/tmp/bazel-working-directory/tflite_micro/bazel-out/k8-fastbuild/bin/python/tflite_micro/runtime_test.runfiles/tflm_pip_deps_keras/site-packages/keras/src/saving/saving_lib.py", line 37, in
 <module>
    import psutil
  File "/home/rkuester/.local/lib/python3.10/site-packages/psutil/__init__.py", line 95, in <module>
    from . import _pslinux as _psplatform
  File "/home/rkuester/.local/lib/python3.10/site-packages/psutil/_pslinux.py", line 25, in <module>
    from . import _psposix
  File "/home/rkuester/.local/lib/python3.10/site-packages/psutil/_psposix.py", line 50, in <module>
    'Negsignal', {x.name: -x.value for x in signal.Signals}
AttributeError: module 'signal' has no attribute 'Signals'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants