这是indexloc提供的服务,不要输入任何密码
Skip to content

Fix missing Keras variables shim #3907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 4, 2023
Merged

Conversation

nvcastet
Copy link
Collaborator

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

Fixes # (issue).

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

Signed-off-by: Nicolas Castet <ncastet@nvidia.com>
@nvcastet nvcastet requested a review from maxhgerlach April 28, 2023 22:46
@github-actions
Copy link

github-actions bot commented Apr 29, 2023

Unit Test Results

  1 155 files   -      74    1 155 suites   - 74   13h 46m 45s ⏱️ - 21m 12s
     887 tests ±       0       832 ✔️ +  39       54 💤 ±    0  1  -   39 
25 708 runs   - 1 909  18 290 ✔️  - 969  7 416 💤  - 730  2  - 210 

For more details on these failures, see this check.

Results for commit fa9cc27. ± Comparison against base commit 39c8f7c.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Apr 29, 2023

Unit Test Results (with flaky tests)

  1 386 files   -    289    1 386 suites   - 289   15h 49m 3s ⏱️ - 1h 29m 9s
     887 tests ±       0       831 ✔️ +     39       54 💤 ±       0  2  -   39 
30 600 runs   - 7 865  21 707 ✔️  - 4 041  8 888 💤  - 3 304  5  - 520 

For more details on these failures, see this check.

Results for commit fa9cc27. ± Comparison against base commit 39c8f7c.

♻️ This comment has been updated with latest results.

@maxhgerlach
Copy link
Collaborator

Triggered GPU pipelines to run again. Previous failures look like flakiness. Let's wait for those to pass..

@maxhgerlach
Copy link
Collaborator

This time the GPU head pipeline mostly passed, except for Spark Lightning MNIST, which has been failing there for a while.

Re-triggered the regular GPU pipeline once more.

@maxhgerlach
Copy link
Collaborator

Hmm, it looks as if download.pytorch.org is not very reliable at the moment.

Looking in links: https://download.pytorch.org/whl/cu101/torch_stable.html
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fba0576c310>: Failed to establish a new connection: [Errno -2] Name or service not known')': /whl/cu101/torch_stable.html
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fba0574ad90>: Failed to establish a new connection: [Errno -2] Name or service not known')': /whl/cu101/torch_stable.html
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fba0576c4d0>: Failed to establish a new connection: [Errno -2] Name or service not known')': /whl/cu101/torch_stable.html
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fba0576c810>: Failed to establish a new connection: [Errno -2] Name or service not known')': /whl/cu101/torch_stable.html
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7fba0576cb50>: Failed to establish a new connection: [Errno -2] Name or service not known')': /whl/cu101/torch_stable.html
ERROR: Could not find a version that satisfies the requirement torch==1.5.0+cu101 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1)
ERROR: No matching distribution found for torch==1.5.0+cu101

Some builds fail with errors like this one.

@maxhgerlach maxhgerlach added this to the v0.28.0 milestone May 2, 2023
@maxhgerlach
Copy link
Collaborator

Some of the builds keep failing on Buildkite with errors downloading from https://download.pytorch.org/. @EnricoMi, does this look like something we can address in any way from our end?

@EnricoMi
Copy link
Collaborator

EnricoMi commented May 3, 2023

As long as the link is valid, there is nothing we can do about. I reran the build jobs and the server seems to be fixed.

@maxhgerlach
Copy link
Collaborator

Thank you

@maxhgerlach maxhgerlach merged commit 67ea042 into horovod:master May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants