这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@grodowski
Copy link
Contributor

Based on experience, if the writer database fails inbeetween the copy & cutover stages (e.g. during cutover pause), the heartbeat writes will fail and stop, then leading to throttled state and an infinite loop of throttler.shouldThrottle().

Since this state is irrecoverable, make the heartbeat writer panic if retries are exhausted, so that the migration can fail and be restarted later.

We've tested this in production now without issues, but if there's some desired behaviour that this PR changes, we might consider adding a new CLI option to control it - feedback welcome.

Closes #1569 (issue has some more context too)

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

Based on experience, if the writer database fails inbeetween the copy & cutover stages (e.g. during cutover pause), the heartbeat writes will fail and stop,
then leading to throttled state and an infinite loop of throttler.shouldThrottle().

Since this state is irrecoverable, make the heartbeat writer panic if retries are exhausted, so that the migration can fail and be restarted later.
Copilot AI review requested due to automatic review settings August 6, 2025 11:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses an infinite loop issue in gh-ost where heartbeat write failures during database unavailability lead to unrecoverable throttled states. The change makes the heartbeat writer panic when retries are exhausted, allowing the migration to fail and be restarted later instead of looping indefinitely.

  • Adds panic mechanism when heartbeat injection fails after exhausting retries
  • Prevents infinite throttling loops during database failures between copy and cutover stages

@meiji163
Copy link
Contributor

Good catch @grodowski, I believe I have seen this happen as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

gh-ost process stuck infinitely after writer database failure

2 participants