-
Notifications
You must be signed in to change notification settings - Fork 901
Refine delay jitter for exponential backoff #7206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine delay jitter for exponential backoff #7206
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7206 +/- ##
=========================================
Coverage 89.93% 89.93%
- Complexity 6676 6678 +2
=========================================
Files 750 750
Lines 20168 20176 +8
Branches 1978 1978
=========================================
+ Hits 18138 18146 +8
Misses 1435 1435
Partials 595 595 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Is this fixing the problem discussed in #7004? |
yeah, looks like it's the same issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new logic for computing backoff looks good. Just wondering why touch the other logic, which needs to be pretty carefully considered.
long currentBackoffNanos = | ||
Math.min(nextBackoffNanos, retryPolicy.getMaxBackoff().toNanos()); | ||
long backoffNanos = (long) (randomJitter.get() * currentBackoffNanos); | ||
nextBackoffNanos = (long) (currentBackoffNanos * retryPolicy.getBackoffMultiplier()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also update the implementation in JdkHttpSender.
I know its not ideal that there are two implementations.. Maybe worth adding a utility function to RetryPolicy
that computes the backoff for a given attempt N
. Signature might look like:
public long computeBackoffNanosForAttempt(int attempt, Random randomSource) {...}
It wouldn't be as efficient as the current implementation, but...
- its such a tiny amount of compute that who cares
- the compute is trivial compared to the overall cost of preparing and executing and HTTP request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Added code for JdkHttpSender
I did not consider adding that method to calculate a backoff delay time
looking at the code I would say we can build more abstractions for sending requests and checking responses and exceptions, but not sure it's really helpful so let's probably move on with duplicated approach as we had before
try { | ||
response = chain.proceed(chain.request()); | ||
if (response != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation for changing this part of the logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if response is null and no exception happened, the code fails in throw exception;
line because exception is null. when response is null it's not something transient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is possible but I haven't seen response null in practice. If it does occur, we can simply add a null check immediately after response = chain.proceed(chain.request())
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's exactly what I did in this PR 😃
also null check was before this change so I intentionally keep it (but I suspect I can just drop it)
also previous code has the issue - if previous(before last) attempt returned rertryable response but the last attempt gets retryable exception method still returns the previous response which is not good as the last state (exception) should be returned
I replied in threads, tell me please if there are still any unanswered questions |
Thanks! |
Adjusted the jitter calculation to improve the randomness and distribution of delays in the exponential backoff logic.
current implementation is not based on exponential backoff but more uses randomly generated numbers in range (0, upperBound) and only upperBound exponentially grows, so in some cases we generate relatively low delays for retries

Also there was an existing link
https://github.com/grpc/proposal/blob/master/A6-client-retries.md#exponential-backoff in code to implementation of exponential backoff but actual source code was different