-
-
Notifications
You must be signed in to change notification settings - Fork 414
fix: refactor validateColumnsByRangeResponse #8482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
eb8b77f to
af8b5db
Compare
Performance Report✔️ no performance regression detected Full benchmark results
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## unstable #8482 +/- ##
============================================
- Coverage 52.18% 51.93% -0.26%
============================================
Files 852 848 -4
Lines 65317 65732 +415
Branches 4793 4792 -1
============================================
+ Hits 34083 34135 +52
- Misses 31165 31528 +363
Partials 69 69 🚀 New features to boost your workflow:
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes two bugs related to range downloads and refactors the column validation logic to be more robust against out-of-order or incomplete data from peers. The changes look solid and significantly improve the validation heuristics. I've found one area where the error reporting could be clearer, but overall this is a great improvement.
**Motivation** - reviewing #8482 - the review was getting quite detailed, seemed to be more direct to just PR into it **Description** - review of validateColumnsByRangeResponse - simplify the duplicate/order checking loop - use a map to deduplicate the various cases - minor updates to the index checking loop - reuse map from ^, use non-nullish array --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
wemeetagain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, will leave open for more confirmation from others
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses two crashes observed in devnet logs by adding safety checks for empty sidecar arrays in cacheByRangeResponses and empty block arrays in validateBlockByRangeResponse. The core of the change is a significant and well-executed refactoring of validateColumnsByRangeResponse to improve validation of column sidecars. The new implementation is more robust, checking for duplicates and correct (slot, index) ordering as per the spec, and handles partial responses more gracefully.
My review includes a few suggestions. I've identified a high-severity bug in the new column ordering validation logic that would prevent it from correctly detecting out-of-order indices. I've also included a couple of medium-severity suggestions to improve the readability of error messages for better debugging. Overall, this is a solid improvement to the sync process's stability and correctness.
| if (currentSlot !== slot) { | ||
| // a new slot has started, reset index | ||
| currentIndex = -1; | ||
| } else { | ||
| currentIndex = columnSidecar.index; | ||
| } | ||
| currentSlot = slot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for updating currentSlot and currentIndex is flawed and will fail to detect out-of-order column indices within a slot.
Specifically, on the first item of a new slot, currentIndex is not updated to that item's index. In the next iteration for the same slot, the check currentIndex > columnSidecar.index will use an incorrect (stale or reset) value for currentIndex, causing the out-of-order check to fail.
The logic can be simplified to correctly update the state for the next iteration's check.
if (currentSlot !== slot) {
currentSlot = slot;
}
currentIndex = columnSidecar.index;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the issue I raised in #8560. Left as is it will cause extra errors, fixing to the suggestion will bury out of order errors. Kinda not sure which way to go but open to suggestions. I tend to think leaving this unless anyone thinks it should be changed to the above.
nflaig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't do another pass but all my previous comments are addressed
| if (expectedColumns === 0) { | ||
| continue; | ||
| let blobCount: number; | ||
| if (!isForkPostFulu(forkName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're checking fork of block slot, looks like it should belong to validateBlockByRangeResponse() logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, there should not be a case where this happens. I had it with a default parameter but @nflaig wanted it fork gated so I added that check and threw the error. It will likely not ever get thrown though unless there is a code bug. The conditional will be needed below though as handling of columns will be different isPostGloas
| /** | ||
| * If no columns are found for a block and there are commitments on the block then stop checking and just | ||
| * return early. Even if there were columns returned for subsequent slots that doesn't matter because | ||
| * we will be re-requesting them again anyway. Leftovers just get ignored |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment here
I expect this function to either throw error, or return result with some warnings
the below break breaks that contract/assumption, it only returns some first blocks of the whole response
I suppose in this case we just need to throw error instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the by-range code can pull partial epochs now. So its possible to pull part of the epoch and then on the next go around the rest of the epoch gets downloaded
|
🎉 This PR is included in v1.36.0 🎉 |
Motivation
Found two small bugs while looking through devnet logs
and
There is a bug in
validateColumnsByRangeResponsethat is passing through an empty array ofblockColumnSidecarson this linelodestar/packages/beacon-node/src/sync/utils/downloadByRange.ts
Line 770 in bca2040
cacheByRangeResponses. While going through the validation with a fine toothed comb I noticed that there were a couple of conditions that we are not checking per spec.Scope
Changed heuristic for how columns are validated in ByRange and added in checks for column delivery in
(slot, column_index)order.