-
-
Notifications
You must be signed in to change notification settings - Fork 413
feat: sync from unfinalized checkpoint state #8527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
**Motivation** - we already have local checkpoint states in either db or file, this PR use one of them to save syncing time the next time the node is restarted - see #7504 (comment) **Description** 2 new init unfinalized state options: - new `--lastPersistedCheckpointState`, true by default to load from the last safe persisted checkpoint state. And the last safe persisted checkpoint state is considered unfinalized - use `--checkpointState` option: this is old option and support finalized state only, this PR supports booting from unfinalized state similar to `--lastPersistedCheckpointState` ==> to quickly sync a new node, we can use this option. Then remove it and the next time node will use `--lastPersistedCheckpointState` option by default - we can configure one of our nodes with `chain.nHistoricalStatesFileDataStore = true` - then feed other nodes with a persisted "safe checkpoint state" from there in `~/checkpoint_states` folder - a persisted checkpoint state is consider to be safe to boot if - it should be the checkpoint state that's unique in its epoch - its last processed block slot should be at epoch boundary or last slot of previous epoch - state slot should be at epoch boundary - state slot should be equal to epoch * SLOTS_PER_EPOCH other options to boot from `stateArchived` or `checkpointSyncUrl` are considered finalized states including `wssCheckpoint`. It's not possible to use `wssCheckpoint` option with unfinalized state for now. **TODO** - this is for `holesky-rescue`, consider supporting this for `unstable` branch too - update document in that case --------- Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com>
**Motivation** - implement an api to get a node synced asap **Description** - new api: `eth/v1/lodestar/persisted_checkpoint_state` to return a state based on an optional `rootHex:epoch` param - if not specified, return the latest safe checkpoint state - a node need to specify `--checkpointState` from the previous PR #7509 **Test** - [x] `curl -H "Accept: application/octet-stream" http://localhost:9596/eth/v1/lodestar/persisted_checkpoint_state -o latest_checkpoint_state.ssz` - [x] `curl -H "Accept: application/octet-stream" http://localhost:9596/eth/v1/lodestar/persisted_checkpoint_state?checkpoint_id=0x4f4d4c1b81141fe77a4a1c6a376dbe64ed9baa8f123664195e4f710c9fc4238d:118936 -o state_epoch_118936.ssz` --------- Co-authored-by: Tuyen Nguyen <twoeths@users.noreply.github.com> Co-authored-by: Nico Flaig <nflaig@protonmail.com>
…ied (#7542) We have added a new api in #7541 to serve latest checkpoint states. However the problem is that `--checkpointState` right now assumes that we receive a ssz-serialized state bytes, so we also need to ensure that the REST API only returns SSZ responses by applying the correct Accept header. This PR includes the following changes - Ensure we only receive SSZ responses if REST API is queried - Consider `--forceCheckpointSync` flag and if set always prioritze remote checkpoint from `--checkpointState` over latest safe checkpoint in db - Minor tweaks to logs to make it easier to understand what the node is doing - Misc formatting changes
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## unstable #8527 +/- ##
============================================
- Coverage 52.18% 51.98% -0.20%
============================================
Files 852 852
Lines 65317 65703 +386
Branches 4793 4800 +7
============================================
+ Hits 34083 34156 +73
- Misses 31165 31478 +313
Partials 69 69 🚀 New features to boost your workflow:
|
Performance Report✔️ no performance regression detected Full benchmark results
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant feature: the ability to sync from an unfinalized checkpoint state. This is particularly useful for faster node startup and recovery, especially during periods of non-finality. The implementation is comprehensive, touching upon API definitions, CLI options, state initialization logic, and the core fork choice mechanism. The introduction of initializeForkChoiceFromUnfinalizedState with its proto-array scaffolding is a clever approach to handle this complex scenario. The code is well-structured, and the new "unsafe" options are appropriately hidden and documented with warnings. Overall, this is a solid implementation of a complex feature. I have a couple of minor suggestions for code quality and performance improvements.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and well-implemented feature to allow syncing from an unfinalized checkpoint state. The changes are comprehensive, touching upon CLI options, API definitions, chain initialization, and the core fork choice logic. The implementation is robust, with a clear separation between finalized and unfinalized startup paths, and includes a clever mechanism to find the latest safe persisted checkpoint. The new complex logic, particularly in fork choice initialization, appears sound and is a great addition for faster node startup in certain scenarios. I have one minor suggestion regarding a misleading comment, but overall, this is an excellent contribution.
| return block; | ||
| } | ||
|
|
||
| getFinalizedBlockSlot(): Slot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a little strange considering its just the finalized epoch start slot and not actually the finalized block slot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe getFinalizedCheckpointSlot()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to getFinalizedCheckpointSlot, could also consider getAnchorBlockSlot
|
🎉 This PR is included in v1.36.0 🎉 |
Motivation
This was a feature we developed for rescuing Holesky as part of #7501 to quickly sync nodes to head during a period of long non-finality (~3 weeks). While it's unlikely we will have such a long period of non-finality on mainnet, this feature is still useful to have for much shorter periods and testing purposes on devnets.
It is now part of Ethereum protocol hardening mitigations described here
We will support this with the exception that our checkpoint state needs to be an epoch boundary checkpoint.
Description
The main feature of this PR is to allow initializing a node from an unfinalized checkpoint state either retrieved locally or from a remote source.
This behavior is disabled by default but can be enabled by either adding
--lastPersistedCheckpointStateflag to load from the last safe persisted checkpoint state stored locally--unsafeCheckpointStateto provide a file path or url to an unfinalized checkpoint state to start syncing from which can be used with new endpointGET /eth/v1/lodestar/persisted_checkpoint_stateto sync from a remote node or by sharing states fromcheckpoint_statesfolderBoth of these options are not safe to use on a network that recently finalized an epoch and must only be considered if syncing from last finalized checkpoint state is unfeasible.
An unfinalized checkpoint state persisted locally is only considered to be safe to boot if
epoch * SLOTS_PER_EPOCHBut even if these criteria are met, there is chance that the node will end up on a minority chain as it will not be able to pivot to another chain that conflicts with the checkpoint state it was initialized from.
Other existing flags (like
--checkpointState) are unchanged by this PR and will continue to expect a finalized checkpoint state.Previous PRs #7509, #7541, #7542 not merged to unstable are included.
Closes #7963
cc @twoeths