-
Notifications
You must be signed in to change notification settings - Fork 492
First cluster implementation #2441
First cluster implementation #2441
Conversation
|
Hi @svagner, thanks a lot for the contribution. 🙌 Is it worth creating an issue first to discuss the approach and keep this PR for the technical discussion? |
be36983 to
25386eb
Compare
I'll create issue then. We already have a couple of them, but there was no movement about clustering. Also, there is some discussion in prev MR, but anyway it is not bad to have a separate issue for discussion |
Cluster would only have one ‘leader’ at a time, all other nodes are followers (so this is an implementation of a model with with 1 master and multiple standby nodes). ‘Master’ node executes the checks and sends notifications, ‘follower’ nodes don’t do neither (they run with ‘no-checks’ and ‘quiet-mode’ options enabled). This also adds a new (optional) dependency raftdb to store state and perform leader election.
For now, we are looking to global variable that was initialized once we've started. If we want to have flexibility to restart scheduler (config api reload/clustering etc.) we should have it as time of scheduler's start
72beb61 to
d54b595
Compare
|
New implementation is in #2472 |
Description
As solution for issue #2443
Same as #2345 with solving conflicts and some new api endpoints
Were added new api endpoints:
POST /api/cluster/recover_clusterApi endpoint Is used to manually force a new configuration in order to recover from a loss of quorum where the current configuration cannot be restored, such as when several servers die at the same time. This works by reading all the current state for this server, creating a snapshot with the supplied configuration, and then truncating the Raft log. This is the only safe way to force a given configuration without actually altering the log to insert any new entries, which could cause conflicts with other servers with a different state.
WARNING! This operation implicitly commits all entries in the Raft log, so in general, this is an extremely unsafe operation. If you've lost your other servers and are performing a manual recovery, then you've also lost the commit information, so this is likely the best you can do, but you should be aware that calling this can cause Raft log entries that were in the process of being replicated but not yet be committed to be committed.
Example:
POST /api/cluster/change_master- move leadership to another node in clusterExample:
Type of change
From the following, please check the options that are relevant.
How has this been tested?
Checklist: