Motivation
In a multi-controller deployment, triggering failover based strictly on a single controller's local probes is risky. If that controller experiences partial network degradation to a Kvrocks node — while its peers observe a healthy state — it may initiate an unwarranted failover, leading to a split-brain scenario and unnecessary traffic shedding.
Implementation
-
Voting mechanism: Each controller independently probes every Kvrocks node. Before promoting a new master, the leader collects unanimous votes from all peer controllers via POST /internal/vote. A single NO (or unreachable peer with a live lease) blocks the failover.
-
Peer discovery: Controllers register themselves in the store with a heartbeat. ListActivePeers returns peers whose leases are still alive. Expired peers are excluded from quorum rather than blocking failover.
Motivation
In a multi-controller deployment, triggering failover based strictly on a single controller's local probes is risky. If that controller experiences partial network degradation to a Kvrocks node — while its peers observe a healthy state — it may initiate an unwarranted failover, leading to a split-brain scenario and unnecessary traffic shedding.
Implementation
Voting mechanism: Each controller independently probes every Kvrocks node. Before promoting a new master, the leader collects unanimous votes from all peer controllers via
POST /internal/vote. A single NO (or unreachable peer with a live lease) blocks the failover.Peer discovery: Controllers register themselves in the store with a heartbeat.
ListActivePeersreturns peers whose leases are still alive. Expired peers are excluded from quorum rather than blocking failover.