HomeClusterLabs Projects

Fix: controller: Avoid election storm due to incompatible CIB

Description

Fix: controller: Avoid election storm due to incompatible CIB

The DC accepts a joining node even if its local CIB manager will later
reject the joining node's CIB (for example, due to schema version
incompatibility). This can cause an election storm.

do_dc_join_finalize() calls cib_t:cmds:sync_from() against the joining
node, syncing its CIB across the cluster. However, it may fail to apply
the diff, and the DC won't notice until we get an error code via the
sync callback.

Here, if a joining node has the max generation so far, we verify on the
DC side that we recognize its schema name before accepting its join
request. That eliminates most of the cases in which we would sync the
CIB and then reject the CIB ourselves on the DC.

If the CIB sync does fail in the finalize step, then we add the node to
a table of nodes whose CIB syncs have failed, and then we trigger a new
election, this time nacking that node if it sends a join request. The
node remains in the failed sync table until it leaves the cluster (which
should happen after it's nacked).

Closes T455

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>

Details

Provenance
nrwahl2Authored on Nov 26 2022, 7:30 PM
Parents
rP2d69be23f443: Merge pull request #2965 from nrwahl2/nrwahl2-T456
Branches
Unknown
Tags
Unknown
Tasks
Restricted Maniphest Task