HomeClusterLabs Projects

galera: recover after network split in a 2-node cluster

Description

galera: recover after network split in a 2-node cluster

Galera maintains its own quorum, and when a network split
occurs in a two node cluster, both node becomes inquorate.

The resource agent always demotes a node when it loses
galera quorum; however it cannot promote it back because
it waits for the other node to advertise its DB sequence
number in the CIB, and that information is unavailable
during the network split.

Overcome this limitation by telling the resource agent
to use the quorum information from pacemaker. After fencing
took place, the surviving node is the only one active, so
it is safe to restart galera from it. Once the fenced node
comes back online, it won't have quorum until network
is restored, so it cannot restart galera locally so
split-brain is avoided.

This heuristic is made available via a new option
"two_node_mode". By default it is disabled, so the resource
agent works as usual.

Details

Provenance
Damien Ciabrini <dciabrin@redhat.com>Authored on Jul 24 2020, 8:54 AM
Damien Ciabrini <damien.ciabrini@gmail.com>Committed on Oct 27 2020, 6:38 AM
Parents
rR0a221d5292c6: Merge pull request #1566 from aleksei-burlakov/change-interface-re
Branches
Unknown
Tags
Unknown

Event Timeline