HomeClusterLabs Projects

rabbitmq-cluster: fail monitor when node is in minority partition

Description

rabbitmq-cluster: fail monitor when node is in minority partition

It's possible for mnesia to still be running, but for mnesia to be
partitioned. And it's also possible to get into this state without
pacemaker seeing the node go down so no corrective action is taken.

When monitoring, check the number of nodes that pacemaker thinks is
running, and compare to the number of nodes that mnesia thinks is
running. If mnesia only sees a minority of the total nodes, fail it
so corrective action can be taken to rejoin the cluster.

This also adds a new function, rmq_app_running, which simply checks
whether the app is running or not and does not care about the
partition status. This is now used instead of the full monitor in a
few places where we don't care about partition state.

Resolves: RHBZ#1639826

Details

Provenance
John Eckersberg <jeckersb@redhat.com>Authored on Oct 16 2018, 4:21 PM
Parents
rR7e151ba9a4b1: Merge pull request #1241 from oalbrigt/nfsserver-shared_infodir-fixes
Branches
Unknown
Tags
Unknown

Event Timeline