HomeClusterLabs Projects

Fix: libcrmcluster: use uint64_t type for corosync ringid (membership id) when…

Description

Fix: libcrmcluster: use uint64_t type for corosync ringid (membership id) when updating node state

The reason is not clear yet, but some clusters of users ran into the
situation that corosync somehow bumped the ringid to a huge number like
"4294967300" which is exactly greater than the maximum uint32_t:

corosync[...]:   [TOTEM ] A new membership (X.X.X.X:4294967300) was
formed. Members

Node state got updated but with a wrong "last_seen" assigned to an
incomplete ringid:

pacemaker-controld  [...] info: Quorum retained | membership=4294967300
members=1
pacemaker-controld  [...] notice: Node node1 state is now member |
nodeid=... previous=unknown source=pcmk_quorum_notification
pacemaker-controld  [...] info: Cluster node node1 is now member (was in
unknown state)

And it immediately reaped itself from the membership cache because the
wrong "last_seen" of course was not consistent with the ringid:

pacemaker-controld  [...] notice: Node node1 state is now lost |
nodeid=... previous=member source=crm_reap_unseen_nodes
pacemaker-controld  [...] info: Cluster node node1 is now lost (was
member)
pacemaker-controld  [...] error: We're not part of the cluster anymore

pacemaker-controld would exit, get respawned and again ...

Anyway, corosync ringid is uint64_t type, which should be correctly
passed and kept.

Details

Provenance
gao-yanAuthored on Apr 8 2020, 11:36 AM
Parents
rPeb73f2237798: Merge pull request #2027 from gao-yan/combine-priority-fencing-delay
Branches
Unknown
Tags
Unknown

Event Timeline