HomeClusterLabs Projects

totem: Increase ring_id seq after load

Description

totem: Increase ring_id seq after load

This patch handles the situation where the leader
node (the node with lowest node_id) crashes and is started again
before token timeout of the rest of the cluster.
The newly restarted node restores the ringid of the old ring from
stable storage, so it has the same ringid as rest of the nodes,
but ARU is zero. If the node is able to create a singleton membership
before receiving the joinlist from rest of the cluster,
everything works as expected, because the ring id gets increased
correctly.

But if the node receives a joinlist from another cluster node before
its own joinlist, then it continues as it would had it never left
the cluster. This is not correct, because the new node should always
create a singleton configuration first.

During the recovery phase, ARUs are compared and because they differ
(the ARU of the old leader node is 0), the other nodes
try to sent all of their previous messages. This is impossible
(even if it was correct), because other nodes have already freed most
of those messages. The implementation uses an assert to limit maximum
number of messages sent during recovery (we could fix this,
but it's not really the point).

The solution here is to increase the ring_id sequence number by 1 after
loading it from storage. During creation of the commit token it is
always increased by 4, so it will not collide with an existing
sequence.

Thanks Christine Caulfield <ccaulfie@redhat.com> for clarify commit
message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>

Details

Provenance
jfriesseAuthored on Jul 15 2019, 8:08 AM
Parents
rC0d1a1b1329a7: init: Use cpgtool instead of cfgtool
Branches
Unknown
Tags
Unknown

Event Timeline