HomeClusterLabs Projects

Fix: controller: Delay join finalization if a transition is in progress

Description

Fix: controller: Delay join finalization if a transition is in progress

While a transition is in progress, CIB updates may be generated and
received rapidly as resource actions complete. This can cause problems
if it happens during a controller join sequence.

The last two major steps of the join sequence are:

  1. The client sends XML containing its resource history, obtained from its local executor, to the DC in do_cl_join_finalize_respond().
  2. The DC receives this client resource history in do_dc_join_ack(), deletes the client's node state in the CIB, and writes the received client resource history to the CIB as the client's new node state.

However, suppose a resource action completes after the client generates
its resource history XML. Further suppose that action is recorded in the
CIB and is received by the DC's CIB manager before the DC updates the
client's node state. In this case, the newer history item is deleted
from the DC's CIB. The DC updated the client's node state based on the
history XML that the client fetched earlier. Now, the DC does not know
that the action completed on the client.

This can result in an action improperly being scheduled a second time.
Specifically, a user reported an issue in which a migrate_to operation
was run a second time after completing. The second time, the migrate_to
operation failed because the resource was no longer physically present
on the source node.

The do_dc_join_finalize() function in controld_join_dc.c contains a
block that delays join finalization while a transition is in progress.
If the R_IN_TRANSITION bit is set in the input register, the controller
stalls.

The problem is that nothing sets this bit. It was added by commit
a1c1b340 in 2005, and the line of code that set the bit was mistakenly
removed by commit feef7987 in 2008. We can tell that removing the
bit-setting line of code was a mistake, because the code that clears the
bit was kept (and moved elsewhere), while the code that checks the bit
was unmodified.

We do want to delay finalization if a transition is in progress.
However, the R_IN_TRANSITION bit itself is no longer necessary:
controld_globals.transition_graph->complete fulfills the same role, so
we can use that and remove R_IN_TRANSITION. The complete flag is
initialized to false (via calloc()) when a new graph is created during
do_te_invoke(). It's set to true by the time we reach notify_crmd()
(usually by te_graph_trigger()), which is where we previously cleared
the R_IN_TRANSITION bit.

This simple fix appears to resolve the known race conditions with client
history fetching versus CIB updates during a join sequence.

Closes T375

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>

Details

Provenance
nrwahl2Authored on Mar 22 2023, 5:28 AM
Parents
rP11fb9fb36eda: Merge pull request #3048 from nrwahl2/nrwahl2-T379_fix
Branches
Unknown
Tags
Unknown
Tasks
Restricted Maniphest Task