Page MenuHomeClusterLabs Projects

crmadmin -D hangs forever if there is no DC
Open, HighPublic

Assigned To
Authored By
nrwahl2
Dec 11 2023, 3:16 PM
Tags
  • Restricted Project
  • Restricted Project
  • Restricted Project
Referenced Files
None
Subscribers

Description

Regression apparently introduced by a6ec43e3 (in 2.1.5). See https://github.com/ClusterLabs/pacemaker/pull/2902#pullrequestreview-1775657071.

gao-yan:

Now upon cluster startup, if "crmadmin -D" is executed before a DC is elected yet, it will be hanging forever .

AFAICS the lookup involves a ping request to the DC. But given that there's not a DC to actually handle the request at the moment, crmadmin most probably won't receive any reply under the situation...

Previously with an implied timeout 30s, the command would eventually return with an error though rather than hanging forever.

Or course first it's a question what should be the best as the default for such a case.

kgaillot:

Hmm, it seems suboptimal that an IPC request sent to CRM_SYSTEM_DC will be broadcast to the CPG and no one will reply if there is no DC.

It would be nice if the local controller could return an error immediately in relay_message() if the message is for the DC but controld_globals.dc_name is NULL (possibly only for CRM_OP_PING to reduce the scope for unexpected consequences). Unfortunately the local controller might just not have learned the DC yet (in do_cl_join_announce() and do_cl_join_query() we set DC to NULL while waiting to hear from the actual DC). Maybe it's the lesser evil to get an error in that situation (especially if we can return something like EAGAIN/CRM_EX_UNSATISFIED).


The idea was that synchronous requests are supposed to time out after 5 seconds (see crm_ipc_send())... maybe we misread that code.

Event Timeline

nrwahl2 triaged this task as High priority.Dec 11 2023, 3:16 PM
nrwahl2 created this task.
nrwahl2 created this object with edit policy "Restricted Project (Project)".

All of this stems from dropping the mainloops in pcmk_cluster_queries.c. We replaced them with sync when the timeout is 0 in a6ec43e3. We replaced them with poll when the timeout is greater than 0 in 8771565f. The sync change is causing this problem. It's unclear if the poll change has caused or will cause any problems; that seems less likely, but it's possible.

It was a misunderstanding that synchronous requests effectively have a 5-second timeout. That's the timeout sending and receiving a reply to an IPC message (crm_ipc_send() and internal_ipc_get_reply()). In the current implementation of pcmk_cluster_queries.c, we don't poll after pcmk_controld_api_*() returns, so I must have misread the 5-second timeout in crm_ipc_send() as the effective timeout.

However, in pcmk__send_ipc_request(), we call crm_ipc_read() in a rapid, infinite loop until we receive a message. Which is quite similar to polling... I'm not sure there's much effective difference between this and polling with an infinite timeout.

I probably thought the reply from internal_ipc_get_reply() was all we were waiting for, but apparently we're waiting for more after that, and we never receive it in this case.

For pcmk__designated_controller(), it seems reasonable to just ask the local controller for controld_globals.dc_name, using that as the source of truth instead of broadcasting a ping.

  • If it's NULL when there is in fact a DC somewhere, that should be a very temporary state, and "no DC" is still "true" from the local perspective.
  • If it's set when the DC has gone offline or something, then similar to above.
  • If it's possible for multiple nodes to transiently see themselves as DC, then that already complicates the "ping the DC and wait for a response" approach we use now. There's no obviously correct answer in that case.

Alternatively or in addition, we could put a more general check somewhere like relay_message() as Ken suggested in https://github.com/ClusterLabs/pacemaker/pull/2902#discussion_r1422844596.

It comes down to what we care more about:

  • whether the node where we run the command believes there's a DC, or
  • whether there's a DC anywhere in the cluster.
    • This involves probing the cluster in search of global truth. As proposed, we could use EAGAIN or similar to say "this is a bad time to send out a probe."
    • But we could still hang forever in edge cases where controld_globals.dc_name != NULL but no node views itself as DC upon receiving the broadcast.
    • If two or more nodes temporarily view themselves as DC, then we'll only return the first node that we get a reply from. The other reply(-ies) will get ignored (or if not filtered out properly, mess up a later API call when they're received).

I'm also thinking of a more general concern, though it's probably unavoidable except by using timeouts as a workaround.

If an IPC message is sent successfully, but the target doesn't receive the cluster message for whatever reason (or the target ignores it because, for example, the message is intended for the DC but the target is not yet the DC), then we'll wait forever on a synchronous call or wait until the timeout on an async call. The problem is that we send the message only once, and we assume the target received and acked it.

Thinking out loud, I wonder if it would be feasible to send cluster messages at intervals until we get the first reply, and discard any further replies. This gets more complicated for requests that modify state on the target; we'd have to make sure that they're idempotent or that the target processes only the first one that it receives.

gao-yan posted the following in https://github.com/ClusterLabs/pacemaker/pull/2902#discussion_r1457468975, in response to this task. I'm inclined to agree with their comments.

Good thinking over there.

My two cents. Unless I'm missing anything, it sounds good enough to me for "crmadmin -D" to have the same understanding as the local controller in regard of who is the DC. If our controller doesn't think there's a DC yet, probably so be it and just tell crmadmin the same?

Basically it probably doesn't really need to cost a ping request to DC to know who is the DC. It would be more meaningful if we'd like to achieve the following with a single command/request:

crmadmin -S $(crmadmin -D -q)