Regression apparently introduced by a6ec43e3 (in 2.1.5). See https://github.com/ClusterLabs/pacemaker/pull/2902#pullrequestreview-1775657071.
gao-yan:
Now upon cluster startup, if "crmadmin -D" is executed before a DC is elected yet, it will be hanging forever .
AFAICS the lookup involves a ping request to the DC. But given that there's not a DC to actually handle the request at the moment, crmadmin most probably won't receive any reply under the situation...
Previously with an implied timeout 30s, the command would eventually return with an error though rather than hanging forever.
Or course first it's a question what should be the best as the default for such a case.
kgaillot:
Hmm, it seems suboptimal that an IPC request sent to CRM_SYSTEM_DC will be broadcast to the CPG and no one will reply if there is no DC.
It would be nice if the local controller could return an error immediately in relay_message() if the message is for the DC but controld_globals.dc_name is NULL (possibly only for CRM_OP_PING to reduce the scope for unexpected consequences). Unfortunately the local controller might just not have learned the DC yet (in do_cl_join_announce() and do_cl_join_query() we set DC to NULL while waiting to hear from the actual DC). Maybe it's the lesser evil to get an error in that situation (especially if we can return something like EAGAIN/CRM_EX_UNSATISFIED).
The idea was that synchronous requests are supposed to time out after 5 seconds (see crm_ipc_send())... maybe we misread that code.