HomeClusterLabs Projects

Fix: tools: crm_mon --one-shot fails while pacemaker is shutting down

Description

Fix: tools: crm_mon --one-shot fails while pacemaker is shutting down

crm_mon --one-shot checks the pacemakerd state before trying to get a
CIB connection. If pacemakerd is shutting down, it returns ENOTCONN.
This can cause a resource agent that calls crm_mon (for example,
ocf:heartbeat:pgsql) to fail to stop during shutdown.

This is a regression introduced by commit 3f342e3.
crm_mon.c:pacemakerd_status() returns pcmk_rc_ok if pacemakerd is
shutting down, since 49ebe4c and 46d6edd (fixes for CLBZ#5471). 3f342e3
refactored crm_mon --one-shot to use library functions. pcmk__status()
now does most of the work, calling pcmk_status.c:pacemakerd_status().
That function returns ENOTCONN if pacemakerd is shutting down. As a
result, we don't try to connect to the CIB during shutdown.

Here we update pcmkstatus() to use pcmkpacemakerd_status() instead
of a static and mostly redundant pacemakerd_status(). It receives the
pacemakerd state via an output pointer argument. If pacemakerd is
running or shutting down (or if we get an EREMOTEIO rc), we try
connecting to the fencer and CIB. However, as long as we successfully
get the pacemakerd state, we return success from pcmk__status(), since
we did obtain the cluster status.

A couple of minor notes:

  • pcmkstatus() now takes a timeout argument that it passes to pcmkpacemakerd_status(). timeout == 0 uses pcmk_ipc_dispatch_sync, matching the old implementation. A positive timeout uses pcmk_ipc_dispatch_main.
  • pcmk_cluster_queries.c:ipc_connect() no longer always prints a "Could not connect" error for EREMOTEIO. The caller may consider it OK.

Fixes T579
Fixes CLBZ#5501

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>

Details

Provenance
nrwahl2Authored on Oct 10 2022, 9:10 PM
Parents
rPa6ec43e37655: Refactor: libpacemaker: Default to sync dispatch in pcmk_cluster_queries
Branches
Unknown
Tags
Unknown
Tasks
Restricted Maniphest Task