HomeClusterLabs Projects

Fix: libcrmcommon: Retry pcmk_connect_ipc() if EAGAIN

Description

Fix: libcrmcommon: Retry pcmk_connect_ipc() if EAGAIN

When a controller is elected DC, it initiates a scheduler IPC connection
(new_schedulerd_ipc_connection()). If the connection is interrupted by a
signal (generally SIGTERM for cluster shutdown), it logs "Error
connecting to the scheduler: Resource temporarily unavailable" and
"Cannot shut down gracefully without the scheduler" and terminates
quickly.

If the connection error code is EAGAIN, we should try again before
giving up, so that we have a chance of shutting down gracefully in this
situation.

This commit adds a retry to pcmk_connect_ipc(), to proactively avoid
similar timing issues that other callers may face. If this turns out to
be problematic (for example, if each connection attempt takes a while
in some situation and this doubles the time it takes to give up), then
we can move the retry to the caller(s) that require(s) it.

Closes T444

Signed-off-by: Reid Wahl <nrwahl@protonmail.com>

Details

Provenance
nrwahl2Authored on Nov 17 2022, 4:15 PM
Parents
rP5249d34ced83: Merge pull request #2944 from nrwahl2/nrwahl2-T239_pt1
Branches
Unknown
Tags
Unknown
Tasks
Restricted Maniphest Task