HomeClusterLabs Projects

Fix: various: Correctly detect completion of systemd start/stop actions
dc58e1177858Unpublished

Unpublished Commit ยท Learn More

Not On Permanent Ref: This commit is not an ancestor of any permanent ref.
This commit no longer exists in the repository. It may have been part of a branch which was deleted.

Description

Fix: various: Correctly detect completion of systemd start/stop actions

When systemd receives a StartUnit() or StopUnit() method call, it
returns almost immediately, as soon as a start/stop job is enqueued. A
successful return code does NOT indicate that the start/stop has
finished.

Previously, we worked around this in action_complete() with a hack that
scheduled a follow-up monitor after a successful start/stop method call,
which polled the service after 2 seconds to see whether it was actually
running. However, this was not a robust solution. Timing issues could
result in Pacemaker having an incorrect view of the resource's status or
prematurely declaring the action as failed.

Now, we follow the best practice as documented in the systemd D-Bus API
doc (see StartUnit()):
https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.systemd1.html#Methods

After kicking off a systemd start/stop action, we make note of the job's
D-Bus object path. Then we register a D-Bus message filter that looks
for a JobRemoved signal whose bus path matches. This signal indicates
that the job has completed and includes its result. When we find the
matching signal, we set the action's result. We then remove the filter,
which causes the action to be finalized and freed. In the case of the
executor daemon, the action has a callback (action_complete()) that runs
during finalization and sets the executor's view of the action result.

Monitor actions still need much of the existing workaround code in
action_complete(), so we keep it for now. We bail out for start/stop
actions after setting the result as described above.

Ref T25

Co-authored-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>

Details

Provenance
clumensAuthored on Jan 16 2025, 4:21 PM
nrwahl2Committed on Wed, Feb 19, 7:30 PM

Event Timeline

Commit No Longer Exists

This commit no longer exists in the repository.