Fix: various: Correctly detect completion of systemd start/stop actions
When systemd receives a StartUnit() or StopUnit() method call, it
returns almost immediately, as soon as a start/stop job is enqueued. A
successful return code does NOT indicate that the start/stop has
finished.
Previously, we worked around this in action_complete() with a hack that
scheduled a follow-up monitor after a successful start/stop method call,
which polled the service after 2 seconds to see whether it was actually
running. However, this was not a robust solution. Timing issues could
result in Pacemaker having an incorrect view of the resource's status or
prematurely declaring the action as failed.
Now, we follow the best practice as documented in the systemd D-Bus API
doc (see StartUnit()):
https://www.freedesktop.org/software/systemd/man/latest/org.freedesktop.systemd1.html#Methods
After kicking off a systemd start/stop action, we make note of the job's
D-Bus object path. Then we register a D-Bus message filter that looks
for a JobRemoved signal whose bus path matches. This signal indicates
that the job has completed and includes its result. When we find the
matching signal, we set the action's result. We then remove the filter,
which causes the action to be finalized and freed. In the case of the
executor daemon, the action has a callback (action_complete()) that runs
during finalization and sets the executor's view of the action result.
Monitor actions still need much of the existing workaround code in
action_complete(), so we keep it for now. We bail out for start/stop
actions after setting the result as described above.
Ref T25
Co-authored-by: Reid Wahl <nrwahl@protonmail.com>
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>