HomeClusterLabs Projects

Fix: fencer: avoid infinite loop if device is removed during operation

Description

Fix: fencer: avoid infinite loop if device is removed during operation

Previously, the fencer could go into an infinite loop if a target node's
fencing topology was removed from the configuration after an unfencing
operation had been initiated but before its result was reported.

When the result arrived, advance_topology_device_in_level() would call
call_remote_stonith(), which would call stonith_choose_peer(), which has a loop
depending on advance_topology_level() returning OK, which it always would since
the topology is now empty.

This is not observed in all operating environments, so it may be additionally
dependent on some external factor such as glib version.

The fix is to have stonith_choose_peer() treat an empty topology as an error in
this situation (controlled by a new argument, since another caller still wants
an empty topology to be considered a success).

Details

Provenance
kgaillotAuthored on May 8 2020, 8:05 PM
Parents
rPabe23ce4f0ea: Refactor: fencer: rename two potentially confusing functions
Branches
Unknown
Tags
Unknown

Event Timeline