Fix: fencer: Refresh CIB devices on change to nodes section
This fixes a regression introduced by bf7ffcd. As of that commit, the
fake local node is created after all resources have been unpacked. So it
doesn't get added to resources' allowed_nodes tables.
This prevents registration of fencing devices when the fencer receives a
CIB diff that removes the local node. For example, the user may have
replaced the CIB with a boilerplate configuration that has an empty
nodes section. Registering a fencing device requires that the local node
be in the resource's allowed nodes table.
One option would be to add the fake local node to all resources' allowed
nodes tables immediately after creating it. However, it shouldn't
necessarily be an allowed node for all resources. For example, if
symmetric-cluster=false, then a node should not be placed in a
resource's allowed nodes table by default.
It's difficult to ensure correctness of the allowed nodes tables when a
fake local node is involved. It may involve repeated code or a fragile
and confusing dependence on the order of unpacking.
Since the fake local node is a hack in the first place, it seems better
to avoid using it where possible. Currently the only code that even sets
the local_node_name member of scheduler->priv is in:
- the fencer
- crm_mon when showing the "times" section
This commit works as follows. If the fencer receives a CIB diff
notification that contains the nodes section, it triggers a full refresh
of fencing device registrations. In our example above, where a user has
replaced the CIB with a configuration that has an empty nodes section,
this means all fencing device registrations will be removed.
However, the controller also has a CIB diff notification callback:
do_cib_updated(). The controller's callback repopulates the nodes
section with up-to-date information from the cluster layer (or its node
cache) if it finds that an untrusted client like cibadmin has sent a
modified the nodes section. Then it updates the CIB accordingly.
The fencer will receive this updated CIB and refresh fencing device
registrations again, re-registering the fencing devices that were just
removed.
Note that in the common case, we're not doing all this wasted work. The
"remove and then re-register" sequence should happen only when a user
has modified the CIB in a sloppy way (for example, by deleting nodes
from the CIB's nodes section that have not been removed from the
cluster).
In short, it seems better to rely on the controller's maintenance of the
CIB's node list, than to rely on a "fake local node" hack in the
scheduler.
See the following pull requests from Hideo Yamauchi and their
discussions:
https://github.com/ClusterLabs/pacemaker/pull/3849
https://github.com/ClusterLabs/pacemaker/pull/3852
Thanks to Hideo for the report and finding the cause.
Signed-off-by: Reid Wahl <nrwahl@protonmail.com>