HomeClusterLabs Projects

High: pacemakerd vs. IPC/procfs confused deputy authenticity issue (4/4)

Description

High: pacemakerd vs. IPC/procfs confused deputy authenticity issue (4/4)

[4/4: CPG users to be careful about now-more-probable rival processes]

In essence, this comes down to pacemaker confusing at-node CPG members
with effectively the only plausible to co-exist at particular node,
which doesn't hold and asks for a wider reconciliation of this
reality-check.

However, in practical terms, since there are two factors lowering the
priority of doing so:

1/ possibly the only non-self-inflicted scenario is either that

some of the cluster stack processes fail -- this the problem
that shall rather be deferred to arranged node disarming/fencing
to stay on the safe side with 100% certainty, at the cost of
possibly long-lasting failover process at other nodes
(for other possibility, someone running some of these by accident
so they effectively become rival processes, it's like getting
hands cut when playing with a lawnmower in an unintended way)

2/ for state tracking of the peer nodes, it may possibly cause troubles

in case the process observed as left wasn't the last for the
particular node, even if presumably just temporary, since the
situation may eventually resolve with imposed serialization of
the rival processes via API end-point singleton restriction (this
is also the most likely cause of why such non-final leave gets
observed in the first place), except in one case -- the legitimate
API end-point carrier won't possibly acknowledged as returned
by its peers, at least not immediately, unless it tries to join
anew, which verges on undefined behaviour (at least per corosync
documentation)

we make do just with a light code change so as to

  • limit 1/ some more with in-daemon self-check for pre-existing end-point existence (this is to complement the checks already made in the parent daemon prior to spawning new instances, only some moments later; note that we don't have any lock file etc. mechanisms to prevent parallel runs of the same daemons, and people could run these on their own deliberation), and to
  • guard against the interferences from the rivals at the same node per 2/ with ignoring their non-final leave messages altogether.

Note that CPG at this point is already expected to be authenticity-safe.

Regarding now-more-probable part, we actually traded the inherently racy
procfs scanning for something (exactly that singleton mentioned above)
rather firm (and unfakeable), but we admittedly got lost track of
processes that are after CPG membership (that is, another form of
a shared state) prior to (or in non-deterministic order allowing for
the same) carring about publishing the end-point.

Big thanks is owed to Yan Gao of SUSE, for early discovery and reporting
this discrepancy arising from the earlier commits in the set.

Details

Provenance
Jan Pokorný <jpokorny@redhat.com>Authored on Apr 15 2019, 6:13 PM
Parents
rP052e6045eea7: High: pacemakerd vs. IPC/procfs confused deputy authenticity issue (3/4)
Branches
Unknown
Tags
Unknown

Event Timeline