HomeClusterLabs Projects

Fix: scheduler: process remote shutdowns correctly

Description

Fix: scheduler: process remote shutdowns correctly

When unpacking node histories, the scheduler can make multiple passes through
the node_state entries, because the state of remote node connections (on other
nodes) must be known before the history of the remote node itself can be
unpacked.

When unpacking a remote or guest node's history, the scheduler also unpacks its
transient attributes. If the shutdown attribute has been set, the scheduler
marks the node as shutting down.

Previously, at that time, it would also set the remote connection's next role
to stopped. However, if it so happened that remote connection history on
another node was processed later in the node history unpacking, and a probe had
found the connection not running, this would reset the next role to unknown.
The connection stop would not be scheduled, and the shutdown would hang until
it timed out.

Now, set the remote connection to stopped for shutdowns after all node
histories have been unpacked.

Details

Provenance
kgaillotAuthored on Jan 22 2021, 5:45 PM
Parents
rP2ae780b8746f: Log: scheduler: use new function to set a resource's next role
Branches
Unknown
Tags
Unknown

Event Timeline