Fix: scheduler: avoid container ping-pong
A bundle replica colocates its remote connection with its container, using a
finite score so that the container can run on Pacemaker Remote nodes.
However, a container shouldn't move around based solely on the preferences of
its remote connection, so treat the colocation as not having influence if the
container is already active.
An example scenario where this otherwise is a problem is:
- A bundle with 2 replicas gets the first assigned to a cluster node, then the second to a remote node. Because the second is on a remote node, its remote connection is on the cluster node with the first replica.
- The first replica fails. Clean instances are always assigned before failed instances, which makes sense and avoids multiple problems, so in this transition, the second replica is assigned first, and because of its connection-with-container colocation, the second replica is scheduled to move to the cluster node, and the first replica is scheduled to be recovered on the remote node. This is an unnecessary move of the second replica, but it gets worse.
- Once both containers are stopped as part of the transition, a new transition is calculated. Since the first replica is no longer failed, it is assigned first again, and gets assigned to the cluster node again. This means the second replica was scheduled to be restarted completely unnecessarily, but it gets worse.
- If the second replica's container starts first, and a new transition is scheduled, then the second replica will be assigned first because active instances are always assigned before inactive instances. New transitions will be scheduled every time one of the containers is integrated as a guest node, so it can get into a loop where the containers continually bounce back and forth.
This might not be a complete fix for the general problem (instance assignment
order is affected by conditions that can theoretically flip-flop), but it takes
care of the bundle connection-with-container case.