Page MenuHomeClusterLabs Projects

Configure Multiple Fencing Devices
Updated 377 Days AgoPublic

Pacemaker supports fencing to ensure problematic nodes cannot corrupt resources. Pacemaker supports fencing topologies to configure multiple fencing devices, when a single fencing device is insufficient or undesirable.

The Problem

A popular method of fencing is to use the IPMI BMC within the node for this purpose. However, this approach has two single points of failure, making it less than ideal by itself.

  • The IPMI draws power from the host's power supply. Should the host lose power, then the IPMI will not be able to respond to fence requests, the fence action will fail, and the cluster will not be able to recover resources that were active on the node.
  • The IPMI's network connection uses a single network interface, so a broken or disconnected network cable, failed switch port or switch, or failure in the NIC itself would leave the IPMI interface inaccessible. This problem is even worse if the IPMI shares its network connection with the host.

Other scenarios may arise with similar limitations for a single fencing mechanism.

The Solution

The simple solution is to add a second fence method as a fallback if the primary fence method fails. An example is to use one or more switched PDUs. These allow another node to cut the power outlets feeding the target nodes' power supplies. For high availability, cluster nodes often have redundant PSUs, so in this example, we will show how to configure two PDUs, each powering one of the node's PSUs, as a fallback fencing method.

To provide total redundancy, when two switches are available, the PDU(s) can be connected to the second switch. This will ensure that the backup fence method is available should the primary switch fail completely. This requires a more complex network configuration that is outside the scope of this mini-tutorial, however.

Ordering

We prefer to use IPMI fencing because when it does work, and when it confirms that a node is off, we can be certain that the fence action was successful. The switched PDUs, on the other hand, confirm the fence action was successful when the requested power outlets are opened. If a user moved the node's power cable(s) after the fencing was configured, the fence action may return a "success" when the node did not actually power off.

For this reason, we will want to ensure that the IPMI fence method is used when possible. We only want to fall back to the PDU-based fencing if IPMI fails.

Implementation

We want to configure a fencing topology that says, "Try the IPMI interface first, but if it fails, call both PDUs and only consider fencing a success if both PDUs successfully cut power".

How to accomplish this depends on the toolset you are using. See one of the following as appropriate:

Last Author
kgaillot
Last Edited
Jan 9 2024, 1:53 PM