Page MenuHomeClusterLabs Projects

Configure Multiple Fencing Devices Using pcs
Updated 416 Days AgoPublic

This describes how to Configure Multiple Fencing Devices (using that page's example of IPMI followed by two switched PDUs) using the higher-level pcs tool.

Starting Point

For a frame of reference, the cluster starts with this configuration:

Cluster Name: an-cluster-03
Corosync Nodes:
  pcmk-1 pcmk-2 
Pacemaker Nodes:
  pcmk-1 pcmk-2 

Resources:   

Stonith Devices: 

Fencing Levels: 

Location Constraints:

Ordering Constraints:

Colocation Constraints:

Cluster Properties:
  cluster-infrastructure: corosync
  no-quorum-policy: ignore
  stonith-enabled: false

Assumptions

We will need to make a few assumptions about our example cluster:

  • It is a two-node cluster with the node names pcmk-1 and pcmk-2.
  • The two PDUs are accessible at the network addresses pdu-1 and pdu-2, and will be accessed using the fence_apc_snmp fence agent.
  • The fencing details for pcmk-1 are:
    • IPMI device address is pcmk-1.ipmi, the login name is admin and the password is secret.
    • Its power supplies are connected to port 1 of both pdu-1 and pdu-2.
  • The fencing details for pcmk-2 are:
    • IPMI device address is pcmk-2.ipmi, the login name is admin and the password is secret.
    • Its power supplies are connected to port 2 of both pdu-1 and pdu-2.

Adapt the example below to the names, addresses, credentials, and fence agents appropriate to your cluster.

Configure Fencing Devices

  • Configure the IPMI fence device for pcmk-1:
pcs stonith create fence_pcmk1_ipmi fence_ipmilan \
  pcmk_host_list="pcmk-1" ipaddr="pcmk-1.ipmi" \
  login="admin" passwd="secret" delay=15 \
  op monitor interval=60s
  • Configure the two PDU fence devices for pcmk-1:

    Note that we've added power_wait="5" to the second PDU, to tell pacemaker to wait 5 seconds after turning off the second PDU before restoring power. This gives plenty of time for the node's power supplies to completely drain, ensuring that the node loses power.
pcs stonith create fence_pcmk1_psu1 fence_apc_snmp \
  pcmk_host_list="pcmk-1" ipaddr="pdu-1" \
  port="1" op monitor interval="60s"
pcs stonith create fence_pcmk1_psu2 fence_apc_snmp \
  pcmk_host_list="pcmk-1" ipaddr="pdu-2" \
  port="1" power_wait="5" op monitor interval="60s"
  • Repeat for pcmk-2:
pcs stonith create fence_pcmk2_ipmi fence_ipmilan \
  pcmk_host_list="pcmk-2" ipaddr="pcmk-2.ipmi" \
  login="admin" passwd="secret" delay=15 \
  op monitor interval=60s
pcs stonith create fence_pcmk2_psu1 fence_apc_snmp \
  pcmk_host_list="pcmk-2" ipaddr="pdu-1" \
  port="2" op monitor interval="60s"
pcs stonith create fence_pcmk2_psu2 fence_apc_snmp \
  pcmk_host_list="pcmk-2" ipaddr="pdu-2" \
  port="2" power_wait="5" op monitor interval="60s"

Configuring fencing_topology

The next step is to tell Pacemaker the order we want the fencing methods to run.

Each fencing level may have one or more fence devices. When fencing is required, Pacemaker will try each level in sequence, stopping at the first level that succeeds. Therefore, separate levels function as a "fallback" mechanism (logical "or"). At any given level, all the devices in that level will be tried in succession, and all must succeed for the level to succeed (logical "and").

For our example, tell Pacemaker that the IPMI-based fence devices are the primary methods to use, and the switched PDUs are the fallback methods:

pcs stonith level add 1 pcmk-1 fence_pcmk1_ipmi
pcs stonith level add 2 pcmk-1 fence_pcmk1_psu1,fence_pcmk1_psu2

pcs stonith level add 1 pcmk-2 fence_pcmk2_ipmi
pcs stonith level add 2 pcmk-2 fence_pcmk2_psu1,fence_pcmk2_psu2

When Pacemaker needs to reboot a node using multiple devices in the same level, it turns them all off, then turns them all on, rather than rebooting each in turn, to ensure the node is completely fenced.

Enable and Test Fencing

Now that fencing is configured, we can enable it:

pcs property set stonith-enabled=true

You can test by unplugging the IPMI interface for pcmk-1 and then crashing it, triggering pcmk-2 to initiate fencing of it. After the IPMI interface times out, you should see PDU 1's port 1 turn off, then PDU 2's port 1 turn off, then the crashed node power down, then PDU 1's port 1 should turn back on, and finally PDU 2's port 1 should turn back on. If you configured your server's BIOS to power on after power loss or to return to last state after power loss, your server should start to power back on.

Last Author
kgaillot
Last Edited
Oct 31 2023, 5:04 PM