Page MenuHomeClusterLabs Projects

Configure Multiple Fencing Devices Using Crm
Updated 197 Days AgoPublic

This describes how to Configure Multiple Fencing Devices (using that page's example of IPMI followed by two switched PDUs) using the higher-level crm shell.

Starting Point

For a frame of reference, the cluster starts with this configuration:

node $id="1" an-c03n01.alteeve.ca
node $id="2" an-c03n02.alteeve.ca
property $id="cib-bootstrap-options" \
	cluster-infrastructure="corosync" \
	no-quorum-policy="ignore" \
	stonith-enabled="false"

Assumptions

We will need to make a few assumptions about our example cluster:

  • It is a two-node cluster with the node names pcmk-1 and pcmk-2.
  • The two PDUs are accessible at the network addresses pdu-1 and pdu-2, and will be accessed using the fence_apc_snmp fence agent.
  • The fencing details for pcmk-1 are:
    • IPMI device address is pcmk-1.ipmi, the login name is admin and the password is secret.
    • Its power supplies are connected to port 1 of both pdu-1 and pdu-2.
  • The fencing details for pcmk-2 are:
    • IPMI device address is pcmk-2.ipmi, the login name is admin and the password is secret.
    • Its power supplies are connected to port 2 of both pdu-1 and pdu-2.

Adapt the example below to the names, addresses, credentials, and fence agents appropriate to your cluster.

Configure Fencing Devices

  • Configure the IPMI fence device for pcmk-1:
crm configure primitive fence_pcmk1_ipmi stonith:fence_ipmilan params \
  ipaddr="an-c03n01.ipmi" login="admin" passwd="secret" delay="15" \
  pcmk_host_list="pcmk-1" op monitor interval="60s"
  • Configure the two PDU fence devices for pcmk-1:

    Note that we've added power_wait="5" to the second PDU, to tell pacemaker to wait 5 seconds after turning off the second PDU before restoring power. This gives plenty of time for the node's power supplies to completely drain, ensuring that the node loses power.
crm configure primitive fence_pcmk1_psu1 stonith:fence_apc_snmp params \
  ipaddr="pdu-1" port="1" pcmk_host_list="pcmk-1" op monitor interval="60s"
crm configure primitive fence_pcmk1_psu2 stonith:fence_apc_snmp params \
  ipaddr="pdu-2" port="1" pcmk_host_list="pcmk-1" power_wait="5" op monitor interval="60s"
  • Repeat for pcmk-2:
crm configure primitive fence_pcmk2_ipmi stonith:fence_ipmilan params \
  ipaddr="an-c03n02.ipmi" login="admin" passwd="secret" \
  pcmk_host_list="pcmk-2" op monitor interval="60s"
crm configure primitive fence_pcmk2_psu1 stonith:fence_apc_snmp params \
  ipaddr="pdu-1" port="1" pcmk_host_list="pcmk-2" op monitor interval="60s"
crm configure primitive fence_pcmk2_psu2 stonith:fence_apc_snmp params \
  ipaddr="pdu-2" port="1" pcmk_host_list="pcmk-2" power_wait="5" op monitor interval="60s"

Configuring fencing_topology

The next step is to tell Pacemaker the order we want the fencing methods to run. This is node using the general format:

nodeX: method1 method2a,method2b [methodN ...]

This says, "For nodeX, try 'method1' first. If that fails, try 'method2a and then method2b' and make sure both succeed. If either fails, consider the attempt failed and move on 'methodN'."

For our example:

crm configure fencing_topology \
  pcmk-1: fence_pcmk1_ipmi fence_pcmk1_psu1,fence_pcmk1_psu2 \
  pcmk-2: fence_pcmk2_ipmi fence_pcmk2_psu1,fence_pcmk2_psu2

When Pacemaker needs to reboot a node using multiple devices in the same level, it turns them all off, then turns them all on, rather than rebooting each in turn, to ensure the node is completely fenced.

Enable Fencing

Now that fencing is configured, we can enable it:

crm configure property stonith-enabled=true

You can test this by unplugging the IPMI interface for pcmk-1 and then crashing it, triggering pcmk-2 to initiate fencing of it. After the IPMI interface times out, you should see PDU 1's port 1 turn off, then PDU 2's port 1 turn off, then the crashed node power down, then PDU 1's port 1 should turn back on, and finally PDU 2's port 1 should turn back on. If you configured your server's BIOS to power on after power loss or to return to last state after power loss, your server should start to power back on.

Last Author
kgaillot
Last Edited
Jan 9 2024, 1:54 PM

Event Timeline

kgaillot created this object.