diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.xml b/doc/Pacemaker_Explained/en-US/Ch-Stonith.xml index a6b2d62f62..5ef243587c 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.xml +++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.xml @@ -1,84 +1,190 @@ Protecting Your Data - STONITH
Why You Need STONITH STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access. Just because a node is unresponsive, this doesn't mean it isn't accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing the data to be accessed from another node. STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.
What STONITH Device Should You Use It is crucial that the STONITH device can allow the cluster to differentiate between a node failure and a network one. The biggest mistake people make in choosing a STONITH device is to use remote power switch (such as many on-board IMPI controllers) that shares power with the node it controls. In such cases, the cluster cannot be sure if the node is really offline, or active and suffering from a network fault. Likewise, any device that relies on the machine being active (such as SSH-based "devices" used during testing) are inappropriate.
Configuring STONITH Find the correct driver: stonith -L - - Since every device is different, the parameters needed to configure it will vary. - To find out the parameters required by the device: stonith -t type -n - - Hopefully the developers chose names that make sense, if not you can query for some additional information by finding an active cluster node and running: - lrmadmin -M stonith type pacemaker - The output should be XML formatted text containing additional parameter descriptions + Since every device is different, the parameters needed to configure it will vary. + To find out the parameters associated with the device, run: + stonith_admin --metadata -a type + + The output should be XML formatted text containing additional parameter descriptions. We + will endevor to make the output more friendly in a later version. - Create a file called stonith.xml containing a primitive resource with a class of stonith, a type of type and a parameter for each of the values returned in step 2 - Create a clone from the primitive resource if the device can shoot more than one node and supports multiple simultaneous connections. - Upload it into the CIB using cibadmin: cibadmin -C -o resources --xml-file stonith.xml + + Create a file called stonith.xml containing a primitive resource with a class of + stonith, a type of type and a parameter for each of the values + returned in step 2 + + + If the device does not know how to fence nodes based on their uname, you may also need + to set the special pcmk_host_map parameter. See man + stonithd for details. + + + If the device does not support the list command, you may also + need to set the special pcmk_host_list and/or + pcmk_host_check parameters. See man stonithd + for details. + + + If the device does not expect the victim to be specified with the + port parameter, you may also need to set the special + pcmk_host_argument parameter. See man stonithd + for details. + + + Upload it into the CIB using cibadmin: cibadmin -C -o resources --xml-file + stonith.xml + + + Once the stonith resource is running, you can test it by executing: + stonith_admin --reboot nodename. Although + you might want to stop the cluster on that machine first. +
Example - Assuming we have an IBM BladeCenter consisting of four nodes and the management interface is active on 10.0.0.1, then we would chose the external/ibmrsa driver in step 2 and obtain the following list of parameters + Assuming we have an chassis containing four nodes and an IPMI device active on 10.0.0.1, then + we would chose the fence_ipmilan driver in step 2 and obtain the + following list of parameters
Obtaining a list of STONITH Parameters - stonith -t external/ibmrsa -n + stonith_admin --metadata -a fence_ipmilan - hostname ipaddr userid passwd type - +<?xml version="1.0" ?> +<resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI over LAN"> +<longdesc> +fence_ipmilan is an I/O Fencing agent which can be used with machines controlled by IPMI. This agent calls support software using ipmitool (http://ipmitool.sf.net/). + +To use fence_ipmilan with HP iLO 3 you have to enable lanplus option (lanplus / -P) and increase wait after operation to 4 seconds (power_wait=4 / -T 4)</longdesc> +<parameters> + <parameter name="auth" unique="1"> + <getopt mixed="-A" /> + <content type="string" /> + <shortdesc lang="en">IPMI Lan Auth type (md5, password, or none)</shortdesc> + </parameter> + <parameter name="ipaddr" unique="1"> + <getopt mixed="-a" /> + <content type="string" /> + <shortdesc lang="en">IPMI Lan IP to talk to</shortdesc> + </parameter> + <parameter name="passwd" unique="1"> + <getopt mixed="-p" /> + <content type="string" /> + <shortdesc lang="en">Password (if required) to control power on IPMI device</shortdesc> + </parameter> + <parameter name="passwd_script" unique="1"> + <getopt mixed="-S" /> + <content type="string" /> + <shortdesc lang="en">Script to retrieve password (if required)</shortdesc> + </parameter> + <parameter name="lanplus" unique="1"> + <getopt mixed="-P" /> + <content type="boolean" /> + <shortdesc lang="en">Use Lanplus</shortdesc> + </parameter> + <parameter name="login" unique="1"> + <getopt mixed="-l" /> + <content type="string" /> + <shortdesc lang="en">Username/Login (if required) to control power on IPMI device</shortdesc> + </parameter> + <parameter name="action" unique="1"> + <getopt mixed="-o" /> + <content type="string" default="reboot"/> + <shortdesc lang="en">Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata</shortdesc> + </parameter> + <parameter name="timeout" unique="1"> + <getopt mixed="-t" /> + <content type="string" /> + <shortdesc lang="en">Timeout (sec) for IPMI operation</shortdesc> + </parameter> + <parameter name="cipher" unique="1"> + <getopt mixed="-C" /> + <content type="string" /> + <shortdesc lang="en">Ciphersuite to use (same as ipmitool -C parameter)</shortdesc> + </parameter> + <parameter name="method" unique="1"> + <getopt mixed="-M" /> + <content type="string" default="onoff"/> + <shortdesc lang="en">Method to fence (onoff or cycle)</shortdesc> + </parameter> + <parameter name="power_wait" unique="1"> + <getopt mixed="-T" /> + <content type="string" default="2"/> + <shortdesc lang="en">Wait X seconds after on/off operation</shortdesc> + </parameter> + <parameter name="delay" unique="1"> + <getopt mixed="-f" /> + <content type="string" /> + <shortdesc lang="en">Wait X seconds before fencing is started</shortdesc> + </parameter> + <parameter name="verbose" unique="1"> + <getopt mixed="-v" /> + <content type="boolean" /> + <shortdesc lang="en">Verbose mode</shortdesc> + </parameter> +</parameters> +<actions> + <action name="on" /> + <action name="off" /> + <action name="reboot" /> + <action name="status" /> + <action name="diag" /> + <action name="list" /> + <action name="monitor" /> + <action name="metadata" /> +</actions> +</resource-agent> +
from which we would create a STONITH resource fragment that might look like this Sample STONITH Resource - - - - + - + - + - - ]]>