diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt index 04a28f38f1..7fed8278d6 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt @@ -1,239 +1,315 @@ = Configure STONITH = //// We prefer [[ch-stonith]], but older versions of asciidoc dont deal well with that construct for chapter headings //// anchor:ch-stonith[Chapter 13, STONITH] indexterm:[STONITH, Configuration] == What Is STONITH == STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access. Just because a node is unresponsive, this doesn't mean it isn't accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing the data to be accessed from another node. STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere. == What STONITH Device Should You Use == It is crucial that the STONITH device can allow the cluster to differentiate between a node failure and a network one. The biggest mistake people make in choosing a STONITH device is to use remote power switch (such as many on-board IMPI controllers) that shares power with the node it controls. In such cases, the cluster cannot be sure if the node is really offline, or active and suffering from a network fault. Likewise, any device that relies on the machine being active (such as SSH-based "devices" used during testing) are inappropriate. == Differences of STONITH Resources == Stonith resources are somewhat special in Pacemaker. In previous versions, only "running" resources could be used by Pacemaker for fencing. This requirement has been relaxed to allow other parts of the cluster (such as resources like DRBD) to reliably initiate fencing. footnote:[Fencing a node while Pacemaker was moving stonith resources around would otherwise fail] Now all nodes have access to their definitions and instantiate them on-the-fly when needed. [NOTE] =========== To disable a fencing device/resource, 'target-role' can be set as you would for a normal resource. =========== [NOTE] =========== To prevent a specific node from using a fencing device, location constraints will work as expected. =========== [IMPORTANT] =========== Currently there is a limitation that fencing resources may only have a one set of meta-attributes and one set of instance-attributes. This can be revisited if it becomes a significant limitation for people. =========== == Configuring STONITH == [NOTE] =========== Both configuration shells include functionality to simplify the process below, particularly the step for deciding which parameters are required. However since this document deals only with core components, you should refer to the Stonith chapter of +Clusters from Scratch+ for those details. =========== . Find the correct driver: +stonith_admin --list-installed+ . Find the required parameters associated with the device: +stonith_admin --metadata --agent + . Create a file called +stonith.xml+ containing a primitive resource with a class of 'stonith', a type of and a parameter for each of the values returned in step 2. . If the device does not know how to fence nodes based on their uname, you may also need to set the special +pcmk_host_map+ parameter. See +man stonithd+ for details. . If the device does not support the list command, you may also need to set the special +pcmk_host_list+ and/or +pcmk_host_check+ parameters. See +man stonithd+ for details. . If the device does not expect the victim to be specified with the port parameter, you may also need to set the special +pcmk_host_argument+ parameter. See +man stonithd+ for details. . Upload it into the CIB using cibadmin: +cibadmin -C -o resources --xml-file stonith.xml+ . Set stonith-enabled to true. +crm_attribute -t crm_config -n stonith-enabled -v true+ . Once the stonith resource is running, you can test it by executing: +stonith_admin --reboot nodename+. Although you might want to stop the cluster on that machine first. === Example === Assuming we have an chassis containing four nodes and an IPMI device active on 10.0.0.1, then we would chose the fence_ipmilan driver in step 2 and obtain the following list of parameters .Obtaining a list of STONITH Parameters [source,C] ---- # stonith_admin --metadata -a fence_ipmilan ---- [source,XML] ---- fence_ipmilan is an I/O Fencing agent which can be used with machines controlled by IPMI. This agent calls support software using ipmitool (http://ipmitool.sf.net/). To use fence_ipmilan with HP iLO 3 you have to enable lanplus option (lanplus / -P) and increase wait after operation to 4 seconds (power_wait=4 / -T 4) IPMI Lan Auth type (md5, password, or none) IPMI Lan IP to talk to Password (if required) to control power on IPMI device Script to retrieve password (if required) Use Lanplus Username/Login (if required) to control power on IPMI device Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata Timeout (sec) for IPMI operation Ciphersuite to use (same as ipmitool -C parameter) Method to fence (onoff or cycle) Wait X seconds after on/off operation Wait X seconds before fencing is started Verbose mode ---- from which we would create a STONITH resource fragment that might look like this: .Sample STONITH Resource [source,XML] ---- - + ---- And finally, since we disabled it earlier, we need to re-enable STONITH. [source,Bash] ---- # crm_attribute -t crm_config -n stonith-enabled -v true ---- + +== Advanced Fencing Configurations == + +Some people consider that having one fencing device is a single point +of failure footnote:[Not true, since a node or resource must fail +before fencing even has a chance to], others prefer removing the node +from the storage and network instead of turning it off. + +Whatever the reason, Pacemaker supports fencing nodes with multiple +devices through a feature called fencing topologies. + +Simply create the individual devices as you normally would and then +define one or more fencing levels in the fencing-topology section in +the configuration. + +* Each level is attempted in +ascending index+ order +* If a device fails, +processing terminates+ for the current level (optimized boolean logic) and the next is attempted +* If the operation succeeds for all the listed devices in a level, the level is deemed to have passed +* Processing stops +when a level has passed+ or there are no more levels to try + +Some suggested uses of topologies include: + +* try poison-pill and fail back to power +* try disk and network, and fall back to power if either fails +* initiate a kdump and then poweroff the node + +.Properties of Fencing Levels +[width="95%",cols="1m,6<",options="header",align="center"] +|========================================================= + +|Field +|Description + +|id +|Your name for the level + indexterm:[id,fencing-level] + indexterm:[Fencing,fencing-level,id] + +|target +|The node to which this level applies + indexterm:[target,fencing-level] + indexterm:[Fencing,fencing-level,target] + +|index +|The order in which to attempt the levels. + Levels are attempted in +ascending index+ order +until one succeeds+. + indexterm:[index,fencing-level] + indexterm:[Fencing,fencing-level,index] + +|devices +|A comma separated list of devices for which the + indexterm:[devices,fencing-level] + indexterm:[Fencing,fencing-level,devices] + +|========================================================= + +.Example use of Fencing Topologies +[source,XML] +---- + + + ... + + + + + + + + + + ... + + + +----