STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it
protects your data from being corrupted by rogue nodes or concurrent
access.
Just because a node is unresponsive, this doesn't mean it isn't
accessing your data. The only way to be 100% sure that your data is
safe, is to use STONITH so we can be certain that the node is truly
offline, before allowing the data to be accessed from another node.
STONITH also has a role to play in the event that a clustered service
cannot be stopped. In this case, the cluster uses STONITH to force the
whole node offline, thereby making it safe to start the service
elsewhere.
== What STONITH Device Should You Use ==
It is crucial that the STONITH device can allow the cluster to
differentiate between a node failure and a network one.
The biggest mistake people make in choosing a STONITH device is to
use remote power switch (such as many on-board IMPI controllers) that
shares power with the node it controls. In such cases, the cluster
cannot be sure if the node is really offline, or active and suffering
from a network fault.
Likewise, any device that relies on the machine being active (such as
SSH-based "devices" used during testing) are inappropriate.
== Configuring STONITH ==
ifdef::pcs[]
. Find the correct driver: +pcs stonith list+
. Find the parameters associated with the device: +pcs stonith describe <agent name>+
. Create a local config to make changes to +pcs cluster cib stonith_cfg+
. Create the fencing resource using +pcs -f stonith_cfg stonith create <stonith_id>
<stonith device type> [stonith device options]+
. Set stonith-enable to true. +pcs -f stonith_cfg property set stonith-enabled=true+
endif::[]
ifdef::crm[]
. Find the correct driver: +stonith_admin --list-installed+
. Since every device is different, the parameters needed to configure
it will vary. To find out the parameters associated with the device,
run: +stonith_admin --metadata --agent type+
The output should be XML formatted text containing additional
parameter descriptions. We will endevor to make the output more
friendly in a later version.
. Enter the shell crm Create an editable copy of the existing
configuration +cib new stonith+ Create a fencing resource containing a
primitive resource with a class of stonith, a type of type and a
parameter for each of the values returned in step 2: +configure
primitive ...+
endif::[]
. If the device does not know how to fence nodes based on their uname,
you may also need to set the special +pcmk_host_map+ parameter. See
+man stonithd+ for details.
. If the device does not support the list command, you may also need
to set the special +pcmk_host_list+ and/or +pcmk_host_check+
parameters. See +man stonithd+ for details.
. If the device does not expect the victim to be specified with the
port parameter, you may also need to set the special
+pcmk_host_argument+ parameter. See +man stonithd+ for details.
ifdef::crm[]
. Upload it into the CIB from the shell: +cib commit stonith+
endif::[]
ifdef::pcs[]
. Commit the new configuration. +pcs cluster push cib stonith_cfg+
endif::[]
. Once the stonith resource is running, you can test it by executing:
+stonith_admin --reboot nodename+. Although you might want to stop the
cluster on that machine first.
== Example ==
Assuming we have an chassis containing four nodes and an IPMI device
active on 10.0.0.1, then we would chose the fence_ipmilan driver in step
2 and obtain the following list of parameters
.Obtaining a list of STONITH Parameters
ifdef::pcs[]
[source,C]
----
# pcs stonith describe fence_ipmilan
Stonith options for: fence_ipmilan
auth: IPMI Lan Auth type (md5, password, or none)
ipaddr: IPMI Lan IP to talk to
passwd: Password (if required) to control power on IPMI device
passwd_script: Script to retrieve password (if required)
lanplus: Use Lanplus
login: Username/Login (if required) to control power on IPMI device
action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata
timeout: Timeout (sec) for IPMI operation
cipher: Ciphersuite to use (same as ipmitool -C parameter)
method: Method to fence (onoff or cycle)
power_wait: Wait X seconds after on/off operation
delay: Wait X seconds before fencing is started
privlvl: Privilege level on IPMI device
verbose: Verbose mode
----
endif::[]
ifdef::crm[]
[source,C]
----
# stonith_admin --metadata -a fence_ipmilan
----
[source,XML]
----
<?xml version="1.0" ?>
<resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI over LAN">
<longdesc>
fence_ipmilan is an I/O Fencing agent which can be used with machines controlled by IPMI. This agent calls support software using ipmitool (http://ipmitool.sf.net/).
To use fence_ipmilan with HP iLO 3 you have to enable lanplus option (lanplus / -P) and increase wait after operation to 4 seconds (power_wait=4 / -T 4)</longdesc>
<parameters>
<parameter name="auth" unique="1">
<getopt mixed="-A" />
<content type="string" />
<shortdesc lang="en">IPMI Lan Auth type (md5, password, or none)</shortdesc>
</parameter>
<parameter name="ipaddr" unique="1">
<getopt mixed="-a" />
<content type="string" />
<shortdesc lang="en">IPMI Lan IP to talk to</shortdesc>
</parameter>
<parameter name="passwd" unique="1">
<getopt mixed="-p" />
<content type="string" />
<shortdesc lang="en">Password (if required) to control power on IPMI device</shortdesc>
</parameter>
<parameter name="passwd_script" unique="1">
<getopt mixed="-S" />
<content type="string" />
<shortdesc lang="en">Script to retrieve password (if required)</shortdesc>
</parameter>
<parameter name="lanplus" unique="1">
<getopt mixed="-P" />
<content type="boolean" />
<shortdesc lang="en">Use Lanplus</shortdesc>
</parameter>
<parameter name="login" unique="1">
<getopt mixed="-l" />
<content type="string" />
<shortdesc lang="en">Username/Login (if required) to control power on IPMI device</shortdesc>
</parameter>
<parameter name="action" unique="1">
<getopt mixed="-o" />
<content type="string" default="reboot"/>
<shortdesc lang="en">Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata</shortdesc>
</parameter>
<parameter name="timeout" unique="1">
<getopt mixed="-t" />
<content type="string" />
<shortdesc lang="en">Timeout (sec) for IPMI operation</shortdesc>
</parameter>
<parameter name="cipher" unique="1">
<getopt mixed="-C" />
<content type="string" />
<shortdesc lang="en">Ciphersuite to use (same as ipmitool -C parameter)</shortdesc>
</parameter>
<parameter name="method" unique="1">
<getopt mixed="-M" />
<content type="string" default="onoff"/>
<shortdesc lang="en">Method to fence (onoff or cycle)</shortdesc>
</parameter>
<parameter name="power_wait" unique="1">
<getopt mixed="-T" />
<content type="string" default="2"/>
<shortdesc lang="en">Wait X seconds after on/off operation</shortdesc>
</parameter>
<parameter name="delay" unique="1">
<getopt mixed="-f" />
<content type="string" />
<shortdesc lang="en">Wait X seconds before fencing is started</shortdesc>
</parameter>
<parameter name="verbose" unique="1">
<getopt mixed="-v" />
<content type="boolean" />
<shortdesc lang="en">Verbose mode</shortdesc>
</parameter>
</parameters>
<actions>
<action name="on" />
<action name="off" />
<action name="reboot" />
<action name="status" />
<action name="diag" />
<action name="list" />
<action name="monitor" />
<action name="metadata" />
</actions>
</resource-agent>
----
endif::[]
from which we would create a STONITH resource fragment that might look
The tool uses the same library as the live cluster to show what it
would have done given the supplied input. It's output, in addition to
a significant amount of logging, is stored in two files +tmp.graph+
and +tmp.dot+, both are representations of the same thing -- the
cluster's response to your changes.
In the graph file is stored the complete transition, containing a list
of all the actions, their parameters and their pre-requisites.
Because the transition graph is not terribly easy to read, the tool
also generates a Graphviz dot-file representing the same information.
== Interpreting the Graphviz output ==
* Arrows indicate ordering dependencies
* Dashed-arrows indicate dependencies that are not present in the transition graph
* Actions with a dashed border of any color do not form part of the transition graph
* Actions with a green border form part of the transition graph
* Actions with a red border are ones the cluster would like to execute but cannot run
* Actions with a blue border are ones the cluster does not feel need to be executed
* Actions with orange text are pseudo/pretend actions that the cluster uses to simplify the graph
* Actions with black text are sent to the LRM
* Resource actions have text of the form pass:[<replaceable>rsc</replaceable>]_pass:[<replaceable>action</replaceable>]_pass:[<replaceable>interval</replaceable>] pass:[<replaceable>node</replaceable>]
* Any action depending on an action with a red border will not be able to execute.
* Loops are _really_ bad. Please report them to the development team.
=== Small Cluster Transition ===
image::images/Policy-Engine-small.png["An example transition graph as represented by Graphviz",width="16cm",height="6cm",align="center"]
In the above example, it appears that a new node, +node2+, has come
online and that the cluster is checking to make sure +rsc1+, +rsc2+
and +rsc3+ are not already running there (Indicated by the
+*_monitor_0+ entries). Once it did that, and assuming the resources
were not active there, it would have liked to stop +rsc1+ and +rsc2+
on +node1+ and move them to +node2+. However, there appears to be
some problem and the cluster cannot or is not permitted to perform the
stop actions which implies it also cannot perform the start actions.
For some reason the cluster does not want to start +rsc3+ anywhere.
For information on the options supported by `crm_simulate`, use
the `--help` option.
=== Complex Cluster Transition ===
image::images/Policy-Engine-big.png["Another, slightly more complex, transition graph that you're not expected to be able to read",width="16cm",height="20cm",align="center"]
== Do I Need to Update the Configuration on all Cluster Nodes? ==
No. Any changes are immediately synchronized to the other active
members of the cluster.
To reduce bandwidth, the cluster only broadcasts the incremental
updates that result from your changes and uses MD5 checksums to ensure
STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it
protects your data from being corrupted by rogue nodes or concurrent
access.
Just because a node is unresponsive, this doesn't mean it isn't
accessing your data. The only way to be 100% sure that your data is
safe, is to use STONITH so we can be certain that the node is truly
offline, before allowing the data to be accessed from another node.
STONITH also has a role to play in the event that a clustered service
cannot be stopped. In this case, the cluster uses STONITH to force the
whole node offline, thereby making it safe to start the service
elsewhere.
== What STONITH Device Should You Use ==
It is crucial that the STONITH device can allow the cluster to
differentiate between a node failure and a network one.
The biggest mistake people make in choosing a STONITH device is to
use remote power switch (such as many on-board IMPI controllers) that
shares power with the node it controls. In such cases, the cluster
cannot be sure if the node is really offline, or active and suffering
from a network fault.
Likewise, any device that relies on the machine being active (such as
SSH-based "devices" used during testing) are inappropriate.
== Configuring STONITH ==
ifdef::pcs[]
. Find the correct driver: +pcs stonith list+
. Find the parameters associated with the device: +pcs stonith describe <agent name>+
. Create a local config to make changes to +pcs cluster cib stonith_cfg+
. Create the fencing resource using +pcs -f stonith_cfg stonith create <stonith_id>
<stonith device type> [stonith device options]+
. Set stonith-enable to true. +pcs -f stonith_cfg property set stonith-enabled=true+
-endif::[]
+endif::pcs[]
ifdef::crm[]
. Find the correct driver: +stonith_admin --list-installed+
. Since every device is different, the parameters needed to configure
it will vary. To find out the parameters associated with the device,
run: +stonith_admin --metadata --agent type+
The output should be XML formatted text containing additional
parameter descriptions. We will endevor to make the output more
friendly in a later version.
. Enter the shell crm Create an editable copy of the existing
configuration +cib new stonith+ Create a fencing resource containing a
primitive resource with a class of stonith, a type of type and a
parameter for each of the values returned in step 2: +configure
primitive ...+
-endif::[]
+endif::crm[]
. If the device does not know how to fence nodes based on their uname,
you may also need to set the special +pcmk_host_map+ parameter. See
+man stonithd+ for details.
. If the device does not support the list command, you may also need
to set the special +pcmk_host_list+ and/or +pcmk_host_check+
parameters. See +man stonithd+ for details.
. If the device does not expect the victim to be specified with the
port parameter, you may also need to set the special
+pcmk_host_argument+ parameter. See +man stonithd+ for details.
ifdef::crm[]
. Upload it into the CIB from the shell: +cib commit stonith+
-endif::[]
+endif::crm[]
ifdef::pcs[]
. Commit the new configuration. +pcs cluster push cib stonith_cfg+
-endif::[]
+endif::pcs[]
. Once the stonith resource is running, you can test it by executing:
+stonith_admin --reboot nodename+. Although you might want to stop the
cluster on that machine first.
== Example ==
Assuming we have an chassis containing four nodes and an IPMI device
active on 10.0.0.1, then we would chose the fence_ipmilan driver in step
2 and obtain the following list of parameters
.Obtaining a list of STONITH Parameters
ifdef::pcs[]
[source,Bash]
----
# pcs stonith describe fence_ipmilan
Stonith options for: fence_ipmilan
auth: IPMI Lan Auth type (md5, password, or none)
ipaddr: IPMI Lan IP to talk to
passwd: Password (if required) to control power on IPMI device
passwd_script: Script to retrieve password (if required)
lanplus: Use Lanplus
login: Username/Login (if required) to control power on IPMI device
action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata
timeout: Timeout (sec) for IPMI operation
cipher: Ciphersuite to use (same as ipmitool -C parameter)
method: Method to fence (onoff or cycle)
power_wait: Wait X seconds after on/off operation
delay: Wait X seconds before fencing is started
privlvl: Privilege level on IPMI device
verbose: Verbose mode
----
-endif::[]
+endif::pcs[]
ifdef::crm[]
[source,C]
----
# stonith_admin --metadata -a fence_ipmilan
----
[source,XML]
----
<?xml version="1.0" ?>
<resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI over LAN">
<longdesc>
fence_ipmilan is an I/O Fencing agent which can be used with machines controlled by IPMI. This agent calls support software using ipmitool (http://ipmitool.sf.net/).
To use fence_ipmilan with HP iLO 3 you have to enable lanplus option (lanplus / -P) and increase wait after operation to 4 seconds (power_wait=4 / -T 4)</longdesc>
<parameters>
<parameter name="auth" unique="1">
<getopt mixed="-A" />
<content type="string" />
<shortdesc lang="en">IPMI Lan Auth type (md5, password, or none)</shortdesc>
</parameter>
<parameter name="ipaddr" unique="1">
<getopt mixed="-a" />
<content type="string" />
<shortdesc lang="en">IPMI Lan IP to talk to</shortdesc>
</parameter>
<parameter name="passwd" unique="1">
<getopt mixed="-p" />
<content type="string" />
<shortdesc lang="en">Password (if required) to control power on IPMI device</shortdesc>
</parameter>
<parameter name="passwd_script" unique="1">
<getopt mixed="-S" />
<content type="string" />
<shortdesc lang="en">Script to retrieve password (if required)</shortdesc>
</parameter>
<parameter name="lanplus" unique="1">
<getopt mixed="-P" />
<content type="boolean" />
<shortdesc lang="en">Use Lanplus</shortdesc>
</parameter>
<parameter name="login" unique="1">
<getopt mixed="-l" />
<content type="string" />
<shortdesc lang="en">Username/Login (if required) to control power on IPMI device</shortdesc>
</parameter>
<parameter name="action" unique="1">
<getopt mixed="-o" />
<content type="string" default="reboot"/>
<shortdesc lang="en">Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata</shortdesc>
</parameter>
<parameter name="timeout" unique="1">
<getopt mixed="-t" />
<content type="string" />
<shortdesc lang="en">Timeout (sec) for IPMI operation</shortdesc>
</parameter>
<parameter name="cipher" unique="1">
<getopt mixed="-C" />
<content type="string" />
<shortdesc lang="en">Ciphersuite to use (same as ipmitool -C parameter)</shortdesc>
</parameter>
<parameter name="method" unique="1">
<getopt mixed="-M" />
<content type="string" default="onoff"/>
<shortdesc lang="en">Method to fence (onoff or cycle)</shortdesc>
</parameter>
<parameter name="power_wait" unique="1">
<getopt mixed="-T" />
<content type="string" default="2"/>
<shortdesc lang="en">Wait X seconds after on/off operation</shortdesc>
</parameter>
<parameter name="delay" unique="1">
<getopt mixed="-f" />
<content type="string" />
<shortdesc lang="en">Wait X seconds before fencing is started</shortdesc>
</parameter>
<parameter name="verbose" unique="1">
<getopt mixed="-v" />
<content type="boolean" />
<shortdesc lang="en">Verbose mode</shortdesc>
</parameter>
</parameters>
<actions>
<action name="on" />
<action name="off" />
<action name="reboot" />
<action name="status" />
<action name="diag" />
<action name="list" />
<action name="monitor" />
<action name="metadata" />
</actions>
</resource-agent>
----
-endif::[]
+endif::crm[]
from which we would create a STONITH resource fragment that might look