diff --git a/doc/sphinx/Pacemaker_Explained/fencing.rst b/doc/sphinx/Pacemaker_Explained/fencing.rst index d9b8f21d72..df928b5dbc 100644 --- a/doc/sphinx/Pacemaker_Explained/fencing.rst +++ b/doc/sphinx/Pacemaker_Explained/fencing.rst @@ -1,1026 +1,1170 @@ +.. index:: + single: fencing + single: STONITH + +.. _fencing: + Fencing ------- -.. Convert_to_RST: - - anchor:ch-fencing[Chapter 6, Fencing] - indexterm:[Fencing, Configuration] - indexterm:[STONITH, Configuration] - - == What Is Fencing? == - - 'Fencing' is the ability to make a node unable to run resources, even when that - node is unresponsive to cluster commands. - - Fencing is also known as 'STONITH', an acronym for "Shoot The Other Node In The - Head", since the most common fencing method is cutting power to the node. - Another method is "fabric fencing", cutting the node's access to some - capability required to run resources (such as network access or a shared disk). - - == Why Is Fencing Necessary? == - - Fencing protects your data from being corrupted by malfunctioning nodes or - unintentional concurrent access to shared resources. - - Fencing protects against the "split brain" failure scenario, where cluster - nodes have lost the ability to reliably communicate with each other but are - still able to run resources. If the cluster just assumed that uncommunicative - nodes were down, then multiple instances of a resource could be started on - different nodes. - - The effect of split brain depends on the resource type. For example, an IP - address brought up on two hosts on a network will cause packets to randomly be - sent to one or the other host, rendering the IP useless. For a database or - clustered file system, the effect could be much more severe, causing data - corruption or divergence. - - Fencing also is used when a resource cannot otherwise be stopped. If a failed - resource fails to stop, it cannot be recovered elsewhere. Fencing the - resource's node is the only way to ensure the resource is recoverable. - - Users may also configure the +on-fail+ property of any resource operation to - +fencing+, in which case the cluster will fence the resource's node if the - operation fails. - - == Fence Devices == - - A 'fence device' (or 'fencing device') is a special type of resource that - provides the means to fence a node. - - Examples of fencing devices include intelligent power switches and IPMI devices - that accept SNMP commands to cut power to a node, and iSCSI controllers that - allow SCSI reservations to be used to cut a node's access to a shared disk. - - Since fencing devices will be used to recover from loss of networking - connectivity to other nodes, it is essential that they do not rely on the same - network as the cluster itself, otherwise that network becomes a single point of - failure. - - Since loss of a node due to power outage is indistinguishable from loss of - network connectivity to that node, it is also essential that at least one fence - device for a node does not share power with that node. For example, an on-board - IPMI controller that shares power with its host should not be used as the sole - fencing device for that host. - - Since fencing is used to isolate malfunctioning nodes, no fence device should - rely on its target functioning properly. This includes, for example, devices - that ssh into a node and issue a shutdown command (such devices might be - suitable for testing, but never for production). - - == Fence Agents == - - A 'fence agent' (or 'fencing agent') is a +stonith+-class resource agent. - - The fence agent standard provides commands (such as +off+ and +reboot+) that - the cluster can use to fence nodes. As with other resource agent classes, - this allows a layer of abstraction so that Pacemaker doesn't need any knowledge - about specific fencing technologies -- that knowledge is isolated in the agent. - - == When a Fence Device Can Be Used == - - Fencing devices do not actually "run" like most services. Typically, they just - provide an interface for sending commands to an external device. - - Additionally, fencing may be initiated by Pacemaker, by other cluster-aware software - such as DRBD or DLM, or manually by an administrator, at any point in the - cluster life cycle, including before any resources have been started. - - To accommodate this, Pacemaker does not require the fence device resource to be - "started" in order to be used. Whether a fence device is started or not - determines whether a node runs any recurring monitor for the device, and gives - the node a slight preference for being chosen to execute fencing using that - device. - - By default, any node can execute any fencing device. If a fence device is - disabled by setting its +target-role+ to Stopped, then no node can use that - device. If mandatory location constraints prevent a specific node from - "running" a fence device, then that node will never be chosen to execute - fencing using the device. A node may fence itself, but the cluster will choose - that only if no other nodes can do the fencing. - - A common configuration scenario is to have one fence device per target node. - In such a case, users often configure anti-location constraints so that - the target node does not monitor its own device. The best practice is to make - the constraint optional (i.e. a finite negative score rather than +-INFINITY+), - so that the node can fence itself if no other nodes can. - - == Limitations of Fencing Resources == - - Fencing resources have certain limitations that other resource classes don't: - - * They may have only one set of meta-attributes and one set of instance - attributes. - * If <> are used to determine fencing resource options, these - may only be evaluated when first read, meaning that later changes to the - rules will have no effect. Therefore, it is better to avoid confusion and not - use rules at all with fencing resources. - - These limitations could be revisited if there is significant user demand. - - == Special Options for Fencing Resources == - - The table below lists special instance attributes that may be set for any - fencing resource ('not' meta-attributes, even though they are interpreted by - pacemaker rather than the fence agent). These are also listed in the man page - for +pacemaker-fenced+. - - .Additional Properties of Fencing Resources - [width="95%",cols="8m,3,6,<12",options="header",align="center"] - |========================================================= - - |Field - |Type - |Default - |Description - - |stonith-timeout - |NA - |NA - a|Older versions used this to override the default period to wait for a STONITH (reboot, on, off) action to complete for this device. - It has been replaced by the +pcmk_reboot_timeout+ and +pcmk_off_timeout+ properties. - indexterm:[stonith-timeout,Fencing] - indexterm:[Fencing,Property,stonith-timeout] - - //// - (not yet implemented) - priority - integer - 0 - The priority of the STONITH resource. Devices are tried in order of highest priority to lowest. - indexterm priority,Fencing - indexterm Fencing,Property,priority - //// - - |provides - |string - | - |Any special capability provided by the fence device. Currently, only one such - capability is meaningful: +unfencing+ (see <>). - indexterm:[provides,Fencing] - indexterm:[Fencing,Property,provides] - - |pcmk_host_map - |string - | - |A mapping of host names to ports numbers for devices that do not support host names. - Example: +node1:1;node2:2,3+ tells the cluster to use port 1 for - *node1* and ports 2 and 3 for *node2*. If +pcmk_host_check+ is explicitly set - to +static-list+, either this or +pcmk_host_list+ must be set. - indexterm:[pcmk_host_map,Fencing] - indexterm:[Fencing,Property,pcmk_host_map] - - |pcmk_host_list - |string - | - |A list of machines controlled by this device. If +pcmk_host_check+ is - explicitly set to +static-list+, either this or +pcmk_host_map+ must be set. - indexterm:[pcmk_host_list,Fencing] - indexterm:[Fencing,Property,pcmk_host_list] - - |pcmk_host_check - |string - |A value appropriate to other configuration options and - device capabilities (see note below) - a|How to determine which machines are controlled by the device. - Allowed values: - - * +dynamic-list:+ query the device via the "list" command - * +static-list:+ check the +pcmk_host_list+ or +pcmk_host_map+ attribute - * +status:+ query the device via the "status" command - * +none:+ assume every device can fence every machine - - indexterm:[pcmk_host_check,Fencing] - indexterm:[Fencing,Property,pcmk_host_check] - - |pcmk_delay_max - |time - |0s - |Enable a random delay of up to the time specified before executing fencing - actions. This is sometimes used in two-node clusters to ensure that the - nodes don't fence each other at the same time. The overall delay introduced - by pacemaker is derived from this random delay value adding a static delay so - that the sum is kept below the maximum delay. - - indexterm:[pcmk_delay_max,Fencing] - indexterm:[Fencing,Property,pcmk_delay_max] - - |pcmk_delay_base - |time - |0s - |Enable a static delay before executing fencing actions. This can be used - e.g. in two-node clusters to ensure that the nodes don't fence each other, - by having separate fencing resources with different values. The node that is - fenced with the shorter delay will lose a fencing race. The overall delay - introduced by pacemaker is derived from this value plus a random delay such - that the sum is kept below the maximum delay. - - indexterm:[pcmk_delay_base,Fencing] - indexterm:[Fencing,Property,pcmk_delay_base] - - |pcmk_action_limit - |integer - |1 - |The maximum number of actions that can be performed in parallel on this - device, if the cluster option +concurrent-fencing+ is +true+. -1 is unlimited. - - indexterm:[pcmk_action_limit,Fencing] - indexterm:[Fencing,Property,pcmk_action_limit] - - |pcmk_host_argument - |string - |+port+ otherwise +plug+ if supported according to the metadata of the fence agent - |'Advanced use only.' Which parameter should be supplied to the fence agent to - identify the node to be fenced. Some devices support neither the standard +plug+ - nor the deprecated +port+ parameter, or may provide additional ones. Use this to - specify an alternate, device-specific parameter. A value of +none+ tells the - cluster not to supply any additional parameters. - indexterm:[pcmk_host_argument,Fencing] - indexterm:[Fencing,Property,pcmk_host_argument] - - |pcmk_reboot_action - |string - |reboot - |'Advanced use only.' The command to send to the resource agent in order to - reboot a node. Some devices do not support the standard commands or may provide - additional ones. Use this to specify an alternate, device-specific command. - indexterm:[pcmk_reboot_action,Fencing] - indexterm:[Fencing,Property,pcmk_reboot_action] - - |pcmk_reboot_timeout - |time - |60s - |'Advanced use only.' Specify an alternate timeout to use for `reboot` actions - instead of the value of +stonith-timeout+. Some devices need much more or less - time to complete than normal. Use this to specify an alternate, device-specific - timeout. - indexterm:[pcmk_reboot_timeout,Fencing] - indexterm:[Fencing,Property,pcmk_reboot_timeout] - indexterm:[stonith-timeout,Fencing] - indexterm:[Fencing,Property,stonith-timeout] - - |pcmk_reboot_retries - |integer - |2 - |'Advanced use only.' The maximum number of times to retry the `reboot` command - within the timeout period. Some devices do not support multiple connections, and - operations may fail if the device is busy with another task, so Pacemaker will - automatically retry the operation, if there is time remaining. Use this option - to alter the number of times Pacemaker retries before giving up. - indexterm:[pcmk_reboot_retries,Fencing] - indexterm:[Fencing,Property,pcmk_reboot_retries] - - |pcmk_off_action - |string - |off - |'Advanced use only.' The command to send to the resource agent in order to - shut down a node. Some devices do not support the standard commands or may provide - additional ones. Use this to specify an alternate, device-specific command. - indexterm:[pcmk_off_action,Fencing] - indexterm:[Fencing,Property,pcmk_off_action] - - |pcmk_off_timeout - |time - |60s - |'Advanced use only.' Specify an alternate timeout to use for `off` actions - instead of the value of +stonith-timeout+. Some devices need much more or less - time to complete than normal. Use this to specify an alternate, device-specific - timeout. - indexterm:[pcmk_off_timeout,Fencing] - indexterm:[Fencing,Property,pcmk_off_timeout] - indexterm:[stonith-timeout,Fencing] - indexterm:[Fencing,Property,stonith-timeout] - - |pcmk_off_retries - |integer - |2 - |'Advanced use only.' The maximum number of times to retry the `off` command - within the timeout period. Some devices do not support multiple connections, and - operations may fail if the device is busy with another task, so Pacemaker will - automatically retry the operation, if there is time remaining. Use this option - to alter the number of times Pacemaker retries before giving up. - indexterm:[pcmk_off_retries,Fencing] - indexterm:[Fencing,Property,pcmk_off_retries] - - |pcmk_list_action - |string - |list - |'Advanced use only.' The command to send to the resource agent in order to - list nodes. Some devices do not support the standard commands or may provide - additional ones. Use this to specify an alternate, device-specific command. - indexterm:[pcmk_list_action,Fencing] - indexterm:[Fencing,Property,pcmk_list_action] - - |pcmk_list_timeout - |time - |60s - |'Advanced use only.' Specify an alternate timeout to use for `list` actions - instead of the value of +stonith-timeout+. Some devices need much more or less - time to complete than normal. Use this to specify an alternate, device-specific - timeout. - indexterm:[pcmk_list_timeout,Fencing] - indexterm:[Fencing,Property,pcmk_list_timeout] - - |pcmk_list_retries - |integer - |2 - |'Advanced use only.' The maximum number of times to retry the `list` command - within the timeout period. Some devices do not support multiple connections, and - operations may fail if the device is busy with another task, so Pacemaker will - automatically retry the operation, if there is time remaining. Use this option - to alter the number of times Pacemaker retries before giving up. - indexterm:[pcmk_list_retries,Fencing] - indexterm:[Fencing,Property,pcmk_list_retries] - - |pcmk_monitor_action - |string - |monitor - |'Advanced use only.' The command to send to the resource agent in order to - report extended status. Some devices do not support the standard commands or may provide - additional ones. Use this to specify an alternate, device-specific command. - indexterm:[pcmk_monitor_action,Fencing] - indexterm:[Fencing,Property,pcmk_monitor_action] - - |pcmk_monitor_timeout - |time - |60s - |'Advanced use only.' Specify an alternate timeout to use for `monitor` actions - instead of the value of +stonith-timeout+. Some devices need much more or less - time to complete than normal. Use this to specify an alternate, device-specific - timeout. - indexterm:[pcmk_monitor_timeout,Fencing] - indexterm:[Fencing,Property,pcmk_monitor_timeout] - - |pcmk_monitor_retries - |integer - |2 - |'Advanced use only.' The maximum number of times to retry the `monitor` command - within the timeout period. Some devices do not support multiple connections, and - operations may fail if the device is busy with another task, so Pacemaker will - automatically retry the operation, if there is time remaining. Use this option - to alter the number of times Pacemaker retries before giving up. - indexterm:[pcmk_monitor_retries,Fencing] - indexterm:[Fencing,Property,pcmk_monitor_retries] - - |pcmk_status_action - |string - |status - |'Advanced use only.' The command to send to the resource agent in order to - report status. Some devices do not support the standard commands or may provide - additional ones. Use this to specify an alternate, device-specific command. - indexterm:[pcmk_status_action,Fencing] - indexterm:[Fencing,Property,pcmk_status_action] - - |pcmk_status_timeout - |time - |60s - |'Advanced use only.' Specify an alternate timeout to use for `status` actions - instead of the value of +stonith-timeout+. Some devices need much more or less - time to complete than normal. Use this to specify an alternate, device-specific - timeout. - indexterm:[pcmk_status_timeout,Fencing] - indexterm:[Fencing,Property,pcmk_status_timeout] - - |pcmk_status_retries - |integer - |2 - |'Advanced use only.' The maximum number of times to retry the `status` command - within the timeout period. Some devices do not support multiple connections, and - operations may fail if the device is busy with another task, so Pacemaker will - automatically retry the operation, if there is time remaining. Use this option - to alter the number of times Pacemaker retries before giving up. - indexterm:[pcmk_status_retries,Fencing] - indexterm:[Fencing,Property,pcmk_status_retries] - - |========================================================= - - [NOTE] - ==== - The default value for +pcmk_host_check+ is +static-list+ if either - +pcmk_host_list+ or +pcmk_host_map+ is configured. If neither of those are - configured, the default is +dynamic-list+ if the fence device supports the list - action, or +status+ if the fence device supports the status action but not the - list action. If none of those conditions apply, the default is +none+. - ==== - - [[s-unfencing]] - == Unfencing == - - With fabric fencing (such as cutting network or shared disk access rather than - power), it is expected that the cluster will fence the node, and - then a system administrator must manually investigate what went wrong, correct - any issues found, then reboot (or restart the cluster services on) the node. - - Once the node reboots and rejoins the cluster, some fabric fencing devices - require an explicit command to restore the node's access. This capability is - called 'unfencing' and is typically implemented as the fence agent's +on+ - command. - - If any cluster resource has +requires+ set to +unfencing+, then that resource - will not be probed or started on a node until that node has been unfenced. - - == Fence Devices Dependent on Other Resources == - - In some cases, a fence device may require some other cluster resource (such as - an IP address) to be active in order to function properly. - - This is obviously undesirable in general: fencing may be required when the - depended-on resource is not active, or fencing may be required because the node - running the depended-on resource is no longer responding. - - However, this may be acceptable under certain conditions: - - * The dependent fence device should not be able to target any node that is - allowed to run the depended-on resource. - - * The depended-on resource should not be disabled during production operation. - - * The +concurrent-fencing+ cluster property should be set to +true+. Otherwise, - if both the node running the depended-on resource and some node targeted by - the dependent fence device need to be fenced, the fencing of the node - running the depended-on resource might be ordered first, making the second - fencing impossible and blocking further recovery. With concurrent fencing, - the dependent fence device might fail at first due to the depended-on - resource being unavailable, but it will be retried and eventually succeed - once the resource is brought back up. - - Even under those conditions, there is one unlikely problem scenario. The DC - always schedules fencing of itself after any other fencing needed, to avoid - unnecessary repeated DC elections. If the dependent fence device targets the - DC, and both the DC and a different node running the depended-on resource need - to be fenced, the DC fencing will always fail and block further recovery. Note, - however, that losing a DC node entirely causes some other node to become DC and - schedule the fencing, so this is only a risk when a stop or other operation - with +on-fail+ set to +fencing+ fails on the DC. - - == Configuring Fencing == - - . Find the correct driver: - + - ---- - # stonith_admin --list-installed - ---- - - . Find the required parameters associated with the device - (replacing $AGENT_NAME with the name obtained from the previous step): - + - ---- - # stonith_admin --metadata --agent $AGENT_NAME - ---- - - . Create a file called +stonith.xml+ containing a primitive resource - with a class of +stonith+, a type equal to the agent name obtained earlier, - and a parameter for each of the values returned in the previous step. - - . If the device does not know how to fence nodes based on their uname, - you may also need to set the special +pcmk_host_map+ parameter. See - `man pacemaker-fenced` for details. - - . If the device does not support the `list` command, you may also need - to set the special +pcmk_host_list+ and/or +pcmk_host_check+ - parameters. See `man pacemaker-fenced` for details. - - . If the device does not expect the victim to be specified with the - `port` parameter, you may also need to set the special - +pcmk_host_argument+ parameter. See `man pacemaker-fenced` for details. - - . Upload it into the CIB using cibadmin: - + - ---- - # cibadmin -C -o resources --xml-file stonith.xml - ---- - - . Set +stonith-enabled+ to true: - + - ---- - # crm_attribute -t crm_config -n stonith-enabled -v true - ---- - - . Once the stonith resource is running, you can test it by executing the - following (although you might want to stop the cluster on that machine - first): - + - ---- - # stonith_admin --reboot nodename - ---- - - === Example Fencing Configuration === - - Assume we have a chassis containing four nodes and an IPMI device - active on 192.0.2.1. We would choose the `fence_ipmilan` driver, - and obtain the following list of parameters: - - .Obtaining a list of Fence Agent Parameters - ==== - ---- - # stonith_admin --metadata -a fence_ipmilan - ---- - - [source,XML] - ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ---- - ==== - - Based on that, we would create a fencing resource fragment that might look - like this: - - .An IPMI-based Fencing Resource - ==== - [source,XML] - ---- - - - - - - - - - - - - ---- - ==== - - Finally, we need to enable fencing: - ---- - # crm_attribute -t crm_config -n stonith-enabled -v true - ---- - - == Fencing Topologies == - - Pacemaker supports fencing nodes with multiple devices through a feature called - 'fencing topologies'. Fencing topologies may be used to provide alternative - devices in case one fails, or to require multiple devices to all be executed - successfully in order to consider the node successfully fenced, or even a - combination of the two. - - Create the individual devices as you normally would, then define one or more - +fencing-level+ entries in the +fencing-topology+ section of the configuration. - - * Each fencing level is attempted in order of ascending +index+. Allowed - values are 1 through 9. - * If a device fails, processing terminates for the current level. - No further devices in that level are exercised, and the next level is attempted instead. - * If the operation succeeds for all the listed devices in a level, the level is deemed to have passed. - * The operation is finished when a level has passed (success), or all levels have been attempted (failed). - * If the operation failed, the next step is determined by the scheduler - and/or the controller. - - Some possible uses of topologies include: - - * Try on-board IPMI, then an intelligent power switch if that fails - * Try fabric fencing of both disk and network, then fall back to power fencing - if either fails - * Wait up to a certain time for a kernel dump to complete, then cut power to - the node - - .Properties of Fencing Levels - [width="95%",cols="1m,<3",options="header",align="center"] - |========================================================= - - |Field - |Description - - |id - |A unique name for the level - indexterm:[id,fencing-level] - indexterm:[Fencing,fencing-level,id] - - |target - |The name of a single node to which this level applies - indexterm:[target,fencing-level] - indexterm:[Fencing,fencing-level,target] - - |target-pattern - |An extended regular expression (as defined in - http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04[POSIX]) - matching the names of nodes to which this level applies - indexterm:[target-pattern,fencing-level] - indexterm:[Fencing,fencing-level,target-pattern] - - |target-attribute - |The name of a node attribute that is set (to +target-value+) for nodes to - which this level applies - indexterm:[target-attribute,fencing-level] - indexterm:[Fencing,fencing-level,target-attribute] - - |target-value - |The node attribute value (of +target-attribute+) that is set for nodes to - which this level applies - indexterm:[target-attribute,fencing-level] - indexterm:[Fencing,fencing-level,target-attribute] - - |index - |The order in which to attempt the levels. - Levels are attempted in ascending order 'until one succeeds'. - Valid values are 1 through 9. - indexterm:[index,fencing-level] - indexterm:[Fencing,fencing-level,index] - - |devices - |A comma-separated list of devices that must all be tried for this level - indexterm:[devices,fencing-level] - indexterm:[Fencing,fencing-level,devices] - - |========================================================= - - .Fencing topology with different devices for different nodes - ==== - [source,XML] - ---- - - - ... - - - - - - - - - - ... - - - - ---- - ==== - - === Example Dual-Layer, Dual-Device Fencing Topologies === - - The following example illustrates an advanced use of +fencing-topology+ in a cluster with the following properties: - - * 3 nodes (2 active prod-mysql nodes, 1 prod_mysql-rep in standby for quorum purposes) - * the active nodes have an IPMI-controlled power board reached at 192.0.2.1 and 192.0.2.2 - * the active nodes also have two independent PSUs (Power Supply Units) - connected to two independent PDUs (Power Distribution Units) reached at - 198.51.100.1 (port 10 and port 11) and 203.0.113.1 (port 10 and port 11) - * the first fencing method uses the `fence_ipmi` agent - * the second fencing method uses the `fence_apc_snmp` agent targetting 2 fencing devices (one per PSU, either port 10 or 11) - * fencing is only implemented for the active nodes and has location constraints - * fencing topology is set to try IPMI fencing first then default to a "sure-kill" dual PDU fencing - - In a normal failure scenario, STONITH will first select +fence_ipmi+ to try to kill the faulty node. - Using a fencing topology, if that first method fails, STONITH will then move on to selecting +fence_apc_snmp+ twice: - - * once for the first PDU - * again for the second PDU - - The fence action is considered successful only if both PDUs report the required status. If any of them fails, STONITH loops back to the first fencing method, +fence_ipmi+, and so on until the node is fenced or fencing action is cancelled. - - .First fencing method: single IPMI device - - Each cluster node has it own dedicated IPMI channel that can be called for fencing using the following primitives: - [source,XML] - ---- - - - - - - - - - - - - - - - - - - - - - - - ---- - - .Second fencing method: dual PDU devices - - Each cluster node also has two distinct power channels controlled by two - distinct PDUs. That means a total of 4 fencing devices configured as follows: - - - Node 1, PDU 1, PSU 1 @ port 10 - - Node 1, PDU 2, PSU 2 @ port 10 - - Node 2, PDU 1, PSU 1 @ port 11 - - Node 2, PDU 2, PSU 2 @ port 11 - - The matching fencing agents are configured as follows: - [source,XML] - ---- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ---- - - .Location Constraints - - To prevent STONITH from trying to run a fencing agent on the same node it is - supposed to fence, constraints are placed on all the fencing primitives: - [source,XML] - ---- - - - - - - - - - ---- - - .Fencing topology - - Now that all the fencing resources are defined, it's time to create the right topology. - We want to first fence using IPMI and if that does not work, fence both PDUs to effectively and surely kill the node. - [source,XML] - ---- - - - - - - - ---- - Please note, in +fencing-topology+, the lowest +index+ value determines the priority of the first fencing method. - - .Final configuration - - Put together, the configuration looks like this: - [source,XML] - ---- - - - - - - - +What Is Fencing? +################ + +*Fencing* is the ability to make a node unable to run resources, even when that +node is unresponsive to cluster commands. + +Fencing is also known as *STONITH*, an acronym for "Shoot The Other Node In The +Head", since the most common fencing method is cutting power to the node. +Another method is "fabric fencing", cutting the node's access to some +capability required to run resources (such as network access or a shared disk). + +.. index:: + single: fencing; why necessary + +Why Is Fencing Necessary? +######################### + +Fencing protects your data from being corrupted by malfunctioning nodes or +unintentional concurrent access to shared resources. + +Fencing protects against the "split brain" failure scenario, where cluster +nodes have lost the ability to reliably communicate with each other but are +still able to run resources. If the cluster just assumed that uncommunicative +nodes were down, then multiple instances of a resource could be started on +different nodes. + +The effect of split brain depends on the resource type. For example, an IP +address brought up on two hosts on a network will cause packets to randomly be +sent to one or the other host, rendering the IP useless. For a database or +clustered file system, the effect could be much more severe, causing data +corruption or divergence. + +Fencing is also used when a resource cannot otherwise be stopped. If a +resource fails to stop on a node, it cannot be started on a different node +without risking the same type of conflict as split-brain. Fencing the +original node ensures the resource can be safely started elsewhere. + +Users may also configure the ``on-fail`` property of :ref:`operation` or the +``loss-policy`` property of +:ref:`ticket constraints ` to ``fence``, in which +case the cluster will fence the resource's node if the operation fails or the +ticket is lost. + +.. index:: + single: fencing; device + +Fence Devices +############# + +A *fence device* or *fencing device* is a special type of resource that +provides the means to fence a node. + +Examples of fencing devices include intelligent power switches and IPMI devices +that accept SNMP commands to cut power to a node, and iSCSI controllers that +allow SCSI reservations to be used to cut a node's access to a shared disk. + +Since fencing devices will be used to recover from loss of networking +connectivity to other nodes, it is essential that they do not rely on the same +network as the cluster itself, otherwise that network becomes a single point of +failure. + +Since loss of a node due to power outage is indistinguishable from loss of +network connectivity to that node, it is also essential that at least one fence +device for a node does not share power with that node. For example, an on-board +IPMI controller that shares power with its host should not be used as the sole +fencing device for that host. + +Since fencing is used to isolate malfunctioning nodes, no fence device should +rely on its target functioning properly. This includes, for example, devices +that ssh into a node and issue a shutdown command (such devices might be +suitable for testing, but never for production). + +.. index:: + single: fencing; agent + +Fence Agents +############ + +A *fence agent* or *fencing agent* is a ``stonith``-class resource agent. + +The fence agent standard provides commands (such as ``off`` and ``reboot``) +that the cluster can use to fence nodes. As with other resource agent classes, +this allows a layer of abstraction so that Pacemaker doesn't need any knowledge +about specific fencing technologies -- that knowledge is isolated in the agent. + +When a Fence Device Can Be Used +############################### + +Fencing devices do not actually "run" like most services. Typically, they just +provide an interface for sending commands to an external device. + +Additionally, fencing may be initiated by Pacemaker, by other cluster-aware +software such as DRBD or DLM, or manually by an administrator, at any point in +the cluster life cycle, including before any resources have been started. + +To accommodate this, Pacemaker does not require the fence device resource to be +"started" in order to be used. Whether a fence device is started or not +determines whether a node runs any recurring monitor for the device, and gives +the node a slight preference for being chosen to execute fencing using that +device. + +By default, any node can execute any fencing device. If a fence device is +disabled by setting its ``target-role`` to ``Stopped``, then no node can use +that device. If a location constraint with a negative score prevents a specific +node from "running" a fence device, then that node will never be chosen to +execute fencing using the device. A node may fence itself, but the cluster will +choose that only if no other nodes can do the fencing. + +A common configuration scenario is to have one fence device per target node. +In such a case, users often configure anti-location constraints so that +the target node does not monitor its own device. + +Limitations of Fencing Resources +################################ + +Fencing resources have certain limitations that other resource classes don't: + +* They may have only one set of meta-attributes and one set of instance + attributes. +* If :ref:`rules` are used to determine fencing resource options, these + might be evaluated only when first read, meaning that later changes to the + rules will have no effect. Therefore, it is better to avoid confusion and not + use rules at all with fencing resources. + +These limitations could be revisited if there is sufficient user demand. + +.. index:: + single: fencing; special instance attributes + +.. _fencing-attributes: + +Special Options for Fencing Resources +##################################### + +The table below lists special instance attributes that may be set for any +fencing resource (*not* meta-attributes, even though they are interpreted by +Pacemaker rather than the fence agent). These are also listed in the man page +for ``pacemaker-fenced``. + +.. Not_Yet_Implemented: + + +----------------------+---------+--------------------+----------------------------------------+ + | priority | integer | 0 | .. index:: | + | | | | single: priority | + | | | | | + | | | | The priority of the fence device. | + | | | | Devices are tried in order of highest | + | | | | priority to lowest. | + +----------------------+---------+--------------------+----------------------------------------+ + +.. table:: **Additional Properties of Fencing Resources** + + +----------------------+---------+--------------------+----------------------------------------+ + | Field | Type | Default | Description | + +======================+=========+====================+========================================+ + | stonith-timeout | time | | .. index:: | + | | | | single: stonith-timeout | + | | | | | + | | | | Older versions used this to override | + | | | | the default period to wait for a fence | + | | | | action (reboot, on, or off) to | + | | | | complete for this device. It has been | + | | | | replaced by the | + | | | | ``pcmk_reboot_timeout`` and | + | | | | ``pcmk_off_timeout`` properties. | + +----------------------+---------+--------------------+----------------------------------------+ + | provides | string | | .. index:: | + | | | | single: provides | + | | | | | + | | | | Any special capability provided by the | + | | | | fence device. Currently, only one such | + | | | | capability is meaningful: | + | | | | :ref:`unfencing `. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_host_map | string | | .. index:: | + | | | | single: pcmk_host_map | + | | | | | + | | | | A mapping of host names to ports | + | | | | numbers for devices that do not | + | | | | support host names. | + | | | | | + | | | | Example: ``node1:1;node2:2,3`` tells | + | | | | the cluster to use port 1 for | + | | | | ``node1`` and ports 2 and 3 for | + | | | | ``node2``. If ``pcmk_host_check`` is | + | | | | explicitly set to ``static-list``, | + | | | | either this or ``pcmk_host_list`` must | + | | | | be set. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_host_list | string | | .. index:: | + | | | | single: pcmk_host_list | + | | | | | + | | | | A list of machines controlled by this | + | | | | device. If ``pcmk_host_check`` is | + | | | | explicitly set to ``static-list``, | + | | | | either this or ``pcmk_host_map`` must | + | | | | be set. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_host_check | string | The default is | .. index:: | + | | | ``static-list`` if | single: pcmk_host_check | + | | | either | | + | | | ``pcmk_host_list`` | How to determine which machines are | + | | | or | controlled by the device. Allowed | + | | | ``pcmk_host_map`` | values: | + | | | is configured. If | | + | | | neither of those | * ``dynamic-list:`` query the device | + | | | are configured, | via the agent's ``list`` action | + | | | the default is | * ``static-list:`` check the | + | | | ``dynamic-list`` | ``pcmk_host_list`` or | + | | | if the fence | ``pcmk_host_map`` attribute | + | | | device supports | * ``status:`` query the device via the | + | | | the list action, | "status" command | + | | | or ``status`` if | * ``none:`` assume the device can | + | | | the fence device | fence any node | + | | | supports the | | + | | | status action but | | + | | | not the list | | + | | | action. If none of | | + | | | those conditions | | + | | | apply, the default | | + | | | is ``none``. | | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_delay_max | time | 0s | .. index:: | + | | | | single: pcmk_delay_max | + | | | | | + | | | | Enable a random delay of up to the | + | | | | time specified before executing | + | | | | fencing actions. This is sometimes | + | | | | used in two-node clusters to ensure | + | | | | that the nodes don't fence each other | + | | | | at the same time. The overall delay | + | | | | introduced by pacemaker is derived | + | | | | from this random delay value adding a | + | | | | static delay so that the sum is kept | + | | | | below the maximum delay. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_delay_base | time | 0s | .. index:: | + | | | | single: pcmk_delay_base | + | | | | | + | | | | Enable a static delay before executing | + | | | | fencing actions. This can be used, for | + | | | | example, in two-node clusters to | + | | | | ensure that the nodes don't fence each | + | | | | other, by having separate fencing | + | | | | resources with different values. The | + | | | | node that is fenced with the shorter | + | | | | delay will lose a fencing race. The | + | | | | overall delay introduced by pacemaker | + | | | | is derived from this value plus a | + | | | | random delay such that the sum is kept | + | | | | below the maximum delay. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_action_limit | integer | 1 | .. index:: | + | | | | single: pcmk_action_limit | + | | | | | + | | | | The maximum number of actions that can | + | | | | be performed in parallel on this | + | | | | device, if the cluster option | + | | | | ``concurrent-fencing`` is ``true``. A | + | | | | value of -1 means unlimited. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_host_argument | string | ``port`` otherwise | .. index:: | + | | | ``plug`` if | single: pcmk_host_argument | + | | | supported | | + | | | according to the | *Advanced use only.* Which parameter | + | | | metadata of the | should be supplied to the fence agent | + | | | fence agent | to identify the node to be fenced. | + | | | | Some devices support neither the | + | | | | standard ``plug`` nor the deprecated | + | | | | ``port`` parameter, or may provide | + | | | | additional ones. Use this to specify | + | | | | an alternate, device-specific | + | | | | parameter. A value of ``none`` tells | + | | | | the cluster not to supply any | + | | | | additional parameters. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_reboot_action | string | reboot | .. index:: | + | | | | single: pcmk_reboot_action | + | | | | | + | | | | *Advanced use only.* The command to | + | | | | send to the resource agent in order to | + | | | | reboot a node. Some devices do not | + | | | | support the standard commands or may | + | | | | provide additional ones. Use this to | + | | | | specify an alternate, device-specific | + | | | | command. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_reboot_timeout | time | 60s | .. index:: | + | | | | single: pcmk_reboot_timeout | + | | | | | + | | | | *Advanced use only.* Specify an | + | | | | alternate timeout to use for | + | | | | ``reboot`` actions instead of the | + | | | | value of ``stonith-timeout``. Some | + | | | | devices need much more or less time to | + | | | | complete than normal. Use this to | + | | | | specify an alternate, device-specific | + | | | | timeout. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_reboot_retries | integer | 2 | .. index:: | + | | | | single: pcmk_reboot_retries | + | | | | | + | | | | *Advanced use only.* The maximum | + | | | | number of times to retry the | + | | | | ``reboot`` command within the timeout | + | | | | period. Some devices do not support | + | | | | multiple connections, and operations | + | | | | may fail if the device is busy with | + | | | | another task, so Pacemaker will | + | | | | automatically retry the operation, if | + | | | | there is time remaining. Use this | + | | | | option to alter the number of times | + | | | | Pacemaker retries before giving up. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_off_action | string | off | .. index:: | + | | | | single: pcmk_off_action | + | | | | | + | | | | *Advanced use only.* The command to | + | | | | send to the resource agent in order to | + | | | | shut down a node. Some devices do not | + | | | | support the standard commands or may | + | | | | provide additional ones. Use this to | + | | | | specify an alternate, device-specific | + | | | | command. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_off_timeout | time | 60s | .. index:: | + | | | | single: pcmk_off_timeout | + | | | | | + | | | | *Advanced use only.* Specify an | + | | | | alternate timeout to use for | + | | | | ``off`` actions instead of the | + | | | | value of ``stonith-timeout``. Some | + | | | | devices need much more or less time to | + | | | | complete than normal. Use this to | + | | | | specify an alternate, device-specific | + | | | | timeout. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_off_retries | integer | 2 | .. index:: | + | | | | single: pcmk_off_retries | + | | | | | + | | | | *Advanced use only.* The maximum | + | | | | number of times to retry the | + | | | | ``off`` command within the timeout | + | | | | period. Some devices do not support | + | | | | multiple connections, and operations | + | | | | may fail if the device is busy with | + | | | | another task, so Pacemaker will | + | | | | automatically retry the operation, if | + | | | | there is time remaining. Use this | + | | | | option to alter the number of times | + | | | | Pacemaker retries before giving up. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_list_action | string | list | .. index:: | + | | | | single: pcmk_list_action | + | | | | | + | | | | *Advanced use only.* The command to | + | | | | send to the resource agent in order to | + | | | | list nodes. Some devices do not | + | | | | support the standard commands or may | + | | | | provide additional ones. Use this to | + | | | | specify an alternate, device-specific | + | | | | command. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_list_timeout | time | 60s | .. index:: | + | | | | single: pcmk_list_timeout | + | | | | | + | | | | *Advanced use only.* Specify an | + | | | | alternate timeout to use for | + | | | | ``list`` actions instead of the | + | | | | value of ``stonith-timeout``. Some | + | | | | devices need much more or less time to | + | | | | complete than normal. Use this to | + | | | | specify an alternate, device-specific | + | | | | timeout. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_list_retries | integer | 2 | .. index:: | + | | | | single: pcmk_list_retries | + | | | | | + | | | | *Advanced use only.* The maximum | + | | | | number of times to retry the | + | | | | ``list`` command within the timeout | + | | | | period. Some devices do not support | + | | | | multiple connections, and operations | + | | | | may fail if the device is busy with | + | | | | another task, so Pacemaker will | + | | | | automatically retry the operation, if | + | | | | there is time remaining. Use this | + | | | | option to alter the number of times | + | | | | Pacemaker retries before giving up. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_monitor_action | string | monitor | .. index:: | + | | | | single: pcmk_monitor_action | + | | | | | + | | | | *Advanced use only.* The command to | + | | | | send to the resource agent in order to | + | | | | report extended status. Some devices do| + | | | | not support the standard commands or | + | | | | may provide additional ones. Use this | + | | | | to specify an alternate, | + | | | | device-specific command. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_monitor_timeout | time | 60s | .. index:: | + | | | | single: pcmk_monitor_timeout | + | | | | | + | | | | *Advanced use only.* Specify an | + | | | | alternate timeout to use for | + | | | | ``monitor`` actions instead of the | + | | | | value of ``stonith-timeout``. Some | + | | | | devices need much more or less time to | + | | | | complete than normal. Use this to | + | | | | specify an alternate, device-specific | + | | | | timeout. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_monitor_retries | integer | 2 | .. index:: | + | | | | single: pcmk_monitor_retries | + | | | | | + | | | | *Advanced use only.* The maximum | + | | | | number of times to retry the | + | | | | ``monitor`` command within the timeout | + | | | | period. Some devices do not support | + | | | | multiple connections, and operations | + | | | | may fail if the device is busy with | + | | | | another task, so Pacemaker will | + | | | | automatically retry the operation, if | + | | | | there is time remaining. Use this | + | | | | option to alter the number of times | + | | | | Pacemaker retries before giving up. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_status_action | string | status | .. index:: | + | | | | single: pcmk_status_action | + | | | | | + | | | | *Advanced use only.* The command to | + | | | | send to the resource agent in order to | + | | | | report status. Some devices do | + | | | | not support the standard commands or | + | | | | may provide additional ones. Use this | + | | | | to specify an alternate, | + | | | | device-specific command. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_status_timeout | time | 60s | .. index:: | + | | | | single: pcmk_status_timeout | + | | | | | + | | | | *Advanced use only.* Specify an | + | | | | alternate timeout to use for | + | | | | ``status`` actions instead of the | + | | | | value of ``stonith-timeout``. Some | + | | | | devices need much more or less time to | + | | | | complete than normal. Use this to | + | | | | specify an alternate, device-specific | + | | | | timeout. | + +----------------------+---------+--------------------+----------------------------------------+ + | pcmk_status_retries | integer | 2 | .. index:: | + | | | | single: pcmk_status_retries | + | | | | | + | | | | *Advanced use only.* The maximum | + | | | | number of times to retry the | + | | | | ``status`` command within the timeout | + | | | | period. Some devices do not support | + | | | | multiple connections, and operations | + | | | | may fail if the device is busy with | + | | | | another task, so Pacemaker will | + | | | | automatically retry the operation, if | + | | | | there is time remaining. Use this | + | | | | option to alter the number of times | + | | | | Pacemaker retries before giving up. | + +----------------------+---------+--------------------+----------------------------------------+ + +.. index:: + single: unfencing + single: fencing; unfencing + +.. _unfencing: + +Unfencing +######### + +With fabric fencing (such as cutting network or shared disk access rather than +power), it is expected that the cluster will fence the node, and then a system +administrator must manually investigate what went wrong, correct any issues +found, then reboot (or restart the cluster services on) the node. + +Once the node reboots and rejoins the cluster, some fabric fencing devices +require an explicit command to restore the node's access. This capability is +called *unfencing* and is typically implemented as the fence agent's ``on`` +command. + +If any cluster resource has ``requires`` set to ``unfencing``, then that +resource will not be probed or started on a node until that node has been +unfenced. + +Fence Devices Dependent on Other Resources +########################################## + +In some cases, a fence device may require some other cluster resource (such as +an IP address) to be active in order to function properly. + +This is obviously undesirable in general: fencing may be required when the +depended-on resource is not active, or fencing may be required because the node +running the depended-on resource is no longer responding. + +However, this may be acceptable under certain conditions: + +* The dependent fence device should not be able to target any node that is + allowed to run the depended-on resource. + +* The depended-on resource should not be disabled during production operation. + +* The ``concurrent-fencing`` cluster property should be set to ``true``. + Otherwise, if both the node running the depended-on resource and some node + targeted by the dependent fence device need to be fenced, the fencing of the + node running the depended-on resource might be ordered first, making the + second fencing impossible and blocking further recovery. With concurrent + fencing, the dependent fence device might fail at first due to the + depended-on resource being unavailable, but it will be retried and eventually + succeed once the resource is brought back up. + +Even under those conditions, there is one unlikely problem scenario. The DC +always schedules fencing of itself after any other fencing needed, to avoid +unnecessary repeated DC elections. If the dependent fence device targets the +DC, and both the DC and a different node running the depended-on resource need +to be fenced, the DC fencing will always fail and block further recovery. Note, +however, that losing a DC node entirely causes some other node to become DC and +schedule the fencing, so this is only a risk when a stop or other operation +with ``on-fail`` set to ``fencing`` fails on the DC. + +.. index:: + single: fencing; configuration + +Configuring Fencing +################### + +Higher-level tools can provide simpler interfaces to this process, but using +Pacemaker command-line tools, this is how you could configure a fence device. + +#. Find the correct driver: + + .. code-block:: none + + # stonith_admin --list-installed + + .. note:: + + You may have to install packages to make fence agents available on your + host. Searching your available packages for ``fence-`` is usually + helpful. Ensure the packages providing the fence agents you require are + installed on every cluster node. + +#. Find the required parameters associated with the device + (replacing ``$AGENT_NAME`` with the name obtained from the previous step): + + .. code-block:: none + + # stonith_admin --metadata --agent $AGENT_NAME + +#. Create a file called ``stonith.xml`` containing a primitive resource + with a class of ``stonith``, a type equal to the agent name obtained earlier, + and a parameter for each of the values returned in the previous step. + +#. If the device does not know how to fence nodes based on their uname, + you may also need to set the special ``pcmk_host_map`` parameter. See + :ref:`fencing-attributes` for details. + +#. If the device does not support the ``list`` command, you may also need + to set the special ``pcmk_host_list`` and/or ``pcmk_host_check`` + parameters. See :ref:`fencing-attributes` for details. + +#. If the device does not expect the victim to be specified with the + ``port`` parameter, you may also need to set the special + ``pcmk_host_argument`` parameter. See :ref:`fencing-attributes` for details. + +#. Upload it into the CIB using cibadmin: + + .. code-block:: none + + # cibadmin --create --scope resources --xml-file stonith.xml + +#. Set ``stonith-enabled`` to true: + + .. code-block:: none + + # crm_attribute --type crm_config --name stonith-enabled --update true + +#. Once the stonith resource is running, you can test it by executing the + following, replacing ``$NODE_NAME`` with the name of the node to fence + (although you might want to stop the cluster on that machine first): + + .. code-block:: none + + # stonith_admin --reboot $NODE_NAME + + +Example Fencing Configuration +_____________________________ + +For this example, we assume we have a cluster node, ``pcmk-1``, whose IPMI +controller is reachable at the IP address 192.0.2.1. The IPMI controller uses +the username ``testuser`` and the password ``abc123``. + +#. Looking at what's installed, we may see a variety of available agents: + + .. code-block:: none + + # stonith_admin --list-installed + + .. code-block:: none + + (... some output omitted ...) + fence_idrac + fence_ilo3 + fence_ilo4 + fence_ilo5 + fence_imm + fence_ipmilan + (... some output omitted ...) + + Perhaps after some reading some man pages and doing some Internet searches, + we might decide ``fence_ipmilan`` is our best choice. + +#. Next, we would check what parameters ``fence_ipmilan`` provides: + + .. code-block:: none + + # stonith_admin --metadata -a fence_ipmilan + + .. code-block:: xml + + + + + + + + fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option. + + + + + + Fencing action + + + + + + IPMI Lan Auth type. + + + + + Ciphersuite to use (same as ipmitool -C parameter) + + + + + Hexadecimal-encoded Kg key for IPMIv2 authentication + + + + + IP address or hostname of fencing device + + + + + IP address or hostname of fencing device + + + + + TCP/UDP port to use for connection with device + + + + + Use Lanplus to improve security of connection + + + + + Login name + + + + + + Method to fence + + + + + Login password or passphrase + + + + + Script to run to retrieve password + + + + + Login password or passphrase + + + + + Script to run to retrieve password + + + + + IP address or hostname of fencing device (together with --port-as-ip) + + + + + IP address or hostname of fencing device (together with --port-as-ip) + + + + + + Privilege level on IPMI device + + + + + Bridge IPMI requests to the remote target address + + + + + Login name + + + + + Disable logging to stderr. Does not affect --verbose or --debug-file or logging to syslog. + + + + + Verbose mode + + + + + Write debug information to given file + + + + + Write debug information to given file + + + + + Display version information and exit + + + + + Display help and exit + + + + + Wait X seconds before fencing is started + + + + + Path to ipmitool binary + + + + + Wait X seconds for cmd prompt after login + + + + + Make "port/plug" to be an alias to IP address + + + + + Test X seconds for status change after ON/OFF + + + + + Wait X seconds after issuing ON/OFF + + + + + Wait X seconds for cmd prompt after issuing command + + + + + Count of attempts to retry power on + + + + + Use sudo (without password) when calling 3rd party software + + + + + Use sudo (without password) when calling 3rd party software + + + + + Path to sudo binary + + + + + + + + + + + + + + + + + + Once we've decided what parameter values we think we need, it is a good idea + to run the fence agent's status action manually, to verify that our values + work correctly: + + .. code-block:: none + + # fence_ipmilan --lanplus -a 192.0.2.1 -l testuser -p abc123 -o status + + Chassis Power is on + +#. Based on that, we might create a fencing resource configuration like this in + ``stonith.xml`` (or any file name, just use the same name with ``cibadmin`` + later): + + .. code-block:: xml + + + + + + + + + + + + + + .. note:: + + Even though the man page shows that the ``action`` parameter is + supported, we do not provide that in the resource configuration. + Pacemaker will supply an appropriate action whenever the fence device + must be used. + +#. In this case, we don't need to configure ``pcmk_host_map`` because + ``fence_ipmilan`` ignores the target node name and instead uses its + ``ip`` parameter to know how to contact the IPMI controller. + +#. We do need to let Pacemaker know which cluster node can be fenced by this + device, since ``fence_ipmilan`` doesn't support the ``list`` action. Add + a line like this to the agent's instance attributes: + + .. code-block:: xml + + + +#. We don't need to configure ``pcmk_host_argument`` since ``ip`` is all the + fence agent needs (it ignores the target name). + +#. Make the configuration active: + + .. code-block:: none + + # cibadmin --create --scope resources --xml-file stonith.xml + +#. Set ``stonith-enabled`` to true (this only has to be done once): + + .. code-block:: none + + # crm_attribute --type crm_config --name stonith-enabled --update true + +#. Since our cluster is still in testing, we can reboot ``pcmk-1`` without + bothering anyone, so we'll test our fencing configuration by running this + from one of the other cluster nodes: + + .. code-block:: none + + # stonith_admin --reboot pcmk-1 + + Then we will verify that the node did, in fact, reboot. + +We can repeat that process to create a separate fencing resource for each node. + +With some other fence device types, a single fencing resource is able to be +used for all nodes. In fact, we could do that with ``fence_ipmilan``, using the +``port-as-ip`` parameter along with ``pcmk_host_map``. Either approach is +fine. + +.. index:: + single: fencing; topology + single: fencing-topology + single: fencing-level + +Fencing Topologies +################## + +Pacemaker supports fencing nodes with multiple devices through a feature called +*fencing topologies*. Fencing topologies may be used to provide alternative +devices in case one fails, or to require multiple devices to all be executed +successfully in order to consider the node successfully fenced, or even a +combination of the two. + +Create the individual devices as you normally would, then define one or more +``fencing-level`` entries in the ``fencing-topology`` section of the +configuration. + +* Each fencing level is attempted in order of ascending ``index``. Allowed + values are 1 through 9. +* If a device fails, processing terminates for the current level. No further + devices in that level are exercised, and the next level is attempted instead. +* If the operation succeeds for all the listed devices in a level, the level is + deemed to have passed. +* The operation is finished when a level has passed (success), or all levels + have been attempted (failed). +* If the operation failed, the next step is determined by the scheduler and/or + the controller. + +Some possible uses of topologies include: + +* Try on-board IPMI, then an intelligent power switch if that fails +* Try fabric fencing of both disk and network, then fall back to power fencing + if either fails +* Wait up to a certain time for a kernel dump to complete, then cut power to + the node + +.. table:: **Attributes of a fencing-level Element** + + +------------------+-----------------------------------------------------------------------------------------+ + | Attribute | Description | + +==================+=========================================================================================+ + | id | .. index:: | + | | pair: fencing-level; id | + | | | + | | A unique name for this element (required) | + +------------------+-----------------------------------------------------------------------------------------+ + | target | .. index:: | + | | pair: fencing-level; target | + | | | + | | The name of a single node to which this level applies | + +------------------+-----------------------------------------------------------------------------------------+ + | target-pattern | .. index:: | + | | pair: fencing-level; target-pattern | + | | | + | | An extended regular expression (as defined in `POSIX | + | | `_) | + | | matching the names of nodes to which this level applies | + +------------------+-----------------------------------------------------------------------------------------+ + | target-attribute | .. index:: | + | | pair: fencing-level; target-attribute | + | | | + | | The name of a node attribute that is set (to ``target-value``) for nodes to which this | + | | level applies | + +------------------+-----------------------------------------------------------------------------------------+ + | target-value | .. index:: | + | | pair: fencing-level; target-value | + | | | + | | The node attribute value (of ``target-attribute``) that is set for nodes to which this | + | | level applies | + +------------------+-----------------------------------------------------------------------------------------+ + | index | .. index:: | + | | pair: fencing-level; index | + | | | + | | The order in which to attempt the levels. Levels are attempted in ascending order | + | | *until one succeeds*. Valid values are 1 through 9. | + +------------------+-----------------------------------------------------------------------------------------+ + | devices | .. index:: | + | | pair: fencing-level; devices | + | | | + | | A comma-separated list of devices that must all be tried for this level | + +------------------+-----------------------------------------------------------------------------------------+ + +.. note:: **Fencing topology with different devices for different nodes** + + .. code-block:: xml + + + + ... + + + + + + + + + ... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ... - - - ---- - - == Remapping Reboots == - - When the cluster needs to reboot a node, whether because +stonith-action+ is +reboot+ or because - a reboot was manually requested (such as by `stonith_admin --reboot`), it will remap that to - other commands in two cases: - - . If the chosen fencing device does not support the +reboot+ command, the cluster - will ask it to perform +off+ instead. - - . If a fencing topology level with multiple devices must be executed, the cluster - will ask all the devices to perform +off+, then ask the devices to perform +on+. - - To understand the second case, consider the example of a node with redundant - power supplies connected to intelligent power switches. Rebooting one switch - and then the other would have no effect on the node. Turning both switches off, - and then on, actually reboots the node. - - In such a case, the fencing operation will be treated as successful as long as - the +off+ commands succeed, because then it is safe for the cluster to recover - any resources that were on the node. Timeouts and errors in the +on+ phase will - be logged but ignored. - - When a reboot operation is remapped, any action-specific timeout for the - remapped action will be used (for example, +pcmk_off_timeout+ will be used when - executing the +off+ command, not +pcmk_reboot_timeout+). + + + + +Example Dual-Layer, Dual-Device Fencing Topologies +__________________________________________________ + +The following example illustrates an advanced use of ``fencing-topology`` in a +cluster with the following properties: + +* 2 nodes (prod-mysql1 and prod-mysql2) +* the nodes have IPMI controllers reachable at 192.0.2.1 and 192.0.2.2 +* the nodes each have two independent Power Supply Units (PSUs) connected to + two independent Power Distribution Units (PDUs) reachable at 198.51.100.1 + (port 10 and port 11) and 203.0.113.1 (port 10 and port 11) +* fencing via the IPMI controller uses the ``fence_ipmilan`` agent (1 fence device + per controller, with each device targeting a separate node) +* fencing via the PDUs uses the ``fence_apc_snmp`` agent (1 fence device per + PDU, with both devices targeting both nodes) +* a random delay is used to lessen the chance of a "death match" +* fencing topology is set to try IPMI fencing first then dual PDU fencing if + that fails + +In a node failure scenario, Pacemaker will first select ``fence_ipmilan`` to +try to kill the faulty node. Using the fencing topology, if that method fails, +it will then move on to selecting ``fence_apc_snmp`` twice (once for the first +PDU, then again for the second PDU). + +The fence action is considered successful only if both PDUs report the required +status. If any of them fails, fencing loops back to the first fencing method, +``fence_ipmilan``, and so on, until the node is fenced or the fencing action is +cancelled. + +.. note:: **First fencing method: single IPMI device per target** + + Each cluster node has it own dedicated IPMI controller that can be contacted + for fencing using the following primitives: + + .. code-block:: xml + + + + + + + + + + + + + + + + + + + + + + +.. note:: **Second fencing method: dual PDU devices** + + Each cluster node also has 2 distinct power supplies controlled by 2 + distinct PDUs: + + * Node 1: PDU 1 port 10 and PDU 2 port 10 + * Node 2: PDU 1 port 11 and PDU 2 port 11 + + The matching fencing agents are configured as follows: + + .. code-block:: xml + + + + + + + + + + + + + + + + + + + + +.. note:: **Fencing topology** + + Now that all the fencing resources are defined, it's time to create the + right topology. We want to first fence using IPMI and if that does not work, + fence both PDUs to effectively and surely kill the node. + + .. code-block:: xml + + + + + + + + + In ``fencing-topology``, the lowest ``index`` value for a target determines + its first fencing method. + +Remapping Reboots +################# + +When the cluster needs to reboot a node, whether because ``stonith-action`` is +``reboot`` or because a reboot was requested externally (such as by +``stonith_admin --reboot``), it will remap that to other commands in two cases: + +* If the chosen fencing device does not support the ``reboot`` command, the + cluster will ask it to perform ``off`` instead. + +* If a fencing topology level with multiple devices must be executed, the + cluster will ask all the devices to perform ``off``, then ask the devices to + perform ``on``. + +To understand the second case, consider the example of a node with redundant +power supplies connected to intelligent power switches. Rebooting one switch +and then the other would have no effect on the node. Turning both switches off, +and then on, actually reboots the node. + +In such a case, the fencing operation will be treated as successful as long as +the ``off`` commands succeed, because then it is safe for the cluster to +recover any resources that were on the node. Timeouts and errors in the ``on`` +phase will be logged but ignored. + +When a reboot operation is remapped, any action-specific timeout for the +remapped action will be used (for example, ``pcmk_off_timeout`` will be used +when executing the ``off`` command, not ``pcmk_reboot_timeout``).