diff --git a/doc/Pacemaker_Explained/en-US/Ch-Options.txt b/doc/Pacemaker_Explained/en-US/Ch-Options.txt
index 05db3f3ce8..ee204c7eea 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Options.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Options.txt
@@ -1,454 +1,455 @@
= Cluster-Wide Configuration =
== CIB Properties ==
Certain settings are defined by CIB properties (that is, attributes of the
+cib+ tag) rather than with the rest of the cluster configuration in the
+configuration+ section.
The reason is simply a matter of parsing. These options are used by the
configuration database which is, by design, mostly ignorant of the content it
holds. So the decision was made to place them in an easy-to-find location.
.CIB Properties
[width="95%",cols="2m,5<",options="header",align="center"]
|=========================================================
|Field |Description
| admin_epoch |
indexterm:[Configuration Version,Cluster]
indexterm:[Cluster,Option,Configuration Version]
indexterm:[admin_epoch,Cluster Option]
indexterm:[Cluster,Option,admin_epoch]
When a node joins the cluster, the cluster performs a check to see
which node has the best configuration. It asks the node with the highest
(+admin_epoch+, +epoch+, +num_updates+) tuple to replace the configuration on
all the nodes -- which makes setting them, and setting them correctly, very
important. +admin_epoch+ is never modified by the cluster; you can use this
to make the configurations on any inactive nodes obsolete. _Never set this
value to zero_. In such cases, the cluster cannot tell the difference between
your configuration and the "empty" one used when nothing is found on disk.
| epoch |
indexterm:[epoch,Cluster Option]
indexterm:[Cluster,Option,epoch]
The cluster increments this every time the configuration is updated (usually by
the administrator).
| num_updates |
indexterm:[num_updates,Cluster Option]
indexterm:[Cluster,Option,num_updates]
The cluster increments this every time the configuration or status is updated
(usually by the cluster) and resets it to 0 when epoch changes.
| validate-with |
indexterm:[validate-with,Cluster Option]
indexterm:[Cluster,Option,validate-with]
Determines the type of XML validation that will be done on the configuration.
If set to +none+, the cluster will not verify that updates conform to the
DTD (nor reject ones that don't). This option can be useful when
operating a mixed-version cluster during an upgrade.
|cib-last-written |
indexterm:[cib-last-written,Cluster Property]
indexterm:[Cluster,Property,cib-last-written]
Indicates when the configuration was last written to disk. Maintained by the
cluster; for informational purposes only.
|have-quorum |
indexterm:[have-quorum,Cluster Property]
indexterm:[Cluster,Property,have-quorum]
Indicates if the cluster has quorum. If false, this may mean that the
cluster cannot start resources or fence other nodes (see
+no-quorum-policy+ below). Maintained by the cluster.
|dc-uuid |
indexterm:[dc-uuid,Cluster Property]
indexterm:[Cluster,Property,dc-uuid]
Indicates which cluster node is the current leader. Used by the
cluster when placing resources and determining the order of some
events. Maintained by the cluster.
|=========================================================
=== Working with CIB Properties ===
Although these fields can be written to by the user, in
most cases the cluster will overwrite any values specified by the
user with the "correct" ones.
To change the ones that can be specified by the user,
for example +admin_epoch+, one should use:
----
# cibadmin --modify --xml-text ''
----
A complete set of CIB properties will look something like this:
.Attributes set for a cib object
======
[source,XML]
-------
-------
======
[[s-cluster-options]]
== Cluster Options ==
Cluster options, as you might expect, control how the cluster behaves
when confronted with certain situations.
They are grouped into sets within the +crm_config+ section, and, in advanced
configurations, there may be more than one set. (This will be described later
in the section on <> where we will show how to have the cluster use
different sets of options during working hours than during weekends.) For now,
we will describe the simple case where each option is present at most once.
You can obtain an up-to-date list of cluster options, including
their default values, by running the `man pengine` and `man crmd` commands.
.Cluster Options
[width="95%",cols="5m,2,11>).
| enable-startup-probes | TRUE |
indexterm:[enable-startup-probes,Cluster Option]
indexterm:[Cluster,Option,enable-startup-probes]
Should the cluster check for active resources during startup?
| maintenance-mode | FALSE |
indexterm:[maintenance-mode,Cluster Option]
indexterm:[Cluster,Option,maintenance-mode]
Should the cluster refrain from monitoring, starting and stopping resources?
| stonith-enabled | TRUE |
indexterm:[stonith-enabled,Cluster Option]
indexterm:[Cluster,Option,stonith-enabled]
Should failed nodes and nodes with resources that can't be stopped be
shot? If you value your data, set up a STONITH device and enable this.
If true, or unset, the cluster will refuse to start resources unless
one or more STONITH resources have been configured.
If false, unresponsive nodes are immediately assumed to be running no
resources, and resource takeover to online nodes starts without any
further protection (which means _data loss_ if the unresponsive node
still accesses shared storage, for example). See also the +requires+
meta-attribute in <>.
| stonith-action | reboot |
indexterm:[stonith-action,Cluster Option]
indexterm:[Cluster,Option,stonith-action]
Action to send to STONITH device. Allowed values are +reboot+ and +off+.
The value +poweroff+ is also allowed, but is only used for
legacy devices.
| stonith-timeout | 60s |
indexterm:[stonith-timeout,Cluster Option]
indexterm:[Cluster,Option,stonith-timeout]
How long to wait for STONITH actions (reboot, on, off) to complete
| stonith-max-attempts | 10 |
indexterm:[stonith-max-attempts,Cluster Option]
indexterm:[Cluster,Option,stonith-max-attempts]
How many times fencing can fail for a target before the cluster will no longer
immediately re-attempt it. '(since 1.1.17)'
| concurrent-fencing | FALSE |
indexterm:[concurrent-fencing,Cluster Option]
indexterm:[Cluster,Option,concurrent-fencing]
Is the cluster allowed to initiate multiple fence actions concurrently?
+'(since 1.1.15)'
| cluster-delay | 60s |
indexterm:[cluster-delay,Cluster Option]
indexterm:[Cluster,Option,cluster-delay]
Estimated maximum round-trip delay over the network (excluding action
execution). If the TE requires an action to be executed on another node,
it will consider the action failed if it does not get a response
from the other node in this time (after considering the action's
own timeout). The "correct" value will depend on the speed and load of your
network and cluster nodes.
| dc-deadtime | 20s |
indexterm:[dc-deadtime,Cluster Option]
indexterm:[Cluster,Option,dc-deadtime]
How long to wait for a response from other nodes during startup.
The "correct" value will depend on the speed/load of your network and the type of switches used.
| cluster-recheck-interval | 15min |
indexterm:[cluster-recheck-interval,Cluster Option]
indexterm:[Cluster,Option,cluster-recheck-interval]
Polling interval for time-based changes to options, resource parameters and constraints.
The Cluster is primarily event-driven, but your configuration can have
elements that take effect based on the time of day. To ensure these changes
take effect, we can optionally poll the cluster's status for changes. A value
of 0 disables polling. Positive values are an interval (in seconds unless other
SI units are specified, e.g. 5min).
| pe-error-series-max | -1 |
indexterm:[pe-error-series-max,Cluster Option]
indexterm:[Cluster,Option,pe-error-series-max]
The number of PE inputs resulting in ERRORs to save. Used when reporting problems.
A value of -1 means unlimited (report all).
| pe-warn-series-max | -1 |
indexterm:[pe-warn-series-max,Cluster Option]
indexterm:[Cluster,Option,pe-warn-series-max]
The number of PE inputs resulting in WARNINGs to save. Used when reporting problems.
A value of -1 means unlimited (report all).
| pe-input-series-max | -1 |
indexterm:[pe-input-series-max,Cluster Option]
indexterm:[Cluster,Option,pe-input-series-max]
The number of "normal" PE inputs to save. Used when reporting problems.
A value of -1 means unlimited (report all).
| placement-strategy | default |
indexterm:[placement-strategy,Cluster Option]
indexterm:[Cluster,Option,placement-strategy]
How the cluster should allocate resources to nodes (see <>).
Allowed values are +default+, +utilization+, +balanced+, and +minimal+.
'(since 1.1.0)'
| node-health-strategy | none |
indexterm:[node-health-strategy,Cluster Option]
indexterm:[Cluster,Option,node-health-strategy]
How the cluster should react to node health attributes (see <>).
Allowed values are +none+, +migrate-on-red+, +only-green+, +progressive+, and
+custom+.
| node-health-base | 0 |
indexterm:[node-health-base,Cluster Option]
indexterm:[Cluster,Option,node-health-base]
The base health score assigned to a node. Only used when
+node-health-strategy+ is +progressive+. '(since 1.1.16)'
| node-health-green | 0 |
indexterm:[node-health-green,Cluster Option]
indexterm:[Cluster,Option,node-health-green]
The score to use for a node health attribute whose value is +green+.
Only used when +node-health-strategy+ is +progressive+ or +custom+.
| node-health-yellow | 0 |
indexterm:[node-health-yellow,Cluster Option]
indexterm:[Cluster,Option,node-health-yellow]
The score to use for a node health attribute whose value is +yellow+.
Only used when +node-health-strategy+ is +progressive+ or +custom+.
| node-health-red | 0 |
indexterm:[node-health-red,Cluster Option]
indexterm:[Cluster,Option,node-health-red]
The score to use for a node health attribute whose value is +red+.
Only used when +node-health-strategy+ is +progressive+ or +custom+.
| remove-after-stop | FALSE |
indexterm:[remove-after-stop,Cluster Option]
indexterm:[Cluster,Option,remove-after-stop]
_Advanced Use Only:_ Should the cluster remove resources from the LRM after
they are stopped? Values other than the default are, at best, poorly tested and
potentially dangerous.
| startup-fencing | TRUE |
indexterm:[startup-fencing,Cluster Option]
indexterm:[Cluster,Option,startup-fencing]
_Advanced Use Only:_ Should the cluster shoot unseen nodes?
Not using the default is very unsafe!
| election-timeout | 2min |
indexterm:[election-timeout,Cluster Option]
indexterm:[Cluster,Option,election-timeout]
_Advanced Use Only:_ If you need to adjust this value, it probably indicates
the presence of a bug.
| shutdown-escalation | 20min |
indexterm:[shutdown-escalation,Cluster Option]
indexterm:[Cluster,Option,shutdown-escalation]
_Advanced Use Only:_ If you need to adjust this value, it probably indicates
the presence of a bug.
| crmd-integration-timeout | 3min |
indexterm:[crmd-integration-timeout,Cluster Option]
indexterm:[Cluster,Option,crmd-integration-timeout]
_Advanced Use Only:_ If you need to adjust this value, it probably indicates
the presence of a bug.
| crmd-finalization-timeout | 30min |
indexterm:[crmd-finalization-timeout,Cluster Option]
indexterm:[Cluster,Option,crmd-finalization-timeout]
_Advanced Use Only:_ If you need to adjust this value, it probably indicates
the presence of a bug.
| crmd-transition-delay | 0s |
indexterm:[crmd-transition-delay,Cluster Option]
indexterm:[Cluster,Option,crmd-transition-delay]
_Advanced Use Only:_ Delay cluster recovery for the configured interval to
allow for additional/related events to occur. Useful if your configuration is
sensitive to the order in which ping updates arrive.
Enabling this option will slow down cluster recovery under
all conditions.
|default-resource-stickiness | 0 |
indexterm:[default-resource-stickiness,Cluster Option]
indexterm:[Cluster,Option,default-resource-stickiness]
_Deprecated:_ See <> instead
| is-managed-default | TRUE |
indexterm:[is-managed-default,Cluster Option]
indexterm:[Cluster,Option,is-managed-default]
_Deprecated:_ See <> instead
| default-action-timeout | 20s |
indexterm:[default-action-timeout,Cluster Option]
indexterm:[Cluster,Option,default-action-timeout]
_Deprecated:_ See <> instead
|=========================================================
=== Querying and Setting Cluster Options ===
indexterm:[Querying,Cluster Option]
indexterm:[Setting,Cluster Option]
indexterm:[Cluster,Querying Options]
indexterm:[Cluster,Setting Options]
Cluster options can be queried and modified using the `crm_attribute` tool. To
get the current value of +cluster-delay+, you can run:
----
# crm_attribute --query --name cluster-delay
----
which is more simply written as
----
# crm_attribute -G -n cluster-delay
----
If a value is found, you'll see a result like this:
----
# crm_attribute -G -n cluster-delay
scope=crm_config name=cluster-delay value=60s
----
If no value is found, the tool will display an error:
----
# crm_attribute -G -n clusta-deway
scope=crm_config name=clusta-deway value=(null)
Error performing operation: No such device or address
----
To use a different value (for example, 30 seconds), simply run:
----
# crm_attribute --name cluster-delay --update 30s
----
To go back to the cluster's default value, you can delete the value, for example:
----
# crm_attribute --name cluster-delay --delete
Deleted crm_config option: id=cib-bootstrap-options-cluster-delay name=cluster-delay
----
=== When Options are Listed More Than Once ===
If you ever see something like the following, it means that the option you're modifying is present more than once.
.Deleting an option that is listed twice
=======
------
# crm_attribute --name batch-limit --delete
Multiple attributes match name=batch-limit in crm_config:
Value: 50 (set=cib-bootstrap-options, id=cib-bootstrap-options-batch-limit)
Value: 100 (set=custom, id=custom-batch-limit)
Please choose from one of the matches above and supply the 'id' with --id
-------
=======
In such cases, follow the on-screen instructions to perform the
requested action. To determine which value is currently being used by
the cluster, refer to <>.
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
index 84053f53c4..133077a74b 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
@@ -1,908 +1,909 @@
= STONITH =
////
We prefer [[ch-stonith]], but older versions of asciidoc don't deal well
with that construct for chapter headings
////
anchor:ch-stonith[Chapter 13, STONITH]
indexterm:[STONITH, Configuration]
== What Is STONITH? ==
STONITH (an acronym for "Shoot The Other Node In The Head"), also called
'fencing', protects your data from being corrupted by rogue nodes or concurrent
access.
Just because a node is unresponsive, this doesn't mean it isn't
accessing your data. The only way to be 100% sure that your data is
safe, is to use STONITH so we can be certain that the node is truly
offline, before allowing the data to be accessed from another node.
STONITH also has a role to play in the event that a clustered service
cannot be stopped. In this case, the cluster uses STONITH to force the
whole node offline, thereby making it safe to start the service
elsewhere.
== What STONITH Device Should You Use? ==
It is crucial that the STONITH device can allow the cluster to
differentiate between a node failure and a network one.
The biggest mistake people make in choosing a STONITH device is to
use a remote power switch (such as many on-board IPMI controllers) that
shares power with the node it controls. In such cases, the cluster
cannot be sure if the node is really offline, or active and suffering
from a network fault.
Likewise, any device that relies on the machine being active (such as
SSH-based "devices" used during testing) are inappropriate.
== Special Treatment of STONITH Resources ==
STONITH resources are somewhat special in Pacemaker.
STONITH may be initiated by pacemaker or by other parts of the cluster
(such as resources like DRBD or DLM). To accommodate this, pacemaker
does not require the STONITH resource to be in the 'started' state
in order to be used, thus allowing reliable use of STONITH devices in such a
case.
[NOTE]
====
In pacemaker versions 1.1.9 and earlier, this feature either did not exist or
did not work well. Only "running" STONITH resources could be used by Pacemaker
for fencing, and if another component tried to fence a node while Pacemaker was
moving STONITH resources, the fencing could fail.
====
All nodes have access to STONITH devices' definitions and instantiate them
on-the-fly when needed, but preference is given to 'verified' instances, which
are the ones that are 'started' according to the cluster's knowledge.
In the case of a cluster split, the partition with a verified instance
will have a slight advantage, because the STONITH daemon in the other partition
will have to hear from all its current peers before choosing a node to
perform the fencing.
Fencing resources do work the same as regular resources in some respects:
* +target-role+ can be used to enable or disable the resource
* Location constraints can be used to prevent a specific node from using the resource
[IMPORTANT]
===========
Currently there is a limitation that fencing resources may only have
one set of meta-attributes and one set of instance attributes. This
can be revisited if it becomes a significant limitation for people.
===========
See the table below or run `man stonithd` to see special instance attributes
that may be set for any fencing resource, regardless of fence agent.
.Properties of Fencing Resources
[width="95%",cols="5m,2,3,10
----
====
Based on that, we would create a STONITH resource fragment that might look
like this:
.An IPMI-based STONITH Resource
====
[source,XML]
----
----
====
Finally, we need to enable STONITH:
----
# crm_attribute -t crm_config -n stonith-enabled -v true
----
== Advanced STONITH Configurations ==
Some people consider that having one fencing device is a single point
of failure footnote:[Not true, since a node or resource must fail
before fencing even has a chance to]; others prefer removing the node
from the storage and network instead of turning it off.
Whatever the reason, Pacemaker supports fencing nodes with multiple
devices through a feature called 'fencing topologies'.
Simply create the individual devices as you normally would, then
define one or more +fencing-level+ entries in the +fencing-topology+ section of
the configuration.
* Each fencing level is attempted in order of ascending +index+. Allowed
values are 1 through 9.
* If a device fails, processing terminates for the current level.
No further devices in that level are exercised, and the next level is attempted instead.
* If the operation succeeds for all the listed devices in a level, the level is deemed to have passed.
* The operation is finished when a level has passed (success), or all levels have been attempted (failed).
* If the operation failed, the next step is determined by the Policy Engine and/or `crmd`.
Some possible uses of topologies include:
* Try poison-pill and fail back to power
* Try disk and network, and fall back to power if either fails
* Initiate a kdump and then poweroff the node
.Properties of Fencing Levels
[width="95%",cols="1m,3<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the level
indexterm:[id,fencing-level]
indexterm:[Fencing,fencing-level,id]
|target
|The name of a single node to which this level applies
indexterm:[target,fencing-level]
indexterm:[Fencing,fencing-level,target]
|target-pattern
|A regular expression matching the names of nodes to which this level applies
'(since 1.1.14)'
indexterm:[target-pattern,fencing-level]
indexterm:[Fencing,fencing-level,target-pattern]
|target-attribute
|The name of a node attribute that is set (to +target-value+) for nodes to
which this level applies '(since 1.1.14)'
indexterm:[target-attribute,fencing-level]
indexterm:[Fencing,fencing-level,target-attribute]
|target-value
|The node attribute value (of +target-attribute+) that is set for nodes to
which this level applies '(since 1.1.14)'
indexterm:[target-attribute,fencing-level]
indexterm:[Fencing,fencing-level,target-attribute]
|index
|The order in which to attempt the levels.
Levels are attempted in ascending order 'until one succeeds'.
Valid values are 1 through 9.
indexterm:[index,fencing-level]
indexterm:[Fencing,fencing-level,index]
|devices
|A comma-separated list of devices that must all be tried for this level
indexterm:[devices,fencing-level]
indexterm:[Fencing,fencing-level,devices]
|=========================================================
.Fencing topology with different devices for different nodes
====
[source,XML]
----
...
...
----
====
=== Example Dual-Layer, Dual-Device Fencing Topologies ===
The following example illustrates an advanced use of +fencing-topology+ in a cluster with the following properties:
* 3 nodes (2 active prod-mysql nodes, 1 prod_mysql-rep in standby for quorum purposes)
* the active nodes have an IPMI-controlled power board reached at 192.0.2.1 and 192.0.2.2
* the active nodes also have two independent PSUs (Power Supply Units)
connected to two independent PDUs (Power Distribution Units) reached at
198.51.100.1 (port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
* the first fencing method uses the `fence_ipmi` agent
* the second fencing method uses the `fence_apc_snmp` agent targetting 2 fencing devices (one per PSU, either port 10 or 11)
* fencing is only implemented for the active nodes and has location constraints
* fencing topology is set to try IPMI fencing first then default to a "sure-kill" dual PDU fencing
In a normal failure scenario, STONITH will first select +fence_ipmi+ to try to kill the faulty node.
Using a fencing topology, if that first method fails, STONITH will then move on to selecting +fence_apc_snmp+ twice:
* once for the first PDU
* again for the second PDU
The fence action is considered successful only if both PDUs report the required status. If any of them fails, STONITH loops back to the first fencing method, +fence_ipmi+, and so on until the node is fenced or fencing action is cancelled.
.First fencing method: single IPMI device
Each cluster node has it own dedicated IPMI channel that can be called for fencing using the following primitives:
[source,XML]
----
----
.Second fencing method: dual PDU devices
Each cluster node also has two distinct power channels controlled by two
distinct PDUs. That means a total of 4 fencing devices configured as follows:
- Node 1, PDU 1, PSU 1 @ port 10
- Node 1, PDU 2, PSU 2 @ port 10
- Node 2, PDU 1, PSU 1 @ port 11
- Node 2, PDU 2, PSU 2 @ port 11
The matching fencing agents are configured as follows:
[source,XML]
----
----
.Location Constraints
To prevent STONITH from trying to run a fencing agent on the same node it is
supposed to fence, constraints are placed on all the fencing primitives:
[source,XML]
----
----
.Fencing topology
Now that all the fencing resources are defined, it's time to create the right topology.
We want to first fence using IPMI and if that does not work, fence both PDUs to effectively and surely kill the node.
[source,XML]
----
----
Please note, in +fencing-topology+, the lowest +index+ value determines the priority of the first fencing method.
.Final configuration
Put together, the configuration looks like this:
[source,XML]
----
...
...
----
== Remapping Reboots ==
When the cluster needs to reboot a node, whether because +stonith-action+ is +reboot+ or because
a reboot was manually requested (such as by `stonith_admin --reboot`), it will remap that to
other commands in two cases:
. If the chosen fencing device does not support the +reboot+ command, the cluster
will ask it to perform +off+ instead.
. If a fencing topology level with multiple devices must be executed, the cluster
will ask all the devices to perform +off+, then ask the devices to perform +on+.
To understand the second case, consider the example of a node with redundant
power supplies connected to intelligent power switches. Rebooting one switch
and then the other would have no effect on the node. Turning both switches off,
and then on, actually reboots the node.
In such a case, the fencing operation will be treated as successful as long as
the +off+ commands succeed, because then it is safe for the cluster to recover
any resources that were on the node. Timeouts and errors in the +on+ phase will
be logged but ignored.
When a reboot operation is remapped, any action-specific timeout for the
remapped action will be used (for example, +pcmk_off_timeout+ will be used when
executing the +off+ command, not +pcmk_reboot_timeout+).
[NOTE]
====
In Pacemaker versions 1.1.13 and earlier, reboots will not be remapped in the
second case. To achieve the same effect, separate fencing devices for off and
on actions must be configured.
====