Page MenuHomeClusterLabs Projects

No OneTemporary

diff --git a/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt b/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt
index 4477c64c75..694c35d053 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt
@@ -1,879 +1,881 @@
= Resource Constraints =
indexterm:[Resource,Constraints]
== Scores ==
Scores of all kinds are integral to how the cluster works.
Practically everything from moving a resource to deciding which
resource to stop in a degraded cluster is achieved by manipulating
scores in some way.
Scores are calculated per resource and node. Any node with a
negative score for a resource can't run that resource. The cluster
places a resource on the node with the highest score for it.
=== Infinity Math ===
Pacemaker implements +INFINITY+ (or equivalently, ++INFINITY+) internally as a
score of 1,000,000. Addition and subtraction with it follow these three basic
rules:
* Any value + +INFINITY+ = +INFINITY+
* Any value - +INFINITY+ = +-INFINITY+
* +INFINITY+ - +INFINITY+ = +-INFINITY+
[NOTE]
======
What if you want to use a score higher than 1,000,000? Typically this possibility
arises when someone wants to base the score on some external metric that might
go above 1,000,000.
The short answer is you can't.
The long answer is it is sometimes possible work around this limitation
creatively. You may be able to set the score to some computed value based on
the external metric rather than use the metric directly. For nodes, you can
store the metric as a node attribute, and query the attribute when computing
the score (possibly as part of a custom resource agent).
======
== Deciding Which Nodes a Resource Can Run On ==
indexterm:[Location Constraints]
indexterm:[Resource,Constraints,Location]
'Location constraints' tell the cluster which nodes a resource can run on.
There are two alternative strategies. One way is to say that, by default,
resources can run anywhere, and then the location constraints specify nodes
that are not allowed (an 'opt-out' cluster). The other way is to start with
nothing able to run anywhere, and use location constraints to selectively
enable allowed nodes (an 'opt-in' cluster).
Whether you should choose opt-in or opt-out depends on your
personal preference and the make-up of your cluster. If most of your
resources can run on most of the nodes, then an opt-out arrangement is
likely to result in a simpler configuration. On the other-hand, if
most resources can only run on a small subset of nodes, an opt-in
configuration might be simpler.
=== Location Properties ===
.Properties of a rsc_location Constraint
[width="95%",cols="2m,1,5<a",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the constraint
indexterm:[id,Location Constraints]
indexterm:[Constraints,Location,id]
|rsc
|
|The name of the resource to which this constraint applies
indexterm:[rsc,Location Constraints]
indexterm:[Constraints,Location,rsc]
|rsc-pattern
|
-|A regular expression matching the names of resources to which this constraint
+|An extended regular expression (as defined in
+ http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04[POSIX])
+ matching the names of resources to which this constraint
applies, if +rsc+ is not specified; if the regular expression contains
submatches and the constraint is governed by a rule (see <<ch-rules>>), the
submatches can be referenced as +%0+ through +%9+ in the rule's
+score-attribute+ or a rule expression's +attribute+
indexterm:[rsc-pattern,Location Constraints]
indexterm:[Constraints,Location,rsc-pattern]
|node
|
|A node's name
indexterm:[node,Location Constraints]
indexterm:[Constraints,Location,node]
|score
|
|Positive values indicate a preference for running the affected resource(s) on
this node -- the higher the value, the stronger the preference. Negative values
indicate the resource(s) should avoid this node (a value of +-INFINITY+
changes "should" to "must").
indexterm:[score,Location Constraints]
indexterm:[Constraints,Location,score]
|resource-discovery
|always
|Whether Pacemaker should perform resource discovery (that is, check whether
the resource is already running) for this resource on this node. This should
normally be left as the default, so that rogue instances of a service can be
stopped when they are running where they are not supposed to be. However,
there are two situations where disabling resource discovery is a good idea:
when a service is not installed on a node, discovery might return an error
(properly written OCF agents will not, so this is usually only seen with other
agent types); and when Pacemaker Remote is used to scale a cluster to hundreds
of nodes, limiting resource discovery to allowed nodes can significantly boost
performance.
* +always:+ Always perform resource discovery for the specified resource on this node.
* +never:+ Never perform resource discovery for the specified resource on this node.
This option should generally be used with a -INFINITY score, although that is not strictly
required.
* +exclusive:+ Perform resource discovery for the specified resource only on
this node (and other nodes similarly marked as +exclusive+). Multiple location
constraints using +exclusive+ discovery for the same resource across
different nodes creates a subset of nodes resource-discovery is exclusive to.
If a resource is marked for +exclusive+ discovery on one or more nodes, that
resource is only allowed to be placed within that subset of nodes.
indexterm:[Resource Discovery,Location Constraints]
indexterm:[Constraints,Location,Resource Discovery]
|=========================================================
[WARNING]
=========
Setting resource-discovery to +never+ or +exclusive+ removes Pacemaker's
ability to detect and stop unwanted instances of a service running
where it's not supposed to be. It is up to the system administrator (you!)
to make sure that the service can 'never' be active on nodes without
resource-discovery (such as by leaving the relevant software uninstalled).
=========
=== Asymmetrical "Opt-In" Clusters ===
indexterm:[Asymmetrical Opt-In Clusters]
indexterm:[Cluster Type,Asymmetrical Opt-In]
To create an opt-in cluster, start by preventing resources from
running anywhere by default:
----
# crm_attribute --name symmetric-cluster --update false
----
Then start enabling nodes. The following fragment says that the web
server prefers *sles-1*, the database prefers *sles-2* and both can
fail over to *sles-3* if their most preferred node fails.
.Opt-in location constraints for two resources
======
[source,XML]
-------
<constraints>
<rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="200"/>
<rsc_location id="loc-2" rsc="Webserver" node="sles-3" score="0"/>
<rsc_location id="loc-3" rsc="Database" node="sles-2" score="200"/>
<rsc_location id="loc-4" rsc="Database" node="sles-3" score="0"/>
</constraints>
-------
======
=== Symmetrical "Opt-Out" Clusters ===
indexterm:[Symmetrical Opt-Out Clusters]
indexterm:[Cluster Type,Symmetrical Opt-Out]
To create an opt-out cluster, start by allowing resources to run
anywhere by default:
----
# crm_attribute --name symmetric-cluster --update true
----
Then start disabling nodes. The following fragment is the equivalent
of the above opt-in configuration.
.Opt-out location constraints for two resources
======
[source,XML]
-------
<constraints>
<rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="200"/>
<rsc_location id="loc-2-dont-run" rsc="Webserver" node="sles-2" score="-INFINITY"/>
<rsc_location id="loc-3-dont-run" rsc="Database" node="sles-1" score="-INFINITY"/>
<rsc_location id="loc-4" rsc="Database" node="sles-2" score="200"/>
</constraints>
-------
======
[[node-score-equal]]
=== What if Two Nodes Have the Same Score ===
If two nodes have the same score, then the cluster will choose one.
This choice may seem random and may not be what was intended, however
the cluster was not given enough information to know any better.
.Constraints where a resource prefers two nodes equally
======
[source,XML]
-------
<constraints>
<rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="INFINITY"/>
<rsc_location id="loc-2" rsc="Webserver" node="sles-2" score="INFINITY"/>
<rsc_location id="loc-3" rsc="Database" node="sles-1" score="500"/>
<rsc_location id="loc-4" rsc="Database" node="sles-2" score="300"/>
<rsc_location id="loc-5" rsc="Database" node="sles-2" score="200"/>
</constraints>
-------
======
In the example above, assuming no other constraints and an inactive
cluster, +Webserver+ would probably be placed on +sles-1+ and +Database+ on
+sles-2+. It would likely have placed +Webserver+ based on the node's
uname and +Database+ based on the desire to spread the resource load
evenly across the cluster. However other factors can also be involved
in more complex configurations.
[[s-resource-ordering]]
== Specifying the Order in which Resources Should Start/Stop ==
indexterm:[Resource,Constraints,Ordering]
indexterm:[Resource,Start Order]
indexterm:[Ordering Constraints]
'Ordering constraints' tell the cluster the order in which resources should
start.
[IMPORTANT]
====
Ordering constraints affect 'only' the ordering of resources;
they do 'not' require that the resources be placed on the
same node. If you want resources to be started on the same node
'and' in a specific order, you need both an ordering constraint 'and'
a colocation constraint (see <<s-resource-colocation>>), or
alternatively, a group (see <<group-resources>>).
====
=== Ordering Properties ===
.Properties of a rsc_order Constraint
[width="95%",cols="1m,1,4<a",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the constraint
indexterm:[id,Ordering Constraints]
indexterm:[Constraints,Ordering,id]
|first
|
|Name of the resource that the +then+ resource depends on
indexterm:[first,Ordering Constraints]
indexterm:[Constraints,Ordering,first]
|then
|
|Name of the dependent resource
indexterm:[then,Ordering Constraints]
indexterm:[Constraints,Ordering,then]
|first-action
|start
|The action that the +first+ resource must complete before +then-action+
can be initiated for the +then+ resource. Allowed values: +start+,
+stop+, +promote+, +demote+.
indexterm:[first-action,Ordering Constraints]
indexterm:[Constraints,Ordering,first-action]
|then-action
|value of +first-action+
|The action that the +then+ resource can execute only after the
+first-action+ on the +first+ resource has completed. Allowed
values: +start+, +stop+, +promote+, +demote+.
indexterm:[then-action,Ordering Constraints]
indexterm:[Constraints,Ordering,then-action]
|kind
|
|How to enforce the constraint. Allowed values:
* +Optional:+ Just a suggestion. Only applies if both resources are
executing the specified actions. Any change in state by the +first+ resource
will have no effect on the +then+ resource.
* +Mandatory:+ Always. If +first+ does not perform +first-action+, +then+ will
not be allowed to performed +then-action+. If +first+ is restarted, +then+
(if running) will be stopped beforehand and started afterward.
* +Serialize:+ Ensure that no two stop/start actions occur concurrently
for the resources. +First+ and +then+ can start in either order,
but one must complete starting before the other can be started. A typical use
case is when resource start-up puts a high load on the host.
indexterm:[kind,Ordering Constraints]
indexterm:[Constraints,Ordering,kind]
|symmetrical
|TRUE for +Mandatory+ and +Optional+ kinds. FALSE for +Serialize+ kind.
|If true, the reverse of the constraint applies for the opposite action (for
example, if B starts after A starts, then B stops before A stops).
+Serialize+ orders cannot be symmetrical.
indexterm:[symmetrical,Ordering Constraints]
indexterm:[Ordering Constraints,symmetrical]
|=========================================================
+Promote+ and +demote+ apply to the master role of
<<s-resource-promotable,promotable>> resources.
=== Optional and mandatory ordering ===
Here is an example of ordering constraints where +Database+ 'must' start before
+Webserver+, and +IP+ 'should' start before +Webserver+ if they both need to be
started:
.Optional and mandatory ordering constraints
======
[source,XML]
-------
<constraints>
<rsc_order id="order-1" first="IP" then="Webserver" kind="Optional"/>
<rsc_order id="order-2" first="Database" then="Webserver" kind="Mandatory" />
</constraints>
-------
======
Because the above example lets +symmetrical+ default to TRUE,
+Webserver+ must be stopped before +Database+ can be stopped,
and +Webserver+ should be stopped before +IP+
if they both need to be stopped.
[[s-resource-colocation]]
== Placing Resources Relative to other Resources ==
indexterm:[Resource,Constraints,Colocation]
indexterm:[Resource,Location Relative to other Resources]
'Colocation constraints' tell the cluster that the location of one resource
depends on the location of another one.
Colocation has an important side-effect: it affects the order in which
resources are assigned to a node. Think about it: You can't place A relative to
B unless you know where B is.
footnote:[
While the human brain is sophisticated enough to read the constraint
in any order and choose the correct one depending on the situation,
the cluster is not quite so smart. Yet.
]
So when you are creating colocation constraints, it is important to
consider whether you should colocate A with B, or B with A.
Another thing to keep in mind is that, assuming A is colocated with
B, the cluster will take into account A's preferences when
deciding which node to choose for B.
For a detailed look at exactly how this occurs, see
http://clusterlabs.org/doc/Colocation_Explained.pdf[Colocation Explained].
[IMPORTANT]
====
Colocation constraints affect 'only' the placement of resources; they do 'not'
require that the resources be started in a particular order. If you want
resources to be started on the same node 'and' in a specific order, you need
both an ordering constraint (see <<s-resource-ordering>>) 'and' a colocation
constraint, or alternatively, a group (see <<group-resources>>).
====
=== Colocation Properties ===
.Properties of a rsc_colocation Constraint
[width="95%",cols="1m,1,4<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the constraint (required).
indexterm:[id,Colocation Constraints]
indexterm:[Constraints,Colocation,id]
|rsc
|
|The name of a resource that should be located relative to +with-rsc+ (required).
indexterm:[rsc,Colocation Constraints]
indexterm:[Constraints,Colocation,rsc]
|with-rsc
|
|The name of the resource used as the colocation target. The cluster will
decide where to put this resource first and then decide where to put +rsc+ (required).
indexterm:[with-rsc,Colocation Constraints]
indexterm:[Constraints,Colocation,with-rsc]
|node-attribute
|#uname
|The node attribute that must be the same on the node running +rsc+ and the
node running +with-rsc+ for the constraint to be satisfied. (For details,
see <<s-coloc-attribute>>.)
indexterm:[node-attribute,Colocation Constraints]
indexterm:[Constraints,Colocation,node-attribute]
|score
|
|Positive values indicate the resources should run on the same
node. Negative values indicate the resources should run on
different nodes. Values of \+/- +INFINITY+ change "should" to "must".
indexterm:[score,Colocation Constraints]
indexterm:[Constraints,Colocation,score]
|=========================================================
=== Mandatory Placement ===
Mandatory placement occurs when the constraint's score is
++INFINITY+ or +-INFINITY+. In such cases, if the constraint can't be
satisfied, then the +rsc+ resource is not permitted to run. For
+score=INFINITY+, this includes cases where the +with-rsc+ resource is
not active.
If you need resource +A+ to always run on the same machine as
resource +B+, you would add the following constraint:
.Mandatory colocation constraint for two resources
====
[source,XML]
<rsc_colocation id="colocate" rsc="A" with-rsc="B" score="INFINITY"/>
====
Remember, because +INFINITY+ was used, if +B+ can't run on any
of the cluster nodes (for whatever reason) then +A+ will not
be allowed to run. Whether +A+ is running or not has no effect on +B+.
Alternatively, you may want the opposite -- that +A+ 'cannot'
run on the same machine as +B+. In this case, use
+score="-INFINITY"+.
.Mandatory anti-colocation constraint for two resources
====
[source,XML]
<rsc_colocation id="anti-colocate" rsc="A" with-rsc="B" score="-INFINITY"/>
====
Again, by specifying +-INFINITY+, the constraint is binding. So if the
only place left to run is where +B+ already is, then
+A+ may not run anywhere.
As with +INFINITY+, +B+ can run even if +A+ is stopped.
However, in this case +A+ also can run if +B+ is stopped, because it still
meets the constraint of +A+ and +B+ not running on the same node.
=== Advisory Placement ===
If mandatory placement is about "must" and "must not", then advisory
placement is the "I'd prefer if" alternative. For constraints with
scores greater than +-INFINITY+ and less than +INFINITY+, the cluster
will try to accommodate your wishes but may ignore them if the
alternative is to stop some of the cluster resources.
As in life, where if enough people prefer something it effectively
becomes mandatory, advisory colocation constraints can combine with
other elements of the configuration to behave as if they were
mandatory.
.Advisory colocation constraint for two resources
====
[source,XML]
<rsc_colocation id="colocate-maybe" rsc="A" with-rsc="B" score="500"/>
====
[[s-coloc-attribute]]
=== Colocation by Node Attribute ===
The +node-attribute+ property of a colocation constraints allows you to express
the requirement, "these resources must be on similar nodes".
As an example, imagine that you have two Storage Area Networks (SANs) that are
not controlled by the cluster, and each node is connected to one or the other.
You may have two resources +r1+ and +r2+ such that +r2+ needs to use the same
SAN as +r1+, but doesn't necessarily have to be on the same exact node.
In such a case, you could define a <<s-node-attributes,node attribute>> named
+san+, with the value +san1+ or +san2+ on each node as appropriate. Then, you
could colocate +r2+ with +r1+ using +node-attribute+ set to +san+.
[[s-resource-sets]]
== Resource Sets ==
'Resource sets' allow multiple resources to be affected by a single constraint.
.A set of 3 resources
====
[source,XML]
----
<resource_set id="resource-set-example">
<resource_ref id="A"/>
<resource_ref id="B"/>
<resource_ref id="C"/>
</resource_set>
----
====
Resource sets are valid inside +rsc_location+,
+rsc_order+ (see <<s-resource-sets-ordering>>),
+rsc_colocation+ (see <<s-resource-sets-colocation>>),
and +rsc_ticket+ (see <<s-ticket-constraints>>) constraints.
A resource set has a number of properties that can be set,
though not all have an effect in all contexts.
.Properties of a resource_set
[width="95%",cols="2m,1,5<a",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the set
indexterm:[id,Resource Sets]
indexterm:[Constraints,Resource Sets,id]
|sequential
|true
|Whether the members of the set must be acted on in order.
Meaningful within +rsc_order+ and +rsc_colocation+.
indexterm:[sequential,Resource Sets]
indexterm:[Constraints,Resource Sets,sequential]
|require-all
|true
|Whether all members of the set must be active before continuing.
With the current implementation, the cluster may continue even if only one
member of the set is started, but if more than one member of the set is
starting at the same time, the cluster will still wait until all of those have
started before continuing (this may change in future versions).
Meaningful within +rsc_order+.
indexterm:[require-all,Resource Sets]
indexterm:[Constraints,Resource Sets,require-all]
|role
|
|Limit the effect of the constraint to the specified role.
Meaningful within +rsc_location+, +rsc_colocation+ and +rsc_ticket+.
indexterm:[role,Resource Sets]
indexterm:[Constraints,Resource Sets,role]
|action
|
|Limit the effect of the constraint to the specified action.
Meaningful within +rsc_order+.
indexterm:[action,Resource Sets]
indexterm:[Constraints,Resource Sets,action]
|score
|
|'Advanced use only.' Use a specific score for this set within the constraint.
indexterm:[score,Resource Sets]
indexterm:[Constraints,Resource Sets,score]
|=========================================================
[[s-resource-sets-ordering]]
== Ordering Sets of Resources ==
A common situation is for an administrator to create a chain of
ordered resources, such as:
.A chain of ordered resources
======
[source,XML]
-------
<constraints>
<rsc_order id="order-1" first="A" then="B" />
<rsc_order id="order-2" first="B" then="C" />
<rsc_order id="order-3" first="C" then="D" />
</constraints>
-------
======
.Visual representation of the four resources' start order for the above constraints
image::images/resource-set.png["Ordered set",width="16cm",height="2.5cm",align="center"]
=== Ordered Set ===
To simplify this situation, resource sets (see <<s-resource-sets>>) can be used
within ordering constraints:
.A chain of ordered resources expressed as a set
======
[source,XML]
-------
<constraints>
<rsc_order id="order-1">
<resource_set id="ordered-set-example" sequential="true">
<resource_ref id="A"/>
<resource_ref id="B"/>
<resource_ref id="C"/>
<resource_ref id="D"/>
</resource_set>
</rsc_order>
</constraints>
-------
======
While the set-based format is not less verbose, it is significantly
easier to get right and maintain.
[IMPORTANT]
=========
If you use a higher-level tool, pay attention to how it exposes this
functionality. Depending on the tool, creating a set +A B+ may be equivalent to
+A then B+, or +B then A+.
=========
=== Ordering Multiple Sets ===
The syntax can be expanded to allow sets of resources to be ordered relative to
each other, where the members of each individual set may be ordered or
unordered (controlled by the +sequential+ property). In the example below, +A+
and +B+ can both start in parallel, as can +C+ and +D+, however +C+ and +D+ can
only start once _both_ +A+ _and_ +B+ are active.
.Ordered sets of unordered resources
======
[source,XML]
-------
<constraints>
<rsc_order id="order-1">
<resource_set id="ordered-set-1" sequential="false">
<resource_ref id="A"/>
<resource_ref id="B"/>
</resource_set>
<resource_set id="ordered-set-2" sequential="false">
<resource_ref id="C"/>
<resource_ref id="D"/>
</resource_set>
</rsc_order>
</constraints>
-------
======
.Visual representation of the start order for two ordered sets of unordered resources
image::images/two-sets.png["Two ordered sets",width="13cm",height="7.5cm",align="center"]
Of course either set -- or both sets -- of resources can also be
internally ordered (by setting +sequential="true"+) and there is no
limit to the number of sets that can be specified.
.Advanced use of set ordering - Three ordered sets, two of which are internally unordered
======
[source,XML]
-------
<constraints>
<rsc_order id="order-1">
<resource_set id="ordered-set-1" sequential="false">
<resource_ref id="A"/>
<resource_ref id="B"/>
</resource_set>
<resource_set id="ordered-set-2" sequential="true">
<resource_ref id="C"/>
<resource_ref id="D"/>
</resource_set>
<resource_set id="ordered-set-3" sequential="false">
<resource_ref id="E"/>
<resource_ref id="F"/>
</resource_set>
</rsc_order>
</constraints>
-------
======
.Visual representation of the start order for the three sets defined above
image::images/three-sets.png["Three ordered sets",width="16cm",height="7.5cm",align="center"]
[IMPORTANT]
====
An ordered set with +sequential=false+ makes sense only if there is another
set in the constraint. Otherwise, the constraint has no effect.
====
=== Resource Set OR Logic ===
The unordered set logic discussed so far has all been "AND" logic.
To illustrate this take the 3 resource set figure in the previous section.
Those sets can be expressed, +(A and B) then \(C) then (D) then (E and F)+.
Say for example we want to change the first set, +(A and B)+, to use "OR" logic
so the sets look like this: +(A or B) then \(C) then (D) then (E and F)+.
This functionality can be achieved through the use of the +require-all+
option. This option defaults to TRUE which is why the
"AND" logic is used by default. Setting +require-all=false+ means only one
resource in the set needs to be started before continuing on to the next set.
.Resource Set "OR" logic: Three ordered sets, where the first set is internally unordered with "OR" logic
======
[source,XML]
-------
<constraints>
<rsc_order id="order-1">
<resource_set id="ordered-set-1" sequential="false" require-all="false">
<resource_ref id="A"/>
<resource_ref id="B"/>
</resource_set>
<resource_set id="ordered-set-2" sequential="true">
<resource_ref id="C"/>
<resource_ref id="D"/>
</resource_set>
<resource_set id="ordered-set-3" sequential="false">
<resource_ref id="E"/>
<resource_ref id="F"/>
</resource_set>
</rsc_order>
</constraints>
-------
======
[IMPORTANT]
====
An ordered set with +require-all=false+ makes sense only in conjunction with
+sequential=false+. Think of it like this: +sequential=false+ modifies the set
to be an unordered set using "AND" logic by default, and adding
+require-all=false+ flips the unordered set's "AND" logic to "OR" logic.
====
[[s-resource-sets-colocation]]
== Colocating Sets of Resources ==
Another common situation is for an administrator to create a set of
colocated resources.
One way to do this would be to define a resource group (see
<<group-resources>>), but that cannot always accurately express the desired
state.
Another way would be to define each relationship as an individual constraint,
but that causes a constraint explosion as the number of resources and
combinations grow. An example of this approach:
.Chain of colocated resources
======
[source,XML]
-------
<constraints>
<rsc_colocation id="coloc-1" rsc="D" with-rsc="C" score="INFINITY"/>
<rsc_colocation id="coloc-2" rsc="C" with-rsc="B" score="INFINITY"/>
<rsc_colocation id="coloc-3" rsc="B" with-rsc="A" score="INFINITY"/>
</constraints>
-------
======
To make things easier, resource sets (see <<s-resource-sets>>) can be used
within colocation constraints. As with the chained version, a
resource that can't be active prevents any resource that must be
colocated with it from being active. For example, if +B+ is not
able to run, then both +C+ and by inference +D+ must also remain
stopped. Here is an example +resource_set+:
.Equivalent colocation chain expressed using +resource_set+
======
[source,XML]
-------
<constraints>
<rsc_colocation id="coloc-1" score="INFINITY" >
<resource_set id="colocated-set-example" sequential="true">
<resource_ref id="A"/>
<resource_ref id="B"/>
<resource_ref id="C"/>
<resource_ref id="D"/>
</resource_set>
</rsc_colocation>
</constraints>
-------
======
[IMPORTANT]
=========
If you use a higher-level tool, pay attention to how it exposes this
functionality. Depending on the tool, creating a set +A B+ may be equivalent to
+A with B+, or +B with A+.
=========
This notation can also be used to tell the cluster that sets of resources must
be colocated relative to each other, where the individual members of each set
may or may not depend on each other being active (controlled by the
+sequential+ property).
In this example, +A+, +B+, and +C+ will each be colocated with +D+.
+D+ must be active, but any of +A+, +B+, or +C+ may be inactive without
affecting any other resources.
.Using colocated sets to specify a common peer
======
[source,XML]
-------
<constraints>
<rsc_colocation id="coloc-1" score="INFINITY" >
<resource_set id="colocated-set-1" sequential="false">
<resource_ref id="A"/>
<resource_ref id="B"/>
<resource_ref id="C"/>
</resource_set>
<resource_set id="colocated-set-2" sequential="true">
<resource_ref id="D"/>
</resource_set>
</rsc_colocation>
</constraints>
-------
======
[IMPORTANT]
====
A colocated set with +sequential=false+ makes sense only if there is another
set in the constraint. Otherwise, the constraint has no effect.
====
There is no inherent limit to the number and size of the sets used.
The only thing that matters is that in order for any member of one set
in the constraint to be active, all members of sets listed after it must also
be active (and naturally on the same node); and if a set has +sequential="true"+,
then in order for one member of that set to be active, all members listed
before it must also be active.
If desired, you can restrict the dependency to instances of promotable clone
resources that are in a specific role, using the set's +role+ property.
.Colocation chain in which the members of the middle set have no interdependencies, and the last listed set (which the cluster places first) is restricted to instances in master status.
======
[source,XML]
-------
<constraints>
<rsc_colocation id="coloc-1" score="INFINITY" >
<resource_set id="colocated-set-1" sequential="true">
<resource_ref id="B"/>
<resource_ref id="A"/>
</resource_set>
<resource_set id="colocated-set-2" sequential="false">
<resource_ref id="C"/>
<resource_ref id="D"/>
<resource_ref id="E"/>
</resource_set>
<resource_set id="colocated-set-3" sequential="true" role="Master">
<resource_ref id="G"/>
<resource_ref id="F"/>
</resource_set>
</rsc_colocation>
</constraints>
-------
======
.Visual representation the above example (resources to the left are placed first)
image::images/three-sets-complex.png["Colocation chain",width="16cm",height="9cm",align="center"]
[NOTE]
====
Pay close attention to the order in which resources and sets are listed.
While the colocation dependency for members of any one set is last-to-first,
the colocation dependency for multiple sets is first-to-last. In the above
example, +B+ is colocated with +A+, but +colocated-set-1+ is
colocated with +colocated-set-2+.
Unlike ordered sets, colocated sets do not use the +require-all+ option.
====
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
index 52883bebc7..4d1361ae8f 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
@@ -1,936 +1,938 @@
= STONITH =
////
We prefer [[ch-stonith]], but older versions of asciidoc don't deal well
with that construct for chapter headings
////
anchor:ch-stonith[Chapter 13, STONITH]
indexterm:[STONITH, Configuration]
== What Is STONITH? ==
STONITH (an acronym for "Shoot The Other Node In The Head"), also called
'fencing', protects your data from being corrupted by rogue nodes or concurrent
access.
Just because a node is unresponsive, this doesn't mean it isn't
accessing your data. The only way to be 100% sure that your data is
safe, is to use STONITH so we can be certain that the node is truly
offline, before allowing the data to be accessed from another node.
STONITH also has a role to play in the event that a clustered service
cannot be stopped. In this case, the cluster uses STONITH to force the
whole node offline, thereby making it safe to start the service
elsewhere.
== What STONITH Device Should You Use? ==
It is crucial that the STONITH device can allow the cluster to
differentiate between a node failure and a network one.
The biggest mistake people make in choosing a STONITH device is to
use a remote power switch (such as many on-board IPMI controllers) that
shares power with the node it controls. In such cases, the cluster
cannot be sure if the node is really offline, or active and suffering
from a network fault.
Likewise, any device that relies on the machine being active (such as
SSH-based "devices" used during testing) are inappropriate.
== Special Treatment of STONITH Resources ==
STONITH resources are somewhat special in Pacemaker.
STONITH may be initiated by pacemaker or by other parts of the cluster
(such as resources like DRBD or DLM). To accommodate this, pacemaker
does not require the STONITH resource to be in the 'started' state
in order to be used, thus allowing reliable use of STONITH devices in such a
case.
All nodes have access to STONITH devices' definitions and instantiate them
on-the-fly when needed, but preference is given to 'verified' instances, which
are the ones that are 'started' according to the cluster's knowledge.
In the case of a cluster split, the partition with a verified instance
will have a slight advantage, because the STONITH daemon in the other partition
will have to hear from all its current peers before choosing a node to
perform the fencing.
Fencing resources do work the same as regular resources in some respects:
* +target-role+ can be used to enable or disable the resource
* Location constraints can be used to prevent a specific node from using the resource
[IMPORTANT]
===========
Currently there is a limitation that fencing resources may only have
one set of meta-attributes and one set of instance attributes. This
can be revisited if it becomes a significant limitation for people.
===========
See the table below or run `man stonithd` to see special instance attributes
that may be set for any fencing resource, regardless of fence agent.
.Additional Properties of Fencing Resources
[width="95%",cols="5m,2,3,10<a",options="header",align="center"]
|=========================================================
|Field
|Type
|Default
|Description
|stonith-timeout
|NA
|NA
|Older versions used this to override the default period to wait for a STONITH (reboot, on, off) action to complete for this device.
It has been replaced by the +pcmk_reboot_timeout+ and +pcmk_off_timeout+ properties.
indexterm:[stonith-timeout,Fencing]
indexterm:[Fencing,Property,stonith-timeout]
////
priority
integer
0
The priority of the STONITH resource. Devices are tried in order of highest priority to lowest.
indexterm:[priority,Fencing]
indexterm:[Fencing,Property,priority]
////
|provides
|string
|
|Any special capability provided by the fence device. Currently, only one such
capability is meaningful: +unfencing+ (see <<s-unfencing>>).
indexterm:[priority,Fencing]
indexterm:[Fencing,Property,priority]
|pcmk_host_map
|string
|
|A mapping of host names to ports numbers for devices that do not support host names.
Example: +node1:1;node2:2,3+ tells the cluster to use port 1 for
*node1* and ports 2 and 3 for *node2*.
indexterm:[pcmk_host_map,Fencing]
indexterm:[Fencing,Property,pcmk_host_map]
|pcmk_host_list
|string
|
|A list of machines controlled by this device (optional unless
+pcmk_host_check+ is +static-list+).
indexterm:[pcmk_host_list,Fencing]
indexterm:[Fencing,Property,pcmk_host_list]
|pcmk_host_check
|string
|dynamic-list
|How to determine which machines are controlled by the device.
Allowed values:
* +dynamic-list:+ query the device
* +static-list:+ check the +pcmk_host_list+ attribute
* +none:+ assume every device can fence every machine
indexterm:[pcmk_host_check,Fencing]
indexterm:[Fencing,Property,pcmk_host_check]
|pcmk_delay_max
|time
|0s
|Enable a random delay of up to the time specified before executing stonith
actions. This is sometimes used in two-node clusters to ensure that the
nodes don't fence each other at the same time. The overall delay introduced
by pacemaker is derived from this random delay value adding a static delay so
that the sum is kept below the maximum delay.
indexterm:[pcmk_delay_max,Fencing]
indexterm:[Fencing,Property,pcmk_delay_max]
|pcmk_delay_base
|time
|0s
|Enable a static delay before executing stonith actions. This can be used
e.g. in two-node clusters to ensure that the nodes don't fence each other,
by having separate fencing resources with different values. The node that is
fenced with the shorter delay will lose a fencing race. The overall delay
introduced by pacemaker is derived from this value plus a random delay such
that the sum is kept below the maximum delay.
indexterm:[pcmk_delay_base,Fencing]
indexterm:[Fencing,Property,pcmk_delay_base]
|pcmk_action_limit
|integer
|1
|The maximum number of actions that can be performed in parallel on this
device, if the cluster option +concurrent-fencing+ is +true+. -1 is unlimited.
indexterm:[pcmk_action_limit,Fencing]
indexterm:[Fencing,Property,pcmk_action_limit]
|pcmk_host_argument
|string
|port
|'Advanced use only.' Which parameter should be supplied to the resource agent
to identify the node to be fenced. Some devices do not support the standard
+port+ parameter or may provide additional ones. Use this to specify an
alternate, device-specific parameter. A value of +none+ tells the
cluster not to supply any additional parameters.
indexterm:[pcmk_host_argument,Fencing]
indexterm:[Fencing,Property,pcmk_host_argument]
|pcmk_reboot_action
|string
|reboot
|'Advanced use only.' The command to send to the resource agent in order to
reboot a node. Some devices do not support the standard commands or may provide
additional ones. Use this to specify an alternate, device-specific command.
indexterm:[pcmk_reboot_action,Fencing]
indexterm:[Fencing,Property,pcmk_reboot_action]
|pcmk_reboot_timeout
|time
|60s
|'Advanced use only.' Specify an alternate timeout to use for `reboot` actions
instead of the value of +stonith-timeout+. Some devices need much more or less
time to complete than normal. Use this to specify an alternate, device-specific
timeout.
indexterm:[pcmk_reboot_timeout,Fencing]
indexterm:[Fencing,Property,pcmk_reboot_timeout]
indexterm:[stonith-timeout,Fencing]
indexterm:[Fencing,Property,stonith-timeout]
|pcmk_reboot_retries
|integer
|2
|'Advanced use only.' The maximum number of times to retry the `reboot` command
within the timeout period. Some devices do not support multiple connections, and
operations may fail if the device is busy with another task, so Pacemaker will
automatically retry the operation, if there is time remaining. Use this option
to alter the number of times Pacemaker retries before giving up.
indexterm:[pcmk_reboot_retries,Fencing]
indexterm:[Fencing,Property,pcmk_reboot_retries]
|pcmk_off_action
|string
|off
|'Advanced use only.' The command to send to the resource agent in order to
shut down a node. Some devices do not support the standard commands or may provide
additional ones. Use this to specify an alternate, device-specific command.
indexterm:[pcmk_off_action,Fencing]
indexterm:[Fencing,Property,pcmk_off_action]
|pcmk_off_timeout
|time
|60s
|'Advanced use only.' Specify an alternate timeout to use for `off` actions
instead of the value of +stonith-timeout+. Some devices need much more or less
time to complete than normal. Use this to specify an alternate, device-specific
timeout.
indexterm:[pcmk_off_timeout,Fencing]
indexterm:[Fencing,Property,pcmk_off_timeout]
indexterm:[stonith-timeout,Fencing]
indexterm:[Fencing,Property,stonith-timeout]
|pcmk_off_retries
|integer
|2
|'Advanced use only.' The maximum number of times to retry the `off` command
within the timeout period. Some devices do not support multiple connections, and
operations may fail if the device is busy with another task, so Pacemaker will
automatically retry the operation, if there is time remaining. Use this option
to alter the number of times Pacemaker retries before giving up.
indexterm:[pcmk_off_retries,Fencing]
indexterm:[Fencing,Property,pcmk_off_retries]
|pcmk_list_action
|string
|list
|'Advanced use only.' The command to send to the resource agent in order to
list nodes. Some devices do not support the standard commands or may provide
additional ones. Use this to specify an alternate, device-specific command.
indexterm:[pcmk_list_action,Fencing]
indexterm:[Fencing,Property,pcmk_list_action]
|pcmk_list_timeout
|time
|60s
|'Advanced use only.' Specify an alternate timeout to use for `list` actions
instead of the value of +stonith-timeout+. Some devices need much more or less
time to complete than normal. Use this to specify an alternate, device-specific
timeout.
indexterm:[pcmk_list_timeout,Fencing]
indexterm:[Fencing,Property,pcmk_list_timeout]
|pcmk_list_retries
|integer
|2
|'Advanced use only.' The maximum number of times to retry the `list` command
within the timeout period. Some devices do not support multiple connections, and
operations may fail if the device is busy with another task, so Pacemaker will
automatically retry the operation, if there is time remaining. Use this option
to alter the number of times Pacemaker retries before giving up.
indexterm:[pcmk_list_retries,Fencing]
indexterm:[Fencing,Property,pcmk_list_retries]
|pcmk_monitor_action
|string
|monitor
|'Advanced use only.' The command to send to the resource agent in order to
report extended status. Some devices do not support the standard commands or may provide
additional ones. Use this to specify an alternate, device-specific command.
indexterm:[pcmk_monitor_action,Fencing]
indexterm:[Fencing,Property,pcmk_monitor_action]
|pcmk_monitor_timeout
|time
|60s
|'Advanced use only.' Specify an alternate timeout to use for `monitor` actions
instead of the value of +stonith-timeout+. Some devices need much more or less
time to complete than normal. Use this to specify an alternate, device-specific
timeout.
indexterm:[pcmk_monitor_timeout,Fencing]
indexterm:[Fencing,Property,pcmk_monitor_timeout]
|pcmk_monitor_retries
|integer
|2
|'Advanced use only.' The maximum number of times to retry the `monitor` command
within the timeout period. Some devices do not support multiple connections, and
operations may fail if the device is busy with another task, so Pacemaker will
automatically retry the operation, if there is time remaining. Use this option
to alter the number of times Pacemaker retries before giving up.
indexterm:[pcmk_monitor_retries,Fencing]
indexterm:[Fencing,Property,pcmk_monitor_retries]
|pcmk_status_action
|string
|status
|'Advanced use only.' The command to send to the resource agent in order to
report status. Some devices do not support the standard commands or may provide
additional ones. Use this to specify an alternate, device-specific command.
indexterm:[pcmk_status_action,Fencing]
indexterm:[Fencing,Property,pcmk_status_action]
|pcmk_status_timeout
|time
|60s
|'Advanced use only.' Specify an alternate timeout to use for `status` actions
instead of the value of +stonith-timeout+. Some devices need much more or less
time to complete than normal. Use this to specify an alternate, device-specific
timeout.
indexterm:[pcmk_status_timeout,Fencing]
indexterm:[Fencing,Property,pcmk_status_timeout]
|pcmk_status_retries
|integer
|2
|'Advanced use only.' The maximum number of times to retry the `status` command
within the timeout period. Some devices do not support multiple connections, and
operations may fail if the device is busy with another task, so Pacemaker will
automatically retry the operation, if there is time remaining. Use this option
to alter the number of times Pacemaker retries before giving up.
indexterm:[pcmk_status_retries,Fencing]
indexterm:[Fencing,Property,pcmk_status_retries]
|=========================================================
[[s-unfencing]]
== Unfencing ==
Most fence devices cut the power to the target. By contrast, fence devices that
perform 'fabric fencing' cut off a node's access to some critical resource,
such as a shared disk or a network switch.
With fabric fencing, it is expected that the cluster will fence the node, and
then a system administrator must manually investigate what went wrong, correct
any issues found, then reboot (or restart the cluster services on) the node.
Once the node reboots and rejoins the cluster, some fabric fencing devices
require that an explicit command to restore the node's access to the critical
resource. This capability is called 'unfencing' and is typically implemented
as the fence agent's +on+ command.
If any cluster resource has +requires+ set to +unfencing+, then that resource
will not be probed or started on a node until that node has been unfenced.
== Configuring STONITH ==
[NOTE]
===========
Higher-level configuration shells include functionality to simplify the
process below, particularly the step for deciding which parameters are
required. However since this document deals only with core
components, you should refer to the STONITH chapter of the
http://www.clusterlabs.org/doc/[Clusters from Scratch] guide for those details.
===========
. Find the correct driver:
+
----
# stonith_admin --list-installed
----
. Find the required parameters associated with the device
(replacing $AGENT_NAME with the name obtained from the previous step):
+
----
# stonith_admin --metadata --agent $AGENT_NAME
----
. Create a file called +stonith.xml+ containing a primitive resource
with a class of +stonith+, a type equal to the agent name obtained earlier,
and a parameter for each of the values returned in the previous step.
. If the device does not know how to fence nodes based on their uname,
you may also need to set the special +pcmk_host_map+ parameter. See
`man stonithd` for details.
. If the device does not support the `list` command, you may also need
to set the special +pcmk_host_list+ and/or +pcmk_host_check+
parameters. See `man stonithd` for details.
. If the device does not expect the victim to be specified with the
`port` parameter, you may also need to set the special
+pcmk_host_argument+ parameter. See `man stonithd` for details.
. Upload it into the CIB using cibadmin:
+
----
# cibadmin -C -o resources --xml-file stonith.xml
----
. Set +stonith-enabled+ to true:
+
----
# crm_attribute -t crm_config -n stonith-enabled -v true
----
. Once the stonith resource is running, you can test it by executing the
following (although you might want to stop the cluster on that machine
first):
+
----
# stonith_admin --reboot nodename
----
=== Example STONITH Configuration ===
Assume we have an chassis containing four nodes and an IPMI device
active on 192.0.2.1. We would choose the `fence_ipmilan` driver,
and obtain the following list of parameters:
.Obtaining a list of STONITH Parameters
====
----
# stonith_admin --metadata -a fence_ipmilan
----
[source,XML]
----
<resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI over LAN">
<symlink name="fence_ilo3" shortdesc="Fence agent for HP iLO3"/>
<symlink name="fence_ilo4" shortdesc="Fence agent for HP iLO4"/>
<symlink name="fence_idrac" shortdesc="Fence agent for Dell iDRAC"/>
<symlink name="fence_imm" shortdesc="Fence agent for IBM Integrated Management Module"/>
<longdesc>
</longdesc>
<vendor-url>
</vendor-url>
<parameters>
<parameter name="auth" unique="0" required="0">
<getopt mixed="-A"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="ipaddr" unique="0" required="1">
<getopt mixed="-a"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="passwd" unique="0" required="0">
<getopt mixed="-p"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="passwd_script" unique="0" required="0">
<getopt mixed="-S"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="lanplus" unique="0" required="0">
<getopt mixed="-P"/>
<content type="boolean"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="login" unique="0" required="0">
<getopt mixed="-l"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="action" unique="0" required="0">
<getopt mixed="-o"/>
<content type="string" default="reboot"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="timeout" unique="0" required="0">
<getopt mixed="-t"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="cipher" unique="0" required="0">
<getopt mixed="-C"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="method" unique="0" required="0">
<getopt mixed="-M"/>
<content type="string" default="onoff"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="power_wait" unique="0" required="0">
<getopt mixed="-T"/>
<content type="string" default="2"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="delay" unique="0" required="0">
<getopt mixed="-f"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="privlvl" unique="0" required="0">
<getopt mixed="-L"/>
<content type="string"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
<parameter name="verbose" unique="0" required="0">
<getopt mixed="-v"/>
<content type="boolean"/>
<shortdesc lang="en">
</shortdesc>
</parameter>
</parameters>
<actions>
<action name="on"/>
<action name="off"/>
<action name="reboot"/>
<action name="status"/>
<action name="diag"/>
<action name="list"/>
<action name="monitor"/>
<action name="metadata"/>
<action name="stop" timeout="20s"/>
<action name="start" timeout="20s"/>
</actions>
</resource-agent>
----
====
Based on that, we would create a STONITH resource fragment that might look
like this:
.An IPMI-based STONITH Resource
====
[source,XML]
----
<primitive id="Fencing" class="stonith" type="fence_ipmilan" >
<instance_attributes id="Fencing-params" >
<nvpair id="Fencing-passwd" name="passwd" value="testuser" />
<nvpair id="Fencing-login" name="login" value="abc123" />
<nvpair id="Fencing-ipaddr" name="ipaddr" value="192.0.2.1" />
<nvpair id="Fencing-pcmk_host_list" name="pcmk_host_list" value="pcmk-1 pcmk-2" />
</instance_attributes>
<operations >
<op id="Fencing-monitor-10m" interval="10m" name="monitor" timeout="300s" />
</operations>
</primitive>
----
====
Finally, we need to enable STONITH:
----
# crm_attribute -t crm_config -n stonith-enabled -v true
----
== Advanced STONITH Configurations ==
Some people consider that having one fencing device is a single point
of failure footnote:[Not true, since a node or resource must fail
before fencing even has a chance to]; others prefer removing the node
from the storage and network instead of turning it off.
Whatever the reason, Pacemaker supports fencing nodes with multiple
devices through a feature called 'fencing topologies'.
Simply create the individual devices as you normally would, then
define one or more +fencing-level+ entries in the +fencing-topology+ section of
the configuration.
* Each fencing level is attempted in order of ascending +index+. Allowed
values are 1 through 9.
* If a device fails, processing terminates for the current level.
No further devices in that level are exercised, and the next level is attempted instead.
* If the operation succeeds for all the listed devices in a level, the level is deemed to have passed.
* The operation is finished when a level has passed (success), or all levels have been attempted (failed).
* If the operation failed, the next step is determined by the Policy Engine and/or `crmd`.
Some possible uses of topologies include:
* Try poison-pill and fail back to power
* Try disk and network, and fall back to power if either fails
* Initiate a kdump and then poweroff the node
.Properties of Fencing Levels
[width="95%",cols="1m,3<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the level
indexterm:[id,fencing-level]
indexterm:[Fencing,fencing-level,id]
|target
|The name of a single node to which this level applies
indexterm:[target,fencing-level]
indexterm:[Fencing,fencing-level,target]
|target-pattern
-|A regular expression matching the names of nodes to which this level applies
+|An extended regular expression (as defined in
+ http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04[POSIX])
+ matching the names of nodes to which this level applies
indexterm:[target-pattern,fencing-level]
indexterm:[Fencing,fencing-level,target-pattern]
|target-attribute
|The name of a node attribute that is set (to +target-value+) for nodes to
which this level applies
indexterm:[target-attribute,fencing-level]
indexterm:[Fencing,fencing-level,target-attribute]
|target-value
|The node attribute value (of +target-attribute+) that is set for nodes to
which this level applies
indexterm:[target-attribute,fencing-level]
indexterm:[Fencing,fencing-level,target-attribute]
|index
|The order in which to attempt the levels.
Levels are attempted in ascending order 'until one succeeds'.
Valid values are 1 through 9.
indexterm:[index,fencing-level]
indexterm:[Fencing,fencing-level,index]
|devices
|A comma-separated list of devices that must all be tried for this level
indexterm:[devices,fencing-level]
indexterm:[Fencing,fencing-level,devices]
|=========================================================
.Fencing topology with different devices for different nodes
====
[source,XML]
----
<cib crm_feature_set="3.0.6" validate-with="pacemaker-1.2" admin_epoch="1" epoch="0" num_updates="0">
<configuration>
...
<fencing-topology>
<!-- For pcmk-1, try poison-pill and fail back to power -->
<fencing-level id="f-p1.1" target="pcmk-1" index="1" devices="poison-pill"/>
<fencing-level id="f-p1.2" target="pcmk-1" index="2" devices="power"/>
<!-- For pcmk-2, try disk and network, and fail back to power -->
<fencing-level id="f-p2.1" target="pcmk-2" index="1" devices="disk,network"/>
<fencing-level id="f-p2.2" target="pcmk-2" index="2" devices="power"/>
</fencing-topology>
...
<configuration>
<status/>
</cib>
----
====
=== Example Dual-Layer, Dual-Device Fencing Topologies ===
The following example illustrates an advanced use of +fencing-topology+ in a cluster with the following properties:
* 3 nodes (2 active prod-mysql nodes, 1 prod_mysql-rep in standby for quorum purposes)
* the active nodes have an IPMI-controlled power board reached at 192.0.2.1 and 192.0.2.2
* the active nodes also have two independent PSUs (Power Supply Units)
connected to two independent PDUs (Power Distribution Units) reached at
198.51.100.1 (port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
* the first fencing method uses the `fence_ipmi` agent
* the second fencing method uses the `fence_apc_snmp` agent targetting 2 fencing devices (one per PSU, either port 10 or 11)
* fencing is only implemented for the active nodes and has location constraints
* fencing topology is set to try IPMI fencing first then default to a "sure-kill" dual PDU fencing
In a normal failure scenario, STONITH will first select +fence_ipmi+ to try to kill the faulty node.
Using a fencing topology, if that first method fails, STONITH will then move on to selecting +fence_apc_snmp+ twice:
* once for the first PDU
* again for the second PDU
The fence action is considered successful only if both PDUs report the required status. If any of them fails, STONITH loops back to the first fencing method, +fence_ipmi+, and so on until the node is fenced or fencing action is cancelled.
.First fencing method: single IPMI device
Each cluster node has it own dedicated IPMI channel that can be called for fencing using the following primitives:
[source,XML]
----
<primitive class="stonith" id="fence_prod-mysql1_ipmi" type="fence_ipmilan">
<instance_attributes id="fence_prod-mysql1_ipmi-instance_attributes">
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.1"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql2_ipmi" type="fence_ipmilan">
<instance_attributes id="fence_prod-mysql2_ipmi-instance_attributes">
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.2"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
</instance_attributes>
</primitive>
----
.Second fencing method: dual PDU devices
Each cluster node also has two distinct power channels controlled by two
distinct PDUs. That means a total of 4 fencing devices configured as follows:
- Node 1, PDU 1, PSU 1 @ port 10
- Node 1, PDU 2, PSU 2 @ port 10
- Node 2, PDU 1, PSU 1 @ port 11
- Node 2, PDU 2, PSU 2 @ port 11
The matching fencing agents are configured as follows:
[source,XML]
----
<primitive class="stonith" id="fence_prod-mysql1_apc1" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql1_apc1-instance_attributes">
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-port" name="port" value="10"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql1_apc2" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql1_apc2-instance_attributes">
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-port" name="port" value="10"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql2_apc1" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql2_apc1-instance_attributes">
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-port" name="port" value="11"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql2_apc2" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql2_apc2-instance_attributes">
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-port" name="port" value="11"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
</instance_attributes>
</primitive>
----
.Location Constraints
To prevent STONITH from trying to run a fencing agent on the same node it is
supposed to fence, constraints are placed on all the fencing primitives:
[source,XML]
----
<constraints>
<rsc_location id="l_fence_prod-mysql1_ipmi" node="prod-mysql1" rsc="fence_prod-mysql1_ipmi" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql2_ipmi" node="prod-mysql2" rsc="fence_prod-mysql2_ipmi" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql1_apc2" node="prod-mysql1" rsc="fence_prod-mysql1_apc2" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql1_apc1" node="prod-mysql1" rsc="fence_prod-mysql1_apc1" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql2_apc1" node="prod-mysql2" rsc="fence_prod-mysql2_apc1" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql2_apc2" node="prod-mysql2" rsc="fence_prod-mysql2_apc2" score="-INFINITY"/>
</constraints>
----
.Fencing topology
Now that all the fencing resources are defined, it's time to create the right topology.
We want to first fence using IPMI and if that does not work, fence both PDUs to effectively and surely kill the node.
[source,XML]
----
<fencing-topology>
<fencing-level devices="fence_prod-mysql1_ipmi" id="fencing-2" index="1" target="prod-mysql1"/>
<fencing-level devices="fence_prod-mysql1_apc1,fence_prod-mysql1_apc2" id="fencing-3" index="2" target="prod-mysql1"/>
<fencing-level devices="fence_prod-mysql2_ipmi" id="fencing-0" index="1" target="prod-mysql2"/>
<fencing-level devices="fence_prod-mysql2_apc1,fence_prod-mysql2_apc2" id="fencing-1" index="2" target="prod-mysql2"/>
</fencing-topology>
----
Please note, in +fencing-topology+, the lowest +index+ value determines the priority of the first fencing method.
.Final configuration
Put together, the configuration looks like this:
[source,XML]
----
<cib admin_epoch="0" crm_feature_set="3.0.7" epoch="292" have-quorum="1" num_updates="29" validate-with="pacemaker-1.2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
<nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="off"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="3"/>
...
</cluster_property_set>
</crm_config>
<nodes>
<node id="prod-mysql1" uname="prod-mysql1">
<node id="prod-mysql2" uname="prod-mysql2"/>
<node id="prod-mysql-rep1" uname="prod-mysql-rep1"/>
<instance_attributes id="prod-mysql-rep1">
<nvpair id="prod-mysql-rep1-standby" name="standby" value="on"/>
</instance_attributes>
</node>
</nodes>
<resources>
<primitive class="stonith" id="fence_prod-mysql1_ipmi" type="fence_ipmilan">
<instance_attributes id="fence_prod-mysql1_ipmi-instance_attributes">
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.1"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql2_ipmi" type="fence_ipmilan">
<instance_attributes id="fence_prod-mysql2_ipmi-instance_attributes">
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.2"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql1_apc1" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql1_apc1-instance_attributes">
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-port" name="port" value="10"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql1_apc2" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql1_apc2-instance_attributes">
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-port" name="port" value="10"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql1_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql2_apc1" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql2_apc1-instance_attributes">
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-port" name="port" value="11"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
</instance_attributes>
</primitive>
<primitive class="stonith" id="fence_prod-mysql2_apc2" type="fence_apc_snmp">
<instance_attributes id="fence_prod-mysql2_apc2-instance_attributes">
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-action" name="action" value="off"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-port" name="port" value="11"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-login" name="login" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
<nvpair id="fence_prod-mysql2_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
</instance_attributes>
</primitive>
</resources>
<constraints>
<rsc_location id="l_fence_prod-mysql1_ipmi" node="prod-mysql1" rsc="fence_prod-mysql1_ipmi" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql2_ipmi" node="prod-mysql2" rsc="fence_prod-mysql2_ipmi" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql1_apc2" node="prod-mysql1" rsc="fence_prod-mysql1_apc2" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql1_apc1" node="prod-mysql1" rsc="fence_prod-mysql1_apc1" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql2_apc1" node="prod-mysql2" rsc="fence_prod-mysql2_apc1" score="-INFINITY"/>
<rsc_location id="l_fence_prod-mysql2_apc2" node="prod-mysql2" rsc="fence_prod-mysql2_apc2" score="-INFINITY"/>
</constraints>
<fencing-topology>
<fencing-level devices="fence_prod-mysql1_ipmi" id="fencing-2" index="1" target="prod-mysql1"/>
<fencing-level devices="fence_prod-mysql1_apc1,fence_prod-mysql1_apc2" id="fencing-3" index="2" target="prod-mysql1"/>
<fencing-level devices="fence_prod-mysql2_ipmi" id="fencing-0" index="1" target="prod-mysql2"/>
<fencing-level devices="fence_prod-mysql2_apc1,fence_prod-mysql2_apc2" id="fencing-1" index="2" target="prod-mysql2"/>
</fencing-topology>
...
</configuration>
</cib>
----
== Remapping Reboots ==
When the cluster needs to reboot a node, whether because +stonith-action+ is +reboot+ or because
a reboot was manually requested (such as by `stonith_admin --reboot`), it will remap that to
other commands in two cases:
. If the chosen fencing device does not support the +reboot+ command, the cluster
will ask it to perform +off+ instead.
. If a fencing topology level with multiple devices must be executed, the cluster
will ask all the devices to perform +off+, then ask the devices to perform +on+.
To understand the second case, consider the example of a node with redundant
power supplies connected to intelligent power switches. Rebooting one switch
and then the other would have no effect on the node. Turning both switches off,
and then on, actually reboots the node.
In such a case, the fencing operation will be treated as successful as long as
the +off+ commands succeed, because then it is safe for the cluster to recover
any resources that were on the node. Timeouts and errors in the +on+ phase will
be logged but ignored.
When a reboot operation is remapped, any action-specific timeout for the
remapped action will be used (for example, +pcmk_off_timeout+ will be used when
executing the +off+ command, not +pcmk_reboot_timeout+).

File Metadata

Mime Type
text/x-diff
Expires
Sun, Jul 20, 8:21 PM (14 h, 40 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
2081401
Default Alt Text
(72 KB)

Event Timeline