diff --git a/doc/Pacemaker_Explained/en-US/Book_Info.xml b/doc/Pacemaker_Explained/en-US/Book_Info.xml
index 56b9d9bfda..bce0089524 100644
--- a/doc/Pacemaker_Explained/en-US/Book_Info.xml
+++ b/doc/Pacemaker_Explained/en-US/Book_Info.xml
@@ -1,35 +1,35 @@
Configuration Explained
An A-Z guide to Pacemaker's Configuration Options
Pacemaker
1.1
- 5
+ 6
0
The purpose of this document is to definitively explain the concepts used to configure Pacemaker.
To achieve this, it will focus exclusively on the XML syntax used to configure Pacemaker's
Cluster Information Base (CIB).
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
index 4060201d92..9d0bb7a43c 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
@@ -1,1045 +1,1052 @@
= Advanced Resource Types =
[[group-resources]]
== Groups - A Syntactic Shortcut ==
indexterm:[Group Resources]
indexterm:[Resources,Groups]
One of the most common elements of a cluster is a set of resources
that need to be located together, start sequentially, and stop in the
reverse order. To simplify this configuration, we support the concept
of groups.
.A group of two primitive resources
======
[source,XML]
-------
-------
======
Although the example above contains only two resources, there is no
limit to the number of resources a group can contain. The example is
also sufficient to explain the fundamental properties of a group:
* Resources are started in the order they appear in (+Public-IP+
first, then +Email+)
* Resources are stopped in the reverse order to which they appear in
(+Email+ first, then +Public-IP+)
If a resource in the group can't run anywhere, then nothing after that
is allowed to run, too.
* If +Public-IP+ can't run anywhere, neither can +Email+;
* but if +Email+ can't run anywhere, this does not affect +Public-IP+
in any way
The group above is logically equivalent to writing:
.How the cluster sees a group resource
======
[source,XML]
-------
-------
======
Obviously as the group grows bigger, the reduced configuration effort
can become significant.
Another (typical) example of a group is a DRBD volume, the filesystem
mount, an IP address, and an application that uses them.
=== Group Properties ===
.Properties of a Group Resource
[width="95%",cols="3m,5<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the group
indexterm:[id,Group Resource Property]
indexterm:[Resource,Group Property,id]
|=========================================================
=== Group Options ===
Groups inherit the +priority+, +target-role+, and +is-managed+ properties
from primitive resources. See <> for information about
those properties.
=== Group Instance Attributes ===
Groups have no instance attributes. However, any that are set for the group
object will be inherited by the group's children.
=== Group Contents ===
Groups may only contain a collection of cluster resources (see
<>). To refer to a child of a group resource, just use
the child's +id+ instead of the group's.
=== Group Constraints ===
Although it is possible to reference a group's children in
constraints, it is usually preferable to reference the group itself.
.Some constraints involving groups
======
[source,XML]
-------
-------
======
=== Group Stickiness ===
indexterm:[resource-stickiness,Groups]
Stickiness, the measure of how much a resource wants to stay where it
is, is additive in groups. Every active resource of the group will
contribute its stickiness value to the group's total. So if the
default +resource-stickiness+ is 100, and a group has seven members,
five of which are active, then the group as a whole will prefer its
current location with a score of 500.
[[s-resource-clone]]
== Clones - Resources That Get Active on Multiple Hosts ==
indexterm:[Clone Resources]
indexterm:[Resources,Clones]
Clones were initially conceived as a convenient way to start multiple
instances of an IP address resource and have them distributed throughout the
cluster for load balancing. They have turned out to quite useful for
a number of purposes including integrating with the Distributed Lock Manager
(used by many cluster filesystems), the fencing subsystem, and OCFS2.
You can clone any resource, provided the resource agent supports it.
Three types of cloned resources exist:
* Anonymous
* Globally unique
* Stateful
'Anonymous' clones are the simplest. These behave
completely identically everywhere they are running. Because of this,
there can be only one copy of an anonymous clone active per machine.
'Globally unique' clones are distinct entities. A copy of the clone
running on one machine is not equivalent to another instance on
another node, nor would any two copies on the same node be
equivalent.
'Stateful' clones are covered later in <>.
.A clone of an LSB resource
======
[source,XML]
-------
-------
======
=== Clone Properties ===
.Properties of a Clone Resource
[width="95%",cols="3m,5<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the clone
indexterm:[id,Clone Property]
indexterm:[Clone,Property,id]
|=========================================================
=== Clone Options ===
Options inherited from <> resources:
+priority, target-role, is-managed+
.Clone-specific configuration options
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|clone-max
|number of nodes in cluster
|How many copies of the resource to start
indexterm:[clone-max,Clone Option]
indexterm:[Clone,Option,clone-max]
|clone-node-max
|1
|How many copies of the resource can be started on a single node
indexterm:[clone-node-max,Clone Option]
indexterm:[Clone,Option,clone-node-max]
+|clone-min
+|1
+|Require at least this number of clone instances to be runnable before allowing
+resources depending on the clone to be runnable '(since 1.1.14)'
+ indexterm:[clone-min,Clone Option]
+ indexterm:[Clone,Option,clone-min]
+
|notify
|true
|When stopping or starting a copy of the clone, tell all the other
copies beforehand and again when the action was successful. Allowed values:
+false+, +true+
indexterm:[notify,Clone Option]
indexterm:[Clone,Option,notify]
|globally-unique
|false
|Does each copy of the clone perform a different function? Allowed
values: +false+, +true+
indexterm:[globally-unique,Clone Option]
indexterm:[Clone,Option,globally-unique]
|ordered
|false
|Should the copies be started in series (instead of in
parallel)? Allowed values: +false+, +true+
indexterm:[ordered,Clone Option]
indexterm:[Clone,Option,ordered]
|interleave
|false
|If this clone depends on another clone via an ordering constraint,
is it allowed to start after the local instance of the other clone
starts, rather than wait for all instances of the other clone to start?
Allowed values: +false+, +true+
indexterm:[interleave,Clone Option]
indexterm:[Clone,Option,interleave]
|=========================================================
=== Clone Instance Attributes ===
Clones have no instance attributes; however, any that are set here
will be inherited by the clone's children.
=== Clone Contents ===
Clones must contain exactly one primitive or group resource.
[WARNING]
You should never reference the name of a clone's child.
If you think you need to do this, you probably need to re-evaluate your design.
=== Clone Constraints ===
In most cases, a clone will have a single copy on each active cluster
node. If this is not the case, you can indicate which nodes the
cluster should preferentially assign copies to with resource location
constraints. These constraints are written no differently from those
for primitive resources except that the clone's +id+ is used.
.Some constraints involving clones
======
[source,XML]
-------
-------
======
Ordering constraints behave slightly differently for clones. In the
example above, +apache-stats+ will wait until all copies of +apache-clone+
that need to be started have done so before being started itself.
Only if _no_ copies can be started will +apache-stats+ be prevented
from being active. Additionally, the clone will wait for
+apache-stats+ to be stopped before stopping itself.
Colocation of a primitive or group resource with a clone means that
the resource can run on any machine with an active copy of the clone.
The cluster will choose a copy based on where the clone is running and
the resource's own location preferences.
Colocation between clones is also possible. If one clone +A+ is colocated
with another clone +B+, the set of allowed locations for +A+ is limited to
nodes on which +B+ is (or will be) active. Placement is then performed
normally.
[[s-clone-stickiness]]
=== Clone Stickiness ===
indexterm:[resource-stickiness,Clones]
To achieve a stable allocation pattern, clones are slightly sticky by
default. If no value for +resource-stickiness+ is provided, the clone
will use a value of 1. Being a small value, it causes minimal
disturbance to the score calculations of other resources but is enough
to prevent Pacemaker from needlessly moving copies around the cluster.
[NOTE]
====
For globally unique clones, this may result in multiple instances of the
clone staying on a single node, even after another eligible node becomes
active (for example, after being put into standby mode then made active again).
If you do not want this behavior, specify a +resource-stickiness+ of 0
for the clone temporarily and let the cluster adjust, then set it back
to 1 if you want the default behavior to apply again.
====
=== Clone Resource Agent Requirements ===
Any resource can be used as an anonymous clone, as it requires no
additional support from the resource agent. Whether it makes sense to
do so depends on your resource and its resource agent.
Globally unique clones do require some additional support in the
resource agent. In particular, it must only respond with
+$\{OCF_SUCCESS}+ if the node has that exact instance active. All
other probes for instances of the clone should result in
+$\{OCF_NOT_RUNNING}+ (or one of the other OCF error codes if
they are failed).
Individual instances of a clone are identified by appending a colon and a
numerical offset, e.g. +apache:2+.
Resource agents can find out how many copies there are by examining
the +OCF_RESKEY_CRM_meta_clone_max+ environment variable and which
copy it is by examining +OCF_RESKEY_CRM_meta_clone+.
The resource agent must not make any assumptions (based on
+OCF_RESKEY_CRM_meta_clone+) about which numerical instances are active. In
particular, the list of active copies will not always be an unbroken
sequence, nor always start at 0.
==== Clone Notifications ====
Supporting notifications requires the +notify+ action to be
implemented. If supported, the notify action will be passed a
number of extra variables which, when combined with additional
context, can be used to calculate the current state of the cluster and
what is about to happen to it.
.Environment variables supplied with Clone notify actions
[width="95%",cols="5,3<",options="header",align="center"]
|=========================================================
|Variable
|Description
|OCF_RESKEY_CRM_meta_notify_type
|Allowed values: +pre+, +post+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,type]
indexterm:[type,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_operation
|Allowed values: +start+, +stop+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,operation]
indexterm:[operation,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_resource
|Resources to be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_resource]
indexterm:[start_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_resource
|Resources to be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_resource]
indexterm:[stop_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_resource
|Resources that are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_resource]
indexterm:[active_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_inactive_resource
|Resources that are not running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,inactive_resource]
indexterm:[inactive_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_uname
|Nodes on which resources will be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_uname]
indexterm:[start_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_uname
|Nodes on which resources will be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_uname]
indexterm:[stop_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_uname
|Nodes on which resources are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_uname]
indexterm:[active_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_inactive_uname
|Nodes on which resources are not running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,inactive_uname]
indexterm:[inactive_uname,Notification Environment Variable]
|=========================================================
The variables come in pairs, such as
+OCF_RESKEY_CRM_meta_notify_start_resource+ and
+OCF_RESKEY_CRM_meta_notify_start_uname+ and should be treated as an
array of whitespace-separated elements.
Thus in order to indicate that +clone:0+ will be started on +sles-1+,
+clone:2+ will be started on +sles-3+, and +clone:3+ will be started
on +sles-2+, the cluster would set
.Notification variables
======
[source,Bash]
-------
OCF_RESKEY_CRM_meta_notify_start_resource="clone:0 clone:2 clone:3"
OCF_RESKEY_CRM_meta_notify_start_uname="sles-1 sles-3 sles-2"
-------
======
==== Proper Interpretation of Notification Environment Variables ====
.Pre-notification (stop):
* Active resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (stop) / Pre-notification (start):
* Active resources
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Inactive resources
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (start):
* Active resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
[[s-resource-multistate]]
== Multi-state - Resources That Have Multiple Modes ==
indexterm:[Multi-state Resources]
indexterm:[Resources,Multi-state]
Multi-state resources are a specialization of clone resources; please
ensure you understand <> before continuing!
Multi-state resources allow the instances to be in one of two operating modes
(called 'roles'). The roles are called 'master' and 'slave', but can mean
whatever you wish them to mean. The only limitation is that when an instance is
started, it must come up in the slave role.
=== Multi-state Properties ===
.Properties of a Multi-State Resource
[width="95%",cols="3m,5<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|Your name for the multi-state resource
indexterm:[id,Multi-State Property]
indexterm:[Multi-State,Property,id]
|=========================================================
=== Multi-state Options ===
Options inherited from <> resources:
+priority+, +target-role+, +is-managed+
Options inherited from <> resources:
+clone-max+, +clone-node-max+, +notify+, +globally-unique+, +ordered+,
+interleave+
.Multi-state-specific resource configuration options
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|master-max
|1
|How many copies of the resource can be promoted to the +master+ role
indexterm:[master-max,Multi-State Option]
indexterm:[Multi-State,Option,master-max]
|master-node-max
|1
|How many copies of the resource can be promoted to the +master+ role on
a single node
indexterm:[master-node-max,Multi-State Option]
indexterm:[Multi-State,Option,master-node-max]
|=========================================================
=== Multi-state Instance Attributes ===
Multi-state resources have no instance attributes; however, any that
are set here will be inherited by a master's children.
=== Multi-state Contents ===
Masters must contain exactly one primitive or group resource.
[WARNING]
You should never reference the name of a master's child.
If you think you need to do this, you probably need to re-evaluate your design.
=== Monitoring Multi-State Resources ===
The usual monitor actions are insufficient to monitor a multi-state resource,
because pacemaker needs to verify not only that the resource is active, but
also that its actual role matches its intended one.
Define two monitoring actions: the usual one will cover the slave role,
and an additional one with +role="master"+ will cover the master role.
.Monitoring both states of a multi-state resource
======
[source,XML]
-------
-------
======
[IMPORTANT]
===========
It is crucial that _every_ monitor operation has a different interval!
Pacemaker currently differentiates between operations
only by resource and interval; so if (for example) a master/slave resource had
the same monitor interval for both roles, Pacemaker would ignore the
role when checking the status -- which would cause unexpected return
codes, and therefore unnecessary complications.
===========
=== Multi-state Constraints ===
In most cases, multi-state resources will have a single copy on each
active cluster node. If this is not the case, you can indicate which
nodes the cluster should preferentially assign copies to with resource
location constraints. These constraints are written no differently from
those for primitive resources except that the master's +id+ is used.
When considering multi-state resources in constraints, for most
purposes it is sufficient to treat them as clones. The exception is
when the +rsc-role+ and/or +with-rsc-role+ fields (for colocation
constraints) and +first-action+ and/or +then-action+ fields (for
ordering constraints) are used.
.Additional constraint options relevant to multi-state resources
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|rsc-role
|started
|An additional attribute of colocation constraints that specifies the
role that +rsc+ must be in. Allowed values: +started+, +master+,
+slave+.
indexterm:[rsc-role,Ordering Constraints]
indexterm:[Constraints,Ordering,rsc-role]
|with-rsc-role
|started
|An additional attribute of colocation constraints that specifies the
role that +with-rsc+ must be in. Allowed values: +started+,
+master+, +slave+.
indexterm:[with-rsc-role,Ordering Constraints]
indexterm:[Constraints,Ordering,with-rsc-role]
|first-action
|start
|An additional attribute of ordering constraints that specifies the
action that the +first+ resource must complete before executing the
specified action for the +then+ resource. Allowed values: +start+,
+stop+, +promote+, +demote+.
indexterm:[first-action,Ordering Constraints]
indexterm:[Constraints,Ordering,first-action]
|then-action
|value of +first-action+
|An additional attribute of ordering constraints that specifies the
action that the +then+ resource can only execute after the
+first-action+ on the +first+ resource has completed. Allowed
values: +start+, +stop+, +promote+, +demote+.
indexterm:[then-action,Ordering Constraints]
indexterm:[Constraints,Ordering,then-action]
|=========================================================
.Constraints involving multi-state resources
======
[source,XML]
-------
-------
======
In the example above, +myApp+ will wait until one of the database
copies has been started and promoted to master before being started
itself on the same node. Only if no copies can be promoted will +myApp+ be
prevented from being active. Additionally, the cluster will wait for
+myApp+ to be stopped before demoting the database.
Colocation of a primitive or group resource with a multi-state
resource means that it can run on any machine with an active copy of
the multi-state resource that has the specified role (+master+ or
+slave+). In the example above, the cluster will choose a location based on
where database is running as a +master+, and if there are multiple
+master+ instances it will also factor in +myApp+'s own location
preferences when deciding which location to choose.
Colocation with regular clones and other multi-state resources is also
possible. In such cases, the set of allowed locations for the +rsc+
clone is (after role filtering) limited to nodes on which the
+with-rsc+ multi-state resource is (or will be) in the specified role.
Placement is then performed as normal.
==== Using Multi-state Resources in Colocation Sets ====
.Additional colocation set options relevant to multi-state resources
[width="95%",cols="1m,1,6<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|role
|started
|The role that 'all members' of the set must be in. Allowed values: +started+, +master+,
+slave+.
indexterm:[role,Ordering Constraints]
indexterm:[Constraints,Ordering,role]
|=========================================================
In the following example +B+'s master must be located on the same node as +A+'s master.
Additionally resources +C+ and +D+ must be located on the same node as +A+'s
and +B+'s masters.
.Colocate C and D with A's and B's master instances
======
[source,XML]
-------
-------
======
==== Using Multi-state Resources in Ordering Sets ====
.Additional ordered set options relevant to multi-state resources
[width="95%",cols="1m,1,3<",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|action
|value of +first-action+
|An additional attribute of ordering constraint sets that specifies the
action that applies to 'all members' of the set. Allowed
values: +start+, +stop+, +promote+, +demote+.
indexterm:[action,Ordering Constraints]
indexterm:[Constraints,Ordering,action]
|=========================================================
.Start C and D after first promoting A and B
======
[source,XML]
-------
-------
======
In the above example, +B+ cannot be promoted to a master role until +A+ has
been promoted. Additionally, resources +C+ and +D+ must wait until +A+ and +B+
have been promoted before they can start.
=== Multi-state Stickiness ===
indexterm:[resource-stickiness,Multi-State]
As with regular clones, multi-state resources are
slightly sticky by default. See <> for details.
=== Which Resource Instance is Promoted ===
During the start operation, most resource agents should call
the `crm_master` utility. This tool automatically detects both the
resource and host and should be used to set a preference for being
promoted. Based on this, +master-max+, and +master-node-max+, the
instance(s) with the highest preference will be promoted.
An alternative is to create a location constraint that
indicates which nodes are most preferred as masters.
.Explicitly preferring node1 to be promoted to master
======
[source,XML]
-------
-------
======
=== Requirements for Multi-state Resource Agents ===
Since multi-state resources are an extension of cloned resources, all
the requirements for resource agents that support clones are also requirements
for resource agents that support multi-state resources.
Additionally, multi-state resources require two extra
actions, +demote+ and +promote+, which are responsible for
changing the state of the resource. Like +start+ and +stop+, they
should return +$\{OCF_SUCCESS}+ if they completed successfully or a
relevant error code if they did not.
The states can mean whatever you wish, but when the resource is
started, it must come up in the mode called +slave+. From there the
cluster will decide which instances to promote to +master+.
In addition to the clone requirements for monitor actions, agents must
also _accurately_ report which state they are in. The cluster relies
on the agent to report its status (including role) accurately and does
not indicate to the agent what role it currently believes it to be in.
.Role implications of OCF return codes
[width="95%",cols="1,1<",options="header",align="center"]
|=========================================================
|Monitor Return Code
|Description
|OCF_NOT_RUNNING
|Stopped
indexterm:[Return Code,OCF_NOT_RUNNING]
|OCF_SUCCESS
|Running (Slave)
indexterm:[Return Code,OCF_SUCCESS]
|OCF_RUNNING_MASTER
|Running (Master)
indexterm:[Return Code,OCF_RUNNING_MASTER]
|OCF_FAILED_MASTER
|Failed (Master)
indexterm:[Return Code,OCF_FAILED_MASTER]
|Other
|Failed (Slave)
|=========================================================
==== Multi-state Notifications ====
Like clones, supporting notifications requires the +notify+ action to
be implemented. If supported, the notify action will be passed a
number of extra variables which, when combined with additional
context, can be used to calculate the current state of the cluster and
what is about to happen to it.
.Environment variables supplied with multi-state notify actions footnote:[Emphasized variables are specific to +Master+ resources, and all behave in the same manner as described for Clone resources.]
[width="95%",cols="5,3<",options="header",align="center"]
|=========================================================
|Variable
|Description
|OCF_RESKEY_CRM_meta_notify_type
|Allowed values: +pre+, +post+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,type]
indexterm:[type,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_operation
|Allowed values: +start+, +stop+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,operation]
indexterm:[operation,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_resource
|Resources the that are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_resource]
indexterm:[active_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_inactive_resource
|Resources the that are not running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,inactive_resource]
indexterm:[inactive_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_master_resource_
|Resources that are running in +Master+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,master_resource]
indexterm:[master_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_slave_resource_
|Resources that are running in +Slave+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,slave_resource]
indexterm:[slave_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_resource
|Resources to be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_resource]
indexterm:[start_resource,Notification Environment Variable]
|indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_resource]
indexterm:[stop_resource,Notification Environment Variable]
OCF_RESKEY_CRM_meta_notify_stop_resource
|Resources to be stopped
|_OCF_RESKEY_CRM_meta_notify_promote_resource_
|Resources to be promoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,promote_resource]
indexterm:[promote_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_demote_resource_
|Resources to be demoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,demote_resource]
indexterm:[demote_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_uname
|Nodes on which resources will be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_uname]
indexterm:[start_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_uname
|Nodes on which resources will be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_uname]
indexterm:[stop_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_promote_uname_
|Nodes on which resources will be promote
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,promote_uname]
indexterm:[promote_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_demote_uname_
|Nodes on which resources will be demoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,demote_uname]
indexterm:[demote_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_uname
|Nodes on which resources are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_uname]
indexterm:[active_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_inactive_uname
|Nodes on which resources are not running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,inactive_uname]
indexterm:[inactive_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_master_uname_
|Nodes on which resources are running in +Master+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,master_uname]
indexterm:[master_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_slave_uname_
|Nodes on which resources are running in +Slave+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,slave_uname]
indexterm:[slave_uname,Notification Environment Variable]
|=========================================================
==== Proper Interpretation of Multi-state Notification Environment Variables ====
.Pre-notification (demote):
* +Active+ resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* +Master+ resources: +$OCF_RESKEY_CRM_meta_notify_master_resource+
* +Slave+ resources: +$OCF_RESKEY_CRM_meta_notify_slave_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (demote) / Pre-notification (stop):
* +Active+ resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources: +$OCF_RESKEY_CRM_meta_notify_slave_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
.Post-notification (stop) / Pre-notification (start)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (start) / Pre-notification (promote)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (promote)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
index f9422a9457..a5bcf0dcfa 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
@@ -1,873 +1,892 @@
= STONITH =
////
We prefer [[ch-stonith]], but older versions of asciidoc don't deal well
with that construct for chapter headings
////
anchor:ch-stonith[Chapter 13, STONITH]
indexterm:[STONITH, Configuration]
== What Is STONITH? ==
STONITH (an acronym for "Shoot The Other Node In The Head"), also called
'fencing', protects your data from being corrupted by rogue nodes or concurrent
access.
Just because a node is unresponsive, this doesn't mean it isn't
accessing your data. The only way to be 100% sure that your data is
safe, is to use STONITH so we can be certain that the node is truly
offline, before allowing the data to be accessed from another node.
STONITH also has a role to play in the event that a clustered service
cannot be stopped. In this case, the cluster uses STONITH to force the
whole node offline, thereby making it safe to start the service
elsewhere.
== What STONITH Device Should You Use? ==
It is crucial that the STONITH device can allow the cluster to
differentiate between a node failure and a network one.
The biggest mistake people make in choosing a STONITH device is to
use a remote power switch (such as many on-board IPMI controllers) that
shares power with the node it controls. In such cases, the cluster
cannot be sure if the node is really offline, or active and suffering
from a network fault.
Likewise, any device that relies on the machine being active (such as
SSH-based "devices" used during testing) are inappropriate.
== Special Treatment of STONITH Resources ==
STONITH resources are somewhat special in Pacemaker.
STONITH may be initiated by pacemaker or by other parts of the cluster
(such as resources like DRBD or DLM). To accommodate this, pacemaker
does not require the STONITH resource to be in the 'started' state
in order to be used, thus allowing reliable use of STONITH devices in such a
case.
[NOTE]
====
In pacemaker versions 1.1.9 and earlier, this feature either did not exist or
did not work well. Only "running" STONITH resources could be used by Pacemaker
for fencing, and if another component tried to fence a node while Pacemaker was
moving STONITH resources, the fencing could fail.
====
All nodes have access to STONITH devices' definitions and instantiate them
on-the-fly when needed, but preference is given to 'verified' instances, which
are the ones that are 'started' according to the cluster's knowledge.
In the case of a cluster split, the partition with a verified instance
will have a slight advantage, because the STONITH daemon in the other partition
will have to hear from all its current peers before choosing a node to
perform the fencing.
Fencing resources do work the same as regular resources in some respects:
* +target-role+ can be used to enable or disable the resource
* Location constraints can be used to prevent a specific node from using the resource
[IMPORTANT]
===========
Currently there is a limitation that fencing resources may only have
one set of meta-attributes and one set of instance attributes. This
can be revisited if it becomes a significant limitation for people.
===========
See the table below or run `man stonithd` to see special instance attributes
that may be set for any fencing resource, regardless of fence agent.
.Properties of Fencing Resources
[width="95%",cols="5m,2,3,10
----
====
Based on that, we would create a STONITH resource fragment that might look
like this:
.An IPMI-based STONITH Resource
====
[source,XML]
----
----
====
Finally, we need to enable STONITH:
----
# crm_attribute -t crm_config -n stonith-enabled -v true
----
== Advanced STONITH Configurations ==
Some people consider that having one fencing device is a single point
of failure footnote:[Not true, since a node or resource must fail
before fencing even has a chance to]; others prefer removing the node
from the storage and network instead of turning it off.
Whatever the reason, Pacemaker supports fencing nodes with multiple
devices through a feature called 'fencing topologies'.
Simply create the individual devices as you normally would, then
define one or more +fencing-level+ entries in the +fencing-topology+ section of
the configuration.
* Each fencing level is attempted in order of ascending +index+. Allowed
indexes are 0 to 9.
* If a device fails, processing terminates for the current level.
No further devices in that level are exercised, and the next level is attempted instead.
* If the operation succeeds for all the listed devices in a level, the level is deemed to have passed.
* The operation is finished when a level has passed (success), or all levels have been attempted (failed).
* If the operation failed, the next step is determined by the Policy Engine and/or `crmd`.
Some possible uses of topologies include:
* Try poison-pill and fail back to power
* Try disk and network, and fall back to power if either fails
* Initiate a kdump and then poweroff the node
.Properties of Fencing Levels
-[width="95%",cols="1m,6<",options="header",align="center"]
+[width="95%",cols="1m,3<",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the level
indexterm:[id,fencing-level]
indexterm:[Fencing,fencing-level,id]
|target
-|The node to which this level applies
+|The name of a single node to which this level applies
indexterm:[target,fencing-level]
indexterm:[Fencing,fencing-level,target]
+|target-pattern
+|A regular expression matching the names of nodes to which this level applies
+'(since 1.1.14)'
+ indexterm:[target-pattern,fencing-level]
+ indexterm:[Fencing,fencing-level,target-pattern]
+
+|target-attribute
+|The name of a node attribute that is set for nodes to which this level applies
+'(since 1.1.14)'
+ indexterm:[target-attribute,fencing-level]
+ indexterm:[Fencing,fencing-level,target-attribute]
+
|index
|The order in which to attempt the levels.
Levels are attempted in ascending order 'until one succeeds'.
indexterm:[index,fencing-level]
indexterm:[Fencing,fencing-level,index]
|devices
|A comma-separated list of devices that must all be tried for this level
indexterm:[devices,fencing-level]
indexterm:[Fencing,fencing-level,devices]
|=========================================================
.Fencing topology with different devices for different nodes
====
[source,XML]
----
...
...
----
====
=== Example Dual-Layer, Dual-Device Fencing Topologies ===
The following example illustrates an advanced use of +fencing-topology+ in a cluster with the following properties:
* 3 nodes (2 active prod-mysql nodes, 1 prod_mysql-rep in standby for quorum purposes)
* the active nodes have an IPMI-controlled power board reached at 192.0.2.1 and 192.0.2.2
* the active nodes also have two independent PSUs (Power Supply Units)
connected to two independent PDUs (Power Distribution Units) reached at
198.51.100.1 (port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
* the first fencing method uses the `fence_ipmi` agent
* the second fencing method uses the `fence_apc_snmp` agent targetting 2 fencing devices (one per PSU, either port 10 or 11)
* fencing is only implemented for the active nodes and has location constraints
* fencing topology is set to try IPMI fencing first then default to a "sure-kill" dual PDU fencing
In a normal failure scenario, STONITH will first select +fence_ipmi+ to try to kill the faulty node.
Using a fencing topology, if that first method fails, STONITH will then move on to selecting +fence_apc_snmp+ twice:
* once for the first PDU
* again for the second PDU
The fence action is considered successful only if both PDUs report the required status. If any of them fails, STONITH loops back to the first fencing method, +fence_ipmi+, and so on until the node is fenced or fencing action is cancelled.
.First fencing method: single IPMI device
Each cluster node has it own dedicated IPMI channel that can be called for fencing using the following primitives:
[source,XML]
----
----
.Second fencing method: dual PDU devices
Each cluster node also has two distinct power channels controlled by two
distinct PDUs. That means a total of 4 fencing devices configured as follows:
- Node 1, PDU 1, PSU 1 @ port 10
- Node 1, PDU 2, PSU 2 @ port 10
- Node 2, PDU 1, PSU 1 @ port 11
- Node 2, PDU 2, PSU 2 @ port 11
The matching fencing agents are configured as follows:
[source,XML]
----
----
.Location Constraints
To prevent STONITH from trying to run a fencing agent on the same node it is
supposed to fence, constraints are placed on all the fencing primitives:
[source,XML]
----
----
.Fencing topology
Now that all the fencing resources are defined, it's time to create the right topology.
We want to first fence using IPMI and if that does not work, fence both PDUs to effectively and surely kill the node.
[source,XML]
----
----
Please note, in +fencing-topology+, the lowest +index+ value determines the priority of the first fencing method.
.Final configuration
Put together, the configuration looks like this:
[source,XML]
----
...
...
----
== Remapping Reboots ==
When the cluster needs to reboot a node, whether because +stonith-action+ is +reboot+ or because
a reboot was manually requested (such as by `stonith_admin --reboot`), it will remap that to
other commands in two cases:
. If the chosen fencing device does not support the +reboot+ command, the cluster
will ask it to perform +off+ instead.
. If a fencing topology level with multiple devices must be executed, the cluster
will ask all the devices to perform +off+, then ask the devices to perform +on+.
To understand the second case, consider the example of a node with redundant
power supplies connected to intelligent power switches. Rebooting one switch
and then the other would have no effect on the node. Turning both switches off,
and then on, actually reboots the node.
In such a case, the fencing operation will be treated as successful as long as
the +off+ commands succeed, because then it is safe for the cluster to recover
any resources that were on the node. Timeouts and errors in the +on+ phase will
be logged but ignored.
When a reboot operation is remapped, any action-specific timeout for the
remapped action will be used (for example, +pcmk_off_timeout+ will be used when
executing the +off+ command, not +pcmk_reboot_timeout+).
+
+[NOTE]
+====
+In Pacemaker versions 1.1.13 and earlier, reboots will not be remapped in the
+second case. To achieve the same effect, separate fencing devices for off and
+on actions must be configured.
+====
diff --git a/doc/Pacemaker_Explained/en-US/Revision_History.xml b/doc/Pacemaker_Explained/en-US/Revision_History.xml
index eecd34b59b..33010d5c0e 100644
--- a/doc/Pacemaker_Explained/en-US/Revision_History.xml
+++ b/doc/Pacemaker_Explained/en-US/Revision_History.xml
@@ -1,60 +1,72 @@
Revision History
1-0
19 Oct 2009
AndrewBeekhofandrew@beekhof.net
Import from Pages.app
2-0
26 Oct 2009
AndrewBeekhofandrew@beekhof.net
Cleanup and reformatting of docbook xml complete
3-0
Tue Nov 12 2009
AndrewBeekhofandrew@beekhof.net
Split book into chapters and pass validation
Re-organize book for use with Publican
4-0
Mon Oct 8 2012
AndrewBeekhofandrew@beekhof.net
Converted to asciidoc
(which is converted to docbook for use with
Publican)
5-0
Mon Feb 23 2015
KenGaillotkgaillot@redhat.com
Update for clarity, stylistic consistency and current command-line syntax
+
+ 6-0
+ Tue Dec 8 2015
+ KenGaillotkgaillot@redhat.com
+
+
+
+ Update for Pacemaker 1.1.14
+
+
+
+
diff --git a/doc/Pacemaker_Remote/en-US/Book_Info.xml b/doc/Pacemaker_Remote/en-US/Book_Info.xml
index a26494e742..12e1ab891d 100644
--- a/doc/Pacemaker_Remote/en-US/Book_Info.xml
+++ b/doc/Pacemaker_Remote/en-US/Book_Info.xml
@@ -1,75 +1,75 @@
%BOOK_ENTITIES;
]>
Pacemaker Remote
Scaling High Availablity Clusters
- 4
+ 5
0
The document exists as both a reference and deployment guide for the Pacemaker Remote service.
The example commands in this document will use:
&DISTRO; &DISTRO_VERSION; as the host operating system
Pacemaker Remote to perform resource management within guest nodes and remote nodes
KVM for virtualization
libvirt to manage guest nodes
Corosync to provide messaging and membership services on cluster nodes
Pacemaker to perform resource management on cluster nodes
pcs as the cluster configuration toolset
The concepts are the same for other distributions,
virtualization platforms, toolsets, and messaging
layers, and should be easily adaptable.
diff --git a/doc/Pacemaker_Remote/en-US/Ch-Intro.txt b/doc/Pacemaker_Remote/en-US/Ch-Intro.txt
index 438ecd2aa4..16934907f8 100644
--- a/doc/Pacemaker_Remote/en-US/Ch-Intro.txt
+++ b/doc/Pacemaker_Remote/en-US/Ch-Intro.txt
@@ -1,193 +1,198 @@
= Scaling a Pacemaker Cluster =
== Overview ==
In a basic Pacemaker high-availability
cluster,footnote:[See the http://www.clusterlabs.org/doc/[Pacemaker
documentation], especially 'Clusters From Scratch' and 'Pacemaker Explained',
for basic information about high-availability using Pacemaker]
each node runs the full cluster stack of corosync and all Pacemaker components.
This allows great flexibility but limits scalability to around 16 nodes.
To allow for scalability to dozens or even hundreds of nodes, Pacemaker
allows nodes not running the full cluster stack to integrate into the cluster
and have the cluster manage their resources as if they were a cluster node.
== Terms ==
cluster node::
A node running the full high-availability stack of corosync and all
Pacemaker components. Cluster nodes may run cluster resources, run
all Pacemaker command-line tools (`crm_mon`, `crm_resource` and so on),
execute fencing actions, count toward cluster quorum, and serve as the
cluster's Designated Controller (DC).
(((cluster node)))
(((node,cluster node)))
pacemaker_remote::
A small service daemon that allows a host to be used as a Pacemaker node
without running the full cluster stack. Nodes running pacemaker_remote
may run cluster resources and most command-line tools, but cannot perform
other functions of full cluster nodes such as fencing execution, quorum
voting or DC eligibility. The pacemaker_remote daemon is an enhanced
version of Pacemaker's local resource management daemon (LRMD).
(((pacemaker_remote)))
remote node::
A physical host running pacemaker_remote. Remote nodes have a special
resource that manages communication with the cluster. This is sometimes
referred to as the 'baremetal' case.
(((remote node)))
(((node,remote node)))
guest node::
A virtual host running pacemaker_remote. Guest nodes differ from remote
nodes mainly in that the guest node is itself a resource that the cluster
manages.
(((guest node)))
(((node,guest node)))
[NOTE]
======
'Remote' in this document refers to the node not being a part of the underlying
corosync cluster. It has nothing to do with physical proximity. Remote nodes
and guest nodes are subject to the same latency requirements as cluster nodes,
which means they are typically in the same data center.
======
[NOTE]
======
It is important to distinguish the various roles a virtual machine can serve
in Pacemaker clusters:
* A virtual machine can run the full cluster stack, in which case it is a
cluster node and is not itself managed by the cluster.
* A virtual machine can be managed by the cluster as a resource, without the
cluster having any awareness of the services running inside the virtual
machine. The virtual machine is 'opaque' to the cluster.
* A virtual machine can be a cluster resource, and run pacemaker_remote
to make it a a guest node, allowing the cluster to manage services
inside it. The virtual machine is 'transparent' to the cluster.
======
== Support in Pacemaker Versions ==
It is recommended to run Pacemaker 1.1.12 or later when using pacemaker_remote
due to important bug fixes. An overview of changes in pacemaker_remote
capability by version:
+.1.1.14
+* Resources that create guest nodes can be included in groups
+* reconnect_interval option for remote nodes
+* Bug fixes, including a memory leak
+
.1.1.13
* Support for maintenance mode
* Remote nodes can recover without being fenced when the cluster node
hosting their connection fails
* Running pacemaker_remote within LXC environments is deprecated due to
newly added Pacemaker support for isolated resources
* Bug fixes
.1.1.12
* Support for permanent node attributes
* Support for migration
* Bug fixes
.1.1.11
* Support for IPv6
* Support for remote nodes
* Support for transient node attributes
* Support for clusters with mixed endian architectures
* Bug fixes
.1.1.10
* Bug fixes
.1.1.9
* Initial version to include pacemaker_remote
* Limited to guest nodes in KVM/LXC environments using only IPv4;
all nodes' architectures must have same endianness
== Guest Nodes ==
(((guest node)))
(((node,guest node)))
*"I want a Pacemaker cluster to manage virtual machine resources, but I also
want Pacemaker to be able to manage the resources that live within those
virtual machines."*
Without pacemaker_remote, the possibilities for implementing the above use case
have significant limitations:
* The cluster stack could be run on the physical hosts only, which loses the
ability to monitor resources within the guests.
* A separate cluster could be on the virtual guests, which quickly hits
scalability issues.
* The cluster stack could be run on the guests using the same cluster as the
physical hosts, which also hits scalability issues and complicates fencing.
With pacemaker_remote:
* The physical hosts are cluster nodes (running the full cluster stack).
* The virtual machines are guest nodes (running the pacemaker_remote service).
Nearly zero configuration is required on the virtual machine.
* The cluster stack on the cluster nodes launches the virtual machines and
immediately connects to the pacemaker_remote service on them, allowing the
virtual machines to integrate into the cluster.
The key difference here between the guest nodes and the cluster nodes is that
the guest nodes do not run the cluster stack. This means they will never become
the DC, initiate fencing actions or participate in quorum voting.
On the other hand, this also means that they are not bound to the scalability
limits associated with the cluster stack (no 16-node corosync member limits to
deal with). That isn't to say that guest nodes can scale indefinitely, but it
is known that guest nodes scale horizontally much further than cluster nodes.
Other than the quorum limitation, these guest nodes behave just like cluster
nodes with respect to resource management. The cluster is fully capable of
managing and monitoring resources on each guest node. You can build constraints
against guest nodes, put them in standby, or do whatever else you'd expect to
be able to do with cluster nodes. They even show up in `crm_mon` output as
nodes.
To solidify the concept, below is an example that is very similar to an actual
deployment we test in our developer environment to verify guest node scalability:
* 16 cluster nodes running the full corosync + pacemaker stack
* 64 Pacemaker-managed virtual machine resources running pacemaker_remote configured as guest nodes
* 64 Pacemaker-managed webserver and database resources configured to run on the 64 guest nodes
With this deployment, you would have 64 webservers and databases running on 64
virtual machines on 16 hardware nodes, all of which are managed and monitored by
the same Pacemaker deployment. It is known that pacemaker_remote can scale to
these lengths and possibly much further depending on the specific scenario.
== Remote Nodes ==
(((remote node)))
(((node,remote node)))
*"I want my traditional high-availability cluster to scale beyond the limits
imposed by the corosync messaging layer."*
Ultimately, the primary advantage of remote nodes over cluster nodes is
scalability. There are likely some other use cases related to geographically
distributed HA clusters that remote nodes may serve a purpose in, but those use
cases are not well understood at this point.
Like guest nodes, remote nodes will never become the DC, initiate
fencing actions or participate in quorum voting.
That is not to say, however, that fencing of a remote node works any
differently than that of a cluster node. The Pacemaker policy engine
understands how to fence remote nodes. As long as a fencing device exists, the
cluster is capable of ensuring remote nodes are fenced in the exact same way as
cluster nodes.
== Expanding the Cluster Stack ==
With pacemaker_remote, the traditional view of the high-availability stack can
be expanded to include a new layer:
.Traditional HA Stack
image::images/pcmk-ha-cluster-stack.png["Traditional Pacemaker+Corosync Stack",width="17cm",height="9cm",align="center"]
.HA Stack With Guest Nodes
image::images/pcmk-ha-remote-stack.png["Pacemaker+Corosync Stack With pacemaker_remote",width="20cm",height="10cm",align="center"]
diff --git a/doc/Pacemaker_Remote/en-US/Ch-Options.txt b/doc/Pacemaker_Remote/en-US/Ch-Options.txt
index abe511fd35..f04b8b6e94 100644
--- a/doc/Pacemaker_Remote/en-US/Ch-Options.txt
+++ b/doc/Pacemaker_Remote/en-US/Ch-Options.txt
@@ -1,115 +1,121 @@
= Configuration Explained =
The walk-through examples use some of these options, but don't explain exactly
what they mean or do. This section is meant to be the go-to resource for all
the options available for configuring pacemaker_remote-based nodes.
(((configuration)))
== Resource Meta-Attributes for Guest Nodes ==
When configuring a virtual machine to use as a guest node, these are the
metadata options available to enable the resource as a guest node and
define its connection parameters.
.Meta-attributes for configuring VM resources as guest nodes
[width="95%",cols="2m,1,4<",options="header",align="center"]
|=========================================================
|Option
|Default
|Description
|remote-node
|'none'
|The node name of the guest node this resource defines. This both enables the
resource as a guest node and defines the unique name used to identify the
guest node. If no other parameters are set, this value will also be assumed as
the hostname to use when connecting to pacemaker_remote on the VM. This value
*must not* overlap with any resource or node IDs.
|remote-port
|3121
|The port on the virtual machine that the cluster will use to connect to
pacemaker_remote.
|remote-addr
|'value of' +remote-node+
|The IP address or hostname to use when connecting to pacemaker_remote on the VM.
|remote-connect-timeout
|60s
|How long before a pending guest connection will time out.
|=========================================================
== Connection Resources for Remote Nodes ==
A remote node is defined by a connection resource. That connection resource
has instance attributes that define where the remote node is located on the
network and how to communicate with it.
Descriptions of these instance attributes can be retrieved using the following
`pcs` command:
----
# pcs resource describe remote
-ocf:pacemaker:remote -
-
-
+ocf:pacemaker:remote - remote resource agent
Resource options:
server: Server location to connect to. This can be an ip address or hostname.
port: tcp port to connect to.
+ reconnect_interval: Time in seconds to wait before attempting to reconnect to
+ a remote node after an active connection to the remote
+ node has been severed. This wait is recurring. If
+ reconnect fails after the wait period, a new reconnect
+ attempt will be made after observing the wait time. When
+ this option is in use, pacemaker will keep attempting to
+ reach out and connect to the remote node indefinitely
+ after each wait interval.
----
When defining a remote node's connection resource, it is common and recommended
to name the connection resource the same as the remote node's hostname. By
default, if no *server* option is provided, the cluster will attempt to contact
the remote node using the resource name as the hostname.
Example defining a remote node with the hostname *remote1*:
----
# pcs resource create remote1 remote
----
Example defining a remote node to connect to a specific IP address and port:
----
# pcs resource create remote1 remote server=192.168.122.200 port=8938
----
== Environment Variables for Daemon Start-up ==
Authentication and encryption of the connection between cluster nodes
and nodes running pacemaker_remote is achieved using
with https://en.wikipedia.org/wiki/TLS-PSK[TLS-PSK] encryption/authentication
over TCP (port 3121 by default). This means that both the cluster node and
remote node must share the same private key. By default, this
key is placed at +/etc/pacemaker/authkey+ on each node.
You can change the default port and/or key location for Pacemaker and
pacemaker_remote via environment variables. These environment variables can be
enabled by placing them in the +/etc/sysconfig/pacemaker+ file.
----
#==#==# Pacemaker Remote
# Use a custom directory for finding the authkey.
PCMK_authkey_location=/etc/pacemaker/authkey
#
# Specify a custom port for Pacemaker Remote connections
PCMK_remote_port=3121
----
== Removing Remote Nodes and Guest Nodes ==
If the resource creating a guest node, or the *ocf:pacemaker:remote* resource
creating a connection to a remote node, is removed from the configuration, the
affected node will continue to show up in output as an offline node.
If you want to get rid of that output, run (replacing $NODE_NAME appropriately):
----
# crm_node --force --remove $NODE_NAME
----
[WARNING]
=========
Be absolutely sure that the node's resource has been deleted from the
configuration first.
=========
diff --git a/doc/Pacemaker_Remote/en-US/Revision_History.xml b/doc/Pacemaker_Remote/en-US/Revision_History.xml
index 269b549a11..1954f14d96 100644
--- a/doc/Pacemaker_Remote/en-US/Revision_History.xml
+++ b/doc/Pacemaker_Remote/en-US/Revision_History.xml
@@ -1,37 +1,42 @@
%BOOK_ENTITIES;
]>
Revision History
1-0
Tue Mar 19 2013
DavidVosseldavidvossel@gmail.com
Import from Pages.app
2-0
Tue May 13 2013
DavidVosseldavidvossel@gmail.com
Added Future Features Section
3-0
Fri Oct 18 2013
DavidVosseldavidvossel@gmail.com
Added Baremetal remote-node feature documentation
4-0
Tue Aug 25 2015
KenGaillotkgaillot@redhat.com
Targeted CentOS 7.1 and Pacemaker 1.1.12+, updated for current terminology and practice
+
+ 5-0
+ Tue Dec 8 2015
+ KenGaillotkgaillot@redhat.com
+ Updated for Pacemaker 1.1.14
+
-