diff --git a/doc/sphinx/Pacemaker_Explained/advanced-resources.rst b/doc/sphinx/Pacemaker_Explained/advanced-resources.rst
index 5bb49dc884..b65f8c26db 100644
--- a/doc/sphinx/Pacemaker_Explained/advanced-resources.rst
+++ b/doc/sphinx/Pacemaker_Explained/advanced-resources.rst
@@ -1,1597 +1,1598 @@
Advanced Resource Types
-----------------------
.. index:
single: group resource
single: resource; group
.. _group-resources:
Groups - A Syntactic Shortcut
#############################
One of the most common elements of a cluster is a set of resources
that need to be located together, start sequentially, and stop in the
reverse order. To simplify this configuration, we support the concept
of groups.
.. topic:: A group of two primitive resources
.. code-block:: xml
Although the example above contains only two resources, there is no
limit to the number of resources a group can contain. The example is
also sufficient to explain the fundamental properties of a group:
* Resources are started in the order they appear in (**Public-IP** first,
then **Email**)
* Resources are stopped in the reverse order to which they appear in
(**Email** first, then **Public-IP**)
If a resource in the group can't run anywhere, then nothing after that
is allowed to run, too.
* If **Public-IP** can't run anywhere, neither can **Email**;
* but if **Email** can't run anywhere, this does not affect **Public-IP**
in any way
The group above is logically equivalent to writing:
.. topic:: How the cluster sees a group resource
.. code-block:: xml
Obviously as the group grows bigger, the reduced configuration effort
can become significant.
Another (typical) example of a group is a DRBD volume, the filesystem
mount, an IP address, and an application that uses them.
.. index::
pair: XML element; group
Group Properties
________________
.. table:: **Properties of a Group Resource**
+-------+--------------------------------------+
| Field | Description |
+=======+======================================+
| id | .. index:: |
| | single: group; property, id |
| | single: property; id (group) |
| | single: id; group property |
| | |
| | A unique name for the group |
+-------+--------------------------------------+
Group Options
_____________
Groups inherit the ``priority``, ``target-role``, and ``is-managed`` properties
from primitive resources. See :ref:`resource_options` for information about
those properties.
Group Instance Attributes
_________________________
Groups have no instance attributes. However, any that are set for the group
object will be inherited by the group's children.
Group Contents
______________
Groups may only contain a collection of cluster resources (see
:ref:`primitive-resource`). To refer to a child of a group resource, just use
the child's ``id`` instead of the group's.
Group Constraints
_________________
Although it is possible to reference a group's children in
constraints, it is usually preferable to reference the group itself.
.. topic:: Some constraints involving groups
.. code-block:: xml
.. index::
pair: resource-stickiness; group
Group Stickiness
________________
Stickiness, the measure of how much a resource wants to stay where it
is, is additive in groups. Every active resource of the group will
contribute its stickiness value to the group's total. So if the
default ``resource-stickiness`` is 100, and a group has seven members,
five of which are active, then the group as a whole will prefer its
current location with a score of 500.
.. index::
single: clone
single: resource; clone
.. _s-resource-clone:
Clones - Resources That Can Have Multiple Active Instances
##########################################################
*Clone* resources are resources that can have more than one copy active at the
same time. This allows you, for example, to run a copy of a daemon on every
node. You can clone any primitive or group resource [#]_.
Anonymous versus Unique Clones
______________________________
A clone resource is configured to be either *anonymous* or *globally unique*.
Anonymous clones are the simplest. These behave completely identically
everywhere they are running. Because of this, there can be only one instance of
an anonymous clone active per node.
The instances of globally unique clones are distinct entities. All instances
are launched identically, but one instance of the clone is not identical to any
other instance, whether running on the same node or a different node. As an
example, a cloned IP address can use special kernel functionality such that
each instance handles a subset of requests for the same IP address.
.. index::
single: promotable clone
single: resource; promotable
.. _s-resource-promotable:
Promotable clones
_________________
If a clone is *promotable*, its instances can perform a special role that
Pacemaker will manage via the ``promote`` and ``demote`` actions of the resource
agent.
Services that support such a special role have various terms for the special
role and the default role: primary and secondary, master and replica,
controller and worker, etc. Pacemaker uses the terms *promoted* and
*unpromoted* to be agnostic to what the service calls them or what they do.
All that Pacemaker cares about is that an instance comes up in the unpromoted role
when started, and the resource agent supports the ``promote`` and ``demote`` actions
to manage entering and exiting the promoted role.
.. index::
pair: XML element; clone
Clone Properties
________________
.. table:: **Properties of a Clone Resource**
+-------+--------------------------------------+
| Field | Description |
+=======+======================================+
| id | .. index:: |
| | single: clone; property, id |
| | single: property; id (clone) |
| | single: id; clone property |
| | |
| | A unique name for the clone |
+-------+--------------------------------------+
.. index::
pair: options; clone
Clone Options
_____________
:ref:`Options ` inherited from primitive resources:
``priority, target-role, is-managed``
.. table:: **Clone-specific configuration options**
+-------------------+-----------------+-------------------------------------------------------+
| Field | Default | Description |
+===================+=================+=======================================================+
| globally-unique | false | .. index:: |
| | | single: clone; option, globally-unique |
| | | single: option; globally-unique (clone) |
| | | single: globally-unique; clone option |
| | | |
| | | If **true**, each clone instance performs a |
| | | distinct function |
+-------------------+-----------------+-------------------------------------------------------+
- | clone-max | number of nodes | .. index:: |
- | | in the cluster | single: clone; option, clone-max |
+ | clone-max | 0 | .. index:: |
+ | | | single: clone; option, clone-max |
| | | single: option; clone-max (clone) |
| | | single: clone-max; clone option |
| | | |
| | | The maximum number of clone instances that can |
- | | | be started across the entire cluster |
+ | | | be started across the entire cluster. If 0, the |
+ | | | number of nodes in the cluster will be used. |
+-------------------+-----------------+-------------------------------------------------------+
| clone-node-max | 1 | .. index:: |
| | | single: clone; option, clone-node-max |
| | | single: option; clone-node-max (clone) |
| | | single: clone-node-max; clone option |
| | | |
| | | If ``globally-unique`` is **true**, the maximum |
| | | number of clone instances that can be started |
| | | on a single node |
+-------------------+-----------------+-------------------------------------------------------+
| clone-min | 0 | .. index:: |
| | | single: clone; option, clone-min |
| | | single: option; clone-min (clone) |
| | | single: clone-min; clone option |
| | | |
| | | Require at least this number of clone instances |
| | | to be runnable before allowing resources |
| | | depending on the clone to be runnable. A value |
| | | of 0 means require all clone instances to be |
| | | runnable. |
+-------------------+-----------------+-------------------------------------------------------+
| notify | false | .. index:: |
| | | single: clone; option, notify |
| | | single: option; notify (clone) |
| | | single: notify; clone option |
| | | |
| | | Call the resource agent's **notify** action for |
| | | all active instances, before and after starting |
| | | or stopping any clone instance. The resource |
| | | agent must support this action. |
| | | Allowed values: **false**, **true** |
+-------------------+-----------------+-------------------------------------------------------+
| ordered | false | .. index:: |
| | | single: clone; option, ordered |
| | | single: option; ordered (clone) |
| | | single: ordered; clone option |
| | | |
| | | If **true**, clone instances must be started |
| | | sequentially instead of in parallel. |
| | | Allowed values: **false**, **true** |
+-------------------+-----------------+-------------------------------------------------------+
| interleave | false | .. index:: |
| | | single: clone; option, interleave |
| | | single: option; interleave (clone) |
| | | single: interleave; clone option |
| | | |
| | | When this clone is ordered relative to another |
| | | clone, if this option is **false** (the default), |
| | | the ordering is relative to *all* instances of |
| | | the other clone, whereas if this option is |
| | | **true**, the ordering is relative only to |
| | | instances on the same node. |
| | | Allowed values: **false**, **true** |
+-------------------+-----------------+-------------------------------------------------------+
| promotable | false | .. index:: |
| | | single: clone; option, promotable |
| | | single: option; promotable (clone) |
| | | single: promotable; clone option |
| | | |
| | | If **true**, clone instances can perform a |
| | | special role that Pacemaker will manage via the |
| | | resource agent's **promote** and **demote** |
| | | actions. The resource agent must support these |
| | | actions. |
| | | Allowed values: **false**, **true** |
+-------------------+-----------------+-------------------------------------------------------+
| promoted-max | 1 | .. index:: |
| | | single: clone; option, promoted-max |
| | | single: option; promoted-max (clone) |
| | | single: promoted-max; clone option |
| | | |
| | | If ``promotable`` is **true**, the number of |
| | | instances that can be promoted at one time |
| | | across the entire cluster |
+-------------------+-----------------+-------------------------------------------------------+
| promoted-node-max | 1 | .. index:: |
| | | single: clone; option, promoted-node-max |
| | | single: option; promoted-node-max (clone) |
| | | single: promoted-node-max; clone option |
| | | |
| | | If ``promotable`` is **true** and ``globally-unique`` |
| | | is **false**, the number of clone instances can be |
| | | promoted at one time on a single node |
+-------------------+-----------------+-------------------------------------------------------+
.. note:: **Deprecated Terminology**
In older documentation and online examples, you may see promotable clones
referred to as *multi-state*, *stateful*, or *master/slave*; these mean the
same thing as *promotable*. Certain syntax is supported for backward
compatibility, but is deprecated and will be removed in a future version:
* Using a ``master`` tag, instead of a ``clone`` tag with the ``promotable``
meta-attribute set to ``true``
* Using the ``master-max`` meta-attribute instead of ``promoted-max``
* Using the ``master-node-max`` meta-attribute instead of
``promoted-node-max``
* Using ``Master`` as a role name instead of ``Promoted``
* Using ``Slave`` as a role name instead of ``Unpromoted``
Clone Contents
______________
Clones must contain exactly one primitive or group resource.
.. topic:: A clone that runs a web server on all nodes
.. code-block:: xml
.. warning::
You should never reference the name of a clone's child (the primitive or group
resource being cloned). If you think you need to do this, you probably need to
re-evaluate your design.
Clone Instance Attribute
________________________
Clones have no instance attributes; however, any that are set here will be
inherited by the clone's child.
.. index::
single: clone; constraint
Clone Constraints
_________________
In most cases, a clone will have a single instance on each active cluster
node. If this is not the case, you can indicate which nodes the
cluster should preferentially assign copies to with resource location
constraints. These constraints are written no differently from those
for primitive resources except that the clone's **id** is used.
.. topic:: Some constraints involving clones
.. code-block:: xml
Ordering constraints behave slightly differently for clones. In the
example above, ``apache-stats`` will wait until all copies of ``apache-clone``
that need to be started have done so before being started itself.
Only if *no* copies can be started will ``apache-stats`` be prevented
from being active. Additionally, the clone will wait for
``apache-stats`` to be stopped before stopping itself.
Colocation of a primitive or group resource with a clone means that
the resource can run on any node with an active instance of the clone.
The cluster will choose an instance based on where the clone is running and
the resource's own location preferences.
Colocation between clones is also possible. If one clone **A** is colocated
with another clone **B**, the set of allowed locations for **A** is limited to
nodes on which **B** is (or will be) active. Placement is then performed
normally.
.. index::
single: promotable clone; constraint
.. _promotable-clone-constraints:
Promotable Clone Constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For promotable clone resources, the ``first-action`` and/or ``then-action`` fields
for ordering constraints may be set to ``promote`` or ``demote`` to constrain the
promoted role, and colocation constraints may contain ``rsc-role`` and/or
``with-rsc-role`` fields.
.. topic:: Constraints involving promotable clone resources
.. code-block:: xml
In the example above, **myApp** will wait until one of the database
copies has been started and promoted before being started
itself on the same node. Only if no copies can be promoted will **myApp** be
prevented from being active. Additionally, the cluster will wait for
**myApp** to be stopped before demoting the database.
Colocation of a primitive or group resource with a promotable clone
resource means that it can run on any node with an active instance of
the promotable clone resource that has the specified role (``Promoted`` or
``Unpromoted``). In the example above, the cluster will choose a location
based on where database is running in the promoted role, and if there are
multiple promoted instances it will also factor in **myApp**'s own location
preferences when deciding which location to choose.
Colocation with regular clones and other promotable clone resources is also
possible. In such cases, the set of allowed locations for the **rsc**
clone is (after role filtering) limited to nodes on which the
``with-rsc`` promotable clone resource is (or will be) in the specified role.
Placement is then performed as normal.
Using Promotable Clone Resources in Colocation Sets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a promotable clone is used in a :ref:`resource set `
inside a colocation constraint, the resource set may take a ``role`` attribute.
In the following example, an instance of **B** may be promoted only on a node
where **A** is in the promoted role. Additionally, resources **C** and **D**
must be located on a node where both **A** and **B** are promoted.
.. topic:: Colocate C and D with A's and B's promoted instances
.. code-block:: xml
Using Promotable Clone Resources in Ordered Sets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a promotable clone is used in a :ref:`resource set `
inside an ordering constraint, the resource set may take an ``action``
attribute.
.. topic:: Start C and D after first promoting A and B
.. code-block:: xml
In the above example, **B** cannot be promoted until **A** has been promoted.
Additionally, resources **C** and **D** must wait until **A** and **B** have
been promoted before they can start.
.. index::
pair: resource-stickiness; clone
.. _s-clone-stickiness:
Clone Stickiness
________________
To achieve a stable allocation pattern, clones are slightly sticky by
default. If no value for ``resource-stickiness`` is provided, the clone
will use a value of 1. Being a small value, it causes minimal
disturbance to the score calculations of other resources but is enough
to prevent Pacemaker from needlessly moving copies around the cluster.
.. note::
For globally unique clones, this may result in multiple instances of the
clone staying on a single node, even after another eligible node becomes
active (for example, after being put into standby mode then made active again).
If you do not want this behavior, specify a ``resource-stickiness`` of 0
for the clone temporarily and let the cluster adjust, then set it back
to 1 if you want the default behavior to apply again.
.. important::
If ``resource-stickiness`` is set in the ``rsc_defaults`` section, it will
apply to clone instances as well. This means an explicit ``resource-stickiness``
of 0 in ``rsc_defaults`` works differently from the implicit default used when
``resource-stickiness`` is not specified.
Clone Resource Agent Requirements
_________________________________
Any resource can be used as an anonymous clone, as it requires no
additional support from the resource agent. Whether it makes sense to
do so depends on your resource and its resource agent.
Resource Agent Requirements for Globally Unique Clones
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Globally unique clones require additional support in the resource agent. In
particular, it must only respond with ``${OCF_SUCCESS}`` if the node has that
exact instance active. All other probes for instances of the clone should
result in ``${OCF_NOT_RUNNING}`` (or one of the other OCF error codes if
they are failed).
Individual instances of a clone are identified by appending a colon and a
numerical offset, e.g. **apache:2**.
Resource agents can find out how many copies there are by examining
the ``OCF_RESKEY_CRM_meta_clone_max`` environment variable and which
instance it is by examining ``OCF_RESKEY_CRM_meta_clone``.
The resource agent must not make any assumptions (based on
``OCF_RESKEY_CRM_meta_clone``) about which numerical instances are active. In
particular, the list of active copies will not always be an unbroken
sequence, nor always start at 0.
Resource Agent Requirements for Promotable Clones
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Promotable clone resources require two extra actions, ``demote`` and ``promote``,
which are responsible for changing the state of the resource. Like **start** and
**stop**, they should return ``${OCF_SUCCESS}`` if they completed successfully or
a relevant error code if they did not.
The states can mean whatever you wish, but when the resource is
started, it must come up in the unpromoted role. From there, the
cluster will decide which instances to promote.
In addition to the clone requirements for monitor actions, agents must
also *accurately* report which state they are in. The cluster relies
on the agent to report its status (including role) accurately and does
not indicate to the agent what role it currently believes it to be in.
.. table:: **Role implications of OCF return codes**
+----------------------+--------------------------------------------------+
| Monitor Return Code | Description |
+======================+==================================================+
| OCF_NOT_RUNNING | .. index:: |
| | single: OCF_NOT_RUNNING |
| | single: OCF return code; OCF_NOT_RUNNING |
| | |
| | Stopped |
+----------------------+--------------------------------------------------+
| OCF_SUCCESS | .. index:: |
| | single: OCF_SUCCESS |
| | single: OCF return code; OCF_SUCCESS |
| | |
| | Running (Unpromoted) |
+----------------------+--------------------------------------------------+
| OCF_RUNNING_PROMOTED | .. index:: |
| | single: OCF_RUNNING_PROMOTED |
| | single: OCF return code; OCF_RUNNING_PROMOTED |
| | |
| | Running (Promoted) |
+----------------------+--------------------------------------------------+
| OCF_FAILED_PROMOTED | .. index:: |
| | single: OCF_FAILED_PROMOTED |
| | single: OCF return code; OCF_FAILED_PROMOTED |
| | |
| | Failed (Promoted) |
+----------------------+--------------------------------------------------+
| Other | .. index:: |
| | single: return code |
| | |
| | Failed (Unpromoted) |
+----------------------+--------------------------------------------------+
Clone Notifications
~~~~~~~~~~~~~~~~~~~
If the clone has the ``notify`` meta-attribute set to **true**, and the resource
agent supports the ``notify`` action, Pacemaker will call the action when
appropriate, passing a number of extra variables which, when combined with
additional context, can be used to calculate the current state of the cluster
and what is about to happen to it.
.. index::
single: clone; environment variables
single: notify; environment variables
.. table:: **Environment variables supplied with Clone notify actions**
+----------------------------------------------+-------------------------------------------------------------------------------+
| Variable | Description |
+==============================================+===============================================================================+
| OCF_RESKEY_CRM_meta_notify_type | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_type |
| | single: OCF_RESKEY_CRM_meta_notify_type |
| | |
| | Allowed values: **pre**, **post** |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_operation | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_operation |
| | single: OCF_RESKEY_CRM_meta_notify_operation |
| | |
| | Allowed values: **start**, **stop** |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_start_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_start_resource |
| | single: OCF_RESKEY_CRM_meta_notify_start_resource |
| | |
| | Resources to be started |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_stop_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_stop_resource |
| | single: OCF_RESKEY_CRM_meta_notify_stop_resource |
| | |
| | Resources to be stopped |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_active_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_active_resource |
| | single: OCF_RESKEY_CRM_meta_notify_active_resource |
| | |
| | Resources that are running |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_inactive_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_inactive_resource |
| | single: OCF_RESKEY_CRM_meta_notify_inactive_resource |
| | |
| | Resources that are not running |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_start_uname | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_start_uname |
| | single: OCF_RESKEY_CRM_meta_notify_start_uname |
| | |
| | Nodes on which resources will be started |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_stop_uname | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_stop_uname |
| | single: OCF_RESKEY_CRM_meta_notify_stop_uname |
| | |
| | Nodes on which resources will be stopped |
+----------------------------------------------+-------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_active_uname | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_active_uname |
| | single: OCF_RESKEY_CRM_meta_notify_active_uname |
| | |
| | Nodes on which resources are running |
+----------------------------------------------+-------------------------------------------------------------------------------+
The variables come in pairs, such as
``OCF_RESKEY_CRM_meta_notify_start_resource`` and
``OCF_RESKEY_CRM_meta_notify_start_uname``, and should be treated as an
array of whitespace-separated elements.
``OCF_RESKEY_CRM_meta_notify_inactive_resource`` is an exception, as the
matching **uname** variable does not exist since inactive resources
are not running on any node.
Thus, in order to indicate that **clone:0** will be started on **sles-1**,
**clone:2** will be started on **sles-3**, and **clone:3** will be started
on **sles-2**, the cluster would set:
.. topic:: Notification variables
.. code-block:: none
OCF_RESKEY_CRM_meta_notify_start_resource="clone:0 clone:2 clone:3"
OCF_RESKEY_CRM_meta_notify_start_uname="sles-1 sles-3 sles-2"
.. note::
Pacemaker will log but otherwise ignore failures of notify actions.
Interpretation of Notification Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Pre-notification (stop):**
* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (stop) / Pre-notification (start):**
* Active resources
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Inactive resources
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (start):**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
Extra Notifications for Promotable Clones
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. index::
single: clone; environment variables
single: promotable; environment variables
.. table:: **Extra environment variables supplied for promotable clones**
+------------------------------------------------+---------------------------------------------------------------------------------+
| Variable | Description |
+================================================+=================================================================================+
| OCF_RESKEY_CRM_meta_notify_promoted_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_promoted_resource |
| | single: OCF_RESKEY_CRM_meta_notify_promoted_resource |
| | |
| | Resources that are running in the promoted role |
+------------------------------------------------+---------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_unpromoted_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_unpromoted_resource |
| | single: OCF_RESKEY_CRM_meta_notify_unpromoted_resource |
| | |
| | Resources that are running in the unpromoted role |
+------------------------------------------------+---------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_promote_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_promote_resource |
| | single: OCF_RESKEY_CRM_meta_notify_promote_resource |
| | |
| | Resources to be promoted |
+------------------------------------------------+---------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_demote_resource | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_demote_resource |
| | single: OCF_RESKEY_CRM_meta_notify_demote_resource |
| | |
| | Resources to be demoted |
+------------------------------------------------+---------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_promote_uname | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_promote_uname |
| | single: OCF_RESKEY_CRM_meta_notify_promote_uname |
| | |
| | Nodes on which resources will be promoted |
+------------------------------------------------+---------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_demote_uname | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_demote_uname |
| | single: OCF_RESKEY_CRM_meta_notify_demote_uname |
| | |
| | Nodes on which resources will be demoted |
+------------------------------------------------+---------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_promoted_uname | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_promoted_uname |
| | single: OCF_RESKEY_CRM_meta_notify_promoted_uname |
| | |
| | Nodes on which resources are running in the promoted role |
+------------------------------------------------+---------------------------------------------------------------------------------+
| OCF_RESKEY_CRM_meta_notify_unpromoted_uname | .. index:: |
| | single: environment variable; OCF_RESKEY_CRM_meta_notify_unpromoted_uname |
| | single: OCF_RESKEY_CRM_meta_notify_unpromoted_uname |
| | |
| | Nodes on which resources are running in the unpromoted role |
+------------------------------------------------+---------------------------------------------------------------------------------+
Interpretation of Promotable Notification Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Pre-notification (demote):**
* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* Promoted resources: ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* Unpromoted resources: ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (demote) / Pre-notification (stop):**
* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Unpromoted resources: ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
**Post-notification (stop) / Pre-notification (start)**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Unpromoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (start) / Pre-notification (promote)**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Unpromoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (promote)**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Unpromoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
Monitoring Promotable Clone Resources
_____________________________________
The usual monitor actions are insufficient to monitor a promotable clone
resource, because Pacemaker needs to verify not only that the resource is
active, but also that its actual role matches its intended one.
Define two monitoring actions: the usual one will cover the unpromoted role,
and an additional one with ``role="Promoted"`` will cover the promoted role.
.. topic:: Monitoring both states of a promotable clone resource
.. code-block:: xml
.. important::
It is crucial that *every* monitor operation has a different interval!
Pacemaker currently differentiates between operations
only by resource and interval; so if (for example) a promotable clone resource
had the same monitor interval for both roles, Pacemaker would ignore the
role when checking the status -- which would cause unexpected return
codes, and therefore unnecessary complications.
.. _s-promotion-scores:
Determining Which Instance is Promoted
______________________________________
Pacemaker can choose a promotable clone instance to be promoted in one of two
ways:
* Promotion scores: These are node attributes set via the ``crm_attribute``
command using the ``--promotion`` option, which generally would be called by
the resource agent's start action if it supports promotable clones. This tool
automatically detects both the resource and host, and should be used to set a
preference for being promoted. Based on this, ``promoted-max``, and
``promoted-node-max``, the instance(s) with the highest preference will be
promoted.
* Constraints: Location constraints can indicate which nodes are most preferred
to be promoted.
.. topic:: Explicitly preferring node1 to be promoted
.. code-block:: xml
.. index:
single: bundle resource
single: resource; bundle
pair: container; Docker
pair: container; podman
pair: container; rkt
.. _s-resource-bundle:
-Bundles - Isolated Environments
-###############################
+Bundles - Containerized Resources
+#################################
-Pacemaker supports a special syntax for launching a
+Pacemaker supports a special syntax for launching a service inside a
`container `_
with any infrastructure it requires: the *bundle*.
Pacemaker bundles support `Docker `_,
`podman `_ *(since 2.0.1)*, and
`rkt `_ container technologies. [#]_
.. topic:: A bundle for a containerized web server
.. code-block:: xml
.. index:
single: bundle resource
single: resource; bundle
Bundle Prerequisites
____________________
Before configuring a bundle in Pacemaker, the user must install the appropriate
container launch technology (Docker, podman, or rkt), and supply a fully
configured container image, on every node allowed to run the bundle.
Pacemaker will create an implicit resource of type **ocf:heartbeat:docker**,
**ocf:heartbeat:podman**, or **ocf:heartbeat:rkt** to manage a bundle's
container. The user must ensure that the appropriate resource agent is
installed on every node allowed to run the bundle.
.. index::
pair: XML element; bundle
Bundle Properties
_________________
.. table:: **XML Attributes of a bundle Element**
+-------------+-----------------------------------------------+
| Attribute | Description |
+=============+===============================================+
| id | .. index:: |
| | single: bundle; attribute, id |
| | single: attribute; id (bundle) |
| | single: id; bundle attribute |
| | |
| | A unique name for the bundle (required) |
+-------------+-----------------------------------------------+
| description | .. index:: |
| | single: bundle; attribute, description |
| | single: attribute; description (bundle) |
| | single: description; bundle attribute |
| | |
| | Arbitrary text (not used by Pacemaker) |
+-------------+-----------------------------------------------+
A bundle must contain exactly one ``docker``, ``podman``, or ``rkt`` element.
.. index::
pair: XML element; docker
pair: XML element; podman
pair: XML element; rkt
single: resource; bundle
Bundle Container Properties
___________________________
.. table:: **XML attributes of a docker, podman, or rkt Element**
+-------------------+------------------------------------+---------------------------------------------------+
| Attribute | Default | Description |
+===================+====================================+===================================================+
| image | | .. index:: |
| | | single: docker; attribute, image |
| | | single: attribute; image (docker) |
| | | single: image; docker attribute |
| | | single: podman; attribute, image |
| | | single: attribute; image (podman) |
| | | single: image; podman attribute |
| | | single: rkt; attribute, image |
| | | single: attribute; image (rkt) |
| | | single: image; rkt attribute |
| | | |
| | | Container image tag (required) |
+-------------------+------------------------------------+---------------------------------------------------+
| replicas | Value of ``promoted-max`` | .. index:: |
| | if that is positive, else 1 | single: docker; attribute, replicas |
| | | single: attribute; replicas (docker) |
| | | single: replicas; docker attribute |
| | | single: podman; attribute, replicas |
| | | single: attribute; replicas (podman) |
| | | single: replicas; podman attribute |
| | | single: rkt; attribute, replicas |
| | | single: attribute; replicas (rkt) |
| | | single: replicas; rkt attribute |
| | | |
| | | A positive integer specifying the number of |
| | | container instances to launch |
+-------------------+------------------------------------+---------------------------------------------------+
| replicas-per-host | 1 | .. index:: |
| | | single: docker; attribute, replicas-per-host |
| | | single: attribute; replicas-per-host (docker) |
| | | single: replicas-per-host; docker attribute |
| | | single: podman; attribute, replicas-per-host |
| | | single: attribute; replicas-per-host (podman) |
| | | single: replicas-per-host; podman attribute |
| | | single: rkt; attribute, replicas-per-host |
| | | single: attribute; replicas-per-host (rkt) |
| | | single: replicas-per-host; rkt attribute |
| | | |
| | | A positive integer specifying the number of |
| | | container instances allowed to run on a |
| | | single node |
+-------------------+------------------------------------+---------------------------------------------------+
| promoted-max | 0 | .. index:: |
| | | single: docker; attribute, promoted-max |
| | | single: attribute; promoted-max (docker) |
| | | single: promoted-max; docker attribute |
| | | single: podman; attribute, promoted-max |
| | | single: attribute; promoted-max (podman) |
| | | single: promoted-max; podman attribute |
| | | single: rkt; attribute, promoted-max |
| | | single: attribute; promoted-max (rkt) |
| | | single: promoted-max; rkt attribute |
| | | |
| | | A non-negative integer that, if positive, |
| | | indicates that the containerized service |
| | | should be treated as a promotable service, |
| | | with this many replicas allowed to run the |
| | | service in the promoted role |
+-------------------+------------------------------------+---------------------------------------------------+
| network | | .. index:: |
| | | single: docker; attribute, network |
| | | single: attribute; network (docker) |
| | | single: network; docker attribute |
| | | single: podman; attribute, network |
| | | single: attribute; network (podman) |
| | | single: network; podman attribute |
| | | single: rkt; attribute, network |
| | | single: attribute; network (rkt) |
| | | single: network; rkt attribute |
| | | |
| | | If specified, this will be passed to the |
| | | ``docker run``, ``podman run``, or |
| | | ``rkt run`` command as the network setting |
| | | for the container. |
+-------------------+------------------------------------+---------------------------------------------------+
| run-command | ``/usr/sbin/pacemaker-remoted`` if | .. index:: |
| | bundle contains a **primitive**, | single: docker; attribute, run-command |
| | otherwise none | single: attribute; run-command (docker) |
| | | single: run-command; docker attribute |
| | | single: podman; attribute, run-command |
| | | single: attribute; run-command (podman) |
| | | single: run-command; podman attribute |
| | | single: rkt; attribute, run-command |
| | | single: attribute; run-command (rkt) |
| | | single: run-command; rkt attribute |
| | | |
| | | This command will be run inside the container |
| | | when launching it ("PID 1"). If the bundle |
| | | contains a **primitive**, this command *must* |
| | | start ``pacemaker-remoted`` (but could, for |
| | | example, be a script that does other stuff, too). |
+-------------------+------------------------------------+---------------------------------------------------+
| options | | .. index:: |
| | | single: docker; attribute, options |
| | | single: attribute; options (docker) |
| | | single: options; docker attribute |
| | | single: podman; attribute, options |
| | | single: attribute; options (podman) |
| | | single: options; podman attribute |
| | | single: rkt; attribute, options |
| | | single: attribute; options (rkt) |
| | | single: options; rkt attribute |
| | | |
| | | Extra command-line options to pass to the |
| | | ``docker run``, ``podman run``, or ``rkt run`` |
| | | command |
+-------------------+------------------------------------+---------------------------------------------------+
.. note::
Considerations when using cluster configurations or container images from
Pacemaker 1.1:
* If the container image has a pre-2.0.0 version of Pacemaker, set ``run-command``
to ``/usr/sbin/pacemaker_remoted`` (note the underbar instead of dash).
* ``masters`` is accepted as an alias for ``promoted-max``, but is deprecated since
2.0.0, and support for it will be removed in a future version.
Bundle Network Properties
_________________________
A bundle may optionally contain one ```` element.
.. index::
pair: XML element; network
single: resource; bundle
single: bundle; networking
.. topic:: **XML attributes of a network Element**
+----------------+---------+------------------------------------------------------------+
| Attribute | Default | Description |
+================+=========+============================================================+
| add-host | TRUE | .. index:: |
| | | single: network; attribute, add-host |
| | | single: attribute; add-host (network) |
| | | single: add-host; network attribute |
| | | |
| | | If TRUE, and ``ip-range-start`` is used, Pacemaker will |
| | | automatically ensure that ``/etc/hosts`` inside the |
| | | containers has entries for each |
| | | :ref:`replica name ` |
| | | and its assigned IP. |
+----------------+---------+------------------------------------------------------------+
| ip-range-start | | .. index:: |
| | | single: network; attribute, ip-range-start |
| | | single: attribute; ip-range-start (network) |
| | | single: ip-range-start; network attribute |
| | | |
| | | If specified, Pacemaker will create an implicit |
| | | ``ocf:heartbeat:IPaddr2`` resource for each container |
| | | instance, starting with this IP address, using up to |
| | | ``replicas`` sequential addresses. These addresses can be |
| | | used from the host's network to reach the service inside |
| | | the container, though it is not visible within the |
| | | container itself. Only IPv4 addresses are currently |
| | | supported. |
+----------------+---------+------------------------------------------------------------+
| host-netmask | 32 | .. index:: |
| | | single: network; attribute; host-netmask |
| | | single: attribute; host-netmask (network) |
| | | single: host-netmask; network attribute |
| | | |
| | | If ``ip-range-start`` is specified, the IP addresses |
| | | are created with this CIDR netmask (as a number of bits). |
+----------------+---------+------------------------------------------------------------+
| host-interface | | .. index:: |
| | | single: network; attribute; host-interface |
| | | single: attribute; host-interface (network) |
| | | single: host-interface; network attribute |
| | | |
| | | If ``ip-range-start`` is specified, the IP addresses are |
| | | created on this host interface (by default, it will be |
| | | determined from the IP address). |
+----------------+---------+------------------------------------------------------------+
| control-port | 3121 | .. index:: |
| | | single: network; attribute; control-port |
| | | single: attribute; control-port (network) |
| | | single: control-port; network attribute |
| | | |
| | | If the bundle contains a ``primitive``, the cluster will |
| | | use this integer TCP port for communication with |
| | | Pacemaker Remote inside the container. Changing this is |
| | | useful when the container is unable to listen on the |
| | | default port, for example, when the container uses the |
| | | host's network rather than ``ip-range-start`` (in which |
| | | case ``replicas-per-host`` must be 1), or when the bundle |
| | | may run on a Pacemaker Remote node that is already |
| | | listening on the default port. Any ``PCMK_remote_port`` |
| | | environment variable set on the host or in the container |
| | | is ignored for bundle connections. |
+----------------+---------+------------------------------------------------------------+
.. _s-resource-bundle-note-replica-names:
.. note::
Replicas are named by the bundle id plus a dash and an integer counter starting
with zero. For example, if a bundle named **httpd-bundle** has **replicas=2**, its
containers will be named **httpd-bundle-0** and **httpd-bundle-1**.
.. index::
pair: XML element; port-mapping
Additionally, a ``network`` element may optionally contain one or more
``port-mapping`` elements.
.. table:: **Attributes of a port-mapping Element**
+---------------+-------------------+------------------------------------------------------+
| Attribute | Default | Description |
+===============+===================+======================================================+
| id | | .. index:: |
| | | single: port-mapping; attribute, id |
| | | single: attribute; id (port-mapping) |
| | | single: id; port-mapping attribute |
| | | |
| | | A unique name for the port mapping (required) |
+---------------+-------------------+------------------------------------------------------+
| port | | .. index:: |
| | | single: port-mapping; attribute, port |
| | | single: attribute; port (port-mapping) |
| | | single: port; port-mapping attribute |
| | | |
| | | If this is specified, connections to this TCP port |
| | | number on the host network (on the container's |
| | | assigned IP address, if ``ip-range-start`` is |
| | | specified) will be forwarded to the container |
| | | network. Exactly one of ``port`` or ``range`` |
| | | must be specified in a ``port-mapping``. |
+---------------+-------------------+------------------------------------------------------+
| internal-port | value of ``port`` | .. index:: |
| | | single: port-mapping; attribute, internal-port |
| | | single: attribute; internal-port (port-mapping) |
| | | single: internal-port; port-mapping attribute |
| | | |
| | | If ``port`` and this are specified, connections |
| | | to ``port`` on the host's network will be |
| | | forwarded to this port on the container network. |
+---------------+-------------------+------------------------------------------------------+
| range | | .. index:: |
| | | single: port-mapping; attribute, range |
| | | single: attribute; range (port-mapping) |
| | | single: range; port-mapping attribute |
| | | |
| | | If this is specified, connections to these TCP |
| | | port numbers (expressed as *first_port*-*last_port*) |
| | | on the host network (on the container's assigned IP |
| | | address, if ``ip-range-start`` is specified) will |
| | | be forwarded to the same ports in the container |
| | | network. Exactly one of ``port`` or ``range`` |
| | | must be specified in a ``port-mapping``. |
+---------------+-------------------+------------------------------------------------------+
.. note::
If the bundle contains a ``primitive``, Pacemaker will automatically map the
``control-port``, so it is not necessary to specify that port in a
``port-mapping``.
.. index:
pair: XML element; storage
pair: XML element; storage-mapping
single: resource; bundle
.. _s-bundle-storage:
Bundle Storage Properties
_________________________
A bundle may optionally contain one ``storage`` element. A ``storage`` element
has no properties of its own, but may contain one or more ``storage-mapping``
elements.
.. table:: **Attributes of a storage-mapping Element**
+-----------------+---------+-------------------------------------------------------------+
| Attribute | Default | Description |
+=================+=========+=============================================================+
| id | | .. index:: |
| | | single: storage-mapping; attribute, id |
| | | single: attribute; id (storage-mapping) |
| | | single: id; storage-mapping attribute |
| | | |
| | | A unique name for the storage mapping (required) |
+-----------------+---------+-------------------------------------------------------------+
| source-dir | | .. index:: |
| | | single: storage-mapping; attribute, source-dir |
| | | single: attribute; source-dir (storage-mapping) |
| | | single: source-dir; storage-mapping attribute |
| | | |
| | | The absolute path on the host's filesystem that will be |
| | | mapped into the container. Exactly one of ``source-dir`` |
| | | and ``source-dir-root`` must be specified in a |
| | | ``storage-mapping``. |
+-----------------+---------+-------------------------------------------------------------+
| source-dir-root | | .. index:: |
| | | single: storage-mapping; attribute, source-dir-root |
| | | single: attribute; source-dir-root (storage-mapping) |
| | | single: source-dir-root; storage-mapping attribute |
| | | |
| | | The start of a path on the host's filesystem that will |
| | | be mapped into the container, using a different |
| | | subdirectory on the host for each container instance. |
| | | The subdirectory will be named the same as the |
| | | :ref:`replica name `. |
| | | Exactly one of ``source-dir`` and ``source-dir-root`` |
| | | must be specified in a ``storage-mapping``. |
+-----------------+---------+-------------------------------------------------------------+
| target-dir | | .. index:: |
| | | single: storage-mapping; attribute, target-dir |
| | | single: attribute; target-dir (storage-mapping) |
| | | single: target-dir; storage-mapping attribute |
| | | |
| | | The path name within the container where the host |
| | | storage will be mapped (required) |
+-----------------+---------+-------------------------------------------------------------+
| options | | .. index:: |
| | | single: storage-mapping; attribute, options |
| | | single: attribute; options (storage-mapping) |
| | | single: options; storage-mapping attribute |
| | | |
| | | A comma-separated list of file system mount |
| | | options to use when mapping the storage |
+-----------------+---------+-------------------------------------------------------------+
.. note::
Pacemaker does not define the behavior if the source directory does not already
exist on the host. However, it is expected that the container technology and/or
its resource agent will create the source directory in that case.
.. note::
If the bundle contains a ``primitive``,
Pacemaker will automatically map the equivalent of
``source-dir=/etc/pacemaker/authkey target-dir=/etc/pacemaker/authkey``
and ``source-dir-root=/var/log/pacemaker/bundles target-dir=/var/log`` into the
container, so it is not necessary to specify those paths in a
``storage-mapping``.
.. important::
The ``PCMK_authkey_location`` environment variable must not be set to anything
other than the default of ``/etc/pacemaker/authkey`` on any node in the cluster.
.. important::
If SELinux is used in enforcing mode on the host, you must ensure the container
is allowed to use any storage you mount into it. For Docker and podman bundles,
adding "Z" to the mount options will create a container-specific label for the
mount that allows the container access.
.. index::
single: resource; bundle
Bundle Primitive
________________
A bundle may optionally contain one :ref:`primitive `
resource. The primitive may have operations, instance attributes, and
meta-attributes defined, as usual.
If a bundle contains a primitive resource, the container image must include
the Pacemaker Remote daemon, and at least one of ``ip-range-start`` or
``control-port`` must be configured in the bundle. Pacemaker will create an
implicit **ocf:pacemaker:remote** resource for the connection, launch
Pacemaker Remote within the container, and monitor and manage the primitive
resource via Pacemaker Remote.
If the bundle has more than one container instance (replica), the primitive
resource will function as an implicit :ref:`clone ` -- a
:ref:`promotable clone ` if the bundle has ``promoted-max``
greater than zero.
.. note::
If you want to pass environment variables to a bundle's Pacemaker Remote
connection or primitive, you have two options:
* Environment variables whose value is the same regardless of the underlying host
may be set using the container element's ``options`` attribute.
* If you want variables to have host-specific values, you can use the
:ref:`storage-mapping ` element to map a file on the host as
``/etc/pacemaker/pcmk-init.env`` in the container *(since 2.0.3)*.
Pacemaker Remote will parse this file as a shell-like format, with
variables set as NAME=VALUE, ignoring blank lines and comments starting
with "#".
.. important::
When a bundle has a ``primitive``, Pacemaker on all cluster nodes must be able to
contact Pacemaker Remote inside the bundle's containers.
* The containers must have an accessible network (for example, ``network`` should
not be set to "none" with a ``primitive``).
* The default, using a distinct network space inside the container, works in
combination with ``ip-range-start``. Any firewall must allow access from all
cluster nodes to the ``control-port`` on the container IPs.
* If the container shares the host's network space (for example, by setting
``network`` to "host"), a unique ``control-port`` should be specified for each
bundle. Any firewall must allow access from all cluster nodes to the
``control-port`` on all cluster and remote node IPs.
.. index::
single: resource; bundle
.. _s-bundle-attributes:
Bundle Node Attributes
______________________
If the bundle has a ``primitive``, the primitive's resource agent may want to set
node attributes such as :ref:`promotion scores `. However, with
containers, it is not apparent which node should get the attribute.
If the container uses shared storage that is the same no matter which node the
container is hosted on, then it is appropriate to use the promotion score on the
bundle node itself.
On the other hand, if the container uses storage exported from the underlying host,
then it may be more appropriate to use the promotion score on the underlying host.
Since this depends on the particular situation, the
``container-attribute-target`` resource meta-attribute allows the user to specify
which approach to use. If it is set to ``host``, then user-defined node attributes
will be checked on the underlying host. If it is anything else, the local node
(in this case the bundle node) is used as usual.
This only applies to user-defined attributes; the cluster will always check the
local node for cluster-defined attributes such as ``#uname``.
If ``container-attribute-target`` is ``host``, the cluster will pass additional
environment variables to the primitive's resource agent that allow it to set
node attributes appropriately: ``CRM_meta_container_attribute_target`` (identical
to the meta-attribute value) and ``CRM_meta_physical_host`` (the name of the
underlying host).
.. note::
When called by a resource agent, the ``attrd_updater`` and ``crm_attribute``
commands will automatically check those environment variables and set
attributes appropriately.
.. index::
single: resource; bundle
Bundle Meta-Attributes
______________________
Any meta-attribute set on a bundle will be inherited by the bundle's
primitive and any resources implicitly created by Pacemaker for the bundle.
This includes options such as ``priority``, ``target-role``, and ``is-managed``. See
:ref:`resource_options` for more information.
Limitations of Bundles
______________________
Restarting pacemaker while a bundle is unmanaged or the cluster is in
maintenance mode may cause the bundle to fail.
Bundles may not be explicitly cloned or included in groups. This includes the
bundle's primitive and any resources implicitly created by Pacemaker for the
bundle. (If ``replicas`` is greater than 1, the bundle will behave like a clone
implicitly.)
Bundles do not have instance attributes, utilization attributes, or operations,
though a bundle's primitive may have them.
A bundle with a primitive can run on a Pacemaker Remote node only if the bundle
uses a distinct ``control-port``.
.. [#] Of course, the service must support running multiple instances.
.. [#] Docker is a trademark of Docker, Inc. No endorsement by or association with
Docker, Inc. is implied.
diff --git a/doc/sphinx/Pacemaker_Explained/constraints.rst b/doc/sphinx/Pacemaker_Explained/constraints.rst
index 8722f81866..b5b9f8b144 100644
--- a/doc/sphinx/Pacemaker_Explained/constraints.rst
+++ b/doc/sphinx/Pacemaker_Explained/constraints.rst
@@ -1,1061 +1,1061 @@
.. index::
single: constraint
single: resource; constraint
.. _constraints:
Resource Constraints
--------------------
.. index::
single: resource; score
single: node; score
Scores
######
Scores of all kinds are integral to how the cluster works.
Practically everything from moving a resource to deciding which
resource to stop in a degraded cluster is achieved by manipulating
scores in some way.
Scores are calculated per resource and node. Any node with a
negative score for a resource can't run that resource. The cluster
places a resource on the node with the highest score for it.
Infinity Math
_____________
Pacemaker implements **INFINITY** (or equivalently, **+INFINITY**) internally as a
score of 1,000,000. Addition and subtraction with it follow these three basic
rules:
* Any value + **INFINITY** = **INFINITY**
* Any value - **INFINITY** = -**INFINITY**
* **INFINITY** - **INFINITY** = **-INFINITY**
.. note::
What if you want to use a score higher than 1,000,000? Typically this possibility
arises when someone wants to base the score on some external metric that might
go above 1,000,000.
The short answer is you can't.
The long answer is it is sometimes possible work around this limitation
creatively. You may be able to set the score to some computed value based on
the external metric rather than use the metric directly. For nodes, you can
store the metric as a node attribute, and query the attribute when computing
the score (possibly as part of a custom resource agent).
.. _location-constraint:
.. index::
single: location constraint
single: constraint; location
Deciding Which Nodes a Resource Can Run On
##########################################
*Location constraints* tell the cluster which nodes a resource can run on.
There are two alternative strategies. One way is to say that, by default,
resources can run anywhere, and then the location constraints specify nodes
that are not allowed (an *opt-out* cluster). The other way is to start with
nothing able to run anywhere, and use location constraints to selectively
enable allowed nodes (an *opt-in* cluster).
Whether you should choose opt-in or opt-out depends on your
personal preference and the make-up of your cluster. If most of your
resources can run on most of the nodes, then an opt-out arrangement is
likely to result in a simpler configuration. On the other-hand, if
most resources can only run on a small subset of nodes, an opt-in
configuration might be simpler.
.. index::
pair: XML element; rsc_location
single: constraint; rsc_location
Location Properties
___________________
.. table:: **Attributes of a rsc_location Element**
+--------------------+---------+----------------------------------------------------------------------------------------------+
| Attribute | Default | Description |
+====================+=========+==============================================================================================+
| id | | .. index:: |
| | | single: rsc_location; attribute, id |
| | | single: attribute; id (rsc_location) |
| | | single: id; rsc_location attribute |
| | | |
| | | A unique name for the constraint (required) |
+--------------------+---------+----------------------------------------------------------------------------------------------+
| rsc | | .. index:: |
| | | single: rsc_location; attribute, rsc |
| | | single: attribute; rsc (rsc_location) |
| | | single: rsc; rsc_location attribute |
| | | |
| | | The name of the resource to which this constraint |
| | | applies. A location constraint must either have a |
| | | ``rsc``, have a ``rsc-pattern``, or contain at |
| | | least one resource set. |
+--------------------+---------+----------------------------------------------------------------------------------------------+
| rsc-pattern | | .. index:: |
| | | single: rsc_location; attribute, rsc-pattern |
| | | single: attribute; rsc-pattern (rsc_location) |
| | | single: rsc-pattern; rsc_location attribute |
| | | |
| | | A pattern matching the names of resources to which |
| | | this constraint applies. The syntax is the same as |
| | | `POSIX `_ |
| | | extended regular expressions, with the addition of an |
| | | initial *!* indicating that resources *not* matching |
| | | the pattern are selected. If the regular expression |
| | | contains submatches, and the constraint is governed by |
| | | a :ref:`rule `, the submatches can be |
| | | referenced as **%1** through **%9** in the rule's |
| | | ``score-attribute`` or a rule expression's ``attribute``. |
| | | A location constraint must either have a ``rsc``, have a |
| | | ``rsc-pattern``, or contain at least one resource set. |
+--------------------+---------+----------------------------------------------------------------------------------------------+
| node | | .. index:: |
| | | single: rsc_location; attribute, node |
| | | single: attribute; node (rsc_location) |
| | | single: node; rsc_location attribute |
| | | |
| | | The name of the node to which this constraint applies. |
| | | A location constraint must either have a ``node`` and |
| | | ``score``, or contain at least one rule. |
+--------------------+---------+----------------------------------------------------------------------------------------------+
| score | | .. index:: |
| | | single: rsc_location; attribute, score |
| | | single: attribute; score (rsc_location) |
| | | single: score; rsc_location attribute |
| | | |
| | | Positive values indicate a preference for running the |
| | | affected resource(s) on ``node`` -- the higher the value, |
| | | the stronger the preference. Negative values indicate |
| | | the resource(s) should avoid this node (a value of |
| | | **-INFINITY** changes "should" to "must"). A location |
| | | constraint must either have a ``node`` and ``score``, |
| | | or contain at least one rule. |
+--------------------+---------+----------------------------------------------------------------------------------------------+
| resource-discovery | always | .. index:: |
| | | single: rsc_location; attribute, resource-discovery |
| | | single: attribute; resource-discovery (rsc_location) |
| | | single: resource-discovery; rsc_location attribute |
| | | |
| | | Whether Pacemaker should perform resource discovery |
| | | (that is, check whether the resource is already running) |
| | | for this resource on this node. This should normally be |
| | | left as the default, so that rogue instances of a |
| | | service can be stopped when they are running where they |
| | | are not supposed to be. However, there are two |
| | | situations where disabling resource discovery is a good |
| | | idea: when a service is not installed on a node, |
| | | discovery might return an error (properly written OCF |
| | | agents will not, so this is usually only seen with other |
| | | agent types); and when Pacemaker Remote is used to scale |
| | | a cluster to hundreds of nodes, limiting resource |
| | | discovery to allowed nodes can significantly boost |
| | | performance. |
| | | |
| | | * ``always:`` Always perform resource discovery for |
| | | the specified resource on this node. |
| | | |
| | | * ``never:`` Never perform resource discovery for the |
| | | specified resource on this node. This option should |
| | | generally be used with a -INFINITY score, although |
| | | that is not strictly required. |
| | | |
| | | * ``exclusive:`` Perform resource discovery for the |
| | | specified resource only on this node (and other nodes |
| | | similarly marked as ``exclusive``). Multiple location |
| | | constraints using ``exclusive`` discovery for the |
| | | same resource across different nodes creates a subset |
| | | of nodes resource-discovery is exclusive to. If a |
| | | resource is marked for ``exclusive`` discovery on one |
| | | or more nodes, that resource is only allowed to be |
| | | placed within that subset of nodes. |
+--------------------+---------+----------------------------------------------------------------------------------------------+
.. warning::
Setting ``resource-discovery`` to ``never`` or ``exclusive`` removes Pacemaker's
ability to detect and stop unwanted instances of a service running
where it's not supposed to be. It is up to the system administrator (you!)
to make sure that the service can *never* be active on nodes without
``resource-discovery`` (such as by leaving the relevant software uninstalled).
.. index::
single: Asymmetrical Clusters
single: Opt-In Clusters
Asymmetrical "Opt-In" Clusters
______________________________
To create an opt-in cluster, start by preventing resources from running anywhere
by default:
.. code-block:: none
# crm_attribute --name symmetric-cluster --update false
Then start enabling nodes. The following fragment says that the web
server prefers **sles-1**, the database prefers **sles-2** and both can
fail over to **sles-3** if their most preferred node fails.
.. topic:: Opt-in location constraints for two resources
.. code-block:: xml
.. index::
single: Symmetrical Clusters
single: Opt-Out Clusters
Symmetrical "Opt-Out" Clusters
______________________________
To create an opt-out cluster, start by allowing resources to run
anywhere by default:
.. code-block:: none
# crm_attribute --name symmetric-cluster --update true
Then start disabling nodes. The following fragment is the equivalent
of the above opt-in configuration.
.. topic:: Opt-out location constraints for two resources
.. code-block:: xml
.. _node-score-equal:
What if Two Nodes Have the Same Score
_____________________________________
If two nodes have the same score, then the cluster will choose one.
This choice may seem random and may not be what was intended, however
the cluster was not given enough information to know any better.
.. topic:: Constraints where a resource prefers two nodes equally
.. code-block:: xml
In the example above, assuming no other constraints and an inactive
cluster, **Webserver** would probably be placed on **sles-1** and **Database** on
**sles-2**. It would likely have placed **Webserver** based on the node's
uname and **Database** based on the desire to spread the resource load
evenly across the cluster. However other factors can also be involved
in more complex configurations.
.. index::
single: constraint; ordering
single: resource; start order
.. _s-resource-ordering:
Specifying the Order in which Resources Should Start/Stop
#########################################################
*Ordering constraints* tell the cluster the order in which certain
resource actions should occur.
.. important::
Ordering constraints affect *only* the ordering of resource actions;
they do *not* require that the resources be placed on the
same node. If you want resources to be started on the same node
*and* in a specific order, you need both an ordering constraint *and*
a colocation constraint (see :ref:`s-resource-colocation`), or
alternatively, a group (see :ref:`group-resources`).
.. index::
pair: XML element; rsc_order
pair: constraint; rsc_order
Ordering Properties
___________________
.. table:: **Attributes of a rsc_order Element**
+--------------+----------------------------+-------------------------------------------------------------------+
| Field | Default | Description |
+==============+============================+===================================================================+
| id | | .. index:: |
| | | single: rsc_order; attribute, id |
| | | single: attribute; id (rsc_order) |
| | | single: id; rsc_order attribute |
| | | |
| | | A unique name for the constraint |
+--------------+----------------------------+-------------------------------------------------------------------+
| first | | .. index:: |
| | | single: rsc_order; attribute, first |
| | | single: attribute; first (rsc_order) |
| | | single: first; rsc_order attribute |
| | | |
| | | Name of the resource that the ``then`` resource |
| | | depends on |
+--------------+----------------------------+-------------------------------------------------------------------+
| then | | .. index:: |
| | | single: rsc_order; attribute, then |
| | | single: attribute; then (rsc_order) |
| | | single: then; rsc_order attribute |
| | | |
| | | Name of the dependent resource |
+--------------+----------------------------+-------------------------------------------------------------------+
| first-action | start | .. index:: |
| | | single: rsc_order; attribute, first-action |
| | | single: attribute; first-action (rsc_order) |
| | | single: first-action; rsc_order attribute |
| | | |
| | | The action that the ``first`` resource must complete |
| | | before ``then-action`` can be initiated for the ``then`` |
| | | resource. Allowed values: ``start``, ``stop``, |
| | | ``promote``, ``demote``. |
+--------------+----------------------------+-------------------------------------------------------------------+
| then-action | value of ``first-action`` | .. index:: |
| | | single: rsc_order; attribute, then-action |
| | | single: attribute; then-action (rsc_order) |
| | | single: first-action; rsc_order attribute |
| | | |
| | | The action that the ``then`` resource can execute only |
| | | after the ``first-action`` on the ``first`` resource has |
| | | completed. Allowed values: ``start``, ``stop``, |
| | | ``promote``, ``demote``. |
+--------------+----------------------------+-------------------------------------------------------------------+
| kind | Mandatory | .. index:: |
| | | single: rsc_order; attribute, kind |
| | | single: attribute; kind (rsc_order) |
| | | single: kind; rsc_order attribute |
| | | |
| | | How to enforce the constraint. Allowed values: |
| | | |
| | | * ``Mandatory:`` ``then-action`` will never be initiated |
| | | for the ``then`` resource unless and until ``first-action`` |
| | | successfully completes for the ``first`` resource. |
| | | |
| | | * ``Optional:`` The constraint applies only if both specified |
| | | resource actions are scheduled in the same transition |
| | | (that is, in response to the same cluster state). This |
| | | means that ``then-action`` is allowed on the ``then`` |
| | | resource regardless of the state of the ``first`` resource, |
| | | but if both actions happen to be scheduled at the same time, |
| | | they will be ordered. |
| | | |
| | | * ``Serialize:`` Ensure that the specified actions are never |
| | | performed concurrently for the specified resources. |
| | | ``First-action`` and ``then-action`` can be executed in either |
| | | order, but one must complete before the other can be initiated. |
| | | An example use case is when resource start-up puts a high load |
| | | on the host. |
+--------------+----------------------------+-------------------------------------------------------------------+
| symmetrical | TRUE for ``Mandatory`` and | .. index:: |
| | ``Optional`` kinds. FALSE | single: rsc_order; attribute, symmetrical |
| | for ``Serialize`` kind. | single: attribute; symmetrical (rsc)order) |
| | | single: symmetrical; rsc_order attribute |
| | | |
| | | If true, the reverse of the constraint applies for the |
| | | opposite action (for example, if B starts after A starts, |
| | | then B stops before A stops). ``Serialize`` orders cannot |
| | | be symmetrical. |
+--------------+----------------------------+-------------------------------------------------------------------+
``Promote`` and ``demote`` apply to :ref:`promotable `
clone resources.
Optional and mandatory ordering
_______________________________
Here is an example of ordering constraints where **Database** *must* start before
**Webserver**, and **IP** *should* start before **Webserver** if they both need to be
started:
.. topic:: Optional and mandatory ordering constraints
.. code-block:: xml
Because the above example lets ``symmetrical`` default to TRUE, **Webserver**
must be stopped before **Database** can be stopped, and **Webserver** should be
stopped before **IP** if they both need to be stopped.
.. index::
single: colocation
single: constraint; colocation
single: resource; location relative to other resources
.. _s-resource-colocation:
Placing Resources Relative to other Resources
#############################################
*Colocation constraints* tell the cluster that the location of one resource
depends on the location of another one.
Colocation has an important side-effect: it affects the order in which
resources are assigned to a node. Think about it: You can't place A relative to
B unless you know where B is [#]_.
So when you are creating colocation constraints, it is important to
consider whether you should colocate A with B, or B with A.
.. important::
Colocation constraints affect *only* the placement of resources; they do *not*
require that the resources be started in a particular order. If you want
resources to be started on the same node *and* in a specific order, you need
both an ordering constraint (see :ref:`s-resource-ordering`) *and* a colocation
constraint, or alternatively, a group (see :ref:`group-resources`).
.. index::
pair: XML element; rsc_colocation
single: constraint; rsc_colocation
Colocation Properties
_____________________
.. table:: **Attributes of a rsc_colocation Constraint**
+----------------+----------------+--------------------------------------------------------+
| Field | Default | Description |
+================+================+========================================================+
| id | | .. index:: |
| | | single: rsc_colocation; attribute, id |
| | | single: attribute; id (rsc_colocation) |
| | | single: id; rsc_colocation attribute |
| | | |
| | | A unique name for the constraint (required). |
+----------------+----------------+--------------------------------------------------------+
| rsc | | .. index:: |
| | | single: rsc_colocation; attribute, rsc |
| | | single: attribute; rsc (rsc_colocation) |
| | | single: rsc; rsc_colocation attribute |
| | | |
| | | The name of a resource that should be located |
| | | relative to ``with-rsc``. A colocation constraint must |
| | | either contain at least one |
| | | :ref:`resource set `, or specify both |
| | | ``rsc`` and ``with-rsc``. |
+----------------+----------------+--------------------------------------------------------+
| with-rsc | | .. index:: |
| | | single: rsc_colocation; attribute, with-rsc |
| | | single: attribute; with-rsc (rsc_colocation) |
| | | single: with-rsc; rsc_colocation attribute |
| | | |
| | | The name of the resource used as the colocation |
| | | target. The cluster will decide where to put this |
| | | resource first and then decide where to put ``rsc``. |
| | | A colocation constraint must either contain at least |
| | | one :ref:`resource set `, or specify |
| | | both ``rsc`` and ``with-rsc``. |
+----------------+----------------+--------------------------------------------------------+
| node-attribute | #uname | .. index:: |
| | | single: rsc_colocation; attribute, node-attribute |
| | | single: attribute; node-attribute (rsc_colocation) |
| | | single: node-attribute; rsc_colocation attribute |
| | | |
| | | If ``rsc`` and ``with-rsc`` are specified, this node |
| | | attribute must be the same on the node running ``rsc`` |
| | | and the node running ``with-rsc`` for the constraint |
| | | to be satisfied. (For details, see |
| | | :ref:`s-coloc-attribute`.) |
+----------------+----------------+--------------------------------------------------------+
| score | 0 | .. index:: |
| | | single: rsc_colocation; attribute, score |
| | | single: attribute; score (rsc_colocation) |
| | | single: score; rsc_colocation attribute |
| | | |
| | | Positive values indicate the resources should run on |
| | | the same node. Negative values indicate the resources |
| | | should run on different nodes. Values of |
| | | +/- ``INFINITY`` change "should" to "must". |
+----------------+----------------+--------------------------------------------------------+
| rsc-role | Started | .. index:: |
| | | single: clone; ordering constraint, rsc-role |
| | | single: ordering constraint; rsc-role (clone) |
| | | single: rsc-role; clone ordering constraint |
| | | |
| | | If ``rsc`` and ``with-rsc`` are specified, and ``rsc`` |
| | | is a :ref:`promotable clone `, |
| | | the constraint applies only to ``rsc`` instances in |
| | | this role. Allowed values: ``Started``, ``Promoted``, |
| | | ``Unpromoted``. For details, see |
| | | :ref:`promotable-clone-constraints`. |
+----------------+----------------+--------------------------------------------------------+
| with-rsc-role | Started | .. index:: |
| | | single: clone; ordering constraint, with-rsc-role |
| | | single: ordering constraint; with-rsc-role (clone) |
| | | single: with-rsc-role; clone ordering constraint |
| | | |
| | | If ``rsc`` and ``with-rsc`` are specified, and |
| | | ``with-rsc`` is a |
| | | :ref:`promotable clone `, the |
| | | constraint applies only to ``with-rsc`` instances in |
| | | this role. Allowed values: ``Started``, ``Promoted``, |
| | | ``Unpromoted``. For details, see |
| | | :ref:`promotable-clone-constraints`. |
+----------------+----------------+--------------------------------------------------------+
| influence | value of | .. index:: |
| | ``critical`` | single: rsc_colocation; attribute, influence |
| | meta-attribute | single: attribute; influence (rsc_colocation) |
| | for ``rsc`` | single: influence; rsc_colocation attribute |
| | | |
| | | Whether to consider the location preferences of |
| | | ``rsc`` when ``with-rsc`` is already active. Allowed |
| | | values: ``true``, ``false``. For details, see |
- | | | :ref:`s-coloc-influence`. |
+ | | | :ref:`s-coloc-influence`. *(since 2.1.0)* |
+----------------+----------------+--------------------------------------------------------+
Mandatory Placement
___________________
Mandatory placement occurs when the constraint's score is
**+INFINITY** or **-INFINITY**. In such cases, if the constraint can't be
satisfied, then the **rsc** resource is not permitted to run. For
``score=INFINITY``, this includes cases where the ``with-rsc`` resource is
not active.
If you need resource **A** to always run on the same machine as
resource **B**, you would add the following constraint:
.. topic:: Mandatory colocation constraint for two resources
.. code-block:: xml
Remember, because **INFINITY** was used, if **B** can't run on any
of the cluster nodes (for whatever reason) then **A** will not
be allowed to run. Whether **A** is running or not has no effect on **B**.
Alternatively, you may want the opposite -- that **A** *cannot*
run on the same machine as **B**. In this case, use ``score="-INFINITY"``.
.. topic:: Mandatory anti-colocation constraint for two resources
.. code-block:: xml
Again, by specifying **-INFINITY**, the constraint is binding. So if the
only place left to run is where **B** already is, then **A** may not run anywhere.
As with **INFINITY**, **B** can run even if **A** is stopped. However, in this
case **A** also can run if **B** is stopped, because it still meets the
constraint of **A** and **B** not running on the same node.
Advisory Placement
__________________
If mandatory placement is about "must" and "must not", then advisory
placement is the "I'd prefer if" alternative. For constraints with
scores greater than **-INFINITY** and less than **INFINITY**, the cluster
will try to accommodate your wishes but may ignore them if the
alternative is to stop some of the cluster resources.
As in life, where if enough people prefer something it effectively
becomes mandatory, advisory colocation constraints can combine with
other elements of the configuration to behave as if they were
mandatory.
.. topic:: Advisory colocation constraint for two resources
.. code-block:: xml
.. _s-coloc-attribute:
Colocation by Node Attribute
____________________________
The ``node-attribute`` property of a colocation constraints allows you to express
the requirement, "these resources must be on similar nodes".
As an example, imagine that you have two Storage Area Networks (SANs) that are
not controlled by the cluster, and each node is connected to one or the other.
You may have two resources **r1** and **r2** such that **r2** needs to use the same
SAN as **r1**, but doesn't necessarily have to be on the same exact node.
In such a case, you could define a :ref:`node attribute ` named
**san**, with the value **san1** or **san2** on each node as appropriate. Then, you
could colocate **r2** with **r1** using ``node-attribute`` set to **san**.
.. _s-coloc-influence:
Colocation Influence
____________________
By default, if A is colocated with B, the cluster will take into account A's
preferences when deciding where to place B, to maximize the chance that both
resources can run.
For a detailed look at exactly how this occurs, see
`Colocation Explained `_.
However, if ``influence`` is set to ``false`` in the colocation constraint,
this will happen only if B is inactive and needing to be started. If B is
already active, A's preferences will have no effect on placing B.
An example of what effect this would have and when it would be desirable would
be a nonessential reporting tool colocated with a resource-intensive service
that takes a long time to start. If the reporting tool fails enough times to
reach its migration threshold, by default the cluster will want to move both
resources to another node if possible. Setting ``influence`` to ``false`` on
the colocation constraint would mean that the reporting tool would be stopped
in this situation instead, to avoid forcing the service to move.
The ``critical`` resource meta-attribute is a convenient way to specify the
default for all colocation constraints and groups involving a particular
resource.
.. note::
If a noncritical resource is a member of a group, all later members of the
group will be treated as noncritical, even if they are marked as (or left to
default to) critical.
.. _s-resource-sets:
Resource Sets
#############
.. index::
single: constraint; resource set
single: resource; resource set
*Resource sets* allow multiple resources to be affected by a single constraint.
.. topic:: A set of 3 resources
.. code-block:: xml
Resource sets are valid inside ``rsc_location``, ``rsc_order``
(see :ref:`s-resource-sets-ordering`), ``rsc_colocation``
(see :ref:`s-resource-sets-colocation`), and ``rsc_ticket``
(see :ref:`ticket-constraints`) constraints.
A resource set has a number of properties that can be set, though not all
have an effect in all contexts.
.. index::
pair: XML element; resource_set
.. topic:: **Attributes of a resource_set Element**
+-------------+------------------+--------------------------------------------------------+
| Field | Default | Description |
+=============+==================+========================================================+
| id | | .. index:: |
| | | single: resource_set; attribute, id |
| | | single: attribute; id (resource_set) |
| | | single: id; resource_set attribute |
| | | |
| | | A unique name for the set (required) |
+-------------+------------------+--------------------------------------------------------+
| sequential | true | .. index:: |
| | | single: resource_set; attribute, sequential |
| | | single: attribute; sequential (resource_set) |
| | | single: sequential; resource_set attribute |
| | | |
| | | Whether the members of the set must be acted on in |
| | | order. Meaningful within ``rsc_order`` and |
| | | ``rsc_colocation``. |
+-------------+------------------+--------------------------------------------------------+
| require-all | true | .. index:: |
| | | single: resource_set; attribute, require-all |
| | | single: attribute; require-all (resource_set) |
| | | single: require-all; resource_set attribute |
| | | |
| | | Whether all members of the set must be active before |
| | | continuing. With the current implementation, the |
| | | cluster may continue even if only one member of the |
| | | set is started, but if more than one member of the set |
| | | is starting at the same time, the cluster will still |
| | | wait until all of those have started before continuing |
| | | (this may change in future versions). Meaningful |
| | | within ``rsc_order``. |
+-------------+------------------+--------------------------------------------------------+
| role | | .. index:: |
| | | single: resource_set; attribute, role |
| | | single: attribute; role (resource_set) |
| | | single: role; resource_set attribute |
| | | |
| | | The constraint applies only to resource set members |
| | | that are :ref:`s-resource-promotable` in this |
| | | role. Meaningful within ``rsc_location``, |
| | | ``rsc_colocation`` and ``rsc_ticket``. |
| | | Allowed values: ``Started``, ``Promoted``, |
| | | ``Unpromoted``. For details, see |
| | | :ref:`promotable-clone-constraints`. |
+-------------+------------------+--------------------------------------------------------+
| action | value of | .. index:: |
| | ``first-action`` | single: resource_set; attribute, action |
| | in the enclosing | single: attribute; action (resource_set) |
| | ordering | single: action; resource_set attribute |
| | constraint | |
| | | The action that applies to *all members* of the set. |
| | | Meaningful within ``rsc_order``. Allowed values: |
| | | ``start``, ``stop``, ``promote``, ``demote``. |
+-------------+------------------+--------------------------------------------------------+
| score | | .. index:: |
| | | single: resource_set; attribute, score |
| | | single: attribute; score (resource_set) |
| | | single: score; resource_set attribute |
| | | |
| | | *Advanced use only.* Use a specific score for this |
| | | set within the constraint. |
+-------------+------------------+--------------------------------------------------------+
.. _s-resource-sets-ordering:
Ordering Sets of Resources
##########################
A common situation is for an administrator to create a chain of ordered
resources, such as:
.. topic:: A chain of ordered resources
.. code-block:: xml
.. topic:: Visual representation of the four resources' start order for the above constraints
.. image:: images/resource-set.png
:alt: Ordered set
Ordered Set
___________
To simplify this situation, :ref:`s-resource-sets` can be used within ordering
constraints:
.. topic:: A chain of ordered resources expressed as a set
.. code-block:: xml
While the set-based format is not less verbose, it is significantly easier to
get right and maintain.
.. important::
If you use a higher-level tool, pay attention to how it exposes this
functionality. Depending on the tool, creating a set **A B** may be equivalent to
**A then B**, or **B then A**.
Ordering Multiple Sets
______________________
The syntax can be expanded to allow sets of resources to be ordered relative to
each other, where the members of each individual set may be ordered or
unordered (controlled by the ``sequential`` property). In the example below, **A**
and **B** can both start in parallel, as can **C** and **D**, however **C** and
**D** can only start once *both* **A** *and* **B** are active.
.. topic:: Ordered sets of unordered resources
.. code-block:: xml
.. topic:: Visual representation of the start order for two ordered sets of
unordered resources
.. image:: images/two-sets.png
:alt: Two ordered sets
Of course either set -- or both sets -- of resources can also be internally
ordered (by setting ``sequential="true"``) and there is no limit to the number
of sets that can be specified.
.. topic:: Advanced use of set ordering - Three ordered sets, two of which are
internally unordered
.. code-block:: xml
.. topic:: Visual representation of the start order for the three sets defined above
.. image:: images/three-sets.png
:alt: Three ordered sets
.. important::
An ordered set with ``sequential=false`` makes sense only if there is another
set in the constraint. Otherwise, the constraint has no effect.
Resource Set OR Logic
_____________________
The unordered set logic discussed so far has all been "AND" logic. To illustrate
this take the 3 resource set figure in the previous section. Those sets can be
expressed, **(A and B) then (C) then (D) then (E and F)**.
Say for example we want to change the first set, **(A and B)**, to use "OR" logic
so the sets look like this: **(A or B) then (C) then (D) then (E and F)**. This
functionality can be achieved through the use of the ``require-all`` option.
This option defaults to TRUE which is why the "AND" logic is used by default.
Setting ``require-all=false`` means only one resource in the set needs to be
started before continuing on to the next set.
.. topic:: Resource Set "OR" logic: Three ordered sets, where the first set is
internally unordered with "OR" logic
.. code-block:: xml
.. important::
An ordered set with ``require-all=false`` makes sense only in conjunction with
``sequential=false``. Think of it like this: ``sequential=false`` modifies the set
to be an unordered set using "AND" logic by default, and adding
``require-all=false`` flips the unordered set's "AND" logic to "OR" logic.
.. _s-resource-sets-colocation:
Colocating Sets of Resources
############################
Another common situation is for an administrator to create a set of
colocated resources.
The simplest way to do this is to define a resource group (see
:ref:`group-resources`), but that cannot always accurately express the desired
relationships. For example, maybe the resources do not need to be ordered.
Another way would be to define each relationship as an individual constraint,
but that causes a difficult-to-follow constraint explosion as the number of
resources and combinations grow.
.. topic:: Colocation chain as individual constraints, where A is placed first,
then B, then C, then D
.. code-block:: xml
To express complicated relationships with a simplified syntax [#]_,
:ref:`resource sets ` can be used within colocation constraints.
.. topic:: Equivalent colocation chain expressed using **resource_set**
.. code-block:: xml
.. note::
Within a ``resource_set``, the resources are listed in the order they are
*placed*, which is the reverse of the order in which they are *colocated*.
In the above example, resource **A** is placed before resource **B**, which is
the same as saying resource **B** is colocated with resource **A**.
As with individual constraints, a resource that can't be active prevents any
resource that must be colocated with it from being active. In both of the two
previous examples, if **B** is unable to run, then both **C** and by inference **D**
must remain stopped.
.. important::
If you use a higher-level tool, pay attention to how it exposes this
functionality. Depending on the tool, creating a set **A B** may be equivalent to
**A with B**, or **B with A**.
Resource sets can also be used to tell the cluster that entire *sets* of
resources must be colocated relative to each other, while the individual
members within any one set may or may not be colocated relative to each other
(determined by the set's ``sequential`` property).
In the following example, resources **B**, **C**, and **D** will each be colocated
with **A** (which will be placed first). **A** must be able to run in order for any
of the resources to run, but any of **B**, **C**, or **D** may be stopped without
affecting any of the others.
.. topic:: Using colocated sets to specify a shared dependency
.. code-block:: xml
.. note::
Pay close attention to the order in which resources and sets are listed.
While the members of any one sequential set are placed first to last (i.e., the
colocation dependency is last with first), multiple sets are placed last to
first (i.e. the colocation dependency is first with last).
.. important::
A colocated set with ``sequential="false"`` makes sense only if there is
another set in the constraint. Otherwise, the constraint has no effect.
There is no inherent limit to the number and size of the sets used.
The only thing that matters is that in order for any member of one set
in the constraint to be active, all members of sets listed after it must also
be active (and naturally on the same node); and if a set has ``sequential="true"``,
then in order for one member of that set to be active, all members listed
before it must also be active.
If desired, you can restrict the dependency to instances of promotable clone
resources that are in a specific role, using the set's ``role`` property.
.. topic:: Colocation in which the members of the middle set have no
interdependencies, and the last set listed applies only to promoted
instances
.. code-block:: xml
.. topic:: Visual representation of the above example (resources are placed from
left to right)
.. image:: ../shared/images/pcmk-colocated-sets.png
:alt: Colocation chain
.. note::
Unlike ordered sets, colocated sets do not use the ``require-all`` option.
.. [#] While the human brain is sophisticated enough to read the constraint
in any order and choose the correct one depending on the situation,
the cluster is not quite so smart. Yet.
.. [#] which is not the same as saying easy to follow
diff --git a/doc/sphinx/Pacemaker_Explained/fencing.rst b/doc/sphinx/Pacemaker_Explained/fencing.rst
index 9ed12b39a4..7ee9979e86 100644
--- a/doc/sphinx/Pacemaker_Explained/fencing.rst
+++ b/doc/sphinx/Pacemaker_Explained/fencing.rst
@@ -1,1170 +1,1286 @@
.. index::
single: fencing
single: STONITH
.. _fencing:
Fencing
-------
What Is Fencing?
################
*Fencing* is the ability to make a node unable to run resources, even when that
node is unresponsive to cluster commands.
Fencing is also known as *STONITH*, an acronym for "Shoot The Other Node In The
Head", since the most common fencing method is cutting power to the node.
Another method is "fabric fencing", cutting the node's access to some
capability required to run resources (such as network access or a shared disk).
.. index::
single: fencing; why necessary
Why Is Fencing Necessary?
#########################
Fencing protects your data from being corrupted by malfunctioning nodes or
unintentional concurrent access to shared resources.
Fencing protects against the "split brain" failure scenario, where cluster
nodes have lost the ability to reliably communicate with each other but are
still able to run resources. If the cluster just assumed that uncommunicative
nodes were down, then multiple instances of a resource could be started on
different nodes.
The effect of split brain depends on the resource type. For example, an IP
address brought up on two hosts on a network will cause packets to randomly be
sent to one or the other host, rendering the IP useless. For a database or
clustered file system, the effect could be much more severe, causing data
corruption or divergence.
Fencing is also used when a resource cannot otherwise be stopped. If a
resource fails to stop on a node, it cannot be started on a different node
without risking the same type of conflict as split-brain. Fencing the
original node ensures the resource can be safely started elsewhere.
Users may also configure the ``on-fail`` property of :ref:`operation` or the
``loss-policy`` property of
:ref:`ticket constraints ` to ``fence``, in which
case the cluster will fence the resource's node if the operation fails or the
ticket is lost.
.. index::
single: fencing; device
Fence Devices
#############
A *fence device* or *fencing device* is a special type of resource that
provides the means to fence a node.
Examples of fencing devices include intelligent power switches and IPMI devices
that accept SNMP commands to cut power to a node, and iSCSI controllers that
allow SCSI reservations to be used to cut a node's access to a shared disk.
Since fencing devices will be used to recover from loss of networking
connectivity to other nodes, it is essential that they do not rely on the same
network as the cluster itself, otherwise that network becomes a single point of
failure.
Since loss of a node due to power outage is indistinguishable from loss of
network connectivity to that node, it is also essential that at least one fence
device for a node does not share power with that node. For example, an on-board
IPMI controller that shares power with its host should not be used as the sole
fencing device for that host.
Since fencing is used to isolate malfunctioning nodes, no fence device should
rely on its target functioning properly. This includes, for example, devices
that ssh into a node and issue a shutdown command (such devices might be
suitable for testing, but never for production).
.. index::
single: fencing; agent
Fence Agents
############
A *fence agent* or *fencing agent* is a ``stonith``-class resource agent.
The fence agent standard provides commands (such as ``off`` and ``reboot``)
that the cluster can use to fence nodes. As with other resource agent classes,
this allows a layer of abstraction so that Pacemaker doesn't need any knowledge
about specific fencing technologies -- that knowledge is isolated in the agent.
+Pacemaker supports two fence agent standards, both inherited from
+no-longer-active projects:
+
+* Red Hat Cluster Suite (RHCS) style: These are typically installed in
+ ``/usr/sbin`` with names starting with ``fence_``.
+
+* Linux-HA style: These typically have names starting with ``external/``.
+ Pacemaker can support these agents using the **fence_legacy** RHCS-style
+ agent as a wrapper, *if* support was enabled when Pacemaker was built, which
+ requires the ``cluster-glue`` library.
+
When a Fence Device Can Be Used
###############################
Fencing devices do not actually "run" like most services. Typically, they just
provide an interface for sending commands to an external device.
Additionally, fencing may be initiated by Pacemaker, by other cluster-aware
software such as DRBD or DLM, or manually by an administrator, at any point in
the cluster life cycle, including before any resources have been started.
To accommodate this, Pacemaker does not require the fence device resource to be
"started" in order to be used. Whether a fence device is started or not
determines whether a node runs any recurring monitor for the device, and gives
the node a slight preference for being chosen to execute fencing using that
device.
By default, any node can execute any fencing device. If a fence device is
disabled by setting its ``target-role`` to ``Stopped``, then no node can use
that device. If a location constraint with a negative score prevents a specific
node from "running" a fence device, then that node will never be chosen to
execute fencing using the device. A node may fence itself, but the cluster will
choose that only if no other nodes can do the fencing.
A common configuration scenario is to have one fence device per target node.
In such a case, users often configure anti-location constraints so that
the target node does not monitor its own device.
Limitations of Fencing Resources
################################
Fencing resources have certain limitations that other resource classes don't:
* They may have only one set of meta-attributes and one set of instance
attributes.
* If :ref:`rules` are used to determine fencing resource options, these
might be evaluated only when first read, meaning that later changes to the
rules will have no effect. Therefore, it is better to avoid confusion and not
use rules at all with fencing resources.
These limitations could be revisited if there is sufficient user demand.
.. index::
single: fencing; special instance attributes
.. _fencing-attributes:
-Special Options for Fencing Resources
-#####################################
+Special Meta-Attributes for Fencing Resources
+#############################################
+
+The table below lists special resource meta-attributes that may be set for any
+fencing resource.
+
+.. table:: **Additional Properties of Fencing Resources**
+
+ +----------------------+---------+--------------------+----------------------------------------+
+ | Field | Type | Default | Description |
+ +======================+=========+====================+========================================+
+ | provides | string | | .. index:: |
+ | | | | single: provides |
+ | | | | |
+ | | | | Any special capability provided by the |
+ | | | | fence device. Currently, only one such |
+ | | | | capability is meaningful: |
+ | | | | :ref:`unfencing `. |
+ +----------------------+---------+--------------------+----------------------------------------+
+
+Special Instance Attributes for Fencing Resources
+#################################################
The table below lists special instance attributes that may be set for any
fencing resource (*not* meta-attributes, even though they are interpreted by
Pacemaker rather than the fence agent). These are also listed in the man page
for ``pacemaker-fenced``.
.. Not_Yet_Implemented:
+----------------------+---------+--------------------+----------------------------------------+
| priority | integer | 0 | .. index:: |
| | | | single: priority |
| | | | |
| | | | The priority of the fence device. |
| | | | Devices are tried in order of highest |
| | | | priority to lowest. |
+----------------------+---------+--------------------+----------------------------------------+
.. table:: **Additional Properties of Fencing Resources**
+----------------------+---------+--------------------+----------------------------------------+
| Field | Type | Default | Description |
+======================+=========+====================+========================================+
| stonith-timeout | time | | .. index:: |
| | | | single: stonith-timeout |
| | | | |
- | | | | Older versions used this to override |
- | | | | the default period to wait for a fence |
- | | | | action (reboot, on, or off) to |
- | | | | complete for this device. It has been |
- | | | | replaced by the |
- | | | | ``pcmk_reboot_timeout`` and |
- | | | | ``pcmk_off_timeout`` properties. |
- +----------------------+---------+--------------------+----------------------------------------+
- | provides | string | | .. index:: |
- | | | | single: provides |
- | | | | |
- | | | | Any special capability provided by the |
- | | | | fence device. Currently, only one such |
- | | | | capability is meaningful: |
- | | | | :ref:`unfencing `. |
+ | | | | This is not used by Pacemaker (see the |
+ | | | | ``pcmk_reboot_timeout``, |
+ | | | | ``pcmk_off_timeout``, etc. properties |
+ | | | | instead), but it may be used by |
+ | | | | Linux-HA fence agents. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_host_map | string | | .. index:: |
| | | | single: pcmk_host_map |
| | | | |
| | | | A mapping of host names to ports |
| | | | numbers for devices that do not |
| | | | support host names. |
| | | | |
| | | | Example: ``node1:1;node2:2,3`` tells |
| | | | the cluster to use port 1 for |
| | | | ``node1`` and ports 2 and 3 for |
| | | | ``node2``. If ``pcmk_host_check`` is |
| | | | explicitly set to ``static-list``, |
| | | | either this or ``pcmk_host_list`` must |
| | | | be set. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_host_list | string | | .. index:: |
| | | | single: pcmk_host_list |
| | | | |
| | | | A list of machines controlled by this |
| | | | device. If ``pcmk_host_check`` is |
| | | | explicitly set to ``static-list``, |
| | | | either this or ``pcmk_host_map`` must |
| | | | be set. |
+----------------------+---------+--------------------+----------------------------------------+
- | pcmk_host_check | string | The default is | .. index:: |
- | | | ``static-list`` if | single: pcmk_host_check |
- | | | either | |
- | | | ``pcmk_host_list`` | How to determine which machines are |
- | | | or | controlled by the device. Allowed |
- | | | ``pcmk_host_map`` | values: |
- | | | is configured. If | |
- | | | neither of those | * ``dynamic-list:`` query the device |
- | | | are configured, | via the agent's ``list`` action |
- | | | the default is | * ``static-list:`` check the |
- | | | ``dynamic-list`` | ``pcmk_host_list`` or |
- | | | if the fence | ``pcmk_host_map`` attribute |
- | | | device supports | * ``status:`` query the device via the |
- | | | the list action, | "status" command |
- | | | or ``status`` if | * ``none:`` assume the device can |
- | | | the fence device | fence any node |
- | | | supports the | |
- | | | status action but | |
- | | | not the list | |
- | | | action. If none of | |
- | | | those conditions | |
- | | | apply, the default | |
- | | | is ``none``. | |
+ | pcmk_host_check | string | Value appropriate | .. index:: |
+ | | | to other | single: pcmk_host_check |
+ | | | parameters (see | |
+ | | | "Default Check | The method Pacemaker should use to |
+ | | | Type" below) | determine which nodes can be targeted |
+ | | | | by this device. Allowed values: |
+ | | | | |
+ | | | | * ``static-list:`` targets are listed |
+ | | | | in the ``pcmk_host_list`` or |
+ | | | | ``pcmk_host_map`` attribute |
+ | | | | * ``dynamic-list:`` query the device |
+ | | | | via the agent's ``list`` action |
+ | | | | * ``status:`` query the device via the |
+ | | | | agent's ``status`` action |
+ | | | | * ``none:`` assume the device can |
+ | | | | fence any node |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_delay_max | time | 0s | .. index:: |
| | | | single: pcmk_delay_max |
| | | | |
| | | | Enable a delay of no more than the |
| | | | time specified before executing |
| | | | fencing actions. Pacemaker derives the |
| | | | overall delay by taking the value of |
| | | | pcmk_delay_base and adding a random |
| | | | delay value such that the sum is kept |
| | | | below this maximum. This is sometimes |
| | | | used in two-node clusters to ensure |
| | | | that the nodes don't fence each other |
| | | | at the same time. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_delay_base | time | 0s | .. index:: |
| | | | single: pcmk_delay_base |
| | | | |
| | | | Enable a static delay before executing |
| | | | fencing actions. This can be used, for |
| | | | example, in two-node clusters to |
| | | | ensure that the nodes don't fence each |
| | | | other, by having separate fencing |
| | | | resources with different values. The |
| | | | node that is fenced with the shorter |
| | | | delay will lose a fencing race. The |
| | | | overall delay introduced by pacemaker |
| | | | is derived from this value plus a |
| | | | random delay such that the sum is kept |
| | | | below the maximum delay. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_action_limit | integer | 1 | .. index:: |
| | | | single: pcmk_action_limit |
| | | | |
| | | | The maximum number of actions that can |
| | | | be performed in parallel on this |
- | | | | device, if the cluster option |
- | | | | ``concurrent-fencing`` is ``true``. A |
- | | | | value of -1 means unlimited. |
+ | | | | device. A value of -1 means unlimited. |
+ | | | | Node fencing actions initiated by the |
+ | | | | cluster (as opposed to an administrator|
+ | | | | running the ``stonith_admin`` tool or |
+ | | | | the fencer running recurring device |
+ | | | | monitors and ``status`` and ``list`` |
+ | | | | commands) are additionally subject to |
+ | | | | the ``concurrent-fencing`` cluster |
+ | | | | property. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_host_argument | string | ``port`` otherwise | .. index:: |
| | | ``plug`` if | single: pcmk_host_argument |
| | | supported | |
| | | according to the | *Advanced use only.* Which parameter |
| | | metadata of the | should be supplied to the fence agent |
| | | fence agent | to identify the node to be fenced. |
| | | | Some devices support neither the |
| | | | standard ``plug`` nor the deprecated |
| | | | ``port`` parameter, or may provide |
| | | | additional ones. Use this to specify |
| | | | an alternate, device-specific |
| | | | parameter. A value of ``none`` tells |
| | | | the cluster not to supply any |
| | | | additional parameters. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_reboot_action | string | reboot | .. index:: |
| | | | single: pcmk_reboot_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | reboot a node. Some devices do not |
| | | | support the standard commands or may |
| | | | provide additional ones. Use this to |
| | | | specify an alternate, device-specific |
| | | | command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_reboot_timeout | time | 60s | .. index:: |
| | | | single: pcmk_reboot_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``reboot`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_reboot_retries | integer | 2 | .. index:: |
| | | | single: pcmk_reboot_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``reboot`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_off_action | string | off | .. index:: |
| | | | single: pcmk_off_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | shut down a node. Some devices do not |
| | | | support the standard commands or may |
| | | | provide additional ones. Use this to |
| | | | specify an alternate, device-specific |
| | | | command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_off_timeout | time | 60s | .. index:: |
| | | | single: pcmk_off_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``off`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_off_retries | integer | 2 | .. index:: |
| | | | single: pcmk_off_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``off`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_list_action | string | list | .. index:: |
| | | | single: pcmk_list_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | list nodes. Some devices do not |
| | | | support the standard commands or may |
| | | | provide additional ones. Use this to |
| | | | specify an alternate, device-specific |
| | | | command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_list_timeout | time | 60s | .. index:: |
| | | | single: pcmk_list_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``list`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_list_retries | integer | 2 | .. index:: |
| | | | single: pcmk_list_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``list`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_monitor_action | string | monitor | .. index:: |
| | | | single: pcmk_monitor_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | report extended status. Some devices do|
| | | | not support the standard commands or |
| | | | may provide additional ones. Use this |
| | | | to specify an alternate, |
| | | | device-specific command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_monitor_timeout | time | 60s | .. index:: |
| | | | single: pcmk_monitor_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``monitor`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_monitor_retries | integer | 2 | .. index:: |
| | | | single: pcmk_monitor_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``monitor`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_status_action | string | status | .. index:: |
| | | | single: pcmk_status_action |
| | | | |
| | | | *Advanced use only.* The command to |
| | | | send to the resource agent in order to |
| | | | report status. Some devices do |
| | | | not support the standard commands or |
| | | | may provide additional ones. Use this |
| | | | to specify an alternate, |
| | | | device-specific command. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_status_timeout | time | 60s | .. index:: |
| | | | single: pcmk_status_timeout |
| | | | |
| | | | *Advanced use only.* Specify an |
| | | | alternate timeout to use for |
| | | | ``status`` actions instead of the |
| | | | value of ``stonith-timeout``. Some |
| | | | devices need much more or less time to |
| | | | complete than normal. Use this to |
| | | | specify an alternate, device-specific |
| | | | timeout. |
+----------------------+---------+--------------------+----------------------------------------+
| pcmk_status_retries | integer | 2 | .. index:: |
| | | | single: pcmk_status_retries |
| | | | |
| | | | *Advanced use only.* The maximum |
| | | | number of times to retry the |
| | | | ``status`` command within the timeout |
| | | | period. Some devices do not support |
| | | | multiple connections, and operations |
| | | | may fail if the device is busy with |
| | | | another task, so Pacemaker will |
| | | | automatically retry the operation, if |
| | | | there is time remaining. Use this |
| | | | option to alter the number of times |
| | | | Pacemaker retries before giving up. |
+----------------------+---------+--------------------+----------------------------------------+
+Default Check Type
+##################
+
+If the user does not explicitly configure ``pcmk_host_check`` for a fence
+device, a default value appropriate to other configured parameters will be
+used:
+
+* If either ``pcmk_host_list`` or ``pcmk_host_map`` is configured,
+ ``static-list`` will be used;
+* otherwise, if the fence device supports the ``list`` action, and the first
+ attempt at using ``list`` succeeds, ``dynamic-list`` will be used;
+* otherwise, if the fence device supports the ``status`` action, ``status``
+ will be used;
+* otherwise, ``none`` will be used.
+
.. index::
single: unfencing
single: fencing; unfencing
.. _unfencing:
Unfencing
#########
With fabric fencing (such as cutting network or shared disk access rather than
power), it is expected that the cluster will fence the node, and then a system
administrator must manually investigate what went wrong, correct any issues
found, then reboot (or restart the cluster services on) the node.
Once the node reboots and rejoins the cluster, some fabric fencing devices
require an explicit command to restore the node's access. This capability is
called *unfencing* and is typically implemented as the fence agent's ``on``
command.
If any cluster resource has ``requires`` set to ``unfencing``, then that
resource will not be probed or started on a node until that node has been
unfenced.
+Fencing and Quorum
+##################
+
+In general, a cluster partition may execute fencing only if the partition has
+quorum, and the ``stonith-enabled`` cluster property is set to true. However,
+there are exceptions:
+
+* The requirements apply only to fencing initiated by Pacemaker. If an
+ administrator initiates fencing using the ``stonith_admin`` command, or an
+ external application such as DLM initiates fencing using Pacemaker's C API,
+ the requirements do not apply.
+
+* A cluster partition without quorum is allowed to fence any active member of
+ that partition. As a corollary, this allows a ``no-quorum-policy`` of
+ ``suicide`` to work.
+
+* If the ``no-quorum-policy`` cluster property is set to ``ignore``, then
+ quorum is not required to execute fencing of any node.
+
+Fencing Timeouts
+################
+
+Fencing timeouts are complicated, since a single fencing operation can involve
+many steps, each of which may have a separate timeout.
+
+Fencing may be initiated in one of several ways:
+
+* An administrator may initiate fencing using the ``stonith_admin`` tool,
+ which has a ``--timeout`` option (defaulting to 2 minutes) that will be used
+ as the fence operation timeout.
+
+* An external application such as DLM may initiate fencing using the Pacemaker
+ C API. The application will specify the fence operation timeout in this case,
+ which might or might not be configurable by the user.
+
+* The cluster may initiate fencing itself. In this case, the
+ ``stonith-timeout`` cluster property (defaulting to 1 minute) will be used as
+ the fence operation timeout.
+
+However fencing is initiated, the initiator contacts Pacemaker's fencer
+(``pacemaker-fenced``) to request fencing. This connection and request has its
+own timeout, separate from the fencing operation timeout, but usually happens
+very quickly.
+
+The fencer will contact all fencers in the cluster to ask what devices they
+have available to fence the target node. The fence operation timeout will be
+used as the timeout for each of these queries.
+
+Once a fencing device has been selected, the fencer will check whether any
+action-specific timeout has been configured for the device, to use instead of
+the fence operation timeout. For example, if ``stonith-timeout`` is 60 seconds,
+but the fencing device has ``pcmk_reboot_timeout`` configured as 90 seconds,
+then a timeout of 90 seconds will be used for reboot actions using that device.
+
+A device may have retries configured, in which case the timeout applies across
+all attempts. For example, if a device has ``pcmk_reboot_retries`` configured
+as 2, and the first reboot attempt fails, the second attempt will only have
+whatever time is remaining in the action timeout after subtracting how much
+time the first attempt used. This means that if the first attempt fails due to
+using the entire timeout, no further attempts will be made. There is currently
+no way to configure a per-attempt timeout.
+
+If more than one device is required to fence a target, whether due to failure
+of the first device or a fencing topology with multiple devices configured for
+the target, each device will have its own separate action timeout.
+
+For all of the above timeouts, the fencer will generally multiply the
+configured value by 1.2 to get an actual value to use, to account for time
+needed by the fencer's own processing.
+
+Separate from the fencer's timeouts, some fence agents have internal timeouts
+for individual steps of their fencing process. These agents often have
+parameters to configure these timeouts, such as ``login-timeout``,
+``shell-timeout``, or ``power-timeout``. Many such agents also have a
+``disable-timeout`` parameter to ignore their internal timeouts and just let
+Pacemaker handle the timeout. This causes a difference in retry behavior.
+If ``disable-timeout`` is not set, and the agent hits one of its internal
+timeouts, it will report that as a failure to Pacemaker, which can then retry.
+If ``disable-timeout`` is set, and Pacemaker hits a timeout for the agent, then
+there will be no time remaining, and no retry will be done.
+
Fence Devices Dependent on Other Resources
##########################################
In some cases, a fence device may require some other cluster resource (such as
an IP address) to be active in order to function properly.
This is obviously undesirable in general: fencing may be required when the
depended-on resource is not active, or fencing may be required because the node
running the depended-on resource is no longer responding.
However, this may be acceptable under certain conditions:
* The dependent fence device should not be able to target any node that is
allowed to run the depended-on resource.
* The depended-on resource should not be disabled during production operation.
* The ``concurrent-fencing`` cluster property should be set to ``true``.
Otherwise, if both the node running the depended-on resource and some node
targeted by the dependent fence device need to be fenced, the fencing of the
node running the depended-on resource might be ordered first, making the
second fencing impossible and blocking further recovery. With concurrent
fencing, the dependent fence device might fail at first due to the
depended-on resource being unavailable, but it will be retried and eventually
succeed once the resource is brought back up.
Even under those conditions, there is one unlikely problem scenario. The DC
always schedules fencing of itself after any other fencing needed, to avoid
unnecessary repeated DC elections. If the dependent fence device targets the
DC, and both the DC and a different node running the depended-on resource need
to be fenced, the DC fencing will always fail and block further recovery. Note,
however, that losing a DC node entirely causes some other node to become DC and
schedule the fencing, so this is only a risk when a stop or other operation
with ``on-fail`` set to ``fencing`` fails on the DC.
.. index::
single: fencing; configuration
Configuring Fencing
###################
Higher-level tools can provide simpler interfaces to this process, but using
Pacemaker command-line tools, this is how you could configure a fence device.
#. Find the correct driver:
.. code-block:: none
# stonith_admin --list-installed
.. note::
You may have to install packages to make fence agents available on your
host. Searching your available packages for ``fence-`` is usually
helpful. Ensure the packages providing the fence agents you require are
installed on every cluster node.
#. Find the required parameters associated with the device
(replacing ``$AGENT_NAME`` with the name obtained from the previous step):
.. code-block:: none
# stonith_admin --metadata --agent $AGENT_NAME
#. Create a file called ``stonith.xml`` containing a primitive resource
with a class of ``stonith``, a type equal to the agent name obtained earlier,
and a parameter for each of the values returned in the previous step.
#. If the device does not know how to fence nodes based on their uname,
you may also need to set the special ``pcmk_host_map`` parameter. See
:ref:`fencing-attributes` for details.
#. If the device does not support the ``list`` command, you may also need
to set the special ``pcmk_host_list`` and/or ``pcmk_host_check``
parameters. See :ref:`fencing-attributes` for details.
#. If the device does not expect the victim to be specified with the
``port`` parameter, you may also need to set the special
``pcmk_host_argument`` parameter. See :ref:`fencing-attributes` for details.
#. Upload it into the CIB using cibadmin:
.. code-block:: none
# cibadmin --create --scope resources --xml-file stonith.xml
#. Set ``stonith-enabled`` to true:
.. code-block:: none
# crm_attribute --type crm_config --name stonith-enabled --update true
#. Once the stonith resource is running, you can test it by executing the
following, replacing ``$NODE_NAME`` with the name of the node to fence
(although you might want to stop the cluster on that machine first):
.. code-block:: none
# stonith_admin --reboot $NODE_NAME
Example Fencing Configuration
_____________________________
For this example, we assume we have a cluster node, ``pcmk-1``, whose IPMI
controller is reachable at the IP address 192.0.2.1. The IPMI controller uses
the username ``testuser`` and the password ``abc123``.
#. Looking at what's installed, we may see a variety of available agents:
.. code-block:: none
# stonith_admin --list-installed
.. code-block:: none
(... some output omitted ...)
fence_idrac
fence_ilo3
fence_ilo4
fence_ilo5
fence_imm
fence_ipmilan
(... some output omitted ...)
Perhaps after some reading some man pages and doing some Internet searches,
we might decide ``fence_ipmilan`` is our best choice.
#. Next, we would check what parameters ``fence_ipmilan`` provides:
.. code-block:: none
# stonith_admin --metadata -a fence_ipmilan
.. code-block:: xml
fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.Fencing actionIPMI Lan Auth type.Ciphersuite to use (same as ipmitool -C parameter)Hexadecimal-encoded Kg key for IPMIv2 authenticationIP address or hostname of fencing deviceIP address or hostname of fencing deviceTCP/UDP port to use for connection with deviceUse Lanplus to improve security of connectionLogin nameMethod to fenceLogin password or passphraseScript to run to retrieve passwordLogin password or passphraseScript to run to retrieve passwordIP address or hostname of fencing device (together with --port-as-ip)IP address or hostname of fencing device (together with --port-as-ip)Privilege level on IPMI deviceBridge IPMI requests to the remote target addressLogin nameDisable logging to stderr. Does not affect --verbose or --debug-file or logging to syslog.Verbose modeWrite debug information to given fileWrite debug information to given fileDisplay version information and exitDisplay help and exitWait X seconds before fencing is startedPath to ipmitool binaryWait X seconds for cmd prompt after loginMake "port/plug" to be an alias to IP addressTest X seconds for status change after ON/OFFWait X seconds after issuing ON/OFFWait X seconds for cmd prompt after issuing commandCount of attempts to retry power onUse sudo (without password) when calling 3rd party softwareUse sudo (without password) when calling 3rd party softwarePath to sudo binary
Once we've decided what parameter values we think we need, it is a good idea
to run the fence agent's status action manually, to verify that our values
work correctly:
.. code-block:: none
# fence_ipmilan --lanplus -a 192.0.2.1 -l testuser -p abc123 -o status
Chassis Power is on
#. Based on that, we might create a fencing resource configuration like this in
``stonith.xml`` (or any file name, just use the same name with ``cibadmin``
later):
.. code-block:: xml
.. note::
Even though the man page shows that the ``action`` parameter is
supported, we do not provide that in the resource configuration.
Pacemaker will supply an appropriate action whenever the fence device
must be used.
#. In this case, we don't need to configure ``pcmk_host_map`` because
``fence_ipmilan`` ignores the target node name and instead uses its
``ip`` parameter to know how to contact the IPMI controller.
#. We do need to let Pacemaker know which cluster node can be fenced by this
device, since ``fence_ipmilan`` doesn't support the ``list`` action. Add
a line like this to the agent's instance attributes:
.. code-block:: xml
#. We don't need to configure ``pcmk_host_argument`` since ``ip`` is all the
fence agent needs (it ignores the target name).
#. Make the configuration active:
.. code-block:: none
# cibadmin --create --scope resources --xml-file stonith.xml
#. Set ``stonith-enabled`` to true (this only has to be done once):
.. code-block:: none
# crm_attribute --type crm_config --name stonith-enabled --update true
#. Since our cluster is still in testing, we can reboot ``pcmk-1`` without
bothering anyone, so we'll test our fencing configuration by running this
from one of the other cluster nodes:
.. code-block:: none
# stonith_admin --reboot pcmk-1
Then we will verify that the node did, in fact, reboot.
We can repeat that process to create a separate fencing resource for each node.
With some other fence device types, a single fencing resource is able to be
used for all nodes. In fact, we could do that with ``fence_ipmilan``, using the
``port-as-ip`` parameter along with ``pcmk_host_map``. Either approach is
fine.
.. index::
single: fencing; topology
single: fencing-topology
single: fencing-level
Fencing Topologies
##################
Pacemaker supports fencing nodes with multiple devices through a feature called
*fencing topologies*. Fencing topologies may be used to provide alternative
devices in case one fails, or to require multiple devices to all be executed
successfully in order to consider the node successfully fenced, or even a
combination of the two.
Create the individual devices as you normally would, then define one or more
``fencing-level`` entries in the ``fencing-topology`` section of the
configuration.
* Each fencing level is attempted in order of ascending ``index``. Allowed
values are 1 through 9.
* If a device fails, processing terminates for the current level. No further
devices in that level are exercised, and the next level is attempted instead.
* If the operation succeeds for all the listed devices in a level, the level is
deemed to have passed.
* The operation is finished when a level has passed (success), or all levels
have been attempted (failed).
* If the operation failed, the next step is determined by the scheduler and/or
the controller.
Some possible uses of topologies include:
* Try on-board IPMI, then an intelligent power switch if that fails
* Try fabric fencing of both disk and network, then fall back to power fencing
if either fails
* Wait up to a certain time for a kernel dump to complete, then cut power to
the node
.. table:: **Attributes of a fencing-level Element**
+------------------+-----------------------------------------------------------------------------------------+
| Attribute | Description |
+==================+=========================================================================================+
| id | .. index:: |
| | pair: fencing-level; id |
| | |
| | A unique name for this element (required) |
+------------------+-----------------------------------------------------------------------------------------+
| target | .. index:: |
| | pair: fencing-level; target |
| | |
| | The name of a single node to which this level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-pattern | .. index:: |
| | pair: fencing-level; target-pattern |
| | |
| | An extended regular expression (as defined in `POSIX |
| | `_) |
| | matching the names of nodes to which this level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-attribute | .. index:: |
| | pair: fencing-level; target-attribute |
| | |
| | The name of a node attribute that is set (to ``target-value``) for nodes to which this |
| | level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-value | .. index:: |
| | pair: fencing-level; target-value |
| | |
| | The node attribute value (of ``target-attribute``) that is set for nodes to which this |
| | level applies |
+------------------+-----------------------------------------------------------------------------------------+
| index | .. index:: |
| | pair: fencing-level; index |
| | |
| | The order in which to attempt the levels. Levels are attempted in ascending order |
| | *until one succeeds*. Valid values are 1 through 9. |
+------------------+-----------------------------------------------------------------------------------------+
| devices | .. index:: |
| | pair: fencing-level; devices |
| | |
| | A comma-separated list of devices that must all be tried for this level |
+------------------+-----------------------------------------------------------------------------------------+
.. note:: **Fencing topology with different devices for different nodes**
.. code-block:: xml
...
...
Example Dual-Layer, Dual-Device Fencing Topologies
__________________________________________________
The following example illustrates an advanced use of ``fencing-topology`` in a
cluster with the following properties:
* 2 nodes (prod-mysql1 and prod-mysql2)
* the nodes have IPMI controllers reachable at 192.0.2.1 and 192.0.2.2
* the nodes each have two independent Power Supply Units (PSUs) connected to
two independent Power Distribution Units (PDUs) reachable at 198.51.100.1
(port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
* fencing via the IPMI controller uses the ``fence_ipmilan`` agent (1 fence device
per controller, with each device targeting a separate node)
* fencing via the PDUs uses the ``fence_apc_snmp`` agent (1 fence device per
PDU, with both devices targeting both nodes)
* a random delay is used to lessen the chance of a "death match"
* fencing topology is set to try IPMI fencing first then dual PDU fencing if
that fails
In a node failure scenario, Pacemaker will first select ``fence_ipmilan`` to
try to kill the faulty node. Using the fencing topology, if that method fails,
it will then move on to selecting ``fence_apc_snmp`` twice (once for the first
PDU, then again for the second PDU).
The fence action is considered successful only if both PDUs report the required
status. If any of them fails, fencing loops back to the first fencing method,
``fence_ipmilan``, and so on, until the node is fenced or the fencing action is
cancelled.
.. note:: **First fencing method: single IPMI device per target**
Each cluster node has it own dedicated IPMI controller that can be contacted
for fencing using the following primitives:
.. code-block:: xml
.. note:: **Second fencing method: dual PDU devices**
Each cluster node also has 2 distinct power supplies controlled by 2
distinct PDUs:
* Node 1: PDU 1 port 10 and PDU 2 port 10
* Node 2: PDU 1 port 11 and PDU 2 port 11
The matching fencing agents are configured as follows:
.. code-block:: xml
.. note:: **Fencing topology**
Now that all the fencing resources are defined, it's time to create the
right topology. We want to first fence using IPMI and if that does not work,
fence both PDUs to effectively and surely kill the node.
.. code-block:: xml
In ``fencing-topology``, the lowest ``index`` value for a target determines
its first fencing method.
Remapping Reboots
#################
When the cluster needs to reboot a node, whether because ``stonith-action`` is
``reboot`` or because a reboot was requested externally (such as by
``stonith_admin --reboot``), it will remap that to other commands in two cases:
* If the chosen fencing device does not support the ``reboot`` command, the
cluster will ask it to perform ``off`` instead.
* If a fencing topology level with multiple devices must be executed, the
cluster will ask all the devices to perform ``off``, then ask the devices to
perform ``on``.
To understand the second case, consider the example of a node with redundant
power supplies connected to intelligent power switches. Rebooting one switch
and then the other would have no effect on the node. Turning both switches off,
and then on, actually reboots the node.
In such a case, the fencing operation will be treated as successful as long as
the ``off`` commands succeed, because then it is safe for the cluster to
recover any resources that were on the node. Timeouts and errors in the ``on``
phase will be logged but ignored.
When a reboot operation is remapped, any action-specific timeout for the
remapped action will be used (for example, ``pcmk_off_timeout`` will be used
when executing the ``off`` command, not ``pcmk_reboot_timeout``).
diff --git a/doc/sphinx/Pacemaker_Explained/intro.rst b/doc/sphinx/Pacemaker_Explained/intro.rst
index 3c3805b931..a1240c308c 100644
--- a/doc/sphinx/Pacemaker_Explained/intro.rst
+++ b/doc/sphinx/Pacemaker_Explained/intro.rst
@@ -1,22 +1,22 @@
Introduction
------------
The Scope of this Document
##########################
This document is intended to be an exhaustive reference for configuring
Pacemaker. To achieve this, it focuses on the XML syntax used to configure the
CIB.
For those that are allergic to XML, multiple higher-level front-ends
(both command-line and GUI) are available. These tools will not be covered
in this document, though the concepts explained here should make the
functionality of these tools more easily understood.
Users may be interested in other parts of the
`Pacemaker documentation set `_,
-such as 'Clusters from Scratch', a step-by-step guide to setting up an
-example cluster, and 'Pacemaker Administration', a guide to maintaining a
+such as *Clusters from Scratch*, a step-by-step guide to setting up an
+example cluster, and *Pacemaker Administration*, a guide to maintaining a
cluster.
.. include:: ../shared/pacemaker-intro.rst
diff --git a/doc/sphinx/Pacemaker_Explained/options.rst b/doc/sphinx/Pacemaker_Explained/options.rst
index 9bc92ef53b..c83be50819 100644
--- a/doc/sphinx/Pacemaker_Explained/options.rst
+++ b/doc/sphinx/Pacemaker_Explained/options.rst
@@ -1,611 +1,618 @@
Cluster-Wide Configuration
--------------------------
.. index::
pair: XML element; cib
pair: XML element; configuration
Configuration Layout
####################
The cluster is defined by the Cluster Information Base (CIB), which uses XML
notation. The simplest CIB, an empty one, looks like this:
.. topic:: An empty configuration
.. code-block:: xml
The empty configuration above contains the major sections that make up a CIB:
* ``cib``: The entire CIB is enclosed with a ``cib`` element. Certain
fundamental settings are defined as attributes of this element.
* ``configuration``: This section -- the primary focus of this document --
contains traditional configuration information such as what resources the
cluster serves and the relationships among them.
* ``crm_config``: cluster-wide configuration options
* ``nodes``: the machines that host the cluster
* ``resources``: the services run by the cluster
* ``constraints``: indications of how resources should be placed
* ``status``: This section contains the history of each resource on each
node. Based on this data, the cluster can construct the complete current
state of the cluster. The authoritative source for this section is the
local executor (pacemaker-execd process) on each cluster node, and the
cluster will occasionally repopulate the entire section. For this reason,
it is never written to disk, and administrators are advised against
modifying it in any way.
In this document, configuration settings will be described as properties or
options based on how they are defined in the CIB:
* Properties are XML attributes of an XML element.
* Options are name-value pairs expressed as ``nvpair`` child elements of an XML
element.
Normally, you will use command-line tools that abstract the XML, so the
distinction will be unimportant; both properties and options are cluster
settings you can tweak.
CIB Properties
##############
Certain settings are defined by CIB properties (that is, attributes of the
``cib`` tag) rather than with the rest of the cluster configuration in the
``configuration`` section.
The reason is simply a matter of parsing. These options are used by the
configuration database which is, by design, mostly ignorant of the content it
holds. So the decision was made to place them in an easy-to-find location.
.. table:: **CIB Properties**
+------------------+-----------------------------------------------------------+
| Attribute | Description |
+==================+===========================================================+
| admin_epoch | .. index:: |
| | pair: admin_epoch; cib |
| | |
| | When a node joins the cluster, the cluster performs a |
| | check to see which node has the best configuration. It |
| | asks the node with the highest (``admin_epoch``, |
| | ``epoch``, ``num_updates``) tuple to replace the |
| | configuration on all the nodes -- which makes setting |
| | them, and setting them correctly, very important. |
| | ``admin_epoch`` is never modified by the cluster; you can |
| | use this to make the configurations on any inactive nodes |
| | obsolete. |
| | |
| | **Warning:** Never set this value to zero. In such cases, |
| | the cluster cannot tell the difference between your |
| | configuration and the "empty" one used when nothing is |
| | found on disk. |
+------------------+-----------------------------------------------------------+
| epoch | .. index:: |
| | pair: epoch; cib |
| | |
| | The cluster increments this every time the configuration |
| | is updated (usually by the administrator). |
+------------------+-----------------------------------------------------------+
| num_updates | .. index:: |
| | pair: num_updates; cib |
| | |
| | The cluster increments this every time the configuration |
| | or status is updated (usually by the cluster) and resets |
| | it to 0 when epoch changes. |
+------------------+-----------------------------------------------------------+
| validate-with | .. index:: |
| | pair: validate-with; cib |
| | |
| | Determines the type of XML validation that will be done |
| | on the configuration. If set to ``none``, the cluster |
| | will not verify that updates conform to the DTD (nor |
| | reject ones that don't). |
+------------------+-----------------------------------------------------------+
| cib-last-written | .. index:: |
| | pair: cib-last-written; cib |
| | |
| | Indicates when the configuration was last written to |
| | disk. Maintained by the cluster; for informational |
| | purposes only. |
+------------------+-----------------------------------------------------------+
| have-quorum | .. index:: |
| | pair: have-quorum; cib |
| | |
| | Indicates if the cluster has quorum. If false, this may |
| | mean that the cluster cannot start resources or fence |
| | other nodes (see ``no-quorum-policy`` below). Maintained |
| | by the cluster. |
+------------------+-----------------------------------------------------------+
| dc-uuid | .. index:: |
| | pair: dc-uuid; cib |
| | |
| | Indicates which cluster node is the current leader. Used |
| | by the cluster when placing resources and determining the |
| | order of some events. Maintained by the cluster. |
+------------------+-----------------------------------------------------------+
.. _cluster_options:
Cluster Options
###############
Cluster options, as you might expect, control how the cluster behaves when
confronted with various situations.
They are grouped into sets within the ``crm_config`` section. In advanced
configurations, there may be more than one set. (This will be described later
in the chapter on :ref:`rules` where we will show how to have the cluster use
different sets of options during working hours than during weekends.) For now,
we will describe the simple case where each option is present at most once.
You can obtain an up-to-date list of cluster options, including their default
values, by running the ``man pacemaker-schedulerd`` and
``man pacemaker-controld`` commands.
.. table:: **Cluster Options**
+---------------------------+---------+----------------------------------------------------+
| Option | Default | Description |
+===========================+=========+====================================================+
| cluster-name | | .. index:: |
| | | pair: cluster option; cluster-name |
| | | |
| | | An (optional) name for the cluster as a whole. |
| | | This is mostly for users' convenience for use |
| | | as desired in administration, but this can be |
| | | used in the Pacemaker configuration in |
| | | :ref:`rules` (as the ``#cluster-name`` |
| | | :ref:`node attribute |
| | | `. It may |
| | | also be used by higher-level tools when |
| | | displaying cluster information, and by |
| | | certain resource agents (for example, the |
| | | ``ocf:heartbeat:GFS2`` agent stores the |
| | | cluster name in filesystem meta-data). |
+---------------------------+---------+----------------------------------------------------+
| dc-version | | .. index:: |
| | | pair: cluster option; dc-version |
| | | |
| | | Version of Pacemaker on the cluster's DC. |
| | | Determined automatically by the cluster. Often |
| | | includes the hash which identifies the exact |
| | | Git changeset it was built from. Used for |
| | | diagnostic purposes. |
+---------------------------+---------+----------------------------------------------------+
| cluster-infrastructure | | .. index:: |
| | | pair: cluster option; cluster-infrastructure |
| | | |
| | | The messaging stack on which Pacemaker is |
| | | currently running. Determined automatically by |
| | | the cluster. Used for informational and |
| | | diagnostic purposes. |
+---------------------------+---------+----------------------------------------------------+
| no-quorum-policy | stop | .. index:: |
| | | pair: cluster option; no-quorum-policy |
| | | |
| | | What to do when the cluster does not have |
| | | quorum. Allowed values: |
| | | |
| | | * ``ignore:`` continue all resource management |
| | | * ``freeze:`` continue resource management, but |
| | | don't recover resources from nodes not in the |
| | | affected partition |
| | | * ``stop:`` stop all resources in the affected |
| | | cluster partition |
| | | * ``demote:`` demote promotable resources and |
| | | stop all other resources in the affected |
| | | cluster partition *(since 2.0.5)* |
| | | * ``suicide:`` fence all nodes in the affected |
| | | cluster partition |
+---------------------------+---------+----------------------------------------------------+
| batch-limit | 0 | .. index:: |
| | | pair: cluster option; batch-limit |
| | | |
| | | The maximum number of actions that the cluster |
| | | may execute in parallel across all nodes. The |
| | | "correct" value will depend on the speed and |
| | | load of your network and cluster nodes. If zero, |
| | | the cluster will impose a dynamically calculated |
- | | | limit only when any node has high load. |
+ | | | limit only when any node has high load. If -1, the |
+ | | | cluster will not impose any limit. |
+---------------------------+---------+----------------------------------------------------+
| migration-limit | -1 | .. index:: |
| | | pair: cluster option; migration-limit |
| | | |
| | | The number of |
| | | :ref:`live migration ` actions |
| | | that the cluster is allowed to execute in |
| | | parallel on a node. A value of -1 means |
| | | unlimited. |
+---------------------------+---------+----------------------------------------------------+
| symmetric-cluster | true | .. index:: |
| | | pair: cluster option; symmetric-cluster |
| | | |
| | | Whether resources can run on any node by default |
| | | (if false, a resource is allowed to run on a |
| | | node only if a |
| | | :ref:`location constraint ` |
| | | enables it) |
+---------------------------+---------+----------------------------------------------------+
| stop-all-resources | false | .. index:: |
| | | pair: cluster option; stop-all-resources |
| | | |
| | | Whether all resources should be disallowed from |
| | | running (can be useful during maintenance) |
+---------------------------+---------+----------------------------------------------------+
| stop-orphan-resources | true | .. index:: |
| | | pair: cluster option; stop-orphan-resources |
| | | |
| | | Whether resources that have been deleted from |
| | | the configuration should be stopped. This value |
| | | takes precedence over ``is-managed`` (that is, |
| | | even unmanaged resources will be stopped when |
| | | orphaned if this value is ``true`` |
+---------------------------+---------+----------------------------------------------------+
| stop-orphan-actions | true | .. index:: |
| | | pair: cluster option; stop-orphan-actions |
| | | |
| | | Whether recurring :ref:`operations ` |
| | | that have been deleted from the configuration |
| | | should be cancelled |
+---------------------------+---------+----------------------------------------------------+
| start-failure-is-fatal | true | .. index:: |
| | | pair: cluster option; start-failure-is-fatal |
| | | |
| | | Whether a failure to start a resource on a |
| | | particular node prevents further start attempts |
| | | on that node? If ``false``, the cluster will |
| | | decide whether the node is still eligible based |
| | | on the resource's current failure count and |
| | | :ref:`migration-threshold `. |
+---------------------------+---------+----------------------------------------------------+
| enable-startup-probes | true | .. index:: |
| | | pair: cluster option; enable-startup-probes |
| | | |
| | | Whether the cluster should check the |
| | | pre-existing state of resources when the cluster |
| | | starts |
+---------------------------+---------+----------------------------------------------------+
| maintenance-mode | false | .. index:: |
| | | pair: cluster option; maintenance-mode |
| | | |
| | | Whether the cluster should refrain from |
| | | monitoring, starting and stopping resources |
+---------------------------+---------+----------------------------------------------------+
| stonith-enabled | true | .. index:: |
| | | pair: cluster option; stonith-enabled |
| | | |
| | | Whether the cluster is allowed to fence nodes |
| | | (for example, failed nodes and nodes with |
| | | resources that can't be stopped. |
| | | |
| | | If true, at least one fence device must be |
| | | configured before resources are allowed to run. |
| | | |
| | | If false, unresponsive nodes are immediately |
| | | assumed to be running no resources, and resource |
| | | recovery on online nodes starts without any |
| | | further protection (which can mean *data loss* |
| | | if the unresponsive node still accesses shared |
| | | storage, for example). See also the |
| | | :ref:`requires ` resource |
| | | meta-attribute. |
+---------------------------+---------+----------------------------------------------------+
| stonith-action | reboot | .. index:: |
| | | pair: cluster option; stonith-action |
| | | |
| | | Action the cluster should send to the fence agent |
| | | when a node must be fenced. Allowed values are |
| | | ``reboot``, ``off``, and (for legacy agents only) |
| | | ``poweroff``. |
+---------------------------+---------+----------------------------------------------------+
| stonith-timeout | 60s | .. index:: |
| | | pair: cluster option; stonith-timeout |
| | | |
| | | How long to wait for ``on``, ``off``, and |
| | | ``reboot`` fence actions to complete by default. |
+---------------------------+---------+----------------------------------------------------+
| stonith-max-attempts | 10 | .. index:: |
| | | pair: cluster option; stonith-max-attempts |
| | | |
| | | How many times fencing can fail for a target |
| | | before the cluster will no longer immediately |
| | | re-attempt it. |
+---------------------------+---------+----------------------------------------------------+
| stonith-watchdog-timeout | 0 | .. index:: |
| | | pair: cluster option; stonith-watchdog-timeout |
| | | |
| | | If nonzero, and the cluster detects |
| | | ``have-watchdog`` as ``true``, then watchdog-based |
| | | self-fencing will be performed via SBD when |
| | | fencing is required, without requiring a fencing |
| | | resource explicitly configured. |
| | | |
| | | If this is set to a positive value, unseen nodes |
| | | are assumed to self-fence within this much time. |
| | | |
| | | **Warning:** It must be ensured that this value is |
| | | larger than the ``SBD_WATCHDOG_TIMEOUT`` |
| | | environment variable on all nodes. Pacemaker |
| | | verifies the settings individually on all nodes |
| | | and prevents startup or shuts down if configured |
| | | wrongly on the fly. It is strongly recommended |
| | | that ``SBD_WATCHDOG_TIMEOUT`` be set to the same |
| | | value on all nodes. |
| | | |
| | | If this is set to a negative value, and |
| | | ``SBD_WATCHDOG_TIMEOUT`` is set, twice that value |
| | | will be used. |
| | | |
| | | **Warning:** In this case, it is essential (and |
| | | currently not verified by pacemaker) that |
| | | ``SBD_WATCHDOG_TIMEOUT`` is set to the same |
| | | value on all nodes. |
+---------------------------+---------+----------------------------------------------------+
| concurrent-fencing | false | .. index:: |
| | | pair: cluster option; concurrent-fencing |
| | | |
- | | | Whether the cluster is allowed to initiate multiple|
- | | | fence actions concurrently |
+ | | | Whether the cluster is allowed to initiate |
+ | | | multiple fence actions concurrently. Fence actions |
+ | | | initiated externally, such as via the |
+ | | | ``stonith_admin`` tool or an application such as |
+ | | | DLM, or by the fencer itself such as recurring |
+ | | | device monitors and ``status`` and ``list`` |
+ | | | commands, are not limited by this option. |
+---------------------------+---------+----------------------------------------------------+
| fence-reaction | stop | .. index:: |
| | | pair: cluster option; fence-reaction |
| | | |
| | | How should a cluster node react if notified of its |
| | | own fencing? A cluster node may receive |
| | | notification of its own fencing if fencing is |
| | | misconfigured, or if fabric fencing is in use that |
| | | doesn't cut cluster communication. Allowed values |
| | | are ``stop`` to attempt to immediately stop |
| | | pacemaker and stay stopped, or ``panic`` to |
| | | attempt to immediately reboot the local node, |
| | | falling back to stop on failure. The default is |
| | | likely to be changed to ``panic`` in a future |
| | | release. *(since 2.0.3)* |
+---------------------------+---------+----------------------------------------------------+
| priority-fencing-delay | 0 | .. index:: |
| | | pair: cluster option; priority-fencing-delay |
| | | |
| | | Apply this delay to any fencing targeting the lost |
| | | nodes with the highest total resource priority in |
| | | case we don't have the majority of the nodes in |
| | | our cluster partition, so that the more |
| | | significant nodes potentially win any fencing |
| | | match (especially meaningful in a split-brain of a |
| | | 2-node cluster). A promoted resource instance |
| | | takes the resource's priority plus 1 if the |
| | | resource's priority is not 0. Any static or random |
| | | delays introduced by ``pcmk_delay_base`` and |
| | | ``pcmk_delay_max`` configured for the |
| | | corresponding fencing resources will be added to |
| | | this delay. This delay should be significantly |
| | | greater than (safely twice) the maximum delay from |
| | | those parameters. *(since 2.0.4)* |
+---------------------------+---------+----------------------------------------------------+
| cluster-delay | 60s | .. index:: |
| | | pair: cluster option; cluster-delay |
| | | |
| | | Estimated maximum round-trip delay over the |
| | | network (excluding action execution). If the DC |
| | | requires an action to be executed on another node, |
| | | it will consider the action failed if it does not |
| | | get a response from the other node in this time |
| | | (after considering the action's own timeout). The |
| | | "correct" value will depend on the speed and load |
| | | of your network and cluster nodes. |
+---------------------------+---------+----------------------------------------------------+
| dc-deadtime | 20s | .. index:: |
| | | pair: cluster option; dc-deadtime |
| | | |
| | | How long to wait for a response from other nodes |
| | | during startup. The "correct" value will depend on |
| | | the speed/load of your network and the type of |
| | | switches used. |
+---------------------------+---------+----------------------------------------------------+
| cluster-ipc-limit | 500 | .. index:: |
| | | pair: cluster option; cluster-ipc-limit |
| | | |
| | | The maximum IPC message backlog before one cluster |
| | | daemon will disconnect another. This is of use in |
| | | large clusters, for which a good value is the |
| | | number of resources in the cluster multiplied by |
| | | the number of nodes. The default of 500 is also |
| | | the minimum. Raise this if you see |
| | | "Evicting client" messages for cluster daemon PIDs |
| | | in the logs. |
+---------------------------+---------+----------------------------------------------------+
| pe-error-series-max | -1 | .. index:: |
| | | pair: cluster option; pe-error-series-max |
| | | |
| | | The number of scheduler inputs resulting in errors |
| | | to save. Used when reporting problems. A value of |
- | | | -1 means unlimited (report all). |
+ | | | -1 means unlimited (report all), and 0 means none. |
+---------------------------+---------+----------------------------------------------------+
- | pe-warn-series-max | -1 | .. index:: |
+ | pe-warn-series-max | 5000 | .. index:: |
| | | pair: cluster option; pe-warn-series-max |
| | | |
| | | The number of scheduler inputs resulting in |
| | | warnings to save. Used when reporting problems. A |
- | | | value of -1 means unlimited (report all). |
+ | | | value of -1 means unlimited (report all), and 0 |
+ | | | means none. |
+---------------------------+---------+----------------------------------------------------+
- | pe-input-series-max | -1 | .. index:: |
+ | pe-input-series-max | 4000 | .. index:: |
| | | pair: cluster option; pe-input-series-max |
| | | |
| | | The number of "normal" scheduler inputs to save. |
| | | Used when reporting problems. A value of -1 means |
- | | | unlimited (report all). |
+ | | | unlimited (report all), and 0 means none. |
+---------------------------+---------+----------------------------------------------------+
| enable-acl | false | .. index:: |
| | | pair: cluster option; enable-acl |
| | | |
| | | Whether :ref:`acl` should be used to authorize |
| | | modifications to the CIB |
+---------------------------+---------+----------------------------------------------------+
| placement-strategy | default | .. index:: |
| | | pair: cluster option; placement-strategy |
| | | |
| | | How the cluster should allocate resources to nodes |
| | | (see :ref:`utilization`). Allowed values are |
| | | ``default``, ``utilization``, ``balanced``, and |
| | | ``minimal``. |
+---------------------------+---------+----------------------------------------------------+
| node-health-strategy | none | .. index:: |
| | | pair: cluster option; node-health-strategy |
| | | |
| | | How the cluster should react to node health |
| | | attributes (see :ref:`node-health`). Allowed values|
| | | are ``none``, ``migrate-on-red``, ``only-green``, |
| | | ``progressive``, and ``custom``. |
+---------------------------+---------+----------------------------------------------------+
| node-health-base | 0 | .. index:: |
| | | pair: cluster option; node-health-base |
| | | |
| | | The base health score assigned to a node. Only |
| | | used when ``node-health-strategy`` is |
| | | ``progressive``. |
+---------------------------+---------+----------------------------------------------------+
| node-health-green | 0 | .. index:: |
| | | pair: cluster option; node-health-green |
| | | |
| | | The score to use for a node health attribute whose |
| | | value is ``green``. Only used when |
| | | ``node-health-strategy`` is ``progressive`` or |
| | | ``custom``. |
+---------------------------+---------+----------------------------------------------------+
| node-health-yellow | 0 | .. index:: |
| | | pair: cluster option; node-health-yellow |
| | | |
| | | The score to use for a node health attribute whose |
| | | value is ``yellow``. Only used when |
| | | ``node-health-strategy`` is ``progressive`` or |
| | | ``custom``. |
+---------------------------+---------+----------------------------------------------------+
| node-health-red | 0 | .. index:: |
| | | pair: cluster option; node-health-red |
| | | |
| | | The score to use for a node health attribute whose |
| | | value is ``red``. Only used when |
| | | ``node-health-strategy`` is ``progressive`` or |
| | | ``custom``. |
+---------------------------+---------+----------------------------------------------------+
| cluster-recheck-interval | 15min | .. index:: |
| | | pair: cluster option; cluster-recheck-interval |
| | | |
| | | Pacemaker is primarily event-driven, and looks |
| | | ahead to know when to recheck the cluster for |
| | | failure timeouts and most time-based rules |
| | | *(since 2.0.3)*. However, it will also recheck the |
| | | cluster after this amount of inactivity. This has |
| | | two goals: rules with ``date_spec`` are only |
| | | guaranteed to be checked this often, and it also |
| | | serves as a fail-safe for some kinds of scheduler |
| | | bugs. A value of 0 disables this polling; positive |
| | | values are a time interval. |
+---------------------------+---------+----------------------------------------------------+
| shutdown-lock | false | .. index:: |
| | | pair: cluster option; shutdown-lock |
| | | |
| | | The default of false allows active resources to be |
| | | recovered elsewhere when their node is cleanly |
| | | shut down, which is what the vast majority of |
| | | users will want. However, some users prefer to |
| | | make resources highly available only for failures, |
| | | with no recovery for clean shutdowns. If this |
| | | option is true, resources active on a node when it |
| | | is cleanly shut down are kept "locked" to that |
| | | node (not allowed to run elsewhere) until they |
| | | start again on that node after it rejoins (or for |
| | | at most ``shutdown-lock-limit``, if set). Stonith |
| | | resources and Pacemaker Remote connections are |
| | | never locked. Clone and bundle instances and the |
| | | promoted role of promotable clones are currently |
| | | never locked, though support could be added in a |
| | | future release. Locks may be manually cleared |
| | | using the ``--refresh`` option of ``crm_resource`` |
| | | (both the resource and node must be specified; |
| | | this works with remote nodes if their connection |
| | | resource's ``target-role`` is set to ``Stopped``, |
| | | but not if Pacemaker Remote is stopped on the |
| | | remote node without disabling the connection |
| | | resource). *(since 2.0.4)* |
+---------------------------+---------+----------------------------------------------------+
| shutdown-lock-limit | 0 | .. index:: |
| | | pair: cluster option; shutdown-lock-limit |
| | | |
| | | If ``shutdown-lock`` is true, and this is set to a |
| | | nonzero time duration, locked resources will be |
| | | allowed to start after this much time has passed |
| | | since the node shutdown was initiated, even if the |
| | | node has not rejoined. (This works with remote |
| | | nodes only if their connection resource's |
| | | ``target-role`` is set to ``Stopped``.) |
| | | *(since 2.0.4)* |
+---------------------------+---------+----------------------------------------------------+
| remove-after-stop | false | .. index:: |
| | | pair: cluster option; remove-after-stop |
| | | |
| | | *Deprecated* Should the cluster remove |
| | | resources from Pacemaker's executor after they are |
| | | stopped? Values other than the default are, at |
| | | best, poorly tested and potentially dangerous. |
| | | This option is deprecated and will be removed in a |
| | | future release. |
+---------------------------+---------+----------------------------------------------------+
| startup-fencing | true | .. index:: |
| | | pair: cluster option; startup-fencing |
| | | |
| | | *Advanced Use Only:* Should the cluster fence |
| | | unseen nodes at start-up? Setting this to false is |
| | | unsafe, because the unseen nodes could be active |
| | | and running resources but unreachable. |
+---------------------------+---------+----------------------------------------------------+
| election-timeout | 2min | .. index:: |
| | | pair: cluster option; election-timeout |
| | | |
| | | *Advanced Use Only:* If you need to adjust this |
| | | value, it probably indicates the presence of a bug.|
+---------------------------+---------+----------------------------------------------------+
| shutdown-escalation | 20min | .. index:: |
| | | pair: cluster option; shutdown-escalation |
| | | |
| | | *Advanced Use Only:* If you need to adjust this |
| | | value, it probably indicates the presence of a bug.|
+---------------------------+---------+----------------------------------------------------+
| join-integration-timeout | 3min | .. index:: |
| | | pair: cluster option; join-integration-timeout |
| | | |
| | | *Advanced Use Only:* If you need to adjust this |
| | | value, it probably indicates the presence of a bug.|
+---------------------------+---------+----------------------------------------------------+
| join-finalization-timeout | 30min | .. index:: |
| | | pair: cluster option; join-finalization-timeout |
| | | |
| | | *Advanced Use Only:* If you need to adjust this |
| | | value, it probably indicates the presence of a bug.|
+---------------------------+---------+----------------------------------------------------+
| transition-delay | 0s | .. index:: |
| | | pair: cluster option; transition-delay |
| | | |
| | | *Advanced Use Only:* Delay cluster recovery for |
| | | the configured interval to allow for additional or |
| | | related events to occur. This can be useful if |
| | | your configuration is sensitive to the order in |
| | | which ping updates arrive. Enabling this option |
| | | will slow down cluster recovery under all |
| | | conditions. |
+---------------------------+---------+----------------------------------------------------+
diff --git a/doc/sphinx/Pacemaker_Explained/resources.rst b/doc/sphinx/Pacemaker_Explained/resources.rst
index 003d4ebfb0..773188c7cc 100644
--- a/doc/sphinx/Pacemaker_Explained/resources.rst
+++ b/doc/sphinx/Pacemaker_Explained/resources.rst
@@ -1,1036 +1,1036 @@
.. _resource:
Cluster Resources
-----------------
.. _s-resource-primitive:
What is a Cluster Resource?
###########################
.. index::
single: resource
A resource is a service made highly available by a cluster.
The simplest type of resource, a *primitive* resource, is described
in this chapter. More complex forms, such as groups and clones,
are described in later chapters.
Every primitive resource has a *resource agent*. A resource agent is an
external program that abstracts the service it provides and present a
consistent view to the cluster.
This allows the cluster to be agnostic about the resources it manages.
The cluster doesn't need to understand how the resource works because
it relies on the resource agent to do the right thing when given a
**start**, **stop** or **monitor** command. For this reason, it is crucial
that resource agents are well-tested.
Typically, resource agents come in the form of shell scripts. However,
they can be written using any technology (such as C, Python or Perl)
that the author is comfortable with.
.. _s-resource-supported:
.. index::
single: resource; class
Resource Classes
################
Pacemaker supports several classes of agents:
* OCF
* LSB
* Systemd
* Upstart (deprecated)
* Service
* Fencing
* Nagios Plugins
.. index::
single: resource; OCF
single: OCF; resources
single: Open Cluster Framework; resources
Open Cluster Framework
______________________
The OCF standard [#]_ is basically an extension of the Linux Standard
Base conventions for init scripts to:
* support parameters,
* make them self-describing, and
* make them extensible
OCF specs have strict definitions of the exit codes that actions must return [#]_.
The cluster follows these specifications exactly, and giving the wrong
exit code will cause the cluster to behave in ways you will likely
find puzzling and annoying. In particular, the cluster needs to
distinguish a completely stopped resource from one which is in some
erroneous and indeterminate state.
Parameters are passed to the resource agent as environment variables, with the
special prefix ``OCF_RESKEY_``. So, a parameter which the user thinks
of as ``ip`` will be passed to the resource agent as ``OCF_RESKEY_ip``. The
number and purpose of the parameters is left to the resource agent; however,
the resource agent should use the **meta-data** command to advertise any that it
supports.
The OCF class is the most preferred as it is an industry standard,
highly flexible (allowing parameters to be passed to agents in a
non-positional manner) and self-describing.
For more information, see the
`reference `_ and
the *Resource Agents* chapter of *Pacemaker Administration*.
.. index::
single: resource; LSB
single: LSB; resources
single: Linux Standard Base; resources
Linux Standard Base
___________________
*LSB* resource agents are more commonly known as *init scripts*. If a full path
is not given, they are assumed to be located in ``/etc/init.d``.
Commonly, they are provided by the OS distribution. In order to be used
with a Pacemaker cluster, they must conform to the LSB specification [#]_.
.. warning::
Many distributions or particular software packages claim LSB compliance
but ship with broken init scripts. For details on how to check whether
your init script is LSB-compatible, see the `Resource Agents` chapter of
`Pacemaker Administration`. Common problematic violations of the LSB
standard include:
* Not implementing the ``status`` operation at all
* Not observing the correct exit status codes for
``start``/``stop``/``status`` actions
* Starting a started resource returns an error
* Stopping a stopped resource returns an error
.. important::
Remember to make sure the computer is `not` configured to start any
services at boot time -- that should be controlled by the cluster.
.. _s-resource-supported-systemd:
.. index::
single: Resource; Systemd
single: Systemd; resources
Systemd
_______
Most Linux distributions have replaced the old
`SysV `_ style of
initialization daemons and scripts with
`Systemd `_.
Pacemaker is able to manage these services `if they are present`.
Instead of init scripts, systemd has `unit files`. Generally, the
services (unit files) are provided by the OS distribution, but there
are online guides for converting from init scripts [#]_.
.. important::
Remember to make sure the computer is `not` configured to start any
services at boot time -- that should be controlled by the cluster.
.. index::
single: Resource; Upstart
single: Upstart; resources
Upstart
_______
Some distributions replaced the old
`SysV `_ style of
initialization daemons (and scripts) with
`Upstart `_.
Pacemaker is able to manage these services `if they are present`.
Instead of init scripts, Upstart has `jobs`. Generally, the
services (jobs) are provided by the OS distribution.
.. important::
Remember to make sure the computer is `not` configured to start any
services at boot time -- that should be controlled by the cluster.
.. warning::
Upstart support is deprecated in Pacemaker. Upstart is no longer an actively
maintained project, and test platforms for it are no longer readily usable.
Support will likely be dropped entirely at the next major release of
Pacemaker.
.. index::
single: Resource; System Services
single: System Service; resources
System Services
_______________
Since there are various types of system services (``systemd``,
``upstart``, and ``lsb``), Pacemaker supports a special ``service`` alias which
intelligently figures out which one applies to a given cluster node.
This is particularly useful when the cluster contains a mix of
``systemd``, ``upstart``, and ``lsb``.
In order, Pacemaker will try to find the named service as:
* an LSB init script
* a Systemd unit file
* an Upstart job
.. index::
single: Resource; STONITH
single: STONITH; resources
STONITH
_______
The STONITH class is used exclusively for fencing-related resources. This is
discussed later in :ref:`fencing`.
.. index::
single: Resource; Nagios Plugins
single: Nagios Plugins; resources
Nagios Plugins
______________
Nagios Plugins [#]_ allow us to monitor services on remote hosts.
Pacemaker is able to do remote monitoring with the plugins `if they are
present`.
A common use case is to configure them as resources belonging to a resource
container (usually a virtual machine), and the container will be restarted
if any of them has failed. Another use is to configure them as ordinary
resources to be used for monitoring hosts or services via the network.
The supported parameters are same as the long options of the plugin.
.. _primitive-resource:
Resource Properties
###################
These values tell the cluster which resource agent to use for the resource,
where to find that resource agent and what standards it conforms to.
.. table:: **Properties of a Primitive Resource**
+----------+------------------------------------------------------------------+
| Field | Description |
+==========+==================================================================+
| id | .. index:: |
| | single: id; resource |
| | single: resource; property, id |
| | |
| | Your name for the resource |
+----------+------------------------------------------------------------------+
| class | .. index:: |
| | single: class; resource |
| | single: resource; property, class |
| | |
| | The standard the resource agent conforms to. Allowed values: |
| | ``lsb``, ``nagios``, ``ocf``, ``service``, ``stonith``, |
| | ``systemd``, ``upstart`` |
+----------+------------------------------------------------------------------+
| type | .. index:: |
| | single: type; resource |
| | single: resource; property, type |
| | |
| | The name of the Resource Agent you wish to use. E.g. |
| | ``IPaddr`` or ``Filesystem`` |
+----------+------------------------------------------------------------------+
| provider | .. index:: |
| | single: provider; resource |
| | single: resource; property, provider |
| | |
| | The OCF spec allows multiple vendors to supply the same resource |
| | agent. To use the OCF resource agents supplied by the Heartbeat |
| | project, you would specify ``heartbeat`` here. |
+----------+------------------------------------------------------------------+
The XML definition of a resource can be queried with the **crm_resource** tool.
For example:
.. code-block:: none
# crm_resource --resource Email --query-xml
might produce:
.. topic:: A system resource definition
.. code-block:: xml
.. note::
One of the main drawbacks to system services (LSB, systemd or
Upstart) resources is that they do not allow any parameters!
.. topic:: An OCF resource definition
.. code-block:: xml
.. _resource_options:
Resource Options
################
Resources have two types of options: *meta-attributes* and *instance attributes*.
Meta-attributes apply to any type of resource, while instance attributes
are specific to each resource agent.
Resource Meta-Attributes
________________________
Meta-attributes are used by the cluster to decide how a resource should
behave and can be easily set using the ``--meta`` option of the
**crm_resource** command.
.. table:: **Meta-attributes of a Primitive Resource**
+----------------------------+----------------------------------+------------------------------------------------------+
| Field | Default | Description |
+============================+==================================+======================================================+
| priority | 0 | .. index:: |
| | | single: priority; resource option |
| | | single: resource; option, priority |
| | | |
| | | If not all resources can be active, the cluster |
| | | will stop lower priority resources in order to |
| | | keep higher priority ones active. |
+----------------------------+----------------------------------+------------------------------------------------------+
| critical | true | .. index:: |
| | | single: critical; resource option |
| | | single: resource; option, critical |
| | | |
| | | Use this value as the default for ``influence`` in |
| | | all :ref:`colocation constraints |
| | | ` involving this resource, |
| | | as well as the implicit colocation constraints |
| | | created if this resource is in a :ref:`group |
| | | `. For details, see |
- | | | :ref:`s-coloc-influence`. |
+ | | | :ref:`s-coloc-influence`. *(since 2.1.0)* |
+----------------------------+----------------------------------+------------------------------------------------------+
| target-role | Started | .. index:: |
| | | single: target-role; resource option |
| | | single: resource; option, target-role |
| | | |
| | | What state should the cluster attempt to keep this |
| | | resource in? Allowed values: |
| | | |
| | | * ``Stopped:`` Force the resource to be stopped |
| | | * ``Started:`` Allow the resource to be started |
| | | (and in the case of :ref:`promotable clone |
| | | resources `, promoted |
| | | if appropriate) |
| | | * ``Unpromoted:`` Allow the resource to be started, |
| | | but only in the unpromoted role if the resource is |
| | | :ref:`promotable ` |
| | | * ``Promoted:`` Equivalent to ``Started`` |
+----------------------------+----------------------------------+------------------------------------------------------+
| is-managed | TRUE | .. index:: |
| | | single: is-managed; resource option |
| | | single: resource; option, is-managed |
| | | |
| | | Is the cluster allowed to start and stop |
| | | the resource? Allowed values: ``true``, ``false`` |
+----------------------------+----------------------------------+------------------------------------------------------+
| maintenance | FALSE | .. index:: |
| | | single: maintenance; resource option |
| | | single: resource; option, maintenance |
| | | |
| | | Similar to the ``maintenance-mode`` |
| | | :ref:`cluster option `, but for |
| | | a single resource. If true, the resource will not |
| | | be started, stopped, or monitored on any node. This |
| | | differs from ``is-managed`` in that monitors will |
| | | not be run. Allowed values: ``true``, ``false`` |
+----------------------------+----------------------------------+------------------------------------------------------+
| resource-stickiness | 1 for individual clone | .. _resource-stickiness: |
| | instances, 0 for all | |
| | other resources | .. index:: |
| | | single: resource-stickiness; resource option |
| | | single: resource; option, resource-stickiness |
| | | |
| | | A score that will be added to the current node when |
| | | a resource is already active. This allows running |
| | | resources to stay where they are, even if they |
| | | would be placed elsewhere if they were being |
| | | started from a stopped state. |
+----------------------------+----------------------------------+------------------------------------------------------+
| requires | ``quorum`` for resources | .. _requires: |
| | with a ``class`` of ``stonith``, | |
| | otherwise ``unfencing`` if | .. index:: |
| | unfencing is active in the | single: requires; resource option |
| | cluster, otherwise ``fencing`` | single: resource; option, requires |
| | if ``stonith-enabled`` is true, | |
| | otherwise ``quorum`` | Conditions under which the resource can be |
| | | started. Allowed values: |
| | | |
| | | * ``nothing:`` can always be started |
| | | * ``quorum:`` The cluster can only start this |
| | | resource if a majority of the configured nodes |
| | | are active |
| | | * ``fencing:`` The cluster can only start this |
| | | resource if a majority of the configured nodes |
| | | are active *and* any failed or unknown nodes |
| | | have been :ref:`fenced ` |
| | | * ``unfencing:`` The cluster can only start this |
| | | resource if a majority of the configured nodes |
| | | are active *and* any failed or unknown nodes have |
| | | been fenced *and* only on nodes that have been |
| | | :ref:`unfenced ` |
+----------------------------+----------------------------------+------------------------------------------------------+
| migration-threshold | INFINITY | .. index:: |
| | | single: migration-threshold; resource option |
| | | single: resource; option, migration-threshold |
| | | |
| | | How many failures may occur for this resource on |
| | | a node, before this node is marked ineligible to |
| | | host this resource. A value of 0 indicates that this |
| | | feature is disabled (the node will never be marked |
| | | ineligible); by constrast, the cluster treats |
| | | INFINITY (the default) as a very large but finite |
| | | number. This option has an effect only if the |
| | | failed operation specifies ``on-fail`` as |
| | | ``restart`` (the default), and additionally for |
| | | failed ``start`` operations, if the cluster |
| | | property ``start-failure-is-fatal`` is ``false``. |
+----------------------------+----------------------------------+------------------------------------------------------+
| failure-timeout | 0 | .. index:: |
| | | single: failure-timeout; resource option |
| | | single: resource; option, failure-timeout |
| | | |
| | | How many seconds to wait before acting as if the |
| | | failure had not occurred, and potentially allowing |
| | | the resource back to the node on which it failed. |
| | | A value of 0 indicates that this feature is |
| | | disabled. |
+----------------------------+----------------------------------+------------------------------------------------------+
| multiple-active | stop_start | .. index:: |
| | | single: multiple-active; resource option |
| | | single: resource; option, multiple-active |
| | | |
| | | What should the cluster do if it ever finds the |
| | | resource active on more than one node? Allowed |
| | | values: |
| | | |
| | | * ``block``: mark the resource as unmanaged |
| | | * ``stop_only``: stop all active instances and |
| | | leave them that way |
| | | * ``stop_start``: stop all active instances and |
| | | start the resource in one location only |
+----------------------------+----------------------------------+------------------------------------------------------+
| allow-migrate | TRUE for ocf:pacemaker:remote | Whether the cluster should try to "live migrate" |
| | resources, FALSE otherwise | this resource when it needs to be moved (see |
| | | :ref:`live-migration`) |
+----------------------------+----------------------------------+------------------------------------------------------+
| container-attribute-target | | Specific to bundle resources; see |
| | | :ref:`s-bundle-attributes` |
+----------------------------+----------------------------------+------------------------------------------------------+
| remote-node | | The name of the Pacemaker Remote guest node this |
| | | resource is associated with, if any. If |
| | | specified, this both enables the resource as a |
| | | guest node and defines the unique name used to |
| | | identify the guest node. The guest must be |
| | | configured to run the Pacemaker Remote daemon |
| | | when it is started. **WARNING:** This value |
| | | cannot overlap with any resource or node IDs. |
+----------------------------+----------------------------------+------------------------------------------------------+
| remote-port | 3121 | If ``remote-node`` is specified, the port on the |
| | | guest used for its Pacemaker Remote connection. |
| | | The Pacemaker Remote daemon on the guest must |
| | | be configured to listen on this port. |
+----------------------------+----------------------------------+------------------------------------------------------+
| remote-addr | value of ``remote-node`` | If ``remote-node`` is specified, the IP |
| | | address or hostname used to connect to the |
| | | guest via Pacemaker Remote. The Pacemaker Remote |
| | | daemon on the guest must be configured to accept |
| | | connections on this address. |
+----------------------------+----------------------------------+------------------------------------------------------+
| remote-connect-timeout | 60s | If ``remote-node`` is specified, how long before |
| | | a pending guest connection will time out. |
+----------------------------+----------------------------------+------------------------------------------------------+
As an example of setting resource options, if you performed the following
commands on an LSB Email resource:
.. code-block:: none
# crm_resource --meta --resource Email --set-parameter priority --parameter-value 100
# crm_resource -m -r Email -p multiple-active -v block
the resulting resource definition might be:
.. topic:: An LSB resource with cluster options
.. code-block:: xml
In addition to the cluster-defined meta-attributes described above, you may
also configure arbitrary meta-attributes of your own choosing. Most commonly,
this would be done for use in :ref:`rules `. For example, an IT department
might define a custom meta-attribute to indicate which company department each
resource is intended for. To reduce the chance of name collisions with
cluster-defined meta-attributes added in the future, it is recommended to use
a unique, organization-specific prefix for such attributes.
.. _s-resource-defaults:
Setting Global Defaults for Resource Meta-Attributes
____________________________________________________
To set a default value for a resource option, add it to the
``rsc_defaults`` section with ``crm_attribute``. For example,
.. code-block:: none
# crm_attribute --type rsc_defaults --name is-managed --update false
would prevent the cluster from starting or stopping any of the
resources in the configuration (unless of course the individual
resources were specifically enabled by having their ``is-managed`` set to
``true``).
Resource Instance Attributes
____________________________
The resource agents of some resource classes (lsb, systemd and upstart *not* among them)
can be given parameters which determine how they behave and which instance
of a service they control.
If your resource agent supports parameters, you can add them with the
``crm_resource`` command. For example,
.. code-block:: none
# crm_resource --resource Public-IP --set-parameter ip --parameter-value 192.0.2.2
would create an entry in the resource like this:
.. topic:: An example OCF resource with instance attributes
.. code-block:: xml
For an OCF resource, the result would be an environment variable
called ``OCF_RESKEY_ip`` with a value of ``192.0.2.2``.
The list of instance attributes supported by an OCF resource agent can be
found by calling the resource agent with the ``meta-data`` command.
The output contains an XML description of all the supported
attributes, their purpose and default values.
.. topic:: Displaying the metadata for the Dummy resource agent template
.. code-block:: none
# export OCF_ROOT=/usr/lib/ocf
# $OCF_ROOT/resource.d/pacemaker/Dummy meta-data
.. code-block:: xml
1.1
This is a dummy OCF resource agent. It does absolutely nothing except keep track
of whether it is running or not, and can be configured so that actions fail or
take a long time. Its purpose is primarily for testing, and to serve as a
template for resource agent writers.
Example stateless resource agent
Location to store the resource state in.
State file
Fake password field
Password
Fake attribute that can be changed to cause a reload
Fake attribute that can be changed to cause a reload
Number of seconds to sleep during operations. This can be used to test how
the cluster reacts to operation timeouts.
Operation sleep duration in seconds.
Start, migrate_from, and reload-agent actions will return failure if running on
the host specified here, but the resource will run successfully anyway (future
monitor calls will find it running). This can be used to test on-fail=ignore.
Report bogus start failure on specified host
If this is set, the environment will be dumped to this file for every call.
Environment dump file
.. index::
single: resource; action
single: resource; operation
.. _operation:
Resource Operations
###################
*Operations* are actions the cluster can perform on a resource by calling the
resource agent. Resource agents must support certain common operations such as
start, stop, and monitor, and may implement any others.
Operations may be explicitly configured for two purposes: to override defaults
for options (such as timeout) that the cluster will use whenever it initiates
the operation, and to run an operation on a recurring basis (for example, to
monitor the resource for failure).
.. topic:: An OCF resource with a non-default start timeout
.. code-block:: xml
Pacemaker identifies operations by a combination of name and interval, so this
combination must be unique for each resource. That is, you should not configure
two operations for the same resource with the same name and interval.
.. _operation_properties:
Operation Properties
____________________
Operation properties may be specified directly in the ``op`` element as
XML attributes, or in a separate ``meta_attributes`` block as ``nvpair`` elements.
XML attributes take precedence over ``nvpair`` elements if both are specified.
.. table:: **Properties of an Operation**
+----------------+-----------------------------------+-----------------------------------------------------+
| Field | Default | Description |
+================+===================================+=====================================================+
| id | | .. index:: |
| | | single: id; action property |
| | | single: action; property, id |
| | | |
| | | A unique name for the operation. |
+----------------+-----------------------------------+-----------------------------------------------------+
| name | | .. index:: |
| | | single: name; action property |
| | | single: action; property, name |
| | | |
| | | The action to perform. This can be any action |
| | | supported by the agent; common values include |
| | | ``monitor``, ``start``, and ``stop``. |
+----------------+-----------------------------------+-----------------------------------------------------+
| interval | 0 | .. index:: |
| | | single: interval; action property |
| | | single: action; property, interval |
| | | |
| | | How frequently (in seconds) to perform the |
| | | operation. A value of 0 means "when needed". |
| | | A positive value defines a *recurring action*, |
| | | which is typically used with |
| | | :ref:`monitor `. |
+----------------+-----------------------------------+-----------------------------------------------------+
| timeout | | .. index:: |
| | | single: timeout; action property |
| | | single: action; property, timeout |
| | | |
| | | How long to wait before declaring the action |
| | | has failed |
+----------------+-----------------------------------+-----------------------------------------------------+
| on-fail | Varies by action: | .. index:: |
| | | single: on-fail; action property |
| | * ``stop``: ``fence`` if | single: action; property, on-fail |
| | ``stonith-enabled`` is true | |
| | or ``block`` otherwise | The action to take if this action ever fails. |
| | * ``demote``: ``on-fail`` of the | Allowed values: |
| | ``monitor`` action with | |
| | ``role`` set to ``Promoted``, | * ``ignore:`` Pretend the resource did not fail. |
| | if present, enabled, and | * ``block:`` Don't perform any further operations |
| | configured to a value other | on the resource. |
| | than ``demote``, or ``restart`` | * ``stop:`` Stop the resource and do not start |
| | otherwise | it elsewhere. |
| | * all other actions: ``restart`` | * ``demote:`` Demote the resource, without a |
| | | full restart. This is valid only for ``promote`` |
| | | actions, and for ``monitor`` actions with both |
| | | a nonzero ``interval`` and ``role`` set to |
| | | ``Promoted``; for any other action, a |
| | | configuration error will be logged, and the |
| | | default behavior will be used. *(since 2.0.5)* |
| | | * ``restart:`` Stop the resource and start it |
| | | again (possibly on a different node). |
| | | * ``fence:`` STONITH the node on which the |
| | | resource failed. |
| | | * ``standby:`` Move *all* resources away from the |
| | | node on which the resource failed. |
+----------------+-----------------------------------+-----------------------------------------------------+
| enabled | TRUE | .. index:: |
| | | single: enabled; action property |
| | | single: action; property, enabled |
| | | |
| | | If ``false``, ignore this operation definition. |
| | | This is typically used to pause a particular |
| | | recurring ``monitor`` operation; for instance, it |
| | | can complement the respective resource being |
| | | unmanaged (``is-managed=false``), as this alone |
| | | will :ref:`not block any configured monitoring |
| | | `. Disabling the operation |
| | | does not suppress all actions of the given type. |
| | | Allowed values: ``true``, ``false``. |
+----------------+-----------------------------------+-----------------------------------------------------+
| record-pending | TRUE | .. index:: |
| | | single: record-pending; action property |
| | | single: action; property, record-pending |
| | | |
| | | If ``true``, the intention to perform the operation |
| | | is recorded so that GUIs and CLI tools can indicate |
| | | that an operation is in progress. This is best set |
| | | as an *operation default* |
| | | (see :ref:`s-operation-defaults`). Allowed values: |
| | | ``true``, ``false``. |
+----------------+-----------------------------------+-----------------------------------------------------+
| role | | .. index:: |
| | | single: role; action property |
| | | single: action; property, role |
| | | |
| | | Run the operation only on node(s) that the cluster |
| | | thinks should be in the specified role. This only |
| | | makes sense for recurring ``monitor`` operations. |
| | | Allowed (case-sensitive) values: ``Stopped``, |
| | | ``Started``, and in the case of :ref:`promotable |
| | | clone resources `, |
| | | ``Unpromoted`` and ``Promoted``. |
+----------------+-----------------------------------+-----------------------------------------------------+
.. note::
When ``on-fail`` is set to ``demote``, recovery from failure by a successful
demote causes the cluster to recalculate whether and where a new instance
should be promoted. The node with the failure is eligible, so if promotion
scores have not changed, it will be promoted again.
There is no direct equivalent of ``migration-threshold`` for the promoted
role, but the same effect can be achieved with a location constraint using a
:ref:`rule ` with a node attribute expression for the resource's fail
count.
For example, to immediately ban the promoted role from a node with any
failed promote or promoted instance monitor:
.. code-block:: xml
This example assumes that there is a promotable clone of the ``my_primitive``
resource (note that the primitive name, not the clone name, is used in the
rule), and that there is a recurring 10-second-interval monitor configured for
the promoted role (fail count attributes specify the interval in
milliseconds).
.. _s-resource-monitoring:
Monitoring Resources for Failure
________________________________
When Pacemaker first starts a resource, it runs one-time ``monitor`` operations
(referred to as *probes*) to ensure the resource is running where it's
supposed to be, and not running where it's not supposed to be. (This behavior
can be affected by the ``resource-discovery`` location constraint property.)
Other than those initial probes, Pacemaker will *not* (by default) check that
the resource continues to stay healthy [#]_. You must configure ``monitor``
operations explicitly to perform these checks.
.. topic:: An OCF resource with a recurring health check
.. code-block:: xml
By default, a ``monitor`` operation will ensure that the resource is running
where it is supposed to. The ``target-role`` property can be used for further
checking.
For example, if a resource has one ``monitor`` operation with
``interval=10 role=Started`` and a second ``monitor`` operation with
``interval=11 role=Stopped``, the cluster will run the first monitor on any nodes
it thinks *should* be running the resource, and the second monitor on any nodes
that it thinks *should not* be running the resource (for the truly paranoid,
who want to know when an administrator manually starts a service by mistake).
.. note::
Currently, monitors with ``role=Stopped`` are not implemented for
:ref:`clone ` resources.
.. _s-monitoring-unmanaged:
Monitoring Resources When Administration is Disabled
____________________________________________________
Recurring ``monitor`` operations behave differently under various administrative
settings:
* When a resource is unmanaged (by setting ``is-managed=false``): No monitors
will be stopped.
If the unmanaged resource is stopped on a node where the cluster thinks it
should be running, the cluster will detect and report that it is not, but it
will not consider the monitor failed, and will not try to start the resource
until it is managed again.
Starting the unmanaged resource on a different node is strongly discouraged
and will at least cause the cluster to consider the resource failed, and
may require the resource's ``target-role`` to be set to ``Stopped`` then
``Started`` to be recovered.
* When a node is put into standby: All resources will be moved away from the
node, and all ``monitor`` operations will be stopped on the node, except those
specifying ``role`` as ``Stopped`` (which will be newly initiated if
appropriate).
* When the cluster is put into maintenance mode: All resources will be marked
as unmanaged. All monitor operations will be stopped, except those
specifying ``role`` as ``Stopped`` (which will be newly initiated if
appropriate). As with single unmanaged resources, starting
a resource on a node other than where the cluster expects it to be will
cause problems.
.. _s-operation-defaults:
Setting Global Defaults for Operations
______________________________________
You can change the global default values for operation properties
in a given cluster. These are defined in an ``op_defaults`` section
of the CIB's ``configuration`` section, and can be set with
``crm_attribute``. For example,
.. code-block:: none
# crm_attribute --type op_defaults --name timeout --update 20s
would default each operation's ``timeout`` to 20 seconds. If an
operation's definition also includes a value for ``timeout``, then that
value would be used for that operation instead.
When Implicit Operations Take a Long Time
_________________________________________
The cluster will always perform a number of implicit operations: ``start``,
``stop`` and a non-recurring ``monitor`` operation used at startup to check
whether the resource is already active. If one of these is taking too long,
then you can create an entry for them and specify a longer timeout.
.. topic:: An OCF resource with custom timeouts for its implicit actions
.. code-block:: xml
Multiple Monitor Operations
___________________________
Provided no two operations (for a single resource) have the same name
and interval, you can have as many ``monitor`` operations as you like.
In this way, you can do a superficial health check every minute and
progressively more intense ones at higher intervals.
To tell the resource agent what kind of check to perform, you need to
provide each monitor with a different value for a common parameter.
The OCF standard creates a special parameter called ``OCF_CHECK_LEVEL``
for this purpose and dictates that it is "made available to the
resource agent without the normal ``OCF_RESKEY`` prefix".
Whatever name you choose, you can specify it by adding an
``instance_attributes`` block to the ``op`` tag. It is up to each
resource agent to look for the parameter and decide how to use it.
.. topic:: An OCF resource with two recurring health checks, performing
different levels of checks specified via ``OCF_CHECK_LEVEL``.
.. code-block:: xml
Disabling a Monitor Operation
_____________________________
The easiest way to stop a recurring monitor is to just delete it.
However, there can be times when you only want to disable it
temporarily. In such cases, simply add ``enabled=false`` to the
operation's definition.
.. topic:: Example of an OCF resource with a disabled health check
.. code-block:: xml
This can be achieved from the command line by executing:
.. code-block:: none
# cibadmin --modify --xml-text ''
Once you've done whatever you needed to do, you can then re-enable it with
.. code-block:: none
# cibadmin --modify --xml-text ''
.. [#] See https://github.com/ClusterLabs/OCF-spec/tree/master/ra. The
Pacemaker implementation has been somewhat extended from the OCF specs.
.. [#] The resource-agents source code includes the **ocf-tester** script,
which can be useful in this regard.
.. [#] See http://refspecs.linux-foundation.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
for the LSB Spec as it relates to init scripts.
.. [#] For example, http://0pointer.de/blog/projects/systemd-for-admins-3.html
.. [#] The project has two independent forks, hosted at
https://www.nagios-plugins.org/ and https://www.monitoring-plugins.org/. Output
from both projects' plugins is similar, so plugins from either project can be
used with pacemaker.
.. [#] Currently, anyway. Automatic monitoring operations may be added in a future
version of Pacemaker.
diff --git a/doc/sphinx/shared/pacemaker-intro.rst b/doc/sphinx/shared/pacemaker-intro.rst
index 3473636843..49ad93361b 100644
--- a/doc/sphinx/shared/pacemaker-intro.rst
+++ b/doc/sphinx/shared/pacemaker-intro.rst
@@ -1,196 +1,196 @@
What Is Pacemaker?
####################
Pacemaker is a high-availability *cluster resource manager* -- software that
runs on a set of hosts (a *cluster* of *nodes*) in order to preserve integrity
and minimize downtime of desired services (*resources*). [#]_ It is maintained
by the `ClusterLabs `_ community.
Pacemaker's key features include:
* Detection of and recovery from node- and service-level failures
* Ability to ensure data integrity by fencing faulty nodes
* Support for one or more nodes per cluster
* Support for multiple resource interface standards (anything that can be
scripted can be clustered)
* Support (but no requirement) for shared storage
* Support for practically any redundancy configuration (active/passive, N+1,
etc.)
* Automatically replicated configuration that can be updated from any node
* Ability to specify cluster-wide relationships between services,
such as ordering, colocation and anti-colocation
* Support for advanced service types, such as *clones* (services that need to
be active on multiple nodes), *promotable clones* (clones that can run in
one of two roles), and containerized services
* Unified, scriptable cluster management tools
.. note:: **Fencing**
*Fencing*, also known as *STONITH* (an acronym for Shoot The Other Node In
The Head), is the ability to ensure that it is not possible for a node to be
running a service. This is accomplished via *fence devices* such as
intelligent power switches that cut power to the target, or intelligent
network switches that cut the target's access to the local network.
Pacemaker represents fence devices as a special class of resource.
A cluster cannot safely recover from certain failure conditions, such as an
unresponsive node, without fencing.
Cluster Architecture
____________________
At a high level, a cluster can be viewed as having these parts (which together
are often referred to as the *cluster stack*):
* **Resources:** These are the reason for the cluster's being -- the services
that need to be kept highly available.
* **Resource agents:** These are scripts or operating system components that
start, stop, and monitor resources, given a set of resource parameters.
These provide a uniform interface between Pacemaker and the managed
services.
* **Fence agents:** These are scripts that execute node fencing actions,
given a target and fence device parameters.
* **Cluster membership layer:** This component provides reliable messaging,
membership, and quorum information about the cluster. Currently, Pacemaker
supports `Corosync `_ as this layer.
* **Cluster resource manager:** Pacemaker provides the brain that processes
and reacts to events that occur in the cluster. These events may include
nodes joining or leaving the cluster; resource events caused by failures,
maintenance, or scheduled activities; and other administrative actions.
To achieve the desired availability, Pacemaker may start and stop resources
and fence nodes.
* **Cluster tools:** These provide an interface for users to interact with the
cluster. Various command-line and graphical (GUI) interfaces are available.
Most managed services are not, themselves, cluster-aware. However, many popular
open-source cluster filesystems make use of a common *Distributed Lock
Manager* (DLM), which makes direct use of Corosync for its messaging and
membership capabilities and Pacemaker for the ability to fence nodes.
.. image:: ../shared/images/pcmk-stack.png
:alt: Example cluster stack
:align: center
Pacemaker Architecture
______________________
Pacemaker itself is composed of multiple daemons that work together:
* pacemakerd
* pacemaker-attrd
* pacemaker-based
* pacemaker-controld
* pacemaker-execd
* pacemaker-fenced
* pacemaker-schedulerd
.. image:: ../shared/images/pcmk-internals.png
:alt: Pacemaker software components
:align: center
-The Pacemaker master process (pacemakerd) spawns all the other daemons, and
+Pacemaker's main process (pacemakerd) spawns all the other daemons, and
respawns them if they unexpectedly exit.
The *Cluster Information Base* (CIB) is an
`XML `_ representation of the cluster's
configuration and the state of all nodes and resources. The *CIB manager*
(pacemaker-based) keeps the CIB synchronized across the cluster, and handles
requests to modify it.
The *attribute manager* (pacemaker-attrd) maintains a database of attributes
for all nodes, keeps it synchronized across the cluster, and handles requests
to modify them. These attributes are usually recorded in the CIB.
Given a snapshot of the CIB as input, the *scheduler* (pacemaker-schedulerd)
determines what actions are necessary to achieve the desired state of the
cluster.
The *local executor* (pacemaker-execd) handles requests to execute
resource agents on the local cluster node, and returns the result.
The *fencer* (pacemaker-fenced) handles requests to fence nodes. Given a target
node, the fencer decides which cluster node(s) should execute which fencing
device(s), and calls the necessary fencing agents (either directly, or via
requests to the fencer peers on other nodes), and returns the result.
The *controller* (pacemaker-controld) is Pacemaker's coordinator, maintaining a
consistent view of the cluster membership and orchestrating all the other
components.
Pacemaker centralizes cluster decision-making by electing one of the controller
-instances as the 'Designated Controller' ('DC'). Should the elected DC process
+instances as the *Designated Controller* (*DC*). Should the elected DC process
(or the node it is on) fail, a new one is quickly established. The DC responds
to cluster events by taking a current snapshot of the CIB, feeding it to the
scheduler, then asking the executors (either directly on the local node, or via
requests to controller peers on other nodes) and the fencer to execute any
necessary actions.
.. note:: **Old daemon names**
The Pacemaker daemons were renamed in version 2.0. You may still find
references to the old names, especially in documentation targeted to
version 1.1.
.. table::
+-------------------+---------------------+
| Old name | New name |
+===================+=====================+
| attrd | pacemaker-attrd |
+-------------------+---------------------+
| cib | pacemaker-based |
+-------------------+---------------------+
| crmd | pacemaker-controld |
+-------------------+---------------------+
| lrmd | pacemaker-execd |
+-------------------+---------------------+
| stonithd | pacemaker-fenced |
+-------------------+---------------------+
| pacemaker_remoted | pacemaker-remoted |
+-------------------+---------------------+
Node Redundancy Designs
_______________________
Pacemaker supports practically any `node redundancy configuration
`_
including *Active/Active*, *Active/Passive*, *N+1*, *N+M*, *N-to-1* and
*N-to-N*.
Active/passive clusters with two (or more) nodes using Pacemaker and
`DRBD `_ are
a cost-effective high-availability solution for many situations. One of the
nodes provides the desired services, and if it fails, the other node takes
over.
.. image:: ../shared/images/pcmk-active-passive.png
:alt: Active/Passive Redundancy
:align: center
Pacemaker also supports multiple nodes in a shared-failover design, reducing
hardware costs by allowing several active/passive clusters to be combined and
share a common backup node.
.. image:: ../shared/images/pcmk-shared-failover.png
:alt: Shared Failover
:align: center
When shared storage is available, every node can potentially be used for
failover. Pacemaker can even run multiple copies of services to spread out the
workload. This is sometimes called N to N Redundancy.
.. image:: ../shared/images/pcmk-active-active.png
:alt: N to N Redundancy
:align: center
.. rubric:: Footnotes
.. [#] *Cluster* is sometimes used in other contexts to refer to hosts grouped
together for other purposes, such as high-performance computing (HPC),
but Pacemaker is not intended for those purposes.