diff --git a/doc/sphinx/Pacemaker_Administration/agents.rst b/doc/sphinx/Pacemaker_Administration/agents.rst
index f6df901cdf..c348acd710 100644
--- a/doc/sphinx/Pacemaker_Administration/agents.rst
+++ b/doc/sphinx/Pacemaker_Administration/agents.rst
@@ -1,1182 +1,1182 @@
.. index::
single: resource agent
Resource Agents
---------------
Action Completion
#################
If one resource depends on another resource via constraints, the cluster will
interpret an expected result as sufficient to continue with dependent actions.
This may cause timing issues if the resource agent start returns before the
service is not only launched but fully ready to perform its function, or if the
resource agent stop returns before the service has fully released all its
claims on system resources. At a minimum, the start or stop should not return
before a status command would return the expected (started or stopped) result.
.. index::
single: OCF resource agent
single: resource agent; OCF
OCF Resource Agents
###################
.. index::
single: OCF resource agent; location
Location of Custom Scripts
__________________________
OCF Resource Agents are found in ``/usr/lib/ocf/resource.d/$PROVIDER``
When creating your own agents, you are encouraged to create a new directory
under ``/usr/lib/ocf/resource.d/`` so that they are not confused with (or
overwritten by) the agents shipped by existing providers.
So, for example, if you choose the provider name of big-corp and want a new
resource named big-app, you would create a resource agent called
``/usr/lib/ocf/resource.d/big-corp/big-app`` and define a resource:
.. code-block: xml
.. index::
single: OCF resource agent; action
Actions
_______
All OCF resource agents are required to implement the following actions.
.. list-table:: **Required Actions for OCF Agents**
:class: longtable
- :widths: 1 4 3
+ :widths: 15 25 60
:header-rows: 1
* - Action
- Description
- Instructions
* - .. _start_action:
.. index::
single: OCF resource agent; start
single: start action
start
- Start the resource
- Return :ref:`OCF_SUCCESS ` on success and an appropriate
error code otherwise. Must not report success until the resource is fully
active.
* - .. _stop_action:
.. index::
single: OCF resource agent; stop
single: stop action
stop
- Stop the resource
- Return :ref:`OCF_SUCCESS ` on success and an appropriate
error code otherwise. Must not report success until the resource is fully
stopped.
* - .. _monitor_action:
.. index::
single: OCF resource agent; monitor
single: monitor action
monitor
- Check the resource's state
- Return :ref:`OCF_SUCCESS ` if the resource is running,
:ref:`OCF_NOT_RUNNING ` if it is stopped, and any other
:ref:`OCF exit code ` if it is failed. **Note:** The
monitor action should test the state of the resource on the local machine
only.
* - .. _meta_data_action:
.. index::
single: OCF resource agent; meta-data
single: meta-data action
meta-data
- Describe the resource
- Provide information about this resource in the XML format defined by the
OCF standard. Return :ref:`OCF_SUCCESS `. **Note:** This is
*not* required to be performed as root.
OCF resource agents may optionally implement additional actions. Some are used
only with advanced resource types such as clones.
.. list-table:: **Optional Actions for OCF Resource Agents**
:class: longtable:
- :widths: 1 4 3
+ :widths: 15 45 40
:header-rows: 1
* - Action
- Description
- Instructions
* - .. _validate_all_action:
.. index::
single: OCF resource agent; validate-all
single: validate-all action
validate-all
- Validate the instance parameters provided.
- Return :ref:`OCF_SUCCESS ` if parameters are valid,
:ref:`OCF_ERR_ARGS ` if not valid, and
:ref:`OCF_ERR_CONFIGURED ` if resource is not
configured.
* - .. _promote_action:
.. index::
single: OCF resource agent; promote
single: promote action
promote
- Bring the local instance of a promotable clone resource to the promoted
role.
- Return :ref:`OCF_SUCCESS ` on success.
* - .. _demote_action:
.. index::
single: OCF resource agent; demote
single: demote action
demote
- Bring the local instance of a promotable clone resource to the unpromoted
role.
- Return :ref:`OCF_SUCCESS ` on success.
* - .. _notify_action:
.. index::
single: OCF resource agent; notify
single: notify action
notify
- Used by the cluster to send the agent pre- and post-notification events
telling the resource what has happened and what will happen.
- Must not fail. Must return :ref:`OCF_SUCCESS `.
* - .. _reload_action:
.. index::
single: OCF resource agent; reload
single: reload action
reload
- Reload the service's own configuration.
- Not used by Pacemaker.
* - .. _reload_agent_action:
.. index::
single: OCF resource agent; reload-agent
single: reload-agent action
reload-agent
- Make effective any changes in instance parameters marked as reloadable in
the agent's meta-data.
- This is used when the agent can handle a change in some of its parameters
more efficiently than stopping and starting the resource.
* - .. _recover_action:
.. index::
single: OCF resource agent; recover
single: recover action
recover
- Restart the service.
- Not used by Pacemaker.
.. important::
If you create a new OCF resource agent, use `ocf-tester` to verify that the
agent complies with the OCF standard properly.
.. index::
single: OCF resource agent; return code
How Are OCF Return Codes Interpreted?
_____________________________________
The first thing the cluster does is to check the return code against the
expected result. If the result does not match the expected value, then the
operation is considered to have failed, and recovery action is initiated.
There are three types of failure recovery:
.. list-table:: **Types of Recovery Performed by the Cluster**
:class: longtable
- :widths: 1 5 5
+ :widths: 10 45 45
:header-rows: 1
* - Type
- Description
- Action Taken by the Cluster
* - .. _soft_error:
.. index::
single: OCF resource agent; soft error
soft
- A transient error
- Restart the resource or move it to a new location
* - .. _hard_error:
.. index::
single: OCF resource agent; hard error
hard
- A non-transient error that may be specific to the current node
- Move the resource elsewhere and prevent it from being retried on the
current node
* - .. _fatal_error:
.. index::
single: OCF resource agent; fatal error
fatal
- A non-transient error that will be common to all cluster nodes (for
example, a bad configuration was specified)
- Stop the resource and prevent it from being started on any cluster node
.. _ocf_return_codes:
OCF Return Codes
________________
The following table outlines the various OCF return codes and the type of
recovery the cluster will initiate when a failure code is received. Although
counterintuitive, even actions that return ``OCF_SUCCESS`` can be considered to
have failed, if ``OCF_SUCCESS`` was not the expected return value.
.. list-table:: **OCF Exit Codes and Their Recovery Types**
:class: longtable
- :widths: 1 3 6 2
+ :widths: 8 32 50 10
:header-rows: 1
* - Exit Code
- OCF Alias
- Description
- Recovery
* - .. _OCF_SUCCESS:
.. index::
single: OCF_SUCCESS
single: OCF return code; OCF_SUCCESS
pair: OCF return code; 0
0
- OCF_SUCCESS
- Success. The command completed successfully. This is the expected result
for all start, stop, promote, and demote actions.
- :ref:`soft `
* - .. _OCF_ERR_GENERIC:
.. index::
single: OCF_ERR_GENERIC
single: OCF return code; OCF_ERR_GENERIC
pair: OCF return code; 1
1
- OCF_ERR_GENERIC
- Generic "there was a problem" error code.
- :ref:`hard `
* - .. _OCF_ERR_ARGS:
.. index::
single: OCF_ERR_ARGS
single: OCF return code; OCF_ERR_ARGS
pair: OCF return code; 2
2
- OCF_ERR_ARGS
- The resource's parameter values are not valid on this machine (for
example, a value refers to a file not found on the local host).
- :ref:`hard `
* - .. _OCF_ERR_UNIMPLEMENTED:
.. index::
single: OCF_ERR_UNIMPLEMENTED
single: OCF return code; OCF_ERR_UNIMPLEMENTED
pair: OCF return code; 3
3
- OCF_ERR_UNIMPLEMENTED
- The requested action is not implemented.
- :ref:`hard `
* - .. _OCF_ERR_PERM:
.. index::
single: OCF_ERR_PERM
single: OCF return code; OCF_ERR_PERM
pair: OCF return code; 4
4
- OCF_ERR_PERM
- The resource agent does not have sufficient privileges to complete the
task.
- :ref:`hard `
* - .. _OCF_ERR_INSTALLED:
.. index::
single: OCF_ERR_INSTALLED
single: OCF return code; OCF_ERR_INSTALLED
pair: OCF return code; 5
5
- OCF_ERR_INSTALLED
- The tools required by the resource are not installed on this machine.
- :ref:`hard `
* - .. _OCF_ERR_CONFIGURED:
.. index::
single: OCF_ERR_CONFIGURED
single: OCF return code; OCF_ERR_CONFIGURED
pair: OCF return code; 6
6
- OCF_ERR_CONFIGURED
- The resource's parameter values are inherently invalid (for example, a
required parameter was not given).
- :ref:`fatal `
* - .. _OCF_NOT_RUNNING:
.. index::
single: OCF_NOT_RUNNING
single: OCF return code; OCF_NOT_RUNNING
pair: OCF return code; 7
7
- OCF_NOT_RUNNING
- The resource is safely stopped. This should only be returned by monitor
actions, not stop actions.
- N/A
* - .. _OCF_RUNNING_PROMOTED:
.. index::
single: OCF_RUNNING_PROMOTED
single: OCF return code; OCF_RUNNING_PROMOTED
pair: OCF return code; 8
8
- OCF_RUNNING_PROMOTED
- The resource is running in the promoted role.
- :ref:`soft `
* - .. _OCF_FAILED_PROMOTED:
.. index::
single: OCF_FAILED_PROMOTED
single: OCF return code; OCF_FAILED_PROMOTED
pair: OCF return code; 9
9
- OCF_FAILED_PROMOTED
- The resource is (or might be) in the promoted role but has failed. The
resource will be demoted, stopped, and then started (and possibly
promoted) again.
- :ref:`soft `
* - .. _OCF_DEGRADED:
.. index::
single: OCF_DEGRADED
single: OCF return code; OCF_DEGRADED
pair: OCF return code; 190
190
- OCF_DEGRADED
- The resource is properly active, but in such a condition that future
failures are more likely.
- none
* - .. _OCF_DEGRADED_PROMOTED:
.. index::
single: OCF_DEGRADED_PROMOTED
single: OCF return code; OCF_DEGRADED_PROMOTED
pair: OCF return code; 191
191
- OCF_DEGRADED_PROMOTED
- The resource is properly active in the promoted role, but in such a
condition that future failures are more likely.
- none
* - other
- *none*
- Custom error code.
- soft
Exceptions to the recovery handling described above:
* Probes (non-recurring monitor actions) that find a resource active
(or in the promoted role) will not result in recovery action unless it is
also found active elsewhere.
* The recovery action taken when a resource is found active more than
once is determined by the resource's ``multiple-active`` property.
* Recurring actions that return ``OCF_ERR_UNIMPLEMENTED``
do not cause any type of recovery.
* Actions that return one of the "degraded" codes will be treated the same as
if they had returned success, but status output will indicate that the
resource is degraded.
.. _ocf_env_vars:
Environment Variables
_____________________
Pacemaker sets certain environment variables when it executes an OCF resource
agent. Agents can check these variables to get information about resource
parameters or the execution environment.
**Note:** Pacemaker may set other environment variables for its own purposes.
They may be present in the agent's environment, but Pacemaker is not providing
them for the agent's use, and so the agent should not rely on any variables not
listed in the table below.
.. list-table:: **OCF Environment Variables**
:class: longtable
- :widths: 1 6
+ :widths: 50 50
:header-rows: 1
* - Environment Variable
- Description
* - .. _OCF_CHECK_LEVEL:
.. index::
single: OCF_CHECK_LEVEL
single: environment variable; OCF_CHECK_LEVEL
OCF_CHECK_LEVEL
- Requested intensity level of checks in ``monitor`` and ``validate-all``
actions. Usually set as an operation attribute; see Pacemaker Explained
for an example.
* - .. _OCF_EXIT_REASON_PREFIX:
.. index::
single: OCF_EXIT_REASON_PREFIX
single: environment variable; OCF_EXIT_REASON_PREFIX
OCF_EXIT_REASON_PREFIX
- Prefix for printing fatal error messages from the resource agent.
* - .. _OCF_RA_VERSION_MAJOR:
.. index::
single: OCF_RA_VERSION_MAJOR
single: environment variable; OCF_RA_VERSION_MAJOR
OCF_RA_VERSION_MAJOR
- Major version number of the OCF Resource Agent API. If the script does
not support this revision, it should report an error.
See the `OCF specification `_ for an
explanation of the versioning scheme used. The version number is split
into two numbers for ease of use in shell scripts. These two may be used
by the agent to determine whether it is run under an OCF-compliant
resource manager.
* - .. _OCF_RA_VERSION_MINOR:
.. index::
single: OCF_RA_VERSION_MINOR
single: environment variable; OCF_RA_VERSION_MINOR
OCF_RA_VERSION_MINOR
- Minor version number of the OCF Resource Agent API. See
:ref:`OCF_RA_VERSION_MAJOR ` for more details.
* - .. _OCF_RESKEY_crm_feature_set:
.. index::
single: OCF_RESKEY_crm_feature_set
single: environment variable; OCF_RESKEY_crm_feature_set
OCF_RESKEY_crm_feature_set
- ``crm_feature_set`` on the DC (or on the local node, if the agent is run
by ``crm_resource``).
* - .. _OCF_RESKEY_CRM_meta_interval:
.. index::
single: OCF_RESKEY_CRM_meta_interval
single: environment variable; OCF_RESKEY_CRM_meta_interval
OCF_RESKEY_CRM_meta_interval
- Interval (in milliseconds) of the current operation.
* - .. _OCF_RESKEY_CRM_meta_name:
.. index::
single: OCF_RESKEY_CRM_meta_name
single: environment variable; OCF_RESKEY_CRM_meta_name
OCF_RESKEY_CRM_meta_name
- Name of the current operation.
* - .. _OCF_RESKEY_CRM_meta_notify:
.. index::
single: OCF_RESKEY_CRM_meta_notify_*
single: environment variable; OCF_RESKEY_CRM_meta_notify_*
OCF_RESKEY_CRM_meta_notify_*
- See :ref:`Clone Notifications `.
* - .. _OCF_RESKEY_CRM_meta_on_node:
.. index::
single: OCF_RESKEY_CRM_meta_on_node
single: environment variable; OCF_RESKEY_CRM_meta_on_node
OCF_RESKEY_CRM_meta_on_node
- Name of the node where the current operation is running.
* - .. _OCF_RESKEY_CRM_meta_on_node_uuid:
.. index::
single: OCF_RESKEY_CRM_meta_on_node_uuid
single: environment variable; OCF_RESKEY_CRM_meta_on_node_uuid
OCF_RESKEY_CRM_meta_on_node_uuid
- Cluster-layer ID of the node where the current operation is running (or
node name for Pacemaker Remote nodes).
* - .. _OCF_RESKEY_CRM_meta_physical_host:
.. index::
single: OCF_RESKEY_CRM_meta_physical_host
single: environment variable; OCF_RESKEY_CRM_meta_physical_host
OCF_RESKEY_CRM_meta_physical_host
- If the node where the current operation is running is a guest node, the
host on which the container is running.
* - .. _OCF_RESKEY_CRM_meta_timeout:
.. index::
single: OCF_RESKEY_CRM_meta_timeout
single: environment variable; OCF_RESKEY_CRM_meta_timeout
OCF_RESKEY_CRM_meta_timeout
- Timeout (in milliseconds) of the current operation.
* - .. _OCF_RESKEY_CRM_meta:
.. index::
single: OCF_RESKEY_CRM_meta_*
single: environment variable; OCF_RESKEY_CRM_meta_*
OCF_RESKEY_CRM_meta_*
- Each of a resource's meta-attributes is converted to an environment
variable prefixed with "OCF_RESKEY_CRM_meta\_". See Pacemaker Explained
for some meta-attributes that have special meaning to Pacemaker.
* - .. _OCF_RESKEY:
.. index::
single: OCF_RESKEY_*
single: environment variable; OCF_RESKEY_*
OCF_RESKEY_*
- Each of a resource's instance parameters is converted to an environment
variable prefixed with "OCF_RESKEY\_".
* - .. _OCF_RESOURCE_INSTANCE:
.. index::
single: OCF_RESOURCE_INSTANCE
single: environment variable; OCF_RESOURCE_INSTANCE
OCF_RESOURCE_INSTANCE
- The name of the resource instance.
* - .. _OCF_RESOURCE_PROVIDER:
.. index::
single: OCF_RESOURCE_PROVIDER
single: environment variable; OCF_RESOURCE_PROVIDER
OCF_RESOURCE_PROVIDER
- The name of the resource agent provider.
* - .. _OCF_RESOURCE_TYPE:
.. index::
single: OCF_RESOURCE_TYPE
single: environment variable; OCF_RESOURCE_TYPE
OCF_RESOURCE_TYPE
- The name of the resource type.
* - .. _OCF_ROOT:
.. index::
single: OCF_ROOT
single: environment variable; OCF_ROOT
OCF_ROOT
- The root of the OCF directory hierarchy.
* - .. _OCF_TRACE_FILE:
.. index::
single: OCF_TRACE_FILE
single: environment variable; OCF_TRACE_FILE
OCF_TRACE_FILE
- The absolute path or file descriptor to write trace output to, if
``OCF_TRACE_RA`` is set to true. Pacemaker sets this only to
``/dev/stderr`` and only when running a resource agent via
``crm_resource``.
* - .. _OCF_TRACE_RA:
.. index::
single: OCF_TRACE_RA
single: environment variable; OCF_TRACE_RA
OCF_TRACE_RA
- If set to true, enable tracing of the resource agent. Trace output is
written to ``OCF_TRACE_FILE`` if set; otherwise, it's written to a file
in ``OCF_RESKEY_trace_dir`` if set or in a default directory if not.
Pacemaker sets this to true only when running a resource agent via
``crm_resource`` with one or more ``-V`` flags.
* - .. _PCMK_DEBUGLOG:
.. _HA_DEBUGLOG:
.. index::
single: PCMK_DEBUGLOG
single: environment variable; PCMK_DEBUGLOG
single: HA_DEBUGLOG
single: environment variable; HA_DEBUGLOG
PCMK_DEBUGLOG (and HA_DEBUGLOG)
- Where to write resource agent debug logs. Pacemaker sets this to
``PCMK_logfile`` if set to a value other than ``none`` and if debugging
is enabled for the executor.
* - .. _PCMK_LOGFACILITY:
.. _HA_LOGFACILITY:
.. index::
single: PCMK_LOGFACILITY
single: environment variable; PCMK_LOGFACILITY
single: HA_LOGFACILITY
single: environment variable; HA_LOGFACILITY
PCMK_LOGFACILITY (and HA_LOGFACILITY)
- Syslog facility for resource agent logs. Pacemaker sets this to
``PCMK_logfacility`` if set to a value other than ``none`` or
``/dev/null``.
* - .. _PCMK_LOGFILE:
.. _HA_LOGFILE:
.. index::
single: PCMK_LOGFILE:
single: environment variable; PCMK_LOGFILE:
single: HA_LOGFILE:
single: environment variable; HA_LOGFILE:
PCMK_LOGFILE (and HA_LOGFILE)
- Where to write resource agent logs. Pacemaker sets this to
``PCMK_logfile`` if set to a value other than ``none``.
* - .. _PCMK_service:
.. index::
single: PCMK_service
single: environment variable; PCMK_service
PCMK_service
- The name of the Pacemaker subsystem or command-line tool that's executing
the resource agent. Specific values are subject to change; useful mainly
for logging.
Clone Resource Agent Requirements
_________________________________
Any resource can be used as an anonymous clone, as it requires no additional
support from the resource agent. Whether it makes sense to do so depends on your
resource and its resource agent.
Resource Agent Requirements for Globally Unique Clones
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Globally unique clones require additional support in the resource agent. In
particular, it must respond with ``OCF_SUCCESS`` only if the node has that exact
instance active. All other probes for instances of the clone should result in
``OCF_NOT_RUNNING`` (or one of the other OCF error codes if they are failed).
Individual instances of a clone are identified by appending a colon and a
numerical offset (for example, ``apache:2``).
A resource agent can find out how many copies there are by examining the
``OCF_RESKEY_CRM_meta_clone_max`` environment variable and which instance it is
by examining ``OCF_RESKEY_CRM_meta_clone``.
The resource agent must not make any assumptions (based on
``OCF_RESKEY_CRM_meta_clone``) about which numerical instances are active. In
particular, the list of active copies is not always an unbroken sequence, nor
does it always start at 0.
Resource Agent Requirements for Promotable Clones
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Promotable clone resources require two extra actions, ``demote`` and ``promote``,
which are responsible for changing the state of the resource. Like ``start`` and
``stop``, they should return ``OCF_SUCCESS`` if they completed successfully or a
relevant error code if they did not.
The states can mean whatever you wish, but when the resource is started, it must
begin in the unpromoted role. From there, the cluster will decide which
instances to promote.
In addition to the clone requirements for monitor actions, agents must also
*accurately* report which state they are in. The cluster relies on the agent to
report its status (including role) accurately and does not indicate to the agent
what role it currently believes it to be in.
.. list-table:: **Role Implications of OCF Return Codes**
:class: longtable
- :widths: 1 3
+ :widths: 50 50
:header-rows: 1
* - Monitor Return Code
- Description
* - :ref:`OCF_NOT_RUNNING `
- .. index::
single: OCF_NOT_RUNNING
single: OCF return code; OCF_NOT_RUNNING
Stopped
* - :ref:`OCF_SUCCESS `
- .. index::
single: OCF_SUCCESS
single: OCF return code; OCF_SUCCESS
Running (Unpromoted)
* - :ref:`OCF_RUNNING_PROMOTED `
- .. index::
single: OCF_RUNNING_PROMOTED
single: OCF return code; OCF_RUNNING_PROMOTED
Running (Promoted)
* - :ref:`OCF_FAILED_PROMOTED `
- .. index::
single: OCF_FAILED_PROMOTED
single: OCF return code; OCF_FAILED_PROMOTED
Failed (Promoted)
* - Other
- Failed (Unpromoted)
.. _clone_notifications:
Clone Notifications
~~~~~~~~~~~~~~~~~~~
If the clone has the ``notify`` meta-attribute set to ``true`` and the resource
agent supports the ``notify`` action, Pacemaker will call the action when
appropriate, passing a number of extra variables. These variables, when combined
with additional context, can be used to calculate the current state of the
cluster and what is about to happen to it.
.. index::
single: clone; environment variables
single: notify; environment variables
.. list-table:: **Environment Variables Supplied with Clone Notify Actions**
:class: longtable
- :widths: 1 1
+ :widths: 50 50
:header-rows: 1
* - Variable
- Description
* - .. _OCF_RESKEY_CRM_meta_notify_type:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_type
single: OCF_RESKEY_CRM_meta_notify_type
OCF_RESKEY_CRM_meta_notify_type
- Allowed values: ``pre``, ``post``
* - .. _OCF_RESKEY_CRM_meta_notify_operation:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_operation
single: OCF_RESKEY_CRM_meta_notify_operation
OCF_RESKEY_CRM_meta_notify_operation
- Allowed values: ``start``, ``stop``
* - .. _OCF_RESKEY_CRM_meta_notify_start_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_start_resource
single: OCF_RESKEY_CRM_meta_notify_start_resource
OCF_RESKEY_CRM_meta_notify_start_resource
- Resources to be started
* - .. _OCF_RESKEY_CRM_meta_notify_stop_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_stop_resource
single: OCF_RESKEY_CRM_meta_notify_stop_resource
OCF_RESKEY_CRM_meta_notify_stop_resource
- Resources to be stopped
* - .. _OCF_RESKEY_CRM_meta_notify_active_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_active_resource
single: OCF_RESKEY_CRM_meta_notify_active_resource
OCF_RESKEY_CRM_meta_notify_active_resource
- Resources that are running
* - .. _OCF_RESKEY_CRM_meta_notify_inactive_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_inactive_resource
single: OCF_RESKEY_CRM_meta_notify_inactive_resource
OCF_RESKEY_CRM_meta_notify_inactive_resource
- Resources that are not running
* - .. _OCF_RESKEY_CRM_meta_notify_start_uname:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_start_uname
single: OCF_RESKEY_CRM_meta_notify_start_uname
OCF_RESKEY_CRM_meta_notify_start_uname
- Nodes on which resources will be started
* - .. _OCF_RESKEY_CRM_meta_notify_stop_uname:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_stop_uname
single: OCF_RESKEY_CRM_meta_notify_stop_uname
OCF_RESKEY_CRM_meta_notify_stop_uname
- Nodes on which resources will be stopped
* - .. _OCF_RESKEY_CRM_meta_notify_active_uname:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_active_uname
single: OCF_RESKEY_CRM_meta_notify_active_uname
OCF_RESKEY_CRM_meta_notify_active_uname
- Nodes on which resources are running
The variables come in pairs, such as
``OCF_RESKEY_CRM_meta_notify_start_resource`` and
``OCF_RESKEY_CRM_meta_notify_start_uname``, and should be treated as an array of
whitespace-separated elements.
``OCF_RESKEY_CRM_meta_notify_inactive_resource`` is an exception, as the
matching ``uname`` variable does not exist since inactive resources are not
running on any node.
Thus, in order to indicate that ``clone:0`` will be started on ``sles-1``,
``clone:2`` will be started on ``sles-3``, and ``clone:3`` will be started
on ``sles-2``, the cluster would set:
.. topic:: Notification Variables
.. code-block:: none
OCF_RESKEY_CRM_meta_notify_start_resource="clone:0 clone:2 clone:3"
OCF_RESKEY_CRM_meta_notify_start_uname="sles-1 sles-3 sles-2"
.. note::
Pacemaker will log but otherwise ignore failures of notify actions.
Interpretation of Notification Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Pre-notification (stop):**
* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (stop) / Pre-notification (start):**
* Active resources
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Inactive resources
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (start):**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
Extra Notifications for Promotable Clones
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. index::
single: clone; environment variables
single: promotable; environment variables
.. list-table:: **Extra Environment Variables Supplied for Promotable Clones**
:class: longtable
- :widths: 1 1
+ :widths: 50 50
:header-rows: 1
* - Variable
- Description
* - .. _OCF_RESKEY_CRM_meta_notify_promoted_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_promoted_resource
single: OCF_RESKEY_CRM_meta_notify_promoted_resource
OCF_RESKEY_CRM_meta_notify_promoted_resource
- Resources that are running in the promoted role
* - .. _OCF_RESKEY_CRM_meta_notify_unpromoted_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_unpromoted_resource
single: OCF_RESKEY_CRM_meta_notify_unpromoted_resource
OCF_RESKEY_CRM_meta_notify_unpromoted_resource
- Resources that are running in the unpromoted role
* - .. _OCF_RESKEY_CRM_meta_notify_promote_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_promote_resource
single: OCF_RESKEY_CRM_meta_notify_promote_resource
OCF_RESKEY_CRM_meta_notify_promote_resource
- Resources to be promoted
* - .. _OCF_RESKEY_CRM_meta_notify_demote_resource:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_demote_resource
single: OCF_RESKEY_CRM_meta_notify_demote_resource
OCF_RESKEY_CRM_meta_notify_demote_resource
- Resources to be demoted
* - .. _OCF_RESKEY_CRM_meta_notify_promote_uname:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_promote_uname
single: OCF_RESKEY_CRM_meta_notify_promote_uname
OCF_RESKEY_CRM_meta_notify_promote_uname
- Nodes on which resources will be promoted
* - .. _OCF_RESKEY_CRM_meta_notify_demote_uname:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_demote_uname
single: OCF_RESKEY_CRM_meta_notify_demote_uname
OCF_RESKEY_CRM_meta_notify_demote_uname
- Nodes on which resources will be demoted
* - .. _OCF_RESKEY_CRM_meta_notify_promoted_uname:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_promoted_uname
single: OCF_RESKEY_CRM_meta_notify_promoted_uname
OCF_RESKEY_CRM_meta_notify_promoted_uname
- Nodes on which resources are running in the promoted role
* - .. _OCF_RESKEY_CRM_meta_notify_unpromoted_uname:
.. index::
single: environment variable; OCF_RESKEY_CRM_meta_notify_unpromoted_uname
single: OCF_RESKEY_CRM_meta_notify_unpromoted_uname
OCF_RESKEY_CRM_meta_notify_unpromoted_uname
- Nodes on which resources are running in the unpromoted role
Interpretation of Promotable Notification Variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**Pre-notification (demote):**
* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* Promoted resources: ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* Unpromoted resources: ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (demote) / Pre-notification (stop):**
* Active resources: ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Unpromoted resources: ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* Inactive resources: ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
**Post-notification (stop) / Pre-notification (start)**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Unpromoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (start) / Pre-notification (promote)**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Unpromoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
**Post-notification (promote)**
* Active resources:
* ``$OCF_RESKEY_CRM_meta_notify_active_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Promoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_promoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Unpromoted resources:
* ``$OCF_RESKEY_CRM_meta_notify_unpromoted_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Inactive resources:
* ``$OCF_RESKEY_CRM_meta_notify_inactive_resource``
* plus ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* minus ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources to be promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources to be demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources to be stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
* Resources that were started: ``$OCF_RESKEY_CRM_meta_notify_start_resource``
* Resources that were promoted: ``$OCF_RESKEY_CRM_meta_notify_promote_resource``
* Resources that were demoted: ``$OCF_RESKEY_CRM_meta_notify_demote_resource``
* Resources that were stopped: ``$OCF_RESKEY_CRM_meta_notify_stop_resource``
.. index::
single: resource agent; LSB
single: LSB resource agent
single: init script
LSB Resource Agents (Init Scripts)
##################################
LSB Compliance
______________
The relevant part of the
`LSB specifications `_
includes a description of all the return codes listed here.
Assuming `some_service` is configured correctly and currently
inactive, the following sequence will help you determine if it is
LSB-compatible:
#. Start (stopped):
.. code-block:: none
# /etc/init.d/some_service start ; echo "result: $?"
* Did the service start?
* Did the echo command print ``result: 0`` (in addition to the init script's
usual output)?
#. Status (running):
.. code-block:: none
# /etc/init.d/some_service status ; echo "result: $?"
* Did the script accept the command?
* Did the script indicate the service was running?
* Did the echo command print ``result: 0`` (in addition to the init script's
usual output)?
#. Start (running):
.. code-block:: none
# /etc/init.d/some_service start ; echo "result: $?"
* Is the service still running?
* Did the echo command print ``result: 0`` (in addition to the init
script's usual output)?
#. Stop (running):
.. code-block:: none
# /etc/init.d/some_service stop ; echo "result: $?"
* Was the service stopped?
* Did the echo command print ``result: 0`` (in addition to the init
script's usual output)?
#. Status (stopped):
.. code-block:: none
# /etc/init.d/some_service status ; echo "result: $?"
* Did the script accept the command?
* Did the script indicate the service was not running?
* Did the echo command print ``result: 3`` (in addition to the init
script's usual output)?
#. Stop (stopped):
.. code-block:: none
# /etc/init.d/some_service stop ; echo "result: $?"
* Is the service still stopped?
* Did the echo command print ``result: 0`` (in addition to the init
script's usual output)?
#. Status (failed):
This step is not readily testable and relies on manual inspection of the script.
The script can use one of the error codes (other than 3) listed in the
LSB spec to indicate that it is active but failed. This tells the
cluster that before moving the resource to another node, it needs to
stop it on the existing one first.
If the answer to any of the above questions is no, then the script is not
LSB-compliant. Your options are then to either fix the script or write an OCF
agent based on the existing script.
diff --git a/doc/sphinx/Pacemaker_Administration/alerts.rst b/doc/sphinx/Pacemaker_Administration/alerts.rst
index 05424dca0b..9066580d76 100644
--- a/doc/sphinx/Pacemaker_Administration/alerts.rst
+++ b/doc/sphinx/Pacemaker_Administration/alerts.rst
@@ -1,343 +1,343 @@
.. index::
single: alert; agents
Alert Agents
------------
.. index::
single: alert; sample agents
Using the Sample Alert Agents
#############################
Pacemaker provides several sample alert agents, installed in
``/usr/share/pacemaker/alerts`` by default.
While these sample scripts may be copied and used as-is, they are provided
mainly as templates to be edited to suit your purposes. See their source code
for the full set of instance attributes they support.
.. topic:: Sending cluster events as SNMP v2c traps
.. code-block:: xml
.. note:: **SNMP alert agent attributes**
The ``timestamp-format`` meta-attribute should always be set to
``%Y-%m-%d,%H:%M:%S.%01N`` when using the SNMP agent, to match the SNMP
standard.
The SNMP agent provides a number of instance attributes in addition to the
one used in the example above. The most useful are ``trap_version``, which
defaults to ``2c``, and ``trap_community``, which defaults to ``public``.
See the source code for more details.
.. topic:: Sending cluster events as SNMP v3 traps
.. code-block:: xml
.. note:: **SNMP v3 trap configuration**
To use SNMP v3, ``trap_version`` must be set to ``3``. ``trap_community``
will be ignored.
The example above uses the ``trap_options`` instance attribute to override
the security level, authentication protocol, authentication user, and
authentication password from snmp.conf. These will be passed to the snmptrap
command. Passing the password on the command line is considered insecure;
specify authentication and privacy options suitable for your environment.
.. topic:: Sending cluster events as e-mails
.. code-block:: xml
.. index::
single: alert; agent development
Writing an Alert Agent
######################
.. index::
single: alert; environment variables
single: environment variable; alert agents
.. list-table:: **Environment variables passed to alert agents**
:class: longtable
- :widths: 1 3 1
+ :widths: 30 50 20
:header-rows: 1
* - Environment Variable
- Description
- Alert Types
* - .. _CRM_alert_kind:
.. index::
single: environment variable; CRM_alert_kind
single: CRM_alert_kind
CRM_alert_kind
- The type of alert (``node``, ``fencing``, ``resource``, or
``attribute``)
- all
* - .. _CRM_alert_node:
.. index::
single: environment variable; CRM_alert_node
single: CRM_alert_node
CRM_alert_node
- Name of affected node
- all
* - .. _CRM_alert_node_sequence:
.. index::
single: environment variable; CRM_alert_node_sequence
single: CRM_alert_node_sequence
CRM_alert_node_sequence
- A sequence number increased whenever an alert is being issued on the
local node, which can be used to reference the order in which alerts
have been issued by Pacemaker. An alert for an event that happened later
in time reliably has a higher sequence number than alerts for earlier
events. This number has no cluster-wide meaning.
- all
* - .. _CRM_alert_recipient:
.. index::
single: environment variable; CRM_alert_recipient
single: CRM_alert_recipient
CRM_alert_recipient
- The configured recipient
- all
* - .. _CRM_alert_timestamp:
.. index::
single: environment variable; CRM_alert_timestamp
single: CRM_alert_timestamp
CRM_alert_timestamp
- A timestamp created prior to executing the agent, in the format
specified by the ``timestamp-format`` meta-attribute. This allows the
agent to have a reliable, high-precision time of when the event
occurred, regardless of when the agent itself was invoked (which could
potentially be delayed due to system load, etc.).
- all
* - .. _CRM_alert_timestamp_epoch:
.. index::
single: environment variable; CRM_alert_timestamp_epoch
single: CRM_alert_timestamp_epoch
CRM_alert_timestamp_epoch
- The same time as ``CRM_alert_timestamp``, expressed as the integer
number of seconds since January 1, 1970. This (along with
``CRM_alert_timestamp_usec``) can be useful for alert agents that need
to format time in a specific way rather than let the user configure it.
- all
* - .. _CRM_alert_timestamp_usec:
.. index::
single: environment variable; CRM_alert_timestamp_usec
single: CRM_alert_timestamp_usec
CRM_alert_timestamp_usec
- The same time as ``CRM_alert_timestamp``, expressed as the integer
number of microseconds since ``CRM_alert_timestamp_epoch``.
- all
* - .. _CRM_alert_version:
.. index::
single: environment variable; CRM_alert_version
single: CRM_alert_version
CRM_alert_version
- The version of Pacemaker sending the alert
- all
* - .. _CRM_alert_desc:
.. index::
single: environment variable; CRM_alert_desc
single: CRM_alert_desc
CRM_alert_desc
- Detail about event. For ``node`` alerts, this is the node's current
state (``member`` or ``lost``). For ``fencing`` alerts, this is a
summary of the requested fencing operation, including origin, target,
and fencing operation error code, if any. For ``resource`` alerts, this
is a readable string equivalent of ``CRM_alert_status``.
- ``node``, ``fencing``, ``resource``
* - .. _CRM_alert_nodeid:
.. index::
single: environment variable; CRM_alert_nodeid
single: CRM_alert_nodeid
CRM_alert_nodeid
- ID of node whose status changed
- ``node``
* - .. _CRM_alert_rc:
.. index::
single: environment variable; CRM_alert_rc
single: CRM_alert_rc
CRM_alert_rc
- The numerical return code of the fencing or resource operation
- ``fencing``, ``resource``
* - .. _CRM_alert_task:
.. index::
single: environment variable; CRM_alert_task
single: CRM_alert_task
CRM_alert_task
- The requested fencing or resource operation
- ``fencing``, ``resource``
* - .. _CRM_alert_exec_time:
.. index::
single: environment variable; CRM_alert_exec_time
single: CRM_alert_exec_time
CRM_alert_exec_time
- The (wall-clock) time, in milliseconds, that it took to execute the
action. If the action timed out, ``CRM_alert_status`` will be 2,
``CRM_alert_desc`` will be "Timed Out", and this value will be the
action timeout. May not be supported on all platforms. *(since 2.0.1)*
- ``resource``
* - .. _CRM_alert_interval:
.. index::
single: environment variable; CRM_alert_interval
single: CRM_alert_interval
CRM_alert_interval
- The interval of the resource operation
- ``resource``
* - .. _CRM_alert_rsc:
.. index::
single: environment variable; CRM_alert_rsc
single: CRM_alert_rsc
CRM_alert_rsc
- The name of the affected resource
- ``resource``
* - .. _CRM_alert_status:
.. index::
single: environment variable; CRM_alert_status
single: CRM_alert_status
CRM_alert_status
- A numerical code used by Pacemaker to represent the operation result
- ``resource``
* - .. _CRM_alert_target_rc:
.. index::
single: environment variable; CRM_alert_target_rc
single: CRM_alert_target_rc
CRM_alert_target_rc
- The expected numerical return code of the operation
- ``resource``
* - .. _CRM_alert_attribute_name:
.. index::
single: environment variable; CRM_alert_attribute_name
single: CRM_alert_attribute_name
CRM_alert_attribute_name
- The name of the node attribute that changed
- ``attribute``
* - .. _CRM_alert_attribute_value:
.. index::
single: environment variable; CRM_alert_attribute_value
single: CRM_alert_attribute_value
CRM_alert_attribute_value
- The new value of the node attribute that changed
- ``attribute``
Special concerns when writing alert agents:
* Alert agents may be called with no recipient (if none is configured),
so the agent must be able to handle this situation, even if it
only exits in that case. (Users may modify the configuration in
stages, and add a recipient later.)
* If more than one recipient is configured for an alert, the alert agent will
be called once per recipient. If an agent is not able to run concurrently, it
should be configured with only a single recipient. The agent is free,
however, to interpret the recipient as a list.
* When a cluster event occurs, all alerts are fired off at the same time as
separate processes. Depending on how many alerts and recipients are
configured, and on what is done within the alert agents,
a significant load burst may occur. The agent could be written to take
this into consideration, for example by queueing resource-intensive actions
into some other instance, instead of directly executing them.
* Alert agents are run as the |CRM_DAEMON_USER| user, which has a minimal set
of permissions. If an agent requires additional privileges, it is
recommended to configure ``sudo`` to allow the agent to run the necessary
commands as another user with the appropriate privileges.
* As always, take care to validate and sanitize user-configured parameters,
such as ``CRM_alert_timestamp`` (whose content is specified by the
user-configured ``timestamp-format``), ``CRM_alert_recipient,`` and all
instance attributes. Mostly this is needed simply to protect against
configuration errors, but if some user can modify the CIB without having
|CRM_DAEMON_USER| access to the cluster nodes, it is a potential security
concern as well, to avoid the possibility of code injection.
diff --git a/doc/sphinx/Pacemaker_Administration/moving.rst b/doc/sphinx/Pacemaker_Administration/moving.rst
index 3d6a92af51..5881dd2910 100644
--- a/doc/sphinx/Pacemaker_Administration/moving.rst
+++ b/doc/sphinx/Pacemaker_Administration/moving.rst
@@ -1,305 +1,305 @@
Moving Resources
----------------
.. index::
single: resource; move
Moving Resources Manually
#########################
There are primarily two occasions when you would want to move a resource from
its current location: when the whole node is under maintenance, and when a
single resource needs to be moved.
.. index::
single: standby mode
single: node; standby mode
Standby Mode
____________
Since everything eventually comes down to a score, you could create constraints
for every resource to prevent them from running on one node. While Pacemaker
configuration can seem convoluted at times, not even we would require this of
administrators.
Instead, you can set a special node attribute which tells the cluster "don't
let anything run here". There is even a helpful tool to help query and set it,
called ``crm_standby``. To check the standby status of the current machine,
run:
.. code-block:: none
# crm_standby -G
A value of ``on`` indicates that the node is *not* able to host any resources,
while a value of ``off`` says that it *can*.
You can also check the status of other nodes in the cluster by specifying the
`--node` option:
.. code-block:: none
# crm_standby -G --node sles-2
To change the current node's standby status, use ``-v`` instead of ``-G``:
.. code-block:: none
# crm_standby -v on
Again, you can change another host's value by supplying a hostname with
``--node``.
A cluster node in standby mode will not run resources, but still contributes to
quorum, and may fence or be fenced by nodes.
Moving One Resource
___________________
When only one resource is required to move, we could do this by creating
location constraints. However, once again we provide a user-friendly shortcut
as part of the ``crm_resource`` command, which creates and modifies the extra
constraints for you. If ``Email`` were running on ``sles-1`` and you wanted it
moved to a specific location, the command would look something like:
.. code-block:: none
# crm_resource -M -r Email -H sles-2
Behind the scenes, the tool will create the following location constraint:
.. code-block:: xml
It is important to note that subsequent invocations of ``crm_resource -M`` are
not cumulative. So, if you ran these commands:
.. code-block:: none
# crm_resource -M -r Email -H sles-2
# crm_resource -M -r Email -H sles-3
then it is as if you had never performed the first command.
To allow the resource to move back again, use:
.. code-block:: none
# crm_resource -U -r Email
Note the use of the word *allow*. The resource *can* move back to its original
location, but depending on ``resource-stickiness``, location constraints, and
so forth, it might stay where it is.
To be absolutely certain that it moves back to ``sles-1``, move it there before
issuing the call to ``crm_resource -U``:
.. code-block:: none
# crm_resource -M -r Email -H sles-1
# crm_resource -U -r Email
Alternatively, if you only care that the resource should be moved from its
current location, try:
.. code-block:: none
# crm_resource -B -r Email
which will instead create a negative constraint, like:
.. code-block:: xml
This will achieve the desired effect, but will also have long-term
consequences. As the tool will warn you, the creation of a ``-INFINITY``
constraint will prevent the resource from running on that node until
``crm_resource -U`` is used. This includes the situation where every other
cluster node is no longer available!
In some cases, such as when ``resource-stickiness`` is set to ``INFINITY``, it
is possible that you will end up with nodes with the same score, forcing the
cluster to choose one (which may not be the one you want). The tool can detect
some of these cases and deals with them by creating both positive and negative
constraints. For example:
.. code-block:: xml
which has the same long-term consequences as discussed earlier.
Moving Resources Due to Connectivity Changes
############################################
You can configure the cluster to move resources when external connectivity is
lost in two steps.
.. index::
single: ocf:pacemaker:ping resource
single: ping resource
Tell Pacemaker to Monitor Connectivity
______________________________________
First, add an ``ocf:pacemaker:ping`` resource to the cluster. The ``ping``
resource uses the system utility of the same name to a test whether a list of
machines (specified by DNS hostname or IP address) are reachable, and uses the
results to maintain a node attribute.
The node attribute is called ``pingd`` by default, but is customizable in order
to allow multiple ping groups to be defined.
Normally, the ping resource should run on all cluster nodes, which means that
you'll need to create a clone. A template for this can be found below, along
with a description of the most interesting parameters.
.. table:: **Commonly Used ocf:pacemaker:ping Resource Parameters**
- :widths: 1 4
+ :widths: 20 80
+--------------------+--------------------------------------------------------------+
| Resource Parameter | Description |
+====================+==============================================================+
| dampen | .. index:: |
| | single: ocf:pacemaker:ping resource; dampen parameter |
| | single: dampen; ocf:pacemaker:ping resource parameter |
| | |
| | The time to wait (dampening) for further changes to occur. |
| | Use this to prevent a resource from bouncing around the |
| | cluster when cluster nodes notice the loss of connectivity |
| | at slightly different times. |
+--------------------+--------------------------------------------------------------+
| multiplier | .. index:: |
| | single: ocf:pacemaker:ping resource; multiplier parameter |
| | single: multiplier; ocf:pacemaker:ping resource parameter |
| | |
| | The number of connected ping nodes gets multiplied by this |
| | value to get a score. Useful when there are multiple ping |
| | nodes configured. |
+--------------------+--------------------------------------------------------------+
| host_list | .. index:: |
| | single: ocf:pacemaker:ping resource; host_list parameter |
| | single: host_list; ocf:pacemaker:ping resource parameter |
| | |
| | The machines to contact in order to determine the current |
| | connectivity status. Allowed values include resolvable DNS |
| | connectivity host names, IPv4 addresses, and IPv6 addresses. |
+--------------------+--------------------------------------------------------------+
.. topic:: Example ping resource that checks node connectivity once every minute
.. code-block:: xml
.. important::
You're only half done. The next section deals with telling Pacemaker how to
deal with the connectivity status that ``ocf:pacemaker:ping`` is recording.
Tell Pacemaker How to Interpret the Connectivity Data
_____________________________________________________
.. important::
Before attempting the following, make sure you understand rules. See the
"Rules" chapter of the *Pacemaker Explained* document for details.
There are a number of ways to use the connectivity data.
The most common setup is for people to have a single ping target (for example,
the service network's default gateway), to prevent the cluster from running a
resource on any unconnected node.
.. topic:: Don't run a resource on unconnected nodes
.. code-block:: xml
A more complex setup is to have a number of ping targets configured. You can
require the cluster to only run resources on nodes that can connect to all (or
a minimum subset) of them.
.. topic:: Run only on nodes connected to three or more ping targets
.. code-block:: xml
...
...
...
Alternatively, you can tell the cluster only to *prefer* nodes with the best
connectivity, by using ``score-attribute`` in the rule. Just be sure to set
``multiplier`` to a value higher than that of ``resource-stickiness`` (and
don't set either of them to ``INFINITY``).
.. topic:: Prefer node with most connected ping nodes
.. code-block:: xml
It is perhaps easier to think of this in terms of the simple constraints that
the cluster translates it into. For example, if ``sles-1`` is connected to all
five ping nodes but ``sles-2`` is only connected to two, then it would be as if
you instead had the following constraints in your configuration:
.. topic:: How the cluster translates the above location constraint
.. code-block:: xml
The advantage is that you don't have to manually update any constraints
whenever your network connectivity changes.
You can also combine the concepts above into something even more complex. The
example below shows how you can prefer the node with the most connected ping
nodes provided they have connectivity to at least three (again assuming that
``multiplier`` is set to 1000).
.. topic:: More complex example of choosing location based on connectivity
.. code-block:: xml
diff --git a/doc/sphinx/Pacemaker_Administration/options.rst b/doc/sphinx/Pacemaker_Administration/options.rst
index 776bb3606c..f75862441c 100644
--- a/doc/sphinx/Pacemaker_Administration/options.rst
+++ b/doc/sphinx/Pacemaker_Administration/options.rst
@@ -1,232 +1,232 @@
.. index:: client options
Client Options
--------------
Pacemaker uses several environment variables set on the client side.
.. note:: Directory and file paths below may differ on your system depending on
your Pacemaker build settings. Check your Pacemaker configuration
file to find the correct paths.
.. list-table:: **Client-side Environment Variables**
:class: longtable
- :widths: 2 4 5
+ :widths: 20 30 50
:header-rows: 1
* - Environment Variable
- Default
- Description
* - .. _CIB_encrypted:
.. index::
single: CIB_encrypted
single: environment variable; CIB_encrypted
CIB_encrypted
- true
- Whether to encrypt network traffic. Used with :ref:`CIB_port `
for connecting to a remote CIB instance; ignored if
:ref:`CIB_port ` is not set.
* - .. _CIB_file:
.. index::
single: CIB_file
single: environment variable; CIB_file
CIB_file
-
- If set, CIB connections are created against the named XML file. Clients
read an input CIB from, and write the result CIB to, the named file.
Ignored if :ref:`CIB_shadow ` is set.
* - .. _CIB_passwd:
.. index::
single: CIB_passwd
single: environment variable; CIB_passwd
CIB_passwd
-
- :ref:`$CIB_user `'s password. Read from the command line if
unset. Used with :ref:`CIB_port ` for connecting to a remote
CIB instance; ignored if :ref:`CIB_port ` is not set.
* - .. _CIB_port:
.. index::
single: CIB_port
single: environment variable; CIB_port
CIB_port
-
- If set, CIB connections are created as clients to a remote CIB instance
on :ref:`$CIB_server ` via this port. Ignored if
:ref:`CIB_shadow ` or :ref:`CIB_file ` is set.
* - .. _CIB_server:
.. index::
single: CIB_server
single: environment variable; CIB_server
CIB_server
- localhost
- The host to connect to. Used with :ref:`CIB_port ` for
connecting to a remote CIB instance; ignored if
:ref:`CIB_port ` is not set.
* - .. _CIB_ca_file:
.. index::
single: CIB_ca_file
single: environment variable; CIB_ca_file
CIB_ca_file
-
- If this, :ref:`CIB_cert_file `, and
:ref:`CIB_key_file ` are set, remote CIB administration
will be encrypted using X.509 (SSL/TLS) certificates, with this root
certificate for the certificate authority. Used with :ref:`CIB_port
` for connecting to a remote CIB instance; ignored if
:ref:`CIB_port ` is not set.
* - .. _CIB_cert_file:
.. index::
single: CIB_cert_file
single: environment variable; CIB_cert_file
CIB_cert_file
-
- If this, :ref:`CIB_ca_file `, and
:ref:`CIB_key_file ` are set, remote CIB administration
will be encrypted using X.509 (SSL/TLS) certificates, with this
certificate for the local host. Used with :ref:`CIB_port ` for
connecting to a remote CIB instance; ignored if
:ref:`CIB_port ` is not set.
* - .. _CIB_key_file:
.. index::
single: CIB_key_file
single: environment variable; CIB_key_file
CIB_key_file
-
- If this, :ref:`CIB_ca_file `, and
:ref:`CIB_cert_file ` are set, remote CIB administration
will be encrypted using X.509 (SSL/TLS) certificates, with this
private key for the local host. Used with :ref:`CIB_port ` for
connecting to a remote CIB instance; ignored if
:ref:`CIB_port ` is not set.
* - .. _CIB_crl_file:
.. index::
single: CIB_crl_file
single: environment variable; CIB_crl_file
CIB_crl_file
-
- If this, :ref:`CIB_ca_file `,
:ref:`CIB_cert_file `, and
:ref:`CIB_key_file ` are all set, then certificates listed
in this PEM-format Certificate Revocation List file will be rejected.
* - .. _CIB_shadow:
.. index::
single: CIB_shadow
single: environment variable; CIB_shadow
CIB_shadow
-
- If set, CIB connections are created against a temporary working
("shadow") CIB file called ``shadow.$CIB_shadow`` in
:ref:`$CIB_shadow_dir `. Should be set only to the name
of a shadow CIB created by :ref:`crm_shadow `. Otherwise,
behavior is undefined.
* - .. _CIB_shadow_dir:
.. index::
single: CIB_shadow_dir
single: environment variable; CIB_shadow_dir
CIB_shadow_dir
- |CRM_CONFIG_DIR| if the current user is ``root`` or |CRM_DAEMON_USER|;
otherwise ``$HOME/.cib`` if :ref:`$HOME ` is set; otherwise
``$TMPDIR/.cib`` if :ref:`$TMPDIR ` is set to an absolute path;
otherwise ``/tmp/.cib``
- If set, shadow files are created in this directory. Ignored if
:ref:`CIB_shadow ` is not set.
* - .. _CIB_user:
.. index::
single: CIB_user
single: environment variable; CIB_user
CIB_user
- |CRM_DAEMON_USER| if used with :ref:`CIB_port `, or the current
effective user otherwise
- If used with :ref:`CIB_port `, connect to
:ref:`$CIB_server ` as this user. Must be part of the
|CRM_DAEMON_GROUP| group on :ref:`$CIB_server `. Otherwise
(without :ref:`CIB_port `), this is used only for ACL and
display purposes.
* - .. _EDITOR:
.. index::
single: EDITOR
single: environment variable; EDITOR
EDITOR
-
- Text editor to use for editing shadow files. Required for the ``--edit``
command of :ref:`crm_shadow `.
* - .. _HOME:
.. index::
single: HOME
single: environment variable; HOME
HOME
- Current user's home directory as configured in the passwd database, if an
entry exists
- Used to create a default :ref:`CIB_shadow_dir ` for non-
privileged users.
* - .. _PE_fail:
.. index::
single: PE_fail
single: environment variable; PE_fail
PE_fail
- 0
- Advanced use only: A dummy graph action with action ID matching this
option will be marked as failed. Primarily for developer use with
scheduler simulations.
* - .. _PS1:
.. index::
single: PS1
single: environment variable; PS1
PS1
-
- The shell's primary prompt string. Used by
:ref:`crm_shadow `: set to indicate that the user is in an
interactive shadow CIB session, and checked to determine whether the user
is already in an interactive session before creating a new one.
* - .. _SHELL:
.. index::
single: SHELL
single: environment variable; SHELL
SHELL
-
- Absolute path to a shell. Used by :ref:`crm_shadow ` when
launching an interactive session.
* - .. _TMPDIR:
.. index::
single: TMPDIR
single: environment variable; TMPDIR
TMPDIR
- /tmp
- Directory for temporary files. If not an absolute path, the default is
used instead.
diff --git a/doc/sphinx/Pacemaker_Administration/tools.rst b/doc/sphinx/Pacemaker_Administration/tools.rst
index de9ee85607..7e1eb4eb69 100644
--- a/doc/sphinx/Pacemaker_Administration/tools.rst
+++ b/doc/sphinx/Pacemaker_Administration/tools.rst
@@ -1,561 +1,562 @@
.. index:: command-line tool
Using Pacemaker Command-Line Tools
----------------------------------
.. index::
single: command-line tool; output format
.. _cmdline_output:
Controlling Command Line Output
###############################
Some of the pacemaker command line utilities have been converted to a new
output system. Among these tools are ``crm_mon`` and ``stonith_admin``. This
is an ongoing project, and more tools will be converted over time. This system
lets you control the formatting of output with ``--output-as=`` and the
destination of output with ``--output-to=``.
The available formats vary by tool, but at least plain text and XML are
supported by all tools that use the new system. The default format is plain
text. The default destination is stdout but can be redirected to any file.
Some formats support command line options for changing the style of the output.
For instance:
.. code-block:: none
# crm_mon --help-output
Usage:
crm_mon [OPTION?]
Provides a summary of cluster's current state.
Outputs varying levels of detail in a number of different formats.
Output Options:
--output-as=FORMAT Specify output format as one of: console (default), html, text, xml
--output-to=DEST Specify file name for output (or "-" for stdout)
--html-cgi Add text needed to use output in a CGI program
--html-stylesheet=URI Link to an external CSS stylesheet
--html-title=TITLE Page title
.. index::
single: crm_mon
single: command-line tool; crm_mon
.. _crm_mon:
Monitor a Cluster with crm_mon
##############################
The ``crm_mon`` utility displays the current state of an active cluster. It can
show the cluster status organized by node or by resource, and can be used in
either single-shot or dynamically updating mode. It can also display operations
performed and information about failures.
Using this tool, you can examine the state of the cluster for irregularities,
and see how it responds when you cause or simulate failures.
See the manual page or the output of ``crm_mon --help`` for a full description
of its many options.
.. topic:: Sample output from crm_mon -1
.. code-block:: none
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.0-1) - partition with quorum
* Last updated: Mon Jan 29 12:18:42 2018
* Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
* 5 nodes configured
* 2 resources configured
Node List:
* Online: [ node1 node2 node3 node4 node5 ]
* Active resources:
* Fencing (stonith:fence_xvm): Started node1
* IP (ocf:heartbeat:IPaddr2): Started node2
.. topic:: Sample output from crm_mon -n -1
.. code-block:: none
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.0-1) - partition with quorum
* Last updated: Mon Jan 29 12:21:48 2018
* Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
* 5 nodes configured
* 2 resources configured
* Node List:
* Node node1: online
* Fencing (stonith:fence_xvm): Started
* Node node2: online
* IP (ocf:heartbeat:IPaddr2): Started
* Node node3: online
* Node node4: online
* Node node5: online
As mentioned in an earlier chapter, the DC is the node is where decisions are
made. The cluster elects a node to be DC as needed. The only significance of
the choice of DC to an administrator is the fact that its logs will have the
most information about why decisions were made.
.. index::
pair: crm_mon; CSS
.. _crm_mon_css:
Styling crm_mon HTML output
___________________________
Various parts of ``crm_mon``'s HTML output have a CSS class associated with
them. Not everything does, but some of the most interesting portions do. In
the following example, the status of each node has an ``online`` class and the
details of each resource have an ``rsc-ok`` class.
.. code-block:: html
Node List
-
Node: cluster01 online
- ping (ocf::pacemaker:ping): Started
-
Node: cluster02 online
- ping (ocf::pacemaker:ping): Started
By default, a stylesheet for styling these classes is included in the head of
the HTML output. The relevant portions of this stylesheet that would be used
in the above example is:
.. code-block:: css
If you want to override some or all of the styling, simply create your own
stylesheet, place it on a web server, and pass ``--html-stylesheet=``
to ``crm_mon``. The link is added after the default stylesheet, so your
changes take precedence. You don't need to duplicate the entire default.
Only include what you want to change.
.. index::
single: cibadmin
single: command-line tool; cibadmin
.. _cibadmin:
Edit the CIB XML with cibadmin
##############################
The most flexible tool for modifying the configuration is Pacemaker's
``cibadmin`` command. With ``cibadmin``, you can query, add, remove, update
or replace any part of the configuration. All changes take effect immediately,
so there is no need to perform a reload-like operation.
The simplest way of using ``cibadmin`` is to use it to save the current
configuration to a temporary file, edit that file with your favorite
text or XML editor, and then upload the revised configuration.
.. topic:: Safely using an editor to modify the cluster configuration
.. code-block:: none
# cibadmin --query > tmp.xml
# vi tmp.xml
# cibadmin --replace --xml-file tmp.xml
Some of the better XML editors can make use of a RELAX NG schema to
help make sure any changes you make are valid. The schema describing
the configuration can be found in ``pacemaker.rng``, which may be
deployed in a location such as ``/usr/share/pacemaker`` depending on your
operating system distribution and how you installed the software.
If you want to modify just one section of the configuration, you can
query and replace just that section to avoid modifying any others.
.. topic:: Safely using an editor to modify only the resources section
.. code-block:: none
# cibadmin --query --scope resources > tmp.xml
# vi tmp.xml
# cibadmin --replace --scope resources --xml-file tmp.xml
To quickly delete a part of the configuration, identify the object you wish to
delete by XML tag and id. For example, you might search the CIB for all
STONITH-related configuration:
.. topic:: Searching for STONITH-related configuration items
.. code-block:: none
# cibadmin --query | grep stonith
If you wanted to delete the ``primitive`` tag with id ``child_DoFencing``,
you would run:
.. code-block:: none
# cibadmin --delete --xml-text ''
See the cibadmin man page for more options.
.. warning::
Never edit the live ``cib.xml`` file directly. Pacemaker will detect such
changes and refuse to use the configuration.
.. index::
single: crm_shadow
single: command-line tool; crm_shadow
.. _crm_shadow:
Batch Configuration Changes with crm_shadow
###########################################
Often, it is desirable to preview the effects of a series of configuration
changes before updating the live configuration all at once. For this purpose,
``crm_shadow`` creates a "shadow" copy of the configuration and arranges for
all the command-line tools to use it.
To begin, simply invoke ``crm_shadow --create`` with a name of your choice,
and follow the simple on-screen instructions. Shadow copies are identified with
a name to make it possible to have more than one.
.. warning::
Read this section and the on-screen instructions carefully; failure to do so
could result in destroying the cluster's active configuration!
.. topic:: Creating and displaying the active sandbox
.. code-block:: none
# crm_shadow --create test
Setting up shadow instance
Type Ctrl-D to exit the crm_shadow shell
shadow[test]:
shadow[test] # crm_shadow --which
test
From this point on, all cluster commands will automatically use the shadow copy
instead of talking to the cluster's active configuration. Once you have
finished experimenting, you can either make the changes active via the
``--commit`` option, or discard them using the ``--delete`` option. Again, be
sure to follow the on-screen instructions carefully!
For a full list of ``crm_shadow`` options and commands, invoke it with the
``--help`` option.
.. topic:: Use sandbox to make multiple changes all at once, discard them, and verify real configuration is untouched
.. code-block:: none
shadow[test] # crm_failcount -r rsc_c001n01 -G
scope=status name=fail-count-rsc_c001n01 value=0
shadow[test] # crm_standby --node c001n02 -v on
shadow[test] # crm_standby --node c001n02 -G
scope=nodes name=standby value=on
shadow[test] # cibadmin --erase --force
shadow[test] # cibadmin --query
shadow[test] # crm_shadow --delete test --force
Now type Ctrl-D to exit the crm_shadow shell
shadow[test] # exit
# crm_shadow --which
No active shadow configuration defined
# cibadmin -Q
See the next section, :ref:`crm_simulate`, for how to test your changes before
committing them to the live cluster.
.. index::
single: crm_simulate
single: command-line tool; crm_simulate
.. _crm_simulate:
Simulate Cluster Activity with crm_simulate
###########################################
The command-line tool `crm_simulate` shows the results of the same logic
the cluster itself uses to respond to a particular cluster configuration and
status.
As always, the man page is the primary documentation, and should be consulted
for further details. This section aims for a better conceptual explanation and
practical examples.
Replaying cluster decision-making logic
_______________________________________
At any given time, one node in a Pacemaker cluster will be elected DC, and that
node will run Pacemaker's scheduler to make decisions.
Each time decisions need to be made (a "transition"), the DC will have log
messages like "Calculated transition ... saving inputs in ..." with a file
name. You can grab the named file and replay the cluster logic to see why
particular decisions were made. The file contains the live cluster
configuration at that moment, so you can also look at it directly to see the
value of node attributes, etc., at that time.
The simplest usage is (replacing $FILENAME with the actual file name):
.. topic:: Simulate cluster response to a given CIB
.. code-block:: none
# crm_simulate --simulate --xml-file $FILENAME
That will show the cluster state when the process started, the actions that
need to be taken ("Transition Summary"), and the resulting cluster state if the
actions succeed. Most actions will have a brief description of why they were
required.
The transition inputs may be compressed. ``crm_simulate`` can handle these
compressed files directly, though if you want to edit the file, you'll need to
uncompress it first.
You can do the same simulation for the live cluster configuration at the
current moment. This is useful mainly when using ``crm_shadow`` to create a
sandbox version of the CIB; the ``--live-check`` option will use the shadow CIB
if one is in effect.
.. topic:: Simulate cluster response to current live CIB or shadow CIB
.. code-block:: none
# crm_simulate --simulate --live-check
Why decisions were made
_______________________
To get further insight into the "why", it gets user-unfriendly very quickly. If
you add the ``--show-scores`` option, you will also see all the scores that
went into the decision-making. The node with the highest cumulative score for a
resource will run it. You can look for ``-INFINITY`` scores in particular to
see where complete bans came into effect.
You can also add ``-VVVV`` to get more detailed messages about what's happening
under the hood. You can add up to two more V's even, but that's usually useful
only if you're a masochist or tracing through the source code.
Visualizing the action sequence
_______________________________
Another handy feature is the ability to generate a visual graph of the actions
needed, using the ``--save-dotfile`` option. This relies on the separate
Graphviz [#]_ project.
.. topic:: Generate a visual graph of cluster actions from a saved CIB
.. code-block:: none
# crm_simulate --simulate --xml-file $FILENAME --save-dotfile $FILENAME.dot
# dot $FILENAME.dot -Tsvg > $FILENAME.svg
``$FILENAME.dot`` will contain a GraphViz representation of the cluster's
response to your changes, including all actions with their ordering
dependencies.
``$FILENAME.svg`` will be the same information in a standard graphical format
that you can view in your browser or other app of choice. You could, of course,
use other ``dot`` options to generate other formats.
How to interpret the graphical output:
* Bubbles indicate actions, and arrows indicate ordering dependencies
* Resource actions have text of the form
``__ `` indicating that the
specified action will be executed for the specified resource on the
specified node, once if interval is 0 or at specified recurring interval
otherwise
* Actions with black text will be sent to the executor (that is, the
appropriate agent will be invoked)
* Actions with orange text are "pseudo" actions that the cluster uses
internally for ordering but require no real activity
* Actions with a solid green border are part of the transition (that is, the
cluster will attempt to execute them in the given order -- though a
transition can be interrupted by action failure or new events)
* Dashed arrows indicate dependencies that are not present in the transition
graph
* Actions with a dashed border will not be executed. If the dashed border is
blue, the cluster does not feel the action needs to be executed. If the
dashed border is red, the cluster would like to execute the action but
cannot. Any actions depending on an action with a dashed border will not be
able to execute.
* Loops should not happen, and should be reported as a bug if found.
.. topic:: Small Cluster Transition
.. image:: ../shared/images/Policy-Engine-small.png
:alt: An example transition graph as represented by Graphviz
:align: center
In the above example, it appears that a new node, ``pcmk-2``, has come online
and that the cluster is checking to make sure ``rsc1``, ``rsc2`` and ``rsc3``
are not already running there (indicated by the ``rscN_monitor_0`` entries).
Once it did that, and assuming the resources were not active there, it would
have liked to stop ``rsc1`` and ``rsc2`` on ``pcmk-1`` and move them to
``pcmk-2``. However, there appears to be some problem and the cluster cannot or
is not permitted to perform the stop actions which implies it also cannot
perform the start actions. For some reason, the cluster does not want to start
``rsc3`` anywhere.
.. topic:: Complex Cluster Transition
.. image:: ../shared/images/Policy-Engine-big.png
:alt: Complex transition graph that you're not expected to be able to read
:align: center
What-if scenarios
_________________
You can make changes to the saved or shadow CIB and simulate it again, to see
how Pacemaker would react differently. You can edit the XML by hand, use
command-line tools such as ``cibadmin`` with either a shadow CIB or the
``CIB_file`` environment variable set to the filename, or use higher-level tool
support (see the man pages of the specific tool you're using for how to perform
actions on a saved CIB file rather than the live CIB).
You can also inject node failures and/or action failures into the simulation;
see the ``crm_simulate`` man page for more details.
This capability is useful when using a shadow CIB to edit the configuration.
Before committing the changes to the live cluster with ``crm_shadow --commit``,
you can use ``crm_simulate`` to see how the cluster will react to the changes.
.. _crm_attribute:
.. index::
single: attrd_updater
single: command-line tool; attrd_updater
single: crm_attribute
single: command-line tool; crm_attribute
Manage Node Attributes, Cluster Options and Defaults with crm_attribute and attrd_updater
#########################################################################################
``crm_attribute`` and ``attrd_updater`` are confusingly similar tools with subtle
differences.
``attrd_updater`` can query and update node attributes. ``crm_attribute`` can query
and update not only node attributes, but also cluster options, resource
defaults, and operation defaults.
To understand the differences, it helps to understand the various types of node
attribute.
.. table:: **Types of Node Attributes**
+ :widths: 20 16 16 16 16 16
+-----------+----------+-------------------+------------------+----------------+----------------+
| Type | Recorded | Recorded in | Survive full | Manageable by | Manageable by |
| | in CIB? | attribute manager | cluster restart? | crm_attribute? | attrd_updater? |
| | | memory? | | | |
+===========+==========+===================+==================+================+================+
| permanent | yes | no | yes | yes | no |
+-----------+----------+-------------------+------------------+----------------+----------------+
| transient | yes | yes | no | yes | yes |
+-----------+----------+-------------------+------------------+----------------+----------------+
| private | no | yes | no | no | yes |
+-----------+----------+-------------------+------------------+----------------+----------------+
As you can see from the table above, ``crm_attribute`` can manage permanent and
transient node attributes, while ``attrd_updater`` can manage transient and
private node attributes.
The difference between the two tools lies mainly in *how* they update node
attributes: ``attrd_updater`` always contacts the Pacemaker attribute manager
directly, while ``crm_attribute`` will contact the attribute manager only for
transient node attributes, and will instead modify the CIB directly for
permanent node attributes (and for transient node attributes when unable to
contact the attribute manager).
By contacting the attribute manager directly, ``attrd_updater`` can change
an attribute's "dampening" (whether changes are immediately flushed to the CIB
or after a specified amount of time, to minimize disk writes for frequent
changes), set private node attributes (which are never written to the CIB), and
set attributes for nodes that don't yet exist.
By modifying the CIB directly, ``crm_attribute`` can set permanent node
attributes (which are only in the CIB and not managed by the attribute
manager), and can be used with saved CIB files and shadow CIBs.
However a transient node attribute is set, it is synchronized between the CIB
and the attribute manager, on all nodes.
.. index::
single: crm_failcount
single: command-line tool; crm_failcount
single: crm_node
single: command-line tool; crm_node
single: crm_report
single: command-line tool; crm_report
single: crm_standby
single: command-line tool; crm_standby
single: crm_verify
single: command-line tool; crm_verify
single: stonith_admin
single: command-line tool; stonith_admin
Other Commonly Used Tools
#########################
Other command-line tools include:
* ``crm_failcount``: query or delete resource fail counts
* ``crm_node``: manage cluster nodes
* ``crm_report``: generate a detailed cluster report for bug submissions
* ``crm_resource``: manage cluster resources
* ``crm_standby``: manage standby status of nodes
* ``crm_verify``: validate a CIB
* ``stonith_admin``: manage fencing devices
See the manual pages for details.
.. rubric:: Footnotes
.. [#] Graph visualization software. See http://www.graphviz.org/ for details.
diff --git a/doc/sphinx/Pacemaker_Administration/upgrading.rst b/doc/sphinx/Pacemaker_Administration/upgrading.rst
index b23c65ea89..00112c4b57 100644
--- a/doc/sphinx/Pacemaker_Administration/upgrading.rst
+++ b/doc/sphinx/Pacemaker_Administration/upgrading.rst
@@ -1,565 +1,566 @@
.. index:: upgrade
Upgrading a Pacemaker Cluster
-----------------------------
.. index:: version
Pacemaker Versioning
####################
Pacemaker has an overall release version, plus separate version numbers for
certain internal components.
.. index::
single: version; release
* **Pacemaker release version:** This version consists of three numbers
(*x.y.z*).
The major version number (the *x* in *x.y.z*) increases when at least some
rolling upgrades are not possible from the previous major version. For example,
a rolling upgrade from 1.0.8 to 1.1.15 should always be supported, but a
rolling upgrade from 1.0.8 to 2.0.0 may not be possible.
The minor version (the *y* in *x.y.z*) increases when there are significant
changes in cluster default behavior, tool behavior, and/or the API interface
(for software that utilizes Pacemaker libraries). The main benefit is to alert
you to pay closer attention to the release notes, to see if you might be
affected.
The release counter (the *z* in *x.y.z*) is increased with all public releases
of Pacemaker, which typically include both bug fixes and new features.
.. index::
single: feature set
single: version; feature set
* **CRM feature set:** This version number applies to the communication between
full cluster nodes, and is used to avoid problems in mixed-version clusters.
The major version number increases when nodes with different versions would not
work (rolling upgrades are not allowed). The minor version number increases
when mixed-version clusters are allowed only during rolling upgrades. The
minor-minor version number is ignored, but allows resource agents to detect
cluster support for various features. [#]_
Pacemaker ensures that the longest-running node is the cluster's DC. This
ensures new features are not enabled until all nodes are upgraded to support
them.
.. index::
single: version; Pacemaker Remote protocol
* **Pacemaker Remote protocol version:** This version applies to communication
between a Pacemaker Remote node and the cluster. It increases when an older
cluster node would have problems hosting the connection to a newer
Pacemaker Remote node. To avoid these problems, Pacemaker Remote nodes will
accept connections only from cluster nodes with the same or newer
Pacemaker Remote protocol version.
Unlike with CRM feature set differences between full cluster nodes,
mixed Pacemaker Remote protocol versions between Pacemaker Remote nodes and
full cluster nodes are fine, as long as the Pacemaker Remote nodes have the
older version. This can be useful, for example, to host a legacy application
in an older operating system version used as a Pacemaker Remote node.
.. index::
single: version; XML schema
* **XML schema version:** Pacemaker’s configuration syntax — what's allowed in
the Configuration Information Base (CIB) — has its own version. This allows
the configuration syntax to evolve over time while still allowing clusters
with older configurations to work without change.
.. index::
single: upgrade; methods
Upgrading Cluster Software
##########################
There are three approaches to upgrading a cluster, each with advantages and
disadvantages.
.. table:: **Upgrade Methods**
+ :widths: 16 14 14 14 14 14 14
+---------------------------------------------------+----------+----------+--------+---------+----------+----------+
| Method | Available| Can be | Service| Service | Exercises| Allows |
| | between | used with| outage | recovery| failover | change of|
| | all | Pacemaker| during | during | logic | messaging|
| | versions | Remote | upgrade| upgrade | | layer |
| | | nodes | | | | [#]_ |
+===================================================+==========+==========+========+=========+==========+==========+
| Complete cluster shutdown | yes | yes | always | N/A | no | yes |
+---------------------------------------------------+----------+----------+--------+---------+----------+----------+
| Rolling (node by node) | no | yes | always | yes | yes | no |
| | | | [#]_ | | | |
+---------------------------------------------------+----------+----------+--------+---------+----------+----------+
| Detach and reattach | yes | no | only | no | no | yes |
| | | | due to | | | |
| | | | failure| | | |
+---------------------------------------------------+----------+----------+--------+---------+----------+----------+
.. index::
single: upgrade; shutdown
Complete Cluster Shutdown
_________________________
In this scenario, one shuts down all cluster nodes and resources,
then upgrades all the nodes before restarting the cluster.
#. On each node:
a. Shutdown the cluster software (pacemaker and the messaging layer).
#. Upgrade the Pacemaker software. This may also include upgrading the
messaging layer and/or the underlying operating system.
#. Check the configuration with the ``crm_verify`` tool.
#. On each node:
a. Start the cluster software.
Currently, only Corosync version 2 and greater is supported as the cluster
layer, but if another stack is supported in the future, the stack does not
need to be the same one before the upgrade.
One variation of this approach is to build a new cluster on new hosts.
This allows the new version to be tested beforehand, and minimizes downtime by
having the new nodes ready to be placed in production as soon as the old nodes
are shut down.
.. index::
single: upgrade; rolling upgrade
Rolling (node by node)
______________________
In this scenario, each node is removed from the cluster, upgraded, and then
brought back online, until all nodes are running the newest version.
Special considerations when planning a rolling upgrade:
* If you plan to upgrade other cluster software -- such as the messaging layer --
at the same time, consult that software's documentation for its compatibility
with a rolling upgrade.
* If the major version number is changing in the Pacemaker version you are
upgrading to, a rolling upgrade may not be possible. Read the new version's
release notes (as well the information here) for what limitations may exist.
* If the CRM feature set is changing in the Pacemaker version you are upgrading
to, you should run a mixed-version cluster only during a small rolling
upgrade window. If one of the older nodes drops out of the cluster for any
reason, it will not be able to rejoin until it is upgraded.
* If the Pacemaker Remote protocol version is changing, all cluster nodes
should be upgraded before upgrading any Pacemaker Remote nodes.
See the
`Pacemaker release calendar
`_
on the ClusterLabs wiki to figure out whether the CRM feature set and/or
Pacemaker Remote protocol version changed between the Pacemaker release versions
in your rolling upgrade.
To perform a rolling upgrade, on each node in turn:
#. Put the node into standby mode, and wait for any active resources
to be moved cleanly to another node. (This step is optional, but
allows you to deal with any resource issues before the upgrade.)
#. Shut down Pacemaker or ``pacemaker-remoted``.
#. If a cluster node, shut down the messaging layer.
#. Upgrade the Pacemaker software. This may also include upgrading the
messaging layer and/or the underlying operating system.
#. If this is the first node to be upgraded, check the configuration
with the ``crm_verify`` tool.
#. If a cluster node, start the messaging layer.
This must be the same messaging layer (currently only Corosync version 2 and
greater is supported) that the rest of the cluster is using.
#. Start Pacemaker or ``pacemaker-remoted``.
.. note::
Even if a rolling upgrade from the current version of the cluster to the
newest version is not directly possible, it may be possible to perform a
rolling upgrade in multiple steps, by upgrading to an intermediate version
first.
The following table lists compatible versions for all other nodes in the cluster
when upgrading a cluster node.
.. list-table:: **Version Compatibility for Cluster Nodes**
:class: longtable
- :widths: 1 1
+ :widths: 50 50
:header-rows: 1
* - Version Being Installed
- Minimum Compatible Version
* - Pacemaker 3.y.z
- Pacemaker 2.0.0
* - Pacemaker 2.y.z
- Pacemaker 1.1.11 [#]_
* - Pacemaker 1.y.z
- Pacemaker 1.0.0
* - Pacemaker 0.6.z to 0.7.z
- Pacemaker 0.6.0
When upgrading a Pacemaker Remote node, all cluster nodes must be running at
least the minimum version listed in the table below.
.. list-table:: **Cluster Node Version Compatibility for Pacemaker Remote Nodes**
:class: longtable
- :widths: 1 1
+ :widths: 50 50
:header-rows: 1
* - Pacemaker Remote Version
- Minimum Cluster Node Version
* - Pacemaker 3.y.z
- Pacemaker 2.0.0
* - Pacemaker 1.1.9 to 2.1.z
- Pacemaker 1.1.9 [#]_
.. index::
single: upgrade; detach and reattach
Detach and Reattach
___________________
The reattach method is a variant of a complete cluster shutdown, where the
resources are left active and get re-detected when the cluster is restarted.
This method may not be used if the cluster contains any Pacemaker Remote nodes.
#. Tell the cluster to stop managing services. This is required to allow the
services to remain active after the cluster shuts down.
.. code-block:: none
# crm_attribute --name maintenance-mode --update true
#. On each node, shutdown the cluster software (pacemaker and the messaging
layer), and upgrade the Pacemaker software. This may also include upgrading
the messaging layer. While the underlying operating system may be upgraded
at the same time, that will be more likely to cause outages in the detached
services (certainly, if a reboot is required).
#. Check the configuration with the ``crm_verify`` tool.
#. On each node, start the cluster software.
Currently, only Corosync version 2 and greater is supported as the cluster
layer, but if another stack is supported in the future, the stack does not
need to be the same one before the upgrade.
#. Verify that the cluster re-detected all resources correctly.
#. Allow the cluster to resume managing resources again:
.. code-block:: none
# crm_attribute --name maintenance-mode --delete
.. note::
While the goal of the detach-and-reattach method is to avoid disturbing
running services, resources may still move after the upgrade if any
resource's location is governed by a rule based on transient node
attributes. Transient node attributes are erased when the node leaves the
cluster. A common example is using the ``ocf:pacemaker:ping`` resource to
set a node attribute used to locate other resources.
.. index::
pair: upgrade; CIB
Upgrading the Configuration
###########################
The CIB schema version can change from one Pacemaker version to another.
After cluster software is upgraded, the cluster will continue to use the older
schema version that it was previously using. This can be useful, for example,
when administrators have written tools that modify the configuration, and are
based on the older syntax. [#]_
However, when using an older syntax, new features may be unavailable, and there
is a performance impact, since the cluster must do a non-persistent
configuration upgrade before each transition. So while using the old syntax is
possible, it is not advisable to continue using it indefinitely.
Even if you wish to continue using the old syntax, it is a good idea to
follow the upgrade procedure outlined below, except for the last step, to ensure
that the new software has no problems with your existing configuration (since it
will perform much the same task internally).
If you are brave, it is sufficient simply to run ``cibadmin --upgrade``.
A more cautious approach would proceed like this:
#. Create a shadow copy of the configuration. The later commands will
automatically operate on this copy, rather than the live configuration.
.. code-block:: none
# crm_shadow --create shadow
.. index::
single: configuration; verify
#. Verify the configuration is valid with the new software (which may be
stricter about syntax mistakes, or may have dropped support for deprecated
features):
.. code-block:: none
# crm_verify --live-check
#. Fix any errors or warnings.
#. Perform the upgrade:
.. code-block:: none
# cibadmin --upgrade
#. If this step fails, there are three main possibilities:
a. The configuration was not valid to start with (did you do steps 2 and
3?).
#. The transformation failed; `report a bug `_.
#. The transformation was successful but produced an invalid result.
If the result of the transformation is invalid, you may see a number of
errors from the validation library. If these are not helpful, try the manual
upgrade procedure described below.
#. Check the changes:
.. code-block:: none
# crm_shadow --diff
If at this point there is anything about the upgrade that you wish to
fine-tune (for example, to change some of the automatic IDs), now is the
time to do so:
.. code-block:: none
# crm_shadow --edit
This will open the configuration in your favorite editor (whichever is
specified by the standard ``$EDITOR`` environment variable).
#. Preview how the cluster will react:
.. code-block:: none
# crm_simulate --live-check --save-dotfile shadow.dot -S
# dot -Tsvg shadow.dot -o shadow.svg
You can then view shadow.svg with any compatible image viewer or web
browser. Verify that either no resource actions will occur or that you are
happy with any that are scheduled. If the output contains actions you do
not expect (possibly due to changes to the score calculations), you may need
to make further manual changes. See :ref:`crm_simulate` for further details
on how to interpret the output of ``crm_simulate`` and ``dot``.
#. Upload the changes:
.. code-block:: none
# crm_shadow --commit shadow --force
In the unlikely event this step fails, please report a bug.
.. note::
It is also possible to perform the configuration upgrade steps manually:
#. Locate the ``upgrade*.xsl`` conversion scripts provided with the source
code. These will often be installed in a location such as
``/usr/share/pacemaker``, or may be obtained from the
`source repository `_.
#. Run the conversion scripts that apply to your older version, for example:
.. code-block:: none
# xsltproc /path/to/upgrade06.xsl config06.xml > config10.xml
#. Locate the ``pacemaker.rng`` script (from the same location as the xsl
files).
#. Check the XML validity:
.. code-block:: none
# xmllint --relaxng /path/to/pacemaker.rng config10.xml
The advantage of this method is that it can be performed without the cluster
running, and any validation errors are often more informative.
What Changed in 2.1
###################
The Pacemaker 2.1 release is fully backward-compatible in both the CIB XML and
the C API. Highlights:
* Pacemaker now supports the **OCF Resource Agent API version 1.1**.
Most notably, the ``Master`` and ``Slave`` role names have been renamed to
``Promoted`` and ``Unpromoted``.
* Pacemaker now supports colocations where the dependent resource does not
affect the primary resource's placement (via a new ``influence`` colocation
constraint option and ``critical`` resource meta-attribute). This is intended
for cases where a less-important resource must be colocated with an essential
resource, but it is preferred to leave the less-important resource stopped if
it fails, rather than move both resources.
* If Pacemaker is built with libqb 2.0 or later, the detail log will use
**millisecond-resolution timestamps**.
* In addition to crm_mon and stonith_admin, the crmadmin, crm_resource,
crm_simulate, and crm_verify commands now support the ``--output-as`` and
``--output-to`` options, including **XML output** (which scripts and
higher-level tools are strongly recommended to use instead of trying to parse
the text output, which may change from release to release).
For a detailed list of changes, see the release notes and
`Pacemaker 2.1 Changes
`_
on the ClusterLabs wiki.
What Changed in 2.0
###################
The main goal of the 2.0 release was to remove support for deprecated syntax,
along with some small changes in default configuration behavior and tool
behavior. Highlights:
* Only Corosync version 2 and greater is now supported as the underlying
cluster layer. Support for Heartbeat and Corosync 1 (including CMAN) is
removed.
* The Pacemaker detail log file is now stored in
``/var/log/pacemaker/pacemaker.log`` by default.
* The record-pending cluster property now defaults to true, which
allows status tools such as crm_mon to show operations that are in
progress.
* Support for a number of deprecated build options, environment variables,
and configuration settings has been removed.
* The ``master`` tag has been deprecated in favor of using the ``clone`` tag
with the new ``promotable`` meta-attribute set to ``true``. "Master/slave"
clone resources are now referred to as "promotable" clone resources.
* The public API for Pacemaker libraries that software applications can use
has changed significantly.
For a detailed list of changes, see the release notes and
`Pacemaker 2.0 Changes
`_
on the ClusterLabs wiki.
What Changed in 1.0
###################
New
___
* Failure timeouts.
* New section for resource and operation defaults.
* Tool for making offline configuration changes.
* ``Rules``, ``instance_attributes``, ``meta_attributes`` and sets of
operations can be defined once and referenced in multiple places.
* The CIB now accepts XPath-based create/modify/delete operations. See
``cibadmin --help``.
* Multi-dimensional colocation and ordering constraints.
* The ability to connect to the CIB from non-cluster machines.
* Allow recurring actions to be triggered at known times.
Changed
_______
* Syntax
* All resource and cluster options now use dashes (-) instead of underscores
(_)
* ``master_slave`` was renamed to ``master``
* The ``attributes`` container tag was removed
* The operation field ``pre-req`` has been renamed ``requires``
* All operations must have an ``interval``, ``start``/``stop`` must have it
set to zero
* The ``stonith-enabled`` option now defaults to true.
* The cluster will refuse to start resources if ``stonith-enabled`` is true (or
unset) and no STONITH resources have been defined
* The attributes of colocation and ordering constraints were renamed for
clarity.
* ``resource-failure-stickiness`` has been replaced by ``migration-threshold``.
* The parameters for command-line tools have been made consistent
* Switched to 'RelaxNG' schema validation and 'libxml2' parser
* id fields are now XML IDs which have the following limitations:
* id's cannot contain colons (:)
* id's cannot begin with a number
* id's must be globally unique (not just unique for that tag)
* Some fields (such as those in constraints that refer to resources) are
IDREFs.
This means that they must reference existing resources or objects in
order for the configuration to be valid. Removing an object which is
referenced elsewhere will therefore fail.
* The CIB representation, from which a MD5 digest is calculated to verify
CIBs on the nodes, has changed.
This means that every CIB update will require a full refresh on any
upgraded nodes until the cluster is fully upgraded to 1.0. This will result
in significant performance degradation and it is therefore highly
inadvisable to run a mixed 1.0/0.6 cluster for any longer than absolutely
necessary.
* Ping node information no longer needs to be added to ``ha.cf``. Simply
include the lists of hosts in your ping resource(s).
Removed
_______
* Syntax
* It is no longer possible to set resource meta options as top-level
attributes. Use meta-attributes instead.
* Resource and operation defaults are no longer read from ``crm_config``.
.. rubric:: Footnotes
.. [#] Before CRM feature set 3.1.0 (Pacemaker 2.0.0), the minor-minor version
number was treated the same as the minor version.
.. [#] Currently, Corosync version 2 and greater is the only supported cluster
stack, but other stacks have been supported by past versions, and may be
supported by future versions.
.. [#] Any active resources will be moved off the node being upgraded, so there
will be at least a brief outage unless all resources can be migrated
"live".
.. [#] Rolling upgrades from Pacemaker 1.1.z to 2.y.z are possible only if the
cluster uses corosync version 2 or greater as its messaging layer, and
the Cluster Information Base (CIB) uses schema 1.0 or higher in its
``validate-with`` property.
.. [#] Pacemaker Remote versions 1.1.15 through 1.1.17 require cluster nodes to
be at least version 1.1.15. Version 1.1.15 introduced an accidental
remote protocol version bump, breaking rolling upgrade compatibility with
older versions. This was fixed in 1.1.18.
.. [#] As of Pacemaker 2.0.0, only schema versions pacemaker-1.0 and higher
are supported (excluding pacemaker-1.1, which was a special case).