diff --git a/doc/Pacemaker_Explained/en-US/Ch-Resources.xml b/doc/Pacemaker_Explained/en-US/Ch-Resources.xml index b867321922..3ecff9a33d 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Resources.xml +++ b/doc/Pacemaker_Explained/en-US/Ch-Resources.xml @@ -1,503 +1,503 @@ Cluster Resources
What is a Cluster Resource The role of a resource agent is to abstract the service it provides and present a consistent view to the cluster, which allows the cluster to be agnostic about the resources it manages. The cluster doesn't need to understand how the resource works because it relies on the resource agent to do the right thing when given a start, stop or monitor command. For this reason it is crucial that resource agents are well tested. Typically resource agents come in the form of shell scripts, however they can be written using any technology (such as C, Python or Perl) that the author is comfortable with.
Supported Resource Classes There are three basic classes of agents supported by Pacemaker. In order of encouraged usage they are:
Open Cluster Framework The OCF Spec (as it relates to resource agents) can be found at: Note: The Pacemaker implementation has been somewhat extended from the OCF Specs, but none of those changes are incompatible with the original OCF specification and is basically an extension of the Linux Standard Base conventions for init scripts to support parameters make them self describing, and extensible OCF specs have strict definitions of what exit codes actions must return Included with the cluster is the ocf-tester script which can be useful in this regard. The cluster follows these specifications exactly, and exiting with the wrong exit code will cause the cluster to behave in ways you will likely find puzzling and annoying. In particular, the cluster needs to distinguish a completely stopped resource from one which is in some erroneous and indeterminate state. Parameters are passed to the script as environment variables, with the special prefix OCF_RESKEY_. So, if you need to be given a parameter which the user thinks of as ip it will be passed to the script as OCF_RESKEY_ip. The number and purpose of the parameters is completely arbitrary, however your script should advertise any that it supports using the meta-data command. - For more information, see and . + For more information, see and .
Linux Standard Base LSB resource agents are those found in /etc/init.d. Generally they are provided by the OS/distribution and in order to be used with the cluster, must conform to the LSB Spec. The LSB Spec (as it relates to init scripts) can be found at: Many distributions claim LSB compliance but ship with broken init scripts. To see if your init script is LSB-compatible, see the FAQ entry . The most common problems are: Not implementing the status operation at all Not observing the correct exit status codes for start/stop/status actions Starting a started resource returns an error (this violates the LSB spec) Stopping a stopped resource returns an error (this violates the LSB spec)
Legacy Heartbeat Version 1 of Heartbeat came with its own style of resource agents and it is highly likely that many people have written their own agents based on its conventions. To enable administrators to continue to use these agents, they are supported by the new cluster manager. For more information, see: The OCF class is the most preferred one as it is an industry standard, highly flexible (allowing parameters to be passed to agents in a non-positional manner) and self-describing. There is also an additional class, STONITH, which is used exclusively for fencing related resources. This is discussed later in .
Properties These values tell the cluster which script to use for the resource, where to find that script and what standards it conforms to. Properties of a Primitive Resource Field Description id Your name for the resource class The standard the script conforms to. Allowed values: heartbeat, lsb, ocf, stonith type The name of the Resource Agent you wish to use. eg. IPaddr or Filesystem provider The OCF spec allows multiple vendors to supply the same ResourceAgent. To use the OCF resource agents supplied with Heartbeat, you should specify heartbeat here.
Resource definitions can be queried with the crm_resource tool. For example crm_resource --resource Email --query-xml might produce An example LSB resource ]]> One of the main drawbacks to LSB resources is that they do not allow any parameters or, for an OCF resource: An example OCF resource ]]> or, finally for the equivalent legacy Heartbeat resource: An example Heartbeat resource ]]> Heartbeat resources take only ordered and unnamed parameters. The supplied name therefor indicates the order in which they are passed to the script. Only single digit values are allowed.
Resource Options Options are used by the cluster to decide how your resource should behave and can be easily set using the --meta option of the crm_resource command. Options for a Primitive Resource Field Default Description priority 0 If not all resources can be active, the cluster will stop lower priority resources in order to keep higher priority ones active. target-role Started What state should the cluster attempt to keep this resource in? Allowed values: Stopped - Force the resource to be stopped Started - Allow the resource to be started (In the case of multi-state resources, they will not promoted to master) Master - Allow the resource to be started and, if appropriate, promoted is-managed TRUE Is the cluster allowed to start and stop the resource? Allowed values: true, false resource-stickiness Inherited How much does the resource prefer to stay where it is? Defaults to the value of resource-stickiness in the rsc_defaults section migration-threshold 0 (disabled) How many failures should occur for this resource on a node before making the node ineligible to host this resource. failure-timeout 0 (disabled) How many seconds to wait before acting as if the failure had not occurred (and potentially allowing the resource back to the node on which it failed. multiple-active stop_start What should the cluster do if it ever finds the resource active on more than one node. Allowed values: block - mark the resource as unmanaged stop_only - stop all active instances and leave them that way stop_start - stop all active instances and start the resource in one location only
If you performed the following commands on the previous LSB Email resource crm_resource --meta --resource Email --set-parameter priority --property-value 100 crm_resource --meta --resource Email --set-parameter multiple-active --property-value block the resulting resource definition would be An LSB resource with cluster options ]]>
Setting Global Defaults for Resource Options To set a default value for a resource option, simply add it to the rsc_defaults section with crm_attribute. Thus, crm_attribute --type rsc_defaults --attr-name is-managed --attr-value false would prevent the cluster from starting or stopping any of the resources in the configuration (unless of course the individual resources were specifically enabled and had is-managed set to true).
Instance Attributes The scripts of some resource classes (LSB not being one of them) can be given parameters which determine how they behave and which instance of a service they control. If your resource agent supports parameters, you can add them with the crm_resource command. For instance crm_resource --resource Public-IP --set-parameter ip --property-value 1.2.3.4 would create an entry in the resource like this An example OCF resource with instance attributes ]]> For an OCF resource, the result would be an environment variable called OCF_RESKEY_ip with a value of 1.2.3.4 The list of instance attributes supported by an OCF script can be found by calling the resource script with the meta-data command. The output contains an XML description of all the supported attributes, their purpose and default values. Displaying the metadata for the Dummy resource agent template export OCF_ROOT=/usr/lib/ocf; $OCF_ROOT/resource.d/pacemaker/Dummy meta-data 1.0 This is a Dummy Resource Agent. It does absolutely nothing except keep track of whether its running or not. Its purpose in life is for testing and to serve as a template for RA writers. Dummy resource agent Location to store the resource state in. State file Dummy attribute that can be changed to cause a reload Dummy attribute that can be changed to cause a reload ]]>
Resource Operations
Monitoring Resources for Failure By default, the cluster will not ensure your resources are still healthy. To instruct the cluster to do this, you need to add a monitor operation to the resource's definition. An OCF resource with a recurring health check ]]> Properties of an Operation Field Description id Your name for the action. Must be unique. name The action to perform. Common values: monitor, start, stop interval How frequently (in seconds) to perform the operation. Default value: 0 timeout How long to wait before declaring the action has failed. requires What conditions need to be satisfied before this action occurs. Allowed values: nothing - The cluster may start this resource at any time quorum - The cluster can only start this resource if a majority of the configured nodes are active fencing - The cluster can only start this resource if a majority of the configured nodes are active and any failed or unknown nodes have been powered off. STONITH resources default to nothing, and all others default to fencing if STONITH is enabled and quorum otherwise. on-fail The action to take if this action ever fails. Allowed values: ignore - Pretend the resource did not fail block - Don't perform any further operations on the resource stop - Stop the resource and do not start it elsewhere restart - Stop the resource and start it again (possibly on a different node) fence - STONITH the node on which the resource failed standby - Move all resources away from the node on which the resource failed The default for the stop operation is fence when STONITH is enabled and block otherwise. All other operations default to stop. enabled If false, the operation is treated as if it does not exist. Allowed values: true, false
Setting Global Defaults for Operations To set a default value for a operation option, simply add it to the op_defaults section with crm_attribute. Thus, crm_attribute --type op_defaults --attr-name timeout --attr-value 20s would default each operation's timeout to 20 seconds. If an operation's definition also includes a value for timeout, then that value would be used instead (for that operation only).
When Resources Take a Long Time to Start/Stop There are a number of implicit operations that the cluster will always perform - start, stop and a non-recurring monitor operation (used at startup to check the resource isn't already active). If one of these is taking too long, then you can create an entry for them and simply specify a new value. An OCF resource with custom timeouts for its implicit actions ]]>
Multiple Monitor Operations Provided no two operations (for a single resource) have the same name and interval you can have as many monitor operations as you like. In this way you can do a superficial health check every minute and progressively more intense ones at higher intervals. To tell the resource agent what kind of check to perform, you need to provide each monitor with a different value for a common parameter. The OCF standard creates a special parameter called OCF_CHECK_LEVEL for this purpose and dictates that it is made available to the resource agent without the normal OCF_RESKEY_ prefix. Whatever name you choose, you can specify it by adding an instance_attributes block to the op tag. Note that it is up to each resource agent to look for the parameter and decide how to use it. An OCF resource with two recurring health checks performing different levels of checks ]]>
Disabling a Monitor Operation The easiest way to stop a recurring monitor is to just delete it. However there can be times when you only want to disable it temporarily. In such cases, simply add disabled="true" to the operation's definition. Example of an OCF resource with a disabled health check ]]> This can be achieved from the command-line by executing cibadmin -M -X ‘<op id="public-ip-check" disabled="true"/>' Once you've done whatever you needed to do, you can then re-enable it with cibadmin -M -X ‘<op id="public-ip-check" disabled="false"/>'