The role of a resource agent is to abstract the service it provides and present a consistent view to the cluster, which allows the cluster to be agnostic about the resources it manages.
The cluster doesn't need to understand how the resource works because it relies on the resource agent to do the right thing when given a start, stop or monitor command.
</para>
<para>For this reason it is crucial that resource agents are well tested. </para>
<para>Typically resource agents come in the form of shell scripts, however they can be written using any technology (such as C, Python or Perl) that the author is comfortable with.</para>
</section>
<section id="s-resource-supported">
<title>Supported Resource Classes</title>
<para>
There are three basic classes of agents supported by Pacemaker.
In order of encouraged usage they are:
</para>
<section id="s-resource-ocf">
<title>Open Cluster Framework</title>
<para>The OCF Spec (as it relates to resource agents) can be found at: <ulink url="http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD"/>
<footnote><para>Note: The Pacemaker implementation has been somewhat extended from the OCF Specs, but none of those changes are incompatible with the original OCF specification</para></footnote> and is basically an extension of the Linux Standard Base conventions for init scripts to</para>
<listitem><para>make them self describing, and</para></listitem>
<listitem><para>extensible</para></listitem>
</itemizedlist>
<para>
OCF specs have strict definitions of what exit codes actions must return
<footnote>
<para>Included with the cluster is the ocf-tester script which can be useful in this regard.</para>
</footnote>
The cluster follows these specifications exactly, and exiting with the wrong exit code will cause the cluster to behave in ways you will likely find puzzling and annoying.
In particular, the cluster needs to distinguish a completely stopped resource from one which is in some erroneous and indeterminate state.
</para>
<para>
Parameters are passed to the script as environment variables, with the special prefix <envar>OCF_RESKEY_</envar>.
So, if you need to be given a parameter which the user thinks of as ip it will be passed to the script as <envar>OCF_RESKEY_ip</envar>.
The number and purpose of the parameters is completely arbitrary, however your script should advertise any that it supports using the meta-data command.
</para>
- <para>For more information, see <ulink url="http://wiki.linux-ha.org/OCFResourceAgent"/> and <xref linkend="ap-ocf"/>.</para>
+ <para>For more information, see <ulink url="http://www.linux-ha.org/wiki/OCF_Resource_Agents"/> and <xref linkend="ap-ocf"/>.</para>
</section>
<section id="s-resource-lsb">
<title>Linux Standard Base</title>
<para>
LSB resource agents are those found in <filename>/etc/init.d</filename>.
Generally they are provided by the OS/distribution and in order to be used with the cluster, must conform to the LSB Spec.
</para>
<para>The LSB Spec (as it relates to init scripts) can be found at: <ulink url="http://refspecs.linux-foundation.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html"/>
</para>
<para>
Many distributions claim LSB compliance but ship with broken init scripts.
To see if your init script is LSB-compatible, see the FAQ entry <xref linkend="ap-lsb"/>.
The most common problems are:
</para>
<itemizedlist>
<listitem><para>Not implementing the status operation at all</para></listitem>
<listitem><para>Not observing the correct exit status codes for start/stop/status actions</para></listitem>
<listitem><para>Starting a started resource returns an error (this violates the LSB spec)</para></listitem>
<listitem><para>Stopping a stopped resource returns an error (this violates the LSB spec)</para></listitem>
</itemizedlist>
</section>
<section id="s-resource-heartbeat">
<title>Legacy Heartbeat</title>
<para>
Version 1 of Heartbeat came with its own style of resource agents and it is highly likely that many people have written their own agents based on its conventions.
To enable administrators to continue to use these agents, they are supported by the new cluster manager.
</para>
<para>For more information, see: <ulink url="http://wiki.linux-ha.org/HeartbeatResourceAgent"/></para>
<para>The OCF class is the most preferred one as it is an industry standard, highly flexible (allowing parameters to be passed to agents in a non-positional manner) and self-describing.</para>
<para>
There is also an additional class, STONITH, which is used exclusively for fencing related resources.
This is discussed later in <xref linkend="ch-stonith"/>.
</para>
</section>
</section>
<section id="s-resource-properties">
<title>Properties</title>
<para>These values tell the cluster which script to use for the resource, where to find that script and what standards it conforms to.</para>
<table frame="all">
<title>Properties of a Primitive Resource</title>
<tgroup cols="2">
<thead>
<row>
<entry>Field</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry>id</entry>
<entry>Your name for the resource</entry>
</row>
<row>
<entry>class</entry>
<entry>The standard the script conforms to. Allowed values: heartbeat, lsb, ocf, stonith</entry>
</row>
<row>
<entry>type</entry>
<entry>The name of the Resource Agent you wish to use. eg. IPaddr or Filesystem</entry>
</row>
<row>
<entry>provider</entry>
<entry>The OCF spec allows multiple vendors to supply the same ResourceAgent. To use the OCF resource agents supplied with Heartbeat, you should specify heartbeat here.</entry>
</row>
</tbody>
</tgroup>
</table>
<para>Resource definitions can be queried with the crm_resource tool. For example</para>
Heartbeat resources take only ordered and unnamed parameters.
The supplied name therefor indicates the order in which they are passed to the script.
Only single digit values are allowed.
</para>
</note>
</section>
<section id="s-resource-options">
<title>Resource Options</title>
<para>Options are used by the cluster to decide how your resource should behave and can be easily set using the <parameter>--meta</parameter> option of the <command>crm_resource</command> command.</para>
<table frame="all">
<title>Options for a Primitive Resource</title>
<tgroup cols="3">
<thead>
<row>
<entry>Field</entry>
<entry>Default</entry>
<entry>Description</entry>
</row>
</thead><tbody><row>
<entry>priority</entry>
<entry>0</entry>
<entry>If not all resources can be active, the cluster will stop lower priority resources in order to keep higher priority ones active.</entry>
</row>
<row>
<entry>target-role</entry>
<entry>Started</entry>
<entry>
<para>What state should the cluster attempt to keep this resource in? Allowed values: </para>
<itemizedlist>
<listitem><para>Stopped - Force the resource to be stopped</para></listitem>
<listitem><para>Started - Allow the resource to be started (In the case of <link linkend="s-resource-multistate">multi-state</link> resources, they will not promoted to master)</para></listitem>
<listitem><para>Master - Allow the resource to be started and, if appropriate, promoted</para></listitem>
</itemizedlist>
</entry>
</row>
<row>
<entry>is-managed</entry>
<entry>TRUE</entry>
<entry>
Is the cluster allowed to start and stop the resource?
Allowed values: true, false
</entry>
</row>
<row>
<entry>resource-stickiness</entry>
<entry>Inherited</entry>
<entry>
How much does the resource prefer to stay where it is?
Defaults to the value of resource-stickiness in the rsc_defaults section
</entry>
</row>
<row>
<entry>migration-threshold</entry>
<entry>0 (disabled)</entry>
<entry>How many failures should occur for this resource on a node before making the node ineligible to host this resource. </entry>
</row>
<row>
<entry>failure-timeout</entry>
<entry>0 (disabled)</entry>
<entry>How many seconds to wait before acting as if the failure had not occurred (and potentially allowing the resource back to the node on which it failed.</entry>
</row>
<row>
<entry>multiple-active</entry>
<entry>stop_start</entry>
<entry>
<para>What should the cluster do if it ever finds the resource active on more than one node. Allowed values: </para>
<itemizedlist>
<listitem><para>block - mark the resource as unmanaged</para></listitem>
<listitem><para>stop_only - stop all active instances and leave them that way</para></listitem>
<listitem><para>stop_start - stop all active instances and start the resource in one location only</para></listitem>
</itemizedlist>
</entry>
</row>
</tbody>
</tgroup>
</table>
<para>If you performed the following commands on the previous LSB Email resource</para>
<title>Setting Global Defaults for Resource Options</title>
<para>To set a default value for a resource option, simply add it to the <literal>rsc_defaults</literal> section with <command>crm_attribute</command>. Thus, </para>
<para>would prevent the cluster from starting or stopping any of the resources in the configuration (unless of course the individual resources were specifically enabled and had <literal>is-managed</literal> set to true).</para>
</section>
<section id="s-resource-attributes">
<title>Instance Attributes</title>
<para>The scripts of some resource classes (LSB not being one of them) can be given parameters which determine how they behave and which instance of a service they control.</para>
<para>If your resource agent supports parameters, you can add them with the <command>crm_resource</command> command. For instance</para>
<para><command>crm_resource --resource Public-IP --set-parameter ip --property-value 1.2.3.4</command></para>
<para>would create an entry in the resource like this</para>
<example>
<title>An example OCF resource with instance attributes</title>
<para>For an OCF resource, the result would be an environment variable called <envar>OCF_RESKEY_ip</envar> with a value of 1.2.3.4</para>
<para>
The list of instance attributes supported by an OCF script can be found by calling the resource script with the <parameter>meta-data</parameter> command.
The output contains an XML description of all the supported attributes, their purpose and default values.
</para>
<example>
<title>Displaying the metadata for the Dummy resource agent template</title>
<entry>Your name for the action. Must be unique.</entry>
</row>
<row>
<entry>name</entry>
<entry>The action to perform. Common values: monitor, start, stop</entry>
</row>
<row>
<entry>interval</entry>
<entry>How frequently (in seconds) to perform the operation. Default value: 0</entry>
</row>
<row>
<entry>timeout</entry>
<entry>How long to wait before declaring the action has failed.</entry>
</row>
<row>
<entry>requires</entry>
<entry>
<para>What conditions need to be satisfied before this action occurs. Allowed values: </para>
<itemizedlist>
<listitem><para>nothing - The cluster may start this resource at any time </para></listitem>
<listitem><para>quorum - The cluster can only start this resource if a majority of the configured nodes are active </para></listitem>
<listitem><para>fencing - The cluster can only start this resource if a majority of the configured nodes are active <emphasis>and</emphasis> any failed or unknown nodes have been powered off.</para></listitem>
</itemizedlist>
<para>STONITH resources default to nothing, and all others default to fencing if STONITH is enabled and quorum otherwise.</para>
</entry>
</row>
<row>
<entry>on-fail</entry>
<entry>
<para>The action to take if this action ever fails. Allowed values: </para>
<itemizedlist>
<listitem><para>ignore - Pretend the resource did not fail</para></listitem>
<listitem><para>block - Don't perform any further operations on the resource</para></listitem>
<listitem><para>stop - Stop the resource and do not start it elsewhere</para></listitem>
<listitem><para>restart - Stop the resource and start it again (possibly on a different node) </para></listitem>
<listitem><para>fence - STONITH the node on which the resource failed</para></listitem>
<listitem><para>standby - Move <emphasis>all</emphasis> resources away from the node on which the resource failed</para></listitem>
</itemizedlist>
<para>The default for the stop operation is fence when STONITH is enabled and block otherwise. All other operations default to stop.</para>
</entry>
</row>
<row>
<entry>enabled</entry>
<entry>If false, the operation is treated as if it does not exist. Allowed values: <emphasis>true</emphasis>, false</entry>
</row>
</tbody>
</tgroup>
</table>
</section>
</section>
<section id="s-operation-defaults">
<title>Setting Global Defaults for Operations</title>
<para>To set a default value for a operation option, simply add it to the op_defaults section with crm_attribute. Thus, </para>
would default each operation's timeout to 20 seconds.
If an operation's definition also includes a value for <literal>timeout</literal>, then that value would be used instead (for that operation only).
</para>
<section id="s-operation-timeouts">
<title>When Resources Take a Long Time to Start/Stop</title>
<para>
There are a number of implicit operations that the cluster will always perform - start, stop and a non-recurring monitor operation (used at startup to check the resource isn't already active).
If one of these is taking too long, then you can create an entry for them and simply specify a new value.
</para>
<example>
<title>An OCF resource with custom timeouts for its implicit actions</title>
Provided no two operations (for a single resource) have the same name and interval you can have as many monitor operations as you like.
In this way you can do a superficial health check every minute and progressively more intense ones at higher intervals.
</para>
<para>
To tell the resource agent what kind of check to perform, you need to provide each monitor with a different value for a common parameter.
The OCF standard creates a special parameter called <envar>OCF_CHECK_LEVEL</envar> for this purpose and dictates that it is <emphasis>made available to the resource agent without the normal <envar>OCF_RESKEY_</envar> prefix</emphasis>.
</para>
<para>
Whatever name you choose, you can specify it by adding an instance_attributes block to the op tag.
Note that it is up to each resource agent to look for the parameter and decide how to use it.
</para>
<example>
<title>An OCF resource with two recurring health checks performing different levels of checks</title>