Page MenuHomeClusterLabs Projects

No OneTemporary

This document is not UTF8. It was detected as ISO-8859-1 (Latin 1) and converted to UTF8 for display.
diff --git a/ra/Makefile b/ra/Makefile
new file mode 100644
index 0000000..217b9b6
--- /dev/null
+++ b/ra/Makefile
@@ -0,0 +1,21 @@
+# Basic DocBook mangling makefile.
+#
+#
+
+BASENAME=resource-agent-api
+
+DOCBOOK2HTML=docbook2html -u -d ../docbook-utils.dsl\#html
+
+DOCBOOK2PS=docbook2ps -u -d ../docbook-utils.dsl\#print
+
+%.html: %.xml
+ $(DOCBOOK2HTML) $^
+
+%.ps: %.xml
+ $(DOCBOOK2PS) $^
+
+%.pdf: %.ps
+ ps2pdf $^
+
+clean:
+ rm $(BASENAME).pdf $(BASENAME).ps $(BASENAME).html||true
\ No newline at end of file
diff --git a/ra/resource-agent-api.xml b/ra/resource-agent-api.xml
new file mode 100644
index 0000000..54e831d
--- /dev/null
+++ b/ra/resource-agent-api.xml
@@ -0,0 +1,888 @@
+<?xml version="1.0" ?>
+<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+
+ <article>
+ <articleinfo>
+ <title>Open Clustering Framework Resource Agent API</title>
+ <authorgroup>
+ <editor>
+ <firstname>Lars</firstname>
+ <surname>Marowsky-Brée</surname>
+ <affiliation>
+ <address><email>lmb@suse.de</email></address>
+ </affiliation>
+ </editor>
+ </authorgroup>
+ <releaseinfo>$Id$</releaseinfo>
+ <copyright>
+ <year>2002</year>
+ <year>2003</year>
+ <holder>Lars Marowsky-Brée</holder>
+ </copyright>
+ <legalnotice>
+ <para>
+ Permission is granted to copy, distribute and/or modify this
+ document under the terms of the GNU Free Documentation
+ License, Version 1.2 or any later version published by the
+ Free Software Foundation; with no Invariant Sections, no
+ Front-Cover Texts, and no Back-Cover Texts. A copy of the
+ license can be found at http://www.gnu.org/licenses/fdl.txt.
+ </para>
+ </legalnotice>
+ </articleinfo>
+
+ <section>
+ <title>Abstract</title>
+
+ <para>Resource Agents (RA) are the middle layer between the
+ Resource Manager (RM) and the actual resources being managed. They
+ aim to integrate the resource type with the RM without any
+ modifications to the actual resource provider itself, by
+ encapsulating it carefully and providing generic methods (actions)
+ to operate on them.
+ </para>
+
+ <para>The RAs are obviously very specific to the resource type
+ they operate on, however there is no reason why they should be
+ specific to a particular RM.
+ </para>
+
+ <para>The API described in this document should be general enough
+ that a compliant Resource Agent can be used by all existing
+ resource managers / switch-over systems who chose to implement
+ this API either exclusively or in addition to their existing one.
+ </para>
+
+ <section>
+ <title>Scope</title>
+
+ <para>This document describes a common API for the RM to call
+ the RAs so the pool of available RAs can be shared by the
+ different clustering solutions.
+ </para>
+
+ <para> It does NOT define any libraries or helper functions
+ which RAs might share with regard to common functionality like
+ external command execution, cluster logging et cetera, as
+ these are NOT specific to RA and are defined in the respective
+ standards.
+ </para>
+
+ </section>
+
+ <section>
+ <title>API version described</title>
+
+ <para>
+ This document currently describes version 1.0 of the
+ API. </para>
+
+ </section>
+
+ </section>
+
+ <section>
+ <title>Terms used in this document</title>
+
+ <section>
+ <title>"Resource"</title>
+
+ <para>A single physical or logical entity that provides a
+ service to clients or other resources. For example, a resource
+ can be a single disk volume, a particular network address, or
+ an application such as a web server. A resource is generally
+ available for use over time on two or more nodes in a cluster,
+ although it usually can be allocated to only one node at any
+ given time.
+ </para>
+
+
+ <para>Resources are identified by a name that must be unique
+ to the particular resource type. This is any name chosen by
+ the administrator to identify the resource instance and passed
+ to the RA as a special environment variable.</para>
+
+ <para>A resource may also have instance parameters which
+ provide additional information required for Resource Agent to
+ control the resource.</para>
+
+ </section>
+
+ <section>
+ <title>"Resource types"</title>
+
+ <para>A resource type represents a set of resources which
+ share a common set of instance parameters and a common set of
+ actions which can be performed on resource of the given
+ type.</para>
+
+ <para>The resource type name is chosen by the provider of the
+ RA.</para>
+ </section>
+
+ <section>
+ <title>"Resource agent"</title>
+
+ <para> A RA provides the actions ("member functions") for a
+ given type of resources; by providing the RA with the instance
+ parameters, it is used to control a specific resource.</para>
+
+ <para> They are usually implemented as shell scripts, but the
+ API described here does not require this.
+ </para>
+
+ <para>Although this is somewhat similar to LSB init scripts,
+ there are some differences explained below.</para>
+
+ </section>
+
+ <section>
+ <title>"Instance parameters"</title>
+
+ <para>Instance parameters are the attributes which describe a
+ given resource instance. It is recommended that the
+ implementor minimize the set of instance parameters.</para>
+
+ <para>The meta data allows the RA to flag one or more instance
+ parameters as 'unique'. This is a hint to the RM or higher
+ level configuration tools that the combination of these
+ parameters must be unique to the given resource type.</para>
+
+ <para>An instance parameter has a given name and value. They
+ are both case sensitive and must satisfy the requirements of
+ POSIX environment name/value combinations.</para>
+
+ </section>
+
+ </section>
+
+ <section>
+ <title>API</title>
+
+ <section>
+ <title>API Version Numbers</title>
+
+ <para>The version number is of the form "x.y", where x and y
+ are positive numbers greater or equal to zero. x is referred
+ to as the "major" number, and y as the "minor" number.
+ </para>
+
+ <para>The major number must be increased if a _backwards
+ incompatible_ change is made to the API. A major number
+ mismatch between the RA and the RM must be reported as an
+ error by both sides.
+ </para>
+
+ <para>The minor number must be increased if _any_ change at
+ all is made to the API. If the major is increased, the minor
+ number should be reset to zero. The minor number can be used
+ by both sides to see whether a certain additional feature is
+ supported by the other party.
+ </para>
+
+ </section>
+
+ <section>
+ <title>Paths</title>
+
+ <para>The Resource Agents are located in subdirectories under
+ <filename>/usr/ocf/resource.d</filename>.</para>
+
+ <para> The subdirectories allow the installation of multiple
+ RAs for the same type, but from different vendors or package
+ versions.</para>
+
+ <para>The filename within the directories equals the resource
+ type name provided by the RA and may be a link to the real
+ location.</para>
+
+ <para>Example directory structure:
+ <screen>
+ FailSafe -> FailSafe-1.1.0/
+ FailSafe-1.0.4/
+ FailSafe-1.1.0/
+ heartbeat -> heartbeat-1.1.2/
+ heartbeat-1.1.2/
+ heartbeat-1.1.2/IPAddr
+ heartbeat-1.1.2/IP -> IPAddr
+ </screen>
+ </para>
+
+ <para> How the RM chooses an agent for a specific resource type
+ name from the available set is implementation specific.</para>
+
+ </section>
+
+ <section>
+ <title>Execution syntax</title>
+
+ <para>After the RM has identified the executable to call, the RA
+ will be called with the requested action as its sole argument.
+ </para>
+
+ <para>To allow for further extensions, the RA shall ignore all
+ other arguments.
+ </para>
+
+ </section>
+
+ <section>
+ <title>Resource Agent actions</title>
+
+ <para>A RA must be able to perform the following actions on a
+ given resource instance on request by the RM; additional
+ actions may be supported by the script for example for LSB
+ compliance.
+ </para>
+
+ <para>The actions are all required to be idempotent. Invoking
+ any operation twice - in particular, the start and stop
+ actions - shall succeed and leave the resource instance in the
+ requested state.
+ </para>
+
+ <para>In general, a RA should not assume it is the only RA of
+ its type running at any given time because the RM might start
+ several RA instances for multiple independent resource
+ instances in parallel.
+ </para>
+
+ <para><emphasis>Mandatory</emphasis> actions must be
+ supported; <emphasis>optional</emphasis> operations must be
+ advertised in the meta data if supported. If the RM tries to
+ call a unsupported action the RA shall return an error as
+ defined below.
+ </para>
+
+ <section>
+ <title><command>start</command></title>
+
+ <para><emphasis>Mandatory</emphasis>.</para>
+
+ <para>This brings the resource instance online and makes it
+ available for use. It should NOT terminate before the
+ resource instance has either been fully started or an error
+ has been encountered.</para>
+
+ <para>It may try to implement recovery actions for certain
+ cases of startup failures.</para>
+
+ <para><command>start</command> must succeed if the
+ resource instance is already running.</para>
+
+ <para><command>start</command> must return an error if
+ the resource instance is not fully started.</para>
+
+ </section>
+
+ <section>
+ <title><command>stop</command></title>
+
+ <para><emphasis>Mandatory</emphasis>.</para>
+
+ <para>This stops the resource instance. After the
+ <command>stop</command> command has completed, no component
+ of the resource shall remain active and it must be possible
+ to start it on the same node or another node or an error
+ must be returned.
+ </para>
+
+ <para> The <command>stop</command> request by the RM
+ includes the authorization to bring down the resource even
+ by force as long data integrity is maintained; breaking
+ currently active transactions should be avoided, but the
+ request to offline the resource has higher priority than
+ this. If this is not possible, the RA shall return an error
+ to allow higher level recovery.
+ </para>
+
+ <para> The <command>stop</command> action should also
+ perform clean-ups of artifacts like leftover shared memory
+ segments, semaphores, IPC message queues, lock files
+ etc.</para>
+
+ <para> <command>stop</command> must succeed if the resource
+ is already stopped.
+ </para>
+
+ <para> <command>stop</command> must return an error if the
+ resource is not fully stopped.
+ </para>
+
+ </section>
+
+ <section>
+ <title><command>status</command></title>
+
+ <para><emphasis>Mandatory</emphasis>.</para>
+
+ <para>Checks and returns the current status of the resource
+ instance. The thoroughness of the check is further
+ influenced by the weight of the check, which is further
+ explained in 3.5.3.
+ </para>
+
+ <para>It is accepted practice to have additional instance
+ parameters which are not strictly required to identify the
+ resource instance but are needed to monitor it or customize
+ how intrusive this check is allowed to be.</para>
+
+ <para> Note that <command>status</command> shall also return
+ a well defined error code (see below) for stopped instances,
+ ie before <command>start</command> has ever been invoked.
+ </para>
+
+ </section>
+
+ <section>
+ <title><command>recover</command></title>
+
+ <para><emphasis>Optional</emphasis>.</para>
+
+ <para> A special case of the <command>start</command>
+ action, this should try to recover a resource locally.
+ </para>
+
+ <para>It is recommended that this action is not advertised
+ unless it is advantageous to use when compared to a
+ stop/start operation.
+ </para>
+
+ <para>If this is not supported, it may be mapped to a
+ stop/start action by the RM.</para>
+
+ <para>An example includes "recovering" an IP address by
+ moving it to another interface; this is much less costly
+ than initiating a full resource group fail over to another
+ node.
+ </para>
+
+ </section>
+
+ <section>
+ <title><command>reload</command></title>
+
+ <para><emphasis>Optional</emphasis>.</para>
+
+ <para>Notifies the resource instance of a configuration
+ change external to the instance parameters; it should reload
+ the configuration of the resource instance without
+ disrupting the service.
+ </para>
+
+ <para>It is recommended that this action is not advertised
+ unless it is advantageous to use when compared to a
+ stop/start operation.</para>
+
+ <para> If this is not supported, it may be mapped to a
+ stop/start action by the RM.
+ </para>
+
+ </section>
+
+ <section>
+ <title><command>metadata</command></title>
+
+ <para><emphasis>Mandatory</emphasis>.</para>
+
+ <para>Returns the resource agent meta data via stdout.</para>
+
+ </section>
+
+ <section>
+ <title><command>validate-all</command></title>
+
+ <para><emphasis>Optional</emphasis>.</para>
+
+ <para>Validate the instance parameters provided.</para>
+
+ <para> Perform a syntax check and if possible, a semantic
+ check on the instance parameters.</para>
+
+ </section>
+
+ </section>
+
+ <section>
+ <title> Parameter passing</title>
+
+ <para> The instance parameters and some additional attributes
+ are passed in via the environment; this has been chosen
+ because it does not reveal the parameters to an unprivileged
+ user on the same system and environment variables can be
+ easily accessed by all programming languages and shell
+ scripts.</para>
+
+ <para> The entire environment variable name space starting
+ with <varname>OCF_</varname> is considered to be reserved for
+ OCF use.</para>
+
+ <para>See section 4. for a more formal explanation of instance
+ parameters.</para>
+
+ <section>
+ <title>Syntax for instance parameters</title>
+
+ <para>They are directly converted to environment variables;
+ the name is prefixed with <varname>OCF_RESKEY_</varname>.
+ </para>
+
+ <para>The instance parameter <varname>force</varname>" with
+ the value <literal>yes</literal> thus becomes:
+ <varname>OCF_RESKEY_force</varname>=<literal>yes</literal>
+ in the environment. </para>
+
+ </section>
+
+ <section>
+ <title> Global OCF attributes</title>
+
+ <para> Currently, the following additional environment
+ variables are defined:
+
+ <variablelist>
+ <varlistentry>
+ <term><varname>OCF_RA_VERSION_MAJOR</varname></term>
+ <term><varname>OCF_RA_VERSION_MINOR</varname></term>
+ <listitem>
+ <para> Version number of the OCF Resource Agent
+ API. If the script does not support this revision,
+ it should report an error.
+ </para>
+
+ <para> See 3.1. for an explanation of the versioning
+ scheme used. The version number is split into two
+ numbers for ease of use in shell scripts.
+ </para>
+
+ <para> These two may be used by the RA to determine
+ whether it is run under an OCF compliant RM.
+ </para>
+
+ <para>
+ Example:
+ <screen> OCF_RA_VERSION_MAJOR=1
+ OCF_RA_VERSION_MINOR=0</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>OCF_ROOT</varname></term>
+ <listitem>
+ <para>Referring to the root of the OCF directory
+ hierarchy.</para>
+
+ <para> Example:
+ <screen> OCF_ROOT=/usr/ocf</screen>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>OCF_RESOURCE_INSTANCE</varname></term>
+ <listitem>
+ <para>The name of the resource instance.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term><varname>OCF_RESOURCE_TYPE</varname></term>
+ <listitem>
+ <para> The name of the resource type being operated
+ on.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </para>
+ </section>
+
+
+ <section>
+ <title> Action specific extensions</title>
+
+ <para>These environment variables are not required for all
+ actions, but only supported by some.
+ </para>
+
+ <section>
+ <title>Parameters specific to the 'status' action</title>
+
+ <variablelist>
+ <varlistentry>
+ <term><varname>OCF_CHECK_LEVEL</varname></term>
+ <listitem>
+ <informaltable>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>0</entry>
+
+ <entry>
+
+ <simpara>The most lightweight check
+ possible, which should not have an impact
+ on the QoS.</simpara>
+
+ <simpara>Example: Check for the existence
+ of the process.</simpara></entry>
+ </row>
+
+ <row>
+ <entry>10</entry>
+
+ <entry>
+
+ <simpara>A medium weight check, expected
+ to be called multiple times per minute,
+ which should not have a noticeable impact
+ on the QoS.</simpara>
+
+ <simpara>Example: Send a request for a
+ static page to a webserver.</simpara>
+ </entry>
+ </row>
+
+ <row>
+ <entry>20</entry>
+
+ <entry>
+
+ <simpara>A heavy weight check, called
+ infrequently, which may impact system or
+ service performance.</simpara>
+
+ <simpara>Example: An internal consistency
+ check to verify service
+ integrity.w</simpara>
+ </entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>Service must remain available during all of these
+ operation.</para>
+
+ <para>All other numbers are reserved.</para>
+
+ <para> It is recommended that if a requested level is not
+ implemented, the RA should perform the next lower level
+ supported.</para>
+
+ </section>
+
+ </section>
+
+ </section>
+
+ <section>
+ <title>Exit status codes</title>
+
+ <para>These exit status codes are the ones documented in the
+ LSB 1.1.0 specification, with additional explanations of how
+ they shall be used by RAs. In general, all non-zero status
+ codes shall indicate failure in accordance to the best current
+ practices.
+ </para>
+
+ <section>
+ <title>All operations but "status"</title>
+ <informaltable>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>0</entry>
+
+ <entry>No error, action succeeded completely
+ </entry>
+ </row>
+
+ <row>
+ <entry>1</entry>
+
+ <entry>generic or unspecified error (current
+ practice)</entry>
+
+ </row>
+ <row>
+ <entry>2</entry>
+
+ <entry>invalid or excess argument(s)
+ <simpara>Likely error
+ code for validate-all, if the instance parameters do
+ not validate. Any other action is free to also
+ return this exit status code for this case.</simpara>
+ </entry>
+
+ </row>
+
+ <row>
+ <entry>3</entry>
+
+ <entry>unimplemented feature (for example, "reload")</entry>
+
+ </row>
+ <row>
+ <entry>4</entry>
+
+ <entry>user had insufficient privilege</entry>
+
+ </row>
+ <row>
+ <entry>5</entry>
+
+ <entry>program is not installed</entry>
+
+ </row>
+
+ <row>
+ <entry>6</entry>
+
+ <entry>program is not configured</entry>
+
+ </row>
+
+ <row>
+ <entry>7</entry>
+
+ <entry>program is not running
+
+ <simpara> Note: This is not the error code to be
+ returned by a successful "stop" operation. A
+ successful "stop" operation shall return
+ 0.</simpara>
+ </entry>
+
+ </row>
+ <row>
+ <entry>8-99</entry>
+
+ <entry>reserved for future LSB use</entry>
+
+ </row>
+ <row>
+ <entry>100-149</entry>
+
+ <entry>reserved for distribution use</entry>
+
+ </row>
+ <row>
+ <entry>150-199</entry>
+
+ <entry>reserved for application use</entry>
+
+ </row>
+ <row>
+ <entry>200-254</entry>
+
+ <entry>reserved</entry>
+
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ </section>
+
+ <section>
+ <title>Exit status codes for "status"</title>
+
+ <para>That the LSB has chosen to deviate from the set of
+ default exit status codes for the "status" operation is
+ unfortunate, but this specification follows this path to
+ stay close to the LSB specification.
+ </para>
+
+ <para>However, a minor deviation from the values given in
+ the LSB specification was made for the values "1" and
+ "2". The existence of a pid file or lock file does not
+ convey any meaning for a RA; these values have been
+ redefined, but still imply errors.</para>
+
+ <informaltable>
+ <tgroup cols="2">
+ <tbody>
+ <row>
+ <entry>0</entry>
+
+ <entry>program is running or service is OK
+ </entry>
+ </row>
+
+ <row>
+ <entry>1</entry>
+
+ <entry>program is dead, hung or crashed. (Generic
+ error)</entry>
+
+ </row>
+ <row>
+ <entry>2</entry>
+
+ <entry>invalid or excess argument(s)
+ </entry>
+
+ </row>
+
+ <row>
+ <entry>3</entry>
+
+ <entry>program is not running
+
+ <simpara>Note: This exit code shall be returned also
+ for cleanly inactive resource
+ instances.</simpara></entry>
+
+ </row>
+ <row>
+ <entry>4</entry>
+
+ <entry>program or service status is unknown</entry>
+
+ </row>
+ <row>
+ <entry>5-99</entry>
+
+ <entry>reserved for future LSB use</entry>
+
+ </row>
+ <row>
+ <entry>100-149</entry>
+
+ <entry>reserved for distribution use</entry>
+
+ </row>
+ <row>
+ <entry>150-199</entry>
+
+ <entry>reserved for application use</entry>
+
+ </row>
+ <row>
+ <entry>200-254</entry>
+
+ <entry>reserved</entry>
+
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+
+ </section>
+
+ </section>
+
+ <section>
+ <title>Relation to the LSB</title>
+
+ <para>It is required that the current LSB spec is fully
+ supported by the system.</para>
+
+ <para>The API tries to make it possible to have RA function both
+ as a normal LSB init script and a cluster-aware RA, but this is
+ not required functionality. The RAs could however use the
+ helper functions defined for LSB init scripts.
+ </para>
+
+ </section>
+
+ <section>
+ <title>RA meta data</title>
+
+ <section>
+ <title>Format</title>
+
+ <para>The API has the following requirements which are not
+ fulfilled by the LSB way of embedding meta data into the
+ beginning of the init scripts:
+
+
+ <itemizedlist>
+ <listitem><simpara>Independent of the language the RA is
+ actually written in,</simpara>
+ </listitem>
+
+ <listitem><simpara>Extensible,</simpara></listitem>
+
+ <listitem><simpara>Structured,</simpara></listitem>
+
+ <listitem><simpara>Easy to parse from a variety of
+ languages.</simpara></listitem>
+
+ </itemizedlist></para>
+
+ <para> This is why the API uses simple XML to describe the RA
+ meta data. The DTD for this API can be found at
+ http://www.opencf.org/standards/ra-api-1.dtd.</para>
+
+ </section>
+
+ <section>
+ <title>Semantics</title>
+
+ <para>An example of a valid meta data output is provided in
+ <filename>ra-metadata-example.xml</filename>.</para>
+
+ </section>
+ </section>
+
+ <appendix label="C">
+ <title>To-do list</title>
+
+ <para>Move the terminology definitions out into a separate
+ document common to all OCF work.
+ </para>
+
+ <para>An interface where the RA asynchronously informs the RM of
+ failures is planned but not defined yet.</para>
+
+ </appendix>
+
+
+ <ackno></ackno>
+ <appendix label="D">
+ <title>Contributors</title>
+
+ <simplelist>
+ <member>James Bottomley
+ <email>James.Bottomley@steeleye.com</email></member>
+
+ <member>Greg Freemyer
+ <email>freemyer@NorcrossGroup.com</email></member>
+
+ <member>Simon Horman
+ <email>horms@verge.net.au</email></member>
+
+ <member>Ragnar Kjørstad
+ <email>linux-ha@ragnark.vestdata.no</email></member>
+
+ <member>Lars Marowsky-Brée
+ <email>lmb@suse.de</email></member>
+
+ <member>Alan Robertson
+ <email>alanr@unix.sh</email></member>
+
+ <member>Yixiong Zou
+ <email>yixiong.zou@intel.com</email></member>
+ </simplelist>
+
+ </appendix>
+
+ <appendix label="E">
+ <title>Change Log</title>
+ <para></para>
+ </appendix>
+
+
+
+
+
+
+
+</article>
\ No newline at end of file

File Metadata

Mime Type
text/x-diff
Expires
Wed, Feb 26, 5:57 AM (23 h, 32 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
1465192
Default Alt Text
(24 KB)

Event Timeline