diff --git a/doc/Pacemaker_Explained/en-US/Ap-OCF.xml b/doc/Pacemaker_Explained/en-US/Ap-OCF.xml index f86e2d7378..9bfcf639d0 100644 --- a/doc/Pacemaker_Explained/en-US/Ap-OCF.xml +++ b/doc/Pacemaker_Explained/en-US/Ap-OCF.xml @@ -1,209 +1,208 @@ More About OCF Resource Agents
Location of Custom Scripts OCF Resource Agents are found in /usr/lib/ocf/resource.d/provider. When creating your own agents, you are encouraged to create a new directory under /usr/lib/ocf/resource.d/ so that they are not confused with (or overwritten by) the agents shipped with Heartbeat. So, for example, if you chose the provider name of bigCorp and wanted a new resource named bigApp, you would create a script called /usr/lib/ocf/resource.d/bigCorp/bigApp and define a resource: <primitive id="custom-app" class="ocf" provider="bigCorp" type="bigApp"/>
Actions All OCF Resource Agents are required to implement the following actions Required Actions for OCF Agents Action Description Instructions start Start the resource Return 0 on success and an appropriate error code otherwise. Must not report success until the resource is fully active. stop Stop the resource Return 0 on success and an appropriate error code otherwise. Must not report success until the resource is fully stopped. monitor Check the resource's state Exit 0 if the resource is running, 7 if it is stopped and anything else if it is failed. NOTE: The monitor script should test the state of the resource on the local machine only. meta-data Describe the resource Provide information about this resource as an XML snippet. Exit with 0. NOTE: This is not performed as root. validate-all Verify the supplied parameters are correct Exit with 0 if parameters are valid, 2 if not valid, 6 if resource is not configured.
Additional requirements (not part of the OCF specs) are placed on agents that will be used for advanced concepts like clones and multi-state resources. Optional Actions for OCF Agents Action Description Instructions promote Promote the local instance of a multi-state resource to the master/primary state Return 0 on success demote Demote the local instance of a multi-state resource to the slave/secondary state Return 0 on success notify Used by the cluster to send the agent pre and post notification events telling the resource what is or did just take place Must not fail. Must exit 0
- Some actions specified in the OCF specs are not currently used by the cluster + One action specified in the OCF specs is not currently used by the cluster - reload - reload the configuration of the resource instance without disrupting the service recover - a variant of the start action, this should try to recover a resource locally. Remember to use ocf-tester to verify that your new agent complies with the OCF standard properly.
How Does the Cluster Interpret the OCF Return Codes? The first thing the cluster does is check the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed and recovery action is initiated. There are three types of failure recovery: Types of recovery performed by the cluster Recovery Type Description Action Taken by the Cluster soft A transient error occurred Restart the resource or move it to a new location hard A non-transient error that may be specific to the current node occurred Move the resource elsewhere and prevent it from being retried on the current node fatal A non-transient error that will be common to all cluster nodes (I.e. a bad configuration was specified) Stop the resource and prevent it from being started on any cluster node
Assuming an action is considered to have failed, the following table outlines the different OCF return codes and the type of recovery the cluster will initiate when it is received. OCF Return Codes and How They are Handled OCF Return Code OCF Alias Description Recovery Type 0 OCF_SUCCESS Success. The command complete successfully. This is the expected result for all start, stop, promote and demote commands. soft 1 OCF_ERR_GENERIC Generic "there was a problem" error code. soft 2 OCF_ERR_ARGS The resource's configuration is not valid on this machine. Eg. Refers to a location/tool not found on the node. hard 3 OCF_ERR_UNIMPLEMENTED The requested action is not implemented. hard 4 OCF_ERR_PERM The resource agent does not have sufficient privileges to complete the task. hard 5 OCF_ERR_INSTALLED The tools required by the resource are not installed on this machine. hard 6 OCF_ERR_CONFIGURED The resource's configuration is invalid. Eg. A required parameters are missing. fatal 7 OCF_NOT_RUNNING The resource is safely stopped. The cluster will not attempt to stop a resource that returns this for any action. N/A 8 OCF_RUNNING_MASTER The resource is running in Master mode. soft 9 OCF_FAILED_MASTER The resource is in Master mode but has failed. The resource will be demoted, stopped and then started (and possibly promoted) again. soft other NA Custom error code. soft
Although counter intuitive, even actions that return 0 (aka. OCF_SUCCESS) can be considered to have failed. This can happen when a resource that is expected to be in the Master state is found running as a Slave, or when a resource is found active on multiple machines..
Exceptions Non-recurring monitor actions (probes) that find a resource active (or in Master mode) will not result in recovery action unless it is also found active elsewhere The recovery action taken when a resource is found active more than once is determined by the multiple-active property of the resource Recurring actions that return OCF_ERR_UNIMPLEMENTED do not cause any type of recovery