diff --git a/doc/Pacemaker_Explained/en-US/Ap-OCF.txt b/doc/Pacemaker_Explained/en-US/Ap-OCF.txt index a5be906c80..2e02f74f3c 100644 --- a/doc/Pacemaker_Explained/en-US/Ap-OCF.txt +++ b/doc/Pacemaker_Explained/en-US/Ap-OCF.txt @@ -1,268 +1,268 @@ [appendix] [[ap-ocf]] == More About OCF Resource Agents == === Location of Custom Scripts === indexterm:[OCF Resource Agents] OCF Resource Agents are found in '/usr/lib/ocf/resource.d/+provider+'. When creating your own agents, you are encouraged to create a new directory under _/usr/lib/ocf/resource.d/_ so that they are not confused with (or overwritten by) the agents shipped with Heartbeat. So, for example, if you chose the provider name of bigCorp and wanted a new resource named bigApp, you would create a script called _/usr/lib/ocf/resource.d/bigCorp/bigApp_ and define a resource: [source,XML] <primitive id="custom-app" class="ocf" provider="bigCorp" type="bigApp"/> === Actions === All OCF Resource Agents are required to implement the following actions .Required Actions for OCF Agents [width="95%",cols="3m,3,7",options="header",align="center"] |========================================================= |Action |Description |Instructions |start |Start the resource |Return 0 on success and an appropriate error code otherwise. Must not report success until the resource is fully active. indexterm:[start action] indexterm:[action,start] |stop |Stop the resource |Return 0 on success and an appropriate error code otherwise. Must not report success until the resource is fully stopped. indexterm:[stop action] indexterm:[action,stop] |monitor |Check the resource's state |Exit 0 if the resource is running, 7 if it is stopped, and anything else if it is failed. indexterm:[monitor action] indexterm:[action,monitor] NOTE: The monitor script should test the state of the resource on the local machine only. |meta-data |Describe the resource |Provide information about this resource as an XML snippet. Exit with 0. indexterm:[meta-data action] indexterm:[action,meta-data] NOTE: This is *not* performed as root. |validate-all |Verify the supplied parameters indexterm:[validate-all action] indexterm:[action,validate-all] |Exit with 0 if parameters are valid, 2 if not valid, 6 if resource is not configured. |========================================================= Additional requirements (not part of the OCF specs) are placed on agents that will be used for advanced concepts like <<s-resource-clone,clones>> and <<s-resource-multistate,multi-state>> resources. .Optional Actions for OCF Agents [width="95%",cols="2m,6,3",options="header",align="center"] |========================================================= |Action |Description |Instructions |promote |Promote the local instance of a multi-state resource to the master/primary state. indexterm:[promote action] indexterm:[action,promote] |Return 0 on success |demote |Demote the local instance of a multi-state resource to the slave/secondary state. indexterm:[demote action] indexterm:[action,demote] |Return 0 on success |notify |Used by the cluster to send the agent pre and post notification events telling the resource what has happened and will happen. indexterm:[notify action] indexterm:[action,notify] |Must not fail. Must exit with 0 |========================================================= One action specified in the OCF specs is not currently used by the cluster: * +recover+ - a variant of the +start+ action, this should try to recover a resource locally. Remember to use indexterm:[ocf-tester]`ocf-tester` to verify that your new agent complies with the OCF standard properly. === How are OCF Return Codes Interpreted? === The first thing the cluster does is to check the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed and recovery action is initiated. There are three types of failure recovery: .Types of recovery performed by the cluster [width="95%",cols="1m,4,4",options="header",align="center"] |========================================================= |Type |Description |Action Taken by the Cluster |soft indexterm:[soft error type] indexterm:[error type,soft] |A transient error occurred |Restart the resource or move it to a new location |hard indexterm:[hard error type] indexterm:[error type,hard] |A non-transient error that may be specific to the current node occurred |Move the resource elsewhere and prevent it from being retried on the current node |fatal indexterm:[fatal error type] indexterm:[error type,fatal] |A non-transient error that will be common to all cluster nodes (eg. a bad configuration was specified) |Stop the resource and prevent it from being started on any cluster node |========================================================= Assuming an action is considered to have failed, the following table outlines the different OCF return codes and the type of recovery the cluster will initiate when it is received. [[s-ocf-return-codes]] === OCF Return Codes === .OCF Return Codes and their Recovery Types -[width="95%",cols="1m,5^m,6<,1m",options="header",align="center"] +[width="95%",cols="2m,5^m,6<,1m",options="header",align="center"] |========================================================= |RC |OCF Alias |Description |RT |indexterm:[return code,0]0 |OCF_SUCCESS |Success. The command completed successfully. This is the expected result for all start, stop, promote and demote commands. indexterm:[OCF_SUCCESS] indexterm:[Environment Variable,OCF_SUCCESS] indexterm:[return code,OCF_SUCCESS] |soft |indexterm:[return code,1]1 |OCF_ERR_GENERIC |Generic "there was a problem" error code. indexterm:[OCF_ERR_,GENERIC] indexterm:[Environment Variable,OCF_ERR_,GENERIC] indexterm:[return code,OCF_ERR_,GENERIC] |soft |indexterm:[return code,2]2 |OCF_ERR_ARGS |The resource's configuration is not valid on this machine. Eg. refers to a location/tool not found on the node. indexterm:[OCF_ERR_,ARGS] indexterm:[Environment Variable,OCF_ERR_,ARGS] indexterm:[return code,OCF_ERR_,ARGS] |hard |indexterm:[return code,3]3 |OCF_ERR_UNIMPLEMENTED |The requested action is not implemented. indexterm:[OCF_ERR_,UNIMPLEMENTED] indexterm:[Environment Variable,OCF_ERR_,UNIMPLEMENTED] indexterm:[return code,OCF_ERR_,UNIMPLEMENTED] |hard |indexterm:[return code,4]4 |OCF_ERR_PERM |The resource agent does not have sufficient privileges to complete the task. indexterm:[OCF_ERR_,PERM] indexterm:[Environment Variable,OCF_ERR_,PERM] indexterm:[return code,OCF_ERR_,PERM] |hard |indexterm:[return code,5]5 |OCF_ERR_INSTALLED |The tools required by the resource are not installed on this machine. indexterm:[OCF_ERR_,INSTALLED] indexterm:[Environment Variable,OCF_ERR_,INSTALLED] indexterm:[return code,OCF_ERR_,INSTALLED] |hard |indexterm:[return code,6]6 |OCF_ERR_CONFIGURED |The resource's configuration is invalid. Eg. required parameters are missing. indexterm:[OCF_ERR_,CONFIGURED] indexterm:[Environment Variable,OCF_ERR_,CONFIGURED] indexterm:[return code,OCF_ERR_,CONFIGURED] |fatal |indexterm:[return code,7]7 |OCF_NOT_RUNNING |The resource is safely stopped. The cluster will not attempt to stop a resource that returns this for any action. indexterm:[OCF_NOT_RUNNING] indexterm:[Environment Variable,OCF_NOT_RUNNING] indexterm:[return code,OCF_NOT_RUNNING] |N/A |indexterm:[return code,8]8 |OCF_RUNNING_MASTER |The resource is running in +Master+ mode. indexterm:[OCF_RUNNING_MASTER] indexterm:[Environment Variable,OCF_RUNNING_MASTER] indexterm:[return code,OCF_RUNNING_MASTER] |soft |indexterm:[return code,9]9 |OCF_FAILED_MASTER |The resource is in +Master+ mode but has failed. The resource will be demoted, stopped and then started (and possibly promoted) again. indexterm:[OCF_FAILED_MASTER] indexterm:[Environment Variable,OCF_FAILED_MASTER] indexterm:[return code,OCF_FAILED_MASTER] |soft |other |NA |Custom error code. indexterm:[other return codes] indexterm:[return code,other] |soft |========================================================= Although counterintuitive, even actions that return 0 (aka. +OCF_SUCCESS+) can be considered to have failed. === Exceptions === * Non-recurring monitor actions (probes) that find a resource active (or in Master mode) will not result in recovery action unless it is also found active elsewhere * The recovery action taken when a resource is found active more than once is determined by the _multiple-active_ property of the resource * Recurring actions that return +OCF_ERR_UNIMPLEMENTED+ do not cause any type of recovery diff --git a/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt b/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt index 6c53c082d9..582aa00e89 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt @@ -1,572 +1,607 @@ = Resource Constraints = == Scores == indexterm:[Resource,Constraints] indexterm:[Constraints,for Resources] Scores of all kinds are integral to how the cluster works. Practically everything from moving a resource to deciding which resource to stop in a degraded cluster is achieved by manipulating scores in some way. Scores are calculated on a per-resource basis and any node with a negative score for a resource can't run that resource. After calculating the scores for a resource, the cluster then chooses the node with the highest one. === Infinity Math === +INFINITY+ is currently defined as 1,000,000 and addition/subtraction with it follows these three basic rules: * Any value + +INFINITY+ = +INFINITY+ * Any value - +INFINITY+ = -+INFINITY+ * +INFINITY+ - +INFINITY+ = -+INFINITY+ == Deciding Which Nodes a Resource Can Run On == +indexterm:[Constraint,Location] +indexterm:[Resource,Location] There are two alternative strategies for specifying which nodes a resources can run on. One way is to say that by default they can run anywhere and then create location constraints for nodes that are not allowed. The other option is to have nodes "opt-in"... to start with nothing able to run anywhere and selectively enable allowed nodes. === Options === .Options for Simple Location Constraints -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |id -indexterm:[id,Constraint Field] -indexterm:[Constraint Field,id] |A unique name for the constraint +indexterm:[id,Location Constraint Field] +indexterm:[Location Constraint,Field,id] -|rsc, -indexterm:[rsc Constraint Field] -indexterm:[Constraint Field,rsc] +|rsc |A resource name +indexterm:[rsc,Location Constraint Field] +indexterm:[Location Constraint,Field,rsc] |node -indexterm:[Node,Constraint Field] -indexterm:[Constraint Field,node] -|A node's uname +|A node's name +indexterm:[Node,Location Constraint Field] +indexterm:[Location Constraint,Field,node] |score -indexterm:[score,Constraint Field] -indexterm:[Constraint Field,score] |Positive values indicate the resource should run on this -. node. Negative values indicate the resource should not run on this - node. Values of +/- +INFINITY+ change "should"/"should not" to + node. Negative values indicate the resource should not run on this + node. + + Values of \+/- +INFINITY+ change "should"/"should not" to "must"/"must not". +indexterm:[score,Location Constraint Field] +indexterm:[Location Constraint,Field,score] |========================================================= === Asymmetrical "Opt-In" Clusters === indexterm:[Asymmetrical Opt-In Clusters] indexterm:[Cluster Type,Asymmetrical Opt-In] To create an opt-in cluster, start by preventing resources from running anywhere by default: [source,C] # crm_attribute --attr-name symmetric-cluster --attr-value false Then start enabling nodes. The following fragment says that the web server prefers +sles-1+, the database prefers +sles-2+ and both can fail over to +sles-3+ if their most preferred node fails. .Example set of opt-in location constraints ====== [source,XML] ------- <constraints> <rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="200"/> <rsc_location id="loc-2" rsc="Webserver" node="sles-3" score="0"/> <rsc_location id="loc-3" rsc="Database" node="sles-2" score="200"/> <rsc_location id="loc-4" rsc="Database" node="sles-3" score="0"/> </constraints> ------- ====== === Symmetrical "Opt-Out" Clusters === indexterm:[Symmetrical Opt-Out Clusters] indexterm:[Cluster Type,Symmetrical Opt-Out] To create an opt-out cluster, start by allowing resources to run anywhere by default: [source,C] # crm_attribute --attr-name symmetric-cluster --attr-value true Then start disabling nodes. The following fragment is the equivalent of the above opt-in configuration. .Example set of opt-out location constraints ====== [source,XML] ------- <constraints> <rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="200"/> <rsc_location id="loc-2-dont-run" rsc="Webserver" node="sles-2" score="-INFINITY"/> <rsc_location id="loc-3-dont-run" rsc="Database" node="sles-1" score="-INFINITY"/> <rsc_location id="loc-4" rsc="Database" node="sles-2" score="200"/> </constraints> ------- ====== Whether you should choose opt-in or opt-out depends both on your personal preference and the make-up of your cluster. If most of your resources can run on most of the nodes, then an opt-out arrangement is likely to result in a simpler configuration. On the other-hand, if most resources can only run on a small subset of nodes an opt-in configuration might be simpler. [[node-score-equal]] === What if Two Nodes Have the Same Score === If two nodes have the same score, then the cluster will choose one. This choice may seem random and may not be what was intended, however the cluster was not given enough information to know any better. .Example of two resources that prefer two nodes equally ====== [source,XML] ------- <constraints> <rsc_location id="loc-1" rsc="Webserver" node="sles-1" score="INFINITY"/> <rsc_location id="loc-2" rsc="Webserver" node="sles-2" score="INFINITY"/> <rsc_location id="loc-3" rsc="Database" node="sles-1" score="500"/> <rsc_location id="loc-4" rsc="Database" node="sles-2" score="300"/> <rsc_location id="loc-5" rsc="Database" node="sles-2" score="200"/> </constraints> ------- ====== In the example above, assuming no other constraints and an inactive cluster, Webserver would probably be placed on sles-1 and Database on sles-2. It would likely have placed Webserver based on the node's uname and Database based on the desire to spread the resource load evenly across the cluster. However other factors can also be involved in more complex configurations. [[s-resource-ordering]] == Specifying in which Order Resources Should Start/Stop == +indexterm:[Constraint,Order] +indexterm:[Resource,Startup Order] The way to specify the order in which resources should start is by creating +rsc_order+ constraints. .Properties of an Ordering Constraint -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |id |A unique name for the constraint +indexterm:[id,Ordering Constraint Field] +indexterm:[Ordering Constraint,Field,id] |first |The name of a resource that must be started before the +then+ resource is allowed to. +indexterm:[first,Ordering Constraint Field] +indexterm:[Ordering Constraint,Field,first] |then |The name of a resource. This resource will start after the +first+ resource. +indexterm:[then,Ordering Constraint Field] +indexterm:[Ordering Constraint,Field,then] -|score -|If greater than zero, the constraint is mandatory. Otherwise it is - only a suggestion. Default value: _INFINITY_ +|kind +|How to enforce the constraint. ('Since 1.1.2') + +* Optional - Just a suggestion. Only applies if both resources are + starting/stopping. + +* Mandatory - Always. If 'first' is stopping or cannot be started, + 'then' must be stopped. + +* Serialize - Ensure that no two stop/start actions occur concurrently + for a set of resources. + +indexterm:[kind,Ordering Constraint Field] +indexterm:[Ordering Constraint,Field,kind] |symmetrical |If true, which is the default, stop the resources in the reverse order. Default value: _true_ +indexterm:[symmetrical,Ordering Constraint Field] +indexterm:[Ordering Constraint,Field,symmetrical] |========================================================= === Mandatory Ordering === When the +then+ resource cannot run without the +first+ resource being active, one should use mandatory constraints. To specify a constraint is mandatory, use scores greater than zero. This will ensure that the then resource will react when the first resource changes state. * If the +first+ resource was running and is stopped, the +then+ resource will also be stopped (if it is running). * If the +first+ resource was not running and cannot be started, the +then+ resource will be stopped (if it is running). * If the +first+ resource is (re)started while the +then+ resource is running, the +then+ resource will be stopped and restarted. === Advisory Ordering === On the other hand, when +score="0"+ is specified for a constraint, the constraint is considered optional and only has an effect when both resources are stopping and/or starting. Any change in state by the +first+ resource will have no effect on the +then+ resource. .Example of an optional and mandatory ordering constraint ====== [source,XML] ------- <constraints> <rsc_order id="order-1" first="Database" then="Webserver" /> <rsc_order id="order-2" first="IP" then="Webserver" score="0"/> </constraints> ------- ====== Some additional information on ordering constraints can be found in the document http://www.clusterlabs.org/mediawiki/images/d/d6/Ordering_Explained.pdf[Ordering Explained]. [[s-resource-colocation]] == Placing Resources Relative to other Resources == +indexterm:[Constraint,Colocation] +indexterm:[Resource,Location Relative to other Resources] When the location of one resource depends on the location of another one, we call this colocation. There is an important side-effect of creating a colocation constraint between two resources: it affects the order in which resources are assigned to a node. If you think about it, it's somewhat obvious. You can't place A relative to B unless you know where B is. footnote:[ While the human brain is sophisticated enough to read the constraint in any order and choose the correct one depending on the situation, the cluster is not quite so smart. Yet. ] So when you are creating colocation constraints, it is important to consider whether you should colocate A with B or B with A. Another thing to keep in mind is that, assuming A is collocated with B, the cluster will also take into account A's preferences when deciding which node to choose for B. For a detailed look at exactly how this occurs, see the http://www.clusterlabs.org/mediawiki/images/6/61/Colocation_Explained.pdf[Colocation Explained] document. === Options === .Properties of a Collocation Constraint -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |id |A unique name for the constraint. +indexterm:[id,Colocation Constraint Field] +indexterm:[Colocation Constraint,Field,id] |rsc |The colocation source. If the constraint cannot be satisfied, the cluster may decide not to allow the resource to run at all. +indexterm:[rsc,Colocation Constraint Field] +indexterm:[Colocation Constraint,Field,rsc] |with-rsc |The colocation target. The cluster will decide where to put this resource first and then decide where to put the resource in the +rsc+ field. + indexterm:[with-rsc,Colocation Constraint Field] + indexterm:[Colocation Constraint,Field,with-rsc] |score |Positive values indicate the resource should run on the same node. Negative values indicate the resources should not run on the same node. Values of \+/- +INFINITY+ change "should" to "must". + indexterm:[score,Colocation Constraint Field] + indexterm:[Colocation Constraint,Field,score] |========================================================= === Mandatory Placement === Mandatory placement occurs any time the constraint's score is ++INFINITY+ or +-INFINITY+. In such cases, if the constraint can't be satisfied, then the +rsc+ resource is not permitted to run. For +score=INFINITY+, this includes cases where the +with-rsc+ resource is not active. If you need +resource1+ to always run on the same machine as +resource2+, you would add the following constraint: .An example colocation constraint [source,XML] <rsc_colocation id="colocate" rsc="resource1" with-rsc="resource2" score="INFINITY"/> Remember, because +INFINITY+ was used, if +resource2+ can't run on any of the cluster nodes (for whatever reason) then +resource1+ will not be allowed to run. Alternatively, you may want the opposite... that +resource1+ cannot run on the same machine as +resource2+. In this case use +score="-INFINITY"+ .An example anti-colocation constraint [source,XML] <rsc_colocation id="anti-colocate" rsc="resource1" with-rsc="resource2" score="-INFINITY"/> Again, by specifying +-INFINTY+, the constraint is binding. So if the only place left to run is where +resource2+ already is, then +resource1+ may not run anywhere. === Advisory Placement === If mandatory placement is about "must" and "must not", then advisory placement is the "I'd prefer if" alternative. For constraints with scores greater than +-INFINITY+ and less than +INFINITY+, the cluster will try and accommodate your wishes but may ignore them if the alternative is to stop some of the cluster resources. Like in life, where if enough people prefer something it effectively becomes mandatory, advisory colocation constraints can combine with other elements of the configuration to behave as if they were mandatory. .An example advisory-only colocation constraint [source,XML] <rsc_colocation id="colocate-maybe" rsc="resource1" with-rsc="resource2" score="500"/> [[s-resource-sets-ordering]] == Ordering Sets of Resources == A common situation is for an administrator to create a chain of ordered resources, such as: .A chain of ordered resources ====== [source,XML] ------- <constraints> <rsc_order id="order-1" first="A" then="B" /> <rsc_order id="order-2" first="B" then="C" /> <rsc_order id="order-3" first="C" then="D" /> </constraints> ------- ====== == Ordered Set == .Visual representation of the four resources' start order for the above constraints image::images/resource-set.png["Ordered set",width="16cm",height="2.5cm",align="center"] To simplify this situation, there is an alternate format for ordering constraints: .A chain of ordered resources expressed as a set ====== [source,XML] ------- <constraints> <rsc_order id="order-1"> <resource_set id="ordered-set-example" sequential="true"> <resource_ref id="A"/> <resource_ref id="B"/> <resource_ref id="C"/> <resource_ref id="D"/> </resource_set> </rsc_order> </constraints> ------- ====== [NOTE] Resource sets have the same ordering semantics as groups. .A group resource with the equivalent ordering rules ====== [source,XML] ------- <group id="dummy"> <primitive id="A" .../> <primitive id="B" .../> <primitive id="C" .../> <primitive id="D" .../> </group> ------- ====== While the set-based format is not less verbose, it is significantly easier to get right and maintain. It can also be expanded to allow ordered sets of (un)ordered resources. In the example below, +rscA+ and +rscB+ can both start in parallel, as can +rscC+ and +rscD+, however +rscC+ and +rscD+ can only start once _both_ +rscA+ _and_ +rscB+ are active. .Ordered sets of unordered resources ====== [source,XML] ------- <constraints> <rsc_order id="order-1"> <resource_set id="ordered-set-1" sequential="false"> <resource_ref id="A"/> <resource_ref id="B"/> </resource_set> <resource_set id="ordered-set-2" sequential="false"> <resource_ref id="C"/> <resource_ref id="D"/> </resource_set> </rsc_order> </constraints> ------- ====== == Two Sets of Unordered Resources == .Visual representation of the start order for two ordered sets of unordered resources image::images/two-sets.png["Two ordered sets",width="13cm",height="7.5cm",align="center"] Of course either set -- or both sets -- of resources can also be internally ordered (by setting +sequential="true"+) and there is no limit to the number of sets that can be specified. .Advanced use of set ordering - Three ordered sets, two of which are internally unordered ====== [source,XML] ------- <constraints> <rsc_order id="order-1"> <resource_set id="ordered-set-1" sequential="false"> <resource_ref id="A"/> <resource_ref id="B"/> </resource_set> <resource_set id="ordered-set-2" sequential="true"> <resource_ref id="C"/> <resource_ref id="D"/> </resource_set> <resource_set id="ordered-set-3" sequential="false"> <resource_ref id="E"/> <resource_ref id="F"/> </resource_set> </rsc_order> </constraints> ------- ====== == Three Resources Sets == .Visual representation of the start order for the three sets defined above image::images/three-sets.png["Three ordered sets",width="16cm",height="7.5cm",align="center"] [[s-resource-sets-collocation]] == Collocating Sets of Resources == Another common situation is for an administrator to create a set of collocated resources. Previously this was possible either by defining a resource group (See <<group-resources>>) which could not always accurately express the design; or by defining each relationship as an individual constraint, causing a constraint explosion as the number of resources and combinations grew. .A chain of collocated resources ====== [source,XML] ------- <constraints> <rsc_colocation id="coloc-1" rsc="B" with-rsc="A" score="INFINITY"/> <rsc_colocation id="coloc-2" rsc="C" with-rsc="B" score="INFINITY"/> <rsc_colocation id="coloc-3" rsc="D" with-rsc="C" score="INFINITY"/> </constraints> ------- ====== To make things easier, we allow an alternate form of colocation constraints using +resource_sets+. Just like the expanded version, a resource that can't be active also prevents any resource that must be collocated with it from being active. For example, if +B was+ not able to run, then both +C (+and by inference +D)+ must also remain stopped. .The equivalent colocation chain expressed using +resource_sets+ ====== [source,XML] ------- <constraints> <rsc_colocation id="coloc-1" score="INFINITY" > <resource_set id="collocated-set-example" sequential="true"> <resource_ref id="A"/> <resource_ref id="B"/> <resource_ref id="C"/> <resource_ref id="D"/> </resource_set> </rsc_colocation> </constraints> ------- ====== [NOTE] Resource sets have the same colocation semantics as groups. .A group resource with the equivalent colocation rules [source,XML] ------- <group id="dummy"> <primitive id="A" .../> <primitive id="B" .../> <primitive id="C" .../> <primitive id="D" .../> </group> ------- This notation can also be used in this context to tell the cluster that a set of resources must all be located with a common peer, but have no dependencies on each other. In this scenario, unlike the previous, +B would+ be allowed to remain active even if +A or+ +C+ (or both) were inactive. .Using colocation sets to specify a common peer. ====== [source,XML] ------- <constraints> <rsc_colocation id="coloc-1" score="INFINITY" > <resource_set id="collocated-set-1" sequential="false"> <resource_ref id="A"/> <resource_ref id="B"/> <resource_ref id="C"/> </resource_set> <resource_set id="collocated-set-2" sequential="true"> <resource_ref id="D"/> </resource_set> </rsc_colocation> </constraints> ------- ====== Of course there is no limit to the number and size of the sets used. The only thing that matters is that in order for any member of set N to be active, all the members of set N+1 must also be active (and naturally on the same node); and if a set has +sequential="true"+, then in order for member M to be active, member M+1 must also be active. You can even specify the role in which the members of a set must be in using the set's role attribute. .A colocation chain where the members of the middle set have no inter-dependencies and the last has master status. ====== [source,XML] ------- <constraints> <rsc_colocation id="coloc-1" score="INFINITY" > <resource_set id="collocated-set-1" sequential="true"> <resource_ref id="A"/> <resource_ref id="B"/> </resource_set> <resource_set id="collocated-set-2" sequential="false"> <resource_ref id="C"/> <resource_ref id="D"/> <resource_ref id="E"/> </resource_set> <resource_set id="collocated-set-2" sequential="true" role="Master"> <resource_ref id="F"/> <resource_ref id="G"/> </resource_set> </rsc_colocation> </constraints> ------- ====== == Another Three Resources Sets == .Visual representation of a colocation chain where the members of the middle set have no inter-dependencies image::images/three-sets-complex.png["Colocation chain",width="16cm",height="9cm",align="center"] diff --git a/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt b/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt index ee6261fdf5..73a2ec2c6d 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt @@ -1,158 +1,158 @@ = Cluster Nodes = == Defining a Cluster Node == Each node in the cluster will have an entry in the nodes section containing its UUID, uname, and type. .Example cluster node entry ====== [source,XML] <node id="1186dc9a-324d-425a-966e-d757e693dc86" uname="pcmk-1" type="normal"/> ====== In normal circumstances, the admin should let the cluster populate this information automatically from the communications and membership data. However one can use the `crm_uuid` tool to read an existing UUID or define a value before the cluster starts. [[s-node-attributes]] == Describing a Cluster Node == Beyond the basic definition of a node the administrator can also describe the node's attributes, such as how much RAM, disk, what OS or kernel version it has, perhaps even its physical location. This information can then be used by the cluster when deciding where to place resources. For more information on the use of node attributes, -see the section on <<ch-rules>>. +see <<ch-rules>>. Node attributes can be specified ahead of time or populated later, when the cluster is running, using `crm_attribute`. Below is what the node's definition would look like if the admin ran the command: .The result of using crm_attribute to specify which kernel pcmk-1 is running ====== [source,C] ------- # crm_attribute --type nodes --node-uname pcmk-1 --attr-name kernel --attr-value `uname -r` ------- [source,XML] ------- <node uname="pcmk-1" type="normal" id="1186dc9a-324d-425a-966e-d757e693dc86"> <instance_attributes id="nodes-1186dc9a-324d-425a-966e-d757e693dc86"> <nvpair id="kernel-1186dc9a-324d-425a-966e-d757e693dc86" name="kernel" value="2.6.16.46-0.4-default"/> </instance_attributes> </node> ------- ====== A simpler way to determine the current value of an attribute is to use `crm_attribute` command again: [source,C] # crm_attribute --type nodes --node-uname pcmk-1 --attr-name kernel --get-value By specifying `--type nodes` the admin tells the cluster that this attribute is persistent. There are also transient attributes which are kept in the status section which are "forgotten" whenever the node rejoins the cluster. The cluster uses this area to store a record of how many times a resource has failed on that node but administrators can also read and write to this section by specifying `--type status`. == Adding a New Cluster Node == === Corosync === Adding a new node is as simple as installing Corosync and Pacemaker, and copying '/etc/corosync/corosync.conf' and '/etc/ais/authkey' (if it exists) from an existing node. You may need to modify the +mcastaddr+ option to match the new node's IP address. If a log message containing "Invalid digest" appears from Corosync, the keys are not consistent between the machines. === Heartbeat === Provided you specified +autojoin any+ in 'ha.cf', adding a new node is as simple as installing heartbeat and copying 'ha.cf' and 'authkeys' from an existing node. If you don't want to use +autojoin+, then after setting up 'ha.cf' and 'authkeys', you must use the `hb_addnode` command before starting the new node. == Removing a Cluster Node == === Corosync === Because the messaging and membership layers are the authoritative source for cluster nodes, deleting them from the CIB is not a reliable solution. First one must arrange for heartbeat to forget about the node (_pcmk-1_ in the example below). On the host to be removed: . Find and record the node's Corosync id: `crm_node -i` . Stop the cluster: `/etc/init.d/corosync stop` Next, from one of the remaining active cluster nodes: . Tell the cluster to forget about the removed host: + [source,C] # crm_node -R $COROSYNC_ID + . Only now is it safe to delete the node from the CIB with: + [source,C] # cibadmin --delete --obj_type nodes --crm_xml '<node uname="_pcmk-1_"/>' # cibadmin --delete --obj_type status --crm_xml '<node_state uname="_pcmk-1_"/>' === Heartbeat === Because the messaging and membership layers are the authoritative source for cluster nodes, deleting them from the CIB is not a reliable solution. First one must arrange for heartbeat to forget about the node (pcmk-1 in the example below). To do this, shut down heartbeat on the node and then, from one of the remaining active cluster nodes, run: [source,C] # hb_delnode pcmk-1 Only then is it safe to delete the node from the CIB with: [source,C] ----- # cibadmin --delete --obj_type nodes --crm_xml '<node uname="pcmk-1"/>' # cibadmin --delete --obj_type status --crm_xml '<node_state uname="pcmk-1"/>' ----- == Replacing a Cluster Node == === Corosync === The five-step guide to replacing an existing cluster node: . Make sure the old node is completely stopped . Give the new machine the same hostname and IP address as the old one . Install the cluster software :-) . Copy '/etc/corosync/corosync.conf' and '/etc/ais/authkey' (if it exists) to the new node . Start the new cluster node If a log message containing "Invalid digest" appears from Corosync, the keys are not consistent between the machines. === Heartbeat === The seven-step guide to replacing an existing cluster node: . Make sure the old node is completely stopped . Give the new machine the same hostname as the old one . Go to an active cluster node and look up the UUID for the old node in '/var/lib/heartbeat/hostcache' . Install the cluster software . Copy 'ha.cf' and 'authkeys' to the new node . On the new node, populate it's UUID using `crm_uuid -w` and the UUID from step 2 . Start the new cluster node diff --git a/doc/Pacemaker_Explained/en-US/Ch-Options.txt b/doc/Pacemaker_Explained/en-US/Ch-Options.txt index d40a09b812..1f27293502 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Options.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Options.txt @@ -1,285 +1,286 @@ = Cluster Options = == Special Options == indexterm:[Special Cluster Options] indexterm:[Cluster Options,Special Options] The reason for these fields to be placed at the top level instead of with the rest of cluster options is simply a matter of parsing. These options are used by the configuration database which is, by design, mostly ignorant of the content it holds. So the decision was made to place them in an easy to find location. == Configuration Version == indexterm:[Configuration Version, Cluster Option] indexterm:[Cluster Options,Configuration Version] When a node joins the cluster, the cluster will perform a check to see who has the best configuration based on the fields below. It then asks the node with the highest (+admin_epoch+, +epoch+, +num_updates+) tuple to replace the configuration on all the nodes - which makes setting them, and setting them correctly, very important. .Configuration Version Properties -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description | admin_epoch | indexterm:[admin_epoch Cluster Option] indexterm:[Cluster Options,admin_epoch] Never modified by the cluster. Use this to make the configurations on any inactive nodes obsolete. _Never set this value to zero_, in such cases the cluster cannot tell the difference between your configuration and the "empty" one used when nothing is found on disk. | epoch | indexterm:[epoch Cluster Option] indexterm:[Cluster Options,epoch] Incremented every time the configuration is updated (usually by the admin) | num_updates | indexterm:[num_updates Cluster Option] indexterm:[Cluster Options,num_updates] Incremented every time the configuration or status is updated (usually by the cluster) |========================================================= == Other Fields == .Properties Controlling Validation -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description | validate-with | indexterm:[validate-with Cluster Option] indexterm:[Cluster Options,validate-with] Determines the type of validation being done on the configuration. If set to "none", the cluster will not verify that updates conform to the DTD (nor reject ones that don't). This option can be useful when operating a mixed version cluster during an upgrade. |========================================================= == Fields Maintained by the Cluster == .Properties Maintained by the Cluster -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |cib-last-written | indexterm:[cib-last-written Cluster Fields] indexterm:[Cluster Fields,cib-last-written] Indicates when the configuration was last written to disk. Informational purposes only. |dc-uuid | indexterm:[dc-uuid Cluster Fields] indexterm:[Cluster Fields,dc-uuid] Indicates which cluster node is the current leader. Used by the cluster when placing resources and determining the order of some events. |have-quorum | indexterm:[have-quorum Cluster Fields] indexterm:[Cluster Fields,have-quorum] Indicates if the cluster has quorum. If false, this may mean that the cluster cannot start resources or fence other nodes. See +no-quorum-policy+ below. |========================================================= Note that although these fields can be written to by the admin, in most cases the cluster will overwrite any values specified by the admin with the "correct" ones. To change the +admin_epoch+, for example, one would use: -`cibadmin --modify --crm_xml ‘<cib admin_epoch="42"/>'` +[source,C] +# cibadmin --modify --crm_xml ‘<cib admin_epoch="42"/>' A complete set of fields will look something like this: .An example of the fields set for a cib object ====== [source,XML] ------- <cib have-quorum="true" validate-with="pacemaker-1.0" admin_epoch="1" epoch="12" num_updates="65" dc-uuid="ea7d39f4-3b94-4cfa-ba7a-952956daabee"> ------- ====== == Cluster Options == Cluster options, as you might expect, control how the cluster behaves when confronted with certain situations. They are grouped into sets and, in advanced configurations, there may be more than one. footnote:[This will be described later in the section on <<ch-rules>> where we will show how to have the cluster use different sets of options during working hours (when downtime is usually to be avoided at all costs) than it does during the weekends (when resources can be moved to the their preferred hosts without bothering end users)] For now we will describe the simple case where each option is present at most once. == Available Cluster Options == .Cluster Options -[width="95%",cols="5m,2m,13",options="header",align="center"] +[width="95%",cols="5m,2,11<",options="header",align="center"] |========================================================= |Option |Default |Description | batch-limit | 30 | indexterm:[batch-limit Cluster Options] indexterm:[Cluster Options,batch-limit] The number of jobs that the TE is allowed to execute in parallel. The "correct" value will depend on the speed and load of your network and cluster nodes. | migration-limit | -1 (unlimited) | indexterm:[migration-limit Cluster Options] indexterm:[Cluster Options,migration-limit] The number of migration jobs that the TE is allowed to execute in parallel on a node. | no-quorum-policy | stop | indexterm:[no-quorum-policy Cluster Options] indexterm:[Cluster Options,no-quorum-policy] What to do when the cluster does not have quorum. Allowed values: * ignore - continue all resource management * freeze - continue resource management, but don't recover resources from nodes not in the affected partition * stop - stop all resources in the affected cluster partition * suicide - fence all nodes in the affected cluster partition | symmetric-cluster | TRUE | indexterm:[symmetric-cluster Cluster Options] indexterm:[Cluster Options,symmetric-cluster] Can all resources run on any node by default? | stonith-enabled | TRUE | indexterm:[stonith-enabled Cluster Options] indexterm:[Cluster Options,stonith-enabled] Should failed nodes and nodes with resources that can't be stopped be shot? If you value your data, set up a STONITH device and enable this. If true, or unset, the cluster will refuse to start resources unless one or more STONITH resources have been configured also. | stonith-action | reboot | indexterm:[stonith-action Cluster Options] indexterm:[Cluster Options,stonith-action] Action to send to STONITH device. Allowed values: reboot, off. The value 'poweroff' is also allowed, but is only used for legacy devices. | cluster-delay | 60s | indexterm:[cluster-delay Cluster Options] indexterm:[Cluster Options,cluster-delay] Round trip delay over the network (excluding action execution). The "correct" value will depend on the speed and load of your network and cluster nodes. | stop-orphan-resources | TRUE | indexterm:[stop-orphan-resources Cluster Options] indexterm:[Cluster Options,stop-orphan-resources] Should deleted resources be stopped? | stop-orphan-actions | TRUE | indexterm:[stop-orphan-actions Cluster Options] indexterm:[Cluster Options,stop-orphan-actions] Should deleted actions be cancelled? | start-failure-is-fatal | TRUE | indexterm:[start-failure-is-fatal Cluster Options] indexterm:[Cluster Options,start-failure-is-fatal] When set to FALSE, the cluster will instead use the resource's +failcount+ and value for +resource-failure-stickiness+. | pe-error-series-max | -1 (all) | indexterm:[pe-error-series-max Cluster Options] indexterm:[Cluster Options,pe-error-series-max] The number of PE inputs resulting in ERRORs to save. Used when reporting problems. | pe-warn-series-max | -1 (all) | indexterm:[pe-warn-series-max Cluster Options] indexterm:[Cluster Options,pe-warn-series-max] The number of PE inputs resulting in WARNINGs to save. Used when reporting problems. | pe-input-series-max | -1 (all) | indexterm:[pe-input-series-max Cluster Options] indexterm:[Cluster Options,pe-input-series-max] The number of "normal" PE inputs to save. Used when reporting problems. |========================================================= You can always obtain an up-to-date list of cluster options, including their default values, by running the `pengine metadata` command. == Querying and Setting Cluster Options == indexterm:[Querying Cluster Options] indexterm:[Setting Cluster Options] indexterm:[Cluster Options,Querying] indexterm:[Cluster Options,Setting] Cluster options can be queried and modified using the `crm_attribute` tool. To get the current value of +cluster-delay+, simply use: [source,C] # crm_attribute --attr-name cluster-delay --get-value which is more simply written as [source,C] # crm_attribute --get-value -n cluster-delay If a value is found, you'll see a result like this: [source,C] # crm_attribute --get-value -n cluster-delay name=cluster-delay value=60s However, if no value is found, the tool will display an error: [source,C] # crm_attribute --get-value -n clusta-deway` name=clusta-deway value=(null) Error performing operation: The object/attribute does not exist To use a different value, eg. +30+, simply run: [source,C] # crm_attribute --attr-name cluster-delay --attr-value 30s To go back to the cluster's default value you can delete the value, for example with this command: [source,C] # crm_attribute --attr-name cluster-delay --delete-attr == When Options are Listed More Than Once == If you ever see something like the following, it means that the option you're modifying is present more than once. .Deleting an option that is listed twice ======= [source,C] ------ # crm_attribute --attr-name batch-limit --delete-attr Multiple attributes match name=batch-limit in crm_config: Value: 50 (set=cib-bootstrap-options, id=cib-bootstrap-options-batch-limit) Value: 100 (set=custom, id=custom-batch-limit) Please choose from one of the matches above and supply the 'id' with --attr-id ------- ======= In such cases follow the on-screen instructions to perform the requested action. To determine which value is currently being used by -the cluster, please refer to the section on <<ch-rules>>. +the cluster, please refer to <<ch-rules>>. diff --git a/doc/Pacemaker_Explained/en-US/Ch-Rules.txt b/doc/Pacemaker_Explained/en-US/Ch-Rules.txt index 06c58ed3a6..f9394adf98 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Rules.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Rules.txt @@ -1,546 +1,546 @@ = Rules = [[ch-rules]] Rules can be used to make your configuration more dynamic. One common example is to set one value for +resource-stickiness+ during working hours, to prevent resources from being moved back to their most preferred location, and another on weekends when no-one is around to notice an outage. Another use of rules might be to assign machines to different processing groups (using a node attribute) based on time and to then use that attribute when creating location constraints. Each rule can contain a number of expressions, date-expressions and even other rules. The results of the expressions are combined based on the rule's +boolean-op+ field to determine if the rule ultimately evaluates to +true+ or +false+. What happens next depends on the context in which the rule is being used. .Properties of a Rule -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |role indexterm:[role Rule Property] indexterm:[Rule,Properties,role] |Limits the rule to apply only when the resource is in that role. Allowed values: _Started_, +Slave,+ and +Master+. NOTE: A rule with +role="Master"+ can not determine the initial location of a clone instance. It will only affect which of the active instances will be promoted. |score indexterm:[score,Rule Property] indexterm:[Rule,Properties,score] |The score to apply if the rule evaluates to +true+. Limited to use in rules that are part of location constraints. |score-attribute indexterm:[score-attribute Rule Property] indexterm:[Rule,Properties,score-attribute] |The node attribute to look up and use as a score if the rule evaluates to +true+. Limited to use in rules that are part of location constraints. |boolean-op indexterm:[boolean-op Rule Property] indexterm:[Rule,Properties,boolean-op] |How to combine the result of multiple expression objects. Allowed values: _and_ and +or+. |========================================================= == Node Attribute Expressions == indexterm:[Node,Attribute Expressions] Expression objects are used to control a resource based on the attributes defined by a node or nodes. In addition to any attributes added by the administrator, each node has a built-in node attribute called +#uname+ that can also be used. .Properties of an Expression -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |value indexterm:[value Expression Property] indexterm:[Expression Properties,value] |User supplied value for comparison |attribute indexterm:[attribute Expression Property] indexterm:[Expression Properties,attribute] |The node attribute to test |type indexterm:[type,Expression Property] indexterm:[Expression Properties,type] |Determines how the value(s) should be tested. Allowed values: _string_, +integer+, +version+ |operation indexterm:[operation Expression Property] indexterm:[Expression Properties,operation] |The comparison to perform. Allowed values: * +lt+ - True if the node attribute's value is less than +value+ * +gt+ - True if the node attribute's value is greater than +value+ * +lte+ - True if the node attribute's value is less than or equal to +value+ * +gte+ - True if the node attribute's value is greater than or equal to +value+ * +eq+ - True if the node attribute's value is equal to +value+ * +ne+ - True if the node attribute's value is not equal to +value+ * +defined+ - True if the node has the named attribute * +not_defined+ - True if the node does not have the named attribute |========================================================= == Time/Date Based Expressions == indexterm:[Time Based Expressions] indexterm:[Expression,Time/Date Based] As the name suggests, +date_expressions+ are used to control a resource or cluster option based on the current date/time. They can contain an optional +date_spec+ and/or +duration+ object depending on the context. .Properties of a Date Expression -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |start |A date/time conforming to the ISO8601 specification. |end |A date/time conforming to the ISO8601 specification. Can be inferred by supplying a value for +start+ and a +duration+. |operation |Compares the current date/time with the start and/or end date, depending on the context. Allowed values: * +gt+ - True if the current date/time is after +start+ * +lt+ - True if the current date/time is before +end+ * +in-range+ - True if the current date/time is after +start+ and before +end+ * +date-spec+ - performs a cron-like comparison to the current date/time |========================================================= [NOTE] ====== Because the comparisons (except for +date_spec+) include the time, the +eq+, +neq+, +gte+ and +lte+ operators have not been implemented. ====== === Date Specifications === indexterm:[Date Specifications] +date_spec+ objects are used to create cron-like expressions relating to time. Each field can contain a single number or a single range. Instead of defaulting to zero, any field not supplied is ignored. For example, +monthdays="1"+ matches the first day of every month and +hours="09-17"+ matches the hours between 9am and 5pm (inclusive). However, at this time one cannot specify +weekdays="1,2"+ or +weekdays="1-2,5-6"+ since they contain multiple ranges. Depending on demand, this may be implemented in a future release. .Properties of a Date Spec -[width="95%",cols="1m,5<",options="header",align="center"] +[width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |id indexterm:[id,Date Spec Property] indexterm:[Date Spec Properties,id] |A unique name for the date |hours indexterm:[hours Date Spec Property] indexterm:[Date Spec Properties,hours] |Allowed values: 0-23 |monthdays indexterm:[monthdays Date Spec Property] indexterm:[Date Spec Properties,monthdays] |Allowed values: 0-31 (depending on month and year) |weekdays indexterm:[weekdays Date Spec Property] indexterm:[Date Spec Properties,weekdays] |Allowed values: 1-7 (1=Monday, 7=Sunday) |yeardays indexterm:[yeardays Date Spec Property] indexterm:[Date Spec Properties,yeardays] |Allowed values: 1-366 (depending on the year) |months indexterm:[months Date Spec Property] indexterm:[Date Spec Properties,months] |Allowed values: 1-12 |weeks indexterm:[weeks Date Spec Property] indexterm:[Date Spec Properties,weeks] |Allowed values: 1-53 (depending on weekyear) |years indexterm:[years Date Spec Property] indexterm:[Date Spec Properties,years] |Year according the Gregorian calendar |weekyears indexterm:[weekyears Date Spec Property] indexterm:[Date Spec Properties,weekyears] |May differ from Gregorian years; Eg. +2005-001 Ordinal+ is also +2005-01-01 Gregorian+ is also +2004-W53-6 Weekly+ |moon indexterm:[moon Date Spec Property] indexterm:[Date Spec Properties,moon] |Allowed values: 0-7 (0 is new, 4 is full moon). Seriously, you can use this. This was implemented to demonstrate the ease with which new comparisons could be added. |========================================================= === Durations === indexterm:[Durations Expressions] indexterm:[Expressions,Durations] Durations are used to calculate a value for +end+ when one is not supplied to in_range operations. They contain the same fields as +date_spec+ objects but without the limitations (ie. you can have a duration of 19 months). Like +date_specs+, any field not supplied is ignored. == Sample Time Based Expressions == A small sample of how time based expressions can be used. //// On older versions of asciidoc, the [source] directive makes the title dissappear //// .True if now is any time in the year 2005 ==== [source,XML] ---- <rule id="rule1"> <date_expression id="date_expr1" start="2005-001" operation="in_range"> <duration years="1"/> </date_expression> </rule> ---- ==== .Equivalent expression ==== [source,XML] ---- <rule id="rule2"> <date_expression id="date_expr2" operation="date_spec"> <date_spec years="2005"/> </date_expression> </rule> ---- ==== .9am-5pm, Mon-Friday ==== [source,XML] ------- <rule id="rule3"> <date_expression id="date_expr3" operation="date_spec"> <date_spec hours="9-16" days="1-5"/> </date_expression> </rule> ------- ==== Please note that the +16+ matches up to +16:59:59+, as the numeric value (hour) still matches! .9am-6pm, Mon-Friday, or all day saturday ==== [source,XML] ------- <rule id="rule4" boolean_op="or"> <date_expression id="date_expr4-1" operation="date_spec"> <date_spec hours="9-16" days="1-5"/> </date_expression> <date_expression id="date_expr4-2" operation="date_spec"> <date_spec days="6"/> </date_expression> </rule> ------- ==== .9am-5pm or 9pm-12pm, Mon-Friday ==== [source,XML] ------- <rule id="rule5" boolean_op="and"> <rule id="rule5-nested1" boolean_op="or"> <date_expression id="date_expr5-1" operation="date_spec"> <date_spec hours="9-16"/> </date_expression> <date_expression id="date_expr5-2" operation="date_spec"> <date_spec hours="21-23"/> </date_expression> </rule> <date_expression id="date_expr5-3" operation="date_spec"> <date_spec days="1-5"/> </date_expression> </rule> ------- ==== .Mondays in March 2005 ==== [source,XML] ------- <rule id="rule6" boolean_op="and"> <date_expression id="date_expr6-1" operation="date_spec"> <date_spec weekdays="1"/> </date_expression> <date_expression id="date_expr6-2" operation="in_range" start="2005-03-01" end="2005-04-01"/> </rule> ------- ==== [NOTE] ====== Because no time is specified, 00:00:00 is implied. This means that the range includes all of 2005-03-01 but none of 2005-04-01. You may wish to write +end="2005-03-31T23:59:59"+ to avoid confusion. ====== .A full moon on Friday the 13th ===== [source,XML] ------- <rule id="rule7" boolean_op="and"> <date_expression id="date_expr7" operation="date_spec"> <date_spec weekdays="5" monthdays="13" moon="4"/> </date_expression> </rule> ------- ===== == Using Rules to Determine Resource Location == indexterm:[Rule,Determine Resource Location] indexterm:[Resource,Location, Determine by Rules] If the constraint's outer-most rule evaluates to +false+, the cluster treats the constraint as if it was not there. When the rule evaluates to +true+, the node's preference for running the resource is updated with the score associated with the rule. If this sounds familiar, its because you have been using a simplified syntax for location constraint rules already. Consider the following location constraint: .Prevent myApacheRsc from running on c001n03 ===== [source,XML] ------- <rsc_location id="dont-run-apache-on-c001n03" rsc="myApacheRsc" score="-INFINITY" node="c001n03"/> ------- ===== This constraint can be more verbosely written as: .Prevent myApacheRsc from running on c001n03 - expanded version ===== [source,XML] ------- <rsc_location id="dont-run-apache-on-c001n03" rsc="myApacheRsc"> <rule id="dont-run-apache-rule" score="-INFINITY"> <expression id="dont-run-apache-expr" attribute="#uname" operation="eq" value="c00n03"/> </rule> </rsc_location> ------- ===== The advantage of using the expanded form is that one can then add extra clauses to the rule, such as limiting the rule such that it only applies during certain times of the day or days of the week (this is discussed in subsequent sections). It also allows us to match on node properties other than its name. If we rated each machine's CPU power such that the cluster had the following nodes section: .A sample nodes section for use with score-attribute ===== [source,XML] ------- <nodes> <node id="uuid1" uname="c001n01" type="normal"> <instance_attributes id="uuid1-custom_attrs"> <nvpair id="uuid1-cpu_mips" name="cpu_mips" value="1234"/> </instance_attributes> </node> <node id="uuid2" uname="c001n02" type="normal"> <instance_attributes id="uuid2-custom_attrs"> <nvpair id="uuid2-cpu_mips" name="cpu_mips" value="5678"/> </instance_attributes> </node> </nodes> ------- ===== then we could prevent resources from running on underpowered machines with the rule [source,XML] ------- <rule id="need-more-power-rule" score="-INFINITY"> <expression id=" need-more-power-expr" attribute="cpu_mips" operation="lt" value="3000"/> </rule> ------- === Using +score-attribute+ Instead of +score+ === When using +score-attribute+ instead of +score+, each node matched by the rule has its score adjusted differently, according to its value for the named node attribute. Thus, in the previous example, if a rule used +score-attribute="cpu_mips"+, +c001n01+ would have its preference to run the resource increased by +1234+ whereas +c001n02+ would have its preference increased by +5678+. == Using Rules to Control Resource Options == Often some cluster nodes will be different from their peers; sometimes these differences (the location of a binary or the names of network interfaces) require resources to be configured differently depending on the machine they're hosted on. By defining multiple +instance_attributes+ objects for the resource and adding a rule to each, we can easily handle these special cases. In the example below, +mySpecialRsc+ will use eth1 and port 9999 when run on +node1+, eth2 and port 8888 on +node2+ and default to eth0 and port 9999 for all other nodes. .Defining different resource options based on the node name ===== [source,XML] ------- <primitive id="mySpecialRsc" class="ocf" type="Special" provider="me"> <instance_attributes id="special-node1" score="3"> <rule id="node1-special-case" score="INFINITY" > <expression id="node1-special-case-expr" attribute="#uname" operation="eq" value="node1"/> </rule> <nvpair id="node1-interface" name="interface" value="eth1"/> </instance_attributes> <instance_attributes id="special-node2" score="2" > <rule id="node2-special-case" score="INFINITY"> <expression id="node2-special-case-expr" attribute="#uname" operation="eq" value="node2"/> </rule> <nvpair id="node2-interface" name="interface" value="eth2"/> <nvpair id="node2-port" name="port" value="8888"/> </instance_attributes> <instance_attributes id="defaults" score="1" > <nvpair id="default-interface" name="interface" value="eth0"/> <nvpair id="default-port" name="port" value="9999"/> </instance_attributes> </primitive> ------- ===== The order in which +instance_attributes+ objects are evaluated is determined by their score (highest to lowest). If not supplied, score defaults to zero and objects with an equal score are processed in listed order. If the +instance_attributes+ object does not have a +rule+ or has a +rule+ that evaluates to +true+, then for any parameter the resource does not yet have a value for, the resource will use the parameter values defined by the +instance_attributes+ object. == Using Rules to Control Cluster Options == indexterm:[Rule,Controlling Cluster Options] indexterm:[Cluster Options,Controlled by Rules] Controlling cluster options is achieved in much the same manner as specifying different resource options on different nodes. The difference is that because they are cluster options, one cannot (or should not, because they won't work) use attribute based expressions. The following example illustrates how to set a different +resource-stickiness+ value during and outside of work hours. This allows resources to automatically move back to their most preferred hosts, but at a time that (in theory) does not interfere with business activities. .Change +resource-stickiness+ during working hours ===== [source,XML] ------- <rsc_defaults> <meta_attributes id="core-hours" score="2"> <rule id="core-hour-rule" score="0"> <date_expression id="nine-to-five-Mon-to-Fri" operation="date_spec"> <date_spec id="nine-to-five-Mon-to-Fri-spec" hours="9-16" weekdays="1-5"/> </date_expression> </rule> <nvpair id="core-stickiness" name="resource-stickiness" value="INFINITY"/> </meta_attributes> <meta_attributes id="after-hours" score="1" > <nvpair id="after-stickiness" name="resource-stickiness" value="0"/> </meta_attributes> </rsc_defaults> ------- ===== [[s-rules-recheck]] == Ensuring Time Based Rules Take Effect == A Pacemaker cluster is an event driven system. As such, it won't recalculate the best place for resources to run in unless something (like a resource failure or configuration change) happens. This can mean that a location constraint that only allows resource X to run between 9am and 5pm is not enforced. If you rely on time based rules, it is essential that you set the +cluster-recheck-interval+ option. This tells the cluster to periodically recalculate the ideal state of the cluster. For example, if you set +cluster-recheck-interval=5m+, then sometime between 9:00 and 9:05 the cluster would notice that it needs to start resource X, and between 17:00 and 17:05 it would realize that X needed to be stopped. Note that the timing of the actual start and stop actions depends on what else needs to be performed first.