diff --git a/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt b/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt index ff5db377e9..15944d1094 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Constraints.txt @@ -1,825 +1,825 @@ = Resource Constraints = indexterm:[Resource,Constraints] == Scores == Scores of all kinds are integral to how the cluster works. Practically everything from moving a resource to deciding which resource to stop in a degraded cluster is achieved by manipulating scores in some way. Scores are calculated per resource and node. Any node with a negative score for a resource can't run that resource. The cluster places a resource on the node with the highest score for it. === Infinity Math === Pacemaker implements +INFINITY+ (or equivalently, ++INFINITY+) internally as a score of 1,000,000. Addition and subtraction with it follow these three basic rules: * Any value + +INFINITY+ = +INFINITY+ * Any value - +INFINITY+ = +-INFINITY+ * +INFINITY+ - +INFINITY+ = +-INFINITY+ [NOTE] ====== What if you want to use a score higher than 1,000,000? Typically this possibility arises when someone wants to base the score on some external metric that might go above 1,000,000. The short answer is you can't. The long answer is it is sometimes possible work around this limitation creatively. You may be able to set the score to some computed value based on the external metric rather than use the metric directly. For nodes, you can store the metric as a node attribute, and query the attribute when computing the score (possibly as part of a custom resource agent). ====== == Deciding Which Nodes a Resource Can Run On == indexterm:[Location Constraints] indexterm:[Resource,Constraints,Location] 'Location constraints' tell the cluster which nodes a resource can run on. There are two alternative strategies. One way is to say that, by default, resources can run anywhere, and then the location constraints specify nodes that are not allowed (an 'opt-out' cluster). The other way is to start with nothing able to run anywhere, and use location constraints to selectively enable allowed nodes (an 'opt-in' cluster). Whether you should choose opt-in or opt-out depends on your personal preference and the make-up of your cluster. If most of your resources can run on most of the nodes, then an opt-out arrangement is likely to result in a simpler configuration. On the other-hand, if most resources can only run on a small subset of nodes, an opt-in configuration might be simpler. === Location Properties === .Properties of a rsc_location Constraint [width="95%",cols="2m,1,5 ------- ====== === Symmetrical "Opt-Out" Clusters === indexterm:[Symmetrical Opt-Out Clusters] indexterm:[Cluster Type,Symmetrical Opt-Out] To create an opt-out cluster, start by allowing resources to run anywhere by default: ---- # crm_attribute --name symmetric-cluster --update true ---- Then start disabling nodes. The following fragment is the equivalent of the above opt-in configuration. .Opt-out location constraints for two resources ====== [source,XML] ------- ------- ====== [[node-score-equal]] === What if Two Nodes Have the Same Score === If two nodes have the same score, then the cluster will choose one. This choice may seem random and may not be what was intended, however the cluster was not given enough information to know any better. .Constraints where a resource prefers two nodes equally ====== [source,XML] ------- ------- ====== In the example above, assuming no other constraints and an inactive cluster, +Webserver+ would probably be placed on +sles-1+ and +Database+ on +sles-2+. It would likely have placed +Webserver+ based on the node's uname and +Database+ based on the desire to spread the resource load evenly across the cluster. However other factors can also be involved in more complex configurations. [[s-resource-ordering]] == Specifying the Order in which Resources Should Start/Stop == indexterm:[Resource,Constraints,Ordering] indexterm:[Resource,Start Order] indexterm:[Ordering Constraints] 'Ordering constraints' tell the cluster the order in which resources should start. [IMPORTANT] ==== Ordering constraints affect 'only' the ordering of resources; they do 'not' require that the resources be placed on the same node. If you want resources to be started on the same node 'and' in a specific order, you need both an ordering constraint 'and' a colocation constraint (see <>), or alternatively, a group (see <>). ==== === Ordering Properties === .Properties of a rsc_order Constraint [width="95%",cols="1m,1,4> resources. === Optional and mandatory ordering === Here is an example of ordering constraints where +Database+ 'must' start before +Webserver+, and +IP+ 'should' start before +Webserver+ if they both need to be started: .Optional and mandatory ordering constraints ====== [source,XML] ------- ------- ====== Because the above example lets +symmetrical+ default to TRUE, +Webserver+ must be stopped before +Database+ can be stopped, and +Webserver+ should be stopped before +IP+ if they both need to be stopped. [[s-resource-colocation]] == Placing Resources Relative to other Resources == indexterm:[Resource,Constraints,Colocation] indexterm:[Resource,Location Relative to other Resources] 'Colocation constraints' tell the cluster that the location of one resource depends on the location of another one. Colocation has an important side-effect: it affects the order in which resources are assigned to a node. Think about it: You can't place A relative to B unless you know where B is. footnote:[ While the human brain is sophisticated enough to read the constraint in any order and choose the correct one depending on the situation, the cluster is not quite so smart. Yet. ] So when you are creating colocation constraints, it is important to consider whether you should colocate A with B, or B with A. Another thing to keep in mind is that, assuming A is colocated with B, the cluster will take into account A's preferences when deciding which node to choose for B. For a detailed look at exactly how this occurs, see http://clusterlabs.org/doc/Colocation_Explained.pdf[Colocation Explained]. [IMPORTANT] ==== Colocation constraints affect 'only' the placement of resources; they do 'not' require that the resources be started in a particular order. If you want resources to be started on the same node 'and' in a specific order, you need both an ordering constraint (see <>) 'and' a colocation constraint, or alternatively, a group (see <>). ==== === Colocation Properties === .Properties of a rsc_colocation Constraint [width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |id |A unique name for the constraint. indexterm:[id,Colocation Constraints] indexterm:[Constraints,Colocation,id] |rsc |The name of a resource that should be located relative to +with-rsc+. indexterm:[rsc,Colocation Constraints] indexterm:[Constraints,Colocation,rsc] |with-rsc |The name of the resource used as the colocation target. The cluster will decide where to put this resource first and then decide where to put +rsc+. indexterm:[with-rsc,Colocation Constraints] indexterm:[Constraints,Colocation,with-rsc] |score |Positive values indicate the resources should run on the same node. Negative values indicate the resources should run on different nodes. Values of \+/- +INFINITY+ change "should" to "must". indexterm:[score,Colocation Constraints] indexterm:[Constraints,Colocation,score] |========================================================= === Mandatory Placement === Mandatory placement occurs when the constraint's score is ++INFINITY+ or +-INFINITY+. In such cases, if the constraint can't be satisfied, then the +rsc+ resource is not permitted to run. For +score=INFINITY+, this includes cases where the +with-rsc+ resource is not active. If you need resource +A+ to always run on the same machine as resource +B+, you would add the following constraint: .Mandatory colocation constraint for two resources ==== [source,XML] ==== Remember, because +INFINITY+ was used, if +B+ can't run on any of the cluster nodes (for whatever reason) then +A+ will not be allowed to run. Whether +A+ is running or not has no effect on +B+. Alternatively, you may want the opposite -- that +A+ 'cannot' run on the same machine as +B+. In this case, use +score="-INFINITY"+. .Mandatory anti-colocation constraint for two resources ==== [source,XML] ==== Again, by specifying +-INFINITY+, the constraint is binding. So if the only place left to run is where +B+ already is, then +A+ may not run anywhere. As with +INFINITY+, +B+ can run even if +A+ is stopped. However, in this case +A+ also can run if +B+ is stopped, because it still meets the constraint of +A+ and +B+ not running on the same node. === Advisory Placement === If mandatory placement is about "must" and "must not", then advisory placement is the "I'd prefer if" alternative. For constraints with scores greater than +-INFINITY+ and less than +INFINITY+, the cluster will try to accommodate your wishes but may ignore them if the alternative is to stop some of the cluster resources. As in life, where if enough people prefer something it effectively becomes mandatory, advisory colocation constraints can combine with other elements of the configuration to behave as if they were mandatory. .Advisory colocation constraint for two resources ==== [source,XML] ==== [[s-resource-sets]] == Resource Sets == 'Resource sets' allow multiple resources to be affected by a single constraint. .A set of 3 resources ==== [source,XML] ---- ---- ==== Resource sets are valid inside +rsc_location+, +rsc_order+ (see <>), +rsc_colocation+ (see <>), and +rsc_ticket+ (see <>) constraints. A resource set has a number of properties that can be set, though not all have an effect in all contexts. .Properties of a resource_set [width="95%",cols="2m,1,5 ------- ====== .Visual representation of the four resources' start order for the above constraints image::images/resource-set.png["Ordered set",width="16cm",height="2.5cm",align="center"] === Ordered Set === To simplify this situation, resource sets (see <>) can be used within ordering constraints: .A chain of ordered resources expressed as a set ====== [source,XML] ------- ------- ====== While the set-based format is not less verbose, it is significantly easier to get right and maintain. [IMPORTANT] ========= If you use a higher-level tool, pay attention to how it exposes this functionality. Depending on the tool, creating a set +A B+ may be equivalent to +A then B+, or +B then A+. ========= === Ordering Multiple Sets === The syntax can be expanded to allow ordered sets of (un)ordered resources. In the example below, +A+ and +B+ can both start in parallel, as can +C+ and +D+, however +C+ and +D+ can only start once _both_ +A+ _and_ +B+ are active. .Ordered sets of unordered resources ====== [source,XML] ------- ------- ====== .Visual representation of the start order for two ordered sets of unordered resources image::images/two-sets.png["Two ordered sets",width="13cm",height="7.5cm",align="center"] Of course either set -- or both sets -- of resources can also be internally ordered (by setting +sequential="true"+) and there is no limit to the number of sets that can be specified. .Advanced use of set ordering - Three ordered sets, two of which are internally unordered ====== [source,XML] ------- ------- ====== .Visual representation of the start order for the three sets defined above image::images/three-sets.png["Three ordered sets",width="16cm",height="7.5cm",align="center"] [IMPORTANT] ==== An ordered set with +sequential=false+ makes sense only if there is another set in the constraint. Otherwise, the constraint has no effect. ==== === Resource Set OR Logic === The unordered set logic discussed so far has all been "AND" logic. To illustrate this take the 3 resource set figure in the previous section. Those sets can be expressed, +(A and B) then \(C) then (D) then (E and F)+. Say for example we want to change the first set, +(A and B)+, to use "OR" logic so the sets look like this: +(A or B) then \(C) then (D) then (E and F)+. This functionality can be achieved through the use of the +require-all+ option. This option defaults to TRUE which is why the "AND" logic is used by default. Setting +require-all=false+ means only one resource in the set needs to be started before continuing on to the next set. .Resource Set "OR" logic: Three ordered sets, where the first set is internally unordered with "OR" logic ====== [source,XML] ------- ------- ====== [IMPORTANT] ==== An ordered set with +require-all=false+ makes sense only in conjunction with +sequential=false+. Think of it like this: +sequential=false+ modifies the set to be an unordered set using "AND" logic by default, and adding +require-all=false+ flips the unordered set's "AND" logic to "OR" logic. ==== [[s-resource-sets-colocation]] == Colocating Sets of Resources == Another common situation is for an administrator to create a set of colocated resources. One way to do this would be to define a resource group (see <>), but that cannot always accurately express the desired state. Another way would be to define each relationship as an individual constraint, but that causes a constraint explosion as the number of resources and combinations grow. An example of this approach: .Chain of colocated resources ====== [source,XML] ------- ------- ====== To make things easier, resource sets (see <>) can be used within colocation constraints. As with the chained version, a resource that can't be active prevents any resource that must be colocated with it from being active. For example, if +B+ is not able to run, then both +C+ and by inference +D+ must also remain stopped. Here is an example +resource_set+: .Equivalent colocation chain expressed using +resource_set+ ====== [source,XML] ------- ------- ====== [IMPORTANT] ========= If you use a higher-level tool, pay attention to how it exposes this functionality. Depending on the tool, creating a set +A B+ may be equivalent to +A with B+, or +B with A+. ========= This notation can also be used to tell the cluster that a set of resources must all be located with a common peer, but have no dependencies on each other. In this scenario, unlike the previous, +B+ 'would' be allowed to remain active even if +A+ or +C+ (or both) were inactive. .Using colocated sets to specify a common peer ====== [source,XML] ------- ------- ====== [IMPORTANT] ==== A colocated set with +sequential=false+ makes sense only if there is another set in the constraint. Otherwise, the constraint has no effect. ==== There is no inherent limit to the number and size of the sets used. The only thing that matters is that in order for any member of one set in the constraint to be active, all members of sets listed after it must also be active (and naturally on the same node); and if a set has +sequential="true"+, then in order for one member of that set to be active, all members listed after it must also be active. You can even specify the role in which the members of a set must be in using the set's +role+ attribute. .A colocation chain where the members of the middle set have no interdependencies and the last has master status. ====== [source,XML] ------- ------- ====== .Visual representation of a colocation chain where the members of the middle set have no inter-dependencies image::images/three-sets-complex.png["Colocation chain",width="16cm",height="9cm",align="center"] [NOTE] ==== Unlike ordered sets, colocated sets do not use the +require-all+ option. ==== diff --git a/doc/Pacemaker_Explained/en-US/Ch-Resources.txt b/doc/Pacemaker_Explained/en-US/Ch-Resources.txt index ad60b1916b..2d988f09bf 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Resources.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Resources.txt @@ -1,844 +1,846 @@ = Cluster Resources = == What is a Cluster Resource? == indexterm:[Resource] A resource is a service made highly available by a cluster. The simplest type of resource, a 'primitive' resource, is described in this chapter. More complex forms, such as groups and clones, are described in later chapters. Every primitive resource has a 'resource agent'. A resource agent is an external program that abstracts the service it provides and present a consistent view to the cluster. This allows the cluster to be agnostic about the resources it manages. The cluster doesn't need to understand how the resource works because it relies on the resource agent to do the right thing when given a `start`, `stop` or `monitor` command. For this reason, it is crucial that resource agents are well-tested. Typically, resource agents come in the form of shell scripts. However, they can be written using any technology (such as C, Python or Perl) that the author is comfortable with. [[s-resource-supported]] == Resource Classes == indexterm:[Resource,class] Pacemaker supports several classes of agents: * OCF * LSB * Upstart * Systemd * Service * Fencing * Nagios Plugins === Open Cluster Framework === indexterm:[Resource,OCF] indexterm:[OCF,Resources] indexterm:[Open Cluster Framework,Resources] The OCF standard footnote:[See http://www.opencf.org/cgi-bin/viewcvs.cgi/specs/ra/resource-agent-api.txt?rev=HEAD -- at least as it relates to resource agents. The Pacemaker implementation has been somewhat extended from the OCF specs, but none of those changes are incompatible with the original OCF specification.] is basically an extension of the Linux Standard Base conventions for init scripts to: * support parameters, * make them self-describing, and * make them extensible OCF specs have strict definitions of the exit codes that actions must return. footnote:[ The resource-agents source code includes the `ocf-tester` script, which can be useful in this regard. ] The cluster follows these specifications exactly, and giving the wrong exit code will cause the cluster to behave in ways you will likely find puzzling and annoying. In particular, the cluster needs to distinguish a completely stopped resource from one which is in some erroneous and indeterminate state. Parameters are passed to the resource agent as environment variables, with the special prefix +OCF_RESKEY_+. So, a parameter which the user thinks of as +ip+ will be passed to the resource agent as +OCF_RESKEY_ip+. The number and purpose of the parameters is left to the resource agent; however, the resource agent should use the `meta-data` command to advertise any that it supports. The OCF class is the most preferred as it is an industry standard, highly flexible (allowing parameters to be passed to agents in a non-positional manner) and self-describing. For more information, see the http://www.linux-ha.org/wiki/OCF_Resource_Agents[reference] and <>. === Linux Standard Base === indexterm:[Resource,LSB] indexterm:[LSB,Resources] indexterm:[Linux Standard Base,Resources] LSB resource agents are those found in +/etc/init.d+. Generally, they are provided by the OS distribution and, in order to be used with the cluster, they must conform to the LSB Spec. footnote:[ See http://refspecs.linux-foundation.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html for the LSB Spec as it relates to init scripts. ] [WARNING] ==== Many distributions claim LSB compliance but ship with broken init scripts. For details on how to check whether your init script is LSB-compatible, see <>. Common problematic violations of the LSB standard include: * Not implementing the status operation at all * Not observing the correct exit status codes for `start/stop/status` actions * Starting a started resource returns an error * Stopping a stopped resource returns an error ==== [IMPORTANT] ==== Remember to make sure the computer is _not_ configured to start any services at boot time -- that should be controlled by the cluster. ==== === Systemd === indexterm:[Resource,Systemd] indexterm:[Systemd,Resources] Some newer distributions have replaced the old http://en.wikipedia.org/wiki/Init#SysV-style["SysV"] style of initialization daemons and scripts with an alternative called http://www.freedesktop.org/wiki/Software/systemd[Systemd]. Pacemaker is able to manage these services _if they are present_. Instead of init scripts, systemd has 'unit files'. Generally, the services (unit files) are provided by the OS distribution, but there are online guides for converting from init scripts. footnote:[For example, http://0pointer.de/blog/projects/systemd-for-admins-3.html] [IMPORTANT] ==== Remember to make sure the computer is _not_ configured to start any services at boot time -- that should be controlled by the cluster. ==== === Upstart === indexterm:[Resource,Upstart] indexterm:[Upstart,Resources] Some newer distributions have replaced the old http://en.wikipedia.org/wiki/Init#SysV-style["SysV"] style of initialization daemons (and scripts) with an alternative called http://upstart.ubuntu.com/[Upstart]. Pacemaker is able to manage these services _if they are present_. Instead of init scripts, upstart has 'jobs'. Generally, the services (jobs) are provided by the OS distribution. [IMPORTANT] ==== Remember to make sure the computer is _not_ configured to start any services at boot time -- that should be controlled by the cluster. ==== === System Services === indexterm:[Resource,System Services] indexterm:[System Service,Resources] Since there are various types of system services (+systemd+, +upstart+, and +lsb+), Pacemaker supports a special +service+ alias which intelligently figures out which one applies to a given cluster node. This is particularly useful when the cluster contains a mix of +systemd+, +upstart+, and +lsb+. In order, Pacemaker will try to find the named service as: . an LSB init script . a Systemd unit file . an Upstart job === STONITH === indexterm:[Resource,STONITH] indexterm:[STONITH,Resources] The STONITH class is used exclusively for fencing-related resources. This is discussed later in <>. === Nagios Plugins === indexterm:[Resource,Nagios Plugins] indexterm:[Nagios Plugins,Resources] Nagios Plugins footnote:[The project has two independent forks, hosted at https://www.nagios-plugins.org/ and https://www.monitoring-plugins.org/. Output from both projects' plugins is similar, so plugins from either project can be used with pacemaker.] allow us to monitor services on remote hosts. Pacemaker is able to do remote monitoring with the plugins _if they are present_. A common use case is to configure them as resources belonging to a resource container (usually a virtual machine), and the container will be restarted if any of them has failed. Another use is to configure them as ordinary resources to be used for monitoring hosts or services via the network. The supported parameters are same as the long options of the plugin. [[primitive-resource]] == Resource Properties == These values tell the cluster which resource agent to use for the resource, where to find that resource agent and what standards it conforms to. .Properties of a Primitive Resource [width="95%",cols="1m,6<",options="header",align="center"] |========================================================= |Field |Description |id |Your name for the resource indexterm:[id,Resource] indexterm:[Resource,Property,id] |class |The standard the resource agent conforms to. Allowed values: +lsb+, +nagios+, +ocf+, +service+, +stonith+, +systemd+, +upstart+ indexterm:[class,Resource] indexterm:[Resource,Property,class] |type |The name of the Resource Agent you wish to use. E.g. +IPaddr+ or +Filesystem+ indexterm:[type,Resource] indexterm:[Resource,Property,type] |provider |The OCF spec allows multiple vendors to supply the same resource agent. To use the OCF resource agents supplied by the Heartbeat project, you would specify +heartbeat+ here. indexterm:[provider,Resource] indexterm:[Resource,Property,provider] |========================================================= The XML definition of a resource can be queried with the `crm_resource` tool. For example: ---- # crm_resource --resource Email --query-xml ---- might produce: .A system resource definition ===== [source,XML] ===== [NOTE] ===== One of the main drawbacks to system services (LSB, systemd or Upstart) resources is that they do not allow any parameters! ===== //// See https://tools.ietf.org/html/rfc5737 for choice of example IP address //// .An OCF resource definition ===== [source,XML] ------- ------- ===== [[s-resource-options]] == Resource Options == Resources have two types of options: 'meta-attributes' and 'instance attributes'. Meta-attributes apply to any type of resource, while instance attributes are specific to each resource agent. === Resource Meta-Attributes === Meta-attributes are used by the cluster to decide how a resource should behave and can be easily set using the `--meta` option of the `crm_resource` command. .Meta-attributes of a Primitive Resource [width="95%",cols="2m,2,5> resources, promoted to master if appropriate) * +Slave:+ Allow the resource to be started, but only in Slave mode if the resource is <> * +Master:+ Equivalent to +Started+ indexterm:[target-role,Resource Option] indexterm:[Resource,Option,target-role] |is-managed |TRUE |Is the cluster allowed to start and stop the resource? Allowed values: +true+, +false+ indexterm:[is-managed,Resource Option] indexterm:[Resource,Option,is-managed] |resource-stickiness |value of +resource-stickiness+ in the +rsc_defaults+ section |How much does the resource prefer to stay where it is? indexterm:[resource-stickiness,Resource Option] indexterm:[Resource,Option,resource-stickiness] |requires |fencing (unless +stonith-enabled+ is +false+ or +class+ is +stonith+, in which case it defaults to quorum) -|Conditions under which the resource can be started ('Since 1.1.8') +|Conditions under which the resource can be started '(since 1.1.8)' Allowed values: * +nothing:+ can always be started * +quorum:+ The cluster can only start this resource if a majority of the configured nodes are active * +fencing:+ The cluster can only start this resource if a majority of the configured nodes are active _and_ any failed or unknown nodes have been powered off -* +unfencing:+ The cluster can only start this resource if a majority +* +unfencing:+ + The cluster can only start this resource if a majority of the configured nodes are active _and_ any failed or unknown nodes have been powered off _and_ only on nodes that have been 'unfenced' + '(since 1.1.9)' indexterm:[requires,Resource Option] indexterm:[Resource,Option,requires] |migration-threshold |INFINITY |How many failures may occur for this resource on a node, before this node is marked ineligible to host this resource. A value of 0 indicates that this feature is disabled (the node will never be marked ineligible); by constrast, the cluster treats INFINITY (the default) as a very large but finite number. This option has an effect only if the failed operation has on-fail=restart (the default), and additionally for failed start operations, if the cluster property start-failure-is-fatal is false. indexterm:[migration-threshold,Resource Option] indexterm:[Resource,Option,migration-threshold] |failure-timeout |0 |How many seconds to wait before acting as if the failure had not occurred, and potentially allowing the resource back to the node on which it failed. A value of 0 indicates that this feature is disabled. As with any time-based actions, this is not guaranteed to be checked more frequently than the value of +cluster-recheck-interval+ (see <>). indexterm:[failure-timeout,Resource Option] indexterm:[Resource,Option,failure-timeout] |multiple-active |stop_start |What should the cluster do if it ever finds the resource active on more than one node? Allowed values: * +block:+ mark the resource as unmanaged * +stop_only:+ stop all active instances and leave them that way * +stop_start:+ stop all active instances and start the resource in one location only indexterm:[multiple-active,Resource Option] indexterm:[Resource,Option,multiple-active] |remote-node | |The name of the remote-node this resource defines. This both enables the resource as a remote-node and defines the unique name used to identify the remote-node. If no other parameters are set, this value will also be assumed as the hostname to connect to at the port specified by +remote-port+. +WARNING:+ This value cannot overlap with any resource or node IDs. If not specified, this feature is disabled. |remote-port |3121 |Port to use for the guest connection to pacemaker_remote |remote-addr |value of +remote-node+ |The IP address or hostname to connect to if remote-node's name is not the hostname of the guest. |+remote-connect-timeout+ |60s |How long before a pending guest connection will time out. |========================================================= [NOTE] ==== Support for remote nodes was added in pacemaker 1.1.10. If you are using an earlier version, options related to remote nodes will not be available. ==== As an example of setting resource options, if you performed the following commands on an LSB Email resource: ------- # crm_resource --meta --resource Email --set-parameter priority --parameter-value 100 # crm_resource -m -r Email -p multiple-active -v block ------- the resulting resource definition might be: .An LSB resource with cluster options ===== [source,XML] ------- ------- ===== [[s-resource-defaults]] === Setting Global Defaults for Resource Meta-Attributes === To set a default value for a resource option, add it to the +rsc_defaults+ section with `crm_attribute`. For example, ---- # crm_attribute --type rsc_defaults --name is-managed --update false ---- would prevent the cluster from starting or stopping any of the resources in the configuration (unless of course the individual resources were specifically enabled by having their +is-managed+ set to +true+). === Resource Instance Attributes === The resource agents of some resource classes (lsb, systemd and upstart 'not' among them) can be given parameters which determine how they behave and which instance of a service they control. If your resource agent supports parameters, you can add them with the `crm_resource` command. For example, ---- # crm_resource --resource Public-IP --set-parameter ip --parameter-value 192.0.2.2 ---- would create an entry in the resource like this: .An example OCF resource with instance attributes ===== [source,XML] ------- ------- ===== For an OCF resource, the result would be an environment variable called +OCF_RESKEY_ip+ with a value of +192.0.2.2+. The list of instance attributes supported by an OCF resource agent can be found by calling the resource agent with the `meta-data` command. The output contains an XML description of all the supported attributes, their purpose and default values. .Displaying the metadata for the Dummy resource agent template ===== ---- # export OCF_ROOT=/usr/lib/ocf # $OCF_ROOT/resource.d/pacemaker/Dummy meta-data ---- [source,XML] ------- 1.0 This is a Dummy Resource Agent. It does absolutely nothing except keep track of whether its running or not. Its purpose in life is for testing and to serve as a template for RA writers. NB: Please pay attention to the timeouts specified in the actions section below. They should be meaningful for the kind of resource the agent manages. They should be the minimum advised timeouts, but they shouldn't/cannot cover _all_ possible resource instances. So, try to be neither overly generous nor too stingy, but moderate. The minimum timeouts should never be below 10 seconds. Example stateless resource agent Location to store the resource state in. State file Fake attribute that can be changed to cause a reload Fake attribute that can be changed to cause a reload Number of seconds to sleep during operations. This can be used to test how the cluster reacts to operation timeouts. Operation sleep duration in seconds. ------- ===== == Resource Operations == indexterm:[Resource,Action] 'Operations' are actions the cluster can perform on a resource by calling the resource agent. Resource agents must support certain common operations such as start, stop and monitor, and may implement any others. Some operations are generated by the cluster itself, for example, stopping and starting resources as needed. You can configure operations in the cluster configuration. As an example, by default the cluster will 'not' ensure your resources stay healthy once they are started. footnote:[Currently, anyway. Automatic monitoring operations may be added in a future version of Pacemaker.] To instruct the cluster to do this, you need to add a +monitor+ operation to the resource's definition. .An OCF resource with a recurring health check ===== [source,XML] ------- ------- ===== .Properties of an Operation [width="95%",cols="2m,3,6>. indexterm:[interval,Action Property] indexterm:[Action,Property,interval] |timeout | |How long to wait before declaring the action has failed indexterm:[timeout,Action Property] indexterm:[Action,Property,timeout] |on-fail |restart '(except for stop operations, which default to' fence 'when STONITH is enabled and' block 'otherwise)' |The action to take if this action ever fails. Allowed values: * +ignore:+ Pretend the resource did not fail. * +block:+ Don't perform any further operations on the resource. * +stop:+ Stop the resource and do not start it elsewhere. * +restart:+ Stop the resource and start it again (possibly on a different node). * +fence:+ STONITH the node on which the resource failed. * +standby:+ Move _all_ resources away from the node on which the resource failed. indexterm:[on-fail,Action Property] indexterm:[Action,Property,on-fail] |enabled |TRUE |If +false+, ignore this operation definition. This is typically used to pause a particular recurring monitor operation; for instance, it can complement the respective resource being unmanaged (+is-managed=false+), as this alone will <>. Disabling the operation does not suppress all actions of the given type. Allowed values: +true+, +false+. indexterm:[enabled,Action Property] indexterm:[Action,Property,enabled] |record-pending | |If +true+, the intention to perform the operation is recorded so that GUIs and CLI tools can indicate that an operation is in progress. This is best set as an 'operation default' (see next section). Allowed values: +true+, +false+. indexterm:[enabled,Action Property] indexterm:[Action,Property,enabled] |role | |Run the operation only on node(s) that the cluster thinks should be in the specified role. This only makes sense for recurring monitor operations. Allowed (case-sensitive) values: +Stopped+, +Started+, and in the case of <> resources, +Slave+ and +Master+. indexterm:[role,Action Property] indexterm:[Action,Property,role] |========================================================= [[s-resource-monitoring]] === Monitoring Resources for Failure === When Pacemaker first starts a resource, it runs one-time monitor operations (referred to as 'probes') to ensure the resource is running where it's supposed to be, and not running where it's not supposed to be. (This behavior can be affected by the +resource-discovery+ location constraint property.) Other than those initial probes, Pacemaker will not (by default) check that the resource continues to stay healthy. As in the example above, you must configure monitor operations explicitly to perform these checks. By default, a monitor operation will ensure that the resource is running where it is supposed to. The +target-role+ property can be used for further checking. For example, if a resource has one monitor operation with +interval=10 role=Started+ and a second monitor operation with +interval=11 role=Stopped+, the cluster will run the first monitor on any nodes it thinks 'should' be running the resource, and the second monitor on any nodes that it thinks 'should not' be running the resource (for the truly paranoid, who want to know when an administrator manually starts a service by mistake). [[s-monitoring-unmanaged]] === Monitoring Resources When Administration is Disabled === Recurring monitor operations behave differently under various administrative settings: * When a resource is unmanaged (by setting +is-managed=false+): No monitors will be stopped. + If the unmanaged resource is stopped on a node where the cluster thinks it should be running, the cluster will detect and report that it is not, but it will not consider the monitor failed, and will not try to start the resource until it is managed again. + Starting the unmanaged resource on a different node is strongly discouraged and will at least cause the cluster to consider the resource failed, and may require the resource's +target-role+ to be set to +Stopped+ then +Started+ to be recovered. * When a node is put into standby: All resources will be moved away from the node, and all monitor operations will be stopped on the node, except those with +role=Stopped+. Monitor operations with +role=Stopped+ will be started on the node if appropriate. * When the cluster is put into maintenance mode: All resources will be marked as unmanaged. All monitor operations will be stopped, except those with +role=Stopped+. As with single unmanaged resources, starting a resource on a node other than where the cluster expects it to be will cause problems. [[s-operation-defaults]] === Setting Global Defaults for Operations === You can change the global default values for operation properties in a given cluster. These are defined in an +op_defaults+ section of the CIB's +configuration+ section, and can be set with `crm_attribute`. For example, ---- # crm_attribute --type op_defaults --name timeout --update 20s ---- would default each operation's +timeout+ to 20 seconds. If an operation's definition also includes a value for +timeout+, then that value would be used for that operation instead. === When Implicit Operations Take a Long Time === The cluster will always perform a number of implicit operations: +start+, +stop+ and a non-recurring +monitor+ operation used at startup to check whether the resource is already active. If one of these is taking too long, then you can create an entry for them and specify a longer timeout. .An OCF resource with custom timeouts for its implicit actions ===== [source,XML] ------- ------- ===== === Multiple Monitor Operations === Provided no two operations (for a single resource) have the same name and interval, you can have as many monitor operations as you like. In this way, you can do a superficial health check every minute and progressively more intense ones at higher intervals. To tell the resource agent what kind of check to perform, you need to provide each monitor with a different value for a common parameter. The OCF standard creates a special parameter called +OCF_CHECK_LEVEL+ for this purpose and dictates that it is "made available to the resource agent without the normal +OCF_RESKEY+ prefix". Whatever name you choose, you can specify it by adding an +instance_attributes+ block to the +op+ tag. It is up to each resource agent to look for the parameter and decide how to use it. .An OCF resource with two recurring health checks, performing different levels of checks specified via +OCF_CHECK_LEVEL+. ===== [source,XML] ------- ------- ===== === Disabling a Monitor Operation === The easiest way to stop a recurring monitor is to just delete it. However, there can be times when you only want to disable it temporarily. In such cases, simply add +enabled="false"+ to the operation's definition. .Example of an OCF resource with a disabled health check ===== [source,XML] ------- ------- ===== This can be achieved from the command line by executing: ---- # cibadmin --modify --xml-text '' ---- Once you've done whatever you needed to do, you can then re-enable it with ---- # cibadmin --modify --xml-text '' ---- diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt index d2880e0843..7e7cb58c5d 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt @@ -1,901 +1,907 @@ = STONITH = //// We prefer [[ch-stonith]], but older versions of asciidoc don't deal well with that construct for chapter headings //// anchor:ch-stonith[Chapter 13, STONITH] indexterm:[STONITH, Configuration] == What Is STONITH? == STONITH (an acronym for "Shoot The Other Node In The Head"), also called 'fencing', protects your data from being corrupted by rogue nodes or concurrent access. Just because a node is unresponsive, this doesn't mean it isn't accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing the data to be accessed from another node. STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere. == What STONITH Device Should You Use? == It is crucial that the STONITH device can allow the cluster to differentiate between a node failure and a network one. The biggest mistake people make in choosing a STONITH device is to use a remote power switch (such as many on-board IPMI controllers) that shares power with the node it controls. In such cases, the cluster cannot be sure if the node is really offline, or active and suffering from a network fault. Likewise, any device that relies on the machine being active (such as SSH-based "devices" used during testing) are inappropriate. == Special Treatment of STONITH Resources == STONITH resources are somewhat special in Pacemaker. STONITH may be initiated by pacemaker or by other parts of the cluster (such as resources like DRBD or DLM). To accommodate this, pacemaker does not require the STONITH resource to be in the 'started' state in order to be used, thus allowing reliable use of STONITH devices in such a case. [NOTE] ==== In pacemaker versions 1.1.9 and earlier, this feature either did not exist or did not work well. Only "running" STONITH resources could be used by Pacemaker for fencing, and if another component tried to fence a node while Pacemaker was moving STONITH resources, the fencing could fail. ==== All nodes have access to STONITH devices' definitions and instantiate them on-the-fly when needed, but preference is given to 'verified' instances, which are the ones that are 'started' according to the cluster's knowledge. In the case of a cluster split, the partition with a verified instance will have a slight advantage, because the STONITH daemon in the other partition will have to hear from all its current peers before choosing a node to perform the fencing. Fencing resources do work the same as regular resources in some respects: * +target-role+ can be used to enable or disable the resource * Location constraints can be used to prevent a specific node from using the resource [IMPORTANT] =========== Currently there is a limitation that fencing resources may only have one set of meta-attributes and one set of instance attributes. This can be revisited if it becomes a significant limitation for people. =========== See the table below or run `man stonithd` to see special instance attributes that may be set for any fencing resource, regardless of fence agent. .Properties of Fencing Resources [width="95%",cols="5m,2,3,10 ---- ==== Based on that, we would create a STONITH resource fragment that might look like this: .An IPMI-based STONITH Resource ==== [source,XML] ---- ---- ==== Finally, we need to enable STONITH: ---- # crm_attribute -t crm_config -n stonith-enabled -v true ---- == Advanced STONITH Configurations == Some people consider that having one fencing device is a single point of failure footnote:[Not true, since a node or resource must fail before fencing even has a chance to]; others prefer removing the node from the storage and network instead of turning it off. Whatever the reason, Pacemaker supports fencing nodes with multiple devices through a feature called 'fencing topologies'. Simply create the individual devices as you normally would, then define one or more +fencing-level+ entries in the +fencing-topology+ section of the configuration. * Each fencing level is attempted in order of ascending +index+. Allowed indexes are 0 to 9. * If a device fails, processing terminates for the current level. No further devices in that level are exercised, and the next level is attempted instead. * If the operation succeeds for all the listed devices in a level, the level is deemed to have passed. * The operation is finished when a level has passed (success), or all levels have been attempted (failed). * If the operation failed, the next step is determined by the Policy Engine and/or `crmd`. Some possible uses of topologies include: * Try poison-pill and fail back to power * Try disk and network, and fall back to power if either fails * Initiate a kdump and then poweroff the node .Properties of Fencing Levels [width="95%",cols="1m,3<",options="header",align="center"] |========================================================= |Field |Description |id |A unique name for the level indexterm:[id,fencing-level] indexterm:[Fencing,fencing-level,id] |target |The name of a single node to which this level applies indexterm:[target,fencing-level] indexterm:[Fencing,fencing-level,target] |target-pattern |A regular expression matching the names of nodes to which this level applies -'(since 1.1.14)' + '(since 1.1.14)' indexterm:[target-pattern,fencing-level] indexterm:[Fencing,fencing-level,target-pattern] |target-attribute -|The name of a node attribute that is set for nodes to which this level applies -'(since 1.1.14)' +|The name of a node attribute that is set (to +target-value+) for nodes to + which this level applies '(since 1.1.14)' + indexterm:[target-attribute,fencing-level] + indexterm:[Fencing,fencing-level,target-attribute] + +|target-value +|The node attribute value (of +target-attribute+) that is set for nodes to + which this level applies '(since 1.1.14)' indexterm:[target-attribute,fencing-level] indexterm:[Fencing,fencing-level,target-attribute] |index |The order in which to attempt the levels. Levels are attempted in ascending order 'until one succeeds'. indexterm:[index,fencing-level] indexterm:[Fencing,fencing-level,index] |devices |A comma-separated list of devices that must all be tried for this level indexterm:[devices,fencing-level] indexterm:[Fencing,fencing-level,devices] |========================================================= .Fencing topology with different devices for different nodes ==== [source,XML] ---- ... ... ---- ==== === Example Dual-Layer, Dual-Device Fencing Topologies === The following example illustrates an advanced use of +fencing-topology+ in a cluster with the following properties: * 3 nodes (2 active prod-mysql nodes, 1 prod_mysql-rep in standby for quorum purposes) * the active nodes have an IPMI-controlled power board reached at 192.0.2.1 and 192.0.2.2 * the active nodes also have two independent PSUs (Power Supply Units) connected to two independent PDUs (Power Distribution Units) reached at 198.51.100.1 (port 10 and port 11) and 203.0.113.1 (port 10 and port 11) * the first fencing method uses the `fence_ipmi` agent * the second fencing method uses the `fence_apc_snmp` agent targetting 2 fencing devices (one per PSU, either port 10 or 11) * fencing is only implemented for the active nodes and has location constraints * fencing topology is set to try IPMI fencing first then default to a "sure-kill" dual PDU fencing In a normal failure scenario, STONITH will first select +fence_ipmi+ to try to kill the faulty node. Using a fencing topology, if that first method fails, STONITH will then move on to selecting +fence_apc_snmp+ twice: * once for the first PDU * again for the second PDU The fence action is considered successful only if both PDUs report the required status. If any of them fails, STONITH loops back to the first fencing method, +fence_ipmi+, and so on until the node is fenced or fencing action is cancelled. .First fencing method: single IPMI device Each cluster node has it own dedicated IPMI channel that can be called for fencing using the following primitives: [source,XML] ---- ---- .Second fencing method: dual PDU devices Each cluster node also has two distinct power channels controlled by two distinct PDUs. That means a total of 4 fencing devices configured as follows: - Node 1, PDU 1, PSU 1 @ port 10 - Node 1, PDU 2, PSU 2 @ port 10 - Node 2, PDU 1, PSU 1 @ port 11 - Node 2, PDU 2, PSU 2 @ port 11 The matching fencing agents are configured as follows: [source,XML] ---- ---- .Location Constraints To prevent STONITH from trying to run a fencing agent on the same node it is supposed to fence, constraints are placed on all the fencing primitives: [source,XML] ---- ---- .Fencing topology Now that all the fencing resources are defined, it's time to create the right topology. We want to first fence using IPMI and if that does not work, fence both PDUs to effectively and surely kill the node. [source,XML] ---- ---- Please note, in +fencing-topology+, the lowest +index+ value determines the priority of the first fencing method. .Final configuration Put together, the configuration looks like this: [source,XML] ---- ... ... ---- == Remapping Reboots == When the cluster needs to reboot a node, whether because +stonith-action+ is +reboot+ or because a reboot was manually requested (such as by `stonith_admin --reboot`), it will remap that to other commands in two cases: . If the chosen fencing device does not support the +reboot+ command, the cluster will ask it to perform +off+ instead. . If a fencing topology level with multiple devices must be executed, the cluster will ask all the devices to perform +off+, then ask the devices to perform +on+. To understand the second case, consider the example of a node with redundant power supplies connected to intelligent power switches. Rebooting one switch and then the other would have no effect on the node. Turning both switches off, and then on, actually reboots the node. In such a case, the fencing operation will be treated as successful as long as the +off+ commands succeed, because then it is safe for the cluster to recover any resources that were on the node. Timeouts and errors in the +on+ phase will be logged but ignored. When a reboot operation is remapped, any action-specific timeout for the remapped action will be used (for example, +pcmk_off_timeout+ will be used when executing the +off+ command, not +pcmk_reboot_timeout+). [NOTE] ==== In Pacemaker versions 1.1.13 and earlier, reboots will not be remapped in the second case. To achieve the same effect, separate fencing devices for off and on actions must be configured. ==== diff --git a/extra/resources/docker-wrapper b/extra/resources/docker-wrapper index 7b3cad1f20..e12559cf52 100755 --- a/extra/resources/docker-wrapper +++ b/extra/resources/docker-wrapper @@ -1,536 +1,536 @@ #!/bin/bash # # Copyright (c) 2015 David Vossel # All Rights Reserved. # # This program is free software; you can redistribute it and/or modify # it under the terms of version 2 of the GNU General Public License as # published by the Free Software Foundation. # # This program is distributed in the hope that it would be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. # # Further, this software is distributed without any warranty that it is # free of the rightful claim of any third person regarding infringement # or the like. Any license provided herein, whether implied or # otherwise, applies only to this software file. Patent licenses, if # any, provided herein do not apply to combinations of this program with # other software, or any other product whatsoever. # # You should have received a copy of the GNU General Public License # along with this program; if not, write the Free Software Foundation, # Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. # ####################################################################### # Initialization: : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat} . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs ####################################################################### CONF_PREFIX="pcmk_docker" meta_data() { cat < 1.0 Docker technology wrapper for pacemaker remote. docker wrapper Docker image to run resources within docker image Give resources within container access to cluster resources such as the CIB and the ability to manage cluster attributes. NOTE: Do not confuse this with the docker run command's '--priviledged' option which gives a container permission to access system devices. To toggle the docker run option, - set --priviledged=true as part of the ${CONF_PREFIS}_run_opts + set --priviledged=true as part of the ${CONF_PREFIX}_run_opts arguments. The ${CONF_PREFIX}_privileged option only pertains to whether or not the container has access to the cluster's CIB or not. Some multistate resources need to be able to write values to the cib, which would require enabling ${CONF_PREFIX}_privileged is privileged Add options to be appended to the 'docker run' command which is used when creating the container during the start action. This option allows users to do things such as setting a custom entry point and injecting environment variables into the newly created container. Note the '-d' option is supplied regardless of this value to force containers to run in the background. NOTE: Do not explicitly specify the --name argument in the run_opts. This agent will set --name using the resource's instance name run options Allow the container to be reused after stopping the container. By default containers are removed after stop. With the reuse option containers will persist after the container stops. reuse container END } ####################################################################### CLIENT="/usr/libexec/pacemaker/lrmd_internal_ctl" DOCKER_AGENT="/usr/lib/ocf/resource.d/heartbeat/docker" KEY_VAL_STR="" PROVIDER=$OCF_RESKEY_CRM_meta_provider CLASS=$OCF_RESKEY_CRM_meta_class TYPE=$OCF_RESKEY_CRM_meta_type CONTAINER=$OCF_RESKEY_CRM_meta_isolation_instance if [ -z "$CONTAINER" ]; then CONTAINER=$OCF_RESOURCE_INSTANCE fi RSC_STATE_DIR="${HA_RSCTMP}/docker-wrapper/${CONTAINER}-data/" RSC_STATE_FILE="$RSC_STATE_DIR/$OCF_RESOURCE_INSTANCE.state" CONNECTION_FAILURE=0 HOST_LOG_DIR="${HA_RSCTMP}/docker-wrapper/${CONTAINER}-logs" HOST_LOG_FILE="$HOST_LOG_DIR/pacemaker.log" GUEST_LOG_DIR="/var/log/pcmk" GUEST_LOG_FILE="$GUEST_LOG_DIR/pacemaker.log" pcmk_docker_wrapper_usage() { cat < $RSC_STATE_FILE fi } clear_state_file() { if [ -f "$RSC_STATE_FILE" ]; then rm -f $RSC_STATE_FILE fi } clear_state_dir() { [ -d "$RSC_STATE_DIR" ] || return 0 rm -rf $RSC_STATE_DIR } num_active_resources() { local count [ -d "$RSC_STATE_DIR" ] || return 0 count="$(ls $RSC_STATE_DIR | wc -w)" if [ $? -ne 0 ] || [ -z "$count" ]; then return 0 fi return $count } random_port() { local port=$(python -c 'import socket; s=socket.socket(); s.bind(("localhost", 0)); print(s.getsockname()[1]); s.close()') if [ $? -eq 0 ] && [ -n "$port" ]; then echo "$port" fi } get_active_port() { PORT="$(docker port $CONTAINER 3121 | awk -F: '{ print $2 }')" } # separate docker args from ocf resource args. separate_args() { local env key value # write out arguments to key value string for ocf agent while read -r line; do key="$(echo $line | awk -F= '{print $1}' | sed 's/^OCF_RESKEY_//g')" val="$(echo $line | awk -F= '{print $2}')" KEY_VAL_STR="$KEY_VAL_STR -k '$key' -v '$val'" done < <(printenv | grep "^OCF.*" | grep -v "^OCF_RESKEY_${CONF_PREFIX}_.*") # sanitize args for DOCKER agent's consumption while read -r line; do env="$(echo $line | awk -F= '{print $1}')" val="$(echo $line | awk -F= '{print $2}')" key="$(echo "$env" | sed "s/^OCF_RESKEY_${CONF_PREFIX}/OCF_RESKEY/g")" export $key="$val" done < <(printenv | grep "^OCF_RESKEY_${CONF_PREFIX}_.*") if ocf_is_true $OCF_RESKEY_privileged ; then export OCF_RESKEY_run_cmd="/usr/sbin/pacemaker_remoted" # on start set random port to run_opts # write port to state file... or potentially get from ps? maybe docker info or inspect as well? else export OCF_RESKEY_run_cmd="/usr/libexec/pacemaker/lrmd" fi export OCF_RESKEY_name="$CONTAINER" } monitor_container() { local rc $DOCKER_AGENT monitor rc=$? if [ $rc -ne $OCF_SUCCESS ]; then clear_state_dir return $rc fi poke_remote rc=$? if [ $rc -ne $OCF_SUCCESS ]; then # container is up without an active daemon. this is bad ocf_log err "Container, $CONTAINER, is active without a responsive pacemaker_remote instance" CONNECTION_FAILURE=1 return $OCF_ERR_GENERIC fi CONNECTION_FAILURE=0 return $rc } pcmk_docker_wrapper_monitor() { local rc monitor_container rc=$? if [ $rc -ne $OCF_SUCCESS ]; then return $rc fi client_action "monitor" rc=$? if [ $rc -eq $OCF_SUCCESS ] || [ $rc -eq $OCF_RUNNING_MASTER ]; then write_state_file else clear_state_file fi return $rc } pcmk_docker_wrapper_generic_action() { local rc monitor_container rc=$? if [ $? -ne $OCF_SUCCESS ]; then return $rc fi client_action "$1" } client_action() { local action=$1 local agent_type="-T $TYPE -C $CLASS" local rc=0 if [ -n "$PROVIDER" ]; then agent_type="$agent_type -P $PROVIDER" fi if ocf_is_true $OCF_RESKEY_privileged ; then if [ -z "$PORT" ]; then get_active_port fi export PCMK_logfile=$HOST_LOG_FILE ocf_log info "$CLIENT -c 'exec' -S '127.0.0.1' -p '$PORT' -a '$action' -r '$OCF_RESOURCE_INSTANCE' -n '$CONTAINER' '$agent_type' $KEY_VAL_STR " eval $CLIENT -c 'exec' -S '127.0.0.1' -p '$PORT' -a '$action' -r '$OCF_RESOURCE_INSTANCE' -n '$CONTAINER' '$agent_type' $KEY_VAL_STR else export PCMK_logfile=$GUEST_LOG_FILE ocf_log info "$CLIENT -c \"exec\" -a $action -r \"$OCF_RESOURCE_INSTANCE\" $agent_type $KEY_VAL_STR" echo "$CLIENT -c \"exec\" -a $action -r \"$OCF_RESOURCE_INSTANCE\" $agent_type $KEY_VAL_STR " | nsenter --target $(docker inspect --format {{.State.Pid}} ${CONTAINER}) --mount --uts --ipc --net --pid fi rc=$? ocf_log debug "Client action $action with result $rc" return $rc } poke_remote() { # verifies daemon in container is active if ocf_is_true $OCF_RESKEY_privileged ; then get_active_port ocf_log info "Attempting to contect $CONTAINER on port $PORT" $CLIENT -c "poke" -S "127.0.0.1" -p $PORT -n $CONTAINER fi # no op for non privileged containers since we handed the # client monitor action as the monitor_cmd for the docker agent } start_container() { local rc monitor_container rc=$? if [ $rc -eq $OCF_SUCCESS ]; then return $rc fi mkdir -p $HOST_LOG_DIR export OCF_RESKEY_run_opts="-e PCMK_logfile=$GUEST_LOG_FILE $OCF_RESKEY_run_opts" export OCF_RESKEY_run_opts="-v $HOST_LOG_DIR:$GUEST_LOG_DIR $OCF_RESKEY_run_opts" if ocf_is_true $OCF_RESKEY_privileged ; then if ! [ -f "/etc/pacemaker/authkey" ]; then # generate an authkey if it doesn't exist. mkdir -p /etc/pacemaker/ dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1 > /dev/null 2>&1 chmod 600 /etc/pacemaker/authkey fi PORT=$(random_port) if [ -z "$PORT" ]; then ocf_exit_reason "Unable to assign random port for pacemaker remote" return $OCF_ERR_GENERIC fi export OCF_RESKEY_run_opts="-p 127.0.0.1:${PORT}:3121 $OCF_RESKEY_run_opts" export OCF_RESKEY_run_opts="-v /etc/pacemaker/authkey:/etc/pacemaker/authkey $OCF_RESKEY_run_opts" ocf_log debug "using privileged mode: run_opts=$OCF_RESKEY_run_opts" else export OCF_RESKEY_monitor_cmd="$CLIENT -c poke" fi $DOCKER_AGENT start rc=$? if [ $rc -ne $OCF_SUCCESS ]; then docker ps > /dev/null 2>&1 if [ $? -ne 0 ]; then ocf_exit_reason "docker daemon is inactive." fi return $rc fi monitor_container } pcmk_docker_wrapper_start() { local rc start_container rc=$? if [ $rc -ne $OCF_SUCCESS ]; then return $rc fi client_action "start" rc=$? if [ $? -ne "$OCF_SUCCESS" ]; then ocf_exit_reason "Failed to start agent within container" return $rc fi pcmk_docker_wrapper_monitor rc=$? if [ $rc -eq $OCF_SUCCESS ]; then ocf_log notice "$OCF_RESOURCE_INSTANCE started successfully. Container's logfile can be found at $HOST_LOG_FILE" fi return $rc } stop_container() { local rc local count num_active_resources count=$? if [ $count -ne 0 ]; then ocf_log err "Failed to stop agent within container. Killing container $CONTAINER with $count active resources" fi $DOCKER_AGENT "stop" rc=$? if [ $rc -ne $OCF_SUCCESS ]; then ocf_exit_reason "Docker container failed to stop" return $rc fi clear_state_dir return $rc } stop_resource() { local rc client_action "stop" rc=$? if [ $? -ne "$OCF_SUCCESS" ]; then export OCF_RESKEY_force_stop="true" kill_now=1 else clear_state_file fi } pcmk_docker_wrapper_stop() { local rc local kill_now=0 local all_stopped=0 pcmk_docker_wrapper_monitor rc=$? if [ $rc -eq $OCF_NOT_RUNNING ]; then rc=$OCF_SUCCESS num_active_resources if [ $? -eq 0 ]; then # stop container if no more resources are running ocf_log info "Gracefully stopping container $CONTAINER because no resources are left running." stop_container rc=$? fi return $rc fi # if we can't talk to the remote daemon but the container is # active, we have to force kill the container. if [ $CONNECTION_FAILURE -eq 1 ]; then export OCF_RESKEY_force_kill="true" stop_container return $? fi # If we've gotten this far, the container is up, and we # need to gracefully stop a resource within the container. client_action "stop" rc=$? if [ $? -ne "$OCF_SUCCESS" ]; then export OCF_RESKEY_force_stop="true" # force kill the container if we fail to stop a resource. stop_container rc=$? else clear_state_file num_active_resources if [ $? -eq 0 ]; then # stop container if no more resources are running ocf_log info "Gracefully stopping container $CONTAINER because last resource has stopped" stop_container rc=$? fi fi return $rc } pcmk_docker_wrapper_validate() { check_binary docker if [ -z "$CLASS" ] || [ -z "$TYPE" ]; then ocf_exit_reason "Update pacemaker to a version that supports container wrappers." return $OCF_ERR_CONFIGURED fi if ! [ -f "$DOCKER_AGENT" ]; then ocf_exit_reason "Requires $DOCKER_AGENT to be installed. update the resource-agents package" return $OCF_ERR_INSTALLED fi $DOCKER_AGENT validate-all return $? } case $__OCF_ACTION in meta-data) meta_data exit $OCF_SUCCESS ;; usage|help) pcmk_docker_wrapper_usage exit $OCF_SUCCESS ;; esac separate_args pcmk_docker_wrapper_validate rc=$? if [ $rc -ne 0 ]; then case $__OCF_ACTION in stop) exit $OCF_SUCCESS;; monitor) exit $OCF_NOT_RUNNING;; *) exit $rc;; esac fi case $__OCF_ACTION in start) pcmk_docker_wrapper_start;; stop) pcmk_docker_wrapper_stop;; monitor|status) pcmk_docker_wrapper_monitor;; reload|promote|demote|notify) pcmk_docker_wrapper_generic_action $__OCF_ACTION;; validate-all) pcmk_docker_wrapper_validate;; *) pcmk_docker_wrapper_usage exit $OCF_ERR_UNIMPLEMENTED ;; esac rc=$? ocf_log debug "Docker-wrapper ${OCF_RESOURCE_INSTANCE} $__OCF_ACTION : $rc" exit $rc diff --git a/xml/Readme.md b/xml/Readme.md index b78e13d26b..88d9137828 100644 --- a/xml/Readme.md +++ b/xml/Readme.md @@ -1,53 +1,84 @@ +# Schema Reference + +Besides the version of Pacemaker itself, the XML schema of the Pacemaker +configuration has its own version. + +## Versioned Schema Evolution + +A versioned schema offers transparent backward/forward compatibility. + +- It reflects the timeline of schema-backed features (introduction, + changes to the syntax, possibly deprecation) through the versioned + stable schema increments, while keeping schema versions used by default + by older Pacemaker versions untouched. + +- Pacemaker internally uses the latest stable schema version, and relies on + supplemental transformations to promote cluster configurations based on + older, incompatible schema versions into the desired form. + +- It allows experimental features with a possibly unstable configuration + interface to be developed using the special `next` version of the schema. + +## Mapping Pacemaker Versions to Schema Versions + +| Pacemaker | Latest Schema | Changed +| --------- | ------------- | ---------------------------------------------- +| `1.1.15` | `2.5` | `alerts` +| `1.1.14` | `2.4` | `fencing` +| `1.1.13` | `2.3` | `constraints` +| `1.1.12` | `2.0` | `nodes`, `nvset`, `resources`, `tags` + `acls` +| `1.1.8`+ | `1.2` | + # Updating schema files # ## Experimental features ## Experimental features go into `${base}-next.rng` Create from the most recent `${base}-${X}.${Y}.rng` if it does not already exist ## Stable features ## The current stable version is determined at runtime when __xml_build_schema_list() interrogates the CRM_DTD_DIRECTORY. It will have the form `pacemaker-${X}.${Y}` and the highest `${X}.${Y}` wins. ### Simple Additions When the new syntax is a simple addition to the previous one, create a new entry with `${Y} = ${Yold} + 1` ### Feature Removal or otherwise Incompatible Changes When the new syntax is not a simple addition to the previous one, create a new entry with `${X} = ${Xold} + 1` and `${Y} = 0`. An XSLT file is also required that converts an old syntax to the new one and must be named `upgrade-${Xold}.${Yold}.xsl`. See `xml/upgrade06.xsl` for an example. ### General Proceedure 1. Copy the most recent version of `${base}-*.rng` to `${base}-${X}.${Y}.rng` 1. Commit the copy, eg. `"Clone the latest ${base} schema in preparation for changes"`. This way the actual change will be obvious in the commit history. 1. Modify `${base}-${X}.${Y}.rng` as required 1. Add an XSLT file if required and update `xslt_SCRIPTS` in `xml/Makefile.am` 1. Commit ## Admin Tasks New features will not be available until the admin 1. Updates all the nodes 1. Runs the equivalent of `cibadmin --upgrade` ## Random Notes From the source directory, run `make -C xml diff` to see the changes in the current schema (compared to the previous ones) and also the pending changes in `pacemaker-next`. Alternatively, if the intention is to grok the overall historical schema evolution, use `make -C xml fulldiff`.