diff --git a/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt b/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt index 511b87986e..761161d805 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Nodes.txt @@ -1,87 +1,206 @@ :compat-mode: legacy = Cluster Nodes = == Defining a Cluster Node == Each node in the cluster will have an entry in the nodes section containing its UUID, uname, and type. .Example Corosync cluster node entry ====== [source,XML] ====== In normal circumstances, the admin should let the cluster populate this information automatically from the communications and membership data. [[s-node-name]] == Where Pacemaker Gets the Node Name == Traditionally, Pacemaker required nodes to be referred to by the value returned by `uname -n`. This can be problematic for services that require the `uname -n` to be a specific value (e.g. for a licence file). This requirement has been relaxed for clusters using Corosync 2.0 or later. The name Pacemaker uses is: . The value stored in +corosync.conf+ under *ring0_addr* in the *nodelist*, if it does not contain an IP address; otherwise . The value stored in +corosync.conf+ under *name* in the *nodelist*; otherwise . The value of `uname -n` Pacemaker provides the `crm_node -n` command which displays the name used by a running cluster. If a Corosync *nodelist* is used, `crm_node --name-for-id` pass:[number] is also available to display the name used by the node with the corosync *nodeid* of pass:[number], for example: `crm_node --name-for-id 2`. [[s-node-attributes]] == Node Attributes == indexterm:[Node,attribute] -'Node attributes' are a special type of option (name-value pair) that -applies to a node object. +Pacemaker allows node-specific values to be specified using 'node attributes'. +A node attribute has a name, and may have a distinct value for each node. -Beyond the basic definition of a node, the administrator can -describe the node's attributes, such as how much RAM, disk, what OS or -kernel version it has, perhaps even its physical location. This -information can then be used by the cluster when deciding where to -place resources. For more information on the use of node attributes, -see <>. +While certain node attributes have specific meanings to the cluster, they are +mainly intended to allow administrators and resource agents to track any +information desired. -Node attributes can be specified ahead of time or populated later, -when the cluster is running, using `crm_attribute`. +For example, an administrator might choose to define node attributes for how +much RAM and disk space each node has, which OS each uses, or which server room +rack each node is in. -Below is what the node's definition would look like if the admin ran the command: +Users can configure <> that use node attributes to affect +where resources are placed. + +=== Setting and querying node attributes === + +Node attributes can be set and queried using the `crm_attribute` and +`attrd_updater` commands, so that the user does not have to deal with XML +configuration directly. + +Here is an example of what XML configuration would be generated if an +administrator ran this command: .Result of using crm_attribute to specify which kernel pcmk-1 is running ====== ------- # crm_attribute --type nodes --node pcmk-1 --name kernel --update $(uname -r) ------- [source,XML] ------- - - - + + + ------- ====== -Rather than having to read the XML, a simpler way to determine the current -value of an attribute is to use `crm_attribute` again: +To read back the value that was just set: ---- # crm_attribute --type nodes --node pcmk-1 --name kernel --query -scope=nodes name=kernel value=3.10.0-123.13.2.el7.x86_64 +scope=nodes name=kernel value=3.10.0-862.14.4.el7.x86_64 ---- By specifying `--type nodes` the admin tells the cluster that this -attribute is persistent. There are also transient attributes which -are kept in the status section which are "forgotten" whenever the node -rejoins the cluster. The cluster uses this area to store a record of -how many times a resource has failed on that node, but administrators -can also read and write to this section by specifying `--type status`. +attribute is persistent across reboots. There are also transient attributes +which are kept in the status section and are "forgotten" whenever the node +leaves the cluster. Administrators can use this section by specifying +`--type status`. + +=== Special node attributes === + +Certain node attributes have special meaning to the cluster. + +Node attribute names beginning with # are considered reserved for these +special attributes. Some special attributes do not start with #, for +historical reasons. + +Certain special attributes are set automatically by the cluster, should never +be modified directly, and can be used only within <>; +these are listed under <>. + +For true/false values, the cluster considers a value of "1", "y", "yes", "on", +or "true" (case-insensitively) to be true, "0", "n", "no", "off", "false", or +unset to be false, and anything else to be an error. + +.Node attributes with special significance +[width="95%",cols="2m,<5",options="header",align="center"] +|==== +|Name |Description + +| fail-count-* +| Attributes whose names start with +fail-count-+ are managed by the cluster + to track how many times particular resource operations have failed on this + node. These should be queried and cleared via the `crm_failcount` or + `crm_resource --cleanup` commands rather than directly. +indexterm:[Node,attribute,fail-count-] +indexterm:[fail-count-,Node attribute] + +| last-failure-* +| Attributes whose names start with +last-failure-+ are managed by the cluster + to track when particular resource operations have most recently failed on + this node. These should be cleared via the `crm_failcount` or + `crm_resource --cleanup` commands rather than directly. +indexterm:[Node,attribute,last-failure-] +indexterm:[last-failure-,Node attribute] + +| maintenance +| Similar to the +maintenance-mode+ <>, but for + a single node. If true, resources will not be started or stopped on the node, + resources and individual clone instances running on the node will become + unmanaged, and any recurring operations for those will be cancelled. +indexterm:[Node,attribute,maintenance] +indexterm:[maintenance,Node attribute] + +| probe_complete +| This is managed by the cluster to detect when nodes need to be reprobed, and + should never be used directly. +indexterm:[Node,attribute,probe_complete] +indexterm:[probe_complete,Node attribute] + +| resource-discovery-enabled +| If the node is a remote node, fencing is enabled, and this attribute is + explicitly set to false (unset means true in this case), resource discovery + (probes) will not be done on this node. This is highly discouraged; the + +resource-discovery+ location constraint property is preferred for this + purpose. +indexterm:[Node,attribute,resource-discovery-enabled] +indexterm:[resource-discovery-enabled,Node attribute] + +| shutdown +| This is managed by the cluster to orchestrate the shutdown of a node, + and should never be used directly. +indexterm:[Node,attribute,shutdown] +indexterm:[shutdown,Node attribute] + +| site-name +| If set, this will be used as the value of the +#site-name+ node attribute + used in rules. (If not set, the value of the +cluster-name+ cluster option + will be used as +#site-name+ instead.) +indexterm:[Node,attribute,site-name] +indexterm:[site-name,Node attribute] + +| standby +| If true, the node is in standby mode. This is typically set and queried via + the `crm_standby` command rather than directly. +indexterm:[Node,attribute,standby] +indexterm:[standby,Node attribute] + +| terminate +| If the value is true or begins with any nonzero number, the node will be + fenced. This is typically set by tools rather than directly. +indexterm:[Node,attribute,terminate] +indexterm:[terminate,Node attribute] + +| #digests-* +| Attributes whose names start with +#digests-+ are managed by the cluster to + detect when <> needs to be redone, and should never be + used directly. +indexterm:[Node,attribute,#digests-] +indexterm:[#digests-,Node attribute] + +| #node-unfenced +| When the node was last unfenced (as seconds since the epoch). This is managed + by the cluster and should never be used directly. +indexterm:[Node,attribute,#node-unfenced] +indexterm:[#node-unfenced,Node attribute] + +|==== + +[WARNING] +==== +Restarting pacemaker on a node that is in single-node maintenance mode will +likely lead to undesirable effects. If +maintenance+ is set as a transient +attribute, it will be erased when pacemaker is stopped, which will immediately +take the node out of maintenance mode and likely get it fenced. Even if +permanent, if pacemaker is restarted, any resources active on the node will +have their local history erased when the node rejoins, so the cluster will no +longer consider them running on the node and thus will consider them managed +again, leading them to be started elsewhere. This behavior might be improved +in a future release. +==== diff --git a/doc/Pacemaker_Explained/en-US/Ch-Rules.txt b/doc/Pacemaker_Explained/en-US/Ch-Rules.txt index 37617f04c5..3e56b19b89 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Rules.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Rules.txt @@ -1,643 +1,644 @@ :compat-mode: legacy = Rules = //// We prefer [[ch-rules]], but older versions of asciidoc don't deal well with that construct for chapter headings //// anchor:ch-rules[Chapter 8, Rules] indexterm:[Resource,Constraint,Rule] Rules can be used to make your configuration more dynamic. One common example is to set one value for +resource-stickiness+ during working hours, to prevent resources from being moved back to their most preferred location, and another on weekends when no-one is around to notice an outage. Another use of rules might be to assign machines to different processing groups (using a node attribute) based on time and to then use that attribute when creating location constraints. Each rule can contain a number of expressions, date-expressions and even other rules. The results of the expressions are combined based on the rule's +boolean-op+ field to determine if the rule ultimately evaluates to +true+ or +false+. What happens next depends on the context in which the rule is being used. == Rule Properties == .Properties of a Rule [width="95%",cols="2m,1,<5",options="header",align="center"] |========================================================= |Field |Default |Description |id | |A unique name for the rule (required) indexterm:[id,Constraint Rule] indexterm:[Constraint,Rule,id] |role |+Started+ |Limits the rule to apply only when the resource is in the specified role. Allowed values are +Started+, +Slave+, and +Master+. A rule with +role="Master"+ cannot determine the initial location of a clone instance and will only affect which of the active instances will be promoted. indexterm:[role,Constraint Rule] indexterm:[Constraint,Rule,role] |score | |The score to apply if the rule evaluates to +true+. Limited to use in rules that are part of location constraints. indexterm:[score,Constraint Rule] indexterm:[Constraint,Rule,score] |score-attribute | |The node attribute to look up and use as a score if the rule evaluates to +true+. Limited to use in rules that are part of location constraints. indexterm:[score-attribute,Constraint Rule] indexterm:[Constraint,Rule,score-attribute] |boolean-op |+and+ |How to combine the result of multiple expression objects. Allowed values are +and+ and +or+. indexterm:[boolean-op,Constraint Rule] indexterm:[Constraint,Rule,boolean-op] |========================================================= == Node Attribute Expressions == +[[node-attribute-expressions]] indexterm:[Resource,Constraint,Attribute Expression] Expression objects are used to control a resource based on the attributes defined by a node or nodes. .Properties of an Expression [width="95%",cols="2m,1,<5",options="header",align="center"] |========================================================= |Field |Default |Description |id | |A unique name for the expression (required) indexterm:[id,Constraint Expression] indexterm:[Constraint,Attribute Expression,id] |attribute | |The node attribute to test (required) indexterm:[attribute,Constraint Expression] indexterm:[Constraint,Attribute Expression,attribute] |type |+string+ |Determines how the value(s) should be tested. Allowed values are +string+, +integer+, and +version+. indexterm:[type,Constraint Expression] indexterm:[Constraint,Attribute Expression,type] |operation | a|The comparison to perform (required). Allowed values: * +lt:+ True if the value of the node's +attribute+ is less than +value+ * +gt:+ True if the value of the node's +attribute+ is greater than +value+ * +lte:+ True if the value of the node's +attribute+ is less than or equal to +value+ * +gte:+ True if the value of the node's +attribute+ is greater than or equal to +value+ * +eq:+ True if the value of the node's +attribute+ is equal to +value+ * +ne:+ True if the value of the node's +attribute+ is not equal to +value+ * +defined:+ True if the node has the named attribute * +not_defined:+ True if the node does not have the named attribute indexterm:[operation,Constraint Expression] indexterm:[Constraint,Attribute Expression,operation] |value | |User-supplied value for comparison (required) indexterm:[value,Constraint Expression] indexterm:[Constraint,Attribute Expression,value] |value-source |+literal+ a|How the +value+ is derived. Allowed values: * +literal+: +value+ is a literal string to compare against * +param+: +value+ is the name of a resource parameter to compare against (only valid in location constraints) * +meta+: +value+ is the name of a resource meta-attribute to compare against (only valid in location constraints) indexterm:[value,Constraint Expression] indexterm:[Constraint,Attribute Expression,value] |========================================================= In addition to any attributes added by the administrator, the cluster defines special, built-in node attributes for each node that can also be used. .Built-in node attributes [width="95%",cols="1m,<5",options="header",align="center"] |========================================================= |Name |Value |#uname |Node <> |#id |Node ID |#kind |Node type. Possible values are +cluster+, +remote+, and +container+. Kind is +remote+ for Pacemaker Remote nodes created with the +ocf:pacemaker:remote+ resource, and +container+ for Pacemaker Remote guest nodes and bundle nodes |#is_dc |"true" if this node is a Designated Controller (DC), "false" otherwise |#cluster-name |The value of the +cluster-name+ cluster property, if set |#site-name |The value of the +site-name+ node attribute, if set, otherwise identical to +#cluster-name+ |#role a|The role the relevant promotable clone resource has on this node. Valid only within a rule for a location constraint for a promotable clone resource. //// // if uncommenting, put a pipe in front of first two lines #ra-version The installed version of the resource agent on the node, as defined by the +version+ attribute of the +resource-agent+ tag in the agent's metadata. Valid only within rules controlling resource options. This can be useful during rolling upgrades of a backward-incompatible resource agent. '(coming in x.x.x)' //// |========================================================= == Time- and Date-Based Expressions == indexterm:[Time Based Expressions] indexterm:[Resource,Constraint,Date/Time Expression] As the name suggests, +date_expressions+ are used to control a resource or cluster option based on the current date/time. They may contain an optional +date_spec+ and/or +duration+ object depending on the context. .Properties of a Date Expression [width="95%",cols="2m,<5",options="header",align="center"] |========================================================= |Field |Description |start |A date/time conforming to the http://en.wikipedia.org/wiki/ISO_8601[ISO8601] specification. indexterm:[start,Constraint Expression] indexterm:[Constraint,Date/Time Expression,start] |end |A date/time conforming to the http://en.wikipedia.org/wiki/ISO_8601[ISO8601] specification. Can be inferred by supplying a value for +start+ and a +duration+. indexterm:[end,Constraint Expression] indexterm:[Constraint,Date/Time Expression,end] |operation a|Compares the current date/time with the start and/or end date, depending on the context. Allowed values: * +gt:+ True if the current date/time is after +start+ * +lt:+ True if the current date/time is before +end+ * +in_range:+ True if the current date/time is after +start+ and before +end+ * +date_spec:+ True if the current date/time matches a +date_spec+ object (described below) indexterm:[operation,Constraint Expression] indexterm:[Constraint,Date/Time Expression,operation] |========================================================= [NOTE] ====== As these comparisons (except for +date_spec+) include the time, the +eq+, +neq+, +gte+ and +lte+ operators have not been implemented since they would only be valid for a single second. ====== === Date Specifications === indexterm:[Date Specification] indexterm:[Resource,Constraint,Date Specification] +date_spec+ objects are used to create cron-like expressions relating to time. Each field can contain a single number or a single range. Instead of defaulting to zero, any field not supplied is ignored. For example, +monthdays="1"+ matches the first day of every month and +hours="09-17"+ matches the hours between 9am and 5pm (inclusive). At this time, multiple ranges (e.g. +weekdays="1,2"+ or +weekdays="1-2,5-6"+) are not supported; depending on demand, this might be implemented in a future release. .Properties of a Date Specification [width="95%",cols="2m,<5",options="header",align="center"] |========================================================= |Field |Description |id |A unique name for the object indexterm:[id,Date Specification] indexterm:[Constraint,Date Specification,id] |hours |Allowed values: 0-23 indexterm:[hours,Date Specification] indexterm:[Constraint,Date Specification,hours] |monthdays |Allowed values: 1-31 (depending on month and year) indexterm:[monthdays,Date Specification] indexterm:[Constraint,Date Specification,monthdays] |weekdays |Allowed values: 1-7 (1=Monday, 7=Sunday) indexterm:[weekdays,Date Specification] indexterm:[Constraint,Date Specification,weekdays] |yeardays |Allowed values: 1-366 (depending on the year) indexterm:[yeardays,Date Specification] indexterm:[Constraint,Date Specification,yeardays] |months |Allowed values: 1-12 indexterm:[months,Date Specification] indexterm:[Constraint,Date Specification,months] |weeks |Allowed values: 1-53 (depending on weekyear) indexterm:[weeks,Date Specification] indexterm:[Constraint,Date Specification,weeks] |years |Year according to the Gregorian calendar indexterm:[years,Date Specification] indexterm:[Constraint,Date Specification,years] |weekyears |Year in which the week started; e.g. 1 January 2005 can be specified as '2005-001 Ordinal', '2005-01-01 Gregorian' or '2004-W53-6 Weekly' and thus would match +years="2005"+ or +weekyears="2004"+ indexterm:[weekyears,Date Specification] indexterm:[Constraint,Date Specification,weekyears] |moon |Allowed values are 0-7 (0 is new, 4 is full moon). Seriously, you can use this. This was implemented to demonstrate the ease with which new comparisons could be added. indexterm:[moon,Date Specification] indexterm:[Constraint,Date Specification,moon] |========================================================= === Durations === indexterm:[Duration] indexterm:[Resource,Constraint,Duration] Durations are used to calculate a value for +end+ when one is not supplied to +in_range+ operations. They contain the same fields as +date_spec+ objects but without the limitations (e.g. you can have a duration of 19 months). As with +date_specs+, any field not supplied is ignored. === Sample Time-Based Expressions === A small sample of how time-based expressions can be used: //// On older versions of asciidoc, the [source] directive makes the title disappear //// .True if now is any time in the year 2005 ==== [source,XML] ---- ---- ==== .Equivalent expression ==== [source,XML] ---- ---- ==== .9am-5pm Monday-Friday ==== [source,XML] ------- ------- ==== Please note that the +16+ matches up to +16:59:59+, as the numeric value (hour) still matches! .9am-6pm Monday through Friday or anytime Saturday ==== [source,XML] ------- ------- ==== .9am-5pm or 9pm-12am Monday through Friday ==== [source,XML] ------- ------- ==== .Mondays in March 2005 ==== [source,XML] ------- ------- ==== [NOTE] ====== Because no time is specified with the above dates, 00:00:00 is implied. This means that the range includes all of 2005-03-01 but none of 2005-04-01. You may wish to write +end="2005-03-31T23:59:59"+ to avoid confusion. ====== .A full moon on Friday the 13th ===== [source,XML] ------- ------- ===== == Using Rules to Determine Resource Location == indexterm:[Rule,Determine Resource Location] indexterm:[Resource,Location,Determine by Rules] A location constraint may contain rules. When the constraint's outermost rule evaluates to +false+, the cluster treats the constraint as if it were not there. When the rule evaluates to +true+, the node's preference for running the resource is updated with the score associated with the rule. If this sounds familiar, it is because you have been using a simplified syntax for location constraint rules already. Consider the following location constraint: .Prevent myApacheRsc from running on c001n03 ===== [source,XML] ------- ------- ===== This constraint can be more verbosely written as: .Prevent myApacheRsc from running on c001n03 - expanded version ===== [source,XML] ------- ------- ===== The advantage of using the expanded form is that one can then add extra clauses to the rule, such as limiting the rule such that it only applies during certain times of the day or days of the week. === Location Rules Based on Other Node Properties === The expanded form allows us to match on node properties other than its name. If we rated each machine's CPU power such that the cluster had the following nodes section: .A sample nodes section for use with score-attribute ===== [source,XML] ------- ------- ===== then we could prevent resources from running on underpowered machines with this rule: [source,XML] ------- ------- === Using +score-attribute+ Instead of +score+ === When using +score-attribute+ instead of +score+, each node matched by the rule has its score adjusted differently, according to its value for the named node attribute. Thus, in the previous example, if a rule used +score-attribute="cpu_mips"+, +c001n01+ would have its preference to run the resource increased by +1234+ whereas +c001n02+ would have its preference increased by +5678+. == Using Rules to Control Resource Options == Often some cluster nodes will be different from their peers. Sometimes, these differences -- e.g. the location of a binary or the names of network interfaces -- require resources to be configured differently depending on the machine they're hosted on. By defining multiple +instance_attributes+ objects for the resource and adding a rule to each, we can easily handle these special cases. In the example below, +mySpecialRsc+ will use eth1 and port 9999 when run on +node1+, eth2 and port 8888 on +node2+ and default to eth0 and port 9999 for all other nodes. .Defining different resource options based on the node name ===== [source,XML] ------- ------- ===== The order in which +instance_attributes+ objects are evaluated is determined by their score (highest to lowest). If not supplied, score defaults to zero, and objects with an equal score are processed in listed order. If the +instance_attributes+ object has no rule or a +rule+ that evaluates to +true+, then for any parameter the resource does not yet have a value for, the resource will use the parameter values defined by the +instance_attributes+. For example, given the configuration above, if the resource is placed on node1: . +special-node1+ has the highest score (3) and so is evaluated first; its rule evaluates to +true+, so +interface+ is set to +eth1+. . +special-node2+ is evaluated next with score 2, but its rule evaluates to +false+, so it is ignored. . +defaults+ is evaluated last with score 1, and has no rule, so its values are examined; +interface+ is already defined, so the value here is not used, but +port+ is not yet defined, so +port+ is set to +9999+. == Using Rules to Control Cluster Options == indexterm:[Rule,Controlling Cluster Options] indexterm:[Cluster,Setting Options with Rules] Controlling cluster options is achieved in much the same manner as specifying different resource options on different nodes. The difference is that because they are cluster options, one cannot (or should not, because they won't work) use attribute-based expressions. The following example illustrates how to set a different +resource-stickiness+ value during and outside work hours. This allows resources to automatically move back to their most preferred hosts, but at a time that (in theory) does not interfere with business activities. .Change +resource-stickiness+ during working hours ===== [source,XML] ------- ------- ===== [[s-rules-recheck]] == Ensuring Time-Based Rules Take Effect == A Pacemaker cluster is an event-driven system. As such, it won't recalculate the best place for resources to run unless something (like a resource failure or configuration change) happens. This can mean that a location constraint that only allows resource X to run between 9am and 5pm is not enforced. If you rely on time-based rules, the +cluster-recheck-interval+ cluster option (which defaults to 15 minutes) is essential. This tells the cluster to periodically recalculate the ideal state of the cluster. For example, if you set +cluster-recheck-interval="5m"+, then sometime between 09:00 and 09:05 the cluster would notice that it needs to start resource X, and between 17:00 and 17:05 it would realize that X needed to be stopped. The timing of the actual start and stop actions depends on what other actions the cluster may need to perform first.