diff --git a/doc/Pacemaker_Explained/en-US/Ch-Options.txt b/doc/Pacemaker_Explained/en-US/Ch-Options.txt index 23c86ebc72..7b7d2db0b2 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Options.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Options.txt @@ -1,291 +1,387 @@ = Cluster Options = == Special Options == The reason for these fields to be placed at the top level instead of with the rest of cluster options is simply a matter of parsing. These options are used by the configuration database which is, by design, mostly ignorant of the content it holds. So the decision was made to place them in an easy to find location. == Configuration Version == indexterm:[Configuration Version,Cluster] indexterm:[Cluster,Option,Configuration Version] When a node joins the cluster, the cluster will perform a check to see who has the best configuration based on the fields below. It then asks the node with the highest (+admin_epoch+, +epoch+, +num_updates+) tuple to replace the configuration on all the nodes - which makes setting them, and setting them correctly, very important. .Configuration Version Properties [width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description | admin_epoch | indexterm:[admin_epoch,Cluster Option] indexterm:[Cluster,Option,admin_epoch] Never modified by the cluster. Use this to make the configurations on any inactive nodes obsolete. _Never set this value to zero_, in such cases the cluster cannot tell the difference between your configuration and the "empty" one used when nothing is found on disk. | epoch | indexterm:[epoch,Cluster Option] indexterm:[Cluster,Option,epoch] Incremented every time the configuration is updated (usually by the admin) | num_updates | indexterm:[num_updates,Cluster Option] indexterm:[Cluster,Option,num_updates] Incremented every time the configuration or status is updated (usually by the cluster) |========================================================= == Other Fields == .Properties Controlling Validation [width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description | validate-with | indexterm:[validate-with,Cluster Option] indexterm:[Cluster,Option,validate-with] Determines the type of validation being done on the configuration. If set to "none", the cluster will not verify that updates conform to the DTD (nor reject ones that don't). This option can be useful when operating a mixed version cluster during an upgrade. |========================================================= == Fields Maintained by the Cluster == .Properties Maintained by the Cluster [width="95%",cols="2m,5<",options="header",align="center"] |========================================================= |Field |Description |cib-last-written | indexterm:[cib-last-written,Cluster Property] indexterm:[Cluster,Property,cib-last-written] Indicates when the configuration was last written to disk. Informational purposes only. |dc-uuid | indexterm:[dc-uuid,Cluster Property] indexterm:[Cluster,Property,dc-uuid] Indicates which cluster node is the current leader. Used by the cluster when placing resources and determining the order of some events. |have-quorum | indexterm:[have-quorum,Cluster Property] indexterm:[Cluster,Property,have-quorum] Indicates if the cluster has quorum. If false, this may mean that the cluster cannot start resources or fence other nodes. See +no-quorum-policy+ below. +| dc-version | +indexterm:[dc-version,Cluster Peroperty] +indexterm:[Cluster,Peroperty,dc-version] +Version of Pacemaker on the cluster's DC. + +Often includes the hash which identifies the exact Git changeset it +was built from. Used for diagnostic purposes. + +| cluster-infrastructure | +indexterm:[cluster-infrastructure,Cluster Peroperty] +indexterm:[Cluster,Peroperty,cluster-infrastructure] +The messaging stack on which Pacemaker is currently running. +Used for informational and diagnostic purposes. + +| expected-quorum-votes | +indexterm:[expected-quorum-votes,Cluster Peroperty] +indexterm:[Cluster,Peroperty,expected-quorum-votes] +The number of nodes expected to be in the cluster + +Used to calculate quorum in Corosync 1.x (not CMAN) based clusters. + |========================================================= Note that although these fields can be written to by the admin, in most cases the cluster will overwrite any values specified by the admin with the "correct" ones. To change the +admin_epoch+, for example, one would use: [source,C] # cibadmin --modify --crm_xml '' A complete set of fields will look something like this: .An example of the fields set for a cib object ====== [source,XML] ------- ------- ====== == Cluster Options == Cluster options, as you might expect, control how the cluster behaves when confronted with certain situations. They are grouped into sets and, in advanced configurations, there may be more than one. footnote:[This will be described later in the section on <> where we will show how to have the cluster use different sets of options during working hours (when downtime is usually to be avoided at all costs) than it does during the weekends (when resources can be moved to the their preferred hosts without bothering end users)] For now we will describe the simple case where each option is present at most once. == Available Cluster Options == .Cluster Options [width="95%",cols="5m,2,11<",options="header",align="center"] |========================================================= |Option |Default |Description | batch-limit | 30 | indexterm:[batch-limit,Cluster Option] indexterm:[Cluster,Option,batch-limit] The number of jobs that the TE is allowed to execute in parallel. The "correct" value will depend on the speed and load of your network and cluster nodes. | migration-limit | -1 (unlimited) | indexterm:[migration-limit,Cluster Option] indexterm:[Cluster,Option,migration-limit] The number of migration jobs that the TE is allowed to execute in parallel on a node. | no-quorum-policy | stop | indexterm:[no-quorum-policy,Cluster Option] indexterm:[Cluster,Option,no-quorum-policy] What to do when the cluster does not have quorum. Allowed values: * ignore - continue all resource management * freeze - continue resource management, but don't recover resources from nodes not in the affected partition * stop - stop all resources in the affected cluster partition * suicide - fence all nodes in the affected cluster partition | symmetric-cluster | TRUE | indexterm:[symmetric-cluster,Cluster Option] indexterm:[Cluster,Option,symmetric-cluster] Can all resources run on any node by default? | stonith-enabled | TRUE | indexterm:[stonith-enabled,Cluster Option] indexterm:[Cluster,Option,stonith-enabled] Should failed nodes and nodes with resources that can't be stopped be shot? If you value your data, set up a STONITH device and enable this. If true, or unset, the cluster will refuse to start resources unless one or more STONITH resources have been configured also. | stonith-action | reboot | indexterm:[stonith-action,Cluster Option] indexterm:[Cluster,Option,stonith-action] Action to send to STONITH device. Allowed values: reboot, off. The value 'poweroff' is also allowed, but is only used for legacy devices. | cluster-delay | 60s | indexterm:[cluster-delay,Cluster Option] indexterm:[Cluster,Option,cluster-delay] Round trip delay over the network (excluding action execution). The "correct" value will depend on the speed and load of your network and cluster nodes. | stop-all-resources | FALSE | indexterm:[stop-all-resources,Cluster Option] indexterm:[Cluster,Option,stop-all-resources] Should the cluster stop all stop | resources-orphan-resources | TRUE | indexterm:[stop-orphan-resources,Cluster Option] indexterm:[Cluster,Option,stop-orphan-resources] Should deleted resources be stopped? | stop-orphan-actions | TRUE | indexterm:[stop-orphan-actions,Cluster Option] indexterm:[Cluster,Option,stop-orphan-actions] Should deleted actions be cancelled? | start-failure-is-fatal | TRUE | indexterm:[start-failure-is-fatal,Cluster Option] indexterm:[Cluster,Option,start-failure-is-fatal] When set to FALSE, the cluster will instead use the resource's +failcount+ and value for +resource-failure-stickiness+. | pe-error-series-max | -1 (all) | indexterm:[pe-error-series-max,Cluster Option] indexterm:[Cluster,Option,pe-error-series-max] The number of PE inputs resulting in ERRORs to save. Used when reporting problems. | pe-warn-series-max | -1 (all) | indexterm:[pe-warn-series-max,Cluster Option] indexterm:[Cluster,Option,pe-warn-series-max] The number of PE inputs resulting in WARNINGs to save. Used when reporting problems. | pe-input-series-max | -1 (all) | indexterm:[pe-input-series-max,Cluster Option] indexterm:[Cluster,Option,pe-input-series-max] The number of "normal" PE inputs to save. Used when reporting problems. +|default-resource-stickiness | 0 | +indexterm:[default-resource-stickiness,Cluster Option] +indexterm:[Cluster,Option,default-resource-stickiness] ++Deprecated:+ See <> instead + +| is-managed-default | TRUE | +indexterm:[is-managed-default,Cluster Option] +indexterm:[Cluster,Option,is-managed-default] ++Deprecated:+ See <> instead + +| maintenance-mode | FALSE | +indexterm:[maintenance-mode,Cluster Option] +indexterm:[Cluster,Option,maintenance-mode] +Should the cluster monitor resources and start/stop them as required + +| stonith-timeout | 60s | +indexterm:[stonith-timeout,Cluster Option] +indexterm:[Cluster,Option,stonith-timeout] +How long to wait for the STONITH action to complete + +| default-action-timeout | 20s | +indexterm:[default-action-timeout,Cluster Option] +indexterm:[Cluster,Option,default-action-timeout] ++Deprecated:+ See <> instead + +| dc-deadtime | 20s | +indexterm:[dc-deadtime,Cluster Option] +indexterm:[Cluster,Option,dc-deadtime] +How long to wait for a response from other nodes during startup. + +The "correct" value will depend on the speed/load of your network and the type of switches used. + +| cluster-recheck-interval | 15min | +indexterm:[cluster-recheck-interval,Cluster Option] +indexterm:[Cluster,Option,cluster-recheck-interval] +Polling interval for time based changes to options, resource parameters and constraints. + +The Cluster is primarily event driven, however the configuration can have elements that change based on time. To ensure these changes take effect, we can optionally poll the cluster's status for changes. + +Allowed values: Zero disables polling. Positive values are an interval in seconds (unless other SI units are specified. eg. 5min) + +| election-timeout | 2min | +indexterm:[election-timeout,Cluster Option] +indexterm:[Cluster,Option,election-timeout] ++Advanced Use Only+ + +If need to adjust this value, it probably indicates the presence of a bug. + +| shutdown-escalation | 20min | +indexterm:[shutdown-escalation,Cluster Option] +indexterm:[Cluster,Option,shutdown-escalation] ++Advanced Use Only+ + +If need to adjust this value, it probably indicates the presence of a bug. + +| crmd-integration-timeout | 3min | +indexterm:[crmd-integration-timeout,Cluster Option] +indexterm:[Cluster,Option,crmd-integration-timeout] ++Advanced Use Only+ + +If need to adjust this value, it probably indicates the presence of a bug. + +| crmd-finalization-timeout | 30min | +indexterm:[crmd-finalization-timeout,Cluster Option] +indexterm:[Cluster,Option,crmd-finalization-timeout] ++Advanced Use Only+ + +If need to adjust this value, it probably indicates the presence of a bug. + +| crmd-transition-delay | | +indexterm:[crmd-transition-delay,Cluster Option] +indexterm:[Cluster,Option,crmd-transition-delay] ++Advanced Use Only+ Enabling this option will slow down cluster recovery under all conditions. + +Delay cluster recovery for the configured interval to allow for additional/related events to occur. Useful if your configuration is sensitive to the order in which ping updates arrive. + |========================================================= You can always obtain an up-to-date list of cluster options, including -their default values, by running the `pengine -metadata` command. +their default values, by running the `man pengine` and `man crmd` commands. == Querying and Setting Cluster Options == indexterm:[Querying,Cluster Option] indexterm:[Setting,Cluster Option] indexterm:[Cluster,Querying Options] indexterm:[Cluster,Setting Options] Cluster options can be queried and modified using the `crm_attribute` tool. To get the current value of +cluster-delay+, simply use: [source,C] # crm_attribute --attr-name cluster-delay --get-value which is more simply written as [source,C] # crm_attribute --get-value -n cluster-delay If a value is found, you'll see a result like this: [source,C] # crm_attribute --get-value -n cluster-delay name=cluster-delay value=60s However, if no value is found, the tool will display an error: [source,C] # crm_attribute --get-value -n clusta-deway` name=clusta-deway value=(null) Error performing operation: The object/attribute does not exist To use a different value, eg. +30+, simply run: [source,C] # crm_attribute --attr-name cluster-delay --attr-value 30s To go back to the cluster's default value you can delete the value, for example with this command: [source,C] # crm_attribute --attr-name cluster-delay --delete-attr == When Options are Listed More Than Once == If you ever see something like the following, it means that the option you're modifying is present more than once. .Deleting an option that is listed twice ======= [source,C] ------ # crm_attribute --attr-name batch-limit --delete-attr Multiple attributes match name=batch-limit in crm_config: Value: 50 (set=cib-bootstrap-options, id=cib-bootstrap-options-batch-limit) Value: 100 (set=custom, id=custom-batch-limit) Please choose from one of the matches above and supply the 'id' with --attr-id ------- ======= In such cases follow the on-screen instructions to perform the requested action. To determine which value is currently being used by the cluster, please refer to <>.