diff --git a/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt b/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt index 1a6beb9af7..2e4228f541 100644 --- a/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt +++ b/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt @@ -1,59 +1,59 @@ [appendix] [[ap-faq]] == FAQ == [qanda] Why is the Project Called Pacemaker?:: indexterm:[Pacemaker] First of all, the reason it's not called the CRM is because of the abundance of terms footnote:[http://en.wikipedia.org/wiki/CRM] that are commonly abbreviated to those three letters. The Pacemaker name came from Kham, footnote:[http://khamsouk.souvanlasy.com/] a good friend of Pacemaker developer Andrew Beekhof's, and was originally used by a Java GUI that Beekhof was prototyping in early 2007. Alas, other commitments prevented the GUI from progressing much and, when it came time to choose a name for this project, Lars Marowsky-Bree suggested it was an even better fit for an independent CRM. The idea stems from the analogy between the role of this software and that of the little device that keeps the human heart pumping. Pacemaker monitors the cluster and intervenes when necessary to ensure the smooth operation of the services it provides. There were a number of other names (and acronyms) tossed around, but suffice to say "Pacemaker" was the best. Why was the Pacemaker Project Created?:: Pacemaker was spun off from an earlier project called http://linux-ha.org/[Heartbeat], which combined a cluster layer and a cluster resource manager. The CRM was made into its own project, Pacemaker, in order to: * support both the Corosync and Heartbeat cluster stacks equally (Heartbeat support was dropped in Pacemaker 2.0, as the project had faded out by then) * decouple the release cycles of the cluster layer and the cluster resource manager at very different stages of their life-cycles * foster clearer package boundaries, thus leading to better and more stable interfaces What Messaging Layers are Supported?:: indexterm:[Messaging Layers] - * http://www.corosync.org/[Corosync] version 2 + * http://www.corosync.org/[Corosync] version 2 and greater * Historically, Pacemaker 1 also supported Corosync version 1 (with either CMAN or a pacemaker plugin) and Heartbeat. Support for these legacy stacks was dropped with Pacemaker 2.0. Where Can I Get Pre-built Packages?:: Most major Linux distributions have pacemaker packages in their standard package repositories. See the http://clusterlabs.org/wiki/Install[Install wiki page] for details. What Versions of Pacemaker Are Supported?:: Some Linux distributions (such as Red Hat Enterprise Linux and SUSE Linux Enterprise) offer technical support for their customers; contact them for details of such support. For help within the community (mailing lists, IRC, etc.) from Pacemaker developers and users, refer to the http://clusterlabs.org/wiki/Releases[Releases wiki page] for an up-to-date list of versions considered to be supported by the project. When seeking assistance, please try to ensure you have one of these versions. diff --git a/doc/Pacemaker_Explained/en-US/Ap-Install.txt b/doc/Pacemaker_Explained/en-US/Ap-Install.txt index 25cb889319..7de68077cf 100644 --- a/doc/Pacemaker_Explained/en-US/Ap-Install.txt +++ b/doc/Pacemaker_Explained/en-US/Ap-Install.txt @@ -1,107 +1,107 @@ [appendix] == Installing == === Installing the Software === Most major Linux distributions have pacemaker packages in their standard package repositories, or the software can be built from source code. See the http://clusterlabs.org/wiki/Install[Install wiki page] for details. === Enabling Pacemaker === -==== Enabling Pacemaker For Corosync 2._x_ ==== +==== Enabling Pacemaker For Corosync version 2 and greater ==== High-level cluster management tools are available that can configure corosync for you. This document focuses on the lower-level details if you want to configure corosync yourself. Corosync configuration is normally located in +/etc/corosync/corosync.conf+. -.Corosync 2._x_ configuration file for two nodes *myhost1* and *myhost2* +.Corosync configuration file for two nodes *myhost1* and *myhost2* ==== ---- totem { version: 2 secauth: off cluster_name: mycluster transport: udpu } nodelist { node { ring0_addr: myhost1 nodeid: 1 } node { ring0_addr: myhost2 nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_syslog: yes } ---- ==== -.Corosync 2._x_ configuration file for three nodes *myhost1*, *myhost2* and *myhost3* +.Corosync configuration file for three nodes *myhost1*, *myhost2* and *myhost3* ==== ---- totem { version: 2 secauth: off cluster_name: mycluster transport: udpu } nodelist { node { ring0_addr: myhost1 nodeid: 1 } node { ring0_addr: myhost2 nodeid: 2 } node { ring0_addr: myhost3 nodeid: 3 } } quorum { provider: corosync_votequorum } logging { to_syslog: yes } ---- ==== In the above examples, the +totem+ section defines what protocol version and options (including encryption) to use, footnote:[ Please consult the Corosync website (http://www.corosync.org/) and documentation for details on enabling encryption and peer authentication for the cluster. ] and gives the cluster a unique name (+mycluster+ in these examples). The +node+ section lists the nodes in this cluster. (See <> for how this affects pacemaker.) The +quorum+ section defines how the cluster uses quorum. The important thing is that two-node clusters must be handled specially, so +two_node: 1+ must be defined for two-node clusters (and only for two-node clusters). The +logging+ section should be self-explanatory. diff --git a/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt b/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt index 367cf49789..570410b520 100644 --- a/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt +++ b/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt @@ -1,443 +1,445 @@ [appendix] == Upgrading == [[ap-upgrade]] === Pacemaker Versioning === Pacemaker has an overall release version, plus separate version numbers for certain internal components. * *Pacemaker release version:* This version consists of three numbers (_x.y.z_). + The major version number (the _x_ in _x.y.z_) increases when at least some rolling upgrades are not possible from the previous major version. For example, a rolling upgrade from 1.0.8 to 1.1.15 should always be supported, but a rolling upgrade from 1.0.8 to 2.0.0 may not be possible. + The minor version (the _y_ in _x.y.z_) increases when there are significant changes in cluster default behavior, tool behavior, and/or the API interface (for software that utilizes Pacemaker libraries). The main benefit is to alert you to pay closer attention to the release notes, to see if you might be affected. + The release counter (the _z_ in _x.y.z_) is increased with all public releases of Pacemaker, which typically include both bug fixes and new features. * *CRM feature set:* This version number applies to the communication between full cluster nodes. + It increases when a cluster node running the older version would have problems if the cluster's Designated Controller (DC) has the newer version. To avoid these problems, Pacemaker ensures that the longest-running node is the DC, and that nodes with an older feature set cannot join the cluster. * *LRMD protocol version:* This version applies to communication between a Pacemaker Remote node and the cluster. It increases when an older cluster node would have problems hosting the connection to a newer Pacemaker Remote node. To avoid these problems, Pacemaker Remote nodes will accept connections only from cluster nodes with the same or newer LRMD protocol version. + Unlike with CRM feature set differences between full cluster nodes, mixed LRMD protocol versions between Pacemaker Remote nodes and full cluster nodes are fine, as long as the Pacemaker Remote nodes have the older version. This can be useful, for example, to host a legacy application in an older operating system version used as a Pacemaker Remote node. * *XML schema version:* Pacemaker’s configuration syntax — what's allowed in the Configuration Information Base (CIB) — has its own version. This allows the configuration syntax to evolve over time while still allowing clusters with older configurations to work without change. === Upgrading Cluster Software === There are three approaches to upgrading a cluster, each with advantages and disadvantages. .Upgrade Methods [width="95%",cols="s,6*",options="header",align="center"] |========================================================= |Method |Available between all versions |Can be used with Pacemaker Remote nodes |Service outage during upgrade |Service recovery during upgrade |Exercises failover logic |Allows change of messaging layer indexterm:[Cluster,switching between stacks] indexterm:[Changing cluster stack] -footnote:[Currently, Corosync 2 is the only supported cluster stack, but other -stacks have been supported by past versions, and may be supported by future -versions.] +footnote:[Currently, Corosync version 2 and greater is the only supported +cluster stack, but other stacks have been supported by past versions, and may +be supported by future versions.] |Complete cluster shutdown indexterm:[upgrade,shutdown] indexterm:[shutdown upgrade] |yes |yes |always |N/A |no |yes |Rolling (node by node) indexterm:[upgrade,rolling] indexterm:[rolling upgrade] |no |yes |always footnote:[Any active resources will be moved off the node being upgraded, so there will be at least a brief outage unless all resources can be migrated "live".] |yes |yes |no |Detach and reattach indexterm:[upgrade,reattach] indexterm:[reattach upgrade] |yes |no |only due to failure |no |no |yes |========================================================= ==== Complete Cluster Shutdown ==== In this scenario, one shuts down all cluster nodes and resources, then upgrades all the nodes before restarting the cluster. . On each node: .. Shutdown the cluster software (pacemaker and the messaging layer). .. Upgrade the Pacemaker software. This may also include upgrading the messaging layer and/or the underlying operating system. .. Check the configuration with the `crm_verify` tool. . On each node: .. Start the cluster software. - Currently, only Corosync version 2 is supported as the cluster layer, - but if another stack is supported in the future, the stack does not need to - be the same one before the upgrade. + Currently, only Corosync version 2 and greater is supported as the cluster + layer, but if another stack is supported in the future, the stack does not + need to be the same one before the upgrade. One variation of this approach is to build a new cluster on new hosts. This allows the new version to be tested beforehand, and minimizes downtime by having the new nodes ready to be placed in production as soon as the old nodes are shut down. ==== Rolling (node by node) ==== In this scenario, each node is removed from the cluster, upgraded, and then brought back online, until all nodes are running the newest version. Special considerations when planning a rolling upgrade: * If you plan to upgrade other cluster software -- such as the messaging layer -- at the same time, consult that software's documentation for its compatibility with a rolling upgrade. * If the major version number is changing in the Pacemaker version you are upgrading to, a rolling upgrade may not be possible. Read the new version's release notes (as well the information here) for what limitations may exist. * If the CRM feature set is changing in the Pacemaker version you are upgrading to, you should run a mixed-version cluster only during a small rolling upgrade window. If one of the older nodes drops out of the cluster for any reason, it will not be able to rejoin until it is upgraded. * If the LRMD protocol version is changing, all cluster nodes should be upgraded before upgrading any Pacemaker Remote nodes. See the ClusterLabs wiki's http://clusterlabs.org/wiki/ReleaseCalendar[Release Calendar] to figure out whether the CRM feature set and/or LRMD protocol version changed between the the Pacemaker release versions in your rolling upgrade. To perform a rolling upgrade, on each node in turn: . Put the node into standby mode, and wait for any active resources to be moved cleanly to another node. (This step is optional, but allows you to deal with any resource issues before the upgrade.) . Shutdown the cluster software (pacemaker and the messaging layer) on the node. . Upgrade the Pacemaker software. This may also include upgrading the messaging layer and/or the underlying operating system. . If this is the first node to be upgraded, check the configuration with the `crm_verify` tool. . Start the messaging layer. - This must be the same messaging layer (currently only Corosync 2 is - supported) that the rest of the cluster is using. + This must be the same messaging layer (currently only Corosync version 2 and + greater is supported) that the rest of the cluster is using. [NOTE] ==== Even if a rolling upgrade from the current version of the cluster to the newest version is not directly possible, it may be possible to perform a rolling upgrade in multiple steps, by upgrading to an intermediate version first. .Version Compatibility Table [width="95%",cols="2*",options="header",align="center"] |========================================================= |Version being Installed |Oldest Compatible Version |Pacemaker 2.y.z |Pacemaker 1.1.11 footnote:[Rolling upgrades from Pacemaker 1.1.z to 2.y.z are possible only if -the cluster uses corosync 2 as its messaging layer, and the Cluster Information -Base (CIB) uses schema 1.0 or higher in its validate-with property.] +the cluster uses corosync version 2 or greater as its messaging layer, and the +Cluster Information Base (CIB) uses schema 1.0 or higher in its validate-with +property.] |Pacemaker 1.y.z |Pacemaker 1.0.0 |Pacemaker 0.7.z |Pacemaker 0.6.z |========================================================= ==== ==== Detach and Reattach ==== The reattach method is a variant of a complete cluster shutdown, where the resources are left active and get re-detected when the cluster is restarted. This method may not be used if the cluster contains any Pacemaker Remote nodes. . Tell the cluster to stop managing services. This is required to allow the services to remain active after the cluster shuts down. + ---- # crm_attribute --name maintenance-mode --update true ---- . On each node, shutdown the cluster software (pacemaker and the messaging layer), and upgrade the Pacemaker software. This may also include upgrading the messaging layer. While the underlying operating system may be upgraded at the same time, that will be more likely to cause outages in the detached services (certainly, if a reboot is required). . Check the configuration with the `crm_verify` tool. . On each node, start the cluster software. - Currently, only Corosync version 2 is supported as the cluster layer, - but if another stack is supported in the future, the stack does not need to - be the same one before the upgrade. + Currently, only Corosync version 2 and greater is supported as the cluster + layer, but if another stack is supported in the future, the stack does not + need to be the same one before the upgrade. . Verify that the cluster re-detected all resources correctly. . Allow the cluster to resume managing resources again: + ---- # crm_attribute --name maintenance-mode --delete ---- === Upgrading the Configuration === indexterm:[upgrade,Configuration] indexterm:[Configuration,upgrading] The CIB schema version can change from one Pacemaker version to another. After cluster software is upgraded, the cluster will continue to use the older schema version that it was previously using. This can be useful, for example, when administrators have written tools that modify the configuration, and are based on the older syntax. footnote:[As of Pacemaker 2.0.0, only schema versions pacemaker-1.0 and higher are supported (excluding pacemaker-1.1, which was an experimental schema now known as pacemaker-next).] However, when using an older syntax, new features may be unavailable, and there is a performance impact, since the cluster must do a non-persistent configuration upgrade before each transition. So while using the old syntax is possible, it is not advisable to continue using it indefinitely. Even if you wish to continue using the old syntax, it is a good idea to follow the upgrade procedure outlined below, except for the last step, to ensure that the new software has no problems with your existing configuration (since it will perform much the same task internally). If you are brave, it is sufficient simply to run `cibadmin --upgrade`. A more cautious approach would proceed like this: . Create a shadow copy of the configuration. The later commands will automatically operate on this copy, rather than the live configuration. + ----- # crm_shadow --create shadow ----- . Verify the configuration is valid with the new software (which may be stricter about syntax mistakes, or may have dropped support for deprecated features): indexterm:[Configuration,verify] indexterm:[verify,Configuration] + ----- # crm_verify --live-check ----- . Fix any errors or warnings. . Perform the upgrade: + ----- # cibadmin --upgrade ----- . If this step fails, there are three main possibilities: .. The configuration was not valid to start with (did you do steps 2 and 3?). .. The transformation failed - http://bugs.clusterlabs.org/[report a bug] or mailto:users@clusterlabs.org?subject=Transformation%20failed%20during%20upgrade[email the project]. .. The transformation was successful but produced an invalid result. + If the result of the transformation is invalid, you may see a number of errors from the validation library. If these are not helpful, visit the http://clusterlabs.org/wiki/Validation_FAQ[Validation FAQ wiki page] and/or try the manual upgrade procedure described below. + . Check the changes: + ----- # crm_shadow --diff ----- + If at this point there is anything about the upgrade that you wish to fine-tune (for example, to change some of the automatic IDs), now is the time to do so: + ----- # crm_shadow --edit ----- + This will open the configuration in your favorite editor (whichever is specified by the standard *$EDITOR* environment variable). + . Preview how the cluster will react: + ------ # crm_simulate --live-check --save-dotfile shadow.dot -S # graphviz shadow.dot ------ + Verify that either no resource actions will occur or that you are happy with any that are scheduled. If the output contains actions you do not expect (possibly due to changes to the score calculations), you may need to make further manual changes. See <> for further details on how to interpret the output of `crm_simulate` and `graphviz`. + . Upload the changes: + ----- # crm_shadow --commit shadow --force ----- + In the unlikely event this step fails, please report a bug. [NOTE] ==== indexterm:[Configuration,upgrade manually] It is also possible to perform the configuration upgrade steps manually: . Locate the +upgrade*.xsl+ conversion scripts provided with the source code. These will often be installed in a location such as +/usr/share/pacemaker+, or may be obtained from the https://github.com/ClusterLabs/pacemaker/tree/master/xml[source repository]. . Run the conversion scripts that apply to your older version, for example: indexterm:[XML,convert] + ----- # xsltproc /path/to/upgrade06.xsl config06.xml > config10.xml ----- + . Locate the +pacemaker.rng+ script (from the same location as the xsl files). . Check the XML validity: indexterm:[validate configuration]indexterm:[Configuration,validate XML] + ---- # xmllint --relaxng /path/to/pacemaker.rng config10.xml ---- The advantage of this method is that it can be performed without the cluster running, and any validation errors are often more informative. ==== === What Changed in 2.0 === The main goal of the 2.0 release was to remove support for deprecated syntax, along with some small changes in default configuration behavior and tool behavior. Highlights: -* Only Corosync version 2 is now supported as the underlying cluster - layer. Support for Heartbeat and Corosync 1 (including CMAN) is removed. +* Only Corosync version 2 and greater is now supported as the underlying + cluster layer. Support for Heartbeat and Corosync 1 (including CMAN) is + removed. * The Pacemaker detail log file is now stored in /var/log/pacemaker/pacemaker.log by default. * The record-pending cluster property now defaults to true, which allows status tools such as crm_mon to show operations that are in progress. * Support for a number of deprecated build options, environment variables, and configuration settings has been removed. * The public API for Pacemaker libraries that software applications can use has changed significantly. For a detailed list of changes, see the release notes and the https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes[Pacemaker 2.0 Changes] page on the ClusterLabs wiki. === What Changed in 1.0 === ==== New ==== * Failure timeouts. See <> * New section for resource and operation defaults. See <> and <> * Tool for making offline configuration changes. See <> * +Rules, instance_attributes, meta_attributes+ and sets of operations can be defined once and referenced in multiple places. See <> * The CIB now accepts XPath-based create/modify/delete operations. See the pass:[cibadmin] help text. * Multi-dimensional colocation and ordering constraints. See <> and <> * The ability to connect to the CIB from non-cluster machines. See <> * Allow recurring actions to be triggered at known times. See <> ==== Changed ==== * Syntax ** All resource and cluster options now use dashes (-) instead of underscores (_) ** +master_slave+ was renamed to +master+ ** The +attributes+ container tag was removed ** The operation field +pre-req+ has been renamed +requires+ ** All operations must have an +interval+, +start+/+stop+ must have it set to zero * The +stonith-enabled+ option now defaults to true. * The cluster will refuse to start resources if +stonith-enabled+ is true (or unset) and no STONITH resources have been defined * The attributes of colocation and ordering constraints were renamed for clarity. See <> and <> * +resource-failure-stickiness+ has been replaced by +migration-threshold+. See <> * The parameters for command-line tools have been made consistent * Switched to 'RelaxNG' schema validation and 'libxml2' parser ** id fields are now XML IDs which have the following limitations: *** id's cannot contain colons (:) *** id's cannot begin with a number *** id's must be globally unique (not just unique for that tag) ** Some fields (such as those in constraints that refer to resources) are IDREFs. + This means that they must reference existing resources or objects in order for the configuration to be valid. Removing an object which is referenced elsewhere will therefore fail. + ** The CIB representation, from which a MD5 digest is calculated to verify CIBs on the nodes, has changed. + This means that every CIB update will require a full refresh on any upgraded nodes until the cluster is fully upgraded to 1.0. This will result in significant performance degradation and it is therefore highly inadvisable to run a mixed 1.0/0.6 cluster for any longer than absolutely necessary. + * Ping node information no longer needs to be added to _ha.cf_. + Simply include the lists of hosts in your ping resource(s). ==== Removed ==== * Syntax ** It is no longer possible to set resource meta options as top-level attributes. Use meta attributes instead. ** Resource and operation defaults are no longer read from +crm_config+. See <> and <> instead.