diff --git a/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt b/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt index 34632d7e5b..f36f8159a3 100644 --- a/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt +++ b/doc/Pacemaker_Explained/en-US/Ap-Upgrade.txt @@ -1,386 +1,420 @@ [appendix] -== Upgrading a Pacemaker Cluster == +== Upgrading == [[ap-upgrade]] === Upgrading Cluster Software === -There will always be an upgrade path from any pacemaker 1._x_ -release to any other 1._y_ release. - -Consult the documentation for your messaging layer -(Heartbeat or Corosync) to see whether upgrading them to a -newer version is also supported. - -There are three approaches to upgrading your cluster software: - -* Complete Cluster Shutdown -* Rolling (node by node) -* Disconnect and Reattach - -Each method has advantages and disadvantages, some of which are listed -in the table below, and you should choose the one most appropriate to -your needs. +There are three approaches to upgrading a cluster, each with advantages and +disadvantages. .Upgrade Methods -[width="95%",cols="6*",options="header",align="center"] +[width="95%",cols="s,6*",options="header",align="center"] |========================================================= -|Type -|Available between all software versions -|Service Outage During Upgrade -|Service Recovery During Upgrade -|Exercises Failover Logic/Configuration -|Allows change of cluster stack type -indexterm:[Cluster,Switching between Stacks] -indexterm:[Changing Cluster Stack] +|Method +|Available between all versions +|Can be used with Pacemaker Remote nodes +|Service outage during upgrade +|Service recovery during upgrade +|Exercises failover logic +|Allows change of messaging layer +indexterm:[Cluster,switching between stacks] +indexterm:[Changing cluster stack] footnote:[For example, switching from Heartbeat to Corosync.] -|Shutdown -indexterm:[Upgrade,Shutdown] -indexterm:[Shutdown Upgrade] +|Complete cluster shutdown +indexterm:[upgrade,shutdown] +indexterm:[shutdown upgrade] +|yes |yes |always |N/A |no |yes -|Rolling -indexterm:[Upgrade,Rolling] -indexterm:[Rolling Upgrade] +|Rolling (node by node) +indexterm:[upgrade,rolling] +indexterm:[rolling upgrade] |no +|yes |always |yes |yes |no -|Reattach -indexterm:[Upgrade,Reattach] -indexterm:[Reattach Upgrade] +|Detach and reattach +indexterm:[upgrade,reattach] +indexterm:[reattach upgrade] |yes +|no |only due to failure |no |no |yes |========================================================= ==== Complete Cluster Shutdown ==== In this scenario, one shuts down all cluster nodes and resources, then upgrades all the nodes before restarting the cluster. . On each node: .. Shutdown the cluster software (pacemaker and the messaging layer). .. Upgrade the Pacemaker software. This may also include upgrading the messaging layer and/or the underlying operating system. -.. Check the configuration manually or with the `crm_verify` tool if available. +.. Check the configuration with the `crm_verify` tool. . On each node: .. Start the cluster software. The messaging layer can be either Corosync or Heartbeat and does not need to be the same one before the upgrade. +One variation of this approach is to build a new cluster on new hosts. +This allows the new version to be tested beforehand, and minimizes downtime by +having the new nodes ready to be placed in production as soon as the old nodes +are shut down. + ==== Rolling (node by node) ==== -In this scenario, each node is removed from the cluster, upgraded and then -brought back online until all nodes are running the newest version. +In this scenario, each node is removed from the cluster, upgraded, and then +brought back online, until all nodes are running the newest version. + +If you plan to upgrade other cluster software -- such as the messaging layer -- +at the same time, consult that software's documentation for its compatibility +with a rolling upgrade. -Rolling upgrades should always be possible for pacemaker versions -1.0.0 and later. +Pacemaker has three version numbers that affect rolling upgrades: + +* *Pacemaker release version:* Rolling upgrades are possible as long as the + major version number (the _x_ in _x.y.z_) stays the same. For example, + a rolling upgrade may be done from 1.0.8 to 1.1.15, but not from + 0.6.7 to 1.0.0. + +* *CRM feature set:* This version number applies to the communication between + full cluster nodes. ++ +It increases when a cluster node running the older version would have +problems if the cluster's Designated Controller (DC) has the newer version. +To avoid these problems, Pacemaker ensures that the longest-running node is the +DC, and that nodes with an older feature set cannot join the cluster. ++ +Therefore, if the CRM feature set is changing in the Pacemaker version you +are upgrading to, you should run a mixed-version cluster only during a small +rolling upgrade window. Otherwise, if one of the older nodes drops out of the +cluster for any reason, it will not be able to rejoin until it is upgraded. + +* *LRMD protocol version:* This version number applies to communication between a + Pacemaker Remote node and the cluster. It increases when an older cluster + node would have problems hosting the connection to a newer Pacemaker Remote + node. To avoid these problems, Pacemaker Remote nodes will accept connections + only from cluster nodes with the same or newer LRMD protocol version. ++ +For rolling upgrades, this means that all cluster nodes should be upgraded +before upgrading any Pacemaker Remote nodes. ++ +Unlike with CRM feature set differences between full cluster nodes, +mixed LRMD protocol versions between Pacemaker Remote nodes and full cluster +nodes are fine, as long as the Pacemaker Remote nodes have the older version. +This can be useful, for example, to host a legacy application in an +older operating system version used as a Pacemaker Remote node. + +See the ClusterLabs wiki's +http://clusterlabs.org/wiki/ReleaseCalendar[Release Calendar] to figure out +whether the CRM feature set and/or LRMD protocol version changed between the +the Pacemaker release versions in your rolling upgrade. + +[WARNING] +==== +The interpretation of the LRMD protocol version changed in Pacemaker 1.1.15. +If you are planning a rolling upgrade from an earlier Pacemaker version to +Pacemaker 1.1.15 or later involving Pacemaker Remote nodes, you will need to +take special precautions to avoid problems. See +http://clusterlabs.org/wiki/Upgrading_to_Pacemaker_1.1.15_or_later_from_an_earlier_version[Upgrading +to Pacemaker 1.1.15 or later from an earlier version] on the ClusterLabs wiki. +==== -On each node: +To perform a rolling upgrade, on each node in turn: . Put the node into standby mode, and wait for any active resources - to be moved cleanly to another node. + to be moved cleanly to another node. (This step is optional, but + allows you to deal with any resource issues before the upgrade.) . Shutdown the cluster software (pacemaker and the messaging layer) on the node. . Upgrade the Pacemaker software. This may also include upgrading the messaging layer and/or the underlying operating system. -. If this is the first node to be upgraded, check the configuration manually - or with the `crm_verify` tool if available. +. If this is the first node to be upgraded, check the configuration + with the `crm_verify` tool. . Start the messaging layer. This must be the same messaging layer (Corosync or Heartbeat) - that the rest of the cluster is using. Upgrading the messaging layer - may also be possible; consult the documentation for those - projects to see whether the two versions will be compatible. + that the rest of the cluster is using. [NOTE] ==== Rolling upgrades were not always possible with older heartbeat and -pacemaker versions. The table below shows which versions were -compatible during rolling upgrades. Rolling upgrades that cross compatibility -boundaries must be performed in multiple steps (for example, -upgrading heartbeat 2.0.6 to heartbeat 2.1.3, and then upgrading again -to pacemaker 0.6.6). Rolling upgrades from pacemaker 0._x_ to 1._y_ are not -possible. +pacemaker versions. Rolling upgrades that cross compatibility +boundaries listed in the following table must be performed in multiple steps. .Version Compatibility Table [width="95%",cols="2*",options="header",align="center"] |========================================================= |Version being Installed |Oldest Compatible Version -|Pacemaker 1.0.x +|Pacemaker 1.x.y |Pacemaker 1.0.0 |Pacemaker 0.7.x |Pacemaker 0.6 or Heartbeat 2.1.3 |Pacemaker 0.6.x |Heartbeat 2.0.8 |Heartbeat 2.1.3 (or less) |Heartbeat 2.0.4 |Heartbeat 2.0.4 (or less) |Heartbeat 2.0.0 |Heartbeat 2.0.0 |None. Use an alternate upgrade strategy. |========================================================= ==== -==== Disconnect and Reattach ==== +==== Detach and Reattach ==== The reattach method is a variant of a complete cluster shutdown, where the resources are left active and get re-detected when the cluster is restarted. +This method may not be used if the cluster contains any Pacemaker Remote nodes. + . Tell the cluster to stop managing services. This is required to allow the services to remain active after the cluster shuts down. + ---- -# crm_attribute -t crm_config -n is-managed-default -v false +# crm_attribute --type rsc_defaults --name is-managed --update false ---- . For any resource that has a value for +is-managed+, make sure it is set to +false+ so that the cluster will not stop it (replacing $rsc_id appropriately): + ---- # crm_resource -t primitive -r $rsc_id -p is-managed -v false ---- -. On each node: -.. Shutdown the cluster software (pacemaker and the messaging layer). -.. Upgrade the Pacemaker software. This may also include upgrading the - messaging layer and/or the underlying operating system. -. Check the configuration manually or with the `crm_verify` tool if available. -. On each node: -.. Start the cluster software. The messaging layer can be either Corosync or - Heartbeat and does not need to be the same one as before the upgrade. - +. On each node, shutdown the cluster software (pacemaker and the messaging + layer), and upgrade the Pacemaker software. This may also include upgrading + the messaging layer. While the underlying operating system may be upgraded + at the same time, that will be more likely to cause outages in the detached + services (certainly, if a reboot is required). +. Check the configuration with the `crm_verify` tool. +. On each node, start the cluster software. The messaging layer can be either + Corosync or Heartbeat and does not need to be the same one as before the + upgrade. . Verify that the cluster re-detected all resources correctly. . Allow the cluster to resume managing resources again: + ---- -# crm_attribute -t crm_config -n is-managed-default -v true +# crm_attribute --type rsc_defaults --name is-managed --update true ---- . For any resource that has a value for +is-managed+, reset it to - +true+ (so the cluster can recover the service if it fails) if - desired: + +true+ if desired, to allow the cluster can recover the service if it fails: + ---- # crm_resource -t primitive -r $rsc_id -p is-managed -v true ---- -[NOTE] -The oldest version of the CRM to support this upgrade type was in Heartbeat 2.0.4. - [IMPORTANT] =========== Always check your existing configuration is still compatible with the version you are installing before starting the cluster. =========== === Upgrading the Configuration === -This process was originally written for the upgrade from 0.6.'x' to 1.'y', -but the concepts should apply for any upgrade involving a change in -the XML schema version. +indexterm:[upgrade,Configuration] +indexterm:[Configuration,upgrading] -indexterm:[Upgrading the Configuration] -indexterm:[Configuration,Upgrading] +Pacemaker's configuration -- the Configuration Information Base (CIB) -- has +its own XML schema version, independent of the Pacemaker software version. -==== Perform the upgrade ==== +After cluster software is upgraded, the cluster will continue to use +the older schema version that it was previously using. This can be useful, for +example, when administrators have written tools that modify the configuration, +and are based on the older syntax. -===== Upgrade the software ===== +However, when using an older syntax, new features may be unavailable, and there +is a performance impact, since the cluster must do a non-persistent +configuration upgrade before each transition. So while using the old syntax is +possible, it is not advisable to continue using it indefinitely. -Refer to the appendix: <> +Even if you wish to continue using the old syntax, it is a good idea to +follow the upgrade procedure outlined below, except for the last step, to ensure +that the new software has no problems with your existing configuration (since it +will perform much the same task internally). -===== Upgrade the Configuration ===== +If you are brave, it is sufficient simply to run `cibadmin --upgrade`. -As XML is not the friendliest of languages, it is common for cluster -administrators to have scripted some of their activities. In such -cases, it is likely that those scripts will not work with the new XML -syntax. +A more cautious approach would proceed like this: -In order to support such environments, it is actually possible to -continue using the old XML syntax. - -The downside is, however, that not all the new features will be -available and there is a performance impact since the cluster must do -a non-persistent configuration upgrade before each transition. So -while using the old syntax is possible, it is not advisable to -continue using it indefinitely. - -Even if you wish to continue using the old syntax, it is advisable to -follow the upgrade procedure (except for the last step) to ensure that the -cluster is able to use your existing configuration (since it will perform much -the same task internally). - -. Create a shadow copy to work with +. Create a shadow copy of the configuration. The later commands will automatically + operate on this copy, rather than the live configuration. + ----- -# crm_shadow --create upgrade06 +# crm_shadow --create shadow ----- -. Verify the configuration is valid indexterm:[Configuration,Verify]indexterm:[Verify,Configuration] +. Verify the configuration is valid with the new software (which may be + stricter about syntax mistakes, or may have dropped support for deprecated + features): +indexterm:[Configuration,verify] +indexterm:[verify,Configuration] + ----- # crm_verify --live-check ----- -. Fix any errors or warnings +. Fix any errors or warnings. . Perform the upgrade: + ----- # cibadmin --upgrade ----- . If this step fails, there are three main possibilities: -.. The configuration was not valid to start with - go back to step 2 -.. The transformation failed - report a bug or mailto:pacemaker@oss.clusterlabs.org?subject=Transformation%20failed%20during%20upgrade[email the project] -.. The transformation was successful but produced an invalid result footnote:[ -The most common reason is ID values being repeated or invalid. Pacemaker 1.0 is much stricter regarding this type of validation. -] +.. The configuration was not valid to start with (did you do steps 2 and 3?). +.. The transformation failed - http://bugs.clusterlabs.org/[report a bug] or + mailto:users@clusterlabs.org?subject=Transformation%20failed%20during%20upgrade[email the project]. +.. The transformation was successful but produced an invalid result. + If the result of the transformation is invalid, you may see a number of errors from the validation library. If these are not helpful, visit the http://clusterlabs.org/wiki/Validation_FAQ[Validation FAQ wiki page] and/or try -the procedure described below under <> +the procedure described below under <>. + -. Check the changes +. Check the changes: + ----- # crm_shadow --diff ----- + -If at this point there is anything about the upgrade that you wish to fine-tune (for example, to change some of the automatic IDs) now is the time to do so. Since the shadow configuration is not in use by the cluster, it is safe to edit the file manually: +If at this point there is anything about the upgrade that you wish to fine-tune +(for example, to change some of the automatic IDs), now is the time to do so: + ----- # crm_shadow --edit ----- + This will open the configuration in your favorite editor (whichever is -specified by the standard *$EDITOR* environment variable) +specified by the standard *$EDITOR* environment variable). + . Preview how the cluster will react: + ------ -# crm_simulate --live-check --save-dotfile upgrade06.dot -S -# graphviz upgrade06.dot +# crm_simulate --live-check --save-dotfile shadow.dot -S +# graphviz shadow.dot ------ + Verify that either no resource actions will occur or that you are happy with any that are scheduled. If the output contains actions you do not expect (possibly due to changes to the score calculations), you may need to make further manual changes. See <> for further details on how to interpret the output of `crm_simulate` and `graphviz`. + -. Upload the changes +. Upload the changes: + ----- -# crm_shadow --commit upgrade06 --force +# crm_shadow --commit shadow --force ----- + In the unlikely event this step fails, please report a bug. +[NOTE] +==== [[s-upgrade-config-manual]] -===== Manually Upgrading the Configuration ===== - -indexterm:[Configuration,Upgrade manually] +indexterm:[Configuration,upgrade manually] It is also possible to perform the configuration upgrade steps manually: -. Locate the +upgrade06.xsl+ conversion script provided with the source code - (the https://github.com/ClusterLabs/pacemaker/tree/master/xml/upgrade06.xsl[latest version] is available via - git). +. Locate the +upgrade*.xsl+ conversion scripts provided with the source code. These will often + be installed in a location such as +/usr/share/pacemaker+, or may be obtained from + the https://github.com/ClusterLabs/pacemaker/tree/master/xml[source repository]. -. Convert the XML blob: indexterm:[XML,Convert] +. Run the conversion scripts that apply to your older version, for example: + indexterm:[XML,convert] + ----- # xsltproc /path/to/upgrade06.xsl config06.xml > config10.xml ----- + -. Locate the +pacemaker.rng+ script. -. Check the XML validity: indexterm:[Validate Configuration]indexterm:[Configuration,Validate XML] +. Locate the +pacemaker.rng+ script (from the same location as the xsl files). +. Check the XML validity: indexterm:[validate configuration]indexterm:[Configuration,validate XML] + ---- # xmllint --relaxng /path/to/pacemaker.rng config10.xml ---- The advantage of this method is that it can be performed without the -cluster running and any validation errors should be more informative -(despite being generated by the same library!) since they include line -numbers. - +cluster running, and any validation errors are often more informative. +==== === What Changed in 1.0 === ==== New ==== * Failure timeouts. See <> * New section for resource and operation defaults. See <> and <> * Tool for making offline configuration changes. See <> * +Rules, instance_attributes, meta_attributes+ and sets of operations can be defined once and referenced in multiple places. See <> * The CIB now accepts XPath-based create/modify/delete operations. See the pass:[cibadmin] help text. * Multi-dimensional colocation and ordering constraints. See <> and <> * The ability to connect to the CIB from non-cluster machines. See <> * Allow recurring actions to be triggered at known times. See <> ==== Changed ==== * Syntax ** All resource and cluster options now use dashes (-) instead of underscores (_) ** +master_slave+ was renamed to +master+ ** The +attributes+ container tag was removed ** The operation field +pre-req+ has been renamed +requires+ ** All operations must have an +interval+, +start+/+stop+ must have it set to zero * The +stonith-enabled+ option now defaults to true. * The cluster will refuse to start resources if +stonith-enabled+ is true (or unset) and no STONITH resources have been defined * The attributes of colocation and ordering constraints were renamed for clarity. See <> and <> * +resource-failure-stickiness+ has been replaced by +migration-threshold+. See <> * The parameters for command-line tools have been made consistent * Switched to 'RelaxNG' schema validation and 'libxml2' parser ** id fields are now XML IDs which have the following limitations: *** id's cannot contain colons (:) *** id's cannot begin with a number *** id's must be globally unique (not just unique for that tag) ** Some fields (such as those in constraints that refer to resources) are IDREFs. + This means that they must reference existing resources or objects in order for the configuration to be valid. Removing an object which is referenced elsewhere will therefore fail. + ** The CIB representation, from which a MD5 digest is calculated to verify CIBs on the nodes, has changed. + This means that every CIB update will require a full refresh on any upgraded nodes until the cluster is fully upgraded to 1.0. This will result in significant performance degradation and it is therefore highly inadvisable to run a mixed 1.0/0.6 cluster for any longer than absolutely necessary. + * Ping node information no longer needs to be added to _ha.cf_. + Simply include the lists of hosts in your ping resource(s). ==== Removed ==== * Syntax ** It is no longer possible to set resource meta options as top-level attributes. Use meta attributes instead. ** Resource and operation defaults are no longer read from +crm_config+. See <> and <> instead.