diff --git a/doc/Pacemaker_Explained/en-US/Author_Group.xml b/doc/Pacemaker_Explained/en-US/Author_Group.xml index 70fe29437e..ce639caf4c 100644 --- a/doc/Pacemaker_Explained/en-US/Author_Group.xml +++ b/doc/Pacemaker_Explained/en-US/Author_Group.xml @@ -1,35 +1,44 @@ AndrewBeekhof Red Hat Primary author andrew@beekhof.net PhilippMarek LINBit Style and formatting updates. Indexing. philipp.marek@linbit.com TanjaRoth SUSE Utilization chapter + Multi-Site Clusters chapter troth@suse.com + + LarsMarowsky-Bree + SUSE + Multi-Site Clusters chapter + lmb@suse.com + YanGao SUSE Utilization chapter + Multi-Site Clusters chapter ygao@suse.com ThomasSchraitle SUSE Utilization chapter + Multi-Site Clusters chapter toms@suse.com diff --git a/doc/Pacemaker_Explained/en-US/Ch-Multi-site-Clusters.txt b/doc/Pacemaker_Explained/en-US/Ch-Multi-site-Clusters.txt new file mode 100644 index 0000000000..b691f41a75 --- /dev/null +++ b/doc/Pacemaker_Explained/en-US/Ch-Multi-site-Clusters.txt @@ -0,0 +1,319 @@ += Multi-Site Clusters and Tickets = + +== Abstract == +Apart from local clusters, Pacemaker also supports multi-site clusters. +That means you can have multiple, geographically dispersed sites with a +local cluster each. Failover between these clusters can be coordinated +by a higher level entity, the so-called `CTR (Cluster Ticket Registry)`. + + +== Challenges for Multi-Site Clusters == + +Typically, multi-site environments are too far apart to support +synchronous communication between the sites and synchronous data +replication. That leads to the following challenges: + +- How to make sure that a cluster site is up and running? + +- How to make sure that resources are only started once? + +- How to make sure that quorum can be reached between the different +sites and a split brain scenario can be avoided? + +- How to manage failover between the sites? + +- How to deal with high latency in case of resources that need to be +stopped? + +In the following sections, learn how to meet these challenges. + + +== Conceptual Overview == + +Multi-site clusters can be considered as “overlay” clusters where +each cluster site corresponds to a cluster node in a traditional cluster. +The overlay cluster can be managed by a `CTR (Cluster Ticket Registry)` +mechanism. It guarantees that the cluster resources will be highly +available across different cluster sites. This is achieved by using +so-called `tickets` that are treated as failover domain between cluster +sites, in case a site should be down. + +The following list explains the individual components and mechanisms +that were introduced for multi-site clusters in more detail. + + +=== Components and Concepts === + +==== Ticket ==== + +"Tickets" are, essentially, cluster-wide attributes. A ticket grants the +right to run certain resources on a specific cluster site. Resources can +be bound to a certain ticket by `rsc_ticket` dependencies. Only if the +ticket is available at a site, the respective resources are started. +Vice versa, if the ticket is revoked, the resources depending on that +ticket need to be stopped. + +The ticket thus is similar to a 'site quorum'; i.e., the permission to +manage/own resources associated with that site. + +(One can also think of the current `have-quorum` flag as a special, cluster-wide +ticket that is granted in case of node majority.) + +These tickets can be granted/revoked either manually by administrators +(which could be the default for the classic enterprise clusters), or via +an automated `CTR` mechanism described further below. + +A ticket can only be owned by one site at a time. Initially, none +of the sites has a ticket. Each ticket must be granted once by the cluster +administrator. + +The presence or absence of tickets for a site is stored in the CIB as a +cluster status. With regards to a certain ticket, there are only two states +for a site: `true` (the site has the ticket) or `false` (the site does +not have the ticket). The absence of a certain ticket (during the initial +state of the multi-site cluster) is also reflected by the value `false`. + + +==== Dead Man Dependency ==== + +A site can only activate the resources safely if it can be sure that the +other site has deactivated them. However after a ticket is revoked, it can +take a long time until all resources depending on that ticket are stopped +"cleanly", especially in case of cascaded resources. To cut that process +short, the concept of a `Dead Man Dependency` was introduced: + +- If the ticket is revoked from a site, the nodes that are hosting +dependent resources are fenced. This considerably speeds up the recovery +process of the cluster and makes sure that resources can be migrated more +quickly. + +This can be configured by specifying a `loss-policy="fence"` in +`rsc_ticket` constraints. + + +==== CTR (Cluster Ticket Registry) ==== + +This is for those scenarios where the tickets management is supposed to +be automatic (instead of the administrator revoking the ticket somewhere, +waiting for everything to stop, and then granting it on the desired site). + +A `CTR` is a network daemon that handles granting, +revoking, and timing out "tickets". The participating clusters would run +the daemons that would connect to each other, exchange information on +their connectivity details, and vote on which site gets which ticket(s). + +A ticket would only be granted to a site once they can be sure that it +has been relinquished by the previous owner, which would need to be +implemented via a timer in most scenarios. If a site loses connection +to its peers, its tickets time out and recovery occurs. After the +connection timeout plus the recovery timeout has passed, the other sites +are allowed to re-acquire the ticket and start the resources again. + +This can also be thought of as a "quorum server", except that it is not +a single quorum ticket, but several. + + +==== Configuration Replication ==== + +As usual, the CIB is synchronized within each cluster, but it is not synchronized +across cluster sites of a multi-site cluster. You have to configure the resources +that will be highly available across the multi-site cluster for every site +accordingly. + + +== Configuring Ticket Dependencies == + +The `rsc_ticket` constraint lets you specify the resources depending on a certain +ticket. Together with the constraint, you can set a `loss-policy` that defines +what should happen to the respective resources if the ticket is revoked. + +The attribute `loss-policy` can have the following values: + +fence:: Fence the nodes that are running the relevant resources. + +stop:: Stop the relevant resources. + +freeze:: Do nothing to the relevant resources. + +demote:: Demote relevant resources that are running in master mode to slave mode. + + +An example to configure a `rsc_ticket` constraint: + +[source,XML] +------- + +------- + +This creates a constraint with the ID `rsc1-req-ticketA`. It defines that the +resource `rsc1` depends on `ticketA` and that the node running the resource should +be fenced in case `ticketA` is revoked. + +If resource `rsc1` was a multi-state resource that can run in master or +slave mode, you may want to configure that only `rsc1's` master mode +depends on `ticketA`. With the following configuration, `rsc1` will be +demoted to slave mode if `ticketA` is revoked: + +[source,XML] +------- + +------- + +If you want other resources to depend on further tickets, create as many +constraints as necessary with `rsc_ticket`. + + +`rsc_ticket` also supports resource sets. So one can easily list all the +resources in one `rsc_ticket` constraint. For example: + +[source,XML] +------- + + + + + + + + + + +------- + + +== Managing Multi-Site Clusters == + +=== Managing Tickets Manually === + +You can grant tickets to sites or revoke them from sites manually. +Though if you want to re-distribute a ticket, you should wait for +the dependent resources to cleanly stop at the previous site before you +grant the ticket to another desired site. + +Use the `crm_ticket` command line tool to grant, revoke, or query tickets. + +To grant a ticket to this site: +[source,Bash] +------- +# crm_ticket -t ticketA -v true +------- + +To revoke a ticket from this site: +[source,Bash] +------- +# crm_ticket -t ticketA -v false +------- + +Query if the specified ticket is granted to this site or not: +[source,Bash] +------- +# crm_ticket -t ticketA -G +------- + +Query the time of last granted the specified ticket to this site: +[source,Bash] +------- +# crm_ticket -t ticketA -T +------- + +[IMPORTANT] +==== +If you are managing tickets manually. Use the `crm_ticket` command with +great care as they cannot help verify if the same ticket is already +granted elsewhere. + +==== + + +=== Managing Tickets via a Cluster Ticket Registry === + +==== Booth ==== +Booth is an implementation of `Cluster Ticket Registry` or so-called +`Cluster Ticket Manager`. + +Booth is the instance managing the ticket distribution and thus, +the failover process between the sites of a multi-site cluster. Each of +the participating clusters and arbitrators runs a service, the boothd. +It connects to the booth daemons running at the other sites and +exchanges connectivity details. Once a ticket is granted to a site, the +booth mechanism will manage the ticket automatically: If the site which +holds the ticket is out of service, the booth daemons will vote which +of the other sites will get the ticket. To protect against brief +connection failures, sites that lose the vote (either explicitly or +implicitly by being disconnected from the voting body) need to +relinquish the ticket after a time-out. Thus, it is made sure that a +ticket will only be re-distributed after it has been relinquished by the +previous site. The resources that depend on that ticket will fail over +to the new site holding the ticket. The nodes that have run the +resources before will be treated according to the `loss-policy` you set +within the `rsc_ticket` constraint. + +Before the booth can manage a certain ticket within the multi-site cluster, +you initially need to grant it to a site manually via `booth client` command. +After you have initially granted a ticket to a site, the booth mechanism +will take over and manage the ticket automatically. + +[IMPORTANT] +==== +The `booth client` command line tool can be used to grant, list, or +revoke tickets. The `booth client` commands work on any machine where +the booth daemon is running. + +If you are managing tickets via `Booth`, only use `booth client` for manual +intervention instead of `crm_ticket`. That can make sure the same ticket +will only be owned by one cluster site at a time. +==== + +Booth includes an implementation of +http://en.wikipedia.org/wiki/Paxos_algorithm['Paxos'] and 'Paxos Lease' +algorithm, which guarantees the distributed consensus among different +cluster sites. + +[TIP] +==== +`Arbitrator` + +Each site runs one booth instance that is responsible for communicating +with the other sites. If you have a setup with an even number of sites, +you need an additional instance to reach consensus about decisions such +as failover of resources across sites. In this case, add one or more +arbitrators running at additional sites. Arbitrators are single machines +that run a booth instance in a special mode. As all booth instances +communicate with each other, arbitrators help to make more reliable +decisions about granting or revoking tickets. + +An arbitrator is especially important for a two-site scenario: For example, +if site `A` can no longer communicate with site `B`, there are two possible +causes for that: + +- `A` network failure between `A` and `B`. + +- Site `B` is down. + +However, if site `C` (the arbitrator) can still communicate with site `B`, +site `B` must still be up and running. + +==== + +===== Requirements ===== + +- All clusters that will be part of the multi-site cluster must be based on Pacemaker. + +- Booth must be installed on all cluster nodes and on all arbitrators that will +be part of the multi-site cluster. + +The most common scenario is probably a multi-site cluster with two sites and a +single arbitrator on a third site. However, technically, there are no limitations +with regards to the number of sites and the number of arbitrators involved. + +Nodes belonging to the same cluster site should be synchronized via NTP. However, +time synchronization is not required between the individual cluster sites. + + +== For more information == + +`Multi-site Clusters` +http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.geo.html + +`Booth` +https://github.com/ClusterLabs/booth diff --git a/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml b/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml index 27da3ee8c6..9c3009dc55 100644 --- a/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml +++ b/doc/Pacemaker_Explained/en-US/Pacemaker_Explained.xml @@ -1,49 +1,50 @@ Receiving Notification for Cluster Events
Configuring Email Notifications
Configuring SNMP Notifications
+ Further Reading Project Website Project Documentation A comprehensive guide to cluster commands has been written by Novell Heartbeat configuration: Corosync Configuration: