msgid "Apart from local clusters, Pacemaker also supports multi-site clusters. That means you can have multiple, geographically dispersed sites, each with a local cluster. Failover between these clusters can be coordinated manually by the administrator, or automatically by a higher-level entity called a <emphasis>Cluster Ticket Registry (CTR)</emphasis>."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Challenges for Multi-Site Clusters"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Typically, multi-site environments are too far apart to support synchronous communication and data replication between the sites. That leads to significant challenges:"
msgstr ""
#. Tag: para
#, no-c-format
msgid "How do we make sure that a cluster site is up and running?"
msgstr ""
#. Tag: para
#, no-c-format
msgid "How do we make sure that resources are only started once?"
msgstr ""
#. Tag: para
#, no-c-format
msgid "How do we make sure that quorum can be reached between the different sites and a split-brain scenario avoided?"
msgstr ""
#. Tag: para
#, no-c-format
msgid "How do we manage failover between sites?"
msgstr ""
#. Tag: para
#, no-c-format
msgid "How do we deal with high latency in case of resources that need to be stopped?"
msgstr ""
#. Tag: para
#, no-c-format
msgid "In the following sections, learn how to meet these challenges."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Conceptual Overview"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Multi-site clusters can be considered as “overlay” clusters where each cluster site corresponds to a cluster node in a traditional cluster. The overlay cluster can be managed by a CTR in order to guarantee that any cluster resource will be active on no more than one cluster site. This is achieved by using <emphasis>tickets</emphasis> that are treated as failover domain between cluster sites, in case a site should be down."
msgstr ""
#. Tag: para
#, no-c-format
msgid "The following sections explain the individual components and mechanisms that were introduced for multi-site clusters in more detail."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Ticket"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Tickets are, essentially, cluster-wide attributes. A ticket grants the right to run certain resources on a specific cluster site. Resources can be bound to a certain ticket by <literal>rsc_ticket</literal> constraints. Only if the ticket is available at a site can the respective resources be started there. Vice versa, if the ticket is revoked, the resources depending on that ticket must be stopped."
msgstr ""
#. Tag: para
#, no-c-format
msgid "The ticket thus is similar to a <emphasis>site quorum</emphasis>, i.e. the permission to manage/own resources associated with that site. (One can also think of the current <literal>have-quorum</literal> flag as a special, cluster-wide ticket that is granted in case of node majority.)"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Tickets can be granted and revoked either manually by administrators (which could be the default for classic enterprise clusters), or via the automated CTR mechanism described below."
msgstr ""
#. Tag: para
#, no-c-format
msgid "A ticket can only be owned by one site at a time. Initially, none of the sites has a ticket. Each ticket must be granted once by the cluster administrator."
msgstr ""
#. Tag: para
#, no-c-format
msgid "The presence or absence of tickets for a site is stored in the CIB as a cluster status. With regards to a certain ticket, there are only two states for a site: <literal>true</literal> (the site has the ticket) or <literal>false</literal> (the site does not have the ticket). The absence of a certain ticket (during the initial state of the multi-site cluster) is the same as the value <literal>false</literal>."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Dead Man Dependency"
msgstr ""
#. Tag: para
#, no-c-format
msgid "A site can only activate resources safely if it can be sure that the other site has deactivated them. However after a ticket is revoked, it can take a long time until all resources depending on that ticket are stopped \"cleanly\", especially in case of cascaded resources. To cut that process short, the concept of a <emphasis>Dead Man Dependency</emphasis> was introduced."
msgstr ""
#. Tag: para
#, no-c-format
msgid "If a dead man dependency is in force, if a ticket is revoked from a site, the nodes that are hosting dependent resources are fenced. This considerably speeds up the recovery process of the cluster and makes sure that resources can be migrated more quickly."
msgstr ""
#. Tag: para
#, no-c-format
msgid "This can be configured by specifying a <literal>loss-policy=\"fence\"</literal> in <literal>rsc_ticket</literal> constraints."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Cluster Ticket Registry"
msgstr ""
#. Tag: para
#, no-c-format
msgid "A CTR is a coordinated group of network daemons that automatically handles granting, revoking, and timing out tickets (instead of the administrator revoking the ticket somewhere, waiting for everything to stop, and then granting it on the desired site)."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Pacemaker does not implement its own CTR, but interoperates with external software designed for that purpose (similar to how resource and fencing agents are not directly part of pacemaker)."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Participating clusters run the CTR daemons, which connect to each other, exchange information about their connectivity, and vote on which sites gets which tickets."
msgstr ""
#. Tag: para
#, no-c-format
msgid "A ticket is granted to a site only once the CTR is sure that the ticket has been relinquished by the previous owner, implemented via a timer in most scenarios. If a site loses connection to its peers, its tickets time out and recovery occurs. After the connection timeout plus the recovery timeout has passed, the other sites are allowed to re-acquire the ticket and start the resources again."
msgstr ""
#. Tag: para
#, no-c-format
msgid "This can also be thought of as a \"quorum server\", except that it is not a single quorum ticket, but several."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Configuration Replication"
msgstr ""
#. Tag: para
#, no-c-format
msgid "As usual, the CIB is synchronized within each cluster, but it is <emphasis>not</emphasis> synchronized across cluster sites of a multi-site cluster. You have to configure the resources that will be highly available across the multi-site cluster for every site accordingly."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Configuring Ticket Dependencies"
msgstr ""
#. Tag: para
#, no-c-format
msgid "The <literal>rsc_ticket</literal> constraint lets you specify the resources depending on a certain ticket. Together with the constraint, you can set a <literal>loss-policy</literal> that defines what should happen to the respective resources if the ticket is revoked."
msgstr ""
#. Tag: para
#, no-c-format
msgid "The attribute <literal>loss-policy</literal> can have the following values:"
msgstr ""
#. Tag: para
#, no-c-format
msgid "<literal>fence:</literal> Fence the nodes that are running the relevant resources."
msgstr ""
#. Tag: para
#, no-c-format
msgid "<literal>stop:</literal> Stop the relevant resources."
msgstr ""
#. Tag: para
#, no-c-format
msgid "<literal>freeze:</literal> Do nothing to the relevant resources."
msgstr ""
#. Tag: para
#, no-c-format
msgid "<literal>demote:</literal> Demote relevant resources that are running in master mode to slave mode."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Constraint that fences node if <literal>ticketA</literal> is revoked"
msgid "The example above creates a constraint with the ID <literal>rsc1-req-ticketA</literal>. It defines that the resource <literal>rsc1</literal> depends on <literal>ticketA</literal> and that the node running the resource should be fenced if <literal>ticketA</literal> is revoked."
msgstr ""
#. Tag: para
#, no-c-format
msgid "If resource <literal>rsc1</literal> were a multi-state resource (i.e. it could run in master or slave mode), you might want to configure that only master mode depends on <literal>ticketA</literal>. With the following configuration, <literal>rsc1</literal> will be demoted to slave mode if <literal>ticketA</literal> is revoked:"
msgstr ""
#. Tag: title
#, no-c-format
msgid "Constraint that demotes <literal>rsc1</literal> if <literal>ticketA</literal> is revoked"
msgid "You can create multiple <literal>rsc_ticket</literal> constraints to let multiple resources depend on the same ticket. However, <literal>rsc_ticket</literal> also supports resource sets (see <xref linkend=\"s-resource-sets\" />), so one can easily list all the resources in one <literal>rsc_ticket</literal> constraint instead."
msgid "In the example above, there are two resource sets, so we can list resources with different roles in a single <literal>rsc_ticket</literal> constraint. There’s no dependency between the two resource sets, and there’s no dependency among the resources within a resource set. Each of the resources just depends on <literal>ticketA</literal>."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Referencing resource templates in <literal>rsc_ticket</literal> constraints, and even referencing them within resource sets, is also supported."
msgstr ""
#. Tag: para
#, no-c-format
msgid "If you want other resources to depend on further tickets, create as many constraints as necessary with <literal>rsc_ticket</literal>."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Managing Multi-Site Clusters"
msgstr ""
#. Tag: title
#, no-c-format
msgid "Granting and Revoking Tickets Manually"
msgstr ""
#. Tag: para
#, no-c-format
msgid "You can grant tickets to sites or revoke them from sites manually. If you want to re-distribute a ticket, you should wait for the dependent resources to stop cleanly at the previous site before you grant the ticket to the new site."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Use the <literal>crm_ticket</literal> command line tool to grant and revoke tickets."
msgstr ""
#. Tag: para
#, no-c-format
msgid "To grant a ticket to this site:"
msgstr ""
#. Tag: screen
#, no-c-format
msgid "# crm_ticket --ticket ticketA --grant"
msgstr ""
#. Tag: para
#, no-c-format
msgid "To revoke a ticket from this site:"
msgstr ""
#. Tag: screen
#, no-c-format
msgid "# crm_ticket --ticket ticketA --revoke"
msgstr ""
#. Tag: para
#, no-c-format
msgid "If you are managing tickets manually, use the <literal>crm_ticket</literal> command with great care, because it cannot check whether the same ticket is already granted elsewhere."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Granting and Revoking Tickets via a Cluster Ticket Registry"
msgstr ""
#. Tag: para
#, no-c-format
msgid "We will use <ulink url=\"https://github.com/ClusterLabs/booth\">Booth</ulink> here as an example of software that can be used with pacemaker as a Cluster Ticket Registry. Booth implements the <ulink url=\"http://en.wikipedia.org/wiki/Raft_%28computer_science%29\">Raft</ulink> algorithm to guarantee the distributed consensus among different cluster sites, and manages the ticket distribution (and thus the failover process between sites)."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Each of the participating clusters and <emphasis>arbitrators</emphasis> runs the Booth daemon <literal>boothd</literal>."
msgstr ""
#. Tag: para
#, no-c-format
msgid "An <emphasis>arbitrator</emphasis> is the multi-site equivalent of a quorum-only node in a local cluster. If you have a setup with an even number of sites, you need an additional instance to reach consensus about decisions such as failover of resources across sites. In this case, add one or more arbitrators running at additional sites. Arbitrators are single machines that run a booth instance in a special mode. An arbitrator is especially important for a two-site scenario, otherwise there is no way for one site to distinguish between a network failure between it and the other site, and a failure of the other site."
msgstr ""
#. Tag: para
#, no-c-format
msgid "The most common multi-site scenario is probably a multi-site cluster with two sites and a single arbitrator on a third site. However, technically, there are no limitations with regards to the number of sites and the number of arbitrators involved."
msgstr ""
#. Tag: para
#, no-c-format
msgid "<literal>Boothd</literal> at each site connects to its peers running at the other sites and exchanges connectivity details. Once a ticket is granted to a site, the booth mechanism will manage the ticket automatically: If the site which holds the ticket is out of service, the booth daemons will vote which of the other sites will get the ticket. To protect against brief connection failures, sites that lose the vote (either explicitly or implicitly by being disconnected from the voting body) need to relinquish the ticket after a time-out. Thus, it is made sure that a ticket will only be re-distributed after it has been relinquished by the previous site. The resources that depend on that ticket will fail over to the new site holding the ticket. The nodes that have run the resources before will be treated according to the <literal>loss-policy</literal> you set within the <literal>rsc_ticket</literal> constraint."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Before the booth can manage a certain ticket within the multi-site cluster, you initially need to grant it to a site manually via the <literal>booth</literal> command-line tool. After you have initially granted a ticket to a site, <literal>boothd</literal> will take over and manage the ticket automatically."
msgstr ""
#. Tag: para
#, no-c-format
msgid "The <literal>booth</literal> command-line tool can be used to grant, list, or revoke tickets and can be run on any machine where <literal>boothd</literal> is running. If you are managing tickets via Booth, use only <literal>booth</literal> for manual intervention, not <literal>crm_ticket</literal>. That ensures the same ticket will only be owned by one cluster site at a time."
msgstr ""
#. Tag: title
#, no-c-format
msgid "Booth Requirements"
msgstr ""
#. Tag: para
#, no-c-format
msgid "All clusters that will be part of the multi-site cluster must be based on Pacemaker."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Booth must be installed on all cluster nodes and on all arbitrators that will be part of the multi-site cluster."
msgstr ""
#. Tag: para
#, no-c-format
msgid "Nodes belonging to the same cluster site should be synchronized via NTP. However, time synchronization is not required between the individual cluster sites."
msgstr ""
#. Tag: title
#, no-c-format
msgid "General Management of Tickets"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Display the information of tickets:"
msgstr ""
#. Tag: screen
#, no-c-format
msgid "# crm_ticket --info"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Or you can monitor them with:"
msgstr ""
#. Tag: screen
#, no-c-format
msgid "# crm_mon --tickets"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Display the <literal>rsc_ticket</literal> constraints that apply to a ticket:"
msgid "When you want to do maintenance or manual switch-over of a ticket, revoking the ticket would trigger the loss policies. If <literal>loss-policy=\"fence\"</literal>, the dependent resources could not be gracefully stopped/demoted, and other unrelated resources could even be affected."
msgstr ""
#. Tag: para
#, no-c-format
msgid "The proper way is making the ticket <emphasis>standby</emphasis> first with:"
msgstr ""
#. Tag: screen
#, no-c-format
msgid "# crm_ticket --ticket ticketA --standby"
msgstr ""
#. Tag: para
#, no-c-format
msgid "Then the dependent resources will be stopped or demoted gracefully without triggering the loss policies."
msgstr ""
#. Tag: para
#, no-c-format
msgid "If you have finished the maintenance and want to activate the ticket again, you can run:"