diff --git a/src/_layouts/home.html b/src/_layouts/home.html index e0f65a7..6db2f10 100644 --- a/src/_layouts/home.html +++ b/src/_layouts/home.html @@ -1,207 +1,207 @@ --- layout: clusterlabs ---

Quick Overview

{% image Deploy-small.png %}

Deploy

We support many deployment scenarios, from the simplest 2-node standby cluster to a 32-node active/active configuration. We can also dramatically reduce hardware costs by allowing several active/passive clusters to be combined and share a common backup node.

{% image Monitor-small.png %}

Monitor

We monitor the system for both hardware and software failures. In the event of a failure, we will automatically recover your application and make sure it is available from one of the remaning machines in the cluster.

{% image Recover-small.png %}

Recover

After a failure, we use advanced algorithms to quickly determine the optimum locations for services based on relative node preferences and/or requirements to run with other cluster services (we call these "constraints").

Why clusters

At its core, a cluster is a distributed finite state machine capable of co-ordinating the startup and recovery of inter-related services across a set of machines.

System HA is possible without a cluster manager, but you save many headaches using one anyway

Even a distributed and/or replicated application that is able to survive the failure of one or more components can benefit from a higher level cluster:

While SYS-V init replacements like systemd can provide deterministic recovery of a complex stack of services, the recovery is limited to one machine and lacks the context of what is happening on other machines - context that is crucial to determine the difference between a local failure, clean startup or recovery after a total site failure.

Features

The ClusterLabs stack, incorporating Corosync and Pacemaker defines an Open Source, High Availability cluster offering suitable for both small and large deployments.

Components

"The definitive open-source high-availability stack for the Linux platform builds upon the Pacemaker cluster resource manager."
-- LINUX Journal, "Ahead of the Pack: the Pacemaker High-Availability Stack"

A Pacemaker stack is built on five core components:

We describe each of these in more detail as well as other optional components such as CLIs and GUIs.

Background

Pacemaker has been around since 2004 and is primarily a collaborative effort between Red Hat and SUSE, however we also receive considerable help and support from the folks at LinBit and the community in general.

Corosync also began life in 2004 but was then part of the OpenAIS project. It is primarily a Red Hat initiative, with considerable help and support from the folks in the community.

The core ClusterLabs team is made up of full-time developers from Australia, Austria, Canada, China, Czech Repulic, England, Germany, Sweden and the USA. Contributions to the code or documentation are always welcome.

The ClusterLabs stack ships with most modern enterprise distributions and has been deployed in many critical environments including Deutsche Flugsicherung GmbH (DFS) which uses Pacemaker to ensure its air traffic control systems are always available.

diff --git a/src/pacemaker/index.html b/src/pacemaker/index.html index 522a09f..0926048 100644 --- a/src/pacemaker/index.html +++ b/src/pacemaker/index.html @@ -1,82 +1,82 @@ --- layout: default title: Pacemaker ---
"The definitive open-source high-availability stack for the Linux platform builds upon the Pacemaker cluster resource manager." -- LINUX Journal, "Ahead of the Pack: the Pacemaker High-Availability Stack"

Features

Background

Black Duck Open Hub project report for pacemaker

Pacemaker has been around since 2004 and is a collaborative effort by the ClusterLabs community, including full-time developers with Red Hat and SuSE.

Pacemaker ships with most modern Linux distributions and has been deployed in many critical environments including Deutsche Flugsicherung GmbH (DFS) which uses Pacemaker to ensure its air traffic control systems are always available.

Andrew Beekhof was Pacemaker's original author and long-time project lead. The current project lead is Ken Gaillot.

diff --git a/src/quickstart-redhat-6.html b/src/quickstart-redhat-6.html index d62f038..d89510a 100644 --- a/src/quickstart-redhat-6.html +++ b/src/quickstart-redhat-6.html @@ -1,199 +1,198 @@ --- layout: pacemaker title: RHEL 6 Quickstart ---
{% include quickstart-common.html %}

RHEL 6.4 onwards

Install

Pacemaker ships as part of the Red Hat High Availability Add-on. The easiest way to try it out on RHEL is to install it from the Scientific Linux or CentOS repositories.

If you are already running CentOS or Scientific Linux, you can skip this step. Otherwise, to teach the machine where to find the CentOS packages, run:

[ALL] # cat < /etc/yum.repo.d/centos.repo [centos-6-base] name=CentOS-$releasever - Base mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/ enabled=1 EOF

Next we use yum to install pacemaker and some other necessary packages we will need:

[ALL] # yum install pacemaker cman pcs ccs resource-agents

Configure Cluster Membership and Messaging

The supported stack on RHEL6 is based on CMAN, so thats what Pacemaker uses too.

We now create a CMAN cluster and populate it with some nodes. Note that the name cannot exceed 15 characters (we'll use 'pacemaker1').

[ONE] # ccs -f /etc/cluster/cluster.conf --createcluster pacemaker1 [ONE] # ccs -f /etc/cluster/cluster.conf --addnode node1 [ONE] # ccs -f /etc/cluster/cluster.conf --addnode node2

Next we need to teach CMAN how to send it's fencing requests to Pacemaker. We do this regardless of whether or not fencing is enabled within Pacemaker.

[ONE] # ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk [ONE] # ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node1 [ONE] # ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node2 [ONE] # ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node1 pcmk-redirect port=node1 [ONE] # ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node2 pcmk-redirect port=node2

Now copy /etc/cluster/cluster.conf to all the other nodes that will be part of the cluster.

Start the Cluster

CMAN was originally written for rgmanager and assumes the cluster should not start until the node has quorum, so before we try to start the cluster, we need to disable this behavior:

[ALL] # echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman

Now, on each machine, run:

[ALL] # service cman start [ALL] # service pacemaker start

A note for users of prior RHEL versions

The original cluster shell (crmsh) is no longer available on RHEL. To help people make the transition there is - a + a quick reference guide for those wanting to know what the pcs equivalent is for various crmsh commands.

Set Cluster Options

With so many devices and possible topologies, it is nearly impossible to include Fencing in a document like this. For now we will disable it.

[ONE] # pcs property set stonith-enabled=false

One of the most common ways to deploy Pacemaker is in a 2-node configuration. However quorum as a concept makes no sense in this scenario (because you only have it when more than half the nodes are available), so we'll disable it too.

[ONE] # pcs property set no-quorum-policy=ignore

For demonstration purposes, we will force the cluster to move services after a single failure:

[ONE] # pcs resource defaults migration-threshold=1

Add a Resource

Lets add a cluster service, we'll choose one doesn't require any configuration and works everywhere to make things easy. Here's the command:

[ONE] # pcs resource create my_first_svc Dummy op monitor interval=120s

"my_first_svc" is the name the service will be known as.

"ocf:pacemaker:Dummy" tells Pacemaker which script to use (Dummy - an agent that's useful as a template and for guides like this one), which namespace it is in (pacemaker) and what - standard it conforms to - (OCF). + standard it conforms to (OCF).

"op monitor interval=120s" tells Pacemaker to check the health of this service every 2 minutes by calling the agent's monitor action.

You should now be able to see the service running using:

[ONE] # pcs status

or

[ONE] # crm_mon -1

Simulate a Service Failure

We can simulate an error by telling the service to stop directly (without telling the cluster):

[ONE] # crm_resource --resource my_first_svc --force-stop

If you now run crm_mon in interactive mode (the default), you should see (within the monitor interval - 2 minutes) the cluster notice that my_first_svc failed and move it to another node.

Next Steps

diff --git a/src/quickstart-redhat.html b/src/quickstart-redhat.html index e629b56..76c3f51 100644 --- a/src/quickstart-redhat.html +++ b/src/quickstart-redhat.html @@ -1,168 +1,167 @@ --- layout: pacemaker title: RHEL 7 Quickstart ---
{% include quickstart-common.html %}

RHEL 7

Install

Pacemaker ships as part of the Red Hat High Availability Add-on. The easiest way to try it out on RHEL is to install it from the Scientific Linux or CentOS repositories.

If you are already running CentOS or Scientific Linux, you can skip this step. Otherwise, to teach the machine where to find the CentOS packages, run:

[ALL] # cat < /etc/yum.repos.d/centos.repo [centos-7-base] name=CentOS-$releasever - Base mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os #baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/ enabled=1 EOF

Next we use yum to install pacemaker and some other necessary packages we will need:

[ALL] # yum install pacemaker pcs resource-agents

Create the Cluster

The supported stack on RHEL7 is based on Corosync 2, so thats what Pacemaker uses too.

First make sure that pcs daemon is running on every node:

[ALL] # systemctl start pcsd.service [ALL] # systemctl enable pcsd.service

Then we set up the authentication needed for pcs.

[ALL] # echo CHANGEME | passwd --stdin hacluster [ONE] # pcs cluster auth node1 node2 -u hacluster -p CHANGEME --force

We now create a cluster and populate it with some nodes. Note that the name cannot exceed 15 characters (we'll use 'pacemaker1').

[ONE] # pcs cluster setup --force --name pacemaker1 node1 node2

Start the Cluster

[ONE] # pcs cluster start --all

Set Cluster Options

With so many devices and possible topologies, it is nearly impossible to include Fencing in a document like this. For now we will disable it.

[ONE] # pcs property set stonith-enabled=false

One of the most common ways to deploy Pacemaker is in a 2-node configuration. However quorum as a concept makes no sense in this scenario (because you only have it when more than half the nodes are available), so we'll disable it too.

[ONE] # pcs property set no-quorum-policy=ignore

For demonstration purposes, we will force the cluster to move services after a single failure:

[ONE] # pcs resource defaults migration-threshold=1

Add a Resource

Lets add a cluster service, we'll choose one doesn't require any configuration and works everywhere to make things easy. Here's the command:

[ONE] # pcs resource create my_first_svc Dummy op monitor interval=120s

"my_first_svc" is the name the service will be known as.

"ocf:pacemaker:Dummy" tells Pacemaker which script to use (Dummy - an agent that's useful as a template and for guides like this one), which namespace it is in (pacemaker) and what - standard it conforms to - (OCF). + standard it conforms to (OCF).

"op monitor interval=120s" tells Pacemaker to check the health of this service every 2 minutes by calling the agent's monitor action.

You should now be able to see the service running using:

[ONE] # pcs status

or

[ONE] # crm_mon -1

Simulate a Service Failure

We can simulate an error by telling the service to stop directly (without telling the cluster):

[ONE] # crm_resource --resource my_first_svc --force-stop

If you now run crm_mon in interactive mode (the default), you should see (within the monitor interval of 2 minutes) the cluster notice that my_first_svc failed and move it to another node.

Next Steps

diff --git a/src/quickstart-suse.html b/src/quickstart-suse.html index f8c6844..3c9af8c 100644 --- a/src/quickstart-suse.html +++ b/src/quickstart-suse.html @@ -1,131 +1,130 @@ --- layout: pacemaker title: SLES 12 Quickstart ---
{% include quickstart-common.html %}

SLES 12

Install

Pacemaker ships as part of the SUSE High Availability Extension. To install, follow the provided documentation. It is also available in openSUSE Leap and openSUSE Tumbleweed.

Create the Cluster

The supported stack on SLES12 is based on Corosync 2.x.

To get started, install the cluster stack on all nodes.

[ALL] # zypper install ha-cluster-bootstrap

First we initialize the cluster on the first machine (node1):

[ONE] # ha-cluster-init

Now we can join the cluster from the second machine (node2):

[TWO] # ha-cluster-join -c node1

These two steps create and start a basic cluster together with the HAWK web interface. If given additional arguments, ha-cluster-init can also configure STONITH, OCFS2 and an administration IP address as part of initial configuration. It is also possible to choose whether to use multicast or unicast for corosync communication.

For more details on ha-cluster-init, see the output of ha-cluster-init --help.

Set Cluster Options

For demonstration purposes, we will force the cluster to move services after a single failure:

[ONE] # crm configure property migration-threshold=1

Add a Resource

Lets add a cluster service, we'll choose one doesn't require any configuration and works everywhere to make things easy. Here's the command:

[ONE] # crm configure primitive my_first_svc Dummy op monitor interval=120s

"my_first_svc" is the name the service will be known as.

"Dummy" tells Pacemaker which script to use (Dummy - an agent that's useful as a template and for guides like this one), which namespace it is in (pacemaker) and what - standard it conforms to - (OCF). + standard it conforms to (OCF).

"op monitor interval=120s" tells Pacemaker to check the health of this service every 2 minutes by calling the agent's monitor action.

You should now be able to see the service running using:

[ONE] # crm status

Simulate a Service Failure

We can simulate an error by telling the service stop directly (without telling the cluster):

[ONE] # crm_resource --resource my_first_svc --force-stop

If you now run crm_mon in interactive mode (the default), you should see (within the monitor interval - 2 minutes) the cluster notice that my_first_svc failed and move it to another node.

You can also watch the transition from the HAWK dashboard, by going to https://node1:7630.

Next Steps