diff --git a/src/_layouts/home.html b/src/_layouts/home.html
index e0f65a7..6db2f10 100644
--- a/src/_layouts/home.html
+++ b/src/_layouts/home.html
@@ -1,207 +1,207 @@
---
layout: clusterlabs
---
The ClusterLabs stack unifies a large group of Open Source projects related to
High Availability into a cluster offering suitable
for both small and large deployments. Together,
Corosync,
Pacemaker,
DRBD,
ScanCore,
and many other projects have been enabling detection and recovery of
machine and application-level failures in production clusters since
1999. The ClusterLabs stack supports practically any redundancy
configuration imaginable.
{% image clusterlabs3.svg %}
{% image Deploy-small.png %}
Deploy
We support many deployment scenarios, from the simplest
2-node standby cluster to a 32-node active/active
configuration.
We can also dramatically reduce hardware costs by allowing
several active/passive clusters to be combined and share a common
backup node.
{% image Monitor-small.png %}
Monitor
We monitor the system for both hardware and software failures.
In the event of a failure, we will automatically recover
your application and make sure it is available from one
of the remaning machines in the cluster.
{% image Recover-small.png %}
Recover
After a failure, we use advanced algorithms to quickly
determine the optimum locations for services based on
relative node preferences and/or requirements to run with
other cluster services (we call these "constraints").
At its core, a cluster is a distributed finite state
machine capable of co-ordinating the startup and recovery
of inter-related services across a set of machines.
System HA is possible without a cluster manager, but you save many headaches using one anyway
Even a distributed and/or replicated application that is
able to survive the failure of one or more components can
benefit from a higher level cluster:
- awareness of other applications in the stack
- a shared quorum implementation and calculation
- data integrity through fencing (a non-responsive process does not imply it is not doing anything)
- automated recovery of instances to ensure capacity
While SYS-V init replacements like systemd can provide
deterministic recovery of a complex stack of services, the
recovery is limited to one machine and lacks the context
of what is happening on other machines - context that is
crucial to determine the difference between a local
failure, clean startup or recovery after a total site
failure.
The ClusterLabs stack, incorporating Corosync
and Pacemaker defines
an Open Source,
High Availability
cluster
offering suitable for both small and large deployments.
- Detection and recovery of machine and application-level failures
- Supports practically any redundancy configuration
- Supports both quorate and resource-driven clusters
- - Configurable strategies for dealing with quorum loss (when multiple machines fail)
- - Supports application startup/shutdown ordering, regardless of which machine(s) the applications are on
- - Supports applications that must/must-not run on the same machine
- - Supports applications which need to be active on multiple machines
- - Supports applications with dual roles (promoted and unpromoted)
- - Provably correct response to any failure or cluster state.
- The cluster's response to any stimuli can be tested offline before the condition exists
+ - Configurable strategies for dealing with quorum loss (when multiple machines fail)
+ - Supports application startup/shutdown ordering, without requiring the applications to run on the same node
+ - Supports applications that must or must not run on the same node
+ - Supports applications which need to be active on multiple nodes
+ - Supports applications with dual roles (promoted and unpromoted)
+ - Provably correct response to any failure or cluster state. The cluster's
+ response to any stimuli can be tested offline before the condition exists
"The definitive open-source high-availability stack for the Linux
platform builds upon the Pacemaker cluster resource manager."
-- LINUX Journal,
"Ahead
of the Pack: the Pacemaker High-Availability Stack"
A Pacemaker stack is built on five core components:
- libQB - core services (logging, IPC, etc)
- Corosync - Membership, messaging and quorum
- Resource agents - A collection of scripts that interact with the underlying services managed by the cluster
- Fencing agents - A collection of scripts that interact with network power switches and SAN devices to isolate cluster members
- Pacemaker itself
We describe each of these in more detail as well as other optional components such as CLIs and GUIs.
Pacemaker has been around
since 2004
and is primarily a collaborative effort
between Red Hat
and SUSE, however we also
receive considerable help and support from the folks
at LinBit and the community in
general.
Corosync also began life in 2004
but was then part of the OpenAIS project.
It is primarily a Red Hat initiative,
with considerable help and support from the folks in the community.
The core ClusterLabs team is made up of full-time
developers from Australia, Austria, Canada, China, Czech
Repulic, England, Germany, Sweden and the USA. Contributions to
the code or documentation are always welcome.
The ClusterLabs stack ships with most modern enterprise
distributions and has been deployed in many critical
environments including Deutsche Flugsicherung GmbH
(DFS)
which uses Pacemaker to ensure
its air
traffic control systems are always available.
diff --git a/src/pacemaker/index.html b/src/pacemaker/index.html
index 522a09f..0926048 100644
--- a/src/pacemaker/index.html
+++ b/src/pacemaker/index.html
@@ -1,82 +1,82 @@
---
layout: default
title: Pacemaker
---
"The definitive open-source high-availability stack for the Linux
platform builds upon the Pacemaker cluster resource manager."
-- LINUX Journal,
"Ahead
of the Pack: the Pacemaker High-Availability Stack"
Features
- Detection and recovery of machine and application-level failures
- Supports practically any redundancy configuration
- Supports both quorate and resource-driven clusters
- - Configurable strategies for dealing with quorum loss (when multiple machines fail)
- - Supports application startup/shutdown ordering, regardless machine(s) the applications are on
- - Supports applications that must/must-not run on the same machine
- - Supports applications which need to be active on multiple machines
- - Supports applications with dual roles (promoted and unpromoted)
- - Provably correct response to any failure or cluster state. The
- cluster's response to any stimuli can be tested offline
- before the condition exists
+ - Configurable strategies for dealing with quorum loss (when multiple nodes fail)
+ - Supports application startup/shutdown ordering, without requiring the applications to run on the same node
+ - Supports applications that must or must not run on the same node
+ - Supports applications which need to be active on multiple nodes
+ - Supports applications with dual roles (promoted and unpromoted)
+ - Provably correct response to any failure or cluster state. The cluster's
+ response to any stimuli can be tested offline before the condition exists
+
Background
Pacemaker has been around since
2004
and is a collaborative effort by the ClusterLabs community, including
full-time developers with
Red Hat
and SuSE.
Pacemaker ships with most modern Linux distributions and has been
deployed in many critical environments including Deutsche
Flugsicherung GmbH
(DFS)
which uses Pacemaker to ensure
its air traffic
control systems are always available.
Andrew Beekhof was
Pacemaker's original author and long-time project lead. The current
project lead is Ken Gaillot.
diff --git a/src/quickstart-redhat-6.html b/src/quickstart-redhat-6.html
index d62f038..d89510a 100644
--- a/src/quickstart-redhat-6.html
+++ b/src/quickstart-redhat-6.html
@@ -1,199 +1,198 @@
---
layout: pacemaker
title: RHEL 6 Quickstart
---
{% include quickstart-common.html %}
RHEL 6.4 onwards
Install
Pacemaker ships as part of the Red Hat
High Availability Add-on.
The easiest way to try it out on RHEL is to install it from the
Scientific Linux
or CentOS repositories.
If you are already running CentOS or Scientific Linux, you can skip this step. Otherwise, to teach the machine where to find the CentOS packages, run:
[ALL] # cat < /etc/yum.repo.d/centos.repo
[centos-6-base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
enabled=1
EOF
Next we use yum to install pacemaker and some other
necessary packages we will need:
[ALL] # yum install pacemaker cman pcs ccs resource-agents
Configure Cluster Membership and Messaging
The supported stack on RHEL6 is based on CMAN, so thats
what Pacemaker uses too.
We now create a CMAN cluster and populate it with some
nodes. Note that the name cannot exceed 15 characters
(we'll use 'pacemaker1').
[ONE] # ccs -f /etc/cluster/cluster.conf --createcluster pacemaker1
[ONE] # ccs -f /etc/cluster/cluster.conf --addnode node1
[ONE] # ccs -f /etc/cluster/cluster.conf --addnode node2
Next we need to teach CMAN how to send it's fencing
requests to Pacemaker. We do this regardless of whether
or not fencing is enabled within Pacemaker.
[ONE] # ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
[ONE] # ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node1
[ONE] # ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node2
[ONE] # ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node1 pcmk-redirect port=node1
[ONE] # ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node2 pcmk-redirect port=node2
Now copy /etc/cluster/cluster.conf to all
the other nodes that will be part of the cluster.
Start the Cluster
CMAN was originally written for rgmanager and assumes the
cluster should not start until the node has
quorum,
so before we try to start the cluster, we need to disable
this behavior:
[ALL] # echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman
Now, on each machine, run:
[ALL] # service cman start
[ALL] # service pacemaker start
A note for users of prior RHEL versions
The original cluster shell (crmsh)
is no
longer available on RHEL. To help people make the
transition there is
- a
+ a
quick reference guide for those wanting to know what
the pcs equivalent is for various crmsh commands.
Set Cluster Options
With so many devices and possible topologies, it is nearly
impossible to include Fencing in a document like this.
For now we will disable it.
[ONE] # pcs property set stonith-enabled=false
One of the most common ways to deploy Pacemaker is in a
2-node configuration. However quorum as a concept makes
no sense in this scenario (because you only have it when
more than half the nodes are available), so we'll disable
it too.
[ONE] # pcs property set no-quorum-policy=ignore
For demonstration purposes, we will force the cluster to
move services after a single failure:
[ONE] # pcs resource defaults migration-threshold=1
Add a Resource
Lets add a cluster service, we'll choose one doesn't
require any configuration and works everywhere to make
things easy. Here's the command:
[ONE] # pcs resource create my_first_svc Dummy op monitor interval=120s
"my_first_svc" is the name the service
will be known as.
"ocf:pacemaker:Dummy" tells Pacemaker
which script to use
(Dummy
- an agent that's useful as a template and for guides like
this one), which namespace it is in (pacemaker) and what
- standard it conforms to
- (OCF).
+ standard it conforms to (OCF).
"op monitor interval=120s" tells Pacemaker to
check the health of this service every 2 minutes by
calling the agent's monitor action.
You should now be able to see the service running using:
[ONE] # pcs status
or
[ONE] # crm_mon -1
Simulate a Service Failure
We can simulate an error by telling the service to stop
directly (without telling the cluster):
[ONE] # crm_resource --resource my_first_svc --force-stop
If you now run crm_mon in interactive
mode (the default), you should see (within the monitor
interval - 2 minutes) the cluster notice
that my_first_svc failed and move it to
another node.
Next Steps
diff --git a/src/quickstart-redhat.html b/src/quickstart-redhat.html
index e629b56..76c3f51 100644
--- a/src/quickstart-redhat.html
+++ b/src/quickstart-redhat.html
@@ -1,168 +1,167 @@
---
layout: pacemaker
title: RHEL 7 Quickstart
---
{% include quickstart-common.html %}
RHEL 7
Install
Pacemaker ships as part of the Red Hat
High Availability Add-on.
The easiest way to try it out on RHEL is to install it from the
Scientific Linux
or CentOS repositories.
If you are already running CentOS or Scientific Linux, you can skip this step. Otherwise, to teach the machine where to find the CentOS packages, run:
[ALL] # cat < /etc/yum.repos.d/centos.repo
[centos-7-base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
enabled=1
EOF
Next we use yum to install pacemaker and some other
necessary packages we will need:
[ALL] # yum install pacemaker pcs resource-agents
Create the Cluster
The supported stack on RHEL7 is based on Corosync 2, so thats
what Pacemaker uses too.
First make sure that pcs daemon is running on every node:
[ALL] # systemctl start pcsd.service
[ALL] # systemctl enable pcsd.service
Then we set up the authentication needed for pcs.
[ALL] # echo CHANGEME | passwd --stdin hacluster
[ONE] # pcs cluster auth node1 node2 -u hacluster -p CHANGEME --force
We now create a cluster and populate it with some nodes.
Note that the name cannot exceed 15 characters (we'll use
'pacemaker1').
[ONE] # pcs cluster setup --force --name pacemaker1 node1 node2
Start the Cluster
[ONE] # pcs cluster start --all
Set Cluster Options
With so many devices and possible topologies, it is nearly
impossible to include Fencing in a document like this.
For now we will disable it.
[ONE] # pcs property set stonith-enabled=false
One of the most common ways to deploy Pacemaker is in a
2-node configuration. However quorum as a concept makes
no sense in this scenario (because you only have it when
more than half the nodes are available), so we'll disable
it too.
[ONE] # pcs property set no-quorum-policy=ignore
For demonstration purposes, we will force the cluster to
move services after a single failure:
[ONE] # pcs resource defaults migration-threshold=1
Add a Resource
Lets add a cluster service, we'll choose one doesn't
require any configuration and works everywhere to make
things easy. Here's the command:
[ONE] # pcs resource create my_first_svc Dummy op monitor interval=120s
"my_first_svc" is the name the service
will be known as.
"ocf:pacemaker:Dummy" tells Pacemaker
which script to use
(Dummy
- an agent that's useful as a template and for guides like
this one), which namespace it is in (pacemaker) and what
- standard it conforms to
- (OCF).
+ standard it conforms to (OCF).
"op monitor interval=120s" tells Pacemaker to
check the health of this service every 2 minutes by
calling the agent's monitor action.
You should now be able to see the service running using:
[ONE] # pcs status
or
[ONE] # crm_mon -1
Simulate a Service Failure
We can simulate an error by telling the service to stop
directly (without telling the cluster):
[ONE] # crm_resource --resource my_first_svc --force-stop
If you now run crm_mon in interactive
mode (the default), you should see (within the monitor
interval of 2 minutes) the cluster notice
that my_first_svc failed and move it to
another node.
Next Steps
diff --git a/src/quickstart-suse.html b/src/quickstart-suse.html
index f8c6844..3c9af8c 100644
--- a/src/quickstart-suse.html
+++ b/src/quickstart-suse.html
@@ -1,131 +1,130 @@
---
layout: pacemaker
title: SLES 12 Quickstart
---
{% include quickstart-common.html %}
SLES 12
Install
Pacemaker ships as part of the
SUSE High
Availability Extension. To install, follow the provided
documentation. It is also available in openSUSE Leap and openSUSE
Tumbleweed.
Create the Cluster
The supported stack on SLES12 is based on Corosync 2.x.
To get started, install the cluster stack on all nodes.
[ALL] # zypper install ha-cluster-bootstrap
First we initialize the cluster on the first machine (node1):
[ONE] # ha-cluster-init
Now we can join the cluster from the second machine (node2):
[TWO] # ha-cluster-join -c node1
These two steps create and start a basic cluster together with the
HAWK web interface. If given
additional arguments, ha-cluster-init can also configure
STONITH, OCFS2 and an administration IP address as part of initial
configuration. It is also possible to choose whether to use multicast
or unicast for corosync communication.
For more details on ha-cluster-init, see the output of
ha-cluster-init --help.
Set Cluster Options
For demonstration purposes, we will force the cluster to
move services after a single failure:
[ONE] # crm configure property migration-threshold=1
Add a Resource
Lets add a cluster service, we'll choose one doesn't
require any configuration and works everywhere to make
things easy. Here's the command:
[ONE] # crm configure primitive my_first_svc Dummy op monitor interval=120s
"my_first_svc" is the name the service
will be known as.
"Dummy" tells Pacemaker
which script to use
(Dummy
- an agent that's useful as a template and for guides like
this one), which namespace it is in (pacemaker) and what
- standard it conforms to
- (OCF).
+ standard it conforms to (OCF).
"op monitor interval=120s" tells Pacemaker to
check the health of this service every 2 minutes by
calling the agent's monitor action.
You should now be able to see the service running using:
[ONE] # crm status
Simulate a Service Failure
We can simulate an error by telling the service stop
directly (without telling the cluster):
[ONE] # crm_resource --resource my_first_svc --force-stop
If you now run crm_mon in interactive
mode (the default), you should see (within the monitor
interval - 2 minutes) the cluster notice
that my_first_svc failed and move it to
another node.
You can also watch the transition from the HAWK dashboard, by going
to https://node1:7630.
Next Steps