Supports applications with multiple modes (eg. master/slave)
Provably correct response to any failure or cluster state. The
cluster's response to any stimuli can be tested offline
before the condition exists
Background
Pacemaker has been around
since 2004
and is primarily a collaborative effort
- between Red Hat
+ between Red Hat
and SuSE. However, we also
receive considerable help and support from the folks
at LinBit and the community in
general.
The core Pacemaker team is made up of full-time developers from
Australia, the Czech Republic, the USA, and Germany. Contributions to the code or
documentation are always welcome.
Pacemaker ships with most modern Linux distributions and has been
deployed in many critical environments including Deutsche
Flugsicherung GmbH
- (DFS)
+ (DFS)
which uses Pacemaker to ensure
its air traffic
control systems are always available.
Currently Andrew Beekhof is
the project lead for Pacemaker.
Contact
Stay up to date with the ClusterLabs community by subscribing to our
mailing lists or by following the project development on Github.
diff --git a/src/_config.yml b/src/_config.yml
index ef03b00..3a07a41 100644
--- a/src/_config.yml
+++ b/src/_config.yml
@@ -1,52 +1,52 @@
# Welcome to Jekyll!
#
# This config file is meant for settings that affect your whole blog, values
# which you are expected to set up once and rarely edit after that. If you find
# yourself editing these this file very often, consider using Jekyll's data files
# feature for the data you need to update frequently.
#
# For technical reasons, this file is *NOT* reloaded automatically when you use
# 'bundle exec jekyll serve'. If you change this file, please restart the server process.
# Site settings
# These are used to personalize your new site. If you look in the HTML files,
# you will see them accessed via {{ site.title }}, {{ site.email }}, and so on.
# You can create any custom variable you would like, and they will be accessible
# in the templates via {{ site.myvariable }}.
title: ClusterLabs
email: andrew@beekhof.net
description: Community hub for open-source high-availability software
-url: http://www.clusterlabs.org/
+url: https://www.clusterlabs.org/
google_analytics: UA-8156370-1
# Build settings
theme: minima
destination: ../html
gems:
- jekyll-assets
- font-awesome-sass
include:
- doc
- pacemaker
- polls
exclude:
- Gemfile
- Gemfile.lock
- LICENSE.theme
# All content generated outside of jekyll, or not yet converted to jekyll,
# must be listed here, or jekyll will erase it when building the site.
# Though not documented as such, the values here function as prefix matches.
keep_files:
- images
- pacemaker/abi
- pacemaker/doc
- pacemaker/doxygen
- pacemaker/global
- pacemaker/man
- Pictures
- rpm-test
- rpm-test-next
- rpm-test-rhel
diff --git a/src/_includes/sidebar.html b/src/_includes/sidebar.html
index e633082..2a561f1 100644
--- a/src/_includes/sidebar.html
+++ b/src/_includes/sidebar.html
@@ -1,49 +1,49 @@
The ClusterLabs stack unifies a large group of Open Source projects related to High Availability into a cluster offering suitable for both small and large deployments. Together, Corosync, Pacemaker, DRBD, ScanCore and many other projects have been enabling detection and recovery of machine and application-level failures in production clusters since 1999. The ClusterLabs stack supports practically any redundancy configuration imaginable.
+
+ The ClusterLabs stack unifies a large group of Open Source projects related to
+ High Availability into a cluster offering suitable
+ for both small and large deployments. Together,
+ Corosync,
+ Pacemaker,
+ DRBD,
+ ScanCore,
+ and many other projects have been enabling detection and recovery of
+ machine and application-level failures in production clusters since
+ 1999. The ClusterLabs stack supports practically any redundancy
+ configuration imaginable.
We support many deployment scenarios, from the simplest
2-node standby cluster to a 32-node active/active
configuration.
We can also dramatically reduce hardware costs by allowing
several active/passive clusters to be combined and share a common
backup node.
We monitor the system for both hardware and software failures.
In the event of a failure, we will automatically recover
your application and make sure it is available from one
of the remaning machines in the cluster.
After a failure, we use advanced algorithms to quickly
determine the optimum locations for services based on
relative node preferences and/or requirements to run with
other cluster services (we call these "constraints").
Why clusters
At its core, a cluster is a distributed finite state
machine capable of co-ordinating the startup and recovery
of inter-related services across a set of machines.
System HA is possible without a cluster manager, but you save many headaches using one anyway
Even a distributed and/or replicated application that is
able to survive the failure of one or more components can
benefit from a higher level cluster:
data integrity through fencing (a non-responsive process does not imply it is not doing anything)
automated recovery of instances to ensure capacity
While SYS-V init replacements like systemd can provide
deterministic recovery of a complex stack of services, the
recovery is limited to one machine and lacks the context
of what is happening on other machines - context that is
crucial to determine the difference between a local
failure, clean startup or recovery after a total site
failure.
A Pacemaker stack is built on five core components:
libQB - core services (logging, IPC, etc)
Corosync - Membership, messaging and quorum
Resource agents - A collection of scripts that interact with the underlying services managed by the cluster
Fencing agents - A collection of scripts that interact with network power switches and SAN devices to isolate cluster members
Pacemaker itself
We describe each of these in more detail as well as other optional components such as CLIs and GUIs.
Background
Pacemaker has been around
since 2004
and is primarily a collaborative effort
- between Red Hat
- and SUSE, however we also
+ between Red Hat
+ and SUSE, however we also
receive considerable help and support from the folks
- at LinBit and the community in
+ at LinBit and the community in
general.
"Pacemaker cluster stack is the state-of-the-art high availability
and load balancing stack for the Linux platform."
-- OpenStack
documentation
Corosync also began life in 2004
but was then part of the OpenAIS project.
- It is primarily a Red
- Hat initiative, however we also receive considerable
- help and support from the folks in the community.
+ It is primarily a Red Hat initiative,
+ with considerable help and support from the folks in the community.
The core ClusterLabs team is made up of full-time
developers from Australia, Austria, Canada, China, Czech
Repulic, England, Germany, Sweden and the USA. Contributions to
the code or documentation are always welcome.
The ClusterLabs stack ships with most modern enterprise
distributions and has been deployed in many critical
environments including Deutsche Flugsicherung GmbH
(DFS)
which uses Pacemaker to ensure
its air
traffic control systems are always available.
At its core, Pacemaker is a distributed finite state
machine capable of co-ordinating the startup and recovery
of inter-related services across a set of machines.
Pacemaker understands many different resource types (OCF,
SYSV, systemd) and can accurately model the relationships
between them (colocation, ordering).
It can even use technology such
as Docker to
automatically isolate the resources managed by the
cluster.
Corosync APIs provide membership (a list of peers),
messaging (the ability to talk to processes on those
peers), and quorum (do we have a majority) capabilities to
projects such as Apache Qpid and Pacemaker.
libqb is a library with the primary purpose of providing
high performance client server reusable features. It
provides high performance logging, tracing, ipc, and poll.
The initial features of libqb come from the parts of
corosync that were thought to useful to other projects.
Resource agents are the abstraction that allows Pacemaker
to manage services it knows nothing about. They contain
the logic for what to do when the cluster wishes to start,
stop or check the health of a service.
This particular set of agents conform to the Open Cluster
Framework (OCF) specification.
A guide
to writing agents is also available.
Fence agents are the abstraction that allows Pacemaker to
isolate badly behaving nodes. They achieve this by either
powering off the node or disabling its access to the
network and/or shared storage.
Many types of network power switches exist and you will
want to choose the one(s) that match your hardware.
Please be aware that some (ones that don't loose power
when the machine goes down) are better than others.
Agents are generally expected to expose OCF-compliant
metadata.
The original documentation that sparked a lot of this
work. Mostly we only use the "RA" specification. Efforts
are underway to revive the process for updating and
modernizing the spec.
Configuration Tools
Pacemaker's internal configuration format is XML, which is
great for machines but terrible for humans.
The community's best minds have created GUIs and Shells to
hide the XML and allow the configuration to be viewed and
updated in a more human friendly format.
The original configuration shell for Pacemaker. Written
and actively maintained by SUSE, it may be used either as an
interactive shell with tab completion, for single commands
directly on the shell's command line or as batch mode
scripting tool. Documentation for crmsh can be
- found here.
+ found here.
An alternate vision for a full cluster lifecycle
configuration shell and web based GUI. Handles everything
from cluster installation through to resource
configuration and status.
The Linux Cluster Management Console (LCMC) is a GUI with
an inovative approach for representing the status of and
relationships between cluster services. It uses SSH to
let you install, configure and manage clusters from your
desktop.
An alternate vision for a full cluster lifecycle
configuration shell and web based GUI. Handles everything
from cluster installation through to resource
configuration and status.
The Booth cluster ticket manager extends Pacemaker to
support geographically distributed clustering. It does
this by managing the granting and revoking of 'tickets'
which authorizes one of the cluster sites, potentially
located in geographically dispersed locations, to run
certain resources.
SBD provides a node fencing mechanism through the
exchange of messages via shared block storage such as for
example a SAN, iSCSI, FCoE. This isolates the fencing
mechanism from changes in firmware version or dependencies on
specific firmware controllers, and it can be used as a STONITH
mechanism in all configurations that have reliable shared
storage. It can also be used as a pure watchdog-based fencing
mechanism.
- The Corosync Cluster Engine is a Group Communication System with additional features for implementing high availability within applications.
+ The Corosync Cluster Engine is a Group Communication System with additional features for implementing high availability within applications.
Corosync is used as a High Availability framework by projects such as Apache Qpid and Pacemaker.
{% img stars.jpg %}
Virtual synchrony
A closed process group communication model with virtual synchrony guarantees for creating replicated state machines.
Availability
A simple availability manager that restarts the application process when it has failed.
Information
A configuration and statistics in-memory database that provide the ability to set, retrieve, and receive change notifications of information.
Quorum
A quorum system that notifies applications when quorum is achieved or lost.
A: Pacemaker ships as part of most modern
distributions, so you can usually just launch your
favorite package manager on:
openSUSE and SUSE Linux Enterprise Server (SLES)
Fedora and derivatives such as Red Hat Enterprise Linux (RHEL)
and CentOS
Debian and derivatives such as Ubuntu (with the exception of
Debian 8 "jessie", for which see the Debian-HA team for
details)
Gentoo
If all else fails, you can try installing from source.
Q: Is there any documentation?
A: Yes. You can find the set relevant to
your version in our documentation
index.
Q: Where should I ask questions?
A: Often basic questions can be answered
on irc,
but sending them to the
- mailing list is
+ mailing list is
always a good idea so that everyone can benefit from the
answer.
Q: Do I need shared storage?
A: No. We can help manage it if you have
some, but Pacemaker itself has no need for shared storage.
Q: Which cluster filesystems does Pacemaker support?
A: Pacemaker supports the
- popular OCFS2
- and GFS2
+ popular OCFS2
+ and GFS2
filesystems. As you'd expect, you can use them on top of
real disks or network block devices
- like DRBD.
+ like DRBD.
Q: What kind of applications can I manage with Pacemaker?
A: Pacemaker is application agnostic, meaning
anything that can be scripted can be made highly available
- provided the script conforms to one of the supported
standards:
LSB,
OCF,
Systemd,
or Upstart.
Q: Can I use Pacemaker with Heartbeat?
A: Yes. Pacemaker started off life as part
of the Heartbeat project and continues to support it as an
alternative to Corosync. See
this documentation for more details
Q: Can I use Pacemaker with CMAN?
A: Yes. Pacemaker added support
for CMAN
v3 in version 1.1.5 to better integrate with distros
that have traditionally shipped and/or supported the RHCS
cluster stack instead of Pacemaker. This is particularly
relevant for those looking to use GFS2 or OCFS2. See
the documentation
for more details
Q: Can I use Pacemaker with Corosync 1.x?
A: Yes. You will need to configure
Corosync to load Pacemaker's custom plugin to provide the
membership and quorum information we require. See
the documentation for more details.
Q: Can I use Pacemaker with Corosync 2.x?
A: Yes. Pacemaker can obtain the
membership and quorum information it requires directly from
Corosync in this configuration. See
the documentation for more details.
Q: Do I need a fencing device?
A: Yes. Fencing is the only 100% reliable
way to ensure the integrity of your data and that
applications are only active on one host. Although
Pacemaker is technically able to function without Fencing,
there are a good reasons SUSE and Red Hat will not support
such a configuration.
Q: Do I need to know XML to configure Pacemaker?
A: No. Although Pacemaker uses XML as its
native configuration format, there
exist 2 CLIs and at least 4 GUIs
that present the configuration in a human friendly format.
Q: How do I synchronize the cluster configuration?
A: Any changes to Pacemaker's
configuration are automatically replicated to other
machines. The configuration is also versioned, so any
offline machines will be updated when they return.
Q: Should I choose pcs or crmsh?
A: Arguably the best advice is to use
whichever one comes with your distro. This is the one
that will be tailored to that environment, receive regular
bugfixes and feature in the documentation.
Of course, for years people have been side-loading all of
Pacemaker onto enterprise distros that didn't ship it, so
doing the same for just a configuration tool should be
easy if your favorite distro does not ship your favorite
tool.
You can stay up to date with the Pacemaker project by subscribing to our
-
news and/or
- site updates feeds.
+ site updates feeds.
A good first step is always to check out
the FAQ
and documentation. Otherwise, many
members of the community hang out
on irc
and are happy to answer questions. We are spread out over
many timezones though (and have day jobs), so you may need
to be patient when waiting for a reply.
Extended or complex issues might be better sent to the
- relevant mailing list(s)
+ relevant mailing list(s)
(you'll need to subscribe in order to send messages).
Don't worry if you pick the wrong one, many of us are on
multiple lists and someone will suggest a more appropriate
forum if necessary.
People new to the project, or Open Source generally, are
encouraged to
read Getting
Answers by Mike Ash from Rogue Amoeba. It provides
some very good tips on effective communication with groups
such as this one. Following the advice it contains will
greatly increase the chance of a quick and helpful reply.
Bugs and other problems can also be reported
- via Bugzilla.
+ via Bugzilla.
The development of most of the ClusterLabs-related projects take place as part of
the ClusterLabs organization at Github,
and the source code and issue trackers for these projects can be found there.
Providing Help
If you find this project useful, you may want to consider supporting its future development.
There are a number of ways to support the project (in no particular order):
If you're looking for external hosting, consider
using Linode.
Signing up for a new Linode with the referral code
75cc67af7ebaa39a56b66771a5b98501c643d312 provides
credits towards the hosting of this site, the code
repositories and mailing lists.
Heinlein Support offers
training
on Linux Clusters. Topics covered: heartbeat, openais,
pacemaker, DRBD, cluster filesystems, shared data and
the setup and integration of Linux Virtual Server (LVS)
into a high the cluster
- LINBIT provides global
- support
+ LINBIT provides global support
for DRBD, Linux-HA, Pacemaker and other HA-software
suites. Philipp Reisner and Lars Ellenberg, the authors
- of DRBD oversee
- LINBIT's Professional Services. In addition, they offer
- training services, certification,
- consultancy
- , and turnkey solutions around DRBD and Pacemaker
+ of DRBD, oversee
+ LINBIT's Professional Services. In addition, they offer training services, certification,
+ consultancy, and turnkey solutions around DRBD and Pacemaker
B1 Systems
provides support (troubleshooting, maintenance,
debugging, ...), consulting and training for Linux
clusters, load balancing, storage clusters, virtual
system cluster and high availability. This includes
Pacemaker, Heartbeat and LVS as well as various cluster
filesystems (OCFS2, GPFS, GFS, ...)
Gurulabs offers
training
in the US on Linux Clusters. Topics covered: heartbeat,
openais, pacemaker, DRBD, cluster filesystems, shared
data and the setup and integration of Linux Virtual
Server (LVS) into the cluster
Does your company provide Pacemaker training or
support? Let
us know!
Supports applications with multiple modes (eg. master/slave)
Provably correct response to any failure or cluster state. The
cluster's response to any stimuli can be tested offline
before the condition exists
Background
Pacemaker has been around
since 2004
and is primarily a collaborative effort
- between Red Hat
+ between Red Hat
and SuSE. However, we also
receive considerable help and support from the folks
at LinBit and the community in
general.
The core Pacemaker team is made up of full-time developers from
Australia, the Czech Republic, the USA, and Germany. Contributions to the code or
documentation are always welcome.
Pacemaker ships with most modern Linux distributions and has been
deployed in many critical environments including Deutsche
Flugsicherung GmbH
- (DFS)
+ (DFS)
which uses Pacemaker to ensure
its air traffic
control systems are always available.
Currently Andrew Beekhof is
the project lead for Pacemaker.
Next we need to teach CMAN how to send it's fencing
requests to Pacemaker. We do this regardless of whether
or not fencing is enabled within Pacemaker.
Now copy /etc/cluster/cluster.conf to all
the other nodes that will be part of the cluster.
Start the Cluster
CMAN was originally written for rgmanager and assumes the
cluster should not start until the node has
- quorum,
+ quorum,
so before we try to start the cluster, we need to disable
this behavior:
[ALL] # service cman start
[ALL] # service pacemaker start
A note for users of prior RHEL versions
The original cluster shell (crmsh)
is no
longer available on RHEL. To help people make the
transition there is
a
quick reference guide for those wanting to know what
the pcs equivalent is for various crmsh commands.
Set Cluster Options
With so many devices and possible topologies, it is nearly
impossible to include Fencing in a document like this.
For now we will disable it.
[ONE] # pcs property set stonith-enabled=false
One of the most common ways to deploy Pacemaker is in a
2-node configuration. However quorum as a concept makes
no sense in this scenario (because you only have it when
more than half the nodes are available), so we'll disable
it too.
[ONE] # pcs property set no-quorum-policy=ignore
For demonstration purposes, we will force the cluster to
move services after a single failure:
Lets add a cluster service, we'll choose one doesn't
require any configuration and works everywhere to make
things easy. Here's the command:
[ONE] # pcs resource create my_first_svc Dummy op monitor interval=120s
"my_first_svc" is the name the service
will be known as.
"ocf:pacemaker:Dummy" tells Pacemaker
which script to use
(Dummy
- an agent that's useful as a template and for guides like
this one), which namespace it is in (pacemaker) and what
standard it conforms to
(OCF).
"op monitor interval=120s" tells Pacemaker to
check the health of this service every 2 minutes by
calling the agent's monitor action.
You should now be able to see the service running using:
[ONE] # pcs status
or
[ONE] # crm_mon -1
Simulate a Service Failure
We can simulate an error by telling the service to stop
directly (without telling the cluster):
If you now run crm_mon in interactive
mode (the default), you should see (within the monitor
interval - 2 minutes) the cluster notice
that my_first_svc failed and move it to
another node.
With so many devices and possible topologies, it is nearly
impossible to include Fencing in a document like this.
For now we will disable it.
[ONE] # pcs property set stonith-enabled=false
One of the most common ways to deploy Pacemaker is in a
2-node configuration. However quorum as a concept makes
no sense in this scenario (because you only have it when
more than half the nodes are available), so we'll disable
it too.
[ONE] # pcs property set no-quorum-policy=ignore
For demonstration purposes, we will force the cluster to
move services after a single failure:
Lets add a cluster service, we'll choose one doesn't
require any configuration and works everywhere to make
things easy. Here's the command:
[ONE] # pcs resource create my_first_svc Dummy op monitor interval=120s
"my_first_svc" is the name the service
will be known as.
"ocf:pacemaker:Dummy" tells Pacemaker
which script to use
(Dummy
- an agent that's useful as a template and for guides like
this one), which namespace it is in (pacemaker) and what
standard it conforms to
(OCF).
"op monitor interval=120s" tells Pacemaker to
check the health of this service every 2 minutes by
calling the agent's monitor action.
You should now be able to see the service running using:
[ONE] # pcs status
or
[ONE] # crm_mon -1
Simulate a Service Failure
We can simulate an error by telling the service to stop
directly (without telling the cluster):
If you now run crm_mon in interactive
mode (the default), you should see (within the monitor
interval of 2 minutes) the cluster notice
that my_first_svc failed and move it to
another node.