The ClusterLabs stack unifies a large group of Open Source projects related to
High Availability into a cluster offering suitable
for both small and large deployments. Together,
Corosync,
Pacemaker,
DRBD,
ScanCore,
and many other projects have been enabling detection and recovery of
machine and application-level failures in production clusters since
1999. The ClusterLabs stack supports practically any redundancy
configuration imaginable.
We support many deployment scenarios, from the simplest
2-node standby cluster to a 32-node active/active
configuration.
We can also dramatically reduce hardware costs by allowing
several active/passive clusters to be combined and share a common
backup node.
We monitor the system for both hardware and software failures.
In the event of a failure, we will automatically recover
your application and make sure it is available from one
of the remaning machines in the cluster.
After a failure, we use advanced algorithms to quickly
determine the optimum locations for services based on
relative node preferences and/or requirements to run with
other cluster services (we call these "constraints").
Why clusters
At its core, a cluster is a distributed finite state
machine capable of co-ordinating the startup and recovery
of inter-related services across a set of machines.
System HA is possible without a cluster manager, but you save many headaches using one anyway
Even a distributed and/or replicated application that is
able to survive the failure of one or more components can
benefit from a higher level cluster:
data integrity through fencing (a non-responsive process does not imply it is not doing anything)
automated recovery of instances to ensure capacity
While SYS-V init replacements like systemd can provide
deterministic recovery of a complex stack of services, the
recovery is limited to one machine and lacks the context
of what is happening on other machines - context that is
crucial to determine the difference between a local
failure, clean startup or recovery after a total site
failure.
A Pacemaker stack is built on five core components:
libQB - core services (logging, IPC, etc)
Corosync - Membership, messaging and quorum
Resource agents - A collection of scripts that interact with the underlying services managed by the cluster
Fencing agents - A collection of scripts that interact with network power switches and SAN devices to isolate cluster members
Pacemaker itself
We describe each of these in more detail as well as other optional components such as CLIs and GUIs.
Background
Pacemaker has been around
since 2004
and is primarily a collaborative effort
between Red Hat
and SUSE, however we also
receive considerable help and support from the folks
at LinBit and the community in
general.
Corosync also began life in 2004
but was then part of the OpenAIS project.
It is primarily a Red Hat initiative,
with considerable help and support from the folks in the community.
The core ClusterLabs team is made up of full-time
developers from Australia, Austria, Canada, China, Czech
Repulic, England, Germany, Sweden and the USA. Contributions to
the code or documentation are always welcome.
The ClusterLabs stack ships with most modern enterprise
distributions and has been deployed in many critical
environments including Deutsche Flugsicherung GmbH
(DFS)
which uses Pacemaker to ensure
its air
traffic control systems are always available.
A good first step is always to check out
the FAQ
and documentation. Otherwise, many
members of the community hang out
on IRC
and are happy to answer questions. We are spread out over
many timezones though (and have day jobs), so you may need
to be patient when waiting for a reply.
Extended or complex issues might be better sent to the
- relevant mailing list
+ relevant mailing list
(you'll need to subscribe in order to send messages).
People new to the project, or Open Source generally, are
encouraged to
read Getting
Answers by Mike Ash from Rogue Amoeba. It provides
some very good tips on effective communication with groups
such as this one. Following the advice it contains will
greatly increase the chance of a quick and helpful reply.
Bugs and other problems can also be reported
via Bugzilla.
The development of most of the ClusterLabs-related projects take place as part of
the ClusterLabs organization at Github,
and the source code and issue trackers for these projects can be found there.
Providing Help
If you find this project useful, you may want to consider supporting its future development.
There are a number of ways to support the project (in no particular order):
LINBIT provides global support
for DRBD, Linux-HA, Pacemaker and other HA-software
suites. Philipp Reisner and Lars Ellenberg, the authors
of DRBD, oversee
LINBIT's Professional Services. In addition, they offer training services, certification,
consultancy, and turnkey solutions around DRBD and Pacemaker
Alteeve is a software
and systems design company specializing in server uptime and
operational continuity. Their Anvil! product offers an all-in-one
supported clustering solution using ClusterLabs software.
B1 Systems
provides support (troubleshooting, maintenance,
debugging, ...), consulting and training for Linux
clusters, load balancing, storage clusters, virtual
system cluster and high availability. This includes
Pacemaker, Heartbeat and LVS as well as various cluster
filesystems (OCFS2, GPFS, GFS, ...)
Most of the documentation listed here was generated from the Pacemaker
sources.
Where to Start
If you're new to Pacemaker or clustering in general, the best place to
start is Clusters from Scratch, which walks you step-by-step through
the installation and configuration of a high-availability cluster with
Pacemaker. It even makes common configuration mistakes so that it can
demonstrate how to fix them.
On the other hand, if you're looking for an exhaustive reference of all
of Pacemaker's options and features, try Pacemaker Explained. It's
dry, but should have the answers you're looking for.
- There is also a project wiki
+ There is also a project wiki
with examples, how-to guides, and other information that doesn't make it
into the manuals.
';
foreach ($books as $b) {
foreach ($langs as $lang) {
if (glob("$base/$lang/Pacemaker/$version/pdf/$b/*-$lang.pdf")) {
echo '
'.str_replace("_", " ", $b)." ($lang)
";
echo '
';
foreach (glob("$base/$lang/Pacemaker/$version/epub/$b/*.epub") as $filename) {
echo " [epub]";
}
foreach (glob("$base/$lang/Pacemaker/$version/pdf/$b/*.pdf") as $filename) {
echo " [pdf]";
}
foreach (glob("$base/$lang/Pacemaker/$version/html/$b/index.html") as $filename) {
echo " [html]";
}
foreach (glob("$base/$lang/Pacemaker/$version/html-single/$b/index.html") as $filename) {
echo " [html-single]";
}
foreach (glob("$base/$lang/Pacemaker/$version/txt/$b/*.txt") as $filename) {
echo " [txt]";
}
echo "
";
}
}
}
echo "
";
echo "";
}
echo "\n
Versioned documentation
\n";
$langs = array();
// for now, show only US English; other translations haven't been maintained
$langs[] = "en-US";
foreach (get_versions("./[0-9]*.*") as $v) {
sphinx_docs_for_version(".", $v);
}
echo "\n
Provably correct response to any failure or cluster state. The cluster's
response to any stimuli can be tested offline before the condition exists
Background
Pacemaker has been around since
2004
and is a collaborative effort by the ClusterLabs community, including
full-time developers with
Red Hat
and SuSE.
Pacemaker ships with most modern Linux distributions and has been
deployed in many critical environments including Deutsche
Flugsicherung GmbH
(DFS)
which uses Pacemaker to ensure
its air traffic
control systems are always available.
Andrew Beekhof was
Pacemaker's original author and long-time project lead. The current
project lead is Ken Gaillot.