From the HA Trenches
This is a place to collect some example success stories and failure post-mortems.
CGGVeritas (2009)
CGGVeritas, a global provider of geophysical services and equipment, did set up several clusters to provide seismic data to users. They attached a 37 TB JBODs to each node of the cluster so using a total of 72 TB XFS filesystem on each node. 10 of these clusters are set up with Linux-HA version 2.1.3 (the equivalent of Heartbeat 2.99 + Pacemaker 0.6) exporting the data with NFS in an active/active setup.
Each node of the clusters has 16 GByte RAM, a 10 GBit Network interface toward the clients and a 4 GBit HBA direct attached storage. Each cluster serves more than 500 clients. The systems came into production 2006.
Minor hiccups caused by file system corruption were resolved after a failover and reboot of the node. Special hint: The admins did set up a uniq fsid. Otherwise the clients might get confused.
Thanks to Sachin Patel for this story.
Heilig-Geist-Hospital, Bingen (2009)
The Heilig-Geist-Hospital in Bingen at the Rhine uses a high available clustered firewall with state synchronization to separate several internal networks from each others. One of their applications is PACS (Picture Archiving and Communication System) for their central radiography laboratories. All departments use a terminal session to access the data. In case of an error the failover occurs. Since the connection table of the firewalls are synced the user experiences an small delay of the line but can go on working after about 3 seconds.
System: Two ordinary PCs, debian lenny, pacemaker and fwbuilder to manage the setup. They use about 20 different VLANs and also some routing controlled by the cluster. Please find a HOWTO to setup the HA firewall here.
Thanks to Matthias Thiele for this story.
GupShup, Free Group SMS (2009)
GupShup is India’s largest social messaging platform. Based in Mumbai it is mobile group SMS service that allows users to create mobile communities and broadcast messages to them. GupShup is growing rapidly with thousands of groups on topics such as finance, entertainment, lifestyle, health, sports and technology.
The cluster, two Ubuntu 8.04 Servers configured with Linux-HA version 2.1.3-2 (the equivalent of Heartbeat 2.99 + Pacemaker 0.6), runs a Shorewall firewall in an Active/Active configuration. Each node of the cluster has 4 Gigs RAM with 250 GB Hard Drive and serves more than 12 million outgoing sms daily at the rate of 150 sms/sec.
Thanks to Kaushal Shriyan for this story.
GitHub (2012 incidents)
GoCardless (2017)
- GoCardless/2017: Postgress, pacemaker, default-resource-stickiness,partial resource crash (via Adam Spiers on the users ML)
Press
- There is an article that offers an overview all the way from heartbeat to pacemaker with openais or corosync in Linux Technical Review. Sorry, article is in German and a subscription is needed.
- A German book "Clusterbau" by O'Reilly describes pacemaker, openais, corosync and LVS. It tells you how to set up clusters from the basics and also includes many useful examples.
- Pacemaker was number 6 on ZDnet's list of 10 Open Source Projects Worth Checking out (Dec 2009)
Miscellaneous links
These are all old, but some may still be of value.
- Step-by-step clustering guides from linode.com for Ubuntu, Debian, and Fedora. Guides include basic IP failover, DRBD and web applications.
- Tips on avoiding STONITH Death-matches
- Setup details for common Pacemaker use cases on Ubuntu
- Setup guide for Pacemaker and OpenNebula (DRBD, MySQL, LVM)
- Pacemaker project statistics
- Using Pacemaker with Lustre
- Options for clustering MySQL (slidedeck)
- High Availability in 37 Easy Steps (slidedeck) (audio: ogg, mp3) (also available on slideshare.net)
- MySQL with Pacemaker (Linbit webinar, requires registration)
- RabbitMQ - High Availability with Pacemaker and DRBD
- Making OpenNMS highly available with Pacemaker
- Nice walkthrough of Xen+DRBD on Debian
- Evoluzione dell’alta affidabilità su Linux (An Italian article series from miamammausalinux.org):
- Last Author
- kgaillot
- Last Edited
- Jan 22 2024, 1:23 PM