diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt b/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt index 6d5f03fd05..2fa311a314 100644 --- a/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt +++ b/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt @@ -1,537 +1,537 @@ = Conversion to Active/Active = == Requirements == The primary requirement for an Active/Active cluster is that the data required for your services is available, simultaneously, on both machines. Pacemaker makes no requirement on how this is achieved, you could use a SAN if you had one available, however since DRBD supports multiple Primaries, we can also use that. The only hitch is that we need to use a cluster-aware filesystem. The one we used earlier with DRBD, ext4, is not one of those. Both OCFS2 and GFS2 are supported, however here we will use GFS2 which comes with Fedora 17. === Installing the required Software === [source,Bash] ----- # yum install -y gfs2-utils dlm kernel-modules-extra Loaded plugins: langpacks, presto, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package dlm.x86_64 0:3.99.4-1.fc17 will be installed ---> Package gfs2-utils.x86_64 0:3.1.4-3.fc17 will be installed ---> Package kernel-modules-extra.x86_64 0:3.4.4-3.fc17 will be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: dlm x86_64 3.99.4-1.fc17 updates 83 k gfs2-utils x86_64 3.1.4-3.fc17 fedora 214 k kernel-modules-extra x86_64 3.4.4-3.fc17 updates 1.7 M Transaction Summary ================================================================================ Install 3 Packages Total download size: 1.9 M Installed size: 7.7 M Downloading Packages: (1/3): dlm-3.99.4-1.fc17.x86_64.rpm | 83 kB 00:00 (2/3): gfs2-utils-3.1.4-3.fc17.x86_64.rpm | 214 kB 00:00 (3/3): kernel-modules-extra-3.4.4-3.fc17.x86_64.rpm | 1.7 MB 00:01 --------------------------------------------------------------------------------- + ------------------------------------------------------------------------------- Total 615 kB/s | 1.9 MB 00:03 Running Transaction Check Running Transaction Test Transaction Test Succeeded Running Transaction Installing : kernel-modules-extra-3.4.4-3.fc17.x86_64 1/3 Installing : gfs2-utils-3.1.4-3.fc17.x86_64 2/3 Installing : dlm-3.99.4-1.fc17.x86_64 3/3 Verifying : dlm-3.99.4-1.fc17.x86_64 1/3 Verifying : gfs2-utils-3.1.4-3.fc17.x86_64 2/3 Verifying : kernel-modules-extra-3.4.4-3.fc17.x86_64 3/3 Installed: dlm.x86_64 0:3.99.4-1.fc17 gfs2-utils.x86_64 0:3.1.4-3.fc17 kernel-modules-extra.x86_64 0:3.4.4-3.fc17 Complete! ----- == Create a GFS2 Filesystem == [[GFS2_prep]] === Preparation === Before we do anything to the existing partition, we need to make sure it is unmounted. We do this by telling the cluster to stop the WebFS resource. This will ensure that other resources (in our case, Apache) using WebFS are not only stopped, but stopped in the correct order. [source,Bash] ----- # crm resource stop WebFS # crm_mon -1 ============ Last updated: Tue Apr 3 14:07:36 2012 Last change: Tue Apr 3 14:07:15 2012 via cibadmin on pcmk-1 Stack: corosync Current DC: pcmk-1 (1702537408) - partition with quorum Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, unknown expected votes 5 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2 Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-2 ] Slaves: [ pcmk-1 ] ----- [NOTE] ======= Note that both Apache and WebFS have been stopped. ======= === Create and Populate an GFS2 Partition === Now that the cluster stack and integration pieces are running smoothly, we can create an GFS2 partition. [WARNING] ========= This will erase all previous content stored on the DRBD device. Ensure you have a copy of any important data. ========= We need to specify a number of additional parameters when creating a GFS2 partition. First we must use the -p option to specify that we want to use the the Kernel's DLM. Next we use -j to indicate that it should reserve enough space for two journals (one per node accessing the filesystem). Lastly, we use -t to specify the lock table name. The format for this field is +clustername:fsname+. For the +fsname+, we need to use the same value as specified in 'cluster.conf' for +cluster_name+. Just pick something unique and descriptive and add somewhere inside the +totem+ block. For example: ..... totem { version: 2 # cypto_cipher and crypto_hash: Used for mutual node authentication. # If you choose to enable this, then do remember to create a shared # secret with "corosync-keygen". crypto_cipher: none crypto_hash: none cluster_name: webtest ... ..... [IMPORTANT] =========== Do this on each node in the cluster and be sure to restart them before continuing. =========== [IMPORTANT] =========== We must run the next command on whichever node last had '/dev/drbd' mounted. Otherwise you will receive the message: ----- /dev/drbd1: Read-only file system ----- =========== [source,Bash] ----- # ssh pcmk-2 -- mkfs.gfs2 -p lock_dlm -j 2 -t webtest:web /dev/drbd1 This will destroy any data on /dev/drbd1. It appears to contain: Linux rev 1.0 ext4 filesystem data, UUID=dc45fff3-c47a-4db2-96f7-a8049a323fe4 (extents) (large files) (huge files) Are you sure you want to proceed? [y/n]y Device: /dev/drbd1 Blocksize: 4096 Device Size 0.97 GB (253935 blocks) Filesystem Size: 0.97 GB (253932 blocks) Journals: 2 Resource Groups: 4 Locking Protocol: "lock_dlm" Lock Table: "webtest" UUID: ed293a02-9eee-3fa3-ed1c-435ef1fd0116 ----- [source,Bash] ----- # crm crm(live)# cib new dlm INFO: dlm shadow CIB created crm(dlm)# configure primitive dlm ocf:pacemaker:controld \ op monitor interval=60s crm(dlm)# configure clone dlm_clone dlm meta clone-max=2 clone-node-max=1 crm(dlm)# configure show node $id="1702537408" pcmk-1 \ attributes standby="off" node $id="1719314624" pcmk-2 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.120" cidr_netmask="32" \ op monitor interval="30s" primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ext4" \ meta target-role="Stopped" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive dlm ocf:pacemaker:controld \ op monitor interval="60s" ms WebDataClone WebData \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone dlm_clone dlm \ meta clone-max="2" clone-node-max="1" location prefer-pcmk-1 WebSite 50: pcmk-1 colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite ClusterIP order WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSite order apache-after-ip inf: ClusterIP WebSite property $id="cib-bootstrap-options" \ dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="corosync" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1333446866" rsc_defaults $id="rsc-options" \ resource-stickiness="100" op_defaults $id="op-options" \ timeout="240s" crm(dlm)# cib commit dlm INFO: commited 'dlm' shadow CIB to the cluster crm(dlm)# quit bye # crm_mon -1 ============ Last updated: Wed Apr 4 01:15:11 2012 Last change: Wed Apr 4 00:50:11 2012 via crmd on pcmk-1 Stack: corosync Current DC: pcmk-1 (1702537408) - partition with quorum Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, unknown expected votes 7 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1 Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-1 ] Slaves: [ pcmk-2 ] Clone Set: dlm_clone [dlm] Started: [ pcmk-1 pcmk-2 ] ----- Then (re)populate the new filesystem with data (web pages). For now we'll create another variation on our home page. [source,Bash] ----- # mount /dev/drbd1 /mnt/ # cat <<-END >/mnt/index.html My Test Site - GFS2 END # umount /dev/drbd1 # drbdadm verify wwwdata# ----- == Reconfigure the Cluster for GFS2 == [source,Bash] ----- # crm crm(live) # cib new GFS2 INFO: GFS2 shadow CIB created crm(GFS2) # configure delete WebFS crm(GFS2) # configure primitive WebFS ocf:heartbeat:Filesystem params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" ----- Now that we've recreated the resource, we also need to recreate all the constraints that used it. This is because the shell will automatically remove any constraints that referenced WebFS. [source,Bash] ----- crm(GFS2) # configure colocation WebSite-with-WebFS inf: WebSite WebFS crm(GFS2) # configure colocation fs_on_drbd inf: WebFS WebDataClone:Master crm(GFS2) # configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start crm(GFS2) # configure order WebSite-after-WebFS inf: WebFS WebSite crm(GFS2) # configure show node pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" \ op monitor interval="30s" ms WebDataClone WebData \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite ClusterIP order WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSite order apache-after-ip inf: ClusterIP WebSite property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ----- Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster's response [source,Bash] ----- crm(GFS2) # cib commit GFS2 INFO: commited 'GFS2' shadow CIB to the cluster crm(GFS2) # quit bye # crm_mon ============ Last updated: Thu Sep 3 20:49:54 2009 Stack: openais Current DC: pcmk-2 - partition with quorum Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f 2 Nodes configured, 2 expected votes 6 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] WebSite (ocf::heartbeat:apache): Started pcmk-2 Master/Slave Set: WebDataClone Masters: [ pcmk-1 ] Slaves: [ pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr): Started pcmk-2WebFS (ocf::heartbeat:Filesystem): Started pcmk-1 ----- == Reconfigure Pacemaker for Active/Active == Almost everything is in place. Recent versions of DRBD are capable of operating in Primary/Primary mode and the filesystem we're using is cluster aware. All we need to do now is reconfigure the cluster to take advantage of this. This will involve a number of changes, so we'll again use interactive mode. [source,Bash] ----- # crm # cib new active ----- There's no point making the services active on both locations if we can't reach them, so lets first clone the IP address. Cloned IPaddr2 resources use an iptables rule to ensure that each request only gets processed by one of the two clone instances. The additional meta options tell the cluster how many instances of the clone we want (one "request bucket" for each node) and that if all other nodes fail, then the remaining node should hold all of them. Otherwise the requests would be simply discarded. [source,Bash] ----- # configure clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" ----- Now we must tell the ClusterIP how to decide which requests are processed by which hosts. To do this we must specify the clusterip_hash parameter. Open the ClusterIP resource [source,Bash] ----- # configure edit ClusterIP ----- And add the following to the params line ..... clusterip_hash="sourceip" ..... So that the complete definition looks like: ..... primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s" ..... Here is the full transcript [source,Bash] ----- # crm crm(live) # cib new active INFO: active shadow CIB created crm(active) # configure clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" crm(active) # configure shownode pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s" ms WebDataClone WebData \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite WebIPorder WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSiteorder apache-after-ip inf: WebIP WebSite property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ----- Notice how any constraints that referenced ClusterIP have been updated to use WebIP instead. This is an additional benefit of using the crm shell. Next we need to convert the filesystem and Apache resources into clones. Again, the shell will automatically update any relevant constraints. [source,Bash] ----- crm(active) # configure clone WebFSClone WebFS crm(active) # configure clone WebSiteClone WebSite ----- The last step is to tell the cluster that it is now allowed to promote both instances to be Primary (aka. Master). [source,Bash] ----- crm(active) # configure edit WebDataClone ----- Change master-max to 2 [source,Bash] ----- crm(active) # configure show node pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s" ms WebDataClone WebData \ meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone WebFSClone WebFSclone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" clone WebSiteClone WebSitecolocation WebSite-with-WebFS inf: WebSiteClone WebFSClone colocation fs_on_drbd inf: WebFSClone WebDataClone:Master colocation website-with-ip inf: WebSiteClone WebIP order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start order WebSite-after-WebFS inf: WebFSClone WebSiteClone order apache-after-ip inf: WebIP WebSiteClone property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ----- Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster's response [source,Bash] ----- crm(active) # cib commit active INFO: commited 'active' shadow CIB to the cluster crm(active) # quit bye # crm_mon ============ Last updated: Thu Sep 3 21:37:27 2009 Stack: openais Current DC: pcmk-2 - partition with quorum Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f 2 Nodes configured, 2 expected votes 6 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] Master/Slave Set: WebDataClone Masters: [ pcmk-1 pcmk-2 ] Clone Set: WebIP Started: [ pcmk-1 pcmk-2 ] Clone Set: WebFSClone Started: [ pcmk-1 pcmk-2 ] Clone Set: WebSiteClone Started: [ pcmk-1 pcmk-2 ] ----- === Testing Recovery === [NOTE] ======= TODO: Put one node into standby to demonstrate failover =======