diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt b/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt index c8e9baf38d..571d9ed5a0 100644 --- a/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt +++ b/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt @@ -1,754 +1,755 @@ = Conversion to Active/Active = == Requirements == The primary requirement for an Active/Active cluster is that the data required for your services is available, simultaneously, on both machines. Pacemaker makes no requirement on how this is achieved, you could use a SAN if you had one available, however since DRBD supports multiple Primaries, we can also use that. The only hitch is that we need to use a cluster-aware filesystem. The one we used earlier with DRBD, ext4, is not one of those. Both OCFS2 and GFS2 are supported, however here we will use GFS2 which comes with Fedora 17. === Installing the required Software === [source,C] # yum install -y gfs2-utils dlm kernel-modules-extra ..... Loaded plugins: langpacks, presto, refresh-packagekit Resolving Dependencies --> Running transaction check ---> Package dlm.x86_64 0:3.99.4-1.fc17 will be installed ---> Package gfs2-utils.x86_64 0:3.1.4-3.fc17 will be installed ---> Package kernel-modules-extra.x86_64 0:3.4.4-3.fc17 will be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================================ Package Arch Version Repository Size ================================================================================ Installing: dlm x86_64 3.99.4-1.fc17 updates 83 k gfs2-utils x86_64 3.1.4-3.fc17 fedora 214 k kernel-modules-extra x86_64 3.4.4-3.fc17 updates 1.7 M Transaction Summary ================================================================================ Install 3 Packages Total download size: 1.9 M Installed size: 7.7 M Downloading Packages: (1/3): dlm-3.99.4-1.fc17.x86_64.rpm | 83 kB 00:00 (2/3): gfs2-utils-3.1.4-3.fc17.x86_64.rpm | 214 kB 00:00 (3/3): kernel-modules-extra-3.4.4-3.fc17.x86_64.rpm | 1.7 MB 00:01 -------------------------------------------------------------------------------- Total 615 kB/s | 1.9 MB 00:03 Running Transaction Check Running Transaction Test Transaction Test Succeeded Running Transaction Installing : kernel-modules-extra-3.4.4-3.fc17.x86_64 1/3 Installing : gfs2-utils-3.1.4-3.fc17.x86_64 2/3 Installing : dlm-3.99.4-1.fc17.x86_64 3/3 Verifying : dlm-3.99.4-1.fc17.x86_64 1/3 Verifying : gfs2-utils-3.1.4-3.fc17.x86_64 2/3 Verifying : kernel-modules-extra-3.4.4-3.fc17.x86_64 3/3 Installed: dlm.x86_64 0:3.99.4-1.fc17 gfs2-utils.x86_64 0:3.1.4-3.fc17 kernel-modules-extra.x86_64 0:3.4.4-3.fc17 Complete! ..... == Create a GFS2 Filesystem == [[GFS2_prep]] === Preparation === Before we do anything to the existing partition, we need to make sure it is unmounted. We do this by telling the cluster to stop the WebFS resource. This will ensure that other resources (in our case, Apache) using WebFS are not only stopped, but stopped in the correct order. ifdef::pcs[] [source,C] ---- # pcs resource stop WebFS # pcs resource ClusterIP (ocf::heartbeat:IPaddr2) Started WebSite (ocf::heartbeat:apache) Stopped Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-2 ] Slaves: [ pcmk-1 ] WebFS (ocf::heartbeat:Filesystem) Stopped ---- endif::[] ifdef::crm[] [source,C] ----- # crm resource stop WebFS # crm_mon -1 ============ Last updated: Tue Apr 3 14:07:36 2012 Last change: Tue Apr 3 14:07:15 2012 via cibadmin on pcmk-1 Stack: corosync Current DC: pcmk-1 (1702537408) - partition with quorum Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, unknown expected votes 5 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2 Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-2 ] Slaves: [ pcmk-1 ] ----- endif::[] [NOTE] ======= Note that both Apache and WebFS have been stopped. ======= === Create and Populate an GFS2 Partition === Now that the cluster stack and integration pieces are running smoothly, we can create an GFS2 partition. [WARNING] ========= This will erase all previous content stored on the DRBD device. Ensure you have a copy of any important data. ========= We need to specify a number of additional parameters when creating a GFS2 partition. First we must use the -p option to specify that we want to use the the Kernel's DLM. Next we use -j to indicate that it should reserve enough space for two journals (one per node accessing the filesystem). ifdef::pcs[] Lastly, we use -t to specify the lock table name. The format for this field is +clustername:fsname+. For the +fsname+, we need to use the same value as specified in 'corosync.conf' for +cluster_name+. If you setup corosync with the same cluster name we used in this tutorial, cluster name will be 'mycluster'. If you are unsure what your cluster name is, open up /etc/corosync/corosync.conf, or execute the command 'pcs cluster corosync pcmk-1' to view the corosync config. The cluster name will be in the +totem+ block. endif::[] ifdef::crm[] Lastly, we use -t to specify the lock table name. The format for this field is +clustername:fsname+. For the +fsname+, we need to use the same value as specified in 'corosync.conf' for +cluster_name+. Just pick something unique and descriptive and add somewhere inside the +totem+ block. For example: ..... totem { version: 2 # cypto_cipher and crypto_hash: Used for mutual node authentication. # If you choose to enable this, then do remember to create a shared # secret with "corosync-keygen". crypto_cipher: none crypto_hash: none cluster_name: mycluster ... ..... [IMPORTANT] =========== Do this on each node in the cluster and be sure to restart them before continuing. =========== endif::[] [IMPORTANT] =========== We must run the next command on whichever node last had '/dev/drbd' mounted. Otherwise you will receive the message: ----- /dev/drbd1: Read-only file system ----- =========== [source,C] ----- # ssh pcmk-2 -- mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:web /dev/drbd1 This will destroy any data on /dev/drbd1. It appears to contain: Linux rev 1.0 ext4 filesystem data, UUID=dc45fff3-c47a-4db2-96f7-a8049a323fe4 (extents) (large files) (huge files) Are you sure you want to proceed? [y/n]y Device: /dev/drbd1 Blocksize: 4096 Device Size 0.97 GB (253935 blocks) Filesystem Size: 0.97 GB (253932 blocks) Journals: 2 Resource Groups: 4 Locking Protocol: "lock_dlm" Lock Table: "mycluster" UUID: ed293a02-9eee-3fa3-ed1c-435ef1fd0116 ----- ifdef::pcs[] [source,C] ---- # pcs cluster cib dlm_cfg # pcs -f dlm_cfg resource create dlm ocf:pacemaker:controld op monitor interval=60s # pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1 # pcs -f dlm_cfg resource show ClusterIP (ocf::heartbeat:IPaddr2) Started WebSite (ocf::heartbeat:apache) Stopped Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-2 ] Slaves: [ pcmk-1 ] WebFS (ocf::heartbeat:Filesystem) Stopped Clone Set: dlm-clone [dlm] Stopped: [ dlm:0 dlm:1 ] # pcs cluster push cib dlm_cfg CIB updated # pcs status Last updated: Fri Sep 14 12:54:50 2012 Last change: Fri Sep 14 12:54:43 2012 via cibadmin on pcmk-1 Stack: corosync Current DC: pcmk-1 (1) - partition with quorum Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0 2 Nodes configured, unknown expected votes 7 Resources configured. Online: [ pcmk-1 pcmk-2 ] Full list of resources: ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2 WebSite (ocf::heartbeat:apache): Stopped Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-2 ] Slaves: [ pcmk-1 ] WebFS (ocf::heartbeat:Filesystem): Stopped Clone Set: dlm-clone [dlm] Started: [ pcmk-1 pcmk-2 ] ---- endif::[] ifdef::crm[] [source,C] ----- # crm crm(live)# cib new dlm INFO: dlm shadow CIB created crm(dlm)# configure primitive dlm ocf:pacemaker:controld \ op monitor interval=60s crm(dlm)# configure clone dlm_clone dlm meta clone-max=2 clone-node-max=1 crm(dlm)# configure show node $id="1702537408" pcmk-1 \ attributes standby="off" node $id="1719314624" pcmk-2 primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.120" cidr_netmask="32" \ op monitor interval="30s" primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="ext4" \ meta target-role="Stopped" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive dlm ocf:pacemaker:controld \ op monitor interval="60s" ms WebDataClone WebData \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone dlm_clone dlm \ meta clone-max="2" clone-node-max="1" location prefer-pcmk-1 WebSite 50: pcmk-1 colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite ClusterIP order WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSite order apache-after-ip inf: ClusterIP WebSite property $id="cib-bootstrap-options" \ dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \ cluster-infrastructure="corosync" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1333446866" rsc_defaults $id="rsc-options" \ resource-stickiness="100" op_defaults $id="op-options" \ timeout="240s" crm(dlm)# cib commit dlm INFO: commited 'dlm' shadow CIB to the cluster crm(dlm)# quit bye # crm_mon -1 ============ Last updated: Wed Apr 4 01:15:11 2012 Last change: Wed Apr 4 00:50:11 2012 via crmd on pcmk-1 Stack: corosync Current DC: pcmk-1 (1702537408) - partition with quorum Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff 2 Nodes configured, unknown expected votes 7 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1 Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-1 ] Slaves: [ pcmk-2 ] Clone Set: dlm_clone [dlm] Started: [ pcmk-1 pcmk-2 ] ----- endif::[] Then (re)populate the new filesystem with data (web pages). For now we'll create another variation on our home page. [source,C] ----- # mount /dev/drbd1 /mnt/ # cat <<-END >/mnt/index.html My Test Site - GFS2 END # umount /dev/drbd1 # drbdadm verify wwwdata# ----- == Reconfigure the Cluster for GFS2 == ifdef::pcs[] With the WebFS resource stopped, lets update the configuration. [source,C] ---- # pcs resource show WebFS Resource: WebFS device: /dev/drbd/by-res/wwwdata directory: /var/www/html fstype: ext4 target-role: Stopped ---- The fstype option needs to be updated to gfs2 instead of ext4. [source,C] ---- # pcs resource update WebFS fstype=gfs2 # pcs resource show WebFS Resource: WebFS device: /dev/drbd/by-res/wwwdata directory: /var/www/html fstype: gfs2 target-role: Stopped CIB updated ---- endif::[] ifdef::crm[] [source,C] ----- # crm crm(live) # cib new GFS2 INFO: GFS2 shadow CIB created crm(GFS2) # configure delete WebFS crm(GFS2) # configure primitive WebFS ocf:heartbeat:Filesystem params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" ----- Now that we've recreated the resource, we also need to recreate all the constraints that used it. This is because the shell will automatically remove any constraints that referenced WebFS. [source,C] ----- crm(GFS2) # configure colocation WebSite-with-WebFS inf: WebSite WebFS crm(GFS2) # configure colocation fs_on_drbd inf: WebFS WebDataClone:Master crm(GFS2) # configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start crm(GFS2) # configure order WebSite-after-WebFS inf: WebFS WebSite crm(GFS2) # configure show node pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" \ op monitor interval="30s" ms WebDataClone WebData \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite ClusterIP order WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSite order apache-after-ip inf: ClusterIP WebSite property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ----- Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster's response [source,C] ----- crm(GFS2) # cib commit GFS2 INFO: commited 'GFS2' shadow CIB to the cluster crm(GFS2) # quit bye # crm_mon ============ Last updated: Thu Sep 3 20:49:54 2009 Stack: openais Current DC: pcmk-2 - partition with quorum Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f 2 Nodes configured, 2 expected votes 6 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] WebSite (ocf::heartbeat:apache): Started pcmk-2 Master/Slave Set: WebDataClone Masters: [ pcmk-1 ] Slaves: [ pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr): Started pcmk-2WebFS (ocf::heartbeat:Filesystem): Started pcmk-1 ----- endif::[] == Reconfigure Pacemaker for Active/Active == Almost everything is in place. Recent versions of DRBD are capable of operating in Primary/Primary mode and the filesystem we're using is cluster aware. All we need to do now is reconfigure the cluster to take advantage of this. ifdef::pcs[] This will involve a number of changes, so we'll want work with a local cib file. [source,C] ---- # pcs cluster cib active_cfg ---- endif::[] ifdef::crm[] This will involve a number of changes, so we'll again use interactive mode. [source,C] ----- # crm # cib new active ----- endif::[] There's no point making the services active on both locations if we can't reach them, so lets first clone the IP address. Cloned IPaddr2 resources use an iptables rule to ensure that each request only gets processed by one of the two clone instances. The additional meta options tell the cluster how many instances of the clone we want (one "request bucket" for each node) and that if all other nodes fail, then the remaining node should hold all of them. Otherwise the requests would be simply discarded. ifdef::pcs[] ---- # pcs -f active_cfg resource clone ClusterIP \ globally-unique=true clone-max=2 clone-node-max=2 ---- Notice when the ClusterIP becomes a clone, the constraints referencing ClusterIP now reference the clone. This is done automatically by pcs. +endif::[] ifdef::pcs[] [source,C] ---- # pcs -f active_cfg constraint Location Constraints: Ordering Constraints: start ClusterIP-clone then start WebSite WebFS then WebSite promote WebDataClone then start WebFS Colocation Constraints: WebSite with ClusterIP-clone WebFS with WebDataClone (with-rsc-role:Master) WebSite with WebFS ---- endif::[] ifdef::crm[] [source,C] ----- # configure clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" ----- endif::[] Now we must tell the ClusterIP how to decide which requests are processed by which hosts. To do this we must specify the clusterip_hash parameter. ifdef::pcs[] [source,C] ---- # pcs -f active_cfg resource update ClusterIP clusterip_hash=sourceip ---- endif::[] ifdef::crm[] Open the ClusterIP resource [source,C] ----- # configure edit ClusterIP ----- And add the following to the params line ..... clusterip_hash="sourceip" ..... So that the complete definition looks like: ..... primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s" ..... Here is the full transcript [source,C] ----- # crm crm(live) # cib new active INFO: active shadow CIB created crm(active) # configure clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" crm(active) # configure shownode pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s" ms WebDataClone WebData \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite WebIPorder WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSiteorder apache-after-ip inf: WebIP WebSite property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ----- Notice how any constraints that referenced ClusterIP have been updated to use WebIP instead. This is an additional benefit of using the crm shell. endif::[] Next we need to convert the filesystem and Apache resources into clones. ifdef::pcs[] Notice how pcs automatically updates the relevant constraints again. [source,C] ---- # pcs -f active_cfg resource clone WebFS # pcs -f active_cfg resource clone WebSite # pcs -f active_cfg constraint Location Constraints: Ordering Constraints: start ClusterIP-clone then start WebSite-clone WebFS-clone then WebSite-clone promote WebDataClone then start WebFS-clone Colocation Constraints: WebSite-clone with ClusterIP-clone WebFS-clone with WebDataClone (with-rsc-role:Master) WebSite-clone with WebFS-clone ---- endif::[] ifdef::crm[] Again, the shell will automatically update any relevant constraints. [source,C] ----- crm(active) # configure clone WebFSClone WebFS crm(active) # configure clone WebSiteClone WebSite ----- endif::[] The last step is to tell the cluster that it is now allowed to promote both instances to be Primary (aka. Master). ifdef::pcs[] [source,C] ----- # pcs -f active_cfg resource update WebDataClone master-max=2 ----- endif::[] ifdef::crm[] [source,C] ----- crm(active) # configure edit WebDataClone ----- Change master-max to 2 [source,C] ----- crm(active) # configure show node pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s" ms WebDataClone WebData \ meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone WebFSClone WebFSclone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" clone WebSiteClone WebSitecolocation WebSite-with-WebFS inf: WebSiteClone WebFSClone colocation fs_on_drbd inf: WebFSClone WebDataClone:Master colocation website-with-ip inf: WebSiteClone WebIP order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start order WebSite-after-WebFS inf: WebFSClone WebSiteClone order apache-after-ip inf: WebIP WebSiteClone property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" ----- endif::[] Review the configuration before uploading it to the cluster, quitting the shell and watching the cluster's response ifdef::pcs[] [source,C] ----- # pcs cluster push cib active_cfg # pcs resource start WebFS ----- After all the processes are started the status should look similar to this. [source,C] ----- # pcs resource Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-2 pcmk-1 ] Clone Set: dlm-clone [dlm] Started: [ pcmk-2 pcmk-1 ] Clone Set: ClusterIP-clone [ClusterIP] (unique) ClusterIP:0 (ocf::heartbeat:IPaddr2) Started ClusterIP:1 (ocf::heartbeat:IPaddr2) Started Clone Set: WebFS-clone [WebFS] Started: [ pcmk-1 pcmk-2 ] Clone Set: WebSite-clone [WebSite] Started: [ pcmk-1 pcmk-2 ] ----- endif::[] ifdef::crm[] [source,C] ----- crm(active) # cib commit active INFO: commited 'active' shadow CIB to the cluster crm(active) # quit bye # crm_mon ============ Last updated: Thu Sep 3 21:37:27 2009 Stack: openais Current DC: pcmk-2 - partition with quorum Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f 2 Nodes configured, 2 expected votes 6 Resources configured. ============ Online: [ pcmk-1 pcmk-2 ] Master/Slave Set: WebDataClone Masters: [ pcmk-1 pcmk-2 ] Clone Set: WebIP Started: [ pcmk-1 pcmk-2 ] Clone Set: WebFSClone Started: [ pcmk-1 pcmk-2 ] Clone Set: WebSiteClone Started: [ pcmk-1 pcmk-2 ] Clone Set: dlm_clone Started: [ pcmk-1 pcmk-2 ] ----- endif::[] === Testing Recovery === [NOTE] ======= TODO: Put one node into standby to demonstrate failover ======= diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt b/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt index 695deea00b..dc37e905ee 100644 --- a/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt +++ b/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt @@ -1,307 +1,308 @@ = Configure STONITH = == What Is STONITH == STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access. Just because a node is unresponsive, this doesn't mean it isn't accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing the data to be accessed from another node. STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere. == What STONITH Device Should You Use == It is crucial that the STONITH device can allow the cluster to differentiate between a node failure and a network one. The biggest mistake people make in choosing a STONITH device is to use remote power switch (such as many on-board IMPI controllers) that shares power with the node it controls. In such cases, the cluster cannot be sure if the node is really offline, or active and suffering from a network fault. Likewise, any device that relies on the machine being active (such as SSH-based "devices" used during testing) are inappropriate. == Configuring STONITH == ifdef::pcs[] . Find the correct driver: +pcs stonith list+ . Find the parameters associated with the device: +pcs stonith describe + . Create a local config to make changes to +pcs cluster cib stonith_cfg+ . Create the fencing resource using +pcs -f stonith_cfg stonith create [stonith device options]+ . Set stonith-enable to true. +pcs -f stonith_cfg property set stonith-enabled=true+ endif::[] ifdef::crm[] . Find the correct driver: +stonith_admin --list-installed+ . Since every device is different, the parameters needed to configure it will vary. To find out the parameters associated with the device, run: +stonith_admin --metadata --agent type+ The output should be XML formatted text containing additional parameter descriptions. We will endevor to make the output more friendly in a later version. . Enter the shell crm Create an editable copy of the existing configuration +cib new stonith+ Create a fencing resource containing a primitive resource with a class of stonith, a type of type and a parameter for each of the values returned in step 2: +configure primitive ...+ endif::[] . If the device does not know how to fence nodes based on their uname, you may also need to set the special +pcmk_host_map+ parameter. See +man stonithd+ for details. . If the device does not support the list command, you may also need to set the special +pcmk_host_list+ and/or +pcmk_host_check+ parameters. See +man stonithd+ for details. . If the device does not expect the victim to be specified with the port parameter, you may also need to set the special +pcmk_host_argument+ parameter. See +man stonithd+ for details. ifdef::crm[] . Upload it into the CIB from the shell: +cib commit stonith+ endif::[] ifdef::pcs[] . Commit the new configuration. +pcs cluster push cib stonith_cfg+ endif::[] . Once the stonith resource is running, you can test it by executing: +stonith_admin --reboot nodename+. Although you might want to stop the cluster on that machine first. == Example == Assuming we have an chassis containing four nodes and an IPMI device active on 10.0.0.1, then we would chose the fence_ipmilan driver in step 2 and obtain the following list of parameters .Obtaining a list of STONITH Parameters ifdef::pcs[] [source,C] ---- # pcs stonith describe fence_ipmilan Stonith options for: fence_ipmilan auth: IPMI Lan Auth type (md5, password, or none) ipaddr: IPMI Lan IP to talk to passwd: Password (if required) to control power on IPMI device passwd_script: Script to retrieve password (if required) lanplus: Use Lanplus login: Username/Login (if required) to control power on IPMI device action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata timeout: Timeout (sec) for IPMI operation cipher: Ciphersuite to use (same as ipmitool -C parameter) method: Method to fence (onoff or cycle) power_wait: Wait X seconds after on/off operation delay: Wait X seconds before fencing is started privlvl: Privilege level on IPMI device verbose: Verbose mode ---- endif::[] ifdef::crm[] [source,C] ---- # stonith_admin --metadata -a fence_ipmilan ---- [source,XML] ---- fence_ipmilan is an I/O Fencing agent which can be used with machines controlled by IPMI. This agent calls support software using ipmitool (http://ipmitool.sf.net/). To use fence_ipmilan with HP iLO 3 you have to enable lanplus option (lanplus / -P) and increase wait after operation to 4 seconds (power_wait=4 / -T 4) IPMI Lan Auth type (md5, password, or none) IPMI Lan IP to talk to Password (if required) to control power on IPMI device Script to retrieve password (if required) Use Lanplus Username/Login (if required) to control power on IPMI device Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata Timeout (sec) for IPMI operation Ciphersuite to use (same as ipmitool -C parameter) Method to fence (onoff or cycle) Wait X seconds after on/off operation Wait X seconds before fencing is started Verbose mode ---- endif::[] from which we would create a STONITH resource fragment that might look like this .Sample STONITH Resource ifdef::pcs[] ---- # pcs cluster cib stonith_cfg # pcs -f stonith_cfg stonith create impi-fencing fence_ipmilan \ pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser \ passwd=acd123 op monitor interval=60s ---- [source,C] ---- # pcs -f stonith_cfg stonith impi-fencing (stonith:fence_ipmilan) Stopped ---- endif::[] ifdef::crm[] [source,C] ---- # crm crm(live)# cib new stonith INFO: stonith shadow CIB created crm(stonith)# configure primitive impi-fencing stonith::fence_ipmilan \ params pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser passwd=abc123 \ op monitor interval="60s" ---- endif::[] And finally, since we disabled it earlier, we need to re-enable STONITH. At this point we should have the following configuration. ifdef::pcs[] [source,C] ---- # pcs -f stonith_cfg property set stonith-enabled=true # pcs -f stonith_cfg property dc-version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0 cluster-infrastructure: corosync no-quorum-policy: ignore stonith-enabled: true ---- +endif::[] Now push the configuration into the cluster. ifdef::pcs[] [source,C] ---- # pcs cluster push cib stonith_cfg ---- endif::[] ifdef::crm[] [source,C] ---- crm(stonith)# configure property stonith-enabled="true" crm(stonith)# configure shownode pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s"primitive ipmi-fencing stonith::fence_ipmilan \ params pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser passwd=abc123 \ op monitor interval="60s"ms WebDataClone WebData \ meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone WebFSClone WebFS clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" clone WebSiteClone WebSite colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone colocation fs_on_drbd inf: WebFSClone WebDataClone:Master colocation website-with-ip inf: WebSiteClone WebIP order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start order WebSite-after-WebFS inf: WebFSClone WebSiteClone order apache-after-ip inf: WebIP WebSiteClone property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="true" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(stonith)# cib commit stonithINFO: commited 'stonith' shadow CIB to the cluster crm(stonith)# quit bye ---- endif::[] diff --git a/doc/Pacemaker_Explained/en-US/Ch-Basics.txt b/doc/Pacemaker_Explained/en-US/Ch-Basics.txt index 63681c64f2..57c0167424 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Basics.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Basics.txt @@ -1,368 +1,368 @@ = Configuration Basics = == Configuration Layout == The cluster is written using XML notation and divided into two main sections: configuration and status. The status section contains the history of each resource on each node and based on this data, the cluster can construct the complete current state of the cluster. The authoritative source for the status section is the local resource manager (lrmd) process on each cluster node and the cluster will occasionally repopulate the entire section. For this reason it is never written to disk and administrators are advised against modifying it in any way. The configuration section contains the more traditional information like cluster options, lists of resources and indications of where they should be placed. The configuration section is the primary focus of this document. The configuration section itself is divided into four parts: * Configuration options (called +crm_config+) * Nodes * Resources * Resource relationships (called +constraints+) .An empty configuration ====== [source,XML] ------- ------- ====== == The Current State of the Cluster == Before one starts to configure a cluster, it is worth explaining how to view the finished product. For this purpose we have created the `crm_mon` utility that will display the current state of an active cluster. It can show the cluster status by node or by resource and can be used in either single-shot or dynamically-updating mode. There are also modes for displaying a list of the operations performed (grouped by node and resource) as well as information about failures. Using this tool, you can examine the state of the cluster for irregularities and see how it responds when you cause or simulate failures. Details on all the available options can be obtained using the `crm_mon --help` command. .Sample output from crm_mon ====== ------- ============ Last updated: Fri Nov 23 15:26:13 2007 Current DC: sles-3 (2298606a-6a8c-499a-9d25-76242f7006ec) 3 Nodes configured. 5 Resources configured. ============ Node: sles-1 (1186dc9a-324d-425a-966e-d757e693dc86): online 192.168.100.181 (heartbeat::ocf:IPaddr): Started sles-1 192.168.100.182 (heartbeat:IPaddr): Started sles-1 192.168.100.183 (heartbeat::ocf:IPaddr): Started sles-1 rsc_sles-1 (heartbeat::ocf:IPaddr): Started sles-1 child_DoFencing:2 (stonith:external/vmware): Started sles-1 Node: sles-2 (02fb99a8-e30e-482f-b3ad-0fb3ce27d088): standby Node: sles-3 (2298606a-6a8c-499a-9d25-76242f7006ec): online rsc_sles-2 (heartbeat::ocf:IPaddr): Started sles-3 rsc_sles-3 (heartbeat::ocf:IPaddr): Started sles-3 child_DoFencing:0 (stonith:external/vmware): Started sles-3 ------- ====== .Sample output from crm_mon -n ====== ------- ============ Last updated: Fri Nov 23 15:26:13 2007 Current DC: sles-3 (2298606a-6a8c-499a-9d25-76242f7006ec) 3 Nodes configured. 5 Resources configured. ============ Node: sles-1 (1186dc9a-324d-425a-966e-d757e693dc86): online Node: sles-2 (02fb99a8-e30e-482f-b3ad-0fb3ce27d088): standby Node: sles-3 (2298606a-6a8c-499a-9d25-76242f7006ec): online Resource Group: group-1 192.168.100.181 (heartbeat::ocf:IPaddr): Started sles-1 192.168.100.182 (heartbeat:IPaddr): Started sles-1 192.168.100.183 (heartbeat::ocf:IPaddr): Started sles-1 rsc_sles-1 (heartbeat::ocf:IPaddr): Started sles-1 rsc_sles-2 (heartbeat::ocf:IPaddr): Started sles-3 rsc_sles-3 (heartbeat::ocf:IPaddr): Started sles-3 Clone Set: DoFencing child_DoFencing:0 (stonith:external/vmware): Started sles-3 child_DoFencing:1 (stonith:external/vmware): Stopped child_DoFencing:2 (stonith:external/vmware): Started sles-1 ------- ====== The DC (Designated Controller) node is where all the decisions are made and if the current DC fails a new one is elected from the remaining cluster nodes. The choice of DC is of no significance to an administrator beyond the fact that its logs will generally be more interesting. == How Should the Configuration be Updated? == There are three basic rules for updating the cluster configuration: * Rule 1 - Never edit the cib.xml file manually. Ever. I'm not making this up. * Rule 2 - Read Rule 1 again. * Rule 3 - The cluster will notice if you ignored rules 1 & 2 and refuse to use the configuration. Now that it is clear how NOT to update the configuration, we can begin to explain how you should. The most powerful tool for modifying the configuration is the +cibadmin+ command which talks to a running cluster. With +cibadmin+, the user can query, add, remove, update or replace any part of the configuration; all changes take effect immediately, so there is no need to perform a reload-like operation. The simplest way of using cibadmin is to use it to save the current configuration to a temporary file, edit that file with your favorite text or XML editor and then upload the revised configuration. .Safely using an editor to modify the cluster configuration ====== [source,C] -------- # cibadmin --query > tmp.xml # vi tmp.xml # cibadmin --replace --xml-file tmp.xml -------- ====== Some of the better XML editors can make use of a Relax NG schema to help make sure any changes you make are valid. The schema describing the configuration can normally be found in '/usr/lib/heartbeat/pacemaker.rng' on most systems. If you only wanted to modify the resources section, you could instead do .Safely using an editor to modify a subsection of the cluster configuration ====== [source,C] -------- # cibadmin --query --obj_type resources > tmp.xml # vi tmp.xml # cibadmin --replace --obj_type resources --xml-file tmp.xml -------- ====== to avoid modifying any other part of the configuration. == Quickly Deleting Part of the Configuration == Identify the object you wish to delete. Eg. run .Searching for STONITH related configuration items ====== [source,C] # cibadmin -Q | grep stonith [source,XML] -------- -------- ====== Next identify the resource's tag name and id (in this case we'll choose +primitive+ and +child_DoFencing+). Then simply execute: [source,C] # cibadmin --delete --crm_xml '<primitive id="child_DoFencing"/>' == Updating the Configuration Without Using XML == Some common tasks can also be performed with one of the higher level tools that avoid the need to read or edit XML. To enable stonith for example, one could run: [source,C] # crm_attribute --attr-name stonith-enabled --attr-value true Or, to see if +somenode+ is allowed to run resources, there is: [source,C] # crm_standby --get-value --node-uname somenode Or, to find the current location of +my-test-rsc+, one can use: [source,C] # crm_resource --locate --resource my-test-rsc [[s-config-sandboxes]] == Making Configuration Changes in a Sandbox == Often it is desirable to preview the effects of a series of changes before updating the configuration atomically. For this purpose we have created `crm_shadow` which creates a "shadow" copy of the configuration and arranges for all the command line tools to use it. To begin, simply invoke `crm_shadow` and give it the name of a configuration to create footnote:[Shadow copies are identified with a name, making it possible to have more than one.] ; be sure to follow the simple on-screen instructions. WARNING: Read the above carefully, failure to do so could result in you destroying the cluster's active configuration! .Creating and displaying the active sandbox -[source,Bash] ====== +[source,Bash] -------- # crm_shadow --create test Setting up shadow instance Type Ctrl-D to exit the crm_shadow shell shadow[test]: shadow[test] # crm_shadow --which test -------- ====== From this point on, all cluster commands will automatically use the shadow copy instead of talking to the cluster's active configuration. Once you have finished experimenting, you can either commit the changes, or discard them as shown below. Again, be sure to follow the on-screen instructions carefully. For a full list of `crm_shadow` options and commands, invoke it with the --help option. .Using a sandbox to make multiple changes atomically ====== [source,Bash] -------- shadow[test] # crm_failcount -G -r rsc_c001n01 name=fail-count-rsc_c001n01 value=0 shadow[test] # crm_standby -v on -n c001n02 shadow[test] # crm_standby -G -n c001n02 name=c001n02 scope=nodes value=on shadow[test] # cibadmin --erase --force shadow[test] # cibadmin --query shadow[test] # crm_shadow --delete test --force Now type Ctrl-D to exit the crm_shadow shell shadow[test] # exit # crm_shadow --which No shadow instance provided # cibadmin -Q -------- ====== Making changes in a sandbox and verifying the real configuration is untouched [[s-config-testing-changes]] == Testing Your Configuration Changes == We saw previously how to make a series of changes to a "shadow" copy of the configuration. Before loading the changes back into the cluster (eg. `crm_shadow --commit mytest --force`), it is often advisable to simulate the effect of the changes with +crm_simulate+, eg. [source,C] # crm_simulate --live-check -VVVVV --save-graph tmp.graph --save-dotfile tmp.dot The tool uses the same library as the live cluster to show what it would have done given the supplied input. It's output, in addition to a significant amount of logging, is stored in two files +tmp.graph+ and +tmp.dot+, both are representations of the same thing -- the cluster's response to your changes. In the graph file is stored the complete transition, containing a list of all the actions, their parameters and their pre-requisites. Because the transition graph is not terribly easy to read, the tool also generates a Graphviz dot-file representing the same information. == Interpreting the Graphviz output == * Arrows indicate ordering dependencies * Dashed-arrows indicate dependencies that are not present in the transition graph * Actions with a dashed border of any color do not form part of the transition graph * Actions with a green border form part of the transition graph * Actions with a red border are ones the cluster would like to execute but cannot run * Actions with a blue border are ones the cluster does not feel need to be executed * Actions with orange text are pseudo/pretend actions that the cluster uses to simplify the graph * Actions with black text are sent to the LRM * Resource actions have text of the form pass:[rsc]_pass:[action]_pass:[interval] pass:[node] * Any action depending on an action with a red border will not be able to execute. * Loops are _really_ bad. Please report them to the development team. === Small Cluster Transition === image::images/Policy-Engine-small.png["An example transition graph as represented by Graphviz",width="16cm",height="6cm",align="center"] In the above example, it appears that a new node, +node2+, has come online and that the cluster is checking to make sure +rsc1+, +rsc2+ and +rsc3+ are not already running there (Indicated by the +*_monitor_0+ entries). Once it did that, and assuming the resources were not active there, it would have liked to stop +rsc1+ and +rsc2+ on +node1+ and move them to +node2+. However, there appears to be some problem and the cluster cannot or is not permitted to perform the stop actions which implies it also cannot perform the start actions. For some reason the cluster does not want to start +rsc3+ anywhere. For information on the options supported by `crm_simulate`, use the `--help` option. === Complex Cluster Transition === image::images/Policy-Engine-big.png["Another, slightly more complex, transition graph that you're not expected to be able to read",width="16cm",height="20cm",align="center"] == Do I Need to Update the Configuration on all Cluster Nodes? == No. Any changes are immediately synchronized to the other active members of the cluster. To reduce bandwidth, the cluster only broadcasts the incremental updates that result from your changes and uses MD5 checksums to ensure that each copy is completely consistent. diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt index e259ee24ec..4c831db8a7 100644 --- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt +++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt @@ -1,307 +1,308 @@ [[ch-stonith]] = Configure STONITH = == What Is STONITH == STONITH is an acronym for Shoot-The-Other-Node-In-The-Head and it protects your data from being corrupted by rogue nodes or concurrent access. Just because a node is unresponsive, this doesn't mean it isn't accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing the data to be accessed from another node. STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere. == What STONITH Device Should You Use == It is crucial that the STONITH device can allow the cluster to differentiate between a node failure and a network one. The biggest mistake people make in choosing a STONITH device is to use remote power switch (such as many on-board IMPI controllers) that shares power with the node it controls. In such cases, the cluster cannot be sure if the node is really offline, or active and suffering from a network fault. Likewise, any device that relies on the machine being active (such as SSH-based "devices" used during testing) are inappropriate. == Configuring STONITH == ifdef::pcs[] . Find the correct driver: +pcs stonith list+ . Find the parameters associated with the device: +pcs stonith describe + . Create a local config to make changes to +pcs cluster cib stonith_cfg+ . Create the fencing resource using +pcs -f stonith_cfg stonith create [stonith device options]+ . Set stonith-enable to true. +pcs -f stonith_cfg property set stonith-enabled=true+ -endif::[] +endif::pcs[] ifdef::crm[] . Find the correct driver: +stonith_admin --list-installed+ . Since every device is different, the parameters needed to configure it will vary. To find out the parameters associated with the device, run: +stonith_admin --metadata --agent type+ The output should be XML formatted text containing additional parameter descriptions. We will endevor to make the output more friendly in a later version. . Enter the shell crm Create an editable copy of the existing configuration +cib new stonith+ Create a fencing resource containing a primitive resource with a class of stonith, a type of type and a parameter for each of the values returned in step 2: +configure primitive ...+ -endif::[] +endif::crm[] . If the device does not know how to fence nodes based on their uname, you may also need to set the special +pcmk_host_map+ parameter. See +man stonithd+ for details. . If the device does not support the list command, you may also need to set the special +pcmk_host_list+ and/or +pcmk_host_check+ parameters. See +man stonithd+ for details. . If the device does not expect the victim to be specified with the port parameter, you may also need to set the special +pcmk_host_argument+ parameter. See +man stonithd+ for details. ifdef::crm[] . Upload it into the CIB from the shell: +cib commit stonith+ -endif::[] +endif::crm[] ifdef::pcs[] . Commit the new configuration. +pcs cluster push cib stonith_cfg+ -endif::[] +endif::pcs[] . Once the stonith resource is running, you can test it by executing: +stonith_admin --reboot nodename+. Although you might want to stop the cluster on that machine first. == Example == Assuming we have an chassis containing four nodes and an IPMI device active on 10.0.0.1, then we would chose the fence_ipmilan driver in step 2 and obtain the following list of parameters .Obtaining a list of STONITH Parameters ifdef::pcs[] [source,Bash] ---- # pcs stonith describe fence_ipmilan Stonith options for: fence_ipmilan auth: IPMI Lan Auth type (md5, password, or none) ipaddr: IPMI Lan IP to talk to passwd: Password (if required) to control power on IPMI device passwd_script: Script to retrieve password (if required) lanplus: Use Lanplus login: Username/Login (if required) to control power on IPMI device action: Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata timeout: Timeout (sec) for IPMI operation cipher: Ciphersuite to use (same as ipmitool -C parameter) method: Method to fence (onoff or cycle) power_wait: Wait X seconds after on/off operation delay: Wait X seconds before fencing is started privlvl: Privilege level on IPMI device verbose: Verbose mode ---- -endif::[] +endif::pcs[] ifdef::crm[] [source,C] ---- # stonith_admin --metadata -a fence_ipmilan ---- [source,XML] ---- fence_ipmilan is an I/O Fencing agent which can be used with machines controlled by IPMI. This agent calls support software using ipmitool (http://ipmitool.sf.net/). To use fence_ipmilan with HP iLO 3 you have to enable lanplus option (lanplus / -P) and increase wait after operation to 4 seconds (power_wait=4 / -T 4) IPMI Lan Auth type (md5, password, or none) IPMI Lan IP to talk to Password (if required) to control power on IPMI device Script to retrieve password (if required) Use Lanplus Username/Login (if required) to control power on IPMI device Operation to perform. Valid operations: on, off, reboot, status, list, diag, monitor or metadata Timeout (sec) for IPMI operation Ciphersuite to use (same as ipmitool -C parameter) Method to fence (onoff or cycle) Wait X seconds after on/off operation Wait X seconds before fencing is started Verbose mode ---- -endif::[] +endif::crm[] from which we would create a STONITH resource fragment that might look like this .Sample STONITH Resource ifdef::pcs[] [source,Bash] ---- # pcs cluster cib stonith_cfg # pcs -f stonith_cfg stonith create impi-fencing fence_ipmilan \ pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser \ passwd=acd123 op monitor interval=60s # pcs -f stonith_cfg stonith impi-fencing (stonith:fence_ipmilan) Stopped ---- -endif::[] +endif::pcs[] ifdef::crm[] [source,Bash] ---- # crm crm(live)# cib new stonith INFO: stonith shadow CIB created crm(stonith)# configure primitive impi-fencing stonith::fence_ipmilan \ params pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser passwd=abc123 \ op monitor interval="60s" ---- -endif::[] +endif::crm[] And finally, since we disabled it earlier, we need to re-enable STONITH. At this point we should have the following configuration. ifdef::pcs[] [source,Bash] ---- # pcs -f stonith_cfg property set stonith-enabled=true # pcs -f stonith_cfg property dc-version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0 cluster-infrastructure: corosync no-quorum-policy: ignore stonith-enabled: true ---- +endif::pcs[] Now push the configuration into the cluster. ifdef::pcs[] [source,C] ---- # pcs cluster push cib stonith_cfg ---- -endif::[] +endif::pcs[] ifdef::crm[] [source,Bash] ---- crm(stonith)# configure property stonith-enabled="true" crm(stonith)# configure shownode pcmk-1 node pcmk-2 primitive WebData ocf:linbit:drbd \ params drbd_resource="wwwdata" \ op monitor interval="60s" primitive WebFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/wwwdata" directory="/var/www/html" fstype="gfs2" primitive WebSite ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="1min" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.101" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s"primitive ipmi-fencing stonith::fence_ipmilan \ params pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser passwd=abc123 \ op monitor interval="60s"ms WebDataClone WebData \ meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone WebFSClone WebFS clone WebIP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" clone WebSiteClone WebSite colocation WebSite-with-WebFS inf: WebSiteClone WebFSClone colocation fs_on_drbd inf: WebFSClone WebDataClone:Master colocation website-with-ip inf: WebSiteClone WebIP order WebFS-after-WebData inf: WebDataClone:promote WebFSClone:start order WebSite-after-WebFS inf: WebFSClone WebSiteClone order apache-after-ip inf: WebIP WebSiteClone property $id="cib-bootstrap-options" \ dc-version="1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="true" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" crm(stonith)# cib commit stonithINFO: commited 'stonith' shadow CIB to the cluster crm(stonith)# quit bye ---- -endif::[] +endif::crm[]