diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt b/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
index 95e01eae88..7375dc8850 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
@@ -1,669 +1,669 @@
= Creating an Active/Passive Cluster =
== Exploring the Existing Configuration ==
When Pacemaker starts up, it automatically records the number and details
of the nodes in the cluster as well as which stack is being used and the
version of Pacemaker being used.
This is what the base configuration should look like.
ifdef::pcs[]
[source,Bash]
----
# pcs status
Last updated: Fri Sep 14 10:12:01 2012
Last change: Fri Sep 14 09:51:55 2012 via crmd on pcmk-2
Stack: corosync
Current DC: pcmk-1 (1) - partition with quorum
Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
2 Nodes configured, unknown expected votes
0 Resources configured.
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm configure show
node $id="1702537408" pcmk-1
node $id="1719314624" pcmk-2
property $id="cib-bootstrap-options" \
dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="corosync"
----
endif::[]
-For those that are not of afraid of XML, you can see the raw
-configuration by appending "xml" to the previous command.
+For those that are not of afraid of XML, you can see the raw cluster
+configuration and status by using the +pcs cluster cib+ command.
.The last XML you'll see in this document
ifdef::pcs[]
[source,Bash]
----
# pcs cluster cib
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm configure show xml
----
endif::[]
Before we make any changes, its a good idea to check the validity of
the configuration.
[source,Bash]
----
# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
-V may provide more details
----
As you can see, the tool has found some errors.
In order to guarantee the safety of your data
footnote:[If the data is corrupt, there is little point in continuing to make it available]
, the default for STONITH
footnote:[A common node fencing mechanism. Used to ensure data integrity by powering off "bad" nodes]
in Pacemaker is +enabled+. However it also knows when no STONITH configuration has been
supplied and reports this as a problem (since the cluster would not be
able to make progress if a situation requiring node fencing arose).
For now, we will disable this feature and configure it later in the
Configuring STONITH section. It is important to note that the use of
STONITH is highly encouraged, turning it off tells the cluster to
simply pretend that failed nodes are safely powered off. Some vendors
will even refuse to support clusters that have it disabled.
To disable STONITH, we set the _stonith-enabled_ cluster option to
false.
ifdef::pcs[]
[source,Bash]
----
# pcs property set stonith-enabled=false
# crm_verify -L
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm configure property stonith-enabled=false
# crm_verify -L
----
endif::[]
With the new cluster option set, the configuration is now valid.
[WARNING]
=========
The use of stonith-enabled=false is completely inappropriate for a
production cluster. We use it here to defer the discussion of its
configuration which can differ widely from one installation to the
next. See <<_what_is_stonith>> for information on why STONITH is important
and details on how to configure it.
=========
== Adding a Resource ==
The first thing we should do is configure an IP address. Regardless of
where the cluster service(s) are running, we need a consistent address
to contact them on. Here I will choose and add 192.168.122.120 as the
floating address, give it the imaginative name ClusterIP and tell the
cluster to check that its running every 30 seconds.
[IMPORTANT]
===========
The chosen address must not be one already associated with
a physical node
===========
ifdef::pcs[]
[source,Bash]
----
# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
ip=192.168.0.120 cidr_netmask=32 op monitor interval=30s
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip=192.168.122.120 cidr_netmask=32 \
op monitor interval=30s
----
endif::[]
The other important piece of information here is ocf:heartbeat:IPaddr2.
This tells Pacemaker three things about the resource you want to
add. The first field, ocf, is the standard to which the resource
script conforms to and where to find it. The second field is specific
to OCF resources and tells the cluster which namespace to find the
resource script in, in this case heartbeat. The last field indicates
the name of the resource script.
ifdef::pcs[]
To obtain a list of the available resource standards (the ocf part of
ocf:heartbeat:IPaddr2), run
[source,Bash]
----
# pcs resource standards
ocf
lsb
service
systemd
stonith
----
To obtain a list of the available ocf resource providers (the heartbeat
part of ocf:heartbeat:IPaddr2), run
[source,Bash]
----
# pcs resource providers
heartbeat
linbit
pacemaker
redhat
----
Finally, if you want to see all the resource agents available for
a specific ocf provider (the IPaddr2 part of ocf:heartbeat:IPaddr2), run
[source,Bash]
----
# pcs resource agents ocf:heartbeat
AoEtarget
AudibleAlarm
CTDB
ClusterMon
Delay
Dummy
.
. (skipping lots of resources to save space)
.
IPaddr2
.
.
.
symlink
syslog-ng
tomcat
vmware
----
endif::[]
ifdef::crm[]
To obtain a list of the available resource classes, run
[source,Bash]
----
# crm ra classes
heartbeat
lsb
ocf / heartbeat pacemaker
stonith
----
To then find all the OCF resource agents provided by Pacemaker and
Heartbeat, run
[source,Bash]
----
# crm ra list ocf pacemaker
ClusterMon Dummy HealthCPU HealthSMART Stateful SysInfo
SystemHealth controld o2cb ping pingd
# crm ra list ocf heartbeat
AoEtarget AudibleAlarm CTDB ClusterMon
Delay Dummy EvmsSCC Evmsd
Filesystem ICP IPaddr IPaddr2
IPsrcaddr IPv6addr LVM LinuxSCSI
MailTo ManageRAID ManageVE Pure-FTPd
Raid1 Route SAPDatabase SAPInstance
SendArp ServeRAID SphinxSearchDaemon Squid
Stateful SysInfo VIPArip VirtualDomain
WAS WAS6 WinPopup Xen
Xinetd anything apache conntrackd
db2 drbd eDir88 ethmonitor
exportfs fio iSCSILogicalUnit iSCSITarget
ids iscsi jboss ldirectord
lxc mysql mysql-proxy nfsserver
nginx oracle oralsnr pgsql
pingd portblock postfix proftpd
rsyncd scsi2reservation sfex symlink
syslog-ng tomcat vmware
----
endif::[]
Now verify that the IP resource has been added and display the cluster's
status to see that it is now active.
ifdef::pcs[]
[source,Bash]
----
# pcs status
Last updated: Fri Sep 14 10:17:00 2012
Last change: Fri Sep 14 10:15:48 2012 via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-1 (1) - partition with quorum
Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
2 Nodes configured, unknown expected votes
1 Resources configured.
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm configure show
node $id="1702537408" pcmk-1
node $id="1719314624" pcmk-2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.122.120" cidr_netmask="32" \
op monitor interval="30s"
property $id="cib-bootstrap-options" \
dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="corosync" \
stonith-enabled="false"
# crm_mon -1
============
Last updated: Tue Apr 3 09:56:50 2012
Last change: Tue Apr 3 09:54:37 2012 via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-1 (1702537408) - partition with quorum
Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
----
endif::[]
== Perform a Failover ==
Being a high-availability cluster, we should test failover of our new
resource before moving on.
First, find the node on which the IP address is running.
ifdef::pcs[]
[source,Bash]
----
# pcs status
Last updated: Fri Sep 14 10:17:00 2012
Last change: Fri Sep 14 10:15:48 2012 via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-1 (1) - partition with quorum
Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
2 Nodes configured, unknown expected votes
1 Resources configured.
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm resource status ClusterIP
resource ClusterIP is running on: pcmk-1
----
endif::[]
Shut down Pacemaker and Corosync on that machine.
ifdef::pcs[]
[source,Bash]
----
#pcs cluster stop pcmk-1
Stopping Cluster...
----
Once Corosync is no longer running, go to the other node and check the
cluster status.
[source,Bash]
----
# pcs status
Last updated: Fri Sep 14 10:31:01 2012
Last change: Fri Sep 14 10:15:48 2012 via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-2 (2) - partition WITHOUT quorum
Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
2 Nodes configured, unknown expected votes
1 Resources configured.
Online: [ pcmk-2 ]
OFFLINE: [ pcmk-1 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Stopped
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# ssh pcmk-1 -- service pacemaker stop
# ssh pcmk-1 -- service corosync stop
----
Once Corosync is no longer running, go to the other node and check the
cluster status with crm_mon.
[source,Bash]
----
# crm_mon -1
============
Last updated: Tue Apr 3 10:01:28 2012
Last change: Tue Apr 3 09:54:39 2012 via cibadmin on pcmk-1
Stack: corosync
Current DC: pcmk-2 (1719314624) - partition WITHOUT quorum
Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ pcmk-2 ]
OFFLINE: [ pcmk-1 ]
----
endif::[]
There are three things to notice about the cluster's current
state. The first is that, as expected, +pcmk-1+ is now offline. However
we can also see that +ClusterIP+ isn't running anywhere!
=== Quorum and Two-Node Clusters ===
This is because the cluster no longer has quorum, as can be seen by
the text "partition WITHOUT quorum" in the status output. In order
to reduce the possibility of data corruption, Pacemaker's default
behavior is to stop all resources if the cluster does not have quorum.
A cluster is said to have quorum when more than half the known or
expected nodes are online, or for the mathematically inclined,
whenever the following equation is true:
....
total_nodes < 2 * active_nodes
....
Therefore a two-node cluster only has quorum when both nodes are
running, which is no longer the case for our cluster. This would
normally make the creation of a two-node cluster pointless
footnote:[Actually some would argue that two-node clusters are always pointless, but that is an argument for another time]
, however it is possible to control how Pacemaker behaves when quorum
is lost. In particular, we can tell the cluster to simply ignore
quorum altogether.
ifdef::pcs[]
[source,Bash]
----
# pcs property set no-quorum-policy=ignore
# pcs property
dc-version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
cluster-infrastructure: corosync
stonith-enabled: false
no-quorum-policy: ignore
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm configure property no-quorum-policy=ignore
# crm configure show
node $id="1702537408" pcmk-1
node $id="1719314624" pcmk-2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.122.120" cidr_netmask="32" \
op monitor interval="30s"
property $id="cib-bootstrap-options" \
dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
no-quorum-policy="ignore"
----
endif::[]
After a few moments, the cluster will start the IP address on the
remaining node. Note that the cluster still does not have quorum.
ifdef::pcs[]
[source,Bash]
----
# pcs status
Last updated: Fri Sep 14 10:38:11 2012
Last change: Fri Sep 14 10:37:53 2012 via cibadmin on pcmk-2
Stack: corosync
Current DC: pcmk-2 (2) - partition WITHOUT quorum
Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
2 Nodes configured, unknown expected votes
1 Resources configured.
Online: [ pcmk-2 ]
OFFLINE: [ pcmk-1 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm_mon -1
============
Last updated: Tue Apr 3 10:02:46 2012
Last change: Tue Apr 3 10:02:08 2012 via cibadmin on pcmk-2
Stack: corosync
Current DC: pcmk-2 (1719314624) - partition WITHOUT quorum
Version: 1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
1 Resources configured.
============
Online: [ pcmk-2 ]
OFFLINE: [ pcmk-1 ]
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
----
endif::[]
Now simulate node recovery by restarting the cluster stack on +pcmk-1+ and
check the cluster's status. Note, if you get an authentication error with
the 'pcs cluster start pcmk-1' command, you must authenticate on the node
using the 'pcs cluster auth pcmk pcmk-1 pcmk-2' command discussed earlier.
ifdef::pcs[]
[source,Bash]
----
# pcs cluster start pcmk-1
Starting Cluster...
# pcs status
Last updated: Fri Sep 14 10:42:56 2012
Last change: Fri Sep 14 10:37:53 2012 via cibadmin on pcmk-2
Stack: corosync
Current DC: pcmk-2 (2) - partition with quorum
Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
2 Nodes configured, unknown expected votes
1 Resources configured.
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# service corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
# service pacemaker start
Starting Pacemaker Cluster Manager: [ OK ]
# crm_mon
============
Last updated: Fri Aug 28 15:32:13 2009
Stack: openais
Current DC: pcmk-2 - partition with quorum
Version: 1.1.5-bdd89e69ba545404d02445be1f3d72e6a203ba2f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ pcmk-1 pcmk-2 ]
ClusterIP (ocf::heartbeat:IPaddr): Started pcmk-2
----
endif::[]
[NOTE]
======
In the dark days, the cluster may have moved the IP back to its
original location (+pcmk-1+). Usually this is no longer the case.
======
=== Prevent Resources from Moving after Recovery ===
In most circumstances, it is highly desirable to prevent healthy
resources from being moved around the cluster. Moving resources almost
always requires a period of downtime. For complex services like Oracle
databases, this period can be quite long.
To address this, Pacemaker has the concept of resource stickiness
which controls how much a service prefers to stay running where it
is. You may like to think of it as the "cost" of any downtime. By
default, Pacemaker assumes there is zero cost associated with moving
resources and will do so to achieve "optimal"
footnote:[It should be noted that Pacemaker's definition of
optimal may not always agree with that of a human's. The order in which
Pacemaker processes lists of resources and nodes creates implicit
preferences in situations where the administrator has not explicitly
specified them]
resource placement. We can specify a different stickiness for every
resource, but it is often sufficient to change the default.
ifdef::pcs[]
[source,Bash]
----
# pcs resource rsc defaults resource-stickiness=100
# pcs resource rsc defaults
resource-stickiness: 100
----
endif::[]
ifdef::crm[]
[source,Bash]
----
# crm configure rsc_defaults resource-stickiness=100
# crm configure show
node $id="1702537408" pcmk-1
node $id="1719314624" pcmk-2
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="192.168.122.120" cidr_netmask="32" \
op monitor interval="30s"
property $id="cib-bootstrap-options" \
dc-version="1.1.7-2.fc17-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="corosync" \
stonith-enabled="false" \
no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"
----
endif::[]