diff --git a/doc/Clusters_from_Scratch/en-US/Ap-Configuration.txt b/doc/Clusters_from_Scratch/en-US/Ap-Configuration.txt
index 6dc987c24c..04d57cd4d1 100644
--- a/doc/Clusters_from_Scratch/en-US/Ap-Configuration.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ap-Configuration.txt
@@ -1,450 +1,451 @@
+:compat-mode: legacy
[appendix]
== Configuration Recap ==
=== Final Cluster Configuration ===
----
[root@pcmk-1 ~]# pcs resource
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-1 pcmk-2 ]
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: ClusterIP-clone [ClusterIP] (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started
Clone Set: WebFS-clone [WebFS]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: WebSite-clone [WebSite]
Started: [ pcmk-1 pcmk-2 ]
----
----
[root@pcmk-1 ~]# pcs resource op defaults
timeout: 240s
----
----
[root@pcmk-1 ~]# pcs stonith
impi-fencing (stonith:fence_ipmilan) Started
----
----
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Ordering Constraints:
start ClusterIP-clone then start WebSite-clone (kind:Mandatory)
promote WebDataClone then start WebFS-clone (kind:Mandatory)
start WebFS-clone then start WebSite-clone (kind:Mandatory)
start dlm-clone then start WebFS-clone (kind:Mandatory)
Colocation Constraints:
WebSite-clone with ClusterIP-clone (score:INFINITY)
WebFS-clone with WebDataClone (score:INFINITY) (with-rsc-role:Master)
WebSite-clone with WebFS-clone (score:INFINITY)
WebFS-clone with dlm-clone (score:INFINITY)
Ticket Constraints:
----
----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 12:05:37 2018
Last change: Fri Jan 12 11:49:29 2018
2 nodes configured
11 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
impi-fencing (stonith:fence_ipmilan): Started pcmk-1
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-1 pcmk-2 ]
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: ClusterIP-clone [ClusterIP] (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started pcmk-2
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started pcmk-1
Clone Set: WebFS-clone [WebFS]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: WebSite-clone [WebSite]
Started: [ pcmk-1 pcmk-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
----
[root@pcmk-1 ~]# pcs cluster cib
----
[source,XML]
----
----
=== Node List ===
----
[root@pcmk-1 ~]# pcs status nodes
Pacemaker Nodes:
Online: pcmk-1 pcmk-2
Standby:
Offline:
----
=== Cluster Options ===
----
[root@pcmk-1 ~]# pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: mycluster
dc-version: 1.1.16-12.el7_4.5-94ff4df
have-watchdog: false
last-lrm-refresh: 1439569053
stonith-enabled: true
----
The output shows state information automatically obtained about the cluster, including:
* *cluster-infrastructure* - the cluster communications layer in use
* *cluster-name* - the cluster name chosen by the administrator when the cluster was created
* *dc-version* - the version (including upstream source-code hash) of Pacemaker used on the Designated Controller
The output also shows options set by the administrator that control the way the cluster operates, including:
* *stonith-enabled=true* - whether the cluster is allowed to use STONITH resources
=== Resources ===
==== Default Options ====
----
[root@pcmk-1 ~]# pcs resource defaults
resource-stickiness: 100
----
This shows cluster option defaults that apply to every resource that does not
explicitly set the option itself. Above:
* *resource-stickiness* - Specify the aversion to moving healthy resources to other machines
==== Fencing ====
----
[root@pcmk-1 ~]# pcs stonith show
ipmi-fencing (stonith:fence_ipmilan) Started
[root@pcmk-1 ~]# pcs stonith show ipmi-fencing
Resource: ipmi-fencing (class=stonith type=fence_ipmilan)
Attributes: ipaddr="10.0.0.1" login="testuser" passwd="acd123" pcmk_host_list="pcmk-1 pcmk-2"
Operations: monitor interval=60s (fence-monitor-interval-60s)
----
==== Service Address ====
Users of the services provided by the cluster require an unchanging
address with which to access it. Additionally, we cloned the address so
it will be active on both nodes. An iptables rule (created as part of the
resource agent) is used to ensure that each request only gets processed by one
of the two clone instances. The additional meta options tell the cluster
that we want two instances of the clone (one "request bucket" for each
node) and that if one node fails, then the remaining node should hold
both.
----
[root@pcmk-1 ~]# pcs resource show ClusterIP-clone
Clone: ClusterIP-clone
Meta Attrs: clone-max=2 clone-node-max=2 globally-unique=true
Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip=192.168.122.120 cidr_netmask=32 clusterip_hash=sourceip
Operations: start interval=0s timeout=20s (ClusterIP-start-timeout-20s)
stop interval=0s timeout=20s (ClusterIP-stop-timeout-20s)
monitor interval=30s (ClusterIP-monitor-interval-30s)
----
==== DRBD - Shared Storage ====
Here, we define the DRBD service and specify which DRBD resource (from
/etc/drbd.d/*.res) it should manage. We make it a promotable clone resource and, in
order to have an active/active setup, allow both instances to be promoted to master
at the same time. We also set the notify option so that the
cluster will tell DRBD agent when its peer changes state.
----
[root@pcmk-1 ~]# pcs resource show WebDataClone
Master: WebDataClone
Meta Attrs: master-max=2 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
Resource: WebData (class=ocf provider=linbit type=drbd)
Attributes: drbd_resource=wwwdata
Operations: start interval=0s timeout=240 (WebData-start-timeout-240)
promote interval=0s timeout=90 (WebData-promote-timeout-90)
demote interval=0s timeout=90 (WebData-demote-timeout-90)
stop interval=0s timeout=100 (WebData-stop-timeout-100)
monitor interval=60s (WebData-monitor-interval-60s)
[root@pcmk-1 ~]# pcs constraint ref WebDataClone
Resource: WebDataClone
colocation-WebFS-WebDataClone-INFINITY
order-WebDataClone-WebFS-mandatory
----
==== Cluster Filesystem ====
The cluster filesystem ensures that files are read and written correctly.
We need to specify the block device (provided by DRBD), where we want it
mounted and that we are using GFS2. Again, it is a clone because it is
intended to be active on both nodes. The additional constraints ensure
that it can only be started on nodes with active DLM and DRBD instances.
----
[root@pcmk-1 ~]# pcs resource show WebFS-clone
Clone: WebFS-clone
Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd1 directory=/var/www/html fstype=gfs2
Operations: start interval=0s timeout=60 (WebFS-start-timeout-60)
stop interval=0s timeout=60 (WebFS-stop-timeout-60)
monitor interval=20 timeout=40 (WebFS-monitor-interval-20)
[root@pcmk-1 ~]# pcs constraint ref WebFS-clone
Resource: WebFS-clone
colocation-WebFS-WebDataClone-INFINITY
colocation-WebSite-WebFS-INFINITY
colocation-WebFS-clone-dlm-clone-INFINITY
order-WebDataClone-WebFS-mandatory
order-WebFS-WebSite-mandatory
order-dlm-clone-WebFS-clone-mandatory
----
==== Apache ====
Lastly, we have the actual service, Apache. We need only tell the cluster
where to find its main configuration file and restrict it to running on
nodes that have the required filesystem mounted and the IP address active.
----
[root@pcmk-1 ~]# pcs resource show WebSite-clone
Clone: WebSite-clone
Resource: WebSite (class=ocf provider=heartbeat type=apache)
Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
Operations: start interval=0s timeout=40s (WebSite-start-timeout-40s)
stop interval=0s timeout=60s (WebSite-stop-timeout-60s)
monitor interval=1min (WebSite-monitor-interval-1min)
[root@pcmk-1 ~]# pcs constraint ref WebSite-clone
Resource: WebSite-clone
colocation-WebSite-ClusterIP-INFINITY
colocation-WebSite-WebFS-INFINITY
order-ClusterIP-WebSite-mandatory
order-WebFS-WebSite-mandatory
----
diff --git a/doc/Clusters_from_Scratch/en-US/Ap-Corosync-Conf.txt b/doc/Clusters_from_Scratch/en-US/Ap-Corosync-Conf.txt
index 87f4042a85..a00e9a2e5a 100644
--- a/doc/Clusters_from_Scratch/en-US/Ap-Corosync-Conf.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ap-Corosync-Conf.txt
@@ -1,33 +1,34 @@
+:compat-mode: legacy
[appendix]
[[ap-corosync-conf]]
== Sample Corosync Configuration ==
.Sample +corosync.conf+ for two-node cluster created by `pcs`.
.....
totem {
version: 2
secauth: off
cluster_name: mycluster
transport: udpu
}
nodelist {
node {
ring0_addr: pcmk-1
nodeid: 1
}
node {
ring0_addr: pcmk-2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_syslog: yes
}
.....
diff --git a/doc/Clusters_from_Scratch/en-US/Ap-Reading.txt b/doc/Clusters_from_Scratch/en-US/Ap-Reading.txt
index 3b9367418d..eac4ad3b37 100644
--- a/doc/Clusters_from_Scratch/en-US/Ap-Reading.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ap-Reading.txt
@@ -1,12 +1,13 @@
+:compat-mode: legacy
[appendix]
== Further Reading ==
- Project Website
http://www.clusterlabs.org/
- SuSE has a comprehensive guide to cluster commands (though using the `crmsh` command-line
shell rather than `pcs`) at:
https://www.suse.com/documentation/sle_ha/book_sleha/data/book_sleha.html
- Corosync
http://www.corosync.org/
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt b/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt
index deecca3b43..a88643e887 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Active-Active.txt
@@ -1,374 +1,375 @@
+:compat-mode: legacy
= Convert Cluster to Active/Active =
The primary requirement for an Active/Active cluster is that the data
required for your services is available, simultaneously, on both
machines. Pacemaker makes no requirement on how this is achieved; you
could use a SAN if you had one available, but since DRBD supports
multiple Primaries, we can continue to use it here.
== Install Cluster Filesystem Software ==
The only hitch is that we need to use a cluster-aware filesystem. The
one we used earlier with DRBD, xfs, is not one of those. Both OCFS2
and GFS2 are supported; here, we will use GFS2.
On both nodes, install the GFS2 command-line utilities and the
Distributed Lock Manager (DLM) required by cluster filesystems:
----
# yum install -y gfs2-utils dlm
----
== Configure the Cluster for the DLM ==
The DLM needs to run on both nodes, so we'll start by creating a resource for
it (using the *ocf:pacemaker:controld* resource script), and clone it:
----
[root@pcmk-1 ~]# pcs cluster cib dlm_cfg
[root@pcmk-1 ~]# pcs -f dlm_cfg resource create dlm ocf:pacemaker:controld op monitor interval=60s
[root@pcmk-1 ~]# pcs -f dlm_cfg resource clone dlm clone-max=2 clone-node-max=1
[root@pcmk-1 ~]# pcs -f dlm_cfg resource show
ClusterIP (ocf::heartbeat:IPaddr2): Started
WebSite (ocf::heartbeat:apache): Started
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-2 ]
Slaves: [ pcmk-1 ]
WebFS (ocf::heartbeat:Filesystem): Started
Clone Set: dlm-clone [dlm]
Stopped: [ pcmk-1 pcmk-2 ]
----
Activate our new configuration, and see how the cluster responds:
----
[root@pcmk-1 ~]# pcs cluster cib-push dlm_cfg
CIB updated
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 11:19:36 2018
Last change: Fri Jan 12 11:19:28 2018
2 nodes configured
8 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-2
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-2 ]
Slaves: [ pcmk-1 ]
WebFS (ocf::heartbeat:Filesystem): Started pcmk-2
ipmi-fencing (stonith:fence_ipmilan): Started pcmk-1
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
[[GFS2_prep]]
== Create and Populate GFS2 Filesystem ==
Before we do anything to the existing partition, we need to make sure it
is unmounted. We do this by telling the cluster to stop the WebFS resource.
This will ensure that other resources (in our case, Apache) using WebFS
are not only stopped, but stopped in the correct order.
----
[root@pcmk-1 ~]# pcs resource disable WebFS
[root@pcmk-1 ~]# pcs resource
ClusterIP (ocf::heartbeat:IPaddr2): Started
WebSite (ocf::heartbeat:apache): Stopped
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-2 ]
Slaves: [ pcmk-1 ]
WebFS (ocf::heartbeat:Filesystem): Stopped
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
----
You can see that both Apache and WebFS have been stopped,
and that *pcmk-2* is the current master for the DRBD device.
Now we can create a new GFS2 filesystem on the DRBD device.
[WARNING]
=========
This will erase all previous content stored on the DRBD device. Ensure
you have a copy of any important data.
=========
[IMPORTANT]
===========
Run the next command on whichever node has the DRBD Primary role.
Otherwise, you will receive the message:
-----
/dev/drbd1: Read-only file system
-----
===========
-----
[root@pcmk-2 ~]# mkfs.gfs2 -p lock_dlm -j 2 -t mycluster:web /dev/drbd1
It appears to contain an existing filesystem (xfs)
This will destroy any data on /dev/drbd1
Are you sure you want to proceed? [y/n]y
Device: /dev/drbd1
Block size: 4096
Device size: 1.00 GB (262127 blocks)
Filesystem size: 1.00 GB (262126 blocks)
Journals: 2
Resource groups: 5
Locking protocol: "lock_dlm"
Lock table: "mycluster:web"
UUID: 9a72c488-d8a7-24c9-ceee-add7a8ca52c2
-----
The `mkfs.gfs2` command required a number of additional parameters:
* `-p lock_dlm` specifies that we want to use the
kernel's DLM.
* `-j 2` indicates that the filesystem should reserve enough
space for two journals (one for each node that will access the filesystem).
* `-t mycluster:web` specifies the lock table name. The format for
this field is +pass:[clustername:fsname]+. For
+pass:[clustername]+, we need to use the same
value we specified originally with `pcs cluster setup --name` (which is also
the value of *cluster_name* in +/etc/corosync/corosync.conf+).
If you are unsure what your cluster name is, you can look in
+/etc/corosync/corosync.conf+ or execute the command
`pcs cluster corosync pcmk-1 | grep cluster_name`.
Now we can (re-)populate the new filesystem with data
(web pages). We'll create yet another variation on our home page.
-----
[root@pcmk-2 ~]# mount /dev/drbd1 /mnt
[root@pcmk-2 ~]# cat <<-END >/mnt/index.html
My Test Site - GFS2
END
[root@pcmk-2 ~]# chcon -R --reference=/var/www/html /mnt
[root@pcmk-2 ~]# umount /dev/drbd1
[root@pcmk-2 ~]# drbdadm verify wwwdata
-----
== Reconfigure the Cluster for GFS2 ==
With the WebFS resource stopped, let's update the configuration.
----
[root@pcmk-1 ~]# pcs resource show WebFS
Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd1 directory=/var/www/html fstype=xfs
Meta Attrs: target-role=Stopped
Operations: start interval=0s timeout=60 (WebFS-start-timeout-60)
stop interval=0s timeout=60 (WebFS-stop-timeout-60)
monitor interval=20 timeout=40 (WebFS-monitor-interval-20)
----
The fstype option needs to be updated to *gfs2* instead of *xfs*.
----
[root@pcmk-1 ~]# pcs resource update WebFS fstype=gfs2
[root@pcmk-1 ~]# pcs resource show WebFS
Resource: WebFS (class=ocf provider=heartbeat type=Filesystem)
Attributes: device=/dev/drbd1 directory=/var/www/html fstype=gfs2
Meta Attrs: target-role=Stopped
Operations: start interval=0s timeout=60 (WebFS-start-timeout-60)
stop interval=0s timeout=60 (WebFS-stop-timeout-60)
monitor interval=20 timeout=40 (WebFS-monitor-interval-20)
----
GFS2 requires that DLM be running, so we also need to set up new colocation
and ordering constraints for it:
----
[root@pcmk-1 ~]# pcs constraint colocation add WebFS with dlm-clone INFINITY
[root@pcmk-1 ~]# pcs constraint order dlm-clone then WebFS
Adding dlm-clone WebFS (kind: Mandatory) (Options: first-action=start then-action=start)
----
== Clone the IP address ==
There's no point making the services active on both locations if we can't
reach them both, so let's clone the IP address.
The *IPaddr2* resource agent has built-in intelligence for when it is configured
as a clone. It will utilize a multicast MAC address to have the local switch
send the relevant packets to all nodes in the cluster, together with *iptables
clusterip* rules on the nodes so that any given packet will be grabbed by
exactly one node. This will give us a simple but effective form of
load-balancing requests between our two nodes.
Let's start a new config, and clone our IP:
----
[root@pcmk-1 ~]# pcs cluster cib loadbalance_cfg
[root@pcmk-1 ~]# pcs -f loadbalance_cfg resource clone ClusterIP \
clone-max=2 clone-node-max=2 globally-unique=true
----
* `clone-max=2` tells the resource agent to split packets this many ways. This
should equal the number of nodes that can host the IP.
* `clone-node-max=2` says that one node can run up to 2 instances
of the clone. This should also equal the number of nodes that can
host the IP, so that if any node goes down, another node can take over
the failed node's "request bucket". Otherwise, requests intended for
the failed node would be discarded.
* `globally-unique=true` tells the cluster that one clone isn't identical
to another (each handles a different "bucket"). This also tells the resource
agent to insert *iptables* rules so each host only processes packets in its
bucket(s).
Notice that when the ClusterIP becomes a clone, the constraints
referencing ClusterIP now reference the clone. This is
done automatically by pcs.
----
[root@pcmk-1 ~]# pcs -f loadbalance_cfg constraint
Location Constraints:
Ordering Constraints:
start ClusterIP-clone then start WebSite (kind:Mandatory)
promote WebDataClone then start WebFS (kind:Mandatory)
start WebFS then start WebSite (kind:Mandatory)
start dlm-clone then start WebFS (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP-clone (score:INFINITY)
WebFS with WebDataClone (score:INFINITY) (with-rsc-role:Master)
WebSite with WebFS (score:INFINITY)
WebFS with dlm-clone (score:INFINITY)
Ticket Constraints:
----
Now we must tell the resource how to decide which requests are
processed by which hosts. To do this, we specify the *clusterip_hash* parameter.
The value of *sourceip* means that the source IP address of incoming packets
will be hashed; each node will process a certain range of hashes.
----
[root@pcmk-1 ~]# pcs -f loadbalance_cfg resource update ClusterIP clusterip_hash=sourceip
----
Load our configuration to the cluster, and see how it responds.
-----
[root@pcmk-1 ~]# pcs cluster cib-push loadbalance_cfg
CIB updated
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 11:32:07 2018
Last change: Fri Jan 12 11:32:04 2018
2 nodes configured
9 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
WebSite (ocf::heartbeat:apache): Stopped
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-1 ]
Slaves: [ pcmk-2 ]
WebFS (ocf::heartbeat:Filesystem): Stopped
ipmi-fencing (stonith:fence_ipmilan): Started pcmk-1
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: ClusterIP-clone [ClusterIP] (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started pcmk-1
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started pcmk-2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
-----
If desired, you can demonstrate that all request buckets are working
by using a tool such as `arping` from several source hosts
to see which host responds to each.
== Clone the Filesystem and Apache Resources ==
Now that we have a cluster filesystem ready to go,
and our nodes can load-balance requests to a shared IP address,
we can configure the cluster so both nodes mount the filesystem
and respond to web requests.
Clone the filesystem and Apache resources in a new configuration.
Notice how pcs automatically updates the relevant constraints again.
----
[root@pcmk-1 ~]# pcs cluster cib active_cfg
[root@pcmk-1 ~]# pcs -f active_cfg resource clone WebFS
[root@pcmk-1 ~]# pcs -f active_cfg resource clone WebSite
[root@pcmk-1 ~]# pcs -f active_cfg constraint
Location Constraints:
Ordering Constraints:
start ClusterIP-clone then start WebSite-clone (kind:Mandatory)
promote WebDataClone then start WebFS-clone (kind:Mandatory)
start WebFS-clone then start WebSite-clone (kind:Mandatory)
start dlm-clone then start WebFS-clone (kind:Mandatory)
Colocation Constraints:
WebSite-clone with ClusterIP-clone (score:INFINITY)
WebFS-clone with WebDataClone (score:INFINITY) (with-rsc-role:Master)
WebSite-clone with WebFS-clone (score:INFINITY)
WebFS-clone with dlm-clone (score:INFINITY)
Ticket Constraints:
----
Tell the cluster that it is now allowed to promote both instances to be DRBD
Primary (aka. master).
-----
[root@pcmk-1 ~]# pcs -f active_cfg resource update WebDataClone master-max=2
-----
Finally, load our configuration to the cluster, and re-enable the WebFS resource
(which we disabled earlier).
-----
[root@pcmk-1 ~]# pcs cluster cib-push active_cfg
CIB updated
[root@pcmk-1 ~]# pcs resource enable WebFS
-----
After all the processes are started, the status should look similar to this.
-----
[root@pcmk-1 ~]# pcs resource
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-1 pcmk-2 ]
Clone Set: dlm-clone [dlm]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: ClusterIP-clone [ClusterIP] (unique)
ClusterIP:0 (ocf::heartbeat:IPaddr2): Started
ClusterIP:1 (ocf::heartbeat:IPaddr2): Started
Clone Set: WebFS-clone [WebFS]
Started: [ pcmk-1 pcmk-2 ]
Clone Set: WebSite-clone [WebSite]
Started: [ pcmk-1 pcmk-2 ]
-----
== Test Failover ==
Testing failover is left as an exercise for the reader.
For example, you can put one node into standby mode,
use `pcs status` to confirm that its ClusterIP clone was
moved to the other node, and use `arping` to verify that
packets are not being lost from any source host.
[NOTE]
====
You may find that when a failed node rejoins the cluster,
both ClusterIP clones stay on one node, due to the
resource stickiness. While this works fine, it effectively eliminates
load-balancing and returns the cluster to an active-passive setup again.
You can avoid this by disabling stickiness for the IP address resource:
----
[root@pcmk-1 ~]# pcs resource meta ClusterIP resource-stickiness=0
----
====
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt b/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
index bb3586ab7d..31e9eac2ef 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
@@ -1,391 +1,392 @@
+:compat-mode: legacy
= Create an Active/Passive Cluster =
== Explore the Existing Configuration ==
When Pacemaker starts up, it automatically records the number and details
of the nodes in the cluster, as well as which stack is being used and the
version of Pacemaker being used.
The first few lines of output should look like this:
----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 16:15:29 2018
Last change: Fri Jan 12 15:49:47 2018
2 nodes configured
0 resources configured
Online: [ pcmk-1 pcmk-2 ]
----
For those who are not of afraid of XML, you can see the raw cluster
configuration and status by using the `pcs cluster cib` command.
.The last XML you'll see in this document
======
----
[root@pcmk-1 ~]# pcs cluster cib
----
[source,XML]
----
----
======
Before we make any changes, it's a good idea to check the validity of
the configuration.
----
[root@pcmk-1 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
----
As you can see, the tool has found some errors.
In order to guarantee the safety of your data,
footnote:[If the data is corrupt, there is little point in continuing to make it available]
the default for STONITH
footnote:[A common node fencing mechanism. Used to ensure data integrity by powering off "bad" nodes]
in Pacemaker is *enabled*. However, it also knows when no STONITH configuration has been
supplied and reports this as a problem (since the cluster would not be
able to make progress if a situation requiring node fencing arose).
We will disable this feature for now and configure it later.
To disable STONITH, set the *stonith-enabled* cluster option to
false:
----
[root@pcmk-1 ~]# pcs property set stonith-enabled=false
[root@pcmk-1 ~]# crm_verify -L
----
With the new cluster option set, the configuration is now valid.
[WARNING]
=========
The use of `stonith-enabled=false` is completely inappropriate for a
production cluster. It tells the cluster to simply pretend that failed nodes
are safely powered off. Some vendors will refuse to support clusters that have
STONITH disabled.
We disable STONITH here only to defer the discussion of its
configuration, which can differ widely from one installation to the
next. See <<_what_is_stonith>> for information on why STONITH is important
and details on how to configure it.
=========
== Add a Resource ==
Our first resource will be a unique IP address that the cluster can bring up on
either node. Regardless of where any cluster service(s) are running, end
users need a consistent address to contact them on. Here, I will choose
192.168.122.120 as the floating address, give it the imaginative name ClusterIP
and tell the cluster to check whether it is running every 30 seconds.
[WARNING]
===========
The chosen address must not already be in use on the network.
Do not reuse an IP address one of the nodes already has configured.
===========
----
[root@pcmk-1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
ip=192.168.122.120 cidr_netmask=32 op monitor interval=30s
----
Another important piece of information here is *ocf:heartbeat:IPaddr2*.
This tells Pacemaker three things about the resource you want to add:
* The first field (*ocf* in this case) is the standard to which the resource
script conforms and where to find it.
* The second field (*heartbeat* in this case) is standard-specific; for OCF
resources, it tells the cluster which OCF namespace the resource script is in.
* The third field (*IPaddr2* in this case) is the name of the resource script.
To obtain a list of the available resource standards (the *ocf* part of
*ocf:heartbeat:IPaddr2*), run:
----
[root@pcmk-1 ~]# pcs resource standards
lsb
ocf
service
systemd
----
To obtain a list of the available OCF resource providers (the *heartbeat*
part of *ocf:heartbeat:IPaddr2*), run:
----
[root@pcmk-1 ~]# pcs resource providers
heartbeat
openstack
pacemaker
----
Finally, if you want to see all the resource agents available for
a specific OCF provider (the *IPaddr2* part of *ocf:heartbeat:IPaddr2*), run:
----
[root@pcmk-1 ~]# pcs resource agents ocf:heartbeat
apache
clvm
conntrackd
CTDB
db2
Delay
.
. (skipping lots of resources to save space)
.
symlink
tomcat
VirtualDomain
Xinetd
----
Now, verify that the IP resource has been added, and display the cluster's
status to see that it is now active:
----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 17:44:40 2018
Last change: Fri Jan 12 17:44:26 2018
2 nodes configured
1 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
== Perform a Failover ==
Since our ultimate goal is high availability, we should test failover of
our new resource before moving on.
First, find the node on which the IP address is running.
----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 17:44:40 2018
Last change: Fri Jan 12 17:44:26 2018
2 nodes configured
1 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
----
You can see that the status of the *ClusterIP* resource
is *Started* on a particular node (in this example, *pcmk-1*).
Shut down Pacemaker and Corosync on that machine to trigger a failover.
----
[root@pcmk-1 ~]# pcs cluster stop pcmk-1
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...
----
[NOTE]
======
A cluster command such as +pcs cluster stop pass:[nodename]+ can be run
from any node in the cluster, not just the affected node.
======
Verify that pacemaker and corosync are no longer running:
----
[root@pcmk-1 ~]# pcs status
Error: cluster is not currently running on this node
----
Go to the other node, and check the cluster status.
----
[root@pcmk-2 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 18:30:56 2018
Last change: Fri Jan 12 17:44:26 2018
2 nodes configured
1 resources configured
Online: [ pcmk-2 ]
OFFLINE: [ pcmk-1 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
Notice that *pcmk-1* is *OFFLINE* for cluster purposes (its *pcsd* is still
active, allowing it to receive `pcs` commands, but it is not participating in
the cluster).
Also notice that *ClusterIP* is now running on *pcmk-2* -- failover happened
automatically, and no errors are reported.
[IMPORTANT]
.Quorum
====
If a cluster splits into two (or more) groups of nodes that can no longer
communicate with each other (aka. _partitions_), _quorum_ is used to prevent
resources from starting on more nodes than desired, which would risk
data corruption.
A cluster has quorum when more than half of all known nodes are online in
the same partition, or for the mathematically inclined, whenever the following
equation is true:
....
total_nodes < 2 * active_nodes
....
For example, if a 5-node cluster split into 3- and 2-node paritions,
the 3-node partition would have quorum and could continue serving resources.
If a 6-node cluster split into two 3-node partitions, neither partition
would have quorum; pacemaker's default behavior in such cases is to
stop all resources, in order to prevent data corruption.
Two-node clusters are a special case. By the above definition,
a two-node cluster would only have quorum when both nodes are
running. This would make the creation of a two-node cluster pointless,
footnote:[Some would argue that two-node clusters are always pointless, but that is an argument for another time]
but corosync has the ability to treat two-node clusters as if only one node
is required for quorum.
The `pcs cluster setup` command will automatically configure *two_node: 1*
in +corosync.conf+, so a two-node cluster will "just work".
If you are using a different cluster shell, you will have to configure
+corosync.conf+ appropriately yourself.
====
Now, simulate node recovery by restarting the cluster stack on *pcmk-1*, and
check the cluster's status. (It may take a little while before the cluster
gets going on the node, but it eventually will look like the below.)
----
[root@pcmk-1 ~]# pcs cluster start pcmk-1
pcmk-1: Starting Cluster...
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 18:50:11 2018
Last change: Fri Jan 12 17:44:26 2018
2 nodes configured
1 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
== Prevent Resources from Moving after Recovery ==
In most circumstances, it is highly desirable to prevent healthy
resources from being moved around the cluster. Moving resources almost
always requires a period of downtime. For complex services such as
databases, this period can be quite long.
To address this, Pacemaker has the concept of resource _stickiness_,
which controls how strongly a service prefers to stay running where it
is. You may like to think of it as the "cost" of any downtime. By
default, Pacemaker assumes there is zero cost associated with moving
resources and will do so to achieve "optimal"
footnote:[Pacemaker's definition of optimal may not always agree with that of a
human's. The order in which Pacemaker processes lists of resources and nodes
creates implicit preferences in situations where the administrator has not
explicitly specified them.]
resource placement. We can specify a different stickiness for every
resource, but it is often sufficient to change the default.
----
[root@pcmk-1 ~]# pcs resource defaults resource-stickiness=100
[root@pcmk-1 ~]# pcs resource defaults
resource-stickiness: 100
----
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Apache.txt b/doc/Clusters_from_Scratch/en-US/Ch-Apache.txt
index f460015de3..5d73526b83 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Apache.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Apache.txt
@@ -1,415 +1,416 @@
+:compat-mode: legacy
= Add Apache HTTP Server as a Cluster Service =
indexterm:[Apache HTTP Server]
Now that we have a basic but functional active/passive two-node cluster,
we're ready to add some real services. We're going to start with
Apache HTTP Server because it is a feature of many clusters and relatively
simple to configure.
== Install Apache ==
Before continuing, we need to make sure Apache is installed on both
hosts. We also need the wget tool in order for the cluster to be able to check
the status of the Apache server.
----
# yum install -y httpd wget
# firewall-cmd --permanent --add-service=http
# firewall-cmd --reload
----
[IMPORTANT]
====
Do *not* enable the httpd service. Services that are intended to
be managed via the cluster software should never be managed by the OS.
It is often useful, however, to manually start the service, verify that
it works, then stop it again, before adding it to the cluster. This
allows you to resolve any non-cluster-related problems before continuing.
Since this is a simple example, we'll skip that step here.
====
== Create Website Documents ==
We need to create a page for Apache to serve. On &DISTRO; &DISTRO_VERSION;, the
default Apache document root is /var/www/html, so we'll create an index file
there. For the moment, we will simplify things by serving a static site
and manually synchronizing the data between the two nodes, so run this command
on both nodes:
-----
# cat <<-END >/var/www/html/index.html
My Test Site - $(hostname)
END
-----
== Enable the Apache status URL ==
indexterm:[Apache HTTP Server,/server-status]
In order to monitor the health of your Apache instance, and recover it if
it fails, the resource agent used by Pacemaker assumes the server-status
URL is available. On both nodes, enable the URL with:
----
# cat <<-END >/etc/httpd/conf.d/status.conf
SetHandler server-status
Require local
END
----
[NOTE]
======
If you are using a different operating system, server-status may already be
enabled or may be configurable in a different location. If you are using
a version of Apache HTTP Server less than 2.4, the syntax will be different.
======
== Configure the Cluster ==
indexterm:[Apache HTTP Server,Apache resource configuration]
At this point, Apache is ready to go, and all that needs to be done is to
add it to the cluster. Let's call the resource WebSite. We need to use
an OCF resource script called apache in the heartbeat namespace.
footnote:[Compare the key used here, *ocf:heartbeat:apache*, with the one we
used earlier for the IP address, *ocf:heartbeat:IPaddr2*]
The script's only required parameter is the path to the main Apache
configuration file, and we'll tell the cluster to check once a
minute that Apache is still running.
----
[root@pcmk-1 ~]# pcs resource create WebSite ocf:heartbeat:apache \
configfile=/etc/httpd/conf/httpd.conf \
statusurl="http://localhost/server-status" \
op monitor interval=1min
----
By default, the operation timeout for all resources' start, stop, and monitor
operations is 20 seconds. In many cases, this timeout period is less than
a particular resource's advised timeout period. For the purposes of this
tutorial, we will adjust the global operation timeout default to 240 seconds.
----
[root@pcmk-1 ~]# pcs resource op defaults timeout=240s
[root@pcmk-1 ~]# pcs resource op defaults
timeout: 240s
----
[NOTE]
======
In a production cluster, it is usually better to adjust each resource's
start, stop, and monitor timeouts to values that are appropriate to
the behavior observed in your environment, rather than adjust
the global default.
======
After a short delay, we should see the cluster start Apache.
-----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 12:40:41 2018
Last change: Fri Jan 12 12:40:05 2018
2 nodes configured
2 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
-----
Wait a moment, the WebSite resource isn't running on the same host as our
IP address!
[NOTE]
======
If, in the `pcs status` output, you see the WebSite resource has
failed to start, then you've likely not enabled the status URL correctly.
You can check whether this is the problem by running:
....
wget -O - http://localhost/server-status
....
If you see *Not Found* or *Forbidden* in the output, then this is likely the
problem. Ensure that the ** block is correct.
======
== Ensure Resources Run on the Same Host ==
To reduce the load on any one machine, Pacemaker will generally try to
spread the configured resources across the cluster nodes. However, we
can tell the cluster that two resources are related and need to run on
the same host (or not at all). Here, we instruct the cluster that
WebSite can only run on the host that ClusterIP is active on.
To achieve this, we use a _colocation constraint_ that indicates it is
mandatory for WebSite to run on the same node as ClusterIP. The
"mandatory" part of the colocation constraint is indicated by using a
score of INFINITY. The INFINITY score also means that if ClusterIP is not
active anywhere, WebSite will not be permitted to run.
[NOTE]
=======
If ClusterIP is not active anywhere, WebSite will not be permitted to run
anywhere.
=======
[IMPORTANT]
===========
Colocation constraints are "directional", in that they imply certain
things about the order in which the two resources will have a location
chosen. In this case, we're saying that *WebSite* needs to be placed on the
same machine as *ClusterIP*, which implies that the cluster must know the
location of *ClusterIP* before choosing a location for *WebSite*.
===========
-----
[root@pcmk-1 ~]# pcs constraint colocation add WebSite with ClusterIP INFINITY
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Ordering Constraints:
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
Ticket Constraints:
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 13:57:58 2018
Last change: Fri Jan 12 13:57:22 2018
2 nodes configured
2 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
-----
== Ensure Resources Start and Stop in Order ==
Like many services, Apache can be configured to bind to specific
IP addresses on a host or to the wildcard IP address. If Apache
binds to the wildcard, it doesn't matter whether an IP address
is added before or after Apache starts; Apache will respond on
that IP just the same. However, if Apache binds only to certain IP
address(es), the order matters: If the address is added after Apache
starts, Apache won't respond on that address.
To be sure our WebSite responds regardless of Apache's address configuration,
we need to make sure ClusterIP not only runs on the same node,
but starts before WebSite. A colocation constraint only ensures the
resources run together, not the order in which they are started and stopped.
We do this by adding an ordering constraint. By default, all order constraints
are mandatory, which means that the recovery of ClusterIP will also trigger the
recovery of WebSite.
-----
[root@pcmk-1 ~]# pcs constraint order ClusterIP then WebSite
Adding ClusterIP WebSite (kind: Mandatory) (Options: first-action=start then-action=start)
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
Ticket Constraints:
-----
== Prefer One Node Over Another ==
Pacemaker does not rely on any sort of hardware symmetry between nodes,
so it may well be that one machine is more powerful than the other. In
such cases, it makes sense to host the resources on the more powerful node if
it is available. To do this, we create a location constraint.
In the location constraint below, we are saying the WebSite resource
prefers the node pcmk-1 with a score of 50. Here, the score indicates
how badly we'd like the resource to run at this location.
-----
[root@pcmk-1 ~]# pcs constraint location WebSite prefers pcmk-1=50
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Resource: WebSite
Enabled on: pcmk-1 (score:50)
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
Ticket Constraints:
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 14:11:49 2018
Last change: Fri Jan 12 14:11:20 2018
2 nodes configured
2 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
-----
Wait a minute, the resources are still on pcmk-2!
Even though WebSite now prefers to run on pcmk-1, that preference is
(intentionally) less than the resource stickiness (how much we
preferred not to have unnecessary downtime).
To see the current placement scores, you can use a tool called crm_simulate.
----
[root@pcmk-1 ~]# crm_simulate -sL
Current cluster status:
Online: [ pcmk-1 pcmk-2 ]
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-2
Allocation scores:
native_color: ClusterIP allocation score on pcmk-1: 50
native_color: ClusterIP allocation score on pcmk-2: 200
native_color: WebSite allocation score on pcmk-1: -INFINITY
native_color: WebSite allocation score on pcmk-2: 100
Transition Summary:
----
== Move Resources Manually ==
There are always times when an administrator needs to override the
cluster and force resources to move to a specific location. In this example,
we will force the WebSite to move to pcmk-1 by
updating our previous location constraint with a score of INFINITY.
-----
[root@pcmk-1 ~]# pcs constraint location WebSite prefers pcmk-1=INFINITY
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Resource: WebSite
Enabled on: pcmk-1 (score:INFINITY)
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
Ticket Constraints:
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 14:19:34 2018
Last change: Fri Jan 12 14:18:37 2018
2 nodes configured
2 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
WebSite (ocf::heartbeat:apache): Started pcmk-1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
-----
Once we've finished whatever activity required us to move the
resources to pcmk-1 (in our case nothing), we can then allow the cluster
to resume normal operation by removing the new constraint. Since we previously
configured a default stickiness, the resources will remain on pcmk-1.
First, use the `--full` option to get the constraint's ID:
-----
[root@pcmk-1 ~]# pcs constraint --full
Location Constraints:
Resource: WebSite
Enabled on: pcmk-1 (score:INFINITY) (id:location-WebSite-pcmk-1-INFINITY)
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory) (id:order-ClusterIP-WebSite-mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY) (id:colocation-WebSite-ClusterIP-INFINITY)
Ticket Constraints:
-----
Then remove the desired contraint using its ID:
-----
[root@pcmk-1 ~]# pcs constraint remove location-WebSite-pcmk-1-INFINITY
[root@pcmk-1 ~]# pcs constraint
Location Constraints:
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
Ticket Constraints:
-----
Note that the location constraint is now gone. If we check the cluster
status, we can also see that (as expected) the resources are still active
on pcmk-1.
-----
# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 14:25:21 2018
Last change: Fri Jan 12 14:24:29 2018
2 nodes configured
2 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
WebSite (ocf::heartbeat:apache): Started pcmk-1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
-----
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Installation.txt b/doc/Clusters_from_Scratch/en-US/Ch-Installation.txt
index 974b8ff331..98d8f93bed 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Installation.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Installation.txt
@@ -1,489 +1,490 @@
+:compat-mode: legacy
= Installation =
== Install &DISTRO; &DISTRO_VERSION; ==
=== Boot the Install Image ===
Download the 4GB
http://isoredirect.centos.org/centos/7/isos/x86_64/CentOS-7-x86_64-DVD-1708.iso[&DISTRO;
&DISTRO_VERSION; DVD ISO]. Use the image to boot a virtual machine, or
burn it to a DVD or USB drive and boot a physical server from that.
After starting the installation, select your language and keyboard layout at
the welcome screen.
.&DISTRO; &DISTRO_VERSION; Installation Welcome Screen
image::images/Welcome.png["Welcome to &DISTRO; &DISTRO_VERSION;",align="center",scaledwidth="100%"]
=== Installation Options ===
At this point, you get a chance to tweak the default installation options.
.&DISTRO; &DISTRO_VERSION; Installation Summary Screen
image::images/Installer.png["&DISTRO; &DISTRO_VERSION; Installation Summary",align="center",scaledwidth="100%"]
Ignore the *SOFTWARE SELECTION* section (try saying that 10 times quickly). The
*Infrastructure Server* environment does have add-ons with much of the software
we need, but we will leave it as a *Minimal Install* here, so that we can see
exactly what software is required later.
=== Configure Network ===
In the *NETWORK & HOSTNAME* section:
- Edit *Host Name:* as desired. For this example, we will use
*pcmk-1.localdomain*.
- Select your network device, press *Configure...*, and manually assign a fixed
IP address. For this example, we'll use 192.168.122.101 under *IPv4 Settings*
(with an appropriate netmask, gateway and DNS server).
- Flip the switch to turn your network device on.
[IMPORTANT]
===========
Do not accept the default network settings.
Cluster machines should never obtain an IP address via DHCP, because
DHCP's periodic address renewal will interfere with corosync.
===========
=== Configure Disk ===
By default, the installer's automatic partitioning will use LVM (which allows
us to dynamically change the amount of space allocated to a given partition).
However, it allocates all free space to the +/+ (aka. *root*) partition, which
cannot be reduced in size later (dynamic increases are fine).
In order to follow the DRBD and GFS2 portions of this guide, we need to reserve
space on each machine for a replicated volume.
Enter the *INSTALLATION DESTINATION* section, ensure the hard drive you want to
install to is selected, select *I will configure partitioning*, and press *Done*.
In the *MANUAL PARTITIONING* screen that comes next, click the option to create
mountpoints automatically. Select the +/+ mountpoint, and reduce the desired
capacity by 1GiB or so. Select *Modify...* by the volume group name, and change
the *Size policy:* to *As large as possible*, to make the reclaimed space
available inside the LVM volume group. We'll add the additional volume later.
=== Configure Time Synchronization ===
It is highly recommended to enable NTP on your cluster nodes. Doing so
ensures all nodes agree on the current time and makes reading log files
significantly easier.
&DISTRO; will enable NTP automatically. If you want to change any time-related
settings (such as time zone or NTP server), you can do this in the
*TIME & DATE* section.
=== Finish Install ===
Select *Begin Installation*. Once it completes, set a root password, and reboot
as instructed. For the purposes of this document, it is not necessary to create
any additional users. After the node reboots, you'll see a login prompt on
the console. Login using *root* and the password you created earlier.
.&DISTRO; &DISTRO_VERSION; Console Prompt
image::images/Console.png["&DISTRO; &DISTRO_VERSION; Console",align="center",scaledwidth="100%"]
[NOTE]
======
From here on, we're going to be working exclusively from the terminal.
======
== Configure the OS ==
=== Verify Networking ===
Ensure that the machine has the static IP address you configured earlier.
-----
[root@pcmk-1 ~]# ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:d7:d6:08 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.101/24 brd 192.168.122.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5054:ff:fed7:d608/64 scope link
valid_lft forever preferred_lft forever
-----
[NOTE]
=====
If you ever need to change the node's IP address from the command line, follow
these instructions, replacing *${device}* with the name of your network device:
....
[root@pcmk-1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-${device} # manually edit as desired
[root@pcmk-1 ~]# nmcli dev disconnect ${device}
[root@pcmk-1 ~]# nmcli con reload ${device}
[root@pcmk-1 ~]# nmcli con up ${device}
....
This makes *NetworkManager* aware that a change was made on the config file.
=====
Next, ensure that the routes are as expected:
-----
[root@pcmk-1 ~]# ip route
default via 192.168.122.1 dev eth0 proto static metric 100
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.101 metric 100
-----
If there is no line beginning with *default via*, then you may need to add a line such as
[source,Bash]
GATEWAY="192.168.122.1"
to the device configuration using the same process as described above for
changing the IP address.
Now, check for connectivity to the outside world. Start small by
testing whether we can reach the gateway we configured.
-----
[root@pcmk-1 ~]# ping -c 1 192.168.122.1
PING 192.168.122.1 (192.168.122.1) 56(84) bytes of data.
64 bytes from 192.168.122.1: icmp_req=1 ttl=64 time=0.249 ms
--- 192.168.122.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.249/0.249/0.249/0.000 ms
-----
Now try something external; choose a location you know should be available.
-----
[root@pcmk-1 ~]# ping -c 1 www.google.com
PING www.l.google.com (173.194.72.106) 56(84) bytes of data.
64 bytes from tf-in-f106.1e100.net (173.194.72.106): icmp_req=1 ttl=41 time=167 ms
--- www.l.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 167.618/167.618/167.618/0.000 ms
-----
=== Login Remotely ===
The console isn't a very friendly place to work from, so we will now
switch to accessing the machine remotely via SSH where we can
use copy and paste, etc.
From another host, check whether we can see the new host at all:
-----
beekhof@f16 ~ # ping -c 1 192.168.122.101
PING 192.168.122.101 (192.168.122.101) 56(84) bytes of data.
64 bytes from 192.168.122.101: icmp_req=1 ttl=64 time=1.01 ms
--- 192.168.122.101 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.012/1.012/1.012/0.000 ms
-----
Next, login as root via SSH.
-----
beekhof@f16 ~ # ssh -l root 192.168.122.101
The authenticity of host '192.168.122.101 (192.168.122.101)' can't be established.
ECDSA key fingerprint is 6e:b7:8f:e2:4c:94:43:54:a8:53:cc:20:0f:29:a4:e0.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.122.101' (ECDSA) to the list of known hosts.
root@192.168.122.101's password:
Last login: Tue Aug 11 13:14:39 2015
[root@pcmk-1 ~]#
-----
=== Apply Updates ===
Apply any package updates released since your installation image was created:
----
[root@pcmk-1 ~]# yum update
----
=== Use Short Node Names ===
During installation, we filled in the machine's fully qualified domain
name (FQDN), which can be rather long when it appears in cluster logs and
status output. See for yourself how the machine identifies itself:
(((Nodes, short name)))
----
[root@pcmk-1 ~]# uname -n
pcmk-1.localdomain
----
(((Nodes, Domain name (Query))))
We can use the `hostnamectl` tool to strip off the domain name:
----
[root@pcmk-1 ~]# hostnamectl set-hostname $(uname -n | sed s/\\..*//)
----
(((Nodes, Domain name (Remove from host name))))
Now, check that the machine is using the correct name:
----
[root@pcmk-1 ~]# uname -n
pcmk-1
----
== Repeat for Second Node ==
Repeat the Installation steps so far, so that you have two
nodes ready to have the cluster software installed.
For the purposes of this document, the additional node is called
pcmk-2 with address 192.168.122.102.
== Configure Communication Between Nodes ==
=== Configure Host Name Resolution ===
Confirm that you can communicate between the two new nodes:
----
[root@pcmk-1 ~]# ping -c 3 192.168.122.102
PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data.
64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=0.343 ms
64 bytes from 192.168.122.102: icmp_seq=2 ttl=64 time=0.402 ms
64 bytes from 192.168.122.102: icmp_seq=3 ttl=64 time=0.558 ms
--- 192.168.122.102 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.343/0.434/0.558/0.092 ms
----
Now we need to make sure we can communicate with the machines by their
name. If you have a DNS server, add additional entries for the two
machines. Otherwise, you'll need to add the machines to +/etc/hosts+
on both nodes. Below are the entries for my cluster nodes:
----
[root@pcmk-1 ~]# grep pcmk /etc/hosts
192.168.122.101 pcmk-1.clusterlabs.org pcmk-1
192.168.122.102 pcmk-2.clusterlabs.org pcmk-2
----
We can now verify the setup by again using ping:
----
[root@pcmk-1 ~]# ping -c 3 pcmk-2
PING pcmk-2.clusterlabs.org (192.168.122.101) 56(84) bytes of data.
64 bytes from pcmk-1.clusterlabs.org (192.168.122.101): icmp_seq=1 ttl=64 time=0.164 ms
64 bytes from pcmk-1.clusterlabs.org (192.168.122.101): icmp_seq=2 ttl=64 time=0.475 ms
64 bytes from pcmk-1.clusterlabs.org (192.168.122.101): icmp_seq=3 ttl=64 time=0.186 ms
--- pcmk-2.clusterlabs.org ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.164/0.275/0.475/0.141 ms
----
=== Configure SSH ===
SSH is a convenient and secure way to copy files and perform commands
remotely. For the purposes of this guide, we will create a key without a
password (using the -N option) so that we can perform remote actions
without being prompted.
(((SSH)))
[WARNING]
=========
Unprotected SSH keys (those without a password) are not recommended for servers exposed to the outside world.
We use them here only to simplify the demo.
=========
Create a new key and allow anyone with that key to log in:
.Creating and Activating a new SSH Key
----
[root@pcmk-1 ~]# ssh-keygen -t dsa -f ~/.ssh/id_dsa -N ""
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
91:09:5c:82:5a:6a:50:08:4e:b2:0c:62:de:cc:74:44 root@pcmk-1.clusterlabs.org
The key's randomart image is:
+--[ DSA 1024]----+
|==.ooEo.. |
|X O + .o o |
| * A + |
| + . |
| . S |
| |
| |
| |
| |
+-----------------+
[root@pcmk-1 ~]# cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
----
(((Creating and Activating a new SSH Key)))
Install the key on the other node:
----
[root@pcmk-1 ~]# scp -r ~/.ssh pcmk-2:
The authenticity of host 'pcmk-2 (192.168.122.102)' can't be established.
ECDSA key fingerprint is a4:f5:b2:34:9d:86:2b:34:a2:87:37:b9:ca:68:52:ec.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'pcmk-2,192.168.122.102' (ECDSA) to the list of known hosts.
root@pcmk-2's password:
id_dsa.pub 100% 616 0.6KB/s 00:00
id_dsa 100% 672 0.7KB/s 00:00
known_hosts 100% 400 0.4KB/s 00:00
authorized_keys 100% 616 0.6KB/s 00:00
----
Test that you can now run commands remotely, without being prompted:
----
[root@pcmk-1 ~]# ssh pcmk-2 -- uname -n
pcmk-2
----
== Install the Cluster Software ==
Fire up a shell on both nodes and run the following to install pacemaker, and while
we're at it, some command-line tools to make our lives easier:
----
# yum install -y pacemaker pcs psmisc policycoreutils-python
----
[IMPORTANT]
===========
This document will show commands that need to be executed on both nodes
with a simple `#` prompt. Be sure to run them on each node individually.
===========
[NOTE]
===========
This document uses `pcs` for cluster management. Other alternatives,
such as `crmsh`, are available, but their syntax
will differ from the examples used here.
===========
== Configure the Cluster Software ==
=== Allow cluster services through firewall ===
On each node, allow cluster-related services through the local firewall:
----
# firewall-cmd --permanent --add-service=high-availability
success
# firewall-cmd --reload
success
----
[NOTE]
======
If you are using iptables directly, or some other firewall solution besides
firewalld, simply open the following ports, which can be used by various
clustering components: TCP ports 2224, 3121, and 21064, and UDP port 5405.
If you run into any problems during testing, you might want to disable
the firewall and SELinux entirely until you have everything working.
This may create significant security issues and should not be performed on
machines that will be exposed to the outside world, but may be appropriate
during development and testing on a protected host.
To disable security measures:
----
[root@pcmk-1 ~]# setenforce 0
[root@pcmk-1 ~]# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
[root@pcmk-1 ~]# systemctl mask firewalld.service
[root@pcmk-1 ~]# systemctl stop firewalld.service
[root@pcmk-1 ~]# iptables --flush
----
======
=== Enable pcs Daemon ===
Before the cluster can be configured, the pcs daemon must be started and enabled
to start at boot time on each node. This daemon works with the pcs command-line interface
to manage synchronizing the corosync configuration across all nodes in the cluster.
Start and enable the daemon by issuing the following commands on each node:
----
# systemctl start pcsd.service
# systemctl enable pcsd.service
ln -s '/usr/lib/systemd/system/pcsd.service' '/etc/systemd/system/multi-user.target.wants/pcsd.service'
----
The installed packages will create a *hacluster* user with a disabled password.
While this is fine for running `pcs` commands locally,
the account needs a login password in order to perform such tasks as syncing
the corosync configuration, or starting and stopping the cluster on other nodes.
This tutorial will make use of such commands,
so now we will set a password for the *hacluster* user, using the same password
on both nodes:
----
# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
----
[NOTE]
===========
Alternatively, to script this process or set the password on a
different machine from the one you're logged into, you can use
the `--stdin` option for `passwd`:
----
[root@pcmk-1 ~]# ssh pcmk-2 -- 'echo mysupersecretpassword | passwd --stdin hacluster'
----
===========
=== Configure Corosync ===
On either node, use `pcs cluster auth` to authenticate as the *hacluster* user:
----
[root@pcmk-1 ~]# pcs cluster auth pcmk-1 pcmk-2
Username: hacluster
Password:
pcmk-1: Authorized
pcmk-2: Authorized
----
Next, use `pcs cluster setup` on the same node to generate and synchronize the
corosync configuration:
----
[root@pcmk-1 ~]# pcs cluster setup --name mycluster pcmk-1 pcmk-2
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop pacemaker.service
Redirecting to /bin/systemctl stop corosync.service
Killing any remaining services...
Removing all cluster configuration files...
pcmk-1: Succeeded
pcmk-2: Succeeded
----
If you received an authorization error for either of those commands, make
sure you configured the *hacluster* user account on each node
with the same password.
[NOTE]
======
If you are not using `pcs` for cluster administration,
follow whatever procedures are appropriate for your tools
to create a corosync.conf and copy it to all nodes.
The `pcs` command will configure corosync to use UDP unicast transport; if you
choose to use multicast instead, choose a multicast address carefully.
footnote:[For some subtle issues, see
http://web.archive.org/web/20101211210054/http://29west.com/docs/THPM/multicast-address-assignment.html[Topics
in High-Performance Messaging: Multicast Address Assignment] or the more detailed treatment in
https://www.cisco.com/c/dam/en/us/support/docs/ip/ip-multicast/ipmlt_wp.pdf[Cisco's
Guidelines for Enterprise IP Multicast Address Allocation].]
======
The final corosync.conf configuration on each node should look
something like the sample in <>.
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Intro.txt b/doc/Clusters_from_Scratch/en-US/Ch-Intro.txt
index d8582b77e6..60ca19e900 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Intro.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Intro.txt
@@ -1,27 +1,28 @@
+:compat-mode: legacy
= Read-Me-First =
== The Scope of this Document ==
Computer clusters can be used to provide highly available services or
resources. The redundancy of multiple machines is used to guard
against failures of many types.
This document will walk through the installation and setup of simple
clusters using the &DISTRO; distribution, version &DISTRO_VERSION;.
The clusters described here will use Pacemaker and Corosync to provide
resource management and messaging. Required packages and modifications
to their configuration files are described along with the use of the
Pacemaker command line tool for generating the XML used for cluster
control.
Pacemaker is a central component and provides the resource management
required in these systems. This management includes detecting and
recovering from the failure of various nodes, resources and services
under its control.
When more in-depth information is required, and for real-world usage,
please refer to the
https://www.clusterlabs.org/pacemaker/doc/[Pacemaker Explained] manual.
include::../../shared/en-US/pacemaker-intro.txt[]
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Shared-Storage.txt b/doc/Clusters_from_Scratch/en-US/Ch-Shared-Storage.txt
index d756fa2d63..2481bad389 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Shared-Storage.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Shared-Storage.txt
@@ -1,529 +1,530 @@
+:compat-mode: legacy
= Replicate Storage Using DRBD =
Even if you're serving up static websites, having to manually synchronize
the contents of that website to all the machines in the cluster is not
ideal. For dynamic websites, such as a wiki, it's not even an option. Not
everyone care afford network-attached storage, but somehow the data needs
to be kept in sync.
Enter DRBD, which can be thought of as network-based RAID-1.
footnote:[See http://www.drbd.org/ for details.]
== Install the DRBD Packages ==
DRBD itself is included in the upstream kernel,footnote:[Since version 2.6.33]
but we do need some utilities to use it effectively.
CentOS does not ship these utilities, so we need to enable a third-party
repository to get them. Supported packages for many OSes are available from
DRBD's maker http://www.linbit.com/[LINBIT], but here we'll use the free
http://elrepo.org/[ELRepo] repository.
On both nodes, import the ELRepo package signing key, and enable the
repository:
----
# rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
# rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
----
Now, we can install the DRBD kernel module and utilities:
----
# yum install -y kmod-drbd84 drbd84-utils
----
DRBD will not be able to run under the default SELinux security policies.
If you are familiar with SELinux, you can modify the policies in a more
fine-grained manner, but here we will simply exempt DRBD processes from SELinux
control:
----
# semanage permissive -a drbd_t
----
We will configure DRBD to use port 7789, so allow that port from each host to
the other:
----
[root@pcmk-1 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.122.102" port port="7789" protocol="tcp" accept'
success
[root@pcmk-1 ~]# firewall-cmd --reload
success
----
----
[root@pcmk-2 ~]# firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.122.101" port port="7789" protocol="tcp" accept'
success
[root@pcmk-2 ~]# firewall-cmd --reload
success
----
[NOTE]
======
In this example, we have only two nodes, and all network traffic is on the same LAN.
In production, it is recommended to use a dedicated, isolated network for cluster-related traffic,
so the firewall configuration would likely be different; one approach would be to
add the dedicated network interfaces to the trusted zone.
======
== Allocate a Disk Volume for DRBD ==
DRBD will need its own block device on each node. This can be
a physical disk partition or logical volume, of whatever size
you need for your data. For this document, we will use a
1GiB logical volume, which is more than sufficient for a single HTML file and
(later) GFS2 metadata.
----
[root@pcmk-1 ~]# vgdisplay | grep -e Name -e Free
VG Name centos_pcmk-1
Free PE / Size 382 / 1.49 GiB
[root@pcmk-1 ~]# lvcreate --name drbd-demo --size 1G centos_pcmk-1
Logical volume "drbd-demo" created
[root@pcmk-1 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
drbd-demo centos_pcmk-1 -wi-a----- 1.00g
root centos_pcmk-1 -wi-ao---- 5.00g
swap centos_pcmk-1 -wi-ao---- 1.00g
----
Repeat for the second node, making sure to use the same size:
----
[root@pcmk-1 ~]# ssh pcmk-2 -- lvcreate --name drbd-demo --size 1G centos_pcmk-2
Logical volume "drbd-demo" created
----
== Configure DRBD ==
There is no series of commands for building a DRBD configuration, so simply
run this on both nodes to use this sample configuration:
----
# cat </etc/drbd.d/wwwdata.res
resource wwwdata {
protocol C;
meta-disk internal;
device /dev/drbd1;
syncer {
verify-alg sha1;
}
net {
allow-two-primaries;
}
on pcmk-1 {
disk /dev/centos_pcmk-1/drbd-demo;
address 192.168.122.101:7789;
}
on pcmk-2 {
disk /dev/centos_pcmk-2/drbd-demo;
address 192.168.122.102:7789;
}
}
END
----
[IMPORTANT]
=========
Edit the file to use the hostnames, IP addresses and logical volume paths
of your nodes if they differ from the ones used in this guide.
=========
[NOTE]
=======
Detailed information on the directives used in this configuration (and
other alternatives) is available at
http://www.drbd.org/users-guide/ch-configure.html
The *allow-two-primaries* option would not normally be used in
an active/passive cluster. We are adding it here for the convenience
of changing to an active/active cluster later.
=======
== Initialize DRBD ==
With the configuration in place, we can now get DRBD running.
These commands create the local metadata for the DRBD resource,
ensure the DRBD kernel module is loaded, and bring up the DRBD resource.
Run them on one node:
----
[root@pcmk-1 ~]# drbdadm create-md wwwdata
initializing activity log
NOT initializing bitmap
Writing meta data...
New drbd meta data block successfully created.
[root@pcmk-1 ~]# modprobe drbd
[root@pcmk-1 ~]# drbdadm up wwwdata
----
We can confirm DRBD's status on this node:
----
[root@pcmk-1 ~]# cat /proc/drbd
version: 8.4.6 (api:1/proto:86-101)
GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil@Build64R7, 2015-04-10 05:13:52
1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----s
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1048508
----
Because we have not yet initialized the data, this node's data
is marked as *Inconsistent*. Because we have not yet initialized
the second node, the local state is *WFConnection* (waiting for connection),
and the partner node's status is marked as *Unknown*.
Now, repeat the above commands on the second node. This time,
when we check the status, it shows:
----
[root@pcmk-2 ~]# cat /proc/drbd
version: 8.4.6 (api:1/proto:86-101)
GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil@Build64R7, 2015-04-10 05:13:52
1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1048508
----
You can see the state has changed to *Connected*, meaning the two DRBD nodes
are communicating properly, and both nodes are in *Secondary* role
with *Inconsistent* data.
To make the data consistent, we need to tell DRBD which node should be
considered to have the correct data. In this case, since we are creating
a new resource, both have garbage, so we'll just pick pcmk-1
and run this command on it:
----
[root@pcmk-1 ~]# drbdadm primary --force wwwdata
----
[NOTE]
======
If you are using a different version of DRBD, the required syntax may be different.
See the documentation for your version for how to perform these commands.
======
If we check the status immediately, we'll see something like this:
----
[root@pcmk-1 ~]# cat /proc/drbd
version: 8.4.6 (api:1/proto:86-101)
GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil@Build64R7, 2015-04-10 05:13:52
1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:2872 nr:0 dw:0 dr:3784 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1045636
[>....................] sync'ed: 0.4% (1045636/1048508)K
finish: 0:10:53 speed: 1,436 (1,436) K/sec
----
We can see that this node has the *Primary* role, the partner node has
the *Secondary* role, this node's data is now considered *UpToDate*,
the partner node's data is still *Inconsistent*, and a progress bar
shows how far along the partner node is in synchronizing the data.
After a while, the sync should finish, and you'll see something like:
----
[root@pcmk-1 ~]# cat /proc/drbd
version: 8.4.6 (api:1/proto:86-101)
GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil@Build64R7, 2015-04-10 05:13:52
1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:1048508 nr:0 dw:0 dr:1049420 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
----
Both sets of data are now *UpToDate*, and we can proceed to creating
and populating a filesystem for our WebSite resource's documents.
== Populate the DRBD Disk ==
On the node with the primary role (pcmk-1 in this example),
create a filesystem on the DRBD device:
----
[root@pcmk-1 ~]# mkfs.xfs /dev/drbd1
meta-data=/dev/drbd1 isize=256 agcount=4, agsize=65532 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=262127, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=853, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
----
[NOTE]
====
In this example, we create an xfs filesystem with no special options.
In a production environment, you should choose a filesystem type and
options that are suitable for your application.
====
Mount the newly created filesystem, populate it with our web document,
give it the same SELinux policy as the web document root,
then unmount it (the cluster will handle mounting and unmounting it later):
----
[root@pcmk-1 ~]# mount /dev/drbd1 /mnt
[root@pcmk-1 ~]# cat <<-END >/mnt/index.html
My Test Site - DRBD
END
[root@pcmk-1 ~]# chcon -R --reference=/var/www/html /mnt
[root@pcmk-1 ~]# umount /dev/drbd1
----
== Configure the Cluster for the DRBD device ==
One handy feature `pcs` has is the ability to queue up several changes
into a file and commit those changes all at once. To do this, start by
populating the file with the current raw XML config from the CIB.
----
[root@pcmk-1 ~]# pcs cluster cib drbd_cfg
----
Using the `pcs -f` option, make changes to the configuration saved
in the +drbd_cfg+ file. These changes will not be seen by the cluster until
the +drbd_cfg+ file is pushed into the live cluster's CIB later.
Here, we create a cluster resource for the DRBD device, and an additional _clone_
resource to allow the resource to run on both nodes at the same time.
----
[root@pcmk-1 ~]# pcs -f drbd_cfg resource create WebData ocf:linbit:drbd \
drbd_resource=wwwdata op monitor interval=60s
[root@pcmk-1 ~]# pcs -f drbd_cfg resource master WebDataClone WebData \
master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 \
notify=true
[root@pcmk-1 ~]# pcs -f drbd_cfg resource show
ClusterIP (ocf::heartbeat:IPaddr2): Started
WebSite (ocf::heartbeat:apache): Started
Master/Slave Set: WebDataClone [WebData]
Stopped: [ pcmk-1 pcmk-2 ]
----
After you are satisfied with all the changes, you can commit
them all at once by pushing the drbd_cfg file into the live CIB.
----
[root@pcmk-1 ~]# pcs cluster cib-push drbd_cfg
CIB updated
----
Let's see what the cluster did with the new configuration:
----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 09:29:41 2018
Last change: Fri Jan 12 09:29:25 2018
2 nodes configured
4 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
WebSite (ocf::heartbeat:apache): Started pcmk-1
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-1 ]
Slaves: [ pcmk-2 ]
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
We can see that *WebDataClone* (our DRBD device) is running as master (DRBD's
primary role) on *pcmk-1* and slave (DRBD's secondary role) on *pcmk-2*.
[IMPORTANT]
====
The resource agent should load the DRBD module when needed if it's not already
loaded. If that does not happen, configure your operating system to load the
module at boot time. For &DISTRO; &DISTRO_VERSION;, you would run this on both
nodes:
----
# echo drbd >/etc/modules-load.d/drbd.conf
----
====
== Configure the Cluster for the Filesystem ==
Now that we have a working DRBD device, we need to mount its filesystem.
In addition to defining the filesystem, we also need to
tell the cluster where it can be located (only on the DRBD Primary)
and when it is allowed to start (after the Primary was promoted).
We are going to take a shortcut when creating the resource this time.
Instead of explicitly saying we want the *ocf:heartbeat:Filesystem* script, we
are only going to ask for *Filesystem*. We can do this because we know there is only
one resource script named *Filesystem* available to pacemaker, and that pcs is smart
enough to fill in the *ocf:heartbeat:* portion for us correctly in the configuration.
If there were multiple *Filesystem* scripts from different OCF providers, we would need
to specify the exact one we wanted.
Once again, we will queue our changes to a file and then push the
new configuration to the cluster as the final step.
----
[root@pcmk-1 ~]# pcs cluster cib fs_cfg
[root@pcmk-1 ~]# pcs -f fs_cfg resource create WebFS Filesystem \
device="/dev/drbd1" directory="/var/www/html" fstype="xfs"
[root@pcmk-1 ~]# pcs -f fs_cfg constraint colocation add WebFS with WebDataClone INFINITY with-rsc-role=Master
[root@pcmk-1 ~]# pcs -f fs_cfg constraint order promote WebDataClone then start WebFS
Adding WebDataClone WebFS (kind: Mandatory) (Options: first-action=promote then-action=start)
----
We also need to tell the cluster that Apache needs to run on the same
machine as the filesystem and that it must be active before Apache can
start.
----
[root@pcmk-1 ~]# pcs -f fs_cfg constraint colocation add WebSite with WebFS INFINITY
[root@pcmk-1 ~]# pcs -f fs_cfg constraint order WebFS then WebSite
Adding WebFS WebSite (kind: Mandatory) (Options: first-action=start then-action=start)
----
Review the updated configuration.
----
[root@pcmk-1 ~]# pcs -f fs_cfg constraint
Location Constraints:
Ordering Constraints:
start ClusterIP then start WebSite (kind:Mandatory)
promote WebDataClone then start WebFS (kind:Mandatory)
start WebFS then start WebSite (kind:Mandatory)
Colocation Constraints:
WebSite with ClusterIP (score:INFINITY)
WebFS with WebDataClone (score:INFINITY) (with-rsc-role:Master)
WebSite with WebFS (score:INFINITY)
Ticket Constraints:
----
----
[root@pcmk-1 ~]# pcs -f fs_cfg resource show
ClusterIP (ocf::heartbeat:IPaddr2): Started
WebSite (ocf::heartbeat:apache): Started
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-1 ]
Slaves: [ pcmk-2 ]
WebFS (ocf::heartbeat:Filesystem): Stopped
----
After reviewing the new configuration, upload it and watch the
cluster put it into effect.
----
[root@pcmk-1 ~]# pcs cluster cib-push fs_cfg
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 09:34:11 2018
Last change: Fri Jan 12 09:34:09 2018
2 nodes configured
5 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
WebSite (ocf::heartbeat:apache): Started pcmk-1
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-1 ]
Slaves: [ pcmk-2 ]
WebFS (ocf::heartbeat:Filesystem): Started pcmk-1
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
== Test Cluster Failover ==
Previously, we used `pcs cluster stop pcmk-1` to stop all cluster
services on *pcmk-1*, failing over the cluster resources, but there is another
way to safely simulate node failure.
We can put the node into _standby mode_. Nodes in this state continue to
run corosync and pacemaker but are not allowed to run resources. Any resources
found active there will be moved elsewhere. This feature can be particularly
useful when performing system administration tasks such as updating packages
used by cluster resources.
Put the active node into standby mode, and observe the cluster move all
the resources to the other node. The node's status will
change to indicate that it can no longer host resources.
----
[root@pcmk-1 ~]# pcs cluster standby pcmk-1
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 09:36:49 2018
Last change: Fri Jan 12 09:36:43 2018
2 nodes configured
5 resources configured
Node pcmk-1 (1): standby
Online: [ pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-2
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-2 ]
Stopped: [ pcmk-1 ]
WebFS (ocf::heartbeat:Filesystem): Started pcmk-2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
Once we've done everything we needed to on pcmk-1 (in this case nothing,
we just wanted to see the resources move), we can allow the node to be a
full cluster member again.
----
[root@pcmk-1 ~]# pcs cluster unstandby pcmk-1
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
Stack: corosync
Current DC: pcmk-1 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 09:38:02 2018
Last change: Fri Jan 12 09:37:56 2018
2 nodes configured
5 resources configured
Online: [ pcmk-1 pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-2
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-2 ]
Slaves: [ pcmk-1 ]
WebFS (ocf::heartbeat:Filesystem): Started pcmk-2
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
Notice that *pcmk-1* is back to the *Online* state, and that the cluster resources
stay where they are due to our resource stickiness settings configured earlier.
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt b/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt
index baaebeff52..51eb5a1a1a 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt
@@ -1,165 +1,166 @@
+:compat-mode: legacy
= Configure STONITH =
== What is STONITH? ==
STONITH (Shoot The Other Node In The Head aka. fencing) protects your data from
being corrupted by rogue nodes or unintended concurrent access.
Just because a node is unresponsive doesn't mean it has stopped
accessing your data. The only way to be 100% sure that your data is
safe, is to use STONITH to ensure that the node is truly
offline before allowing the data to be accessed from another node.
STONITH also has a role to play in the event that a clustered service
cannot be stopped. In this case, the cluster uses STONITH to force the
whole node offline, thereby making it safe to start the service
elsewhere.
== Choose a STONITH Device ==
It is crucial that your STONITH device can allow the cluster to
differentiate between a node failure and a network failure.
A common mistake people make when choosing a STONITH device is to use a remote
power switch (such as many on-board IPMI controllers) that shares power with
the node it controls. If the power fails in such a case, the cluster cannot be
sure whether the node is really offline, or active and suffering from a network
fault, so the cluster will stop all resources to avoid a possible split-brain
situation.
Likewise, any device that relies on the machine being active (such as
SSH-based "devices" sometimes used during testing) is inappropriate.
== Configure the Cluster for STONITH ==
. Install the STONITH agent(s). To see what packages are available, run `yum
search fence-`. Be sure to install the package(s) on all cluster nodes.
. Configure the STONITH device itself to be able to fence your nodes and accept
fencing requests. This includes any necessary configuration on the device and
on the nodes, and any firewall or SELinux changes needed. Test the
communication between the device and your nodes.
. Find the correct STONITH agent script: `pcs stonith list`
. Find the parameters associated with the device: +pcs stonith describe pass:[agent_name]+
. Create a local copy of the CIB: `pcs cluster cib stonith_cfg`
. Create the fencing resource: +pcs -f stonith_cfg stonith create pass:[stonith_id
stonith_device_type [stonith_device_options]]+
+
Any flags that do not take arguments, such as +--ssl+, should be passed as +ssl=1+.
. Enable STONITH in the cluster: `pcs -f stonith_cfg property set stonith-enabled=true`
. If the device does not know how to fence nodes based on their uname,
you may also need to set the special *pcmk_host_map* parameter. See
`man pacemaker-fenced` for details.
. If the device does not support the *list* command, you may also need
to set the special *pcmk_host_list* and/or *pcmk_host_check*
parameters. See `man pacemaker-fenced` for details.
. If the device does not expect the victim to be specified with the
*port* parameter, you may also need to set the special
*pcmk_host_argument* parameter. See `man pacemaker-fenced` for details.
. Commit the new configuration: `pcs cluster cib-push stonith_cfg`
. Once the STONITH resource is running, test it (you might want to stop
the cluster on that machine first): +stonith_admin --reboot pass:[nodename]+
== Example ==
For this example, assume we have a chassis containing four nodes
and an IPMI device active on 10.0.0.1. Following the steps above
would go something like this:
Step 1: Install the *fence-agents-ipmilan* package on both nodes.
Step 2: Configure the IP address, authentication credentials, etc. in the IPMI device itself.
Step 3: Choose the *fence_ipmilan* STONITH agent.
Step 4: Obtain the agent's possible parameters:
----
[root@pcmk-1 ~]# pcs stonith describe fence_ipmilan
fence_ipmilan - Fence agent for IPMI
fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.
Stonith options:
ipport: TCP/UDP port to use for connection with device
port: IP address or hostname of fencing device (together with --port-as-ip)
inet6_only: Forces agent to use IPv6 addresses only
ipaddr: IP Address or Hostname
passwd_script: Script to retrieve password
method: Method to fence (onoff|cycle)
inet4_only: Forces agent to use IPv4 addresses only
passwd: Login password or passphrase
lanplus: Use Lanplus to improve security of connection
auth: IPMI Lan Auth type.
action: Fencing Action WARNING: specifying 'action' is deprecated and not necessary with current Pacemaker versions.
cipher: Ciphersuite to use (same as ipmitool -C parameter)
target: Bridge IPMI requests to the remote target address
privlvl: Privilege level on IPMI device
timeout: Timeout (sec) for IPMI operation
login: Login Name
power_wait: Wait X seconds after issuing ON/OFF
login_timeout: Wait X seconds for cmd prompt after login
delay: Wait X seconds before fencing is started
power_timeout: Test X seconds for status change after ON/OFF
ipmitool_path: Path to ipmitool binary
shell_timeout: Wait X seconds for cmd prompt after issuing command
port_as_ip: Make "port/plug" to be an alias to IP address
retry_on: Count of attempts to retry power on
sudo: Use sudo (without password) when calling 3rd party sotfware.
priority: The priority of the stonith resource. Devices are tried in order of highest priority to lowest.
pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names. Eg. node1:1;node2:2,3 would tell the cluster to use port 1 for node1 and ports
2 and 3 for node2
pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).
pcmk_host_check: How to determine which machines are controlled by the device. Allowed values: dynamic-list (query the device), static-list (check the pcmk_host_list attribute),
none (assume every device can fence every machine)
pcmk_delay_max: Enable random delay for stonith actions and specify the maximum of random delay This prevents double fencing when using slow devices such as sbd. Use this to
enable random delay for stonith actions and specify the maximum of random delay.
pcmk_action_limit: The maximum number of actions can be performed in parallel on this device Cluster property concurrent-fencing=true needs to be configured first. Then use this
to specify the maximum number of actions can be performed in parallel on this device. -1 is unlimited.
Default operations:
monitor: interval=60s
----
Step 5: `pcs cluster cib stonith_cfg`
Step 6: Here are example parameters for creating our STONITH resource:
----
[root@pcmk-1 ~]# pcs -f stonith_cfg stonith create ipmi-fencing fence_ipmilan \
pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser \
passwd=acd123 op monitor interval=60s
[root@pcmk-1 ~]# pcs -f stonith_cfg stonith
ipmi-fencing (stonith:fence_ipmilan): Stopped
----
Steps 7-10: Enable STONITH in the cluster:
----
[root@pcmk-1 ~]# pcs -f stonith_cfg property set stonith-enabled=true
[root@pcmk-1 ~]# pcs -f stonith_cfg property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: mycluster
dc-version: 1.1.16-12.el7_4.5-94ff4df
have-watchdog: false
stonith-enabled: true
----
Step 11: `pcs cluster cib-push stonith_cfg`
Step 12: Test:
----
[root@pcmk-1 ~]# pcs cluster stop pcmk-2
[root@pcmk-1 ~]# stonith_admin --reboot pcmk-2
----
After a successful test, login to any rebooted nodes, and start the cluster
(with `pcs cluster start`).
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Tools.txt b/doc/Clusters_from_Scratch/en-US/Ch-Tools.txt
index fda3476caa..c396c0010f 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Tools.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Tools.txt
@@ -1,131 +1,132 @@
+:compat-mode: legacy
= Pacemaker Tools =
== Simplify administration using a cluster shell ==
In the dark past, configuring Pacemaker required the administrator to
read and write XML. In true UNIX style, there were also a number of
different commands that specialized in different aspects of querying
and updating the cluster.
All of that has been greatly simplified with the creation of unified
command-line shells (and GUIs) that hide all the messy XML
scaffolding.
These shells take all the individual aspects required for managing and
configuring a cluster, and pack them into one simple-to-use command
line tool.
They even allow you to queue up several changes at once and commit
them all at once.
Two popular command-line shells are `pcs` and
`crmsh`. This edition of Clusters from Scratch is based on `pcs`.
[NOTE]
===========
The two shells share many concepts but the scope, layout and syntax
does differ, so make sure you read the version of this guide that
corresponds to the software installed on your system.
===========
== Explore pcs ==
Start by taking some time to familiarize yourself with
what `pcs` can do.
----
[root@pcmk-1 ~]# pcs
Usage: pcs [-f file] [-h] [commands]...
Control and configure pacemaker and corosync.
Options:
-h, --help Display usage and exit.
-f file Perform actions on file instead of active CIB.
--debug Print all network traffic and external commands run.
--version Print pcs version information.
--request-timeout Timeout for each outgoing request to another node in
seconds. Default is 60s.
Commands:
cluster Configure cluster options and nodes.
resource Manage cluster resources.
stonith Manage fence devices.
constraint Manage resource constraints.
property Manage pacemaker properties.
acl Manage pacemaker access control lists.
qdevice Manage quorum device provider on the local host.
quorum Manage cluster quorum settings.
booth Manage booth (cluster ticket manager).
status View cluster status.
config View and manage cluster configuration.
pcsd Manage pcs daemon.
node Manage cluster nodes.
alert Manage pacemaker alerts.
----
As you can see, the different aspects of cluster management are separated
into categories. To discover the functionality available in each of these
categories, one can issue the command +pcs pass:[category] help+. Below
is an example of all the options available under the status category.
----
[root@pcmk-1 ~]# pcs status help
Usage: pcs status [commands]...
View current cluster and resource status
Commands:
[status] [--full | --hide-inactive]
View all information about the cluster and resources (--full provides
more details, --hide-inactive hides inactive resources).
resources [ | --full | --groups | --hide-inactive]
Show all currently configured resources or if a resource is specified
show the options for the configured resource. If --full is specified,
all configured resource options will be displayed. If --groups is
specified, only show groups (and their resources). If --hide-inactive
is specified, only show active resources.
groups
View currently configured groups and their resources.
cluster
View current cluster status.
corosync
View current membership information as seen by corosync.
quorum
View current quorum status.
qdevice [--full] []
Show runtime status of specified model of quorum device provider. Using
--full will give more detailed output. If is specified,
only information about the specified cluster will be displayed.
nodes [corosync | both | config]
View current status of nodes from pacemaker. If 'corosync' is
specified, view current status of nodes from corosync instead. If
'both' is specified, view current status of nodes from both corosync &
pacemaker. If 'config' is specified, print nodes from corosync &
pacemaker configuration.
pcsd []...
Show current status of pcsd on nodes specified, or on all nodes
configured in the local cluster if no nodes are specified.
xml
View xml version of status (output from crm_mon -r -1 -X).
----
Additionally, if you are interested in the version and
supported cluster stack(s) available with your Pacemaker
installation, run:
----
[root@pcmk-1 ~]# pacemakerd --features
Pacemaker 1.1.16-12.el7_4.5 (Build: 94ff4df)
Supporting v3.0.12: generated-manpages agent-manpages ncurses libqb-logging libqb-ipc systemd nagios corosync-native atomic-attrd acls
----
diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt b/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt
index b13f228754..19fcdf172e 100644
--- a/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt
+++ b/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt
@@ -1,147 +1,148 @@
+:compat-mode: legacy
= Start and Verify Cluster =
== Start the Cluster ==
Now that corosync is configured, it is time to start the cluster.
The command below will start corosync and pacemaker on both nodes
in the cluster. If you are issuing the start command from a different
node than the one you ran the `pcs cluster auth` command on earlier, you
must authenticate on the current node you are logged into before you will
be allowed to start the cluster.
----
[root@pcmk-1 ~]# pcs cluster start --all
pcmk-1: Starting Cluster...
pcmk-2: Starting Cluster...
----
[NOTE]
======
An alternative to using the `pcs cluster start --all` command
is to issue either of the below command sequences on each node in the
cluster separately:
----
# pcs cluster start
Starting Cluster...
----
or
----
# systemctl start corosync.service
# systemctl start pacemaker.service
----
======
[IMPORTANT]
====
In this example, we are not enabling the corosync and pacemaker services
to start at boot. If a cluster node fails or is rebooted, you will need to run
+pcs cluster start pass:[nodename]+ (or `--all`) to start the cluster on it.
While you could enable the services to start at boot, requiring a manual
start of cluster services gives you the opportunity to do a post-mortem investigation
of a node failure before returning it to the cluster.
====
== Verify Corosync Installation ==
First, use `corosync-cfgtool` to check whether cluster communication is happy:
----
[root@pcmk-1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.122.101
status = ring 0 active with no faults
----
We can see here that everything appears normal with our fixed IP
address (not a 127.0.0.x loopback address) listed as the *id*, and *no
faults* for the status.
If you see something different, you might want to start by checking
the node's network, firewall and selinux configurations.
Next, check the membership and quorum APIs:
----
[root@pcmk-1 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.122.101)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.122.102)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 2
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@pcmk-1 ~]# pcs status corosync
Membership information
--------------------------
Nodeid Votes Name
1 1 pcmk-1 (local)
2 1 pcmk-2
----
You should see both nodes have joined the cluster.
== Verify Pacemaker Installation ==
Now that we have confirmed that Corosync is functional, we can check
the rest of the stack. Pacemaker has already been started, so verify
the necessary processes are running:
----
[root@pcmk-1 ~]# ps axf
PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
...lots of processes...
1362 ? Ssl 0:35 corosync
1379 ? Ss 0:00 /usr/sbin/pacemakerd -f
1380 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-based
1381 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-fenced
1382 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-execd
1383 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-attrd
1384 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-schedulerd
1385 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-controld
----
If that looks OK, check the `pcs status` output:
----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: pcmk-2 (version 1.1.16-12.el7_4.5-94ff4df) - partition with quorum
Last updated: Fri Jan 12 16:15:29 2018
Last change: Fri Jan 12 15:49:47 2018
2 nodes configured
0 resources configured
Online: [ pcmk-1 pcmk-2 ]
No active resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
Finally, ensure there are no startup errors (aside from messages relating
to not having STONITH configured, which are OK at this point):
----
[root@pcmk-1 ~]# journalctl | grep -i error
----
[NOTE]
======
Other operating systems may report startup errors in other locations,
for example +/var/log/messages+.
======
Repeat these checks on the other node. The results should be the same.
diff --git a/doc/Pacemaker_Administration/en-US/Ch-Agents.txt b/doc/Pacemaker_Administration/en-US/Ch-Agents.txt
index 1cb2e252a3..c5afcb6b4a 100644
--- a/doc/Pacemaker_Administration/en-US/Ch-Agents.txt
+++ b/doc/Pacemaker_Administration/en-US/Ch-Agents.txt
@@ -1,337 +1,338 @@
+:compat-mode: legacy
= Resource Agents =
== OCF Resource Agents ==
=== Location of Custom Scripts ===
indexterm:[OCF Resource Agents]
OCF Resource Agents are found in +/usr/lib/ocf/resource.d/pass:[provider]+
When creating your own agents, you are encouraged to create a new
directory under +/usr/lib/ocf/resource.d/+ so that they are not
confused with (or overwritten by) the agents shipped by existing providers.
So, for example, if you choose the provider name of bigCorp and want
a new resource named bigApp, you would create a resource agent called
+/usr/lib/ocf/resource.d/bigCorp/bigApp+ and define a resource:
[source,XML]
----
----
=== Actions ===
All OCF resource agents are required to implement the following actions.
.Required Actions for OCF Agents
[width="95%",cols="3m,3,7",options="header",align="center"]
|=========================================================
|Action
|Description
|Instructions
|start
|Start the resource
|Return 0 on success and an appropriate error code otherwise. Must not
report success until the resource is fully active.
indexterm:[start,OCF Action]
indexterm:[OCF,Action,start]
|stop
|Stop the resource
|Return 0 on success and an appropriate error code otherwise. Must not
report success until the resource is fully stopped.
indexterm:[stop,OCF Action]
indexterm:[OCF,Action,stop]
|monitor
|Check the resource's state
|Exit 0 if the resource is running, 7 if it is stopped, and anything
else if it is failed.
indexterm:[monitor,OCF Action]
indexterm:[OCF,Action,monitor]
NOTE: The monitor script should test the state of the resource on the local machine only.
|meta-data
|Describe the resource
|Provide information about this resource as an XML snippet. Exit with 0.
indexterm:[meta-data,OCF Action]
indexterm:[OCF,Action,meta-data]
NOTE: This is _not_ performed as root.
|validate-all
|Verify the supplied parameters
|Return 0 if parameters are valid, 2 if not valid, and 6 if resource is not configured.
indexterm:[validate-all,OCF Action]
indexterm:[OCF,Action,validate-all]
|=========================================================
Additional requirements (not part of the OCF specification) are placed on
agents that will be used for advanced concepts such as clone resources.
.Optional Actions for OCF Resource Agents
[width="95%",cols="2m,6,3",options="header",align="center"]
|=========================================================
|Action
|Description
|Instructions
|promote
|Promote the local instance of a promotable clone resource to the master (primary) state.
|Return 0 on success
indexterm:[promote,OCF Action]
indexterm:[OCF,Action,promote]
|demote
|Demote the local instance of a promotable clone resource to the slave (secondary) state.
|Return 0 on success
indexterm:[demote,OCF Action]
indexterm:[OCF,Action,demote]
|notify
|Used by the cluster to send the agent pre- and post-notification
events telling the resource what has happened and will happen.
|Must not fail. Must exit with 0
indexterm:[notify,OCF Action]
indexterm:[OCF,Action,notify]
|=========================================================
One action specified in the OCF specs, +recover+, is not currently used by the
cluster. It is intended to be a variant of the +start+ action that tries to
recover a resource locally.
[IMPORTANT]
====
If you create a new OCF resource agent, use indexterm:[ocf-tester]`ocf-tester`
to verify that the agent complies with the OCF standard properly.
====
=== How are OCF Return Codes Interpreted? ===
The first thing the cluster does is to check the return code against
the expected result. If the result does not match the expected value,
then the operation is considered to have failed, and recovery action is
initiated.
There are three types of failure recovery:
.Types of recovery performed by the cluster
[width="95%",cols="1m,4,4",options="header",align="center"]
|=========================================================
|Type
|Description
|Action Taken by the Cluster
|soft
|A transient error occurred
|Restart the resource or move it to a new location
indexterm:[soft,OCF error]
indexterm:[OCF,error,soft]
|hard
|A non-transient error that may be specific to the current node occurred
|Move the resource elsewhere and prevent it from being retried on the current node
indexterm:[hard,OCF error]
indexterm:[OCF,error,hard]
|fatal
|A non-transient error that will be common to all cluster nodes (e.g. a bad configuration was specified)
|Stop the resource and prevent it from being started on any cluster node
indexterm:[fatal,OCF error]
indexterm:[OCF,error,fatal]
|=========================================================
[[s-ocf-return-codes]]
=== OCF Return Codes ===
The following table outlines the different OCF return codes and the type of
recovery the cluster will initiate when a failure code is received.
Although counterintuitive, even actions that return 0
(aka. +OCF_SUCCESS+) can be considered to have failed, if 0 was not
the expected return value.
.OCF Return Codes and their Recovery Types
[width="95%",cols="1m,<4m,<6,1m",options="header",align="center"]
|=========================================================
|RC
|OCF Alias
|Description
|RT
|0
|OCF_SUCCESS
|Success. The command completed successfully. This is the expected result for all start, stop, promote and demote commands.
indexterm:[Return Code,OCF_SUCCESS]
indexterm:[Return Code,0,OCF_SUCCESS]
|soft
|1
|OCF_ERR_GENERIC
|Generic "there was a problem" error code.
indexterm:[Return Code,OCF_ERR_GENERIC]
indexterm:[Return Code,1,OCF_ERR_GENERIC]
|soft
|2
|OCF_ERR_ARGS
|The resource's configuration is not valid on this machine. E.g. it refers to a location not found on the node.
indexterm:[Return Code,OCF_ERR_ARGS]
indexterm:[Return Code,2,OCF_ERR_ARGS]
|hard
|3
|OCF_ERR_UNIMPLEMENTED
|The requested action is not implemented.
indexterm:[Return Code,OCF_ERR_UNIMPLEMENTED]
indexterm:[Return Code,3,OCF_ERR_UNIMPLEMENTED]
|hard
|4
|OCF_ERR_PERM
|The resource agent does not have sufficient privileges to complete the task.
indexterm:[Return Code,OCF_ERR_PERM]
indexterm:[Return Code,4,OCF_ERR_PERM]
|hard
|5
|OCF_ERR_INSTALLED
|The tools required by the resource are not installed on this machine.
indexterm:[Return Code,OCF_ERR_INSTALLED]
indexterm:[Return Code,5,OCF_ERR_INSTALLED]
|hard
|6
|OCF_ERR_CONFIGURED
|The resource's configuration is invalid. E.g. required parameters are missing.
indexterm:[Return Code,OCF_ERR_CONFIGURED]
indexterm:[Return Code,6,OCF_ERR_CONFIGURED]
|fatal
|7
|OCF_NOT_RUNNING
|The resource is safely stopped. The cluster will not attempt to stop a resource that returns this for any action.
indexterm:[Return Code,OCF_NOT_RUNNING]
indexterm:[Return Code,7,OCF_NOT_RUNNING]
|N/A
|8
|OCF_RUNNING_MASTER
|The resource is running in master mode.
indexterm:[Return Code,OCF_RUNNING_MASTER]
indexterm:[Return Code,8,OCF_RUNNING_MASTER]
|soft
|9
|OCF_FAILED_MASTER
|The resource is in master mode but has failed. The resource will be demoted,
stopped and then started (and possibly promoted) again.
indexterm:[Return Code,OCF_FAILED_MASTER]
indexterm:[Return Code,9,OCF_FAILED_MASTER]
|soft
|other
|N/A
|Custom error code.
indexterm:[Return Code,other]
|soft
|=========================================================
Exceptions to the recovery handling described above:
* Probes (non-recurring monitor actions) that find a resource active
(or in master mode) will not result in recovery action unless it is
also found active elsewhere.
* The recovery action taken when a resource is found active more than
once is determined by the resource's +multiple-active+ property.
* Recurring actions that return +OCF_ERR_UNIMPLEMENTED+
do not cause any type of recovery.
== Init Script LSB Compliance ==
The relevant part of the
http://refspecs.linuxfoundation.org/lsb.shtml[LSB specifications]
includes a description of all the return codes listed here.
Assuming `some_service` is configured correctly and currently
inactive, the following sequence will help you determine if it is
LSB-compatible:
. Start (stopped):
+
----
# /etc/init.d/some_service start ; echo "result: $?"
----
+
.. Did the service start?
.. Did the command print *result: 0* (in addition to its usual output)?
+
. Status (running):
+
----
# /etc/init.d/some_service status ; echo "result: $?"
----
+
.. Did the script accept the command?
.. Did the script indicate the service was running?
.. Did the command print *result: 0* (in addition to its usual output)?
+
. Start (running):
+
----
# /etc/init.d/some_service start ; echo "result: $?"
----
+
.. Is the service still running?
.. Did the command print *result: 0* (in addition to its usual output)?
+
. Stop (running):
+
----
# /etc/init.d/some_service stop ; echo "result: $?"
----
+
.. Was the service stopped?
.. Did the command print *result: 0* (in addition to its usual output)?
+
. Status (stopped):
+
----
# /etc/init.d/some_service status ; echo "result: $?"
----
+
.. Did the script accept the command?
.. Did the script indicate the service was not running?
.. Did the command print *result: 3* (in addition to its usual output)?
+
. Stop (stopped):
+
----
# /etc/init.d/some_service stop ; echo "result: $?"
----
+
.. Is the service still stopped?
.. Did the command print *result: 0* (in addition to its usual output)?
+
. Status (failed):
+
.. This step is not readily testable and relies on manual inspection of the script.
+
The script can use one of the error codes (other than 3) listed in the
LSB spec to indicate that it is active but failed. This tells the
cluster that before moving the resource to another node, it needs to
stop it on the existing one first.
If the answer to any of the above questions is no, then the script is
not LSB-compliant. Your options are then to either fix the script or
write an OCF agent based on the existing script.
diff --git a/doc/Pacemaker_Administration/en-US/Ch-Cluster.txt b/doc/Pacemaker_Administration/en-US/Ch-Cluster.txt
index 3a14d7cdf3..c346d1ab7f 100644
--- a/doc/Pacemaker_Administration/en-US/Ch-Cluster.txt
+++ b/doc/Pacemaker_Administration/en-US/Ch-Cluster.txt
@@ -1,58 +1,59 @@
+:compat-mode: legacy
= The Cluster Layer =
== Pacemaker and the Cluster Layer ==
Pacemaker utilizes an underlying cluster layer for two purposes:
* obtaining quorum
* messaging between nodes
Currently, only Corosync 2 and later is supported for this layer.
== Managing Nodes in a Corosync-Based Cluster ==
=== Adding a New Corosync Node ===
indexterm:[Corosync,Add Cluster Node]
indexterm:[Add Cluster Node,Corosync]
To add a new node:
. Install Corosync and Pacemaker on the new host.
. Copy +/etc/corosync/corosync.conf+ and +/etc/corosync/authkey+ (if it exists)
from an existing node. You may need to modify the *mcastaddr* option to match
the new node's IP address.
. Start the cluster software on the new host. If a log message containing
"Invalid digest" appears from Corosync, the keys are not consistent between
the machines.
=== Removing a Corosync Node ===
indexterm:[Corosync,Remove Cluster Node]
indexterm:[Remove Cluster Node,Corosync]
Because the messaging and membership layers are the authoritative
source for cluster nodes, deleting them from the CIB is not a complete
solution. First, one must arrange for corosync to forget about the
node (*pcmk-1* in the example below).
. Stop the cluster on the host to be removed. How to do this will vary with
your operating system and installed versions of cluster software, for example,
`pcs cluster stop` if you are using pcs for cluster management.
. From one of the remaining active cluster nodes, tell Pacemaker to forget
about the removed host, which will also delete the node from the CIB:
+
----
# crm_node -R pcmk-1
----
=== Replacing a Corosync Node ===
indexterm:[Corosync,Replace Cluster Node]
indexterm:[Replace Cluster Node,Corosync]
To replace an existing cluster node:
. Make sure the old node is completely stopped.
. Give the new machine the same hostname and IP address as the old one.
. Follow the procedure above for adding a node.
diff --git a/doc/Pacemaker_Administration/en-US/Ch-Configuring.txt b/doc/Pacemaker_Administration/en-US/Ch-Configuring.txt
index 473e5b5299..5ca9dfc32e 100644
--- a/doc/Pacemaker_Administration/en-US/Ch-Configuring.txt
+++ b/doc/Pacemaker_Administration/en-US/Ch-Configuring.txt
@@ -1,435 +1,436 @@
+:compat-mode: legacy
= Configuring Pacemaker =
== How Should the Configuration be Updated? ==
There are three basic rules for updating the cluster configuration:
* Rule 1 - Never edit the +cib.xml+ file manually. Ever. I'm not making this up.
* Rule 2 - Read Rule 1 again.
* Rule 3 - The cluster will notice if you ignored rules 1 & 2 and refuse to use the configuration.
Now that it is clear how 'not' to update the configuration, we can begin
to explain how you 'should'.
=== Editing the CIB Using XML ===
The most powerful tool for modifying the configuration is the
+cibadmin+ command. With +cibadmin+, you can query, add, remove, update
or replace any part of the configuration. All changes take effect immediately,
so there is no need to perform a reload-like operation.
The simplest way of using `cibadmin` is to use it to save the current
configuration to a temporary file, edit that file with your favorite
text or XML editor, and then upload the revised configuration. footnote:[This
process might appear to risk overwriting changes that happen after the initial
cibadmin call, but pacemaker will reject any update that is "too old". If the
CIB is updated in some other fashion after the initial cibadmin, the second
cibadmin will be rejected because the version number will be too low.]
.Safely using an editor to modify the cluster configuration
======
--------
# cibadmin --query > tmp.xml
# vi tmp.xml
# cibadmin --replace --xml-file tmp.xml
--------
======
Some of the better XML editors can make use of a Relax NG schema to
help make sure any changes you make are valid. The schema describing
the configuration can be found in +pacemaker.rng+, which may be
deployed in a location such as +/usr/share/pacemaker+ or
+/usr/lib/heartbeat+ depending on your operating system and how you
installed the software.
If you want to modify just one section of the configuration, you can
query and replace just that section to avoid modifying any others.
.Safely using an editor to modify only the resources section
======
--------
# cibadmin --query --scope resources > tmp.xml
# vi tmp.xml
# cibadmin --replace --scope resources --xml-file tmp.xml
--------
======
=== Quickly Deleting Part of the Configuration ===
Identify the object you wish to delete by XML tag and id. For example,
you might search the CIB for all STONITH-related configuration:
.Searching for STONITH-related configuration items
======
----
# cibadmin -Q | grep stonith
----
======
If you wanted to delete the +primitive+ tag with id +child_DoFencing+,
you would run:
----
# cibadmin --delete --xml-text ''
----
=== Updating the Configuration Without Using XML ===
Most tasks can be performed with one of the other command-line
tools provided with pacemaker, avoiding the need to read or edit XML.
To enable STONITH for example, one could run:
----
# crm_attribute --name stonith-enabled --update 1
----
Or, to check whether *somenode* is allowed to run resources, there is:
----
# crm_standby --query --node somenode
----
Or, to find the current location of *my-test-rsc*, one can use:
----
# crm_resource --locate --resource my-test-rsc
----
Examples of using these tools for specific cases will be given throughout this
document where appropriate.
[[s-config-sandboxes]]
== Making Configuration Changes in a Sandbox ==
Often it is desirable to preview the effects of a series of changes
before updating the configuration all at once. For this purpose, we
have created `crm_shadow` which creates a
"shadow" copy of the configuration and arranges for all the command
line tools to use it.
To begin, simply invoke `crm_shadow --create` with
the name of a configuration to create footnote:[Shadow copies are
identified with a name, making it possible to have more than one.],
and follow the simple on-screen instructions.
[WARNING]
====
Read this section and the on-screen instructions carefully; failure to do so could
result in destroying the cluster's active configuration!
====
.Creating and displaying the active sandbox
======
----
# crm_shadow --create test
Setting up shadow instance
Type Ctrl-D to exit the crm_shadow shell
shadow[test]:
shadow[test] # crm_shadow --which
test
----
======
From this point on, all cluster commands will automatically use the
shadow copy instead of talking to the cluster's active configuration.
Once you have finished experimenting, you can either make the
changes active via the `--commit` option, or discard them using the `--delete`
option. Again, be sure to follow the on-screen instructions carefully!
For a full list of `crm_shadow` options and
commands, invoke it with the `--help` option.
.Use sandbox to make multiple changes all at once, discard them, and verify real configuration is untouched
======
----
shadow[test] # crm_failcount -r rsc_c001n01 -G
scope=status name=fail-count-rsc_c001n01 value=0
shadow[test] # crm_standby --node c001n02 -v on
shadow[test] # crm_standby --node c001n02 -G
scope=nodes name=standby value=on
shadow[test] # cibadmin --erase --force
shadow[test] # cibadmin --query
shadow[test] # crm_shadow --delete test --force
Now type Ctrl-D to exit the crm_shadow shell
shadow[test] # exit
# crm_shadow --which
No active shadow configuration defined
# cibadmin -Q
----
======
[[s-config-testing-changes]]
== Testing Your Configuration Changes ==
We saw previously how to make a series of changes to a "shadow" copy
of the configuration. Before loading the changes back into the
cluster (e.g. `crm_shadow --commit mytest --force`), it is often
advisable to simulate the effect of the changes with +crm_simulate+.
For example:
----
# crm_simulate --live-check -VVVVV --save-graph tmp.graph --save-dotfile tmp.dot
----
This tool uses the same library as the live cluster to show what it
would have done given the supplied input. Its output, in addition to
a significant amount of logging, is stored in two files +tmp.graph+
and +tmp.dot+. Both files are representations of the same thing: the
cluster's response to your changes.
The graph file stores the complete transition from the existing cluster state
to your desired new state, containing a list of all the actions, their
parameters and their pre-requisites. Because the transition graph is not
terribly easy to read, the tool also generates a Graphviz
footnote:[Graph visualization software. See http://www.graphviz.org/ for details.]
dot-file representing the same information.
For information on the options supported by `crm_simulate`, use
its `--help` option.
.Interpreting the Graphviz output
* Arrows indicate ordering dependencies
* Dashed arrows indicate dependencies that are not present in the transition graph
* Actions with a dashed border of any color do not form part of the transition graph
* Actions with a green border form part of the transition graph
* Actions with a red border are ones the cluster would like to execute but cannot run
* Actions with a blue border are ones the cluster does not feel need to be executed
* Actions with orange text are pseudo/pretend actions that the cluster uses to simplify the graph
* Actions with black text are sent to the LRM
* Resource actions have text of the form pass:[rsc]_pass:[action]_pass:[interval] pass:[node]
* Any action depending on an action with a red border will not be able to execute.
* Loops are _really_ bad. Please report them to the development team.
=== Small Cluster Transition ===
image::images/Policy-Engine-small.png["An example transition graph as represented by Graphviz",width="16cm",height="6cm",align="center"]
In the above example, it appears that a new node, *pcmk-2*, has come
online and that the cluster is checking to make sure *rsc1*, *rsc2*
and *rsc3* are not already running there (Indicated by the
*rscN_monitor_0* entries). Once it did that, and assuming the resources
were not active there, it would have liked to stop *rsc1* and *rsc2*
on *pcmk-1* and move them to *pcmk-2*. However, there appears to be
some problem and the cluster cannot or is not permitted to perform the
stop actions which implies it also cannot perform the start actions.
For some reason the cluster does not want to start *rsc3* anywhere.
=== Complex Cluster Transition ===
image::images/Policy-Engine-big.png["Another, slightly more complex, transition graph that you're not expected to be able to read",width="16cm",height="20cm",align="center"]
== Do I Need to Update the Configuration on All Cluster Nodes? ==
No. Any changes are immediately synchronized to the other active
members of the cluster.
To reduce bandwidth, the cluster only broadcasts the incremental
updates that result from your changes and uses MD5 checksums to ensure
that each copy is completely consistent.
== Working with CIB Properties ==
Although these fields can be written to by the user, in
most cases the cluster will overwrite any values specified by the
user with the "correct" ones.
To change the ones that can be specified by the user,
for example +admin_epoch+, one should use:
----
# cibadmin --modify --xml-text ''
----
A complete set of CIB properties will look something like this:
.Attributes set for a cib object
======
[source,XML]
-------
-------
======
== Querying and Setting Cluster Options ==
indexterm:[Querying,Cluster Option]
indexterm:[Setting,Cluster Option]
indexterm:[Cluster,Querying Options]
indexterm:[Cluster,Setting Options]
Cluster options can be queried and modified using the `crm_attribute` tool. To
get the current value of +cluster-delay+, you can run:
----
# crm_attribute --query --name cluster-delay
----
which is more simply written as
----
# crm_attribute -G -n cluster-delay
----
If a value is found, you'll see a result like this:
----
# crm_attribute -G -n cluster-delay
scope=crm_config name=cluster-delay value=60s
----
If no value is found, the tool will display an error:
----
# crm_attribute -G -n clusta-deway
scope=crm_config name=clusta-deway value=(null)
Error performing operation: No such device or address
----
To use a different value (for example, 30 seconds), simply run:
----
# crm_attribute --name cluster-delay --update 30s
----
To go back to the cluster's default value, you can delete the value, for example:
----
# crm_attribute --name cluster-delay --delete
Deleted crm_config option: id=cib-bootstrap-options-cluster-delay name=cluster-delay
----
=== When Options are Listed More Than Once ===
If you ever see something like the following, it means that the option you're modifying is present more than once.
.Deleting an option that is listed twice
=======
------
# crm_attribute --name batch-limit --delete
Multiple attributes match name=batch-limit in crm_config:
Value: 50 (set=cib-bootstrap-options, id=cib-bootstrap-options-batch-limit)
Value: 100 (set=custom, id=custom-batch-limit)
Please choose from one of the matches above and supply the 'id' with --id
-------
=======
In such cases, follow the on-screen instructions to perform the
requested action. To determine which value is currently being used by
the cluster, refer to the 'Rules' chapter of 'Pacemaker Explained'.
[[s-remote-connection]]
== Connecting from a Remote Machine ==
indexterm:[Cluster,Remote connection]
indexterm:[Cluster,Remote administration]
Provided Pacemaker is installed on a machine, it is possible to
connect to the cluster even if the machine itself is not in the same
cluster. To do this, one simply sets up a number of environment
variables and runs the same commands as when working on a cluster
node.
.Environment Variables Used to Connect to Remote Instances of the CIB
[width="95%",cols="1m,1,<3",options="header",align="center"]
|=========================================================
|Environment Variable
|Default
|Description
|CIB_user
|$USER
|The user to connect as. Needs to be part of the +haclient+ group on
the target host.
indexterm:[Environment Variable,CIB_user]
|CIB_passwd
|
|The user's password. Read from the command line if unset.
indexterm:[Environment Variable,CIB_passwd]
|CIB_server
|localhost
|The host to contact
indexterm:[Environment Variable,CIB_server]
|CIB_port
|
|The port on which to contact the server; required.
indexterm:[Environment Variable,CIB_port]
|CIB_encrypted
|TRUE
|Whether to encrypt network traffic
indexterm:[Environment Variable,CIB_encrypted]
|=========================================================
So, if *c001n01* is an active cluster node and is listening on port 1234
for connections, and *someuser* is a member of the *haclient* group,
then the following would prompt for *someuser*'s password and return
the cluster's current configuration:
----
# export CIB_port=1234; export CIB_server=c001n01; export CIB_user=someuser;
# cibadmin -Q
----
For security reasons, the cluster does not listen for remote
connections by default. If you wish to allow remote access, you need
to set the +remote-tls-port+ (encrypted) or +remote-clear-port+
(unencrypted) CIB properties (i.e., those kept in the +cib+ tag, like
+num_updates+ and +epoch+).
.Extra top-level CIB properties for remote access
[width="95%",cols="1m,1,<3",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|remote-tls-port
|_none_
|Listen for encrypted remote connections on this port.
indexterm:[remote-tls-port,Remote Connection Option]
indexterm:[Remote Connection,Option,remote-tls-port]
|remote-clear-port
|_none_
|Listen for plaintext remote connections on this port.
indexterm:[remote-clear-port,Remote Connection Option]
indexterm:[Remote Connection,Option,remote-clear-port]
|=========================================================
diff --git a/doc/Pacemaker_Administration/en-US/Ch-Installing.txt b/doc/Pacemaker_Administration/en-US/Ch-Installing.txt
index dd227b32d8..75aa566c2d 100644
--- a/doc/Pacemaker_Administration/en-US/Ch-Installing.txt
+++ b/doc/Pacemaker_Administration/en-US/Ch-Installing.txt
@@ -1,104 +1,105 @@
+:compat-mode: legacy
= Installing Cluster Software =
== Installing the Software ==
Most major Linux distributions have pacemaker packages in their standard
package repositories, or the software can be built from source code.
See the http://clusterlabs.org/wiki/Install[Install wiki page] for details.
== Enabling Pacemaker ==
=== Enabling Pacemaker For Corosync version 2 and greater ===
High-level cluster management tools are available that can configure
corosync for you. This document focuses on the lower-level details
if you want to configure corosync yourself.
Corosync configuration is normally located in
+/etc/corosync/corosync.conf+.
.Corosync configuration file for two nodes *myhost1* and *myhost2*
====
----
totem {
version: 2
secauth: off
cluster_name: mycluster
transport: udpu
}
nodelist {
node {
ring0_addr: myhost1
nodeid: 1
}
node {
ring0_addr: myhost2
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_syslog: yes
}
----
====
.Corosync configuration file for three nodes *myhost1*, *myhost2* and *myhost3*
====
----
totem {
version: 2
secauth: off
cluster_name: mycluster
transport: udpu
}
nodelist {
node {
ring0_addr: myhost1
nodeid: 1
}
node {
ring0_addr: myhost2
nodeid: 2
}
node {
ring0_addr: myhost3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
}
logging {
to_syslog: yes
}
----
====
In the above examples, the +totem+ section defines what protocol version and
options (including encryption) to use,
footnote:[
Please consult the Corosync website (http://www.corosync.org/) and
documentation for details on enabling encryption and peer authentication for
the cluster.
]
and gives the cluster a unique name (+mycluster+ in these examples).
The +node+ section lists the nodes in this cluster.
The +quorum+ section defines how the cluster uses quorum.
The important thing is that two-node clusters must be handled specially,
so +two_node: 1+ must be defined for two-node clusters (and only for two-node
clusters).
The +logging+ section should be self-explanatory.
diff --git a/doc/Pacemaker_Administration/en-US/Ch-Intro.txt b/doc/Pacemaker_Administration/en-US/Ch-Intro.txt
index 60b750761c..2686733e2c 100644
--- a/doc/Pacemaker_Administration/en-US/Ch-Intro.txt
+++ b/doc/Pacemaker_Administration/en-US/Ch-Intro.txt
@@ -1,19 +1,20 @@
+:compat-mode: legacy
= Read-Me-First =
== The Scope of this Document ==
The purpose of this document is to help system administrators learn how to
manage a Pacemaker cluster.
System administrators may be interested in other parts of the
https://www.clusterlabs.org/pacemaker/doc/[Pacemaker documentation set],
such as 'Clusters from Scratch', a step-by-step guide to setting up an
example cluster, and 'Pacemaker Explained', an exhaustive reference for
cluster configuration.
Multiple higher-level tools (both command-line and GUI) are available to
simplify cluster management. However, this document focuses on the lower-level
command-line tools that come with Pacemaker itself. The concepts are applicable
to the higher-level tools, though the syntax would differ.
include::../../shared/en-US/pacemaker-intro.txt[]
diff --git a/doc/Pacemaker_Administration/en-US/Ch-Monitoring.txt b/doc/Pacemaker_Administration/en-US/Ch-Monitoring.txt
index b9edabae2a..9792d5ceff 100644
--- a/doc/Pacemaker_Administration/en-US/Ch-Monitoring.txt
+++ b/doc/Pacemaker_Administration/en-US/Ch-Monitoring.txt
@@ -1,60 +1,61 @@
+:compat-mode: legacy
= Monitoring a Pacemaker Cluster =
== Using crm_mon ==
The `crm_mon` utility displays the current state of an active cluster. It can
show the cluster status organized by node or by resource, and can be used in
either single-shot or dynamically updating mode. It can also display operations
performed and information about failures.
Using this tool, you can examine the state of the cluster for irregularities,
and see how it responds when you cause or simulate failures.
See the manual page or the output of `crm_mon --help` for a full description of
its many options.
.Sample output from crm_mon -1
======
-------
Stack: corosync
Current DC: node2 (version 2.0.0-1) - partition with quorum
Last updated: Mon Jan 29 12:18:42 2018
Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
5 nodes configured
2 resources configured
Online: [ node1 node2 node3 node4 node5 ]
Active resources:
Fencing (stonith:fence_xvm): Started node1
IP (ocf:heartbeat:IPaddr2): Started node2
-------
======
.Sample output from crm_mon -n -1
======
-------
Stack: corosync
Current DC: node2 (version 2.0.0-1) - partition with quorum
Last updated: Mon Jan 29 12:21:48 2018
Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
5 nodes configured
2 resources configured
Node node1: online
Fencing (stonith:fence_xvm): Started
Node node2: online
IP (ocf:heartbeat:IPaddr2): Started
Node node3: online
Node node4: online
Node node5: online
-------
======
As mentioned in an earlier chapter, the DC is the node is where decisions are
made. The cluster elects a node to be DC as needed. The only significance of
the choice of DC to an administrator is the fact that its logs will have the
most information about why decisions were made.
diff --git a/doc/Pacemaker_Administration/en-US/Ch-Upgrading.txt b/doc/Pacemaker_Administration/en-US/Ch-Upgrading.txt
index e6c7ecc38a..166a98c4f7 100644
--- a/doc/Pacemaker_Administration/en-US/Ch-Upgrading.txt
+++ b/doc/Pacemaker_Administration/en-US/Ch-Upgrading.txt
@@ -1,454 +1,455 @@
+:compat-mode: legacy
= Upgrading a Pacemaker Cluster =
== Pacemaker Versioning ==
Pacemaker has an overall release version, plus separate version numbers for
certain internal components.
* *Pacemaker release version:* This version consists of three numbers
(_x.y.z_).
+
The major version number (the _x_ in _x.y.z_) increases when at least some
rolling upgrades are not possible from the previous major version. For example,
a rolling upgrade from 1.0.8 to 1.1.15 should always be supported, but a
rolling upgrade from 1.0.8 to 2.0.0 may not be possible.
+
The minor version (the _y_ in _x.y.z_) increases when there are significant
changes in cluster default behavior, tool behavior, and/or the API interface
(for software that utilizes Pacemaker libraries). The main benefit is to alert
you to pay closer attention to the release notes, to see if you might be
affected.
+
The release counter (the _z_ in _x.y.z_) is increased with all public releases
of Pacemaker, which typically include both bug fixes and new features.
* *CRM feature set:* This version number applies to the communication between
full cluster nodes, and is used to avoid problems in mixed-version clusters.
+
The major version number increases when nodes with different versions would not
work (rolling upgrades are not allowed). The minor version number increases
when mixed-version clusters are allowed only during rolling upgrades. The
minor-minor version number is ignored, but allows resource agents to detect
cluster support for various features. footnote:[
Before CRM feature set 3.1.0 (Pacemaker 2.0.0), the minor-minor
version number was treated the same as the minor version.
]
+
Pacemaker ensures that the longest-running node is the cluster's DC. This
ensures new features are not enabled until all nodes are upgraded to support
them.
* *LRMD protocol version:* This version applies to communication between a
Pacemaker Remote node and the cluster. It increases when an older cluster
node would have problems hosting the connection to a newer Pacemaker Remote
node. To avoid these problems, Pacemaker Remote nodes will accept connections
only from cluster nodes with the same or newer LRMD protocol version.
+
Unlike with CRM feature set differences between full cluster nodes,
mixed LRMD protocol versions between Pacemaker Remote nodes and full cluster
nodes are fine, as long as the Pacemaker Remote nodes have the older version.
This can be useful, for example, to host a legacy application in an
older operating system version used as a Pacemaker Remote node.
* *XML schema version:* Pacemaker’s configuration syntax — what's allowed in
the Configuration Information Base (CIB) — has its own version. This allows
the configuration syntax to evolve over time while still allowing clusters
with older configurations to work without change.
== Upgrading Cluster Software ==
There are three approaches to upgrading a cluster, each with advantages and
disadvantages.
.Upgrade Methods
[width="95%",cols="s,6*",options="header",align="center"]
|=========================================================
|Method
|Available between all versions
|Can be used with Pacemaker Remote nodes
|Service outage during upgrade
|Service recovery during upgrade
|Exercises failover logic
|Allows change of messaging layer
indexterm:[Cluster,switching between stacks]
indexterm:[Changing cluster stack]
footnote:[Currently, Corosync version 2 and greater is the only supported
cluster stack, but other stacks have been supported by past versions, and may
be supported by future versions.]
|Complete cluster shutdown
indexterm:[upgrade,shutdown]
indexterm:[shutdown upgrade]
|yes
|yes
|always
|N/A
|no
|yes
|Rolling (node by node)
indexterm:[upgrade,rolling]
indexterm:[rolling upgrade]
|no
|yes
|always
footnote:[Any active resources will be moved off the node being upgraded,
so there will be at least a brief outage unless all resources can be
migrated "live".]
|yes
|yes
|no
|Detach and reattach
indexterm:[upgrade,reattach]
indexterm:[reattach upgrade]
|yes
|no
|only due to failure
|no
|no
|yes
|=========================================================
=== Complete Cluster Shutdown ===
In this scenario, one shuts down all cluster nodes and resources,
then upgrades all the nodes before restarting the cluster.
. On each node:
.. Shutdown the cluster software (pacemaker and the messaging layer).
.. Upgrade the Pacemaker software. This may also include upgrading the
messaging layer and/or the underlying operating system.
.. Check the configuration with the `crm_verify` tool.
. On each node:
.. Start the cluster software.
Currently, only Corosync version 2 and greater is supported as the cluster
layer, but if another stack is supported in the future, the stack does not
need to be the same one before the upgrade.
One variation of this approach is to build a new cluster on new hosts.
This allows the new version to be tested beforehand, and minimizes downtime by
having the new nodes ready to be placed in production as soon as the old nodes
are shut down.
=== Rolling (node by node) ===
In this scenario, each node is removed from the cluster, upgraded, and then
brought back online, until all nodes are running the newest version.
Special considerations when planning a rolling upgrade:
* If you plan to upgrade other cluster software -- such as the messaging layer --
at the same time, consult that software's documentation for its compatibility
with a rolling upgrade.
* If the major version number is changing in the Pacemaker version you are
upgrading to, a rolling upgrade may not be possible. Read the new version's
release notes (as well the information here) for what limitations may exist.
* If the CRM feature set is changing in the Pacemaker version you are upgrading
to, you should run a mixed-version cluster only during a small rolling
upgrade window. If one of the older nodes drops out of the cluster for any
reason, it will not be able to rejoin until it is upgraded.
* If the LRMD protocol version is changing, all cluster nodes should be
upgraded before upgrading any Pacemaker Remote nodes.
See the ClusterLabs wiki's
http://clusterlabs.org/wiki/ReleaseCalendar[Release Calendar] to figure out
whether the CRM feature set and/or LRMD protocol version changed between the
the Pacemaker release versions in your rolling upgrade.
To perform a rolling upgrade, on each node in turn:
. Put the node into standby mode, and wait for any active resources
to be moved cleanly to another node. (This step is optional, but
allows you to deal with any resource issues before the upgrade.)
. Shutdown the cluster software (pacemaker and the messaging layer) on the node.
. Upgrade the Pacemaker software. This may also include upgrading the
messaging layer and/or the underlying operating system.
. If this is the first node to be upgraded, check the configuration
with the `crm_verify` tool.
. Start the messaging layer.
This must be the same messaging layer (currently only Corosync version 2 and
greater is supported) that the rest of the cluster is using.
[NOTE]
====
Even if a rolling upgrade from the current version of the cluster to the newest
version is not directly possible, it may be possible to perform a rolling
upgrade in multiple steps, by upgrading to an intermediate version first.
.Version Compatibility Table
[width="95%",cols="2*",options="header",align="center"]
|=========================================================
|Version being Installed
|Oldest Compatible Version
|Pacemaker 2.y.z
|Pacemaker 1.1.11
footnote:[Rolling upgrades from Pacemaker 1.1.z to 2.y.z are possible only if
the cluster uses corosync version 2 or greater as its messaging layer, and the
Cluster Information Base (CIB) uses schema 1.0 or higher in its validate-with
property.]
|Pacemaker 1.y.z
|Pacemaker 1.0.0
|Pacemaker 0.7.z
|Pacemaker 0.6.z
|=========================================================
====
=== Detach and Reattach ===
The reattach method is a variant of a complete cluster shutdown, where the
resources are left active and get re-detected when the cluster is restarted.
This method may not be used if the cluster contains any Pacemaker Remote nodes.
. Tell the cluster to stop managing services. This is required to allow the
services to remain active after the cluster shuts down.
+
----
# crm_attribute --name maintenance-mode --update true
----
. On each node, shutdown the cluster software (pacemaker and the messaging
layer), and upgrade the Pacemaker software. This may also include upgrading
the messaging layer. While the underlying operating system may be upgraded
at the same time, that will be more likely to cause outages in the detached
services (certainly, if a reboot is required).
. Check the configuration with the `crm_verify` tool.
. On each node, start the cluster software.
Currently, only Corosync version 2 and greater is supported as the cluster
layer, but if another stack is supported in the future, the stack does not
need to be the same one before the upgrade.
. Verify that the cluster re-detected all resources correctly.
. Allow the cluster to resume managing resources again:
+
----
# crm_attribute --name maintenance-mode --delete
----
== Upgrading the Configuration ==
indexterm:[upgrade,Configuration]
indexterm:[Configuration,upgrading]
The CIB schema version can change from one Pacemaker version to another.
After cluster software is upgraded, the cluster will continue to use
the older schema version that it was previously using. This can be useful, for
example, when administrators have written tools that modify the configuration,
and are based on the older syntax.
footnote:[As of Pacemaker 2.0.0, only schema versions pacemaker-1.0 and higher
are supported (excluding pacemaker-1.1, which was an experimental schema
now known as pacemaker-next).]
However, when using an older syntax, new features may be unavailable, and there
is a performance impact, since the cluster must do a non-persistent
configuration upgrade before each transition. So while using the old syntax is
possible, it is not advisable to continue using it indefinitely.
Even if you wish to continue using the old syntax, it is a good idea to
follow the upgrade procedure outlined below, except for the last step, to ensure
that the new software has no problems with your existing configuration (since it
will perform much the same task internally).
If you are brave, it is sufficient simply to run `cibadmin --upgrade`.
A more cautious approach would proceed like this:
. Create a shadow copy of the configuration. The later commands will automatically
operate on this copy, rather than the live configuration.
+
-----
# crm_shadow --create shadow
-----
. Verify the configuration is valid with the new software (which may be
stricter about syntax mistakes, or may have dropped support for deprecated
features):
indexterm:[Configuration,verify]
indexterm:[verify,Configuration]
+
-----
# crm_verify --live-check
-----
. Fix any errors or warnings.
. Perform the upgrade:
+
-----
# cibadmin --upgrade
-----
. If this step fails, there are three main possibilities:
.. The configuration was not valid to start with (did you do steps 2 and 3?).
.. The transformation failed - http://bugs.clusterlabs.org/[report a bug] or
mailto:users@clusterlabs.org?subject=Transformation%20failed%20during%20upgrade[email the project].
.. The transformation was successful but produced an invalid result.
+
If the result of the transformation is invalid, you may see a number of errors
from the validation library. If these are not helpful, visit the
http://clusterlabs.org/wiki/Validation_FAQ[Validation FAQ wiki page] and/or try
the manual upgrade procedure described below.
+
. Check the changes:
+
-----
# crm_shadow --diff
-----
+
If at this point there is anything about the upgrade that you wish to fine-tune
(for example, to change some of the automatic IDs), now is the time to do so:
+
-----
# crm_shadow --edit
-----
+
This will open the configuration in your favorite editor (whichever is
specified by the standard *$EDITOR* environment variable).
+
. Preview how the cluster will react:
+
------
# crm_simulate --live-check --save-dotfile shadow.dot -S
# graphviz shadow.dot
------
+
Verify that either no resource actions will occur or that you are
happy with any that are scheduled. If the output contains actions you
do not expect (possibly due to changes to the score calculations), you
may need to make further manual changes. See
<> for further details on how to interpret
the output of `crm_simulate` and `graphviz`.
+
. Upload the changes:
+
-----
# crm_shadow --commit shadow --force
-----
+
In the unlikely event this step fails, please report a bug.
[NOTE]
====
indexterm:[Configuration,upgrade manually]
It is also possible to perform the configuration upgrade steps manually:
. Locate the +upgrade*.xsl+ conversion scripts provided with the source code. These will often
be installed in a location such as +/usr/share/pacemaker+, or may be obtained from
the https://github.com/ClusterLabs/pacemaker/tree/master/xml[source repository].
. Run the conversion scripts that apply to your older version, for example:
indexterm:[XML,convert]
+
-----
# xsltproc /path/to/upgrade06.xsl config06.xml > config10.xml
-----
+
. Locate the +pacemaker.rng+ script (from the same location as the xsl files).
. Check the XML validity: indexterm:[validate configuration]indexterm:[Configuration,validate XML]
+
----
# xmllint --relaxng /path/to/pacemaker.rng config10.xml
----
The advantage of this method is that it can be performed without the
cluster running, and any validation errors are often more informative.
====
== What Changed in 2.0 ==
The main goal of the 2.0 release was to remove support for deprecated syntax,
along with some small changes in default configuration behavior and tool
behavior. Highlights:
* Only Corosync version 2 and greater is now supported as the underlying
cluster layer. Support for Heartbeat and Corosync 1 (including CMAN) is
removed.
* The Pacemaker detail log file is now stored in
/var/log/pacemaker/pacemaker.log by default.
* The record-pending cluster property now defaults to true, which
allows status tools such as crm_mon to show operations that are in
progress.
* Support for a number of deprecated build options, environment variables,
and configuration settings has been removed.
* The +master+ tag has been deprecated in favor of using a +clone+ tag with the
new +promotable+ meta-attribute set to +true+. "Master/slave" clone resources
are now referred to as "promotable" clone resources, though it will take
longer for the full terminology change to be completed.
* The public API for Pacemaker libraries that software applications can use
has changed significantly.
For a detailed list of changes, see the release notes and the
https://wiki.clusterlabs.org/wiki/Pacemaker_2.0_Changes[Pacemaker 2.0 Changes]
page on the ClusterLabs wiki.
== What Changed in 1.0 ==
=== New ===
* Failure timeouts.
* New section for resource and operation defaults.
* Tool for making offline configuration changes.
* +Rules, instance_attributes, meta_attributes+ and sets of operations can be defined once and referenced in multiple places.
* The CIB now accepts XPath-based create/modify/delete operations. See the pass:[cibadmin] help text.
* Multi-dimensional colocation and ordering constraints.
* The ability to connect to the CIB from non-cluster machines.
* Allow recurring actions to be triggered at known times.
=== Changed ===
* Syntax
** All resource and cluster options now use dashes (-) instead of underscores (_)
** +master_slave+ was renamed to +master+
** The +attributes+ container tag was removed
** The operation field +pre-req+ has been renamed +requires+
** All operations must have an +interval+, +start+/+stop+ must have it set to zero
* The +stonith-enabled+ option now defaults to true.
* The cluster will refuse to start resources if +stonith-enabled+ is true (or unset) and no STONITH resources have been defined
* The attributes of colocation and ordering constraints were renamed for clarity.
* +resource-failure-stickiness+ has been replaced by +migration-threshold+.
* The parameters for command-line tools have been made consistent
* Switched to 'RelaxNG' schema validation and 'libxml2' parser
** id fields are now XML IDs which have the following limitations:
*** id's cannot contain colons (:)
*** id's cannot begin with a number
*** id's must be globally unique (not just unique for that tag)
** Some fields (such as those in constraints that refer to resources) are IDREFs.
+
This means that they must reference existing resources or objects in
order for the configuration to be valid. Removing an object which is
referenced elsewhere will therefore fail.
+
** The CIB representation, from which a MD5 digest is calculated to verify CIBs on the nodes, has changed.
+
This means that every CIB update will require a full refresh on any
upgraded nodes until the cluster is fully upgraded to 1.0. This will
result in significant performance degradation and it is therefore
highly inadvisable to run a mixed 1.0/0.6 cluster for any longer than
absolutely necessary.
+
* Ping node information no longer needs to be added to _ha.cf_.
+
Simply include the lists of hosts in your ping resource(s).
=== Removed ===
* Syntax
** It is no longer possible to set resource meta options as top-level
attributes. Use meta attributes instead.
** Resource and operation defaults are no longer read from
+crm_config+.
diff --git a/doc/Pacemaker_Development/en-US/Ch-Coding.txt b/doc/Pacemaker_Development/en-US/Ch-Coding.txt
index ecb228ae39..c0bfde984c 100644
--- a/doc/Pacemaker_Development/en-US/Ch-Coding.txt
+++ b/doc/Pacemaker_Development/en-US/Ch-Coding.txt
@@ -1,198 +1,199 @@
+:compat-mode: legacy
= C Coding Guidelines =
////
We prefer [[ch-NAME]], but older versions of asciidoc don't deal well
with that construct for chapter headings
////
anchor:ch-c-coding[Chapter 2, C Coding Guidelines]
== C Boilerplate ==
indexterm:[C,boilerplate]
indexterm:[licensing,C boilerplate]
Every C file should start like this:
====
[source,C]
----
/*
* Copyright Andrew Beekhof
*
* This source code is licensed under WITHOUT ANY WARRANTY.
*/
----
====
++ is the year the code was 'originally' created.
footnote:[
See the U.S. Copyright Office's https://www.copyright.gov/comp3/["Compendium
of U.S. Copyright Office Practices"], particularly "Chapter 2200: Notice of
Copyright", sections 2205.1(A) and 2205.1(F), or
https://techwhirl.com/updating-copyright-notices/["Updating Copyright
Notices"] for a more readable summary.
]
If the code is modified in later years, add +-YYYY+ with the most recent year
of modification.
++ should follow the policy set forth in the
https://github.com/ClusterLabs/pacemaker/blob/master/COPYING[+COPYING+] file,
generally one of "GNU General Public License version 2 or later (GPLv2+)"
or "GNU Lesser General Public License version 2.1 or later (LGPLv2.1+)".
== Formatting ==
=== Whitespace ===
indexterm:[C,whitespace]
- Indentation must be 4 spaces, no tabs.
- Do not leave trailing whitespace.
=== Line Length ===
- Lines should be no longer than 80 characters unless limiting line length
significantly impacts readability.
=== Pointers ===
indexterm:[C,pointers]
- The +*+ goes by the variable name, not the type:
====
[source,C]
----
char *foo;
----
====
- Use a space before the +*+ and after the closing parenthesis in a cast:
====
[source,C]
----
char *foo = (char *) bar;
----
====
=== Functions ===
indexterm:[C,functions]
- In the function definition, put the return type on its own line, and place
the opening brace by itself on a line:
====
[source,C]
----
static int
foo(void)
{
----
====
- For functions with enough arguments that they must break to the next line,
align arguments with the first argument:
====
[source,C]
----
static int
function_name(int bar, const char *a, const char *b,
const char *c, const char *d)
{
----
====
- If a function name gets really long, start the arguments on their own line
with 8 spaces of indentation:
====
[source,C]
----
static int
really_really_long_function_name_this_is_getting_silly_now(
int bar, const char *a, const char *b,
const char *c, const char *d)
{
----
====
=== Control Statements (if, else, while, for, switch) ===
- The keyword is followed by one space, then left parenthesis without space,
condition, right parenthesis, space, opening bracket on the same line.
+else+ and +else if+ are on the same line with the ending brace and opening
brace, separated by a space:
====
[source,C]
----
if (condition1) {
statement1;
} else if (condition2) {
statement2;
} else {
statement3;
}
----
====
- In a +switch+ statement, +case+ is indented one level, and the body of each
+case+ is indented by another level. The opening brace is on the same line as
+switch+.
====
[source,C]
----
switch (expression) {
case 0:
command1;
break;
case 1:
command2;
break;
default:
command3;
}
----
====
=== Operators ===
indexterm:[C,operators]
- Operators have spaces from both sides. Do not rely on operator precedence;
use parentheses when mixing operators with different priority.
- No space is used after opening parenthesis and before closing parenthesis.
====
[source,C]
----
x = a + b - (c * d);
----
====
== Naming Conventions ==
indexterm:[C,naming]
- Any exposed symbols in libraries (non-+static+ function names, type names,
etc.) must begin with a prefix appropriate to the library, for example,
+crm_+, +pe_+, +st_+, +lrm_+.
== vim Settings ==
indexterm:[vim]
Developers who use +vim+ to edit source code can add the following settings to
their +~/.vimrc+ file to follow Pacemaker C coding guidelines:
----
" follow Pacemaker coding guidelines when editing C source code files
filetype plugin indent on
au FileType c setlocal expandtab tabstop=4 softtabstop=4 shiftwidth=4 textwidth=80
autocmd BufNewFile,BufRead *.h set filetype=c
let c_space_errors = 1
----
diff --git a/doc/Pacemaker_Development/en-US/Ch-FAQ.txt b/doc/Pacemaker_Development/en-US/Ch-FAQ.txt
index 065ba04d94..26490e5a84 100644
--- a/doc/Pacemaker_Development/en-US/Ch-FAQ.txt
+++ b/doc/Pacemaker_Development/en-US/Ch-FAQ.txt
@@ -1,112 +1,113 @@
+:compat-mode: legacy
= Frequently Asked Questions =
[qanda]
Who is this document intended for?::
Anyone who wishes to read and/or edit the Pacemaker source code.
Casual contributors should feel free to read just this FAQ, and
consult other chapters as needed.
Where is the source code for Pacemaker?::
indexterm:[downloads]
indexterm:[source code]
indexterm:[git,GitHub]
The https://github.com/ClusterLabs/pacemaker[source code for Pacemaker] is
kept on https://github.com/[GitHub], as are all software projects under the
https://github.com/ClusterLabs[ClusterLabs] umbrella. Pacemaker uses
https://git-scm.com/[Git] for source code management. If you are a Git newbie,
the http://schacon.github.io/git/gittutorial.html[gittutorial(7) man page]
is an excellent starting point. If you're familiar with using Git from the
command line, you can create a local copy of the Pacemaker source code with:
`git clone https://github.com/ClusterLabs/pacemaker.git pacemaker`
What are the different Git branches and repositories used for?::
indexterm:[branches]
* The https://github.com/ClusterLabs/pacemaker/tree/master[master branch]
is the primary branch used for development.
* The https://github.com/ClusterLabs/pacemaker/tree/1.1[1.1 branch] contains
the latest official release, and normally does not receive any changes.
During the release cycle, it will contain release candidates for the
next official release, and will receive only bug fixes.
* The https://github.com/ClusterLabs/pacemaker-1.0[1.0 repository] is a
frozen snapshot of the 1.0 release series, and is no longer developed.
* Messages will be posted to the
http://clusterlabs.org/mailman/listinfo/developers[developers@clusterlabs.org]
mailing list during the release cycle, with instructions about which
branches to use when submitting requests.
How do I build from the source code?::
See https://github.com/ClusterLabs/pacemaker/blob/master/INSTALL.md[INSTALL.md]
in the main checkout directory.
What coding style should I follow?::
You'll be mostly fine if you simply follow the example of existing code.
When unsure, see the relevant chapter of this document for language-specific
recommendations. Pacemaker has grown and evolved organically over many years,
so you will see much code that doesn't conform to the current guidelines. We
discourage making changes solely to bring code into conformance, as any change
requires developer time for review and opens the possibility of adding bugs.
However, new code should follow the guidelines, and it is fine to bring lines
of older code into conformance when modifying that code for other reasons.
How should I format my Git commit messages?::
indexterm:[git,commit messages]
See existing examples in the git log. The first line should look like
+change-type: affected-code: explanation+ where +change-type+ can be
+Fix+ or +Bug+ for most bug fixes, +Feature+ for new features, +Log+ for
changes to log messages or handling, +Doc+ for changes to documentation or
comments, or +Test+ for changes in CTS and regression tests. You will
sometimes see +Low+, +Med+ (or +Mid+) and +High+ used instead for bug fixes,
to indicate the severity. The important thing is that only commits with
+Feature+, +Fix+, +Bug+, or +High+ will automatically be included in the
change log for the next release. The +affected-code+ is the name of the
component(s) being changed, for example, +pacemaker-controld+ or
+libcrmcommon+ (it's more free-form, so don't sweat getting it exact). The
+explanation+ briefly describes the change. The git project recommends the
entire summary line stay under 50 characters, but more is fine if needed for
clarity. Except for the most simple and obvious of changes, the summary should
be followed by a blank line and then a longer explanation of 'why' the change
was made.
How can I test my changes?::
Most importantly, Pacemaker has regression tests for most major components;
these will automatically be run for any pull requests submitted through
GitHub. Additionally, Pacemaker's Cluster Test Suite (CTS) can be used to set
up a test cluster and run a wide variety of complex tests. This document will
have more detail on testing in the future.
What is Pacemaker's license?::
indexterm:[licensing]
Except where noted otherwise in the file itself, the source code for all
Pacemaker programs is licensed under version 2 or later of the GNU General
Public License (https://www.gnu.org/licenses/gpl-2.0.html[GPLv2+]), its
headers and libraries under version 2.1 or later of the less restrictive
GNU Lesser General Public License
(https://www.gnu.org/licenses/lgpl-2.1.html[LGPLv2.1+]),
its documentation under version 4.0 or later of the
Creative Commons Attribution-ShareAlike International Public License
(https://creativecommons.org/licenses/by-sa/4.0/legalcode[CC-BY-SA]),
and its init scripts under the
https://opensource.org/licenses/BSD-3-Clause[Revised BSD] license. If you find
any deviations from this policy, or wish to inquire about alternate licensing
arrangements, please e-mail mailto:andrew@beekhof.net[andrew@beekhof.net].
Licensing issues are also discussed on the
http://clusterlabs.org/wiki/License[ClusterLabs wiki].
How can I contribute my changes to the project?::
Contributions of bug fixes or new features are very much appreciated!
Patches can be submitted as
https://help.github.com/articles/using-pull-requests/[pull requests]
via GitHub (the preferred method, due to its excellent
https://github.com/features/[features]), or e-mailed to the
http://clusterlabs.org/mailman/listinfo/developers[developers@clusterlabs.org]
mailing list as an attachment in a format Git can import.
What if I still have questions?::
indexterm:[mailing lists]
Ask on the
http://clusterlabs.org/mailman/listinfo/developers[developers@clusterlabs.org]
mailing list for development-related questions, or on the
http://clusterlabs.org/mailman/listinfo/users[users@clusterlabs.org]
mailing list for general questions about using Pacemaker.
Developers often also hang out on http://freenode.net/[freenode's]
#clusterlabs IRC channel.
diff --git a/doc/Pacemaker_Development/en-US/Ch-Python.txt b/doc/Pacemaker_Development/en-US/Ch-Python.txt
index f372dd87d8..bd450fc3c6 100644
--- a/doc/Pacemaker_Development/en-US/Ch-Python.txt
+++ b/doc/Pacemaker_Development/en-US/Ch-Python.txt
@@ -1,154 +1,155 @@
+:compat-mode: legacy
= Python Coding Guidelines =
////
We prefer [[ch-NAME]], but older versions of asciidoc don't deal well
with that construct for chapter headings
////
anchor:ch-python-coding[Chapter 3, Python Coding Guidelines]
[[s-python-boilerplate]]
== Python Boilerplate ==
indexterm:[Python,boilerplate]
indexterm:[licensing,Python boilerplate]
If a Python file is meant to be executed (as opposed to imported), it should
have a +.in+ extension, and its first line should be:
====
----
#!@PYTHON@
----
====
which will be replaced with the appropriate python executable when Pacemaker is
built. To make that happen, add an AC_CONFIG_FILES() line to configure.ac, and
add the file name without .in to .gitignore (see existing examples).
After the above line if any, every Python file should start like this:
====
[source,Python]
----
"""
"""
# Pacemaker targets compatibility with Python 2.7 and 3.2+
from __future__ import print_function, unicode_literals, absolute_import, division
__copyright__ = "Copyright Andrew Beekhof "
__license__ = " WITHOUT ANY WARRANTY"
----
====
If the file is meant to be directly executed, the first line (++)
should be +#!/usr/bin/python+. If it is meant to be imported, omit this line.
++ is obviously a brief description of the file's
purpose. The string may contain any other information typically used in
a Python file https://www.python.org/dev/peps/pep-0257/[docstring].
The +import+ statement is discussed further in <>.
++ is the year the code was 'originally' created.
footnote:[
See the U.S. Copyright Office's https://www.copyright.gov/comp3/["Compendium
of U.S. Copyright Office Practices"], particularly "Chapter 2200: Notice of
Copyright", sections 2205.1(A) and 2205.1(F), or
https://techwhirl.com/updating-copyright-notices/["Updating Copyright
Notices"] for a more readable summary.
]
If the code is modified in later years, add +-YYYY+ with the most recent year
of modification.
++ should follow the policy set forth in the
https://github.com/ClusterLabs/pacemaker/blob/master/COPYING[+COPYING+] file,
generally one of "GNU General Public License version 2 or later (GPLv2+)"
or "GNU Lesser General Public License version 2.1 or later (LGPLv2.1+)".
== Python Compatibility ==
indexterm:[Python,2]
indexterm:[Python,3]
indexterm:[Python,versions]
Pacemaker targets compatibility with Python 2.7, and Python 3.2 and
later. These versions have added features to be more compatible with each
other, allowing us to support both the 2 and 3 series with the same code. It is
a good idea to test any changes with both Python 2 and 3.
[[s-python-future-imports]]
=== Python Future Imports ===
The future imports used in <> mean:
* All print statements must use parentheses, and printing without a newline
is accomplished with the +end=' '+ parameter rather than a trailing comma.
* All string literals will be treated as Unicode (the +u+ prefix is
unnecessary, and must not be used, because it is not available in Python 3.2).
* Local modules must be imported using +from . import+ (rather than just
+import+). To import one item from a local module, use
+from .modulename import+ (rather than +from modulename import+).
* Division using +/+ will always return a floating-point result (use +//+ if
you want the integer floor instead).
=== Other Python Compatibility Requirements ===
* When specifying an exception variable, always use +as+ instead of a comma
(e.g. +except Exception as e+ or +except (TypeError, IOError) as e+).
Use +e.args+ to access the error arguments (instead of iterating over or
subscripting +e+).
* Use +in+ (not +has_key()+) to determine if a dictionary has a particular key.
* Always use the I/O functions from the +io+ module rather than the native
I/O functions (e.g. +io.open()+ rather than +open()+).
* When opening a file, always use the +t+ (text) or +b+ (binary) mode flag.
* When creating classes, always specify a parent class to ensure that it is a
"new-style" class (e.g. +class Foo(object):+ rather than +class Foo:+)
* Be aware of the bytes type added in Python 3. Many places where strings are
used in Python 2 use bytes or bytearrays in Python 3 (for example, the pipes
used with +subprocess.Popen()+). Code should handle both possibilities.
* Be aware that the +items()+, +keys()+, and +values()+ methods of dictionaries
return lists in Python 2 and views in Python 3. In many case, no special
handling is required, but if the code needs to use list methods on the
result, cast the result to list first.
* Do not raise or catch strings as exceptions (e.g. +raise "Bad thing"+).
* Do not use the +cmp+ parameter of sorting functions (use +key+ instead, if
needed) or the +$$__cmp__()$$+ method of classes (implement rich comparison
methods such as +$$__lt__()$$+ instead, if needed).
* Do not use the +buffer+ type.
* Do not use features not available in all targeted Python versions. Common
examples include:
** The +html+, +ipaddress+, and +UserDict+ modules
** The +subprocess.run()+ function
** The +subprocess.DEVNULL+ constant
** +subprocess+ module-specific exceptions
=== Python Usages to Avoid ===
Avoid the following if possible, otherwise research the compatibility issues
involved (hacky workarounds are often available):
* long integers
* octal integer literals
* mixed binary and string data in one data file or variable
* metaclasses
* +locale.strcoll+ and +locale.strxfrm+
* the +configparser+ and +ConfigParser+ modules
* importing compatibility modules such as +six+ (so we don't have
to add them to Pacemaker's dependencies)
== Formatting Python Code ==
indexterm:[Python,formatting]
* Indentation must be 4 spaces, no tabs.
* Do not leave trailing whitespace.
* Lines should be no longer than 80 characters unless limiting line length
significantly impacts readability. For Python, this limitation is
flexible since breaking a line often impacts readability, but
definitely keep it under 120 characters.
* Where not conflicting with this style guide, it is recommended (but not
required) to follow https://www.python.org/dev/peps/pep-0008/[PEP 8].
* It is recommended (but not required) to format Python code such that
`pylint --disable=line-too-long,too-many-lines,too-many-instance-attributes,too-many-arguments,too-many-statements`
produces minimal complaints (even better if you don't need to disable all
those checks).
diff --git a/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt b/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt
index 2e4228f541..b89bf4af04 100644
--- a/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt
+++ b/doc/Pacemaker_Explained/en-US/Ap-FAQ.txt
@@ -1,59 +1,60 @@
+:compat-mode: legacy
[appendix]
[[ap-faq]]
== FAQ ==
[qanda]
Why is the Project Called Pacemaker?::
indexterm:[Pacemaker]
First of all, the reason it's not called the CRM is because of the abundance
of terms footnote:[http://en.wikipedia.org/wiki/CRM] that are commonly
abbreviated to those three letters. The Pacemaker name came from Kham,
footnote:[http://khamsouk.souvanlasy.com/] a good friend of Pacemaker
developer Andrew Beekhof's, and was originally used by a Java GUI that Beekhof
was prototyping in early 2007. Alas, other commitments prevented the GUI from
progressing much and, when it came time to choose a name for this project,
Lars Marowsky-Bree suggested it was an even better fit for an independent CRM.
The idea stems from the analogy between the role of this software and that of
the little device that keeps the human heart pumping. Pacemaker monitors the
cluster and intervenes when necessary to ensure the smooth operation of the
services it provides.
There were a number of other names (and acronyms) tossed around, but suffice to
say "Pacemaker" was the best.
Why was the Pacemaker Project Created?::
Pacemaker was spun off from an earlier project called
http://linux-ha.org/[Heartbeat], which combined a cluster layer and a cluster
resource manager. The CRM was made into its own project, Pacemaker, in order to:
* support both the Corosync and Heartbeat cluster stacks equally (Heartbeat
support was dropped in Pacemaker 2.0, as the project had faded out by then)
* decouple the release cycles of the cluster layer and the cluster resource
manager at very different stages of their life-cycles
* foster clearer package boundaries, thus leading to better and more stable interfaces
What Messaging Layers are Supported?::
indexterm:[Messaging Layers]
* http://www.corosync.org/[Corosync] version 2 and greater
* Historically, Pacemaker 1 also supported Corosync version 1 (with either
CMAN or a pacemaker plugin) and Heartbeat. Support for these legacy stacks
was dropped with Pacemaker 2.0.
Where Can I Get Pre-built Packages?::
Most major Linux distributions have pacemaker packages in their standard
package repositories. See the http://clusterlabs.org/wiki/Install[Install wiki
page] for details.
What Versions of Pacemaker Are Supported?::
Some Linux distributions (such as Red Hat Enterprise Linux and SUSE Linux
Enterprise) offer technical support for their customers; contact them
for details of such support.
For help within the community (mailing lists, IRC, etc.) from Pacemaker developers
and users, refer to the http://clusterlabs.org/wiki/Releases[Releases wiki page]
for an up-to-date list of versions considered to be supported by the project.
When seeking assistance, please try to ensure you have one of these versions.
diff --git a/doc/Pacemaker_Explained/en-US/Ap-Samples.txt b/doc/Pacemaker_Explained/en-US/Ap-Samples.txt
index 4494c18d55..f1dadec145 100644
--- a/doc/Pacemaker_Explained/en-US/Ap-Samples.txt
+++ b/doc/Pacemaker_Explained/en-US/Ap-Samples.txt
@@ -1,152 +1,153 @@
+:compat-mode: legacy
[appendix]
== Sample Configurations ==
=== Empty ===
.An Empty Configuration
=======
[source,XML]
-------
-------
=======
=== Simple ===
.A simple configuration with two nodes, some cluster options and a resource
=======
[source,XML]
-------
-------
=======
In the above example, we have one resource (an IP address) that we check
every five minutes and will run on host +c001n01+ until either the
resource fails 10 times or the host shuts down.
=== Advanced Configuration ===
.An advanced configuration with groups, clones and STONITH
=======
[source,XML]
-------
-------
=======
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Options.txt b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Options.txt
index c662c60a49..d0aba3914f 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Options.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Options.txt
@@ -1,728 +1,729 @@
+:compat-mode: legacy
= Advanced Configuration =
[[s-recurring-start]]
== Specifying When Recurring Actions are Performed ==
By default, recurring actions are scheduled relative to when the
resource started. So if your resource was last started at 14:32 and
you have a backup set to be performed every 24 hours, then the backup
will always run in the middle of the business day -- hardly
desirable.
To specify a date and time that the operation should be relative to, set
the operation's +interval-origin+. The cluster uses this point to
calculate the correct +start-delay+ such that the operation will occur
at _origin + (interval * N)_.
So, if the operation's interval is 24h, its interval-origin is set to
02:00 and it is currently 14:32, then the cluster would initiate
the operation with a start delay of 11 hours and 28 minutes. If the
resource is moved to another node before 2am, then the operation is
cancelled.
The value specified for +interval+ and +interval-origin+ can be any
date/time conforming to the
http://en.wikipedia.org/wiki/ISO_8601[ISO8601 standard]. By way of
example, to specify an operation that would run on the first Monday of
2009 and every Monday after that, you would add:
.Specifying a Base for Recurring Action Intervals
=====
[source,XML]
=====
[[s-failure-handling]]
== Handling Resource Failure ==
By default, Pacemaker will attempt to recover failed resources by restarting
them. However, failure recovery is highly configurable.
=== Failure Counts ===
Pacemaker tracks resource failures for each combination of node, resource, and
operation (start, stop, monitor, etc.).
You can query the fail count for a particular node, resource, and/or operation
using the `crm_failcount` command. For example, to see how many times the
10-second monitor for +myrsc+ has failed on +node1+, run:
----
# crm_failcount --query -r myrsc -N node1 -n monitor -I 10s
----
If you omit the node, `crm_failcount` will use the local node. If you omit the
operation and interval, `crm_failcount` will display the sum of the fail counts
for all operations on the resource.
You can use `crm_resource --cleanup` or `crm_failcount --delete` to clear
fail counts. For example, to clear the above monitor failures, run:
----
# crm_resource --cleanup -r myrsc -N node1 -n monitor -I 10s
----
If you omit the resource, `crm_resource --cleanup` will clear failures for all
resources. If you omit the node, it will clear failures on all nodes. If you
omit the operation and interval, it will clear the failures for all operations
on the resource.
[NOTE]
====
Even when cleaning up only a single operation, all failed operations will
disappear from the status display. This allows us to trigger a re-check of the
resource's current status.
====
Higher-level tools may provide other commands for querying and clearing
fail counts.
The `crm_mon` tool shows the current cluster status, including any failed
operations. To see the current fail counts for any failed resources, call
`crm_mon` with the `--failcounts` option. This shows the fail counts per
resource (that is, the sum of any operation fail counts for the resource).
=== Failure Response ===
Normally, if a running resource fails, pacemaker will try to stop it and start
it again. Pacemaker will choose the best location to start it each time, which
may be the same node that it failed on.
However, if a resource fails repeatedly, it is possible that there is an
underlying problem on that node, and you might desire trying a different node
in such a case. Pacemaker allows you to set your preference via the
+migration-threshold+ resource meta-attribute.
footnote:[
The naming of this option was perhaps unfortunate as it is easily
confused with live migration, the process of moving a resource from
one node to another without stopping it. Xen virtual guests are the
most common example of resources that can be migrated in this manner.
]
If you define +migration-threshold=pass:[N]+ for a
resource, it will be banned from the original node after 'N' failures.
[NOTE]
====
The +migration-threshold+ is per 'resource', even though fail counts are
tracked per 'operation'. The operation fail counts are added together
to compare against the +migration-threshold+.
====
By default, fail counts remain until manually cleared by an administrator
using `crm_resource --cleanup` or `crm_failcount --delete` (hopefully after
first fixing the failure's cause). It is possible to have fail counts expire
automatically by setting the +failure-timeout+ resource meta-attribute.
[IMPORTANT]
====
A successful operation does not clear past failures. If a recurring monitor
operation fails once, succeeds many times, then fails again days later, its
fail count is 2. Fail counts are cleared only by manual intervention or
falure timeout.
====
For example, a setting of +migration-threshold=2+ and +failure-timeout=60s+
would cause the resource to move to a new node after 2 failures, and
allow it to move back (depending on stickiness and constraint scores) after one
minute.
[NOTE]
====
+failure-timeout+ is measured since the most recent failure. That is, older
failures do not individually time out and lower the fail count. Instead, all
failures are timed out simultaneously (and the fail count is reset to 0) if
there is no new failure for the timeout period.
====
There are two exceptions to the migration threshold concept:
when a resource either fails to start or fails to stop.
If the cluster property +start-failure-is-fatal+ is set to +true+ (which is the
default), start failures cause the fail count to be set to +INFINITY+ and thus
always cause the resource to move immediately.
Stop failures are slightly different and crucial. If a resource fails
to stop and STONITH is enabled, then the cluster will fence the node
in order to be able to start the resource elsewhere. If STONITH is
not enabled, then the cluster has no way to continue and will not try
to start the resource elsewhere, but will try to stop it again after
the failure timeout.
[IMPORTANT]
Please read <> to understand how timeouts work
before configuring a +failure-timeout+.
== Moving Resources ==
indexterm:[Moving,Resources]
indexterm:[Resource,Moving]
=== Moving Resources Manually ===
There are primarily two occasions when you would want to move a
resource from its current location: when the whole node is under
maintenance, and when a single resource needs to be moved.
==== Standby Mode ====
Since everything eventually comes down to a score, you could create
constraints for every resource to prevent them from running on one
node. While pacemaker configuration can seem convoluted at times, not even
we would require this of administrators.
Instead, one can set a special node attribute which tells the cluster
"don't let anything run here". There is even a helpful tool to help
query and set it, called `crm_standby`. To check the standby status
of the current machine, run:
----
# crm_standby -G
----
A value of +on+ indicates that the node is _not_ able to host any
resources, while a value of +off+ says that it _can_.
You can also check the status of other nodes in the cluster by
specifying the `--node` option:
----
# crm_standby -G --node sles-2
----
To change the current node's standby status, use `-v` instead of `-G`:
----
# crm_standby -v on
----
Again, you can change another host's value by supplying a hostname with `--node`.
==== Moving One Resource ====
When only one resource is required to move, we could do this by creating
location constraints. However, once again we provide a user-friendly
shortcut as part of the `crm_resource` command, which creates and
modifies the extra constraints for you. If +Email+ were running on
+sles-1+ and you wanted it moved to a specific location, the command
would look something like:
----
# crm_resource -M -r Email -H sles-2
----
Behind the scenes, the tool will create the following location constraint:
[source,XML]
It is important to note that subsequent invocations of `crm_resource
-M` are not cumulative. So, if you ran these commands
----
# crm_resource -M -r Email -H sles-2
# crm_resource -M -r Email -H sles-3
----
then it is as if you had never performed the first command.
To allow the resource to move back again, use:
----
# crm_resource -U -r Email
----
Note the use of the word _allow_. The resource can move back to its
original location but, depending on +resource-stickiness+, it might
stay where it is. To be absolutely certain that it moves back to
+sles-1+, move it there before issuing the call to `crm_resource -U`:
----
# crm_resource -M -r Email -H sles-1
# crm_resource -U -r Email
----
Alternatively, if you only care that the resource should be moved from
its current location, try:
----
# crm_resource -B -r Email
----
Which will instead create a negative constraint, like
[source,XML]
This will achieve the desired effect, but will also have long-term
consequences. As the tool will warn you, the creation of a
+-INFINITY+ constraint will prevent the resource from running on that
node until `crm_resource -U` is used. This includes the situation
where every other cluster node is no longer available!
In some cases, such as when +resource-stickiness+ is set to
+INFINITY+, it is possible that you will end up with the problem
described in <>. The tool can detect
some of these cases and deals with them by creating both
positive and negative constraints. E.g.
+Email+ prefers +sles-1+ with a score of +-INFINITY+
+Email+ prefers +sles-2+ with a score of +INFINITY+
which has the same long-term consequences as discussed earlier.
=== Moving Resources Due to Connectivity Changes ===
You can configure the cluster to move resources when external connectivity is
lost in two steps.
==== Tell Pacemaker to Monitor Connectivity ====
First, add an *ocf:pacemaker:ping* resource to the cluster. The
*ping* resource uses the system utility of the same name to a test whether
list of machines (specified by DNS hostname or IPv4/IPv6 address) are
reachable and uses the results to maintain a node attribute called +pingd+
by default.
footnote:[
The attribute name is customizable, in order to allow multiple ping groups to be defined.
]
[NOTE]
===========
Older versions of Pacemaker used a different agent *ocf:pacemaker:pingd* which
is now deprecated in favor of *ping*. If your version of Pacemaker does not
contain the *ping* resource agent, download the latest version from
https://github.com/ClusterLabs/pacemaker/tree/master/extra/resources/ping
===========
Normally, the ping resource should run on all cluster nodes, which means that
you'll need to create a clone. A template for this can be found below
along with a description of the most interesting parameters.
.Common Options for a 'ping' Resource
[width="95%",cols="1m,<4",options="header",align="center"]
|=========================================================
|Field
|Description
|dampen
|The time to wait (dampening) for further changes to occur. Use this
to prevent a resource from bouncing around the cluster when cluster
nodes notice the loss of connectivity at slightly different times.
indexterm:[dampen,Ping Resource Option]
indexterm:[Ping Resource,Option,dampen]
|multiplier
|The number of connected ping nodes gets multiplied by this value to
get a score. Useful when there are multiple ping nodes configured.
indexterm:[multiplier,Ping Resource Option]
indexterm:[Ping Resource,Option,multiplier]
|host_list
|The machines to contact in order to determine the current
connectivity status. Allowed values include resolvable DNS host
names, IPv4 and IPv6 addresses.
indexterm:[host_list,Ping Resource Option]
indexterm:[Ping Resource,Option,host_list]
|=========================================================
.An example ping cluster resource that checks node connectivity once every minute
=====
[source,XML]
------------
------------
=====
[IMPORTANT]
===========
You're only half done. The next section deals with telling Pacemaker
how to deal with the connectivity status that +ocf:pacemaker:ping+ is
recording.
===========
==== Tell Pacemaker How to Interpret the Connectivity Data ====
[IMPORTANT]
======
Before attempting the following, make sure you understand
<>.
======
There are a number of ways to use the connectivity data.
The most common setup is for people to have a single ping
target (e.g. the service network's default gateway), to prevent the cluster
from running a resource on any unconnected node.
.Don't run a resource on unconnected nodes
=====
[source,XML]
-------
-------
=====
A more complex setup is to have a number of ping targets configured.
You can require the cluster to only run resources on nodes that can
connect to all (or a minimum subset) of them.
.Run only on nodes connected to three or more ping targets.
=====
[source,XML]
-------
...
...
...
-------
=====
Alternatively, you can tell the cluster only to _prefer_ nodes with the best
connectivity. Just be sure to set +multiplier+ to a value higher than
that of +resource-stickiness+ (and don't set either of them to
+INFINITY+).
.Prefer the node with the most connected ping nodes
=====
[source,XML]
-------
-------
=====
It is perhaps easier to think of this in terms of the simple
constraints that the cluster translates it into. For example, if
*sles-1* is connected to all five ping nodes but *sles-2* is only
connected to two, then it would be as if you instead had the following
constraints in your configuration:
.How the cluster translates the above location constraint
=====
[source,XML]
-------
-------
=====
The advantage is that you don't have to manually update any
constraints whenever your network connectivity changes.
You can also combine the concepts above into something even more
complex. The example below shows how you can prefer the node with the
most connected ping nodes provided they have connectivity to at least
three (again assuming that +multiplier+ is set to 1000).
.A more complex example of choosing a location based on connectivity
=====
[source,XML]
-------
-------
=====
[[s-migrating-resources]]
=== Migrating Resources ===
Normally, when the cluster needs to move a resource, it fully restarts
the resource (i.e. stops the resource on the current node
and starts it on the new node).
However, some types of resources, such as Xen virtual guests, are able to move to
another location without loss of state (often referred to as live migration
or hot migration). In pacemaker, this is called resource migration.
Pacemaker can be configured to migrate a resource when moving it,
rather than restarting it.
Not all resources are able to migrate; see the Migration Checklist
below, and those that can, won't do so in all situations.
Conceptually, there are two requirements from which the other
prerequisites follow:
* The resource must be active and healthy at the old location; and
* everything required for the resource to run must be available on
both the old and new locations.
The cluster is able to accommodate both 'push' and 'pull' migration models
by requiring the resource agent to support two special actions:
+migrate_to+ (performed on the current location) and +migrate_from+
(performed on the destination).
In push migration, the process on the current location transfers the
resource to the new location where is it later activated. In this
scenario, most of the work would be done in the +migrate_to+ action
and, if anything, the activation would occur during +migrate_from+.
Conversely for pull, the +migrate_to+ action is practically empty and
+migrate_from+ does most of the work, extracting the relevant resource
state from the old location and activating it.
There is no wrong or right way for a resource agent to implement migration,
as long as it works.
.Migration Checklist
* The resource may not be a clone.
* The resource must use an OCF style agent.
* The resource must not be in a failed or degraded state.
* The resource agent must support +migrate_to+ and
+migrate_from+ actions, and advertise them in its metadata.
* The resource must have the +allow-migrate+ meta-attribute set to
+true+ (which is not the default).
If an otherwise migratable resource depends on another resource
via an ordering constraint, there are special situations in which it will be
restarted rather than migrated.
For example, if the resource depends on a clone, and at the time the resource
needs to be moved, the clone has instances that are stopping and instances
that are starting, then the resource will be restarted. The scheduler is not
yet able to model this situation correctly and so takes the safer (if less
optimal) path.
Also, if a migratable resource depends on a non-migratable resource, and both
need to be moved, the migratable resource will be restarted.
[[s-node-health]]
== Tracking Node Health ==
A node may be functioning adequately as far as cluster membership is concerned,
and yet be "unhealthy" in some respect that makes it an undesirable location
for resources. For example, a disk drive may be reporting SMART errors, or the
CPU may be highly loaded.
Pacemaker offers a way to automatically move resources off unhealthy nodes.
=== Node Health Attributes ===
Pacemaker will treat any node attribute whose name starts with +#health+ as an
indicator of node health. Node health attributes may have one of the following
values:
.Allowed Values for Node Health Attributes
[width="95%",cols="1,<3",options="header",align="center"]
|=========================================================
|Value
|Intended significance
|+red+
|This indicator is unhealthy
indexterm:[Node health,red]
|+yellow+
|This indicator is becoming unhealthy
indexterm:[Node health,yellow]
|+green+
|This indicator is healthy
indexterm:[Node health,green]
|'integer'
|A numeric score to apply to all resources on this node
(0 or positive is healthy, negative is unhealthy)
indexterm:[Node health,score]
|=========================================================
=== Node Health Strategy ===
Pacemaker assigns a node health score to each node, as the sum of the values of
all its node health attributes. This score will be used as a location
constraint applied to this node for all resources.
The +node-health-strategy+ cluster option controls how Pacemaker responds to
changes in node health attributes, and how it translates +red+, +yellow+, and
+green+ to scores.
Allowed values are:
.Node Health Strategies
[width="95%",cols="1m,<3",options="header",align="center"]
|=========================================================
|Value
|Effect
|none
|Do not track node health attributes at all.
indexterm:[Node health,none]
|migrate-on-red
|Assign the value of +-INFINITY+ to +red+, and 0 to +yellow+ and +green+.
This will cause all resources to move off the node if any attribute is +red+.
indexterm:[Node health,migrate-on-red]
|only-green
|Assign the value of +-INFINITY+ to +red+ and +yellow+, and 0 to +green+.
This will cause all resources to move off the node if any attribute is +red+
or +yellow+.
indexterm:[Node health,only-green]
|progressive
|Assign the value of the +node-health-red+ cluster option to +red+, the value
of +node-health-yellow+ to +yellow+, and the value of +node-health-green+ to
+green+. Each node is additionally assigned a score of +node-health-base+
(this allows resources to start even if some attributes are +yellow+). This
strategy gives the administrator finer control over how important each value
is.
indexterm:[Node health,progressive]
|custom
|Track node health attributes using the same values as +progressive+ for
+red+, +yellow+, and +green+, but do not take them into account.
The administrator is expected to implement a policy by defining rules
(see <>) referencing node health attributes.
indexterm:[Node health,custom]
|=========================================================
=== Measuring Node Health ===
Since Pacemaker calculates node health based on node attributes,
any method that sets node attributes may be used to measure node
health. The most common ways are resource agents or separate daemons.
Pacemaker provides examples that can be used directly or as a basis for
custom code. The +ocf:pacemaker:HealthCPU+ and +ocf:pacemaker:HealthSMART+
resource agents set node health attributes based on CPU and disk parameters.
The +ipmiservicelogd+ daemon sets node health attributes based on IPMI
values (the +ocf:pacemaker:SystemHealth+ resource agent can be used to manage
the daemon as a cluster resource).
== Reloading Services After a Definition Change ==
The cluster automatically detects changes to the definition of
services it manages. The normal response is to stop the
service (using the old definition) and start it again (with the new
definition). This works well, but some services are smarter and can
be told to use a new set of options without restarting.
To take advantage of this capability, the resource agent must:
. Accept the +reload+ operation and perform any required actions.
_The actions here depend completely on your application!_
+
.The DRBD agent's logic for supporting +reload+
=====
[source,Bash]
-------
case $1 in
start)
drbd_start
;;
stop)
drbd_stop
;;
reload)
drbd_reload
;;
monitor)
drbd_monitor
;;
*)
drbd_usage
exit $OCF_ERR_UNIMPLEMENTED
;;
esac
exit $?
-------
=====
. Advertise the +reload+ operation in the +actions+ section of its metadata
+
.The DRBD Agent Advertising Support for the +reload+ Operation
=====
[source,XML]
-------
1.1
Master/Slave OCF Resource Agent for DRBD
...
-------
=====
. Advertise one or more parameters that can take effect using +reload+.
+
Any parameter with the +unique+ set to 0 is eligible to be used in this way.
+
.Parameter that can be changed using reload
=====
[source,XML]
-------
Full path to the drbd.conf file.Path to drbd.conf
-------
=====
Once these requirements are satisfied, the cluster will automatically
know to reload the resource (instead of restarting) when a non-unique
field changes.
[NOTE]
======
Metadata will not be re-read unless the resource needs to be started. This may
mean that the resource will be restarted the first time, even though you
changed a parameter with +unique=0+.
======
[NOTE]
======
If both a unique and non-unique field are changed simultaneously, the
resource will still be restarted.
======
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
index 4c401d1dd1..c41be61a6f 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Advanced-Resources.txt
@@ -1,1454 +1,1455 @@
+:compat-mode: legacy
= Advanced Resource Types =
[[group-resources]]
== Groups - A Syntactic Shortcut ==
indexterm:[Group Resources]
indexterm:[Resource,Groups]
One of the most common elements of a cluster is a set of resources
that need to be located together, start sequentially, and stop in the
reverse order. To simplify this configuration, we support the concept
of groups.
.A group of two primitive resources
======
[source,XML]
-------
-------
======
Although the example above contains only two resources, there is no
limit to the number of resources a group can contain. The example is
also sufficient to explain the fundamental properties of a group:
* Resources are started in the order they appear in (+Public-IP+
first, then +Email+)
* Resources are stopped in the reverse order to which they appear in
(+Email+ first, then +Public-IP+)
If a resource in the group can't run anywhere, then nothing after that
is allowed to run, too.
* If +Public-IP+ can't run anywhere, neither can +Email+;
* but if +Email+ can't run anywhere, this does not affect +Public-IP+
in any way
The group above is logically equivalent to writing:
.How the cluster sees a group resource
======
[source,XML]
-------
-------
======
Obviously as the group grows bigger, the reduced configuration effort
can become significant.
Another (typical) example of a group is a DRBD volume, the filesystem
mount, an IP address, and an application that uses them.
=== Group Properties ===
.Properties of a Group Resource
[width="95%",cols="3m,<5",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the group
indexterm:[id,Group Resource Property]
indexterm:[Resource,Group Property,id]
|=========================================================
=== Group Options ===
Groups inherit the +priority+, +target-role+, and +is-managed+ properties
from primitive resources. See <> for information about
those properties.
=== Group Instance Attributes ===
Groups have no instance attributes. However, any that are set for the group
object will be inherited by the group's children.
=== Group Contents ===
Groups may only contain a collection of cluster resources (see
<>). To refer to a child of a group resource, just use
the child's +id+ instead of the group's.
=== Group Constraints ===
Although it is possible to reference a group's children in
constraints, it is usually preferable to reference the group itself.
.Some constraints involving groups
======
[source,XML]
-------
-------
======
=== Group Stickiness ===
indexterm:[resource-stickiness,Groups]
Stickiness, the measure of how much a resource wants to stay where it
is, is additive in groups. Every active resource of the group will
contribute its stickiness value to the group's total. So if the
default +resource-stickiness+ is 100, and a group has seven members,
five of which are active, then the group as a whole will prefer its
current location with a score of 500.
[[s-resource-clone]]
== Clones - Resources That Can Have Multiple Active Instances ==
indexterm:[Clone Resources]
indexterm:[Resource,Clones]
'Clone' resources are resources that can have more than one copy active at the
same time. This allows you, for example, to run a copy of a daemon on every
node. You can clone any primitive or group resource.
footnote:[
Of course, the service must support running multiple instances.
]
=== Anonymous versus Unique Clones ===
A clone resource is configured to be either 'anonymous' or 'globally unique'.
Anonymous clones are the simplest. These behave completely identically
everywhere they are running. Because of this, there can be only one instance of
an anonymous clone active per node.
The instances of globally unique clones are distinct entities. All instances
are launched identically, but one instance of the clone is not identical to any
other instance, whether running on the same node or a different node. As an
example, a cloned IP address can use special kernel functionality such that
each instance handles a subset of requests for the same IP address.
[[s-resource-promotable]]
=== Promotable clones ===
indexterm:[Promotable Clone Resources]
indexterm:[Resource,Promotable]
If a clone is 'promotable', its instances can perform a special role that
Pacemaker will manage via the +promote+ and +demote+ actions of the resource
agent.
Services that support such a special role have various terms for the special
role and the default role: primary and secondary, master and replica,
controller and worker, etc. Pacemaker uses the terms 'master' and 'slave',
footnote:[
These are historical terms that will eventually be replaced, but the extensive
use of them and the need for backward compatibility makes it a long process.
You may see examples using a +master+ tag instead of a +clone+ tag with the
+promotable+ meta-attribute set to +true+; the +master+ tag is supported, but
deprecated, and will be removed in a future version. You may also see such
services referred to as 'multi-state' or 'stateful'; these means the same thing
as 'promotable'.
]
but is agnostic to what the service calls them or what they do.
All that Pacemaker cares about is that an instance comes up in the default role
when started, and the resource agent supports the +promote+ and +demote+ actions
to manage entering and exiting the special role.
=== Clone Properties ===
.Properties of a Clone Resource
[width="95%",cols="3m,<5",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the clone
indexterm:[id,Clone Property]
indexterm:[Clone,Property,id]
|=========================================================
=== Clone Options ===
<> inherited from primitive resources:
+priority, target-role, is-managed+
.Clone-specific configuration options
[width="95%",cols="1m,1,<3",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|globally-unique
|false
|If +true+, each clone instance performs a distinct function
indexterm:[globally-unique,Clone Option]
indexterm:[Clone,Option,globally-unique]
|clone-max
|number of nodes in cluster
|The maximum number of clone instances that can be started across the entire
cluster
indexterm:[clone-max,Clone Option]
indexterm:[Clone,Option,clone-max]
|clone-node-max
|1
|If +globally-unique+ is +true+, the maximum number of clone instances that can
be started on a single node
indexterm:[clone-node-max,Clone Option]
indexterm:[Clone,Option,clone-node-max]
|clone-min
|0
|Require at least this number of clone instances to be runnable before allowing
resources depending on the clone to be runnable. A value of 0 means require
all clone instances to be runnable.
indexterm:[clone-min,Clone Option]
indexterm:[Clone,Option,clone-min]
|notify
|false
|Call the resource agent's +notify+ action for all active instances, before and
after starting or stopping any clone instance. The resource agent must support
this action. Allowed values: +false+, +true+
indexterm:[notify,Clone Option]
indexterm:[Clone,Option,notify]
|ordered
|false
|If +true+, clone instances must be started sequentially instead of in parallel
Allowed values: +false+, +true+
indexterm:[ordered,Clone Option]
indexterm:[Clone,Option,ordered]
|interleave
|false
|When this clone is ordered relative to another clone, if this option is
+false+ (the default), the ordering is relative to 'all' instances of the
other clone, whereas if this option is +true+, the ordering is relative only
to instances on the same node.
Allowed values: +false+, +true+
indexterm:[interleave,Clone Option]
indexterm:[Clone,Option,interleave]
|promotable
|false
|If +true+, clone instances can perform a special role that Pacemaker will
manage via the resource agent's +promote+ and +demote+ actions. The resource
agent must support these actions.
Allowed values: +false+, +true+
indexterm:[promotable,Clone Option]
indexterm:[Clone,Option,promotable]
|promoted-max
|1
|If +promotable+ is +true+, the number of instances that can be promoted at one
time across the entire cluster
indexterm:[promoted-max,Clone Option]
indexterm:[Clone,Option,promoted-max]
|promoted-node-max
|1
|If +promotable+ is +true+ and +globally-unique+ is +false+, the number of
clone instances can be promoted at one time on a single node
indexterm:[promoted-node-max,Clone Option]
indexterm:[Clone,Option,promoted-node-max]
|=========================================================
For backward compatibility, +master-max+ and +master-node-max+ are accepted as
aliases for +promoted-max+ and +promoted-node-max+, but are deprecated since
2.0.0, and support for them will be removed in a future version.
=== Clone Contents ===
Clones must contain exactly one primitive or group resource.
.A clone that runs a web server on all nodes
====
[source,XML]
----
----
====
[WARNING]
You should never reference the name of a clone's child (the primitive or group
resource being cloned). If you think you need to do this, you probably need to
re-evaluate your design.
=== Clone Instance Attributes ===
Clones have no instance attributes; however, any that are set here will be
inherited by the clone's child.
=== Clone Constraints ===
In most cases, a clone will have a single instance on each active cluster
node. If this is not the case, you can indicate which nodes the
cluster should preferentially assign copies to with resource location
constraints. These constraints are written no differently from those
for primitive resources except that the clone's +id+ is used.
.Some constraints involving clones
======
[source,XML]
-------
-------
======
Ordering constraints behave slightly differently for clones. In the
example above, +apache-stats+ will wait until all copies of +apache-clone+
that need to be started have done so before being started itself.
Only if _no_ copies can be started will +apache-stats+ be prevented
from being active. Additionally, the clone will wait for
+apache-stats+ to be stopped before stopping itself.
Colocation of a primitive or group resource with a clone means that
the resource can run on any node with an active instance of the clone.
The cluster will choose an instance based on where the clone is running and
the resource's own location preferences.
Colocation between clones is also possible. If one clone +A+ is colocated
with another clone +B+, the set of allowed locations for +A+ is limited to
nodes on which +B+ is (or will be) active. Placement is then performed
normally.
==== Promotable Clone Constraints ====
For promotable clone resources, the +first-action+ and/or +then-action+ fields
for ordering constraints may be set to +promote+ or +demote+ to constrain the
master role, and colocation constraints may contain +rsc-role+ and/or
+with-rsc-role+ fields.
.Additional colocation constraint options for promotable clone resources
[width="95%",cols="1m,1,<3",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|rsc-role
|Started
|An additional attribute of colocation constraints that specifies the
role that +rsc+ must be in. Allowed values: +Started+, +Master+,
+Slave+.
indexterm:[rsc-role,Ordering Constraints]
indexterm:[Constraints,Ordering,rsc-role]
|with-rsc-role
|Started
|An additional attribute of colocation constraints that specifies the
role that +with-rsc+ must be in. Allowed values: +Started+,
+Master+, +Slave+.
indexterm:[with-rsc-role,Ordering Constraints]
indexterm:[Constraints,Ordering,with-rsc-role]
|=========================================================
.Constraints involving promotable clone resources
======
[source,XML]
-------
-------
======
In the example above, +myApp+ will wait until one of the database
copies has been started and promoted to master before being started
itself on the same node. Only if no copies can be promoted will +myApp+ be
prevented from being active. Additionally, the cluster will wait for
+myApp+ to be stopped before demoting the database.
Colocation of a primitive or group resource with a promotable clone
resource means that it can run on any node with an active instance of
the promotable clone resource that has the specified role (+master+ or
+slave+). In the example above, the cluster will choose a location based on
where database is running as a +master+, and if there are multiple
+master+ instances it will also factor in +myApp+'s own location
preferences when deciding which location to choose.
Colocation with regular clones and other promotable clone resources is also
possible. In such cases, the set of allowed locations for the +rsc+
clone is (after role filtering) limited to nodes on which the
+with-rsc+ promotable clone resource is (or will be) in the specified role.
Placement is then performed as normal.
==== Using Promotable Clone Resources in Colocation Sets ====
.Additional colocation set options relevant to promotable clone resources
[width="95%",cols="1m,1,<6",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|role
|Started
|The role that 'all members' of the set must be in. Allowed values: +Started+, +Master+,
+Slave+.
indexterm:[role,Ordering Constraints]
indexterm:[Constraints,Ordering,role]
|=========================================================
In the following example +B+'s master must be located on the same node as +A+'s master.
Additionally resources +C+ and +D+ must be located on the same node as +A+'s
and +B+'s masters.
.Colocate C and D with A's and B's master instances
======
[source,XML]
-------
-------
======
==== Using Promotable Clone Resources in Ordered Sets ====
.Additional ordered set options relevant to promotable clone resources
[width="95%",cols="1m,1,<3",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|action
|value of +first-action+
|An additional attribute of ordering constraint sets that specifies the
action that applies to 'all members' of the set. Allowed
values: +start+, +stop+, +promote+, +demote+.
indexterm:[action,Ordering Constraints]
indexterm:[Constraints,Ordering,action]
|=========================================================
.Start C and D after first promoting A and B
======
[source,XML]
-------
-------
======
In the above example, +B+ cannot be promoted to a master role until +A+ has
been promoted. Additionally, resources +C+ and +D+ must wait until +A+ and +B+
have been promoted before they can start.
[[s-clone-stickiness]]
=== Clone Stickiness ===
indexterm:[resource-stickiness,Clones]
To achieve a stable allocation pattern, clones are slightly sticky by
default. If no value for +resource-stickiness+ is provided, the clone
will use a value of 1. Being a small value, it causes minimal
disturbance to the score calculations of other resources but is enough
to prevent Pacemaker from needlessly moving copies around the cluster.
[NOTE]
====
For globally unique clones, this may result in multiple instances of the
clone staying on a single node, even after another eligible node becomes
active (for example, after being put into standby mode then made active again).
If you do not want this behavior, specify a +resource-stickiness+ of 0
for the clone temporarily and let the cluster adjust, then set it back
to 1 if you want the default behavior to apply again.
====
=== Clone Resource Agent Requirements ===
Any resource can be used as an anonymous clone, as it requires no
additional support from the resource agent. Whether it makes sense to
do so depends on your resource and its resource agent.
==== Resource Agent Requirements for Globally Unique Clones ====
Globally unique clones require additional support in the resource agent. In
particular, it must only respond with +$\{OCF_SUCCESS}+ if the node has that
exact instance active. All other probes for instances of the clone should
result in +$\{OCF_NOT_RUNNING}+ (or one of the other OCF error codes if
they are failed).
Individual instances of a clone are identified by appending a colon and a
numerical offset, e.g. +apache:2+.
Resource agents can find out how many copies there are by examining
the +OCF_RESKEY_CRM_meta_clone_max+ environment variable and which
instance it is by examining +OCF_RESKEY_CRM_meta_clone+.
The resource agent must not make any assumptions (based on
+OCF_RESKEY_CRM_meta_clone+) about which numerical instances are active. In
particular, the list of active copies will not always be an unbroken
sequence, nor always start at 0.
==== Resource Agent Requirements for Promotable Clones ====
Promotable clone resources require two extra actions, +demote+ and +promote+,
which are responsible for changing the state of the resource. Like +start+ and
+stop+, they should return +$\{OCF_SUCCESS}+ if they completed successfully or
a relevant error code if they did not.
The states can mean whatever you wish, but when the resource is
started, it must come up in the mode called +slave+. From there the
cluster will decide which instances to promote to +master+.
In addition to the clone requirements for monitor actions, agents must
also _accurately_ report which state they are in. The cluster relies
on the agent to report its status (including role) accurately and does
not indicate to the agent what role it currently believes it to be in.
.Role implications of OCF return codes
[width="95%",cols="1,<1",options="header",align="center"]
|=========================================================
|Monitor Return Code
|Description
|OCF_NOT_RUNNING
|Stopped
indexterm:[Return Code,OCF_NOT_RUNNING]
|OCF_SUCCESS
|Running (Slave)
indexterm:[Return Code,OCF_SUCCESS]
|OCF_RUNNING_MASTER
|Running (Master)
indexterm:[Return Code,OCF_RUNNING_MASTER]
|OCF_FAILED_MASTER
|Failed (Master)
indexterm:[Return Code,OCF_FAILED_MASTER]
|Other
|Failed (Slave)
|=========================================================
==== Clone Notifications ====
If the clone has the +notify+ meta-attribute set to +true+, and the resource
agent supports the +notify+ action, Pacemaker will call the action when
appropriate, passing a number of extra variables which, when combined with
additional context, can be used to calculate the current state of the cluster
and what is about to happen to it.
.Environment variables supplied with Clone notify actions
[width="95%",cols="5,<3",options="header",align="center"]
|=========================================================
|Variable
|Description
|OCF_RESKEY_CRM_meta_notify_type
|Allowed values: +pre+, +post+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,type]
indexterm:[type,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_operation
|Allowed values: +start+, +stop+
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,operation]
indexterm:[operation,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_resource
|Resources to be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_resource]
indexterm:[start_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_resource
|Resources to be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_resource]
indexterm:[stop_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_resource
|Resources that are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_resource]
indexterm:[active_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_inactive_resource
|Resources that are not running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,inactive_resource]
indexterm:[inactive_resource,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_start_uname
|Nodes on which resources will be started
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,start_uname]
indexterm:[start_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_stop_uname
|Nodes on which resources will be stopped
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,stop_uname]
indexterm:[stop_uname,Notification Environment Variable]
|OCF_RESKEY_CRM_meta_notify_active_uname
|Nodes on which resources are running
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,active_uname]
indexterm:[active_uname,Notification Environment Variable]
|=========================================================
The variables come in pairs, such as
+OCF_RESKEY_CRM_meta_notify_start_resource+ and
+OCF_RESKEY_CRM_meta_notify_start_uname+ and should be treated as an
array of whitespace-separated elements.
+OCF_RESKEY_CRM_meta_notify_inactive_resource+ is an exception as the
matching +uname+ variable does not exist since inactive resources
are not running on any node.
Thus in order to indicate that +clone:0+ will be started on +sles-1+,
+clone:2+ will be started on +sles-3+, and +clone:3+ will be started
on +sles-2+, the cluster would set
.Notification variables
======
[source,Bash]
-------
OCF_RESKEY_CRM_meta_notify_start_resource="clone:0 clone:2 clone:3"
OCF_RESKEY_CRM_meta_notify_start_uname="sles-1 sles-3 sles-2"
-------
======
==== Interpretation of Notification Variables ====
.Pre-notification (stop):
* Active resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (stop) / Pre-notification (start):
* Active resources
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Inactive resources
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (start):
* Active resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
==== Extra Notifications for Promotable Clones ====
.Extra environment variables supplied for promotable clones
[width="95%",cols="5,<3",options="header",align="center"]
|=========================================================
|_OCF_RESKEY_CRM_meta_notify_master_resource_
|Resources that are running in +Master+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,master_resource]
indexterm:[master_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_slave_resource_
|Resources that are running in +Slave+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,slave_resource]
indexterm:[slave_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_promote_resource_
|Resources to be promoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,promote_resource]
indexterm:[promote_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_demote_resource_
|Resources to be demoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,demote_resource]
indexterm:[demote_resource,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_promote_uname_
|Nodes on which resources will be promoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,promote_uname]
indexterm:[promote_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_demote_uname_
|Nodes on which resources will be demoted
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,demote_uname]
indexterm:[demote_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_master_uname_
|Nodes on which resources are running in +Master+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,master_uname]
indexterm:[master_uname,Notification Environment Variable]
|_OCF_RESKEY_CRM_meta_notify_slave_uname_
|Nodes on which resources are running in +Slave+ mode
indexterm:[Environment Variable,OCF_RESKEY_CRM_meta_notify_,slave_uname]
indexterm:[slave_uname,Notification Environment Variable]
|=========================================================
==== Interpretation of Promotable Notification Variables ====
.Pre-notification (demote):
* +Active+ resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* +Master+ resources: +$OCF_RESKEY_CRM_meta_notify_master_resource+
* +Slave+ resources: +$OCF_RESKEY_CRM_meta_notify_slave_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (demote) / Pre-notification (stop):
* +Active+ resources: +$OCF_RESKEY_CRM_meta_notify_active_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources: +$OCF_RESKEY_CRM_meta_notify_slave_resource+
* Inactive resources: +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
.Post-notification (stop) / Pre-notification (start)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (start) / Pre-notification (promote)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
.Post-notification (promote)
* +Active+ resources:
** +$OCF_RESKEY_CRM_meta_notify_active_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* +Master+ resources:
** +$OCF_RESKEY_CRM_meta_notify_master_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_demote_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* +Slave+ resources:
** +$OCF_RESKEY_CRM_meta_notify_slave_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_start_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Inactive resources:
** +$OCF_RESKEY_CRM_meta_notify_inactive_resource+
** plus +$OCF_RESKEY_CRM_meta_notify_stop_resource+
** minus +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources to be promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources to be demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources to be stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
* Resources that were started: +$OCF_RESKEY_CRM_meta_notify_start_resource+
* Resources that were promoted: +$OCF_RESKEY_CRM_meta_notify_promote_resource+
* Resources that were demoted: +$OCF_RESKEY_CRM_meta_notify_demote_resource+
* Resources that were stopped: +$OCF_RESKEY_CRM_meta_notify_stop_resource+
=== Monitoring Promotable Clone Resources ===
The usual monitor actions are insufficient to monitor a promotable clone
resource, because Pacemaker needs to verify not only that the resource is
active, but also that its actual role matches its intended one.
Define two monitoring actions: the usual one will cover the slave role,
and an additional one with +role="master"+ will cover the master role.
.Monitoring both states of a promotable clone resource
======
[source,XML]
-------
-------
======
[IMPORTANT]
===========
It is crucial that _every_ monitor operation has a different interval!
Pacemaker currently differentiates between operations
only by resource and interval; so if (for example) a promotable clone resource
had the same monitor interval for both roles, Pacemaker would ignore the
role when checking the status -- which would cause unexpected return
codes, and therefore unnecessary complications.
===========
[[s-promotion-scores]]
=== Determining Which Instance is Promoted ===
Pacemaker can choose a promotable clone instance to be promoted in one of two
ways:
* Promotion scores: These are node attributes set via the `crm_master` utility,
which generally would be called by the resource agent's start action if it
supports promotable clones. This tool automatically detects both the resource
and host, and should be used to set a preference for being promoted. Based on
this, +promoted-max+, and +promoted-node-max+, the instance(s) with the
highest preference will be promoted.
* Constraints: Location constraints can indicate which nodes are most preferred
as masters.
.Explicitly preferring node1 to be promoted to master
======
[source,XML]
-------
-------
======
[[s-resource-bundle]]
== Bundles - Isolated Environments ==
indexterm:[bundle]
indexterm:[Resource,bundle]
indexterm:[Docker,bundle]
indexterm:[rkt,bundle]
Pacemaker supports a special syntax for launching a
https://en.wikipedia.org/wiki/Operating-system-level_virtualization[container]
with any infrastructure it requires: the 'bundle'.
Pacemaker bundles support https://www.docker.com/[Docker] and
https://coreos.com/rkt/[rkt] container technologies.
footnote:[Docker is a trademark of Docker, Inc. No endorsement by or
association with Docker, Inc. is implied.]
.A bundle for a containerized web server
====
[source,XML]
----
----
====
=== Bundle Properties ===
.Properties of a Bundle
[width="95%",cols="3m,<5",options="header",align="center"]
|=========================================================
|Field
|Description
|id
|A unique name for the bundle (required)
indexterm:[id,bundle]
indexterm:[bundle,Property,id]
|description
|Arbitrary text (not used by Pacemaker)
indexterm:[description,bundle]
indexterm:[bundle,Property,description]
|=========================================================
A bundle must contain exactly one ++ or ++ element.
=== Docker Properties ===
Before configuring a Docker bundle in Pacemaker, the user must install Docker
and supply a fully configured Docker image on every node allowed to run the
bundle.
Pacemaker will create an implicit +ocf:heartbeat:docker+ resource to manage
a bundle's Docker container. The user must ensure that resource agent is
installed on every node allowed to run the bundle.
.Properties of a Bundle's Docker Element
[width="95%",cols="3m,4,<5",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|image
|
|Docker image tag (required)
indexterm:[image,Docker]
indexterm:[Docker,Property,image]
|replicas
|Value of +promoted-max+ if that is positive, else 1
|A positive integer specifying the number of container instances to launch
indexterm:[replicas,Docker]
indexterm:[Docker,Property,replicas]
|replicas-per-host
|1
|A positive integer specifying the number of container instances allowed to run
on a single node
indexterm:[replicas-per-host,Docker]
indexterm:[Docker,Property,replicas-per-host]
|promoted-max
|0
|A non-negative integer that, if positive, indicates that the containerized
service should be treated as a promotable service, with this many replicas
allowed to run the service in the master role
indexterm:[promoted-max,Docker]
indexterm:[Docker,Property,promoted-max]
|network
|
|If specified, this will be passed to +docker run+ as the
https://docs.docker.com/engine/reference/run/#network-settings[network setting]
for the Docker container.
indexterm:[network,Docker]
indexterm:[Docker,Property,network]
|run-command
|`/usr/sbin/pacemaker-remoted` if bundle contains a +primitive+, otherwise none
|This command will be run inside the container when launching it ("PID 1"). If
the bundle contains a +primitive+, this command 'must' start pacemaker-remoted
(but could, for example, be a script that does other stuff, too). If the
container image has a pre-2.0.0 version of Pacemaker, set this to
+/usr/sbin/pacemaker_remoted+ (note the underbar instead of dash).
indexterm:[run-command,Docker]
indexterm:[Docker,Property,run-command]
|options
|
|Extra command-line options to pass to `docker run`
indexterm:[options,Docker]
indexterm:[Docker,Property,options]
|=========================================================
For backward compatibility, +masters+ is accepted as an alias for
+promoted-max+, but is deprecated since 2.0.0, and support for it will be
removed in a future version.
=== rkt Properties ===
Before configuring a rkt bundle in Pacemaker, the user must install rkt
and supply a fully configured container image on every node allowed to run the
bundle.
Pacemaker will create an implicit +ocf:heartbeat:rkt+ resource to manage
a bundle's rkt container. The user must ensure that resource agent is
installed on every node allowed to run the bundle.
.Properties of a Bundle's rkt Element
[width="95%",cols="3m,4,<5",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|image
|
|Container image tag (required)
indexterm:[image,rkt]
indexterm:[rkt,Property,image]
|replicas
|Value of +promoted-max+ if that is positive, else 1
|A positive integer specifying the number of container instances to launch
indexterm:[replicas,rkt]
indexterm:[rkt,Property,replicas]
|replicas-per-host
|1
|A positive integer specifying the number of container instances allowed to run
on a single node
indexterm:[replicas-per-host,rkt]
indexterm:[rkt,Property,replicas-per-host]
|promoted-max
|0
|A non-negative integer that, if positive, indicates that the containerized
service should be treated as a promotable service, with this many replicas
allowed to run the service in the master role
indexterm:[promoted-max,rkt]
indexterm:[rkt,Property,promoted-max]
|network
|
|If specified, this will be passed to +rkt run+ as the
network setting for the rkt container.
indexterm:[network,rkt]
indexterm:[rkt,Property,network]
|run-command
|`/usr/sbin/pacemaker-remoted` if bundle contains a +primitive+, otherwise none
|This command will be run inside the container when launching it ("PID 1"). If
the bundle contains a +primitive+, this command 'must' start pacemaker-remoted
(but could, for example, be a script that does other stuff, too). If the
container image has a pre-2.0.0 version of Pacemaker, set this to
+/usr/sbin/pacemaker_remoted+ (note the underbar instead of dash).
indexterm:[run-command,rkt]
indexterm:[rkt,Property,run-command]
|options
|
|Extra command-line options to pass to `rkt run`
indexterm:[options,rkt]
indexterm:[rkt,Property,options]
|=========================================================
For backward compatibility, +masters+ is accepted as an alias for
+promoted-max+, but is deprecated since 2.0.0, and support for it will be
removed in a future version.
=== Bundle Network Properties ===
A bundle may optionally contain one ++ element.
indexterm:[bundle,network]
.Properties of a Bundle's Network Element
[width="95%",cols="2m,1,<4",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|add-host
|TRUE
|If TRUE, and +ip-range-start+ is used, Pacemaker will automatically ensure
that +/etc/hosts+ inside the containers has entries for each
<> and its assigned IP.
indexterm:[add-host,network]
indexterm:[network,Property,add-host]
|ip-range-start
|
|If specified, Pacemaker will create an implicit +ocf:heartbeat:IPaddr2+
resource for each container instance, starting with this IP address,
using up to +replicas+ sequential addresses. These addresses can be used
from the host's network to reach the service inside the container, though
it is not visible within the container itself. Only IPv4 addresses are
currently supported.
indexterm:[ip-range-start,network]
indexterm:[network,Property,ip-range-start]
|host-netmask
|32
|If +ip-range-start+ is specified, the IP addresses are created with this
CIDR netmask (as a number of bits).
indexterm:[host-netmask,network]
indexterm:[network,Property,host-netmask]
|host-interface
|
|If +ip-range-start+ is specified, the IP addresses are created on this
host interface (by default, it will be determined from the IP address).
indexterm:[host-interface,network]
indexterm:[network,Property,host-interface]
|control-port
|3121
|If the bundle contains a +primitive+, the cluster will use this integer TCP
port for communication with Pacemaker Remote inside the container. Changing
this is useful when the container is unable to listen on the default port,
for example, when the container uses the host's network rather than
+ip-range-start+ (in which case +replicas-per-host+ must be 1), or when the
bundle may run on a Pacemaker Remote node that is already listening on the
default port. Any PCMK_remote_port environment variable set on the host or in
the container is ignored for bundle connections.
indexterm:[control-port,network]
indexterm:[network,Property,control-port]
|=========================================================
[[s-resource-bundle-note-replica-names]]
[NOTE]
====
Replicas are named by the bundle id plus a dash and an integer counter starting
with zero. For example, if a bundle named +httpd-bundle+ has +replicas=2+, its
containers will be named +httpd-bundle-0+ and +httpd-bundle-1+.
====
Additionally, a ++ element may optionally contain one or more
++ elements.
indexterm:[bundle,network,port-mapping]
.Properties of a Bundle's Port-Mapping Element
[width="95%",cols="2m,1,<4",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the port mapping (required)
indexterm:[id,port-mapping]
indexterm:[port-mapping,Property,id]
|port
|
|If this is specified, connections to this TCP port number on the host network
(on the container's assigned IP address, if +ip-range-start+ is specified)
will be forwarded to the container network. Exactly one of +port+ or +range+
must be specified in a +port-mapping+.
indexterm:[port,port-mapping]
indexterm:[port-mapping,Property,port]
|internal-port
|value of +port+
|If +port+ and this are specified, connections to +port+ on the host's network
will be forwarded to this port on the container network.
indexterm:[internal-port,port-mapping]
indexterm:[port-mapping,Property,internal-port]
|range
|
|If this is specified, connections to these TCP port numbers (expressed as
'first_port'-'last_port') on the host network (on the container's assigned IP
address, if +ip-range-start+ is specified) will be forwarded to the same ports
in the container network. Exactly one of +port+ or +range+ must be specified
in a +port-mapping+.
indexterm:[range,port-mapping]
indexterm:[port-mapping,Property,range]
|=========================================================
[NOTE]
====
If the bundle contains a +primitive+, Pacemaker will automatically map the
+control-port+, so it is not necessary to specify that port in a
+port-mapping+.
====
=== Bundle Storage Properties ===
A bundle may optionally contain one ++ element. A ++ element
has no properties of its own, but may contain one or more ++
elements.
indexterm:[bundle,storage,storage-mapping]
.Properties of a Bundle's Storage-Mapping Element
[width="95%",cols="2m,1,<4",options="header",align="center"]
|=========================================================
|Field
|Default
|Description
|id
|
|A unique name for the storage mapping (required)
indexterm:[id,storage-mapping]
indexterm:[storage-mapping,Property,id]
|source-dir
|
|The absolute path on the host's filesystem that will be mapped into the
container. Exactly one of +source-dir+ and +source-dir-root+ must be specified
in a +storage-mapping+.
indexterm:[source-dir,storage-mapping]
indexterm:[storage-mapping,Property,source-dir]
|source-dir-root
|
|The start of a path on the host's filesystem that will be mapped into the
container, using a different subdirectory on the host for each container
instance. The subdirectory will be named the same as the
<>.
Exactly one of +source-dir+ and +source-dir-root+ must be specified in a
+storage-mapping+.
indexterm:[source-dir-root,storage-mapping]
indexterm:[storage-mapping,Property,source-dir-root]
|target-dir
|
|The path name within the container where the host storage will be mapped
(required)
indexterm:[target-dir,storage-mapping]
indexterm:[storage-mapping,Property,target-dir]
|options
|
|File system mount options to use when mapping the storage
indexterm:[options,storage-mapping]
indexterm:[storage-mapping,Property,options]
|=========================================================
[NOTE]
====
Pacemaker does not define the behavior if the source directory does not already
exist on the host. However, it is expected that the container technology and/or
its resource agent will create the source directory in that case.
====
[NOTE]
====
If the bundle contains a +primitive+,
Pacemaker will automatically map the equivalent of
+source-dir=/etc/pacemaker/authkey target-dir=/etc/pacemaker/authkey+
and +source-dir-root=/var/log/pacemaker/bundles target-dir=/var/log+ into the
container, so it is not necessary to specify those paths in a
+storage-mapping+.
====
[IMPORTANT]
====
The +PCMK_authkey_location+ environment variable must not be set to anything
other than the default of `/etc/pacemaker/authkey` on any node in the cluster.
====
=== Bundle Primitive ===
A bundle may optionally contain one ++ resource
(see <>). The primitive may have operations,
instance attributes and meta-attributes defined, as usual.
If a bundle contains a primitive resource, the container image must include
the Pacemaker Remote daemon, and at least one of +ip-range-start+ or
+control-port+ must be configured in the bundle. Pacemaker will create an
implicit +ocf:pacemaker:remote+ resource for the connection, launch
Pacemaker Remote within the container, and monitor and manage the primitive
resource via Pacemaker Remote.
If the bundle has more than one container instance (replica), the primitive
resource will function as an implicit clone (see <>) --
a promotable clone if the bundle has +masters+ greater than zero
(see <>).
[IMPORTANT]
====
Containers in bundles with a +primitive+ must have an accessible networking
environment, so that Pacemaker on the cluster nodes can contact
Pacemaker Remote inside the container. For example, the Docker option
`--net=none` should not be used with a +primitive+. The default (using a
distinct network space inside the container) works in combination with
+ip-range-start+. If the Docker option `--net=host` is used (making the
container share the host's network space), a unique +control-port+ should be
specified for each bundle. Any firewall must allow access to the
+control-port+.
====
[[s-bundle-attributes]]
=== Bundle Node Attributes ===
If the bundle has a +primitive+, the primitive's resource agent may want to set
node attributes such as <>. However, with
containers, it is not apparent which node should get the attribute.
If the container uses shared storage that is the same no matter which node the
container is hosted on, then it is appropriate to use the promotion score on the
bundle node itself.
On the other hand, if the container uses storage exported from the underlying host,
then it may be more appropriate to use the promotion score on the underlying host.
Since this depends on the particular situation, the
+container-attribute-target+ resource meta-attribute allows the user to specify
which approach to use. If it is set to +host+, then user-defined node attributes
will be checked on the underlying host. If it is anything else, the local node
(in this case the bundle node) is used as usual.
This only applies to user-defined attributes; the cluster will always check the
local node for cluster-defined attributes such as +#uname+.
If +container-attribute-target+ is +host+, the cluster will pass additional
environment variables to the primitive's resource agent that allow it to set
node attributes appropriately: +CRM_meta_container_attribute_target+ (identical
to the meta-attribute value) and +CRM_meta_physical_host+ (the name of the
underlying host).
[NOTE]
====
When called by a resource agent, the attrd_updater and crm_attribute commands
will automatically check those environment variables and set attributes
appropriately.
====
=== Bundle Meta-Attributes ===
Any meta-attribute set on a bundle will be inherited by the bundle's
primitive and any resources implicitly created by Pacemaker for the bundle.
This includes options such as +priority+, +target-role+, and +is-managed+. See
<> for more information.
=== Limitations of Bundles ===
Restarting pacemaker while a bundle is unmanaged or the cluster is in
maintenance mode may cause the bundle to fail.
Bundles may not be explicitly cloned or included in groups. This includes the
bundle's primitive and any resources implicitly created by Pacemaker for the
bundle. (If +replicas+ is greater than 1, the bundle will behave like a clone
implicitly.)
Bundles do not have instance attributes, utilization attributes, or operations,
though a bundle's primitive may have them.
A bundle with a primitive can run on a Pacemaker Remote node only if the bundle
uses a distinct +control-port+.
diff --git a/doc/Pacemaker_Explained/en-US/Ch-Alerts.txt b/doc/Pacemaker_Explained/en-US/Ch-Alerts.txt
index 34daeece5f..34efbb284b 100644
--- a/doc/Pacemaker_Explained/en-US/Ch-Alerts.txt
+++ b/doc/Pacemaker_Explained/en-US/Ch-Alerts.txt
@@ -1,423 +1,424 @@
+:compat-mode: legacy
= Alerts =
////
We prefer [[ch-alerts]], but older versions of asciidoc don't deal well
with that construct for chapter headings
////
anchor:ch-alerts[Chapter 7, Alerts]
indexterm:[Resource,Alerts]
'Alerts' may be configured to take some external action when a cluster event
occurs (node failure, resource starting or stopping, etc.).
== Alert Agents ==
As with resource agents, the cluster calls an external program (an
'alert agent') to handle alerts. The cluster passes information about the event
to the agent via environment variables. Agents can do anything
desired with this information (send an e-mail, log to a file,
update a monitoring system, etc.).
.Simple alert configuration
=====
[source,XML]
-----
-----
=====
In the example above, the cluster will call +my-script.sh+ for each event.
Multiple alert agents may be configured; the cluster will call all of them for
each event.
Alert agents will be called only on cluster nodes. They will be called for
events involving Pacemaker Remote nodes, but they will never be called _on_
those nodes.
== Alert Recipients ==
Usually alerts are directed towards a recipient. Thus each alert may be additionally configured with one or more recipients.
The cluster will call the agent separately for each recipient.
.Alert configuration with recipient
=====
[source,XML]
-----
-----
=====
In the above example, the cluster will call +my-script.sh+ for each event,
passing the recipient +some-address+ as an environment variable.
The recipient may be anything the alert agent can recognize --
an IP address, an e-mail address, a file name, whatever the particular
agent supports.
== Alert Meta-Attributes ==
As with resource agents, meta-attributes can be configured for alert agents
to affect how Pacemaker calls them.
.Meta-Attributes of an Alert
[width="95%",cols="m,1,<2",options="header",align="center"]
|=========================================================
|Meta-Attribute
|Default
|Description
|timestamp-format
|%H:%M:%S.%06N
|Format the cluster will use when sending the event's timestamp to the agent.
This is a string as used with the `date(1)` command.
indexterm:[Alert,Option,timestamp-format]
|timeout
|30s
|If the alert agent does not complete within this amount of time, it will be
terminated.
indexterm:[Alert,Option,timeout]
|=========================================================
Meta-attributes can be configured per alert agent and/or per recipient.
.Alert configuration with meta-attributes
=====
[source,XML]
-----
-----
=====
In the above example, the +my-script.sh+ will get called twice for each event,
with each call using a 15-second timeout. One call will be passed the recipient
+someuser@example.com+ and a timestamp in the format +%D %H:%M+, while the
other call will be passed the recipient +otheruser@example.com+ and a timestamp
in the format +%c+.
== Alert Instance Attributes ==
As with resource agents, agent-specific configuration values may be configured
as instance attributes. These will be passed to the agent as additional
environment variables. The number, names and allowed values of these
instance attributes are completely up to the particular agent.
.Alert configuration with instance attributes
=====
[source,XML]
-----
-----
=====
== Alert Filters ==
By default, an alert agent will be called for node events, fencing events, and
resource events. An agent may choose to ignore certain types of events, but
there is still the overhead of calling it for those events. To eliminate that
overhead, you may select which types of events the agent should receive.
.Alert configuration to receive only node events and fencing events
=====
[source,XML]
-----
-----
=====
The possible options within +