diff --git a/doc/sphinx/Clusters_from_Scratch/installation.rst b/doc/sphinx/Clusters_from_Scratch/installation.rst
index db61a49b74..6a0d4698cd 100644
--- a/doc/sphinx/Clusters_from_Scratch/installation.rst
+++ b/doc/sphinx/Clusters_from_Scratch/installation.rst
@@ -1,437 +1,416 @@
Installation
------------
Install |CFS_DISTRO| |CFS_DISTRO_VER|
################################################################################################
Boot the Install Image
______________________
Download the latest |CFS_DISTRO| |CFS_DISTRO_VER| DVD ISO by navigating to
the `CentOS Mirrors List `_,
selecting a download mirror which is close to you, and finally selecting the
.iso file that has "dvd" in its name.
Use the image to boot a virtual machine, or burn it to a DVD or USB drive and
boot a physical server from that.
After starting the installation, select your language and keyboard layout at
the welcome screen.
.. figure:: images/WelcomeToCentos.png
- :scale: 80%
- :width: 1024
- :height: 800
:align: center
:alt: Installation Welcome Screen
|CFS_DISTRO| |CFS_DISTRO_VER| Installation Welcome Screen
Installation Options
____________________
At this point, you get a chance to tweak the default installation options.
.. figure:: images/InstallationSummary.png
- :scale: 80%
- :width: 1024
- :height: 800
:align: center
:alt: Installation Summary Screen
|CFS_DISTRO| |CFS_DISTRO_VER| Installation Summary Screen
Click on the **SOFTWARE SELECTION** section (try saying that 10 times quickly). The
default environment, **Server with GUI**, does have add-ons with much of the software
we need, but we will change the environment to a **Minimal Install** here, so that we
can see exactly what software is required later, and press **Done**.
.. figure:: images/SoftwareSelection.png
- :scale: 80%
- :width: 1024
- :height: 800
:align: center
:alt: Software Selection Screen
|CFS_DISTRO| |CFS_DISTRO_VER| Software Selection Screen
Configure Network
_________________
In the **NETWORK & HOSTNAME** section:
- Edit **Host Name:** as desired. For this example, we will use
**pcmk-1.localdomain** and then press **Apply**.
- Select your network device, press **Configure...**, and use the **Manual** method to
assign a fixed IP address. For this example, we'll use 192.168.122.101 under
**IPv4 Settings** (with an appropriate netmask, gateway and DNS server).
- Press **Save**.
- Flip the switch to turn your network device on, and press **Done**.
.. figure:: images/NetworkAndHostName.png
- :scale: 80%
- :width: 1024
- :height: 800
:align: center
:alt: Editing network settings
|CFS_DISTRO| |CFS_DISTRO_VER| Network Interface Screen
.. IMPORTANT::
Do not accept the default network settings.
Cluster machines should never obtain an IP address via DHCP, because
DHCP's periodic address renewal will interfere with corosync.
Configure Disk
______________
By default, the installer's automatic partitioning will use LVM (which allows
us to dynamically change the amount of space allocated to a given partition).
However, it allocates all free space to the ``/`` (aka. **root**) partition, which
cannot be reduced in size later (dynamic increases are fine).
In order to follow the DRBD and GFS2 portions of this guide, we need to reserve
space on each machine for a replicated volume.
Enter the **INSTALLATION DESTINATION** section, ensure the hard drive you want to
install to is selected, select **Custom** to be the **Storage Configuration**, and
press **Done**.
In the **MANUAL PARTITIONING** screen that comes next, click the option to create
mountpoints automatically. Select the ``/`` mountpoint, and reduce the desired
capacity by 3GiB or so. Select **Modify...** by the volume group name, and change
the **Size policy:** to **As large as possible**, to make the reclaimed space
available inside the LVM volume group. We'll add the additional volume later.
.. figure:: images/ManualPartitioning.png
- :scale: 80%
- :width: 1024
- :height: 800
:align: center
:alt: Manual Partitioning Screen
|CFS_DISTRO| |CFS_DISTRO_VER| Manual Partitioning Screen
Press **Done**, then **Accept changes**.
Configure Time Synchronization
______________________________
It is highly recommended to enable NTP on your cluster nodes. Doing so
ensures all nodes agree on the current time and makes reading log files
significantly easier.
|CFS_DISTRO| will enable NTP automatically. If you want to change any time-related
settings (such as time zone or NTP server), you can do this in the
**TIME & DATE** section.
Root Password
______________________________
In order to continue to the next step, a **Root Password** must be set.
.. figure:: images/RootPassword.png
- :scale: 80%
- :width: 1024
- :height: 800
:align: center
:alt: Root Password Screen
|CFS_DISTRO| |CFS_DISTRO_VER| Root Password Screen
Press **Done** (depending on the password you chose, you may need to do so twice).
Finish Install
______________
Select **Begin Installation**. Once it completes, **Reboot System**
as instructed. After the node reboots, you'll see a login prompt on
the console. Login using **root** and the password you created earlier.
.. figure:: images/ConsolePrompt.png
- :scale: 80%
- :width: 1024
- :height: 768
:align: center
:alt: Console Prompt
|CFS_DISTRO| |CFS_DISTRO_VER| Console Prompt
.. NOTE::
From here on, we're going to be working exclusively from the terminal.
Configure the OS
################
Verify Networking
_________________
Ensure that the machine has the static IP address you configured earlier.
.. code-block:: none
[root@pcmk-1 ~]# ip addr
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp1s0: mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:32:cf:a9 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.101/24 brd 192.168.122.255 scope global noprefixroute enp1s0
valid_lft forever preferred_lft forever
inet6 fe80::c3e1:3ba:959:fa96/64 scope link noprefixroute
valid_lft forever preferred_lft forever
.. NOTE::
If you ever need to change the node's IP address from the command line, follow
these instructions, replacing **${device}** with the name of your network device:
.. code-block:: none
[root@pcmk-1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-${device} # manually edit as desired
[root@pcmk-1 ~]# nmcli dev disconnect ${device}
[root@pcmk-1 ~]# nmcli con reload ${device}
[root@pcmk-1 ~]# nmcli con up ${device}
This makes **NetworkManager** aware that a change was made on the config file.
Next, ensure that the routes are as expected:
.. code-block:: none
[root@pcmk-1 ~]# ip route
default via 192.168.122.1 dev enp1s0 proto static metric 100
192.168.122.0/24 dev enp1s0 proto kernel scope link src 192.168.122.101 metric 100
If there is no line beginning with **default via**, then you may need to add a line such as
``GATEWAY="192.168.122.1"``
to the device configuration using the same process as described above for
changing the IP address.
Now, check for connectivity to the outside world. Start small by
testing whether we can reach the gateway we configured.
.. code-block:: none
[root@pcmk-1 ~]# ping -c 1 192.168.122.1
PING 192.168.122.1 (192.168.122.1) 56(84) bytes of data.
64 bytes from 192.168.122.1: icmp_seq=1 ttl=64 time=0.492 ms
--- 192.168.122.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.492/0.492/0.492/0.000 ms
Now try something external; choose a location you know should be available.
.. code-block:: none
[root@pcmk-1 ~]# ping -c 1 www.clusterlabs.org
PING mx1.clusterlabs.org (95.217.104.78) 56(84) bytes of data.
64 bytes from mx1.clusterlabs.org (95.217.104.78): icmp_seq=1 ttl=54 time=134 ms
--- mx1.clusterlabs.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 133.987/133.987/133.987/0.000 ms
Login Remotely
______________
The console isn't a very friendly place to work from, so we will now
switch to accessing the machine remotely via SSH where we can
use copy and paste, etc.
From another host, check whether we can see the new host at all:
.. code-block:: none
[gchin@gchin ~]$ ping -c 1 192.168.122.101
PING 192.168.122.101 (192.168.122.101) 56(84) bytes of data.
64 bytes from 192.168.122.101: icmp_seq=1 ttl=64 time=0.344 ms
--- 192.168.122.101 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.344/0.344/0.344/0.000 ms
Next, login as root via SSH.
.. code-block:: none
[gchin@gchin ~]$ ssh root@192.168.122.101
The authenticity of host '192.168.122.101 (192.168.122.101)' can't be established.
ECDSA key fingerprint is SHA256:NBvcRrPDLIt39Rf0Tz4/f2Rd/FA5wUiDOd9bZ9QWWjo.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.122.101' (ECDSA) to the list of known hosts.
root@192.168.122.101's password:
Last login: Tue Jan 10 20:46:30 2021
[root@pcmk-1 ~]#
Apply Updates
_____________
Apply any package updates released since your installation image was created:
.. code-block:: none
[root@pcmk-1 ~]# yum update
.. index::
single: node; short name
Use Short Node Names
____________________
During installation, we filled in the machine's fully qualified domain
name (FQDN), which can be rather long when it appears in cluster logs and
status output. See for yourself how the machine identifies itself:
.. code-block:: none
[root@pcmk-1 ~]# uname -n
pcmk-1.localdomain
We can use the `hostnamectl` tool to strip off the domain name:
.. code-block:: none
[root@pcmk-1 ~]# hostnamectl set-hostname $(uname -n | sed s/\\..*//)
Now, check that the machine is using the correct name:
.. code-block:: none
[root@pcmk-1 ~]# uname -n
pcmk-1
You may want to reboot to ensure all updates take effect.
Repeat for Second Node
######################
Repeat the Installation steps so far, so that you have two
nodes ready to have the cluster software installed.
For the purposes of this document, the additional node is called
pcmk-2 with address 192.168.122.102.
Configure Communication Between Nodes
#####################################
Configure Host Name Resolution
______________________________
Confirm that you can communicate between the two new nodes:
.. code-block:: none
[root@pcmk-1 ~]# ping -c 3 192.168.122.102
PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data.
64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=1.22 ms
64 bytes from 192.168.122.102: icmp_seq=2 ttl=64 time=0.795 ms
64 bytes from 192.168.122.102: icmp_seq=3 ttl=64 time=0.751 ms
--- 192.168.122.102 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2054ms
rtt min/avg/max/mdev = 0.751/0.923/1.224/0.214 ms
Now we need to make sure we can communicate with the machines by their
name. If you have a DNS server, add additional entries for the two
machines. Otherwise, you'll need to add the machines to ``/etc/hosts``
on both nodes. Below are the entries for my cluster nodes:
.. code-block:: none
[root@pcmk-1 ~]# grep pcmk /etc/hosts
192.168.122.101 pcmk-1.clusterlabs.org pcmk-1
192.168.122.102 pcmk-2.clusterlabs.org pcmk-2
We can now verify the setup by again using ping:
.. code-block:: none
[root@pcmk-1 ~]# ping -c 3 pcmk-2
PING pcmk-2.clusterlabs.org (192.168.122.102) 56(84) bytes of data.
64 bytes from pcmk-2.clusterlabs.org (192.168.122.102): icmp_seq=1 ttl=64 time=0.295 ms
64 bytes from pcmk-2.clusterlabs.org (192.168.122.102): icmp_seq=2 ttl=64 time=0.616 ms
64 bytes from pcmk-2.clusterlabs.org (192.168.122.102): icmp_seq=3 ttl=64 time=0.809 ms
--- pcmk-2.clusterlabs.org ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2043ms
rtt min/avg/max/mdev = 0.295/0.573/0.809/0.212 ms
.. index:: SSH
Configure SSH
_____________
SSH is a convenient and secure way to copy files and perform commands
remotely. For the purposes of this guide, we will create a key without a
password (using the -N option) so that we can perform remote actions
without being prompted.
.. WARNING::
Unprotected SSH keys (those without a password) are not recommended for
servers exposed to the outside world. We use them here only to simplify
the demo.
Create a new key and allow anyone with that key to log in:
.. index::
single: SSH; key
.. topic:: Creating and Activating a new SSH Key
.. code-block:: none
[root@pcmk-1 ~]# ssh-keygen -t dsa -f ~/.ssh/id_dsa -N ""
Generating public/private dsa key pair.
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:ehR595AVLAVpvFgqYXiayds2qx8emkvnHmfQZMTZ4jM root@pcmk-1
The key's randomart image is:
+---[DSA 1024]----+
| . ..+.=+. |
| . +o+ Bo. |
| . *oo+*+o |
| = .*E..o |
| oS..o . |
| .o+. |
| o.*oo |
| . B.* |
| === |
+----[SHA256]-----+
[root@pcmk-1 ~]# cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
Install the key on the other node:
.. code-block:: none
[root@pcmk-1 ~]# scp -r ~/.ssh pcmk-2:
The authenticity of host 'pcmk-2 (192.168.122.102)' can't be established.
ECDSA key fingerprint is SHA256:FQ4sVubTiHdQ6IetbN96fixoTVx/LuQUV8qoyiywnfs.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'pcmk-2,192.168.122.102' (ECDSA) to the list of known hosts.
root@pcmk-2's password:
id_dsa 100% 1385 1.6MB/s 00:00
id_dsa.pub 100% 601 1.0MB/s 00:00
authorized_keys 100% 601 1.3MB/s 00:00
known_hosts 100% 184 389.2KB/s 00:00
Test that you can now run commands remotely, without being prompted:
.. code-block:: none
[root@pcmk-1 ~]# ssh pcmk-2 -- uname -n
root@pcmk-2's password:
pcmk-2
diff --git a/doc/sphinx/Pacemaker_Administration/tools.rst b/doc/sphinx/Pacemaker_Administration/tools.rst
index 16353216b8..e85edee403 100644
--- a/doc/sphinx/Pacemaker_Administration/tools.rst
+++ b/doc/sphinx/Pacemaker_Administration/tools.rst
@@ -1,568 +1,562 @@
.. index:: command-line tool
Using Pacemaker Command-Line Tools
----------------------------------
.. index::
single: command-line tool; output format
.. _cmdline_output:
Controlling Command Line Output
###############################
Some of the pacemaker command line utilities have been converted to a new
output system. Among these tools are ``crm_mon`` and ``stonith_admin``. This
is an ongoing project, and more tools will be converted over time. This system
lets you control the formatting of output with ``--output-as=`` and the
destination of output with ``--output-to=``.
The available formats vary by tool, but at least plain text and XML are
supported by all tools that use the new system. The default format is plain
text. The default destination is stdout but can be redirected to any file.
Some formats support command line options for changing the style of the output.
For instance:
.. code-block:: none
# crm_mon --help-output
Usage:
crm_mon [OPTION?]
Provides a summary of cluster's current state.
Outputs varying levels of detail in a number of different formats.
Output Options:
--output-as=FORMAT Specify output format as one of: console (default), html, text, xml
--output-to=DEST Specify file name for output (or "-" for stdout)
--html-cgi Add text needed to use output in a CGI program
--html-stylesheet=URI Link to an external CSS stylesheet
--html-title=TITLE Page title
--text-fancy Use more highly formatted output
.. index::
single: crm_mon
single: command-line tool; crm_mon
.. _crm_mon:
Monitor a Cluster with crm_mon
##############################
The ``crm_mon`` utility displays the current state of an active cluster. It can
show the cluster status organized by node or by resource, and can be used in
either single-shot or dynamically updating mode. It can also display operations
performed and information about failures.
Using this tool, you can examine the state of the cluster for irregularities,
and see how it responds when you cause or simulate failures.
See the manual page or the output of ``crm_mon --help`` for a full description
of its many options.
.. topic:: Sample output from crm_mon -1
.. code-block:: none
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.0-1) - partition with quorum
* Last updated: Mon Jan 29 12:18:42 2018
* Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
* 5 nodes configured
* 2 resources configured
Node List:
* Online: [ node1 node2 node3 node4 node5 ]
* Active resources:
* Fencing (stonith:fence_xvm): Started node1
* IP (ocf:heartbeat:IPaddr2): Started node2
.. topic:: Sample output from crm_mon -n -1
.. code-block:: none
Cluster Summary:
* Stack: corosync
* Current DC: node2 (version 2.0.0-1) - partition with quorum
* Last updated: Mon Jan 29 12:21:48 2018
* Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3
* 5 nodes configured
* 2 resources configured
* Node List:
* Node node1: online
* Fencing (stonith:fence_xvm): Started
* Node node2: online
* IP (ocf:heartbeat:IPaddr2): Started
* Node node3: online
* Node node4: online
* Node node5: online
As mentioned in an earlier chapter, the DC is the node is where decisions are
made. The cluster elects a node to be DC as needed. The only significance of
the choice of DC to an administrator is the fact that its logs will have the
most information about why decisions were made.
.. index::
pair: crm_mon; CSS
.. _crm_mon_css:
Styling crm_mon HTML output
___________________________
Various parts of ``crm_mon``'s HTML output have a CSS class associated with
them. Not everything does, but some of the most interesting portions do. In
the following example, the status of each node has an ``online`` class and the
details of each resource have an ``rsc-ok`` class.
.. code-block:: html
Node List
-
Node: cluster01 online
- ping (ocf::pacemaker:ping): Started
-
Node: cluster02 online
- ping (ocf::pacemaker:ping): Started
By default, a stylesheet for styling these classes is included in the head of
the HTML output. The relevant portions of this stylesheet that would be used
in the above example is:
.. code-block:: css
If you want to override some or all of the styling, simply create your own
stylesheet, place it on a web server, and pass ``--html-stylesheet=``
to ``crm_mon``. The link is added after the default stylesheet, so your
changes take precedence. You don't need to duplicate the entire default.
Only include what you want to change.
.. index::
single: cibadmin
single: command-line tool; cibadmin
.. _cibadmin:
Edit the CIB XML with cibadmin
##############################
The most flexible tool for modifying the configuration is Pacemaker's
``cibadmin`` command. With ``cibadmin``, you can query, add, remove, update
or replace any part of the configuration. All changes take effect immediately,
so there is no need to perform a reload-like operation.
The simplest way of using ``cibadmin`` is to use it to save the current
configuration to a temporary file, edit that file with your favorite
text or XML editor, and then upload the revised configuration.
.. topic:: Safely using an editor to modify the cluster configuration
.. code-block:: none
# cibadmin --query > tmp.xml
# vi tmp.xml
# cibadmin --replace --xml-file tmp.xml
Some of the better XML editors can make use of a RELAX NG schema to
help make sure any changes you make are valid. The schema describing
the configuration can be found in ``pacemaker.rng``, which may be
deployed in a location such as ``/usr/share/pacemaker`` depending on your
operating system distribution and how you installed the software.
If you want to modify just one section of the configuration, you can
query and replace just that section to avoid modifying any others.
.. topic:: Safely using an editor to modify only the resources section
.. code-block:: none
# cibadmin --query --scope resources > tmp.xml
# vi tmp.xml
# cibadmin --replace --scope resources --xml-file tmp.xml
To quickly delete a part of the configuration, identify the object you wish to
delete by XML tag and id. For example, you might search the CIB for all
STONITH-related configuration:
.. topic:: Searching for STONITH-related configuration items
.. code-block:: none
# cibadmin --query | grep stonith
If you wanted to delete the ``primitive`` tag with id ``child_DoFencing``,
you would run:
.. code-block:: none
# cibadmin --delete --xml-text ''
See the cibadmin man page for more options.
.. warning::
Never edit the live ``cib.xml`` file directly. Pacemaker will detect such
changes and refuse to use the configuration.
.. index::
single: crm_shadow
single: command-line tool; crm_shadow
.. _crm_shadow:
Batch Configuration Changes with crm_shadow
###########################################
Often, it is desirable to preview the effects of a series of configuration
changes before updating the live configuration all at once. For this purpose,
``crm_shadow`` creates a "shadow" copy of the configuration and arranges for
all the command-line tools to use it.
To begin, simply invoke ``crm_shadow --create`` with a name of your choice,
and follow the simple on-screen instructions. Shadow copies are identified with
a name to make it possible to have more than one.
.. warning::
Read this section and the on-screen instructions carefully; failure to do so
could result in destroying the cluster's active configuration!
.. topic:: Creating and displaying the active sandbox
.. code-block:: none
# crm_shadow --create test
Setting up shadow instance
Type Ctrl-D to exit the crm_shadow shell
shadow[test]:
shadow[test] # crm_shadow --which
test
From this point on, all cluster commands will automatically use the shadow copy
instead of talking to the cluster's active configuration. Once you have
finished experimenting, you can either make the changes active via the
``--commit`` option, or discard them using the ``--delete`` option. Again, be
sure to follow the on-screen instructions carefully!
For a full list of ``crm_shadow`` options and commands, invoke it with the
``--help`` option.
.. topic:: Use sandbox to make multiple changes all at once, discard them, and verify real configuration is untouched
.. code-block:: none
shadow[test] # crm_failcount -r rsc_c001n01 -G
scope=status name=fail-count-rsc_c001n01 value=0
shadow[test] # crm_standby --node c001n02 -v on
shadow[test] # crm_standby --node c001n02 -G
scope=nodes name=standby value=on
shadow[test] # cibadmin --erase --force
shadow[test] # cibadmin --query
shadow[test] # crm_shadow --delete test --force
Now type Ctrl-D to exit the crm_shadow shell
shadow[test] # exit
# crm_shadow --which
No active shadow configuration defined
# cibadmin -Q
See the next section, :ref:`crm_simulate`, for how to test your changes before
committing them to the live cluster.
.. index::
single: crm_simulate
single: command-line tool; crm_simulate
.. _crm_simulate:
Simulate Cluster Activity with crm_simulate
###########################################
The command-line tool `crm_simulate` shows the results of the same logic
the cluster itself uses to respond to a particular cluster configuration and
status.
As always, the man page is the primary documentation, and should be consulted
for further details. This section aims for a better conceptual explanation and
practical examples.
Replaying cluster decision-making logic
_______________________________________
At any given time, one node in a Pacemaker cluster will be elected DC, and that
node will run Pacemaker's scheduler to make decisions.
Each time decisions need to be made (a "transition"), the DC will have log
messages like "Calculated transition ... saving inputs in ..." with a file
name. You can grab the named file and replay the cluster logic to see why
particular decisions were made. The file contains the live cluster
configuration at that moment, so you can also look at it directly to see the
value of node attributes, etc., at that time.
The simplest usage is (replacing $FILENAME with the actual file name):
.. topic:: Simulate cluster response to a given CIB
.. code-block:: none
# crm_simulate --simulate --xml-file $FILENAME
That will show the cluster state when the process started, the actions that
need to be taken ("Transition Summary"), and the resulting cluster state if the
actions succeed. Most actions will have a brief description of why they were
required.
The transition inputs may be compressed. ``crm_simulate`` can handle these
compressed files directly, though if you want to edit the file, you'll need to
uncompress it first.
You can do the same simulation for the live cluster configuration at the
current moment. This is useful mainly when using ``crm_shadow`` to create a
sandbox version of the CIB; the ``--live-check`` option will use the shadow CIB
if one is in effect.
.. topic:: Simulate cluster response to current live CIB or shadow CIB
.. code-block:: none
# crm_simulate --simulate --live-check
Why decisions were made
_______________________
To get further insight into the "why", it gets user-unfriendly very quickly. If
you add the ``--show-scores`` option, you will also see all the scores that
went into the decision-making. The node with the highest cumulative score for a
resource will run it. You can look for ``-INFINITY`` scores in particular to
see where complete bans came into effect.
You can also add ``-VVVV`` to get more detailed messages about what's happening
under the hood. You can add up to two more V's even, but that's usually useful
only if you're a masochist or tracing through the source code.
Visualizing the action sequence
_______________________________
Another handy feature is the ability to generate a visual graph of the actions
needed, using the ``--dot-file`` option. This relies on the separate
Graphviz [#]_ project.
.. topic:: Generate a visual graph of cluster actions from a saved CIB
.. code-block:: none
# crm_simulate --simulate --xml-file $FILENAME --dot-file $FILENAME.dot
# dot $FILENAME.dot -Tsvg > $FILENAME.svg
``$FILENAME.dot`` will contain a GraphViz representation of the cluster's
response to your changes, including all actions with their ordering
dependencies.
``$FILENAME.svg`` will be the same information in a standard graphical format
that you can view in your browser or other app of choice. You could, of course,
use other ``dot`` options to generate other formats.
How to interpret the graphical output:
* Bubbles indicate actions, and arrows indicate ordering dependencies
* Resource actions have text of the form
``__ `` indicating that the
specified action will be executed for the specified resource on the
specified node, once if interval is 0 or at specified recurring interval
otherwise
* Actions with black text will be sent to the executor (that is, the
appropriate agent will be invoked)
* Actions with orange text are "pseudo" actions that the cluster uses
internally for ordering but require no real activity
* Actions with a solid green border are part of the transition (that is, the
cluster will attempt to execute them in the given order -- though a
transition can be interrupted by action failure or new events)
* Dashed arrows indicate dependencies that are not present in the transition
graph
* Actions with a dashed border will not be executed. If the dashed border is
blue, the cluster does not feel the action needs to be executed. If the
dashed border is red, the cluster would like to execute the action but
cannot. Any actions depending on an action with a dashed border will not be
able to execute.
* Loops should not happen, and should be reported as a bug if found.
.. topic:: Small Cluster Transition
.. image:: ../shared/images/Policy-Engine-small.png
:alt: An example transition graph as represented by Graphviz
- :height: 325
- :width: 1161
- :scale: 75 %
:align: center
In the above example, it appears that a new node, ``pcmk-2``, has come online
and that the cluster is checking to make sure ``rsc1``, ``rsc2`` and ``rsc3``
are not already running there (indicated by the ``rscN_monitor_0`` entries).
Once it did that, and assuming the resources were not active there, it would
have liked to stop ``rsc1`` and ``rsc2`` on ``pcmk-1`` and move them to
``pcmk-2``. However, there appears to be some problem and the cluster cannot or
is not permitted to perform the stop actions which implies it also cannot
perform the start actions. For some reason, the cluster does not want to start
``rsc3`` anywhere.
.. topic:: Complex Cluster Transition
.. image:: ../shared/images/Policy-Engine-big.png
:alt: Complex transition graph that you're not expected to be able to read
- :width: 1455
- :height: 1945
- :scale: 75 %
:align: center
What-if scenarios
_________________
You can make changes to the saved or shadow CIB and simulate it again, to see
how Pacemaker would react differently. You can edit the XML by hand, use
command-line tools such as ``cibadmin`` with either a shadow CIB or the
``CIB_file`` environment variable set to the filename, or use higher-level tool
support (see the man pages of the specific tool you're using for how to perform
actions on a saved CIB file rather than the live CIB).
You can also inject node failures and/or action failures into the simulation;
see the ``crm_simulate`` man page for more details.
This capability is useful when using a shadow CIB to edit the configuration.
Before committing the changes to the live cluster with ``crm_shadow --commit``,
you can use ``crm_simulate`` to see how the cluster will react to the changes.
.. _crm_attribute:
.. index::
single: attrd_updater
single: command-line tool; attrd_updater
single: crm_attribute
single: command-line tool; crm_attribute
Manage Node Attributes, Cluster Options and Defaults with crm_attribute and attrd_updater
#########################################################################################
``crm_attribute`` and ``attrd_updater`` are confusingly similar tools with subtle
differences.
``attrd_updater`` can query and update node attributes. ``crm_attribute`` can query
and update not only node attributes, but also cluster options, resource
defaults, and operation defaults.
To understand the differences, it helps to understand the various types of node
attribute.
.. table:: **Types of Node Attributes**
+-----------+----------+-------------------+------------------+----------------+----------------+
| Type | Recorded | Recorded in | Survive full | Manageable by | Manageable by |
| | in CIB? | attribute manager | cluster restart? | crm_attribute? | attrd_updater? |
| | | memory? | | | |
+===========+==========+===================+==================+================+================+
| permanent | yes | no | yes | yes | no |
+-----------+----------+-------------------+------------------+----------------+----------------+
| transient | yes | yes | no | yes | yes |
+-----------+----------+-------------------+------------------+----------------+----------------+
| private | no | yes | no | no | yes |
+-----------+----------+-------------------+------------------+----------------+----------------+
As you can see from the table above, ``crm_attribute`` can manage permanent and
transient node attributes, while ``attrd_updater`` can manage transient and
private node attributes.
The difference between the two tools lies mainly in *how* they update node
attributes: ``attrd_updater`` always contacts the Pacemaker attribute manager
directly, while ``crm_attribute`` will contact the attribute manager only for
transient node attributes, and will instead modify the CIB directly for
permanent node attributes (and for transient node attributes when unable to
contact the attribute manager).
By contacting the attribute manager directly, ``attrd_updater`` can change
an attribute's "dampening" (whether changes are immediately flushed to the CIB
or after a specified amount of time, to minimize disk writes for frequent
changes), set private node attributes (which are never written to the CIB), and
set attributes for nodes that don't yet exist.
By modifying the CIB directly, ``crm_attribute`` can set permanent node
attributes (which are only in the CIB and not managed by the attribute
manager), and can be used with saved CIB files and shadow CIBs.
However a transient node attribute is set, it is synchronized between the CIB
and the attribute manager, on all nodes.
.. index::
single: crm_failcount
single: command-line tool; crm_failcount
single: crm_node
single: command-line tool; crm_node
single: crm_report
single: command-line tool; crm_report
single: crm_standby
single: command-line tool; crm_standby
single: crm_verify
single: command-line tool; crm_verify
single: stonith_admin
single: command-line tool; stonith_admin
Other Commonly Used Tools
#########################
Other command-line tools include:
* ``crm_failcount``: query or delete resource fail counts
* ``crm_node``: manage cluster nodes
* ``crm_report``: generate a detailed cluster report for bug submissions
* ``crm_resource``: manage cluster resources
* ``crm_standby``: manage standby status of nodes
* ``crm_verify``: validate a CIB
* ``stonith_admin``: manage fencing devices
See the manual pages for details.
.. rubric:: Footnotes
.. [#] Graph visualization software. See http://www.graphviz.org/ for details.
diff --git a/doc/sphinx/Pacemaker_Remote/intro.rst b/doc/sphinx/Pacemaker_Remote/intro.rst
index 361d4fb82d..9c5dab81a0 100644
--- a/doc/sphinx/Pacemaker_Remote/intro.rst
+++ b/doc/sphinx/Pacemaker_Remote/intro.rst
@@ -1,190 +1,186 @@
Scaling a Pacemaker Cluster
---------------------------
Overview
########
In a basic Pacemaker high-availability cluster [#]_ each node runs the full
cluster stack of Corosync and all Pacemaker components. This allows great
flexibility but limits scalability to around 16 nodes.
To allow for scalability to dozens or even hundreds of nodes, Pacemaker
allows nodes not running the full cluster stack to integrate into the cluster
and have the cluster manage their resources as if they were a cluster node.
Terms
#####
.. index::
single: cluster node
single: node; cluster node
**cluster node**
A node running the full high-availability stack of corosync and all
Pacemaker components. Cluster nodes may run cluster resources, run
all Pacemaker command-line tools (``crm_mon``, ``crm_resource`` and so on),
execute fencing actions, count toward cluster quorum, and serve as the
cluster's Designated Controller (DC).
.. index:: pacemaker_remoted
**pacemaker_remoted**
A small service daemon that allows a host to be used as a Pacemaker node
without running the full cluster stack. Nodes running ``pacemaker_remoted``
may run cluster resources and most command-line tools, but cannot perform
other functions of full cluster nodes such as fencing execution, quorum
voting, or DC eligibility. The ``pacemaker_remoted`` daemon is an enhanced
version of Pacemaker's local resource management daemon (LRMD).
.. index::
single: remote node
single: node; remote node
**pacemaker_remote**
The name of the systemd service that manages ``pacemaker_remoted``
**Pacemaker Remote**
A way to refer to the general technology implementing nodes running
``pacemaker_remoted``, including the cluster-side implementation
and the communication protocol between them.
**remote node**
A physical host running ``pacemaker_remoted``. Remote nodes have a special
resource that manages communication with the cluster. This is sometimes
referred to as the *bare metal* case.
.. index::
single: guest node
single: node; guest node
**guest node**
A virtual host running ``pacemaker_remoted``. Guest nodes differ from remote
nodes mainly in that the guest node is itself a resource that the cluster
manages.
.. NOTE::
*Remote* in this document refers to the node not being a part of the underlying
corosync cluster. It has nothing to do with physical proximity. Remote nodes
and guest nodes are subject to the same latency requirements as cluster nodes,
which means they are typically in the same data center.
.. NOTE::
It is important to distinguish the various roles a virtual machine can serve
in Pacemaker clusters:
* A virtual machine can run the full cluster stack, in which case it is a
cluster node and is not itself managed by the cluster.
* A virtual machine can be managed by the cluster as a resource, without the
cluster having any awareness of the services running inside the virtual
machine. The virtual machine is *opaque* to the cluster.
* A virtual machine can be a cluster resource, and run ``pacemaker_remoted``
to make it a guest node, allowing the cluster to manage services
inside it. The virtual machine is *transparent* to the cluster.
.. index::
single: virtual machine; as guest node
Guest Nodes
###########
**"I want a Pacemaker cluster to manage virtual machine resources, but I also
want Pacemaker to be able to manage the resources that live within those
virtual machines."**
Without ``pacemaker_remoted``, the possibilities for implementing the above use
case have significant limitations:
* The cluster stack could be run on the physical hosts only, which loses the
ability to monitor resources within the guests.
* A separate cluster could be on the virtual guests, which quickly hits
scalability issues.
* The cluster stack could be run on the guests using the same cluster as the
physical hosts, which also hits scalability issues and complicates fencing.
With ``pacemaker_remoted``:
* The physical hosts are cluster nodes (running the full cluster stack).
* The virtual machines are guest nodes (running ``pacemaker_remoted``).
Nearly zero configuration is required on the virtual machine.
* The cluster stack on the cluster nodes launches the virtual machines and
immediately connects to ``pacemaker_remoted`` on them, allowing the
virtual machines to integrate into the cluster.
The key difference here between the guest nodes and the cluster nodes is that
the guest nodes do not run the cluster stack. This means they will never become
the DC, initiate fencing actions or participate in quorum voting.
On the other hand, this also means that they are not bound to the scalability
limits associated with the cluster stack (no 16-node corosync member limits to
deal with). That isn't to say that guest nodes can scale indefinitely, but it
is known that guest nodes scale horizontally much further than cluster nodes.
Other than the quorum limitation, these guest nodes behave just like cluster
nodes with respect to resource management. The cluster is fully capable of
managing and monitoring resources on each guest node. You can build constraints
against guest nodes, put them in standby, or do whatever else you'd expect to
be able to do with cluster nodes. They even show up in ``crm_mon`` output as
nodes.
To solidify the concept, below is an example that is very similar to an actual
deployment we test in our developer environment to verify guest node scalability:
* 16 cluster nodes running the full Corosync + Pacemaker stack
* 64 Pacemaker-managed virtual machine resources running ``pacemaker_remoted``
configured as guest nodes
* 64 Pacemaker-managed webserver and database resources configured to run on
the 64 guest nodes
With this deployment, you would have 64 webservers and databases running on 64
virtual machines on 16 hardware nodes, all of which are managed and monitored by
the same Pacemaker deployment. It is known that ``pacemaker_remoted`` can scale
to these lengths and possibly much further depending on the specific scenario.
Remote Nodes
############
**"I want my traditional high-availability cluster to scale beyond the limits
imposed by the corosync messaging layer."**
Ultimately, the primary advantage of remote nodes over cluster nodes is
scalability. There are likely some other use cases related to geographically
distributed HA clusters that remote nodes may serve a purpose in, but those use
cases are not well understood at this point.
Like guest nodes, remote nodes will never become the DC, initiate
fencing actions or participate in quorum voting.
That is not to say, however, that fencing of a remote node works any
differently than that of a cluster node. The Pacemaker scheduler
understands how to fence remote nodes. As long as a fencing device exists, the
cluster is capable of ensuring remote nodes are fenced in the exact same way as
cluster nodes.
Expanding the Cluster Stack
###########################
With ``pacemaker_remoted``, the traditional view of the high-availability stack
can be expanded to include a new layer:
Traditional HA Stack
____________________
.. image:: images/pcmk-ha-cluster-stack.png
- :width: 17cm
- :height: 9cm
:alt: Traditional Pacemaker+Corosync Stack
:align: center
HA Stack With Guest Nodes
_________________________
.. image:: images/pcmk-ha-remote-stack.png
- :width: 20cm
- :height: 10cm
:alt: Pacemaker+Corosync Stack with pacemaker_remoted
:align: center
.. [#] See the ``_ Pacemaker documentation,
especially *Clusters From Scratch* and *Pacemaker Explained*.
diff --git a/doc/sphinx/shared/pacemaker-intro.rst b/doc/sphinx/shared/pacemaker-intro.rst
index c7aaeab86d..3473636843 100644
--- a/doc/sphinx/shared/pacemaker-intro.rst
+++ b/doc/sphinx/shared/pacemaker-intro.rst
@@ -1,201 +1,196 @@
What Is Pacemaker?
####################
Pacemaker is a high-availability *cluster resource manager* -- software that
runs on a set of hosts (a *cluster* of *nodes*) in order to preserve integrity
and minimize downtime of desired services (*resources*). [#]_ It is maintained
by the `ClusterLabs `_ community.
Pacemaker's key features include:
* Detection of and recovery from node- and service-level failures
* Ability to ensure data integrity by fencing faulty nodes
* Support for one or more nodes per cluster
* Support for multiple resource interface standards (anything that can be
scripted can be clustered)
* Support (but no requirement) for shared storage
* Support for practically any redundancy configuration (active/passive, N+1,
etc.)
* Automatically replicated configuration that can be updated from any node
* Ability to specify cluster-wide relationships between services,
such as ordering, colocation and anti-colocation
* Support for advanced service types, such as *clones* (services that need to
be active on multiple nodes), *promotable clones* (clones that can run in
one of two roles), and containerized services
* Unified, scriptable cluster management tools
.. note:: **Fencing**
*Fencing*, also known as *STONITH* (an acronym for Shoot The Other Node In
The Head), is the ability to ensure that it is not possible for a node to be
running a service. This is accomplished via *fence devices* such as
intelligent power switches that cut power to the target, or intelligent
network switches that cut the target's access to the local network.
Pacemaker represents fence devices as a special class of resource.
A cluster cannot safely recover from certain failure conditions, such as an
unresponsive node, without fencing.
Cluster Architecture
____________________
At a high level, a cluster can be viewed as having these parts (which together
are often referred to as the *cluster stack*):
* **Resources:** These are the reason for the cluster's being -- the services
that need to be kept highly available.
* **Resource agents:** These are scripts or operating system components that
start, stop, and monitor resources, given a set of resource parameters.
These provide a uniform interface between Pacemaker and the managed
services.
* **Fence agents:** These are scripts that execute node fencing actions,
given a target and fence device parameters.
* **Cluster membership layer:** This component provides reliable messaging,
membership, and quorum information about the cluster. Currently, Pacemaker
supports `Corosync `_ as this layer.
* **Cluster resource manager:** Pacemaker provides the brain that processes
and reacts to events that occur in the cluster. These events may include
nodes joining or leaving the cluster; resource events caused by failures,
maintenance, or scheduled activities; and other administrative actions.
To achieve the desired availability, Pacemaker may start and stop resources
and fence nodes.
* **Cluster tools:** These provide an interface for users to interact with the
cluster. Various command-line and graphical (GUI) interfaces are available.
Most managed services are not, themselves, cluster-aware. However, many popular
open-source cluster filesystems make use of a common *Distributed Lock
Manager* (DLM), which makes direct use of Corosync for its messaging and
membership capabilities and Pacemaker for the ability to fence nodes.
.. image:: ../shared/images/pcmk-stack.png
:alt: Example cluster stack
- :scale: 75 %
:align: center
Pacemaker Architecture
______________________
Pacemaker itself is composed of multiple daemons that work together:
* pacemakerd
* pacemaker-attrd
* pacemaker-based
* pacemaker-controld
* pacemaker-execd
* pacemaker-fenced
* pacemaker-schedulerd
.. image:: ../shared/images/pcmk-internals.png
:alt: Pacemaker software components
- :scale: 65 %
:align: center
The Pacemaker master process (pacemakerd) spawns all the other daemons, and
respawns them if they unexpectedly exit.
The *Cluster Information Base* (CIB) is an
`XML `_ representation of the cluster's
configuration and the state of all nodes and resources. The *CIB manager*
(pacemaker-based) keeps the CIB synchronized across the cluster, and handles
requests to modify it.
The *attribute manager* (pacemaker-attrd) maintains a database of attributes
for all nodes, keeps it synchronized across the cluster, and handles requests
to modify them. These attributes are usually recorded in the CIB.
Given a snapshot of the CIB as input, the *scheduler* (pacemaker-schedulerd)
determines what actions are necessary to achieve the desired state of the
cluster.
The *local executor* (pacemaker-execd) handles requests to execute
resource agents on the local cluster node, and returns the result.
The *fencer* (pacemaker-fenced) handles requests to fence nodes. Given a target
node, the fencer decides which cluster node(s) should execute which fencing
device(s), and calls the necessary fencing agents (either directly, or via
requests to the fencer peers on other nodes), and returns the result.
The *controller* (pacemaker-controld) is Pacemaker's coordinator, maintaining a
consistent view of the cluster membership and orchestrating all the other
components.
Pacemaker centralizes cluster decision-making by electing one of the controller
instances as the 'Designated Controller' ('DC'). Should the elected DC process
(or the node it is on) fail, a new one is quickly established. The DC responds
to cluster events by taking a current snapshot of the CIB, feeding it to the
scheduler, then asking the executors (either directly on the local node, or via
requests to controller peers on other nodes) and the fencer to execute any
necessary actions.
.. note:: **Old daemon names**
The Pacemaker daemons were renamed in version 2.0. You may still find
references to the old names, especially in documentation targeted to
version 1.1.
.. table::
+-------------------+---------------------+
| Old name | New name |
+===================+=====================+
| attrd | pacemaker-attrd |
+-------------------+---------------------+
| cib | pacemaker-based |
+-------------------+---------------------+
| crmd | pacemaker-controld |
+-------------------+---------------------+
| lrmd | pacemaker-execd |
+-------------------+---------------------+
| stonithd | pacemaker-fenced |
+-------------------+---------------------+
| pacemaker_remoted | pacemaker-remoted |
+-------------------+---------------------+
Node Redundancy Designs
_______________________
Pacemaker supports practically any `node redundancy configuration
`_
including *Active/Active*, *Active/Passive*, *N+1*, *N+M*, *N-to-1* and
*N-to-N*.
Active/passive clusters with two (or more) nodes using Pacemaker and
`DRBD `_ are
a cost-effective high-availability solution for many situations. One of the
nodes provides the desired services, and if it fails, the other node takes
over.
.. image:: ../shared/images/pcmk-active-passive.png
:alt: Active/Passive Redundancy
:align: center
- :scale: 75 %
Pacemaker also supports multiple nodes in a shared-failover design, reducing
hardware costs by allowing several active/passive clusters to be combined and
share a common backup node.
.. image:: ../shared/images/pcmk-shared-failover.png
:alt: Shared Failover
:align: center
- :scale: 75 %
When shared storage is available, every node can potentially be used for
failover. Pacemaker can even run multiple copies of services to spread out the
workload. This is sometimes called N to N Redundancy.
.. image:: ../shared/images/pcmk-active-active.png
:alt: N to N Redundancy
:align: center
- :scale: 75 %
.. rubric:: Footnotes
.. [#] *Cluster* is sometimes used in other contexts to refer to hosts grouped
together for other purposes, such as high-performance computing (HPC),
but Pacemaker is not intended for those purposes.