diff --git a/INSTALL.md b/INSTALL.md index c4af9f7217..d25d4b04d9 100644 --- a/INSTALL.md +++ b/INSTALL.md @@ -1,79 +1,79 @@ # How to Install Pacemaker ## Build Dependencies | Version | Fedora-based | Suse-based | Debian-based | |:---------------:|:------------------:|:------------------:|:--------------:| | 1.11 or later | automake | automake | automake | | 2.64 or later | autoconf | autoconf | autoconf | | | libtool | libtool | libtool | | | libtool-ltdl-devel | | libltdl-dev | | | libuuid-devel | libuuid-devel | uuid-dev | | | pkgconfig | pkgconfig | pkg-config | | 2.32.0 or later | glib2-devel | glib2-devel | libglib2.0-dev | | | libxml2-devel | libxml2-devel | libxml2-dev | | | libxslt-devel | libxslt-devel | libxslt-dev | | | bzip2-devel | libbz2-devel | libbz2-dev | | | libqb-devel | libqb-devel | libqb-dev | | 3.2 or later | python3 | python3 | python3 | Also: GNU make ### Cluster Stack Dependencies *Only corosync is currently supported* | Version | Fedora-based | Suse-based | Debian-based | |:---------------:|:------------------:|:------------------:|:--------------:| | 2.0.0 or later | corosynclib | libcorosync | corosync | | 2.0.0 or later | corosynclib-devel | libcorosync-devel | | | | | | libcfg-dev | | | | | libcpg-dev | | | | | libcmap-dev | | | | | libquorum-dev | ### Optional Build Dependencies | Feature Enabled | Version | Fedora-based | Suse-based | Debian-based | |:-----------------------------------------------:|:--------------:|:-----------------------:|:-----------------------:|:-----------------------:| | Pacemaker Remote and encrypted remote CIB admin | 2.1.7 or later | gnutls-devel | libgnutls-devel | libgnutls-dev | | encrypted remote CIB admin | | pam-devel | pam-devel | libpam0g-dev | | interactive crm_mon | | ncurses-devel | ncurses-devel | ncurses-dev | | systemd support | | systemd-devel | systemd-devel | libsystemd-dev | | systemd/upstart resource support | | dbus-devel | dbus-devel | libdbus-1-dev | | Linux-HA style fencing agents | | cluster-glue-libs-devel | libglue-devel | cluster-glue-dev | | documentation | | asciidoc or asciidoctor | asciidoc or asciidoctor | asciidoc or asciidoctor | | documentation | | help2man | help2man | help2man | | documentation | | inkscape | inkscape | inkscape | | documentation | | docbook-style-xsl | docbook-xsl-stylesheets | docbook-xsl | | documentation | | python3-sphinx | python3-sphinx | python3-sphinx | -| documentation (PDF) | | texlive, texlive-titlesec, texlive-framed, texlive-threeparttable texlive-wrapfig texlive-multirow | texlive, texlive-latex | texlive, texlive-latex-extra | +| documentation (PDF) | | latexmk texlive texlive-capt-of texlive-collection-xetex texlive-fncychap texlive-framed texlive-multirow texlive-needspace texlive-tabulary texlive-titlesec texlive-threeparttable texlive-upquote texlive-wrapfig texlive-xetex | texlive texlive-latex | texlive texlive-latex-extra | | RPM packages via "make rpm" | 4.11 or later | rpm | rpm | (n/a) | ## Optional testing dependencies * procps and psmisc (if running cts-exec, cts-fencing, or CTS) * valgrind (if running CTS valgrind tests) * python3-systemd (if using CTS on cluster nodes running systemd) * rsync (if running CTS container tests) * libvirt-daemon-driver-lxc (if running CTS container tests) * libvirt-daemon-lxc (if running CTS container tests) * libvirt-login-shell (if running CTS container tests) * nmap (if not specifying an IP address base) * oprofile (if running CTS profiling tests) * dlm (to log DLM debugging info after CTS tests) ## Simple install $ make && sudo make install If GNU make is not your default make, use "gmake" instead. ## Detailed install First, browse the build options that are available: $ ./autogen.sh $ ./configure --help Re-run ./configure with any options you want, then proceed with the simple method. diff --git a/doc/sphinx/Clusters_from_Scratch/cluster-setup.rst b/doc/sphinx/Clusters_from_Scratch/cluster-setup.rst index 56939c00e1..8aeb39cfb0 100644 --- a/doc/sphinx/Clusters_from_Scratch/cluster-setup.rst +++ b/doc/sphinx/Clusters_from_Scratch/cluster-setup.rst @@ -1,299 +1,295 @@ Set up a Cluster ---------------- Simplify Administration With a Cluster Shell ############################################ In the dark past, configuring Pacemaker required the administrator to read and write XML. In true UNIX style, there were also a number of different commands that specialized in different aspects of querying and updating the cluster. In addition, the various components of the cluster stack (corosync, pacemaker, etc.) had to be configured separately, with different configuration tools and formats. All of that has been greatly simplified with the creation of higher-level tools, whether command-line or GUIs, that hide all the mess underneath. Command-line cluster shells take all the individual aspects required for managing and configuring a cluster, and pack them into one simple-to-use command-line tool. They even allow you to queue up several changes at once and commit them all at once. Two popular command-line shells are ``pcs`` and ``crmsh``. Clusters from Scratch is based on ``pcs`` because it comes with CentOS, but both have similar functionality. Choosing a shell or GUI is a matter of personal preference and what comes with (and perhaps is supported by) your choice of operating system. Install the Cluster Software ############################ Fire up a shell on both nodes and run the following to activate the High Availability repo. .. code-block:: none # dnf config-manager --set-enabled ha .. IMPORTANT:: This document will show commands that need to be executed on both nodes with a simple ``#`` prompt. Be sure to run them on each node individually. Now, we'll install pacemaker, pcs, and some other command-line tools that will make our lives easier: .. code-block:: none # yum install -y pacemaker pcs psmisc policycoreutils-python3 .. NOTE:: This document uses ``pcs`` for cluster management. Other alternatives, such as ``crmsh``, are available, but their syntax will differ from the examples used here. Configure the Cluster Software ############################## .. index:: single: firewall Allow cluster services through firewall _______________________________________ On each node, allow cluster-related services through the local firewall: .. code-block:: none # firewall-cmd --permanent --add-service=high-availability success # firewall-cmd --reload success .. NOTE :: If you are using iptables directly, or some other firewall solution besides firewalld, simply open the following ports, which can be used by various clustering components: TCP ports 2224, 3121, and 21064, and UDP port 5405. If you run into any problems during testing, you might want to disable the firewall and SELinux entirely until you have everything working. This may create significant security issues and should not be performed on machines that will be exposed to the outside world, but may be appropriate during development and testing on a protected host. To disable security measures: .. code-block:: none [root@pcmk-1 ~]# setenforce 0 [root@pcmk-1 ~]# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config [root@pcmk-1 ~]# systemctl mask firewalld.service [root@pcmk-1 ~]# systemctl stop firewalld.service [root@pcmk-1 ~]# iptables --flush Enable pcs Daemon _________________ Before the cluster can be configured, the pcs daemon must be started and enabled to start at boot time on each node. This daemon works with the pcs command-line interface to manage synchronizing the corosync configuration across all nodes in the cluster. Start and enable the daemon by issuing the following commands on each node: .. code-block:: none # systemctl start pcsd.service # systemctl enable pcsd.service Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service. The installed packages will create a **hacluster** user with a disabled password. While this is fine for running ``pcs`` commands locally, the account needs a login password in order to perform such tasks as syncing the corosync configuration, or starting and stopping the cluster on other nodes. This tutorial will make use of such commands, so now we will set a password for the **hacluster** user, using the same password on both nodes: .. code-block:: none # passwd hacluster Changing password for user hacluster. New password: Retype new password: passwd: all authentication tokens updated successfully. .. NOTE:: Alternatively, to script this process or set the password on a different machine from the one you're logged into, you can use the ``--stdin`` option for ``passwd``: .. code-block:: none [root@pcmk-1 ~]# ssh pcmk-2 -- 'echo mysupersecretpassword | passwd --stdin hacluster' Configure Corosync __________________ On either node, use ``pcs host auth`` to authenticate as the **hacluster** user: .. code-block:: none [root@pcmk-1 ~]# pcs host auth pcmk-1 pcmk-2 Username: hacluster Password: pcmk-2: Authorized pcmk-1: Authorized Next, use ``pcs cluster setup`` on the same node to generate and synchronize the corosync configuration: .. code-block:: none [root@pcmk-1 ~]# pcs cluster setup mycluster pcmk-1 pcmk-2 No addresses specified for host 'pcmk-1', using 'pcmk-1' No addresses specified for host 'pcmk-2', using 'pcmk-2' Destroying cluster on hosts: 'pcmk-1', 'pcmk-2'... pcmk-2: Successfully destroyed cluster pcmk-1: Successfully destroyed cluster Requesting remove 'pcsd settings' from 'pcmk-1', 'pcmk-2' pcmk-1: successful removal of the file 'pcsd settings' pcmk-2: successful removal of the file 'pcsd settings' Sending 'corosync authkey', 'pacemaker authkey' to 'pcmk-1', 'pcmk-2' pcmk-1: successful distribution of the file 'corosync authkey' pcmk-1: successful distribution of the file 'pacemaker authkey' pcmk-2: successful distribution of the file 'corosync authkey' pcmk-2: successful distribution of the file 'pacemaker authkey' Sending 'corosync.conf' to 'pcmk-1', 'pcmk-2' pcmk-1: successful distribution of the file 'corosync.conf' pcmk-2: successful distribution of the file 'corosync.conf' Cluster has been successfully set up. If you received an authorization error for either of those commands, make sure you configured the **hacluster** user account on each node with the same password. The final corosync.conf configuration on each node should look something like the sample in :ref:`sample-corosync-configuration`. Explore pcs ########### Start by taking some time to familiarize yourself with what ``pcs`` can do. .. code-block:: none [root@pcmk-1 ~]# pcs Usage: pcs [-f file] [-h] [commands]... Control and configure pacemaker and corosync. Options: -h, --help Display usage and exit. -f file Perform actions on file instead of active CIB. Commands supporting the option use the initial state of the specified file as their input and then overwrite the file with the state reflecting the requested operation(s). A few commands only use the specified file in read-only mode since their effect is not a CIB modification. --debug Print all network traffic and external commands run. --version Print pcs version information. List pcs capabilities if --full is specified. --request-timeout Timeout for each outgoing request to another node in seconds. Default is 60s. --force Override checks and errors, the exact behavior depends on the command. WARNING: Using the --force option is strongly discouraged unless you know what you are doing. Commands: cluster Configure cluster options and nodes. resource Manage cluster resources. stonith Manage fence devices. constraint Manage resource constraints. property Manage pacemaker properties. acl Manage pacemaker access control lists. qdevice Manage quorum device provider on the local host. quorum Manage cluster quorum settings. booth Manage booth (cluster ticket manager). status View cluster status. config View and manage cluster configuration. pcsd Manage pcs daemon. host Manage hosts known to pcs/pcsd. node Manage cluster nodes. alert Manage pacemaker alerts. client Manage pcsd client configuration. dr Manage disaster recovery configuration. tag Manage pacemaker tags. As you can see, the different aspects of cluster management are separated into categories. To discover the functionality available in each of these categories, one can issue the command ``pcs help``. Below is an example of all the options available under the status category. .. code-block:: none [root@pcmk-1 ~]# pcs status help Usage: pcs status [commands]... View current cluster and resource status Commands: [status] [--full] [--hide-inactive] View all information about the cluster and resources (--full provides more details, --hide-inactive hides inactive resources). resources [--hide-inactive] Show status of all currently configured resources. If --hide-inactive is specified, only show active resources. cluster View current cluster status. corosync View current membership information as seen by corosync. quorum View current quorum status. qdevice [--full] [] Show runtime status of specified model of quorum device provider. Using --full will give more detailed output. If is specified, only information about the specified cluster will be displayed. booth Print current status of booth on the local node. nodes [corosync | both | config] View current status of nodes from pacemaker. If 'corosync' is specified, view current status of nodes from corosync instead. If 'both' is specified, view current status of nodes from both corosync & pacemaker. If 'config' is specified, print nodes from corosync & pacemaker configuration. pcsd []... Show current status of pcsd on nodes specified, or on all nodes configured in the local cluster if no nodes are specified. xml View xml version of status (output from crm_mon -r -1 -X). Additionally, if you are interested in the version and supported cluster stack(s) available with your Pacemaker installation, run: .. code-block:: none [root@pcmk-1 ~]# pacemakerd --features Pacemaker 2.0.5-4.el8 (Build: ba59be7122) Supporting v3.6.1: generated-manpages agent-manpages ncurses libqb-logging libqb-ipc systemd nagios corosync-native atomic-attrd acls cibsecrets - - -.. [#] For some subtle issues, see `Topics in High-Performance Messaging: Multicast Address Assignment `_ - or the more detailed treatment in `Cisco's Guidelines for Enterprise IP Multicast Address Allocation `_. diff --git a/doc/sphinx/Clusters_from_Scratch/installation.rst b/doc/sphinx/Clusters_from_Scratch/installation.rst index 2827cda52a..6a0d4698cd 100644 --- a/doc/sphinx/Clusters_from_Scratch/installation.rst +++ b/doc/sphinx/Clusters_from_Scratch/installation.rst @@ -1,436 +1,416 @@ Installation ------------ Install |CFS_DISTRO| |CFS_DISTRO_VER| ################################################################################################ Boot the Install Image ______________________ Download the latest |CFS_DISTRO| |CFS_DISTRO_VER| DVD ISO by navigating to the `CentOS Mirrors List `_, selecting a download mirror which is close to you, and finally selecting the .iso file that has "dvd" in its name. Use the image to boot a virtual machine, or burn it to a DVD or USB drive and boot a physical server from that. After starting the installation, select your language and keyboard layout at the welcome screen. .. figure:: images/WelcomeToCentos.png - :scale: 80% - :width: 1024 - :height: 800 :align: center :alt: Installation Welcome Screen |CFS_DISTRO| |CFS_DISTRO_VER| Installation Welcome Screen Installation Options ____________________ At this point, you get a chance to tweak the default installation options. .. figure:: images/InstallationSummary.png - :scale: 80% - :width: 1024 - :height: 800 :align: center :alt: Installation Summary Screen |CFS_DISTRO| |CFS_DISTRO_VER| Installation Summary Screen Click on the **SOFTWARE SELECTION** section (try saying that 10 times quickly). The default environment, **Server with GUI**, does have add-ons with much of the software we need, but we will change the environment to a **Minimal Install** here, so that we can see exactly what software is required later, and press **Done**. .. figure:: images/SoftwareSelection.png - :scale: 80% - :width: 1024 - :height: 800 :align: center :alt: Software Selection Screen |CFS_DISTRO| |CFS_DISTRO_VER| Software Selection Screen Configure Network _________________ In the **NETWORK & HOSTNAME** section: - Edit **Host Name:** as desired. For this example, we will use **pcmk-1.localdomain** and then press **Apply**. - Select your network device, press **Configure...**, and use the **Manual** method to assign a fixed IP address. For this example, we'll use 192.168.122.101 under **IPv4 Settings** (with an appropriate netmask, gateway and DNS server). - Press **Save**. - Flip the switch to turn your network device on, and press **Done**. .. figure:: images/NetworkAndHostName.png - :scale: 80% - :width: 1024 - :height: 800 :align: center :alt: Editing network settings |CFS_DISTRO| |CFS_DISTRO_VER| Network Interface Screen .. IMPORTANT:: Do not accept the default network settings. Cluster machines should never obtain an IP address via DHCP, because DHCP's periodic address renewal will interfere with corosync. Configure Disk ______________ By default, the installer's automatic partitioning will use LVM (which allows us to dynamically change the amount of space allocated to a given partition). However, it allocates all free space to the ``/`` (aka. **root**) partition, which cannot be reduced in size later (dynamic increases are fine). In order to follow the DRBD and GFS2 portions of this guide, we need to reserve space on each machine for a replicated volume. Enter the **INSTALLATION DESTINATION** section, ensure the hard drive you want to install to is selected, select **Custom** to be the **Storage Configuration**, and press **Done**. In the **MANUAL PARTITIONING** screen that comes next, click the option to create mountpoints automatically. Select the ``/`` mountpoint, and reduce the desired capacity by 3GiB or so. Select **Modify...** by the volume group name, and change the **Size policy:** to **As large as possible**, to make the reclaimed space available inside the LVM volume group. We'll add the additional volume later. .. figure:: images/ManualPartitioning.png - :scale: 80% - :width: 1024 - :height: 800 :align: center :alt: Manual Partitioning Screen |CFS_DISTRO| |CFS_DISTRO_VER| Manual Partitioning Screen Press **Done**, then **Accept changes**. Configure Time Synchronization ______________________________ It is highly recommended to enable NTP on your cluster nodes. Doing so ensures all nodes agree on the current time and makes reading log files significantly easier. |CFS_DISTRO| will enable NTP automatically. If you want to change any time-related settings (such as time zone or NTP server), you can do this in the **TIME & DATE** section. Root Password ______________________________ In order to continue to the next step, a **Root Password** must be set. .. figure:: images/RootPassword.png - :scale: 80% - :width: 1024 - :height: 800 :align: center :alt: Root Password Screen |CFS_DISTRO| |CFS_DISTRO_VER| Root Password Screen Press **Done** (depending on the password you chose, you may need to do so twice). Finish Install ______________ Select **Begin Installation**. Once it completes, **Reboot System** as instructed. After the node reboots, you'll see a login prompt on the console. Login using **root** and the password you created earlier. .. figure:: images/ConsolePrompt.png - :scale: 80% - :width: 1024 - :height: 768 :align: center :alt: Console Prompt |CFS_DISTRO| |CFS_DISTRO_VER| Console Prompt .. NOTE:: From here on, we're going to be working exclusively from the terminal. Configure the OS ################ Verify Networking _________________ Ensure that the machine has the static IP address you configured earlier. .. code-block:: none + [root@pcmk-1 ~]# ip addr 1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp1s0: mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:32:cf:a9 brd ff:ff:ff:ff:ff:ff inet 192.168.122.101/24 brd 192.168.122.255 scope global noprefixroute enp1s0 valid_lft forever preferred_lft forever inet6 fe80::c3e1:3ba:959:fa96/64 scope link noprefixroute valid_lft forever preferred_lft forever .. NOTE:: If you ever need to change the node's IP address from the command line, follow these instructions, replacing **${device}** with the name of your network device: .. code-block:: none [root@pcmk-1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-${device} # manually edit as desired [root@pcmk-1 ~]# nmcli dev disconnect ${device} [root@pcmk-1 ~]# nmcli con reload ${device} [root@pcmk-1 ~]# nmcli con up ${device} This makes **NetworkManager** aware that a change was made on the config file. Next, ensure that the routes are as expected: .. code-block:: none [root@pcmk-1 ~]# ip route default via 192.168.122.1 dev enp1s0 proto static metric 100 192.168.122.0/24 dev enp1s0 proto kernel scope link src 192.168.122.101 metric 100 If there is no line beginning with **default via**, then you may need to add a line such as ``GATEWAY="192.168.122.1"`` to the device configuration using the same process as described above for changing the IP address. Now, check for connectivity to the outside world. Start small by testing whether we can reach the gateway we configured. .. code-block:: none [root@pcmk-1 ~]# ping -c 1 192.168.122.1 PING 192.168.122.1 (192.168.122.1) 56(84) bytes of data. 64 bytes from 192.168.122.1: icmp_seq=1 ttl=64 time=0.492 ms --- 192.168.122.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.492/0.492/0.492/0.000 ms Now try something external; choose a location you know should be available. .. code-block:: none [root@pcmk-1 ~]# ping -c 1 www.clusterlabs.org PING mx1.clusterlabs.org (95.217.104.78) 56(84) bytes of data. 64 bytes from mx1.clusterlabs.org (95.217.104.78): icmp_seq=1 ttl=54 time=134 ms --- mx1.clusterlabs.org ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 133.987/133.987/133.987/0.000 ms Login Remotely ______________ The console isn't a very friendly place to work from, so we will now switch to accessing the machine remotely via SSH where we can use copy and paste, etc. From another host, check whether we can see the new host at all: .. code-block:: none [gchin@gchin ~]$ ping -c 1 192.168.122.101 PING 192.168.122.101 (192.168.122.101) 56(84) bytes of data. 64 bytes from 192.168.122.101: icmp_seq=1 ttl=64 time=0.344 ms --- 192.168.122.101 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.344/0.344/0.344/0.000 ms Next, login as root via SSH. .. code-block:: none [gchin@gchin ~]$ ssh root@192.168.122.101 The authenticity of host '192.168.122.101 (192.168.122.101)' can't be established. ECDSA key fingerprint is SHA256:NBvcRrPDLIt39Rf0Tz4/f2Rd/FA5wUiDOd9bZ9QWWjo. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '192.168.122.101' (ECDSA) to the list of known hosts. root@192.168.122.101's password: Last login: Tue Jan 10 20:46:30 2021 [root@pcmk-1 ~]# Apply Updates _____________ Apply any package updates released since your installation image was created: .. code-block:: none [root@pcmk-1 ~]# yum update .. index:: single: node; short name Use Short Node Names ____________________ During installation, we filled in the machine's fully qualified domain name (FQDN), which can be rather long when it appears in cluster logs and status output. See for yourself how the machine identifies itself: .. code-block:: none [root@pcmk-1 ~]# uname -n pcmk-1.localdomain We can use the `hostnamectl` tool to strip off the domain name: .. code-block:: none [root@pcmk-1 ~]# hostnamectl set-hostname $(uname -n | sed s/\\..*//) Now, check that the machine is using the correct name: .. code-block:: none [root@pcmk-1 ~]# uname -n pcmk-1 You may want to reboot to ensure all updates take effect. Repeat for Second Node ###################### Repeat the Installation steps so far, so that you have two nodes ready to have the cluster software installed. For the purposes of this document, the additional node is called pcmk-2 with address 192.168.122.102. Configure Communication Between Nodes ##################################### Configure Host Name Resolution ______________________________ Confirm that you can communicate between the two new nodes: .. code-block:: none [root@pcmk-1 ~]# ping -c 3 192.168.122.102 PING 192.168.122.102 (192.168.122.102) 56(84) bytes of data. 64 bytes from 192.168.122.102: icmp_seq=1 ttl=64 time=1.22 ms 64 bytes from 192.168.122.102: icmp_seq=2 ttl=64 time=0.795 ms 64 bytes from 192.168.122.102: icmp_seq=3 ttl=64 time=0.751 ms --- 192.168.122.102 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2054ms rtt min/avg/max/mdev = 0.751/0.923/1.224/0.214 ms Now we need to make sure we can communicate with the machines by their name. If you have a DNS server, add additional entries for the two machines. Otherwise, you'll need to add the machines to ``/etc/hosts`` on both nodes. Below are the entries for my cluster nodes: .. code-block:: none [root@pcmk-1 ~]# grep pcmk /etc/hosts 192.168.122.101 pcmk-1.clusterlabs.org pcmk-1 192.168.122.102 pcmk-2.clusterlabs.org pcmk-2 We can now verify the setup by again using ping: .. code-block:: none [root@pcmk-1 ~]# ping -c 3 pcmk-2 PING pcmk-2.clusterlabs.org (192.168.122.102) 56(84) bytes of data. 64 bytes from pcmk-2.clusterlabs.org (192.168.122.102): icmp_seq=1 ttl=64 time=0.295 ms 64 bytes from pcmk-2.clusterlabs.org (192.168.122.102): icmp_seq=2 ttl=64 time=0.616 ms 64 bytes from pcmk-2.clusterlabs.org (192.168.122.102): icmp_seq=3 ttl=64 time=0.809 ms --- pcmk-2.clusterlabs.org ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2043ms rtt min/avg/max/mdev = 0.295/0.573/0.809/0.212 ms .. index:: SSH Configure SSH _____________ SSH is a convenient and secure way to copy files and perform commands remotely. For the purposes of this guide, we will create a key without a password (using the -N option) so that we can perform remote actions without being prompted. .. WARNING:: Unprotected SSH keys (those without a password) are not recommended for servers exposed to the outside world. We use them here only to simplify the demo. Create a new key and allow anyone with that key to log in: .. index:: single: SSH; key -.Creating and Activating a new SSH Key - -.. code-block:: none - - [root@pcmk-1 ~]# ssh-keygen -t dsa -f ~/.ssh/id_dsa -N "" - Generating public/private dsa key pair. - Created directory '/root/.ssh'. - Your identification has been saved in /root/.ssh/id_dsa. - Your public key has been saved in /root/.ssh/id_dsa.pub. - The key fingerprint is: - SHA256:ehR595AVLAVpvFgqYXiayds2qx8emkvnHmfQZMTZ4jM root@pcmk-1 - The key's randomart image is: - +---[DSA 1024]----+ - | . ..+.=+. | - | . +o+ Bo. | - | . *oo+*+o | - | = .*E..o | - | oS..o . | - | .o+. | - | o.*oo | - | . B.* | - | === | - +----[SHA256]-----+ - [root@pcmk-1 ~]# cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys +.. topic:: Creating and Activating a new SSH Key + + .. code-block:: none + + [root@pcmk-1 ~]# ssh-keygen -t dsa -f ~/.ssh/id_dsa -N "" + Generating public/private dsa key pair. + Created directory '/root/.ssh'. + Your identification has been saved in /root/.ssh/id_dsa. + Your public key has been saved in /root/.ssh/id_dsa.pub. + The key fingerprint is: + SHA256:ehR595AVLAVpvFgqYXiayds2qx8emkvnHmfQZMTZ4jM root@pcmk-1 + The key's randomart image is: + +---[DSA 1024]----+ + | . ..+.=+. | + | . +o+ Bo. | + | . *oo+*+o | + | = .*E..o | + | oS..o . | + | .o+. | + | o.*oo | + | . B.* | + | === | + +----[SHA256]-----+ + [root@pcmk-1 ~]# cp ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys Install the key on the other node: .. code-block:: none [root@pcmk-1 ~]# scp -r ~/.ssh pcmk-2: The authenticity of host 'pcmk-2 (192.168.122.102)' can't be established. ECDSA key fingerprint is SHA256:FQ4sVubTiHdQ6IetbN96fixoTVx/LuQUV8qoyiywnfs. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'pcmk-2,192.168.122.102' (ECDSA) to the list of known hosts. root@pcmk-2's password: id_dsa 100% 1385 1.6MB/s 00:00 id_dsa.pub 100% 601 1.0MB/s 00:00 authorized_keys 100% 601 1.3MB/s 00:00 known_hosts 100% 184 389.2KB/s 00:00 Test that you can now run commands remotely, without being prompted: .. code-block:: none [root@pcmk-1 ~]# ssh pcmk-2 -- uname -n root@pcmk-2's password: pcmk-2 diff --git a/doc/sphinx/Makefile.am b/doc/sphinx/Makefile.am index 3359b252f5..afaf99c1bd 100644 --- a/doc/sphinx/Makefile.am +++ b/doc/sphinx/Makefile.am @@ -1,192 +1,193 @@ # -# Copyright 2003-2020 the Pacemaker project contributors +# Copyright 2003-2021 the Pacemaker project contributors # # The version control history for this file may have further details. # # This source code is licensed under the GNU General Public License version 2 # or later (GPLv2+) WITHOUT ANY WARRANTY. # include $(top_srcdir)/mk/common.mk # Things you might want to override on the command line # Books to generate BOOKS ?= Clusters_from_Scratch \ Pacemaker_Administration \ Pacemaker_Development \ Pacemaker_Explained \ Pacemaker_Remote # Output formats to generate. Possible values: # html (multiple HTML files) # dirhtml (HTML files named index.html in multiple directories) # singlehtml (a single large HTML file) # text # pdf # epub # latex # linkcheck (not actually a format; check validity of external links) # # The results will end up in /_build/ BOOK_FORMATS ?= singlehtml -# Set to "a4" or "letter" if building latex format -PAPER ?= letter +# Set to "a4paper" or "letterpaper" if building latex format +PAPER ?= letterpaper # Additional options for sphinx-build SPHINXFLAGS ?= # toplevel rsync destination for www targets (without trailing slash) RSYNC_DEST ?= root@www.clusterlabs.org:/var/www/html # End of useful overrides # Example scheduler transition graphs # @TODO The original CIB XML for these is long lost. Ideally, we would recreate # something similar and keep those here instead of the DOTs (or use a couple of # scheduler regression test inputs instead), then regenerate the SVG # equivalents using crm_simulate and dot when making a release. DOTS = $(wildcard shared/images/*.dot) # Vector sources for generated PNGs (including SVG equivalents of DOTS, created # manually using dot) SVGS = $(wildcard shared/images/pcmk-*.svg) $(DOTS:%.dot=%.svg) # PNG images generated from SVGS # # These will not be accessible in a VPATH build, which will generate warnings # when building the documentation, but the make will still succeed. It is # nontrivial to get them working for VPATH builds and not worth the effort. PNGS_GENERATED = $(SVGS:%.svg=%.png) # Original PNG image sources PNGS_Clusters_from_Scratch = $(wildcard Clusters_from_Scratch/images/*.png) PNGS_Pacemaker_Explained = $(wildcard Pacemaker_Explained/images/*.png) PNGS_Pacemaker_Remote = $(wildcard Pacemaker_Remote/images/*.png) STATIC_FILES = $(wildcard _static/*.css) EXTRA_DIST = $(wildcard */*.rst) $(DOTS) $(SVGS) \ $(PNGS_Clusters_from_Scratch) \ $(PNGS_Pacemaker_Explained) \ $(PNGS_Pacemaker_Remote) \ $(STATIC_FILES) \ conf.py.in # recursive, preserve symlinks/permissions/times, verbose, compress, # don't cross filesystems, sparse, show progress RSYNC_OPTS = -rlptvzxS --progress BOOK_RSYNC_DEST = $(RSYNC_DEST)/$(PACKAGE)/doc/$(PACKAGE_SERIES) TAG ?= $(shell [ -n "`git tag --points-at HEAD | head -1`" ] \ && ( git tag --points-at HEAD | head -1 ) \ || git log --pretty=format:Pacemaker-$(VERSION)-%h -n 1 HEAD) BOOK = none DEPS_intro = shared/pacemaker-intro.rst $(PNGS_GENERATED) DEPS_Clusters_from_Scratch = $(DEPS_intro) $(PNGS_Clusters_from_Scratch) DEPS_Pacemaker_Administration = $(DEPS_intro) DEPS_Pacemaker_Development = DEPS_Pacemaker_Explained = $(DEPS_intro) $(PNGS_Pacemaker_Explained) DEPS_Pacemaker_Remote = $(PNGS_Pacemaker_Remote) if BUILD_SPHINX_DOCS INKSCAPE_CMD = $(INKSCAPE) --export-dpi=90 -C # Pattern rule to generate PNGs from SVGs # (--export-png works with Inkscape <1.0, --export-filename with >=1.0; # create the destination directory in case this is a VPATH build) %.png: %.svg $(AM_V_at)-$(MKDIR_P) "$(shell dirname "$@")" $(AM_V_GEN) { \ $(INKSCAPE_CMD) --export-png="$@" "$<" 2>/dev/null \ || $(INKSCAPE_CMD) --export-filename="$@" "$<"; \ } $(PCMK_quiet) # Create a book's Sphinx configuration. # Create the book directory in case this is a VPATH build. $(BOOKS:%=%/conf.py): conf.py.in $(AM_V_at)-$(MKDIR_P) "$(@:%/conf.py=%)" $(AM_V_GEN)sed \ -e 's/%VERSION%/$(VERSION)/g' \ -e 's/%BOOK_ID%/$(@:%/conf.py=%)/g' \ -e 's/%BOOK_TITLE%/$(subst _, ,$(@:%/conf.py=%))/g' \ -e 's#%SRC_DIR%#$(abs_srcdir)#g' \ $(<) > "$@" $(BOOK)/_build: $(STATIC_FILES) $(BOOK)/conf.py $(DEPS_$(BOOK)) $(wildcard $(srcdir)/$(BOOK)/*.rst) @echo 'Building "$(subst _, ,$(BOOK))" because of $?' $(PCMK_quiet) $(AM_V_at)rm -rf "$@" $(AM_V_BOOK)for format in $(BOOK_FORMATS); do \ echo -e "\n * Building $$format" $(PCMK_quiet); \ doctrees="doctrees"; \ real_format="$$format"; \ case "$$format" in \ pdf) real_format="latex" ;; \ gettext) doctrees="gettext-doctrees" ;; \ esac; \ $(SPHINX) -b "$$real_format" -d "$@/$$doctrees" \ -c "$(builddir)/$(BOOK)" \ - -D latex_paper_size=$(PAPER) $(SPHINXFLAGS) \ + -D latex_elements.papersize=$(PAPER) \ + $(SPHINXFLAGS) \ "$(srcdir)/$(BOOK)" "$@/$$format" \ $(PCMK_quiet); \ if [ "$$format" = "pdf" ]; then \ $(MAKE) $(AM_MAKEFLAGS) -C "$@/$$format" \ all-pdf; \ fi; \ done endif .PHONY: books-upload books-upload: all if BUILD_SPHINX_DOCS @echo "Uploading $(PACKAGE_SERIES) documentation set" @for book in $(BOOKS); do \ echo " * $$book"; \ buildfile="$$book/_build/build-$(PACKAGE_SERIES).txt"; \ echo "Generated on `date --utc` from version $(TAG)" \ > "$$buildfile"; \ rsync $(RSYNC_OPTS) "$$buildfile" \ $(BOOK_FORMATS:%=$$book/_build/%) \ "$(BOOK_RSYNC_DEST)/$$book/"; \ done all-local: @for book in $(BOOKS); do \ $(MAKE) $(AM_MAKEFLAGS) BOOK=$$book \ PAPER="$(PAPER)" SPHINXFLAGS="$(SPHINXFLAGS)" \ BOOK_FORMATS="$(BOOK_FORMATS)" $$book/_build; \ done install-data-local: all-local $(AM_V_AT)for book in $(BOOKS); do \ for format in $(BOOK_FORMATS); do \ formatdir="$$book/_build/$$format"; \ for f in `find "$$formatdir" -print`; do \ dname="`echo $$f | sed s:_build/::`"; \ dloc="$(DESTDIR)/$(docdir)/$$dname"; \ if [ -d "$$f" ]; then \ $(INSTALL) -d -m 755 "$$dloc"; \ else \ $(INSTALL_DATA) "$$f" "$$dloc"; \ fi \ done; \ done; \ done uninstall-local: $(AM_V_AT)for book in $(BOOKS); do \ rm -rf "$(DESTDIR)/$(docdir)/$$book"; \ done endif clean-local: $(AM_V_at)-rm -rf \ $(BOOKS:%="$(builddir)/%/_build") \ $(BOOKS:%="$(builddir)/%/conf.py") \ $(PNGS_GENERATED) diff --git a/doc/sphinx/Pacemaker_Administration/tools.rst b/doc/sphinx/Pacemaker_Administration/tools.rst index 5899467e91..e85edee403 100644 --- a/doc/sphinx/Pacemaker_Administration/tools.rst +++ b/doc/sphinx/Pacemaker_Administration/tools.rst @@ -1,570 +1,562 @@ .. index:: command-line tool Using Pacemaker Command-Line Tools ---------------------------------- .. index:: single: command-line tool; output format .. _cmdline_output: Controlling Command Line Output ############################### Some of the pacemaker command line utilities have been converted to a new output system. Among these tools are ``crm_mon`` and ``stonith_admin``. This is an ongoing project, and more tools will be converted over time. This system lets you control the formatting of output with ``--output-as=`` and the destination of output with ``--output-to=``. The available formats vary by tool, but at least plain text and XML are supported by all tools that use the new system. The default format is plain text. The default destination is stdout but can be redirected to any file. Some formats support command line options for changing the style of the output. For instance: .. code-block:: none # crm_mon --help-output Usage: crm_mon [OPTION?] Provides a summary of cluster's current state. Outputs varying levels of detail in a number of different formats. Output Options: --output-as=FORMAT Specify output format as one of: console (default), html, text, xml --output-to=DEST Specify file name for output (or "-" for stdout) --html-cgi Add text needed to use output in a CGI program --html-stylesheet=URI Link to an external CSS stylesheet --html-title=TITLE Page title --text-fancy Use more highly formatted output .. index:: single: crm_mon single: command-line tool; crm_mon .. _crm_mon: Monitor a Cluster with crm_mon ############################## The ``crm_mon`` utility displays the current state of an active cluster. It can show the cluster status organized by node or by resource, and can be used in either single-shot or dynamically updating mode. It can also display operations performed and information about failures. Using this tool, you can examine the state of the cluster for irregularities, and see how it responds when you cause or simulate failures. See the manual page or the output of ``crm_mon --help`` for a full description of its many options. .. topic:: Sample output from crm_mon -1 .. code-block:: none Cluster Summary: * Stack: corosync * Current DC: node2 (version 2.0.0-1) - partition with quorum * Last updated: Mon Jan 29 12:18:42 2018 * Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3 * 5 nodes configured * 2 resources configured Node List: * Online: [ node1 node2 node3 node4 node5 ] * Active resources: * Fencing (stonith:fence_xvm): Started node1 * IP (ocf:heartbeat:IPaddr2): Started node2 .. topic:: Sample output from crm_mon -n -1 .. code-block:: none Cluster Summary: * Stack: corosync * Current DC: node2 (version 2.0.0-1) - partition with quorum * Last updated: Mon Jan 29 12:21:48 2018 * Last change: Mon Jan 29 12:18:40 2018 by root via crm_attribute on node3 * 5 nodes configured * 2 resources configured * Node List: * Node node1: online * Fencing (stonith:fence_xvm): Started * Node node2: online * IP (ocf:heartbeat:IPaddr2): Started * Node node3: online * Node node4: online * Node node5: online As mentioned in an earlier chapter, the DC is the node is where decisions are made. The cluster elects a node to be DC as needed. The only significance of the choice of DC to an administrator is the fact that its logs will have the most information about why decisions were made. .. index:: pair: crm_mon; CSS .. _crm_mon_css: Styling crm_mon HTML output ___________________________ Various parts of ``crm_mon``'s HTML output have a CSS class associated with them. Not everything does, but some of the most interesting portions do. In the following example, the status of each node has an ``online`` class and the details of each resource have an ``rsc-ok`` class. .. code-block:: html

Node List

  • Node: cluster01 online
    • ping (ocf::pacemaker:ping): Started
  • Node: cluster02 online
    • ping (ocf::pacemaker:ping): Started
By default, a stylesheet for styling these classes is included in the head of the HTML output. The relevant portions of this stylesheet that would be used in the above example is: .. code-block:: css If you want to override some or all of the styling, simply create your own stylesheet, place it on a web server, and pass ``--html-stylesheet=`` to ``crm_mon``. The link is added after the default stylesheet, so your changes take precedence. You don't need to duplicate the entire default. Only include what you want to change. .. index:: single: cibadmin single: command-line tool; cibadmin .. _cibadmin: Edit the CIB XML with cibadmin ############################## The most flexible tool for modifying the configuration is Pacemaker's ``cibadmin`` command. With ``cibadmin``, you can query, add, remove, update or replace any part of the configuration. All changes take effect immediately, so there is no need to perform a reload-like operation. The simplest way of using ``cibadmin`` is to use it to save the current configuration to a temporary file, edit that file with your favorite text or XML editor, and then upload the revised configuration. .. topic:: Safely using an editor to modify the cluster configuration .. code-block:: none # cibadmin --query > tmp.xml # vi tmp.xml # cibadmin --replace --xml-file tmp.xml Some of the better XML editors can make use of a RELAX NG schema to help make sure any changes you make are valid. The schema describing the configuration can be found in ``pacemaker.rng``, which may be deployed in a location such as ``/usr/share/pacemaker`` depending on your operating system distribution and how you installed the software. If you want to modify just one section of the configuration, you can query and replace just that section to avoid modifying any others. .. topic:: Safely using an editor to modify only the resources section .. code-block:: none # cibadmin --query --scope resources > tmp.xml # vi tmp.xml # cibadmin --replace --scope resources --xml-file tmp.xml To quickly delete a part of the configuration, identify the object you wish to delete by XML tag and id. For example, you might search the CIB for all STONITH-related configuration: .. topic:: Searching for STONITH-related configuration items .. code-block:: none # cibadmin --query | grep stonith If you wanted to delete the ``primitive`` tag with id ``child_DoFencing``, you would run: .. code-block:: none # cibadmin --delete --xml-text '' See the cibadmin man page for more options. .. warning:: Never edit the live ``cib.xml`` file directly. Pacemaker will detect such changes and refuse to use the configuration. .. index:: single: crm_shadow single: command-line tool; crm_shadow .. _crm_shadow: Batch Configuration Changes with crm_shadow ########################################### Often, it is desirable to preview the effects of a series of configuration changes before updating the live configuration all at once. For this purpose, ``crm_shadow`` creates a "shadow" copy of the configuration and arranges for all the command-line tools to use it. To begin, simply invoke ``crm_shadow --create`` with a name of your choice, and follow the simple on-screen instructions. Shadow copies are identified with a name to make it possible to have more than one. .. warning:: Read this section and the on-screen instructions carefully; failure to do so could result in destroying the cluster's active configuration! .. topic:: Creating and displaying the active sandbox .. code-block:: none # crm_shadow --create test Setting up shadow instance Type Ctrl-D to exit the crm_shadow shell shadow[test]: shadow[test] # crm_shadow --which test From this point on, all cluster commands will automatically use the shadow copy instead of talking to the cluster's active configuration. Once you have finished experimenting, you can either make the changes active via the ``--commit`` option, or discard them using the ``--delete`` option. Again, be sure to follow the on-screen instructions carefully! For a full list of ``crm_shadow`` options and commands, invoke it with the ``--help`` option. .. topic:: Use sandbox to make multiple changes all at once, discard them, and verify real configuration is untouched .. code-block:: none shadow[test] # crm_failcount -r rsc_c001n01 -G scope=status name=fail-count-rsc_c001n01 value=0 shadow[test] # crm_standby --node c001n02 -v on shadow[test] # crm_standby --node c001n02 -G scope=nodes name=standby value=on shadow[test] # cibadmin --erase --force shadow[test] # cibadmin --query shadow[test] # crm_shadow --delete test --force Now type Ctrl-D to exit the crm_shadow shell shadow[test] # exit # crm_shadow --which No active shadow configuration defined # cibadmin -Q See the next section, :ref:`crm_simulate`, for how to test your changes before committing them to the live cluster. .. index:: single: crm_simulate single: command-line tool; crm_simulate .. _crm_simulate: Simulate Cluster Activity with crm_simulate ########################################### The command-line tool `crm_simulate` shows the results of the same logic the cluster itself uses to respond to a particular cluster configuration and status. As always, the man page is the primary documentation, and should be consulted for further details. This section aims for a better conceptual explanation and practical examples. Replaying cluster decision-making logic _______________________________________ At any given time, one node in a Pacemaker cluster will be elected DC, and that node will run Pacemaker's scheduler to make decisions. Each time decisions need to be made (a "transition"), the DC will have log messages like "Calculated transition ... saving inputs in ..." with a file name. You can grab the named file and replay the cluster logic to see why particular decisions were made. The file contains the live cluster configuration at that moment, so you can also look at it directly to see the value of node attributes, etc., at that time. The simplest usage is (replacing $FILENAME with the actual file name): .. topic:: Simulate cluster response to a given CIB .. code-block:: none # crm_simulate --simulate --xml-file $FILENAME That will show the cluster state when the process started, the actions that need to be taken ("Transition Summary"), and the resulting cluster state if the actions succeed. Most actions will have a brief description of why they were required. The transition inputs may be compressed. ``crm_simulate`` can handle these compressed files directly, though if you want to edit the file, you'll need to uncompress it first. You can do the same simulation for the live cluster configuration at the current moment. This is useful mainly when using ``crm_shadow`` to create a sandbox version of the CIB; the ``--live-check`` option will use the shadow CIB if one is in effect. .. topic:: Simulate cluster response to current live CIB or shadow CIB .. code-block:: none # crm_simulate --simulate --live-check Why decisions were made _______________________ To get further insight into the "why", it gets user-unfriendly very quickly. If you add the ``--show-scores`` option, you will also see all the scores that went into the decision-making. The node with the highest cumulative score for a resource will run it. You can look for ``-INFINITY`` scores in particular to see where complete bans came into effect. You can also add ``-VVVV`` to get more detailed messages about what's happening under the hood. You can add up to two more V's even, but that's usually useful only if you're a masochist or tracing through the source code. Visualizing the action sequence _______________________________ Another handy feature is the ability to generate a visual graph of the actions needed, using the ``--dot-file`` option. This relies on the separate Graphviz [#]_ project. .. topic:: Generate a visual graph of cluster actions from a saved CIB .. code-block:: none # crm_simulate --simulate --xml-file $FILENAME --dot-file $FILENAME.dot # dot $FILENAME.dot -Tsvg > $FILENAME.svg ``$FILENAME.dot`` will contain a GraphViz representation of the cluster's response to your changes, including all actions with their ordering dependencies. ``$FILENAME.svg`` will be the same information in a standard graphical format that you can view in your browser or other app of choice. You could, of course, use other ``dot`` options to generate other formats. How to interpret the graphical output: * Bubbles indicate actions, and arrows indicate ordering dependencies * Resource actions have text of the form ``__ `` indicating that the specified action will be executed for the specified resource on the specified node, once if interval is 0 or at specified recurring interval otherwise * Actions with black text will be sent to the executor (that is, the appropriate agent will be invoked) * Actions with orange text are "pseudo" actions that the cluster uses internally for ordering but require no real activity * Actions with a solid green border are part of the transition (that is, the cluster will attempt to execute them in the given order -- though a transition can be interrupted by action failure or new events) * Dashed arrows indicate dependencies that are not present in the transition graph * Actions with a dashed border will not be executed. If the dashed border is blue, the cluster does not feel the action needs to be executed. If the dashed border is red, the cluster would like to execute the action but cannot. Any actions depending on an action with a dashed border will not be able to execute. * Loops should not happen, and should be reported as a bug if found. .. topic:: Small Cluster Transition .. image:: ../shared/images/Policy-Engine-small.png :alt: An example transition graph as represented by Graphviz - :height: 325 - :width: 1161 - :scale: 75 % :align: center In the above example, it appears that a new node, ``pcmk-2``, has come online and that the cluster is checking to make sure ``rsc1``, ``rsc2`` and ``rsc3`` are not already running there (indicated by the ``rscN_monitor_0`` entries). Once it did that, and assuming the resources were not active there, it would have liked to stop ``rsc1`` and ``rsc2`` on ``pcmk-1`` and move them to ``pcmk-2``. However, there appears to be some problem and the cluster cannot or is not permitted to perform the stop actions which implies it also cannot perform the start actions. For some reason, the cluster does not want to start ``rsc3`` anywhere. .. topic:: Complex Cluster Transition .. image:: ../shared/images/Policy-Engine-big.png :alt: Complex transition graph that you're not expected to be able to read - :width: 1455 - :height: 1945 - :scale: 75 % :align: center What-if scenarios _________________ You can make changes to the saved or shadow CIB and simulate it again, to see how Pacemaker would react differently. You can edit the XML by hand, use command-line tools such as ``cibadmin`` with either a shadow CIB or the ``CIB_file`` environment variable set to the filename, or use higher-level tool support (see the man pages of the specific tool you're using for how to perform actions on a saved CIB file rather than the live CIB). You can also inject node failures and/or action failures into the simulation; see the ``crm_simulate`` man page for more details. This capability is useful when using a shadow CIB to edit the configuration. Before committing the changes to the live cluster with ``crm_shadow --commit``, you can use ``crm_simulate`` to see how the cluster will react to the changes. -.. _attrd_updater: - .. _crm_attribute: .. index:: single: attrd_updater single: command-line tool; attrd_updater single: crm_attribute single: command-line tool; crm_attribute Manage Node Attributes, Cluster Options and Defaults with crm_attribute and attrd_updater ######################################################################################### ``crm_attribute`` and ``attrd_updater`` are confusingly similar tools with subtle differences. ``attrd_updater`` can query and update node attributes. ``crm_attribute`` can query and update not only node attributes, but also cluster options, resource defaults, and operation defaults. To understand the differences, it helps to understand the various types of node attribute. .. table:: **Types of Node Attributes** +-----------+----------+-------------------+------------------+----------------+----------------+ | Type | Recorded | Recorded in | Survive full | Manageable by | Manageable by | | | in CIB? | attribute manager | cluster restart? | crm_attribute? | attrd_updater? | | | | memory? | | | | +===========+==========+===================+==================+================+================+ | permanent | yes | no | yes | yes | no | +-----------+----------+-------------------+------------------+----------------+----------------+ | transient | yes | yes | no | yes | yes | +-----------+----------+-------------------+------------------+----------------+----------------+ | private | no | yes | no | no | yes | +-----------+----------+-------------------+------------------+----------------+----------------+ As you can see from the table above, ``crm_attribute`` can manage permanent and transient node attributes, while ``attrd_updater`` can manage transient and private node attributes. The difference between the two tools lies mainly in *how* they update node attributes: ``attrd_updater`` always contacts the Pacemaker attribute manager directly, while ``crm_attribute`` will contact the attribute manager only for transient node attributes, and will instead modify the CIB directly for permanent node attributes (and for transient node attributes when unable to contact the attribute manager). By contacting the attribute manager directly, ``attrd_updater`` can change an attribute's "dampening" (whether changes are immediately flushed to the CIB or after a specified amount of time, to minimize disk writes for frequent changes), set private node attributes (which are never written to the CIB), and set attributes for nodes that don't yet exist. By modifying the CIB directly, ``crm_attribute`` can set permanent node attributes (which are only in the CIB and not managed by the attribute manager), and can be used with saved CIB files and shadow CIBs. However a transient node attribute is set, it is synchronized between the CIB and the attribute manager, on all nodes. .. index:: single: crm_failcount single: command-line tool; crm_failcount single: crm_node single: command-line tool; crm_node single: crm_report single: command-line tool; crm_report single: crm_standby single: command-line tool; crm_standby single: crm_verify single: command-line tool; crm_verify single: stonith_admin single: command-line tool; stonith_admin Other Commonly Used Tools ######################### Other command-line tools include: * ``crm_failcount``: query or delete resource fail counts * ``crm_node``: manage cluster nodes * ``crm_report``: generate a detailed cluster report for bug submissions * ``crm_resource``: manage cluster resources * ``crm_standby``: manage standby status of nodes * ``crm_verify``: validate a CIB * ``stonith_admin``: manage fencing devices See the manual pages for details. .. rubric:: Footnotes .. [#] Graph visualization software. See http://www.graphviz.org/ for details. diff --git a/doc/sphinx/Pacemaker_Explained/nodes.rst b/doc/sphinx/Pacemaker_Explained/nodes.rst index 84069ea931..25f2c2129f 100644 --- a/doc/sphinx/Pacemaker_Explained/nodes.rst +++ b/doc/sphinx/Pacemaker_Explained/nodes.rst @@ -1,247 +1,246 @@ Cluster Nodes ------------- Defining a Cluster Node _______________________ Each cluster node will have an entry in the ``nodes`` section containing at least an ID and a name. A cluster node's ID is defined by the cluster layer (Corosync). .. topic:: **Example Corosync cluster node entry** .. code-block:: xml In normal circumstances, the admin should let the cluster populate this information automatically from the cluster layer. .. _node_name: Where Pacemaker Gets the Node Name ################################## The name that Pacemaker uses for a node in the configuration does not have to be the same as its local hostname. Pacemaker uses the following for a Corosync node's name, in order of most preferred first: * The value of ``name`` in the ``nodelist`` section of ``corosync.conf`` * The value of ``ring0_addr`` in the ``nodelist`` section of ``corosync.conf`` * The local hostname (value of ``uname -n``) If the cluster is running, the ``crm_node -n`` command will display the local node's name as used by the cluster. If a Corosync ``nodelist`` is used, ``crm_node --name-for-id`` with a Corosync node ID will display the name used by the node with the given Corosync ``nodeid``, for example: .. code-block:: none crm_node --name-for-id 2 .. index:: single: node; attribute single: node attribute .. _node_attributes: Node Attributes _______________ Pacemaker allows node-specific values to be specified using *node attributes*. A node attribute has a name, and may have a distinct value for each node. Node attributes come in two types, *permanent* and *transient*. Permanent node attributes are kept within the ``node`` entry, and keep their values even if the cluster restarts on a node. Transient node attributes are kept in the CIB's ``status`` section, and go away when the cluster stops on the node. While certain node attributes have specific meanings to the cluster, they are mainly intended to allow administrators and resource agents to track any information desired. For example, an administrator might choose to define node attributes for how much RAM and disk space each node has, which OS each uses, or which server room rack each node is in. Users can configure :ref:`rules` that use node attributes to affect where resources are placed. Setting and querying node attributes #################################### Node attributes can be set and queried using the ``crm_attribute`` and ``attrd_updater`` commands, so that the user does not have to deal with XML configuration directly. Here is an example command to set a permanent node attribute, and the XML configuration that would be generated: .. topic:: **Result of using crm_attribute to specify which kernel pcmk-1 is running** .. code-block:: none # crm_attribute --type nodes --node pcmk-1 --name kernel --update $(uname -r) .. code-block:: xml To read back the value that was just set: .. code-block:: none # crm_attribute --type nodes --node pcmk-1 --name kernel --query scope=nodes name=kernel value=3.10.0-862.14.4.el7.x86_64 The ``--type nodes`` indicates that this is a permanent node attribute; ``--type status`` would indicate a transient node attribute. Special node attributes ####################### Certain node attributes have special meaning to the cluster. Node attribute names beginning with ``#`` are considered reserved for these special attributes. Some special attributes do not start with ``#``, for historical reasons. Certain special attributes are set automatically by the cluster, should never be modified directly, and can be used only within :ref:`rules`; these are listed under :ref:`built-in node attributes `. For true/false values, the cluster considers a value of "1", "y", "yes", "on", or "true" (case-insensitively) to be true, "0", "n", "no", "off", "false", or unset to be false, and anything else to be an error. .. table:: **Node attributes with special significance** +----------------------------+-----------------------------------------------------+ | Name | Description | +============================+=====================================================+ | fail-count-* | .. index:: | | | pair: node attribute; fail-count | | | | | | Attributes whose names start with | | | ``fail-count-`` are managed by the cluster | | | to track how many times particular resource | | | operations have failed on this node. These | | | should be queried and cleared via the | | | ``crm_failcount`` or | | | ``crm_resource --cleanup`` commands rather | | | than directly. | +----------------------------+-----------------------------------------------------+ | last-failure-* | .. index:: | | | pair: node attribute; last-failure | | | | | | Attributes whose names start with | | | ``last-failure-`` are managed by the cluster | | | to track when particular resource operations | | | have most recently failed on this node. | | | These should be cleared via the | | | ``crm_failcount`` or | | | ``crm_resource --cleanup`` commands rather | | | than directly. | +----------------------------+-----------------------------------------------------+ | maintenance | .. index:: | | | pair: node attribute; maintenance | | | | | | Similar to the ``maintenance-mode`` | | | :ref:`cluster option `, but | | | for a single node. If true, resources will | | | not be started or stopped on the node, | | | resources and individual clone instances | | | running on the node will become unmanaged, | | | and any recurring operations for those will | | | be cancelled. | | | | - | | .. warning:: | - | | Restarting pacemaker on a node that is in | - | | single-node maintenance mode will likely | - | | lead to undesirable effects. If | - | | ``maintenance`` is set as a transient | - | | attribute, it will be erased when | - | | Pacemaker is stopped, which will | - | | immediately take the node out of | - | | maintenance mode and likely get it | - | | fenced. Even if permanent, if Pacemaker | - | | is restarted, any resources active on the | - | | node will have their local history erased | - | | when the node rejoins, so the cluster | - | | will no longer consider them running on | - | | the node and thus will consider them | - | | managed again, leading them to be started | - | | elsewhere. This behavior might be | - | | improved in a future release. | + | | **Warning:** Restarting pacemaker on a node that is | + | | in single-node maintenance mode will likely | + | | lead to undesirable effects. If | + | | ``maintenance`` is set as a transient | + | | attribute, it will be erased when | + | | Pacemaker is stopped, which will | + | | immediately take the node out of | + | | maintenance mode and likely get it | + | | fenced. Even if permanent, if Pacemaker | + | | is restarted, any resources active on the | + | | node will have their local history erased | + | | when the node rejoins, so the cluster | + | | will no longer consider them running on | + | | the node and thus will consider them | + | | managed again, leading them to be started | + | | elsewhere. This behavior might be | + | | improved in a future release. | +----------------------------+-----------------------------------------------------+ | probe_complete | .. index:: | | | pair: node attribute; probe_complete | | | | | | This is managed by the cluster to detect | | | when nodes need to be reprobed, and should | | | never be used directly. | +----------------------------+-----------------------------------------------------+ | resource-discovery-enabled | .. index:: | | | pair: node attribute; resource-discovery-enabled | | | | | | If the node is a remote node, fencing is enabled, | | | and this attribute is explicitly set to false | | | (unset means true in this case), resource discovery | | | (probes) will not be done on this node. This is | | | highly discouraged; the ``resource-discovery`` | | | location constraint property is preferred for this | | | purpose. | +----------------------------+-----------------------------------------------------+ | shutdown | .. index:: | | | pair: node attribute; shutdown | | | | | | This is managed by the cluster to orchestrate the | | | shutdown of a node, and should never be used | | | directly. | +----------------------------+-----------------------------------------------------+ | site-name | .. index:: | | | pair: node attribute; site-name | | | | | | If set, this will be used as the value of the | | | ``#site-name`` node attribute used in rules. (If | | | not set, the value of the ``cluster-name`` cluster | | | option will be used as ``#site-name`` instead.) | +----------------------------+-----------------------------------------------------+ | standby | .. index:: | | | pair: node attribute; standby | | | | | | If true, the node is in standby mode. This is | | | typically set and queried via the ``crm_standby`` | | | command rather than directly. | +----------------------------+-----------------------------------------------------+ | terminate | .. index:: | | | pair: node attribute; terminate | | | | | | If the value is true or begins with any nonzero | | | number, the node will be fenced. This is typically | | | set by tools rather than directly. | +----------------------------+-----------------------------------------------------+ | #digests-* | .. index:: | | | pair: node attribute; #digests | | | | | | Attributes whose names start with ``#digests-`` are | | | managed by the cluster to detect when | | | :ref:`unfencing` needs to be redone, and should | | | never be used directly. | +----------------------------+-----------------------------------------------------+ | #node-unfenced | .. index:: | | | pair: node attribute; #node-unfenced | | | | | | When the node was last unfenced (as seconds since | | | the epoch). This is managed by the cluster and | | | should never be used directly. | +----------------------------+-----------------------------------------------------+ diff --git a/doc/sphinx/Pacemaker_Explained/options.rst b/doc/sphinx/Pacemaker_Explained/options.rst index 2e21bb6180..cbb849f991 100644 --- a/doc/sphinx/Pacemaker_Explained/options.rst +++ b/doc/sphinx/Pacemaker_Explained/options.rst @@ -1,614 +1,611 @@ Cluster-Wide Configuration -------------------------- .. index:: pair: XML element; cib pair: XML element; configuration Configuration Layout #################### The cluster is defined by the Cluster Information Base (CIB), which uses XML notation. The simplest CIB, an empty one, looks like this: .. topic:: An empty configuration .. code-block:: xml The empty configuration above contains the major sections that make up a CIB: * ``cib``: The entire CIB is enclosed with a ``cib`` element. Certain fundamental settings are defined as attributes of this element. * ``configuration``: This section -- the primary focus of this document -- contains traditional configuration information such as what resources the cluster serves and the relationships among them. * ``crm_config``: cluster-wide configuration options * ``nodes``: the machines that host the cluster * ``resources``: the services run by the cluster * ``constraints``: indications of how resources should be placed * ``status``: This section contains the history of each resource on each node. Based on this data, the cluster can construct the complete current state of the cluster. The authoritative source for this section is the local executor (pacemaker-execd process) on each cluster node, and the cluster will occasionally repopulate the entire section. For this reason, it is never written to disk, and administrators are advised against modifying it in any way. In this document, configuration settings will be described as properties or options based on how they are defined in the CIB: * Properties are XML attributes of an XML element. * Options are name-value pairs expressed as ``nvpair`` child elements of an XML element. Normally, you will use command-line tools that abstract the XML, so the distinction will be unimportant; both properties and options are cluster settings you can tweak. CIB Properties ############## Certain settings are defined by CIB properties (that is, attributes of the ``cib`` tag) rather than with the rest of the cluster configuration in the ``configuration`` section. The reason is simply a matter of parsing. These options are used by the configuration database which is, by design, mostly ignorant of the content it holds. So the decision was made to place them in an easy-to-find location. .. table:: **CIB Properties** +------------------+-----------------------------------------------------------+ | Attribute | Description | +==================+===========================================================+ | admin_epoch | .. index:: | | | pair: admin_epoch; cib | | | | | | When a node joins the cluster, the cluster performs a | | | check to see which node has the best configuration. It | | | asks the node with the highest (``admin_epoch``, | | | ``epoch``, ``num_updates``) tuple to replace the | | | configuration on all the nodes -- which makes setting | | | them, and setting them correctly, very important. | | | ``admin_epoch`` is never modified by the cluster; you can | | | use this to make the configurations on any inactive nodes | | | obsolete. | | | | - | | .. warning:: | - | | Never set this value to zero. In such cases, the | - | | cluster cannot tell the difference between your | - | | configuration and the "empty" one used when nothing is | - | | found on disk. | + | | **Warning:** Never set this value to zero. In such cases, | + | | the cluster cannot tell the difference between your | + | | configuration and the "empty" one used when nothing is | + | | found on disk. | +------------------+-----------------------------------------------------------+ | epoch | .. index:: | | | pair: epoch; cib | | | | | | The cluster increments this every time the configuration | | | is updated (usually by the administrator). | +------------------+-----------------------------------------------------------+ | num_updates | .. index:: | | | pair: num_updates; cib | | | | | | The cluster increments this every time the configuration | | | or status is updated (usually by the cluster) and resets | | | it to 0 when epoch changes. | +------------------+-----------------------------------------------------------+ | validate-with | .. index:: | | | pair: validate-with; cib | | | | | | Determines the type of XML validation that will be done | | | on the configuration. If set to ``none``, the cluster | | | will not verify that updates conform to the DTD (nor | | | reject ones that don't). | +------------------+-----------------------------------------------------------+ | cib-last-written | .. index:: | | | pair: cib-last-written; cib | | | | | | Indicates when the configuration was last written to | | | disk. Maintained by the cluster; for informational | | | purposes only. | +------------------+-----------------------------------------------------------+ | have-quorum | .. index:: | | | pair: have-quorum; cib | | | | | | Indicates if the cluster has quorum. If false, this may | | | mean that the cluster cannot start resources or fence | | | other nodes (see ``no-quorum-policy`` below). Maintained | | | by the cluster. | +------------------+-----------------------------------------------------------+ | dc-uuid | .. index:: | | | pair: dc-uuid; cib | | | | | | Indicates which cluster node is the current leader. Used | | | by the cluster when placing resources and determining the | | | order of some events. Maintained by the cluster. | +------------------+-----------------------------------------------------------+ .. _cluster_options: Cluster Options ############### Cluster options, as you might expect, control how the cluster behaves when confronted with various situations. They are grouped into sets within the ``crm_config`` section. In advanced configurations, there may be more than one set. (This will be described later in the chapter on :ref:`rules` where we will show how to have the cluster use different sets of options during working hours than during weekends.) For now, we will describe the simple case where each option is present at most once. You can obtain an up-to-date list of cluster options, including their default values, by running the ``man pacemaker-schedulerd`` and ``man pacemaker-controld`` commands. .. table:: **Cluster Options** +---------------------------+---------+----------------------------------------------------+ | Option | Default | Description | +===========================+=========+====================================================+ | cluster-name | | .. index:: | | | | pair: cluster option; cluster-name | | | | | | | | An (optional) name for the cluster as a whole. | | | | This is mostly for users' convenience for use | | | | as desired in administration, but this can be | | | | used in the Pacemaker configuration in | | | | :ref:`rules` (as the ``#cluster-name`` | | | | :ref:`node attribute | | | | `. It may | | | | also be used by higher-level tools when | | | | displaying cluster information, and by | | | | certain resource agents (for example, the | | | | ``ocf:heartbeat:GFS2`` agent stores the | | | | cluster name in filesystem meta-data). | +---------------------------+---------+----------------------------------------------------+ | dc-version | | .. index:: | | | | pair: cluster option; dc-version | | | | | | | | Version of Pacemaker on the cluster's DC. | | | | Determined automatically by the cluster. Often | | | | includes the hash which identifies the exact | | | | Git changeset it was built from. Used for | | | | diagnostic purposes. | +---------------------------+---------+----------------------------------------------------+ | cluster-infrastructure | | .. index:: | | | | pair: cluster option; cluster-infrastructure | | | | | | | | The messaging stack on which Pacemaker is | | | | currently running. Determined automatically by | | | | the cluster. Used for informational and | | | | diagnostic purposes. | +---------------------------+---------+----------------------------------------------------+ | no-quorum-policy | stop | .. index:: | | | | pair: cluster option; no-quorum-policy | | | | | | | | What to do when the cluster does not have | | | | quorum. Allowed values: | | | | | | | | * ``ignore:`` continue all resource management | | | | * ``freeze:`` continue resource management, but | | | | don't recover resources from nodes not in the | | | | affected partition | | | | * ``stop:`` stop all resources in the affected | | | | cluster partition | | | | * ``demote:`` demote promotable resources and | | | | stop all other resources in the affected | | | | cluster partition *(since 2.0.5)* | | | | * ``suicide:`` fence all nodes in the affected | | | | cluster partition | +---------------------------+---------+----------------------------------------------------+ | batch-limit | 0 | .. index:: | | | | pair: cluster option; batch-limit | | | | | | | | The maximum number of actions that the cluster | | | | may execute in parallel across all nodes. The | | | | "correct" value will depend on the speed and | | | | load of your network and cluster nodes. If zero, | | | | the cluster will impose a dynamically calculated | | | | limit only when any node has high load. | +---------------------------+---------+----------------------------------------------------+ | migration-limit | -1 | .. index:: | | | | pair: cluster option; migration-limit | | | | | | | | The number of | | | | :ref:`live migration ` actions | | | | that the cluster is allowed to execute in | | | | parallel on a node. A value of -1 means | | | | unlimited. | +---------------------------+---------+----------------------------------------------------+ | symmetric-cluster | true | .. index:: | | | | pair: cluster option; symmetric-cluster | | | | | | | | Whether resources can run on any node by default | | | | (if false, a resource is allowed to run on a | | | | node only if a | | | | :ref:`location constraint ` | | | | enables it) | +---------------------------+---------+----------------------------------------------------+ | stop-all-resources | false | .. index:: | | | | pair: cluster option; stop-all-resources | | | | | | | | Whether all resources should be disallowed from | | | | running (can be useful during maintenance) | +---------------------------+---------+----------------------------------------------------+ | stop-orphan-resources | true | .. index:: | | | | pair: cluster option; stop-orphan-resources | | | | | | | | Whether resources that have been deleted from | | | | the configuration should be stopped. This value | | | | takes precedence over ``is-managed`` (that is, | | | | even unmanaged resources will be stopped when | | | | orphaned if this value is ``true`` | +---------------------------+---------+----------------------------------------------------+ | stop-orphan-actions | true | .. index:: | | | | pair: cluster option; stop-orphan-actions | | | | | | | | Whether recurring :ref:`operations ` | | | | that have been deleted from the configuration | | | | should be cancelled | +---------------------------+---------+----------------------------------------------------+ | start-failure-is-fatal | true | .. index:: | | | | pair: cluster option; start-failure-is-fatal | | | | | | | | Whether a failure to start a resource on a | | | | particular node prevents further start attempts | | | | on that node? If ``false``, the cluster will | | | | decide whether the node is still eligible based | | | | on the resource's current failure count and | | | | :ref:`migration-threshold `. | +---------------------------+---------+----------------------------------------------------+ | enable-startup-probes | true | .. index:: | | | | pair: cluster option; enable-startup-probes | | | | | | | | Whether the cluster should check the | | | | pre-existing state of resources when the cluster | | | | starts | +---------------------------+---------+----------------------------------------------------+ | maintenance-mode | false | .. index:: | | | | pair: cluster option; maintenance-mode | | | | | | | | Whether the cluster should refrain from | | | | monitoring, starting and stopping resources | +---------------------------+---------+----------------------------------------------------+ | stonith-enabled | true | .. index:: | | | | pair: cluster option; stonith-enabled | | | | | | | | Whether the cluster is allowed to fence nodes | | | | (for example, failed nodes and nodes with | | | | resources that can't be stopped. | | | | | | | | If true, at least one fence device must be | | | | configured before resources are allowed to run. | | | | | | | | If false, unresponsive nodes are immediately | | | | assumed to be running no resources, and resource | | | | recovery on online nodes starts without any | | | | further protection (which can mean *data loss* | | | | if the unresponsive node still accesses shared | | | | storage, for example). See also the | | | | :ref:`requires ` resource | | | | meta-attribute. | +---------------------------+---------+----------------------------------------------------+ | stonith-action | reboot | .. index:: | | | | pair: cluster option; stonith-action | | | | | | | | Action the cluster should send to the fence agent | | | | when a node must be fenced. Allowed values are | | | | ``reboot``, ``off``, and (for legacy agents only) | | | | ``poweroff``. | +---------------------------+---------+----------------------------------------------------+ | stonith-timeout | 60s | .. index:: | | | | pair: cluster option; stonith-timeout | | | | | | | | How long to wait for ``on``, ``off``, and | | | | ``reboot`` fence actions to complete by default. | +---------------------------+---------+----------------------------------------------------+ | stonith-max-attempts | 10 | .. index:: | | | | pair: cluster option; stonith-max-attempts | | | | | | | | How many times fencing can fail for a target | | | | before the cluster will no longer immediately | | | | re-attempt it. | +---------------------------+---------+----------------------------------------------------+ | stonith-watchdog-timeout | 0 | .. index:: | | | | pair: cluster option; stonith-watchdog-timeout | | | | | | | | If nonzero, and the cluster detects | | | | ``have-watchdog`` as ``true``, then watchdog-based | | | | self-fencing will be performed via SBD when | | | | fencing is required, without requiring a fencing | | | | resource explicitly configured. | | | | | | | | If this is set to a positive value, unseen nodes | | | | are assumed to self-fence within this much time. | | | | | - | | | .. warning:: | - | | | It must be ensured that this value is larger | - | | | than the ``SBD_WATCHDOG_TIMEOUT`` environment | - | | | variable on all nodes. Pacemaker verifies the | - | | | settings individually on all nodes and prevents | - | | | startup or shuts down if configured wrongly on | - | | | the fly. It is strongly recommended that | - | | | ``SBD_WATCHDOG_TIMEOUT`` be set to the same | - | | | value on all nodes. | + | | | **Warning:** It must be ensured that this value is | + | | | larger than the ``SBD_WATCHDOG_TIMEOUT`` | + | | | environment variable on all nodes. Pacemaker | + | | | verifies the settings individually on all nodes | + | | | and prevents startup or shuts down if configured | + | | | wrongly on the fly. It is strongly recommended | + | | | that ``SBD_WATCHDOG_TIMEOUT`` be set to the same | + | | | value on all nodes. | | | | | | | | If this is set to a negative value, and | | | | ``SBD_WATCHDOG_TIMEOUT`` is set, twice that value | | | | will be used. | | | | | - | | | .. warning:: | - | | | In this case, it is essential (and currently | - | | | not verified by pacemaker) that | - | | | ``SBD_WATCHDOG_TIMEOUT`` is set to the same | - | | | value on all nodes. | + | | | **Warning:** In this case, it is essential (and | + | | | currently not verified by pacemaker) that | + | | | ``SBD_WATCHDOG_TIMEOUT`` is set to the same | + | | | value on all nodes. | +---------------------------+---------+----------------------------------------------------+ | concurrent-fencing | false | .. index:: | | | | pair: cluster option; concurrent-fencing | | | | | | | | Whether the cluster is allowed to initiate multiple| | | | fence actions concurrently | +---------------------------+---------+----------------------------------------------------+ | fence-reaction | stop | .. index:: | | | | pair: cluster option; fence-reaction | | | | | | | | How should a cluster node react if notified of its | | | | own fencing? A cluster node may receive | | | | notification of its own fencing if fencing is | | | | misconfigured, or if fabric fencing is in use that | | | | doesn't cut cluster communication. Allowed values | | | | are ``stop`` to attempt to immediately stop | | | | pacemaker and stay stopped, or ``panic`` to | | | | attempt to immediately reboot the local node, | | | | falling back to stop on failure. The default is | | | | likely to be changed to ``panic`` in a future | | | | release. *(since 2.0.3)* | +---------------------------+---------+----------------------------------------------------+ | priority-fencing-delay | 0 | .. index:: | | | | pair: cluster option; priority-fencing-delay | | | | | | | | Apply this delay to any fencing targeting the lost | | | | nodes with the highest total resource priority in | | | | case we don't have the majority of the nodes in | | | | our cluster partition, so that the more | | | | significant nodes potentially win any fencing | | | | match (especially meaningful in a split-brain of a | | | | 2-node cluster). A promoted resource instance | | | | takes the resource's priority plus 1 if the | | | | resource's priority is not 0. Any static or random | | | | delays introduced by ``pcmk_delay_base`` and | | | | ``pcmk_delay_max`` configured for the | | | | corresponding fencing resources will be added to | | | | this delay. This delay should be significantly | | | | greater than (safely twice) the maximum delay from | | | | those parameters. *(since 2.0.4)* | +---------------------------+---------+----------------------------------------------------+ | cluster-delay | 60s | .. index:: | | | | pair: cluster option; cluster-delay | | | | | | | | Estimated maximum round-trip delay over the | | | | network (excluding action execution). If the DC | | | | requires an action to be executed on another node, | | | | it will consider the action failed if it does not | | | | get a response from the other node in this time | | | | (after considering the action's own timeout). The | | | | "correct" value will depend on the speed and load | | | | of your network and cluster nodes. | +---------------------------+---------+----------------------------------------------------+ | dc-deadtime | 20s | .. index:: | | | | pair: cluster option; dc-deadtime | | | | | | | | How long to wait for a response from other nodes | | | | during startup. The "correct" value will depend on | | | | the speed/load of your network and the type of | | | | switches used. | +---------------------------+---------+----------------------------------------------------+ | cluster-ipc-limit | 500 | .. index:: | | | | pair: cluster option; cluster-ipc-limit | | | | | | | | The maximum IPC message backlog before one cluster | | | | daemon will disconnect another. This is of use in | | | | large clusters, for which a good value is the | | | | number of resources in the cluster multiplied by | | | | the number of nodes. The default of 500 is also | | | | the minimum. Raise this if you see | | | | "Evicting client" messages for cluster daemon PIDs | | | | in the logs. | +---------------------------+---------+----------------------------------------------------+ | pe-error-series-max | -1 | .. index:: | | | | pair: cluster option; pe-error-series-max | | | | | | | | The number of scheduler inputs resulting in errors | | | | to save. Used when reporting problems. A value of | | | | -1 means unlimited (report all). | +---------------------------+---------+----------------------------------------------------+ | pe-warn-series-max | -1 | .. index:: | | | | pair: cluster option; pe-warn-series-max | | | | | | | | The number of scheduler inputs resulting in | | | | warnings to save. Used when reporting problems. A | | | | value of -1 means unlimited (report all). | +---------------------------+---------+----------------------------------------------------+ | pe-input-series-max | -1 | .. index:: | | | | pair: cluster option; pe-input-series-max | | | | | | | | The number of "normal" scheduler inputs to save. | | | | Used when reporting problems. A value of -1 means | | | | unlimited (report all). | +---------------------------+---------+----------------------------------------------------+ | enable-acl | false | .. index:: | | | | pair: cluster option; enable-acl | | | | | | | | Whether :ref:`acl` should be used to authorize | | | | modifications to the CIB | +---------------------------+---------+----------------------------------------------------+ | placement-strategy | default | .. index:: | | | | pair: cluster option; placement-strategy | | | | | | | | How the cluster should allocate resources to nodes | | | | (see :ref:`utilization`). Allowed values are | | | | ``default``, ``utilization``, ``balanced``, and | | | | ``minimal``. | +---------------------------+---------+----------------------------------------------------+ | node-health-strategy | none | .. index:: | | | | pair: cluster option; node-health-strategy | | | | | | | | How the cluster should react to node health | | | | attributes (see :ref:`node-health`). Allowed values| | | | are ``none``, ``migrate-on-red``, ``only-green``, | | | | ``progressive``, and ``custom``. | +---------------------------+---------+----------------------------------------------------+ | node-health-base | 0 | .. index:: | | | | pair: cluster option; node-health-base | | | | | | | | The base health score assigned to a node. Only | | | | used when ``node-health-strategy`` is | | | | ``progressive``. | +---------------------------+---------+----------------------------------------------------+ | node-health-green | 0 | .. index:: | | | | pair: cluster option; node-health-green | | | | | | | | The score to use for a node health attribute whose | | | | value is ``green``. Only used when | | | | ``node-health-strategy`` is ``progressive`` or | | | | ``custom``. | +---------------------------+---------+----------------------------------------------------+ | node-health-yellow | 0 | .. index:: | | | | pair: cluster option; node-health-yellow | | | | | | | | The score to use for a node health attribute whose | | | | value is ``yellow``. Only used when | | | | ``node-health-strategy`` is ``progressive`` or | | | | ``custom``. | +---------------------------+---------+----------------------------------------------------+ | node-health-red | 0 | .. index:: | | | | pair: cluster option; node-health-red | | | | | | | | The score to use for a node health attribute whose | | | | value is ``red``. Only used when | | | | ``node-health-strategy`` is ``progressive`` or | | | | ``custom``. | +---------------------------+---------+----------------------------------------------------+ | cluster-recheck-interval | 15min | .. index:: | | | | pair: cluster option; cluster-recheck-interval | | | | | | | | Pacemaker is primarily event-driven, and looks | | | | ahead to know when to recheck the cluster for | | | | failure timeouts and most time-based rules | | | | *(since 2.0.3)*. However, it will also recheck the | | | | cluster after this amount of inactivity. This has | | | | two goals: rules with ``date_spec`` are only | | | | guaranteed to be checked this often, and it also | | | | serves as a fail-safe for some kinds of scheduler | | | | bugs. A value of 0 disables this polling; positive | | | | values are a time interval. | +---------------------------+---------+----------------------------------------------------+ | shutdown-lock | false | .. index:: | | | | pair: cluster option; shutdown-lock | | | | | | | | The default of false allows active resources to be | | | | recovered elsewhere when their node is cleanly | | | | shut down, which is what the vast majority of | | | | users will want. However, some users prefer to | | | | make resources highly available only for failures, | | | | with no recovery for clean shutdowns. If this | | | | option is true, resources active on a node when it | | | | is cleanly shut down are kept "locked" to that | | | | node (not allowed to run elsewhere) until they | | | | start again on that node after it rejoins (or for | | | | at most ``shutdown-lock-limit``, if set). Stonith | | | | resources and Pacemaker Remote connections are | | | | never locked. Clone and bundle instances and the | | | | master role of promotable clones are currently | | | | never locked, though support could be added in a | | | | future release. Locks may be manually cleared | | | | using the ``--refresh`` option of ``crm_resource`` | | | | (both the resource and node must be specified; | | | | this works with remote nodes if their connection | | | | resource's ``target-role`` is set to ``Stopped``, | | | | but not if Pacemaker Remote is stopped on the | | | | remote node without disabling the connection | | | | resource). *(since 2.0.4)* | +---------------------------+---------+----------------------------------------------------+ | shutdown-lock-limit | 0 | .. index:: | | | | pair: cluster option; shutdown-lock-limit | | | | | | | | If ``shutdown-lock`` is true, and this is set to a | | | | nonzero time duration, locked resources will be | | | | allowed to start after this much time has passed | | | | since the node shutdown was initiated, even if the | | | | node has not rejoined. (This works with remote | | | | nodes only if their connection resource's | | | | ``target-role`` is set to ``Stopped``.) | | | | *(since 2.0.4)* | +---------------------------+---------+----------------------------------------------------+ | remove-after-stop | false | .. index:: | | | | pair: cluster option; remove-after-stop | | | | | | | | *Deprecated* Should the cluster remove | | | | resources from Pacemaker's executor after they are | | | | stopped? Values other than the default are, at | | | | best, poorly tested and potentially dangerous. | | | | This option is deprecated and will be removed in a | | | | future release. | +---------------------------+---------+----------------------------------------------------+ | startup-fencing | true | .. index:: | | | | pair: cluster option; startup-fencing | | | | | | | | *Advanced Use Only:* Should the cluster fence | | | | unseen nodes at start-up? Setting this to false is | | | | unsafe, because the unseen nodes could be active | | | | and running resources but unreachable. | +---------------------------+---------+----------------------------------------------------+ | election-timeout | 2min | .. index:: | | | | pair: cluster option; election-timeout | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | shutdown-escalation | 20min | .. index:: | | | | pair: cluster option; shutdown-escalation | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | join-integration-timeout | 3min | .. index:: | | | | pair: cluster option; join-integration-timeout | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | join-finalization-timeout | 30min | .. index:: | | | | pair: cluster option; join-finalization-timeout | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | transition-delay | 0s | .. index:: | | | | pair: cluster option; transition-delay | | | | | | | | *Advanced Use Only:* Delay cluster recovery for | | | | the configured interval to allow for additional or | | | | related events to occur. This can be useful if | | | | your configuration is sensitive to the order in | | | | which ping updates arrive. Enabling this option | | | | will slow down cluster recovery under all | | | | conditions. | +---------------------------+---------+----------------------------------------------------+ diff --git a/doc/sphinx/Pacemaker_Remote/intro.rst b/doc/sphinx/Pacemaker_Remote/intro.rst index 361d4fb82d..9c5dab81a0 100644 --- a/doc/sphinx/Pacemaker_Remote/intro.rst +++ b/doc/sphinx/Pacemaker_Remote/intro.rst @@ -1,190 +1,186 @@ Scaling a Pacemaker Cluster --------------------------- Overview ######## In a basic Pacemaker high-availability cluster [#]_ each node runs the full cluster stack of Corosync and all Pacemaker components. This allows great flexibility but limits scalability to around 16 nodes. To allow for scalability to dozens or even hundreds of nodes, Pacemaker allows nodes not running the full cluster stack to integrate into the cluster and have the cluster manage their resources as if they were a cluster node. Terms ##### .. index:: single: cluster node single: node; cluster node **cluster node** A node running the full high-availability stack of corosync and all Pacemaker components. Cluster nodes may run cluster resources, run all Pacemaker command-line tools (``crm_mon``, ``crm_resource`` and so on), execute fencing actions, count toward cluster quorum, and serve as the cluster's Designated Controller (DC). .. index:: pacemaker_remoted **pacemaker_remoted** A small service daemon that allows a host to be used as a Pacemaker node without running the full cluster stack. Nodes running ``pacemaker_remoted`` may run cluster resources and most command-line tools, but cannot perform other functions of full cluster nodes such as fencing execution, quorum voting, or DC eligibility. The ``pacemaker_remoted`` daemon is an enhanced version of Pacemaker's local resource management daemon (LRMD). .. index:: single: remote node single: node; remote node **pacemaker_remote** The name of the systemd service that manages ``pacemaker_remoted`` **Pacemaker Remote** A way to refer to the general technology implementing nodes running ``pacemaker_remoted``, including the cluster-side implementation and the communication protocol between them. **remote node** A physical host running ``pacemaker_remoted``. Remote nodes have a special resource that manages communication with the cluster. This is sometimes referred to as the *bare metal* case. .. index:: single: guest node single: node; guest node **guest node** A virtual host running ``pacemaker_remoted``. Guest nodes differ from remote nodes mainly in that the guest node is itself a resource that the cluster manages. .. NOTE:: *Remote* in this document refers to the node not being a part of the underlying corosync cluster. It has nothing to do with physical proximity. Remote nodes and guest nodes are subject to the same latency requirements as cluster nodes, which means they are typically in the same data center. .. NOTE:: It is important to distinguish the various roles a virtual machine can serve in Pacemaker clusters: * A virtual machine can run the full cluster stack, in which case it is a cluster node and is not itself managed by the cluster. * A virtual machine can be managed by the cluster as a resource, without the cluster having any awareness of the services running inside the virtual machine. The virtual machine is *opaque* to the cluster. * A virtual machine can be a cluster resource, and run ``pacemaker_remoted`` to make it a guest node, allowing the cluster to manage services inside it. The virtual machine is *transparent* to the cluster. .. index:: single: virtual machine; as guest node Guest Nodes ########### **"I want a Pacemaker cluster to manage virtual machine resources, but I also want Pacemaker to be able to manage the resources that live within those virtual machines."** Without ``pacemaker_remoted``, the possibilities for implementing the above use case have significant limitations: * The cluster stack could be run on the physical hosts only, which loses the ability to monitor resources within the guests. * A separate cluster could be on the virtual guests, which quickly hits scalability issues. * The cluster stack could be run on the guests using the same cluster as the physical hosts, which also hits scalability issues and complicates fencing. With ``pacemaker_remoted``: * The physical hosts are cluster nodes (running the full cluster stack). * The virtual machines are guest nodes (running ``pacemaker_remoted``). Nearly zero configuration is required on the virtual machine. * The cluster stack on the cluster nodes launches the virtual machines and immediately connects to ``pacemaker_remoted`` on them, allowing the virtual machines to integrate into the cluster. The key difference here between the guest nodes and the cluster nodes is that the guest nodes do not run the cluster stack. This means they will never become the DC, initiate fencing actions or participate in quorum voting. On the other hand, this also means that they are not bound to the scalability limits associated with the cluster stack (no 16-node corosync member limits to deal with). That isn't to say that guest nodes can scale indefinitely, but it is known that guest nodes scale horizontally much further than cluster nodes. Other than the quorum limitation, these guest nodes behave just like cluster nodes with respect to resource management. The cluster is fully capable of managing and monitoring resources on each guest node. You can build constraints against guest nodes, put them in standby, or do whatever else you'd expect to be able to do with cluster nodes. They even show up in ``crm_mon`` output as nodes. To solidify the concept, below is an example that is very similar to an actual deployment we test in our developer environment to verify guest node scalability: * 16 cluster nodes running the full Corosync + Pacemaker stack * 64 Pacemaker-managed virtual machine resources running ``pacemaker_remoted`` configured as guest nodes * 64 Pacemaker-managed webserver and database resources configured to run on the 64 guest nodes With this deployment, you would have 64 webservers and databases running on 64 virtual machines on 16 hardware nodes, all of which are managed and monitored by the same Pacemaker deployment. It is known that ``pacemaker_remoted`` can scale to these lengths and possibly much further depending on the specific scenario. Remote Nodes ############ **"I want my traditional high-availability cluster to scale beyond the limits imposed by the corosync messaging layer."** Ultimately, the primary advantage of remote nodes over cluster nodes is scalability. There are likely some other use cases related to geographically distributed HA clusters that remote nodes may serve a purpose in, but those use cases are not well understood at this point. Like guest nodes, remote nodes will never become the DC, initiate fencing actions or participate in quorum voting. That is not to say, however, that fencing of a remote node works any differently than that of a cluster node. The Pacemaker scheduler understands how to fence remote nodes. As long as a fencing device exists, the cluster is capable of ensuring remote nodes are fenced in the exact same way as cluster nodes. Expanding the Cluster Stack ########################### With ``pacemaker_remoted``, the traditional view of the high-availability stack can be expanded to include a new layer: Traditional HA Stack ____________________ .. image:: images/pcmk-ha-cluster-stack.png - :width: 17cm - :height: 9cm :alt: Traditional Pacemaker+Corosync Stack :align: center HA Stack With Guest Nodes _________________________ .. image:: images/pcmk-ha-remote-stack.png - :width: 20cm - :height: 10cm :alt: Pacemaker+Corosync Stack with pacemaker_remoted :align: center .. [#] See the ``_ Pacemaker documentation, especially *Clusters From Scratch* and *Pacemaker Explained*. diff --git a/doc/sphinx/conf.py.in b/doc/sphinx/conf.py.in index 0c2615a1cb..91c40b0404 100644 --- a/doc/sphinx/conf.py.in +++ b/doc/sphinx/conf.py.in @@ -1,314 +1,316 @@ """ Sphinx configuration for Pacemaker documentation """ __copyright__ = "Copyright 2020 the Pacemaker project contributors" __license__ = "GNU General Public License version 2 or later (GPLv2+) WITHOUT ANY WARRANTY" # This file is execfile()d with the current directory set to its containing dir. # # Note that not all possible configuration values are present in this # autogenerated file. # # All configuration values have a default; values that are commented out # serve to show the default. import datetime import os import sys # Variables that can be used later in this file authors = "the Pacemaker project contributors" year = datetime.datetime.now().year doc_license = "Creative Commons Attribution-ShareAlike International Public License" doc_license += " version 4.0 or later (CC-BY-SA v4.0+)" # rST markup to insert at beginning of every document; mainly used for # # .. || replace:: # # where occurrences of || in the rST will be substituted with rst_prolog=""" .. |CFS_DISTRO| replace:: CentOS Stream .. |CFS_DISTRO_VER| replace:: 8 .. |REMOTE_DISTRO| replace:: CentOS .. |REMOTE_DISTRO_VER| replace:: 7.4 """ # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. #sys.path.insert(0, os.path.abspath('.')) # -- General configuration ----------------------------------------------------- # If your documentation needs a minimal Sphinx version, state it here. needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be extensions # coming with Sphinx (named 'sphinx.ext.*') or your custom ones. extensions = [] # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] # The suffix of source filenames. source_suffix = '.rst' # The encoding of source files. #source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' # General information about the project. project = '%BOOK_ID%' copyright = "2009-%s %s. Released under the terms of the %s" % (year, authors, doc_license) # The version info for the project you're documenting, acts as replacement for # |version| and |release|, also used in various other places throughout the # built documents. # # The full version, including alpha/beta/rc tags. release = '%VERSION%' # The short X.Y version. version = release.rsplit('.', 1)[0] # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. #language = None # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: #today = '' # Else, today_fmt is used as the format for a strftime call. #today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. exclude_patterns = ['_build'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. #add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). #add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. #show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'vs' # A list of ignored prefixes for module index sorting. #modindex_common_prefix = [] # -- Options for HTML output --------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. html_theme = 'pyramid' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. #html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. #html_theme_path = [] html_style = 'pacemaker.css' # The name for this set of Sphinx documents. If None, it defaults to # " v documentation". html_title = "%BOOK_TITLE%" # A shorter title for the navigation bar. Default is the same as html_title. #html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. #html_logo = None # The name of an image file (within the static path) to use as favicon of the # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. #html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, # so a file named "default.css" will overwrite the builtin "default.css". html_static_path = [ '%SRC_DIR%/_static' ] # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, # using the given strftime format. #html_last_updated_fmt = '%b %d, %Y' # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. #html_use_smartypants = True # Custom sidebar templates, maps document names to template names. #html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. #html_additional_pages = {} # If false, no module index is generated. #html_domain_indices = True # If false, no index is generated. #html_use_index = True # If true, the index is split into individual pages for each letter. #html_split_index = False # If true, links to the reST sources are added to the pages. #html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. #html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. #html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. #html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). #html_file_suffix = None # Output file base name for HTML help builder. htmlhelp_basename = 'Pacemakerdoc' # -- Options for LaTeX output -------------------------------------------------- +latex_engine = "xelatex" + latex_elements = { # The paper size ('letterpaper' or 'a4paper'). #'papersize': 'letterpaper', # The font size ('10pt', '11pt' or '12pt'). #'pointsize': '10pt', # Additional stuff for the LaTeX preamble. #'preamble': '', } # Grouping the document tree into LaTeX files. List of tuples # (source start file, target name, title, author, documentclass [howto/manual]). latex_documents = [ ('index', '%BOOK_ID%.tex', '%BOOK_TITLE%', authors, 'manual'), ] # The name of an image file (relative to this directory) to place at the top of # the title page. #latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. #latex_use_parts = False # If true, show page references after internal links. #latex_show_pagerefs = False # If true, show URL addresses after external links. #latex_show_urls = False # Documents to append as an appendix to all manuals. #latex_appendices = [] # If false, no module index is generated. #latex_domain_indices = True # -- Options for manual page output -------------------------------------------- # One entry per manual page. List of tuples # (source start file, name, description, authors, manual section). man_pages = [ ('index', '%BOOK_ID%', 'Part of the Pacemaker documentation set', [authors], 8) ] # If true, show URL addresses after external links. #man_show_urls = False # -- Options for Texinfo output ------------------------------------------------ # Grouping the document tree into Texinfo files. List of tuples # (source start file, target name, title, author, # dir menu entry, description, category) texinfo_documents = [ ('index', '%BOOK_ID%', '%BOOK_TITLE%', authors, '%BOOK_TITLE%', 'Pacemaker is an advanced, scalable high-availability cluster resource manager.', 'Miscellaneous'), ] # Documents to append as an appendix to all manuals. #texinfo_appendices = [] # If false, no module index is generated. #texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. #texinfo_show_urls = 'footnote' # -- Options for Epub output --------------------------------------------------- # Bibliographic Dublin Core info. epub_title = '%BOOK_TITLE%' epub_author = authors epub_publisher = 'ClusterLabs.org' epub_copyright = copyright # The language of the text. It defaults to the language option # or en if the language is not set. #epub_language = '' # The scheme of the identifier. Typical schemes are ISBN or URL. epub_scheme = 'URL' # The unique identifier of the text. This can be a ISBN number # or the project homepage. epub_identifier = 'http://www.clusterlabs.org/pacemaker/doc/2.0/%BOOK_ID%/epub/%BOOK_ID%.epub' # A unique identification for the text. epub_uid = 'ClusterLabs.org-Pacemaker-%BOOK_ID%' # A tuple containing the cover image and cover page html template filenames. #epub_cover = () # HTML files that should be inserted before the pages created by sphinx. # The format is a list of tuples containing the path and title. #epub_pre_files = [] # HTML files that should be inserted after the pages created by sphinx. # The format is a list of tuples containing the path and title. #epub_post_files = [] # A list of files that should not be packed into the epub file. epub_exclude_files = [ '_static/doctools.js', '_static/jquery.js', '_static/searchtools.js', '_static/underscore.js', '_static/basic.css', '_static/websupport.js', 'search.html', ] # The depth of the table of contents in toc.ncx. #epub_tocdepth = 3 # Allow duplicate toc entries. #epub_tocdup = True diff --git a/doc/sphinx/shared/pacemaker-intro.rst b/doc/sphinx/shared/pacemaker-intro.rst index a56e53fed7..3473636843 100644 --- a/doc/sphinx/shared/pacemaker-intro.rst +++ b/doc/sphinx/shared/pacemaker-intro.rst @@ -1,201 +1,196 @@ What Is Pacemaker? #################### Pacemaker is a high-availability *cluster resource manager* -- software that runs on a set of hosts (a *cluster* of *nodes*) in order to preserve integrity and minimize downtime of desired services (*resources*). [#]_ It is maintained by the `ClusterLabs `_ community. Pacemaker's key features include: * Detection of and recovery from node- and service-level failures * Ability to ensure data integrity by fencing faulty nodes * Support for one or more nodes per cluster * Support for multiple resource interface standards (anything that can be scripted can be clustered) * Support (but no requirement) for shared storage * Support for practically any redundancy configuration (active/passive, N+1, etc.) * Automatically replicated configuration that can be updated from any node * Ability to specify cluster-wide relationships between services, such as ordering, colocation and anti-colocation * Support for advanced service types, such as *clones* (services that need to be active on multiple nodes), *promotable clones* (clones that can run in one of two roles), and containerized services * Unified, scriptable cluster management tools -.. note:: Fencing +.. note:: **Fencing** *Fencing*, also known as *STONITH* (an acronym for Shoot The Other Node In The Head), is the ability to ensure that it is not possible for a node to be running a service. This is accomplished via *fence devices* such as intelligent power switches that cut power to the target, or intelligent network switches that cut the target's access to the local network. Pacemaker represents fence devices as a special class of resource. A cluster cannot safely recover from certain failure conditions, such as an unresponsive node, without fencing. Cluster Architecture ____________________ At a high level, a cluster can be viewed as having these parts (which together are often referred to as the *cluster stack*): * **Resources:** These are the reason for the cluster's being -- the services that need to be kept highly available. * **Resource agents:** These are scripts or operating system components that start, stop, and monitor resources, given a set of resource parameters. These provide a uniform interface between Pacemaker and the managed services. * **Fence agents:** These are scripts that execute node fencing actions, given a target and fence device parameters. * **Cluster membership layer:** This component provides reliable messaging, membership, and quorum information about the cluster. Currently, Pacemaker supports `Corosync `_ as this layer. * **Cluster resource manager:** Pacemaker provides the brain that processes and reacts to events that occur in the cluster. These events may include nodes joining or leaving the cluster; resource events caused by failures, maintenance, or scheduled activities; and other administrative actions. To achieve the desired availability, Pacemaker may start and stop resources and fence nodes. * **Cluster tools:** These provide an interface for users to interact with the cluster. Various command-line and graphical (GUI) interfaces are available. Most managed services are not, themselves, cluster-aware. However, many popular open-source cluster filesystems make use of a common *Distributed Lock Manager* (DLM), which makes direct use of Corosync for its messaging and membership capabilities and Pacemaker for the ability to fence nodes. .. image:: ../shared/images/pcmk-stack.png :alt: Example cluster stack - :scale: 75 % :align: center Pacemaker Architecture ______________________ Pacemaker itself is composed of multiple daemons that work together: * pacemakerd * pacemaker-attrd * pacemaker-based * pacemaker-controld * pacemaker-execd * pacemaker-fenced * pacemaker-schedulerd .. image:: ../shared/images/pcmk-internals.png :alt: Pacemaker software components - :scale: 65 % :align: center The Pacemaker master process (pacemakerd) spawns all the other daemons, and respawns them if they unexpectedly exit. The *Cluster Information Base* (CIB) is an `XML `_ representation of the cluster's configuration and the state of all nodes and resources. The *CIB manager* (pacemaker-based) keeps the CIB synchronized across the cluster, and handles requests to modify it. The *attribute manager* (pacemaker-attrd) maintains a database of attributes for all nodes, keeps it synchronized across the cluster, and handles requests to modify them. These attributes are usually recorded in the CIB. Given a snapshot of the CIB as input, the *scheduler* (pacemaker-schedulerd) determines what actions are necessary to achieve the desired state of the cluster. The *local executor* (pacemaker-execd) handles requests to execute resource agents on the local cluster node, and returns the result. The *fencer* (pacemaker-fenced) handles requests to fence nodes. Given a target node, the fencer decides which cluster node(s) should execute which fencing device(s), and calls the necessary fencing agents (either directly, or via requests to the fencer peers on other nodes), and returns the result. The *controller* (pacemaker-controld) is Pacemaker's coordinator, maintaining a consistent view of the cluster membership and orchestrating all the other components. Pacemaker centralizes cluster decision-making by electing one of the controller instances as the 'Designated Controller' ('DC'). Should the elected DC process (or the node it is on) fail, a new one is quickly established. The DC responds to cluster events by taking a current snapshot of the CIB, feeding it to the scheduler, then asking the executors (either directly on the local node, or via requests to controller peers on other nodes) and the fencer to execute any necessary actions. .. note:: **Old daemon names** The Pacemaker daemons were renamed in version 2.0. You may still find references to the old names, especially in documentation targeted to version 1.1. .. table:: +-------------------+---------------------+ | Old name | New name | +===================+=====================+ | attrd | pacemaker-attrd | +-------------------+---------------------+ | cib | pacemaker-based | +-------------------+---------------------+ | crmd | pacemaker-controld | +-------------------+---------------------+ | lrmd | pacemaker-execd | +-------------------+---------------------+ | stonithd | pacemaker-fenced | +-------------------+---------------------+ | pacemaker_remoted | pacemaker-remoted | +-------------------+---------------------+ Node Redundancy Designs _______________________ Pacemaker supports practically any `node redundancy configuration `_ including *Active/Active*, *Active/Passive*, *N+1*, *N+M*, *N-to-1* and *N-to-N*. Active/passive clusters with two (or more) nodes using Pacemaker and `DRBD `_ are a cost-effective high-availability solution for many situations. One of the nodes provides the desired services, and if it fails, the other node takes over. .. image:: ../shared/images/pcmk-active-passive.png :alt: Active/Passive Redundancy :align: center - :scale: 75 % Pacemaker also supports multiple nodes in a shared-failover design, reducing hardware costs by allowing several active/passive clusters to be combined and share a common backup node. .. image:: ../shared/images/pcmk-shared-failover.png :alt: Shared Failover :align: center - :scale: 75 % When shared storage is available, every node can potentially be used for failover. Pacemaker can even run multiple copies of services to spread out the workload. This is sometimes called N to N Redundancy. .. image:: ../shared/images/pcmk-active-active.png :alt: N to N Redundancy :align: center - :scale: 75 % .. rubric:: Footnotes .. [#] *Cluster* is sometimes used in other contexts to refer to hosts grouped together for other purposes, such as high-performance computing (HPC), but Pacemaker is not intended for those purposes.