diff --git a/doc/sphinx/Clusters_from_Scratch/verification.rst b/doc/sphinx/Clusters_from_Scratch/verification.rst index af1438f290..28e6036c1a 100644 --- a/doc/sphinx/Clusters_from_Scratch/verification.rst +++ b/doc/sphinx/Clusters_from_Scratch/verification.rst @@ -1,216 +1,213 @@ Start and Verify Cluster ------------------------ Start the Cluster ################# Now that corosync is configured, it is time to start the cluster. The command below will start corosync and pacemaker on both nodes -in the cluster. If you are issuing the start command from a different -node than the one you ran the ``pcs host auth`` command on earlier, you -must authenticate on the current node you are logged into before you will -be allowed to start the cluster. +in the cluster. .. code-block:: none [root@pcmk-1 ~]# pcs cluster start --all pcmk-1: Starting Cluster... pcmk-2: Starting Cluster... .. NOTE:: An alternative to using the ``pcs cluster start --all`` command is to issue either of the below command sequences on each node in the cluster separately: .. code-block:: none # pcs cluster start Starting Cluster... or .. code-block:: none # systemctl start corosync.service # systemctl start pacemaker.service .. IMPORTANT:: In this example, we are not enabling the corosync and pacemaker services to start at boot. If a cluster node fails or is rebooted, you will need to run ``pcs cluster start [ | --all]`` to start the cluster on it. While you can enable the services to start at boot (for example, using ``pcs cluster enable [ | --all]``), requiring a manual start of cluster services gives you the opportunity to do a post-mortem investigation of a node failure before returning it to the cluster. Verify Corosync Installation ############################ First, use ``corosync-cfgtool`` to check whether cluster communication is happy: .. code-block:: none [root@pcmk-1 ~]# corosync-cfgtool -s Printing link status. Local node ID 1 LINK ID 0 addr = 192.168.122.101 status: nodeid 1: localhost nodeid 2: connected We can see here that everything appears normal with our fixed IP address (not a 127.0.0.x loopback address) listed as the **addr**, and **localhost** and **connected** for the statuses of nodeid 1 and nodeid 2, respectively. If you see something different, you might want to start by checking the node's network, firewall, and SELinux configurations. Next, check the membership and quorum APIs: .. code-block:: none [root@pcmk-1 ~]# corosync-cmapctl | grep members runtime.members.1.config_version (u64) = 0 runtime.members.1.ip (str) = r(0) ip(192.168.122.101) runtime.members.1.join_count (u32) = 1 runtime.members.1.status (str) = joined runtime.members.2.config_version (u64) = 0 runtime.members.2.ip (str) = r(0) ip(192.168.122.102) runtime.members.2.join_count (u32) = 1 runtime.members.2.status (str) = joined [root@pcmk-1 ~]# pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 pcmk-1 (local) 2 1 pcmk-2 You should see both nodes have joined the cluster. Verify Pacemaker Installation ############################# Now that we have confirmed that Corosync is functional, we can check the rest of the stack. Pacemaker has already been started, so verify the necessary processes are running: .. code-block:: none [root@pcmk-1 ~]# ps axf PID TTY STAT TIME COMMAND 2 ? S 0:00 [kthreadd] ...lots of processes... 17121 ? SLsl 0:01 /usr/sbin/corosync -f 17133 ? Ss 0:00 /usr/sbin/pacemakerd 17134 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-based 17135 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-fenced 17136 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-execd 17137 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-attrd 17138 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-schedulerd 17139 ? Ss 0:00 \_ /usr/libexec/pacemaker/pacemaker-controld If that looks OK, check the ``pcs status`` output: .. code-block:: none [root@pcmk-1 ~]# pcs status Cluster name: mycluster WARNINGS: No stonith devices and stonith-enabled is not false Cluster Summary: * Stack: corosync * Current DC: pcmk-2 (version 2.0.5-4.el8-ba59be7122) - partition with quorum * Last updated: Wed Jan 20 07:54:02 2021 * Last change: Wed Jan 20 07:48:25 2021 by hacluster via crmd on pcmk-2 * 2 nodes configured * 0 resource instances configured Node List: * Online: [ pcmk-1 pcmk-2 ] Full List of Resources: * No resources Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Finally, ensure there are no start-up errors from corosync or pacemaker (aside from messages relating to not having STONITH configured, which are OK at this point): .. code-block:: none [root@pcmk-1 ~]# journalctl -b | grep -i error .. NOTE:: Other operating systems may report startup errors in other locations (for example, ``/var/log/messages``). Repeat these checks on the other node. The results should be the same. Explore the Existing Configuration ################################## For those who are not of afraid of XML, you can see the raw cluster configuration and status by using the ``pcs cluster cib`` command. .. topic:: The last XML you'll see in this document .. code-block:: none [root@pcmk-1 ~]# pcs cluster cib .. code-block:: xml Before we make any changes, it's a good idea to check the validity of the configuration. .. code-block:: none [root@pcmk-1 ~]# crm_verify -L -V error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid As you can see, the tool has found some errors. The cluster will not start any resources until we configure STONITH.