diff --git a/doc/sphinx/Clusters_from_Scratch/ap-configuration.rst b/doc/sphinx/Clusters_from_Scratch/ap-configuration.rst index 8beb1dd1d0..f899c88d95 100644 --- a/doc/sphinx/Clusters_from_Scratch/ap-configuration.rst +++ b/doc/sphinx/Clusters_from_Scratch/ap-configuration.rst @@ -1,372 +1,372 @@ Configuration Recap ------------------- Final Cluster Configuration ########################### :: [root@pcmk-1 ~]# pcs resource Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-1 pcmk-2 ] Clone Set: dlm-clone [dlm] Started: [ pcmk-1 pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1 Clone Set: WebFS-clone [WebFS] Started: [ pcmk-1 pcmk-2 ] WebSite (ocf::heartbeat:apache): Started pcmk-1 :: [root@pcmk-1 ~]# pcs resource op defaults timeout: 240s :: [root@pcmk-1 ~]# pcs stonith impi-fencing (stonith:fence_ipmilan): Started pcmk-1 :: [root@pcmk-1 ~]# pcs constraint Location Constraints: Ordering Constraints: start ClusterIP then start WebSite (kind:Mandatory) promote WebDataClone then start WebFS-clone (kind:Mandatory) start WebFS-clone then start WebSite (kind:Mandatory) start dlm-clone then start WebFS-clone (kind:Mandatory) Colocation Constraints: WebSite with ClusterIP (score:INFINITY) WebFS-clone with WebDataClone (score:INFINITY) (with-rsc-role:Master) WebSite with WebFS-clone (score:INFINITY) WebFS-clone with dlm-clone (score:INFINITY) Ticket Constraints: :: [root@pcmk-1 ~]# pcs status Cluster name: mycluster Stack: corosync Current DC: pcmk-1 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Tue Sep 11 10:41:53 2018 Last change: Tue Sep 11 10:40:16 2018 by root via cibadmin on pcmk-1 2 nodes configured 11 resources configured Online: [ pcmk-1 pcmk-2 ] Full list of resources: ipmi-fencing (stonith:fence_ipmilan): Started pcmk-1 Master/Slave Set: WebDataClone [WebData] Masters: [ pcmk-1 pcmk-2 ] Clone Set: dlm-clone [dlm] Started: [ pcmk-1 pcmk-2 ] ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1 Clone Set: WebFS-clone [WebFS] Started: [ pcmk-1 pcmk-2 ] WebSite (ocf::heartbeat:apache): Started pcmk-1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled :: [root@pcmk-1 ~]# pcs cluster cib --config -.. code:: xml +.. code-block:: xml Node List ######### :: [root@pcmk-1 ~]# pcs status nodes Pacemaker Nodes: Online: pcmk-1 pcmk-2 Standby: Maintenance: Offline: Pacemaker Remote Nodes: Online: Standby: Maintenance: Offline: Cluster Options ############### :: [root@pcmk-1 ~]# pcs property Cluster Properties: cluster-infrastructure: corosync cluster-name: mycluster dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9 have-watchdog: false last-lrm-refresh: 1536679009 stonith-enabled: true The output shows state information automatically obtained about the cluster, including: * **cluster-infrastructure** - the cluster communications layer in use * **cluster-name** - the cluster name chosen by the administrator when the cluster was created * **dc-version** - the version (including upstream source-code hash) of Pacemaker used on the Designated Controller, which is the node elected to determine what actions are needed when events occur The output also shows options set by the administrator that control the way the cluster operates, including: * **stonith-enabled=true** - whether the cluster is allowed to use STONITH resources Resources ######### Default Options _______________ :: [root@pcmk-1 ~]# pcs resource defaults resource-stickiness: 100 This shows cluster option defaults that apply to every resource that does not explicitly set the option itself. Above: * **resource-stickiness** - Specify the aversion to moving healthy resources to other machines Fencing _______ :: [root@pcmk-1 ~]# pcs stonith show ipmi-fencing (stonith:fence_ipmilan): Started pcmk-1 [root@pcmk-1 ~]# pcs stonith show ipmi-fencing Resource: ipmi-fencing (class=stonith type=fence_ipmilan) Attributes: ipaddr="10.0.0.1" login="testuser" passwd="acd123" pcmk_host_list="pcmk-1 pcmk-2" Operations: monitor interval=60s (fence-monitor-interval-60s) Service Address _______________ Users of the services provided by the cluster require an unchanging address with which to access it. :: [root@pcmk-1 ~]# pcs resource show ClusterIP Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) Attributes: cidr_netmask=24 ip=192.168.122.120 clusterip_hash=sourceip Meta Attrs: resource-stickiness=0 Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) start interval=0s timeout=20s (ClusterIP-start-interval-0s) stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) DRBD - Shared Storage _____________________ Here, we define the DRBD service and specify which DRBD resource (from /etc/drbd.d/\*.res) it should manage. We make it a master clone resource and, in order to have an active/active setup, allow both instances to be promoted to master at the same time. We also set the notify option so that the cluster will tell DRBD agent when its peer changes state. :: [root@pcmk-1 ~]# pcs resource show WebDataClone Master: WebDataClone Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=2 clone-node-max=1 Resource: WebData (class=ocf provider=linbit type=drbd) Attributes: drbd_resource=wwwdata Operations: demote interval=0s timeout=90 (WebData-demote-interval-0s) monitor interval=60s (WebData-monitor-interval-60s) notify interval=0s timeout=90 (WebData-notify-interval-0s) promote interval=0s timeout=90 (WebData-promote-interval-0s) reload interval=0s timeout=30 (WebData-reload-interval-0s) start interval=0s timeout=240 (WebData-start-interval-0s) stop interval=0s timeout=100 (WebData-stop-interval-0s) [root@pcmk-1 ~]# pcs constraint ref WebDataClone Resource: WebDataClone colocation-WebFS-WebDataClone-INFINITY order-WebDataClone-WebFS-mandatory Cluster Filesystem __________________ The cluster filesystem ensures that files are read and written correctly. We need to specify the block device (provided by DRBD), where we want it mounted and that we are using GFS2. Again, it is a clone because it is intended to be active on both nodes. The additional constraints ensure that it can only be started on nodes with active DLM and DRBD instances. :: [root@pcmk-1 ~]# pcs resource show WebFS-clone Clone: WebFS-clone Resource: WebFS (class=ocf provider=heartbeat type=Filesystem) Attributes: device=/dev/drbd1 directory=/var/www/html fstype=gfs2 Operations: monitor interval=20 timeout=40 (WebFS-monitor-interval-20) notify interval=0s timeout=60 (WebFS-notify-interval-0s) start interval=0s timeout=60 (WebFS-start-interval-0s) stop interval=0s timeout=60 (WebFS-stop-interval-0s) [root@pcmk-1 ~]# pcs constraint ref WebFS-clone Resource: WebFS-clone colocation-WebFS-WebDataClone-INFINITY colocation-WebSite-WebFS-INFINITY colocation-WebFS-dlm-clone-INFINITY order-WebDataClone-WebFS-mandatory order-WebFS-WebSite-mandatory order-dlm-clone-WebFS-mandatory Apache ______ Lastly, we have the actual service, Apache. We need only tell the cluster where to find its main configuration file and restrict it to running on a node that has the required filesystem mounted and the IP address active. :: [root@pcmk-1 ~]# pcs resource show WebSite Resource: WebSite (class=ocf provider=heartbeat type=apache) Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status Operations: monitor interval=1min (WebSite-monitor-interval-1min) start interval=0s timeout=40s (WebSite-start-interval-0s) stop interval=0s timeout=60s (WebSite-stop-interval-0s) [root@pcmk-1 ~]# pcs constraint ref WebSite Resource: WebSite colocation-WebSite-ClusterIP-INFINITY colocation-WebSite-WebFS-INFINITY order-ClusterIP-WebSite-mandatory order-WebFS-WebSite-mandatory diff --git a/doc/sphinx/Clusters_from_Scratch/verification.rst b/doc/sphinx/Clusters_from_Scratch/verification.rst index 0d40792e12..f42deac924 100644 --- a/doc/sphinx/Clusters_from_Scratch/verification.rst +++ b/doc/sphinx/Clusters_from_Scratch/verification.rst @@ -1,211 +1,211 @@ Start and Verify Cluster ------------------------ Start the Cluster ################# Now that corosync is configured, it is time to start the cluster. The command below will start corosync and pacemaker on both nodes in the cluster. If you are issuing the start command from a different node than the one you ran the ``pcs cluster auth`` command on earlier, you must authenticate on the current node you are logged into before you will be allowed to start the cluster. :: [root@pcmk-1 ~]# pcs cluster start --all pcmk-1: Starting Cluster... pcmk-2: Starting Cluster... .. NOTE:: An alternative to using the ``pcs cluster start --all`` command is to issue either of the below command sequences on each node in the cluster separately: :: # pcs cluster start Starting Cluster... or :: # systemctl start corosync.service # systemctl start pacemaker.service .. IMPORTANT:: In this example, we are not enabling the corosync and pacemaker services to start at boot. If a cluster node fails or is rebooted, you will need to run ``pcs cluster start `` (or ``--all``) to start the cluster on it. While you could enable the services to start at boot, requiring a manual start of cluster services gives you the opportunity to do a post-mortem investigation of a node failure before returning it to the cluster. Verify Corosync Installation ############################ First, use ``corosync-cfgtool`` to check whether cluster communication is happy: :: [root@pcmk-1 ~]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 192.168.122.101 status = ring 0 active with no faults We can see here that everything appears normal with our fixed IP address (not a 127.0.0.x loopback address) listed as the **id**, and **no faults** for the status. If you see something different, you might want to start by checking the node's network, firewall and SELinux configurations. Next, check the membership and quorum APIs: :: [root@pcmk-1 ~]# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.122.101) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.122.102) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined [root@pcmk-1 ~]# pcs status corosync Membership information \---------------------- Nodeid Votes Name 1 1 pcmk-1 (local) 2 1 pcmk-2 You should see both nodes have joined the cluster. Verify Pacemaker Installation ############################# Now that we have confirmed that Corosync is functional, we can check the rest of the stack. Pacemaker has already been started, so verify the necessary processes are running: :: [root@pcmk-1 ~]# ps axf PID TTY STAT TIME COMMAND 2 ? S 0:00 [kthreadd] ...lots of processes... 11635 ? SLsl 0:03 corosync 11642 ? Ss 0:00 /usr/sbin/pacemakerd -f 11643 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib 11644 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd 11645 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd 11646 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd 11647 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine 11648 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd If that looks OK, check the ``pcs status`` output: :: [root@pcmk-1 ~]# pcs status Cluster name: mycluster WARNING: no stonith devices and stonith-enabled is not false Stack: corosync Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum Last updated: Mon Sep 10 16:37:34 2018 Last change: Mon Sep 10 16:30:53 2018 by hacluster via crmd on pcmk-2 2 nodes configured 0 resources configured Online: [ pcmk-1 pcmk-2 ] No resources Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Finally, ensure there are no start-up errors from corosync or pacemaker (aside from messages relating to not having STONITH configured, which are OK at this point): :: [root@pcmk-1 ~]# journalctl -b | grep -i error .. NOTE:: Other operating systems may report startup errors in other locations, for example ``/var/log/messages``. Repeat these checks on the other node. The results should be the same. Explore the Existing Configuration ################################## For those who are not of afraid of XML, you can see the raw cluster configuration and status by using the `pcs cluster cib` command. .. topic:: The last XML you'll see in this document :: [root@pcmk-1 ~]# pcs cluster cib - .. code:: xml + .. code-block:: xml Before we make any changes, it's a good idea to check the validity of the configuration. :: [root@pcmk-1 ~]# crm_verify -L -V error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid As you can see, the tool has found some errors. The cluster will not start any resources until we configure STONITH.