No OneTemporary
Actions

Size

33 KB

Referenced Files

None

Subscribers

None

View Options

	diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt b/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
	index 0cd3463113..f75cb34e2b 100644
	--- a/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
	+++ b/doc/Clusters_from_Scratch/en-US/Ch-Active-Passive.txt
	@@ -1,377 +1,268 @@
	:compat-mode: legacy
	= Create an Active/Passive Cluster =

	-== Explore the Existing Configuration ==
	-
	-When Pacemaker starts up, it automatically records the number and details
	-of the nodes in the cluster, as well as which stack is being used and the
	-version of Pacemaker being used.
	-
	-The first few lines of output should look like this:
	-
	-----
	-[root@pcmk-1 ~]# pcs status
	-Cluster name: mycluster
	-WARNING: no stonith devices and stonith-enabled is not false
	-Stack: corosync
	-Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
	-Last updated: Mon Sep 10 16:41:46 2018
	-Last change: Mon Sep 10 16:30:53 2018 by hacluster via crmd on pcmk-2
	-
	-2 nodes configured
	-0 resources configured
	-
	-Online: [ pcmk-1 pcmk-2 ]
	-----
	-
	-For those who are not of afraid of XML, you can see the raw cluster
	-configuration and status by using the `pcs cluster cib` command.
	-
	-.The last XML you'll see in this document
	-======
	-----
	-[root@pcmk-1 ~]# pcs cluster cib
	-----
	-[source,XML]
	-----
	-<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="5" num_updates="4" admin_epoch="0" cib-last-written="Mon Sep 10 16:30:53 2018" update-origin="pcmk-2" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2">
	- <configuration>
	- <crm_config>
	- <cluster_property_set id="cib-bootstrap-options">
	- <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
	- <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.18-11.el7_5.3-2b07d5c5a9"/>
	- <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
	- <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="mycluster"/>
	- </cluster_property_set>
	- </crm_config>
	- <nodes>
	- <node id="1" uname="pcmk-1"/>
	- <node id="2" uname="pcmk-2"/>
	- </nodes>
	- <resources/>
	- <constraints/>
	- </configuration>
	- <status>
	- <node_state id="1" uname="pcmk-1" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
	- <lrm id="1">
	- <lrm_resources/>
	- </lrm>
	- </node_state>
	- <node_state id="2" uname="pcmk-2" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
	- <lrm id="2">
	- <lrm_resources/>
	- </lrm>
	- </node_state>
	- </status>
	-</cib>
	-----
	-======
	-
	-Before we make any changes, it's a good idea to check the validity of
	-the configuration.
	-
	-----
	-[root@pcmk-1 ~]# crm_verify -L -V
	- error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
	- error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
	- error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
	-Errors found during check: config not valid
	-----
	-
	-As you can see, the tool has found some errors.
	-
	-In order to guarantee the safety of your data,
	-footnote:[If the data is corrupt, there is little point in continuing to make it available]
	-fencing (also called STONITH) is enabled by default. However, it also knows
	-when no STONITH configuration has been supplied and reports this as a problem
	-(since the cluster will not be able to make progress if a situation requiring
	-node fencing arises).
	-
	-We will disable this feature for now and configure it later.
	-
	-To disable STONITH, set the stonith-enabled cluster option to
	-false:
	-
	-----
	-[root@pcmk-1 ~]# pcs property set stonith-enabled=false
	-[root@pcmk-1 ~]# crm_verify -L
	-----
	-
	-With the new cluster option set, the configuration is now valid.
	-
	-[WARNING]
	-=========
	-The use of `stonith-enabled=false` is completely inappropriate for a
	-production cluster. It tells the cluster to simply pretend that failed nodes
	-are safely powered off. Some vendors will refuse to support clusters that have
	-STONITH disabled. We disable STONITH here only to defer the discussion of its
	-configuration, which can differ widely from one installation to the
	-next. See <<_what_is_stonith>> for information on why STONITH is important
	-and details on how to configure it.
	-=========
	-
	== Add a Resource ==

	Our first resource will be a unique IP address that the cluster can bring up on
	either node. Regardless of where any cluster service(s) are running, end
	users need a consistent address to contact them on. Here, I will choose
	192.168.122.120 as the floating address, give it the imaginative name ClusterIP
	and tell the cluster to check whether it is running every 30 seconds.

	[WARNING]
	===========
	The chosen address must not already be in use on the network.
	Do not reuse an IP address one of the nodes already has configured.
	===========

	----
	[root@pcmk-1 ~]# pcs resource create ClusterIP ocf:heartbeat:IPaddr2 \
	ip=192.168.122.120 cidr_netmask=24 op monitor interval=30s
	----

	Another important piece of information here is ocf:heartbeat:IPaddr2.
	This tells Pacemaker three things about the resource you want to add:

	* The first field (ocf in this case) is the standard to which the resource
	script conforms and where to find it.

	* The second field (heartbeat in this case) is standard-specific; for OCF
	resources, it tells the cluster which OCF namespace the resource script is in.

	* The third field (IPaddr2 in this case) is the name of the resource script.

	To obtain a list of the available resource standards (the ocf part of
	ocf:heartbeat:IPaddr2), run:

	----
	[root@pcmk-1 ~]# pcs resource standards
	lsb
	ocf
	service
	systemd
	----

	To obtain a list of the available OCF resource providers (the heartbeat
	part of ocf:heartbeat:IPaddr2), run:

	----
	[root@pcmk-1 ~]# pcs resource providers
	heartbeat
	openstack
	pacemaker
	----

	Finally, if you want to see all the resource agents available for
	a specific OCF provider (the IPaddr2 part of ocf:heartbeat:IPaddr2), run:

	----
	[root@pcmk-1 ~]# pcs resource agents ocf:heartbeat
	apache
	aws-vpc-move-ip
	awseip
	awsvip
	azure-lb
	clvm
	.
	. (skipping lots of resources to save space)
	.
	symlink
	tomcat
	VirtualDomain
	Xinetd
	----

	Now, verify that the IP resource has been added, and display the cluster's
	status to see that it is now active:

	----
	[root@pcmk-1 ~]# pcs status
	Cluster name: mycluster
	Stack: corosync
	Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
	Last updated: Mon Sep 10 16:55:26 2018
	Last change: Mon Sep 10 16:53:42 2018 by root via cibadmin on pcmk-1

	2 nodes configured
	1 resource configured

	Online: [ pcmk-1 pcmk-2 ]

	Full list of resources:

	ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	== Perform a Failover ==

	Since our ultimate goal is high availability, we should test failover of
	our new resource before moving on.

	First, find the node on which the IP address is running.

	----
	[root@pcmk-1 ~]# pcs status
	Cluster name: mycluster
	Stack: corosync
	Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
	Last updated: Mon Sep 10 16:55:26 2018
	Last change: Mon Sep 10 16:53:42 2018 by root via cibadmin on pcmk-1

	2 nodes configured
	1 resource configured

	Online: [ pcmk-1 pcmk-2 ]

	Full list of resources:

	ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-1
	----

	You can see that the status of the ClusterIP resource
	is Started on a particular node (in this example, pcmk-1).
	Shut down Pacemaker and Corosync on that machine to trigger a failover.

	----
	[root@pcmk-1 ~]# pcs cluster stop pcmk-1
	Stopping Cluster (pacemaker)...
	Stopping Cluster (corosync)...
	----

	[NOTE]
	======
	A cluster command such as +pcs cluster stop pass:[<replaceable>nodename</replaceable>]+ can be run
	from any node in the cluster, not just the affected node.
	======

	Verify that pacemaker and corosync are no longer running:
	----
	[root@pcmk-1 ~]# pcs status
	Error: cluster is not currently running on this node
	----

	Go to the other node, and check the cluster status.

	----
	[root@pcmk-2 ~]# pcs status
	Cluster name: mycluster
	Stack: corosync
	Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
	Last updated: Mon Sep 10 16:57:22 2018
	Last change: Mon Sep 10 16:53:42 2018 by root via cibadmin on pcmk-1

	2 nodes configured
	1 resource configured

	Online: [ pcmk-2 ]
	OFFLINE: [ pcmk-1 ]

	Full list of resources:

	ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	Notice that pcmk-1 is OFFLINE for cluster purposes (its pcsd is still
	active, allowing it to receive `pcs` commands, but it is not participating in
	the cluster).

	Also notice that ClusterIP is now running on pcmk-2 -- failover happened
	automatically, and no errors are reported.

	[IMPORTANT]
	.Quorum
	====
	If a cluster splits into two (or more) groups of nodes that can no longer
	communicate with each other (aka. _partitions_), _quorum_ is used to prevent
	resources from starting on more nodes than desired, which would risk
	data corruption.

	A cluster has quorum when more than half of all known nodes are online in
	the same partition, or for the mathematically inclined, whenever the following
	equation is true:
	....
	total_nodes < 2 * active_nodes
	....

	For example, if a 5-node cluster split into 3- and 2-node paritions,
	the 3-node partition would have quorum and could continue serving resources.
	If a 6-node cluster split into two 3-node partitions, neither partition
	would have quorum; pacemaker's default behavior in such cases is to
	stop all resources, in order to prevent data corruption.

	Two-node clusters are a special case. By the above definition,
	a two-node cluster would only have quorum when both nodes are
	running. This would make the creation of a two-node cluster pointless,
	but corosync has the ability to treat two-node clusters as if only one node
	is required for quorum.

	The `pcs cluster setup` command will automatically configure two_node: 1
	in +corosync.conf+, so a two-node cluster will "just work".

	If you are using a different cluster shell, you will have to configure
	+corosync.conf+ appropriately yourself.
	====

	Now, simulate node recovery by restarting the cluster stack on pcmk-1, and
	check the cluster's status. (It may take a little while before the cluster
	gets going on the node, but it eventually will look like the below.)

	----
	[root@pcmk-1 ~]# pcs cluster start pcmk-1
	pcmk-1: Starting Cluster...
	[root@pcmk-1 ~]# pcs status
	Cluster name: mycluster
	Stack: corosync
	Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
	Last updated: Mon Sep 10 17:00:04 2018
	Last change: Mon Sep 10 16:53:42 2018 by root via cibadmin on pcmk-1

	2 nodes configured
	1 resource configured

	Online: [ pcmk-1 pcmk-2 ]

	Full list of resources:

	ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	== Prevent Resources from Moving after Recovery ==

	In most circumstances, it is highly desirable to prevent healthy
	resources from being moved around the cluster. Moving resources almost
	always requires a period of downtime. For complex services such as
	databases, this period can be quite long.

	To address this, Pacemaker has the concept of resource _stickiness_,
	which controls how strongly a service prefers to stay running where it
	is. You may like to think of it as the "cost" of any downtime. By
	default, Pacemaker assumes there is zero cost associated with moving
	resources and will do so to achieve "optimal"
	footnote:[Pacemaker's definition of optimal may not always agree with that of a
	human's. The order in which Pacemaker processes lists of resources and nodes
	creates implicit preferences in situations where the administrator has not
	explicitly specified them.]
	resource placement. We can specify a different stickiness for every
	resource, but it is often sufficient to change the default.

	----
	[root@pcmk-1 ~]# pcs resource defaults resource-stickiness=100
	Warning: Defaults do not apply to resources which override them with their own defined values
	[root@pcmk-1 ~]# pcs resource defaults
	resource-stickiness: 100
	----
	diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt b/doc/Clusters_from_Scratch/en-US/Ch-Fencing.txt
	similarity index 66%
	rename from doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt
	rename to doc/Clusters_from_Scratch/en-US/Ch-Fencing.txt
	index e25735440f..6987c69460 100644
	--- a/doc/Clusters_from_Scratch/en-US/Ch-Stonith.txt
	+++ b/doc/Clusters_from_Scratch/en-US/Ch-Fencing.txt
	@@ -1,170 +1,210 @@
	:compat-mode: legacy
	-= Configure STONITH =
	+= Configure Fencing =

	-== What is STONITH? ==
	+== What is Fencing? ==

	-STONITH (Shoot The Other Node In The Head aka. fencing) protects your data from
	-being corrupted by rogue nodes or unintended concurrent access.
	+Fencing protects your data from being corrupted, and your application from
	+becoming unavailable, due to unintended concurrent access by rogue nodes.

	Just because a node is unresponsive doesn't mean it has stopped
	accessing your data. The only way to be 100% sure that your data is
	-safe, is to use STONITH to ensure that the node is truly
	+safe, is to use fencing to ensure that the node is truly
	offline before allowing the data to be accessed from another node.

	-STONITH also has a role to play in the event that a clustered service
	-cannot be stopped. In this case, the cluster uses STONITH to force the
	+Fencing also has a role to play in the event that a clustered service
	+cannot be stopped. In this case, the cluster uses fencing to force the
	whole node offline, thereby making it safe to start the service
	elsewhere.

	-== Choose a STONITH Device ==
	+Fencing is also known as STONITH, an acronym for "Shoot The Other Node In The
	+Head", since the most popular form of fencing is cutting a host's power.

	-It is crucial that your STONITH device can allow the cluster to
	-differentiate between a node failure and a network failure.
	+In order to guarantee the safety of your data,
	+footnote:[If the data is corrupt, there is little point in continuing to make it available]
	+fencing is enabled by default.

	-A common mistake people make when choosing a STONITH device is to use a remote
	-power switch (such as many on-board IPMI controllers) that shares power with
	-the node it controls. If the power fails in such a case, the cluster cannot be
	-sure whether the node is really offline, or active and suffering from a network
	-fault, so the cluster will stop all resources to avoid a possible split-brain
	-situation.
	+[NOTE]
	+====
	+It is possible to tell the cluster not to use fencing, by setting the
	+stonith-enabled cluster option to false:
	+----
	+[root@pcmk-1 ~]# pcs property set stonith-enabled=false
	+[root@pcmk-1 ~]# crm_verify -L
	+----
	+
	+However, this is completely inappropriate for a production cluster. It tells
	+the cluster to simply pretend that failed nodes are safely powered off. Some
	+vendors will refuse to support clusters that have fencing disabled. Even
	+disabling it for a test cluster means you won't be able to test real failure
	+scenarios.
	+====
	+
	+== Choose a Fence Device ==
	+
	+The two broad categories of fence device are power fencing, which cuts off
	+power to the target, and fabric fencing, which cuts off the target's access to
	+some critical resource, such as a shared disk or access to the local network.
	+
	+Power fencing devices include:
	+
	+* Intelligent power switches
	+* IPMI
	+* Hardware watchdog device (alone, or in combination with shared storage used
	+ as a "poison pill" mechanism)
	+
	+Fabric fencing devices include:
	+
	+* Shared storage that can be cut off for a target host by another host (for
	+ example, an external storage device that supports SCSI-3 persistent
	+ reservations)
	+* Intelligent network switches
	+
	+Using IPMI as a power fencing device may seem like a good choice. However,
	+if the IPMI shares power and/or network access with the host (such as most
	+onboard IPMI controllers), a power or network failure will cause both the
	+host and its fencing device to fail. The cluster will be unable to recover,
	+and must stop all resources to avoid a possible split-brain situation.

	Likewise, any device that relies on the machine being active (such as
	-SSH-based "devices" sometimes used during testing) is inappropriate.
	+SSH-based "devices" sometimes used during testing) is inappropriate,
	+because fencing will be required when the node is completely unresponsive.

	-== Configure the Cluster for STONITH ==
	+== Configure the Cluster for Fencing ==

	-. Install the STONITH agent(s). To see what packages are available, run `yum
	+. Install the fence agent(s). To see what packages are available, run `yum
	search fence-`. Be sure to install the package(s) on all cluster nodes.

	-. Configure the STONITH device itself to be able to fence your nodes and accept
	+. Configure the fence device itself to be able to fence your nodes and accept
	fencing requests. This includes any necessary configuration on the device and
	on the nodes, and any firewall or SELinux changes needed. Test the
	communication between the device and your nodes.

	-. Find the correct STONITH agent script: `pcs stonith list`
	+. Find the name of the correct fence agent: `pcs stonith list`

	-. Find the parameters associated with the device: +pcs stonith describe pass:[<replaceable>agent_name</replaceable>]+
	+. Find the parameters associated with the device:
	+ +pcs stonith describe pass:[<replaceable>agent_name</replaceable>]+

	. Create a local copy of the CIB: `pcs cluster cib stonith_cfg`

	. Create the fencing resource: +pcs -f stonith_cfg stonith create pass:[<replaceable>stonith_id
	stonith_device_type [stonith_device_options]</replaceable>]+
	+
	Any flags that do not take arguments, such as +--ssl+, should be passed as +ssl=1+.

	-. Enable STONITH in the cluster: `pcs -f stonith_cfg property set stonith-enabled=true`
	+. Enable fencing in the cluster: `pcs -f stonith_cfg property set stonith-enabled=true`

	-. If the device does not know how to fence nodes based on their uname,
	- you may also need to set the special pcmk_host_map parameter. See
	+. If the device does not know how to fence nodes based on their cluster node
	+ name, you may also need to set the special pcmk_host_map parameter. See
	`man pacemaker-fenced` for details.

	. If the device does not support the list command, you may also need
	to set the special pcmk_host_list and/or pcmk_host_check
	parameters. See `man pacemaker-fenced` for details.

	. If the device does not expect the victim to be specified with the
	port parameter, you may also need to set the special
	pcmk_host_argument parameter. See `man pacemaker-fenced` for details.

	. Commit the new configuration: `pcs cluster cib-push stonith_cfg`

	-. Once the STONITH resource is running, test it (you might want to stop
	- the cluster on that machine first): +stonith_admin --reboot pass:[<replaceable>nodename</replaceable>]+
	+. Once the fence device resource is running, test it (you might want to stop
	+ the cluster on that machine first):
	+ +stonith_admin --reboot pass:[<replaceable>nodename</replaceable>]+

	== Example ==

	For this example, assume we have a chassis containing four nodes
	-and an IPMI device active on 10.0.0.1. Following the steps above
	-would go something like this:
	+and a separately powered IPMI device active on 10.0.0.1. Following the steps
	+above would go something like this:

	Step 1: Install the fence-agents-ipmilan package on both nodes.

	Step 2: Configure the IP address, authentication credentials, etc. in the IPMI device itself.

	Step 3: Choose the fence_ipmilan STONITH agent.

	Step 4: Obtain the agent's possible parameters:
	----
	[root@pcmk-1 ~]# pcs stonith describe fence_ipmilan
	fence_ipmilan - Fence agent for IPMI

	fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.

	Stonith options:
	ipport: TCP/UDP port to use for connection with device
	hexadecimal_kg: Hexadecimal-encoded Kg key for IPMIv2 authentication
	port: IP address or hostname of fencing device (together with --port-as-ip)
	inet6_only: Forces agent to use IPv6 addresses only
	ipaddr: IP Address or Hostname
	passwd_script: Script to retrieve password
	method: Method to fence (onoff\|cycle)
	inet4_only: Forces agent to use IPv4 addresses only
	passwd: Login password or passphrase
	lanplus: Use Lanplus to improve security of connection
	auth: IPMI Lan Auth type.
	cipher: Ciphersuite to use (same as ipmitool -C parameter)
	target: Bridge IPMI requests to the remote target address
	privlvl: Privilege level on IPMI device
	timeout: Timeout (sec) for IPMI operation
	login: Login Name
	verbose: Verbose mode
	debug: Write debug information to given file
	power_wait: Wait X seconds after issuing ON/OFF
	login_timeout: Wait X seconds for cmd prompt after login
	delay: Wait X seconds before fencing is started
	power_timeout: Test X seconds for status change after ON/OFF
	ipmitool_path: Path to ipmitool binary
	shell_timeout: Wait X seconds for cmd prompt after issuing command
	port_as_ip: Make "port/plug" to be an alias to IP address
	retry_on: Count of attempts to retry power on
	sudo: Use sudo (without password) when calling 3rd party sotfware.
	priority: The priority of the stonith resource. Devices are tried in order of highest priority to lowest.
	pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names. Eg. node1:1;node2:2,3 would tell the cluster to use port 1 for node1 and ports 2 and
	3 for node2
	pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).
	pcmk_host_check: How to determine which machines are controlled by the device. Allowed values: dynamic-list (query the device), static-list (check the pcmk_host_list attribute), none
	(assume every device can fence every machine)
	pcmk_delay_max: Enable a random delay for stonith actions and specify the maximum of random delay. This prevents double fencing when using slow devices such as sbd. Use this to enable a
	random delay for stonith actions. The overall delay is derived from this random delay value adding a static delay so that the sum is kept below the maximum delay.
	pcmk_delay_base: Enable a base delay for stonith actions and specify base delay value. This prevents double fencing when different delays are configured on the nodes. Use this to enable
	a static delay for stonith actions. The overall delay is derived from a random delay value adding this static delay so that the sum is kept below the maximum delay.
	pcmk_action_limit: The maximum number of actions can be performed in parallel on this device Pengine property concurrent-fencing=true needs to be configured first. Then use this to
	specify the maximum number of actions can be performed in parallel on this device. -1 is unlimited.

	Default operations:
	monitor: interval=60s
	----

	Step 5: `pcs cluster cib stonith_cfg`

	-Step 6: Here are example parameters for creating our STONITH resource:
	+Step 6: Here are example parameters for creating our fence device resource:
	----
	[root@pcmk-1 ~]# pcs -f stonith_cfg stonith create ipmi-fencing fence_ipmilan \
	pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser \
	passwd=acd123 op monitor interval=60s
	[root@pcmk-1 ~]# pcs -f stonith_cfg stonith
	ipmi-fencing (stonith:fence_ipmilan): Stopped
	----

	-Steps 7-10: Enable STONITH in the cluster:
	+Steps 7-10: Enable fencing in the cluster:
	----
	[root@pcmk-1 ~]# pcs -f stonith_cfg property set stonith-enabled=true
	[root@pcmk-1 ~]# pcs -f stonith_cfg property
	Cluster Properties:
	cluster-infrastructure: corosync
	cluster-name: mycluster
	dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
	have-watchdog: false
	stonith-enabled: true
	----

	Step 11: `pcs cluster cib-push stonith_cfg --config`

	Step 12: Test:
	----
	[root@pcmk-1 ~]# pcs cluster stop pcmk-2
	[root@pcmk-1 ~]# stonith_admin --reboot pcmk-2
	----

	After a successful test, login to any rebooted nodes, and start the cluster
	(with `pcs cluster start`).
	diff --git a/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt b/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt
	index d4762e34fa..e21688dbc0 100644
	--- a/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt
	+++ b/doc/Clusters_from_Scratch/en-US/Ch-Verification.txt
	@@ -1,151 +1,210 @@
	:compat-mode: legacy
	= Start and Verify Cluster =

	== Start the Cluster ==

	Now that corosync is configured, it is time to start the cluster.
	The command below will start corosync and pacemaker on both nodes
	in the cluster. If you are issuing the start command from a different
	node than the one you ran the `pcs cluster auth` command on earlier, you
	must authenticate on the current node you are logged into before you will
	be allowed to start the cluster.

	----
	[root@pcmk-1 ~]# pcs cluster start --all
	pcmk-1: Starting Cluster...
	pcmk-2: Starting Cluster...
	----

	[NOTE]
	======
	An alternative to using the `pcs cluster start --all` command
	is to issue either of the below command sequences on each node in the
	cluster separately:

	----
	# pcs cluster start
	Starting Cluster...
	----

	or

	----
	# systemctl start corosync.service
	# systemctl start pacemaker.service
	----
	======

	[IMPORTANT]
	====
	In this example, we are not enabling the corosync and pacemaker services
	to start at boot. If a cluster node fails or is rebooted, you will need to run
	+pcs cluster start pass:[<replaceable>nodename</replaceable>]+ (or `--all`) to start the cluster on it.
	While you could enable the services to start at boot, requiring a manual
	start of cluster services gives you the opportunity to do a post-mortem investigation
	of a node failure before returning it to the cluster.
	====

	== Verify Corosync Installation ==

	First, use `corosync-cfgtool` to check whether cluster communication is happy:

	----
	[root@pcmk-1 ~]# corosync-cfgtool -s
	Printing ring status.
	Local node ID 1
	RING ID 0
	id = 192.168.122.101
	status = ring 0 active with no faults
	----

	We can see here that everything appears normal with our fixed IP
	address (not a 127.0.0.x loopback address) listed as the id, and *no
	faults* for the status.

	If you see something different, you might want to start by checking
	the node's network, firewall and SELinux configurations.

	Next, check the membership and quorum APIs:

	----
	[root@pcmk-1 ~]# corosync-cmapctl \| grep members
	runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
	runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.122.101)
	runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
	runtime.totem.pg.mrp.srp.members.1.status (str) = joined
	runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
	runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.122.102)
	runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
	runtime.totem.pg.mrp.srp.members.2.status (str) = joined

	[root@pcmk-1 ~]# pcs status corosync

	Membership information
	\----------------------
	Nodeid Votes Name
	1 1 pcmk-1 (local)
	2 1 pcmk-2
	----

	You should see both nodes have joined the cluster.

	== Verify Pacemaker Installation ==

	Now that we have confirmed that Corosync is functional, we can check
	the rest of the stack. Pacemaker has already been started, so verify
	the necessary processes are running:

	----
	[root@pcmk-1 ~]# ps axf
	PID TTY STAT TIME COMMAND
	2 ? S 0:00 [kthreadd]
	...lots of processes...
	11635 ? SLsl 0:03 corosync
	11642 ? Ss 0:00 /usr/sbin/pacemakerd -f
	11643 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib
	11644 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
	11645 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
	11646 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
	11647 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
	11648 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
	----

	If that looks OK, check the `pcs status` output:

	----
	[root@pcmk-1 ~]# pcs status
	Cluster name: mycluster
	WARNING: no stonith devices and stonith-enabled is not false
	Stack: corosync
	Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
	Last updated: Mon Sep 10 16:37:34 2018
	Last change: Mon Sep 10 16:30:53 2018 by hacluster via crmd on pcmk-2

	2 nodes configured
	0 resources configured

	Online: [ pcmk-1 pcmk-2 ]

	No resources


	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	Finally, ensure there are no start-up errors from corosync or pacemaker (aside
	from messages relating to not having STONITH configured, which are OK at this
	point):
	----
	[root@pcmk-1 ~]# journalctl -b \| grep -i error
	----

	[NOTE]
	======
	Other operating systems may report startup errors in other locations,
	for example +/var/log/messages+.
	======

	Repeat these checks on the other node. The results should be the same.
	+
	+== Explore the Existing Configuration ==
	+
	+For those who are not of afraid of XML, you can see the raw cluster
	+configuration and status by using the `pcs cluster cib` command.
	+
	+.The last XML you'll see in this document
	+======
	+----
	+[root@pcmk-1 ~]# pcs cluster cib
	+----
	+[source,XML]
	+----
	+<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="5" num_updates="4" admin_epoch="0" cib-last-written="Mon Sep 10 16:30:53 2018" update-origin="pcmk-2" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2">
	+ <configuration>
	+ <crm_config>
	+ <cluster_property_set id="cib-bootstrap-options">
	+ <nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
	+ <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.18-11.el7_5.3-2b07d5c5a9"/>
	+ <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
	+ <nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="mycluster"/>
	+ </cluster_property_set>
	+ </crm_config>
	+ <nodes>
	+ <node id="1" uname="pcmk-1"/>
	+ <node id="2" uname="pcmk-2"/>
	+ </nodes>
	+ <resources/>
	+ <constraints/>
	+ </configuration>
	+ <status>
	+ <node_state id="1" uname="pcmk-1" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
	+ <lrm id="1">
	+ <lrm_resources/>
	+ </lrm>
	+ </node_state>
	+ <node_state id="2" uname="pcmk-2" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
	+ <lrm id="2">
	+ <lrm_resources/>
	+ </lrm>
	+ </node_state>
	+ </status>
	+</cib>
	+----
	+======
	+
	+Before we make any changes, it's a good idea to check the validity of
	+the configuration.
	+
	+----
	+[root@pcmk-1 ~]# crm_verify -L -V
	+ error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
	+ error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
	+ error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
	+Errors found during check: config not valid
	+----
	+
	+As you can see, the tool has found some errors. The cluster will not start any
	+resources until we configure STONITH.
	diff --git a/doc/Clusters_from_Scratch/en-US/Clusters_from_Scratch.xml b/doc/Clusters_from_Scratch/en-US/Clusters_from_Scratch.xml
	index 7893d91adc..d69f167134 100644
	--- a/doc/Clusters_from_Scratch/en-US/Clusters_from_Scratch.xml
	+++ b/doc/Clusters_from_Scratch/en-US/Clusters_from_Scratch.xml
	@@ -1,24 +1,24 @@
	<?xml version='1.0' encoding='utf-8' ?>
	<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
	<!ENTITY % BOOK_ENTITIES SYSTEM "Clusters_from_Scratch.ent">
	%BOOK_ENTITIES;
	]>
	<book>
	<xi:include href="Book_Info.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Preface.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Intro.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Installation.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Cluster-Setup.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Verification.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	+ <xi:include href="Ch-Fencing.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Active-Passive.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Apache.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Shared-Storage.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	- <xi:include href="Ch-Stonith.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ch-Active-Active.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ap-Configuration.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ap-Corosync-Conf.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Ap-Reading.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Revision_History.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<index />
	</book>

File Metadata

Mime Type: text/x-diff
Expires: Sat, Nov 23, 5:01 PM (14 h, 12 m)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 1019021
Default Alt Text: (33 KB)

No OneTemporaryActions

View Options

File Metadata

Event Timeline

No OneTemporary
Actions