No OneTemporary
Actions

Size

108 KB

Referenced Files

None

Subscribers

None

View Options

	diff --git a/doc/Pacemaker_Explained/en-US/Book_Info.xml b/doc/Pacemaker_Explained/en-US/Book_Info.xml
	index bce0089524..c189d07a6c 100644
	--- a/doc/Pacemaker_Explained/en-US/Book_Info.xml
	+++ b/doc/Pacemaker_Explained/en-US/Book_Info.xml
	@@ -1,35 +1,35 @@
	<?xml version='1.0' encoding='utf-8' ?>
	<!DOCTYPE bookinfo PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
	]>
	<bookinfo>
	<title>Configuration Explained</title>
	<subtitle>An A-Z guide to Pacemaker's Configuration Options</subtitle>
	<productname>Pacemaker</productname>
	<productnumber>1.1</productnumber>
	<!--
	EDITION-PUBSNUMBER should match REVNUMBER in Revision_History.xml.
	Increment EDITION when the syntax of the documented software
	changes (pacemaker), and PUBSNUMBER for
	simple textual changes (corrections, translations, etc.).
	-->
	- <edition>6</edition>
	+ <edition>7</edition>
	<pubsnumber>0</pubsnumber>
	<abstract>
	<para>
	The purpose of this document is to definitively explain the concepts used to configure Pacemaker.
	To achieve this, it will focus exclusively on the XML syntax used to configure Pacemaker's
	Cluster Information Base (CIB).
	</para>
	</abstract>
	<corpauthor>
	<inlinemediaobject>
	<imageobject>
	<imagedata fileref="Common_Content/images/title_logo.svg" format="SVG"/>
	</imageobject>
	</inlinemediaobject>
	</corpauthor>
	<xi:include href="Common_Content/Legal_Notice.xml" xmlns:xi="http://www.w3.org/2001/XInclude">
	</xi:include>
	<xi:include href="Author_Group.xml" xmlns:xi="http://www.w3.org/2001/XInclude">
	</xi:include>
	</bookinfo>
	diff --git a/doc/Pacemaker_Explained/en-US/Ch-Options.txt b/doc/Pacemaker_Explained/en-US/Ch-Options.txt
	index 0c1a2e7620..a2fbfe2473 100644
	--- a/doc/Pacemaker_Explained/en-US/Ch-Options.txt
	+++ b/doc/Pacemaker_Explained/en-US/Ch-Options.txt
	@@ -1,404 +1,409 @@
	= Cluster-Wide Configuration =

	== CIB Properties ==

	Certain settings are defined by CIB properties (that is, attributes of the
	+cib+ tag) rather than with the rest of the cluster configuration in the
	+configuration+ section.

	The reason is simply a matter of parsing. These options are used by the
	configuration database which is, by design, mostly ignorant of the content it
	holds. So the decision was made to place them in an easy-to-find location.

	.CIB Properties
	[width="95%",cols="2m,5<",options="header",align="center"]
	\|=========================================================
	\|Field \|Description

	\| admin_epoch \|
	indexterm:[Configuration Version,Cluster]
	indexterm:[Cluster,Option,Configuration Version]
	indexterm:[admin_epoch,Cluster Option]
	indexterm:[Cluster,Option,admin_epoch]
	When a node joins the cluster, the cluster performs a check to see
	which node has the best configuration. It asks the node with the highest
	(+admin_epoch+, +epoch+, +num_updates+) tuple to replace the configuration on
	all the nodes -- which makes setting them, and setting them correctly, very
	important. +admin_epoch+ is never modified by the cluster; you can use this
	to make the configurations on any inactive nodes obsolete. _Never set this
	value to zero_. In such cases, the cluster cannot tell the difference between
	your configuration and the "empty" one used when nothing is found on disk.

	\| epoch \|
	indexterm:[epoch,Cluster Option]
	indexterm:[Cluster,Option,epoch]
	The cluster increments this every time the configuration is updated (usually by
	the administrator).

	\| num_updates \|
	indexterm:[num_updates,Cluster Option]
	indexterm:[Cluster,Option,num_updates]
	The cluster increments this every time the configuration or status is updated
	(usually by the cluster) and resets it to 0 when epoch changes.

	\| validate-with \|
	indexterm:[validate-with,Cluster Option]
	indexterm:[Cluster,Option,validate-with]
	Determines the type of XML validation that will be done on the configuration.
	If set to +none+, the cluster will not verify that updates conform to the
	DTD (nor reject ones that don't). This option can be useful when
	operating a mixed-version cluster during an upgrade.

	\|cib-last-written \|
	indexterm:[cib-last-written,Cluster Property]
	indexterm:[Cluster,Property,cib-last-written]
	Indicates when the configuration was last written to disk. Maintained by the
	cluster; for informational purposes only.

	\|have-quorum \|
	indexterm:[have-quorum,Cluster Property]
	indexterm:[Cluster,Property,have-quorum]
	Indicates if the cluster has quorum. If false, this may mean that the
	cluster cannot start resources or fence other nodes (see
	+no-quorum-policy+ below). Maintained by the cluster.

	\|dc-uuid \|
	indexterm:[dc-uuid,Cluster Property]
	indexterm:[Cluster,Property,dc-uuid]
	Indicates which cluster node is the current leader. Used by the
	cluster when placing resources and determining the order of some
	events. Maintained by the cluster.

	\|=========================================================

	=== Working with CIB Properties ===

	Although these fields can be written to by the user, in
	most cases the cluster will overwrite any values specified by the
	user with the "correct" ones.

	To change the ones that can be specified by the user,
	for example +admin_epoch+, one should use:
	----
	# cibadmin --modify --crm_xml '<cib admin_epoch="42"/>'
	----

	A complete set of CIB properties will look something like this:

	.Attributes set for a cib object
	======
	[source,XML]
	-------
	<cib crm_feature_set="3.0.7" validate-with="pacemaker-1.2"
	admin_epoch="42" epoch="116" num_updates="1"
	cib-last-written="Mon Jan 12 15:46:39 2015" update-origin="rhel7-1"
	update-client="crm_attribute" have-quorum="1" dc-uuid="1">
	-------
	======

	== Cluster Options ==

	Cluster options, as you might expect, control how the cluster behaves
	when confronted with certain situations.

	They are grouped into sets within the +crm_config+ section, and, in advanced
	configurations, there may be more than one set. (This will be described later
	in the section on <<ch-rules>> where we will show how to have the cluster use
	different sets of options during working hours than during weekends.) For now,
	we will describe the simple case where each option is present at most once.

	You can obtain an up-to-date list of cluster options, including
	their default values, by running the `man pengine` and `man crmd` commands.

	.Cluster Options
	[width="95%",cols="5m,2,11<a",options="header",align="center"]
	\|=========================================================
	\|Option \|Default \|Description

	\| dc-version \| \|
	indexterm:[dc-version,Cluster Property]
	indexterm:[Cluster,Property,dc-version]
	Version of Pacemaker on the cluster's DC.
	Determined automatically by the cluster.
	Often includes the hash which identifies the exact Git changeset it was built
	from. Used for diagnostic purposes.

	\| cluster-infrastructure \| \|
	indexterm:[cluster-infrastructure,Cluster Property]
	indexterm:[Cluster,Property,cluster-infrastructure]
	The messaging stack on which Pacemaker is currently running.
	Determined automatically by the cluster.
	Used for informational and diagnostic purposes.

	\| expected-quorum-votes \| \|
	indexterm:[expected-quorum-votes,Cluster Property]
	indexterm:[Cluster,Property,expected-quorum-votes]
	The number of nodes expected to be in the cluster.
	Determined automatically by the cluster.
	Used to calculate quorum in clusters that use Corosync 1.x without CMAN
	as the messaging layer.

	\| no-quorum-policy \| stop \|
	indexterm:[no-quorum-policy,Cluster Option]
	indexterm:[Cluster,Option,no-quorum-policy]
	What to do when the cluster does not have quorum. Allowed values:

	* +ignore:+ continue all resource management
	* +freeze:+ continue resource management, but don't recover resources from nodes not in the affected partition
	* +stop:+ stop all resources in the affected cluster partition
	* +suicide:+ fence all nodes in the affected cluster partition

	\| batch-limit \| 30 \|
	indexterm:[batch-limit,Cluster Option]
	indexterm:[Cluster,Option,batch-limit]
	The number of jobs that the Transition Engine (TE) is allowed to execute in
	parallel. The TE is the logic in pacemaker's CRMd that executes the actions
	determined by the Policy Engine (PE). The "correct" value will depend on the
	speed and load of your network and cluster nodes.

	\| migration-limit \| -1 \|
	indexterm:[migration-limit,Cluster Option]
	indexterm:[Cluster,Option,migration-limit]
	The number of migration jobs that the TE is allowed to execute in
	parallel on a node. A value of -1 means unlimited.

	\| symmetric-cluster \| TRUE \|
	indexterm:[symmetric-cluster,Cluster Option]
	indexterm:[Cluster,Option,symmetric-cluster]
	Can all resources run on any node by default?

	\| stop-all-resources \| FALSE \|
	indexterm:[stop-all-resources,Cluster Option]
	indexterm:[Cluster,Option,stop-all-resources]
	Should the cluster stop all resources?

	\| stop-orphan-resources \| TRUE \|
	indexterm:[stop-orphan-resources,Cluster Option]
	indexterm:[Cluster,Option,stop-orphan-resources]
	Should deleted resources be stopped?

	\| stop-orphan-actions \| TRUE \|
	indexterm:[stop-orphan-actions,Cluster Option]
	indexterm:[Cluster,Option,stop-orphan-actions]
	Should deleted actions be cancelled?

	\| start-failure-is-fatal \| TRUE \|
	indexterm:[start-failure-is-fatal,Cluster Option]
	indexterm:[Cluster,Option,start-failure-is-fatal]
	Should a failure to start a resource on a particular node prevent further start
	attempts on that node? If FALSE, the cluster will decide whether to try
	starting on the same node again based on the resource's current failure count
	and +migration-threshold+ (see <<s-failure-migration>>).

	\| enable-startup-probes \| TRUE \|
	indexterm:[enable-startup-probes,Cluster Option]
	indexterm:[Cluster,Option,enable-startup-probes]
	Should the cluster check for active resources during startup?

	\| maintenance-mode \| FALSE \|
	indexterm:[maintenance-mode,Cluster Option]
	indexterm:[Cluster,Option,maintenance-mode]
	Should the cluster refrain from monitoring, starting and stopping resources?

	\| stonith-enabled \| TRUE \|
	indexterm:[stonith-enabled,Cluster Option]
	indexterm:[Cluster,Option,stonith-enabled]
	Should failed nodes and nodes with resources that can't be stopped be
	shot? If you value your data, set up a STONITH device and enable this.

	If true, or unset, the cluster will refuse to start resources unless
	one or more STONITH resources have been configured.
	If false, unresponsive nodes are immediately assumed to be running no
	resources, and resource takeover to online nodes starts without any
	further protection (which means _data loss_ if the unresponsive node
	still accesses shared storage, for example). See also the +requires+
	meta-attribute in <<s-resource-options>>.

	\| stonith-action \| reboot \|
	indexterm:[stonith-action,Cluster Option]
	indexterm:[Cluster,Option,stonith-action]
	Action to send to STONITH device. Allowed values are +reboot+ and +off+.
	The value +poweroff+ is also allowed, but is only used for
	legacy devices.

	\| stonith-timeout \| 60s \|
	indexterm:[stonith-timeout,Cluster Option]
	indexterm:[Cluster,Option,stonith-timeout]
	How long to wait for STONITH actions (reboot, on, off) to complete

	+\| concurrent-fencing \| FALSE \|
	+indexterm:[concurrent-fencing,Cluster Option]
	+indexterm:[Cluster,Option,concurrent-fencing]
	+Is the cluster allowed to initiate multiple fence actions concurrently?
	+
	\| cluster-delay \| 60s \|
	indexterm:[cluster-delay,Cluster Option]
	indexterm:[Cluster,Option,cluster-delay]
	Estimated maximum round-trip delay over the network (excluding action
	execution). If the TE requires an action to be executed on another node,
	it will consider the action failed if it does not get a response
	from the other node in this time (after considering the action's
	own timeout). The "correct" value will depend on the speed and load of your
	network and cluster nodes.

	\| dc-deadtime \| 20s \|
	indexterm:[dc-deadtime,Cluster Option]
	indexterm:[Cluster,Option,dc-deadtime]
	How long to wait for a response from other nodes during startup.

	The "correct" value will depend on the speed/load of your network and the type of switches used.

	\| cluster-recheck-interval \| 15min \|
	indexterm:[cluster-recheck-interval,Cluster Option]
	indexterm:[Cluster,Option,cluster-recheck-interval]
	Polling interval for time-based changes to options, resource parameters and constraints.

	The Cluster is primarily event-driven, but your configuration can have
	elements that take effect based on the time of day. To ensure these changes
	take effect, we can optionally poll the cluster's status for changes. A value
	of 0 disables polling. Positive values are an interval (in seconds unless other
	SI units are specified, e.g. 5min).

	\| pe-error-series-max \| -1 \|
	indexterm:[pe-error-series-max,Cluster Option]
	indexterm:[Cluster,Option,pe-error-series-max]
	The number of PE inputs resulting in ERRORs to save. Used when reporting problems.
	A value of -1 means unlimited (report all).

	\| pe-warn-series-max \| -1 \|
	indexterm:[pe-warn-series-max,Cluster Option]
	indexterm:[Cluster,Option,pe-warn-series-max]
	The number of PE inputs resulting in WARNINGs to save. Used when reporting problems.
	A value of -1 means unlimited (report all).

	\| pe-input-series-max \| -1 \|
	indexterm:[pe-input-series-max,Cluster Option]
	indexterm:[Cluster,Option,pe-input-series-max]
	The number of "normal" PE inputs to save. Used when reporting problems.
	A value of -1 means unlimited (report all).

	\| remove-after-stop \| FALSE \|
	indexterm:[remove-after-stop,Cluster Option]
	indexterm:[Cluster,Option,remove-after-stop]
	_Advanced Use Only:_ Should the cluster remove resources from the LRM after
	they are stopped? Values other than the default are, at best, poorly tested and
	potentially dangerous.

	\| startup-fencing \| TRUE \|
	indexterm:[startup-fencing,Cluster Option]
	indexterm:[Cluster,Option,startup-fencing]
	_Advanced Use Only:_ Should the cluster shoot unseen nodes?
	Not using the default is very unsafe!

	\| election-timeout \| 2min \|
	indexterm:[election-timeout,Cluster Option]
	indexterm:[Cluster,Option,election-timeout]
	_Advanced Use Only:_ If you need to adjust this value, it probably indicates
	the presence of a bug.

	\| shutdown-escalation \| 20min \|
	indexterm:[shutdown-escalation,Cluster Option]
	indexterm:[Cluster,Option,shutdown-escalation]
	_Advanced Use Only:_ If you need to adjust this value, it probably indicates
	the presence of a bug.

	\| crmd-integration-timeout \| 3min \|
	indexterm:[crmd-integration-timeout,Cluster Option]
	indexterm:[Cluster,Option,crmd-integration-timeout]
	_Advanced Use Only:_ If you need to adjust this value, it probably indicates
	the presence of a bug.

	\| crmd-finalization-timeout \| 30min \|
	indexterm:[crmd-finalization-timeout,Cluster Option]
	indexterm:[Cluster,Option,crmd-finalization-timeout]
	_Advanced Use Only:_ If you need to adjust this value, it probably indicates
	the presence of a bug.

	\| crmd-transition-delay \| 0s \|
	indexterm:[crmd-transition-delay,Cluster Option]
	indexterm:[Cluster,Option,crmd-transition-delay]
	_Advanced Use Only:_ Delay cluster recovery for the configured interval to
	allow for additional/related events to occur. Useful if your configuration is
	sensitive to the order in which ping updates arrive.
	Enabling this option will slow down cluster recovery under
	all conditions.

	\|default-resource-stickiness \| 0 \|
	indexterm:[default-resource-stickiness,Cluster Option]
	indexterm:[Cluster,Option,default-resource-stickiness]
	_Deprecated:_ See <<s-resource-defaults>> instead

	\| is-managed-default \| TRUE \|
	indexterm:[is-managed-default,Cluster Option]
	indexterm:[Cluster,Option,is-managed-default]
	_Deprecated:_ See <<s-resource-defaults>> instead

	\| default-action-timeout \| 20s \|
	indexterm:[default-action-timeout,Cluster Option]
	indexterm:[Cluster,Option,default-action-timeout]
	_Deprecated:_ See <<s-operation-defaults>> instead

	\|=========================================================

	=== Querying and Setting Cluster Options ===

	indexterm:[Querying,Cluster Option]
	indexterm:[Setting,Cluster Option]
	indexterm:[Cluster,Querying Options]
	indexterm:[Cluster,Setting Options]

	Cluster options can be queried and modified using the `crm_attribute` tool. To
	get the current value of +cluster-delay+, you can run:

	----
	# crm_attribute --query --name cluster-delay
	----

	which is more simply written as

	----
	# crm_attribute -G -n cluster-delay
	----

	If a value is found, you'll see a result like this:

	----
	# crm_attribute -G -n cluster-delay
	scope=crm_config name=cluster-delay value=60s
	----

	If no value is found, the tool will display an error:

	----
	# crm_attribute -G -n clusta-deway
	scope=crm_config name=clusta-deway value=(null)
	Error performing operation: No such device or address
	----

	To use a different value (for example, 30 seconds), simply run:

	----
	# crm_attribute --name cluster-delay --update 30s
	----

	To go back to the cluster's default value, you can delete the value, for example:

	----
	# crm_attribute --name cluster-delay --delete
	Deleted crm_config option: id=cib-bootstrap-options-cluster-delay name=cluster-delay
	----

	=== When Options are Listed More Than Once ===

	If you ever see something like the following, it means that the option you're modifying is present more than once.

	.Deleting an option that is listed twice
	=======
	------
	# crm_attribute --name batch-limit --delete

	Multiple attributes match name=batch-limit in crm_config:
	Value: 50 (set=cib-bootstrap-options, id=cib-bootstrap-options-batch-limit)
	Value: 100 (set=custom, id=custom-batch-limit)
	Please choose from one of the matches above and supply the 'id' with --id
	-------
	=======

	In such cases, follow the on-screen instructions to perform the
	requested action. To determine which value is currently being used by
	the cluster, refer to <<ch-rules>>.
	diff --git a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
	index a5bcf0dcfa..d2880e0843 100644
	--- a/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
	+++ b/doc/Pacemaker_Explained/en-US/Ch-Stonith.txt
	@@ -1,892 +1,901 @@
	= STONITH =

	////
	We prefer [[ch-stonith]], but older versions of asciidoc don't deal well
	with that construct for chapter headings
	////
	anchor:ch-stonith[Chapter 13, STONITH]
	indexterm:[STONITH, Configuration]

	== What Is STONITH? ==

	STONITH (an acronym for "Shoot The Other Node In The Head"), also called
	'fencing', protects your data from being corrupted by rogue nodes or concurrent
	access.

	Just because a node is unresponsive, this doesn't mean it isn't
	accessing your data. The only way to be 100% sure that your data is
	safe, is to use STONITH so we can be certain that the node is truly
	offline, before allowing the data to be accessed from another node.

	STONITH also has a role to play in the event that a clustered service
	cannot be stopped. In this case, the cluster uses STONITH to force the
	whole node offline, thereby making it safe to start the service
	elsewhere.

	== What STONITH Device Should You Use? ==

	It is crucial that the STONITH device can allow the cluster to
	differentiate between a node failure and a network one.

	The biggest mistake people make in choosing a STONITH device is to
	use a remote power switch (such as many on-board IPMI controllers) that
	shares power with the node it controls. In such cases, the cluster
	cannot be sure if the node is really offline, or active and suffering
	from a network fault.

	Likewise, any device that relies on the machine being active (such as
	SSH-based "devices" used during testing) are inappropriate.

	== Special Treatment of STONITH Resources ==

	STONITH resources are somewhat special in Pacemaker.

	STONITH may be initiated by pacemaker or by other parts of the cluster
	(such as resources like DRBD or DLM). To accommodate this, pacemaker
	does not require the STONITH resource to be in the 'started' state
	in order to be used, thus allowing reliable use of STONITH devices in such a
	case.

	[NOTE]
	====
	In pacemaker versions 1.1.9 and earlier, this feature either did not exist or
	did not work well. Only "running" STONITH resources could be used by Pacemaker
	for fencing, and if another component tried to fence a node while Pacemaker was
	moving STONITH resources, the fencing could fail.
	====

	All nodes have access to STONITH devices' definitions and instantiate them
	on-the-fly when needed, but preference is given to 'verified' instances, which
	are the ones that are 'started' according to the cluster's knowledge.

	In the case of a cluster split, the partition with a verified instance
	will have a slight advantage, because the STONITH daemon in the other partition
	will have to hear from all its current peers before choosing a node to
	perform the fencing.

	Fencing resources do work the same as regular resources in some respects:

	* +target-role+ can be used to enable or disable the resource
	* Location constraints can be used to prevent a specific node from using the resource

	[IMPORTANT]
	===========
	Currently there is a limitation that fencing resources may only have
	one set of meta-attributes and one set of instance attributes. This
	can be revisited if it becomes a significant limitation for people.
	===========

	See the table below or run `man stonithd` to see special instance attributes
	that may be set for any fencing resource, regardless of fence agent.

	.Properties of Fencing Resources
	[width="95%",cols="5m,2,3,10<a",options="header",align="center"]
	\|=========================================================

	\|Field
	\|Type
	\|Default
	\|Description

	\|stonith-timeout
	\|NA
	\|NA
	\|Older versions used this to override the default period to wait for a STONITH (reboot, on, off) action to complete for this device.
	It has been replaced by the +pcmk_reboot_timeout+ and +pcmk_off_timeout+ properties.
	indexterm:[stonith-timeout,Fencing]
	indexterm:[Fencing,Property,stonith-timeout]

	\|priority
	\|integer
	\|0
	\|The priority of the STONITH resource. Devices are tried in order of highest priority to lowest.
	indexterm:[priority,Fencing]
	indexterm:[Fencing,Property,priority]

	\|pcmk_host_map
	\|string
	\|
	\|A mapping of host names to ports numbers for devices that do not support host names.
	Example: +node1:1;node2:2,3+ tells the cluster to use port 1 for
	node1 and ports 2 and 3 for node2.
	indexterm:[pcmk_host_map,Fencing]
	indexterm:[Fencing,Property,pcmk_host_map]

	\|pcmk_host_list
	\|string
	\|
	\|A list of machines controlled by this device (optional unless
	+pcmk_host_check+ is +static-list+).
	indexterm:[pcmk_host_list,Fencing]
	indexterm:[Fencing,Property,pcmk_host_list]

	\|pcmk_host_check
	\|string
	\|dynamic-list
	\|How to determine which machines are controlled by the device.
	Allowed values:

	* +dynamic-list:+ query the device
	* +static-list:+ check the +pcmk_host_list+ attribute
	* +none:+ assume every device can fence every machine

	indexterm:[pcmk_host_check,Fencing]
	indexterm:[Fencing,Property,pcmk_host_check]

	\|pcmk_delay_max
	\|time
	\|0s
	\|Enable a random delay of up to the time specified before executing stonith
	actions. This is sometimes used in two-node clusters to ensure that the
	nodes don't fence each other at the same time.

	indexterm:[pcmk_delay_max,Fencing]
	indexterm:[Fencing,Property,pcmk_delay_max]

	+\|pcmk_action_limit
	+\|integer
	+\|1
	+\|The maximum number of actions that can be performed in parallel on this
	+ device, if the cluster option +concurrent-fencing+ is +true+. -1 is unlimited.
	+
	+indexterm:[pcmk_action_limit,Fencing]
	+indexterm:[Fencing,Property,pcmk_action_limit]
	+
	\|pcmk_host_argument
	\|string
	\|port
	\|'Advanced use only.' Which parameter should be supplied to the resource agent
	to identify the node to be fenced. Some devices do not support the standard
	+port+ parameter or may provide additional ones. Use this to specify an
	alternate, device-specific parameter. A value of +none+ tells the
	cluster not to supply any additional parameters.
	indexterm:[pcmk_host_argument,Fencing]
	indexterm:[Fencing,Property,pcmk_host_argument]

	\|pcmk_reboot_action
	\|string
	\|reboot
	\|'Advanced use only.' The command to send to the resource agent in order to
	reboot a node. Some devices do not support the standard commands or may provide
	additional ones. Use this to specify an alternate, device-specific command.
	indexterm:[pcmk_reboot_action,Fencing]
	indexterm:[Fencing,Property,pcmk_reboot_action]

	\|pcmk_reboot_timeout
	\|time
	\|60s
	\|'Advanced use only.' Specify an alternate timeout to use for `reboot` actions
	instead of the value of +stonith-timeout+. Some devices need much more or less
	time to complete than normal. Use this to specify an alternate, device-specific
	timeout.
	indexterm:[pcmk_reboot_timeout,Fencing]
	indexterm:[Fencing,Property,pcmk_reboot_timeout]
	indexterm:[stonith-timeout,Fencing]
	indexterm:[Fencing,Property,stonith-timeout]

	\|pcmk_reboot_retries
	\|integer
	\|2
	\|'Advanced use only.' The maximum number of times to retry the `reboot` command
	within the timeout period. Some devices do not support multiple connections, and
	operations may fail if the device is busy with another task, so Pacemaker will
	automatically retry the operation, if there is time remaining. Use this option
	to alter the number of times Pacemaker retries before giving up.
	indexterm:[pcmk_reboot_retries,Fencing]
	indexterm:[Fencing,Property,pcmk_reboot_retries]

	\|pcmk_off_action
	\|string
	\|off
	\|'Advanced use only.' The command to send to the resource agent in order to
	shut down a node. Some devices do not support the standard commands or may provide
	additional ones. Use this to specify an alternate, device-specific command.
	indexterm:[pcmk_off_action,Fencing]
	indexterm:[Fencing,Property,pcmk_off_action]

	\|pcmk_off_timeout
	\|time
	\|60s
	\|'Advanced use only.' Specify an alternate timeout to use for `off` actions
	instead of the value of +stonith-timeout+. Some devices need much more or less
	time to complete than normal. Use this to specify an alternate, device-specific
	timeout.
	indexterm:[pcmk_off_timeout,Fencing]
	indexterm:[Fencing,Property,pcmk_off_timeout]
	indexterm:[stonith-timeout,Fencing]
	indexterm:[Fencing,Property,stonith-timeout]

	\|pcmk_off_retries
	\|integer
	\|2
	\|'Advanced use only.' The maximum number of times to retry the `off` command
	within the timeout period. Some devices do not support multiple connections, and
	operations may fail if the device is busy with another task, so Pacemaker will
	automatically retry the operation, if there is time remaining. Use this option
	to alter the number of times Pacemaker retries before giving up.
	indexterm:[pcmk_off_retries,Fencing]
	indexterm:[Fencing,Property,pcmk_off_retries]

	\|pcmk_list_action
	\|string
	\|list
	\|'Advanced use only.' The command to send to the resource agent in order to
	list nodes. Some devices do not support the standard commands or may provide
	additional ones. Use this to specify an alternate, device-specific command.
	indexterm:[pcmk_list_action,Fencing]
	indexterm:[Fencing,Property,pcmk_list_action]

	\|pcmk_list_timeout
	\|time
	\|60s
	\|'Advanced use only.' Specify an alternate timeout to use for `list` actions
	instead of the value of +stonith-timeout+. Some devices need much more or less
	time to complete than normal. Use this to specify an alternate, device-specific
	timeout.
	indexterm:[pcmk_list_timeout,Fencing]
	indexterm:[Fencing,Property,pcmk_list_timeout]

	\|pcmk_list_retries
	\|integer
	\|2
	\|'Advanced use only.' The maximum number of times to retry the `list` command
	within the timeout period. Some devices do not support multiple connections, and
	operations may fail if the device is busy with another task, so Pacemaker will
	automatically retry the operation, if there is time remaining. Use this option
	to alter the number of times Pacemaker retries before giving up.
	indexterm:[pcmk_list_retries,Fencing]
	indexterm:[Fencing,Property,pcmk_list_retries]

	\|pcmk_monitor_action
	\|string
	\|monitor
	\|'Advanced use only.' The command to send to the resource agent in order to
	report extended status. Some devices do not support the standard commands or may provide
	additional ones. Use this to specify an alternate, device-specific command.
	indexterm:[pcmk_monitor_action,Fencing]
	indexterm:[Fencing,Property,pcmk_monitor_action]

	\|pcmk_monitor_timeout
	\|time
	\|60s
	\|'Advanced use only.' Specify an alternate timeout to use for `monitor` actions
	instead of the value of +stonith-timeout+. Some devices need much more or less
	time to complete than normal. Use this to specify an alternate, device-specific
	timeout.
	indexterm:[pcmk_monitor_timeout,Fencing]
	indexterm:[Fencing,Property,pcmk_monitor_timeout]

	\|pcmk_monitor_retries
	\|integer
	\|2
	\|'Advanced use only.' The maximum number of times to retry the `monitor` command
	within the timeout period. Some devices do not support multiple connections, and
	operations may fail if the device is busy with another task, so Pacemaker will
	automatically retry the operation, if there is time remaining. Use this option
	to alter the number of times Pacemaker retries before giving up.
	indexterm:[pcmk_monitor_retries,Fencing]
	indexterm:[Fencing,Property,pcmk_monitor_retries]

	\|pcmk_status_action
	\|string
	\|status
	\|'Advanced use only.' The command to send to the resource agent in order to
	report status. Some devices do not support the standard commands or may provide
	additional ones. Use this to specify an alternate, device-specific command.
	indexterm:[pcmk_status_action,Fencing]
	indexterm:[Fencing,Property,pcmk_status_action]

	\|pcmk_status_timeout
	\|time
	\|60s
	\|'Advanced use only.' Specify an alternate timeout to use for `status` actions
	instead of the value of +stonith-timeout+. Some devices need much more or less
	time to complete than normal. Use this to specify an alternate, device-specific
	timeout.
	indexterm:[pcmk_status_timeout,Fencing]
	indexterm:[Fencing,Property,pcmk_status_timeout]

	\|pcmk_status_retries
	\|integer
	\|2
	\|'Advanced use only.' The maximum number of times to retry the `status` command
	within the timeout period. Some devices do not support multiple connections, and
	operations may fail if the device is busy with another task, so Pacemaker will
	automatically retry the operation, if there is time remaining. Use this option
	to alter the number of times Pacemaker retries before giving up.
	indexterm:[pcmk_status_retries,Fencing]
	indexterm:[Fencing,Property,pcmk_status_retries]

	\|=========================================================

	== Configuring STONITH ==

	[NOTE]
	===========
	Higher-level configuration shells include functionality to simplify the
	process below, particularly the step for deciding which parameters are
	required. However since this document deals only with core
	components, you should refer to the STONITH chapter of the
	http://www.clusterlabs.org/doc/[Clusters from Scratch] guide for those details.
	===========

	. Find the correct driver:
	+
	----
	# stonith_admin --list-installed
	----

	. Find the required parameters associated with the device
	(replacing $AGENT_NAME with the name obtained from the previous step):
	+
	----
	# stonith_admin --metadata --agent $AGENT_NAME
	----

	. Create a file called +stonith.xml+ containing a primitive resource
	with a class of +stonith+, a type equal to the agent name obtained earlier,
	and a parameter for each of the values returned in the previous step.

	. If the device does not know how to fence nodes based on their uname,
	you may also need to set the special +pcmk_host_map+ parameter. See
	`man stonithd` for details.

	. If the device does not support the `list` command, you may also need
	to set the special +pcmk_host_list+ and/or +pcmk_host_check+
	parameters. See `man stonithd` for details.

	. If the device does not expect the victim to be specified with the
	`port` parameter, you may also need to set the special
	+pcmk_host_argument+ parameter. See `man stonithd` for details.

	. Upload it into the CIB using cibadmin:
	+
	----
	# cibadmin -C -o resources --xml-file stonith.xml
	----

	. Set +stonith-enabled+ to true:
	+
	----
	# crm_attribute -t crm_config -n stonith-enabled -v true
	----

	. Once the stonith resource is running, you can test it by executing the
	following (although you might want to stop the cluster on that machine
	first):
	+
	----
	# stonith_admin --reboot nodename
	----

	=== Example STONITH Configuration ===

	Assume we have an chassis containing four nodes and an IPMI device
	active on 192.0.2.1. We would choose the `fence_ipmilan` driver,
	and obtain the following list of parameters:

	.Obtaining a list of STONITH Parameters
	====
	----
	# stonith_admin --metadata -a fence_ipmilan
	----

	[source,XML]
	----
	<resource-agent name="fence_ipmilan" shortdesc="Fence agent for IPMI over LAN">
	<symlink name="fence_ilo3" shortdesc="Fence agent for HP iLO3"/>
	<symlink name="fence_ilo4" shortdesc="Fence agent for HP iLO4"/>
	<symlink name="fence_idrac" shortdesc="Fence agent for Dell iDRAC"/>
	<symlink name="fence_imm" shortdesc="Fence agent for IBM Integrated Management Module"/>
	<longdesc>
	</longdesc>
	<vendor-url>
	</vendor-url>
	<parameters>
	<parameter name="auth" unique="0" required="0">
	<getopt mixed="-A"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="ipaddr" unique="0" required="1">
	<getopt mixed="-a"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="passwd" unique="0" required="0">
	<getopt mixed="-p"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="passwd_script" unique="0" required="0">
	<getopt mixed="-S"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="lanplus" unique="0" required="0">
	<getopt mixed="-P"/>
	<content type="boolean"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="login" unique="0" required="0">
	<getopt mixed="-l"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="action" unique="0" required="0">
	<getopt mixed="-o"/>
	<content type="string" default="reboot"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="timeout" unique="0" required="0">
	<getopt mixed="-t"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="cipher" unique="0" required="0">
	<getopt mixed="-C"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="method" unique="0" required="0">
	<getopt mixed="-M"/>
	<content type="string" default="onoff"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="power_wait" unique="0" required="0">
	<getopt mixed="-T"/>
	<content type="string" default="2"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="delay" unique="0" required="0">
	<getopt mixed="-f"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="privlvl" unique="0" required="0">
	<getopt mixed="-L"/>
	<content type="string"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	<parameter name="verbose" unique="0" required="0">
	<getopt mixed="-v"/>
	<content type="boolean"/>
	<shortdesc lang="en">
	</shortdesc>
	</parameter>
	</parameters>
	<actions>
	<action name="on"/>
	<action name="off"/>
	<action name="reboot"/>
	<action name="status"/>
	<action name="diag"/>
	<action name="list"/>
	<action name="monitor"/>
	<action name="metadata"/>
	<action name="stop" timeout="20s"/>
	<action name="start" timeout="20s"/>
	</actions>
	</resource-agent>
	----
	====

	Based on that, we would create a STONITH resource fragment that might look
	like this:

	.An IPMI-based STONITH Resource
	====
	[source,XML]
	----
	<primitive id="Fencing" class="stonith" type="fence_ipmilan" >
	<instance_attributes id="Fencing-params" >
	<nvpair id="Fencing-passwd" name="passwd" value="testuser" />
	<nvpair id="Fencing-login" name="login" value="abc123" />
	<nvpair id="Fencing-ipaddr" name="ipaddr" value="192.0.2.1" />
	<nvpair id="Fencing-pcmk_host_list" name="pcmk_host_list" value="pcmk-1 pcmk-2" />
	</instance_attributes>
	<operations >
	<op id="Fencing-monitor-10m" interval="10m" name="monitor" timeout="300s" />
	</operations>
	</primitive>
	----
	====

	Finally, we need to enable STONITH:
	----
	# crm_attribute -t crm_config -n stonith-enabled -v true
	----

	== Advanced STONITH Configurations ==

	Some people consider that having one fencing device is a single point
	of failure footnote:[Not true, since a node or resource must fail
	before fencing even has a chance to]; others prefer removing the node
	from the storage and network instead of turning it off.

	Whatever the reason, Pacemaker supports fencing nodes with multiple
	devices through a feature called 'fencing topologies'.

	Simply create the individual devices as you normally would, then
	define one or more +fencing-level+ entries in the +fencing-topology+ section of
	the configuration.

	* Each fencing level is attempted in order of ascending +index+. Allowed
	indexes are 0 to 9.
	* If a device fails, processing terminates for the current level.
	No further devices in that level are exercised, and the next level is attempted instead.
	* If the operation succeeds for all the listed devices in a level, the level is deemed to have passed.
	* The operation is finished when a level has passed (success), or all levels have been attempted (failed).
	* If the operation failed, the next step is determined by the Policy Engine and/or `crmd`.

	Some possible uses of topologies include:

	* Try poison-pill and fail back to power
	* Try disk and network, and fall back to power if either fails
	* Initiate a kdump and then poweroff the node

	.Properties of Fencing Levels
	[width="95%",cols="1m,3<",options="header",align="center"]
	\|=========================================================

	\|Field
	\|Description

	\|id
	\|A unique name for the level
	indexterm:[id,fencing-level]
	indexterm:[Fencing,fencing-level,id]

	\|target
	\|The name of a single node to which this level applies
	indexterm:[target,fencing-level]
	indexterm:[Fencing,fencing-level,target]

	\|target-pattern
	\|A regular expression matching the names of nodes to which this level applies
	'(since 1.1.14)'
	indexterm:[target-pattern,fencing-level]
	indexterm:[Fencing,fencing-level,target-pattern]

	\|target-attribute
	\|The name of a node attribute that is set for nodes to which this level applies
	'(since 1.1.14)'
	indexterm:[target-attribute,fencing-level]
	indexterm:[Fencing,fencing-level,target-attribute]

	\|index
	\|The order in which to attempt the levels.
	Levels are attempted in ascending order 'until one succeeds'.
	indexterm:[index,fencing-level]
	indexterm:[Fencing,fencing-level,index]

	\|devices
	\|A comma-separated list of devices that must all be tried for this level
	indexterm:[devices,fencing-level]
	indexterm:[Fencing,fencing-level,devices]

	\|=========================================================

	.Fencing topology with different devices for different nodes
	====
	[source,XML]
	----
	<cib crm_feature_set="3.0.6" validate-with="pacemaker-1.2" admin_epoch="1" epoch="0" num_updates="0">
	<configuration>
	...
	<fencing-topology>
	<!-- For pcmk-1, try poison-pill and fail back to power -->
	<fencing-level id="f-p1.1" target="pcmk-1" index="1" devices="poison-pill"/>
	<fencing-level id="f-p1.2" target="pcmk-1" index="2" devices="power"/>

	<!-- For pcmk-2, try disk and network, and fail back to power -->
	<fencing-level id="f-p2.1" target="pcmk-2" index="1" devices="disk,network"/>
	<fencing-level id="f-p2.2" target="pcmk-2" index="2" devices="power"/>
	</fencing-topology>
	...
	<configuration>
	<status/>
	</cib>
	----
	====

	=== Example Dual-Layer, Dual-Device Fencing Topologies ===

	The following example illustrates an advanced use of +fencing-topology+ in a cluster with the following properties:

	* 3 nodes (2 active prod-mysql nodes, 1 prod_mysql-rep in standby for quorum purposes)
	* the active nodes have an IPMI-controlled power board reached at 192.0.2.1 and 192.0.2.2
	* the active nodes also have two independent PSUs (Power Supply Units)
	connected to two independent PDUs (Power Distribution Units) reached at
	198.51.100.1 (port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
	* the first fencing method uses the `fence_ipmi` agent
	* the second fencing method uses the `fence_apc_snmp` agent targetting 2 fencing devices (one per PSU, either port 10 or 11)
	* fencing is only implemented for the active nodes and has location constraints
	* fencing topology is set to try IPMI fencing first then default to a "sure-kill" dual PDU fencing

	In a normal failure scenario, STONITH will first select +fence_ipmi+ to try to kill the faulty node.
	Using a fencing topology, if that first method fails, STONITH will then move on to selecting +fence_apc_snmp+ twice:

	* once for the first PDU
	* again for the second PDU

	The fence action is considered successful only if both PDUs report the required status. If any of them fails, STONITH loops back to the first fencing method, +fence_ipmi+, and so on until the node is fenced or fencing action is cancelled.

	.First fencing method: single IPMI device

	Each cluster node has it own dedicated IPMI channel that can be called for fencing using the following primitives:
	[source,XML]
	----
	<primitive class="stonith" id="fence_prod-mysql1_ipmi" type="fence_ipmilan">
	<instance_attributes id="fence_prod-mysql1_ipmi-instance_attributes">
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.1"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql2_ipmi" type="fence_ipmilan">
	<instance_attributes id="fence_prod-mysql2_ipmi-instance_attributes">
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.2"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
	</instance_attributes>
	</primitive>
	----

	.Second fencing method: dual PDU devices

	Each cluster node also has two distinct power channels controlled by two
	distinct PDUs. That means a total of 4 fencing devices configured as follows:

	- Node 1, PDU 1, PSU 1 @ port 10
	- Node 1, PDU 2, PSU 2 @ port 10
	- Node 2, PDU 1, PSU 1 @ port 11
	- Node 2, PDU 2, PSU 2 @ port 11

	The matching fencing agents are configured as follows:
	[source,XML]
	----
	<primitive class="stonith" id="fence_prod-mysql1_apc1" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql1_apc1-instance_attributes">
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-port" name="port" value="10"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql1_apc2" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql1_apc2-instance_attributes">
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-port" name="port" value="10"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql2_apc1" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql2_apc1-instance_attributes">
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-port" name="port" value="11"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql2_apc2" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql2_apc2-instance_attributes">
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-port" name="port" value="11"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
	</instance_attributes>
	</primitive>
	----

	.Location Constraints

	To prevent STONITH from trying to run a fencing agent on the same node it is
	supposed to fence, constraints are placed on all the fencing primitives:
	[source,XML]
	----
	<constraints>
	<rsc_location id="l_fence_prod-mysql1_ipmi" node="prod-mysql1" rsc="fence_prod-mysql1_ipmi" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql2_ipmi" node="prod-mysql2" rsc="fence_prod-mysql2_ipmi" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql1_apc2" node="prod-mysql1" rsc="fence_prod-mysql1_apc2" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql1_apc1" node="prod-mysql1" rsc="fence_prod-mysql1_apc1" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql2_apc1" node="prod-mysql2" rsc="fence_prod-mysql2_apc1" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql2_apc2" node="prod-mysql2" rsc="fence_prod-mysql2_apc2" score="-INFINITY"/>
	</constraints>
	----

	.Fencing topology

	Now that all the fencing resources are defined, it's time to create the right topology.
	We want to first fence using IPMI and if that does not work, fence both PDUs to effectively and surely kill the node.
	[source,XML]
	----
	<fencing-topology>
	<fencing-level devices="fence_prod-mysql1_ipmi" id="fencing-2" index="1" target="prod-mysql1"/>
	<fencing-level devices="fence_prod-mysql1_apc1,fence_prod-mysql1_apc2" id="fencing-3" index="2" target="prod-mysql1"/>
	<fencing-level devices="fence_prod-mysql2_ipmi" id="fencing-0" index="1" target="prod-mysql2"/>
	<fencing-level devices="fence_prod-mysql2_apc1,fence_prod-mysql2_apc2" id="fencing-1" index="2" target="prod-mysql2"/>
	</fencing-topology>
	----
	Please note, in +fencing-topology+, the lowest +index+ value determines the priority of the first fencing method.

	.Final configuration

	Put together, the configuration looks like this:
	[source,XML]
	----
	<cib admin_epoch="0" crm_feature_set="3.0.7" epoch="292" have-quorum="1" num_updates="29" validate-with="pacemaker-1.2">
	<configuration>
	<crm_config>
	<cluster_property_set id="cib-bootstrap-options">
	<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/>
	<nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="off"/>
	<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="3"/>
	...
	</cluster_property_set>
	</crm_config>
	<nodes>
	<node id="prod-mysql1" uname="prod-mysql1">
	<node id="prod-mysql2" uname="prod-mysql2"/>
	<node id="prod-mysql-rep1" uname="prod-mysql-rep1"/>
	<instance_attributes id="prod-mysql-rep1">
	<nvpair id="prod-mysql-rep1-standby" name="standby" value="on"/>
	</instance_attributes>
	</node>
	</nodes>
	<resources>
	<primitive class="stonith" id="fence_prod-mysql1_ipmi" type="fence_ipmilan">
	<instance_attributes id="fence_prod-mysql1_ipmi-instance_attributes">
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.1"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
	<nvpair id="fence_prod-mysql1_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql2_ipmi" type="fence_ipmilan">
	<instance_attributes id="fence_prod-mysql2_ipmi-instance_attributes">
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-ipaddr" name="ipaddr" value="192.0.2.2"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-passwd" name="passwd" value="finishme"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-verbose" name="verbose" value="true"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
	<nvpair id="fence_prod-mysql2_ipmi-instance_attributes-lanplus" name="lanplus" value="true"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql1_apc1" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql1_apc1-instance_attributes">
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-port" name="port" value="10"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql1_apc2" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql1_apc2-instance_attributes">
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-port" name="port" value="10"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql1_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql1"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql2_apc1" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql2_apc1-instance_attributes">
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-ipaddr" name="ipaddr" value="198.51.100.1"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-port" name="port" value="11"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc1-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
	</instance_attributes>
	</primitive>
	<primitive class="stonith" id="fence_prod-mysql2_apc2" type="fence_apc_snmp">
	<instance_attributes id="fence_prod-mysql2_apc2-instance_attributes">
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-ipaddr" name="ipaddr" value="203.0.113.1"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-action" name="action" value="off"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-port" name="port" value="11"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-login" name="login" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-passwd" name="passwd" value="fencing"/>
	<nvpair id="fence_prod-mysql2_apc2-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="prod-mysql2"/>
	</instance_attributes>
	</primitive>
	</resources>
	<constraints>
	<rsc_location id="l_fence_prod-mysql1_ipmi" node="prod-mysql1" rsc="fence_prod-mysql1_ipmi" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql2_ipmi" node="prod-mysql2" rsc="fence_prod-mysql2_ipmi" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql1_apc2" node="prod-mysql1" rsc="fence_prod-mysql1_apc2" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql1_apc1" node="prod-mysql1" rsc="fence_prod-mysql1_apc1" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql2_apc1" node="prod-mysql2" rsc="fence_prod-mysql2_apc1" score="-INFINITY"/>
	<rsc_location id="l_fence_prod-mysql2_apc2" node="prod-mysql2" rsc="fence_prod-mysql2_apc2" score="-INFINITY"/>
	</constraints>
	<fencing-topology>
	<fencing-level devices="fence_prod-mysql1_ipmi" id="fencing-2" index="1" target="prod-mysql1"/>
	<fencing-level devices="fence_prod-mysql1_apc1,fence_prod-mysql1_apc2" id="fencing-3" index="2" target="prod-mysql1"/>
	<fencing-level devices="fence_prod-mysql2_ipmi" id="fencing-0" index="1" target="prod-mysql2"/>
	<fencing-level devices="fence_prod-mysql2_apc1,fence_prod-mysql2_apc2" id="fencing-1" index="2" target="prod-mysql2"/>
	</fencing-topology>
	...
	</configuration>
	</cib>
	----

	== Remapping Reboots ==

	When the cluster needs to reboot a node, whether because +stonith-action+ is +reboot+ or because
	a reboot was manually requested (such as by `stonith_admin --reboot`), it will remap that to
	other commands in two cases:

	. If the chosen fencing device does not support the +reboot+ command, the cluster
	will ask it to perform +off+ instead.

	. If a fencing topology level with multiple devices must be executed, the cluster
	will ask all the devices to perform +off+, then ask the devices to perform +on+.

	To understand the second case, consider the example of a node with redundant
	power supplies connected to intelligent power switches. Rebooting one switch
	and then the other would have no effect on the node. Turning both switches off,
	and then on, actually reboots the node.

	In such a case, the fencing operation will be treated as successful as long as
	the +off+ commands succeed, because then it is safe for the cluster to recover
	any resources that were on the node. Timeouts and errors in the +on+ phase will
	be logged but ignored.

	When a reboot operation is remapped, any action-specific timeout for the
	remapped action will be used (for example, +pcmk_off_timeout+ will be used when
	executing the +off+ command, not +pcmk_reboot_timeout+).

	[NOTE]
	====
	In Pacemaker versions 1.1.13 and earlier, reboots will not be remapped in the
	second case. To achieve the same effect, separate fencing devices for off and
	on actions must be configured.
	====
	diff --git a/doc/Pacemaker_Explained/en-US/Revision_History.xml b/doc/Pacemaker_Explained/en-US/Revision_History.xml
	index 33010d5c0e..4bd3485d26 100644
	--- a/doc/Pacemaker_Explained/en-US/Revision_History.xml
	+++ b/doc/Pacemaker_Explained/en-US/Revision_History.xml
	@@ -1,72 +1,84 @@
	<?xml version='1.0' encoding='utf-8' ?>
	<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
	]>
	<appendix>
	<!-- see comment in Book_Info.xml for revision numbering -->
	<title>Revision History</title>
	<simpara>
	<revhistory>
	<revision>
	<revnumber>1-0</revnumber>
	<date>19 Oct 2009</date>
	<author><firstname>Andrew</firstname><surname>Beekhof</surname><email>andrew@beekhof.net</email></author>
	<revdescription><simplelist><member>Import from Pages.app</member></simplelist></revdescription>
	</revision>
	<revision>
	<revnumber>2-0</revnumber>
	<date>26 Oct 2009</date>
	<author><firstname>Andrew</firstname><surname>Beekhof</surname><email>andrew@beekhof.net</email></author>
	<revdescription><simplelist><member>Cleanup and reformatting of docbook xml complete</member></simplelist></revdescription>
	</revision>
	<revision>
	<revnumber>3-0</revnumber>
	<date>Tue Nov 12 2009</date>
	<author><firstname>Andrew</firstname><surname>Beekhof</surname><email>andrew@beekhof.net</email></author>
	<revdescription>
	<simplelist>
	<member>Split book into chapters and pass validation</member>
	<member>Re-organize book for use with <ulink url="https://fedorahosted.org/publican/">Publican</ulink></member>
	</simplelist>
	</revdescription>
	</revision>
	<revision>
	<revnumber>4-0</revnumber>
	<date>Mon Oct 8 2012</date>
	<author><firstname>Andrew</firstname><surname>Beekhof</surname><email>andrew@beekhof.net</email></author>
	<revdescription>
	<simplelist>
	<member>
	Converted to <ulink url="http://www.methods.co.nz/asciidoc">asciidoc</ulink>
	(which is converted to docbook for use with
	<ulink url="https://fedorahosted.org/publican/">Publican</ulink>)
	</member>
	</simplelist>
	</revdescription>
	</revision>
	<revision>
	<revnumber>5-0</revnumber>
	<date>Mon Feb 23 2015</date>
	<author><firstname>Ken</firstname><surname>Gaillot</surname><email>kgaillot@redhat.com</email></author>
	<revdescription>
	<simplelist>
	<member>
	Update for clarity, stylistic consistency and current command-line syntax
	</member>
	</simplelist>
	</revdescription>
	</revision>
	<revision>
	<revnumber>6-0</revnumber>
	<date>Tue Dec 8 2015</date>
	<author><firstname>Ken</firstname><surname>Gaillot</surname><email>kgaillot@redhat.com</email></author>
	<revdescription>
	<simplelist>
	<member>
	Update for Pacemaker 1.1.14
	</member>
	</simplelist>
	</revdescription>
	</revision>
	+ <revision>
	+ <revnumber>7-0</revnumber>
	+ <date>Tue May 3 2016</date>
	+ <author><firstname>Ken</firstname><surname>Gaillot</surname><email>kgaillot@redhat.com</email></author>
	+ <revdescription>
	+ <simplelist>
	+ <member>
	+ Update for Pacemaker 1.1.15
	+ </member>
	+ </simplelist>
	+ </revdescription>
	+ </revision>
	</revhistory>
	</simpara>
	</appendix>
	diff --git a/doc/Pacemaker_Remote/en-US/Book_Info.xml b/doc/Pacemaker_Remote/en-US/Book_Info.xml
	index 12e1ab891d..1e3675b9d1 100644
	--- a/doc/Pacemaker_Remote/en-US/Book_Info.xml
	+++ b/doc/Pacemaker_Remote/en-US/Book_Info.xml
	@@ -1,75 +1,75 @@
	<?xml version='1.0' encoding='utf-8' ?>
	<!DOCTYPE bookinfo PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
	<!ENTITY % BOOK_ENTITIES SYSTEM "Pacemaker_Remote.ent">
	%BOOK_ENTITIES;
	]>
	<bookinfo id="book-Pacemaker_Remote-Pacemaker_Remote">
	<title>Pacemaker Remote</title>
	<subtitle>Scaling High Availablity Clusters</subtitle>
	<!--
	EDITION-PUBSNUMBER should match REVNUMBER in Revision_History.xml.
	Increment EDITION when the syntax of the documented software
	changes (OS, pacemaker, corosync, pcs), and PUBSNUMBER for
	simple textual changes (corrections, translations, etc.).
	-->
	- <edition>5</edition>
	+ <edition>6</edition>
	<pubsnumber>0</pubsnumber>
	<abstract>
	<para>
	The document exists as both a reference and deployment guide for the Pacemaker Remote service.
	</para>
	<para>
	The example commands in this document will use:
	<orderedlist>
	<listitem>
	<para>
	&DISTRO; &DISTRO_VERSION; as the host operating system
	</para>
	</listitem>
	<listitem>
	<para>
	Pacemaker Remote to perform resource management within guest nodes and remote nodes
	</para>
	</listitem>
	<listitem>
	<para>
	KVM for virtualization
	</para>
	</listitem>
	<listitem>
	<para>
	libvirt to manage guest nodes
	</para>
	</listitem>
	<listitem>
	<para>
	Corosync to provide messaging and membership services on cluster nodes
	</para>
	</listitem>
	<listitem>
	<para>
	Pacemaker to perform resource management on cluster nodes
	</para>
	</listitem>
	<listitem>
	<para>
	pcs as the cluster configuration toolset
	</para>
	</listitem>
	</orderedlist>
	The concepts are the same for other distributions,
	virtualization platforms, toolsets, and messaging
	layers, and should be easily adaptable.
	</para>
	</abstract>
	<corpauthor>
	<inlinemediaobject>
	<imageobject>
	<imagedata fileref="Common_Content/images/title_logo.svg" format="SVG" />
	</imageobject>
	</inlinemediaobject>
	</corpauthor>
	<xi:include href="Common_Content/Legal_Notice.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	<xi:include href="Author_Group.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
	</bookinfo>

	diff --git a/doc/Pacemaker_Remote/en-US/Ch-Alternatives.txt b/doc/Pacemaker_Remote/en-US/Ch-Alternatives.txt
	index d2fd9f42fd..7cf45ab423 100644
	--- a/doc/Pacemaker_Remote/en-US/Ch-Alternatives.txt
	+++ b/doc/Pacemaker_Remote/en-US/Ch-Alternatives.txt
	@@ -1,77 +1,78 @@
	= Alternative Configurations =

	These alternative configurations may be appropriate in limited cases, such as a
	test cluster, but are not the best method in most situations. They are
	presented here for completeness and as an example of pacemaker's flexibility
	to suit your needs.

	== Virtual Machines as Cluster Nodes ==

	The preferred use of virtual machines in a pacemaker cluster is as a
	cluster resource, whether opaque or as a guest node. However, it is
	possible to run the full cluster stack on a virtual node instead.

	This is commonly used to set up test environments; a single physical host
	(that does not participate in the cluster) runs two or more virtual machines,
	all running the full cluster stack. This can be used to simulate a
	larger cluster for testing purposes.

	In a production environment, fencing becomes more complicated, especially
	if the underlying hosts run any services besides the clustered VMs.
	If the VMs are not guaranteed a minimum amount of host resources,
	CPU and I/O contention can cause timing issues for cluster components.

	Another situation where this approach is sometimes used is when
	the cluster owner leases the VMs from a provider and does not have
	direct access to the underlying host. The main concerns in this case
	are proper fencing (usually via a custom resource agent that communicates
	with the provider's APIs) and maintaining a static IP address between reboots,
	as well as resource contention issues.

	== Virtual Machines as Remote Nodes ==

	Virtual machines may be configured following the process for remote nodes
	rather than guest nodes (i.e., using an ocf:pacemaker:remote resource
	rather than letting the cluster manage the VM directly).

	This is mainly useful in testing, to use a single physical host to simulate a
	larger cluster involving remote nodes. Pacemaker's Cluster Test Suite (CTS)
	uses this approach to test remote node functionality.

	== Containers as Guest Nodes ==

	Containers,footnote:[https://en.wikipedia.org/wiki/Operating-system-level_virtualization]
	and in particular Linux containers (LXC) and Docker, have become a popular
	method of isolating services in a resource-efficient manner.

	The preferred means of integrating containers into Pacemaker is as a
	cluster resource, whether opaque or using Pacemaker's built-in
	resource isolation support.footnote:[Documentation for this support is planned
	but not yet available.]

	However, it is possible to run `pacemaker_remote` inside a container,
	following the process for guest nodes. This is not recommended but can
	be useful, for example, in testing scenarios, to simulate a large number of
	guest nodes.

	The configuration process is very similar to that described for guest nodes
	using virtual machines. Key differences:

	* The underlying host must install the libvirt driver for the desired container
	technology -- for example, the +libvirt-daemon-lxc+ package to get the
	http://libvirt.org/drvlxc.html:[libvirt-lxc] driver for LXC containers.

	* Libvirt XML definitions must be generated for the containers. The
	- +pacemaker-cts+ package includes a helpful script for this purpose,
	+ +pacemaker-cts+ package includes a script for this purpose,
	+/usr/share/pacemaker/tests/cts/lxc_autogen.sh+. Run it with the
	- `--help` option for details on how to use it. Of course, you can create
	- XML definitions manually, following the appropriate libvirt driver
	- documentation.
	+ `--help` option for details on how to use it. It is intended for testing
	+ purposes only, and hardcodes various parameters that would need to be set
	+ appropriately in real usage. Of course, you can create XML definitions
	+ manually, following the appropriate libvirt driver documentation.

	* To share the authentication key, either share the host's +/etc/pacemaker+
	directory with the container, or copy the key into the container's
	filesystem.

	* The VirtualDomain resource for a container will need
	force_stop="true" and an appropriate hypervisor option,
	for example hypervisor="lxc:///" for LXC containers.
	diff --git a/doc/Pacemaker_Remote/en-US/Ch-Baremetal-Tutorial.txt b/doc/Pacemaker_Remote/en-US/Ch-Baremetal-Tutorial.txt
	index c187b2536f..f866c9a944 100644
	--- a/doc/Pacemaker_Remote/en-US/Ch-Baremetal-Tutorial.txt
	+++ b/doc/Pacemaker_Remote/en-US/Ch-Baremetal-Tutorial.txt
	@@ -1,306 +1,310 @@
	= Remote Node Walk-through =

	What this tutorial is: An in-depth walk-through of how to get Pacemaker to
	integrate a remote node into the cluster as a node capable of running cluster
	resources.

	What this tutorial is not: A realistic deployment scenario. The steps shown
	here are meant to get users familiar with the concept of remote nodes as
	quickly as possible.

	This tutorial requires three machines: two to act as cluster nodes, and
	a third to act as the remote node.

	== Configure Remote Node ==

	=== Configure Firewall on Remote Node ===

	Allow cluster-related services through the local firewall:
	----
	# firewall-cmd --permanent --add-service=high-availability
	success
	# firewall-cmd --reload
	success
	----

	[NOTE]
	======
	If you are using iptables directly, or some other firewall solution besides
	firewalld, simply open the following ports, which can be used by various
	clustering components: TCP ports 2224, 3121, and 21064, and UDP port 5405.

	If you run into any problems during testing, you might want to disable
	the firewall and SELinux entirely until you have everything working.
	This may create significant security issues and should not be performed on
	machines that will be exposed to the outside world, but may be appropriate
	during development and testing on a protected host.

	To disable security measures:
	----
	# setenforce 0
	# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
	# systemctl disable firewalld.service
	# systemctl stop firewalld.service
	# iptables --flush
	----
	======

	=== Configure pacemaker_remote on Remote Node ===

	Install the pacemaker_remote daemon on the remote node.
	----
	# yum install -y pacemaker-remote resource-agents pcs
	----

	Create a location for the shared authentication key:
	----
	# mkdir -p --mode=0750 /etc/pacemaker
	# chgrp haclient /etc/pacemaker
	----

	All nodes (both cluster nodes and remote nodes) must have the same
	authentication key installed for the communication to work correctly.
	If you already have a key on an existing node, copy it to the new
	remote node. Otherwise, create a new key, for example:
	----
	# dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
	----

	Now start and enable the pacemaker_remote daemon on the remote node.
	----
	# systemctl enable pacemaker_remote.service
	# systemctl start pacemaker_remote.service
	----

	Verify the start is successful.

	----
	# systemctl status pacemaker_remote
	pacemaker_remote.service - Pacemaker Remote Service
	Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; enabled)
	Active: active (running) since Fri 2015-08-21 15:21:20 CDT; 20s ago
	Main PID: 21273 (pacemaker_remot)
	CGroup: /system.slice/pacemaker_remote.service
	└─21273 /usr/sbin/pacemaker_remoted

	Aug 21 15:21:20 remote1 systemd[1]: Starting Pacemaker Remote Service...
	Aug 21 15:21:20 remote1 systemd[1]: Started Pacemaker Remote Service.
	Aug 21 15:21:20 remote1 pacemaker_remoted[21273]: notice: crm_add_logfile: Additional logging available in /var/log/pacemaker.log
	Aug 21 15:21:20 remote1 pacemaker_remoted[21273]: notice: lrmd_init_remote_tls_server: Starting a tls listener on port 3121.
	Aug 21 15:21:20 remote1 pacemaker_remoted[21273]: notice: bind_and_listen: Listening on address ::
	----

	== Verify Connection to Remote Node ==

	Before moving forward, it's worth verifying that the cluster nodes
	can contact the remote node on port 3121. Here's a trick you can use.
	Connect using ssh from each of the cluster nodes. The connection will get
	destroyed, but how it is destroyed tells you whether it worked or not.

	First, add the remote node's hostname (we're using remote1 in this tutorial)
	to the cluster nodes' +/etc/hosts+ files if you haven't already. This
	is required unless you have DNS set up in a way where remote1's address can be
	discovered.

	Execute the following on each cluster node, replacing the IP address with the
	actual IP address of the remote node.
	----
	# cat << END >> /etc/hosts
	192.168.122.10 remote1
	END
	----

	If running the ssh command on one of the cluster nodes results in this
	-output before disconnecting, the connection works.
	+output before disconnecting, the connection works:
	----
	# ssh -p 3121 remote1
	ssh_exchange_identification: read: Connection reset by peer
	----

	-If you see this, the connection is not working.
	+If you see one of these, the connection is not working:
	----
	# ssh -p 3121 remote1
	ssh: connect to host remote1 port 3121: No route to host
	----
	+----
	+# ssh -p 3121 remote1
	+ssh: connect to host remote1 port 3121: Connection refused
	+----

	Once you can successfully connect to the remote node from the both
	cluster nodes, move on to setting up Pacemaker on the cluster nodes.

	== Configure Cluster Nodes ==

	=== Configure Firewall on Cluster Nodes ===

	On each cluster node, allow cluster-related services through the local
	firewall, following the same procedure as in <<_configure_firewall_on_remote_node>>.

	=== Install Pacemaker on Cluster Nodes ===

	On the two cluster nodes, install the following packages.

	----
	# yum install -y pacemaker corosync pcs resource-agents
	----

	=== Copy Authentication Key to Cluster Nodes ===

	Create a location for the shared authentication key,
	and copy it from any existing node:
	----
	# mkdir -p --mode=0750 /etc/pacemaker
	# chgrp haclient /etc/pacemaker
	# scp remote1:/etc/pacemaker/authkey /etc/pacemaker/authkey
	----

	=== Configure Corosync on Cluster Nodes ===

	Corosync handles Pacemaker's cluster membership and messaging. The corosync
	config file is located in +/etc/corosync/corosync.conf+. That config file must be
	initialized with information about the two cluster nodes before pacemaker can
	start.

	To initialize the corosync config file, execute the following pcs command on
	both nodes, filling in the information in <> with your nodes' information.
	----
	# pcs cluster setup --force --local --name mycluster <node1 ip or hostname> <node2 ip or hostname>
	----

	=== Start Pacemaker on Cluster Nodes ===

	Start the cluster stack on both cluster nodes using the following command.

	----
	# pcs cluster start
	----

	Verify corosync membership

	....
	# pcs status corosync
	Membership information
	----------------------
	Nodeid Votes Name
	1 1 node1 (local)
	....

	Verify Pacemaker status. At first, the `pcs cluster status` output will look
	like this.

	----
	# pcs status
	Cluster name: mycluster
	Last updated: Fri Aug 21 16:14:05 2015
	Last change: Fri Aug 21 14:02:14 2015
	Stack: corosync
	Current DC: NONE
	Version: 1.1.12-a14efad
	1 Nodes configured, unknown expected votes
	0 Resources configured
	----

	After about a minute, you should see your two cluster nodes come online.

	----
	# pcs status
	Cluster name: mycluster
	Last updated: Fri Aug 21 16:16:32 2015
	Last change: Fri Aug 21 14:02:14 2015
	Stack: corosync
	Current DC: node1 (1) - partition with quorum
	Version: 1.1.12-a14efad
	2 Nodes configured
	0 Resources configured

	Online: [ node1 node2 ]
	----

	For the sake of this tutorial, we are going to disable stonith to avoid having to cover fencing device configuration.

	----
	# pcs property set stonith-enabled=false
	----

	== Integrate Remote Node into Cluster ==

	Integrating a remote node into the cluster is achieved through the
	creation of a remote node connection resource. The remote node connection
	resource both establishes the connection to the remote node and defines that
	the remote node exists. Note that this resource is actually internal to
	Pacemaker's crmd component. A metadata file for this resource can be found in
	the +/usr/lib/ocf/resource.d/pacemaker/remote+ file that describes what options
	are available, but there is no actual ocf:pacemaker:remote resource agent
	script that performs any work.

	Define the remote node connection resource to our remote node,
	remote1, using the following command on any cluster node.

	----
	# pcs resource create remote1 ocf:pacemaker:remote
	----

	That's it. After a moment you should see the remote node come online.

	----
	Cluster name: mycluster
	Last updated: Fri Aug 21 17:13:09 2015
	Last change: Fri Aug 21 17:02:02 2015
	Stack: corosync
	Current DC: node1 (1) - partition with quorum
	Version: 1.1.12-a14efad
	3 Nodes configured
	1 Resources configured


	Online: [ node1 node2 ]
	RemoteOnline: [ remote1 ]

	Full list of resources:

	remote1 (ocf::pacemaker:remote): Started node1

	PCSD Status:
	node1: Online
	node2: Online

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	== Starting Resources on Remote Node ==

	Once the remote node is integrated into the cluster, starting resources on a
	remote node is the exact same as on cluster nodes. Refer to the
	http://clusterlabs.org/doc/['Clusters from Scratch'] document for examples of
	resource creation.

	[WARNING]
	=========
	Never involve a remote node connection resource in a resource group,
	colocation constraint, or order constraint.
	=========

	== Fencing Remote Nodes ==

	Remote nodes are fenced the same way as cluster nodes. No special
	considerations are required. Configure fencing resources for use with
	remote nodes the same as you would with cluster nodes.

	Note, however, that remote nodes can never 'initiate' a fencing action. Only
	cluster nodes are capable of actually executing a fencing operation against
	another node.

	== Accessing Cluster Tools from a Remote Node ==

	Besides allowing the cluster to manage resources on a remote node,
	pacemaker_remote has one other trick. The pacemaker_remote daemon allows
	nearly all the pacemaker tools (`crm_resource`, `crm_mon`, `crm_attribute`,
	`crm_master`, etc.) to work on remote nodes natively.

	Try it: Run `crm_mon` on the remote node after pacemaker has
	integrated it into the cluster. These tools just work. These means resource
	agents such as master/slave resources which need access to tools like
	`crm_master` work seamlessly on the remote nodes.

	Higher-level command shells such as `pcs` may have partial support
	on remote nodes, but it is recommended to run them from a cluster node.
	diff --git a/doc/Pacemaker_Remote/en-US/Ch-Intro.txt b/doc/Pacemaker_Remote/en-US/Ch-Intro.txt
	index 9edf054a69..416c19d880 100644
	--- a/doc/Pacemaker_Remote/en-US/Ch-Intro.txt
	+++ b/doc/Pacemaker_Remote/en-US/Ch-Intro.txt
	@@ -1,198 +1,204 @@
	= Scaling a Pacemaker Cluster =

	== Overview ==

	In a basic Pacemaker high-availability
	cluster,footnote:[See the http://www.clusterlabs.org/doc/[Pacemaker
	documentation], especially 'Clusters From Scratch' and 'Pacemaker Explained',
	for basic information about high-availability using Pacemaker]
	each node runs the full cluster stack of corosync and all Pacemaker components.
	This allows great flexibility but limits scalability to around 16 nodes.

	To allow for scalability to dozens or even hundreds of nodes, Pacemaker
	allows nodes not running the full cluster stack to integrate into the cluster
	and have the cluster manage their resources as if they were a cluster node.

	== Terms ==

	cluster node::
	A node running the full high-availability stack of corosync and all
	Pacemaker components. Cluster nodes may run cluster resources, run
	all Pacemaker command-line tools (`crm_mon`, `crm_resource` and so on),
	execute fencing actions, count toward cluster quorum, and serve as the
	cluster's Designated Controller (DC).
	(((cluster node)))
	(((node,cluster node)))

	pacemaker_remote::
	A small service daemon that allows a host to be used as a Pacemaker node
	without running the full cluster stack. Nodes running pacemaker_remote
	may run cluster resources and most command-line tools, but cannot perform
	other functions of full cluster nodes such as fencing execution, quorum
	voting or DC eligibility. The pacemaker_remote daemon is an enhanced
	version of Pacemaker's local resource management daemon (LRMD).
	(((pacemaker_remote)))

	remote node::
	A physical host running pacemaker_remote. Remote nodes have a special
	resource that manages communication with the cluster. This is sometimes
	referred to as the 'baremetal' case.
	(((remote node)))
	(((node,remote node)))

	guest node::
	A virtual host running pacemaker_remote. Guest nodes differ from remote
	nodes mainly in that the guest node is itself a resource that the cluster
	manages.
	(((guest node)))
	(((node,guest node)))

	[NOTE]
	======
	'Remote' in this document refers to the node not being a part of the underlying
	corosync cluster. It has nothing to do with physical proximity. Remote nodes
	and guest nodes are subject to the same latency requirements as cluster nodes,
	which means they are typically in the same data center.
	======

	[NOTE]
	======
	It is important to distinguish the various roles a virtual machine can serve
	in Pacemaker clusters:

	* A virtual machine can run the full cluster stack, in which case it is a
	cluster node and is not itself managed by the cluster.
	* A virtual machine can be managed by the cluster as a resource, without the
	cluster having any awareness of the services running inside the virtual
	machine. The virtual machine is 'opaque' to the cluster.
	* A virtual machine can be a cluster resource, and run pacemaker_remote
	to make it a guest node, allowing the cluster to manage services
	inside it. The virtual machine is 'transparent' to the cluster.
	======

	== Support in Pacemaker Versions ==

	It is recommended to run Pacemaker 1.1.12 or later when using pacemaker_remote
	due to important bug fixes. An overview of changes in pacemaker_remote
	capability by version:

	+.1.1.15
	+* If pacemaker_remote is stopped on an active node, it will wait for the
	+ cluster to migrate all resources off before exiting, rather than exit
	+ immediately and get fenced.
	+* Bug fixes
	+
	.1.1.14
	* Resources that create guest nodes can be included in groups
	* reconnect_interval option for remote nodes
	* Bug fixes, including a memory leak

	.1.1.13
	* Support for maintenance mode
	* Remote nodes can recover without being fenced when the cluster node
	hosting their connection fails
	* Running pacemaker_remote within LXC environments is deprecated due to
	newly added Pacemaker support for isolated resources
	* Bug fixes

	.1.1.12
	* Support for permanent node attributes
	* Support for migration
	* Bug fixes

	.1.1.11
	* Support for IPv6
	* Support for remote nodes
	* Support for transient node attributes
	* Support for clusters with mixed endian architectures
	* Bug fixes

	.1.1.10
	* Bug fixes

	.1.1.9
	* Initial version to include pacemaker_remote
	* Limited to guest nodes in KVM/LXC environments using only IPv4;
	all nodes' architectures must have same endianness

	== Guest Nodes ==
	(((guest node)))
	(((node,guest node)))

	*"I want a Pacemaker cluster to manage virtual machine resources, but I also
	want Pacemaker to be able to manage the resources that live within those
	virtual machines."*

	Without pacemaker_remote, the possibilities for implementing the above use case
	have significant limitations:

	* The cluster stack could be run on the physical hosts only, which loses the
	ability to monitor resources within the guests.
	* A separate cluster could be on the virtual guests, which quickly hits
	scalability issues.
	* The cluster stack could be run on the guests using the same cluster as the
	physical hosts, which also hits scalability issues and complicates fencing.

	With pacemaker_remote:

	* The physical hosts are cluster nodes (running the full cluster stack).
	* The virtual machines are guest nodes (running the pacemaker_remote service).
	Nearly zero configuration is required on the virtual machine.
	* The cluster stack on the cluster nodes launches the virtual machines and
	immediately connects to the pacemaker_remote service on them, allowing the
	virtual machines to integrate into the cluster.

	The key difference here between the guest nodes and the cluster nodes is that
	the guest nodes do not run the cluster stack. This means they will never become
	the DC, initiate fencing actions or participate in quorum voting.

	On the other hand, this also means that they are not bound to the scalability
	limits associated with the cluster stack (no 16-node corosync member limits to
	deal with). That isn't to say that guest nodes can scale indefinitely, but it
	is known that guest nodes scale horizontally much further than cluster nodes.

	Other than the quorum limitation, these guest nodes behave just like cluster
	nodes with respect to resource management. The cluster is fully capable of
	managing and monitoring resources on each guest node. You can build constraints
	against guest nodes, put them in standby, or do whatever else you'd expect to
	be able to do with cluster nodes. They even show up in `crm_mon` output as
	nodes.

	To solidify the concept, below is an example that is very similar to an actual
	deployment we test in our developer environment to verify guest node scalability:

	* 16 cluster nodes running the full corosync + pacemaker stack
	* 64 Pacemaker-managed virtual machine resources running pacemaker_remote configured as guest nodes
	* 64 Pacemaker-managed webserver and database resources configured to run on the 64 guest nodes

	With this deployment, you would have 64 webservers and databases running on 64
	virtual machines on 16 hardware nodes, all of which are managed and monitored by
	the same Pacemaker deployment. It is known that pacemaker_remote can scale to
	these lengths and possibly much further depending on the specific scenario.


	== Remote Nodes ==
	(((remote node)))
	(((node,remote node)))

	*"I want my traditional high-availability cluster to scale beyond the limits
	imposed by the corosync messaging layer."*

	Ultimately, the primary advantage of remote nodes over cluster nodes is
	scalability. There are likely some other use cases related to geographically
	distributed HA clusters that remote nodes may serve a purpose in, but those use
	cases are not well understood at this point.

	Like guest nodes, remote nodes will never become the DC, initiate
	fencing actions or participate in quorum voting.

	That is not to say, however, that fencing of a remote node works any
	differently than that of a cluster node. The Pacemaker policy engine
	understands how to fence remote nodes. As long as a fencing device exists, the
	cluster is capable of ensuring remote nodes are fenced in the exact same way as
	cluster nodes.

	== Expanding the Cluster Stack ==

	With pacemaker_remote, the traditional view of the high-availability stack can
	be expanded to include a new layer:

	.Traditional HA Stack
	image::images/pcmk-ha-cluster-stack.png["Traditional Pacemaker+Corosync Stack",width="17cm",height="9cm",align="center"]

	.HA Stack With Guest Nodes
	image::images/pcmk-ha-remote-stack.png["Pacemaker+Corosync Stack With pacemaker_remote",width="20cm",height="10cm",align="center"]
	diff --git a/doc/Pacemaker_Remote/en-US/Ch-KVM-Tutorial.txt b/doc/Pacemaker_Remote/en-US/Ch-KVM-Tutorial.txt
	index 72a9076592..7f09598e31 100644
	--- a/doc/Pacemaker_Remote/en-US/Ch-KVM-Tutorial.txt
	+++ b/doc/Pacemaker_Remote/en-US/Ch-KVM-Tutorial.txt
	@@ -1,583 +1,583 @@
	= Guest Node Walk-through =

	What this tutorial is: An in-depth walk-through of how to get Pacemaker to
	manage a KVM guest instance and integrate that guest into the cluster as a
	guest node.

	What this tutorial is not: A realistic deployment scenario. The steps shown
	here are meant to get users familiar with the concept of guest nodes as quickly
	as possible.

	== Configure the Physical Host ==

	[NOTE]
	======
	For this example, we will use a single physical host named example-host.
	A production cluster would likely have multiple physical hosts, in which case
	you would run the commands here on each one, unless noted otherwise.
	======

	=== Configure Firewall on Host ===

	On the physical host, allow cluster-related services through the local firewall:
	----
	# firewall-cmd --permanent --add-service=high-availability
	success
	# firewall-cmd --reload
	success
	----

	[NOTE]
	======
	If you are using iptables directly, or some other firewall solution besides
	firewalld, simply open the following ports, which can be used by various
	clustering components: TCP ports 2224, 3121, and 21064, and UDP port 5405.

	If you run into any problems during testing, you might want to disable
	the firewall and SELinux entirely until you have everything working.
	This may create significant security issues and should not be performed on
	machines that will be exposed to the outside world, but may be appropriate
	during development and testing on a protected host.

	To disable security measures:
	----
	[root@pcmk-1 ~]# setenforce 0
	[root@pcmk-1 ~]# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
	[root@pcmk-1 ~]# systemctl disable firewalld.service
	[root@pcmk-1 ~]# systemctl stop firewalld.service
	[root@pcmk-1 ~]# iptables --flush
	----
	======

	=== Install Cluster Software ===

	----
	# yum install -y pacemaker corosync pcs resource-agents
	----

	=== Configure Corosync ===

	Corosync handles pacemaker's cluster membership and messaging. The corosync
	config file is located in +/etc/corosync/corosync.conf+. That config file must
	be initialized with information about the cluster nodes before pacemaker can
	start.

	To initialize the corosync config file, execute the following `pcs` command,
	replacing the cluster name and hostname as desired:
	----
	# pcs cluster setup --force --local --name mycluster example-host
	----

	[NOTE]
	======
	If you have multiple physical hosts, you would execute the setup command on
	only one host, but list all of them at the end of the command.
	======

	=== Configure Pacemaker for Remote Node Communication ===

	Create a place to hold an authentication key for use with pacemaker_remote:
	----
	# mkdir -p --mode=0750 /etc/pacemaker
	# chgrp haclient /etc/pacemaker
	----

	Generate a key:
	----
	# dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
	----

	[NOTE]
	======
	If you have multiple physical hosts, you would generate the key on only one
	host, and copy it to the same location on all hosts.
	======

	=== Verify Cluster Software ===

	Start the cluster
	----
	# pcs cluster start
	----

	Verify corosync membership
	....
	# pcs status corosync

	Membership information
	----------------------
	Nodeid Votes Name
	1 1 example-host (local)
	....

	Verify pacemaker status. At first, the output will look like this:
	----
	# pcs status
	Cluster name: mycluster
	WARNING: no stonith devices and stonith-enabled is not false
	Last updated: Fri Oct 9 15:18:32 2015 Last change: Fri Oct 9 12:42:21 2015 by root via cibadmin on example-host
	Stack: corosync
	Current DC: NONE
	1 node and 0 resources configured

	Node example-host: UNCLEAN (offline)

	Full list of resources:


	PCSD Status:
	example-host: Online

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	After a short amount of time, you should see your host as a single node in the
	cluster:
	----
	# pcs status
	Cluster name: mycluster
	WARNING: no stonith devices and stonith-enabled is not false
	Last updated: Fri Oct 9 15:20:05 2015 Last change: Fri Oct 9 12:42:21 2015 by root via cibadmin on example-host
	Stack: corosync
	Current DC: example-host (version 1.1.13-a14efad) - partition WITHOUT quorum
	1 node and 0 resources configured

	Online: [ example-host ]

	Full list of resources:


	PCSD Status:
	example-host: Online

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	=== Disable STONITH and Quorum ===

	Now, enable the cluster to work without quorum or stonith. This is required
	for the sake of getting this tutorial to work with a single cluster node.

	----
	# pcs property set stonith-enabled=false
	# pcs property set no-quorum-policy=ignore
	----

	[WARNING]
	=========
	The use of `stonith-enabled=false` is completely inappropriate for a production
	cluster. It tells the cluster to simply pretend that failed nodes are safely
	powered off. Some vendors will refuse to support clusters that have STONITH
	disabled. We disable STONITH here only to focus the discussion on
	pacemaker_remote, and to be able to use a single physical host in the example.
	=========

	Now, the status output should look similar to this:
	----
	# pcs status
	Cluster name: mycluster
	Last updated: Fri Oct 9 15:22:49 2015 Last change: Fri Oct 9 15:22:46 2015 by root via cibadmin on example-host
	Stack: corosync
	Current DC: example-host (version 1.1.13-a14efad) - partition with quorum
	1 node and 0 resources configured

	Online: [ example-host ]

	Full list of resources:


	PCSD Status:
	example-host: Online

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	Go ahead and stop the cluster for now after verifying everything is in order.
	----
	# pcs cluster stop --force
	----

	=== Install Virtualization Software ===

	----
	# yum install -y kvm libvirt qemu-system qemu-kvm bridge-utils virt-manager
	# systemctl enable libvirtd.service
	----

	Reboot the host.

	[NOTE]
	======
	While KVM is used in this example, any virtualization platform with a Pacemaker
	resource agent can be used to create a guest node. The resource agent needs
	only to support usual commands (start, stop, etc.); Pacemaker implements the
	remote-node meta-attribute, independent of the agent.
	======

	== Configure the KVM guest ==

	=== Create Guest ===

	We will not outline here the installation steps required to create a KVM
	guest. There are plenty of tutorials available elsewhere that do that.
	Just be sure to configure the guest with a hostname and a static IP address
	(as an example here, we will use guest1 and 192.168.122.10).

	=== Configure Firewall on Guest ===

	On each guest, allow cluster-related services through the local firewall,
	following the same procedure as in <<_configure_firewall_on_host>>.

	=== Verify Connectivity ===

	At this point, you should be able to ping and ssh into guests from hosts, and
	vice versa.

	=== Configure pacemaker_remote ===

	Install pacemaker_remote, and enable it to run at start-up. Here, we also
	install the pacemaker package; it is not required, but it contains the dummy
	resource agent that we will use later for testing.
	----
	# yum install -y pacemaker pacemaker-remote resource-agents
	# systemctl enable pacemaker_remote.service
	----

	Copy the authentication key from a host:
	----
	# mkdir -p --mode=0750 /etc/pacemaker
	# chgrp haclient /etc/pacemaker
	# scp root@example-host:/etc/pacemaker/authkey /etc/pacemaker
	----

	Start pacemaker_remote, and verify the start was successful:
	----
	# systemctl start pacemaker_remote
	# systemctl status pacemaker_remote

	pacemaker_remote.service - Pacemaker Remote Service
	Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; enabled)
	Active: active (running) since Thu 2013-03-14 18:24:04 EDT; 2min 8s ago
	Main PID: 1233 (pacemaker_remot)
	CGroup: name=systemd:/system/pacemaker_remote.service
	└─1233 /usr/sbin/pacemaker_remoted

	Mar 14 18:24:04 guest1 systemd[1]: Starting Pacemaker Remote Service...
	Mar 14 18:24:04 guest1 systemd[1]: Started Pacemaker Remote Service.
	Mar 14 18:24:04 guest1 pacemaker_remoted[1233]: notice: lrmd_init_remote_tls_server: Starting a tls listener on port 3121.
	----

	=== Verify Host Connection to Guest ===

	Before moving forward, it's worth verifying that the host can contact the guest
	on port 3121. Here's a trick you can use. Connect using ssh from the host. The
	connection will get destroyed, but how it is destroyed tells you whether it
	worked or not.

	First add guest1 to the host machine's +/etc/hosts+ file if you haven't
	already. This is required unless you have DNS setup in a way where guest1's
	address can be discovered.

	----
	# cat << END >> /etc/hosts
	192.168.122.10 guest1
	END
	----

	If running the ssh command on one of the cluster nodes results in this
	-output before disconnecting, the connection works.
	+output before disconnecting, the connection works:
	----
	# ssh -p 3121 guest1
	ssh_exchange_identification: read: Connection reset by peer
	----

	-If you see one of these, the connection is not working.
	+If you see one of these, the connection is not working:
	----
	# ssh -p 3121 guest1
	ssh: connect to host guest1 port 3121: No route to host
	----
	----
	# ssh -p 3121 guest1
	ssh: connect to host guest1 port 3121: Connection refused
	----

	Once you can successfully connect to the guest from the host, shutdown the guest. Pacemaker will be managing the virtual machine from this point forward.

	== Integrate Guest into Cluster ==

	Now the fun part, integrating the virtual machine you've just created into the cluster. It is incredibly simple.

	=== Start the Cluster ===
	On the host, start pacemaker.

	----
	# pcs cluster start
	----

	Wait for the host to become the DC. The output of `pcs status` should look
	as it did in <<_disable_stonith_and_quorum>>.

	=== Integrate as Guest Node ===

	If you didn't already do this earlier in the verify host to guest connection
	section, add the KVM guest's IP address to the host's +/etc/hosts+ file so we
	can connect by hostname. For this example:
	----
	# cat << END >> /etc/hosts
	192.168.122.10 guest1
	END
	----

	We will use the VirtualDomain resource agent for the management of the
	virtual machine. This agent requires the virtual machine's XML config to be
	dumped to a file on disk. To do this, pick out the name of the virtual machine
	you just created from the output of this list.

	....
	# virsh list --all
	Id Name State
	----------------------------------------------------
	- guest1 shut off
	....

	In my case I named it guest1. Dump the xml to a file somewhere on the host using the following command.

	----
	# virsh dumpxml guest1 > /etc/pacemaker/guest1.xml
	----

	Now just register the resource with pacemaker and you're set!

	----
	# pcs resource create vm-guest1 VirtualDomain hypervisor="qemu:///system" \
	config="/etc/pacemaker/guest1.xml" meta remote-node=guest1
	----

	[NOTE]
	======
	This example puts the guest XML under /etc/pacemaker because the
	permissions and SELinux labeling should not need any changes.
	If you run into trouble with this or any step, try disabling SELinux
	with `setenforce 0`. If it works after that, see SELinux documentation
	for how to troubleshoot, if you wish to reenable SELinux.
	======

	[NOTE]
	======
	Pacemaker will automatically monitor pacemaker_remote connections for failure,
	so it is not necessary to create a recurring monitor on the VirtualDomain
	resource.
	======

	Once the vm-guest1 resource is started you will see guest1 appear in the
	`pcs status` output as a node. The final `pcs status` output should look
	something like this.

	----
	# pcs status
	Cluster name: mycluster
	Last updated: Fri Oct 9 18:00:45 2015 Last change: Fri Oct 9 17:53:44 2015 by root via crm_resource on example-host
	Stack: corosync
	Current DC: example-host (version 1.1.13-a14efad) - partition with quorum
	2 nodes and 2 resources configured

	Online: [ example-host ]
	GuestOnline: [ guest1@example-host ]

	Full list of resources:

	vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host

	PCSD Status:
	example-host: Online

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	=== Starting Resources on KVM Guest ===

	The commands below demonstrate how resources can be executed on both the
	guest node and the cluster node.

	Create a few Dummy resources. Dummy resources are real resource agents used just for testing purposes. They actually execute on the host they are assigned to just like an apache server or database would, except their execution just means a file was created. When the resource is stopped, that the file it created is removed.

	----
	# pcs resource create FAKE1 ocf:pacemaker:Dummy
	# pcs resource create FAKE2 ocf:pacemaker:Dummy
	# pcs resource create FAKE3 ocf:pacemaker:Dummy
	# pcs resource create FAKE4 ocf:pacemaker:Dummy
	# pcs resource create FAKE5 ocf:pacemaker:Dummy
	----

	Now check your `pcs status` output. In the resource section, you should see
	something like the following, where some of the resources started on the
	cluster node, and some started on the guest node.

	----
	Full list of resources:

	vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host
	FAKE1 (ocf::pacemaker:Dummy): Started guest1
	FAKE2 (ocf::pacemaker:Dummy): Started guest1
	FAKE3 (ocf::pacemaker:Dummy): Started example-host
	FAKE4 (ocf::pacemaker:Dummy): Started guest1
	FAKE5 (ocf::pacemaker:Dummy): Started example-host
	----


	The guest node, guest1, reacts just like any other node in the cluster. For
	example, pick out a resource that is running on your cluster node. For my
	purposes, I am picking FAKE3 from the output above. We can force FAKE3 to run
	on guest1 in the exact same way we would any other node.

	----
	# pcs constraint location FAKE3 prefers guest1
	----

	Now, looking at the bottom of the `pcs status` output you'll see FAKE3 is on
	guest1.

	----
	Full list of resources:

	vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host
	FAKE1 (ocf::pacemaker:Dummy): Started guest1
	FAKE2 (ocf::pacemaker:Dummy): Started guest1
	FAKE3 (ocf::pacemaker:Dummy): Started guest1
	FAKE4 (ocf::pacemaker:Dummy): Started example-host
	FAKE5 (ocf::pacemaker:Dummy): Started example-host
	----

	=== Testing Recovery and Fencing ===

	Pacemaker's policy engine is smart enough to know fencing guest nodes
	associated with a virtual machine means shutting off/rebooting the virtual
	machine. No special configuration is necessary to make this happen. If you
	are interested in testing this functionality out, trying stopping the guest's
	pacemaker_remote daemon. This would be equivalent of abruptly terminating a
	cluster node's corosync membership without properly shutting it down.

	ssh into the guest and run this command.

	----
	# kill -9 `pidof pacemaker_remoted`
	----

	Within a few seconds, your `pcs status` output will show a monitor failure,
	and the guest1 node will not be shown while it is being recovered.
	----
	# pcs status
	Cluster name: mycluster
	Last updated: Fri Oct 9 18:08:35 2015 Last change: Fri Oct 9 18:07:00 2015 by root via cibadmin on example-host
	Stack: corosync
	Current DC: example-host (version 1.1.13-a14efad) - partition with quorum
	2 nodes and 7 resources configured

	Online: [ example-host ]

	Full list of resources:

	vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host
	FAKE1 (ocf::pacemaker:Dummy): Stopped
	FAKE2 (ocf::pacemaker:Dummy): Stopped
	FAKE3 (ocf::pacemaker:Dummy): Stopped
	FAKE4 (ocf::pacemaker:Dummy): Started example-host
	FAKE5 (ocf::pacemaker:Dummy): Started example-host

	Failed Actions:
	* guest1_monitor_30000 on example-host 'unknown error' (1): call=8, status=Error, exitreason='none',
	last-rc-change='Fri Oct 9 18:08:29 2015', queued=0ms, exec=0ms


	PCSD Status:
	example-host: Online

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	[NOTE]
	======
	A guest node involves two resources: the one you explicitly configured creates the guest,
	and Pacemaker creates an implicit resource for the pacemaker_remote connection, which
	will be named the same as the value of the remote-node attribute of the explicit resource.
	When we killed pacemaker_remote, it is the implicit resource that failed, which is why
	the failed action starts with guest1 and not vm-guest1.
	======

	Once recovery of the guest is complete, you'll see it automatically get
	re-integrated into the cluster. The final `pcs status` output should look
	something like this.

	----
	Cluster name: mycluster
	Last updated: Fri Oct 9 18:18:30 2015 Last change: Fri Oct 9 18:07:00 2015 by root via cibadmin on example-host
	Stack: corosync
	Current DC: example-host (version 1.1.13-a14efad) - partition with quorum
	2 nodes and 7 resources configured

	Online: [ example-host ]
	GuestOnline: [ guest1@example-host ]

	Full list of resources:

	vm-guest1 (ocf::heartbeat:VirtualDomain): Started example-host
	FAKE1 (ocf::pacemaker:Dummy): Started guest1
	FAKE2 (ocf::pacemaker:Dummy): Started guest1
	FAKE3 (ocf::pacemaker:Dummy): Started guest1
	FAKE4 (ocf::pacemaker:Dummy): Started example-host
	FAKE5 (ocf::pacemaker:Dummy): Started example-host

	Failed Actions:
	* guest1_monitor_30000 on example-host 'unknown error' (1): call=8, status=Error, exitreason='none',
	last-rc-change='Fri Oct 9 18:08:29 2015', queued=0ms, exec=0ms


	PCSD Status:
	example-host: Online

	Daemon Status:
	corosync: active/disabled
	pacemaker: active/disabled
	pcsd: active/enabled
	----

	Normally, once you've investigated and addressed a failed action, you can clear the
	failure. However Pacemaker does not yet support cleanup for the implicitly
	created connection resource while the explicit resource is active. If you want
	to clear the failed action from the status output, stop the guest resource before
	clearing it. For example:
	----
	# pcs resource disable vm-guest1 --wait
	# pcs resource cleanup guest1
	# pcs resource enable vm-guest1
	----

	=== Accessing Cluster Tools from Guest Node ===

	Besides allowing the cluster to manage resources on a guest node,
	pacemaker_remote has one other trick. The pacemaker_remote daemon allows
	nearly all the pacemaker tools (`crm_resource`, `crm_mon`, `crm_attribute`,
	`crm_master`, etc.) to work on guest nodes natively.

	Try it: Run `crm_mon` on the guest after pacemaker has
	integrated the guest node into the cluster. These tools just work. This
	means resource agents such as master/slave resources which need access to tools
	like `crm_master` work seamlessly on the guest nodes.

	Higher-level command shells such as `pcs` may have partial support
	on guest nodes, but it is recommended to run them from a cluster node.
	diff --git a/doc/Pacemaker_Remote/en-US/Revision_History.xml b/doc/Pacemaker_Remote/en-US/Revision_History.xml
	index 1954f14d96..b3d1fd285d 100644
	--- a/doc/Pacemaker_Remote/en-US/Revision_History.xml
	+++ b/doc/Pacemaker_Remote/en-US/Revision_History.xml
	@@ -1,42 +1,49 @@
	<?xml version='1.0' encoding='utf-8' ?>
	<!DOCTYPE appendix PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
	<!ENTITY % BOOK_ENTITIES SYSTEM "Pacemaker_Remote.ent">
	%BOOK_ENTITIES;
	]>
	<appendix id="appe-Pacemaker_Remote-Revision_History">
	+ <!-- see comment in Book_Info.xml for revision numbering -->
	<title>Revision History</title>
	<simpara>
	<revhistory>
	<revision>
	<revnumber>1-0</revnumber>
	<date>Tue Mar 19 2013</date>
	<author><firstname>David</firstname><surname>Vossel</surname><email>davidvossel@gmail.com</email></author>
	<revdescription><simplelist><member>Import from Pages.app</member></simplelist></revdescription>
	</revision>
	<revision>
	<revnumber>2-0</revnumber>
	<date>Tue May 13 2013</date>
	<author><firstname>David</firstname><surname>Vossel</surname><email>davidvossel@gmail.com</email></author>
	<revdescription><simplelist><member>Added Future Features Section</member></simplelist></revdescription>
	</revision>
	<revision>
	<revnumber>3-0</revnumber>
	<date>Fri Oct 18 2013</date>
	<author><firstname>David</firstname><surname>Vossel</surname><email>davidvossel@gmail.com</email></author>
	<revdescription><simplelist><member>Added Baremetal remote-node feature documentation</member></simplelist></revdescription>
	</revision>
	<revision>
	<revnumber>4-0</revnumber>
	<date>Tue Aug 25 2015</date>
	<author><firstname>Ken</firstname><surname>Gaillot</surname><email>kgaillot@redhat.com</email></author>
	<revdescription><simplelist><member>Targeted CentOS 7.1 and Pacemaker 1.1.12+, updated for current terminology and practice</member></simplelist></revdescription>
	</revision>
	<revision>
	<revnumber>5-0</revnumber>
	<date>Tue Dec 8 2015</date>
	<author><firstname>Ken</firstname><surname>Gaillot</surname><email>kgaillot@redhat.com</email></author>
	<revdescription><simplelist><member>Updated for Pacemaker 1.1.14</member></simplelist></revdescription>
	</revision>
	+ <revision>
	+ <revnumber>6-0</revnumber>
	+ <date>Tue May 3 2016</date>
	+ <author><firstname>Ken</firstname><surname>Gaillot</surname><email>kgaillot@redhat.com</email></author>
	+ <revdescription><simplelist><member>Updated for Pacemaker 1.1.15</member></simplelist></revdescription>
	+ </revision>
	</revhistory>
	</simpara>
	</appendix>

File Metadata

Mime Type: text/x-diff
Expires: Sat, Jan 25, 11:54 AM (1 d, 19 h)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 1322454
Default Alt Text: (108 KB)

No OneTemporaryActions

View Options

File Metadata

Event Timeline

No OneTemporary
Actions