diff --git a/cts/cli/regression.daemons.exp b/cts/cli/regression.daemons.exp
index c9843a017e..704354e02c 100644
--- a/cts/cli/regression.daemons.exp
+++ b/cts/cli/regression.daemons.exp
@@ -1,751 +1,751 @@
=#=#=#= Begin test: Get CIB manager metadata =#=#=#=
1.1
Cluster options used by Pacemaker's Cluster Information Base manager
Cluster Information Base manager options
Enable Access Control Lists (ACLs) for the CIB
Enable Access Control Lists (ACLs) for the CIB
Raise this if log has "Evicting client" messages for cluster daemon PIDs (a good value is the number of resources in the cluster multiplied by the number of nodes).
Maximum IPC message backlog before disconnecting a cluster daemon
=#=#=#= End test: Get CIB manager metadata - OK (0) =#=#=#=
* Passed: pacemaker-based - Get CIB manager metadata
=#=#=#= Begin test: Get controller metadata =#=#=#=
1.1
Cluster options used by Pacemaker's controller
Pacemaker controller options
Includes a hash which identifies the exact revision the code was built from. Used for diagnostic purposes.
Pacemaker version on cluster node elected Designated Controller (DC)
Used for informational and diagnostic purposes.
The messaging layer on which Pacemaker is currently running
This optional value is mostly for users' convenience as desired in administration, but may also be used in Pacemaker configuration rules via the #cluster-name node attribute, and by higher-level tools and resource agents.
An arbitrary name for the cluster
The optimal value will depend on the speed and load of your network and the type of switches used.
How long to wait for a response from other nodes during start-up
Pacemaker is primarily event-driven, and looks ahead to know when to recheck cluster state for failure-timeout settings and most time-based rules. However, it will also recheck the cluster after this amount of inactivity, to evaluate rules with date specifications and serve as a fail-safe for certain types of scheduler bugs. A value of 0 disables polling. A positive value sets an interval in seconds, unless other units are specified (for example, "5min").
Polling interval to recheck cluster state and evaluate rules with date specifications
A cluster node may receive notification of a "succeeded" fencing that targeted it if fencing is misconfigured, or if fabric fencing is in use that doesn't cut cluster communication. Use "stop" to attempt to immediately stop Pacemaker and stay stopped, or "panic" to attempt to immediately reboot the local node, falling back to stop on failure.
How a cluster node should react if notified of its own fencing
Declare an election failed if it is not decided within this much time. If you need to adjust this value, it probably indicates the presence of a bug.
Declare an election failed if it is not decided within this much time. If you need to adjust this value, it probably indicates the presence of a bug.
Exit immediately if shutdown does not complete within this much time. If you need to adjust this value, it probably indicates the presence of a bug.
Exit immediately if shutdown does not complete within this much time. If you need to adjust this value, it probably indicates the presence of a bug.
If you need to adjust this value, it probably indicates the presence of a bug.
If you need to adjust this value, it probably indicates the presence of a bug.
If you need to adjust this value, it probably indicates the presence of a bug.
If you need to adjust this value, it probably indicates the presence of a bug.
Delay cluster recovery for this much time to allow for additional events to occur. Useful if your configuration is sensitive to the order in which ping updates arrive.
Enabling this option will slow down cluster recovery under all conditions
If this is set to a positive value, lost nodes are assumed to achieve self-fencing using watchdog-based SBD within this much time. This does not require a fencing resource to be explicitly configured, though a fence_watchdog resource can be configured, to limit use to specific nodes. If this is set to 0 (the default), the cluster will never assume watchdog-based self-fencing. If this is set to a negative value, the cluster will use twice the local value of the `SBD_WATCHDOG_TIMEOUT` environment variable if that is positive, or otherwise treat this as 0. WARNING: When used, this timeout must be larger than `SBD_WATCHDOG_TIMEOUT` on all nodes that use watchdog-based SBD, and Pacemaker will refuse to start on any of those nodes where this is not true for the local value or SBD is not active. When this is set to a negative value, `SBD_WATCHDOG_TIMEOUT` must be set to the same value on all nodes that use SBD, otherwise data corruption or loss could occur.
How long before nodes can be assumed to be safely down when watchdog-based self-fencing via SBD is in use
How many times fencing can fail before it will no longer be immediately re-attempted on a target
How many times fencing can fail before it will no longer be immediately re-attempted on a target
The cluster will slow down its recovery process when the amount of system resources used (currently CPU) approaches this limit
Maximum amount of system load that should be used by cluster nodes
Maximum number of jobs that can be scheduled per node (defaults to 2x cores)
Maximum number of jobs that can be scheduled per node (defaults to 2x cores)
=#=#=#= End test: Get controller metadata - OK (0) =#=#=#=
* Passed: pacemaker-controld - Get controller metadata
=#=#=#= Begin test: Get fencer metadata =#=#=#=
1.1
Instance attributes available for all "stonith"-class resources and used by Pacemaker's fence daemon, formerly known as stonithd
Instance attributes available for all "stonith"-class resources
Some devices do not support the standard 'port' parameter or may provide additional ones. Use this to specify an alternate, device-specific, parameter that should indicate the machine to be fenced. A value of "none" can be used to tell the cluster not to supply any additional parameters.
An alternate parameter to supply instead of 'port'
For example, "node1:1;node2:2,3" would tell the cluster to use port 1 for node1 and ports 2 and 3 for node2.
A mapping of node names to port numbers for devices that do not support node names.
- For example, "node1,node2,node3".
+ Comma-separated list of nodes that can be targeted by this device (for example, "node1,node2,node3"). If pcmk_host_check is "static-list", either this or pcmk_host_map must be set.
- A list of nodes that can be targeted by this device (optional unless pcmk_host_list="static-list")
+ Nodes targeted by this device
Use "dynamic-list" to query the device via the 'list' command; "static-list" to check the pcmk_host_list attribute; "status" to query the device via the 'status' command; or "none" to assume every device can fence every node. The default value is "static-list" if pcmk_host_map or pcmk_host_list is set; otherwise "dynamic-list" if the device supports the list operation; otherwise "status" if the device supports the status operation; otherwise "none"
How to determine which nodes can be targeted by the device
Enable a delay of no more than the time specified before executing fencing actions. Pacemaker derives the overall delay by taking the value of pcmk_delay_base and adding a random delay value such that the sum is kept below this maximum.
Enable a delay of no more than the time specified before executing fencing actions.
This enables a static delay for fencing actions, which can help avoid "death matches" where two nodes try to fence each other at the same time. If pcmk_delay_max is also used, a random delay will be added such that the total delay is kept below that value. This can be set to a single time value to apply to any node targeted by this device (useful if a separate device is configured for each target), or to a node map (for example, "node1:1s;node2:5") to set a different value for each target.
Enable a base delay for fencing actions and specify base delay value.
Cluster property concurrent-fencing="true" needs to be configured first. Then use this to specify the maximum number of actions can be performed in parallel on this device. A value of -1 means an unlimited number of actions can be performed in parallel.
The maximum number of actions can be performed in parallel on this device
Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the 'reboot' action.
An alternate command to run instead of 'reboot'
Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for 'reboot' actions.
Specify an alternate timeout to use for 'reboot' actions instead of stonith-timeout
Some devices do not support multiple connections. Operations may "fail" if the device is busy with another task. In that case, Pacemaker will automatically retry the operation if there is time remaining. Use this option to alter the number of times Pacemaker tries a 'reboot' action before giving up.
The maximum number of times to try the 'reboot' command within the timeout period
Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the 'off' action.
An alternate command to run instead of 'off'
Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for 'off' actions.
Specify an alternate timeout to use for 'off' actions instead of stonith-timeout
Some devices do not support multiple connections. Operations may "fail" if the device is busy with another task. In that case, Pacemaker will automatically retry the operation if there is time remaining. Use this option to alter the number of times Pacemaker tries a 'off' action before giving up.
The maximum number of times to try the 'off' command within the timeout period
Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the 'on' action.
An alternate command to run instead of 'on'
Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for 'on' actions.
Specify an alternate timeout to use for 'on' actions instead of stonith-timeout
Some devices do not support multiple connections. Operations may "fail" if the device is busy with another task. In that case, Pacemaker will automatically retry the operation if there is time remaining. Use this option to alter the number of times Pacemaker tries a 'on' action before giving up.
The maximum number of times to try the 'on' command within the timeout period
Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the 'list' action.
An alternate command to run instead of 'list'
Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for 'list' actions.
Specify an alternate timeout to use for 'list' actions instead of stonith-timeout
Some devices do not support multiple connections. Operations may "fail" if the device is busy with another task. In that case, Pacemaker will automatically retry the operation if there is time remaining. Use this option to alter the number of times Pacemaker tries a 'list' action before giving up.
The maximum number of times to try the 'list' command within the timeout period
Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the 'monitor' action.
An alternate command to run instead of 'monitor'
Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for 'monitor' actions.
Specify an alternate timeout to use for 'monitor' actions instead of stonith-timeout
Some devices do not support multiple connections. Operations may "fail" if the device is busy with another task. In that case, Pacemaker will automatically retry the operation if there is time remaining. Use this option to alter the number of times Pacemaker tries a 'monitor' action before giving up.
The maximum number of times to try the 'monitor' command within the timeout period
Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the 'status' action.
An alternate command to run instead of 'status'
Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for 'status' actions.
Specify an alternate timeout to use for 'status' actions instead of stonith-timeout
Some devices do not support multiple connections. Operations may "fail" if the device is busy with another task. In that case, Pacemaker will automatically retry the operation if there is time remaining. Use this option to alter the number of times Pacemaker tries a 'status' action before giving up.
The maximum number of times to try the 'status' command within the timeout period
=#=#=#= End test: Get fencer metadata - OK (0) =#=#=#=
* Passed: pacemaker-fenced - Get fencer metadata
=#=#=#= Begin test: Get scheduler metadata =#=#=#=
1.1
Cluster options used by Pacemaker's scheduler
Pacemaker scheduler options
What to do when the cluster does not have quorum
What to do when the cluster does not have quorum
When true, resources active on a node when it is cleanly shut down are kept "locked" to that node (not allowed to run elsewhere) until they start again on that node after it rejoins (or for at most shutdown-lock-limit, if set). Stonith resources and Pacemaker Remote connections are never locked. Clone and bundle instances and the promoted role of promotable clones are currently never locked, though support could be added in a future release.
Whether to lock resources to a cleanly shut down node
If shutdown-lock is true and this is set to a nonzero time duration, shutdown locks will expire after this much time has passed since the shutdown was initiated, even if the node has not rejoined.
Do not lock resources to a cleanly shut down node longer than this
Whether resources can run on any node by default
Whether resources can run on any node by default
Whether the cluster should refrain from monitoring, starting, and stopping resources
Whether the cluster should refrain from monitoring, starting, and stopping resources
When true, the cluster will immediately ban a resource from a node if it fails to start there. When false, the cluster will instead check the resource's fail count against its migration-threshold.
Whether a start failure should prevent a resource from being recovered on the same node
Whether the cluster should check for active resources during start-up
Whether the cluster should check for active resources during start-up
If false, unresponsive nodes are immediately assumed to be harmless, and resources that were active on them may be recovered elsewhere. This can result in a "split-brain" situation, potentially leading to data loss and/or service unavailability.
Whether nodes may be fenced as part of recovery
Action to send to fence device when a node needs to be fenced ("poweroff" is a deprecated alias for "off")
Action to send to fence device when a node needs to be fenced ("poweroff" is a deprecated alias for "off")
How long to wait for on, off, and reboot fence actions to complete by default
How long to wait for on, off, and reboot fence actions to complete by default
This is set automatically by the cluster according to whether SBD is detected to be in use. User-configured values are ignored. The value `true` is meaningful if diskless SBD is used and `stonith-watchdog-timeout` is nonzero. In that case, if fencing is required, watchdog-based self-fencing will be performed via SBD without requiring a fencing resource explicitly configured.
Whether watchdog integration is enabled
Allow performing fencing operations in parallel
Allow performing fencing operations in parallel
Setting this to false may lead to a "split-brain" situation, potentially leading to data loss and/or service unavailability.
Whether to fence unseen nodes at start-up
Apply specified delay for the fencings that are targeting the lost nodes with the highest total resource priority in case we don't have the majority of the nodes in our cluster partition, so that the more significant nodes potentially win any fencing match, which is especially meaningful under split-brain of 2-node cluster. A promoted resource instance takes the base priority + 1 on calculation if the base priority is not 0. Any static/random delays that are introduced by `pcmk_delay_base/max` configured for the corresponding fencing resources will be added to this delay. This delay should be significantly greater than, safely twice, the maximum `pcmk_delay_base/max`. By default, priority fencing delay is disabled.
Apply fencing delay targeting the lost nodes with the highest total resource priority
Fence nodes that do not join the controller process group within this much time after joining the cluster, to allow the cluster to continue managing resources. A value of 0 means never fence pending nodes. Setting the value to 2h means fence nodes after 2 hours.
How long to wait for a node that has joined the cluster to join the controller process group
The node elected Designated Controller (DC) will consider an action failed if it does not get a response from the node executing the action within this time (after considering the action's own timeout). The "correct" value will depend on the speed and load of your network and cluster nodes.
Maximum time for node-to-node communication
The "correct" value will depend on the speed and load of your network and cluster nodes. If set to 0, the cluster will impose a dynamically calculated limit when any node has a high load.
Maximum number of jobs that the cluster may execute in parallel across all nodes
The number of live migration actions that the cluster is allowed to execute in parallel on a node (-1 means no limit)
The number of live migration actions that the cluster is allowed to execute in parallel on a node (-1 means no limit)
Whether the cluster should stop all active resources
Whether the cluster should stop all active resources
Whether to stop resources that were removed from the configuration
Whether to stop resources that were removed from the configuration
Whether to cancel recurring actions removed from the configuration
Whether to cancel recurring actions removed from the configuration
Values other than default are poorly tested and potentially dangerous.
Whether to remove stopped resources from the executor
Zero to disable, -1 to store unlimited.
The number of scheduler inputs resulting in errors to save
Zero to disable, -1 to store unlimited.
The number of scheduler inputs resulting in warnings to save
Zero to disable, -1 to store unlimited.
The number of scheduler inputs without errors or warnings to save
Requires external entities to create node attributes (named with the prefix "#health") with values "red", "yellow", or "green".
How cluster should react to node health attributes
Only used when "node-health-strategy" is set to "progressive".
Base health score assigned to a node
Only used when "node-health-strategy" is set to "custom" or "progressive".
The score to use for a node health attribute whose value is "green"
Only used when "node-health-strategy" is set to "custom" or "progressive".
The score to use for a node health attribute whose value is "yellow"
Only used when "node-health-strategy" is set to "custom" or "progressive".
The score to use for a node health attribute whose value is "red"
How the cluster should allocate resources to nodes
How the cluster should allocate resources to nodes
=#=#=#= End test: Get scheduler metadata - OK (0) =#=#=#=
* Passed: pacemaker-schedulerd - Get scheduler metadata
diff --git a/daemons/fenced/pacemaker-fenced.c b/daemons/fenced/pacemaker-fenced.c
index 1b95136586..e71369d4ed 100644
--- a/daemons/fenced/pacemaker-fenced.c
+++ b/daemons/fenced/pacemaker-fenced.c
@@ -1,976 +1,977 @@
/*
* Copyright 2009-2024 the Pacemaker project contributors
*
* The version control history for this file may have further details.
*
* This source code is licensed under the GNU General Public License version 2
* or later (GPLv2+) WITHOUT ANY WARRANTY.
*/
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include // PRIu32, PRIx32
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#define SUMMARY "daemon for executing fencing devices in a Pacemaker cluster"
char *stonith_our_uname = NULL;
long long stonith_watchdog_timeout_ms = 0;
GList *stonith_watchdog_targets = NULL;
static GMainLoop *mainloop = NULL;
gboolean stand_alone = FALSE;
gboolean stonith_shutdown_flag = FALSE;
static qb_ipcs_service_t *ipcs = NULL;
static pcmk__output_t *out = NULL;
pcmk__supported_format_t formats[] = {
PCMK__SUPPORTED_FORMAT_NONE,
PCMK__SUPPORTED_FORMAT_TEXT,
PCMK__SUPPORTED_FORMAT_XML,
{ NULL, NULL, NULL }
};
static struct {
bool no_cib_connect;
gchar **log_files;
} options;
crm_exit_t exit_code = CRM_EX_OK;
static void stonith_cleanup(void);
static int32_t
st_ipc_accept(qb_ipcs_connection_t * c, uid_t uid, gid_t gid)
{
if (stonith_shutdown_flag) {
crm_info("Ignoring new client [%d] during shutdown",
pcmk__client_pid(c));
return -ECONNREFUSED;
}
if (pcmk__new_client(c, uid, gid) == NULL) {
return -ENOMEM;
}
return 0;
}
/* Exit code means? */
static int32_t
st_ipc_dispatch(qb_ipcs_connection_t * qbc, void *data, size_t size)
{
uint32_t id = 0;
uint32_t flags = 0;
int call_options = 0;
xmlNode *request = NULL;
pcmk__client_t *c = pcmk__find_client(qbc);
const char *op = NULL;
if (c == NULL) {
crm_info("Invalid client: %p", qbc);
return 0;
}
request = pcmk__client_data2xml(c, data, &id, &flags);
if (request == NULL) {
pcmk__ipc_send_ack(c, id, flags, PCMK__XE_NACK, NULL, CRM_EX_PROTOCOL);
return 0;
}
op = crm_element_value(request, PCMK__XA_CRM_TASK);
if(pcmk__str_eq(op, CRM_OP_RM_NODE_CACHE, pcmk__str_casei)) {
crm_xml_add(request, PCMK__XA_T, PCMK__VALUE_STONITH_NG);
crm_xml_add(request, PCMK__XA_ST_OP, op);
crm_xml_add(request, PCMK__XA_ST_CLIENTID, c->id);
crm_xml_add(request, PCMK__XA_ST_CLIENTNAME, pcmk__client_name(c));
crm_xml_add(request, PCMK__XA_ST_CLIENTNODE, stonith_our_uname);
send_cluster_message(NULL, crm_msg_stonith_ng, request, FALSE);
free_xml(request);
return 0;
}
if (c->name == NULL) {
const char *value = crm_element_value(request, PCMK__XA_ST_CLIENTNAME);
c->name = crm_strdup_printf("%s.%u", pcmk__s(value, "unknown"), c->pid);
}
crm_element_value_int(request, PCMK__XA_ST_CALLOPT, &call_options);
crm_trace("Flags %#08" PRIx32 "/%#08x for command %" PRIu32
" from client %s", flags, call_options, id, pcmk__client_name(c));
if (pcmk_is_set(call_options, st_opt_sync_call)) {
CRM_ASSERT(flags & crm_ipc_client_response);
CRM_LOG_ASSERT(c->request_id == 0); /* This means the client has two synchronous events in-flight */
c->request_id = id; /* Reply only to the last one */
}
crm_xml_add(request, PCMK__XA_ST_CLIENTID, c->id);
crm_xml_add(request, PCMK__XA_ST_CLIENTNAME, pcmk__client_name(c));
crm_xml_add(request, PCMK__XA_ST_CLIENTNODE, stonith_our_uname);
crm_log_xml_trace(request, "ipc-received");
stonith_command(c, id, flags, request, NULL);
free_xml(request);
return 0;
}
/* Error code means? */
static int32_t
st_ipc_closed(qb_ipcs_connection_t * c)
{
pcmk__client_t *client = pcmk__find_client(c);
if (client == NULL) {
return 0;
}
crm_trace("Connection %p closed", c);
pcmk__free_client(client);
/* 0 means: yes, go ahead and destroy the connection */
return 0;
}
static void
st_ipc_destroy(qb_ipcs_connection_t * c)
{
crm_trace("Connection %p destroyed", c);
st_ipc_closed(c);
}
static void
stonith_peer_callback(xmlNode * msg, void *private_data)
{
const char *remote_peer = crm_element_value(msg, PCMK__XA_SRC);
const char *op = crm_element_value(msg, PCMK__XA_ST_OP);
if (pcmk__str_eq(op, STONITH_OP_POKE, pcmk__str_none)) {
return;
}
crm_log_xml_trace(msg, "Peer[inbound]");
stonith_command(NULL, 0, 0, msg, remote_peer);
}
#if SUPPORT_COROSYNC
static void
stonith_peer_ais_callback(cpg_handle_t handle,
const struct cpg_name *groupName,
uint32_t nodeid, uint32_t pid, void *msg, size_t msg_len)
{
uint32_t kind = 0;
xmlNode *xml = NULL;
const char *from = NULL;
char *data = pcmk_message_common_cs(handle, nodeid, pid, msg, &kind, &from);
if(data == NULL) {
return;
}
if (kind == crm_class_cluster) {
xml = string2xml(data);
if (xml == NULL) {
crm_err("Invalid XML: '%.120s'", data);
free(data);
return;
}
crm_xml_add(xml, PCMK__XA_SRC, from);
stonith_peer_callback(xml, NULL);
}
free_xml(xml);
free(data);
return;
}
static void
stonith_peer_cs_destroy(gpointer user_data)
{
crm_crit("Lost connection to cluster layer, shutting down");
stonith_shutdown(0);
}
#endif
void
do_local_reply(const xmlNode *notify_src, pcmk__client_t *client,
int call_options)
{
/* send callback to originating child */
int local_rc = pcmk_rc_ok;
int rid = 0;
uint32_t ipc_flags = crm_ipc_server_event;
if (pcmk_is_set(call_options, st_opt_sync_call)) {
CRM_LOG_ASSERT(client->request_id);
rid = client->request_id;
client->request_id = 0;
ipc_flags = crm_ipc_flags_none;
}
local_rc = pcmk__ipc_send_xml(client, rid, notify_src, ipc_flags);
if (local_rc == pcmk_rc_ok) {
crm_trace("Sent response %d to client %s",
rid, pcmk__client_name(client));
} else {
crm_warn("%synchronous reply to client %s failed: %s",
(pcmk_is_set(call_options, st_opt_sync_call)? "S" : "As"),
pcmk__client_name(client), pcmk_rc_str(local_rc));
}
}
uint64_t
get_stonith_flag(const char *name)
{
if (pcmk__str_eq(name, T_STONITH_NOTIFY_FENCE, pcmk__str_casei)) {
return st_callback_notify_fence;
} else if (pcmk__str_eq(name, STONITH_OP_DEVICE_ADD, pcmk__str_casei)) {
return st_callback_device_add;
} else if (pcmk__str_eq(name, STONITH_OP_DEVICE_DEL, pcmk__str_casei)) {
return st_callback_device_del;
} else if (pcmk__str_eq(name, T_STONITH_NOTIFY_HISTORY, pcmk__str_casei)) {
return st_callback_notify_history;
} else if (pcmk__str_eq(name, T_STONITH_NOTIFY_HISTORY_SYNCED, pcmk__str_casei)) {
return st_callback_notify_history_synced;
}
return st_callback_unknown;
}
static void
stonith_notify_client(gpointer key, gpointer value, gpointer user_data)
{
const xmlNode *update_msg = user_data;
pcmk__client_t *client = value;
const char *type = NULL;
CRM_CHECK(client != NULL, return);
CRM_CHECK(update_msg != NULL, return);
type = crm_element_value(update_msg, PCMK__XA_SUBT);
CRM_CHECK(type != NULL, crm_log_xml_err(update_msg, "notify"); return);
if (client->ipcs == NULL) {
crm_trace("Skipping client with NULL channel");
return;
}
if (pcmk_is_set(client->flags, get_stonith_flag(type))) {
int rc = pcmk__ipc_send_xml(client, 0, update_msg,
crm_ipc_server_event);
if (rc != pcmk_rc_ok) {
crm_warn("%s notification of client %s failed: %s "
CRM_XS " id=%.8s rc=%d", type, pcmk__client_name(client),
pcmk_rc_str(rc), client->id, rc);
} else {
crm_trace("Sent %s notification to client %s",
type, pcmk__client_name(client));
}
}
}
void
do_stonith_async_timeout_update(const char *client_id, const char *call_id, int timeout)
{
pcmk__client_t *client = NULL;
xmlNode *notify_data = NULL;
if (!timeout || !call_id || !client_id) {
return;
}
client = pcmk__find_client_by_id(client_id);
if (!client) {
return;
}
notify_data = create_xml_node(NULL, PCMK__XE_ST_ASYNC_TIMEOUT_VALUE);
crm_xml_add(notify_data, PCMK__XA_T, PCMK__VALUE_ST_ASYNC_TIMEOUT_VALUE);
crm_xml_add(notify_data, PCMK__XA_ST_CALLID, call_id);
crm_xml_add_int(notify_data, PCMK__XA_ST_TIMEOUT, timeout);
crm_trace("timeout update is %d for client %s and call id %s", timeout, client_id, call_id);
if (client) {
pcmk__ipc_send_xml(client, 0, notify_data, crm_ipc_server_event);
}
free_xml(notify_data);
}
/*!
* \internal
* \brief Notify relevant IPC clients of a fencing operation result
*
* \param[in] type Notification type
* \param[in] result Result of fencing operation (assume success if NULL)
* \param[in] data If not NULL, add to notification as call data
*/
void
fenced_send_notification(const char *type, const pcmk__action_result_t *result,
xmlNode *data)
{
/* TODO: Standardize the contents of data */
xmlNode *update_msg = create_xml_node(NULL, PCMK__XE_NOTIFY);
CRM_LOG_ASSERT(type != NULL);
crm_xml_add(update_msg, PCMK__XA_T, PCMK__VALUE_ST_NOTIFY);
crm_xml_add(update_msg, PCMK__XA_SUBT, type);
crm_xml_add(update_msg, PCMK__XA_ST_OP, type);
stonith__xe_set_result(update_msg, result);
if (data != NULL) {
add_message_xml(update_msg, PCMK__XA_ST_CALLDATA, data);
}
crm_trace("Notifying clients");
pcmk__foreach_ipc_client(stonith_notify_client, update_msg);
free_xml(update_msg);
crm_trace("Notify complete");
}
/*!
* \internal
* \brief Send notifications for a configuration change to subscribed clients
*
* \param[in] op Notification type (\c STONITH_OP_DEVICE_ADD,
* \c STONITH_OP_DEVICE_DEL, \c STONITH_OP_LEVEL_ADD, or
* \c STONITH_OP_LEVEL_DEL)
* \param[in] result Operation result
* \param[in] desc Description of what changed (either device ID or string
* representation of level
* ([]))
*/
void
fenced_send_config_notification(const char *op,
const pcmk__action_result_t *result,
const char *desc)
{
xmlNode *notify_data = create_xml_node(NULL, op);
CRM_CHECK(notify_data != NULL, return);
crm_xml_add(notify_data, PCMK__XA_ST_DEVICE_ID, desc);
fenced_send_notification(op, result, notify_data);
free_xml(notify_data);
}
/*!
* \internal
* \brief Check whether a node does watchdog-fencing
*
* \param[in] node Name of node to check
*
* \return TRUE if node found in stonith_watchdog_targets
* or stonith_watchdog_targets is empty indicating
* all nodes are doing watchdog-fencing
*/
gboolean
node_does_watchdog_fencing(const char *node)
{
return ((stonith_watchdog_targets == NULL) ||
pcmk__str_in_list(node, stonith_watchdog_targets, pcmk__str_casei));
}
void
stonith_shutdown(int nsig)
{
crm_info("Terminating with %d clients", pcmk__ipc_client_count());
stonith_shutdown_flag = TRUE;
if (mainloop != NULL && g_main_loop_is_running(mainloop)) {
g_main_loop_quit(mainloop);
}
}
static void
stonith_cleanup(void)
{
fenced_cib_cleanup();
if (ipcs) {
qb_ipcs_destroy(ipcs);
}
crm_peer_destroy();
pcmk__client_cleanup();
free_stonith_remote_op_list();
free_topology_list();
free_device_list();
free_metadata_cache();
fenced_unregister_handlers();
free(stonith_our_uname);
stonith_our_uname = NULL;
}
static gboolean
stand_alone_cpg_cb(const gchar *option_name, const gchar *optarg, gpointer data,
GError **error)
{
stand_alone = FALSE;
options.no_cib_connect = true;
return TRUE;
}
struct qb_ipcs_service_handlers ipc_callbacks = {
.connection_accept = st_ipc_accept,
.connection_created = NULL,
.msg_process = st_ipc_dispatch,
.connection_closed = st_ipc_closed,
.connection_destroyed = st_ipc_destroy
};
/*!
* \internal
* \brief Callback for peer status changes
*
* \param[in] type What changed
* \param[in] node What peer had the change
* \param[in] data Previous value of what changed
*/
static void
st_peer_update_callback(enum crm_status_type type, crm_node_t * node, const void *data)
{
if ((type != crm_status_processes)
&& !pcmk_is_set(node->flags, crm_remote_node)) {
/*
* This is a hack until we can send to a nodeid and/or we fix node name lookups
* These messages are ignored in stonith_peer_callback()
*/
xmlNode *query = create_xml_node(NULL, PCMK__XE_STONITH_COMMAND);
crm_xml_add(query, PCMK__XA_T, PCMK__VALUE_STONITH_NG);
crm_xml_add(query, PCMK__XA_ST_OP, STONITH_OP_POKE);
crm_debug("Broadcasting our uname because of node %u", node->id);
send_cluster_message(NULL, crm_msg_stonith_ng, query, FALSE);
free_xml(query);
}
}
static pcmk__cluster_option_t fencer_options[] = {
/* name, old name, type, allowed values,
* default value, validator,
* flags,
* short description,
* long description
*/
{
PCMK_STONITH_HOST_ARGUMENT, NULL, "string", NULL,
"port", NULL,
pcmk__opt_advanced,
N_("An alternate parameter to supply instead of 'port'"),
N_("Some devices do not support the standard 'port' parameter or may "
"provide additional ones. Use this to specify an alternate, device-"
"specific, parameter that should indicate the machine to be "
"fenced. A value of \"none\" can be used to tell the cluster not "
"to supply any additional parameters."),
},
{
PCMK_STONITH_HOST_MAP, NULL, "string", NULL,
NULL, NULL,
pcmk__opt_none,
N_("A mapping of node names to port numbers for devices that do not "
"support node names."),
N_("For example, \"node1:1;node2:2,3\" would tell the cluster to use "
"port 1 for node1 and ports 2 and 3 for node2."),
},
{
PCMK_STONITH_HOST_LIST, NULL, "string", NULL,
NULL, NULL,
pcmk__opt_none,
- N_("A list of nodes that can be targeted by this device (optional "
- "unless pcmk_host_list=\"static-list\")"),
- N_("For example, \"node1,node2,node3\"."),
+ N_("Nodes targeted by this device"),
+ N_("Comma-separated list of nodes that can be targeted by this device "
+ "(for example, \"node1,node2,node3\"). If pcmk_host_check is "
+ "\"static-list\", either this or pcmk_host_map must be set."),
},
{
PCMK_STONITH_HOST_CHECK, NULL, "select",
"dynamic-list, static-list, status, none",
NULL, NULL,
pcmk__opt_none,
N_("How to determine which nodes can be targeted by the device"),
N_("Use \"dynamic-list\" to query the device via the 'list' command; "
"\"static-list\" to check the pcmk_host_list attribute; "
"\"status\" to query the device via the 'status' command; or "
"\"none\" to assume every device can fence every node. "
"The default value is \"static-list\" if pcmk_host_map or "
"pcmk_host_list is set; otherwise \"dynamic-list\" if the device "
"supports the list operation; otherwise \"status\" if the device "
"supports the status operation; otherwise \"none\""),
},
{
PCMK_STONITH_DELAY_MAX, NULL, "time", NULL,
"0s", NULL,
pcmk__opt_none,
N_("Enable a delay of no more than the time specified before executing "
"fencing actions."),
N_("Enable a delay of no more than the time specified before executing "
"fencing actions. Pacemaker derives the overall delay by taking "
"the value of pcmk_delay_base and adding a random delay value such "
"that the sum is kept below this maximum."),
},
{
PCMK_STONITH_DELAY_BASE, NULL, "string", NULL,
"0s", NULL,
pcmk__opt_none,
N_("Enable a base delay for fencing actions and specify base delay "
"value."),
N_("This enables a static delay for fencing actions, which can help "
"avoid \"death matches\" where two nodes try to fence each other "
"at the same time. If pcmk_delay_max is also used, a random delay "
"will be added such that the total delay is kept below that value. "
"This can be set to a single time value to apply to any node "
"targeted by this device (useful if a separate device is "
"configured for each target), or to a node map (for example, "
"\"node1:1s;node2:5\") to set a different value for each target."),
},
{
PCMK_STONITH_ACTION_LIMIT, NULL, "integer", NULL,
"1", NULL,
pcmk__opt_none,
N_("The maximum number of actions can be performed in parallel on this "
"device"),
N_("Cluster property concurrent-fencing=\"true\" needs to be "
"configured first. Then use this to specify the maximum number of "
"actions can be performed in parallel on this device. A value of "
"-1 means an unlimited number of actions can be performed in "
"parallel."),
},
{
"pcmk_reboot_action", NULL, "string", NULL,
PCMK_ACTION_REBOOT, NULL,
pcmk__opt_advanced,
N_("An alternate command to run instead of 'reboot'"),
N_("Some devices do not support the standard commands or may provide "
"additional ones. Use this to specify an alternate, device-"
"specific, command that implements the 'reboot' action."),
},
{
"pcmk_reboot_timeout", NULL, "time", NULL,
"60s", NULL,
pcmk__opt_advanced,
N_("Specify an alternate timeout to use for 'reboot' actions instead "
"of stonith-timeout"),
N_("Some devices need much more/less time to complete than normal. "
"Use this to specify an alternate, device-specific, timeout for "
"'reboot' actions."),
},
{
"pcmk_reboot_retries", NULL, "integer", NULL,
"2", NULL,
pcmk__opt_advanced,
N_("The maximum number of times to try the 'reboot' command within the "
"timeout period"),
N_("Some devices do not support multiple connections. Operations may "
"\"fail\" if the device is busy with another task. In that case, "
"Pacemaker will automatically retry the operation if there is time "
"remaining. Use this option to alter the number of times Pacemaker "
"tries a 'reboot' action before giving up."),
},
{
"pcmk_off_action", NULL, "string", NULL,
PCMK_ACTION_OFF, NULL,
pcmk__opt_advanced,
N_("An alternate command to run instead of 'off'"),
N_("Some devices do not support the standard commands or may provide "
"additional ones. Use this to specify an alternate, device-"
"specific, command that implements the 'off' action."),
},
{
"pcmk_off_timeout", NULL, "time", NULL,
"60s", NULL,
pcmk__opt_advanced,
N_("Specify an alternate timeout to use for 'off' actions instead of "
"stonith-timeout"),
N_("Some devices need much more/less time to complete than normal. "
"Use this to specify an alternate, device-specific, timeout for "
"'off' actions."),
},
{
"pcmk_off_retries", NULL, "integer", NULL,
"2", NULL,
pcmk__opt_advanced,
N_("The maximum number of times to try the 'off' command within the "
"timeout period"),
N_("Some devices do not support multiple connections. Operations may "
"\"fail\" if the device is busy with another task. In that case, "
"Pacemaker will automatically retry the operation if there is time "
"remaining. Use this option to alter the number of times Pacemaker "
"tries a 'off' action before giving up."),
},
{
"pcmk_on_action", NULL, "string", NULL,
PCMK_ACTION_ON, NULL,
pcmk__opt_advanced,
N_("An alternate command to run instead of 'on'"),
N_("Some devices do not support the standard commands or may provide "
"additional ones. Use this to specify an alternate, device-"
"specific, command that implements the 'on' action."),
},
{
"pcmk_on_timeout", NULL, "time", NULL,
"60s", NULL,
pcmk__opt_advanced,
N_("Specify an alternate timeout to use for 'on' actions instead of "
"stonith-timeout"),
N_("Some devices need much more/less time to complete than normal. "
"Use this to specify an alternate, device-specific, timeout for "
"'on' actions."),
},
{
"pcmk_on_retries", NULL, "integer", NULL,
"2", NULL,
pcmk__opt_advanced,
N_("The maximum number of times to try the 'on' command within the "
"timeout period"),
N_("Some devices do not support multiple connections. Operations may "
"\"fail\" if the device is busy with another task. In that case, "
"Pacemaker will automatically retry the operation if there is time "
"remaining. Use this option to alter the number of times Pacemaker "
"tries a 'on' action before giving up."),
},
{
"pcmk_list_action", NULL, "string", NULL,
PCMK_ACTION_LIST, NULL,
pcmk__opt_advanced,
N_("An alternate command to run instead of 'list'"),
N_("Some devices do not support the standard commands or may provide "
"additional ones. Use this to specify an alternate, device-"
"specific, command that implements the 'list' action."),
},
{
"pcmk_list_timeout", NULL, "time", NULL,
"60s", NULL,
pcmk__opt_advanced,
N_("Specify an alternate timeout to use for 'list' actions instead of "
"stonith-timeout"),
N_("Some devices need much more/less time to complete than normal. "
"Use this to specify an alternate, device-specific, timeout for "
"'list' actions."),
},
{
"pcmk_list_retries", NULL, "integer", NULL,
"2", NULL,
pcmk__opt_advanced,
N_("The maximum number of times to try the 'list' command within the "
"timeout period"),
N_("Some devices do not support multiple connections. Operations may "
"\"fail\" if the device is busy with another task. In that case, "
"Pacemaker will automatically retry the operation if there is time "
"remaining. Use this option to alter the number of times Pacemaker "
"tries a 'list' action before giving up."),
},
{
"pcmk_monitor_action", NULL, "string", NULL,
PCMK_ACTION_MONITOR, NULL,
pcmk__opt_advanced,
N_("An alternate command to run instead of 'monitor'"),
N_("Some devices do not support the standard commands or may provide "
"additional ones. Use this to specify an alternate, device-"
"specific, command that implements the 'monitor' action."),
},
{
"pcmk_monitor_timeout", NULL, "time", NULL,
"60s", NULL,
pcmk__opt_advanced,
N_("Specify an alternate timeout to use for 'monitor' actions instead "
"of stonith-timeout"),
N_("Some devices need much more/less time to complete than normal. "
"Use this to specify an alternate, device-specific, timeout for "
"'monitor' actions."),
},
{
"pcmk_monitor_retries", NULL, "integer", NULL,
"2", NULL,
pcmk__opt_advanced,
N_("The maximum number of times to try the 'monitor' command within "
"the timeout period"),
N_("Some devices do not support multiple connections. Operations may "
"\"fail\" if the device is busy with another task. In that case, "
"Pacemaker will automatically retry the operation if there is time "
"remaining. Use this option to alter the number of times Pacemaker "
"tries a 'monitor' action before giving up."),
},
{
"pcmk_status_action", NULL, "string", NULL,
PCMK_ACTION_STATUS, NULL,
pcmk__opt_advanced,
N_("An alternate command to run instead of 'status'"),
N_("Some devices do not support the standard commands or may provide "
"additional ones. Use this to specify an alternate, device-"
"specific, command that implements the 'status' action."),
},
{
"pcmk_status_timeout", NULL, "time", NULL,
"60s", NULL,
pcmk__opt_advanced,
N_("Specify an alternate timeout to use for 'status' actions instead "
"of stonith-timeout"),
N_("Some devices need much more/less time to complete than normal. "
"Use this to specify an alternate, device-specific, timeout for "
"'status' actions."),
},
{
"pcmk_status_retries", NULL, "integer", NULL,
"2", NULL,
pcmk__opt_advanced,
N_("The maximum number of times to try the 'status' command within "
"the timeout period"),
N_("Some devices do not support multiple connections. Operations may "
"\"fail\" if the device is busy with another task. In that case, "
"Pacemaker will automatically retry the operation if there is time "
"remaining. Use this option to alter the number of times Pacemaker "
"tries a 'status' action before giving up."),
},
{ NULL, },
};
static int
fencer_metadata(void)
{
// @TODO Use pcmk__daemon_metadata when fencer_options moves to options.c
const char *name = "pacemaker-fenced";
const char *desc_short = N_("Instance attributes available for all "
"\"stonith\"-class resources");
const char *desc_long = N_("Instance attributes available for all "
"\"stonith\"-class resources and used by "
"Pacemaker's fence daemon, formerly known as "
"stonithd");
pcmk__output_t *tmp_out = NULL;
xmlNode *top = NULL;
const xmlNode *metadata = NULL;
char *metadata_s = NULL;
int rc = pcmk__output_new(&tmp_out, "xml", "/dev/null", NULL);
if (rc != pcmk_rc_ok) {
return rc;
}
out->message(tmp_out, "option-list", name, desc_short, desc_long,
pcmk__opt_none, fencer_options, true);
tmp_out->finish(tmp_out, CRM_EX_OK, false, (void **) &top);
metadata = first_named_child(top, PCMK_XE_RESOURCE_AGENT);
metadata_s = dump_xml_formatted_with_text(metadata);
out->output_xml(out, PCMK_XE_METADATA, metadata_s);
pcmk__output_free(tmp_out);
free_xml(top);
free(metadata_s);
return pcmk_rc_ok;
}
static GOptionEntry entries[] = {
{ "stand-alone", 's', G_OPTION_FLAG_NONE, G_OPTION_ARG_NONE, &stand_alone,
N_("Deprecated (will be removed in a future release)"), NULL },
{ "stand-alone-w-cpg", 'c', G_OPTION_FLAG_NO_ARG, G_OPTION_ARG_CALLBACK,
stand_alone_cpg_cb, N_("Intended for use in regression testing only"), NULL },
{ "logfile", 'l', G_OPTION_FLAG_NONE, G_OPTION_ARG_FILENAME_ARRAY,
&options.log_files, N_("Send logs to the additional named logfile"), NULL },
{ NULL }
};
static GOptionContext *
build_arg_context(pcmk__common_args_t *args, GOptionGroup **group)
{
GOptionContext *context = NULL;
context = pcmk__build_arg_context(args, "text (default), xml", group,
"[metadata]");
pcmk__add_main_args(context, entries);
return context;
}
int
main(int argc, char **argv)
{
int rc = pcmk_rc_ok;
crm_cluster_t *cluster = NULL;
crm_ipc_t *old_instance = NULL;
GError *error = NULL;
GOptionGroup *output_group = NULL;
pcmk__common_args_t *args = pcmk__new_common_args(SUMMARY);
gchar **processed_args = pcmk__cmdline_preproc(argv, "l");
GOptionContext *context = build_arg_context(args, &output_group);
crm_log_preinit(NULL, argc, argv);
pcmk__register_formats(output_group, formats);
if (!g_option_context_parse_strv(context, &processed_args, &error)) {
exit_code = CRM_EX_USAGE;
goto done;
}
rc = pcmk__output_new(&out, args->output_ty, args->output_dest, argv);
if (rc != pcmk_rc_ok) {
exit_code = CRM_EX_ERROR;
g_set_error(&error, PCMK__EXITC_ERROR, exit_code,
"Error creating output format %s: %s",
args->output_ty, pcmk_rc_str(rc));
goto done;
}
if (args->version) {
out->version(out, false);
goto done;
}
if ((g_strv_length(processed_args) >= 2)
&& pcmk__str_eq(processed_args[1], "metadata", pcmk__str_none)) {
rc = fencer_metadata();
if (rc != pcmk_rc_ok) {
exit_code = CRM_EX_FATAL;
g_set_error(&error, PCMK__EXITC_ERROR, exit_code,
"Unable to display metadata: %s", pcmk_rc_str(rc));
}
goto done;
}
// Open additional log files
pcmk__add_logfiles(options.log_files, out);
crm_log_init(NULL, LOG_INFO + args->verbosity, TRUE,
(args->verbosity > 0), argc, argv, FALSE);
crm_notice("Starting Pacemaker fencer");
old_instance = crm_ipc_new("stonith-ng", 0);
if (old_instance == NULL) {
/* crm_ipc_new() will have already logged an error message with
* crm_err()
*/
exit_code = CRM_EX_FATAL;
goto done;
}
if (pcmk__connect_generic_ipc(old_instance) == pcmk_rc_ok) {
// IPC endpoint already up
crm_ipc_close(old_instance);
crm_ipc_destroy(old_instance);
crm_err("pacemaker-fenced is already active, aborting startup");
goto done;
} else {
// Not up or not authentic, we'll proceed either way
crm_ipc_destroy(old_instance);
old_instance = NULL;
}
mainloop_add_signal(SIGTERM, stonith_shutdown);
crm_peer_init();
rc = fenced_scheduler_init();
if (rc != pcmk_rc_ok) {
exit_code = CRM_EX_FATAL;
g_set_error(&error, PCMK__EXITC_ERROR, exit_code,
"Error initializing scheduler data: %s", pcmk_rc_str(rc));
goto done;
}
cluster = pcmk_cluster_new();
if (!stand_alone) {
#if SUPPORT_COROSYNC
if (is_corosync_cluster()) {
cluster->destroy = stonith_peer_cs_destroy;
cluster->cpg.cpg_deliver_fn = stonith_peer_ais_callback;
cluster->cpg.cpg_confchg_fn = pcmk_cpg_membership;
}
#endif // SUPPORT_COROSYNC
crm_set_status_callback(&st_peer_update_callback);
if (crm_cluster_connect(cluster) == FALSE) {
exit_code = CRM_EX_FATAL;
crm_crit("Cannot sign in to the cluster... terminating");
goto done;
}
pcmk__str_update(&stonith_our_uname, cluster->uname);
if (!options.no_cib_connect) {
setup_cib();
}
} else {
pcmk__str_update(&stonith_our_uname, "localhost");
crm_warn("Stand-alone mode is deprecated and will be removed "
"in a future release");
}
init_device_list();
init_topology_list();
pcmk__serve_fenced_ipc(&ipcs, &ipc_callbacks);
// Create the mainloop and run it...
mainloop = g_main_loop_new(NULL, FALSE);
crm_notice("Pacemaker fencer successfully started and accepting connections");
g_main_loop_run(mainloop);
done:
g_strfreev(processed_args);
pcmk__free_arg_context(context);
g_strfreev(options.log_files);
stonith_cleanup();
pcmk_cluster_free(cluster);
fenced_scheduler_cleanup();
pcmk__output_and_clear_error(&error, out);
if (out != NULL) {
out->finish(out, exit_code, true, NULL);
pcmk__output_free(out);
}
pcmk__unregister_formats();
crm_exit(exit_code);
}
diff --git a/doc/sphinx/Pacemaker_Explained/fencing.rst b/doc/sphinx/Pacemaker_Explained/fencing.rst
index 109b4da604..f30ee4616a 100644
--- a/doc/sphinx/Pacemaker_Explained/fencing.rst
+++ b/doc/sphinx/Pacemaker_Explained/fencing.rst
@@ -1,1298 +1,1295 @@
.. index::
single: fencing
single: STONITH
.. _fencing:
Fencing
-------
What Is Fencing?
################
*Fencing* is the ability to make a node unable to run resources, even when that
node is unresponsive to cluster commands.
Fencing is also known as *STONITH*, an acronym for "Shoot The Other Node In The
Head", since the most common fencing method is cutting power to the node.
Another method is "fabric fencing", cutting the node's access to some
capability required to run resources (such as network access or a shared disk).
.. index::
single: fencing; why necessary
Why Is Fencing Necessary?
#########################
Fencing protects your data from being corrupted by malfunctioning nodes or
unintentional concurrent access to shared resources.
Fencing protects against the "split brain" failure scenario, where cluster
nodes have lost the ability to reliably communicate with each other but are
still able to run resources. If the cluster just assumed that uncommunicative
nodes were down, then multiple instances of a resource could be started on
different nodes.
The effect of split brain depends on the resource type. For example, an IP
address brought up on two hosts on a network will cause packets to randomly be
sent to one or the other host, rendering the IP useless. For a database or
clustered file system, the effect could be much more severe, causing data
corruption or divergence.
Fencing is also used when a resource cannot otherwise be stopped. If a
resource fails to stop on a node, it cannot be started on a different node
without risking the same type of conflict as split-brain. Fencing the
original node ensures the resource can be safely started elsewhere.
Users may also configure the ``on-fail`` property of :ref:`operation` or the
``loss-policy`` property of
:ref:`ticket constraints ` to ``fence``, in which
case the cluster will fence the resource's node if the operation fails or the
ticket is lost.
.. index::
single: fencing; device
Fence Devices
#############
A *fence device* or *fencing device* is a special type of resource that
provides the means to fence a node.
Examples of fencing devices include intelligent power switches and IPMI devices
that accept SNMP commands to cut power to a node, and iSCSI controllers that
allow SCSI reservations to be used to cut a node's access to a shared disk.
Since fencing devices will be used to recover from loss of networking
connectivity to other nodes, it is essential that they do not rely on the same
network as the cluster itself, otherwise that network becomes a single point of
failure.
Since loss of a node due to power outage is indistinguishable from loss of
network connectivity to that node, it is also essential that at least one fence
device for a node does not share power with that node. For example, an on-board
IPMI controller that shares power with its host should not be used as the sole
fencing device for that host.
Since fencing is used to isolate malfunctioning nodes, no fence device should
rely on its target functioning properly. This includes, for example, devices
that ssh into a node and issue a shutdown command (such devices might be
suitable for testing, but never for production).
.. index::
single: fencing; agent
Fence Agents
############
A *fence agent* or *fencing agent* is a ``stonith``-class resource agent.
The fence agent standard provides commands (such as ``off`` and ``reboot``)
that the cluster can use to fence nodes. As with other resource agent classes,
this allows a layer of abstraction so that Pacemaker doesn't need any knowledge
about specific fencing technologies -- that knowledge is isolated in the agent.
Pacemaker supports two fence agent standards, both inherited from
no-longer-active projects:
* Red Hat Cluster Suite (RHCS) style: These are typically installed in
``/usr/sbin`` with names starting with ``fence_``.
* Linux-HA style: These typically have names starting with ``external/``.
Pacemaker can support these agents using the **fence_legacy** RHCS-style
agent as a wrapper, *if* support was enabled when Pacemaker was built, which
requires the ``cluster-glue`` library.
When a Fence Device Can Be Used
###############################
Fencing devices do not actually "run" like most services. Typically, they just
provide an interface for sending commands to an external device.
Additionally, fencing may be initiated by Pacemaker, by other cluster-aware
software such as DRBD or DLM, or manually by an administrator, at any point in
the cluster life cycle, including before any resources have been started.
To accommodate this, Pacemaker does not require the fence device resource to be
"started" in order to be used. Whether a fence device is started or not
determines whether a node runs any recurring monitor for the device, and gives
the node a slight preference for being chosen to execute fencing using that
device.
By default, any node can execute any fencing device. If a fence device is
disabled by setting its ``target-role`` to ``Stopped``, then no node can use
that device. If a location constraint with a negative score prevents a specific
node from "running" a fence device, then that node will never be chosen to
execute fencing using the device. A node may fence itself, but the cluster will
choose that only if no other nodes can do the fencing.
A common configuration scenario is to have one fence device per target node.
In such a case, users often configure anti-location constraints so that
the target node does not monitor its own device.
Limitations of Fencing Resources
################################
Fencing resources have certain limitations that other resource classes don't:
* They may have only one set of meta-attributes and one set of instance
attributes.
* If :ref:`rules` are used to determine fencing resource options, these
might be evaluated only when first read, meaning that later changes to the
rules will have no effect. Therefore, it is better to avoid confusion and not
use rules at all with fencing resources.
These limitations could be revisited if there is sufficient user demand.
.. index::
single: fencing; special instance attributes
.. _fencing-attributes:
Special Meta-Attributes for Fencing Resources
#############################################
The table below lists special resource meta-attributes that may be set for any
fencing resource.
.. table:: **Additional Properties of Fencing Resources**
:widths: 2 1 2 4
+----------------------+---------+--------------------+----------------------------------------+
| Field | Type | Default | Description |
+======================+=========+====================+========================================+
| provides | string | | .. index:: |
| | | | single: provides |
| | | | |
| | | | Any special capability provided by the |
| | | | fence device. Currently, only one such |
| | | | capability is meaningful: |
| | | | :ref:`unfencing `. |
+----------------------+---------+--------------------+----------------------------------------+
Special Instance Attributes for Fencing Resources
#################################################
The table below lists special instance attributes that may be set for any
fencing resource (*not* meta-attributes, even though they are interpreted by
Pacemaker rather than the fence agent). These are also listed in the man page
for ``pacemaker-fenced``.
.. Not_Yet_Implemented:
+----------------------+---------+--------------------+----------------------------------------+
| priority | integer | 0 | .. index:: |
| | | | single: priority |
| | | | |
| | | | The priority of the fence device. |
| | | | Devices are tried in order of highest |
| | | | priority to lowest. |
+----------------------+---------+--------------------+----------------------------------------+
-.. table:: **Additional Properties of Fencing Resources**
+.. list-table:: **Additional Properties of Fencing Resources**
:class: longtable
:widths: 2 1 2 4
-
- +----------------------+---------+--------------------+----------------------------------------+
- | Field | Type | Default | Description |
- +======================+=========+====================+========================================+
- | stonith-timeout | time | | .. index:: |
- | | | | single: stonith-timeout |
- | | | | |
- | | | | This is not used by Pacemaker (see the |
- | | | | ``pcmk_reboot_timeout``, |
- | | | | ``pcmk_off_timeout``, etc. properties |
- | | | | instead), but it may be used by |
- | | | | Linux-HA fence agents. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_host_map | string | | .. index:: |
- | | | | single: pcmk_host_map |
- | | | | |
- | | | | A mapping of node names to ports |
- | | | | for devices that do not understand |
- | | | | the node names. |
- | | | | |
- | | | | Example: ``node1:1;node2:2,3`` tells |
- | | | | the cluster to use port 1 for |
- | | | | ``node1`` and ports 2 and 3 for |
- | | | | ``node2``. If ``pcmk_host_check`` is |
- | | | | explicitly set to ``static-list``, |
- | | | | either this or ``pcmk_host_list`` must |
- | | | | be set. The port portion of the map |
- | | | | may contain special characters such as |
- | | | | spaces if preceded by a backslash |
- | | | | *(since 2.1.2)*. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_host_list | string | | .. index:: |
- | | | | single: pcmk_host_list |
- | | | | |
- | | | | A list of machines controlled by this |
- | | | | device. If ``pcmk_host_check`` is |
- | | | | explicitly set to ``static-list``, |
- | | | | either this or ``pcmk_host_map`` must |
- | | | | be set. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_host_check | string | Value appropriate | .. index:: |
- | | | to other | single: pcmk_host_check |
- | | | parameters (see | |
- | | | "Default Check | The method Pacemaker should use to |
- | | | Type" below) | determine which nodes can be targeted |
- | | | | by this device. Allowed values: |
- | | | | |
- | | | | * ``static-list:`` targets are listed |
- | | | | in the ``pcmk_host_list`` or |
- | | | | ``pcmk_host_map`` attribute |
- | | | | * ``dynamic-list:`` query the device |
- | | | | via the agent's ``list`` action |
- | | | | * ``status:`` query the device via the |
- | | | | agent's ``status`` action |
- | | | | * ``none:`` assume the device can |
- | | | | fence any node |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_delay_max | time | 0s | .. index:: |
- | | | | single: pcmk_delay_max |
- | | | | |
- | | | | Enable a delay of no more than the |
- | | | | time specified before executing |
- | | | | fencing actions. Pacemaker derives the |
- | | | | overall delay by taking the value of |
- | | | | pcmk_delay_base and adding a random |
- | | | | delay value such that the sum is kept |
- | | | | below this maximum. This is sometimes |
- | | | | used in two-node clusters to ensure |
- | | | | that the nodes don't fence each other |
- | | | | at the same time. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_delay_base | time | 0s | .. index:: |
- | | | | single: pcmk_delay_base |
- | | | | |
- | | | | Enable a static delay before executing |
- | | | | fencing actions. This can be used, for |
- | | | | example, in two-node clusters to |
- | | | | ensure that the nodes don't fence each |
- | | | | other, by having separate fencing |
- | | | | resources with different values. The |
- | | | | node that is fenced with the shorter |
- | | | | delay will lose a fencing race. The |
- | | | | overall delay introduced by pacemaker |
- | | | | is derived from this value plus a |
- | | | | random delay such that the sum is kept |
- | | | | below the maximum delay. A single |
- | | | | device can have different delays per |
- | | | | node using a host map *(since 2.1.2)*, |
- | | | | for example ``node1:0s;node2:5s.`` |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_action_limit | integer | 1 | .. index:: |
- | | | | single: pcmk_action_limit |
- | | | | |
- | | | | The maximum number of actions that can |
- | | | | be performed in parallel on this |
- | | | | device. A value of -1 means unlimited. |
- | | | | Node fencing actions initiated by the |
- | | | | cluster (as opposed to an administrator|
- | | | | running the ``stonith_admin`` tool or |
- | | | | the fencer running recurring device |
- | | | | monitors and ``status`` and ``list`` |
- | | | | commands) are additionally subject to |
- | | | | the ``concurrent-fencing`` cluster |
- | | | | property. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_host_argument | string | ``port`` otherwise | .. index:: |
- | | | ``plug`` if | single: pcmk_host_argument |
- | | | supported | |
- | | | according to the | *Advanced use only.* Which parameter |
- | | | metadata of the | should be supplied to the fence agent |
- | | | fence agent | to identify the node to be fenced. |
- | | | | Some devices support neither the |
- | | | | standard ``plug`` nor the deprecated |
- | | | | ``port`` parameter, or may provide |
- | | | | additional ones. Use this to specify |
- | | | | an alternate, device-specific |
- | | | | parameter. A value of ``none`` tells |
- | | | | the cluster not to supply any |
- | | | | additional parameters. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_reboot_action | string | reboot | .. index:: |
- | | | | single: pcmk_reboot_action |
- | | | | |
- | | | | *Advanced use only.* The command to |
- | | | | send to the resource agent in order to |
- | | | | reboot a node. Some devices do not |
- | | | | support the standard commands or may |
- | | | | provide additional ones. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | command. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_reboot_timeout | time | 60s | .. index:: |
- | | | | single: pcmk_reboot_timeout |
- | | | | |
- | | | | *Advanced use only.* Specify an |
- | | | | alternate timeout to use for |
- | | | | ``reboot`` actions instead of the |
- | | | | value of ``stonith-timeout``. Some |
- | | | | devices need much more or less time to |
- | | | | complete than normal. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | timeout. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_reboot_retries | integer | 2 | .. index:: |
- | | | | single: pcmk_reboot_retries |
- | | | | |
- | | | | *Advanced use only.* The maximum |
- | | | | number of times to retry the |
- | | | | ``reboot`` command within the timeout |
- | | | | period. Some devices do not support |
- | | | | multiple connections, and operations |
- | | | | may fail if the device is busy with |
- | | | | another task, so Pacemaker will |
- | | | | automatically retry the operation, if |
- | | | | there is time remaining. Use this |
- | | | | option to alter the number of times |
- | | | | Pacemaker retries before giving up. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_off_action | string | off | .. index:: |
- | | | | single: pcmk_off_action |
- | | | | |
- | | | | *Advanced use only.* The command to |
- | | | | send to the resource agent in order to |
- | | | | shut down a node. Some devices do not |
- | | | | support the standard commands or may |
- | | | | provide additional ones. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | command. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_off_timeout | time | 60s | .. index:: |
- | | | | single: pcmk_off_timeout |
- | | | | |
- | | | | *Advanced use only.* Specify an |
- | | | | alternate timeout to use for |
- | | | | ``off`` actions instead of the |
- | | | | value of ``stonith-timeout``. Some |
- | | | | devices need much more or less time to |
- | | | | complete than normal. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | timeout. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_off_retries | integer | 2 | .. index:: |
- | | | | single: pcmk_off_retries |
- | | | | |
- | | | | *Advanced use only.* The maximum |
- | | | | number of times to retry the |
- | | | | ``off`` command within the timeout |
- | | | | period. Some devices do not support |
- | | | | multiple connections, and operations |
- | | | | may fail if the device is busy with |
- | | | | another task, so Pacemaker will |
- | | | | automatically retry the operation, if |
- | | | | there is time remaining. Use this |
- | | | | option to alter the number of times |
- | | | | Pacemaker retries before giving up. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_list_action | string | list | .. index:: |
- | | | | single: pcmk_list_action |
- | | | | |
- | | | | *Advanced use only.* The command to |
- | | | | send to the resource agent in order to |
- | | | | list nodes. Some devices do not |
- | | | | support the standard commands or may |
- | | | | provide additional ones. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | command. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_list_timeout | time | 60s | .. index:: |
- | | | | single: pcmk_list_timeout |
- | | | | |
- | | | | *Advanced use only.* Specify an |
- | | | | alternate timeout to use for |
- | | | | ``list`` actions instead of the |
- | | | | value of ``stonith-timeout``. Some |
- | | | | devices need much more or less time to |
- | | | | complete than normal. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | timeout. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_list_retries | integer | 2 | .. index:: |
- | | | | single: pcmk_list_retries |
- | | | | |
- | | | | *Advanced use only.* The maximum |
- | | | | number of times to retry the |
- | | | | ``list`` command within the timeout |
- | | | | period. Some devices do not support |
- | | | | multiple connections, and operations |
- | | | | may fail if the device is busy with |
- | | | | another task, so Pacemaker will |
- | | | | automatically retry the operation, if |
- | | | | there is time remaining. Use this |
- | | | | option to alter the number of times |
- | | | | Pacemaker retries before giving up. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_monitor_action | string | monitor | .. index:: |
- | | | | single: pcmk_monitor_action |
- | | | | |
- | | | | *Advanced use only.* The command to |
- | | | | send to the resource agent in order to |
- | | | | report extended status. Some devices do|
- | | | | not support the standard commands or |
- | | | | may provide additional ones. Use this |
- | | | | to specify an alternate, |
- | | | | device-specific command. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_monitor_timeout | time | 60s | .. index:: |
- | | | | single: pcmk_monitor_timeout |
- | | | | |
- | | | | *Advanced use only.* Specify an |
- | | | | alternate timeout to use for |
- | | | | ``monitor`` actions instead of the |
- | | | | value of ``stonith-timeout``. Some |
- | | | | devices need much more or less time to |
- | | | | complete than normal. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | timeout. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_monitor_retries | integer | 2 | .. index:: |
- | | | | single: pcmk_monitor_retries |
- | | | | |
- | | | | *Advanced use only.* The maximum |
- | | | | number of times to retry the |
- | | | | ``monitor`` command within the timeout |
- | | | | period. Some devices do not support |
- | | | | multiple connections, and operations |
- | | | | may fail if the device is busy with |
- | | | | another task, so Pacemaker will |
- | | | | automatically retry the operation, if |
- | | | | there is time remaining. Use this |
- | | | | option to alter the number of times |
- | | | | Pacemaker retries before giving up. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_status_action | string | status | .. index:: |
- | | | | single: pcmk_status_action |
- | | | | |
- | | | | *Advanced use only.* The command to |
- | | | | send to the resource agent in order to |
- | | | | report status. Some devices do |
- | | | | not support the standard commands or |
- | | | | may provide additional ones. Use this |
- | | | | to specify an alternate, |
- | | | | device-specific command. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_status_timeout | time | 60s | .. index:: |
- | | | | single: pcmk_status_timeout |
- | | | | |
- | | | | *Advanced use only.* Specify an |
- | | | | alternate timeout to use for |
- | | | | ``status`` actions instead of the |
- | | | | value of ``stonith-timeout``. Some |
- | | | | devices need much more or less time to |
- | | | | complete than normal. Use this to |
- | | | | specify an alternate, device-specific |
- | | | | timeout. |
- +----------------------+---------+--------------------+----------------------------------------+
- | pcmk_status_retries | integer | 2 | .. index:: |
- | | | | single: pcmk_status_retries |
- | | | | |
- | | | | *Advanced use only.* The maximum |
- | | | | number of times to retry the |
- | | | | ``status`` command within the timeout |
- | | | | period. Some devices do not support |
- | | | | multiple connections, and operations |
- | | | | may fail if the device is busy with |
- | | | | another task, so Pacemaker will |
- | | | | automatically retry the operation, if |
- | | | | there is time remaining. Use this |
- | | | | option to alter the number of times |
- | | | | Pacemaker retries before giving up. |
- +----------------------+---------+--------------------+----------------------------------------+
+ :header-rows: 1
+
+ * - Name
+ - Type
+ - Default
+ - Description
+ * - .. _primitive_stonith_timeout:
+
+ .. index::
+ single: stonith-timeout (primitive instance attribute)
+
+ stonith-timeout
+ - :ref:`timeout `
+ -
+ - This is not used by Pacemaker (see the ``pcmk_reboot_timeout``,
+ ``pcmk_off_timeout``, etc., properties instead), but it may be used by
+ Linux-HA fence agents.
+ * - .. _pcmk_host_map:
+
+ .. _index::
+ single: pcmk_host_map
+
+ pcmk_host_map
+ - :ref:`text `
+ -
+ - A mapping of node names to ports for devices that do not understand the
+ node names. For example, ``node1:1;node2:2,3`` tells the cluster to use
+ port 1 for ``node1`` and ports 2 and 3 for ``node2``. If
+ ``pcmk_host_check`` is explicitly set to ``static-list``, either this or
+ ``pcmk_host_list`` must be set. The port portion of the map may contain
+ special characters such as spaces if preceded by a backslash *(since 2.1.2)*.
+ * - .. _pcmk_host_list:
+
+ .. _index::
+ single: pcmk_host_list
+
+ pcmk_host_list
+ - :ref:`text `
+ -
+ - Comma-separated list of nodes that can be targeted by this device (for
+ example, ``node1,node2,node3``). If pcmk_host_check is ``static-list``,
+ either this or ``pcmk_host_map`` must be set.
+ * - .. _pcmk_host_check:
+
+ .. _index::
+ single: pcmk_host_check
+
+ pcmk_host_check
+ - :ref:`text `
+ - See :ref:`pcmk_host_check_default`
+ - The method Pacemaker should use to determine which nodes can be targeted
+ by this device. Allowed values:
+
+ * ``static-list:`` targets are listed in the ``pcmk_host_list`` or ``pcmk_host_map`` attribute
+ * ``dynamic-list:`` query the device via the agent's ``list`` action
+ * ``status:`` query the device via the agent's ``status`` action
+ * ``none:`` assume the device can fence any node
+ * - .. _pcmk_delay_max:
+
+ .. _index::
+ single: pcmk_delay_max
+
+ pcmk_delay_max
+ - :ref:`duration `
+ - 0s
+ - Enable a delay of no more than the time specified before executing
+ fencing actions. Pacemaker derives the overall delay by taking the value
+ of pcmk_delay_base and adding a random delay value such that the sum is
+ kept below this maximum. This is sometimes used in two-node clusters to
+ ensure that the nodes don't fence each other at the same time.
+ * - .. _pcmk_delay_base:
+
+ .. _index::
+ single: pcmk_delay_base
+
+ pcmk_delay_base
+ - :ref:`duration `
+ - 0s
+ - Enable a static delay before executing fencing actions. This can be
+ used, for example, in two-node clusters to ensure that the nodes don't
+ fence each other, by having separate fencing resources with different
+ values. The node that is fenced with the shorter delay will lose a
+ fencing race. The overall delay introduced by pacemaker is derived from
+ this value plus a random delay such that the sum is kept below the
+ maximum delay. A single device can have different delays per node using
+ a host map *(since 2.1.2)*, for example ``node1:0s;node2:5s.``
+ * - .. _pcmk_action_limit:
+
+ .. _index::
+ single: pcmk_action_limit
+
+ pcmk_action_limit
+ - :ref:`integer `
+ - 1
+ - The maximum number of actions that can be performed in parallel on this
+ device. A value of -1 means unlimited. Node fencing actions initiated by
+ the cluster (as opposed to an administrator running the
+ ``stonith_admin`` tool or the fencer running recurring device monitors
+ and ``status`` and ``list`` commands) are additionally subject to the
+ ``concurrent-fencing`` cluster property.
+ * - .. _pcmk_host_argument:
+
+ .. _index::
+ single: pcmk_host_argument
+
+ pcmk_host_argument
+ - :ref:`text `
+ - ``port`` otherwise ``plug`` if supported according to the metadata of
+ the fence agent
+ - *Advanced use only.* Which parameter should be supplied to the fence
+ agent to identify the node to be fenced. Some devices support neither
+ the standard ``plug`` nor the deprecated ``port`` parameter, or may
+ provide additional ones. Use this to specify an alternate,
+ device-specific parameter. A value of ``none`` tells the cluster not to
+ supply any additional parameters.
+ * - .. _pcmk_reboot_action:
+
+ .. _index::
+ single: pcmk_reboot_action
+
+ pcmk_reboot_action
+ - :ref:`text `
+ - ``reboot``
+ - *Advanced use only.* The command to send to the resource agent in order
+ to reboot a node. Some devices do not support the standard commands or
+ may provide additional ones. Use this to specify an alternate,
+ device-specific command.
+ * - .. _pcmk_reboot_timeout:
+
+ .. _index::
+ single: pcmk_reboot_timeout
+
+ pcmk_reboot_timeout
+ - :ref:`integer `
+ - 60
+ - *Advanced use only.* Specify an alternate timeout (in seconds) to use
+ for ``reboot`` actions instead of the value of ``stonith-timeout``. Some
+ devices need much more or less time to complete than normal. Use this to
+ specify an alternate, device-specific timeout.
+ * - .. _pcmk_reboot_retries:
+
+ .. _index::
+ single: pcmk_reboot_retries
+
+ pcmk_reboot_retries
+ - :ref:`integer `
+ - 2
+ - *Advanced use only.* The maximum number of times to retry the ``reboot``
+ command within the timeout period. Some devices do not support multiple
+ connections, and operations may fail if the device is busy with another
+ task, so Pacemaker will automatically retry the operation, if there is
+ time remaining. Use this option to alter the number of times Pacemaker
+ retries before giving up.
+ * - .. _pcmk_off_action:
+
+ .. _index::
+ single: pcmk_off_action
+
+ pcmk_off_action
+ - :ref:`text `
+ - ``off``
+ - *Advanced use only.* The command to send to the resource agent in order
+ to shut down a node. Some devices do not support the standard commands or
+ may provide additional ones. Use this to specify an alternate,
+ device-specific command.
+ * - .. _pcmk_off_timeout:
+
+ .. _index::
+ single: pcmk_off_timeout
+
+ pcmk_off_timeout
+ - :ref:`integer `
+ - 60
+ - *Advanced use only.* Specify an alternate timeout (in seconds) to use
+ for ``off`` actions instead of the value of ``stonith-timeout``. Some
+ devices need much more or less time to complete than normal. Use this to
+ specify an alternate, device-specific timeout.
+ * - .. _pcmk_off_retries:
+
+ .. _index::
+ single: pcmk_off_retries
+
+ pcmk_off_retries
+ - :ref:`integer `
+ - 2
+ - *Advanced use only.* The maximum number of times to retry the ``off``
+ command within the timeout period. Some devices do not support multiple
+ connections, and operations may fail if the device is busy with another
+ task, so Pacemaker will automatically retry the operation, if there is
+ time remaining. Use this option to alter the number of times Pacemaker
+ retries before giving up.
+ * - .. _pcmk_list_action:
+
+ .. _index::
+ single: pcmk_list_action
+
+ pcmk_list_action
+ - :ref:`text `
+ - ``list``
+ - *Advanced use only.* The command to send to the resource agent in order
+ to list nodes. Some devices do not support the standard commands or may
+ provide additional ones. Use this to specify an alternate,
+ device-specific command.
+ * - .. _pcmk_list_timeout:
+
+ .. _index::
+ single: pcmk_list_timeout
+
+ pcmk_list_timeout
+ - :ref:`integer `
+ - 60
+ - *Advanced use only.* Specify an alternate timeout (in seconds) to use
+ for ``list`` actions instead of the value of ``stonith-timeout``. Some
+ devices need much more or less time to complete than normal. Use this to
+ specify an alternate, device-specific timeout.
+ * - .. _pcmk_list_retries:
+
+ .. _index::
+ single: pcmk_list_retries
+
+ pcmk_list_retries
+ - :ref:`integer `
+ - 2
+ - *Advanced use only.* The maximum number of times to retry the ``list``
+ command within the timeout period. Some devices do not support multiple
+ connections, and operations may fail if the device is busy with another
+ task, so Pacemaker will automatically retry the operation, if there is
+ time remaining. Use this option to alter the number of times Pacemaker
+ retries before giving up.
+ * - .. _pcmk_monitor_action:
+
+ .. _index::
+ single: pcmk_monitor_action
+
+ pcmk_monitor_action
+ - :ref:`text `
+ - ``monitor``
+ - *Advanced use only.* The command to send to the resource agent in order
+ to report extended status. Some devices do not support the standard
+ commands or may provide additional ones. Use this to specify an
+ alternate, device-specific command.
+ * - .. _pcmk_monitor_timeout:
+
+ .. _index::
+ single: pcmk_monitor_timeout
+
+ pcmk_monitor_timeout
+ - :ref:`integer `
+ - 60
+ - *Advanced use only.* Specify an alternate timeout (in seconds) to use
+ for ``monitor`` actions instead of the value of ``stonith-timeout``. Some
+ devices need much more or less time to complete than normal. Use this to
+ specify an alternate, device-specific timeout.
+ * - .. _pcmk_monitor_retries:
+
+ .. _index::
+ single: pcmk_monitor_retries
+
+ pcmk_monitor_retries
+ - :ref:`integer `
+ - 2
+ - *Advanced use only.* The maximum number of times to retry the ``monitor``
+ command within the timeout period. Some devices do not support multiple
+ connections, and operations may fail if the device is busy with another
+ task, so Pacemaker will automatically retry the operation, if there is
+ time remaining. Use this option to alter the number of times Pacemaker
+ retries before giving up.
+ * - .. _pcmk_status_action:
+
+ .. _index::
+ single: pcmk_status_action
+
+ pcmk_status_action
+ - :ref:`text `
+ - ``status``
+ - *Advanced use only.* The command to send to the resource agent in order
+ to report status. Some devices do not support the standard commands or
+ may provide additional ones. Use this to specify an alternate,
+ device-specific command.
+ * - .. _pcmk_status_timeout:
+
+ .. _index::
+ single: pcmk_status_timeout
+
+ pcmk_status_timeout
+ - :ref:`integer `
+ - 60
+ - *Advanced use only.* Specify an alternate timeout (in seconds) to use
+ for ``status`` actions instead of the value of ``stonith-timeout``. Some
+ devices need much more or less time to complete than normal. Use this to
+ specify an alternate, device-specific timeout.
+ * - .. _pcmk_status_retries:
+
+ .. _index::
+ single: pcmk_status_retries
+
+ pcmk_status_retries
+ - :ref:`integer `
+ - 2
+ - *Advanced use only.* The maximum number of times to retry the ``status``
+ command within the timeout period. Some devices do not support multiple
+ connections, and operations may fail if the device is busy with another
+ task, so Pacemaker will automatically retry the operation, if there is
+ time remaining. Use this option to alter the number of times Pacemaker
+ retries before giving up.
+
+.. _pcmk_host_check_default:
Default Check Type
##################
If the user does not explicitly configure ``pcmk_host_check`` for a fence
device, a default value appropriate to other configured parameters will be
used:
* If either ``pcmk_host_list`` or ``pcmk_host_map`` is configured,
``static-list`` will be used;
* otherwise, if the fence device supports the ``list`` action, and the first
attempt at using ``list`` succeeds, ``dynamic-list`` will be used;
* otherwise, if the fence device supports the ``status`` action, ``status``
will be used;
* otherwise, ``none`` will be used.
.. index::
single: unfencing
single: fencing; unfencing
.. _unfencing:
Unfencing
#########
With fabric fencing (such as cutting network or shared disk access rather than
power), it is expected that the cluster will fence the node, and then a system
administrator must manually investigate what went wrong, correct any issues
found, then reboot (or restart the cluster services on) the node.
Once the node reboots and rejoins the cluster, some fabric fencing devices
require an explicit command to restore the node's access. This capability is
called *unfencing* and is typically implemented as the fence agent's ``on``
command.
If any cluster resource has ``requires`` set to ``unfencing``, then that
resource will not be probed or started on a node until that node has been
unfenced.
Fencing and Quorum
##################
In general, a cluster partition may execute fencing only if the partition has
quorum, and the ``stonith-enabled`` cluster property is set to true. However,
there are exceptions:
* The requirements apply only to fencing initiated by Pacemaker. If an
administrator initiates fencing using the ``stonith_admin`` command, or an
external application such as DLM initiates fencing using Pacemaker's C API,
the requirements do not apply.
* A cluster partition without quorum is allowed to fence any active member of
that partition. As a corollary, this allows a ``no-quorum-policy`` of
``suicide`` to work.
* If the ``no-quorum-policy`` cluster property is set to ``ignore``, then
quorum is not required to execute fencing of any node.
Fencing Timeouts
################
Fencing timeouts are complicated, since a single fencing operation can involve
many steps, each of which may have a separate timeout.
Fencing may be initiated in one of several ways:
* An administrator may initiate fencing using the ``stonith_admin`` tool,
which has a ``--timeout`` option (defaulting to 2 minutes) that will be used
as the fence operation timeout.
* An external application such as DLM may initiate fencing using the Pacemaker
C API. The application will specify the fence operation timeout in this case,
which might or might not be configurable by the user.
* The cluster may initiate fencing itself. In this case, the
``stonith-timeout`` cluster property (defaulting to 1 minute) will be used as
the fence operation timeout.
However fencing is initiated, the initiator contacts Pacemaker's fencer
(``pacemaker-fenced``) to request fencing. This connection and request has its
own timeout, separate from the fencing operation timeout, but usually happens
very quickly.
The fencer will contact all fencers in the cluster to ask what devices they
have available to fence the target node. The fence operation timeout will be
used as the timeout for each of these queries.
Once a fencing device has been selected, the fencer will check whether any
action-specific timeout has been configured for the device, to use instead of
the fence operation timeout. For example, if ``stonith-timeout`` is 60 seconds,
but the fencing device has ``pcmk_reboot_timeout`` configured as 90 seconds,
then a timeout of 90 seconds will be used for reboot actions using that device.
A device may have retries configured, in which case the timeout applies across
all attempts. For example, if a device has ``pcmk_reboot_retries`` configured
as 2, and the first reboot attempt fails, the second attempt will only have
whatever time is remaining in the action timeout after subtracting how much
time the first attempt used. This means that if the first attempt fails due to
using the entire timeout, no further attempts will be made. There is currently
no way to configure a per-attempt timeout.
If more than one device is required to fence a target, whether due to failure
of the first device or a fencing topology with multiple devices configured for
the target, each device will have its own separate action timeout.
For all of the above timeouts, the fencer will generally multiply the
configured value by 1.2 to get an actual value to use, to account for time
needed by the fencer's own processing.
Separate from the fencer's timeouts, some fence agents have internal timeouts
for individual steps of their fencing process. These agents often have
parameters to configure these timeouts, such as ``login-timeout``,
``shell-timeout``, or ``power-timeout``. Many such agents also have a
``disable-timeout`` parameter to ignore their internal timeouts and just let
Pacemaker handle the timeout. This causes a difference in retry behavior.
If ``disable-timeout`` is not set, and the agent hits one of its internal
timeouts, it will report that as a failure to Pacemaker, which can then retry.
If ``disable-timeout`` is set, and Pacemaker hits a timeout for the agent, then
there will be no time remaining, and no retry will be done.
Fence Devices Dependent on Other Resources
##########################################
In some cases, a fence device may require some other cluster resource (such as
an IP address) to be active in order to function properly.
This is obviously undesirable in general: fencing may be required when the
depended-on resource is not active, or fencing may be required because the node
running the depended-on resource is no longer responding.
However, this may be acceptable under certain conditions:
* The dependent fence device should not be able to target any node that is
allowed to run the depended-on resource.
* The depended-on resource should not be disabled during production operation.
* The ``concurrent-fencing`` cluster property should be set to ``true``.
Otherwise, if both the node running the depended-on resource and some node
targeted by the dependent fence device need to be fenced, the fencing of the
node running the depended-on resource might be ordered first, making the
second fencing impossible and blocking further recovery. With concurrent
fencing, the dependent fence device might fail at first due to the
depended-on resource being unavailable, but it will be retried and eventually
succeed once the resource is brought back up.
Even under those conditions, there is one unlikely problem scenario. The DC
always schedules fencing of itself after any other fencing needed, to avoid
unnecessary repeated DC elections. If the dependent fence device targets the
DC, and both the DC and a different node running the depended-on resource need
to be fenced, the DC fencing will always fail and block further recovery. Note,
however, that losing a DC node entirely causes some other node to become DC and
schedule the fencing, so this is only a risk when a stop or other operation
with ``on-fail`` set to ``fencing`` fails on the DC.
.. index::
single: fencing; configuration
Configuring Fencing
###################
Higher-level tools can provide simpler interfaces to this process, but using
Pacemaker command-line tools, this is how you could configure a fence device.
#. Find the correct driver:
.. code-block:: none
# stonith_admin --list-installed
.. note::
You may have to install packages to make fence agents available on your
host. Searching your available packages for ``fence-`` is usually
helpful. Ensure the packages providing the fence agents you require are
installed on every cluster node.
#. Find the required parameters associated with the device
(replacing ``$AGENT_NAME`` with the name obtained from the previous step):
.. code-block:: none
# stonith_admin --metadata --agent $AGENT_NAME
#. Create a file called ``stonith.xml`` containing a primitive resource
with a class of ``stonith``, a type equal to the agent name obtained earlier,
and a parameter for each of the values returned in the previous step.
#. If the device does not know how to fence nodes based on their uname,
you may also need to set the special ``pcmk_host_map`` parameter. See
:ref:`fencing-attributes` for details.
#. If the device does not support the ``list`` command, you may also need
to set the special ``pcmk_host_list`` and/or ``pcmk_host_check``
parameters. See :ref:`fencing-attributes` for details.
#. If the device does not expect the target to be specified with the
``port`` parameter, you may also need to set the special
``pcmk_host_argument`` parameter. See :ref:`fencing-attributes` for details.
#. Upload it into the CIB using cibadmin:
.. code-block:: none
# cibadmin --create --scope resources --xml-file stonith.xml
#. Set ``stonith-enabled`` to true:
.. code-block:: none
# crm_attribute --type crm_config --name stonith-enabled --update true
#. Once the stonith resource is running, you can test it by executing the
following, replacing ``$NODE_NAME`` with the name of the node to fence
(although you might want to stop the cluster on that machine first):
.. code-block:: none
# stonith_admin --reboot $NODE_NAME
Example Fencing Configuration
_____________________________
For this example, we assume we have a cluster node, ``pcmk-1``, whose IPMI
controller is reachable at the IP address 192.0.2.1. The IPMI controller uses
the username ``testuser`` and the password ``abc123``.
#. Looking at what's installed, we may see a variety of available agents:
.. code-block:: none
# stonith_admin --list-installed
.. code-block:: none
(... some output omitted ...)
fence_idrac
fence_ilo3
fence_ilo4
fence_ilo5
fence_imm
fence_ipmilan
(... some output omitted ...)
Perhaps after some reading some man pages and doing some Internet searches,
we might decide ``fence_ipmilan`` is our best choice.
#. Next, we would check what parameters ``fence_ipmilan`` provides:
.. code-block:: none
# stonith_admin --metadata -a fence_ipmilan
.. code-block:: xml
fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.Fencing actionIPMI Lan Auth type.Ciphersuite to use (same as ipmitool -C parameter)Hexadecimal-encoded Kg key for IPMIv2 authenticationIP address or hostname of fencing deviceIP address or hostname of fencing deviceTCP/UDP port to use for connection with deviceUse Lanplus to improve security of connectionLogin nameMethod to fenceLogin password or passphraseScript to run to retrieve passwordLogin password or passphraseScript to run to retrieve passwordIP address or hostname of fencing device (together with --port-as-ip)IP address or hostname of fencing device (together with --port-as-ip)Privilege level on IPMI deviceBridge IPMI requests to the remote target addressLogin nameDisable logging to stderr. Does not affect --verbose or --debug-file or logging to syslog.Verbose modeWrite debug information to given fileWrite debug information to given fileDisplay version information and exitDisplay help and exitWait X seconds before fencing is startedPath to ipmitool binaryWait X seconds for cmd prompt after loginMake "port/plug" to be an alias to IP addressTest X seconds for status change after ON/OFFWait X seconds after issuing ON/OFFWait X seconds for cmd prompt after issuing commandCount of attempts to retry power onUse sudo (without password) when calling 3rd party softwareUse sudo (without password) when calling 3rd party softwarePath to sudo binary
Once we've decided what parameter values we think we need, it is a good idea
to run the fence agent's status action manually, to verify that our values
work correctly:
.. code-block:: none
# fence_ipmilan --lanplus -a 192.0.2.1 -l testuser -p abc123 -o status
Chassis Power is on
#. Based on that, we might create a fencing resource configuration like this in
``stonith.xml`` (or any file name, just use the same name with ``cibadmin``
later):
.. code-block:: xml
.. note::
Even though the man page shows that the ``action`` parameter is
supported, we do not provide that in the resource configuration.
Pacemaker will supply an appropriate action whenever the fence device
must be used.
#. In this case, we don't need to configure ``pcmk_host_map`` because
``fence_ipmilan`` ignores the target node name and instead uses its
``ip`` parameter to know how to contact the IPMI controller.
#. We do need to let Pacemaker know which cluster node can be fenced by this
device, since ``fence_ipmilan`` doesn't support the ``list`` action. Add
a line like this to the agent's instance attributes:
.. code-block:: xml
#. We don't need to configure ``pcmk_host_argument`` since ``ip`` is all the
fence agent needs (it ignores the target name).
#. Make the configuration active:
.. code-block:: none
# cibadmin --create --scope resources --xml-file stonith.xml
#. Set ``stonith-enabled`` to true (this only has to be done once):
.. code-block:: none
# crm_attribute --type crm_config --name stonith-enabled --update true
#. Since our cluster is still in testing, we can reboot ``pcmk-1`` without
bothering anyone, so we'll test our fencing configuration by running this
from one of the other cluster nodes:
.. code-block:: none
# stonith_admin --reboot pcmk-1
Then we will verify that the node did, in fact, reboot.
We can repeat that process to create a separate fencing resource for each node.
With some other fence device types, a single fencing resource is able to be
used for all nodes. In fact, we could do that with ``fence_ipmilan``, using the
``port-as-ip`` parameter along with ``pcmk_host_map``. Either approach is
fine.
.. index::
single: fencing; topology
single: fencing-topology
single: fencing-level
Fencing Topologies
##################
Pacemaker supports fencing nodes with multiple devices through a feature called
*fencing topologies*. Fencing topologies may be used to provide alternative
devices in case one fails, or to require multiple devices to all be executed
successfully in order to consider the node successfully fenced, or even a
combination of the two.
Create the individual devices as you normally would, then define one or more
``fencing-level`` entries in the ``fencing-topology`` section of the
configuration.
* Each fencing level is attempted in order of ascending ``index``. Allowed
values are 1 through 9.
* If a device fails, processing terminates for the current level. No further
devices in that level are exercised, and the next level is attempted instead.
* If the operation succeeds for all the listed devices in a level, the level is
deemed to have passed.
* The operation is finished when a level has passed (success), or all levels
have been attempted (failed).
* If the operation failed, the next step is determined by the scheduler and/or
the controller.
Some possible uses of topologies include:
* Try on-board IPMI, then an intelligent power switch if that fails
* Try fabric fencing of both disk and network, then fall back to power fencing
if either fails
* Wait up to a certain time for a kernel dump to complete, then cut power to
the node
.. table:: **Attributes of a fencing-level Element**
:class: longtable
:widths: 1 4
+------------------+-----------------------------------------------------------------------------------------+
| Attribute | Description |
+==================+=========================================================================================+
| id | .. index:: |
| | pair: fencing-level; id |
| | |
| | A unique name for this element (required) |
+------------------+-----------------------------------------------------------------------------------------+
| target | .. index:: |
| | pair: fencing-level; target |
| | |
| | The name of a single node to which this level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-pattern | .. index:: |
| | pair: fencing-level; target-pattern |
| | |
| | An extended regular expression (as defined in `POSIX |
| | `_) |
| | matching the names of nodes to which this level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-attribute | .. index:: |
| | pair: fencing-level; target-attribute |
| | |
| | The name of a node attribute that is set (to ``target-value``) for nodes to which this |
| | level applies |
+------------------+-----------------------------------------------------------------------------------------+
| target-value | .. index:: |
| | pair: fencing-level; target-value |
| | |
| | The node attribute value (of ``target-attribute``) that is set for nodes to which this |
| | level applies |
+------------------+-----------------------------------------------------------------------------------------+
| index | .. index:: |
| | pair: fencing-level; index |
| | |
| | The order in which to attempt the levels. Levels are attempted in ascending order |
| | *until one succeeds*. Valid values are 1 through 9. |
+------------------+-----------------------------------------------------------------------------------------+
| devices | .. index:: |
| | pair: fencing-level; devices |
| | |
| | A comma-separated list of devices that must all be tried for this level |
+------------------+-----------------------------------------------------------------------------------------+
.. note:: **Fencing topology with different devices for different nodes**
.. code-block:: xml
...
...
Example Dual-Layer, Dual-Device Fencing Topologies
__________________________________________________
The following example illustrates an advanced use of ``fencing-topology`` in a
cluster with the following properties:
* 2 nodes (prod-mysql1 and prod-mysql2)
* the nodes have IPMI controllers reachable at 192.0.2.1 and 192.0.2.2
* the nodes each have two independent Power Supply Units (PSUs) connected to
two independent Power Distribution Units (PDUs) reachable at 198.51.100.1
(port 10 and port 11) and 203.0.113.1 (port 10 and port 11)
* fencing via the IPMI controller uses the ``fence_ipmilan`` agent (1 fence device
per controller, with each device targeting a separate node)
* fencing via the PDUs uses the ``fence_apc_snmp`` agent (1 fence device per
PDU, with both devices targeting both nodes)
* a random delay is used to lessen the chance of a "death match"
* fencing topology is set to try IPMI fencing first then dual PDU fencing if
that fails
In a node failure scenario, Pacemaker will first select ``fence_ipmilan`` to
try to kill the faulty node. Using the fencing topology, if that method fails,
it will then move on to selecting ``fence_apc_snmp`` twice (once for the first
PDU, then again for the second PDU).
The fence action is considered successful only if both PDUs report the required
status. If any of them fails, fencing loops back to the first fencing method,
``fence_ipmilan``, and so on, until the node is fenced or the fencing action is
cancelled.
.. note:: **First fencing method: single IPMI device per target**
Each cluster node has it own dedicated IPMI controller that can be contacted
for fencing using the following primitives:
.. code-block:: xml
.. note:: **Second fencing method: dual PDU devices**
Each cluster node also has 2 distinct power supplies controlled by 2
distinct PDUs:
* Node 1: PDU 1 port 10 and PDU 2 port 10
* Node 2: PDU 1 port 11 and PDU 2 port 11
The matching fencing agents are configured as follows:
.. code-block:: xml
.. note:: **Fencing topology**
Now that all the fencing resources are defined, it's time to create the
right topology. We want to first fence using IPMI and if that does not work,
fence both PDUs to effectively and surely kill the node.
.. code-block:: xml
In ``fencing-topology``, the lowest ``index`` value for a target determines
its first fencing method.
Remapping Reboots
#################
When the cluster needs to reboot a node, whether because ``stonith-action`` is
``reboot`` or because a reboot was requested externally (such as by
``stonith_admin --reboot``), it will remap that to other commands in two cases:
* If the chosen fencing device does not support the ``reboot`` command, the
cluster will ask it to perform ``off`` instead.
* If a fencing topology level with multiple devices must be executed, the
cluster will ask all the devices to perform ``off``, then ask the devices to
perform ``on``.
To understand the second case, consider the example of a node with redundant
power supplies connected to intelligent power switches. Rebooting one switch
and then the other would have no effect on the node. Turning both switches off,
and then on, actually reboots the node.
In such a case, the fencing operation will be treated as successful as long as
the ``off`` commands succeed, because then it is safe for the cluster to
recover any resources that were on the node. Timeouts and errors in the ``on``
phase will be logged but ignored.
When a reboot operation is remapped, any action-specific timeout for the
remapped action will be used (for example, ``pcmk_off_timeout`` will be used
when executing the ``off`` command, not ``pcmk_reboot_timeout``).