diff --git a/cts/cli/regression.daemons.exp b/cts/cli/regression.daemons.exp
index 98b83b6267..b34fba8070 100644
--- a/cts/cli/regression.daemons.exp
+++ b/cts/cli/regression.daemons.exp
@@ -1,456 +1,456 @@
=#=#=#= Begin test: Get CIB manager metadata =#=#=#=
1.1Cluster options used by Pacemaker's Cluster Information Base managerCluster Information Base manager optionsEnable Access Control Lists (ACLs) for the CIBEnable Access Control Lists (ACLs) for the CIBRaise this if log has "Evicting client" messages for cluster daemon PIDs (a good value is the number of resources in the cluster multiplied by the number of nodes).Maximum IPC message backlog before disconnecting a cluster daemon
=#=#=#= End test: Get CIB manager metadata - OK (0) =#=#=#=
* Passed: pacemaker-based - Get CIB manager metadata
=#=#=#= Begin test: Get controller metadata =#=#=#=
1.1Cluster options used by Pacemaker's controllerPacemaker controller optionsIncludes a hash which identifies the exact changeset the code was built from. Used for diagnostic purposes.Pacemaker version on cluster node elected Designated Controller (DC)Used for informational and diagnostic purposes.The messaging stack on which Pacemaker is currently runningThis optional value is mostly for users' convenience as desired in administration, but may also be used in Pacemaker configuration rules via the #cluster-name node attribute, and by higher-level tools and resource agents.An arbitrary name for the clusterThe optimal value will depend on the speed and load of your network and the type of switches used.How long to wait for a response from other nodes during start-upPacemaker is primarily event-driven, and looks ahead to know when to recheck cluster state for failure timeouts and most time-based rules. However, it will also recheck the cluster after this amount of inactivity, to evaluate rules with date specifications and serve as a fail-safe for certain types of scheduler bugs. Allowed values: Zero disables polling, while positive values are an interval in seconds(unless other units are specified, for example "5min")Polling interval to recheck cluster state and evaluate rules with date specificationsThe cluster will slow down its recovery process when the amount of system resources used (currently CPU) approaches this limitMaximum amount of system load that should be used by cluster nodesMaximum number of jobs that can be scheduled per node (defaults to 2x cores)Maximum number of jobs that can be scheduled per node (defaults to 2x cores)A cluster node may receive notification of its own fencing if fencing is misconfigured, or if fabric fencing is in use that doesn't cut cluster communication. Allowed values are "stop" to attempt to immediately stop Pacemaker and stay stopped, or "panic" to attempt to immediately reboot the local node, falling back to stop on failure.How a cluster node should react if notified of its own fencingDeclare an election failed if it is not decided within this much time. If you need to adjust this value, it probably indicates the presence of a bug.*** Advanced Use Only ***Exit immediately if shutdown does not complete within this much time. If you need to adjust this value, it probably indicates the presence of a bug.*** Advanced Use Only ***If you need to adjust this value, it probably indicates the presence of a bug.*** Advanced Use Only ***If you need to adjust this value, it probably indicates the presence of a bug.*** Advanced Use Only ***Delay cluster recovery for this much time to allow for additional events to occur. Useful if your configuration is sensitive to the order in which ping updates arrive.*** Advanced Use Only *** Enabling this option will slow down cluster recovery under all conditionsIf this is set to a positive value, lost nodes are assumed to self-fence using watchdog-based SBD within this much time. This does not require a fencing resource to be explicitly configured, though a fence_watchdog resource can be configured, to limit use to specific nodes. If this is set to 0 (the default), the cluster will never assume watchdog-based self-fencing. If this is set to a negative value, the cluster will use twice the local value of the `SBD_WATCHDOG_TIMEOUT` environment variable if that is positive, or otherwise treat this as 0. WARNING: When used, this timeout must be larger than `SBD_WATCHDOG_TIMEOUT` on all nodes that use watchdog-based SBD, and Pacemaker will refuse to start on any of those nodes where this is not true for the local value or SBD is not active. When this is set to a negative value, `SBD_WATCHDOG_TIMEOUT` must be set to the same value on all nodes that use SBD, otherwise data corruption or loss could occur.How long before nodes can be assumed to be safely down when watchdog-based self-fencing via SBD is in useHow many times fencing can fail before it will no longer be immediately re-attempted on a targetHow many times fencing can fail before it will no longer be immediately re-attempted on a targetWhat to do when the cluster does not have quorum Allowed values: stop, freeze, ignore, demote, suicideWhat to do when the cluster does not have quorumWhen true, resources active on a node when it is cleanly shut down are kept "locked" to that node (not allowed to run elsewhere) until they start again on that node after it rejoins (or for at most shutdown-lock-limit, if set). Stonith resources and Pacemaker Remote connections are never locked. Clone and bundle instances and the promoted role of promotable clones are currently never locked, though support could be added in a future release.Whether to lock resources to a cleanly shut down nodeIf shutdown-lock is true and this is set to a nonzero time duration, shutdown locks will expire after this much time has passed since the shutdown was initiated, even if the node has not rejoined.Do not lock resources to a cleanly shut down node longer than this
- Fence nodes that do not join the controller process group within this much time after joining the cluster, to allow the cluster to continue managing resources. A value of 0 means never fence pending nodes.
+ Fence nodes that do not join the controller process group within this much time after joining the cluster, to allow the cluster to continue managing resources. A value of 0 means never fence pending nodes. Setting the value to 2h means fence nodes after 2 hours.How long to wait for a node that has joined the cluster to join the controller process group
=#=#=#= End test: Get controller metadata - OK (0) =#=#=#=
* Passed: pacemaker-controld - Get controller metadata
=#=#=#= Begin test: Get fencer metadata =#=#=#=
1.1Instance attributes available for all "stonith"-class resources and used by Pacemaker's fence daemon, formerly known as stonithdInstance attributes available for all "stonith"-class resourcessome devices do not support the standard 'port' parameter or may provide additional ones. Use this to specify an alternate, device-specific, parameter that should indicate the machine to be fenced. A value of none can be used to tell the cluster not to supply any additional parameters.Advanced use only: An alternate parameter to supply instead of 'port'Eg. node1:1;node2:2,3 would tell the cluster to use port 1 for node1 and ports 2 and 3 for node2A mapping of host names to ports numbers for devices that do not support host names.A list of machines controlled by this device (Optional unless pcmk_host_list=static-list)Eg. node1,node2,node3Allowed values: dynamic-list (query the device via the 'list' command), static-list (check the pcmk_host_list attribute), status (query the device via the 'status' command), none (assume every device can fence every machine)How to determine which machines are controlled by the device.Enable a delay of no more than the time specified before executing fencing actions. Pacemaker derives the overall delay by taking the value of pcmk_delay_base and adding a random delay value such that the sum is kept below this maximum.Enable a base delay for fencing actions and specify base delay value.This enables a static delay for fencing actions, which can help avoid "death matches" where two nodes try to fence each other at the same time. If pcmk_delay_max is also used, a random delay will be added such that the total delay is kept below that value.This can be set to a single time value to apply to any node targeted by this device (useful if a separate device is configured for each target), or to a node map (for example, "node1:1s;node2:5") to set a different value per target.Enable a base delay for fencing actions and specify base delay value.Cluster property concurrent-fencing=true needs to be configured first.Then use this to specify the maximum number of actions can be performed in parallel on this device. -1 is unlimited.The maximum number of actions can be performed in parallel on this deviceSome devices do not support the standard commands or may provide additional ones.\nUse this to specify an alternate, device-specific, command that implements the 'reboot' action.Advanced use only: An alternate command to run instead of 'reboot'Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'reboot' actions.Advanced use only: Specify an alternate timeout to use for reboot actions instead of stonith-timeoutSome devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'reboot' actions before giving up.Advanced use only: The maximum number of times to retry the 'reboot' command within the timeout periodSome devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'off' action.Advanced use only: An alternate command to run instead of 'off'Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'off' actions.Advanced use only: Specify an alternate timeout to use for off actions instead of stonith-timeoutSome devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'off' actions before giving up.Advanced use only: The maximum number of times to retry the 'off' command within the timeout periodSome devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'on' action.Advanced use only: An alternate command to run instead of 'on'Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'on' actions.Advanced use only: Specify an alternate timeout to use for on actions instead of stonith-timeoutSome devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'on' actions before giving up.Advanced use only: The maximum number of times to retry the 'on' command within the timeout periodSome devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'list' action.Advanced use only: An alternate command to run instead of 'list'Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'list' actions.Advanced use only: Specify an alternate timeout to use for list actions instead of stonith-timeoutSome devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'list' actions before giving up.Advanced use only: The maximum number of times to retry the 'list' command within the timeout periodSome devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'monitor' action.Advanced use only: An alternate command to run instead of 'monitor'Some devices need much more/less time to complete than normal.\nUse this to specify an alternate, device-specific, timeout for 'monitor' actions.Advanced use only: Specify an alternate timeout to use for monitor actions instead of stonith-timeoutSome devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'monitor' actions before giving up.Advanced use only: The maximum number of times to retry the 'monitor' command within the timeout periodSome devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'status' action.Advanced use only: An alternate command to run instead of 'status'Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'status' actions.Advanced use only: Specify an alternate timeout to use for status actions instead of stonith-timeoutSome devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'status' actions before giving up.Advanced use only: The maximum number of times to retry the 'status' command within the timeout period
=#=#=#= End test: Get fencer metadata - OK (0) =#=#=#=
* Passed: pacemaker-fenced - Get fencer metadata
=#=#=#= Begin test: Get scheduler metadata =#=#=#=
1.1Cluster options used by Pacemaker's schedulerPacemaker scheduler optionsWhat to do when the cluster does not have quorum Allowed values: stop, freeze, ignore, demote, suicideWhat to do when the cluster does not have quorumWhether resources can run on any node by defaultWhether resources can run on any node by defaultWhether the cluster should refrain from monitoring, starting, and stopping resourcesWhether the cluster should refrain from monitoring, starting, and stopping resourcesWhen true, the cluster will immediately ban a resource from a node if it fails to start there. When false, the cluster will instead check the resource's fail count against its migration-threshold.Whether a start failure should prevent a resource from being recovered on the same nodeWhether the cluster should check for active resources during start-upWhether the cluster should check for active resources during start-upWhen true, resources active on a node when it is cleanly shut down are kept "locked" to that node (not allowed to run elsewhere) until they start again on that node after it rejoins (or for at most shutdown-lock-limit, if set). Stonith resources and Pacemaker Remote connections are never locked. Clone and bundle instances and the promoted role of promotable clones are currently never locked, though support could be added in a future release.Whether to lock resources to a cleanly shut down nodeIf shutdown-lock is true and this is set to a nonzero time duration, shutdown locks will expire after this much time has passed since the shutdown was initiated, even if the node has not rejoined.Do not lock resources to a cleanly shut down node longer than thisIf false, unresponsive nodes are immediately assumed to be harmless, and resources that were active on them may be recovered elsewhere. This can result in a "split-brain" situation, potentially leading to data loss and/or service unavailability.*** Advanced Use Only *** Whether nodes may be fenced as part of recoveryAction to send to fence device when a node needs to be fenced ("poweroff" is a deprecated alias for "off") Allowed values: reboot, off, poweroffAction to send to fence device when a node needs to be fenced ("poweroff" is a deprecated alias for "off")This value is not used by Pacemaker, but is kept for backward compatibility, and certain legacy fence agents might use it.*** Advanced Use Only *** Unused by PacemakerThis is set automatically by the cluster according to whether SBD is detected to be in use. User-configured values are ignored. The value `true` is meaningful if diskless SBD is used and `stonith-watchdog-timeout` is nonzero. In that case, if fencing is required, watchdog-based self-fencing will be performed via SBD without requiring a fencing resource explicitly configured.Whether watchdog integration is enabledAllow performing fencing operations in parallelAllow performing fencing operations in parallelSetting this to false may lead to a "split-brain" situation,potentially leading to data loss and/or service unavailability.*** Advanced Use Only *** Whether to fence unseen nodes at start-upApply specified delay for the fencings that are targeting the lost nodes with the highest total resource priority in case we don't have the majority of the nodes in our cluster partition, so that the more significant nodes potentially win any fencing match, which is especially meaningful under split-brain of 2-node cluster. A promoted resource instance takes the base priority + 1 on calculation if the base priority is not 0. Any static/random delays that are introduced by `pcmk_delay_base/max` configured for the corresponding fencing resources will be added to this delay. This delay should be significantly greater than, safely twice, the maximum `pcmk_delay_base/max`. By default, priority fencing delay is disabled.Apply fencing delay targeting the lost nodes with the highest total resource priority
- Fence nodes that do not join the controller process group within this much time after joining the cluster, to allow the cluster to continue managing resources. A value of 0 means never fence pending nodes.
+ Fence nodes that do not join the controller process group within this much time after joining the cluster, to allow the cluster to continue managing resources. A value of 0 means never fence pending nodes. Setting the value to 2h means fence nodes after 2 hours.How long to wait for a node that has joined the cluster to join the controller process groupThe node elected Designated Controller (DC) will consider an action failed if it does not get a response from the node executing the action within this time (after considering the action's own timeout). The "correct" value will depend on the speed and load of your network and cluster nodes.Maximum time for node-to-node communicationThe "correct" value will depend on the speed and load of your network and cluster nodes. If set to 0, the cluster will impose a dynamically calculated limit when any node has a high load.Maximum number of jobs that the cluster may execute in parallel across all nodesThe number of live migration actions that the cluster is allowed to execute in parallel on a node (-1 means no limit)The number of live migration actions that the cluster is allowed to execute in parallel on a node (-1 means no limit)Whether the cluster should stop all active resourcesWhether the cluster should stop all active resourcesWhether to stop resources that were removed from the configurationWhether to stop resources that were removed from the configurationWhether to cancel recurring actions removed from the configurationWhether to cancel recurring actions removed from the configurationValues other than default are poorly tested and potentially dangerous. This option will be removed in a future release.*** Deprecated *** Whether to remove stopped resources from the executorZero to disable, -1 to store unlimited.The number of scheduler inputs resulting in errors to saveZero to disable, -1 to store unlimited.The number of scheduler inputs resulting in warnings to saveZero to disable, -1 to store unlimited.The number of scheduler inputs without errors or warnings to saveRequires external entities to create node attributes (named with the prefix "#health") with values "red", "yellow", or "green". Allowed values: none, migrate-on-red, only-green, progressive, customHow cluster should react to node health attributesOnly used when "node-health-strategy" is set to "progressive".Base health score assigned to a nodeOnly used when "node-health-strategy" is set to "custom" or "progressive".The score to use for a node health attribute whose value is "green"Only used when "node-health-strategy" is set to "custom" or "progressive".The score to use for a node health attribute whose value is "yellow"Only used when "node-health-strategy" is set to "custom" or "progressive".The score to use for a node health attribute whose value is "red"How the cluster should allocate resources to nodes Allowed values: default, utilization, minimal, balancedHow the cluster should allocate resources to nodes
=#=#=#= End test: Get scheduler metadata - OK (0) =#=#=#=
* Passed: pacemaker-schedulerd - Get scheduler metadata
diff --git a/daemons/controld/controld_control.c b/daemons/controld/controld_control.c
index 540b3ff9f8..644d686bb1 100644
--- a/daemons/controld/controld_control.c
+++ b/daemons/controld/controld_control.c
@@ -1,863 +1,864 @@
/*
* Copyright 2004-2023 the Pacemaker project contributors
*
* The version control history for this file may have further details.
*
* This source code is licensed under the GNU General Public License version 2
* or later (GPLv2+) WITHOUT ANY WARRANTY.
*/
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
static qb_ipcs_service_t *ipcs = NULL;
static crm_trigger_t *config_read_trigger = NULL;
#if SUPPORT_COROSYNC
extern gboolean crm_connect_corosync(crm_cluster_t * cluster);
#endif
void crm_shutdown(int nsig);
static gboolean crm_read_options(gpointer user_data);
/* A_HA_CONNECT */
void
do_ha_control(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state,
enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
gboolean registered = FALSE;
static crm_cluster_t *cluster = NULL;
if (cluster == NULL) {
cluster = pcmk_cluster_new();
}
if (action & A_HA_DISCONNECT) {
crm_cluster_disconnect(cluster);
crm_info("Disconnected from the cluster");
controld_set_fsa_input_flags(R_HA_DISCONNECTED);
}
if (action & A_HA_CONNECT) {
crm_set_status_callback(&peer_update_callback);
crm_set_autoreap(FALSE);
#if SUPPORT_COROSYNC
if (is_corosync_cluster()) {
registered = crm_connect_corosync(cluster);
}
#endif // SUPPORT_COROSYNC
if (registered) {
controld_election_init(cluster->uname);
controld_globals.our_nodename = cluster->uname;
controld_globals.our_uuid = cluster->uuid;
if(cluster->uuid == NULL) {
crm_err("Could not obtain local uuid");
registered = FALSE;
}
}
if (!registered) {
controld_set_fsa_input_flags(R_HA_DISCONNECTED);
register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL);
return;
}
populate_cib_nodes(node_update_none, __func__);
controld_clear_fsa_input_flags(R_HA_DISCONNECTED);
crm_info("Connected to the cluster");
}
if (action & ~(A_HA_CONNECT | A_HA_DISCONNECT)) {
crm_err("Unexpected action %s in %s", fsa_action2string(action),
__func__);
}
}
/* A_SHUTDOWN */
void
do_shutdown(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
/* just in case */
controld_set_fsa_input_flags(R_SHUTDOWN);
controld_disconnect_fencer(FALSE);
}
/* A_SHUTDOWN_REQ */
void
do_shutdown_req(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state,
enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
xmlNode *msg = NULL;
controld_set_fsa_input_flags(R_SHUTDOWN);
//controld_set_fsa_input_flags(R_STAYDOWN);
crm_info("Sending shutdown request to all peers (DC is %s)",
pcmk__s(controld_globals.dc_name, "not set"));
msg = create_request(CRM_OP_SHUTDOWN_REQ, NULL, NULL, CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL);
if (send_cluster_message(NULL, crm_msg_crmd, msg, TRUE) == FALSE) {
register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL);
}
free_xml(msg);
}
void
crmd_fast_exit(crm_exit_t exit_code)
{
if (pcmk_is_set(controld_globals.fsa_input_register, R_STAYDOWN)) {
crm_warn("Inhibiting respawn "CRM_XS" remapping exit code %d to %d",
exit_code, CRM_EX_FATAL);
exit_code = CRM_EX_FATAL;
} else if ((exit_code == CRM_EX_OK)
&& pcmk_is_set(controld_globals.fsa_input_register,
R_IN_RECOVERY)) {
crm_err("Could not recover from internal error");
exit_code = CRM_EX_ERROR;
}
if (controld_globals.logger_out != NULL) {
controld_globals.logger_out->finish(controld_globals.logger_out,
exit_code, true, NULL);
pcmk__output_free(controld_globals.logger_out);
controld_globals.logger_out = NULL;
}
crm_exit(exit_code);
}
crm_exit_t
crmd_exit(crm_exit_t exit_code)
{
GMainLoop *mloop = controld_globals.mainloop;
static bool in_progress = FALSE;
if (in_progress && (exit_code == CRM_EX_OK)) {
crm_debug("Exit is already in progress");
return exit_code;
} else if(in_progress) {
crm_notice("Error during shutdown process, exiting now with status %d (%s)",
exit_code, crm_exit_str(exit_code));
crm_write_blackbox(SIGTRAP, NULL);
crmd_fast_exit(exit_code);
}
in_progress = TRUE;
crm_trace("Preparing to exit with status %d (%s)",
exit_code, crm_exit_str(exit_code));
/* Suppress secondary errors resulting from us disconnecting everything */
controld_set_fsa_input_flags(R_HA_DISCONNECTED);
/* Close all IPC servers and clients to ensure any and all shared memory files are cleaned up */
if(ipcs) {
crm_trace("Closing IPC server");
mainloop_del_ipc_server(ipcs);
ipcs = NULL;
}
controld_close_attrd_ipc();
controld_shutdown_schedulerd_ipc();
controld_disconnect_fencer(TRUE);
if ((exit_code == CRM_EX_OK) && (controld_globals.mainloop == NULL)) {
crm_debug("No mainloop detected");
exit_code = CRM_EX_ERROR;
}
/* On an error, just get out.
*
* Otherwise, make the effort to have mainloop exit gracefully so
* that it (mostly) cleans up after itself and valgrind has less
* to report on - allowing real errors stand out
*/
if (exit_code != CRM_EX_OK) {
crm_notice("Forcing immediate exit with status %d (%s)",
exit_code, crm_exit_str(exit_code));
crm_write_blackbox(SIGTRAP, NULL);
crmd_fast_exit(exit_code);
}
/* Clean up as much memory as possible for valgrind */
for (GList *iter = controld_globals.fsa_message_queue; iter != NULL;
iter = iter->next) {
fsa_data_t *fsa_data = (fsa_data_t *) iter->data;
crm_info("Dropping %s: [ state=%s cause=%s origin=%s ]",
fsa_input2string(fsa_data->fsa_input),
fsa_state2string(controld_globals.fsa_state),
fsa_cause2string(fsa_data->fsa_cause), fsa_data->origin);
delete_fsa_input(fsa_data);
}
controld_clear_fsa_input_flags(R_MEMBERSHIP);
g_list_free(controld_globals.fsa_message_queue);
controld_globals.fsa_message_queue = NULL;
controld_free_node_pending_timers();
controld_election_fini();
/* Tear down the CIB manager connection, but don't free it yet -- it could
* be used when we drain the mainloop later.
*/
controld_disconnect_cib_manager();
verify_stopped(controld_globals.fsa_state, LOG_WARNING);
controld_clear_fsa_input_flags(R_LRM_CONNECTED);
lrm_state_destroy_all();
mainloop_destroy_trigger(config_read_trigger);
config_read_trigger = NULL;
controld_destroy_fsa_trigger();
controld_destroy_transition_trigger();
pcmk__client_cleanup();
crm_peer_destroy();
controld_free_fsa_timers();
te_cleanup_stonith_history_sync(NULL, TRUE);
controld_free_sched_timer();
free(controld_globals.our_nodename);
controld_globals.our_nodename = NULL;
free(controld_globals.our_uuid);
controld_globals.our_uuid = NULL;
free(controld_globals.dc_name);
controld_globals.dc_name = NULL;
free(controld_globals.dc_version);
controld_globals.dc_version = NULL;
free(controld_globals.cluster_name);
controld_globals.cluster_name = NULL;
free(controld_globals.te_uuid);
controld_globals.te_uuid = NULL;
free_max_generation();
controld_destroy_failed_sync_table();
controld_destroy_outside_events_table();
mainloop_destroy_signal(SIGPIPE);
mainloop_destroy_signal(SIGUSR1);
mainloop_destroy_signal(SIGTERM);
mainloop_destroy_signal(SIGTRAP);
/* leave SIGCHLD engaged as we might still want to drain some service-actions */
if (mloop) {
GMainContext *ctx = g_main_loop_get_context(controld_globals.mainloop);
/* Don't re-enter this block */
controld_globals.mainloop = NULL;
/* no signals on final draining anymore */
mainloop_destroy_signal(SIGCHLD);
crm_trace("Draining mainloop %d %d", g_main_loop_is_running(mloop), g_main_context_pending(ctx));
{
int lpc = 0;
while((g_main_context_pending(ctx) && lpc < 10)) {
lpc++;
crm_trace("Iteration %d", lpc);
g_main_context_dispatch(ctx);
}
}
crm_trace("Closing mainloop %d %d", g_main_loop_is_running(mloop), g_main_context_pending(ctx));
g_main_loop_quit(mloop);
/* Won't do anything yet, since we're inside it now */
g_main_loop_unref(mloop);
} else {
mainloop_destroy_signal(SIGCHLD);
}
cib_delete(controld_globals.cib_conn);
controld_globals.cib_conn = NULL;
throttle_fini();
/* Graceful */
crm_trace("Done preparing for exit with status %d (%s)",
exit_code, crm_exit_str(exit_code));
return exit_code;
}
/* A_EXIT_0, A_EXIT_1 */
void
do_exit(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
crm_exit_t exit_code = CRM_EX_OK;
if (pcmk_is_set(action, A_EXIT_1)) {
exit_code = CRM_EX_ERROR;
crm_err("Exiting now due to errors");
}
verify_stopped(cur_state, LOG_ERR);
crmd_exit(exit_code);
}
static void sigpipe_ignore(int nsig) { return; }
/* A_STARTUP */
void
do_startup(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
crm_debug("Registering Signal Handlers");
mainloop_add_signal(SIGTERM, crm_shutdown);
mainloop_add_signal(SIGPIPE, sigpipe_ignore);
config_read_trigger = mainloop_add_trigger(G_PRIORITY_HIGH,
crm_read_options, NULL);
controld_init_fsa_trigger();
controld_init_transition_trigger();
crm_debug("Creating CIB manager and executor objects");
controld_globals.cib_conn = cib_new();
lrm_state_init_local();
if (controld_init_fsa_timers() == FALSE) {
register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL);
}
}
// \return libqb error code (0 on success, -errno on error)
static int32_t
accept_controller_client(qb_ipcs_connection_t *c, uid_t uid, gid_t gid)
{
crm_trace("Accepting new IPC client connection");
if (pcmk__new_client(c, uid, gid) == NULL) {
return -EIO;
}
return 0;
}
// \return libqb error code (0 on success, -errno on error)
static int32_t
dispatch_controller_ipc(qb_ipcs_connection_t * c, void *data, size_t size)
{
uint32_t id = 0;
uint32_t flags = 0;
pcmk__client_t *client = pcmk__find_client(c);
xmlNode *msg = pcmk__client_data2xml(client, data, &id, &flags);
if (msg == NULL) {
pcmk__ipc_send_ack(client, id, flags, "ack", NULL, CRM_EX_PROTOCOL);
return 0;
}
pcmk__ipc_send_ack(client, id, flags, "ack", NULL, CRM_EX_INDETERMINATE);
CRM_ASSERT(client->user != NULL);
pcmk__update_acl_user(msg, F_CRM_USER, client->user);
crm_xml_add(msg, F_CRM_SYS_FROM, client->id);
if (controld_authorize_ipc_message(msg, client, NULL)) {
crm_trace("Processing IPC message from client %s",
pcmk__client_name(client));
route_message(C_IPC_MESSAGE, msg);
}
controld_trigger_fsa();
free_xml(msg);
return 0;
}
static int32_t
ipc_client_disconnected(qb_ipcs_connection_t *c)
{
pcmk__client_t *client = pcmk__find_client(c);
if (client) {
crm_trace("Disconnecting %sregistered client %s (%p/%p)",
(client->userdata? "" : "un"), pcmk__client_name(client),
c, client);
free(client->userdata);
pcmk__free_client(client);
controld_trigger_fsa();
}
return 0;
}
static void
ipc_connection_destroyed(qb_ipcs_connection_t *c)
{
crm_trace("Connection %p", c);
ipc_client_disconnected(c);
}
/* A_STOP */
void
do_stop(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
crm_trace("Closing IPC server");
mainloop_del_ipc_server(ipcs); ipcs = NULL;
register_fsa_input(C_FSA_INTERNAL, I_TERMINATE, NULL);
}
/* A_STARTED */
void
do_started(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
static struct qb_ipcs_service_handlers crmd_callbacks = {
.connection_accept = accept_controller_client,
.connection_created = NULL,
.msg_process = dispatch_controller_ipc,
.connection_closed = ipc_client_disconnected,
.connection_destroyed = ipc_connection_destroyed
};
if (cur_state != S_STARTING) {
crm_err("Start cancelled... %s", fsa_state2string(cur_state));
return;
} else if (!pcmk_is_set(controld_globals.fsa_input_register,
R_MEMBERSHIP)) {
crm_info("Delaying start, no membership data (%.16llx)", R_MEMBERSHIP);
crmd_fsa_stall(TRUE);
return;
} else if (!pcmk_is_set(controld_globals.fsa_input_register,
R_LRM_CONNECTED)) {
crm_info("Delaying start, not connected to executor (%.16llx)", R_LRM_CONNECTED);
crmd_fsa_stall(TRUE);
return;
} else if (!pcmk_is_set(controld_globals.fsa_input_register,
R_CIB_CONNECTED)) {
crm_info("Delaying start, CIB not connected (%.16llx)", R_CIB_CONNECTED);
crmd_fsa_stall(TRUE);
return;
} else if (!pcmk_is_set(controld_globals.fsa_input_register,
R_READ_CONFIG)) {
crm_info("Delaying start, Config not read (%.16llx)", R_READ_CONFIG);
crmd_fsa_stall(TRUE);
return;
} else if (!pcmk_is_set(controld_globals.fsa_input_register, R_PEER_DATA)) {
crm_info("Delaying start, No peer data (%.16llx)", R_PEER_DATA);
crmd_fsa_stall(TRUE);
return;
}
crm_debug("Init server comms");
ipcs = pcmk__serve_controld_ipc(&crmd_callbacks);
if (ipcs == NULL) {
crm_err("Failed to create IPC server: shutting down and inhibiting respawn");
register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL);
} else {
crm_notice("Pacemaker controller successfully started and accepting connections");
}
controld_set_fsa_input_flags(R_ST_REQUIRED);
controld_timer_fencer_connect(GINT_TO_POINTER(TRUE));
controld_clear_fsa_input_flags(R_STARTING);
register_fsa_input(msg_data->fsa_cause, I_PENDING, NULL);
}
/* A_RECOVER */
void
do_recover(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
controld_set_fsa_input_flags(R_IN_RECOVERY);
crm_warn("Fast-tracking shutdown in response to errors");
register_fsa_input(C_FSA_INTERNAL, I_TERMINATE, NULL);
}
static pcmk__cluster_option_t controller_options[] = {
/* name, old name, type, allowed values,
* default value, validator,
* short description,
* long description
*/
{
"dc-version", NULL, "string", NULL, PCMK__VALUE_NONE, NULL,
N_("Pacemaker version on cluster node elected Designated Controller (DC)"),
N_("Includes a hash which identifies the exact changeset the code was "
"built from. Used for diagnostic purposes.")
},
{
"cluster-infrastructure", NULL, "string", NULL, "corosync", NULL,
N_("The messaging stack on which Pacemaker is currently running"),
N_("Used for informational and diagnostic purposes.")
},
{
"cluster-name", NULL, "string", NULL, NULL, NULL,
N_("An arbitrary name for the cluster"),
N_("This optional value is mostly for users' convenience as desired "
"in administration, but may also be used in Pacemaker "
"configuration rules via the #cluster-name node attribute, and "
"by higher-level tools and resource agents.")
},
{
XML_CONFIG_ATTR_DC_DEADTIME, NULL, "time",
NULL, "20s", pcmk__valid_interval_spec,
N_("How long to wait for a response from other nodes during start-up"),
N_("The optimal value will depend on the speed and load of your network "
"and the type of switches used.")
},
{
XML_CONFIG_ATTR_RECHECK, NULL, "time",
N_("Zero disables polling, while positive values are an interval in seconds"
"(unless other units are specified, for example \"5min\")"),
"15min", pcmk__valid_interval_spec,
N_("Polling interval to recheck cluster state and evaluate rules "
"with date specifications"),
N_("Pacemaker is primarily event-driven, and looks ahead to know when to "
"recheck cluster state for failure timeouts and most time-based "
"rules. However, it will also recheck the cluster after this "
"amount of inactivity, to evaluate rules with date specifications "
"and serve as a fail-safe for certain types of scheduler bugs.")
},
{
"load-threshold", NULL, "percentage", NULL,
"80%", pcmk__valid_percentage,
N_("Maximum amount of system load that should be used by cluster nodes"),
N_("The cluster will slow down its recovery process when the amount of "
"system resources used (currently CPU) approaches this limit"),
},
{
"node-action-limit", NULL, "integer", NULL,
"0", pcmk__valid_number,
N_("Maximum number of jobs that can be scheduled per node "
"(defaults to 2x cores)")
},
{ XML_CONFIG_ATTR_FENCE_REACTION, NULL, "string", NULL, "stop", NULL,
N_("How a cluster node should react if notified of its own fencing"),
N_("A cluster node may receive notification of its own fencing if fencing "
"is misconfigured, or if fabric fencing is in use that doesn't cut "
"cluster communication. Allowed values are \"stop\" to attempt to "
"immediately stop Pacemaker and stay stopped, or \"panic\" to attempt "
"to immediately reboot the local node, falling back to stop on failure.")
},
{
XML_CONFIG_ATTR_ELECTION_FAIL, NULL, "time", NULL,
"2min", pcmk__valid_interval_spec,
"*** Advanced Use Only ***",
N_("Declare an election failed if it is not decided within this much "
"time. If you need to adjust this value, it probably indicates "
"the presence of a bug.")
},
{
XML_CONFIG_ATTR_FORCE_QUIT, NULL, "time", NULL,
"20min", pcmk__valid_interval_spec,
"*** Advanced Use Only ***",
N_("Exit immediately if shutdown does not complete within this much "
"time. If you need to adjust this value, it probably indicates "
"the presence of a bug.")
},
{
"join-integration-timeout", "crmd-integration-timeout", "time", NULL,
"3min", pcmk__valid_interval_spec,
"*** Advanced Use Only ***",
N_("If you need to adjust this value, it probably indicates "
"the presence of a bug.")
},
{
"join-finalization-timeout", "crmd-finalization-timeout", "time", NULL,
"30min", pcmk__valid_interval_spec,
"*** Advanced Use Only ***",
N_("If you need to adjust this value, it probably indicates "
"the presence of a bug.")
},
{
"transition-delay", "crmd-transition-delay", "time", NULL,
"0s", pcmk__valid_interval_spec,
N_("*** Advanced Use Only *** Enabling this option will slow down "
"cluster recovery under all conditions"),
N_("Delay cluster recovery for this much time to allow for additional "
"events to occur. Useful if your configuration is sensitive to "
"the order in which ping updates arrive.")
},
{
"stonith-watchdog-timeout", NULL, "time", NULL,
"0", controld_verify_stonith_watchdog_timeout,
N_("How long before nodes can be assumed to be safely down when "
"watchdog-based self-fencing via SBD is in use"),
N_("If this is set to a positive value, lost nodes are assumed to "
"self-fence using watchdog-based SBD within this much time. This "
"does not require a fencing resource to be explicitly configured, "
"though a fence_watchdog resource can be configured, to limit use "
"to specific nodes. If this is set to 0 (the default), the cluster "
"will never assume watchdog-based self-fencing. If this is set to a "
"negative value, the cluster will use twice the local value of the "
"`SBD_WATCHDOG_TIMEOUT` environment variable if that is positive, "
"or otherwise treat this as 0. WARNING: When used, this timeout "
"must be larger than `SBD_WATCHDOG_TIMEOUT` on all nodes that use "
"watchdog-based SBD, and Pacemaker will refuse to start on any of "
"those nodes where this is not true for the local value or SBD is "
"not active. When this is set to a negative value, "
"`SBD_WATCHDOG_TIMEOUT` must be set to the same value on all nodes "
"that use SBD, otherwise data corruption or loss could occur.")
},
{
"stonith-max-attempts", NULL, "integer", NULL,
"10", pcmk__valid_positive_number,
N_("How many times fencing can fail before it will no longer be "
"immediately re-attempted on a target")
},
// Already documented in libpe_status (other values must be kept identical)
{
"no-quorum-policy", NULL, "select",
"stop, freeze, ignore, demote, suicide", "stop", pcmk__valid_quorum,
N_("What to do when the cluster does not have quorum"), NULL
},
{
XML_CONFIG_ATTR_SHUTDOWN_LOCK, NULL, "boolean", NULL,
"false", pcmk__valid_boolean,
N_("Whether to lock resources to a cleanly shut down node"),
N_("When true, resources active on a node when it is cleanly shut down "
"are kept \"locked\" to that node (not allowed to run elsewhere) "
"until they start again on that node after it rejoins (or for at "
"most shutdown-lock-limit, if set). Stonith resources and "
"Pacemaker Remote connections are never locked. Clone and bundle "
"instances and the promoted role of promotable clones are "
"currently never locked, though support could be added in a future "
"release.")
},
{
XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT, NULL, "time", NULL,
"0", pcmk__valid_interval_spec,
N_("Do not lock resources to a cleanly shut down node longer than "
"this"),
N_("If shutdown-lock is true and this is set to a nonzero time "
"duration, shutdown locks will expire after this much time has "
"passed since the shutdown was initiated, even if the node has not "
"rejoined.")
},
{
XML_CONFIG_ATTR_NODE_PENDING_TIMEOUT, NULL, "time", NULL,
- "2h", pcmk__valid_interval_spec,
+ "0", pcmk__valid_interval_spec,
N_("How long to wait for a node that has joined the cluster to join "
"the controller process group"),
N_("Fence nodes that do not join the controller process group within "
"this much time after joining the cluster, to allow the cluster "
- "to continue managing resources. A value of 0 means never fence "
- "pending nodes.")
+ "to continue managing resources. A value of 0 means never fence "
+ "pending nodes. Setting the value to 2h means fence nodes after "
+ "2 hours.")
},
};
void
crmd_metadata(void)
{
const char *desc_short = "Pacemaker controller options";
const char *desc_long = "Cluster options used by Pacemaker's controller";
gchar *s = pcmk__format_option_metadata("pacemaker-controld", desc_short,
desc_long, controller_options,
PCMK__NELEM(controller_options));
printf("%s", s);
g_free(s);
}
static void
config_query_callback(xmlNode * msg, int call_id, int rc, xmlNode * output, void *user_data)
{
const char *value = NULL;
GHashTable *config_hash = NULL;
crm_time_t *now = crm_time_new(NULL);
xmlNode *crmconfig = NULL;
xmlNode *alerts = NULL;
if (rc != pcmk_ok) {
fsa_data_t *msg_data = NULL;
crm_err("Local CIB query resulted in an error: %s", pcmk_strerror(rc));
register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL);
if (rc == -EACCES || rc == -pcmk_err_schema_validation) {
crm_err("The cluster is mis-configured - shutting down and staying down");
controld_set_fsa_input_flags(R_STAYDOWN);
}
goto bail;
}
crmconfig = output;
if ((crmconfig != NULL)
&& !pcmk__xe_is(crmconfig, XML_CIB_TAG_CRMCONFIG)) {
crmconfig = first_named_child(crmconfig, XML_CIB_TAG_CRMCONFIG);
}
if (!crmconfig) {
fsa_data_t *msg_data = NULL;
crm_err("Local CIB query for " XML_CIB_TAG_CRMCONFIG " section failed");
register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL);
goto bail;
}
crm_debug("Call %d : Parsing CIB options", call_id);
config_hash = pcmk__strkey_table(free, free);
pe_unpack_nvpairs(crmconfig, crmconfig, XML_CIB_TAG_PROPSET, NULL,
config_hash, CIB_OPTIONS_FIRST, FALSE, now, NULL);
// Validate all options, and use defaults if not already present in hash
pcmk__validate_cluster_options(config_hash, controller_options,
PCMK__NELEM(controller_options));
value = g_hash_table_lookup(config_hash, "no-quorum-policy");
if (pcmk__str_eq(value, "suicide", pcmk__str_casei) && pcmk__locate_sbd()) {
controld_set_global_flags(controld_no_quorum_suicide);
}
value = g_hash_table_lookup(config_hash, XML_CONFIG_ATTR_SHUTDOWN_LOCK);
if (crm_is_true(value)) {
controld_set_global_flags(controld_shutdown_lock_enabled);
} else {
controld_clear_global_flags(controld_shutdown_lock_enabled);
}
value = g_hash_table_lookup(config_hash,
XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT);
controld_globals.shutdown_lock_limit = crm_parse_interval_spec(value)
/ 1000;
value = g_hash_table_lookup(config_hash,
XML_CONFIG_ATTR_NODE_PENDING_TIMEOUT);
controld_globals.node_pending_timeout = crm_parse_interval_spec(value) / 1000;
value = g_hash_table_lookup(config_hash, "cluster-name");
pcmk__str_update(&(controld_globals.cluster_name), value);
// Let subcomponents initialize their own static variables
controld_configure_election(config_hash);
controld_configure_fencing(config_hash);
controld_configure_fsa_timers(config_hash);
controld_configure_throttle(config_hash);
alerts = first_named_child(output, XML_CIB_TAG_ALERTS);
crmd_unpack_alerts(alerts);
controld_set_fsa_input_flags(R_READ_CONFIG);
controld_trigger_fsa();
g_hash_table_destroy(config_hash);
bail:
crm_time_free(now);
}
/*!
* \internal
* \brief Trigger read and processing of the configuration
*
* \param[in] fn Calling function name
* \param[in] line Line number where call occurred
*/
void
controld_trigger_config_as(const char *fn, int line)
{
if (config_read_trigger != NULL) {
crm_trace("%s:%d - Triggered config processing", fn, line);
mainloop_set_trigger(config_read_trigger);
}
}
gboolean
crm_read_options(gpointer user_data)
{
cib_t *cib_conn = controld_globals.cib_conn;
int call_id = cib_conn->cmds->query(cib_conn,
"//" XML_CIB_TAG_CRMCONFIG
" | //" XML_CIB_TAG_ALERTS,
NULL, cib_xpath|cib_scope_local);
fsa_register_cib_callback(call_id, NULL, config_query_callback);
crm_trace("Querying the CIB... call %d", call_id);
return TRUE;
}
/* A_READCONFIG */
void
do_read_config(long long action,
enum crmd_fsa_cause cause,
enum crmd_fsa_state cur_state,
enum crmd_fsa_input current_input, fsa_data_t * msg_data)
{
throttle_init();
controld_trigger_config();
}
void
crm_shutdown(int nsig)
{
const char *value = NULL;
guint default_period_ms = 0;
if ((controld_globals.mainloop == NULL)
|| !g_main_loop_is_running(controld_globals.mainloop)) {
crmd_exit(CRM_EX_OK);
return;
}
if (pcmk_is_set(controld_globals.fsa_input_register, R_SHUTDOWN)) {
crm_err("Escalating shutdown");
register_fsa_input_before(C_SHUTDOWN, I_ERROR, NULL);
return;
}
controld_set_fsa_input_flags(R_SHUTDOWN);
register_fsa_input(C_SHUTDOWN, I_SHUTDOWN, NULL);
/* If shutdown timer doesn't have a period set, use the default
*
* @TODO: Evaluate whether this is still necessary. As long as
* config_query_callback() has been run at least once, it doesn't look like
* anything could have changed the timer period since then.
*/
value = pcmk__cluster_option(NULL, controller_options,
PCMK__NELEM(controller_options),
XML_CONFIG_ATTR_FORCE_QUIT);
default_period_ms = crm_parse_interval_spec(value);
controld_shutdown_start_countdown(default_period_ms);
}
diff --git a/doc/sphinx/Pacemaker_Explained/options.rst b/doc/sphinx/Pacemaker_Explained/options.rst
index d38a2ab892..77bd7e65bc 100644
--- a/doc/sphinx/Pacemaker_Explained/options.rst
+++ b/doc/sphinx/Pacemaker_Explained/options.rst
@@ -1,921 +1,921 @@
Cluster-Wide Configuration
--------------------------
.. index::
pair: XML element; cib
pair: XML element; configuration
Configuration Layout
####################
The cluster is defined by the Cluster Information Base (CIB), which uses XML
notation. The simplest CIB, an empty one, looks like this:
.. topic:: An empty configuration
.. code-block:: xml
The empty configuration above contains the major sections that make up a CIB:
* ``cib``: The entire CIB is enclosed with a ``cib`` element. Certain
fundamental settings are defined as attributes of this element.
* ``configuration``: This section -- the primary focus of this document --
contains traditional configuration information such as what resources the
cluster serves and the relationships among them.
* ``crm_config``: cluster-wide configuration options
* ``nodes``: the machines that host the cluster
* ``resources``: the services run by the cluster
* ``constraints``: indications of how resources should be placed
* ``status``: This section contains the history of each resource on each
node. Based on this data, the cluster can construct the complete current
state of the cluster. The authoritative source for this section is the
local executor (pacemaker-execd process) on each cluster node, and the
cluster will occasionally repopulate the entire section. For this reason,
it is never written to disk, and administrators are advised against
modifying it in any way.
In this document, configuration settings will be described as properties or
options based on how they are defined in the CIB:
* Properties are XML attributes of an XML element.
* Options are name-value pairs expressed as ``nvpair`` child elements of an XML
element.
Normally, you will use command-line tools that abstract the XML, so the
distinction will be unimportant; both properties and options are cluster
settings you can tweak.
Configuration Value Types
#########################
Throughout this document, configuration values will be designated as having one
of the following types:
.. list-table:: **Configuration Value Types**
:class: longtable
:widths: 1 3
:header-rows: 1
* - Type
- Description
* - .. _boolean:
.. index::
pair: type; boolean
boolean
- Case-insensitive text value where ``1``, ``yes``, ``y``, ``on``,
and ``true`` evaluate as true and ``0``, ``no``, ``n``, ``off``,
``false``, and unset evaluate as false
* - .. _date_time:
.. index::
pair: type; date/time
date/time
- Textual timestamp like ``Sat Dec 21 11:47:45 2013``
* - .. _duration:
.. index::
pair: type; duration
duration
- A time duration, specified either like a :ref:`timeout ` or an
`ISO 8601 duration `_.
A duration may be up to approximately 49 days but is intended for much
smaller time periods.
* - .. _enumeration:
.. index::
pair: type; enumeration
enumeration
- Text that must be one of a set of defined values (which will be listed
in the description)
* - .. _integer:
.. index::
pair: type; integer
integer
- 32-bit signed integer value (-2,147,483,648 to 2,147,483,647)
* - .. _nonnegative_integer:
.. index::
pair: type; nonnegative integer
nonnegative integer
- 32-bit nonnegative integer value (0 to 2,147,483,647)
* - .. _port:
.. index::
pair: type; port
port
- Integer TCP port number (0 to 65535)
* - .. _score:
.. index::
pair: type; score
score
- A Pacemaker score can be an integer between -1,000,000 and 1,000,000, or
a string alias: ``INFINITY`` or ``+INFINITY`` is equivalent to
1,000,000, ``-INFINITY`` is equivalent to -1,000,000, and ``red``,
``yellow``, and ``green`` are equivalent to integers as described in
:ref:`node-health`.
* - .. _text:
.. index::
pair: type; text
text
- A text string
* - .. _timeout:
.. index::
pair: type; timeout
timeout
- A time duration, specified as a bare number (in which case it is
considered to be in seconds) or a number with a unit (``ms`` or ``msec``
for milliseconds, ``us`` or ``usec`` for microseconds, ``s`` or ``sec``
for seconds, ``m`` or ``min`` for minutes, ``h`` or ``hr`` for hours)
optionally with whitespace before and/or after the number.
* - .. _version:
.. index::
pair: type; version
version
- Version number (any combination of alphanumeric characters, dots, and
dashes, starting with a number).
Scores
______
Scores are integral to how Pacemaker works. Practically everything from moving
a resource to deciding which resource to stop in a degraded cluster is achieved
by manipulating scores in some way.
Scores are calculated per resource and node. Any node with a negative score for
a resource can't run that resource. The cluster places a resource on the node
with the highest score for it.
Score addition and subtraction follow these rules:
* Any value (including ``INFINITY``) - ``INFINITY`` = ``-INFINITY``
* ``INFINITY`` + any value other than ``-INFINITY`` = ``INFINITY``
.. note::
What if you want to use a score higher than 1,000,000? Typically this possibility
arises when someone wants to base the score on some external metric that might
go above 1,000,000.
The short answer is you can't.
The long answer is it is sometimes possible work around this limitation
creatively. You may be able to set the score to some computed value based on
the external metric rather than use the metric directly. For nodes, you can
store the metric as a node attribute, and query the attribute when computing
the score (possibly as part of a custom resource agent).
CIB Properties
##############
Certain settings are defined by CIB properties (that is, attributes of the
``cib`` tag) rather than with the rest of the cluster configuration in the
``configuration`` section.
The reason is simply a matter of parsing. These options are used by the
configuration database which is, by design, mostly ignorant of the content it
holds. So the decision was made to place them in an easy-to-find location.
.. list-table:: **CIB Properties**
:class: longtable
:widths: 2 2 2 5
:header-rows: 1
* - Name
- Type
- Default
- Description
* - .. _admin_epoch:
.. index::
pair: admin_epoch; cib
admin_epoch
- :ref:`nonnegative integer `
- 0
- When a node joins the cluster, the cluster asks the node with the
highest (``admin_epoch``, ``epoch``, ``num_updates``) tuple to replace
the configuration on all the nodes -- which makes setting them correctly
very important. ``admin_epoch`` is never modified by the cluster; you
can use this to make the configurations on any inactive nodes obsolete.
* - .. _epoch:
.. index::
pair: epoch; cib
epoch
- :ref:`nonnegative integer `
- 0
- The cluster increments this every time the CIB's configuration section
is updated.
* - .. _num_updates:
.. index::
pair: num_updates; cib
num_updates
- :ref:`nonnegative integer `
- 0
- The cluster increments this every time the CIB's configuration or status
sections are updated, and resets it to 0 when epoch changes.
* - .. _validate_with:
.. index::
pair: validate-with; cib
validate-with
- :ref:`enumeration `
-
- Determines the type of XML validation that will be done on the
configuration. Allowed values are ``none`` (in which case the cluster
will not require that updates conform to expected syntax) and the base
names of schema files installed on the local machine (for example,
"pacemaker-3.9")
* - .. _remote_tls_port:
.. index::
pair: remote-tls-port; cib
remote-tls-port
- :ref:`port `
-
- If set, the CIB manager will listen for anonymously encrypted remote
connections on this port, to allow CIB administration from hosts not in
the cluster. No key is used, so this should be used only on a protected
network where man-in-the-middle attacks can be avoided.
* - .. _remote_clear_port:
.. index::
pair: remote-clear-port; cib
remote-clear-port
- :ref:`port `
-
- If set to a TCP port number, the CIB manager will listen for remote
connections on this port, to allow for CIB administration from hosts not
in the cluster. No encryption is used, so this should be used only on a
protected network.
* - .. _cib_last_written:
.. index::
pair: cib-last-written; cib
cib-last-written
- :ref:`date/time `
-
- Indicates when the configuration was last written to disk. Maintained by
the cluster; for informational purposes only.
* - .. _have_quorum:
.. index::
pair: have-quorum; cib
have-quorum
- :ref:`boolean `
-
- Indicates whether the cluster has quorum. If false, the cluster's
response is determined by ``no-quorum-policy`` (see below). Maintained
by the cluster.
* - .. _dc_uuid:
.. index::
pair: dc-uuid; cib
dc-uuid
- :ref:`text `
-
- Node ID of the cluster's current designated controller (DC). Used and
maintained by the cluster.
.. _cluster_options:
Cluster Options
###############
Cluster options, as you might expect, control how the cluster behaves when
confronted with various situations.
They are grouped into sets within the ``crm_config`` section. In advanced
configurations, there may be more than one set. (This will be described later
in the chapter on :ref:`rules` where we will show how to have the cluster use
different sets of options during working hours than during weekends.) For now,
we will describe the simple case where each option is present at most once.
You can obtain an up-to-date list of cluster options, including their default
values, by running the ``man pacemaker-schedulerd`` and
``man pacemaker-controld`` commands.
.. list-table:: **Cluster Options**
:class: longtable
:widths: 2 2 2 5
:header-rows: 1
* - Name
- Type
- Default
- Description
* - .. _cluster_name:
.. index::
pair: cluster option; cluster-name
cluster-name
- :ref:`text `
-
- An (optional) name for the cluster as a whole. This is mostly for users'
convenience for use as desired in administration, but can be used in the
Pacemaker configuration in :ref:`rules` (as the ``#cluster-name``
:ref:`node attribute `). It may also
be used by higher-level tools when displaying cluster information, and
by certain resource agents (for example, the ``ocf:heartbeat:GFS2``
agent stores the cluster name in filesystem meta-data).
* - .. _dc_version:
.. index::
pair: cluster option; dc-version
dc-version
- :ref:`version `
- *detected*
- Version of Pacemaker on the cluster's designated controller (DC).
Maintained by the cluster, and intended for diagnostic purposes.
* - .. _cluster_infrastructure:
.. index::
pair: cluster option; cluster-infrastructure
cluster-infrastructure
- :ref:`text `
- *detected*
- The messaging layer with which Pacemaker is currently running.
Maintained by the cluster, and intended for informational and diagnostic
purposes.
* - .. _no_quorum_policy:
.. index::
pair: cluster option; no-quorum-policy
no-quorum-policy
- :ref:`enumeration `
- stop
- What to do when the cluster does not have quorum. Allowed values:
* ``ignore:`` continue all resource management
* ``freeze:`` continue resource management, but don't recover resources
from nodes not in the affected partition
* ``stop:`` stop all resources in the affected cluster partition
* ``demote:`` demote promotable resources and stop all other resources
in the affected cluster partition *(since 2.0.5)*
* ``suicide:`` fence all nodes in the affected cluster partition
* - .. _batch_limit:
.. index::
pair: cluster option; batch-limit
batch-limit
- :ref:`integer `
- 0
- The maximum number of actions that the cluster may execute in parallel
across all nodes. The ideal value will depend on the speed and load
of your network and cluster nodes. If zero, the cluster will impose a
dynamically calculated limit only when any node has high load. If -1,
the cluster will not impose any limit.
* - .. _migration_limit:
.. index::
pair: cluster option; migration-limit
migration-limit
- :ref:`integer `
- -1
- The number of :ref:`live migration ` actions that the
cluster is allowed to execute in parallel on a node. A value of -1 means
unlimited.
* - .. _symmetric_cluster:
.. index::
pair: cluster option; symmetric-cluster
symmetric-cluster
- :ref:`boolean `
- true
- If true, resources can run on any node by default. If false, a resource
is allowed to run on a node only if a
:ref:`location constraint ` enables it.
* - .. _stop_all_resources:
.. index::
pair: cluster option; stop-all-resources
stop-all-resources
- :ref:`boolean `
- false
- Whether all resources should be disallowed from running (can be useful
during maintenance or troubleshooting)
* - .. _stop_orphan_resources:
.. index::
pair: cluster option; stop-orphan-resources
stop-orphan-resources
- :ref:`boolean `
- true
- Whether resources that have been deleted from the configuration should
be stopped. This value takes precedence over
:ref:`is-managed ` (that is, even unmanaged resources will
be stopped when orphaned if this value is ``true``).
* - .. _stop_orphan_actions:
.. index::
pair: cluster option; stop-orphan-actions
stop-orphan-actions
- :ref:`boolean `
- true
- Whether recurring :ref:`operations ` that have been deleted
from the configuration should be cancelled
* - .. _start_failure_is_fatal:
.. index::
pair: cluster option; start-failure-is-fatal
start-failure-is-fatal
- :ref:`boolean `
- true
- Whether a failure to start a resource on a particular node prevents
further start attempts on that node. If ``false``, the cluster will
decide whether the node is still eligible based on the resource's
current failure count and ``migration-threshold``.
* - .. _enable_startup_probes:
.. index::
pair: cluster option; enable-startup-probes
enable-startup-probes
- :ref:`boolean `
- true
- Whether the cluster should check the pre-existing state of resources
when the cluster starts
* - .. _maintenance_mode:
.. index::
pair: cluster option; maintenance-mode
maintenance-mode
- :ref:`boolean `
- false
- If true, the cluster will not start or stop any resource in the cluster,
and any recurring operations (expect those specifying ``role`` as
``Stopped``) will be paused. If true, this overrides the
:ref:`maintenance ` node attribute,
:ref:`is-managed ` and :ref:`maintenance `
resource meta-attributes, and :ref:`enabled ` operation
meta-attribute.
* - .. _stonith_enabled:
.. index::
pair: cluster option; stonith-enabled
stonith-enabled
- :ref:`boolean `
- true
- Whether the cluster is allowed to fence nodes (for example, failed nodes
and nodes with resources that can't be stopped).
If true, at least one fence device must be configured before resources
are allowed to run.
If false, unresponsive nodes are immediately assumed to be running no
resources, and resource recovery on online nodes starts without any
further protection (which can mean *data loss* if the unresponsive node
still accesses shared storage, for example). See also the
:ref:`requires ` resource meta-attribute.
* - .. _stonith_action:
.. index::
pair: cluster option; stonith-action
stonith-action
- :ref:`enumeration `
- reboot
- Action the cluster should send to the fence agent when a node must be
fenced. Allowed values are ``reboot``, ``off``, and (for legacy agents
only) ``poweroff``.
* - .. _stonith_timeout:
.. index::
pair: cluster option; stonith-timeout
stonith-timeout
- :ref:`duration `
- 60s
- How long to wait for ``on``, ``off``, and ``reboot`` fence actions to
complete by default.
* - .. _stonith_max_attempts:
.. index::
pair: cluster option; stonith-max-attempts
stonith-max-attempts
- :ref:`score `
- 10
- How many times fencing can fail for a target before the cluster will no
longer immediately re-attempt it. Any value below 1 will be ignored, and
the default will be used instead.
* - .. _stonith_watchdog_timeout:
.. index::
pair: cluster option; stonith-watchdog-timeout
stonith-watchdog-timeout
- :ref:`timeout `
- 0
- If nonzero, and the cluster detects ``have-watchdog`` as ``true``, then
watchdog-based self-fencing will be performed via SBD when fencing is
required, without requiring a fencing resource explicitly configured.
If this is set to a positive value, unseen nodes are assumed to
self-fence within this much time.
**Warning:** It must be ensured that this value is larger than the
``SBD_WATCHDOG_TIMEOUT`` environment variable on all nodes. Pacemaker
verifies the settings individually on all nodes and prevents startup or
shuts down if configured wrongly on the fly. It is strongly recommended
that ``SBD_WATCHDOG_TIMEOUT`` be set to the same value on all nodes.
If this is set to a negative value, and ``SBD_WATCHDOG_TIMEOUT`` is set,
twice that value will be used.
**Warning:** In this case, it is essential (and currently not verified
by pacemaker) that ``SBD_WATCHDOG_TIMEOUT`` is set to the same value on
all nodes.
* - .. _concurrent-fencing:
.. index::
pair: cluster option; concurrent-fencing
concurrent-fencing
- :ref:`boolean `
- false
- Whether the cluster is allowed to initiate multiple fence actions
concurrently. Fence actions initiated externally, such as via the
``stonith_admin`` tool or an application such as DLM, or by the fencer
itself such as recurring device monitors and ``status`` and ``list``
commands, are not limited by this option.
* - .. _fence_reaction:
.. index::
pair: cluster option; fence-reaction
fence-reaction
- :ref:`enumeration `
- stop
- How should a cluster node react if notified of its own fencing? A
cluster node may receive notification of its own fencing if fencing is
misconfigured, or if fabric fencing is in use that doesn't cut cluster
communication. Allowed values are ``stop`` to attempt to immediately
stop Pacemaker and stay stopped, or ``panic`` to attempt to immediately
reboot the local node, falling back to stop on failure. The default is
likely to be changed to ``panic`` in a future release. *(since 2.0.3)*
* - .. _priority_fencing_delay:
.. index::
pair: cluster option; priority-fencing-delay
priority-fencing-delay
- :ref:`duration `
- 0
- Apply this delay to any fencing targeting the lost nodes with the
highest total resource priority in case we don't have the majority of
the nodes in our cluster partition, so that the more significant nodes
potentially win any fencing match (especially meaningful in a
split-brain of a 2-node cluster). A promoted resource instance takes the
resource's priority plus 1 if the resource's priority is not 0. Any
static or random delays introduced by ``pcmk_delay_base`` and
``pcmk_delay_max`` configured for the corresponding fencing resources
will be added to this delay. This delay should be significantly greater
than (safely twice) the maximum delay from those parameters. *(since
2.0.4)*
* - .. _node_pending_timeout:
.. index::
pair: cluster option; node-pending-timeout
node-pending-timeout
- :ref:`duration `
- - 2h
+ - 0
- Fence nodes that do not join the controller process group within this
much time after joining the cluster, to allow the cluster to continue
- managing resources. A value of 0 means never fence pending nodes.
- *(since 2.1.7)*
+ managing resources. A value of 0 means never fence pending nodes. Setting the value to 2h means fence nodes after 2 hours.
+ *(since 2.1.7)*
* - .. _cluster_delay:
.. index::
pair: cluster option; cluster-delay
cluster-delay
- :ref:`duration `
- 60s
- If the DC requires an action to be executed on another node, it will
consider the action failed if it does not get a response from the other
node within this time (beyond the action's own timeout). The ideal value
will depend on the speed and load of your network and cluster nodes.
* - .. _dc_deadtime:
.. index::
pair: cluster option; dc-deadtime
dc-deadtime
- :ref:`duration `
- 20s
- How long to wait for a response from other nodes when electing a DC. The
ideal value will depend on the speed and load of your network and
cluster nodes.
* - .. _cluster_ipc_limit:
.. index::
pair: cluster option; cluster-ipc-limit
cluster-ipc-limit
- :ref:`nonnegative integer `
- 500
- The maximum IPC message backlog before one cluster daemon will
disconnect another. This is of use in large clusters, for which a good
value is the number of resources in the cluster multiplied by the number
of nodes. The default of 500 is also the minimum. Raise this if you see
"Evicting client" log messages for cluster daemon process IDs.
* - .. _pe_error_series_max:
.. index::
pair: cluster option; pe-error-series-max
pe-error-series-max
- :ref:`integer `
- -1
- The number of scheduler inputs resulting in errors to save. These inputs
can be helpful during troubleshooting and when reporting issues. A
negative value means save all inputs, and 0 means save none.
* - .. _pe_warn_series_max:
.. index::
pair: cluster option; pe-warn-series-max
pe-warn-series-max
- :ref:`integer `
- 5000
- The number of scheduler inputs resulting in warnings to save. These
inputs can be helpful during troubleshooting and when reporting issues.
A negative value means save all inputs, and 0 means save none.
* - .. _pe_input_series_max:
.. index::
pair: cluster option; pe-input-series-max
pe-input-series-max
- :ref:`integer `
- 4000
- The number of "normal" scheduler inputs to save. These inputs can be
helpful during troubleshooting and when reporting issues. A negative
value means save all inputs, and 0 means save none.
* - .. _enable_acl:
.. index::
pair: cluster option; enable-acl
enable-acl
- :ref:`boolean `
- false
- Whether :ref:`access control lists ` should be used to authorize
CIB modifications
* - .. _placement_strategy:
.. index::
pair: cluster option; placement-strategy
placement-strategy
- :ref:`enumeration `
- default
- How the cluster should assign resources to nodes (see
:ref:`utilization`). Allowed values are ``default``, ``utilization``,
``balanced``, and ``minimal``.
* - .. _node_health_strategy:
.. index::
pair: cluster option; node-health-strategy
node-health-strategy
- :ref:`enumeration `
- none
- How the cluster should react to :ref:`node health `
attributes. Allowed values are ``none``, ``migrate-on-red``,
``only-green``, ``progressive``, and ``custom``.
* - .. _node_health_base:
.. index::
pair: cluster option; node-health-base
node-health-base
- :ref:`score `
- 0
- The base health score assigned to a node. Only used when
``node-health-strategy`` is ``progressive``.
* - .. _node_health_green:
.. index::
pair: cluster option; node-health-green
node-health-green
- :ref:`score `
- 0
- The score to use for a node health attribute whose value is ``green``.
Only used when ``node-health-strategy`` is ``progressive`` or
``custom``.
* - .. _node_health_yellow:
.. index::
pair: cluster option; node-health-yellow
node-health-yellow
- :ref:`score `
- 0
- The score to use for a node health attribute whose value is ``yellow``.
Only used when ``node-health-strategy`` is ``progressive`` or
``custom``.
* - .. _node_health_red:
.. index::
pair: cluster option; node-health-red
node-health-red
- :ref:`score `
- 0
- The score to use for a node health attribute whose value is ``red``.
Only used when ``node-health-strategy`` is ``progressive`` or
``custom``.
* - .. _cluster_recheck_interval:
.. index::
pair: cluster option; cluster-recheck-interval
cluster-recheck-interval
- :ref:`duration `
- 15min
- Pacemaker is primarily event-driven, and looks ahead to know when to
recheck the cluster for failure timeouts and most time-based rules
*(since 2.0.3)*. However, it will also recheck the cluster after this
amount of inactivity. This has two goals: rules with ``date_spec`` are
only guaranteed to be checked this often, and it also serves as a
fail-safe for some kinds of scheduler bugs. A value of 0 disables this
polling.
* - .. _shutdown_lock:
.. index::
pair: cluster option; shutdown-lock
shutdown-lock
- :ref:`boolean `
- false
- The default of false allows active resources to be recovered elsewhere
when their node is cleanly shut down, which is what the vast majority of
users will want. However, some users prefer to make resources highly
available only for failures, with no recovery for clean shutdowns. If
this option is true, resources active on a node when it is cleanly shut
down are kept "locked" to that node (not allowed to run elsewhere) until
they start again on that node after it rejoins (or for at most
``shutdown-lock-limit``, if set). Stonith resources and Pacemaker Remote
connections are never locked. Clone and bundle instances and the
promoted role of promotable clones are currently never locked, though
support could be added in a future release. Locks may be manually
cleared using the ``--refresh`` option of ``crm_resource`` (both the
resource and node must be specified; this works with remote nodes if
their connection resource's ``target-role`` is set to ``Stopped``, but
not if Pacemaker Remote is stopped on the remote node without disabling
the connection resource). *(since 2.0.4)*
* - .. _shutdown_lock_limit:
.. index::
pair: cluster option; shutdown-lock-limit
shutdown-lock-limit
- :ref:`duration `
- 0
- If ``shutdown-lock`` is true, and this is set to a nonzero time
duration, locked resources will be allowed to start after this much time
has passed since the node shutdown was initiated, even if the node has
not rejoined. (This works with remote nodes only if their connection
resource's ``target-role`` is set to ``Stopped``.) *(since 2.0.4)*
* - .. _remove_after_stop:
.. index::
pair: cluster option; remove-after-stop
remove-after-stop
- :ref:`boolean `
- false
- *Deprecated* Whether the cluster should remove resources from
Pacemaker's executor after they are stopped. Values other than the
default are, at best, poorly tested and potentially dangerous. This
option is deprecated and will be removed in a future release.
* - .. _startup_fencing:
.. index::
pair: cluster option; startup-fencing
startup-fencing
- :ref:`boolean `
- true
- *Advanced Use Only:* Whether the cluster should fence unseen nodes at
start-up. Setting this to false is unsafe, because the unseen nodes
could be active and running resources but unreachable. ``dc-deadtime``
acts as a grace period before this fencing, since a DC must be elected
to schedule fencing.
* - .. _election_timeout:
.. index::
pair: cluster option; election-timeout
election-timeout
- :ref:`duration `
- 2min
- *Advanced Use Only:* If a winner is not declared within this much time
of starting an election, the node that initiated the election will
declare itself the winner.
* - .. _shutdown_escalation:
.. index::
pair: cluster option; shutdown-escalation
shutdown-escalation
- :ref:`duration `
- 20min
- *Advanced Use Only:* The controller will exit immediately if a shutdown
does not complete within this much time.
* - .. _join_integration_timeout:
.. index::
pair: cluster option; join-integration-timeout
join-integration-timeout
- :ref:`duration `
- 3min
- *Advanced Use Only:* If you need to adjust this value, it probably
indicates the presence of a bug.
* - .. _join_finalization_timeout:
.. index::
pair: cluster option; join-finalization-timeout
join-finalization-timeout
- :ref:`duration `
- 30min
- *Advanced Use Only:* If you need to adjust this value, it probably
indicates the presence of a bug.
* - .. _transition_delay:
.. index::
pair: cluster option; transition-delay
transition-delay
- :ref:`duration `
- 0s
- *Advanced Use Only:* Delay cluster recovery for the configured interval
to allow for additional or related events to occur. This can be useful
if your configuration is sensitive to the order in which ping updates
arrive. Enabling this option will slow down cluster recovery under all
conditions.
diff --git a/lib/pengine/common.c b/lib/pengine/common.c
index 1a20775c23..0fdd5a120b 100644
--- a/lib/pengine/common.c
+++ b/lib/pengine/common.c
@@ -1,627 +1,627 @@
/*
* Copyright 2004-2023 the Pacemaker project contributors
*
* The version control history for this file may have further details.
*
* This source code is licensed under the GNU Lesser General Public License
* version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY.
*/
#include
#include
#include
#include
#include
#include
#include
#include
gboolean was_processing_error = FALSE;
gboolean was_processing_warning = FALSE;
static bool
check_placement_strategy(const char *value)
{
return pcmk__strcase_any_of(value, "default", "utilization", "minimal",
"balanced", NULL);
}
static pcmk__cluster_option_t pe_opts[] = {
/* name, old name, type, allowed values,
* default value, validator,
* short description,
* long description
*/
{
"no-quorum-policy", NULL, "select", "stop, freeze, ignore, demote, suicide",
"stop", pcmk__valid_quorum,
N_("What to do when the cluster does not have quorum"),
NULL
},
{
"symmetric-cluster", NULL, "boolean", NULL,
"true", pcmk__valid_boolean,
N_("Whether resources can run on any node by default"),
NULL
},
{
"maintenance-mode", NULL, "boolean", NULL,
"false", pcmk__valid_boolean,
N_("Whether the cluster should refrain from monitoring, starting, "
"and stopping resources"),
NULL
},
{
"start-failure-is-fatal", NULL, "boolean", NULL,
"true", pcmk__valid_boolean,
N_("Whether a start failure should prevent a resource from being "
"recovered on the same node"),
N_("When true, the cluster will immediately ban a resource from a node "
"if it fails to start there. When false, the cluster will instead "
"check the resource's fail count against its migration-threshold.")
},
{
"enable-startup-probes", NULL, "boolean", NULL,
"true", pcmk__valid_boolean,
N_("Whether the cluster should check for active resources during start-up"),
NULL
},
{
XML_CONFIG_ATTR_SHUTDOWN_LOCK, NULL, "boolean", NULL,
"false", pcmk__valid_boolean,
N_("Whether to lock resources to a cleanly shut down node"),
N_("When true, resources active on a node when it is cleanly shut down "
"are kept \"locked\" to that node (not allowed to run elsewhere) "
"until they start again on that node after it rejoins (or for at "
"most shutdown-lock-limit, if set). Stonith resources and "
"Pacemaker Remote connections are never locked. Clone and bundle "
"instances and the promoted role of promotable clones are "
"currently never locked, though support could be added in a future "
"release.")
},
{
XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT, NULL, "time", NULL,
"0", pcmk__valid_interval_spec,
N_("Do not lock resources to a cleanly shut down node longer than "
"this"),
N_("If shutdown-lock is true and this is set to a nonzero time "
"duration, shutdown locks will expire after this much time has "
"passed since the shutdown was initiated, even if the node has not "
"rejoined.")
},
// Fencing-related options
{
"stonith-enabled", NULL, "boolean", NULL,
"true", pcmk__valid_boolean,
N_("*** Advanced Use Only *** "
"Whether nodes may be fenced as part of recovery"),
N_("If false, unresponsive nodes are immediately assumed to be harmless, "
"and resources that were active on them may be recovered "
"elsewhere. This can result in a \"split-brain\" situation, "
"potentially leading to data loss and/or service unavailability.")
},
{
"stonith-action", NULL, "select", "reboot, off, poweroff",
PCMK_ACTION_REBOOT, pcmk__is_fencing_action,
N_("Action to send to fence device when a node needs to be fenced "
"(\"poweroff\" is a deprecated alias for \"off\")"),
NULL
},
{
"stonith-timeout", NULL, "time", NULL,
"60s", pcmk__valid_interval_spec,
N_("*** Advanced Use Only *** Unused by Pacemaker"),
N_("This value is not used by Pacemaker, but is kept for backward "
"compatibility, and certain legacy fence agents might use it.")
},
{
XML_ATTR_HAVE_WATCHDOG, NULL, "boolean", NULL,
"false", pcmk__valid_boolean,
N_("Whether watchdog integration is enabled"),
N_("This is set automatically by the cluster according to whether SBD "
"is detected to be in use. User-configured values are ignored. "
"The value `true` is meaningful if diskless SBD is used and "
"`stonith-watchdog-timeout` is nonzero. In that case, if fencing "
"is required, watchdog-based self-fencing will be performed via "
"SBD without requiring a fencing resource explicitly configured.")
},
{
"concurrent-fencing", NULL, "boolean", NULL,
PCMK__CONCURRENT_FENCING_DEFAULT, pcmk__valid_boolean,
N_("Allow performing fencing operations in parallel"),
NULL
},
{
"startup-fencing", NULL, "boolean", NULL,
"true", pcmk__valid_boolean,
N_("*** Advanced Use Only *** Whether to fence unseen nodes at start-up"),
N_("Setting this to false may lead to a \"split-brain\" situation,"
"potentially leading to data loss and/or service unavailability.")
},
{
XML_CONFIG_ATTR_PRIORITY_FENCING_DELAY, NULL, "time", NULL,
"0", pcmk__valid_interval_spec,
N_("Apply fencing delay targeting the lost nodes with the highest total resource priority"),
N_("Apply specified delay for the fencings that are targeting the lost "
"nodes with the highest total resource priority in case we don't "
"have the majority of the nodes in our cluster partition, so that "
"the more significant nodes potentially win any fencing match, "
"which is especially meaningful under split-brain of 2-node "
"cluster. A promoted resource instance takes the base priority + 1 "
"on calculation if the base priority is not 0. Any static/random "
"delays that are introduced by `pcmk_delay_base/max` configured "
"for the corresponding fencing resources will be added to this "
"delay. This delay should be significantly greater than, safely "
"twice, the maximum `pcmk_delay_base/max`. By default, priority "
"fencing delay is disabled.")
},
-
{
XML_CONFIG_ATTR_NODE_PENDING_TIMEOUT, NULL, "time", NULL,
- "2h", pcmk__valid_interval_spec,
+ "0", pcmk__valid_interval_spec,
N_("How long to wait for a node that has joined the cluster to join "
"the controller process group"),
N_("Fence nodes that do not join the controller process group within "
"this much time after joining the cluster, to allow the cluster "
"to continue managing resources. A value of 0 means never fence "
- "pending nodes.")
+ "pending nodes. Setting the value to 2h means fence nodes after "
+ "2 hours.")
},
{
"cluster-delay", NULL, "time", NULL,
"60s", pcmk__valid_interval_spec,
N_("Maximum time for node-to-node communication"),
N_("The node elected Designated Controller (DC) will consider an action "
"failed if it does not get a response from the node executing the "
"action within this time (after considering the action's own "
"timeout). The \"correct\" value will depend on the speed and "
"load of your network and cluster nodes.")
},
{
"batch-limit", NULL, "integer", NULL,
"0", pcmk__valid_number,
N_("Maximum number of jobs that the cluster may execute in parallel "
"across all nodes"),
N_("The \"correct\" value will depend on the speed and load of your "
"network and cluster nodes. If set to 0, the cluster will "
"impose a dynamically calculated limit when any node has a "
"high load.")
},
{
"migration-limit", NULL, "integer", NULL,
"-1", pcmk__valid_number,
N_("The number of live migration actions that the cluster is allowed "
"to execute in parallel on a node (-1 means no limit)")
},
/* Orphans and stopping */
{
"stop-all-resources", NULL, "boolean", NULL,
"false", pcmk__valid_boolean,
N_("Whether the cluster should stop all active resources"),
NULL
},
{
"stop-orphan-resources", NULL, "boolean", NULL,
"true", pcmk__valid_boolean,
N_("Whether to stop resources that were removed from the configuration"),
NULL
},
{
"stop-orphan-actions", NULL, "boolean", NULL,
"true", pcmk__valid_boolean,
N_("Whether to cancel recurring actions removed from the configuration"),
NULL
},
{
"remove-after-stop", NULL, "boolean", NULL,
"false", pcmk__valid_boolean,
N_("*** Deprecated *** Whether to remove stopped resources from "
"the executor"),
N_("Values other than default are poorly tested and potentially dangerous."
" This option will be removed in a future release.")
},
/* Storing inputs */
{
"pe-error-series-max", NULL, "integer", NULL,
"-1", pcmk__valid_number,
N_("The number of scheduler inputs resulting in errors to save"),
N_("Zero to disable, -1 to store unlimited.")
},
{
"pe-warn-series-max", NULL, "integer", NULL,
"5000", pcmk__valid_number,
N_("The number of scheduler inputs resulting in warnings to save"),
N_("Zero to disable, -1 to store unlimited.")
},
{
"pe-input-series-max", NULL, "integer", NULL,
"4000", pcmk__valid_number,
N_("The number of scheduler inputs without errors or warnings to save"),
N_("Zero to disable, -1 to store unlimited.")
},
/* Node health */
{
PCMK__OPT_NODE_HEALTH_STRATEGY, NULL, "select",
PCMK__VALUE_NONE ", " PCMK__VALUE_MIGRATE_ON_RED ", "
PCMK__VALUE_ONLY_GREEN ", " PCMK__VALUE_PROGRESSIVE ", "
PCMK__VALUE_CUSTOM,
PCMK__VALUE_NONE, pcmk__validate_health_strategy,
N_("How cluster should react to node health attributes"),
N_("Requires external entities to create node attributes (named with "
"the prefix \"#health\") with values \"red\", "
"\"yellow\", or \"green\".")
},
{
PCMK__OPT_NODE_HEALTH_BASE, NULL, "integer", NULL,
"0", pcmk__valid_number,
N_("Base health score assigned to a node"),
N_("Only used when \"node-health-strategy\" is set to \"progressive\".")
},
{
PCMK__OPT_NODE_HEALTH_GREEN, NULL, "integer", NULL,
"0", pcmk__valid_number,
N_("The score to use for a node health attribute whose value is \"green\""),
N_("Only used when \"node-health-strategy\" is set to \"custom\" or \"progressive\".")
},
{
PCMK__OPT_NODE_HEALTH_YELLOW, NULL, "integer", NULL,
"0", pcmk__valid_number,
N_("The score to use for a node health attribute whose value is \"yellow\""),
N_("Only used when \"node-health-strategy\" is set to \"custom\" or \"progressive\".")
},
{
PCMK__OPT_NODE_HEALTH_RED, NULL, "integer", NULL,
"-INFINITY", pcmk__valid_number,
N_("The score to use for a node health attribute whose value is \"red\""),
N_("Only used when \"node-health-strategy\" is set to \"custom\" or \"progressive\".")
},
/*Placement Strategy*/
{
"placement-strategy", NULL, "select",
"default, utilization, minimal, balanced",
"default", check_placement_strategy,
N_("How the cluster should allocate resources to nodes"),
NULL
},
};
void
pe_metadata(pcmk__output_t *out)
{
const char *desc_short = "Pacemaker scheduler options";
const char *desc_long = "Cluster options used by Pacemaker's scheduler";
gchar *s = pcmk__format_option_metadata("pacemaker-schedulerd", desc_short,
desc_long, pe_opts,
PCMK__NELEM(pe_opts));
out->output_xml(out, "metadata", s);
g_free(s);
}
void
verify_pe_options(GHashTable * options)
{
pcmk__validate_cluster_options(options, pe_opts, PCMK__NELEM(pe_opts));
}
const char *
pe_pref(GHashTable * options, const char *name)
{
return pcmk__cluster_option(options, pe_opts, PCMK__NELEM(pe_opts), name);
}
const char *
fail2text(enum action_fail_response fail)
{
const char *result = "";
switch (fail) {
case pcmk_on_fail_ignore:
result = "ignore";
break;
case pcmk_on_fail_demote:
result = "demote";
break;
case pcmk_on_fail_block:
result = "block";
break;
case pcmk_on_fail_restart:
result = "recover";
break;
case pcmk_on_fail_ban:
result = "migrate";
break;
case pcmk_on_fail_stop:
result = "stop";
break;
case pcmk_on_fail_fence_node:
result = "fence";
break;
case pcmk_on_fail_standby_node:
result = "standby";
break;
case pcmk_on_fail_restart_container:
result = "restart-container";
break;
case pcmk_on_fail_reset_remote:
result = "reset-remote";
break;
}
return result;
}
enum action_tasks
text2task(const char *task)
{
if (pcmk__str_eq(task, PCMK_ACTION_STOP, pcmk__str_casei)) {
return pcmk_action_stop;
} else if (pcmk__str_eq(task, PCMK_ACTION_STOPPED, pcmk__str_casei)) {
return pcmk_action_stopped;
} else if (pcmk__str_eq(task, PCMK_ACTION_START, pcmk__str_casei)) {
return pcmk_action_start;
} else if (pcmk__str_eq(task, PCMK_ACTION_RUNNING, pcmk__str_casei)) {
return pcmk_action_started;
} else if (pcmk__str_eq(task, PCMK_ACTION_DO_SHUTDOWN, pcmk__str_casei)) {
return pcmk_action_shutdown;
} else if (pcmk__str_eq(task, PCMK_ACTION_STONITH, pcmk__str_casei)) {
return pcmk_action_fence;
} else if (pcmk__str_eq(task, PCMK_ACTION_MONITOR, pcmk__str_casei)) {
return pcmk_action_monitor;
} else if (pcmk__str_eq(task, PCMK_ACTION_NOTIFY, pcmk__str_casei)) {
return pcmk_action_notify;
} else if (pcmk__str_eq(task, PCMK_ACTION_NOTIFIED, pcmk__str_casei)) {
return pcmk_action_notified;
} else if (pcmk__str_eq(task, PCMK_ACTION_PROMOTE, pcmk__str_casei)) {
return pcmk_action_promote;
} else if (pcmk__str_eq(task, PCMK_ACTION_DEMOTE, pcmk__str_casei)) {
return pcmk_action_demote;
} else if (pcmk__str_eq(task, PCMK_ACTION_PROMOTED, pcmk__str_casei)) {
return pcmk_action_promoted;
} else if (pcmk__str_eq(task, PCMK_ACTION_DEMOTED, pcmk__str_casei)) {
return pcmk_action_demoted;
}
return pcmk_action_unspecified;
}
const char *
task2text(enum action_tasks task)
{
const char *result = "";
switch (task) {
case pcmk_action_unspecified:
result = "no_action";
break;
case pcmk_action_stop:
result = PCMK_ACTION_STOP;
break;
case pcmk_action_stopped:
result = PCMK_ACTION_STOPPED;
break;
case pcmk_action_start:
result = PCMK_ACTION_START;
break;
case pcmk_action_started:
result = PCMK_ACTION_RUNNING;
break;
case pcmk_action_shutdown:
result = PCMK_ACTION_DO_SHUTDOWN;
break;
case pcmk_action_fence:
result = PCMK_ACTION_STONITH;
break;
case pcmk_action_monitor:
result = PCMK_ACTION_MONITOR;
break;
case pcmk_action_notify:
result = PCMK_ACTION_NOTIFY;
break;
case pcmk_action_notified:
result = PCMK_ACTION_NOTIFIED;
break;
case pcmk_action_promote:
result = PCMK_ACTION_PROMOTE;
break;
case pcmk_action_promoted:
result = PCMK_ACTION_PROMOTED;
break;
case pcmk_action_demote:
result = PCMK_ACTION_DEMOTE;
break;
case pcmk_action_demoted:
result = PCMK_ACTION_DEMOTED;
break;
}
return result;
}
const char *
role2text(enum rsc_role_e role)
{
switch (role) {
case pcmk_role_stopped:
return PCMK__ROLE_STOPPED;
case pcmk_role_started:
return PCMK__ROLE_STARTED;
case pcmk_role_unpromoted:
#ifdef PCMK__COMPAT_2_0
return PCMK__ROLE_UNPROMOTED_LEGACY;
#else
return PCMK__ROLE_UNPROMOTED;
#endif
case pcmk_role_promoted:
#ifdef PCMK__COMPAT_2_0
return PCMK__ROLE_PROMOTED_LEGACY;
#else
return PCMK__ROLE_PROMOTED;
#endif
default: // pcmk_role_unknown
return PCMK__ROLE_UNKNOWN;
}
}
enum rsc_role_e
text2role(const char *role)
{
CRM_ASSERT(role != NULL);
if (pcmk__str_eq(role, PCMK__ROLE_STOPPED, pcmk__str_casei)) {
return pcmk_role_stopped;
} else if (pcmk__str_eq(role, PCMK__ROLE_STARTED, pcmk__str_casei)) {
return pcmk_role_started;
} else if (pcmk__strcase_any_of(role, PCMK__ROLE_UNPROMOTED,
PCMK__ROLE_UNPROMOTED_LEGACY, NULL)) {
return pcmk_role_unpromoted;
} else if (pcmk__strcase_any_of(role, PCMK__ROLE_PROMOTED,
PCMK__ROLE_PROMOTED_LEGACY, NULL)) {
return pcmk_role_promoted;
} else if (pcmk__str_eq(role, PCMK__ROLE_UNKNOWN, pcmk__str_casei)) {
return pcmk_role_unknown;
}
crm_err("Unknown role: %s", role);
return pcmk_role_unknown;
}
void
add_hash_param(GHashTable * hash, const char *name, const char *value)
{
CRM_CHECK(hash != NULL, return);
crm_trace("Adding name='%s' value='%s' to hash table",
pcmk__s(name, ""), pcmk__s(value, ""));
if (name == NULL || value == NULL) {
return;
} else if (pcmk__str_eq(value, "#default", pcmk__str_casei)) {
return;
} else if (g_hash_table_lookup(hash, name) == NULL) {
g_hash_table_insert(hash, strdup(name), strdup(value));
}
}
/*!
* \internal
* \brief Look up an attribute value on the appropriate node
*
* If \p node is a guest node and either the \c XML_RSC_ATTR_TARGET meta
* attribute is set to "host" for \p rsc or \p force_host is \c true, query the
* attribute on the node's host. Otherwise, query the attribute on \p node
* itself.
*
* \param[in] node Node to query attribute value on by default
* \param[in] name Name of attribute to query
* \param[in] rsc Resource on whose behalf we're querying
* \param[in] node_type Type of resource location lookup
* \param[in] force_host Force a lookup on the guest node's host, regardless of
* the \c XML_RSC_ATTR_TARGET value
*
* \return Value of the attribute on \p node or on the host of \p node
*
* \note If \p force_host is \c true, \p node \e must be a guest node.
*/
const char *
pe__node_attribute_calculated(const pcmk_node_t *node, const char *name,
const pcmk_resource_t *rsc,
enum pcmk__rsc_node node_type,
bool force_host)
{
// @TODO: Use pe__is_guest_node() after merging libpe_{rules,status}
bool is_guest = (node != NULL)
&& (node->details->type == pcmk_node_variant_remote)
&& (node->details->remote_rsc != NULL)
&& (node->details->remote_rsc->container != NULL);
const char *source = NULL;
const char *node_type_s = NULL;
const char *reason = NULL;
const pcmk_resource_t *container = NULL;
const pcmk_node_t *host = NULL;
CRM_ASSERT((node != NULL) && (name != NULL) && (rsc != NULL)
&& (!force_host || is_guest));
/* Ignore XML_RSC_ATTR_TARGET if node is not a guest node. This represents a
* user configuration error.
*/
source = g_hash_table_lookup(rsc->meta, XML_RSC_ATTR_TARGET);
if (!force_host
&& (!is_guest || !pcmk__str_eq(source, "host", pcmk__str_casei))) {
return g_hash_table_lookup(node->details->attrs, name);
}
container = node->details->remote_rsc->container;
switch (node_type) {
case pcmk__rsc_node_assigned:
node_type_s = "assigned";
host = container->allocated_to;
if (host == NULL) {
reason = "not assigned";
}
break;
case pcmk__rsc_node_current:
node_type_s = "current";
if (container->running_on != NULL) {
host = container->running_on->data;
}
if (host == NULL) {
reason = "inactive";
}
break;
default:
// Add support for other enum pcmk__rsc_node values if needed
CRM_ASSERT(false);
break;
}
if (host != NULL) {
const char *value = g_hash_table_lookup(host->details->attrs, name);
pe_rsc_trace(rsc,
"%s: Value lookup for %s on %s container host %s %s%s",
rsc->id, name, node_type_s, pe__node_name(host),
((value != NULL)? "succeeded: " : "failed"),
pcmk__s(value, ""));
return value;
}
pe_rsc_trace(rsc,
"%s: Not looking for %s on %s container host: %s is %s",
rsc->id, name, node_type_s, container->id, reason);
return NULL;
}
const char *
pe_node_attribute_raw(const pcmk_node_t *node, const char *name)
{
if(node == NULL) {
return NULL;
}
return g_hash_table_lookup(node->details->attrs, name);
}