diff --git a/cts/cli/regression.daemons.exp b/cts/cli/regression.daemons.exp index 66bd7b3aed..7bb5d05bc9 100644 --- a/cts/cli/regression.daemons.exp +++ b/cts/cli/regression.daemons.exp @@ -1,446 +1,451 @@ =#=#=#= Begin test: Get CIB manager metadata =#=#=#= 1.1 Cluster options used by Pacemaker's Cluster Information Base manager Cluster Information Base manager options Enable Access Control Lists (ACLs) for the CIB Enable Access Control Lists (ACLs) for the CIB Raise this if log has "Evicting client" messages for cluster daemon PIDs (a good value is the number of resources in the cluster multiplied by the number of nodes). Maximum IPC message backlog before disconnecting a cluster daemon =#=#=#= End test: Get CIB manager metadata - OK (0) =#=#=#= * Passed: pacemaker-based - Get CIB manager metadata =#=#=#= Begin test: Get controller metadata =#=#=#= 1.1 Cluster options used by Pacemaker's controller Pacemaker controller options Includes a hash which identifies the exact changeset the code was built from. Used for diagnostic purposes. Pacemaker version on cluster node elected Designated Controller (DC) Used for informational and diagnostic purposes. The messaging stack on which Pacemaker is currently running This optional value is mostly for users' convenience as desired in administration, but may also be used in Pacemaker configuration rules via the #cluster-name node attribute, and by higher-level tools and resource agents. An arbitrary name for the cluster The optimal value will depend on the speed and load of your network and the type of switches used. How long to wait for a response from other nodes during start-up Pacemaker is primarily event-driven, and looks ahead to know when to recheck cluster state for failure timeouts and most time-based rules. However, it will also recheck the cluster after this amount of inactivity, to evaluate rules with date specifications and serve as a fail-safe for certain types of scheduler bugs. Allowed values: Zero disables polling, while positive values are an interval in seconds(unless other units are specified, for example "5min") Polling interval to recheck cluster state and evaluate rules with date specifications The cluster will slow down its recovery process when the amount of system resources used (currently CPU) approaches this limit Maximum amount of system load that should be used by cluster nodes Maximum number of jobs that can be scheduled per node (defaults to 2x cores) Maximum number of jobs that can be scheduled per node (defaults to 2x cores) A cluster node may receive notification of its own fencing if fencing is misconfigured, or if fabric fencing is in use that doesn't cut cluster communication. Allowed values are "stop" to attempt to immediately stop Pacemaker and stay stopped, or "panic" to attempt to immediately reboot the local node, falling back to stop on failure. How a cluster node should react if notified of its own fencing Declare an election failed if it is not decided within this much time. If you need to adjust this value, it probably indicates the presence of a bug. *** Advanced Use Only *** Exit immediately if shutdown does not complete within this much time. If you need to adjust this value, it probably indicates the presence of a bug. *** Advanced Use Only *** If you need to adjust this value, it probably indicates the presence of a bug. *** Advanced Use Only *** If you need to adjust this value, it probably indicates the presence of a bug. *** Advanced Use Only *** Delay cluster recovery for this much time to allow for additional events to occur. Useful if your configuration is sensitive to the order in which ping updates arrive. *** Advanced Use Only *** Enabling this option will slow down cluster recovery under all conditions If this is set to a positive value, lost nodes are assumed to self-fence using watchdog-based SBD within this much time. This does not require a fencing resource to be explicitly configured, though a fence_watchdog resource can be configured, to limit use to specific nodes. If this is set to 0 (the default), the cluster will never assume watchdog-based self-fencing. If this is set to a negative value, the cluster will use twice the local value of the `SBD_WATCHDOG_TIMEOUT` environment variable if that is positive, or otherwise treat this as 0. WARNING: When used, this timeout must be larger than `SBD_WATCHDOG_TIMEOUT` on all nodes that use watchdog-based SBD, and Pacemaker will refuse to start on any of those nodes where this is not true for the local value or SBD is not active. When this is set to a negative value, `SBD_WATCHDOG_TIMEOUT` must be set to the same value on all nodes that use SBD, otherwise data corruption or loss could occur. How long before nodes can be assumed to be safely down when watchdog-based self-fencing via SBD is in use How many times fencing can fail before it will no longer be immediately re-attempted on a target How many times fencing can fail before it will no longer be immediately re-attempted on a target What to do when the cluster does not have quorum Allowed values: stop, freeze, ignore, demote, suicide What to do when the cluster does not have quorum When true, resources active on a node when it is cleanly shut down are kept "locked" to that node (not allowed to run elsewhere) until they start again on that node after it rejoins (or for at most shutdown-lock-limit, if set). Stonith resources and Pacemaker Remote connections are never locked. Clone and bundle instances and the promoted role of promotable clones are currently never locked, though support could be added in a future release. Whether to lock resources to a cleanly shut down node If shutdown-lock is true and this is set to a nonzero time duration, shutdown locks will expire after this much time has passed since the shutdown was initiated, even if the node has not rejoined. Do not lock resources to a cleanly shut down node longer than this =#=#=#= End test: Get controller metadata - OK (0) =#=#=#= * Passed: pacemaker-controld - Get controller metadata =#=#=#= Begin test: Get fencer metadata =#=#=#= 1.1 Instance attributes available for all "stonith"-class resources and used by Pacemaker's fence daemon, formerly known as stonithd Instance attributes available for all "stonith"-class resources some devices do not support the standard 'port' parameter or may provide additional ones. Use this to specify an alternate, device-specific, parameter that should indicate the machine to be fenced. A value of none can be used to tell the cluster not to supply any additional parameters. Advanced use only: An alternate parameter to supply instead of 'port' Eg. node1:1;node2:2,3 would tell the cluster to use port 1 for node1 and ports 2 and 3 for node2 A mapping of host names to ports numbers for devices that do not support host names. A list of machines controlled by this device (Optional unless pcmk_host_list=static-list) Eg. node1,node2,node3 Allowed values: dynamic-list (query the device via the 'list' command), static-list (check the pcmk_host_list attribute), status (query the device via the 'status' command), none (assume every device can fence every machine) How to determine which machines are controlled by the device. Enable a delay of no more than the time specified before executing fencing actions. Pacemaker derives the overall delay by taking the value of pcmk_delay_base and adding a random delay value such that the sum is kept below this maximum. Enable a base delay for fencing actions and specify base delay value. This enables a static delay for fencing actions, which can help avoid "death matches" where two nodes try to fence each other at the same time. If pcmk_delay_max is also used, a random delay will be added such that the total delay is kept below that value.This can be set to a single time value to apply to any node targeted by this device (useful if a separate device is configured for each target), or to a node map (for example, "node1:1s;node2:5") to set a different value per target. Enable a base delay for fencing actions and specify base delay value. Cluster property concurrent-fencing=true needs to be configured first.Then use this to specify the maximum number of actions can be performed in parallel on this device. -1 is unlimited. The maximum number of actions can be performed in parallel on this device Some devices do not support the standard commands or may provide additional ones.\nUse this to specify an alternate, device-specific, command that implements the 'reboot' action. Advanced use only: An alternate command to run instead of 'reboot' Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'reboot' actions. Advanced use only: Specify an alternate timeout to use for reboot actions instead of stonith-timeout Some devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'reboot' actions before giving up. Advanced use only: The maximum number of times to retry the 'reboot' command within the timeout period Some devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'off' action. Advanced use only: An alternate command to run instead of 'off' Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'off' actions. Advanced use only: Specify an alternate timeout to use for off actions instead of stonith-timeout Some devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'off' actions before giving up. Advanced use only: The maximum number of times to retry the 'off' command within the timeout period Some devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'on' action. Advanced use only: An alternate command to run instead of 'on' Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'on' actions. Advanced use only: Specify an alternate timeout to use for on actions instead of stonith-timeout Some devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'on' actions before giving up. Advanced use only: The maximum number of times to retry the 'on' command within the timeout period Some devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'list' action. Advanced use only: An alternate command to run instead of 'list' Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'list' actions. Advanced use only: Specify an alternate timeout to use for list actions instead of stonith-timeout Some devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'list' actions before giving up. Advanced use only: The maximum number of times to retry the 'list' command within the timeout period Some devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'monitor' action. Advanced use only: An alternate command to run instead of 'monitor' Some devices need much more/less time to complete than normal.\nUse this to specify an alternate, device-specific, timeout for 'monitor' actions. Advanced use only: Specify an alternate timeout to use for monitor actions instead of stonith-timeout Some devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'monitor' actions before giving up. Advanced use only: The maximum number of times to retry the 'monitor' command within the timeout period Some devices do not support the standard commands or may provide additional ones.Use this to specify an alternate, device-specific, command that implements the 'status' action. Advanced use only: An alternate command to run instead of 'status' Some devices need much more/less time to complete than normal.Use this to specify an alternate, device-specific, timeout for 'status' actions. Advanced use only: Specify an alternate timeout to use for status actions instead of stonith-timeout Some devices do not support multiple connections. Operations may 'fail' if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries 'status' actions before giving up. Advanced use only: The maximum number of times to retry the 'status' command within the timeout period =#=#=#= End test: Get fencer metadata - OK (0) =#=#=#= * Passed: pacemaker-fenced - Get fencer metadata =#=#=#= Begin test: Get scheduler metadata =#=#=#= 1.1 Cluster options used by Pacemaker's scheduler Pacemaker scheduler options What to do when the cluster does not have quorum Allowed values: stop, freeze, ignore, demote, suicide What to do when the cluster does not have quorum Whether resources can run on any node by default Whether resources can run on any node by default Whether the cluster should refrain from monitoring, starting, and stopping resources Whether the cluster should refrain from monitoring, starting, and stopping resources When true, the cluster will immediately ban a resource from a node if it fails to start there. When false, the cluster will instead check the resource's fail count against its migration-threshold. Whether a start failure should prevent a resource from being recovered on the same node Whether the cluster should check for active resources during start-up Whether the cluster should check for active resources during start-up When true, resources active on a node when it is cleanly shut down are kept "locked" to that node (not allowed to run elsewhere) until they start again on that node after it rejoins (or for at most shutdown-lock-limit, if set). Stonith resources and Pacemaker Remote connections are never locked. Clone and bundle instances and the promoted role of promotable clones are currently never locked, though support could be added in a future release. Whether to lock resources to a cleanly shut down node If shutdown-lock is true and this is set to a nonzero time duration, shutdown locks will expire after this much time has passed since the shutdown was initiated, even if the node has not rejoined. Do not lock resources to a cleanly shut down node longer than this If false, unresponsive nodes are immediately assumed to be harmless, and resources that were active on them may be recovered elsewhere. This can result in a "split-brain" situation, potentially leading to data loss and/or service unavailability. *** Advanced Use Only *** Whether nodes may be fenced as part of recovery Action to send to fence device when a node needs to be fenced ("poweroff" is a deprecated alias for "off") Allowed values: reboot, off, poweroff Action to send to fence device when a node needs to be fenced ("poweroff" is a deprecated alias for "off") This value is not used by Pacemaker, but is kept for backward compatibility, and certain legacy fence agents might use it. *** Advanced Use Only *** Unused by Pacemaker This is set automatically by the cluster according to whether SBD is detected to be in use. User-configured values are ignored. The value `true` is meaningful if diskless SBD is used and `stonith-watchdog-timeout` is nonzero. In that case, if fencing is required, watchdog-based self-fencing will be performed via SBD without requiring a fencing resource explicitly configured. Whether watchdog integration is enabled Allow performing fencing operations in parallel Allow performing fencing operations in parallel Setting this to false may lead to a "split-brain" situation,potentially leading to data loss and/or service unavailability. *** Advanced Use Only *** Whether to fence unseen nodes at start-up Apply specified delay for the fencings that are targeting the lost nodes with the highest total resource priority in case we don't have the majority of the nodes in our cluster partition, so that the more significant nodes potentially win any fencing match, which is especially meaningful under split-brain of 2-node cluster. A promoted resource instance takes the base priority + 1 on calculation if the base priority is not 0. Any static/random delays that are introduced by `pcmk_delay_base/max` configured for the corresponding fencing resources will be added to this delay. This delay should be significantly greater than, safely twice, the maximum `pcmk_delay_base/max`. By default, priority fencing delay is disabled. Apply fencing delay targeting the lost nodes with the highest total resource priority + + A node that has joined the cluster can be pending on joining the process group. We wait up to this much time for it. If it times out, fencing targeting the node will be issued if enabled. + How long to wait for a node that has joined the cluster to join the process group + + The node elected Designated Controller (DC) will consider an action failed if it does not get a response from the node executing the action within this time (after considering the action's own timeout). The "correct" value will depend on the speed and load of your network and cluster nodes. Maximum time for node-to-node communication The "correct" value will depend on the speed and load of your network and cluster nodes. If set to 0, the cluster will impose a dynamically calculated limit when any node has a high load. Maximum number of jobs that the cluster may execute in parallel across all nodes The number of live migration actions that the cluster is allowed to execute in parallel on a node (-1 means no limit) The number of live migration actions that the cluster is allowed to execute in parallel on a node (-1 means no limit) Whether the cluster should stop all active resources Whether the cluster should stop all active resources Whether to stop resources that were removed from the configuration Whether to stop resources that were removed from the configuration Whether to cancel recurring actions removed from the configuration Whether to cancel recurring actions removed from the configuration Values other than default are poorly tested and potentially dangerous. This option will be removed in a future release. *** Deprecated *** Whether to remove stopped resources from the executor Zero to disable, -1 to store unlimited. The number of scheduler inputs resulting in errors to save Zero to disable, -1 to store unlimited. The number of scheduler inputs resulting in warnings to save Zero to disable, -1 to store unlimited. The number of scheduler inputs without errors or warnings to save Requires external entities to create node attributes (named with the prefix "#health") with values "red", "yellow", or "green". Allowed values: none, migrate-on-red, only-green, progressive, custom How cluster should react to node health attributes Only used when "node-health-strategy" is set to "progressive". Base health score assigned to a node Only used when "node-health-strategy" is set to "custom" or "progressive". The score to use for a node health attribute whose value is "green" Only used when "node-health-strategy" is set to "custom" or "progressive". The score to use for a node health attribute whose value is "yellow" Only used when "node-health-strategy" is set to "custom" or "progressive". The score to use for a node health attribute whose value is "red" How the cluster should allocate resources to nodes Allowed values: default, utilization, minimal, balanced How the cluster should allocate resources to nodes =#=#=#= End test: Get scheduler metadata - OK (0) =#=#=#= * Passed: pacemaker-schedulerd - Get scheduler metadata diff --git a/cts/cli/regression.tools.exp b/cts/cli/regression.tools.exp index be5afa6dc4..a5713e5085 100644 --- a/cts/cli/regression.tools.exp +++ b/cts/cli/regression.tools.exp @@ -1,7900 +1,7900 @@ Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Validate CIB =#=#=#= =#=#=#= Current cib after: Validate CIB =#=#=#= =#=#=#= End test: Validate CIB - OK (0) =#=#=#= * Passed: cibadmin - Validate CIB =#=#=#= Begin test: Query the value of an attribute that does not exist =#=#=#= crm_attribute: Error performing operation: No such device or address =#=#=#= End test: Query the value of an attribute that does not exist - No such object (105) =#=#=#= * Passed: crm_attribute - Query the value of an attribute that does not exist =#=#=#= Begin test: Configure something before erasing =#=#=#= =#=#=#= Current cib after: Configure something before erasing =#=#=#= =#=#=#= End test: Configure something before erasing - OK (0) =#=#=#= * Passed: crm_attribute - Configure something before erasing =#=#=#= Begin test: Require --force for CIB erasure =#=#=#= cibadmin: The supplied command is considered dangerous. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= Current cib after: Require --force for CIB erasure =#=#=#= =#=#=#= End test: Require --force for CIB erasure - Operation not safe (107) =#=#=#= * Passed: cibadmin - Require --force for CIB erasure =#=#=#= Begin test: Allow CIB erasure with --force =#=#=#= =#=#=#= End test: Allow CIB erasure with --force - OK (0) =#=#=#= * Passed: cibadmin - Allow CIB erasure with --force =#=#=#= Begin test: Query CIB =#=#=#= =#=#=#= Current cib after: Query CIB =#=#=#= =#=#=#= End test: Query CIB - OK (0) =#=#=#= * Passed: cibadmin - Query CIB =#=#=#= Begin test: Set cluster option =#=#=#= =#=#=#= Current cib after: Set cluster option =#=#=#= =#=#=#= End test: Set cluster option - OK (0) =#=#=#= * Passed: crm_attribute - Set cluster option =#=#=#= Begin test: Query new cluster option =#=#=#= =#=#=#= Current cib after: Query new cluster option =#=#=#= =#=#=#= End test: Query new cluster option - OK (0) =#=#=#= * Passed: cibadmin - Query new cluster option =#=#=#= Begin test: Query cluster options =#=#=#= =#=#=#= Current cib after: Query cluster options =#=#=#= =#=#=#= End test: Query cluster options - OK (0) =#=#=#= * Passed: cibadmin - Query cluster options =#=#=#= Begin test: Set no-quorum policy =#=#=#= =#=#=#= Current cib after: Set no-quorum policy =#=#=#= =#=#=#= End test: Set no-quorum policy - OK (0) =#=#=#= * Passed: crm_attribute - Set no-quorum policy =#=#=#= Begin test: Delete nvpair =#=#=#= =#=#=#= Current cib after: Delete nvpair =#=#=#= =#=#=#= End test: Delete nvpair - OK (0) =#=#=#= * Passed: cibadmin - Delete nvpair =#=#=#= Begin test: Create operation should fail =#=#=#= Call failed: File exists =#=#=#= Current cib after: Create operation should fail =#=#=#= =#=#=#= End test: Create operation should fail - Requested item already exists (108) =#=#=#= * Passed: cibadmin - Create operation should fail =#=#=#= Begin test: Modify cluster options section =#=#=#= =#=#=#= Current cib after: Modify cluster options section =#=#=#= =#=#=#= End test: Modify cluster options section - OK (0) =#=#=#= * Passed: cibadmin - Modify cluster options section =#=#=#= Begin test: Query updated cluster option =#=#=#= =#=#=#= Current cib after: Query updated cluster option =#=#=#= =#=#=#= End test: Query updated cluster option - OK (0) =#=#=#= * Passed: cibadmin - Query updated cluster option =#=#=#= Begin test: Set duplicate cluster option =#=#=#= =#=#=#= Current cib after: Set duplicate cluster option =#=#=#= =#=#=#= End test: Set duplicate cluster option - OK (0) =#=#=#= * Passed: crm_attribute - Set duplicate cluster option =#=#=#= Begin test: Setting multiply defined cluster option should fail =#=#=#= crm_attribute: Please choose from one of the matches below and supply the 'id' with --attr-id Multiple attributes match name=cluster-delay Value: 60s (id=cib-bootstrap-options-cluster-delay) Value: 40s (id=duplicate-cluster-delay) =#=#=#= Current cib after: Setting multiply defined cluster option should fail =#=#=#= =#=#=#= End test: Setting multiply defined cluster option should fail - Multiple items match request (109) =#=#=#= * Passed: crm_attribute - Setting multiply defined cluster option should fail =#=#=#= Begin test: Set cluster option with -s =#=#=#= =#=#=#= Current cib after: Set cluster option with -s =#=#=#= =#=#=#= End test: Set cluster option with -s - OK (0) =#=#=#= * Passed: crm_attribute - Set cluster option with -s =#=#=#= Begin test: Delete cluster option with -i =#=#=#= Deleted crm_config option: id=(null) name=cluster-delay =#=#=#= Current cib after: Delete cluster option with -i =#=#=#= =#=#=#= End test: Delete cluster option with -i - OK (0) =#=#=#= * Passed: crm_attribute - Delete cluster option with -i =#=#=#= Begin test: Create node1 and bring it online =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Current cluster status: * Full List of Resources: * No resources Performing Requested Modifications: * Bringing node node1 online Transition Summary: Executing Cluster Transition: Revised Cluster Status: * Node List: * Online: [ node1 ] * Full List of Resources: * No resources =#=#=#= Current cib after: Create node1 and bring it online =#=#=#= =#=#=#= End test: Create node1 and bring it online - OK (0) =#=#=#= * Passed: crm_simulate - Create node1 and bring it online =#=#=#= Begin test: Create node attribute =#=#=#= =#=#=#= Current cib after: Create node attribute =#=#=#= =#=#=#= End test: Create node attribute - OK (0) =#=#=#= * Passed: crm_attribute - Create node attribute =#=#=#= Begin test: Query new node attribute =#=#=#= =#=#=#= Current cib after: Query new node attribute =#=#=#= =#=#=#= End test: Query new node attribute - OK (0) =#=#=#= * Passed: cibadmin - Query new node attribute =#=#=#= Begin test: Create second node attribute =#=#=#= =#=#=#= Current cib after: Create second node attribute =#=#=#= =#=#=#= End test: Create second node attribute - OK (0) =#=#=#= * Passed: crm_attribute - Create second node attribute =#=#=#= Begin test: Query node attributes by pattern =#=#=#= scope=nodes name=ram value=1024M scope=nodes name=rattr value=XYZ =#=#=#= End test: Query node attributes by pattern - OK (0) =#=#=#= * Passed: crm_attribute - Query node attributes by pattern =#=#=#= Begin test: Update node attributes by pattern =#=#=#= =#=#=#= Current cib after: Update node attributes by pattern =#=#=#= =#=#=#= End test: Update node attributes by pattern - OK (0) =#=#=#= * Passed: crm_attribute - Update node attributes by pattern =#=#=#= Begin test: Delete node attributes by pattern =#=#=#= Deleted nodes attribute: id=nodes-node1-rattr name=rattr =#=#=#= Current cib after: Delete node attributes by pattern =#=#=#= =#=#=#= End test: Delete node attributes by pattern - OK (0) =#=#=#= * Passed: crm_attribute - Delete node attributes by pattern =#=#=#= Begin test: Set a transient (fail-count) node attribute =#=#=#= =#=#=#= Current cib after: Set a transient (fail-count) node attribute =#=#=#= =#=#=#= End test: Set a transient (fail-count) node attribute - OK (0) =#=#=#= * Passed: crm_attribute - Set a transient (fail-count) node attribute =#=#=#= Begin test: Query a fail count =#=#=#= scope=status name=fail-count-foo value=3 =#=#=#= Current cib after: Query a fail count =#=#=#= =#=#=#= End test: Query a fail count - OK (0) =#=#=#= * Passed: crm_failcount - Query a fail count =#=#=#= Begin test: Show node attributes with crm_simulate =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Current cluster status: * Node List: * Online: [ node1 ] * Full List of Resources: * No resources * Node Attributes: * Node: node1: * ram : 1024M =#=#=#= End test: Show node attributes with crm_simulate - OK (0) =#=#=#= * Passed: crm_simulate - Show node attributes with crm_simulate =#=#=#= Begin test: Set a second transient node attribute =#=#=#= =#=#=#= Current cib after: Set a second transient node attribute =#=#=#= =#=#=#= End test: Set a second transient node attribute - OK (0) =#=#=#= * Passed: crm_attribute - Set a second transient node attribute =#=#=#= Begin test: Query transient node attributes by pattern =#=#=#= scope=status name=fail-count-foo value=3 scope=status name=fail-count-bar value=5 =#=#=#= End test: Query transient node attributes by pattern - OK (0) =#=#=#= * Passed: crm_attribute - Query transient node attributes by pattern =#=#=#= Begin test: Update transient node attributes by pattern =#=#=#= =#=#=#= Current cib after: Update transient node attributes by pattern =#=#=#= =#=#=#= End test: Update transient node attributes by pattern - OK (0) =#=#=#= * Passed: crm_attribute - Update transient node attributes by pattern =#=#=#= Begin test: Delete transient node attributes by pattern =#=#=#= Deleted status attribute: id=status-node1-fail-count-foo name=fail-count-foo Deleted status attribute: id=status-node1-fail-count-bar name=fail-count-bar =#=#=#= Current cib after: Delete transient node attributes by pattern =#=#=#= =#=#=#= End test: Delete transient node attributes by pattern - OK (0) =#=#=#= * Passed: crm_attribute - Delete transient node attributes by pattern =#=#=#= Begin test: crm_attribute given invalid delete usage =#=#=#= crm_attribute: Error: must specify attribute name or pattern to delete =#=#=#= End test: crm_attribute given invalid delete usage - Incorrect usage (64) =#=#=#= * Passed: crm_attribute - crm_attribute given invalid delete usage =#=#=#= Begin test: Set a utilization node attribute =#=#=#= =#=#=#= Current cib after: Set a utilization node attribute =#=#=#= =#=#=#= End test: Set a utilization node attribute - OK (0) =#=#=#= * Passed: crm_attribute - Set a utilization node attribute =#=#=#= Begin test: Query utilization node attribute =#=#=#= scope=nodes name=cpu value=1 =#=#=#= End test: Query utilization node attribute - OK (0) =#=#=#= * Passed: crm_attribute - Query utilization node attribute =#=#=#= Begin test: Digest calculation =#=#=#= Digest: =#=#=#= Current cib after: Digest calculation =#=#=#= =#=#=#= End test: Digest calculation - OK (0) =#=#=#= * Passed: cibadmin - Digest calculation =#=#=#= Begin test: Replace operation should fail =#=#=#= Call failed: Update was older than existing configuration =#=#=#= Current cib after: Replace operation should fail =#=#=#= =#=#=#= End test: Replace operation should fail - Update was older than existing configuration (103) =#=#=#= * Passed: cibadmin - Replace operation should fail =#=#=#= Begin test: Default standby value =#=#=#= scope=status name=standby value=off =#=#=#= Current cib after: Default standby value =#=#=#= =#=#=#= End test: Default standby value - OK (0) =#=#=#= * Passed: crm_standby - Default standby value =#=#=#= Begin test: Set standby status =#=#=#= =#=#=#= Current cib after: Set standby status =#=#=#= =#=#=#= End test: Set standby status - OK (0) =#=#=#= * Passed: crm_standby - Set standby status =#=#=#= Begin test: Query standby value =#=#=#= scope=nodes name=standby value=true =#=#=#= Current cib after: Query standby value =#=#=#= =#=#=#= End test: Query standby value - OK (0) =#=#=#= * Passed: crm_standby - Query standby value =#=#=#= Begin test: Delete standby value =#=#=#= Deleted nodes attribute: id=nodes-node1-standby name=standby =#=#=#= Current cib after: Delete standby value =#=#=#= =#=#=#= End test: Delete standby value - OK (0) =#=#=#= * Passed: crm_standby - Delete standby value =#=#=#= Begin test: Create a resource =#=#=#= =#=#=#= Current cib after: Create a resource =#=#=#= =#=#=#= End test: Create a resource - OK (0) =#=#=#= * Passed: cibadmin - Create a resource =#=#=#= Begin test: crm_resource run with extra arguments =#=#=#= crm_resource: non-option ARGV-elements: [1 of 2] foo [2 of 2] bar =#=#=#= End test: crm_resource run with extra arguments - Incorrect usage (64) =#=#=#= * Passed: crm_resource - crm_resource run with extra arguments =#=#=#= Begin test: crm_resource given both -r and resource config =#=#=#= crm_resource: --resource cannot be used with --class, --agent, and --provider =#=#=#= End test: crm_resource given both -r and resource config - Incorrect usage (64) =#=#=#= * Passed: crm_resource - crm_resource given both -r and resource config =#=#=#= Begin test: crm_resource given resource config with invalid action =#=#=#= crm_resource: --class, --agent, and --provider can only be used with --validate and --force-* =#=#=#= End test: crm_resource given resource config with invalid action - Incorrect usage (64) =#=#=#= * Passed: crm_resource - crm_resource given resource config with invalid action =#=#=#= Begin test: Create a resource meta attribute =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Set 'dummy' option: id=dummy-meta_attributes-is-managed set=dummy-meta_attributes name=is-managed value=false =#=#=#= Current cib after: Create a resource meta attribute =#=#=#= =#=#=#= End test: Create a resource meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Create a resource meta attribute =#=#=#= Begin test: Query a resource meta attribute =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity false =#=#=#= Current cib after: Query a resource meta attribute =#=#=#= =#=#=#= End test: Query a resource meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Query a resource meta attribute =#=#=#= Begin test: Remove a resource meta attribute =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Deleted 'dummy' option: id=dummy-meta_attributes-is-managed name=is-managed =#=#=#= Current cib after: Remove a resource meta attribute =#=#=#= =#=#=#= End test: Remove a resource meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Remove a resource meta attribute =#=#=#= Begin test: Create another resource meta attribute =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= End test: Create another resource meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Create another resource meta attribute =#=#=#= Begin test: Show why a resource is not running =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= End test: Show why a resource is not running - OK (0) =#=#=#= * Passed: crm_resource - Show why a resource is not running =#=#=#= Begin test: Remove another resource meta attribute =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= End test: Remove another resource meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Remove another resource meta attribute =#=#=#= Begin test: Get a non-existent attribute from a resource element with output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Attribute 'nonexistent' not found for 'dummy' =#=#=#= End test: Get a non-existent attribute from a resource element with output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Get a non-existent attribute from a resource element with output-as=xml =#=#=#= Begin test: Get a non-existent attribute from a resource element without output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Attribute 'nonexistent' not found for 'dummy' =#=#=#= Current cib after: Get a non-existent attribute from a resource element without output-as=xml =#=#=#= =#=#=#= End test: Get a non-existent attribute from a resource element without output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Get a non-existent attribute from a resource element without output-as=xml =#=#=#= Begin test: Get an existent attribute from a resource element with output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity ocf =#=#=#= End test: Get an existent attribute from a resource element with output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Get an existent attribute from a resource element with output-as=xml =#=#=#= Begin test: Get an existent attribute from a resource element without output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity ocf =#=#=#= Current cib after: Get an existent attribute from a resource element without output-as=xml =#=#=#= =#=#=#= End test: Get an existent attribute from a resource element without output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Get an existent attribute from a resource element without output-as=xml =#=#=#= Begin test: Set a non-existent attribute for a resource element with output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= Current cib after: Set a non-existent attribute for a resource element with output-as=xml =#=#=#= =#=#=#= End test: Set a non-existent attribute for a resource element with output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Set a non-existent attribute for a resource element with output-as=xml =#=#=#= Begin test: Set an existent attribute for a resource element with output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= Current cib after: Set an existent attribute for a resource element with output-as=xml =#=#=#= =#=#=#= End test: Set an existent attribute for a resource element with output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Set an existent attribute for a resource element with output-as=xml =#=#=#= Begin test: Delete an existent attribute for a resource element with output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= Current cib after: Delete an existent attribute for a resource element with output-as=xml =#=#=#= =#=#=#= End test: Delete an existent attribute for a resource element with output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Delete an existent attribute for a resource element with output-as=xml =#=#=#= Begin test: Delete a non-existent attribute for a resource element with output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= Current cib after: Delete a non-existent attribute for a resource element with output-as=xml =#=#=#= =#=#=#= End test: Delete a non-existent attribute for a resource element with output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Delete a non-existent attribute for a resource element with output-as=xml =#=#=#= Begin test: Set a non-existent attribute for a resource element without output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Set attribute: name=description value=test_description =#=#=#= Current cib after: Set a non-existent attribute for a resource element without output-as=xml =#=#=#= =#=#=#= End test: Set a non-existent attribute for a resource element without output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Set a non-existent attribute for a resource element without output-as=xml =#=#=#= Begin test: Set an existent attribute for a resource element without output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Set attribute: name=description value=test_description =#=#=#= Current cib after: Set an existent attribute for a resource element without output-as=xml =#=#=#= =#=#=#= End test: Set an existent attribute for a resource element without output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Set an existent attribute for a resource element without output-as=xml =#=#=#= Begin test: Delete an existent attribute for a resource element without output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Deleted attribute: description =#=#=#= Current cib after: Delete an existent attribute for a resource element without output-as=xml =#=#=#= =#=#=#= End test: Delete an existent attribute for a resource element without output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Delete an existent attribute for a resource element without output-as=xml =#=#=#= Begin test: Delete a non-existent attribute for a resource element without output-as=xml =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Deleted attribute: description =#=#=#= Current cib after: Delete a non-existent attribute for a resource element without output-as=xml =#=#=#= =#=#=#= End test: Delete a non-existent attribute for a resource element without output-as=xml - OK (0) =#=#=#= * Passed: crm_resource - Delete a non-existent attribute for a resource element without output-as=xml =#=#=#= Begin test: Create a resource attribute =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Set 'dummy' option: id=dummy-instance_attributes-delay set=dummy-instance_attributes name=delay value=10s =#=#=#= Current cib after: Create a resource attribute =#=#=#= =#=#=#= End test: Create a resource attribute - OK (0) =#=#=#= * Passed: crm_resource - Create a resource attribute =#=#=#= Begin test: List the configured resources =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Full List of Resources: * dummy (ocf:pacemaker:Dummy): Stopped =#=#=#= Current cib after: List the configured resources =#=#=#= =#=#=#= End test: List the configured resources - OK (0) =#=#=#= * Passed: crm_resource - List the configured resources =#=#=#= Begin test: List the configured resources in XML =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity =#=#=#= End test: List the configured resources in XML - OK (0) =#=#=#= * Passed: crm_resource - List the configured resources in XML =#=#=#= Begin test: Implicitly list the configured resources =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity Full List of Resources: * dummy (ocf:pacemaker:Dummy): Stopped =#=#=#= End test: Implicitly list the configured resources - OK (0) =#=#=#= * Passed: crm_resource - Implicitly list the configured resources =#=#=#= Begin test: List IDs of instantiated resources =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity dummy =#=#=#= End test: List IDs of instantiated resources - OK (0) =#=#=#= * Passed: crm_resource - List IDs of instantiated resources =#=#=#= Begin test: Show XML configuration of resource =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity dummy (ocf:pacemaker:Dummy): Stopped Resource XML: =#=#=#= End test: Show XML configuration of resource - OK (0) =#=#=#= * Passed: crm_resource - Show XML configuration of resource =#=#=#= Begin test: Show XML configuration of resource, output as XML =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity ]]> =#=#=#= End test: Show XML configuration of resource, output as XML - OK (0) =#=#=#= * Passed: crm_resource - Show XML configuration of resource, output as XML =#=#=#= Begin test: Require a destination when migrating a resource that is stopped =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity crm_resource: Resource 'dummy' not moved: active in 0 locations. To prevent 'dummy' from running on a specific location, specify a node. =#=#=#= Current cib after: Require a destination when migrating a resource that is stopped =#=#=#= =#=#=#= End test: Require a destination when migrating a resource that is stopped - Incorrect usage (64) =#=#=#= * Passed: crm_resource - Require a destination when migrating a resource that is stopped =#=#=#= Begin test: Don't support migration to non-existent locations =#=#=#= unpack_resources error: Resource start-up disabled since no STONITH resources have been defined unpack_resources error: Either configure some or disable STONITH with the stonith-enabled option unpack_resources error: NOTE: Clusters with shared data need STONITH to ensure data integrity crm_resource: Node 'i.do.not.exist' not found Error performing operation: No such object =#=#=#= Current cib after: Don't support migration to non-existent locations =#=#=#= =#=#=#= End test: Don't support migration to non-existent locations - No such object (105) =#=#=#= * Passed: crm_resource - Don't support migration to non-existent locations =#=#=#= Begin test: Create a fencing resource =#=#=#= =#=#=#= Current cib after: Create a fencing resource =#=#=#= =#=#=#= End test: Create a fencing resource - OK (0) =#=#=#= * Passed: cibadmin - Create a fencing resource =#=#=#= Begin test: Bring resources online =#=#=#= Current cluster status: * Node List: * Online: [ node1 ] * Full List of Resources: * dummy (ocf:pacemaker:Dummy): Stopped * Fence (stonith:fence_true): Stopped Transition Summary: * Start dummy ( node1 ) * Start Fence ( node1 ) Executing Cluster Transition: * Resource action: dummy monitor on node1 * Resource action: Fence monitor on node1 * Resource action: dummy start on node1 * Resource action: Fence start on node1 Revised Cluster Status: * Node List: * Online: [ node1 ] * Full List of Resources: * dummy (ocf:pacemaker:Dummy): Started node1 * Fence (stonith:fence_true): Started node1 =#=#=#= Current cib after: Bring resources online =#=#=#= =#=#=#= End test: Bring resources online - OK (0) =#=#=#= * Passed: crm_simulate - Bring resources online =#=#=#= Begin test: Try to move a resource to its existing location =#=#=#= crm_resource: Error performing operation: Requested item already exists =#=#=#= Current cib after: Try to move a resource to its existing location =#=#=#= =#=#=#= End test: Try to move a resource to its existing location - Requested item already exists (108) =#=#=#= * Passed: crm_resource - Try to move a resource to its existing location =#=#=#= Begin test: Try to move a resource that doesn't exist =#=#=#= crm_resource: Resource 'xyz' not found Error performing operation: No such object =#=#=#= End test: Try to move a resource that doesn't exist - No such object (105) =#=#=#= * Passed: crm_resource - Try to move a resource that doesn't exist =#=#=#= Begin test: Move a resource from its existing location =#=#=#= WARNING: Creating rsc_location constraint 'cli-ban-dummy-on-node1' with a score of -INFINITY for resource dummy on node1. This will prevent dummy from running on node1 until the constraint is removed using the clear option or by editing the CIB with an appropriate tool This will be the case even if node1 is the last node in the cluster =#=#=#= Current cib after: Move a resource from its existing location =#=#=#= =#=#=#= End test: Move a resource from its existing location - OK (0) =#=#=#= * Passed: crm_resource - Move a resource from its existing location =#=#=#= Begin test: Clear out constraints generated by --move =#=#=#= Removing constraint: cli-ban-dummy-on-node1 =#=#=#= Current cib after: Clear out constraints generated by --move =#=#=#= =#=#=#= End test: Clear out constraints generated by --move - OK (0) =#=#=#= * Passed: crm_resource - Clear out constraints generated by --move =#=#=#= Begin test: Default ticket granted state =#=#=#= false =#=#=#= Current cib after: Default ticket granted state =#=#=#= =#=#=#= End test: Default ticket granted state - OK (0) =#=#=#= * Passed: crm_ticket - Default ticket granted state =#=#=#= Begin test: Set ticket granted state =#=#=#= =#=#=#= Current cib after: Set ticket granted state =#=#=#= =#=#=#= End test: Set ticket granted state - OK (0) =#=#=#= * Passed: crm_ticket - Set ticket granted state =#=#=#= Begin test: Query ticket granted state =#=#=#= false =#=#=#= Current cib after: Query ticket granted state =#=#=#= =#=#=#= End test: Query ticket granted state - OK (0) =#=#=#= * Passed: crm_ticket - Query ticket granted state =#=#=#= Begin test: Delete ticket granted state =#=#=#= =#=#=#= Current cib after: Delete ticket granted state =#=#=#= =#=#=#= End test: Delete ticket granted state - OK (0) =#=#=#= * Passed: crm_ticket - Delete ticket granted state =#=#=#= Begin test: Make a ticket standby =#=#=#= =#=#=#= Current cib after: Make a ticket standby =#=#=#= =#=#=#= End test: Make a ticket standby - OK (0) =#=#=#= * Passed: crm_ticket - Make a ticket standby =#=#=#= Begin test: Query ticket standby state =#=#=#= true =#=#=#= Current cib after: Query ticket standby state =#=#=#= =#=#=#= End test: Query ticket standby state - OK (0) =#=#=#= * Passed: crm_ticket - Query ticket standby state =#=#=#= Begin test: Activate a ticket =#=#=#= =#=#=#= Current cib after: Activate a ticket =#=#=#= =#=#=#= End test: Activate a ticket - OK (0) =#=#=#= * Passed: crm_ticket - Activate a ticket =#=#=#= Begin test: Delete ticket standby state =#=#=#= =#=#=#= Current cib after: Delete ticket standby state =#=#=#= =#=#=#= End test: Delete ticket standby state - OK (0) =#=#=#= * Passed: crm_ticket - Delete ticket standby state =#=#=#= Begin test: Ban a resource on unknown node =#=#=#= crm_resource: Node 'host1' not found Error performing operation: No such object =#=#=#= Current cib after: Ban a resource on unknown node =#=#=#= =#=#=#= End test: Ban a resource on unknown node - No such object (105) =#=#=#= * Passed: crm_resource - Ban a resource on unknown node =#=#=#= Begin test: Create two more nodes and bring them online =#=#=#= Current cluster status: * Node List: * Online: [ node1 ] * Full List of Resources: * dummy (ocf:pacemaker:Dummy): Started node1 * Fence (stonith:fence_true): Started node1 Performing Requested Modifications: * Bringing node node2 online * Bringing node node3 online Transition Summary: * Move Fence ( node1 -> node2 ) Executing Cluster Transition: * Resource action: dummy monitor on node3 * Resource action: dummy monitor on node2 * Resource action: Fence stop on node1 * Resource action: Fence monitor on node3 * Resource action: Fence monitor on node2 * Resource action: Fence start on node2 Revised Cluster Status: * Node List: * Online: [ node1 node2 node3 ] * Full List of Resources: * dummy (ocf:pacemaker:Dummy): Started node1 * Fence (stonith:fence_true): Started node2 =#=#=#= Current cib after: Create two more nodes and bring them online =#=#=#= =#=#=#= End test: Create two more nodes and bring them online - OK (0) =#=#=#= * Passed: crm_simulate - Create two more nodes and bring them online =#=#=#= Begin test: Ban dummy from node1 =#=#=#= WARNING: Creating rsc_location constraint 'cli-ban-dummy-on-node1' with a score of -INFINITY for resource dummy on node1. This will prevent dummy from running on node1 until the constraint is removed using the clear option or by editing the CIB with an appropriate tool This will be the case even if node1 is the last node in the cluster =#=#=#= Current cib after: Ban dummy from node1 =#=#=#= =#=#=#= End test: Ban dummy from node1 - OK (0) =#=#=#= * Passed: crm_resource - Ban dummy from node1 =#=#=#= Begin test: Show where a resource is running =#=#=#= resource dummy is running on: node1 =#=#=#= End test: Show where a resource is running - OK (0) =#=#=#= * Passed: crm_resource - Show where a resource is running =#=#=#= Begin test: Show constraints on a resource =#=#=#= Locations: * Node node1 (score=-INFINITY, id=cli-ban-dummy-on-node1, rsc=dummy) =#=#=#= End test: Show constraints on a resource - OK (0) =#=#=#= * Passed: crm_resource - Show constraints on a resource =#=#=#= Begin test: Ban dummy from node2 =#=#=#= =#=#=#= Current cib after: Ban dummy from node2 =#=#=#= =#=#=#= End test: Ban dummy from node2 - OK (0) =#=#=#= * Passed: crm_resource - Ban dummy from node2 =#=#=#= Begin test: Relocate resources due to ban =#=#=#= Current cluster status: * Node List: * Online: [ node1 node2 node3 ] * Full List of Resources: * dummy (ocf:pacemaker:Dummy): Started node1 * Fence (stonith:fence_true): Started node2 Transition Summary: * Move dummy ( node1 -> node3 ) Executing Cluster Transition: * Resource action: dummy stop on node1 * Resource action: dummy start on node3 Revised Cluster Status: * Node List: * Online: [ node1 node2 node3 ] * Full List of Resources: * dummy (ocf:pacemaker:Dummy): Started node3 * Fence (stonith:fence_true): Started node2 =#=#=#= Current cib after: Relocate resources due to ban =#=#=#= =#=#=#= End test: Relocate resources due to ban - OK (0) =#=#=#= * Passed: crm_simulate - Relocate resources due to ban =#=#=#= Begin test: Move dummy to node1 =#=#=#= =#=#=#= Current cib after: Move dummy to node1 =#=#=#= =#=#=#= End test: Move dummy to node1 - OK (0) =#=#=#= * Passed: crm_resource - Move dummy to node1 =#=#=#= Begin test: Clear implicit constraints for dummy on node2 =#=#=#= Removing constraint: cli-ban-dummy-on-node2 =#=#=#= Current cib after: Clear implicit constraints for dummy on node2 =#=#=#= =#=#=#= End test: Clear implicit constraints for dummy on node2 - OK (0) =#=#=#= * Passed: crm_resource - Clear implicit constraints for dummy on node2 =#=#=#= Begin test: Drop the status section =#=#=#= =#=#=#= End test: Drop the status section - OK (0) =#=#=#= * Passed: cibadmin - Drop the status section =#=#=#= Begin test: Create a clone =#=#=#= =#=#=#= End test: Create a clone - OK (0) =#=#=#= * Passed: cibadmin - Create a clone =#=#=#= Begin test: Create a resource meta attribute =#=#=#= Performing update of 'is-managed' on 'test-clone', the parent of 'test-primitive' Set 'test-clone' option: id=test-clone-meta_attributes-is-managed set=test-clone-meta_attributes name=is-managed value=false =#=#=#= Current cib after: Create a resource meta attribute =#=#=#= =#=#=#= End test: Create a resource meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Create a resource meta attribute =#=#=#= Begin test: Create a resource meta attribute in the primitive =#=#=#= Set 'test-primitive' option: id=test-primitive-meta_attributes-is-managed set=test-primitive-meta_attributes name=is-managed value=false =#=#=#= Current cib after: Create a resource meta attribute in the primitive =#=#=#= =#=#=#= End test: Create a resource meta attribute in the primitive - OK (0) =#=#=#= * Passed: crm_resource - Create a resource meta attribute in the primitive =#=#=#= Begin test: Update resource meta attribute with duplicates =#=#=#= Multiple attributes match name=is-managed Value: false (id=test-primitive-meta_attributes-is-managed) Value: false (id=test-clone-meta_attributes-is-managed) A value for 'is-managed' already exists in child 'test-primitive', performing update on that instead of 'test-clone' Set 'test-primitive' option: id=test-primitive-meta_attributes-is-managed name=is-managed value=true =#=#=#= Current cib after: Update resource meta attribute with duplicates =#=#=#= =#=#=#= End test: Update resource meta attribute with duplicates - OK (0) =#=#=#= * Passed: crm_resource - Update resource meta attribute with duplicates =#=#=#= Begin test: Update resource meta attribute with duplicates (force clone) =#=#=#= Set 'test-clone' option: id=test-clone-meta_attributes-is-managed name=is-managed value=true =#=#=#= Current cib after: Update resource meta attribute with duplicates (force clone) =#=#=#= =#=#=#= End test: Update resource meta attribute with duplicates (force clone) - OK (0) =#=#=#= * Passed: crm_resource - Update resource meta attribute with duplicates (force clone) =#=#=#= Begin test: Update child resource meta attribute with duplicates =#=#=#= Multiple attributes match name=is-managed Value: true (id=test-primitive-meta_attributes-is-managed) Value: true (id=test-clone-meta_attributes-is-managed) Set 'test-primitive' option: id=test-primitive-meta_attributes-is-managed name=is-managed value=false =#=#=#= Current cib after: Update child resource meta attribute with duplicates =#=#=#= =#=#=#= End test: Update child resource meta attribute with duplicates - OK (0) =#=#=#= * Passed: crm_resource - Update child resource meta attribute with duplicates =#=#=#= Begin test: Delete resource meta attribute with duplicates =#=#=#= Multiple attributes match name=is-managed Value: false (id=test-primitive-meta_attributes-is-managed) Value: true (id=test-clone-meta_attributes-is-managed) A value for 'is-managed' already exists in child 'test-primitive', performing delete on that instead of 'test-clone' Deleted 'test-primitive' option: id=test-primitive-meta_attributes-is-managed name=is-managed =#=#=#= Current cib after: Delete resource meta attribute with duplicates =#=#=#= =#=#=#= End test: Delete resource meta attribute with duplicates - OK (0) =#=#=#= * Passed: crm_resource - Delete resource meta attribute with duplicates =#=#=#= Begin test: Delete resource meta attribute in parent =#=#=#= Performing delete of 'is-managed' on 'test-clone', the parent of 'test-primitive' Deleted 'test-clone' option: id=test-clone-meta_attributes-is-managed name=is-managed =#=#=#= Current cib after: Delete resource meta attribute in parent =#=#=#= =#=#=#= End test: Delete resource meta attribute in parent - OK (0) =#=#=#= * Passed: crm_resource - Delete resource meta attribute in parent =#=#=#= Begin test: Create a resource meta attribute in the primitive =#=#=#= Set 'test-primitive' option: id=test-primitive-meta_attributes-is-managed set=test-primitive-meta_attributes name=is-managed value=false =#=#=#= Current cib after: Create a resource meta attribute in the primitive =#=#=#= =#=#=#= End test: Create a resource meta attribute in the primitive - OK (0) =#=#=#= * Passed: crm_resource - Create a resource meta attribute in the primitive =#=#=#= Begin test: Update existing resource meta attribute =#=#=#= A value for 'is-managed' already exists in child 'test-primitive', performing update on that instead of 'test-clone' Set 'test-primitive' option: id=test-primitive-meta_attributes-is-managed name=is-managed value=true =#=#=#= Current cib after: Update existing resource meta attribute =#=#=#= =#=#=#= End test: Update existing resource meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Update existing resource meta attribute =#=#=#= Begin test: Create a resource meta attribute in the parent =#=#=#= Set 'test-clone' option: id=test-clone-meta_attributes-is-managed set=test-clone-meta_attributes name=is-managed value=true =#=#=#= Current cib after: Create a resource meta attribute in the parent =#=#=#= =#=#=#= End test: Create a resource meta attribute in the parent - OK (0) =#=#=#= * Passed: crm_resource - Create a resource meta attribute in the parent =#=#=#= Begin test: Copy resources =#=#=#= =#=#=#= End test: Copy resources - OK (0) =#=#=#= * Passed: cibadmin - Copy resources =#=#=#= Begin test: Delete resource parent meta attribute (force) =#=#=#= Deleted 'test-clone' option: id=test-clone-meta_attributes-is-managed name=is-managed =#=#=#= Current cib after: Delete resource parent meta attribute (force) =#=#=#= =#=#=#= End test: Delete resource parent meta attribute (force) - OK (0) =#=#=#= * Passed: crm_resource - Delete resource parent meta attribute (force) =#=#=#= Begin test: Restore duplicates =#=#=#= =#=#=#= Current cib after: Restore duplicates =#=#=#= =#=#=#= End test: Restore duplicates - OK (0) =#=#=#= * Passed: cibadmin - Restore duplicates =#=#=#= Begin test: Delete resource child meta attribute =#=#=#= Multiple attributes match name=is-managed Value: true (id=test-primitive-meta_attributes-is-managed) Value: true (id=test-clone-meta_attributes-is-managed) Deleted 'test-primitive' option: id=test-primitive-meta_attributes-is-managed name=is-managed =#=#=#= Current cib after: Delete resource child meta attribute =#=#=#= =#=#=#= End test: Delete resource child meta attribute - OK (0) =#=#=#= * Passed: crm_resource - Delete resource child meta attribute =#=#=#= Begin test: Create the dummy-group resource group =#=#=#= =#=#=#= Current cib after: Create the dummy-group resource group =#=#=#= =#=#=#= End test: Create the dummy-group resource group - OK (0) =#=#=#= * Passed: cibadmin - Create the dummy-group resource group =#=#=#= Begin test: Create a resource meta attribute in dummy1 =#=#=#= Set 'dummy1' option: id=dummy1-meta_attributes-is-managed set=dummy1-meta_attributes name=is-managed value=true =#=#=#= Current cib after: Create a resource meta attribute in dummy1 =#=#=#= =#=#=#= End test: Create a resource meta attribute in dummy1 - OK (0) =#=#=#= * Passed: crm_resource - Create a resource meta attribute in dummy1 =#=#=#= Begin test: Create a resource meta attribute in dummy-group =#=#=#= Set 'dummy1' option: id=dummy1-meta_attributes-is-managed name=is-managed value=false Set 'dummy-group' option: id=dummy-group-meta_attributes-is-managed set=dummy-group-meta_attributes name=is-managed value=false =#=#=#= Current cib after: Create a resource meta attribute in dummy-group =#=#=#= =#=#=#= End test: Create a resource meta attribute in dummy-group - OK (0) =#=#=#= * Passed: crm_resource - Create a resource meta attribute in dummy-group =#=#=#= Begin test: Delete the dummy-group resource group =#=#=#= =#=#=#= Current cib after: Delete the dummy-group resource group =#=#=#= =#=#=#= End test: Delete the dummy-group resource group - OK (0) =#=#=#= * Passed: cibadmin - Delete the dummy-group resource group =#=#=#= Begin test: Specify a lifetime when moving a resource =#=#=#= Migration will take effect until: =#=#=#= Current cib after: Specify a lifetime when moving a resource =#=#=#= =#=#=#= End test: Specify a lifetime when moving a resource - OK (0) =#=#=#= * Passed: crm_resource - Specify a lifetime when moving a resource =#=#=#= Begin test: Try to move a resource previously moved with a lifetime =#=#=#= =#=#=#= Current cib after: Try to move a resource previously moved with a lifetime =#=#=#= =#=#=#= End test: Try to move a resource previously moved with a lifetime - OK (0) =#=#=#= * Passed: crm_resource - Try to move a resource previously moved with a lifetime =#=#=#= Begin test: Ban dummy from node1 for a short time =#=#=#= Migration will take effect until: WARNING: Creating rsc_location constraint 'cli-ban-dummy-on-node1' with a score of -INFINITY for resource dummy on node1. This will prevent dummy from running on node1 until the constraint is removed using the clear option or by editing the CIB with an appropriate tool This will be the case even if node1 is the last node in the cluster =#=#=#= Current cib after: Ban dummy from node1 for a short time =#=#=#= =#=#=#= End test: Ban dummy from node1 for a short time - OK (0) =#=#=#= * Passed: crm_resource - Ban dummy from node1 for a short time =#=#=#= Begin test: Remove expired constraints =#=#=#= Removing constraint: cli-ban-dummy-on-node1 =#=#=#= Current cib after: Remove expired constraints =#=#=#= =#=#=#= End test: Remove expired constraints - OK (0) =#=#=#= * Passed: crm_resource - Remove expired constraints =#=#=#= Begin test: Clear all implicit constraints for dummy =#=#=#= Removing constraint: cli-prefer-dummy =#=#=#= Current cib after: Clear all implicit constraints for dummy =#=#=#= =#=#=#= End test: Clear all implicit constraints for dummy - OK (0) =#=#=#= * Passed: crm_resource - Clear all implicit constraints for dummy =#=#=#= Begin test: Set a node health strategy =#=#=#= =#=#=#= Current cib after: Set a node health strategy =#=#=#= =#=#=#= End test: Set a node health strategy - OK (0) =#=#=#= * Passed: crm_attribute - Set a node health strategy =#=#=#= Begin test: Set a node health attribute =#=#=#= =#=#=#= Current cib after: Set a node health attribute =#=#=#= =#=#=#= End test: Set a node health attribute - OK (0) =#=#=#= * Passed: crm_attribute - Set a node health attribute =#=#=#= Begin test: Show why a resource is not running on an unhealthy node =#=#=#= =#=#=#= End test: Show why a resource is not running on an unhealthy node - OK (0) =#=#=#= * Passed: crm_resource - Show why a resource is not running on an unhealthy node =#=#=#= Begin test: Delete a resource =#=#=#= =#=#=#= Current cib after: Delete a resource =#=#=#= =#=#=#= End test: Delete a resource - OK (0) =#=#=#= * Passed: crm_resource - Delete a resource =#=#=#= Begin test: Create an XML patchset =#=#=#= =#=#=#= End test: Create an XML patchset - Error occurred (1) =#=#=#= * Passed: crm_diff - Create an XML patchset =#=#=#= Begin test: Check locations and constraints for prim1 =#=#=#= =#=#=#= End test: Check locations and constraints for prim1 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim1 =#=#=#= Begin test: Recursively check locations and constraints for prim1 =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim1 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim1 =#=#=#= Begin test: Check locations and constraints for prim1 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim1 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim1 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim1 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim1 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim1 in XML =#=#=#= Begin test: Check locations and constraints for prim2 =#=#=#= Locations: * Node cluster01 (score=INFINITY, id=prim2-on-cluster1, rsc=prim2) Resources prim2 is colocated with: * prim3 (score=INFINITY, id=colocation-prim2-prim3-INFINITY) =#=#=#= End test: Check locations and constraints for prim2 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim2 =#=#=#= Begin test: Recursively check locations and constraints for prim2 =#=#=#= Locations: * Node cluster01 (score=INFINITY, id=prim2-on-cluster1, rsc=prim2) Resources prim2 is colocated with: * prim3 (score=INFINITY, id=colocation-prim2-prim3-INFINITY) * Resources prim3 is colocated with: * prim4 (score=INFINITY, id=colocation-prim3-prim4-INFINITY) * Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) * Resources prim4 is colocated with: * prim5 (score=INFINITY, id=colocation-prim4-prim5-INFINITY) =#=#=#= End test: Recursively check locations and constraints for prim2 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim2 =#=#=#= Begin test: Check locations and constraints for prim2 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim2 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim2 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim2 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim2 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim2 in XML =#=#=#= Begin test: Check locations and constraints for prim3 =#=#=#= Resources colocated with prim3: * prim2 (score=INFINITY, id=colocation-prim2-prim3-INFINITY) * Locations: * Node cluster01 (score=INFINITY, id=prim2-on-cluster1, rsc=prim2) Resources prim3 is colocated with: * prim4 (score=INFINITY, id=colocation-prim3-prim4-INFINITY) * Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) =#=#=#= End test: Check locations and constraints for prim3 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim3 =#=#=#= Begin test: Recursively check locations and constraints for prim3 =#=#=#= Resources colocated with prim3: * prim2 (score=INFINITY, id=colocation-prim2-prim3-INFINITY) * Locations: * Node cluster01 (score=INFINITY, id=prim2-on-cluster1, rsc=prim2) Resources prim3 is colocated with: * prim4 (score=INFINITY, id=colocation-prim3-prim4-INFINITY) * Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) * Resources prim4 is colocated with: * prim5 (score=INFINITY, id=colocation-prim4-prim5-INFINITY) =#=#=#= End test: Recursively check locations and constraints for prim3 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim3 =#=#=#= Begin test: Check locations and constraints for prim3 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim3 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim3 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim3 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim3 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim3 in XML =#=#=#= Begin test: Check locations and constraints for prim4 =#=#=#= Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) Resources colocated with prim4: * prim10 (score=INFINITY, id=colocation-prim10-prim4-INFINITY) * prim3 (score=INFINITY, id=colocation-prim3-prim4-INFINITY) Resources prim4 is colocated with: * prim5 (score=INFINITY, id=colocation-prim4-prim5-INFINITY) =#=#=#= End test: Check locations and constraints for prim4 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim4 =#=#=#= Begin test: Recursively check locations and constraints for prim4 =#=#=#= Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) Resources colocated with prim4: * prim10 (score=INFINITY, id=colocation-prim10-prim4-INFINITY) * prim3 (score=INFINITY, id=colocation-prim3-prim4-INFINITY) * Resources colocated with prim3: * prim2 (score=INFINITY, id=colocation-prim2-prim3-INFINITY) * Locations: * Node cluster01 (score=INFINITY, id=prim2-on-cluster1, rsc=prim2) Resources prim4 is colocated with: * prim5 (score=INFINITY, id=colocation-prim4-prim5-INFINITY) =#=#=#= End test: Recursively check locations and constraints for prim4 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim4 =#=#=#= Begin test: Check locations and constraints for prim4 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim4 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim4 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim4 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim4 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim4 in XML =#=#=#= Begin test: Check locations and constraints for prim5 =#=#=#= Resources colocated with prim5: * prim4 (score=INFINITY, id=colocation-prim4-prim5-INFINITY) * Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) =#=#=#= End test: Check locations and constraints for prim5 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim5 =#=#=#= Begin test: Recursively check locations and constraints for prim5 =#=#=#= Resources colocated with prim5: * prim4 (score=INFINITY, id=colocation-prim4-prim5-INFINITY) * Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) * Resources colocated with prim4: * prim10 (score=INFINITY, id=colocation-prim10-prim4-INFINITY) * prim3 (score=INFINITY, id=colocation-prim3-prim4-INFINITY) * Resources colocated with prim3: * prim2 (score=INFINITY, id=colocation-prim2-prim3-INFINITY) * Locations: * Node cluster01 (score=INFINITY, id=prim2-on-cluster1, rsc=prim2) =#=#=#= End test: Recursively check locations and constraints for prim5 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim5 =#=#=#= Begin test: Check locations and constraints for prim5 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim5 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim5 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim5 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim5 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim5 in XML =#=#=#= Begin test: Check locations and constraints for prim6 =#=#=#= Locations: * Node cluster02 (score=-INFINITY, id=prim6-not-on-cluster2, rsc=prim6) =#=#=#= End test: Check locations and constraints for prim6 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim6 =#=#=#= Begin test: Recursively check locations and constraints for prim6 =#=#=#= Locations: * Node cluster02 (score=-INFINITY, id=prim6-not-on-cluster2, rsc=prim6) =#=#=#= End test: Recursively check locations and constraints for prim6 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim6 =#=#=#= Begin test: Check locations and constraints for prim6 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim6 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim6 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim6 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim6 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim6 in XML =#=#=#= Begin test: Check locations and constraints for prim7 =#=#=#= Resources prim7 is colocated with: * group (score=INFINITY, id=colocation-prim7-group-INFINITY) =#=#=#= End test: Check locations and constraints for prim7 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim7 =#=#=#= Begin test: Recursively check locations and constraints for prim7 =#=#=#= Resources prim7 is colocated with: * group (score=INFINITY, id=colocation-prim7-group-INFINITY) =#=#=#= End test: Recursively check locations and constraints for prim7 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim7 =#=#=#= Begin test: Check locations and constraints for prim7 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim7 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim7 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim7 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim7 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim7 in XML =#=#=#= Begin test: Check locations and constraints for prim8 =#=#=#= Resources prim8 is colocated with: * gr2 (score=INFINITY, id=colocation-prim8-gr2-INFINITY) =#=#=#= End test: Check locations and constraints for prim8 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim8 =#=#=#= Begin test: Recursively check locations and constraints for prim8 =#=#=#= Resources prim8 is colocated with: * gr2 (score=INFINITY, id=colocation-prim8-gr2-INFINITY) =#=#=#= End test: Recursively check locations and constraints for prim8 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim8 =#=#=#= Begin test: Check locations and constraints for prim8 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim8 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim8 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim8 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim8 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim8 in XML =#=#=#= Begin test: Check locations and constraints for prim9 =#=#=#= Resources prim9 is colocated with: * clone (score=INFINITY, id=colocation-prim9-clone-INFINITY) =#=#=#= End test: Check locations and constraints for prim9 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim9 =#=#=#= Begin test: Recursively check locations and constraints for prim9 =#=#=#= Resources prim9 is colocated with: * clone (score=INFINITY, id=colocation-prim9-clone-INFINITY) =#=#=#= End test: Recursively check locations and constraints for prim9 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim9 =#=#=#= Begin test: Check locations and constraints for prim9 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim9 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim9 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim9 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim9 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim9 in XML =#=#=#= Begin test: Check locations and constraints for prim10 =#=#=#= Resources prim10 is colocated with: * prim4 (score=INFINITY, id=colocation-prim10-prim4-INFINITY) * Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) =#=#=#= End test: Check locations and constraints for prim10 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim10 =#=#=#= Begin test: Recursively check locations and constraints for prim10 =#=#=#= Resources prim10 is colocated with: * prim4 (score=INFINITY, id=colocation-prim10-prim4-INFINITY) * Locations: * Node cluster02 (score=INFINITY, id=prim4-on-cluster2, rsc=prim4) * Resources prim4 is colocated with: * prim5 (score=INFINITY, id=colocation-prim4-prim5-INFINITY) =#=#=#= End test: Recursively check locations and constraints for prim10 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim10 =#=#=#= Begin test: Check locations and constraints for prim10 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim10 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim10 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim10 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim10 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim10 in XML =#=#=#= Begin test: Check locations and constraints for prim11 =#=#=#= Resources colocated with prim11: * prim13 (score=INFINITY, id=colocation-prim13-prim11-INFINITY) Resources prim11 is colocated with: * prim12 (score=INFINITY, id=colocation-prim11-prim12-INFINITY) =#=#=#= End test: Check locations and constraints for prim11 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim11 =#=#=#= Begin test: Recursively check locations and constraints for prim11 =#=#=#= Resources colocated with prim11: * prim13 (score=INFINITY, id=colocation-prim13-prim11-INFINITY) * Resources colocated with prim13: * prim12 (score=INFINITY, id=colocation-prim12-prim13-INFINITY) * Resources colocated with prim12: * prim11 (id=colocation-prim11-prim12-INFINITY - loop) Resources prim11 is colocated with: * prim12 (score=INFINITY, id=colocation-prim11-prim12-INFINITY) * Resources prim12 is colocated with: * prim13 (score=INFINITY, id=colocation-prim12-prim13-INFINITY) * Resources prim13 is colocated with: * prim11 (id=colocation-prim13-prim11-INFINITY - loop) =#=#=#= End test: Recursively check locations and constraints for prim11 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim11 =#=#=#= Begin test: Check locations and constraints for prim11 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim11 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim11 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim11 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim11 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim11 in XML =#=#=#= Begin test: Check locations and constraints for prim12 =#=#=#= Resources colocated with prim12: * prim11 (score=INFINITY, id=colocation-prim11-prim12-INFINITY) Resources prim12 is colocated with: * prim13 (score=INFINITY, id=colocation-prim12-prim13-INFINITY) =#=#=#= End test: Check locations and constraints for prim12 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim12 =#=#=#= Begin test: Recursively check locations and constraints for prim12 =#=#=#= Resources colocated with prim12: * prim11 (score=INFINITY, id=colocation-prim11-prim12-INFINITY) * Resources colocated with prim11: * prim13 (score=INFINITY, id=colocation-prim13-prim11-INFINITY) * Resources colocated with prim13: * prim12 (id=colocation-prim12-prim13-INFINITY - loop) Resources prim12 is colocated with: * prim13 (score=INFINITY, id=colocation-prim12-prim13-INFINITY) * Resources prim13 is colocated with: * prim11 (score=INFINITY, id=colocation-prim13-prim11-INFINITY) * Resources prim11 is colocated with: * prim12 (id=colocation-prim11-prim12-INFINITY - loop) =#=#=#= End test: Recursively check locations and constraints for prim12 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim12 =#=#=#= Begin test: Check locations and constraints for prim12 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim12 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim12 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim12 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim12 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim12 in XML =#=#=#= Begin test: Check locations and constraints for prim13 =#=#=#= Resources colocated with prim13: * prim12 (score=INFINITY, id=colocation-prim12-prim13-INFINITY) Resources prim13 is colocated with: * prim11 (score=INFINITY, id=colocation-prim13-prim11-INFINITY) =#=#=#= End test: Check locations and constraints for prim13 - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim13 =#=#=#= Begin test: Recursively check locations and constraints for prim13 =#=#=#= Resources colocated with prim13: * prim12 (score=INFINITY, id=colocation-prim12-prim13-INFINITY) * Resources colocated with prim12: * prim11 (score=INFINITY, id=colocation-prim11-prim12-INFINITY) * Resources colocated with prim11: * prim13 (id=colocation-prim13-prim11-INFINITY - loop) Resources prim13 is colocated with: * prim11 (score=INFINITY, id=colocation-prim13-prim11-INFINITY) * Resources prim11 is colocated with: * prim12 (score=INFINITY, id=colocation-prim11-prim12-INFINITY) * Resources prim12 is colocated with: * prim13 (id=colocation-prim12-prim13-INFINITY - loop) =#=#=#= End test: Recursively check locations and constraints for prim13 - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim13 =#=#=#= Begin test: Check locations and constraints for prim13 in XML =#=#=#= =#=#=#= End test: Check locations and constraints for prim13 in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for prim13 in XML =#=#=#= Begin test: Recursively check locations and constraints for prim13 in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for prim13 in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for prim13 in XML =#=#=#= Begin test: Check locations and constraints for group =#=#=#= Resources colocated with group: * prim7 (score=INFINITY, id=colocation-prim7-group-INFINITY) =#=#=#= End test: Check locations and constraints for group - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for group =#=#=#= Begin test: Recursively check locations and constraints for group =#=#=#= Resources colocated with group: * prim7 (score=INFINITY, id=colocation-prim7-group-INFINITY) =#=#=#= End test: Recursively check locations and constraints for group - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for group =#=#=#= Begin test: Check locations and constraints for group in XML =#=#=#= =#=#=#= End test: Check locations and constraints for group in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for group in XML =#=#=#= Begin test: Recursively check locations and constraints for group in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for group in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for group in XML =#=#=#= Begin test: Check locations and constraints for clone =#=#=#= Resources colocated with clone: * prim9 (score=INFINITY, id=colocation-prim9-clone-INFINITY) =#=#=#= End test: Check locations and constraints for clone - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for clone =#=#=#= Begin test: Recursively check locations and constraints for clone =#=#=#= Resources colocated with clone: * prim9 (score=INFINITY, id=colocation-prim9-clone-INFINITY) =#=#=#= End test: Recursively check locations and constraints for clone - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for clone =#=#=#= Begin test: Check locations and constraints for clone in XML =#=#=#= =#=#=#= End test: Check locations and constraints for clone in XML - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for clone in XML =#=#=#= Begin test: Recursively check locations and constraints for clone in XML =#=#=#= =#=#=#= End test: Recursively check locations and constraints for clone in XML - OK (0) =#=#=#= * Passed: crm_resource - Recursively check locations and constraints for clone in XML =#=#=#= Begin test: Check locations and constraints for group member (referring to group) =#=#=#= Resources colocated with group: * prim7 (score=INFINITY, id=colocation-prim7-group-INFINITY) =#=#=#= End test: Check locations and constraints for group member (referring to group) - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for group member (referring to group) =#=#=#= Begin test: Check locations and constraints for group member (without referring to group) =#=#=#= Resources colocated with gr2: * prim8 (score=INFINITY, id=colocation-prim8-gr2-INFINITY) =#=#=#= End test: Check locations and constraints for group member (without referring to group) - OK (0) =#=#=#= * Passed: crm_resource - Check locations and constraints for group member (without referring to group) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Set a meta-attribute for primitive and resources colocated with it =#=#=#= Set 'prim5' option: id=prim5-meta_attributes-target-role set=prim5-meta_attributes name=target-role value=Stopped Set 'prim4' option: id=prim4-meta_attributes-target-role set=prim4-meta_attributes name=target-role value=Stopped Set 'prim10' option: id=prim10-meta_attributes-target-role set=prim10-meta_attributes name=target-role value=Stopped Set 'prim3' option: id=prim3-meta_attributes-target-role set=prim3-meta_attributes name=target-role value=Stopped Set 'prim2' option: id=prim2-meta_attributes-target-role set=prim2-meta_attributes name=target-role value=Stopped =#=#=#= End test: Set a meta-attribute for primitive and resources colocated with it - OK (0) =#=#=#= * Passed: crm_resource - Set a meta-attribute for primitive and resources colocated with it =#=#=#= Begin test: Set a meta-attribute for group and resource colocated with it =#=#=#= Set 'group' option: id=group-meta_attributes-target-role set=group-meta_attributes name=target-role value=Stopped Set 'prim7' option: id=prim7-meta_attributes-target-role set=prim7-meta_attributes name=target-role value=Stopped =#=#=#= End test: Set a meta-attribute for group and resource colocated with it - OK (0) =#=#=#= * Passed: crm_resource - Set a meta-attribute for group and resource colocated with it =#=#=#= Begin test: Set a meta-attribute for clone and resource colocated with it =#=#=#= Set 'clone' option: id=clone-meta_attributes-target-role set=clone-meta_attributes name=target-role value=Stopped Set 'prim9' option: id=prim9-meta_attributes-target-role set=prim9-meta_attributes name=target-role value=Stopped =#=#=#= End test: Set a meta-attribute for clone and resource colocated with it - OK (0) =#=#=#= * Passed: crm_resource - Set a meta-attribute for clone and resource colocated with it =#=#=#= Begin test: Show resource digests =#=#=#= =#=#=#= End test: Show resource digests - OK (0) =#=#=#= * Passed: crm_resource - Show resource digests =#=#=#= Begin test: Show resource digests with overrides =#=#=#= =#=#=#= End test: Show resource digests with overrides - OK (0) =#=#=#= * Passed: crm_resource - Show resource digests with overrides =#=#=#= Begin test: Show resource operations =#=#=#= rsc1 (ocf:pacemaker:Dummy): Started: rsc1_monitor_0 (node=node4, call=136, rc=7, exec=28ms): complete Fencing (stonith:fence_xvm): Started: Fencing_monitor_0 (node=node4, call=5, rc=7, exec=2ms): complete rsc1 (ocf:pacemaker:Dummy): Started: rsc1_monitor_0 (node=node2, call=101, rc=7, exec=45ms): complete Fencing (stonith:fence_xvm): Started: Fencing_monitor_0 (node=node2, call=5, rc=7, exec=4ms): complete Fencing (stonith:fence_xvm): Started: Fencing_monitor_0 (node=node3, call=5, rc=7, exec=24ms): complete rsc1 (ocf:pacemaker:Dummy): Started: rsc1_monitor_0 (node=node5, call=99, rc=193, exec=27ms): pending Fencing (stonith:fence_xvm): Started: Fencing_monitor_0 (node=node5, call=5, rc=7, exec=14ms): complete rsc1 (ocf:pacemaker:Dummy): Started: rsc1_start_0 (node=node1, call=104, rc=0, exec=22ms): complete rsc1 (ocf:pacemaker:Dummy): Started: rsc1_monitor_10000 (node=node1, call=106, rc=0, exec=20ms): complete Fencing (stonith:fence_xvm): Started: Fencing_start_0 (node=node1, call=10, rc=0, exec=59ms): complete Fencing (stonith:fence_xvm): Started: Fencing_monitor_120000 (node=node1, call=12, rc=0, exec=70ms): complete =#=#=#= End test: Show resource operations - OK (0) =#=#=#= * Passed: crm_resource - Show resource operations =#=#=#= Begin test: Show resource operations (XML) =#=#=#= =#=#=#= End test: Show resource operations (XML) - OK (0) =#=#=#= * Passed: crm_resource - Show resource operations (XML) =#=#=#= Begin test: List all nodes =#=#=#= cluster node: overcloud-controller-0 (1) cluster node: overcloud-controller-1 (2) cluster node: overcloud-controller-2 (3) cluster node: overcloud-galera-0 (4) cluster node: overcloud-galera-1 (5) cluster node: overcloud-galera-2 (6) guest node: lxc1 (lxc1) guest node: lxc2 (lxc2) remote node: overcloud-rabbit-0 (overcloud-rabbit-0) remote node: overcloud-rabbit-1 (overcloud-rabbit-1) remote node: overcloud-rabbit-2 (overcloud-rabbit-2) =#=#=#= End test: List all nodes - OK (0) =#=#=#= * Passed: crmadmin - List all nodes =#=#=#= Begin test: Minimally list all nodes =#=#=#= overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 overcloud-galera-0 overcloud-galera-1 overcloud-galera-2 lxc1 lxc2 overcloud-rabbit-0 overcloud-rabbit-1 overcloud-rabbit-2 =#=#=#= End test: Minimally list all nodes - OK (0) =#=#=#= * Passed: crmadmin - Minimally list all nodes =#=#=#= Begin test: List all nodes as bash exports =#=#=#= export overcloud-controller-0=1 export overcloud-controller-1=2 export overcloud-controller-2=3 export overcloud-galera-0=4 export overcloud-galera-1=5 export overcloud-galera-2=6 export lxc1=lxc1 export lxc2=lxc2 export overcloud-rabbit-0=overcloud-rabbit-0 export overcloud-rabbit-1=overcloud-rabbit-1 export overcloud-rabbit-2=overcloud-rabbit-2 =#=#=#= End test: List all nodes as bash exports - OK (0) =#=#=#= * Passed: crmadmin - List all nodes as bash exports =#=#=#= Begin test: List cluster nodes =#=#=#= 6 =#=#=#= End test: List cluster nodes - OK (0) =#=#=#= * Passed: crmadmin - List cluster nodes =#=#=#= Begin test: List guest nodes =#=#=#= 2 =#=#=#= End test: List guest nodes - OK (0) =#=#=#= * Passed: crmadmin - List guest nodes =#=#=#= Begin test: List remote nodes =#=#=#= 3 =#=#=#= End test: List remote nodes - OK (0) =#=#=#= * Passed: crmadmin - List remote nodes =#=#=#= Begin test: List cluster,remote nodes =#=#=#= 9 =#=#=#= End test: List cluster,remote nodes - OK (0) =#=#=#= * Passed: crmadmin - List cluster,remote nodes =#=#=#= Begin test: List guest,remote nodes =#=#=#= 5 =#=#=#= End test: List guest,remote nodes - OK (0) =#=#=#= * Passed: crmadmin - List guest,remote nodes =#=#=#= Begin test: Show allocation scores with crm_simulate =#=#=#= =#=#=#= End test: Show allocation scores with crm_simulate - OK (0) =#=#=#= * Passed: crm_simulate - Show allocation scores with crm_simulate =#=#=#= Begin test: Show utilization with crm_simulate =#=#=#= 4 of 32 resource instances DISABLED and 0 BLOCKED from further action due to failure [ cluster01 cluster02 ] [ httpd-bundle-0 httpd-bundle-1 ] Started: [ cluster01 cluster02 ] Fencing (stonith:fence_xvm): Started cluster01 dummy (ocf:pacemaker:Dummy): Started cluster02 Stopped (disabled): [ cluster01 cluster02 ] inactive-dummy-1 (ocf:pacemaker:Dummy): Stopped (disabled) inactive-dummy-2 (ocf:pacemaker:Dummy): Stopped (disabled) httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): Started cluster01 httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): Started cluster02 httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped Public-IP (ocf:heartbeat:IPaddr): Started cluster02 Email (lsb:exim): Started cluster02 Started: [ cluster01 cluster02 ] Promoted: [ cluster02 ] Unpromoted: [ cluster01 ] Only 'private' parameters to 60s-interval monitor for dummy on cluster02 changed: 0:0;16:2:0:4a9e64d6-e1dd-4395-917c-1596312eafe4 Original: cluster01 capacity: Original: cluster02 capacity: Original: httpd-bundle-0 capacity: Original: httpd-bundle-1 capacity: Original: httpd-bundle-2 capacity: pcmk__finalize_assignment: ping:0 utilization on cluster02: pcmk__finalize_assignment: ping:1 utilization on cluster01: pcmk__finalize_assignment: Fencing utilization on cluster01: pcmk__finalize_assignment: dummy utilization on cluster02: pcmk__finalize_assignment: httpd-bundle-docker-0 utilization on cluster01: pcmk__finalize_assignment: httpd-bundle-docker-1 utilization on cluster02: pcmk__finalize_assignment: httpd-bundle-ip-192.168.122.131 utilization on cluster01: pcmk__finalize_assignment: httpd-bundle-0 utilization on cluster01: pcmk__finalize_assignment: httpd:0 utilization on httpd-bundle-0: pcmk__finalize_assignment: httpd-bundle-ip-192.168.122.132 utilization on cluster02: pcmk__finalize_assignment: httpd-bundle-1 utilization on cluster02: pcmk__finalize_assignment: httpd:1 utilization on httpd-bundle-1: pcmk__finalize_assignment: httpd-bundle-2 utilization on cluster01: pcmk__finalize_assignment: httpd:2 utilization on httpd-bundle-2: pcmk__finalize_assignment: Public-IP utilization on cluster02: pcmk__finalize_assignment: Email utilization on cluster02: pcmk__finalize_assignment: mysql-proxy:0 utilization on cluster02: pcmk__finalize_assignment: mysql-proxy:1 utilization on cluster01: pcmk__finalize_assignment: promotable-rsc:0 utilization on cluster02: pcmk__finalize_assignment: promotable-rsc:1 utilization on cluster01: Remaining: cluster01 capacity: Remaining: cluster02 capacity: Remaining: httpd-bundle-0 capacity: Remaining: httpd-bundle-1 capacity: Remaining: httpd-bundle-2 capacity: Start httpd-bundle-2 ( cluster01 ) due to unrunnable httpd-bundle-docker-2 start (blocked) Start httpd:2 ( httpd-bundle-2 ) due to unrunnable httpd-bundle-docker-2 start (blocked) =#=#=#= End test: Show utilization with crm_simulate - OK (0) =#=#=#= * Passed: crm_simulate - Show utilization with crm_simulate =#=#=#= Begin test: Simulate injecting a failure =#=#=#= 4 of 32 resource instances DISABLED and 0 BLOCKED from further action due to failure Current cluster status: * Node List: * Online: [ cluster01 cluster02 ] * GuestOnline: [ httpd-bundle-0 httpd-bundle-1 ] * Full List of Resources: * Clone Set: ping-clone [ping]: * Started: [ cluster01 cluster02 ] * Fencing (stonith:fence_xvm): Started cluster01 * dummy (ocf:pacemaker:Dummy): Started cluster02 * Clone Set: inactive-clone [inactive-dhcpd] (disabled): * Stopped (disabled): [ cluster01 cluster02 ] * Resource Group: inactive-group (disabled): * inactive-dummy-1 (ocf:pacemaker:Dummy): Stopped (disabled) * inactive-dummy-2 (ocf:pacemaker:Dummy): Stopped (disabled) * Container bundle set: httpd-bundle [pcmk:http]: * httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): Started cluster01 * httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): Started cluster02 * httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped * Resource Group: exim-group: * Public-IP (ocf:heartbeat:IPaddr): Started cluster02 * Email (lsb:exim): Started cluster02 * Clone Set: mysql-clone-group [mysql-group]: * Started: [ cluster01 cluster02 ] * Clone Set: promotable-clone [promotable-rsc] (promotable): * Promoted: [ cluster02 ] * Unpromoted: [ cluster01 ] Performing Requested Modifications: * Injecting ping_monitor_10000@cluster02=1 into the configuration * Injecting attribute fail-count-ping#monitor_10000=value++ into /node_state '2' * Injecting attribute last-failure-ping#monitor_10000= into /node_state '2' Transition Summary: * Recover ping:0 ( cluster02 ) * Start httpd-bundle-2 ( cluster01 ) due to unrunnable httpd-bundle-docker-2 start (blocked) * Start httpd:2 ( httpd-bundle-2 ) due to unrunnable httpd-bundle-docker-2 start (blocked) Executing Cluster Transition: * Cluster action: clear_failcount for ping on cluster02 * Pseudo action: ping-clone_stop_0 * Pseudo action: httpd-bundle_start_0 * Resource action: ping stop on cluster02 * Pseudo action: ping-clone_stopped_0 * Pseudo action: ping-clone_start_0 * Pseudo action: httpd-bundle-clone_start_0 * Resource action: ping start on cluster02 * Resource action: ping monitor=10000 on cluster02 * Pseudo action: ping-clone_running_0 * Pseudo action: httpd-bundle-clone_running_0 * Pseudo action: httpd-bundle_running_0 Revised Cluster Status: * Node List: * Online: [ cluster01 cluster02 ] * GuestOnline: [ httpd-bundle-0 httpd-bundle-1 ] * Full List of Resources: * Clone Set: ping-clone [ping]: * Started: [ cluster01 cluster02 ] * Fencing (stonith:fence_xvm): Started cluster01 * dummy (ocf:pacemaker:Dummy): Started cluster02 * Clone Set: inactive-clone [inactive-dhcpd] (disabled): * Stopped (disabled): [ cluster01 cluster02 ] * Resource Group: inactive-group (disabled): * inactive-dummy-1 (ocf:pacemaker:Dummy): Stopped (disabled) * inactive-dummy-2 (ocf:pacemaker:Dummy): Stopped (disabled) * Container bundle set: httpd-bundle [pcmk:http]: * httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): Started cluster01 * httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): Started cluster02 * httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped * Resource Group: exim-group: * Public-IP (ocf:heartbeat:IPaddr): Started cluster02 * Email (lsb:exim): Started cluster02 * Clone Set: mysql-clone-group [mysql-group]: * Started: [ cluster01 cluster02 ] * Clone Set: promotable-clone [promotable-rsc] (promotable): * Promoted: [ cluster02 ] * Unpromoted: [ cluster01 ] =#=#=#= End test: Simulate injecting a failure - OK (0) =#=#=#= * Passed: crm_simulate - Simulate injecting a failure =#=#=#= Begin test: Simulate bringing a node down =#=#=#= 4 of 32 resource instances DISABLED and 0 BLOCKED from further action due to failure Current cluster status: * Node List: * Online: [ cluster01 cluster02 ] * GuestOnline: [ httpd-bundle-0 httpd-bundle-1 ] * Full List of Resources: * Clone Set: ping-clone [ping]: * Started: [ cluster01 cluster02 ] * Fencing (stonith:fence_xvm): Started cluster01 * dummy (ocf:pacemaker:Dummy): Started cluster02 * Clone Set: inactive-clone [inactive-dhcpd] (disabled): * Stopped (disabled): [ cluster01 cluster02 ] * Resource Group: inactive-group (disabled): * inactive-dummy-1 (ocf:pacemaker:Dummy): Stopped (disabled) * inactive-dummy-2 (ocf:pacemaker:Dummy): Stopped (disabled) * Container bundle set: httpd-bundle [pcmk:http]: * httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): Started cluster01 * httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): Started cluster02 * httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped * Resource Group: exim-group: * Public-IP (ocf:heartbeat:IPaddr): Started cluster02 * Email (lsb:exim): Started cluster02 * Clone Set: mysql-clone-group [mysql-group]: * Started: [ cluster01 cluster02 ] * Clone Set: promotable-clone [promotable-rsc] (promotable): * Promoted: [ cluster02 ] * Unpromoted: [ cluster01 ] Performing Requested Modifications: * Taking node cluster01 offline Transition Summary: * Fence (off) httpd-bundle-0 (resource: httpd-bundle-docker-0) 'guest is unclean' * Start Fencing ( cluster02 ) * Start httpd-bundle-0 ( cluster02 ) due to unrunnable httpd-bundle-docker-0 start (blocked) * Stop httpd:0 ( httpd-bundle-0 ) due to unrunnable httpd-bundle-docker-0 start * Start httpd-bundle-2 ( cluster02 ) due to unrunnable httpd-bundle-docker-2 start (blocked) * Start httpd:2 ( httpd-bundle-2 ) due to unrunnable httpd-bundle-docker-2 start (blocked) Executing Cluster Transition: * Resource action: Fencing start on cluster02 * Pseudo action: stonith-httpd-bundle-0-off on httpd-bundle-0 * Pseudo action: httpd-bundle_stop_0 * Pseudo action: httpd-bundle_start_0 * Resource action: Fencing monitor=60000 on cluster02 * Pseudo action: httpd-bundle-clone_stop_0 * Pseudo action: httpd_stop_0 * Pseudo action: httpd-bundle-clone_stopped_0 * Pseudo action: httpd-bundle-clone_start_0 * Pseudo action: httpd-bundle_stopped_0 * Pseudo action: httpd-bundle-clone_running_0 * Pseudo action: httpd-bundle_running_0 Revised Cluster Status: * Node List: * Online: [ cluster02 ] * OFFLINE: [ cluster01 ] * GuestOnline: [ httpd-bundle-1 ] * Full List of Resources: * Clone Set: ping-clone [ping]: * Started: [ cluster02 ] * Stopped: [ cluster01 ] * Fencing (stonith:fence_xvm): Started cluster02 * dummy (ocf:pacemaker:Dummy): Started cluster02 * Clone Set: inactive-clone [inactive-dhcpd] (disabled): * Stopped (disabled): [ cluster01 cluster02 ] * Resource Group: inactive-group (disabled): * inactive-dummy-1 (ocf:pacemaker:Dummy): Stopped (disabled) * inactive-dummy-2 (ocf:pacemaker:Dummy): Stopped (disabled) * Container bundle set: httpd-bundle [pcmk:http]: * httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): FAILED * httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): Started cluster02 * httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped * Resource Group: exim-group: * Public-IP (ocf:heartbeat:IPaddr): Started cluster02 * Email (lsb:exim): Started cluster02 * Clone Set: mysql-clone-group [mysql-group]: * Started: [ cluster02 ] * Stopped: [ cluster01 ] * Clone Set: promotable-clone [promotable-rsc] (promotable): * Promoted: [ cluster02 ] * Stopped: [ cluster01 ] =#=#=#= End test: Simulate bringing a node down - OK (0) =#=#=#= * Passed: crm_simulate - Simulate bringing a node down =#=#=#= Begin test: Simulate a node failing =#=#=#= 4 of 32 resource instances DISABLED and 0 BLOCKED from further action due to failure Current cluster status: * Node List: * Online: [ cluster01 cluster02 ] * GuestOnline: [ httpd-bundle-0 httpd-bundle-1 ] * Full List of Resources: * Clone Set: ping-clone [ping]: * Started: [ cluster01 cluster02 ] * Fencing (stonith:fence_xvm): Started cluster01 * dummy (ocf:pacemaker:Dummy): Started cluster02 * Clone Set: inactive-clone [inactive-dhcpd] (disabled): * Stopped (disabled): [ cluster01 cluster02 ] * Resource Group: inactive-group (disabled): * inactive-dummy-1 (ocf:pacemaker:Dummy): Stopped (disabled) * inactive-dummy-2 (ocf:pacemaker:Dummy): Stopped (disabled) * Container bundle set: httpd-bundle [pcmk:http]: * httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): Started cluster01 * httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): Started cluster02 * httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped * Resource Group: exim-group: * Public-IP (ocf:heartbeat:IPaddr): Started cluster02 * Email (lsb:exim): Started cluster02 * Clone Set: mysql-clone-group [mysql-group]: * Started: [ cluster01 cluster02 ] * Clone Set: promotable-clone [promotable-rsc] (promotable): * Promoted: [ cluster02 ] * Unpromoted: [ cluster01 ] Performing Requested Modifications: * Failing node cluster02 Transition Summary: * Fence (off) httpd-bundle-1 (resource: httpd-bundle-docker-1) 'guest is unclean' * Fence (reboot) cluster02 'peer is no longer part of the cluster' * Stop ping:0 ( cluster02 ) due to node availability * Stop dummy ( cluster02 ) due to node availability * Stop httpd-bundle-ip-192.168.122.132 ( cluster02 ) due to node availability * Stop httpd-bundle-docker-1 ( cluster02 ) due to node availability * Stop httpd-bundle-1 ( cluster02 ) due to unrunnable httpd-bundle-docker-1 start * Stop httpd:1 ( httpd-bundle-1 ) due to unrunnable httpd-bundle-docker-1 start * Start httpd-bundle-2 ( cluster01 ) due to unrunnable httpd-bundle-docker-2 start (blocked) * Start httpd:2 ( httpd-bundle-2 ) due to unrunnable httpd-bundle-docker-2 start (blocked) * Move Public-IP ( cluster02 -> cluster01 ) * Move Email ( cluster02 -> cluster01 ) * Stop mysql-proxy:0 ( cluster02 ) due to node availability * Stop promotable-rsc:0 ( Promoted cluster02 ) due to node availability Executing Cluster Transition: * Pseudo action: httpd-bundle-1_stop_0 * Pseudo action: promotable-clone_demote_0 * Pseudo action: httpd-bundle_stop_0 * Pseudo action: httpd-bundle_start_0 * Fencing cluster02 (reboot) * Pseudo action: ping-clone_stop_0 * Pseudo action: dummy_stop_0 * Pseudo action: httpd-bundle-docker-1_stop_0 * Pseudo action: exim-group_stop_0 * Pseudo action: Email_stop_0 * Pseudo action: mysql-clone-group_stop_0 * Pseudo action: promotable-rsc_demote_0 * Pseudo action: promotable-clone_demoted_0 * Pseudo action: promotable-clone_stop_0 * Pseudo action: stonith-httpd-bundle-1-off on httpd-bundle-1 * Pseudo action: ping_stop_0 * Pseudo action: ping-clone_stopped_0 * Pseudo action: httpd-bundle-clone_stop_0 * Pseudo action: httpd-bundle-ip-192.168.122.132_stop_0 * Pseudo action: Public-IP_stop_0 * Pseudo action: mysql-group:0_stop_0 * Pseudo action: mysql-proxy_stop_0 * Pseudo action: promotable-rsc_stop_0 * Pseudo action: promotable-clone_stopped_0 * Pseudo action: httpd_stop_0 * Pseudo action: httpd-bundle-clone_stopped_0 * Pseudo action: httpd-bundle-clone_start_0 * Pseudo action: exim-group_stopped_0 * Pseudo action: exim-group_start_0 * Resource action: Public-IP start on cluster01 * Resource action: Email start on cluster01 * Pseudo action: mysql-group:0_stopped_0 * Pseudo action: mysql-clone-group_stopped_0 * Pseudo action: httpd-bundle_stopped_0 * Pseudo action: httpd-bundle-clone_running_0 * Pseudo action: exim-group_running_0 * Pseudo action: httpd-bundle_running_0 Revised Cluster Status: * Node List: * Online: [ cluster01 ] * OFFLINE: [ cluster02 ] * GuestOnline: [ httpd-bundle-0 ] * Full List of Resources: * Clone Set: ping-clone [ping]: * Started: [ cluster01 ] * Stopped: [ cluster02 ] * Fencing (stonith:fence_xvm): Started cluster01 * dummy (ocf:pacemaker:Dummy): Stopped * Clone Set: inactive-clone [inactive-dhcpd] (disabled): * Stopped (disabled): [ cluster01 cluster02 ] * Resource Group: inactive-group (disabled): * inactive-dummy-1 (ocf:pacemaker:Dummy): Stopped (disabled) * inactive-dummy-2 (ocf:pacemaker:Dummy): Stopped (disabled) * Container bundle set: httpd-bundle [pcmk:http]: * httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): Started cluster01 * httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): FAILED * httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped * Resource Group: exim-group: * Public-IP (ocf:heartbeat:IPaddr): Started cluster01 * Email (lsb:exim): Started cluster01 * Clone Set: mysql-clone-group [mysql-group]: * Started: [ cluster01 ] * Stopped: [ cluster02 ] * Clone Set: promotable-clone [promotable-rsc] (promotable): * Unpromoted: [ cluster01 ] * Stopped: [ cluster02 ] =#=#=#= End test: Simulate a node failing - OK (0) =#=#=#= * Passed: crm_simulate - Simulate a node failing =#=#=#= Begin test: List a promotable clone resource =#=#=#= resource promotable-clone is running on: cluster01 resource promotable-clone is running on: cluster02 Promoted =#=#=#= End test: List a promotable clone resource - OK (0) =#=#=#= * Passed: crm_resource - List a promotable clone resource =#=#=#= Begin test: List the primitive of a promotable clone resource =#=#=#= resource promotable-rsc is running on: cluster01 resource promotable-rsc is running on: cluster02 Promoted =#=#=#= End test: List the primitive of a promotable clone resource - OK (0) =#=#=#= * Passed: crm_resource - List the primitive of a promotable clone resource =#=#=#= Begin test: List a single instance of a promotable clone resource =#=#=#= resource promotable-rsc:0 is running on: cluster02 Promoted =#=#=#= End test: List a single instance of a promotable clone resource - OK (0) =#=#=#= * Passed: crm_resource - List a single instance of a promotable clone resource =#=#=#= Begin test: List another instance of a promotable clone resource =#=#=#= resource promotable-rsc:1 is running on: cluster01 =#=#=#= End test: List another instance of a promotable clone resource - OK (0) =#=#=#= * Passed: crm_resource - List another instance of a promotable clone resource =#=#=#= Begin test: List a promotable clone resource in XML =#=#=#= cluster01 cluster02 =#=#=#= End test: List a promotable clone resource in XML - OK (0) =#=#=#= * Passed: crm_resource - List a promotable clone resource in XML =#=#=#= Begin test: List the primitive of a promotable clone resource in XML =#=#=#= cluster01 cluster02 =#=#=#= End test: List the primitive of a promotable clone resource in XML - OK (0) =#=#=#= * Passed: crm_resource - List the primitive of a promotable clone resource in XML =#=#=#= Begin test: List a single instance of a promotable clone resource in XML =#=#=#= cluster02 =#=#=#= End test: List a single instance of a promotable clone resource in XML - OK (0) =#=#=#= * Passed: crm_resource - List a single instance of a promotable clone resource in XML =#=#=#= Begin test: List another instance of a promotable clone resource in XML =#=#=#= cluster01 =#=#=#= End test: List another instance of a promotable clone resource in XML - OK (0) =#=#=#= * Passed: crm_resource - List another instance of a promotable clone resource in XML =#=#=#= Begin test: Try to move an instance of a cloned resource =#=#=#= crm_resource: Cannot operate on clone resource instance 'promotable-rsc:0' Error performing operation: Invalid parameter =#=#=#= End test: Try to move an instance of a cloned resource - Invalid parameter (2) =#=#=#= * Passed: crm_resource - Try to move an instance of a cloned resource =#=#=#= Begin test: Query a nonexistent promotable score attribute =#=#=#= crm_attribute: Error performing operation: No such device or address =#=#=#= End test: Query a nonexistent promotable score attribute - No such object (105) =#=#=#= * Passed: crm_attribute - Query a nonexistent promotable score attribute =#=#=#= Begin test: Query a nonexistent promotable score attribute (XML) =#=#=#= crm_attribute: Error performing operation: No such device or address =#=#=#= End test: Query a nonexistent promotable score attribute (XML) - No such object (105) =#=#=#= * Passed: crm_attribute - Query a nonexistent promotable score attribute (XML) =#=#=#= Begin test: Delete a nonexistent promotable score attribute =#=#=#= =#=#=#= End test: Delete a nonexistent promotable score attribute - OK (0) =#=#=#= * Passed: crm_attribute - Delete a nonexistent promotable score attribute =#=#=#= Begin test: Delete a nonexistent promotable score attribute (XML) =#=#=#= =#=#=#= End test: Delete a nonexistent promotable score attribute (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Delete a nonexistent promotable score attribute (XML) =#=#=#= Begin test: Query after deleting a nonexistent promotable score attribute =#=#=#= crm_attribute: Error performing operation: No such device or address =#=#=#= End test: Query after deleting a nonexistent promotable score attribute - No such object (105) =#=#=#= * Passed: crm_attribute - Query after deleting a nonexistent promotable score attribute =#=#=#= Begin test: Query after deleting a nonexistent promotable score attribute (XML) =#=#=#= crm_attribute: Error performing operation: No such device or address =#=#=#= End test: Query after deleting a nonexistent promotable score attribute (XML) - No such object (105) =#=#=#= * Passed: crm_attribute - Query after deleting a nonexistent promotable score attribute (XML) =#=#=#= Begin test: Update a nonexistent promotable score attribute =#=#=#= =#=#=#= End test: Update a nonexistent promotable score attribute - OK (0) =#=#=#= * Passed: crm_attribute - Update a nonexistent promotable score attribute =#=#=#= Begin test: Update a nonexistent promotable score attribute (XML) =#=#=#= =#=#=#= End test: Update a nonexistent promotable score attribute (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Update a nonexistent promotable score attribute (XML) =#=#=#= Begin test: Query after updating a nonexistent promotable score attribute =#=#=#= scope=status name=master-promotable-rsc value=1 =#=#=#= End test: Query after updating a nonexistent promotable score attribute - OK (0) =#=#=#= * Passed: crm_attribute - Query after updating a nonexistent promotable score attribute =#=#=#= Begin test: Query after updating a nonexistent promotable score attribute (XML) =#=#=#= =#=#=#= End test: Query after updating a nonexistent promotable score attribute (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Query after updating a nonexistent promotable score attribute (XML) =#=#=#= Begin test: Update an existing promotable score attribute =#=#=#= =#=#=#= End test: Update an existing promotable score attribute - OK (0) =#=#=#= * Passed: crm_attribute - Update an existing promotable score attribute =#=#=#= Begin test: Update an existing promotable score attribute (XML) =#=#=#= =#=#=#= End test: Update an existing promotable score attribute (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Update an existing promotable score attribute (XML) =#=#=#= Begin test: Query after updating an existing promotable score attribute =#=#=#= scope=status name=master-promotable-rsc value=5 =#=#=#= End test: Query after updating an existing promotable score attribute - OK (0) =#=#=#= * Passed: crm_attribute - Query after updating an existing promotable score attribute =#=#=#= Begin test: Query after updating an existing promotable score attribute (XML) =#=#=#= =#=#=#= End test: Query after updating an existing promotable score attribute (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Query after updating an existing promotable score attribute (XML) =#=#=#= Begin test: Delete an existing promotable score attribute =#=#=#= Deleted status attribute: id=status-1-master-promotable-rsc name=master-promotable-rsc =#=#=#= End test: Delete an existing promotable score attribute - OK (0) =#=#=#= * Passed: crm_attribute - Delete an existing promotable score attribute =#=#=#= Begin test: Delete an existing promotable score attribute (XML) =#=#=#= =#=#=#= End test: Delete an existing promotable score attribute (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Delete an existing promotable score attribute (XML) =#=#=#= Begin test: Query after deleting an existing promotable score attribute =#=#=#= crm_attribute: Error performing operation: No such device or address =#=#=#= End test: Query after deleting an existing promotable score attribute - No such object (105) =#=#=#= * Passed: crm_attribute - Query after deleting an existing promotable score attribute =#=#=#= Begin test: Query after deleting an existing promotable score attribute (XML) =#=#=#= crm_attribute: Error performing operation: No such device or address =#=#=#= End test: Query after deleting an existing promotable score attribute (XML) - No such object (105) =#=#=#= * Passed: crm_attribute - Query after deleting an existing promotable score attribute (XML) =#=#=#= Begin test: Update a promotable score attribute to -INFINITY =#=#=#= =#=#=#= End test: Update a promotable score attribute to -INFINITY - OK (0) =#=#=#= * Passed: crm_attribute - Update a promotable score attribute to -INFINITY =#=#=#= Begin test: Update a promotable score attribute to -INFINITY (XML) =#=#=#= =#=#=#= End test: Update a promotable score attribute to -INFINITY (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Update a promotable score attribute to -INFINITY (XML) =#=#=#= Begin test: Query after updating a promotable score attribute to -INFINITY =#=#=#= scope=status name=master-promotable-rsc value=-INFINITY =#=#=#= End test: Query after updating a promotable score attribute to -INFINITY - OK (0) =#=#=#= * Passed: crm_attribute - Query after updating a promotable score attribute to -INFINITY =#=#=#= Begin test: Query after updating a promotable score attribute to -INFINITY (XML) =#=#=#= =#=#=#= End test: Query after updating a promotable score attribute to -INFINITY (XML) - OK (0) =#=#=#= * Passed: crm_attribute - Query after updating a promotable score attribute to -INFINITY (XML) =#=#=#= Begin test: Try OCF_RESOURCE_INSTANCE if -p is specified with an empty string =#=#=#= scope=status name=master-promotable-rsc value=-INFINITY =#=#=#= End test: Try OCF_RESOURCE_INSTANCE if -p is specified with an empty string - OK (0) =#=#=#= * Passed: crm_attribute - Try OCF_RESOURCE_INSTANCE if -p is specified with an empty string =#=#=#= Begin test: Return usage error if both -p and OCF_RESOURCE_INSTANCE are empty strings =#=#=#= crm_attribute: -p/--promotion must be called from an OCF resource agent or with a resource ID specified =#=#=#= End test: Return usage error if both -p and OCF_RESOURCE_INSTANCE are empty strings - Incorrect usage (64) =#=#=#= * Passed: crm_attribute - Return usage error if both -p and OCF_RESOURCE_INSTANCE are empty strings =#=#=#= Begin test: Check that CIB_file="-" works - crm_mon =#=#=#= Cluster Summary: * Stack: corosync * Current DC: cluster02 (version) - partition with quorum * Last updated: * Last change: * 5 nodes configured * 32 resource instances configured (4 DISABLED) Node List: * Online: [ cluster01 cluster02 ] * GuestOnline: [ httpd-bundle-0 httpd-bundle-1 ] Active Resources: * Clone Set: ping-clone [ping]: * Started: [ cluster01 cluster02 ] * Fencing (stonith:fence_xvm): Started cluster01 * dummy (ocf:pacemaker:Dummy): Started cluster02 * Container bundle set: httpd-bundle [pcmk:http]: * httpd-bundle-0 (192.168.122.131) (ocf:heartbeat:apache): Started cluster01 * httpd-bundle-1 (192.168.122.132) (ocf:heartbeat:apache): Started cluster02 * httpd-bundle-2 (192.168.122.133) (ocf:heartbeat:apache): Stopped * Resource Group: exim-group: * Public-IP (ocf:heartbeat:IPaddr): Started cluster02 * Email (lsb:exim): Started cluster02 * Clone Set: mysql-clone-group [mysql-group]: * Started: [ cluster01 cluster02 ] * Clone Set: promotable-clone [promotable-rsc] (promotable): * Promoted: [ cluster02 ] * Unpromoted: [ cluster01 ] =#=#=#= End test: Check that CIB_file="-" works - crm_mon - OK (0) =#=#=#= * Passed: cat - Check that CIB_file="-" works - crm_mon =#=#=#= Begin test: Check that CIB_file="-" works - crm_resource =#=#=#= =#=#=#= End test: Check that CIB_file="-" works - crm_resource - OK (0) =#=#=#= * Passed: cat - Check that CIB_file="-" works - crm_resource =#=#=#= Begin test: Check that CIB_file="-" works - crmadmin =#=#=#= 11 =#=#=#= End test: Check that CIB_file="-" works - crmadmin - OK (0) =#=#=#= * Passed: cat - Check that CIB_file="-" works - crmadmin =#=#=#= Begin test: Get active shadow instance (no active instance) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance (no active instance) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance (no active instance) =#=#=#= Begin test: Get active shadow instance (no active instance) (XML) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance (no active instance) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance (no active instance) (XML) =#=#=#= Begin test: Get active shadow instance's file name (no active instance) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance's file name (no active instance) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's file name (no active instance) =#=#=#= Begin test: Get active shadow instance's file name (no active instance) (XML) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance's file name (no active instance) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's file name (no active instance) (XML) =#=#=#= Begin test: Get active shadow instance's contents (no active instance) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance's contents (no active instance) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's contents (no active instance) =#=#=#= Begin test: Get active shadow instance's contents (no active instance) (XML) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance's contents (no active instance) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's contents (no active instance) (XML) =#=#=#= Begin test: Get active shadow instance's diff (no active instance) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance's diff (no active instance) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (no active instance) =#=#=#= Begin test: Get active shadow instance's diff (no active instance) (XML) =#=#=#= crm_shadow: No active shadow configuration defined =#=#=#= End test: Get active shadow instance's diff (no active instance) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (no active instance) (XML) =#=#=#= Begin test: Create copied shadow instance =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance =#=#=#= Begin test: Create copied shadow instance (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (XML) =#=#=#= Begin test: Get active shadow instance (copied) =#=#=#= cts-cli =#=#=#= End test: Get active shadow instance (copied) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance (copied) =#=#=#= Begin test: Get active shadow instance (copied) (XML) =#=#=#= =#=#=#= End test: Get active shadow instance (copied) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance (copied) (XML) =#=#=#= Begin test: Get active shadow instance's file name (copied) =#=#=#= /tmp/cts-cli.shadow/shadow.cts-cli =#=#=#= End test: Get active shadow instance's file name (copied) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's file name (copied) =#=#=#= Begin test: Get active shadow instance's file name (copied) (XML) =#=#=#= =#=#=#= End test: Get active shadow instance's file name (copied) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's file name (copied) (XML) =#=#=#= Begin test: Get active shadow instance's contents (copied) =#=#=#= =#=#=#= End test: Get active shadow instance's contents (copied) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's contents (copied) =#=#=#= Begin test: Get active shadow instance's contents (copied) (XML) =#=#=#= ]]> =#=#=#= End test: Get active shadow instance's contents (copied) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's contents (copied) (XML) =#=#=#= Begin test: Get active shadow instance's diff (copied) =#=#=#= =#=#=#= End test: Get active shadow instance's diff (copied) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (copied) =#=#=#= Begin test: Get active shadow instance's diff (copied) (XML) =#=#=#= =#=#=#= End test: Get active shadow instance's diff (copied) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (copied) (XML) =#=#=#= Begin test: Get active shadow instance's diff (after changes) =#=#=#= Diff: --- 1.1.173 2 Diff: +++ 1.4.1 (null) -- /cib/configuration/op_defaults + /cib: @epoch=4, @num_updates=1 + /cib/configuration/resources/primitive[@id='dummy']: @description=desc ++ /cib/configuration/resources: ++ /cib/status: =#=#=#= End test: Get active shadow instance's diff (after changes) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after changes) =#=#=#= Begin test: Get active shadow instance's diff (after changes) (XML) =#=#=#= ]]> =#=#=#= End test: Get active shadow instance's diff (after changes) (XML) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after changes) (XML) =#=#=#= Begin test: Commit shadow instance =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance =#=#=#= Begin test: Commit shadow instance (force) =#=#=#= =#=#=#= End test: Commit shadow instance (force) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (force) =#=#=#= Begin test: Get active shadow instance's diff (after commit) =#=#=#= Diff: --- 1.2.0 2 Diff: +++ 1.4.1 (null) + /cib: @epoch=4, @num_updates=1 ++ /cib/status: =#=#=#= End test: Get active shadow instance's diff (after commit) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after commit) =#=#=#= Begin test: Commit shadow instance (force) (all) =#=#=#= =#=#=#= End test: Commit shadow instance (force) (all) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (force) (all) =#=#=#= Begin test: Get active shadow instance's diff (after commit all) =#=#=#= Diff: --- 1.4.2 2 Diff: +++ 1.4.1 (null) + /cib: @num_updates=1 =#=#=#= End test: Get active shadow instance's diff (after commit all) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after commit all) =#=#=#= Begin test: Commit shadow instance (XML) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (XML) =#=#=#= Begin test: Commit shadow instance (force) (XML) =#=#=#= =#=#=#= End test: Commit shadow instance (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (force) (XML) =#=#=#= Begin test: Get active shadow instance's diff (after commit) (XML) =#=#=#= ]]> =#=#=#= End test: Get active shadow instance's diff (after commit) (XML) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after commit) (XML) =#=#=#= Begin test: Commit shadow instance (force) (all) (XML) =#=#=#= =#=#=#= End test: Commit shadow instance (force) (all) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (force) (all) (XML) =#=#=#= Begin test: Get active shadow instance's diff (after commit all) (XML) =#=#=#= ]]> =#=#=#= End test: Get active shadow instance's diff (after commit all) (XML) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after commit all) (XML) =#=#=#= Begin test: Commit shadow instance (no active instance) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (no active instance) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (no active instance) =#=#=#= Begin test: Commit shadow instance (no active instance) (force) =#=#=#= =#=#=#= End test: Commit shadow instance (no active instance) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (no active instance) (force) =#=#=#= Begin test: Commit shadow instance (no active instance) (XML) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (no active instance) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (no active instance) (XML) =#=#=#= Begin test: Commit shadow instance (no active instance) (force) (XML) =#=#=#= =#=#=#= End test: Commit shadow instance (no active instance) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (no active instance) (force) (XML) =#=#=#= Begin test: Commit shadow instance (mismatch) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. Additionally, the supplied shadow instance (cts-cli) is not the same as the active one (nonexistent_shadow). To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (mismatch) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (mismatch) =#=#=#= Begin test: Commit shadow instance (mismatch) (force) =#=#=#= =#=#=#= End test: Commit shadow instance (mismatch) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (mismatch) (force) =#=#=#= Begin test: Commit shadow instance (mismatch) (XML) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. Additionally, the supplied shadow instance (cts-cli) is not the same as the active one (nonexistent_shadow). To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (mismatch) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (mismatch) (XML) =#=#=#= Begin test: Commit shadow instance (mismatch) (force) (XML) =#=#=#= =#=#=#= End test: Commit shadow instance (mismatch) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Commit shadow instance (mismatch) (force) (XML) =#=#=#= Begin test: Commit shadow instance (nonexistent shadow file) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (nonexistent shadow file) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent shadow file) =#=#=#= Begin test: Commit shadow instance (nonexistent shadow file) (force) =#=#=#= crm_shadow: Could not access shadow instance 'nonexistent_shadow': No such file or directory =#=#=#= End test: Commit shadow instance (nonexistent shadow file) (force) - No such object (105) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent shadow file) (force) =#=#=#= Begin test: Get active shadow instance's diff (nonexistent shadow file) =#=#=#= crm_shadow: Could not access shadow instance 'nonexistent_shadow': No such file or directory =#=#=#= End test: Get active shadow instance's diff (nonexistent shadow file) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (nonexistent shadow file) =#=#=#= Begin test: Commit shadow instance (nonexistent shadow file) (XML) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (nonexistent shadow file) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent shadow file) (XML) =#=#=#= Begin test: Commit shadow instance (nonexistent shadow file) (force) (XML) =#=#=#= crm_shadow: Could not access shadow instance 'nonexistent_shadow': No such file or directory =#=#=#= End test: Commit shadow instance (nonexistent shadow file) (force) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent shadow file) (force) (XML) =#=#=#= Begin test: Get active shadow instance's diff (nonexistent shadow file) (XML) =#=#=#= crm_shadow: Could not access shadow instance 'nonexistent_shadow': No such file or directory =#=#=#= End test: Get active shadow instance's diff (nonexistent shadow file) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (nonexistent shadow file) (XML) =#=#=#= Begin test: Commit shadow instance (nonexistent CIB file) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (nonexistent CIB file) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent CIB file) =#=#=#= Begin test: Commit shadow instance (nonexistent CIB file) (force) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Commit shadow instance (nonexistent CIB file) (force) - No such object (105) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent CIB file) (force) =#=#=#= Begin test: Get active shadow instance's diff (nonexistent CIB file) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Get active shadow instance's diff (nonexistent CIB file) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (nonexistent CIB file) =#=#=#= Begin test: Commit shadow instance (nonexistent CIB file) (XML) =#=#=#= crm_shadow: The commit command overwrites the active cluster configuration. To prevent accidental destruction of the cluster, the --force flag is required in order to proceed. =#=#=#= End test: Commit shadow instance (nonexistent CIB file) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent CIB file) (XML) =#=#=#= Begin test: Commit shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Commit shadow instance (nonexistent CIB file) (force) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Commit shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= Begin test: Get active shadow instance's diff (nonexistent CIB file) (XML) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Get active shadow instance's diff (nonexistent CIB file) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (nonexistent CIB file) (XML) =#=#=#= Begin test: Delete shadow instance =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance =#=#=#= Begin test: Delete shadow instance (force) =#=#=#= Remember to unset the CIB_shadow variable by entering the following into your shell: unset CIB_shadow =#=#=#= End test: Delete shadow instance (force) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (force) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Delete shadow instance (XML) =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (XML) =#=#=#= Begin test: Delete shadow instance (force) (XML) =#=#=#= Remember to unset the CIB_shadow variable by entering the following into your shell: unset CIB_shadow =#=#=#= End test: Delete shadow instance (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (force) (XML) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Delete shadow instance (no active instance) =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (no active instance) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (no active instance) =#=#=#= Begin test: Delete shadow instance (no active instance) (force) =#=#=#= =#=#=#= End test: Delete shadow instance (no active instance) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (no active instance) (force) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Delete shadow instance (no active instance) (XML) =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (no active instance) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (no active instance) (XML) =#=#=#= Begin test: Delete shadow instance (no active instance) (force) (XML) =#=#=#= =#=#=#= End test: Delete shadow instance (no active instance) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (no active instance) (force) (XML) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Delete shadow instance (mismatch) =#=#=#= crm_shadow: The delete command removes the specified shadow file. Additionally, the supplied shadow instance (cts-cli) is not the same as the active one (nonexistent_shadow). To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (mismatch) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (mismatch) =#=#=#= Begin test: Delete shadow instance (mismatch) (force) =#=#=#= =#=#=#= End test: Delete shadow instance (mismatch) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (mismatch) (force) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Delete shadow instance (mismatch) (XML) =#=#=#= crm_shadow: The delete command removes the specified shadow file. Additionally, the supplied shadow instance (cts-cli) is not the same as the active one (nonexistent_shadow). To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (mismatch) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (mismatch) (XML) =#=#=#= Begin test: Delete shadow instance (mismatch) (force) (XML) =#=#=#= =#=#=#= End test: Delete shadow instance (mismatch) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (mismatch) (force) (XML) =#=#=#= Begin test: Delete shadow instance (nonexistent shadow file) =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (nonexistent shadow file) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent shadow file) =#=#=#= Begin test: Delete shadow instance (nonexistent shadow file) (force) =#=#=#= Remember to unset the CIB_shadow variable by entering the following into your shell: unset CIB_shadow =#=#=#= End test: Delete shadow instance (nonexistent shadow file) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent shadow file) (force) =#=#=#= Begin test: Delete shadow instance (nonexistent shadow file) (XML) =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (nonexistent shadow file) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent shadow file) (XML) =#=#=#= Begin test: Delete shadow instance (nonexistent shadow file) (force) (XML) =#=#=#= Remember to unset the CIB_shadow variable by entering the following into your shell: unset CIB_shadow =#=#=#= End test: Delete shadow instance (nonexistent shadow file) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent shadow file) (force) (XML) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Delete shadow instance (nonexistent CIB file) =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (nonexistent CIB file) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent CIB file) =#=#=#= Begin test: Delete shadow instance (nonexistent CIB file) (force) =#=#=#= Remember to unset the CIB_shadow variable by entering the following into your shell: unset CIB_shadow =#=#=#= End test: Delete shadow instance (nonexistent CIB file) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent CIB file) (force) A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Delete shadow instance (nonexistent CIB file) (XML) =#=#=#= crm_shadow: The delete command removes the specified shadow file. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Delete shadow instance (nonexistent CIB file) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent CIB file) (XML) =#=#=#= Begin test: Delete shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= Remember to unset the CIB_shadow variable by entering the following into your shell: unset CIB_shadow =#=#=#= End test: Delete shadow instance (nonexistent CIB file) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Delete shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= Begin test: Create copied shadow instance (no active instance) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance (no active instance) - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (no active instance) =#=#=#= Begin test: Create copied shadow instance (no active instance) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance (no active instance) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (no active instance) (XML) =#=#=#= Begin test: Create copied shadow instance (mismatch) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance (mismatch) - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (mismatch) =#=#=#= Begin test: Create copied shadow instance (mismatch) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance (mismatch) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (mismatch) (XML) =#=#=#= Begin test: Create copied shadow instance (file already exists) =#=#=#= crm_shadow: A shadow instance 'cts-cli' already exists. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Create copied shadow instance (file already exists) - Cannot create output file (73) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (file already exists) =#=#=#= Begin test: Create copied shadow instance (file already exists) (force) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance (file already exists) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (file already exists) (force) =#=#=#= Begin test: Create copied shadow instance (file already exists) (XML) =#=#=#= crm_shadow: A shadow instance 'cts-cli' already exists. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Create copied shadow instance (file already exists) (XML) - Cannot create output file (73) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (file already exists) (XML) =#=#=#= Begin test: Create copied shadow instance (file already exists) (force) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create copied shadow instance (file already exists) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (file already exists) (force) (XML) =#=#=#= Begin test: Create copied shadow instance (nonexistent CIB file) (force) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Create copied shadow instance (nonexistent CIB file) (force) - No such object (105) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (nonexistent CIB file) (force) =#=#=#= Begin test: Create copied shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Create copied shadow instance (nonexistent CIB file) (force) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Create copied shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= Begin test: Create empty shadow instance =#=#=#= Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance =#=#=#= Begin test: Create empty shadow instance (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (XML) =#=#=#= Begin test: Create empty shadow instance (no active instance) =#=#=#= Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (no active instance) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (no active instance) =#=#=#= Begin test: Create empty shadow instance (no active instance) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (no active instance) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (no active instance) (XML) =#=#=#= Begin test: Create empty shadow instance (mismatch) =#=#=#= Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (mismatch) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (mismatch) =#=#=#= Begin test: Create empty shadow instance (mismatch) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (mismatch) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (mismatch) (XML) =#=#=#= Begin test: Create empty shadow instance (nonexistent CIB file) =#=#=#= Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (nonexistent CIB file) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (nonexistent CIB file) =#=#=#= Begin test: Create empty shadow instance (nonexistent CIB file) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (nonexistent CIB file) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (nonexistent CIB file) (XML) =#=#=#= Begin test: Create empty shadow instance (file already exists) =#=#=#= crm_shadow: A shadow instance 'cts-cli' already exists. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Create empty shadow instance (file already exists) - Cannot create output file (73) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (file already exists) =#=#=#= Begin test: Create empty shadow instance (file already exists) (force) =#=#=#= Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (file already exists) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (file already exists) (force) =#=#=#= Begin test: Create empty shadow instance (file already exists) (XML) =#=#=#= crm_shadow: A shadow instance 'cts-cli' already exists. To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Create empty shadow instance (file already exists) (XML) - Cannot create output file (73) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (file already exists) (XML) =#=#=#= Begin test: Create empty shadow instance (file already exists) (force) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Create empty shadow instance (file already exists) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Create empty shadow instance (file already exists) (force) (XML) =#=#=#= Begin test: Get active shadow instance's contents (empty CIB) =#=#=#= =#=#=#= End test: Get active shadow instance's contents (empty CIB) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's contents (empty CIB) =#=#=#= Begin test: Get active shadow instance's contents (empty CIB) (XML) =#=#=#= ]]> =#=#=#= End test: Get active shadow instance's contents (empty CIB) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's contents (empty CIB) (XML) =#=#=#= Begin test: Get active shadow instance's diff (empty CIB) =#=#=#= Diff: --- 1.1.173 2 Diff: +++ 0.1.0 (null) -- /cib/configuration/crm_config/cluster_property_set[@id='cib-bootstrap-options'] -- /cib/configuration/nodes/node[@id='1'] -- /cib/configuration/nodes/node[@id='2'] -- /cib/configuration/resources/clone[@id='ping-clone'] -- /cib/configuration/resources/primitive[@id='Fencing'] -- /cib/configuration/resources/primitive[@id='dummy'] -- /cib/configuration/resources/clone[@id='inactive-clone'] -- /cib/configuration/resources/group[@id='inactive-group'] -- /cib/configuration/resources/bundle[@id='httpd-bundle'] -- /cib/configuration/resources/group[@id='exim-group'] -- /cib/configuration/resources/clone[@id='mysql-clone-group'] -- /cib/configuration/resources/clone[@id='promotable-clone'] -- /cib/configuration/constraints/rsc_location[@id='not-on-cluster1'] -- /cib/configuration/constraints/rsc_location[@id='loc-promotable-clone'] -- /cib/configuration/tags -- /cib/configuration/op_defaults -- /cib/status/node_state[@id='2'] -- /cib/status/node_state[@id='1'] -- /cib/status/node_state[@id='httpd-bundle-0'] -- /cib/status/node_state[@id='httpd-bundle-1'] -+ /cib: @crm_feature_set=3.17.4, @num_updates=0, @admin_epoch=0 ++ /cib: @crm_feature_set=3.18.0, @num_updates=0, @admin_epoch=0 -- /cib: @cib-last-written, @update-origin, @update-client, @update-user, @have-quorum, @dc-uuid =#=#=#= End test: Get active shadow instance's diff (empty CIB) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (empty CIB) =#=#=#= Begin test: Get active shadow instance's diff (empty CIB) (XML) =#=#=#= - + ]]> =#=#=#= End test: Get active shadow instance's diff (empty CIB) (XML) - Error occurred (1) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (empty CIB) (XML) =#=#=#= Begin test: Reset shadow instance =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance =#=#=#= Begin test: Get active shadow instance's diff (after reset) =#=#=#= =#=#=#= End test: Get active shadow instance's diff (after reset) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after reset) Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Reset shadow instance (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance (XML) =#=#=#= Begin test: Get active shadow instance's diff (after reset) (XML) =#=#=#= =#=#=#= End test: Get active shadow instance's diff (after reset) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Get active shadow instance's diff (after reset) (XML) =#=#=#= Begin test: Reset shadow instance (no active instance) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance (no active instance) - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance (no active instance) Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Reset shadow instance (no active instance) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance (no active instance) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance (no active instance) (XML) =#=#=#= Begin test: Reset shadow instance (mismatch) =#=#=#= crm_shadow: The supplied shadow instance (cts-cli) is not the same as the active one (nonexistent_shadow). To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Reset shadow instance (mismatch) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Reset shadow instance (mismatch) =#=#=#= Begin test: Reset shadow instance (mismatch) (force) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance (mismatch) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance (mismatch) (force) Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Reset shadow instance (mismatch) (XML) =#=#=#= crm_shadow: The supplied shadow instance (cts-cli) is not the same as the active one (nonexistent_shadow). To prevent accidental destruction of the shadow file, the --force flag is required in order to proceed. =#=#=#= End test: Reset shadow instance (mismatch) (XML) - Incorrect usage (64) =#=#=#= * Passed: crm_shadow - Reset shadow instance (mismatch) (XML) =#=#=#= Begin test: Reset shadow instance (mismatch) (force) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance (mismatch) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance (mismatch) (force) (XML) Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Reset shadow instance (nonexistent CIB file) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Reset shadow instance (nonexistent CIB file) - No such object (105) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent CIB file) =#=#=#= Begin test: Reset shadow instance (nonexistent CIB file) (XML) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Reset shadow instance (nonexistent CIB file) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent CIB file) (XML) =#=#=#= Begin test: Reset shadow instance (nonexistent CIB file) (force) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Reset shadow instance (nonexistent CIB file) (force) - No such object (105) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent CIB file) (force) =#=#=#= Begin test: Reset shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= crm_shadow: Could not connect to CIB: No such device or address =#=#=#= End test: Reset shadow instance (nonexistent CIB file) (force) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent CIB file) (force) (XML) =#=#=#= Begin test: Reset shadow instance (nonexistent shadow file) =#=#=#= crm_shadow: Could not access shadow instance 'cts-cli': No such file or directory =#=#=#= End test: Reset shadow instance (nonexistent shadow file) - No such object (105) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent shadow file) =#=#=#= Begin test: Reset shadow instance (nonexistent shadow file) (force) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance (nonexistent shadow file) (force) - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent shadow file) (force) =#=#=#= Begin test: Reset shadow instance (nonexistent shadow file) (XML) =#=#=#= crm_shadow: Could not access shadow instance 'cts-cli': No such file or directory =#=#=#= End test: Reset shadow instance (nonexistent shadow file) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent shadow file) (XML) =#=#=#= Begin test: Reset shadow instance (nonexistent shadow file) (force) (XML) =#=#=#= A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Reset shadow instance (nonexistent shadow file) (force) (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Reset shadow instance (nonexistent shadow file) (force) (XML) Created new pacemaker configuration A new shadow instance was created. To begin using it, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= Begin test: Switch to new shadow instance =#=#=#= To switch to the named shadow instance, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Switch to new shadow instance - OK (0) =#=#=#= * Passed: crm_shadow - Switch to new shadow instance =#=#=#= Begin test: Switch to new shadow instance (XML) =#=#=#= To switch to the named shadow instance, enter the following into your shell: export CIB_shadow=cts-cli =#=#=#= End test: Switch to new shadow instance (XML) - OK (0) =#=#=#= * Passed: crm_shadow - Switch to new shadow instance (XML) =#=#=#= Begin test: Switch to nonexistent shadow instance =#=#=#= crm_shadow: Could not access shadow instance 'cts-cli': No such file or directory =#=#=#= End test: Switch to nonexistent shadow instance - No such object (105) =#=#=#= * Passed: crm_shadow - Switch to nonexistent shadow instance =#=#=#= Begin test: Switch to nonexistent shadow instance (force) =#=#=#= crm_shadow: Could not access shadow instance 'cts-cli': No such file or directory =#=#=#= End test: Switch to nonexistent shadow instance (force) - No such object (105) =#=#=#= * Passed: crm_shadow - Switch to nonexistent shadow instance (force) =#=#=#= Begin test: Switch to nonexistent shadow instance (XML) =#=#=#= crm_shadow: Could not access shadow instance 'cts-cli': No such file or directory =#=#=#= End test: Switch to nonexistent shadow instance (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Switch to nonexistent shadow instance (XML) =#=#=#= Begin test: Switch to nonexistent shadow instance (force) (XML) =#=#=#= crm_shadow: Could not access shadow instance 'cts-cli': No such file or directory =#=#=#= End test: Switch to nonexistent shadow instance (force) (XML) - No such object (105) =#=#=#= * Passed: crm_shadow - Switch to nonexistent shadow instance (force) (XML) diff --git a/cts/cts-scheduler.in b/cts/cts-scheduler.in index ee0cb7b472..e1f24d4c96 100644 --- a/cts/cts-scheduler.in +++ b/cts/cts-scheduler.in @@ -1,1626 +1,1628 @@ #!@PYTHON@ """ Regression tests for Pacemaker's scheduler """ __copyright__ = "Copyright 2004-2023 the Pacemaker project contributors" __license__ = "GNU General Public License version 2 or later (GPLv2+) WITHOUT ANY WARRANTY" import io import os import re import sys import stat import shlex import shutil import argparse import subprocess import platform import tempfile # These imports allow running from a source checkout after running `make`. # Note that while this doesn't necessarily mean it will successfully run tests, # but being able to see --help output can be useful. if os.path.exists("@abs_top_srcdir@/python"): sys.path.insert(0, "@abs_top_srcdir@/python") if os.path.exists("@abs_top_builddir@/python") and "@abs_top_builddir@" != "@abs_top_srcdir@": sys.path.insert(0, "@abs_top_builddir@/python") from pacemaker.buildoptions import BuildOptions from pacemaker.exitstatus import ExitStatus DESC = """Regression tests for Pacemaker's scheduler""" # Each entry in TESTS is a group of tests, where each test consists of a # test base name, test description, and additional test arguments. # Test groups will be separated by newlines in output. TESTS = [ [ [ "simple1", "Offline" ], [ "simple2", "Start" ], [ "simple3", "Start 2" ], [ "simple4", "Start Failed" ], [ "simple6", "Stop Start" ], [ "simple7", "Shutdown" ], #[ "simple8", "Stonith" ], #[ "simple9", "Lower version" ], #[ "simple10", "Higher version" ], [ "simple11", "Priority (ne)" ], [ "simple12", "Priority (eq)" ], [ "simple8", "Stickiness" ], ], [ [ "group1", "Group" ], [ "group2", "Group + Native" ], [ "group3", "Group + Group" ], [ "group4", "Group + Native (nothing)" ], [ "group5", "Group + Native (move)" ], [ "group6", "Group + Group (move)" ], [ "group7", "Group colocation" ], [ "group13", "Group colocation (cant run)" ], [ "group8", "Group anti-colocation" ], [ "group9", "Group recovery" ], [ "group10", "Group partial recovery" ], [ "group11", "Group target_role" ], [ "group14", "Group stop (graph terminated)" ], [ "group15", "Negative group colocation" ], [ "bug-1573", "Partial stop of a group with two children" ], [ "bug-1718", "Mandatory group ordering - Stop group_FUN" ], [ "failed-sticky-group", "Move group on last member failure despite infinite stickiness" ], [ "failed-sticky-anticolocated-group", "Move group on last member failure despite infinite stickiness and optional anti-colocation" ], [ "bug-lf-2619", "Move group on clone failure" ], [ "group-fail", "Ensure stop order is preserved for partially active groups" ], [ "group-unmanaged", "No need to restart r115 because r114 is unmanaged" ], [ "group-unmanaged-stopped", "Make sure r115 is stopped when r114 fails" ], [ "partial-unmanaged-group", "New member in partially unmanaged group" ], [ "group-dependents", "Account for the location preferences of things colocated with a group" ], [ "group-stop-ordering", "Ensure blocked group member stop does not force other member stops" ], [ "colocate-unmanaged-group", "Respect mandatory colocations even if earlier group member is unmanaged" ], ], [ [ "rsc_dep1", "Must not" ], [ "rsc_dep3", "Must" ], [ "rsc_dep5", "Must not 3" ], [ "rsc_dep7", "Must 3" ], [ "rsc_dep10", "Must (but cant)" ], [ "rsc_dep2", "Must (running)" ], [ "rsc_dep8", "Must (running : alt)" ], [ "rsc_dep4", "Must (running + move)" ], [ "asymmetric", "Asymmetric - require explicit location constraints" ], ], [ [ "orphan-0", "Orphan ignore" ], [ "orphan-1", "Orphan stop" ], [ "orphan-2", "Orphan stop, remove failcount" ], ], [ [ "params-0", "Params: No change" ], [ "params-1", "Params: Changed" ], [ "params-2", "Params: Resource definition" ], [ "params-3", "Params: Restart instead of reload if start pending" ], [ "params-4", "Params: Reload" ], [ "params-5", "Params: Restart based on probe digest" ], [ "novell-251689", "Resource definition change + target_role=stopped" ], [ "bug-lf-2106", "Restart all anonymous clone instances after config change" ], [ "params-6", "Params: Detect reload in previously migrated resource" ], [ "nvpair-id-ref", "Support id-ref in nvpair with optional name" ], [ "not-reschedule-unneeded-monitor", "Do not reschedule unneeded monitors while resource definitions have changed" ], [ "reload-becomes-restart", "Cancel reload if restart becomes required" ], [ "restart-with-extra-op-params", "Restart if with extra operation parameters upon changes of any" ], ], [ [ "target-0", "Target Role : baseline" ], [ "target-1", "Target Role : promoted" ], [ "target-2", "Target Role : invalid" ], ], [ [ "base-score", "Set a node's default score for all nodes" ], ], [ [ "date-1", "Dates", [ "-t", "2005-020" ] ], [ "date-2", "Date Spec - Pass", [ "-t", "2005-020T12:30" ] ], [ "date-3", "Date Spec - Fail", [ "-t", "2005-020T11:30" ] ], [ "origin", "Timing of recurring operations", [ "-t", "2014-05-07 00:28:00" ] ], [ "probe-0", "Probe (anon clone)" ], [ "probe-1", "Pending Probe" ], [ "probe-2", "Correctly re-probe cloned groups" ], [ "probe-3", "Probe (pending node)" ], [ "probe-4", "Probe (pending node + stopped resource)" ], [ "probe-pending-node", "Probe (pending node + unmanaged resource)" ], [ "failed-probe-primitive", "Maskable vs. unmaskable probe failures on primitive resources" ], [ "failed-probe-clone", "Maskable vs. unmaskable probe failures on cloned resources" ], [ "expired-failed-probe-primitive", "Maskable, expired probe failure on primitive resources" ], [ "standby", "Standby" ], [ "comments", "Comments" ], ], [ [ "one-or-more-0", "Everything starts" ], [ "one-or-more-1", "Nothing starts because of A" ], [ "one-or-more-2", "D can start because of C" ], [ "one-or-more-3", "D cannot start because of B and C" ], [ "one-or-more-4", "D cannot start because of target-role" ], [ "one-or-more-5", "Start A and F even though C and D are stopped" ], [ "one-or-more-6", "Leave A running even though B is stopped" ], [ "one-or-more-7", "Leave A running even though C is stopped" ], [ "bug-5140-require-all-false", "Allow basegrp:0 to stop" ], [ "clone-require-all-1", "clone B starts node 3 and 4" ], [ "clone-require-all-2", "clone B remains stopped everywhere" ], [ "clone-require-all-3", "clone B stops everywhere because A stops everywhere" ], [ "clone-require-all-4", "clone B remains on node 3 and 4 with only one instance of A remaining" ], [ "clone-require-all-5", "clone B starts on node 1 3 and 4" ], [ "clone-require-all-6", "clone B remains active after shutting down instances of A" ], [ "clone-require-all-7", "clone A and B both start at the same time. all instances of A start before B" ], [ "clone-require-all-no-interleave-1", "C starts everywhere after A and B" ], [ "clone-require-all-no-interleave-2", "C starts on nodes 1, 2, and 4 with only one active instance of B" ], [ "clone-require-all-no-interleave-3", "C remains active when instance of B is stopped on one node and started on another" ], [ "one-or-more-unrunnable-instances", "Avoid dependencies on instances that won't ever be started" ], ], [ [ "location-date-rules-1", "Use location constraints with ineffective date-based rules" ], [ "location-date-rules-2", "Use location constraints with effective date-based rules" ], [ "nvpair-date-rules-1", "Use nvpair blocks with a variety of date-based rules" ], [ "value-source", "Use location constraints with node attribute expressions using value-source" ], [ "rule-dbl-as-auto-number-match", "Floating-point rule values default to number comparison: match" ], [ "rule-dbl-as-auto-number-no-match", "Floating-point rule values default to number comparison: no " "match" ], [ "rule-dbl-as-integer-match", "Floating-point rule values set to integer comparison: match" ], [ "rule-dbl-as-integer-no-match", "Floating-point rule values set to integer comparison: no match" ], [ "rule-dbl-as-number-match", "Floating-point rule values set to number comparison: match" ], [ "rule-dbl-as-number-no-match", "Floating-point rule values set to number comparison: no match" ], [ "rule-dbl-parse-fail-default-str-match", "Floating-point rule values fail to parse, default to string " "comparison: match" ], [ "rule-dbl-parse-fail-default-str-no-match", "Floating-point rule values fail to parse, default to string " "comparison: no match" ], [ "rule-int-as-auto-integer-match", "Integer rule values default to integer comparison: match" ], [ "rule-int-as-auto-integer-no-match", "Integer rule values default to integer comparison: no match" ], [ "rule-int-as-integer-match", "Integer rule values set to integer comparison: match" ], [ "rule-int-as-integer-no-match", "Integer rule values set to integer comparison: no match" ], [ "rule-int-as-number-match", "Integer rule values set to number comparison: match" ], [ "rule-int-as-number-no-match", "Integer rule values set to number comparison: no match" ], [ "rule-int-parse-fail-default-str-match", "Integer rule values fail to parse, default to string " "comparison: match" ], [ "rule-int-parse-fail-default-str-no-match", "Integer rule values fail to parse, default to string " "comparison: no match" ], ], [ [ "order1", "Order start 1" ], [ "order2", "Order start 2" ], [ "order3", "Order stop" ], [ "order4", "Order (multiple)" ], [ "order5", "Order (move)" ], [ "order6", "Order (move w/ restart)" ], [ "order7", "Order (mandatory)" ], [ "order-optional", "Order (score=0)" ], [ "order-required", "Order (score=INFINITY)" ], [ "bug-lf-2171", "Prevent group start when clone is stopped" ], [ "order-clone", "Clone ordering should be able to prevent startup of dependent clones" ], [ "order-sets", "Ordering for resource sets" ], [ "order-serialize", "Serialize resources without inhibiting migration" ], [ "order-serialize-set", "Serialize a set of resources without inhibiting migration" ], [ "clone-order-primitive", "Order clone start after a primitive" ], [ "clone-order-16instances", "Verify ordering of 16 cloned resources" ], [ "order-optional-keyword", "Order (optional keyword)" ], [ "order-mandatory", "Order (mandatory keyword)" ], [ "bug-lf-2493", "Don't imply colocation requirements when applying ordering constraints with clones" ], [ "ordered-set-basic-startup", "Constraint set with default order settings" ], [ "ordered-set-natural", "Allow natural set ordering" ], [ "order-wrong-kind", "Order (error)" ], ], [ [ "coloc-loop", "Colocation - loop" ], [ "coloc-many-one", "Colocation - many-to-one" ], [ "coloc-list", "Colocation - many-to-one with list" ], [ "coloc-group", "Colocation - groups" ], [ "coloc-unpromoted-anti", "Anti-colocation with unpromoted shouldn't prevent promoted colocation" ], [ "coloc-attr", "Colocation based on node attributes" ], [ "coloc-negative-group", "Negative colocation with a group" ], [ "coloc-intra-set", "Intra-set colocation" ], [ "bug-lf-2435", "Colocation sets with a negative score" ], [ "coloc-clone-stays-active", "Ensure clones don't get stopped/demoted because a dependent must stop" ], [ "coloc_fp_logic", "Verify floating point calculations in colocation are working" ], [ "colo_promoted_w_native", "cl#5070 - Verify promotion order is affected when colocating promoted with primitive" ], [ "colo_unpromoted_w_native", "cl#5070 - Verify promotion order is affected when colocating unpromoted with primitive" ], [ "anti-colocation-order", "cl#5187 - Prevent resources in an anti-colocation from even temporarily running on a same node" ], [ "anti-colocation-promoted", "Organize order of actions for promoted resources in anti-colocations" ], [ "anti-colocation-unpromoted", "Organize order of actions for unpromoted resources in anti-colocations" ], [ "group-anticolocation", "Group with failed last member anti-colocated with another group" ], [ "group-colocation-failure", "Group with sole member failed, colocated with another group" ], [ "enforce-colo1", "Always enforce B with A INFINITY" ], [ "complex_enforce_colo", "Always enforce B with A INFINITY. (make sure heat-engine stops)" ], [ "coloc-dependee-should-stay", "Stickiness outweighs group colocation" ], [ "coloc-dependee-should-move", "Group colocation outweighs stickiness" ], [ "colocation-influence", "Respect colocation influence" ], [ "colocation-priority-group", "Apply group colocations in order of primary priority" ], [ "colocation-vs-stickiness", "Group stickiness outweighs anti-colocation score" ], [ "promoted-with-blocked", "Promoted role colocated with a resource with blocked start" ], [ "primitive-with-group-with-clone", "Consider group dependent when colocating with clone" ], [ "primitive-with-group-with-promoted", "Consider group dependent when colocating with promoted role" ], [ "primitive-with-unrunnable-group", "Block primitive colocated with group that can't start", ], ], [ [ "rsc-sets-seq-true", "Resource Sets - sequential=false" ], [ "rsc-sets-seq-false", "Resource Sets - sequential=true" ], [ "rsc-sets-clone", "Resource Sets - Clone" ], [ "rsc-sets-promoted", "Resource Sets - Promoted" ], [ "rsc-sets-clone-1", "Resource Sets - Clone (lf#2404)" ], ], [ [ "attrs1", "string: eq (and)" ], [ "attrs2", "string: lt / gt (and)" ], [ "attrs3", "string: ne (or)" ], [ "attrs4", "string: exists" ], [ "attrs5", "string: not_exists" ], [ "attrs6", "is_dc: true" ], [ "attrs7", "is_dc: false" ], [ "attrs8", "score_attribute" ], [ "per-node-attrs", "Per node resource parameters" ], ], [ [ "mon-rsc-1", "Schedule Monitor - start" ], [ "mon-rsc-2", "Schedule Monitor - move" ], [ "mon-rsc-3", "Schedule Monitor - pending start" ], [ "mon-rsc-4", "Schedule Monitor - move/pending start" ], ], [ [ "rec-rsc-0", "Resource Recover - no start" ], [ "rec-rsc-1", "Resource Recover - start" ], [ "rec-rsc-2", "Resource Recover - monitor" ], [ "rec-rsc-3", "Resource Recover - stop - ignore" ], [ "rec-rsc-4", "Resource Recover - stop - block" ], [ "rec-rsc-5", "Resource Recover - stop - fence" ], [ "rec-rsc-6", "Resource Recover - multiple - restart" ], [ "rec-rsc-7", "Resource Recover - multiple - stop" ], [ "rec-rsc-8", "Resource Recover - multiple - block" ], [ "rec-rsc-9", "Resource Recover - group/group" ], [ "stop-unexpected", "Recover multiply active group with stop_unexpected" ], [ "stop-unexpected-2", "Resource multiply active primitve with stop_unexpected" ], [ "monitor-recovery", "on-fail=block + resource recovery detected by recurring monitor" ], [ "stop-failure-no-quorum", "Stop failure without quorum" ], [ "stop-failure-no-fencing", "Stop failure without fencing available" ], [ "stop-failure-with-fencing", "Stop failure with fencing available" ], [ "multiple-active-block-group", "Support of multiple-active=block for resource groups" ], [ "multiple-monitor-one-failed", "Consider resource failed if any of the configured monitor operations failed" ], ], [ [ "quorum-1", "No quorum - ignore" ], [ "quorum-2", "No quorum - freeze" ], [ "quorum-3", "No quorum - stop" ], [ "quorum-4", "No quorum - start anyway" ], [ "quorum-5", "No quorum - start anyway (group)" ], [ "quorum-6", "No quorum - start anyway (clone)" ], [ "bug-cl-5212", "No promotion with no-quorum-policy=freeze" ], [ "suicide-needed-inquorate", "no-quorum-policy=suicide: suicide necessary" ], [ "suicide-not-needed-initial-quorum", "no-quorum-policy=suicide: suicide not necessary at initial quorum" ], [ "suicide-not-needed-never-quorate", "no-quorum-policy=suicide: suicide not necessary if never quorate" ], [ "suicide-not-needed-quorate", "no-quorum-policy=suicide: suicide necessary if quorate" ], ], [ [ "rec-node-1", "Node Recover - Startup - no fence" ], [ "rec-node-2", "Node Recover - Startup - fence" ], [ "rec-node-3", "Node Recover - HA down - no fence" ], [ "rec-node-4", "Node Recover - HA down - fence" ], [ "rec-node-5", "Node Recover - CRM down - no fence" ], [ "rec-node-6", "Node Recover - CRM down - fence" ], [ "rec-node-7", "Node Recover - no quorum - ignore" ], [ "rec-node-8", "Node Recover - no quorum - freeze" ], [ "rec-node-9", "Node Recover - no quorum - stop" ], [ "rec-node-10", "Node Recover - no quorum - stop w/fence" ], [ "rec-node-11", "Node Recover - CRM down w/ group - fence" ], [ "rec-node-12", "Node Recover - nothing active - fence" ], [ "rec-node-13", "Node Recover - failed resource + shutdown - fence" ], [ "rec-node-15", "Node Recover - unknown lrm section" ], [ "rec-node-14", "Serialize all stonith's" ], ], [ [ "multi1", "Multiple Active (stop/start)" ], ], [ [ "migrate-begin", "Normal migration" ], [ "migrate-success", "Completed migration" ], [ "migrate-partial-1", "Completed migration, missing stop on source" ], [ "migrate-partial-2", "Successful migrate_to only" ], [ "migrate-partial-3", "Successful migrate_to only, target down" ], [ "migrate-partial-4", "Migrate from the correct host after migrate_to+migrate_from" ], [ "bug-5186-partial-migrate", "Handle partial migration when src node loses membership" ], [ "migrate-fail-2", "Failed migrate_from" ], [ "migrate-fail-3", "Failed migrate_from + stop on source" ], [ "migrate-fail-4", "Failed migrate_from + stop on target - ideally we wouldn't need to re-stop on target" ], [ "migrate-fail-5", "Failed migrate_from + stop on source and target" ], [ "migrate-fail-6", "Failed migrate_to" ], [ "migrate-fail-7", "Failed migrate_to + stop on source" ], [ "migrate-fail-8", "Failed migrate_to + stop on target - ideally we wouldn't need to re-stop on target" ], [ "migrate-fail-9", "Failed migrate_to + stop on source and target" ], [ "migration-ping-pong", "Old migrate_to failure + successful migrate_from on same node" ], [ "migrate-stop", "Migration in a stopping stack" ], [ "migrate-start", "Migration in a starting stack" ], [ "migrate-stop_start", "Migration in a restarting stack" ], [ "migrate-stop-complex", "Migration in a complex stopping stack" ], [ "migrate-start-complex", "Migration in a complex starting stack" ], [ "migrate-stop-start-complex", "Migration in a complex moving stack" ], [ "migrate-shutdown", "Order the post-migration 'stop' before node shutdown" ], [ "migrate-1", "Migrate (migrate)" ], [ "migrate-2", "Migrate (stable)" ], [ "migrate-3", "Migrate (failed migrate_to)" ], [ "migrate-4", "Migrate (failed migrate_from)" ], [ "novell-252693", "Migration in a stopping stack" ], [ "novell-252693-2", "Migration in a starting stack" ], [ "novell-252693-3", "Non-Migration in a starting and stopping stack" ], [ "bug-1820", "Migration in a group" ], [ "bug-1820-1", "Non-migration in a group" ], [ "migrate-5", "Primitive migration with a clone" ], [ "migrate-fencing", "Migration after Fencing" ], [ "migrate-both-vms", "Migrate two VMs that have no colocation" ], [ "migration-behind-migrating-remote", "Migrate resource behind migrating remote connection" ], [ "1-a-then-bm-move-b", "Advanced migrate logic. A then B. migrate B" ], [ "2-am-then-b-move-a", "Advanced migrate logic, A then B, migrate A without stopping B" ], [ "3-am-then-bm-both-migrate", "Advanced migrate logic. A then B. migrate both" ], [ "4-am-then-bm-b-not-migratable", "Advanced migrate logic, A then B, B not migratable" ], [ "5-am-then-bm-a-not-migratable", "Advanced migrate logic. A then B. move both, a not migratable" ], [ "6-migrate-group", "Advanced migrate logic, migrate a group" ], [ "7-migrate-group-one-unmigratable", "Advanced migrate logic, migrate group mixed with allow-migrate true/false" ], [ "8-am-then-bm-a-migrating-b-stopping", "Advanced migrate logic, A then B, A migrating, B stopping" ], [ "9-am-then-bm-b-migrating-a-stopping", "Advanced migrate logic, A then B, B migrate, A stopping" ], [ "10-a-then-bm-b-move-a-clone", "Advanced migrate logic, A clone then B, migrate B while stopping A" ], [ "11-a-then-bm-b-move-a-clone-starting", "Advanced migrate logic, A clone then B, B moving while A is start/stopping" ], [ "a-promote-then-b-migrate", "A promote then B start. migrate B" ], [ "a-demote-then-b-migrate", "A demote then B stop. migrate B" ], [ "probe-target-of-failed-migrate_to-1", "Failed migrate_to, target rejoins" ], [ "probe-target-of-failed-migrate_to-2", "Failed migrate_to, target rejoined and probed" ], [ "partial-live-migration-multiple-active", "Prevent running on multiple nodes due to partial live migration" ], [ "migration-intermediary-cleaned", "Probe live-migration intermediary with no history" ], [ "bug-lf-2422", "Dependency on partially active group - stop ocfs:*" ], ], [ [ "clone-anon-probe-1", "Probe the correct (anonymous) clone instance for each node" ], [ "clone-anon-probe-2", "Avoid needless re-probing of anonymous clones" ], [ "clone-anon-failcount", "Merge failcounts for anonymous clones" ], [ "force-anon-clone-max", "Update clone-max properly when forcing a clone to be anonymous" ], [ "anon-instance-pending", "Assign anonymous clone instance numbers properly when action pending" ], [ "inc0", "Incarnation start" ], [ "inc1", "Incarnation start order" ], [ "inc2", "Incarnation silent restart, stop, move" ], [ "inc3", "Inter-incarnation ordering, silent restart, stop, move" ], [ "inc4", "Inter-incarnation ordering, silent restart, stop, move (ordered)" ], [ "inc5", "Inter-incarnation ordering, silent restart, stop, move (restart 1)" ], [ "inc6", "Inter-incarnation ordering, silent restart, stop, move (restart 2)" ], [ "inc7", "Clone colocation" ], [ "inc8", "Clone anti-colocation" ], [ "inc9", "Non-unique clone" ], [ "inc10", "Non-unique clone (stop)" ], [ "inc11", "Primitive colocation with clones" ], [ "inc12", "Clone shutdown" ], [ "cloned-group", "Make sure only the correct number of cloned groups are started" ], [ "cloned-group-stop", "Ensure stopping qpidd also stops glance and cinder" ], [ "clone-no-shuffle", "Don't prioritize allocation of instances that must be moved" ], [ "clone-max-zero", "Orphan processing with clone-max=0" ], [ "clone-anon-dup", "Bug LF#2087 - Correctly parse the state of anonymous clones that are active more than once per node" ], [ "bug-lf-2160", "Don't shuffle clones due to colocation" ], [ "bug-lf-2213", "clone-node-max enforcement for cloned groups" ], [ "bug-lf-2153", "Clone ordering constraints" ], [ "bug-lf-2361", "Ensure clones observe mandatory ordering constraints if the LHS is unrunnable" ], [ "bug-lf-2317", "Avoid needless restart of primitive depending on a clone" ], [ "bug-lf-2453", "Enforce mandatory clone ordering without colocation" ], [ "bug-lf-2508", "Correctly reconstruct the status of anonymous cloned groups" ], [ "bug-lf-2544", "Balanced clone placement" ], [ "bug-lf-2445", "Redistribute clones with node-max > 1 and stickiness = 0" ], [ "bug-lf-2574", "Avoid clone shuffle" ], [ "bug-lf-2581", "Avoid group restart due to unrelated clone (re)start" ], [ "bug-cl-5168", "Don't shuffle clones" ], [ "bug-cl-5170", "Prevent clone from starting with on-fail=block" ], [ "clone-fail-block-colocation", "Move colocated group when failed clone has on-fail=block" ], [ "clone-interleave-1", "Clone-3 cannot start on pcmk-1 due to interleaved ordering (no colocation)" ], [ "clone-interleave-2", "Clone-3 must stop on pcmk-1 due to interleaved ordering (no colocation)" ], [ "clone-interleave-3", "Clone-3 must be recovered on pcmk-1 due to interleaved ordering (no colocation)" ], [ "rebalance-unique-clones", "Rebalance unique clone instances with no stickiness" ], [ "clone-requires-quorum-recovery", "Clone with requires=quorum on failed node needing recovery" ], [ "clone-requires-quorum", "Clone with requires=quorum with presumed-inactive instance on failed node" ], ], [ [ "cloned_start_one", "order first clone then clone... first clone_min=2" ], [ "cloned_start_two", "order first clone then clone... first clone_min=2" ], [ "cloned_stop_one", "order first clone then clone... first clone_min=2" ], [ "cloned_stop_two", "order first clone then clone... first clone_min=2" ], [ "clone_min_interleave_start_one", "order first clone then clone... first clone_min=2 and then has interleave=true" ], [ "clone_min_interleave_start_two", "order first clone then clone... first clone_min=2 and then has interleave=true" ], [ "clone_min_interleave_stop_one", "order first clone then clone... first clone_min=2 and then has interleave=true" ], [ "clone_min_interleave_stop_two", "order first clone then clone... first clone_min=2 and then has interleave=true" ], [ "clone_min_start_one", "order first clone then primitive... first clone_min=2" ], [ "clone_min_start_two", "order first clone then primitive... first clone_min=2" ], [ "clone_min_stop_all", "order first clone then primitive... first clone_min=2" ], [ "clone_min_stop_one", "order first clone then primitive... first clone_min=2" ], [ "clone_min_stop_two", "order first clone then primitive... first clone_min=2" ], ], [ [ "unfence-startup", "Clean unfencing" ], [ "unfence-definition", "Unfencing when the agent changes" ], [ "unfence-parameters", "Unfencing when the agent parameters changes" ], [ "unfence-device", "Unfencing when a cluster has only fence devices" ], ], [ [ "promoted-0", "Stopped -> Unpromoted" ], [ "promoted-1", "Stopped -> Promote" ], [ "promoted-2", "Stopped -> Promote : notify" ], [ "promoted-3", "Stopped -> Promote : promoted location" ], [ "promoted-4", "Started -> Promote : promoted location" ], [ "promoted-5", "Promoted -> Promoted" ], [ "promoted-6", "Promoted -> Promoted (2)" ], [ "promoted-7", "Promoted -> Fenced" ], [ "promoted-8", "Promoted -> Fenced -> Moved" ], [ "promoted-9", "Stopped + Promotable + No quorum" ], [ "promoted-10", "Stopped -> Promotable : notify with monitor" ], [ "promoted-11", "Stopped -> Promote : colocation" ], [ "novell-239082", "Demote/Promote ordering" ], [ "novell-239087", "Stable promoted placement" ], [ "promoted-12", "Promotion based solely on rsc_location constraints" ], [ "promoted-13", "Include preferences of colocated resources when placing promoted" ], [ "promoted-demote", "Ordering when actions depends on demoting an unpromoted resource" ], [ "promoted-ordering", "Prevent resources from starting that need a promoted" ], [ "bug-1765", "Verify promoted-with-promoted colocation does not stop unpromoted instances" ], [ "promoted-group", "Promotion of cloned groups" ], [ "bug-lf-1852", "Don't shuffle promotable instances unnecessarily" ], [ "promoted-failed-demote", "Don't retry failed demote actions" ], [ "promoted-failed-demote-2", "Don't retry failed demote actions (notify=false)" ], [ "promoted-depend", "Ensure resources that depend on promoted instance don't get allocated until that does" ], [ "promoted-reattach", "Re-attach to a running promoted" ], [ "promoted-allow-start", "Don't include promoted score if it would prevent allocation" ], [ "promoted-colocation", "Allow promoted instances placemaker to be influenced by colocation constraints" ], [ "promoted-pseudo", "Make sure promote/demote pseudo actions are created correctly" ], [ "promoted-role", "Prevent target-role from promoting more than promoted-max instances" ], [ "bug-lf-2358", "Anti-colocation of promoted instances" ], [ "promoted-promotion-constraint", "Mandatory promoted colocation constraints" ], [ "unmanaged-promoted", "Ensure role is preserved for unmanaged resources" ], [ "promoted-unmanaged-monitor", "Start correct monitor for unmanaged promoted instances" ], [ "promoted-demote-2", "Demote does not clear past failure" ], [ "promoted-move", "Move promoted based on failure of colocated group" ], [ "promoted-probed-score", "Observe the promotion score of probed resources" ], [ "colocation_constraint_stops_promoted", "cl#5054 - Ensure promoted is demoted when stopped by colocation constraint" ], [ "colocation_constraint_stops_unpromoted", "cl#5054 - Ensure unpromoted is not demoted when stopped by colocation constraint" ], [ "order_constraint_stops_promoted", "cl#5054 - Ensure promoted is demoted when stopped by order constraint" ], [ "order_constraint_stops_unpromoted", "cl#5054 - Ensure unpromoted is not demoted when stopped by order constraint" ], [ "promoted_monitor_restart", "cl#5072 - Ensure promoted monitor operation will start after promotion" ], [ "bug-rh-880249", "Handle replacement of an m/s resource with a primitive" ], [ "bug-5143-ms-shuffle", "Prevent promoted instance shuffling due to promotion score" ], [ "promoted-demote-block", "Block promotion if demote fails with on-fail=block" ], [ "promoted-dependent-ban", "Don't stop instances from being active because a dependent is banned from that host" ], [ "promoted-stop", "Stop instances due to location constraint with role=Started" ], [ "promoted-partially-demoted-group", "Allow partially demoted group to finish demoting" ], [ "bug-cl-5213", "Ensure role colocation with -INFINITY is enforced" ], [ "bug-cl-5219", "Allow unrelated resources with a common colocation target to remain promoted" ], [ "promoted-asymmetrical-order", "Fix the behaviors of multi-state resources with asymmetrical ordering" ], [ "promoted-notify", "Promotion with notifications" ], [ "promoted-score-startup", "Use permanent promoted scores without LRM history" ], [ "failed-demote-recovery", "Recover resource in unpromoted role after demote fails" ], [ "failed-demote-recovery-promoted", "Recover resource in promoted role after demote fails" ], [ "on_fail_demote1", "Recovery with on-fail=\"demote\" on healthy cluster, remote, guest, and bundle nodes" ], [ "on_fail_demote2", "Recovery with on-fail=\"demote\" with promotion on different node" ], [ "on_fail_demote3", "Recovery with on-fail=\"demote\" with no promotion" ], [ "on_fail_demote4", "Recovery with on-fail=\"demote\" on failed cluster, remote, guest, and bundle nodes" ], [ "no_quorum_demote", "Promotable demotion and primitive stop with no-quorum-policy=\"demote\"" ], [ "no-promote-on-unrunnable-guest", "Don't select bundle instance for promotion when container can't run" ], [ "leftover-pending-monitor", "Prevent a leftover pending monitor from causing unexpected stop of other instances" ], ], [ [ "history-1", "Correctly parse stateful-1 resource state" ], ], [ [ "managed-0", "Managed (reference)" ], [ "managed-1", "Not managed - down" ], [ "managed-2", "Not managed - up" ], [ "bug-5028", "Shutdown should block if anything depends on an unmanaged resource" ], [ "bug-5028-detach", "Ensure detach still works" ], [ "bug-5028-bottom", "Ensure shutdown still blocks if the blocked resource is at the bottom of the stack" ], [ "unmanaged-stop-1", "cl#5155 - Block the stop of resources if any depending resource is unmanaged" ], [ "unmanaged-stop-2", "cl#5155 - Block the stop of resources if the first resource in a mandatory stop order is unmanaged" ], [ "unmanaged-stop-3", "cl#5155 - Block the stop of resources if any depending resource in a group is unmanaged" ], [ "unmanaged-stop-4", "cl#5155 - Block the stop of resources if any depending resource in the middle of a group is unmanaged" ], [ "unmanaged-block-restart", "Block restart of resources if any dependent resource in a group is unmanaged" ], ], [ [ "interleave-0", "Interleave (reference)" ], [ "interleave-1", "coloc - not interleaved" ], [ "interleave-2", "coloc - interleaved" ], [ "interleave-3", "coloc - interleaved (2)" ], [ "interleave-pseudo-stop", "Interleaved clone during stonith" ], [ "interleave-stop", "Interleaved clone during stop" ], [ "interleave-restart", "Interleaved clone during dependency restart" ], ], [ [ "notify-0", "Notify reference" ], [ "notify-1", "Notify simple" ], [ "notify-2", "Notify simple, confirm" ], [ "notify-3", "Notify move, confirm" ], [ "novell-239079", "Notification priority" ], #[ "notify-2", "Notify - 764" ], [ "notifs-for-unrunnable", "Don't schedule notifications for an unrunnable action" ], [ "route-remote-notify", "Route remote notify actions through correct cluster node" ], [ "notify-behind-stopping-remote", "Don't schedule notifications behind stopped remote" ], ], [ [ "594", "OSDL #594 - Unrunnable actions scheduled in transition" ], [ "662", "OSDL #662 - Two resources start on one node when incarnation_node_max = 1" ], [ "696", "OSDL #696 - CRM starts stonith RA without monitor" ], [ "726", "OSDL #726 - Attempting to schedule rsc_posic041_monitor_5000 _after_ a stop" ], [ "735", "OSDL #735 - Correctly detect that rsc_hadev1 is stopped on hadev3" ], [ "764", "OSDL #764 - Missing monitor op for DoFencing:child_DoFencing:1" ], [ "797", "OSDL #797 - Assert triggered: task_id_i > max_call_id" ], [ "829", "OSDL #829" ], [ "994", "OSDL #994 - Stopping the last resource in a resource group causes the entire group to be restarted" ], [ "994-2", "OSDL #994 - with a dependent resource" ], [ "1360", "OSDL #1360 - Clone stickiness" ], [ "1484", "OSDL #1484 - on_fail=stop" ], [ "1494", "OSDL #1494 - Clone stability" ], [ "unrunnable-1", "Unrunnable" ], [ "unrunnable-2", "Unrunnable 2" ], [ "stonith-0", "Stonith loop - 1" ], [ "stonith-1", "Stonith loop - 2" ], [ "stonith-2", "Stonith loop - 3" ], [ "stonith-3", "Stonith startup" ], [ "stonith-4", "Stonith node state" ], [ "dc-fence-ordering", "DC needs fencing while other nodes are shutting down" ], [ "bug-1572-1", "Recovery of groups depending on promotable role" ], [ "bug-1572-2", "Recovery of groups depending on promotable role when promoted is not re-promoted" ], [ "bug-1685", "Depends-on-promoted ordering" ], [ "bug-1822", "Don't promote partially active groups" ], [ "bug-pm-11", "New resource added to a m/s group" ], [ "bug-pm-12", "Recover only the failed portion of a cloned group" ], [ "bug-n-387749", "Don't shuffle clone instances" ], [ "bug-n-385265", "Don't ignore the failure stickiness of group children - resource_idvscommon should stay stopped" ], [ "bug-n-385265-2", "Ensure groups are migrated instead of remaining partially active on the current node" ], [ "bug-lf-1920", "Correctly handle probes that find active resources" ], [ "bnc-515172", "Location constraint with multiple expressions" ], [ "colocate-primitive-with-clone", "Optional colocation with a clone" ], [ "use-after-free-merge", "Use-after-free in native_merge_weights" ], [ "bug-lf-2551", "STONITH ordering for stop" ], [ "bug-lf-2606", "Stonith implies demote" ], [ "bug-lf-2474", "Ensure resource op timeout takes precedence over op_defaults" ], [ "bug-suse-707150", "Prevent vm-01 from starting due to colocation/ordering" ], [ "bug-5014-A-start-B-start", "Verify when A starts B starts using symmetrical=false" ], [ "bug-5014-A-stop-B-started", "Verify when A stops B does not stop if it has already started using symmetric=false" ], [ "bug-5014-A-stopped-B-stopped", "Verify when A is stopped and B has not started, B does not start before A using symmetric=false" ], [ "bug-5014-CthenAthenB-C-stopped", "Verify when C then A is symmetrical=true, A then B is symmetric=false, and C is stopped that nothing starts" ], [ "bug-5014-CLONE-A-start-B-start", "Verify when A starts B starts using clone resources with symmetric=false" ], [ "bug-5014-CLONE-A-stop-B-started", "Verify when A stops B does not stop if it has already started using clone resources with symmetric=false" ], [ "bug-5014-GROUP-A-start-B-start", "Verify when A starts B starts when using group resources with symmetric=false" ], [ "bug-5014-GROUP-A-stopped-B-started", "Verify when A stops B does not stop if it has already started using group resources with symmetric=false" ], [ "bug-5014-GROUP-A-stopped-B-stopped", "Verify when A is stopped and B has not started, B does not start before A using group resources with symmetric=false" ], [ "bug-5014-ordered-set-symmetrical-false", "Verify ordered sets work with symmetrical=false" ], [ "bug-5014-ordered-set-symmetrical-true", "Verify ordered sets work with symmetrical=true" ], [ "clbz5007-promotable-colocation", "Verify use of colocation scores other than INFINITY and -INFINITY work on multi-state resources" ], [ "bug-5038", "Prevent restart of anonymous clones when clone-max decreases" ], [ "bug-5025-1", "Automatically clean up failcount after resource config change with reload" ], [ "bug-5025-2", "Make sure clear failcount action isn't set when config does not change" ], [ "bug-5025-3", "Automatically clean up failcount after resource config change with restart" ], [ "bug-5025-4", "Clear failcount when last failure is a start op and rsc attributes changed" ], [ "failcount", "Ensure failcounts are correctly expired" ], [ "failcount-block", "Ensure failcounts are not expired when on-fail=block is present" ], [ "per-op-failcount", "Ensure per-operation failcount is handled and not passed to fence agent" ], [ "on-fail-ignore", "Ensure on-fail=ignore works even beyond migration-threshold" ], [ "monitor-onfail-restart", "bug-5058 - Monitor failure with on-fail set to restart" ], [ "monitor-onfail-stop", "bug-5058 - Monitor failure wiht on-fail set to stop" ], [ "bug-5059", "No need to restart p_stateful1:*" ], [ "bug-5069-op-enabled", "Test on-fail=ignore with failure when monitor is enabled" ], [ "bug-5069-op-disabled", "Test on-fail-ignore with failure when monitor is disabled" ], [ "obsolete-lrm-resource", "cl#5115 - Do not use obsolete lrm_resource sections" ], [ "expire-non-blocked-failure", "Ignore failure-timeout only if the failed operation has on-fail=block" ], [ "asymmetrical-order-move", "Respect asymmetrical ordering when trying to move resources" ], [ "asymmetrical-order-restart", "Respect asymmetrical ordering when restarting dependent resource" ], [ "start-then-stop-with-unfence", "Avoid graph loop with start-then-stop constraint plus unfencing" ], [ "order-expired-failure", "Order failcount cleanup after remote fencing" ], [ "expired-stop-1", "Expired stop failure should not block resource" ], [ "ignore_stonith_rsc_order1", "cl#5056- Ignore order constraint between stonith and non-stonith rsc" ], [ "ignore_stonith_rsc_order2", "cl#5056- Ignore order constraint with group rsc containing mixed stonith and non-stonith" ], [ "ignore_stonith_rsc_order3", "cl#5056- Ignore order constraint, stonith clone and mixed group" ], [ "ignore_stonith_rsc_order4", "cl#5056- Ignore order constraint, stonith clone and clone with nested mixed group" ], [ "honor_stonith_rsc_order1", "cl#5056- Honor order constraint, stonith clone and pure stonith group(single rsc)" ], [ "honor_stonith_rsc_order2", "cl#5056- Honor order constraint, stonith clone and pure stonith group(multiple rsc)" ], [ "honor_stonith_rsc_order3", "cl#5056- Honor order constraint, stonith clones with nested pure stonith group" ], [ "honor_stonith_rsc_order4", "cl#5056- Honor order constraint, between two native stonith rscs" ], [ "multiply-active-stonith", "Multiply active stonith" ], [ "probe-timeout", "cl#5099 - Default probe timeout" ], [ "order-first-probes", "cl#5301 - respect order constraints when relevant resources are being probed" ], [ "concurrent-fencing", "Allow performing fencing operations in parallel" ], [ "priority-fencing-delay", "Delay fencing targeting the more significant node" ], + [ "pending-node-no-uname", "Do not fence a pending node that doesn't have an uname in node state yet" ], + [ "node-pending-timeout", "Fence a pending node that has reached `node-pending-timeout`" ], ], [ [ "systemhealth1", "System Health () #1" ], [ "systemhealth2", "System Health () #2" ], [ "systemhealth3", "System Health () #3" ], [ "systemhealthn1", "System Health (None) #1" ], [ "systemhealthn2", "System Health (None) #2" ], [ "systemhealthn3", "System Health (None) #3" ], [ "systemhealthm1", "System Health (Migrate On Red) #1" ], [ "systemhealthm2", "System Health (Migrate On Red) #2" ], [ "systemhealthm3", "System Health (Migrate On Red) #3" ], [ "systemhealtho1", "System Health (Only Green) #1" ], [ "systemhealtho2", "System Health (Only Green) #2" ], [ "systemhealtho3", "System Health (Only Green) #3" ], [ "systemhealthp1", "System Health (Progessive) #1" ], [ "systemhealthp2", "System Health (Progessive) #2" ], [ "systemhealthp3", "System Health (Progessive) #3" ], [ "allow-unhealthy-nodes", "System Health (migrate-on-red + allow-unhealth-nodes)" ], ], [ [ "utilization", "Placement Strategy - utilization" ], [ "minimal", "Placement Strategy - minimal" ], [ "balanced", "Placement Strategy - balanced" ], ], [ [ "placement-stickiness", "Optimized Placement Strategy - stickiness" ], [ "placement-priority", "Optimized Placement Strategy - priority" ], [ "placement-location", "Optimized Placement Strategy - location" ], [ "placement-capacity", "Optimized Placement Strategy - capacity" ], ], [ [ "utilization-order1", "Utilization Order - Simple" ], [ "utilization-order2", "Utilization Order - Complex" ], [ "utilization-order3", "Utilization Order - Migrate" ], [ "utilization-order4", "Utilization Order - Live Migration (bnc#695440)" ], [ "utilization-complex", "Utilization with complex relationships" ], [ "utilization-shuffle", "Don't displace prmExPostgreSQLDB2 on act2, Start prmExPostgreSQLDB1 on act3" ], [ "load-stopped-loop", "Avoid transition loop due to load_stopped (cl#5044)" ], [ "load-stopped-loop-2", "cl#5235 - Prevent graph loops that can be introduced by load_stopped -> migrate_to ordering" ], ], [ [ "colocated-utilization-primitive-1", "Colocated Utilization - Primitive" ], [ "colocated-utilization-primitive-2", "Colocated Utilization - Choose the most capable node" ], [ "colocated-utilization-group", "Colocated Utilization - Group" ], [ "colocated-utilization-clone", "Colocated Utilization - Clone" ], [ "utilization-check-allowed-nodes", "Only check the capacities of the nodes that can run the resource" ], ], [ [ "reprobe-target_rc", "Ensure correct target_rc for reprobe of inactive resources" ], [ "node-maintenance-1", "cl#5128 - Node maintenance" ], [ "node-maintenance-2", "cl#5128 - Node maintenance (coming out of maintenance mode)" ], [ "shutdown-maintenance-node", "Do not fence a maintenance node if it shuts down cleanly" ], [ "rsc-maintenance", "Per-resource maintenance" ], ], [ [ "not-installed-agent", "The resource agent is missing" ], [ "not-installed-tools", "Something the resource agent needs is missing" ], ], [ [ "stopped-monitor-00", "Stopped Monitor - initial start" ], [ "stopped-monitor-01", "Stopped Monitor - failed started" ], [ "stopped-monitor-02", "Stopped Monitor - started multi-up" ], [ "stopped-monitor-03", "Stopped Monitor - stop started" ], [ "stopped-monitor-04", "Stopped Monitor - failed stop" ], [ "stopped-monitor-05", "Stopped Monitor - start unmanaged" ], [ "stopped-monitor-06", "Stopped Monitor - unmanaged multi-up" ], [ "stopped-monitor-07", "Stopped Monitor - start unmanaged multi-up" ], [ "stopped-monitor-08", "Stopped Monitor - migrate" ], [ "stopped-monitor-09", "Stopped Monitor - unmanage started" ], [ "stopped-monitor-10", "Stopped Monitor - unmanaged started multi-up" ], [ "stopped-monitor-11", "Stopped Monitor - stop unmanaged started" ], [ "stopped-monitor-12", "Stopped Monitor - unmanaged started multi-up (target-role=Stopped)" ], [ "stopped-monitor-20", "Stopped Monitor - initial stop" ], [ "stopped-monitor-21", "Stopped Monitor - stopped single-up" ], [ "stopped-monitor-22", "Stopped Monitor - stopped multi-up" ], [ "stopped-monitor-23", "Stopped Monitor - start stopped" ], [ "stopped-monitor-24", "Stopped Monitor - unmanage stopped" ], [ "stopped-monitor-25", "Stopped Monitor - unmanaged stopped multi-up" ], [ "stopped-monitor-26", "Stopped Monitor - start unmanaged stopped" ], [ "stopped-monitor-27", "Stopped Monitor - unmanaged stopped multi-up (target-role=Started)" ], [ "stopped-monitor-30", "Stopped Monitor - new node started" ], [ "stopped-monitor-31", "Stopped Monitor - new node stopped" ], ], [ # This is a combo test to check: # - probe timeout defaults to the minimum-interval monitor's # - duplicate recurring operations are ignored # - if timeout spec is bad, the default timeout is used # - failure is blocked with on-fail=block even if ISO8601 interval is specified # - started/stopped role monitors are started/stopped on right nodes [ "intervals", "Recurring monitor interval handling" ], ], [ [ "ticket-primitive-1", "Ticket - Primitive (loss-policy=stop, initial)" ], [ "ticket-primitive-2", "Ticket - Primitive (loss-policy=stop, granted)" ], [ "ticket-primitive-3", "Ticket - Primitive (loss-policy-stop, revoked)" ], [ "ticket-primitive-4", "Ticket - Primitive (loss-policy=demote, initial)" ], [ "ticket-primitive-5", "Ticket - Primitive (loss-policy=demote, granted)" ], [ "ticket-primitive-6", "Ticket - Primitive (loss-policy=demote, revoked)" ], [ "ticket-primitive-7", "Ticket - Primitive (loss-policy=fence, initial)" ], [ "ticket-primitive-8", "Ticket - Primitive (loss-policy=fence, granted)" ], [ "ticket-primitive-9", "Ticket - Primitive (loss-policy=fence, revoked)" ], [ "ticket-primitive-10", "Ticket - Primitive (loss-policy=freeze, initial)" ], [ "ticket-primitive-11", "Ticket - Primitive (loss-policy=freeze, granted)" ], [ "ticket-primitive-12", "Ticket - Primitive (loss-policy=freeze, revoked)" ], [ "ticket-primitive-13", "Ticket - Primitive (loss-policy=stop, standby, granted)" ], [ "ticket-primitive-14", "Ticket - Primitive (loss-policy=stop, granted, standby)" ], [ "ticket-primitive-15", "Ticket - Primitive (loss-policy=stop, standby, revoked)" ], [ "ticket-primitive-16", "Ticket - Primitive (loss-policy=demote, standby, granted)" ], [ "ticket-primitive-17", "Ticket - Primitive (loss-policy=demote, granted, standby)" ], [ "ticket-primitive-18", "Ticket - Primitive (loss-policy=demote, standby, revoked)" ], [ "ticket-primitive-19", "Ticket - Primitive (loss-policy=fence, standby, granted)" ], [ "ticket-primitive-20", "Ticket - Primitive (loss-policy=fence, granted, standby)" ], [ "ticket-primitive-21", "Ticket - Primitive (loss-policy=fence, standby, revoked)" ], [ "ticket-primitive-22", "Ticket - Primitive (loss-policy=freeze, standby, granted)" ], [ "ticket-primitive-23", "Ticket - Primitive (loss-policy=freeze, granted, standby)" ], [ "ticket-primitive-24", "Ticket - Primitive (loss-policy=freeze, standby, revoked)" ], ], [ [ "ticket-group-1", "Ticket - Group (loss-policy=stop, initial)" ], [ "ticket-group-2", "Ticket - Group (loss-policy=stop, granted)" ], [ "ticket-group-3", "Ticket - Group (loss-policy-stop, revoked)" ], [ "ticket-group-4", "Ticket - Group (loss-policy=demote, initial)" ], [ "ticket-group-5", "Ticket - Group (loss-policy=demote, granted)" ], [ "ticket-group-6", "Ticket - Group (loss-policy=demote, revoked)" ], [ "ticket-group-7", "Ticket - Group (loss-policy=fence, initial)" ], [ "ticket-group-8", "Ticket - Group (loss-policy=fence, granted)" ], [ "ticket-group-9", "Ticket - Group (loss-policy=fence, revoked)" ], [ "ticket-group-10", "Ticket - Group (loss-policy=freeze, initial)" ], [ "ticket-group-11", "Ticket - Group (loss-policy=freeze, granted)" ], [ "ticket-group-12", "Ticket - Group (loss-policy=freeze, revoked)" ], [ "ticket-group-13", "Ticket - Group (loss-policy=stop, standby, granted)" ], [ "ticket-group-14", "Ticket - Group (loss-policy=stop, granted, standby)" ], [ "ticket-group-15", "Ticket - Group (loss-policy=stop, standby, revoked)" ], [ "ticket-group-16", "Ticket - Group (loss-policy=demote, standby, granted)" ], [ "ticket-group-17", "Ticket - Group (loss-policy=demote, granted, standby)" ], [ "ticket-group-18", "Ticket - Group (loss-policy=demote, standby, revoked)" ], [ "ticket-group-19", "Ticket - Group (loss-policy=fence, standby, granted)" ], [ "ticket-group-20", "Ticket - Group (loss-policy=fence, granted, standby)" ], [ "ticket-group-21", "Ticket - Group (loss-policy=fence, standby, revoked)" ], [ "ticket-group-22", "Ticket - Group (loss-policy=freeze, standby, granted)" ], [ "ticket-group-23", "Ticket - Group (loss-policy=freeze, granted, standby)" ], [ "ticket-group-24", "Ticket - Group (loss-policy=freeze, standby, revoked)" ], ], [ [ "ticket-clone-1", "Ticket - Clone (loss-policy=stop, initial)" ], [ "ticket-clone-2", "Ticket - Clone (loss-policy=stop, granted)" ], [ "ticket-clone-3", "Ticket - Clone (loss-policy-stop, revoked)" ], [ "ticket-clone-4", "Ticket - Clone (loss-policy=demote, initial)" ], [ "ticket-clone-5", "Ticket - Clone (loss-policy=demote, granted)" ], [ "ticket-clone-6", "Ticket - Clone (loss-policy=demote, revoked)" ], [ "ticket-clone-7", "Ticket - Clone (loss-policy=fence, initial)" ], [ "ticket-clone-8", "Ticket - Clone (loss-policy=fence, granted)" ], [ "ticket-clone-9", "Ticket - Clone (loss-policy=fence, revoked)" ], [ "ticket-clone-10", "Ticket - Clone (loss-policy=freeze, initial)" ], [ "ticket-clone-11", "Ticket - Clone (loss-policy=freeze, granted)" ], [ "ticket-clone-12", "Ticket - Clone (loss-policy=freeze, revoked)" ], [ "ticket-clone-13", "Ticket - Clone (loss-policy=stop, standby, granted)" ], [ "ticket-clone-14", "Ticket - Clone (loss-policy=stop, granted, standby)" ], [ "ticket-clone-15", "Ticket - Clone (loss-policy=stop, standby, revoked)" ], [ "ticket-clone-16", "Ticket - Clone (loss-policy=demote, standby, granted)" ], [ "ticket-clone-17", "Ticket - Clone (loss-policy=demote, granted, standby)" ], [ "ticket-clone-18", "Ticket - Clone (loss-policy=demote, standby, revoked)" ], [ "ticket-clone-19", "Ticket - Clone (loss-policy=fence, standby, granted)" ], [ "ticket-clone-20", "Ticket - Clone (loss-policy=fence, granted, standby)" ], [ "ticket-clone-21", "Ticket - Clone (loss-policy=fence, standby, revoked)" ], [ "ticket-clone-22", "Ticket - Clone (loss-policy=freeze, standby, granted)" ], [ "ticket-clone-23", "Ticket - Clone (loss-policy=freeze, granted, standby)" ], [ "ticket-clone-24", "Ticket - Clone (loss-policy=freeze, standby, revoked)" ], ], [ [ "ticket-promoted-1", "Ticket - Promoted (loss-policy=stop, initial)" ], [ "ticket-promoted-2", "Ticket - Promoted (loss-policy=stop, granted)" ], [ "ticket-promoted-3", "Ticket - Promoted (loss-policy-stop, revoked)" ], [ "ticket-promoted-4", "Ticket - Promoted (loss-policy=demote, initial)" ], [ "ticket-promoted-5", "Ticket - Promoted (loss-policy=demote, granted)" ], [ "ticket-promoted-6", "Ticket - Promoted (loss-policy=demote, revoked)" ], [ "ticket-promoted-7", "Ticket - Promoted (loss-policy=fence, initial)" ], [ "ticket-promoted-8", "Ticket - Promoted (loss-policy=fence, granted)" ], [ "ticket-promoted-9", "Ticket - Promoted (loss-policy=fence, revoked)" ], [ "ticket-promoted-10", "Ticket - Promoted (loss-policy=freeze, initial)" ], [ "ticket-promoted-11", "Ticket - Promoted (loss-policy=freeze, granted)" ], [ "ticket-promoted-12", "Ticket - Promoted (loss-policy=freeze, revoked)" ], [ "ticket-promoted-13", "Ticket - Promoted (loss-policy=stop, standby, granted)" ], [ "ticket-promoted-14", "Ticket - Promoted (loss-policy=stop, granted, standby)" ], [ "ticket-promoted-15", "Ticket - Promoted (loss-policy=stop, standby, revoked)" ], [ "ticket-promoted-16", "Ticket - Promoted (loss-policy=demote, standby, granted)" ], [ "ticket-promoted-17", "Ticket - Promoted (loss-policy=demote, granted, standby)" ], [ "ticket-promoted-18", "Ticket - Promoted (loss-policy=demote, standby, revoked)" ], [ "ticket-promoted-19", "Ticket - Promoted (loss-policy=fence, standby, granted)" ], [ "ticket-promoted-20", "Ticket - Promoted (loss-policy=fence, granted, standby)" ], [ "ticket-promoted-21", "Ticket - Promoted (loss-policy=fence, standby, revoked)" ], [ "ticket-promoted-22", "Ticket - Promoted (loss-policy=freeze, standby, granted)" ], [ "ticket-promoted-23", "Ticket - Promoted (loss-policy=freeze, granted, standby)" ], [ "ticket-promoted-24", "Ticket - Promoted (loss-policy=freeze, standby, revoked)" ], ], [ [ "ticket-rsc-sets-1", "Ticket - Resource sets (1 ticket, initial)" ], [ "ticket-rsc-sets-2", "Ticket - Resource sets (1 ticket, granted)" ], [ "ticket-rsc-sets-3", "Ticket - Resource sets (1 ticket, revoked)" ], [ "ticket-rsc-sets-4", "Ticket - Resource sets (2 tickets, initial)" ], [ "ticket-rsc-sets-5", "Ticket - Resource sets (2 tickets, granted)" ], [ "ticket-rsc-sets-6", "Ticket - Resource sets (2 tickets, granted)" ], [ "ticket-rsc-sets-7", "Ticket - Resource sets (2 tickets, revoked)" ], [ "ticket-rsc-sets-8", "Ticket - Resource sets (1 ticket, standby, granted)" ], [ "ticket-rsc-sets-9", "Ticket - Resource sets (1 ticket, granted, standby)" ], [ "ticket-rsc-sets-10", "Ticket - Resource sets (1 ticket, standby, revoked)" ], [ "ticket-rsc-sets-11", "Ticket - Resource sets (2 tickets, standby, granted)" ], [ "ticket-rsc-sets-12", "Ticket - Resource sets (2 tickets, standby, granted)" ], [ "ticket-rsc-sets-13", "Ticket - Resource sets (2 tickets, granted, standby)" ], [ "ticket-rsc-sets-14", "Ticket - Resource sets (2 tickets, standby, revoked)" ], [ "cluster-specific-params", "Cluster-specific instance attributes based on rules" ], [ "site-specific-params", "Site-specific instance attributes based on rules" ], ], [ [ "template-1", "Template - 1" ], [ "template-2", "Template - 2" ], [ "template-3", "Template - 3 (merge operations)" ], [ "template-coloc-1", "Template - Colocation 1" ], [ "template-coloc-2", "Template - Colocation 2" ], [ "template-coloc-3", "Template - Colocation 3" ], [ "template-order-1", "Template - Order 1" ], [ "template-order-2", "Template - Order 2" ], [ "template-order-3", "Template - Order 3" ], [ "template-ticket", "Template - Ticket" ], [ "template-rsc-sets-1", "Template - Resource Sets 1" ], [ "template-rsc-sets-2", "Template - Resource Sets 2" ], [ "template-rsc-sets-3", "Template - Resource Sets 3" ], [ "template-rsc-sets-4", "Template - Resource Sets 4" ], [ "template-clone-primitive", "Cloned primitive from template" ], [ "template-clone-group", "Cloned group from template" ], [ "location-sets-templates", "Resource sets and templates - Location" ], [ "tags-coloc-order-1", "Tags - Colocation and Order (Simple)" ], [ "tags-coloc-order-2", "Tags - Colocation and Order (Resource Sets with Templates)" ], [ "tags-location", "Tags - Location" ], [ "tags-ticket", "Tags - Ticket" ], ], [ [ "container-1", "Container - initial" ], [ "container-2", "Container - monitor failed" ], [ "container-3", "Container - stop failed" ], [ "container-4", "Container - reached migration-threshold" ], [ "container-group-1", "Container in group - initial" ], [ "container-group-2", "Container in group - monitor failed" ], [ "container-group-3", "Container in group - stop failed" ], [ "container-group-4", "Container in group - reached migration-threshold" ], [ "container-is-remote-node", "Place resource within container when container is remote-node" ], [ "bug-rh-1097457", "Kill user defined container/contents ordering" ], [ "bug-cl-5247", "Graph loop when recovering m/s resource in a container" ], [ "bundle-order-startup", "Bundle startup ordering" ], [ "bundle-order-partial-start", "Bundle startup ordering when some dependencies are already running" ], [ "bundle-order-partial-start-2", "Bundle startup ordering when some dependencies and the container are already running" ], [ "bundle-order-stop", "Bundle stop ordering" ], [ "bundle-order-partial-stop", "Bundle startup ordering when some dependencies are already stopped" ], [ "bundle-order-stop-on-remote", "Stop nested resource after bringing up the connection" ], [ "bundle-order-startup-clone", "Prevent startup because bundle isn't promoted" ], [ "bundle-order-startup-clone-2", "Bundle startup with clones" ], [ "bundle-order-stop-clone", "Stop bundle because clone is stopping" ], [ "bundle-interleave-start", "Interleave bundle starts" ], [ "bundle-interleave-promote", "Interleave bundle promotes" ], [ "bundle-nested-colocation", "Colocation of nested connection resources" ], [ "bundle-order-fencing", "Order pseudo bundle fencing after parent node fencing if both are happening" ], [ "bundle-probe-order-1", "order 1" ], [ "bundle-probe-order-2", "order 2" ], [ "bundle-probe-order-3", "order 3" ], [ "bundle-probe-remotes", "Ensure remotes get probed too" ], [ "bundle-replicas-change", "Change bundle from 1 replica to multiple" ], [ "bundle-connection-with-container", "Don't move a container due to connection preferences" ], [ "nested-remote-recovery", "Recover bundle's container hosted on remote node" ], ], [ [ "whitebox-fail1", "Fail whitebox container rsc" ], [ "whitebox-fail2", "Fail cluster connection to guest node" ], [ "whitebox-fail3", "Failed containers should not run nested on remote nodes" ], [ "whitebox-start", "Start whitebox container with resources assigned to it" ], [ "whitebox-stop", "Stop whitebox container with resources assigned to it" ], [ "whitebox-move", "Move whitebox container with resources assigned to it" ], [ "whitebox-asymmetric", "Verify connection rsc opts-in based on container resource" ], [ "whitebox-ms-ordering", "Verify promote/demote can not occur before connection is established" ], [ "whitebox-ms-ordering-move", "Stop/Start cycle within a moving container" ], [ "whitebox-orphaned", "Properly shutdown orphaned whitebox container" ], [ "whitebox-orphan-ms", "Properly tear down orphan ms resources on remote-nodes" ], [ "whitebox-unexpectedly-running", "Recover container nodes the cluster did not start" ], [ "whitebox-migrate1", "Migrate both container and connection resource" ], [ "whitebox-imply-stop-on-fence", "imply stop action on container node rsc when host node is fenced" ], [ "whitebox-nested-group", "Verify guest remote-node works nested in a group" ], [ "guest-node-host-dies", "Verify guest node is recovered if host goes away" ], [ "guest-node-cleanup", "Order guest node connection recovery after container probe" ], [ "guest-host-not-fenceable", "Actions on guest node are unrunnable if host is unclean and cannot be fenced" ], ], [ [ "remote-startup-probes", "Baremetal remote-node startup probes" ], [ "remote-startup", "Startup a newly discovered remote-nodes with no status" ], [ "remote-fence-unclean", "Fence unclean baremetal remote-node" ], [ "remote-fence-unclean2", "Fence baremetal remote-node after cluster node fails and connection can not be recovered" ], [ "remote-fence-unclean-3", "Probe failed remote nodes (triggers fencing)" ], [ "remote-move", "Move remote-node connection resource" ], [ "remote-disable", "Disable a baremetal remote-node" ], [ "remote-probe-disable", "Probe then stop a baremetal remote-node" ], [ "remote-orphaned", "Properly shutdown orphaned connection resource" ], [ "remote-orphaned2", "verify we can handle orphaned remote connections with active resources on the remote" ], [ "remote-recover", "Recover connection resource after cluster-node fails" ], [ "remote-stale-node-entry", "Make sure we properly handle leftover remote-node entries in the node section" ], [ "remote-partial-migrate", "Make sure partial migrations are handled before ops on the remote node" ], [ "remote-partial-migrate2", "Make sure partial migration target is prefered for remote connection" ], [ "remote-recover-fail", "Make sure start failure causes fencing if rsc are active on remote" ], [ "remote-start-fail", "Make sure a start failure does not result in fencing if no active resources are on remote" ], [ "remote-unclean2", "Make monitor failure always results in fencing, even if no rsc are active on remote" ], [ "remote-fence-before-reconnect", "Fence before clearing recurring monitor failure" ], [ "remote-recovery", "Recover remote connections before attempting demotion" ], [ "remote-recover-connection", "Optimistically recovery of only the connection" ], [ "remote-recover-all", "Fencing when the connection has no home" ], [ "remote-recover-no-resources", "Fencing when the connection has no home and no active resources" ], [ "remote-recover-unknown", "Fencing when the connection has no home and the remote has no operation history" ], [ "remote-reconnect-delay", "Waiting for remote reconnect interval to expire" ], [ "remote-connection-unrecoverable", "Remote connection host must be fenced, with connection unrecoverable" ], [ "remote-connection-shutdown", "Remote connection shutdown" ], [ "cancel-behind-moving-remote", "Route recurring monitor cancellations through original node of a moving remote connection" ], ], [ [ "resource-discovery", "Exercises resource-discovery location constraint option" ], [ "rsc-discovery-per-node", "Disable resource discovery per node" ], [ "shutdown-lock", "Ensure shutdown lock works properly" ], [ "shutdown-lock-expiration", "Ensure shutdown lock expiration works properly" ], ], [ [ "op-defaults", "Test op_defaults conditional expressions" ], [ "op-defaults-2", "Test op_defaults AND'ed conditional expressions" ], [ "op-defaults-3", "Test op_defaults precedence" ], [ "rsc-defaults", "Test rsc_defaults conditional expressions" ], [ "rsc-defaults-2", "Test rsc_defaults conditional expressions without type" ], ], [ [ "stop-all-resources", "Test stop-all-resources=true "], ], [ [ "ocf_degraded-remap-ocf_ok", "Test degraded remapped to OK" ], [ "ocf_degraded_promoted-remap-ocf_ok", "Test degraded promoted remapped to OK"], ], ] TESTS_64BIT = [ [ [ "year-2038", "Check handling of timestamps beyond 2038-01-19 03:14:08 UTC" ], ], ] def is_executable(path): """ Check whether a file at a given path is executable. """ try: return os.stat(path)[stat.ST_MODE] & stat.S_IXUSR except OSError: return False def diff(file1, file2, **kwargs): """ Call diff on two files """ return subprocess.call([ "diff", "-u", "-N", "--ignore-all-space", "--ignore-blank-lines", file1, file2 ], **kwargs) def sort_file(filename): """ Sort a file alphabetically """ with io.open(filename, "rt") as f: lines = sorted(f) with io.open(filename, "wt") as f: f.writelines(lines) def remove_files(filenames): """ Remove a list of files """ for filename in filenames: try: os.remove(filename) except OSError: pass def normalize(filename): """ Remove text from a file that isn't important for comparison """ if not hasattr(normalize, "patterns"): normalize.patterns = [ re.compile(r'crm_feature_set="[^"]*"'), re.compile(r'batch-limit="[0-9]*"') ] if os.path.isfile(filename): with io.open(filename, "rt") as f: lines = f.readlines() with io.open(filename, "wt") as f: for line in lines: for pattern in normalize.patterns: line = pattern.sub("", line) f.write(line) def cat(filename, dest=sys.stdout): """ Copy a file to a destination file descriptor """ with io.open(filename, "rt") as f: shutil.copyfileobj(f, dest) class CtsScheduler(object): """ Regression tests for Pacemaker's scheduler """ def _parse_args(self, argv): """ Parse command-line arguments """ parser = argparse.ArgumentParser(description=DESC) parser.add_argument('-V', '--verbose', action='count', help='Display any differences from expected output') parser.add_argument('--run', metavar='TEST', help=('Run only single specified test (any further ' 'arguments will be passed to crm_simulate)')) parser.add_argument('--update', action='store_true', help='Update expected results with actual results') parser.add_argument('-b', '--binary', metavar='PATH', help='Specify path to crm_simulate') parser.add_argument('-i', '--io-dir', metavar='PATH', help='Specify path to regression test data directory') parser.add_argument('-o', '--out-dir', metavar='PATH', help='Specify where intermediate and output files should go') parser.add_argument('-v', '--valgrind', action='store_true', help='Run all commands under valgrind') parser.add_argument('--valgrind-dhat', action='store_true', help='Run all commands under valgrind with heap analyzer') parser.add_argument('--valgrind-skip-output', action='store_true', help='If running under valgrind, do not display output') parser.add_argument('--testcmd-options', metavar='OPTIONS', default='', help='Additional options for command under test') # argparse can't handle "everything after --run TEST", so grab that self.single_test_args = [] narg = 0 for arg in argv: narg = narg + 1 if arg == '--run': (argv, self.single_test_args) = (argv[:narg+1], argv[narg+1:]) break self.args = parser.parse_args(argv[1:]) def _error(self, s): print(" * ERROR: %s" % s) def _failed(self, s): print(" * FAILED: %s" % s) def _get_valgrind_cmd(self): """ Return command arguments needed (or not) to run valgrind """ if self.args.valgrind: os.environ['G_SLICE'] = "always-malloc" return [ "valgrind", "-q", "--gen-suppressions=all", "--time-stamp=yes", "--trace-children=no", "--show-reachable=no", "--leak-check=full", "--num-callers=20", "--suppressions=%s/valgrind-pcmk.suppressions" % (self.test_home) ] if self.args.valgrind_dhat: os.environ['G_SLICE'] = "always-malloc" return [ "valgrind", "--tool=exp-dhat", "--time-stamp=yes", "--trace-children=no", "--show-top-n=100", "--num-callers=4" ] return [] def _get_simulator_cmd(self): """ Locate the simulation binary """ if self.args.binary is None: self.args.binary = BuildOptions._BUILD_DIR + "/tools/crm_simulate" if not is_executable(self.args.binary): self.args.binary = BuildOptions.SBIN_DIR + "/crm_simulate" if not is_executable(self.args.binary): # @TODO it would be more pythonic to raise an exception self._error("Test binary " + self.args.binary + " not found") sys.exit(ExitStatus.NOT_INSTALLED) return [ self.args.binary ] + shlex.split(self.args.testcmd_options) def set_schema_env(self): """ Ensure schema directory environment variable is set, if possible """ try: return os.environ['PCMK_schema_directory'] except KeyError: for d in [ os.path.join(BuildOptions._BUILD_DIR, "xml"), BuildOptions.SCHEMA_DIR ]: if os.path.isdir(d): os.environ['PCMK_schema_directory'] = d return d return None def __init__(self, argv=sys.argv): # Ensure all command output is in portable locale for comparison os.environ['LC_ALL'] = "C" self._parse_args(argv) # Where this executable lives self.test_home = os.path.dirname(os.path.realpath(argv[0])) # Where test data resides if self.args.io_dir is None: self.args.io_dir = os.path.join(self.test_home, "scheduler") self.xml_input_dir = os.path.join(self.args.io_dir, "xml") self.expected_dir = os.path.join(self.args.io_dir, "exp") self.dot_expected_dir = os.path.join(self.args.io_dir, "dot") self.scores_dir = os.path.join(self.args.io_dir, "scores") self.summary_dir = os.path.join(self.args.io_dir, "summary") self.stderr_expected_dir = os.path.join(self.args.io_dir, "stderr") # Create a temporary directory to store diff file self.failed_dir = tempfile.mkdtemp(prefix='cts-scheduler_') # Where to store generated files if self.args.out_dir is None: self.args.out_dir = self.args.io_dir self.failed_filename = os.path.join(self.failed_dir, "test-output.diff") else: self.failed_filename = os.path.join(self.args.out_dir, "test-output.diff") os.environ['CIB_shadow_dir'] = self.args.out_dir self.failed_file = None self.outfile_out_dir = os.path.join(self.args.out_dir, "out") self.dot_out_dir = os.path.join(self.args.out_dir, "dot") self.scores_out_dir = os.path.join(self.args.out_dir, "scores") self.summary_out_dir = os.path.join(self.args.out_dir, "summary") self.stderr_out_dir = os.path.join(self.args.out_dir, "stderr") self.valgrind_out_dir = os.path.join(self.args.out_dir, "valgrind") # Single test mode (if requested) try: # User can give test base name or file name of a test input self.args.run = os.path.splitext(os.path.basename(self.args.run))[0] except (AttributeError, TypeError): pass # --run was not specified self.set_schema_env() # Arguments needed (or not) to run commands self.valgrind_args = self._get_valgrind_cmd() self.simulate_args = self._get_simulator_cmd() # Test counters self.num_failed = 0 self.num_tests = 0 # Ensure that the main output directory exists # We don't want to create it with os.makedirs below if not os.path.isdir(self.args.out_dir): self._error("Output directory missing; can't create output files") sys.exit(ExitStatus.CANTCREAT) # Create output subdirectories if they don't exist try: os.makedirs(self.outfile_out_dir, 0o755, True) os.makedirs(self.dot_out_dir, 0o755, True) os.makedirs(self.scores_out_dir, 0o755, True) os.makedirs(self.summary_out_dir, 0o755, True) os.makedirs(self.stderr_out_dir, 0o755, True) if self.valgrind_args: os.makedirs(self.valgrind_out_dir, 0o755, True) except OSError as ex: self._error("Unable to create output subdirectory: %s" % ex) remove_files([ self.outfile_out_dir, self.dot_out_dir, self.scores_out_dir, self.summary_out_dir, self.stderr_out_dir, ]) sys.exit(ExitStatus.CANTCREAT) def _compare_files(self, filename1, filename2): """ Add any file differences to failed results """ if diff(filename1, filename2, stdout=subprocess.DEVNULL) != 0: diff(filename1, filename2, stdout=self.failed_file, stderr=subprocess.DEVNULL) self.failed_file.write("\n") return True return False def run_one(self, test_name, test_desc, test_args=[]): """ Run one scheduler test """ print(" Test %-25s %s" % ((test_name + ":"), test_desc)) did_fail = False self.num_tests = self.num_tests + 1 # Test inputs input_filename = os.path.join( self.xml_input_dir, "%s.xml" % test_name) expected_filename = os.path.join( self.expected_dir, "%s.exp" % test_name) dot_expected_filename = os.path.join( self.dot_expected_dir, "%s.dot" % test_name) scores_filename = os.path.join( self.scores_dir, "%s.scores" % test_name) summary_filename = os.path.join( self.summary_dir, "%s.summary" % test_name) stderr_expected_filename = os.path.join( self.stderr_expected_dir, "%s.stderr" % test_name) # (Intermediate) test outputs output_filename = os.path.join( self.outfile_out_dir, "%s.out" % test_name) dot_output_filename = os.path.join( self.dot_out_dir, "%s.dot.pe" % test_name) score_output_filename = os.path.join( self.scores_out_dir, "%s.scores.pe" % test_name) summary_output_filename = os.path.join( self.summary_out_dir, "%s.summary.pe" % test_name) stderr_output_filename = os.path.join( self.stderr_out_dir, "%s.stderr.pe" % test_name) valgrind_output_filename = os.path.join( self.valgrind_out_dir, "%s.valgrind" % test_name) # Common arguments for running test test_cmd = [] if self.valgrind_args: test_cmd = self.valgrind_args + [ "--log-file=%s" % valgrind_output_filename ] test_cmd = test_cmd + self.simulate_args # @TODO It would be more pythonic to raise exceptions for errors, # then perhaps it would be nice to make a single-test class # Ensure necessary test inputs exist if not os.path.isfile(input_filename): self._error("No input") self.num_failed = self.num_failed + 1 return ExitStatus.NOINPUT if not self.args.update and not os.path.isfile(expected_filename): self._error("no stored output") return ExitStatus.NOINPUT # Run simulation to generate summary output if self.args.run: # Single test mode test_cmd_full = test_cmd + [ '-x', input_filename, '-S' ] + test_args print(" ".join(test_cmd_full)) else: # @TODO Why isn't test_args added here? test_cmd_full = test_cmd + [ '-x', input_filename, '-S' ] with io.open(summary_output_filename, "wt") as f: simulation = subprocess.Popen(test_cmd_full, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, env=os.environ) # This makes diff happy regardless of --enable-compat-2.0. # Use sed -E to make Linux and BSD special characters more compatible. sed = subprocess.Popen(["sed", "-E", "-e", "s/ocf::/ocf:/g", "-e", r"s/Masters:/Promoted:/", "-e", r"s/Slaves:/Unpromoted:/", "-e", r"s/ Master( |\[|$)/ Promoted\1/", "-e", r"s/ Slave / Unpromoted /", ], stdin=simulation.stdout, stdout=f, stderr=subprocess.STDOUT) simulation.stdout.close() sed.communicate() if self.args.run: cat(summary_output_filename) # Re-run simulation to generate dot, graph, and scores test_cmd_full = test_cmd + [ '-x', input_filename, '-D', dot_output_filename, '-G', output_filename, '-sSQ' ] + test_args with io.open(stderr_output_filename, "wt") as f_stderr, \ io.open(score_output_filename, "wt") as f_score: rc = subprocess.call(test_cmd_full, stdout=f_score, stderr=f_stderr, env=os.environ) # Check for test command failure if rc != ExitStatus.OK: self._failed("Test returned: %d" % rc) did_fail = True print(" ".join(test_cmd_full)) # Check for valgrind errors if self.valgrind_args and not self.args.valgrind_skip_output: if os.stat(valgrind_output_filename).st_size > 0: self._failed("Valgrind reported errors") did_fail = True cat(valgrind_output_filename) remove_files([ valgrind_output_filename ]) # Check for core dump if os.path.isfile("core"): self._failed("Core-file detected: core." + test_name) did_fail = True os.rename("core", "%s/core.%s" % (self.test_home, test_name)) # Check any stderr output if os.path.isfile(stderr_expected_filename): if self._compare_files(stderr_expected_filename, stderr_output_filename): self._failed("stderr changed") did_fail = True elif os.stat(stderr_output_filename).st_size > 0: self._failed("Output was written to stderr") did_fail = True cat(stderr_output_filename) remove_files([ stderr_output_filename ]) # Check whether output graph exists, and normalize it if (not os.path.isfile(output_filename) or os.stat(output_filename).st_size == 0): self._error("No graph produced") did_fail = True self.num_failed = self.num_failed + 1 remove_files([ output_filename ]) return ExitStatus.ERROR normalize(output_filename) # Check whether dot output exists, and sort it if (not os.path.isfile(dot_output_filename) or os.stat(dot_output_filename).st_size == 0): self._error("No dot-file summary produced") did_fail = True self.num_failed = self.num_failed + 1 remove_files([ dot_output_filename, output_filename ]) return ExitStatus.ERROR with io.open(dot_output_filename, "rt") as f: first_line = f.readline() # "digraph" line with opening brace lines = f.readlines() last_line = lines[-1] # closing brace del lines[-1] lines = sorted(set(lines)) # unique sort with io.open(dot_output_filename, "wt") as f: f.write(first_line) f.writelines(lines) f.write(last_line) # Check whether score output exists, and sort it if (not os.path.isfile(score_output_filename) or os.stat(score_output_filename).st_size == 0): self._error("No allocation scores produced") did_fail = True self.num_failed = self.num_failed + 1 remove_files([ score_output_filename, output_filename ]) return ExitStatus.ERROR else: sort_file(score_output_filename) if self.args.update: shutil.copyfile(output_filename, expected_filename) shutil.copyfile(dot_output_filename, dot_expected_filename) shutil.copyfile(score_output_filename, scores_filename) shutil.copyfile(summary_output_filename, summary_filename) print(" Updated expected outputs") if self._compare_files(summary_filename, summary_output_filename): self._failed("summary changed") did_fail = True if self._compare_files(dot_expected_filename, dot_output_filename): self._failed("dot-file summary changed") did_fail = True else: remove_files([ dot_output_filename ]) if self._compare_files(expected_filename, output_filename): self._failed("xml-file changed") did_fail = True if self._compare_files(scores_filename, score_output_filename): self._failed("scores-file changed") did_fail = True remove_files([ output_filename, dot_output_filename, score_output_filename, summary_output_filename]) if did_fail: self.num_failed = self.num_failed + 1 return ExitStatus.ERROR return ExitStatus.OK def run_all(self): """ Run all defined tests """ if platform.architecture()[0] == "64bit": TESTS.extend(TESTS_64BIT) for group in TESTS: for test in group: try: args = test[2] except IndexError: args = [] self.run_one(test[0], test[1], args) print() def _print_summary(self): """ Print a summary of parameters for this test run """ print("Test home is:\t" + self.test_home) print("Test binary is:\t" + self.args.binary) if 'PCMK_schema_directory' in os.environ: print("Schema home is:\t" + os.environ['PCMK_schema_directory']) if self.valgrind_args != []: print("Activating memory testing with valgrind") print() def _test_results(self): if self.num_failed == 0: shutil.rmtree(self.failed_dir) return ExitStatus.OK if os.path.isfile(self.failed_filename) and os.stat(self.failed_filename).st_size != 0: if self.args.verbose: self._error("Results of %d failed tests (out of %d):" % (self.num_failed, self.num_tests)) cat(self.failed_filename) else: self._error("Results of %d failed tests (out of %d) are in %s" % (self.num_failed, self.num_tests, self.failed_filename)) self._error("Use -V to display them after running the tests") else: self._error("%d (of %d) tests failed (no diff results)" % (self.num_failed, self.num_tests)) if os.path.isfile(self.failed_filename): shutil.rmtree(self.failed_dir) return ExitStatus.ERROR def run(self): """ Run test(s) as specified """ # Check for pre-existing core so we don't think it's from us if os.path.exists("core"): self._failed("Can't run with core already present in " + self.test_home) return ExitStatus.OSFILE self._print_summary() # Zero out the error log self.failed_file = io.open(self.failed_filename, "wt") if self.args.run is None: print("Performing the following tests from " + self.args.io_dir) print() self.run_all() print() self.failed_file.close() rc = self._test_results() else: rc = self.run_one(self.args.run, "Single shot", self.single_test_args) self.failed_file.close() if self.num_failed > 0: print("\nFailures:\nThese have also been written to: " + self.failed_filename + "\n") cat(self.failed_filename) shutil.rmtree(self.failed_dir) return rc if __name__ == "__main__": sys.exit(CtsScheduler().run()) # vim: set filetype=python expandtab tabstop=4 softtabstop=4 shiftwidth=4 textwidth=120: diff --git a/cts/scheduler/dot/node-pending-timeout.dot b/cts/scheduler/dot/node-pending-timeout.dot new file mode 100644 index 0000000000..c808f7ea07 --- /dev/null +++ b/cts/scheduler/dot/node-pending-timeout.dot @@ -0,0 +1,7 @@ + digraph "g" { +"st-sbd_monitor_0 node-1" -> "st-sbd_start_0 node-1" [ style = bold] +"st-sbd_monitor_0 node-1" [ style=bold color="green" fontcolor="black"] +"st-sbd_start_0 node-1" [ style=bold color="green" fontcolor="black"] +"stonith 'reboot' node-2" -> "st-sbd_start_0 node-1" [ style = bold] +"stonith 'reboot' node-2" [ style=bold color="green" fontcolor="black"] +} diff --git a/cts/scheduler/dot/pending-node-no-uname.dot b/cts/scheduler/dot/pending-node-no-uname.dot new file mode 100644 index 0000000000..98783ca789 --- /dev/null +++ b/cts/scheduler/dot/pending-node-no-uname.dot @@ -0,0 +1,7 @@ + digraph "g" { +"st-sbd_monitor_0 node-1" -> "st-sbd_start_0 node-1" [ style = dashed] +"st-sbd_monitor_0 node-1" [ style=bold color="green" fontcolor="black"] +"st-sbd_monitor_0 node-2" -> "st-sbd_start_0 node-1" [ style = dashed] +"st-sbd_monitor_0 node-2" [ style=dashed color="red" fontcolor="black"] +"st-sbd_start_0 node-1" [ style=dashed color="red" fontcolor="black"] +} diff --git a/cts/scheduler/exp/node-pending-timeout.exp b/cts/scheduler/exp/node-pending-timeout.exp new file mode 100644 index 0000000000..e94812f031 --- /dev/null +++ b/cts/scheduler/exp/node-pending-timeout.exp @@ -0,0 +1,38 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/cts/scheduler/exp/pending-node-no-uname.exp b/cts/scheduler/exp/pending-node-no-uname.exp new file mode 100644 index 0000000000..2c45756b51 --- /dev/null +++ b/cts/scheduler/exp/pending-node-no-uname.exp @@ -0,0 +1,11 @@ + + + + + + + + + + + diff --git a/cts/scheduler/scores/node-pending-timeout.scores b/cts/scheduler/scores/node-pending-timeout.scores new file mode 100644 index 0000000000..90a7c8bd4e --- /dev/null +++ b/cts/scheduler/scores/node-pending-timeout.scores @@ -0,0 +1,3 @@ + +pcmk__primitive_assign: st-sbd allocation score on node-1: 0 +pcmk__primitive_assign: st-sbd allocation score on node-2: 0 diff --git a/cts/scheduler/scores/pending-node-no-uname.scores b/cts/scheduler/scores/pending-node-no-uname.scores new file mode 100644 index 0000000000..90a7c8bd4e --- /dev/null +++ b/cts/scheduler/scores/pending-node-no-uname.scores @@ -0,0 +1,3 @@ + +pcmk__primitive_assign: st-sbd allocation score on node-1: 0 +pcmk__primitive_assign: st-sbd allocation score on node-2: 0 diff --git a/cts/scheduler/summary/node-pending-timeout.summary b/cts/scheduler/summary/node-pending-timeout.summary new file mode 100644 index 0000000000..0fef9823ea --- /dev/null +++ b/cts/scheduler/summary/node-pending-timeout.summary @@ -0,0 +1,26 @@ +Using the original execution date of: 2023-02-21 12:19:57Z +Current cluster status: + * Node List: + * Node node-2: UNCLEAN (online) + * Online: [ node-1 ] + + * Full List of Resources: + * st-sbd (stonith:external/sbd): Stopped + +Transition Summary: + * Fence (reboot) node-2 'peer pending timed out on joining the process group' + * Start st-sbd ( node-1 ) + +Executing Cluster Transition: + * Resource action: st-sbd monitor on node-1 + * Fencing node-2 (reboot) + * Resource action: st-sbd start on node-1 +Using the original execution date of: 2023-02-21 12:19:57Z + +Revised Cluster Status: + * Node List: + * Online: [ node-1 ] + * OFFLINE: [ node-2 ] + + * Full List of Resources: + * st-sbd (stonith:external/sbd): Started node-1 diff --git a/cts/scheduler/summary/pending-node-no-uname.summary b/cts/scheduler/summary/pending-node-no-uname.summary new file mode 100644 index 0000000000..5f04fc6453 --- /dev/null +++ b/cts/scheduler/summary/pending-node-no-uname.summary @@ -0,0 +1,23 @@ +Using the original execution date of: 2023-02-21 12:19:57Z +Current cluster status: + * Node List: + * Node node-2: pending + * Online: [ node-1 ] + + * Full List of Resources: + * st-sbd (stonith:external/sbd): Stopped + +Transition Summary: + * Start st-sbd ( node-1 ) blocked + +Executing Cluster Transition: + * Resource action: st-sbd monitor on node-1 +Using the original execution date of: 2023-02-21 12:19:57Z + +Revised Cluster Status: + * Node List: + * Node node-2: pending + * Online: [ node-1 ] + + * Full List of Resources: + * st-sbd (stonith:external/sbd): Stopped diff --git a/cts/scheduler/xml/node-pending-timeout.xml b/cts/scheduler/xml/node-pending-timeout.xml new file mode 100644 index 0000000000..b4c3614dea --- /dev/null +++ b/cts/scheduler/xml/node-pending-timeout.xml @@ -0,0 +1,27 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/cts/scheduler/xml/pending-node-no-uname.xml b/cts/scheduler/xml/pending-node-no-uname.xml new file mode 100644 index 0000000000..d1b3664217 --- /dev/null +++ b/cts/scheduler/xml/pending-node-no-uname.xml @@ -0,0 +1,26 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/daemons/attrd/attrd_corosync.c b/daemons/attrd/attrd_corosync.c index ef205e6e13..086ec03d95 100644 --- a/daemons/attrd/attrd_corosync.c +++ b/daemons/attrd/attrd_corosync.c @@ -1,620 +1,620 @@ /* * Copyright 2013-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include #include "pacemaker-attrd.h" extern crm_exit_t attrd_exit_status; static xmlNode * attrd_confirmation(int callid) { xmlNode *node = create_xml_node(NULL, __func__); crm_xml_add(node, F_TYPE, T_ATTRD); crm_xml_add(node, F_ORIG, get_local_node_name()); crm_xml_add(node, PCMK__XA_TASK, PCMK__ATTRD_CMD_CONFIRM); crm_xml_add_int(node, XML_LRM_ATTR_CALLID, callid); return node; } static void attrd_peer_message(crm_node_t *peer, xmlNode *xml) { const char *election_op = crm_element_value(xml, F_CRM_TASK); if (election_op) { attrd_handle_election_op(peer, xml); return; } if (attrd_shutting_down()) { /* If we're shutting down, we want to continue responding to election * ops as long as we're a cluster member (because our vote may be * needed). Ignore all other messages. */ return; } else { pcmk__request_t request = { .ipc_client = NULL, .ipc_id = 0, .ipc_flags = 0, .peer = peer->uname, .xml = xml, .call_options = 0, .result = PCMK__UNKNOWN_RESULT, }; request.op = crm_element_value_copy(request.xml, PCMK__XA_TASK); CRM_CHECK(request.op != NULL, return); attrd_handle_request(&request); /* Having finished handling the request, check to see if the originating * peer requested confirmation. If so, send that confirmation back now. */ if (pcmk__xe_attr_is_true(xml, PCMK__XA_CONFIRM) && !pcmk__str_eq(request.op, PCMK__ATTRD_CMD_CONFIRM, pcmk__str_none)) { int callid = 0; xmlNode *reply = NULL; /* Add the confirmation ID for the message we are confirming to the * response so the originating peer knows what they're a confirmation * for. */ crm_element_value_int(xml, XML_LRM_ATTR_CALLID, &callid); reply = attrd_confirmation(callid); /* And then send the confirmation back to the originating peer. This * ends up right back in this same function (attrd_peer_message) on the * peer where it will have to do something with a PCMK__XA_CONFIRM type * message. */ crm_debug("Sending %s a confirmation", peer->uname); attrd_send_message(peer, reply, false); free_xml(reply); } pcmk__reset_request(&request); } } static void attrd_cpg_dispatch(cpg_handle_t handle, const struct cpg_name *groupName, uint32_t nodeid, uint32_t pid, void *msg, size_t msg_len) { uint32_t kind = 0; xmlNode *xml = NULL; const char *from = NULL; char *data = pcmk_message_common_cs(handle, nodeid, pid, msg, &kind, &from); if(data == NULL) { return; } if (kind == crm_class_cluster) { xml = string2xml(data); } if (xml == NULL) { crm_err("Bad message of class %d received from %s[%u]: '%.120s'", kind, from, nodeid, data); } else { crm_node_t *peer = crm_get_peer(nodeid, from); attrd_peer_message(peer, xml); } free_xml(xml); free(data); } static void attrd_cpg_destroy(gpointer unused) { if (attrd_shutting_down()) { crm_info("Corosync disconnection complete"); } else { crm_crit("Lost connection to cluster layer, shutting down"); attrd_exit_status = CRM_EX_DISCONNECT; attrd_shutdown(0); } } /*! * \internal * \brief Override an attribute sync with a local value * * Broadcast the local node's value for an attribute that's different from the * value provided in a peer's attribute synchronization response. This ensures a * node's values for itself take precedence and all peers are kept in sync. * * \param[in] a Attribute entry to override * * \return Local instance of attribute value */ static attribute_value_t * broadcast_local_value(const attribute_t *a) { attribute_value_t *v = g_hash_table_lookup(a->values, attrd_cluster->uname); xmlNode *sync = create_xml_node(NULL, __func__); crm_xml_add(sync, PCMK__XA_TASK, PCMK__ATTRD_CMD_SYNC_RESPONSE); attrd_add_value_xml(sync, a, v, false); attrd_send_message(NULL, sync, false); free_xml(sync); return v; } /*! * \internal * \brief Ensure a Pacemaker Remote node is in the correct peer cache * * \param[in] node_name Name of Pacemaker Remote node to check */ static void cache_remote_node(const char *node_name) { /* If we previously assumed this node was an unseen cluster node, * remove its entry from the cluster peer cache. */ - crm_node_t *dup = pcmk__search_cluster_node_cache(0, node_name); + crm_node_t *dup = pcmk__search_cluster_node_cache(0, node_name, NULL); if (dup && (dup->uuid == NULL)) { reap_crm_member(0, node_name); } // Ensure node is in the remote peer cache CRM_ASSERT(crm_remote_peer_get(node_name) != NULL); } #define state_text(state) pcmk__s((state), "in unknown state") /*! * \internal * \brief Return host's hash table entry (creating one if needed) * * \param[in,out] values Hash table of values * \param[in] host Name of peer to look up * \param[in] xml XML describing the attribute * * \return Pointer to new or existing hash table entry */ static attribute_value_t * attrd_lookup_or_create_value(GHashTable *values, const char *host, const xmlNode *xml) { attribute_value_t *v = g_hash_table_lookup(values, host); int is_remote = 0; crm_element_value_int(xml, PCMK__XA_ATTR_IS_REMOTE, &is_remote); if (is_remote) { cache_remote_node(host); } if (v == NULL) { v = calloc(1, sizeof(attribute_value_t)); CRM_ASSERT(v != NULL); pcmk__str_update(&v->nodename, host); v->is_remote = is_remote; g_hash_table_replace(values, v->nodename, v); } return(v); } static void attrd_peer_change_cb(enum crm_status_type kind, crm_node_t *peer, const void *data) { bool gone = false; bool is_remote = pcmk_is_set(peer->flags, crm_remote_node); switch (kind) { case crm_status_uname: crm_debug("%s node %s is now %s", (is_remote? "Remote" : "Cluster"), peer->uname, state_text(peer->state)); break; case crm_status_processes: if (!pcmk_is_set(peer->processes, crm_get_cluster_proc())) { gone = true; } crm_debug("Node %s is %s a peer", peer->uname, (gone? "no longer" : "now")); break; case crm_status_nstate: crm_debug("%s node %s is now %s (was %s)", (is_remote? "Remote" : "Cluster"), peer->uname, state_text(peer->state), state_text(data)); if (pcmk__str_eq(peer->state, CRM_NODE_MEMBER, pcmk__str_casei)) { /* If we're the writer, send new peers a list of all attributes * (unless it's a remote node, which doesn't run its own attrd) */ if (attrd_election_won() && !pcmk_is_set(peer->flags, crm_remote_node)) { attrd_peer_sync(peer, NULL); } } else { // Remove all attribute values associated with lost nodes attrd_peer_remove(peer->uname, false, "loss"); gone = true; } break; } // Remove votes from cluster nodes that leave, in case election in progress if (gone && !is_remote) { attrd_remove_voter(peer); attrd_remove_peer_protocol_ver(peer->uname); attrd_do_not_expect_from_peer(peer->uname); // Ensure remote nodes that come up are in the remote node cache } else if (!gone && is_remote) { cache_remote_node(peer->uname); } } static void record_peer_nodeid(attribute_value_t *v, const char *host) { crm_node_t *known_peer = crm_get_peer(v->nodeid, host); crm_trace("Learned %s has node id %s", known_peer->uname, known_peer->uuid); if (attrd_election_won()) { attrd_write_attributes(false, false); } } static void update_attr_on_host(attribute_t *a, const crm_node_t *peer, const xmlNode *xml, const char *attr, const char *value, const char *host, bool filter, int is_force_write) { attribute_value_t *v = NULL; v = attrd_lookup_or_create_value(a->values, host, xml); if (filter && !pcmk__str_eq(v->current, value, pcmk__str_casei) && pcmk__str_eq(host, attrd_cluster->uname, pcmk__str_casei)) { crm_notice("%s[%s]: local value '%s' takes priority over '%s' from %s", attr, host, v->current, value, peer->uname); v = broadcast_local_value(a); } else if (!pcmk__str_eq(v->current, value, pcmk__str_casei)) { crm_notice("Setting %s[%s]%s%s: %s -> %s " CRM_XS " from %s with %s write delay", attr, host, a->set_type ? " in " : "", pcmk__s(a->set_type, ""), pcmk__s(v->current, "(unset)"), pcmk__s(value, "(unset)"), peer->uname, (a->timeout_ms == 0)? "no" : pcmk__readable_interval(a->timeout_ms)); pcmk__str_update(&v->current, value); a->changed = true; if (pcmk__str_eq(host, attrd_cluster->uname, pcmk__str_casei) && pcmk__str_eq(attr, XML_CIB_ATTR_SHUTDOWN, pcmk__str_none)) { if (!pcmk__str_eq(value, "0", pcmk__str_null_matches)) { attrd_set_requesting_shutdown(); } else { attrd_clear_requesting_shutdown(); } } // Write out new value or start dampening timer if (a->timeout_ms && a->timer) { crm_trace("Delayed write out (%dms) for %s", a->timeout_ms, attr); mainloop_timer_start(a->timer); } else { attrd_write_or_elect_attribute(a); } } else { if (is_force_write == 1 && a->timeout_ms && a->timer) { /* Save forced writing and set change flag. */ /* The actual attribute is written by Writer after election. */ crm_trace("Unchanged %s[%s] from %s is %s(Set the forced write flag)", attr, host, peer->uname, value); a->force_write = TRUE; } else { crm_trace("Unchanged %s[%s] from %s is %s", attr, host, peer->uname, value); } } /* Set the seen flag for attribute processing held only in the own node. */ v->seen = TRUE; /* If this is a cluster node whose node ID we are learning, remember it */ if ((v->nodeid == 0) && (v->is_remote == FALSE) && (crm_element_value_int(xml, PCMK__XA_ATTR_NODE_ID, (int*)&v->nodeid) == 0) && (v->nodeid > 0)) { record_peer_nodeid(v, host); } } static void attrd_peer_update_one(const crm_node_t *peer, xmlNode *xml, bool filter) { attribute_t *a = NULL; const char *attr = crm_element_value(xml, PCMK__XA_ATTR_NAME); const char *value = crm_element_value(xml, PCMK__XA_ATTR_VALUE); const char *host = crm_element_value(xml, PCMK__XA_ATTR_NODE_NAME); int is_force_write = 0; if (attr == NULL) { crm_warn("Could not update attribute: peer did not specify name"); return; } crm_element_value_int(xml, PCMK__XA_ATTR_FORCE, &is_force_write); a = attrd_populate_attribute(xml, attr); if (a == NULL) { return; } if (host == NULL) { // If no host was specified, update all hosts GHashTableIter vIter; crm_debug("Setting %s for all hosts to %s", attr, value); xml_remove_prop(xml, PCMK__XA_ATTR_NODE_ID); g_hash_table_iter_init(&vIter, a->values); while (g_hash_table_iter_next(&vIter, (gpointer *) & host, NULL)) { update_attr_on_host(a, peer, xml, attr, value, host, filter, is_force_write); } } else { // Update attribute value for the given host update_attr_on_host(a, peer, xml, attr, value, host, filter, is_force_write); } /* If this is a message from some attrd instance broadcasting its protocol * version, check to see if it's a new minimum version. */ if (pcmk__str_eq(attr, CRM_ATTR_PROTOCOL, pcmk__str_none)) { attrd_update_minimum_protocol_ver(peer->uname, value); } } static void broadcast_unseen_local_values(void) { GHashTableIter aIter; GHashTableIter vIter; attribute_t *a = NULL; attribute_value_t *v = NULL; xmlNode *sync = NULL; g_hash_table_iter_init(&aIter, attributes); while (g_hash_table_iter_next(&aIter, NULL, (gpointer *) & a)) { g_hash_table_iter_init(&vIter, a->values); while (g_hash_table_iter_next(&vIter, NULL, (gpointer *) & v)) { if (!(v->seen) && pcmk__str_eq(v->nodename, attrd_cluster->uname, pcmk__str_casei)) { if (sync == NULL) { sync = create_xml_node(NULL, __func__); crm_xml_add(sync, PCMK__XA_TASK, PCMK__ATTRD_CMD_SYNC_RESPONSE); } attrd_add_value_xml(sync, a, v, a->timeout_ms && a->timer); } } } if (sync != NULL) { crm_debug("Broadcasting local-only values"); attrd_send_message(NULL, sync, false); free_xml(sync); } } int attrd_cluster_connect(void) { attrd_cluster = pcmk_cluster_new(); attrd_cluster->destroy = attrd_cpg_destroy; attrd_cluster->cpg.cpg_deliver_fn = attrd_cpg_dispatch; attrd_cluster->cpg.cpg_confchg_fn = pcmk_cpg_membership; crm_set_status_callback(&attrd_peer_change_cb); if (crm_cluster_connect(attrd_cluster) == FALSE) { crm_err("Cluster connection failed"); return -ENOTCONN; } return pcmk_ok; } void attrd_peer_clear_failure(pcmk__request_t *request) { xmlNode *xml = request->xml; const char *rsc = crm_element_value(xml, PCMK__XA_ATTR_RESOURCE); const char *host = crm_element_value(xml, PCMK__XA_ATTR_NODE_NAME); const char *op = crm_element_value(xml, PCMK__XA_ATTR_OPERATION); const char *interval_spec = crm_element_value(xml, PCMK__XA_ATTR_INTERVAL); guint interval_ms = crm_parse_interval_spec(interval_spec); char *attr = NULL; GHashTableIter iter; regex_t regex; crm_node_t *peer = crm_get_peer(0, request->peer); if (attrd_failure_regex(®ex, rsc, op, interval_ms) != pcmk_ok) { crm_info("Ignoring invalid request to clear failures for %s", pcmk__s(rsc, "all resources")); return; } crm_xml_add(xml, PCMK__XA_TASK, PCMK__ATTRD_CMD_UPDATE); /* Make sure value is not set, so we delete */ if (crm_element_value(xml, PCMK__XA_ATTR_VALUE)) { crm_xml_replace(xml, PCMK__XA_ATTR_VALUE, NULL); } g_hash_table_iter_init(&iter, attributes); while (g_hash_table_iter_next(&iter, (gpointer *) &attr, NULL)) { if (regexec(®ex, attr, 0, NULL, 0) == 0) { crm_trace("Matched %s when clearing %s", attr, pcmk__s(rsc, "all resources")); crm_xml_add(xml, PCMK__XA_ATTR_NAME, attr); attrd_peer_update(peer, xml, host, false); } } regfree(®ex); } /*! * \internal * \brief Load attributes from a peer sync response * * \param[in] peer Peer that sent clear request * \param[in] peer_won Whether peer is the attribute writer * \param[in,out] xml Request XML */ void attrd_peer_sync_response(const crm_node_t *peer, bool peer_won, xmlNode *xml) { crm_info("Processing " PCMK__ATTRD_CMD_SYNC_RESPONSE " from %s", peer->uname); if (peer_won) { /* Initialize the "seen" flag for all attributes to cleared, so we can * detect attributes that local node has but the writer doesn't. */ attrd_clear_value_seen(); } // Process each attribute update in the sync response for (xmlNode *child = pcmk__xml_first_child(xml); child != NULL; child = pcmk__xml_next(child)) { attrd_peer_update(peer, child, crm_element_value(child, PCMK__XA_ATTR_NODE_NAME), true); } if (peer_won) { /* If any attributes are still not marked as seen, the writer doesn't * know about them, so send all peers an update with them. */ broadcast_unseen_local_values(); } } /*! * \internal * \brief Remove all attributes and optionally peer cache entries for a node * * \param[in] host Name of node to purge * \param[in] uncache If true, remove node from peer caches * \param[in] source Who requested removal (only used for logging) */ void attrd_peer_remove(const char *host, bool uncache, const char *source) { attribute_t *a = NULL; GHashTableIter aIter; CRM_CHECK(host != NULL, return); crm_notice("Removing all %s attributes for peer %s", host, source); g_hash_table_iter_init(&aIter, attributes); while (g_hash_table_iter_next(&aIter, NULL, (gpointer *) & a)) { if(g_hash_table_remove(a->values, host)) { crm_debug("Removed %s[%s] for peer %s", a->id, host, source); } } if (uncache) { crm_remote_peer_cache_remove(host); reap_crm_member(0, host); } } void attrd_peer_sync(crm_node_t *peer, xmlNode *xml) { GHashTableIter aIter; GHashTableIter vIter; attribute_t *a = NULL; attribute_value_t *v = NULL; xmlNode *sync = create_xml_node(NULL, __func__); crm_xml_add(sync, PCMK__XA_TASK, PCMK__ATTRD_CMD_SYNC_RESPONSE); g_hash_table_iter_init(&aIter, attributes); while (g_hash_table_iter_next(&aIter, NULL, (gpointer *) & a)) { g_hash_table_iter_init(&vIter, a->values); while (g_hash_table_iter_next(&vIter, NULL, (gpointer *) & v)) { crm_debug("Syncing %s[%s] = %s to %s", a->id, v->nodename, v->current, peer?peer->uname:"everyone"); attrd_add_value_xml(sync, a, v, false); } } crm_debug("Syncing values to %s", peer?peer->uname:"everyone"); attrd_send_message(peer, sync, false); free_xml(sync); } void attrd_peer_update(const crm_node_t *peer, xmlNode *xml, const char *host, bool filter) { bool handle_sync_point = false; if (xml_has_children(xml)) { for (xmlNode *child = first_named_child(xml, XML_ATTR_OP); child != NULL; child = crm_next_same_xml(child)) { attrd_copy_xml_attributes(xml, child); attrd_peer_update_one(peer, child, filter); if (attrd_request_has_sync_point(child)) { handle_sync_point = true; } } } else { attrd_peer_update_one(peer, xml, filter); if (attrd_request_has_sync_point(xml)) { handle_sync_point = true; } } /* If the update XML specified that the client wanted to wait for a sync * point, process that now. */ if (handle_sync_point) { crm_trace("Hit local sync point for attribute update"); attrd_ack_waitlist_clients(attrd_sync_point_local, xml); } } diff --git a/daemons/attrd/attrd_ipc.c b/daemons/attrd/attrd_ipc.c index 9d3dfff23a..8b7040b57f 100644 --- a/daemons/attrd/attrd_ipc.c +++ b/daemons/attrd/attrd_ipc.c @@ -1,628 +1,629 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "pacemaker-attrd.h" static qb_ipcs_service_t *ipcs = NULL; /*! * \internal * \brief Build the XML reply to a client query * * param[in] attr Name of requested attribute * param[in] host Name of requested host (or NULL for all hosts) * * \return New XML reply * \note Caller is responsible for freeing the resulting XML */ static xmlNode *build_query_reply(const char *attr, const char *host) { xmlNode *reply = create_xml_node(NULL, __func__); attribute_t *a; if (reply == NULL) { return NULL; } crm_xml_add(reply, F_TYPE, T_ATTRD); crm_xml_add(reply, F_SUBTYPE, PCMK__ATTRD_CMD_QUERY); crm_xml_add(reply, PCMK__XA_ATTR_VERSION, ATTRD_PROTOCOL_VERSION); /* If desired attribute exists, add its value(s) to the reply */ a = g_hash_table_lookup(attributes, attr); if (a) { attribute_value_t *v; xmlNode *host_value; crm_xml_add(reply, PCMK__XA_ATTR_NAME, attr); /* Allow caller to use "localhost" to refer to local node */ if (pcmk__str_eq(host, "localhost", pcmk__str_casei)) { host = attrd_cluster->uname; crm_trace("Mapped localhost to %s", host); } /* If a specific node was requested, add its value */ if (host) { v = g_hash_table_lookup(a->values, host); host_value = create_xml_node(reply, XML_CIB_TAG_NODE); if (host_value == NULL) { free_xml(reply); return NULL; } pcmk__xe_add_node(host_value, host, 0); crm_xml_add(host_value, PCMK__XA_ATTR_VALUE, (v? v->current : NULL)); /* Otherwise, add all nodes' values */ } else { GHashTableIter iter; g_hash_table_iter_init(&iter, a->values); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &v)) { host_value = create_xml_node(reply, XML_CIB_TAG_NODE); if (host_value == NULL) { free_xml(reply); return NULL; } pcmk__xe_add_node(host_value, v->nodename, 0); crm_xml_add(host_value, PCMK__XA_ATTR_VALUE, v->current); } } } return reply; } xmlNode * attrd_client_clear_failure(pcmk__request_t *request) { xmlNode *xml = request->xml; const char *rsc, *op, *interval_spec; if (minimum_protocol_version >= 2) { /* Propagate to all peers (including ourselves). * This ends up at attrd_peer_message(). */ attrd_send_message(NULL, xml, false); pcmk__set_result(&request->result, CRM_EX_OK, PCMK_EXEC_DONE, NULL); return NULL; } rsc = crm_element_value(xml, PCMK__XA_ATTR_RESOURCE); op = crm_element_value(xml, PCMK__XA_ATTR_OPERATION); interval_spec = crm_element_value(xml, PCMK__XA_ATTR_INTERVAL); /* Map this to an update */ crm_xml_add(xml, PCMK__XA_TASK, PCMK__ATTRD_CMD_UPDATE); /* Add regular expression matching desired attributes */ if (rsc) { char *pattern; if (op == NULL) { pattern = crm_strdup_printf(ATTRD_RE_CLEAR_ONE, rsc); } else { guint interval_ms = crm_parse_interval_spec(interval_spec); pattern = crm_strdup_printf(ATTRD_RE_CLEAR_OP, rsc, op, interval_ms); } crm_xml_add(xml, PCMK__XA_ATTR_PATTERN, pattern); free(pattern); } else { crm_xml_add(xml, PCMK__XA_ATTR_PATTERN, ATTRD_RE_CLEAR_ALL); } /* Make sure attribute and value are not set, so we delete via regex */ if (crm_element_value(xml, PCMK__XA_ATTR_NAME)) { crm_xml_replace(xml, PCMK__XA_ATTR_NAME, NULL); } if (crm_element_value(xml, PCMK__XA_ATTR_VALUE)) { crm_xml_replace(xml, PCMK__XA_ATTR_VALUE, NULL); } return attrd_client_update(request); } xmlNode * attrd_client_peer_remove(pcmk__request_t *request) { xmlNode *xml = request->xml; // Host and ID are not used in combination, rather host has precedence const char *host = crm_element_value(xml, PCMK__XA_ATTR_NODE_NAME); char *host_alloc = NULL; attrd_send_ack(request->ipc_client, request->ipc_id, request->ipc_flags); if (host == NULL) { int nodeid = 0; crm_element_value_int(xml, PCMK__XA_ATTR_NODE_ID, &nodeid); if (nodeid > 0) { - crm_node_t *node = pcmk__search_cluster_node_cache(nodeid, NULL); + crm_node_t *node = pcmk__search_cluster_node_cache(nodeid, NULL, + NULL); char *host_alloc = NULL; if (node && node->uname) { // Use cached name if available host = node->uname; } else { // Otherwise ask cluster layer host_alloc = get_node_name(nodeid); host = host_alloc; } pcmk__xe_add_node(xml, host, 0); } } if (host) { crm_info("Client %s is requesting all values for %s be removed", pcmk__client_name(request->ipc_client), host); attrd_send_message(NULL, xml, false); /* ends up at attrd_peer_message() */ free(host_alloc); } else { crm_info("Ignoring request by client %s to remove all peer values without specifying peer", pcmk__client_name(request->ipc_client)); } pcmk__set_result(&request->result, CRM_EX_OK, PCMK_EXEC_DONE, NULL); return NULL; } xmlNode * attrd_client_query(pcmk__request_t *request) { xmlNode *query = request->xml; xmlNode *reply = NULL; const char *attr = NULL; crm_debug("Query arrived from %s", pcmk__client_name(request->ipc_client)); /* Request must specify attribute name to query */ attr = crm_element_value(query, PCMK__XA_ATTR_NAME); if (attr == NULL) { pcmk__format_result(&request->result, CRM_EX_ERROR, PCMK_EXEC_ERROR, "Ignoring malformed query from %s (no attribute name given)", pcmk__client_name(request->ipc_client)); return NULL; } /* Build the XML reply */ reply = build_query_reply(attr, crm_element_value(query, PCMK__XA_ATTR_NODE_NAME)); if (reply == NULL) { pcmk__format_result(&request->result, CRM_EX_ERROR, PCMK_EXEC_ERROR, "Could not respond to query from %s: could not create XML reply", pcmk__client_name(request->ipc_client)); return NULL; } else { pcmk__set_result(&request->result, CRM_EX_OK, PCMK_EXEC_DONE, NULL); } request->ipc_client->request_id = 0; return reply; } xmlNode * attrd_client_refresh(pcmk__request_t *request) { crm_info("Updating all attributes"); attrd_send_ack(request->ipc_client, request->ipc_id, request->ipc_flags); attrd_write_attributes(true, true); pcmk__set_result(&request->result, CRM_EX_OK, PCMK_EXEC_DONE, NULL); return NULL; } static void handle_missing_host(xmlNode *xml) { const char *host = crm_element_value(xml, PCMK__XA_ATTR_NODE_NAME); if (host == NULL) { crm_trace("Inferring host"); pcmk__xe_add_node(xml, attrd_cluster->uname, attrd_cluster->nodeid); } } /* Convert a single IPC message with a regex into one with multiple children, one * for each regex match. */ static int expand_regexes(xmlNode *xml, const char *attr, const char *value, const char *regex) { if (attr == NULL && regex) { bool matched = false; GHashTableIter aIter; regex_t r_patt; crm_debug("Setting %s to %s", regex, value); if (regcomp(&r_patt, regex, REG_EXTENDED|REG_NOSUB)) { return EINVAL; } g_hash_table_iter_init(&aIter, attributes); while (g_hash_table_iter_next(&aIter, (gpointer *) & attr, NULL)) { int status = regexec(&r_patt, attr, 0, NULL, 0); if (status == 0) { xmlNode *child = create_xml_node(xml, XML_ATTR_OP); crm_trace("Matched %s with %s", attr, regex); matched = true; /* Copy all the attributes from the parent over, but remove the * regex and replace it with the name. */ attrd_copy_xml_attributes(xml, child); crm_xml_replace(child, PCMK__XA_ATTR_PATTERN, NULL); crm_xml_add(child, PCMK__XA_ATTR_NAME, attr); } } regfree(&r_patt); /* Return a code if we never matched anything. This should not be treated * as an error. It indicates there was a regex, and it was a valid regex, * but simply did not match anything and the caller should not continue * doing any regex-related processing. */ if (!matched) { return pcmk_rc_op_unsatisfied; } } else if (attr == NULL) { return pcmk_rc_bad_nvpair; } return pcmk_rc_ok; } static int handle_regexes(pcmk__request_t *request) { xmlNode *xml = request->xml; int rc = pcmk_rc_ok; const char *attr = crm_element_value(xml, PCMK__XA_ATTR_NAME); const char *value = crm_element_value(xml, PCMK__XA_ATTR_VALUE); const char *regex = crm_element_value(xml, PCMK__XA_ATTR_PATTERN); rc = expand_regexes(xml, attr, value, regex); if (rc == EINVAL) { pcmk__format_result(&request->result, CRM_EX_ERROR, PCMK_EXEC_ERROR, "Bad regex '%s' for update from client %s", regex, pcmk__client_name(request->ipc_client)); } else if (rc == pcmk_rc_bad_nvpair) { crm_err("Update request did not specify attribute or regular expression"); pcmk__format_result(&request->result, CRM_EX_ERROR, PCMK_EXEC_ERROR, "Client %s update request did not specify attribute or regular expression", pcmk__client_name(request->ipc_client)); } return rc; } static int handle_value_expansion(const char **value, xmlNode *xml, const char *op, const char *attr) { attribute_t *a = g_hash_table_lookup(attributes, attr); if (a == NULL && pcmk__str_eq(op, PCMK__ATTRD_CMD_UPDATE_DELAY, pcmk__str_none)) { return EINVAL; } if (*value && attrd_value_needs_expansion(*value)) { int int_value; attribute_value_t *v = NULL; if (a) { const char *host = crm_element_value(xml, PCMK__XA_ATTR_NODE_NAME); v = g_hash_table_lookup(a->values, host); } int_value = attrd_expand_value(*value, (v? v->current : NULL)); crm_info("Expanded %s=%s to %d", attr, *value, int_value); crm_xml_add_int(xml, PCMK__XA_ATTR_VALUE, int_value); /* Replacing the value frees the previous memory, so re-query it */ *value = crm_element_value(xml, PCMK__XA_ATTR_VALUE); } return pcmk_rc_ok; } static void send_update_msg_to_cluster(pcmk__request_t *request, xmlNode *xml) { if (pcmk__str_eq(attrd_request_sync_point(xml), PCMK__VALUE_CLUSTER, pcmk__str_none)) { /* The client is waiting on the cluster-wide sync point. In this case, * the response ACK is not sent until this attrd broadcasts the update * and receives its own confirmation back from all peers. */ attrd_expect_confirmations(request, attrd_cluster_sync_point_update); attrd_send_message(NULL, xml, true); /* ends up at attrd_peer_message() */ } else { /* The client is either waiting on the local sync point or was not * waiting on any sync point at all. For the local sync point, the * response ACK is sent in attrd_peer_update. For clients not * waiting on any sync point, the response ACK is sent in * handle_update_request immediately before this function was called. */ attrd_send_message(NULL, xml, false); /* ends up at attrd_peer_message() */ } } static int send_child_update(xmlNode *child, void *data) { pcmk__request_t *request = (pcmk__request_t *) data; /* Calling pcmk__set_result is handled by one of these calls to * attrd_client_update, so no need to do it again here. */ request->xml = child; attrd_client_update(request); return pcmk_rc_ok; } xmlNode * attrd_client_update(pcmk__request_t *request) { xmlNode *xml = request->xml; const char *attr, *value, *regex; /* If the message has children, that means it is a message from a newer * client that supports sending multiple operations at a time. There are * two ways we can handle that. */ if (xml_has_children(xml)) { if (ATTRD_SUPPORTS_MULTI_MESSAGE(minimum_protocol_version)) { /* First, if all peers support a certain protocol version, we can * just broadcast the big message and they'll handle it. However, * we also need to apply all the transformations in this function * to the children since they don't happen anywhere else. */ for (xmlNode *child = first_named_child(xml, XML_ATTR_OP); child != NULL; child = crm_next_same_xml(child)) { attr = crm_element_value(child, PCMK__XA_ATTR_NAME); value = crm_element_value(child, PCMK__XA_ATTR_VALUE); handle_missing_host(child); if (handle_value_expansion(&value, child, request->op, attr) == EINVAL) { pcmk__format_result(&request->result, CRM_EX_NOSUCH, PCMK_EXEC_ERROR, "Attribute %s does not exist", attr); return NULL; } } send_update_msg_to_cluster(request, xml); pcmk__set_result(&request->result, CRM_EX_OK, PCMK_EXEC_DONE, NULL); } else { /* Save the original xml node pointer so it can be restored after iterating * over all the children. */ xmlNode *orig_xml = request->xml; /* Second, if they do not support that protocol version, split it * up into individual messages and call attrd_client_update on * each one. */ pcmk__xe_foreach_child(xml, XML_ATTR_OP, send_child_update, request); request->xml = orig_xml; } return NULL; } attr = crm_element_value(xml, PCMK__XA_ATTR_NAME); value = crm_element_value(xml, PCMK__XA_ATTR_VALUE); regex = crm_element_value(xml, PCMK__XA_ATTR_PATTERN); if (handle_regexes(request) != pcmk_rc_ok) { /* Error handling was already dealt with in handle_regexes, so just return. */ return NULL; } else if (regex) { /* Recursively call attrd_client_update on the new message with regexes * expanded. If supported by the attribute daemon, this means that all * matches can also be handled atomically. */ return attrd_client_update(request); } handle_missing_host(xml); if (handle_value_expansion(&value, xml, request->op, attr) == EINVAL) { pcmk__format_result(&request->result, CRM_EX_NOSUCH, PCMK_EXEC_ERROR, "Attribute %s does not exist", attr); return NULL; } crm_debug("Broadcasting %s[%s]=%s%s", attr, crm_element_value(xml, PCMK__XA_ATTR_NODE_NAME), value, (attrd_election_won()? " (writer)" : "")); send_update_msg_to_cluster(request, xml); pcmk__set_result(&request->result, CRM_EX_OK, PCMK_EXEC_DONE, NULL); return NULL; } /*! * \internal * \brief Accept a new client IPC connection * * \param[in,out] c New connection * \param[in] uid Client user id * \param[in] gid Client group id * * \return pcmk_ok on success, -errno otherwise */ static int32_t attrd_ipc_accept(qb_ipcs_connection_t *c, uid_t uid, gid_t gid) { crm_trace("New client connection %p", c); if (attrd_shutting_down()) { crm_info("Ignoring new connection from pid %d during shutdown", pcmk__client_pid(c)); return -EPERM; } if (pcmk__new_client(c, uid, gid) == NULL) { return -EIO; } return pcmk_ok; } /*! * \internal * \brief Destroy a client IPC connection * * \param[in] c Connection to destroy * * \return FALSE (i.e. do not re-run this callback) */ static int32_t attrd_ipc_closed(qb_ipcs_connection_t *c) { pcmk__client_t *client = pcmk__find_client(c); if (client == NULL) { crm_trace("Ignoring request to clean up unknown connection %p", c); } else { crm_trace("Cleaning up closed client connection %p", c); /* Remove the client from the sync point waitlist if it's present. */ attrd_remove_client_from_waitlist(client); /* And no longer wait for confirmations from any peers. */ attrd_do_not_wait_for_client(client); pcmk__free_client(client); } return FALSE; } /*! * \internal * \brief Destroy a client IPC connection * * \param[in,out] c Connection to destroy * * \note We handle a destroyed connection the same as a closed one, * but we need a separate handler because the return type is different. */ static void attrd_ipc_destroy(qb_ipcs_connection_t *c) { crm_trace("Destroying client connection %p", c); attrd_ipc_closed(c); } static int32_t attrd_ipc_dispatch(qb_ipcs_connection_t * c, void *data, size_t size) { uint32_t id = 0; uint32_t flags = 0; pcmk__client_t *client = pcmk__find_client(c); xmlNode *xml = NULL; // Sanity-check, and parse XML from IPC data CRM_CHECK((c != NULL) && (client != NULL), return 0); if (data == NULL) { crm_debug("No IPC data from PID %d", pcmk__client_pid(c)); return 0; } xml = pcmk__client_data2xml(client, data, &id, &flags); if (xml == NULL) { crm_debug("Unrecognizable IPC data from PID %d", pcmk__client_pid(c)); pcmk__ipc_send_ack(client, id, flags, "ack", NULL, CRM_EX_PROTOCOL); return 0; } else { pcmk__request_t request = { .ipc_client = client, .ipc_id = id, .ipc_flags = flags, .peer = NULL, .xml = xml, .call_options = 0, .result = PCMK__UNKNOWN_RESULT, }; CRM_ASSERT(client->user != NULL); pcmk__update_acl_user(xml, PCMK__XA_ATTR_USER, client->user); request.op = crm_element_value_copy(request.xml, PCMK__XA_TASK); CRM_CHECK(request.op != NULL, return 0); attrd_handle_request(&request); pcmk__reset_request(&request); } free_xml(xml); return 0; } static struct qb_ipcs_service_handlers ipc_callbacks = { .connection_accept = attrd_ipc_accept, .connection_created = NULL, .msg_process = attrd_ipc_dispatch, .connection_closed = attrd_ipc_closed, .connection_destroyed = attrd_ipc_destroy }; void attrd_ipc_fini(void) { if (ipcs != NULL) { pcmk__drop_all_clients(ipcs); qb_ipcs_destroy(ipcs); ipcs = NULL; } } /*! * \internal * \brief Set up attrd IPC communication */ void attrd_init_ipc(void) { pcmk__serve_attrd_ipc(&ipcs, &ipc_callbacks); } diff --git a/daemons/based/based_messages.c b/daemons/based/based_messages.c index 3c9e68aad3..34a69443a8 100644 --- a/daemons/based/based_messages.c +++ b/daemons/based/based_messages.c @@ -1,582 +1,582 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* Maximum number of diffs to ignore while waiting for a resync */ #define MAX_DIFF_RETRY 5 bool based_is_primary = false; xmlNode *the_cib = NULL; int cib_process_shutdown_req(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { const char *host = crm_element_value(req, F_ORIG); *answer = NULL; if (crm_element_value(req, F_CIB_ISREPLY) == NULL) { crm_info("Peer %s is requesting to shut down", host); return pcmk_ok; } if (cib_shutdown_flag == FALSE) { crm_err("Peer %s mistakenly thinks we wanted to shut down", host); return -EINVAL; } crm_info("Peer %s has acknowledged our shutdown request", host); terminate_cib(__func__, 0); return pcmk_ok; } // @COMPAT: Remove when PCMK__CIB_REQUEST_NOOP is removed int cib_process_noop(const char *op, int options, const char *section, xmlNode *req, xmlNode *input, xmlNode *existing_cib, xmlNode **result_cib, xmlNode **answer) { crm_trace("Processing \"%s\" event", op); *answer = NULL; return pcmk_ok; } int cib_process_readwrite(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { int result = pcmk_ok; crm_trace("Processing \"%s\" event", op); if (pcmk__str_eq(op, PCMK__CIB_REQUEST_IS_PRIMARY, pcmk__str_none)) { if (based_is_primary) { result = pcmk_ok; } else { result = -EPERM; } return result; } if (pcmk__str_eq(op, PCMK__CIB_REQUEST_PRIMARY, pcmk__str_none)) { if (!based_is_primary) { crm_info("We are now in R/W mode"); based_is_primary = true; } else { crm_debug("We are still in R/W mode"); } } else if (based_is_primary) { crm_info("We are now in R/O mode"); based_is_primary = false; } return result; } /* Set to 1 when a sync is requested, incremented when a diff is ignored, * reset to 0 when a sync is received */ static int sync_in_progress = 0; void send_sync_request(const char *host) { xmlNode *sync_me = create_xml_node(NULL, "sync-me"); crm_info("Requesting re-sync from %s", (host? host : "all peers")); sync_in_progress = 1; crm_xml_add(sync_me, F_TYPE, "cib"); crm_xml_add(sync_me, F_CIB_OPERATION, PCMK__CIB_REQUEST_SYNC_TO_ONE); crm_xml_add(sync_me, F_CIB_DELEGATED, stand_alone? "localhost" : crm_cluster->uname); send_cluster_message(host ? crm_get_peer(0, host) : NULL, crm_msg_cib, sync_me, FALSE); free_xml(sync_me); } int cib_process_ping(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { const char *host = crm_element_value(req, F_ORIG); const char *seq = crm_element_value(req, F_CIB_PING_ID); char *digest = calculate_xml_versioned_digest(the_cib, FALSE, TRUE, CRM_FEATURE_SET); crm_trace("Processing \"%s\" event %s from %s", op, seq, host); *answer = create_xml_node(NULL, XML_CRM_TAG_PING); crm_xml_add(*answer, XML_ATTR_CRM_VERSION, CRM_FEATURE_SET); crm_xml_add(*answer, XML_ATTR_DIGEST, digest); crm_xml_add(*answer, F_CIB_PING_ID, seq); pcmk__if_tracing( { // Append additional detail so the receiver can log the differences add_message_xml(*answer, F_CIB_CALLDATA, the_cib); }, { // Always include at least the version details const char *tag = TYPE(the_cib); xmlNode *shallow = create_xml_node(NULL, tag); copy_in_properties(shallow, the_cib); add_message_xml(*answer, F_CIB_CALLDATA, shallow); free_xml(shallow); } ); crm_info("Reporting our current digest to %s: %s for %s.%s.%s", host, digest, crm_element_value(existing_cib, XML_ATTR_GENERATION_ADMIN), crm_element_value(existing_cib, XML_ATTR_GENERATION), crm_element_value(existing_cib, XML_ATTR_NUMUPDATES)); free(digest); return pcmk_ok; } int cib_process_sync(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { return sync_our_cib(req, TRUE); } int cib_process_upgrade_server(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { int rc = pcmk_ok; *answer = NULL; if(crm_element_value(req, F_CIB_SCHEMA_MAX)) { /* The originator of an upgrade request sends it to the DC, without * F_CIB_SCHEMA_MAX. If an upgrade is needed, the DC re-broadcasts the * request with F_CIB_SCHEMA_MAX, and each node performs the upgrade * (and notifies its local clients) here. */ return cib_process_upgrade( op, options, section, req, input, existing_cib, result_cib, answer); } else { int new_version = 0; int current_version = 0; xmlNode *scratch = copy_xml(existing_cib); const char *host = crm_element_value(req, F_ORIG); const char *value = crm_element_value(existing_cib, XML_ATTR_VALIDATION); const char *client_id = crm_element_value(req, F_CIB_CLIENTID); const char *call_opts = crm_element_value(req, F_CIB_CALLOPTS); const char *call_id = crm_element_value(req, F_CIB_CALLID); crm_trace("Processing \"%s\" event", op); if (value != NULL) { current_version = get_schema_version(value); } rc = update_validation(&scratch, &new_version, 0, TRUE, TRUE); if (new_version > current_version) { xmlNode *up = create_xml_node(NULL, __func__); rc = pcmk_ok; crm_notice("Upgrade request from %s verified", host); crm_xml_add(up, F_TYPE, "cib"); crm_xml_add(up, F_CIB_OPERATION, PCMK__CIB_REQUEST_UPGRADE); crm_xml_add(up, F_CIB_SCHEMA_MAX, get_schema_name(new_version)); crm_xml_add(up, F_CIB_DELEGATED, host); crm_xml_add(up, F_CIB_CLIENTID, client_id); crm_xml_add(up, F_CIB_CALLOPTS, call_opts); crm_xml_add(up, F_CIB_CALLID, call_id); if (cib_legacy_mode() && based_is_primary) { rc = cib_process_upgrade( op, options, section, up, input, existing_cib, result_cib, answer); } else { send_cluster_message(NULL, crm_msg_cib, up, FALSE); } free_xml(up); } else if(rc == pcmk_ok) { rc = -pcmk_err_schema_unchanged; } if (rc != pcmk_ok) { // Notify originating peer so it can notify its local clients - crm_node_t *origin = pcmk__search_cluster_node_cache(0, host); + crm_node_t *origin = pcmk__search_cluster_node_cache(0, host, NULL); crm_info("Rejecting upgrade request from %s: %s " CRM_XS " rc=%d peer=%s", host, pcmk_strerror(rc), rc, (origin? origin->uname : "lost")); if (origin) { xmlNode *up = create_xml_node(NULL, __func__); crm_xml_add(up, F_TYPE, "cib"); crm_xml_add(up, F_CIB_OPERATION, PCMK__CIB_REQUEST_UPGRADE); crm_xml_add(up, F_CIB_DELEGATED, host); crm_xml_add(up, F_CIB_ISREPLY, host); crm_xml_add(up, F_CIB_CLIENTID, client_id); crm_xml_add(up, F_CIB_CALLOPTS, call_opts); crm_xml_add(up, F_CIB_CALLID, call_id); crm_xml_add_int(up, F_CIB_UPGRADE_RC, rc); if (send_cluster_message(origin, crm_msg_cib, up, TRUE) == FALSE) { crm_warn("Could not send CIB upgrade result to %s", host); } free_xml(up); } } free_xml(scratch); } return rc; } int cib_process_sync_one(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { return sync_our_cib(req, FALSE); } int cib_server_process_diff(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { int rc = pcmk_ok; if (sync_in_progress > MAX_DIFF_RETRY) { /* Don't ignore diffs forever; the last request may have been lost. * If the diff fails, we'll ask for another full resync. */ sync_in_progress = 0; } // The primary instance should never ignore a diff if (sync_in_progress && !based_is_primary) { int diff_add_updates = 0; int diff_add_epoch = 0; int diff_add_admin_epoch = 0; int diff_del_updates = 0; int diff_del_epoch = 0; int diff_del_admin_epoch = 0; cib_diff_version_details(input, &diff_add_admin_epoch, &diff_add_epoch, &diff_add_updates, &diff_del_admin_epoch, &diff_del_epoch, &diff_del_updates); sync_in_progress++; crm_notice("Not applying diff %d.%d.%d -> %d.%d.%d (sync in progress)", diff_del_admin_epoch, diff_del_epoch, diff_del_updates, diff_add_admin_epoch, diff_add_epoch, diff_add_updates); return -pcmk_err_diff_resync; } rc = cib_process_diff(op, options, section, req, input, existing_cib, result_cib, answer); crm_trace("result: %s (%d), %s", pcmk_strerror(rc), rc, (based_is_primary? "primary": "secondary")); if ((rc == -pcmk_err_diff_resync) && !based_is_primary) { free_xml(*result_cib); *result_cib = NULL; send_sync_request(NULL); } else if (rc == -pcmk_err_diff_resync) { rc = -pcmk_err_diff_failed; if (options & cib_force_diff) { crm_warn("Not requesting full refresh in R/W mode"); } } else if ((rc != pcmk_ok) && !based_is_primary && cib_legacy_mode()) { crm_warn("Requesting full CIB refresh because update failed: %s" CRM_XS " rc=%d", pcmk_strerror(rc), rc); pcmk__output_set_log_level(logger_out, LOG_INFO); logger_out->message(logger_out, "xml-patchset", input); free_xml(*result_cib); *result_cib = NULL; send_sync_request(NULL); } return rc; } int cib_process_replace_svr(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { const char *tag = crm_element_name(input); int rc = cib_process_replace(op, options, section, req, input, existing_cib, result_cib, answer); if (rc == pcmk_ok && pcmk__str_eq(tag, XML_TAG_CIB, pcmk__str_casei)) { sync_in_progress = 0; } return rc; } // @COMPAT: Remove when PCMK__CIB_REQUEST_ABS_DELETE is removed int cib_process_delete_absolute(const char *op, int options, const char *section, xmlNode * req, xmlNode * input, xmlNode * existing_cib, xmlNode ** result_cib, xmlNode ** answer) { return -EINVAL; } static xmlNode * cib_msg_copy(xmlNode *msg) { static const char *field_list[] = { F_XML_TAGNAME, F_TYPE, F_CIB_CLIENTID, F_CIB_CALLOPTS, F_CIB_CALLID, F_CIB_OPERATION, F_CIB_ISREPLY, F_CIB_SECTION, F_CIB_HOST, F_CIB_RC, F_CIB_DELEGATED, F_CIB_OBJID, F_CIB_OBJTYPE, F_CIB_EXISTING, F_CIB_SEENCOUNT, F_CIB_TIMEOUT, F_CIB_GLOBAL_UPDATE, F_CIB_CLIENTNAME, F_CIB_USER, F_CIB_NOTIFY_TYPE, F_CIB_NOTIFY_ACTIVATE }; xmlNode *copy = create_xml_node(NULL, "copy"); CRM_ASSERT(copy != NULL); for (int lpc = 0; lpc < PCMK__NELEM(field_list); lpc++) { const char *field = field_list[lpc]; const char *value = crm_element_value(msg, field); if (value != NULL) { crm_xml_add(copy, field, value); } } return copy; } int sync_our_cib(xmlNode * request, gboolean all) { int result = pcmk_ok; char *digest = NULL; const char *host = crm_element_value(request, F_ORIG); const char *op = crm_element_value(request, F_CIB_OPERATION); xmlNode *replace_request = NULL; CRM_CHECK(the_cib != NULL, return -EINVAL); CRM_CHECK(all || (host != NULL), return -EINVAL); crm_debug("Syncing CIB to %s", all ? "all peers" : host); replace_request = cib_msg_copy(request); if (host != NULL) { crm_xml_add(replace_request, F_CIB_ISREPLY, host); } if (all) { xml_remove_prop(replace_request, F_CIB_HOST); } crm_xml_add(replace_request, F_CIB_OPERATION, PCMK__CIB_REQUEST_REPLACE); crm_xml_add(replace_request, "original_" F_CIB_OPERATION, op); pcmk__xe_set_bool_attr(replace_request, F_CIB_GLOBAL_UPDATE, true); crm_xml_add(replace_request, XML_ATTR_CRM_VERSION, CRM_FEATURE_SET); digest = calculate_xml_versioned_digest(the_cib, FALSE, TRUE, CRM_FEATURE_SET); crm_xml_add(replace_request, XML_ATTR_DIGEST, digest); add_message_xml(replace_request, F_CIB_CALLDATA, the_cib); if (send_cluster_message (all ? NULL : crm_get_peer(0, host), crm_msg_cib, replace_request, FALSE) == FALSE) { result = -ENOTCONN; } free_xml(replace_request); free(digest); return result; } int cib_process_init_transaction(const char *op, int options, const char *section, xmlNode *req, xmlNode *input, xmlNode *existing_cib, xmlNode **result_cib, xmlNode **answer) { const char *client_id = crm_element_value(req, F_CIB_CLIENTID); pcmk__client_t *client = NULL; const char *reason = NULL; int rc = pcmk_ok; if (client_id == NULL) { reason = "No client ID"; rc = -EINVAL; goto done; } client = pcmk__find_client_by_id(client_id); if (client == NULL) { reason = "Client not found"; rc = -ENXIO; goto done; } rc = based_init_transaction(client); if (rc != pcmk_rc_ok) { reason = pcmk_rc_str(rc); } rc = pcmk_rc2legacy(rc); done: if (rc != pcmk_ok) { const char *client_name = crm_element_value(req, F_CIB_CLIENTNAME); crm_err("Could not initiate transaction for client %s (%s): %s", pcmk__s(client_name, "unspecified"), pcmk__s(client_id, "unknown"), pcmk__s(reason, "unknown reason (bug?)")); } return rc; } int cib_process_commit_transaction(const char *op, int options, const char *section, xmlNode *req, xmlNode *input, xmlNode *existing_cib, xmlNode **result_cib, xmlNode **answer) { /* On success, our caller will activate *result_cib locally, trigger a * replace notification if appropriate, and sync *result_cib to all nodes. * On failure, our caller will free *result_cib. */ const char *client_id = crm_element_value(req, F_CIB_CLIENTID); pcmk__client_t *client = NULL; const char *reason = NULL; int rc = pcmk_ok; if (client_id == NULL) { reason = "No client ID"; rc = -EINVAL; goto done; } client = pcmk__find_client_by_id(client_id); if (client == NULL) { reason = "Client not found"; rc = -ENXIO; goto done; } rc = based_commit_transaction(client, result_cib); if (rc != pcmk_rc_ok) { reason = pcmk_rc_str(rc); rc = pcmk_rc2legacy(rc); } done: if (rc != pcmk_ok) { const char *client_name = crm_element_value(req, F_CIB_CLIENTNAME); crm_err("Could not commit transaction for client %s (%s): %s", pcmk__s(client_name, "unspecified"), pcmk__s(client_id, "unknown"), pcmk__s(reason, "unknown reason (bug?)")); } return rc; } int cib_process_discard_transaction(const char *op, int options, const char *section, xmlNode *req, xmlNode *input, xmlNode *existing_cib, xmlNode **result_cib, xmlNode **answer) { const char *client_id = crm_element_value(req, F_CIB_CLIENTID); pcmk__client_t *client = NULL; const char *reason = NULL; int rc = pcmk_ok; if (client_id == NULL) { reason = "No client ID"; rc = -EINVAL; goto done; } client = pcmk__find_client_by_id(client_id); if (client == NULL) { reason = "Client not found"; rc = -ENXIO; goto done; } based_discard_transaction(client); done: if (rc != pcmk_ok) { const char *client_name = crm_element_value(req, F_CIB_CLIENTNAME); crm_err("Could not discard transaction for client %s (%s): %s", pcmk__s(client_name, "unspecified"), pcmk__s(client_id, "unknown"), pcmk__s(reason, "unknown reason (bug?)")); } return rc; } diff --git a/daemons/controld/controld_callbacks.c b/daemons/controld/controld_callbacks.c index d578adc932..e3b68f6906 100644 --- a/daemons/controld/controld_callbacks.c +++ b/daemons/controld/controld_callbacks.c @@ -1,367 +1,369 @@ /* * Copyright 2004-2022 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include /* From join_dc... */ extern gboolean check_join_state(enum crmd_fsa_state cur_state, const char *source); void crmd_ha_msg_filter(xmlNode * msg) { if (AM_I_DC) { const char *sys_from = crm_element_value(msg, F_CRM_SYS_FROM); if (pcmk__str_eq(sys_from, CRM_SYSTEM_DC, pcmk__str_casei)) { const char *from = crm_element_value(msg, F_ORIG); if (!pcmk__str_eq(from, controld_globals.our_nodename, pcmk__str_casei)) { int level = LOG_INFO; const char *op = crm_element_value(msg, F_CRM_TASK); /* make sure the election happens NOW */ if (controld_globals.fsa_state != S_ELECTION) { ha_msg_input_t new_input; level = LOG_WARNING; new_input.msg = msg; register_fsa_error_adv(C_FSA_INTERNAL, I_ELECTION, NULL, &new_input, __func__); } do_crm_log(level, "Another DC detected: %s (op=%s)", from, op); goto done; } } } else { const char *sys_to = crm_element_value(msg, F_CRM_SYS_TO); if (pcmk__str_eq(sys_to, CRM_SYSTEM_DC, pcmk__str_casei)) { return; } } /* crm_log_xml_trace(msg, "HA[inbound]"); */ route_message(C_HA_MESSAGE, msg); done: controld_trigger_fsa(); } /*! * \internal * \brief Check whether a node is online * * \param[in] node Node to check * * \retval -1 if completely dead * \retval 0 if partially alive * \retval 1 if completely alive */ static int node_alive(const crm_node_t *node) { if (pcmk_is_set(node->flags, crm_remote_node)) { // Pacemaker Remote nodes can't be partially alive return pcmk__str_eq(node->state, CRM_NODE_MEMBER, pcmk__str_casei) ? 1: -1; } else if (crm_is_peer_active(node)) { // Completely up cluster node: both cluster member and peer return 1; } else if (!pcmk_is_set(node->processes, crm_get_cluster_proc()) && !pcmk__str_eq(node->state, CRM_NODE_MEMBER, pcmk__str_casei)) { // Completely down cluster node: neither cluster member nor peer return -1; } // Partially up cluster node: only cluster member or only peer return 0; } #define state_text(state) ((state)? (const char *)(state) : "in unknown state") void peer_update_callback(enum crm_status_type type, crm_node_t * node, const void *data) { uint32_t old = 0; bool appeared = FALSE; bool is_remote = pcmk_is_set(node->flags, crm_remote_node); + controld_node_pending_timer(node); + /* The controller waits to receive some information from the membership * layer before declaring itself operational. If this is being called for a * cluster node, indicate that we have it. */ if (!is_remote) { controld_set_fsa_input_flags(R_PEER_DATA); } if (type == crm_status_processes && pcmk_is_set(node->processes, crm_get_cluster_proc()) && !AM_I_DC && !is_remote) { /* * This is a hack until we can send to a nodeid and/or we fix node name lookups * These messages are ignored in crmd_ha_msg_filter() */ xmlNode *query = create_request(CRM_OP_HELLO, NULL, NULL, CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL); crm_debug("Sending hello to node %u so that it learns our node name", node->id); send_cluster_message(node, crm_msg_crmd, query, FALSE); free_xml(query); } if (node->uname == NULL) { return; } switch (type) { case crm_status_uname: /* If we've never seen the node, then it also won't be in the status section */ crm_info("%s node %s is now %s", (is_remote? "Remote" : "Cluster"), node->uname, state_text(node->state)); return; case crm_status_nstate: /* This callback should not be called unless the state actually * changed, but here's a failsafe just in case. */ CRM_CHECK(!pcmk__str_eq(data, node->state, pcmk__str_casei), return); crm_info("%s node %s is now %s (was %s)", (is_remote? "Remote" : "Cluster"), node->uname, state_text(node->state), state_text(data)); if (pcmk__str_eq(CRM_NODE_MEMBER, node->state, pcmk__str_casei)) { appeared = TRUE; if (!is_remote) { remove_stonith_cleanup(node->uname); } } else { controld_remove_failed_sync_node(node->uname); controld_remove_voter(node->uname); } crmd_alert_node_event(node); break; case crm_status_processes: CRM_CHECK(data != NULL, return); old = *(const uint32_t *)data; appeared = pcmk_is_set(node->processes, crm_get_cluster_proc()); { const char *dc_s = controld_globals.dc_name; if ((dc_s == NULL) && AM_I_DC) { dc_s = "true"; } crm_info("Node %s is %s a peer " CRM_XS " DC=%s old=%#07x new=%#07x", node->uname, (appeared? "now" : "no longer"), pcmk__s(dc_s, ""), old, node->processes); } if (!pcmk_is_set((node->processes ^ old), crm_get_cluster_proc())) { /* Peer status did not change. This should not be possible, * since we don't track process flags other than peer status. */ crm_trace("Process flag %#7x did not change from %#7x to %#7x", crm_get_cluster_proc(), old, node->processes); return; } if (!appeared) { node->peer_lost = time(NULL); controld_remove_failed_sync_node(node->uname); controld_remove_voter(node->uname); } if (!pcmk_is_set(controld_globals.fsa_input_register, R_CIB_CONNECTED)) { crm_trace("Ignoring peer status change because not connected to CIB"); return; } else if (controld_globals.fsa_state == S_STOPPING) { crm_trace("Ignoring peer status change because stopping"); return; } if (!appeared && pcmk__str_eq(node->uname, controld_globals.our_nodename, pcmk__str_casei)) { /* Did we get evicted? */ crm_notice("Our peer connection failed"); register_fsa_input(C_CRMD_STATUS_CALLBACK, I_ERROR, NULL); } else if (pcmk__str_eq(node->uname, controld_globals.dc_name, pcmk__str_casei) && !crm_is_peer_active(node)) { /* Did the DC leave us? */ crm_notice("Our peer on the DC (%s) is dead", controld_globals.dc_name); register_fsa_input(C_CRMD_STATUS_CALLBACK, I_ELECTION, NULL); /* @COMPAT DC < 1.1.13: If a DC shuts down normally, we don't * want to fence it. Newer DCs will send their shutdown request * to all peers, who will update the DC's expected state to * down, thus avoiding fencing. We can safely erase the DC's * transient attributes when it leaves in that case. However, * the only way to avoid fencing older DCs is to leave the * transient attributes intact until it rejoins. */ if (compare_version(controld_globals.dc_version, "3.0.9") > 0) { controld_delete_node_state(node->uname, controld_section_attrs, cib_scope_local); } } else if (AM_I_DC || pcmk_is_set(controld_globals.flags, controld_dc_left) || (controld_globals.dc_name == NULL)) { /* This only needs to be done once, so normally the DC should do * it. However if there is no DC, every node must do it, since * there is no other way to ensure some one node does it. */ if (appeared) { te_trigger_stonith_history_sync(FALSE); } else { controld_delete_node_state(node->uname, controld_section_attrs, cib_scope_local); } } break; } if (AM_I_DC) { xmlNode *update = NULL; int flags = node_update_peer; int alive = node_alive(node); pcmk__graph_action_t *down = match_down_event(node->uuid); crm_trace("Alive=%d, appeared=%d, down=%d", alive, appeared, (down? down->id : -1)); if (appeared && (alive > 0) && !is_remote) { register_fsa_input_before(C_FSA_INTERNAL, I_NODE_JOIN, NULL); } if (down) { const char *task = crm_element_value(down->xml, XML_LRM_ATTR_TASK); if (pcmk__str_eq(task, CRM_OP_FENCE, pcmk__str_casei)) { /* tengine_stonith_callback() confirms fence actions */ crm_trace("Updating CIB %s fencer reported fencing of %s complete", (pcmk_is_set(down->flags, pcmk__graph_action_confirmed)? "after" : "before"), node->uname); } else if (!appeared && pcmk__str_eq(task, CRM_OP_SHUTDOWN, pcmk__str_casei)) { // Shutdown actions are immediately confirmed (i.e. no_wait) if (!is_remote) { flags |= node_update_join | node_update_expected; crmd_peer_down(node, FALSE); check_join_state(controld_globals.fsa_state, __func__); } if (alive >= 0) { crm_info("%s of peer %s is in progress " CRM_XS " action=%d", task, node->uname, down->id); } else { crm_notice("%s of peer %s is complete " CRM_XS " action=%d", task, node->uname, down->id); pcmk__update_graph(controld_globals.transition_graph, down); trigger_graph(); } } else { crm_trace("Node %s is %s, was expected to %s (op %d)", node->uname, ((alive > 0)? "alive" : ((alive < 0)? "dead" : "partially alive")), task, down->id); } } else if (appeared == FALSE) { if ((controld_globals.transition_graph == NULL) || (controld_globals.transition_graph->id == -1)) { crm_info("Stonith/shutdown of node %s is unknown to the " "current DC", node->uname); } else { crm_warn("Stonith/shutdown of node %s was not expected", node->uname); } if (!is_remote) { crm_update_peer_join(__func__, node, crm_join_none); check_join_state(controld_globals.fsa_state, __func__); } abort_transition(INFINITY, pcmk__graph_restart, "Node failure", NULL); fail_incompletable_actions(controld_globals.transition_graph, node->uuid); } else { crm_trace("Node %s came up, was not expected to be down", node->uname); } if (is_remote) { /* A pacemaker_remote node won't have its cluster status updated * in the CIB by membership-layer callbacks, so do it here. */ flags |= node_update_cluster; /* Trigger resource placement on newly integrated nodes */ if (appeared) { abort_transition(INFINITY, pcmk__graph_restart, "Pacemaker Remote node integrated", NULL); } } /* Update the CIB node state */ update = create_node_state_update(node, flags, NULL, __func__); if (update == NULL) { crm_debug("Node state update not yet possible for %s", node->uname); } else { fsa_cib_anon_update(XML_CIB_TAG_STATUS, update); } free_xml(update); } controld_trigger_fsa(); } gboolean crm_fsa_trigger(gpointer user_data) { crm_trace("Invoked (queue len: %d)", g_list_length(controld_globals.fsa_message_queue)); s_crmd_fsa(C_FSA_INTERNAL); crm_trace("Exited (queue len: %d)", g_list_length(controld_globals.fsa_message_queue)); return TRUE; } diff --git a/daemons/controld/controld_control.c b/daemons/controld/controld_control.c index ffc62a01ac..bfcd88f7b7 100644 --- a/daemons/controld/controld_control.c +++ b/daemons/controld/controld_control.c @@ -1,857 +1,862 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include static qb_ipcs_service_t *ipcs = NULL; static crm_trigger_t *config_read_trigger = NULL; #if SUPPORT_COROSYNC extern gboolean crm_connect_corosync(crm_cluster_t * cluster); #endif void crm_shutdown(int nsig); static gboolean crm_read_options(gpointer user_data); /* A_HA_CONNECT */ void do_ha_control(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { gboolean registered = FALSE; static crm_cluster_t *cluster = NULL; if (cluster == NULL) { cluster = pcmk_cluster_new(); } if (action & A_HA_DISCONNECT) { crm_cluster_disconnect(cluster); crm_info("Disconnected from the cluster"); controld_set_fsa_input_flags(R_HA_DISCONNECTED); } if (action & A_HA_CONNECT) { crm_set_status_callback(&peer_update_callback); crm_set_autoreap(FALSE); #if SUPPORT_COROSYNC if (is_corosync_cluster()) { registered = crm_connect_corosync(cluster); } #endif // SUPPORT_COROSYNC if (registered) { controld_election_init(cluster->uname); controld_globals.our_nodename = cluster->uname; controld_globals.our_uuid = cluster->uuid; if(cluster->uuid == NULL) { crm_err("Could not obtain local uuid"); registered = FALSE; } } if (!registered) { controld_set_fsa_input_flags(R_HA_DISCONNECTED); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); return; } populate_cib_nodes(node_update_none, __func__); controld_clear_fsa_input_flags(R_HA_DISCONNECTED); crm_info("Connected to the cluster"); } if (action & ~(A_HA_CONNECT | A_HA_DISCONNECT)) { crm_err("Unexpected action %s in %s", fsa_action2string(action), __func__); } } /* A_SHUTDOWN */ void do_shutdown(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { /* just in case */ controld_set_fsa_input_flags(R_SHUTDOWN); controld_disconnect_fencer(FALSE); } /* A_SHUTDOWN_REQ */ void do_shutdown_req(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { xmlNode *msg = NULL; controld_set_fsa_input_flags(R_SHUTDOWN); //controld_set_fsa_input_flags(R_STAYDOWN); crm_info("Sending shutdown request to all peers (DC is %s)", pcmk__s(controld_globals.dc_name, "not set")); msg = create_request(CRM_OP_SHUTDOWN_REQ, NULL, NULL, CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL); if (send_cluster_message(NULL, crm_msg_crmd, msg, TRUE) == FALSE) { register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } free_xml(msg); } void crmd_fast_exit(crm_exit_t exit_code) { if (pcmk_is_set(controld_globals.fsa_input_register, R_STAYDOWN)) { crm_warn("Inhibiting respawn "CRM_XS" remapping exit code %d to %d", exit_code, CRM_EX_FATAL); exit_code = CRM_EX_FATAL; } else if ((exit_code == CRM_EX_OK) && pcmk_is_set(controld_globals.fsa_input_register, R_IN_RECOVERY)) { crm_err("Could not recover from internal error"); exit_code = CRM_EX_ERROR; } if (controld_globals.logger_out != NULL) { controld_globals.logger_out->finish(controld_globals.logger_out, exit_code, true, NULL); pcmk__output_free(controld_globals.logger_out); controld_globals.logger_out = NULL; } crm_exit(exit_code); } crm_exit_t crmd_exit(crm_exit_t exit_code) { GMainLoop *mloop = controld_globals.mainloop; static bool in_progress = FALSE; if (in_progress && (exit_code == CRM_EX_OK)) { crm_debug("Exit is already in progress"); return exit_code; } else if(in_progress) { crm_notice("Error during shutdown process, exiting now with status %d (%s)", exit_code, crm_exit_str(exit_code)); crm_write_blackbox(SIGTRAP, NULL); crmd_fast_exit(exit_code); } in_progress = TRUE; crm_trace("Preparing to exit with status %d (%s)", exit_code, crm_exit_str(exit_code)); /* Suppress secondary errors resulting from us disconnecting everything */ controld_set_fsa_input_flags(R_HA_DISCONNECTED); /* Close all IPC servers and clients to ensure any and all shared memory files are cleaned up */ if(ipcs) { crm_trace("Closing IPC server"); mainloop_del_ipc_server(ipcs); ipcs = NULL; } controld_close_attrd_ipc(); controld_shutdown_schedulerd_ipc(); controld_disconnect_fencer(TRUE); if ((exit_code == CRM_EX_OK) && (controld_globals.mainloop == NULL)) { crm_debug("No mainloop detected"); exit_code = CRM_EX_ERROR; } /* On an error, just get out. * * Otherwise, make the effort to have mainloop exit gracefully so * that it (mostly) cleans up after itself and valgrind has less * to report on - allowing real errors stand out */ if (exit_code != CRM_EX_OK) { crm_notice("Forcing immediate exit with status %d (%s)", exit_code, crm_exit_str(exit_code)); crm_write_blackbox(SIGTRAP, NULL); crmd_fast_exit(exit_code); } /* Clean up as much memory as possible for valgrind */ for (GList *iter = controld_globals.fsa_message_queue; iter != NULL; iter = iter->next) { fsa_data_t *fsa_data = (fsa_data_t *) iter->data; crm_info("Dropping %s: [ state=%s cause=%s origin=%s ]", fsa_input2string(fsa_data->fsa_input), fsa_state2string(controld_globals.fsa_state), fsa_cause2string(fsa_data->fsa_cause), fsa_data->origin); delete_fsa_input(fsa_data); } controld_clear_fsa_input_flags(R_MEMBERSHIP); g_list_free(controld_globals.fsa_message_queue); controld_globals.fsa_message_queue = NULL; + controld_free_node_pending_timers(); controld_election_fini(); /* Tear down the CIB manager connection, but don't free it yet -- it could * be used when we drain the mainloop later. */ controld_disconnect_cib_manager(); verify_stopped(controld_globals.fsa_state, LOG_WARNING); controld_clear_fsa_input_flags(R_LRM_CONNECTED); lrm_state_destroy_all(); mainloop_destroy_trigger(config_read_trigger); config_read_trigger = NULL; controld_destroy_fsa_trigger(); controld_destroy_transition_trigger(); pcmk__client_cleanup(); crm_peer_destroy(); controld_free_fsa_timers(); te_cleanup_stonith_history_sync(NULL, TRUE); controld_free_sched_timer(); free(controld_globals.our_nodename); controld_globals.our_nodename = NULL; free(controld_globals.our_uuid); controld_globals.our_uuid = NULL; free(controld_globals.dc_name); controld_globals.dc_name = NULL; free(controld_globals.dc_version); controld_globals.dc_version = NULL; free(controld_globals.cluster_name); controld_globals.cluster_name = NULL; free(controld_globals.te_uuid); controld_globals.te_uuid = NULL; free_max_generation(); controld_destroy_cib_replacements_table(); controld_destroy_failed_sync_table(); controld_destroy_outside_events_table(); mainloop_destroy_signal(SIGPIPE); mainloop_destroy_signal(SIGUSR1); mainloop_destroy_signal(SIGTERM); mainloop_destroy_signal(SIGTRAP); /* leave SIGCHLD engaged as we might still want to drain some service-actions */ if (mloop) { GMainContext *ctx = g_main_loop_get_context(controld_globals.mainloop); /* Don't re-enter this block */ controld_globals.mainloop = NULL; /* no signals on final draining anymore */ mainloop_destroy_signal(SIGCHLD); crm_trace("Draining mainloop %d %d", g_main_loop_is_running(mloop), g_main_context_pending(ctx)); { int lpc = 0; while((g_main_context_pending(ctx) && lpc < 10)) { lpc++; crm_trace("Iteration %d", lpc); g_main_context_dispatch(ctx); } } crm_trace("Closing mainloop %d %d", g_main_loop_is_running(mloop), g_main_context_pending(ctx)); g_main_loop_quit(mloop); /* Won't do anything yet, since we're inside it now */ g_main_loop_unref(mloop); } else { mainloop_destroy_signal(SIGCHLD); } cib_delete(controld_globals.cib_conn); controld_globals.cib_conn = NULL; throttle_fini(); /* Graceful */ crm_trace("Done preparing for exit with status %d (%s)", exit_code, crm_exit_str(exit_code)); return exit_code; } /* A_EXIT_0, A_EXIT_1 */ void do_exit(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { crm_exit_t exit_code = CRM_EX_OK; int log_level = LOG_INFO; const char *exit_type = "gracefully"; if (action & A_EXIT_1) { log_level = LOG_ERR; exit_type = "forcefully"; exit_code = CRM_EX_ERROR; } verify_stopped(cur_state, LOG_ERR); do_crm_log(log_level, "Performing %s - %s exiting the controller", fsa_action2string(action), exit_type); crm_info("[%s] stopped (%d)", crm_system_name, exit_code); crmd_exit(exit_code); } static void sigpipe_ignore(int nsig) { return; } /* A_STARTUP */ void do_startup(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { crm_debug("Registering Signal Handlers"); mainloop_add_signal(SIGTERM, crm_shutdown); mainloop_add_signal(SIGPIPE, sigpipe_ignore); config_read_trigger = mainloop_add_trigger(G_PRIORITY_HIGH, crm_read_options, NULL); controld_init_fsa_trigger(); controld_init_transition_trigger(); crm_debug("Creating CIB manager and executor objects"); controld_globals.cib_conn = cib_new(); lrm_state_init_local(); if (controld_init_fsa_timers() == FALSE) { register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } } // \return libqb error code (0 on success, -errno on error) static int32_t accept_controller_client(qb_ipcs_connection_t *c, uid_t uid, gid_t gid) { crm_trace("Accepting new IPC client connection"); if (pcmk__new_client(c, uid, gid) == NULL) { return -EIO; } return 0; } // \return libqb error code (0 on success, -errno on error) static int32_t dispatch_controller_ipc(qb_ipcs_connection_t * c, void *data, size_t size) { uint32_t id = 0; uint32_t flags = 0; pcmk__client_t *client = pcmk__find_client(c); xmlNode *msg = pcmk__client_data2xml(client, data, &id, &flags); if (msg == NULL) { pcmk__ipc_send_ack(client, id, flags, "ack", NULL, CRM_EX_PROTOCOL); return 0; } pcmk__ipc_send_ack(client, id, flags, "ack", NULL, CRM_EX_INDETERMINATE); CRM_ASSERT(client->user != NULL); pcmk__update_acl_user(msg, F_CRM_USER, client->user); crm_xml_add(msg, F_CRM_SYS_FROM, client->id); if (controld_authorize_ipc_message(msg, client, NULL)) { crm_trace("Processing IPC message from client %s", pcmk__client_name(client)); route_message(C_IPC_MESSAGE, msg); } controld_trigger_fsa(); free_xml(msg); return 0; } static int32_t ipc_client_disconnected(qb_ipcs_connection_t *c) { pcmk__client_t *client = pcmk__find_client(c); if (client) { crm_trace("Disconnecting %sregistered client %s (%p/%p)", (client->userdata? "" : "un"), pcmk__client_name(client), c, client); free(client->userdata); pcmk__free_client(client); controld_trigger_fsa(); } return 0; } static void ipc_connection_destroyed(qb_ipcs_connection_t *c) { crm_trace("Connection %p", c); ipc_client_disconnected(c); } /* A_STOP */ void do_stop(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { crm_trace("Closing IPC server"); mainloop_del_ipc_server(ipcs); ipcs = NULL; register_fsa_input(C_FSA_INTERNAL, I_TERMINATE, NULL); } /* A_STARTED */ void do_started(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { static struct qb_ipcs_service_handlers crmd_callbacks = { .connection_accept = accept_controller_client, .connection_created = NULL, .msg_process = dispatch_controller_ipc, .connection_closed = ipc_client_disconnected, .connection_destroyed = ipc_connection_destroyed }; if (cur_state != S_STARTING) { crm_err("Start cancelled... %s", fsa_state2string(cur_state)); return; } else if (!pcmk_is_set(controld_globals.fsa_input_register, R_MEMBERSHIP)) { crm_info("Delaying start, no membership data (%.16llx)", R_MEMBERSHIP); crmd_fsa_stall(TRUE); return; } else if (!pcmk_is_set(controld_globals.fsa_input_register, R_LRM_CONNECTED)) { crm_info("Delaying start, not connected to executor (%.16llx)", R_LRM_CONNECTED); crmd_fsa_stall(TRUE); return; } else if (!pcmk_is_set(controld_globals.fsa_input_register, R_CIB_CONNECTED)) { crm_info("Delaying start, CIB not connected (%.16llx)", R_CIB_CONNECTED); crmd_fsa_stall(TRUE); return; } else if (!pcmk_is_set(controld_globals.fsa_input_register, R_READ_CONFIG)) { crm_info("Delaying start, Config not read (%.16llx)", R_READ_CONFIG); crmd_fsa_stall(TRUE); return; } else if (!pcmk_is_set(controld_globals.fsa_input_register, R_PEER_DATA)) { crm_info("Delaying start, No peer data (%.16llx)", R_PEER_DATA); crmd_fsa_stall(TRUE); return; } crm_debug("Init server comms"); ipcs = pcmk__serve_controld_ipc(&crmd_callbacks); if (ipcs == NULL) { crm_err("Failed to create IPC server: shutting down and inhibiting respawn"); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } else { crm_notice("Pacemaker controller successfully started and accepting connections"); } controld_trigger_fencer_connect(); controld_clear_fsa_input_flags(R_STARTING); register_fsa_input(msg_data->fsa_cause, I_PENDING, NULL); } /* A_RECOVER */ void do_recover(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { controld_set_fsa_input_flags(R_IN_RECOVERY); crm_warn("Fast-tracking shutdown in response to errors"); register_fsa_input(C_FSA_INTERNAL, I_TERMINATE, NULL); } static pcmk__cluster_option_t controller_options[] = { /* name, old name, type, allowed values, * default value, validator, * short description, * long description */ { "dc-version", NULL, "string", NULL, PCMK__VALUE_NONE, NULL, N_("Pacemaker version on cluster node elected Designated Controller (DC)"), N_("Includes a hash which identifies the exact changeset the code was " "built from. Used for diagnostic purposes.") }, { "cluster-infrastructure", NULL, "string", NULL, "corosync", NULL, N_("The messaging stack on which Pacemaker is currently running"), N_("Used for informational and diagnostic purposes.") }, { "cluster-name", NULL, "string", NULL, NULL, NULL, N_("An arbitrary name for the cluster"), N_("This optional value is mostly for users' convenience as desired " "in administration, but may also be used in Pacemaker " "configuration rules via the #cluster-name node attribute, and " "by higher-level tools and resource agents.") }, { XML_CONFIG_ATTR_DC_DEADTIME, NULL, "time", NULL, "20s", pcmk__valid_interval_spec, N_("How long to wait for a response from other nodes during start-up"), N_("The optimal value will depend on the speed and load of your network " "and the type of switches used.") }, { XML_CONFIG_ATTR_RECHECK, NULL, "time", N_("Zero disables polling, while positive values are an interval in seconds" "(unless other units are specified, for example \"5min\")"), "15min", pcmk__valid_interval_spec, N_("Polling interval to recheck cluster state and evaluate rules " "with date specifications"), N_("Pacemaker is primarily event-driven, and looks ahead to know when to " "recheck cluster state for failure timeouts and most time-based " "rules. However, it will also recheck the cluster after this " "amount of inactivity, to evaluate rules with date specifications " "and serve as a fail-safe for certain types of scheduler bugs.") }, { "load-threshold", NULL, "percentage", NULL, "80%", pcmk__valid_percentage, N_("Maximum amount of system load that should be used by cluster nodes"), N_("The cluster will slow down its recovery process when the amount of " "system resources used (currently CPU) approaches this limit"), }, { "node-action-limit", NULL, "integer", NULL, "0", pcmk__valid_number, N_("Maximum number of jobs that can be scheduled per node " "(defaults to 2x cores)") }, { XML_CONFIG_ATTR_FENCE_REACTION, NULL, "string", NULL, "stop", NULL, N_("How a cluster node should react if notified of its own fencing"), N_("A cluster node may receive notification of its own fencing if fencing " "is misconfigured, or if fabric fencing is in use that doesn't cut " "cluster communication. Allowed values are \"stop\" to attempt to " "immediately stop Pacemaker and stay stopped, or \"panic\" to attempt " "to immediately reboot the local node, falling back to stop on failure.") }, { XML_CONFIG_ATTR_ELECTION_FAIL, NULL, "time", NULL, "2min", pcmk__valid_interval_spec, "*** Advanced Use Only ***", N_("Declare an election failed if it is not decided within this much " "time. If you need to adjust this value, it probably indicates " "the presence of a bug.") }, { XML_CONFIG_ATTR_FORCE_QUIT, NULL, "time", NULL, "20min", pcmk__valid_interval_spec, "*** Advanced Use Only ***", N_("Exit immediately if shutdown does not complete within this much " "time. If you need to adjust this value, it probably indicates " "the presence of a bug.") }, { "join-integration-timeout", "crmd-integration-timeout", "time", NULL, "3min", pcmk__valid_interval_spec, "*** Advanced Use Only ***", N_("If you need to adjust this value, it probably indicates " "the presence of a bug.") }, { "join-finalization-timeout", "crmd-finalization-timeout", "time", NULL, "30min", pcmk__valid_interval_spec, "*** Advanced Use Only ***", N_("If you need to adjust this value, it probably indicates " "the presence of a bug.") }, { "transition-delay", "crmd-transition-delay", "time", NULL, "0s", pcmk__valid_interval_spec, N_("*** Advanced Use Only *** Enabling this option will slow down " "cluster recovery under all conditions"), N_("Delay cluster recovery for this much time to allow for additional " "events to occur. Useful if your configuration is sensitive to " "the order in which ping updates arrive.") }, { "stonith-watchdog-timeout", NULL, "time", NULL, "0", controld_verify_stonith_watchdog_timeout, N_("How long before nodes can be assumed to be safely down when " "watchdog-based self-fencing via SBD is in use"), N_("If this is set to a positive value, lost nodes are assumed to " "self-fence using watchdog-based SBD within this much time. This " "does not require a fencing resource to be explicitly configured, " "though a fence_watchdog resource can be configured, to limit use " "to specific nodes. If this is set to 0 (the default), the cluster " "will never assume watchdog-based self-fencing. If this is set to a " "negative value, the cluster will use twice the local value of the " "`SBD_WATCHDOG_TIMEOUT` environment variable if that is positive, " "or otherwise treat this as 0. WARNING: When used, this timeout " "must be larger than `SBD_WATCHDOG_TIMEOUT` on all nodes that use " "watchdog-based SBD, and Pacemaker will refuse to start on any of " "those nodes where this is not true for the local value or SBD is " "not active. When this is set to a negative value, " "`SBD_WATCHDOG_TIMEOUT` must be set to the same value on all nodes " "that use SBD, otherwise data corruption or loss could occur.") }, { "stonith-max-attempts", NULL, "integer", NULL, "10", pcmk__valid_positive_number, N_("How many times fencing can fail before it will no longer be " "immediately re-attempted on a target") }, // Already documented in libpe_status (other values must be kept identical) { "no-quorum-policy", NULL, "select", "stop, freeze, ignore, demote, suicide", "stop", pcmk__valid_quorum, N_("What to do when the cluster does not have quorum"), NULL }, { XML_CONFIG_ATTR_SHUTDOWN_LOCK, NULL, "boolean", NULL, "false", pcmk__valid_boolean, N_("Whether to lock resources to a cleanly shut down node"), N_("When true, resources active on a node when it is cleanly shut down " "are kept \"locked\" to that node (not allowed to run elsewhere) " "until they start again on that node after it rejoins (or for at " "most shutdown-lock-limit, if set). Stonith resources and " "Pacemaker Remote connections are never locked. Clone and bundle " "instances and the promoted role of promotable clones are " "currently never locked, though support could be added in a future " "release.") }, { XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT, NULL, "time", NULL, "0", pcmk__valid_interval_spec, N_("Do not lock resources to a cleanly shut down node longer than " "this"), N_("If shutdown-lock is true and this is set to a nonzero time " "duration, shutdown locks will expire after this much time has " "passed since the shutdown was initiated, even if the node has not " "rejoined.") }, }; void crmd_metadata(void) { const char *desc_short = "Pacemaker controller options"; const char *desc_long = "Cluster options used by Pacemaker's controller"; gchar *s = pcmk__format_option_metadata("pacemaker-controld", desc_short, desc_long, controller_options, PCMK__NELEM(controller_options)); printf("%s", s); g_free(s); } static void config_query_callback(xmlNode * msg, int call_id, int rc, xmlNode * output, void *user_data) { const char *value = NULL; GHashTable *config_hash = NULL; crm_time_t *now = crm_time_new(NULL); xmlNode *crmconfig = NULL; xmlNode *alerts = NULL; if (rc != pcmk_ok) { fsa_data_t *msg_data = NULL; crm_err("Local CIB query resulted in an error: %s", pcmk_strerror(rc)); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); if (rc == -EACCES || rc == -pcmk_err_schema_validation) { crm_err("The cluster is mis-configured - shutting down and staying down"); controld_set_fsa_input_flags(R_STAYDOWN); } goto bail; } crmconfig = output; if ((crmconfig) && (crm_element_name(crmconfig)) && (strcmp(crm_element_name(crmconfig), XML_CIB_TAG_CRMCONFIG) != 0)) { crmconfig = first_named_child(crmconfig, XML_CIB_TAG_CRMCONFIG); } if (!crmconfig) { fsa_data_t *msg_data = NULL; crm_err("Local CIB query for " XML_CIB_TAG_CRMCONFIG " section failed"); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); goto bail; } crm_debug("Call %d : Parsing CIB options", call_id); config_hash = pcmk__strkey_table(free, free); pe_unpack_nvpairs(crmconfig, crmconfig, XML_CIB_TAG_PROPSET, NULL, config_hash, CIB_OPTIONS_FIRST, FALSE, now, NULL); // Validate all options, and use defaults if not already present in hash pcmk__validate_cluster_options(config_hash, controller_options, PCMK__NELEM(controller_options)); value = g_hash_table_lookup(config_hash, "no-quorum-policy"); if (pcmk__str_eq(value, "suicide", pcmk__str_casei) && pcmk__locate_sbd()) { controld_set_global_flags(controld_no_quorum_suicide); } value = g_hash_table_lookup(config_hash, XML_CONFIG_ATTR_SHUTDOWN_LOCK); if (crm_is_true(value)) { controld_set_global_flags(controld_shutdown_lock_enabled); } else { controld_clear_global_flags(controld_shutdown_lock_enabled); } value = g_hash_table_lookup(config_hash, XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT); controld_globals.shutdown_lock_limit = crm_parse_interval_spec(value) / 1000; + value = g_hash_table_lookup(config_hash, + XML_CONFIG_ATTR_NODE_PENDING_TIMEOUT); + controld_globals.node_pending_timeout = crm_parse_interval_spec(value) / 1000; + value = g_hash_table_lookup(config_hash, "cluster-name"); pcmk__str_update(&(controld_globals.cluster_name), value); // Let subcomponents initialize their own static variables controld_configure_election(config_hash); controld_configure_fencing(config_hash); controld_configure_fsa_timers(config_hash); controld_configure_throttle(config_hash); alerts = first_named_child(output, XML_CIB_TAG_ALERTS); crmd_unpack_alerts(alerts); controld_set_fsa_input_flags(R_READ_CONFIG); controld_trigger_fsa(); g_hash_table_destroy(config_hash); bail: crm_time_free(now); } /*! * \internal * \brief Trigger read and processing of the configuration * * \param[in] fn Calling function name * \param[in] line Line number where call occurred */ void controld_trigger_config_as(const char *fn, int line) { if (config_read_trigger != NULL) { crm_trace("%s:%d - Triggered config processing", fn, line); mainloop_set_trigger(config_read_trigger); } } gboolean crm_read_options(gpointer user_data) { cib_t *cib_conn = controld_globals.cib_conn; int call_id = cib_conn->cmds->query(cib_conn, "//" XML_CIB_TAG_CRMCONFIG " | //" XML_CIB_TAG_ALERTS, NULL, cib_xpath|cib_scope_local); fsa_register_cib_callback(call_id, NULL, config_query_callback); crm_trace("Querying the CIB... call %d", call_id); return TRUE; } /* A_READCONFIG */ void do_read_config(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { throttle_init(); controld_trigger_config(); } void crm_shutdown(int nsig) { const char *value = NULL; guint default_period_ms = 0; if ((controld_globals.mainloop == NULL) || !g_main_loop_is_running(controld_globals.mainloop)) { crmd_exit(CRM_EX_OK); return; } if (pcmk_is_set(controld_globals.fsa_input_register, R_SHUTDOWN)) { crm_err("Escalating shutdown"); register_fsa_input_before(C_SHUTDOWN, I_ERROR, NULL); return; } controld_set_fsa_input_flags(R_SHUTDOWN); register_fsa_input(C_SHUTDOWN, I_SHUTDOWN, NULL); /* If shutdown timer doesn't have a period set, use the default * * @TODO: Evaluate whether this is still necessary. As long as * config_query_callback() has been run at least once, it doesn't look like * anything could have changed the timer period since then. */ value = pcmk__cluster_option(NULL, controller_options, PCMK__NELEM(controller_options), XML_CONFIG_ATTR_FORCE_QUIT); default_period_ms = crm_parse_interval_spec(value); controld_shutdown_start_countdown(default_period_ms); } diff --git a/daemons/controld/controld_corosync.c b/daemons/controld/controld_corosync.c index 4378b30b58..e7818a5e51 100644 --- a/daemons/controld/controld_corosync.c +++ b/daemons/controld/controld_corosync.c @@ -1,164 +1,165 @@ /* * Copyright 2004-2022 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #if SUPPORT_COROSYNC extern void post_cache_update(int seq); /* A_HA_CONNECT */ static void crmd_cs_dispatch(cpg_handle_t handle, const struct cpg_name *groupName, uint32_t nodeid, uint32_t pid, void *msg, size_t msg_len) { uint32_t kind = 0; const char *from = NULL; char *data = pcmk_message_common_cs(handle, nodeid, pid, msg, &kind, &from); if(data == NULL) { return; } if (kind == crm_class_cluster) { crm_node_t *peer = NULL; xmlNode *xml = string2xml(data); if (xml == NULL) { crm_err("Could not parse message content (%d): %.100s", kind, data); free(data); return; } crm_xml_add(xml, F_ORIG, from); /* crm_xml_add_int(xml, F_SEQ, wrapper->id); Fake? */ peer = crm_get_peer(0, from); if (!pcmk_is_set(peer->processes, crm_proc_cpg)) { /* If we can still talk to our peer process on that node, * then it must be part of the corosync membership */ crm_warn("Receiving messages from a node we think is dead: %s[%d]", peer->uname, peer->id); crm_update_peer_proc(__func__, peer, crm_proc_cpg, ONLINESTATUS); } crmd_ha_msg_filter(xml); free_xml(xml); } else { crm_err("Invalid message class (%d): %.100s", kind, data); } free(data); } static gboolean crmd_quorum_callback(unsigned long long seq, gboolean quorate) { crm_update_quorum(quorate, FALSE); post_cache_update(seq); return TRUE; } static void crmd_cs_destroy(gpointer user_data) { if (!pcmk_is_set(controld_globals.fsa_input_register, R_HA_DISCONNECTED)) { crm_crit("Lost connection to cluster layer, shutting down"); crmd_exit(CRM_EX_DISCONNECT); } else { crm_info("Corosync connection closed"); } } /*! * \brief Handle a Corosync notification of a CPG configuration change * * \param[in] handle CPG connection * \param[in] cpg_name CPG group name * \param[in] member_list List of current CPG members * \param[in] member_list_entries Number of entries in \p member_list * \param[in] left_list List of CPG members that left * \param[in] left_list_entries Number of entries in \p left_list * \param[in] joined_list List of CPG members that joined * \param[in] joined_list_entries Number of entries in \p joined_list */ static void cpg_membership_callback(cpg_handle_t handle, const struct cpg_name *cpg_name, const struct cpg_address *member_list, size_t member_list_entries, const struct cpg_address *left_list, size_t left_list_entries, const struct cpg_address *joined_list, size_t joined_list_entries) { /* When nodes leave CPG, the DC clears their transient node attributes. * * However if there is no DC, or the DC is among the nodes that left, each * remaining node needs to do the clearing, to ensure it gets done. * Otherwise, the attributes would persist when the nodes rejoin, which * could have serious consequences for unfencing, agents that use attributes * for internal logic, etc. * * Here, we set a global boolean if the DC is among the nodes that left, for * use by the peer callback. */ if (controld_globals.dc_name != NULL) { crm_node_t *peer = NULL; - peer = pcmk__search_cluster_node_cache(0, controld_globals.dc_name); + peer = pcmk__search_cluster_node_cache(0, controld_globals.dc_name, + NULL); if (peer != NULL) { for (int i = 0; i < left_list_entries; ++i) { if (left_list[i].nodeid == peer->id) { controld_set_global_flags(controld_dc_left); break; } } } } // Process the change normally, which will call the peer callback as needed pcmk_cpg_membership(handle, cpg_name, member_list, member_list_entries, left_list, left_list_entries, joined_list, joined_list_entries); controld_clear_global_flags(controld_dc_left); } extern gboolean crm_connect_corosync(crm_cluster_t * cluster); gboolean crm_connect_corosync(crm_cluster_t * cluster) { if (is_corosync_cluster()) { crm_set_status_callback(&peer_update_callback); cluster->cpg.cpg_deliver_fn = crmd_cs_dispatch; cluster->cpg.cpg_confchg_fn = cpg_membership_callback; cluster->destroy = crmd_cs_destroy; if (crm_cluster_connect(cluster)) { pcmk__corosync_quorum_connect(crmd_quorum_callback, crmd_cs_destroy); return TRUE; } } return FALSE; } #endif diff --git a/daemons/controld/controld_fencing.c b/daemons/controld/controld_fencing.c index 89cb61fa3d..bfad0f2521 100644 --- a/daemons/controld/controld_fencing.c +++ b/daemons/controld/controld_fencing.c @@ -1,1108 +1,1111 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include static void tengine_stonith_history_synced(stonith_t *st, stonith_event_t *st_event); /* * stonith failure counting * * We don't want to get stuck in a permanent fencing loop. Keep track of the * number of fencing failures for each target node, and the most we'll restart a * transition for. */ struct st_fail_rec { int count; }; static bool fence_reaction_panic = false; static unsigned long int stonith_max_attempts = 10; static GHashTable *stonith_failures = NULL; /*! * \internal * \brief Update max fencing attempts before giving up * * \param[in] value New max fencing attempts */ static void update_stonith_max_attempts(const char *value) { stonith_max_attempts = char2score(value); if (stonith_max_attempts < 1UL) { stonith_max_attempts = 10UL; } } /*! * \internal * \brief Configure reaction to notification of local node being fenced * * \param[in] reaction_s Reaction type */ static void set_fence_reaction(const char *reaction_s) { if (pcmk__str_eq(reaction_s, "panic", pcmk__str_casei)) { fence_reaction_panic = true; } else { if (!pcmk__str_eq(reaction_s, "stop", pcmk__str_casei)) { crm_warn("Invalid value '%s' for %s, using 'stop'", reaction_s, XML_CONFIG_ATTR_FENCE_REACTION); } fence_reaction_panic = false; } } /*! * \internal * \brief Configure fencing options based on the CIB * * \param[in,out] options Name/value pairs for configured options */ void controld_configure_fencing(GHashTable *options) { const char *value = NULL; value = g_hash_table_lookup(options, XML_CONFIG_ATTR_FENCE_REACTION); set_fence_reaction(value); value = g_hash_table_lookup(options, "stonith-max-attempts"); update_stonith_max_attempts(value); } static gboolean too_many_st_failures(const char *target) { GHashTableIter iter; const char *key = NULL; struct st_fail_rec *value = NULL; if (stonith_failures == NULL) { return FALSE; } if (target == NULL) { g_hash_table_iter_init(&iter, stonith_failures); while (g_hash_table_iter_next(&iter, (gpointer *) &key, (gpointer *) &value)) { if (value->count >= stonith_max_attempts) { target = (const char*)key; goto too_many; } } } else { value = g_hash_table_lookup(stonith_failures, target); if ((value != NULL) && (value->count >= stonith_max_attempts)) { goto too_many; } } return FALSE; too_many: crm_warn("Too many failures (%d) to fence %s, giving up", value->count, target); return TRUE; } /*! * \internal * \brief Reset a stonith fail count * * \param[in] target Name of node to reset, or NULL for all */ void st_fail_count_reset(const char *target) { if (stonith_failures == NULL) { return; } if (target) { struct st_fail_rec *rec = NULL; rec = g_hash_table_lookup(stonith_failures, target); if (rec) { rec->count = 0; } } else { GHashTableIter iter; const char *key = NULL; struct st_fail_rec *rec = NULL; g_hash_table_iter_init(&iter, stonith_failures); while (g_hash_table_iter_next(&iter, (gpointer *) &key, (gpointer *) &rec)) { rec->count = 0; } } } static void st_fail_count_increment(const char *target) { struct st_fail_rec *rec = NULL; if (stonith_failures == NULL) { stonith_failures = pcmk__strkey_table(free, free); } rec = g_hash_table_lookup(stonith_failures, target); if (rec) { rec->count++; } else { rec = malloc(sizeof(struct st_fail_rec)); if(rec == NULL) { return; } rec->count = 1; g_hash_table_insert(stonith_failures, strdup(target), rec); } } /* end stonith fail count functions */ static void cib_fencing_updated(xmlNode *msg, int call_id, int rc, xmlNode *output, void *user_data) { if (rc < pcmk_ok) { crm_err("Fencing update %d for %s: failed - %s (%d)", call_id, (char *)user_data, pcmk_strerror(rc), rc); crm_log_xml_warn(msg, "Failed update"); abort_transition(INFINITY, pcmk__graph_shutdown, "CIB update failed", NULL); } else { crm_info("Fencing update %d for %s: complete", call_id, (char *)user_data); } } static void send_stonith_update(pcmk__graph_action_t *action, const char *target, const char *uuid) { int rc = pcmk_ok; crm_node_t *peer = NULL; /* We (usually) rely on the membership layer to do node_update_cluster, * and the peer status callback to do node_update_peer, because the node * might have already rejoined before we get the stonith result here. */ int flags = node_update_join | node_update_expected; /* zero out the node-status & remove all LRM status info */ xmlNode *node_state = NULL; CRM_CHECK(target != NULL, return); CRM_CHECK(uuid != NULL, return); - /* Make sure the membership and join caches are accurate */ - peer = crm_get_peer_full(0, target, CRM_GET_PEER_ANY); + /* Make sure the membership and join caches are accurate. + * Try getting any existing node cache entry also by node uuid in case it + * doesn't have an uname yet. + */ + peer = pcmk__get_peer_full(0, target, uuid, CRM_GET_PEER_ANY); CRM_CHECK(peer != NULL, return); if (peer->state == NULL) { /* Usually, we rely on the membership layer to update the cluster state * in the CIB. However, if the node has never been seen, do it here, so * the node is not considered unclean. */ flags |= node_update_cluster; } if (peer->uuid == NULL) { crm_info("Recording uuid '%s' for node '%s'", uuid, target); peer->uuid = strdup(uuid); } crmd_peer_down(peer, TRUE); /* Generate a node state update for the CIB */ node_state = create_node_state_update(peer, flags, NULL, __func__); /* we have to mark whether or not remote nodes have already been fenced */ if (peer->flags & crm_remote_node) { char *now_s = pcmk__ttoa(time(NULL)); crm_xml_add(node_state, XML_NODE_IS_FENCED, now_s); free(now_s); } /* Force our known ID */ crm_xml_add(node_state, XML_ATTR_ID, uuid); rc = controld_globals.cib_conn->cmds->modify(controld_globals.cib_conn, XML_CIB_TAG_STATUS, node_state, cib_scope_local |cib_can_create); /* Delay processing the trigger until the update completes */ crm_debug("Sending fencing update %d for %s", rc, target); fsa_register_cib_callback(rc, strdup(target), cib_fencing_updated); // Make sure it sticks /* controld_globals.cib_conn->cmds->bump_epoch(controld_globals.cib_conn, * cib_scope_local); */ controld_delete_node_state(peer->uname, controld_section_all, cib_scope_local); free_xml(node_state); return; } /*! * \internal * \brief Abort transition due to stonith failure * * \param[in] abort_action Whether to restart or stop transition * \param[in] target Don't restart if this (NULL for any) has too many failures * \param[in] reason Log this stonith action XML as abort reason (or NULL) */ static void abort_for_stonith_failure(enum pcmk__graph_next abort_action, const char *target, const xmlNode *reason) { /* If stonith repeatedly fails, we eventually give up on starting a new * transition for that reason. */ if ((abort_action != pcmk__graph_wait) && too_many_st_failures(target)) { abort_action = pcmk__graph_wait; } abort_transition(INFINITY, abort_action, "Stonith failed", reason); } /* * stonith cleanup list * * If the DC is shot, proper notifications might not go out. * The stonith cleanup list allows the cluster to (re-)send * notifications once a new DC is elected. */ static GList *stonith_cleanup_list = NULL; /*! * \internal * \brief Add a node to the stonith cleanup list * * \param[in] target Name of node to add */ void add_stonith_cleanup(const char *target) { stonith_cleanup_list = g_list_append(stonith_cleanup_list, strdup(target)); } /*! * \internal * \brief Remove a node from the stonith cleanup list * * \param[in] Name of node to remove */ void remove_stonith_cleanup(const char *target) { GList *iter = stonith_cleanup_list; while (iter != NULL) { GList *tmp = iter; char *iter_name = tmp->data; iter = iter->next; if (pcmk__str_eq(target, iter_name, pcmk__str_casei)) { crm_trace("Removing %s from the cleanup list", iter_name); stonith_cleanup_list = g_list_delete_link(stonith_cleanup_list, tmp); free(iter_name); } } } /*! * \internal * \brief Purge all entries from the stonith cleanup list */ void purge_stonith_cleanup(void) { if (stonith_cleanup_list) { GList *iter = NULL; for (iter = stonith_cleanup_list; iter != NULL; iter = iter->next) { char *target = iter->data; crm_info("Purging %s from stonith cleanup list", target); free(target); } g_list_free(stonith_cleanup_list); stonith_cleanup_list = NULL; } } /*! * \internal * \brief Send stonith updates for all entries in cleanup list, then purge it */ void execute_stonith_cleanup(void) { GList *iter; for (iter = stonith_cleanup_list; iter != NULL; iter = iter->next) { char *target = iter->data; crm_node_t *target_node = crm_get_peer(0, target); const char *uuid = crm_peer_uuid(target_node); crm_notice("Marking %s, target of a previous stonith action, as clean", target); send_stonith_update(NULL, target, uuid); free(target); } g_list_free(stonith_cleanup_list); stonith_cleanup_list = NULL; } /* end stonith cleanup list functions */ /* stonith API client * * Functions that need to interact directly with the fencer via its API */ static stonith_t *stonith_api = NULL; static crm_trigger_t *stonith_reconnect = NULL; static char *te_client_id = NULL; static gboolean fail_incompletable_stonith(pcmk__graph_t *graph) { GList *lpc = NULL; const char *task = NULL; xmlNode *last_action = NULL; if (graph == NULL) { return FALSE; } for (lpc = graph->synapses; lpc != NULL; lpc = lpc->next) { GList *lpc2 = NULL; pcmk__graph_synapse_t *synapse = (pcmk__graph_synapse_t *) lpc->data; if (pcmk_is_set(synapse->flags, pcmk__synapse_confirmed)) { continue; } for (lpc2 = synapse->actions; lpc2 != NULL; lpc2 = lpc2->next) { pcmk__graph_action_t *action = (pcmk__graph_action_t *) lpc2->data; if ((action->type != pcmk__cluster_graph_action) || pcmk_is_set(action->flags, pcmk__graph_action_confirmed)) { continue; } task = crm_element_value(action->xml, XML_LRM_ATTR_TASK); if (task && pcmk__str_eq(task, CRM_OP_FENCE, pcmk__str_casei)) { pcmk__set_graph_action_flags(action, pcmk__graph_action_failed); last_action = action->xml; pcmk__update_graph(graph, action); crm_notice("Failing action %d (%s): fencer terminated", action->id, ID(action->xml)); } } } if (last_action != NULL) { crm_warn("Fencer failure resulted in unrunnable actions"); abort_for_stonith_failure(pcmk__graph_restart, NULL, last_action); return TRUE; } return FALSE; } static void tengine_stonith_connection_destroy(stonith_t *st, stonith_event_t *e) { te_cleanup_stonith_history_sync(st, FALSE); if (pcmk_is_set(controld_globals.fsa_input_register, R_ST_REQUIRED)) { crm_crit("Fencing daemon connection failed"); mainloop_set_trigger(stonith_reconnect); } else { crm_info("Fencing daemon disconnected"); } if (stonith_api) { /* the client API won't properly reconnect notifications * if they are still in the table - so remove them */ if (stonith_api->state != stonith_disconnected) { stonith_api->cmds->disconnect(st); } stonith_api->cmds->remove_notification(stonith_api, NULL); } if (AM_I_DC) { fail_incompletable_stonith(controld_globals.transition_graph); trigger_graph(); } } /*! * \internal * \brief Handle an event notification from the fencing API * * \param[in] st Fencing API connection (ignored) * \param[in] event Fencing API event notification */ static void handle_fence_notification(stonith_t *st, stonith_event_t *event) { bool succeeded = true; const char *executioner = "the cluster"; const char *client = "a client"; const char *reason = NULL; int exec_status; if (te_client_id == NULL) { te_client_id = crm_strdup_printf("%s.%lu", crm_system_name, (unsigned long) getpid()); } if (event == NULL) { crm_err("Notify data not found"); return; } if (event->executioner != NULL) { executioner = event->executioner; } if (event->client_origin != NULL) { client = event->client_origin; } exec_status = stonith__event_execution_status(event); if ((stonith__event_exit_status(event) != CRM_EX_OK) || (exec_status != PCMK_EXEC_DONE)) { succeeded = false; if (exec_status == PCMK_EXEC_DONE) { exec_status = PCMK_EXEC_ERROR; } } reason = stonith__event_exit_reason(event); crmd_alert_fencing_op(event); if (pcmk__str_eq("on", event->action, pcmk__str_none)) { // Unfencing doesn't need special handling, just a log message if (succeeded) { crm_notice("%s was unfenced by %s at the request of %s@%s", event->target, executioner, client, event->origin); } else { crm_err("Unfencing of %s by %s failed (%s%s%s) with exit status %d", event->target, executioner, pcmk_exec_status_str(exec_status), ((reason == NULL)? "" : ": "), ((reason == NULL)? "" : reason), stonith__event_exit_status(event)); } return; } if (succeeded && pcmk__str_eq(event->target, controld_globals.our_nodename, pcmk__str_casei)) { /* We were notified of our own fencing. Most likely, either fencing was * misconfigured, or fabric fencing that doesn't cut cluster * communication is in use. * * Either way, shutting down the local host is a good idea, to require * administrator intervention. Also, other nodes would otherwise likely * set our status to lost because of the fencing callback and discard * our subsequent election votes as "not part of our cluster". */ crm_crit("We were allegedly just fenced by %s for %s!", executioner, event->origin); // Dumps blackbox if enabled if (fence_reaction_panic) { pcmk__panic(__func__); } else { crm_exit(CRM_EX_FATAL); } return; // Should never get here } /* Update the count of fencing failures for this target, in case we become * DC later. The current DC has already updated its fail count in * tengine_stonith_callback(). */ if (!AM_I_DC) { if (succeeded) { st_fail_count_reset(event->target); } else { st_fail_count_increment(event->target); } } crm_notice("Peer %s was%s terminated (%s) by %s on behalf of %s@%s: " "%s%s%s%s " CRM_XS " event=%s", event->target, (succeeded? "" : " not"), event->action, executioner, client, event->origin, (succeeded? "OK" : pcmk_exec_status_str(exec_status)), ((reason == NULL)? "" : " ("), ((reason == NULL)? "" : reason), ((reason == NULL)? "" : ")"), event->id); if (succeeded) { crm_node_t *peer = pcmk__search_known_node_cache(0, event->target, CRM_GET_PEER_ANY); const char *uuid = NULL; if (peer == NULL) { return; } uuid = crm_peer_uuid(peer); if (AM_I_DC) { /* The DC always sends updates */ send_stonith_update(NULL, event->target, uuid); /* @TODO Ideally, at this point, we'd check whether the fenced node * hosted any guest nodes, and call remote_node_down() for them. * Unfortunately, the controller doesn't have a simple, reliable way * to map hosts to guests. It might be possible to track this in the * peer cache via crm_remote_peer_cache_refresh(). For now, we rely * on the scheduler creating fence pseudo-events for the guests. */ if (!pcmk__str_eq(client, te_client_id, pcmk__str_casei)) { /* Abort the current transition if it wasn't the cluster that * initiated fencing. */ crm_info("External fencing operation from %s fenced %s", client, event->target); abort_transition(INFINITY, pcmk__graph_restart, "External Fencing Operation", NULL); } } else if (pcmk__str_eq(controld_globals.dc_name, event->target, pcmk__str_null_matches|pcmk__str_casei) && !pcmk_is_set(peer->flags, crm_remote_node)) { // Assume the target was our DC if we don't currently have one if (controld_globals.dc_name != NULL) { crm_notice("Fencing target %s was our DC", event->target); } else { crm_notice("Fencing target %s may have been our DC", event->target); } /* Given the CIB resyncing that occurs around elections, * have one node update the CIB now and, if the new DC is different, * have them do so too after the election */ if (pcmk__str_eq(event->executioner, controld_globals.our_nodename, pcmk__str_casei)) { send_stonith_update(NULL, event->target, uuid); } add_stonith_cleanup(event->target); } /* If the target is a remote node, and we host its connection, * immediately fail all monitors so it can be recovered quickly. * The connection won't necessarily drop when a remote node is fenced, * so the failure might not otherwise be detected until the next poke. */ if (pcmk_is_set(peer->flags, crm_remote_node)) { remote_ra_fail(event->target); } crmd_peer_down(peer, TRUE); } } /*! * \brief Connect to fencer * * \param[in] user_data If NULL, retry failures now, otherwise retry in main loop * * \return TRUE * \note If user_data is NULL, this will wait 2s between attempts, for up to * 30 attempts, meaning the controller could be blocked as long as 58s. */ static gboolean te_connect_stonith(gpointer user_data) { int rc = pcmk_ok; if (stonith_api == NULL) { stonith_api = stonith_api_new(); if (stonith_api == NULL) { crm_err("Could not connect to fencer: API memory allocation failed"); return TRUE; } } if (stonith_api->state != stonith_disconnected) { crm_trace("Already connected to fencer, no need to retry"); return TRUE; } if (user_data == NULL) { // Blocking (retry failures now until successful) rc = stonith_api_connect_retry(stonith_api, crm_system_name, 30); if (rc != pcmk_ok) { crm_err("Could not connect to fencer in 30 attempts: %s " CRM_XS " rc=%d", pcmk_strerror(rc), rc); } } else { // Non-blocking (retry failures later in main loop) rc = stonith_api->cmds->connect(stonith_api, crm_system_name, NULL); if (rc != pcmk_ok) { if (pcmk_is_set(controld_globals.fsa_input_register, R_ST_REQUIRED)) { crm_notice("Fencer connection failed (will retry): %s " CRM_XS " rc=%d", pcmk_strerror(rc), rc); mainloop_set_trigger(stonith_reconnect); } else { crm_info("Fencer connection failed (ignoring because no longer required): %s " CRM_XS " rc=%d", pcmk_strerror(rc), rc); } return TRUE; } } if (rc == pcmk_ok) { stonith_api->cmds->register_notification(stonith_api, T_STONITH_NOTIFY_DISCONNECT, tengine_stonith_connection_destroy); stonith_api->cmds->register_notification(stonith_api, T_STONITH_NOTIFY_FENCE, handle_fence_notification); stonith_api->cmds->register_notification(stonith_api, T_STONITH_NOTIFY_HISTORY_SYNCED, tengine_stonith_history_synced); te_trigger_stonith_history_sync(TRUE); crm_notice("Fencer successfully connected"); } return TRUE; } /*! \internal \brief Schedule fencer connection attempt in main loop */ void controld_trigger_fencer_connect(void) { if (stonith_reconnect == NULL) { stonith_reconnect = mainloop_add_trigger(G_PRIORITY_LOW, te_connect_stonith, GINT_TO_POINTER(TRUE)); } controld_set_fsa_input_flags(R_ST_REQUIRED); mainloop_set_trigger(stonith_reconnect); } void controld_disconnect_fencer(bool destroy) { if (stonith_api) { // Prevent fencer connection from coming up again controld_clear_fsa_input_flags(R_ST_REQUIRED); if (stonith_api->state != stonith_disconnected) { stonith_api->cmds->disconnect(stonith_api); } stonith_api->cmds->remove_notification(stonith_api, NULL); } if (destroy) { if (stonith_api) { stonith_api->cmds->free(stonith_api); stonith_api = NULL; } if (stonith_reconnect) { mainloop_destroy_trigger(stonith_reconnect); stonith_reconnect = NULL; } if (te_client_id) { free(te_client_id); te_client_id = NULL; } } } static gboolean do_stonith_history_sync(gpointer user_data) { if (stonith_api && (stonith_api->state != stonith_disconnected)) { stonith_history_t *history = NULL; te_cleanup_stonith_history_sync(stonith_api, FALSE); stonith_api->cmds->history(stonith_api, st_opt_sync_call | st_opt_broadcast, NULL, &history, 5); stonith_history_free(history); return TRUE; } else { crm_info("Skip triggering stonith history-sync as stonith is disconnected"); return FALSE; } } static void tengine_stonith_callback(stonith_t *stonith, stonith_callback_data_t *data) { char *uuid = NULL; int stonith_id = -1; int transition_id = -1; pcmk__graph_action_t *action = NULL; const char *target = NULL; if ((data == NULL) || (data->userdata == NULL)) { crm_err("Ignoring fence operation %d result: " "No transition key given (bug?)", ((data == NULL)? -1 : data->call_id)); return; } if (!AM_I_DC) { const char *reason = stonith__exit_reason(data); if (reason == NULL) { reason = pcmk_exec_status_str(stonith__execution_status(data)); } crm_notice("Result of fence operation %d: %d (%s) " CRM_XS " key=%s", data->call_id, stonith__exit_status(data), reason, (const char *) data->userdata); return; } CRM_CHECK(decode_transition_key(data->userdata, &uuid, &transition_id, &stonith_id, NULL), goto bail); if (controld_globals.transition_graph->complete || (stonith_id < 0) || !pcmk__str_eq(uuid, controld_globals.te_uuid, pcmk__str_none) || (controld_globals.transition_graph->id != transition_id)) { crm_info("Ignoring fence operation %d result: " "Not from current transition " CRM_XS " complete=%s action=%d uuid=%s (vs %s) transition=%d (vs %d)", data->call_id, pcmk__btoa(controld_globals.transition_graph->complete), stonith_id, uuid, controld_globals.te_uuid, transition_id, controld_globals.transition_graph->id); goto bail; } action = controld_get_action(stonith_id); if (action == NULL) { crm_err("Ignoring fence operation %d result: " "Action %d not found in transition graph (bug?) " CRM_XS " uuid=%s transition=%d", data->call_id, stonith_id, uuid, transition_id); goto bail; } target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET); if (target == NULL) { crm_err("Ignoring fence operation %d result: No target given (bug?)", data->call_id); goto bail; } stop_te_timer(action); if (stonith__exit_status(data) == CRM_EX_OK) { const char *uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID); const char *op = crm_meta_value(action->params, "stonith_action"); crm_info("Fence operation %d for %s succeeded", data->call_id, target); if (!(pcmk_is_set(action->flags, pcmk__graph_action_confirmed))) { te_action_confirmed(action, NULL); if (pcmk__str_eq("on", op, pcmk__str_casei)) { const char *value = NULL; char *now = pcmk__ttoa(time(NULL)); gboolean is_remote_node = FALSE; /* This check is not 100% reliable, since this node is not * guaranteed to have the remote node cached. However, it * doesn't have to be reliable, since the attribute manager can * learn a node's "remoteness" by other means sooner or later. * This allows it to learn more quickly if this node does have * the information. */ if (g_hash_table_lookup(crm_remote_peer_cache, uuid) != NULL) { is_remote_node = TRUE; } update_attrd(target, CRM_ATTR_UNFENCED, now, NULL, is_remote_node); free(now); value = crm_meta_value(action->params, XML_OP_ATTR_DIGESTS_ALL); update_attrd(target, CRM_ATTR_DIGESTS_ALL, value, NULL, is_remote_node); value = crm_meta_value(action->params, XML_OP_ATTR_DIGESTS_SECURE); update_attrd(target, CRM_ATTR_DIGESTS_SECURE, value, NULL, is_remote_node); } else if (!(pcmk_is_set(action->flags, pcmk__graph_action_sent_update))) { send_stonith_update(action, target, uuid); pcmk__set_graph_action_flags(action, pcmk__graph_action_sent_update); } } st_fail_count_reset(target); } else { enum pcmk__graph_next abort_action = pcmk__graph_restart; int status = stonith__execution_status(data); const char *reason = stonith__exit_reason(data); if (reason == NULL) { if (status == PCMK_EXEC_DONE) { reason = "Agent returned error"; } else { reason = pcmk_exec_status_str(status); } } pcmk__set_graph_action_flags(action, pcmk__graph_action_failed); /* If no fence devices were available, there's no use in immediately * checking again, so don't start a new transition in that case. */ if (status == PCMK_EXEC_NO_FENCE_DEVICE) { crm_warn("Fence operation %d for %s failed: %s " "(aborting transition and giving up for now)", data->call_id, target, reason); abort_action = pcmk__graph_wait; } else { crm_notice("Fence operation %d for %s failed: %s " "(aborting transition)", data->call_id, target, reason); } /* Increment the fail count now, so abort_for_stonith_failure() can * check it. Non-DC nodes will increment it in * handle_fence_notification(). */ st_fail_count_increment(target); abort_for_stonith_failure(abort_action, target, NULL); } pcmk__update_graph(controld_globals.transition_graph, action); trigger_graph(); bail: free(data->userdata); free(uuid); return; } static int fence_with_delay(const char *target, const char *type, int delay) { uint32_t options = st_opt_none; // Group of enum stonith_call_options int timeout_sec = (int) (controld_globals.transition_graph->stonith_timeout / 1000); if (crmd_join_phase_count(crm_join_confirmed) == 1) { stonith__set_call_options(options, target, st_opt_allow_suicide); } return stonith_api->cmds->fence_with_delay(stonith_api, options, target, type, timeout_sec, 0, delay); } /*! * \internal * \brief Execute a fencing action from a transition graph * * \param[in] graph Transition graph being executed (ignored) * \param[in] action Fencing action to execute * * \return Standard Pacemaker return code */ int controld_execute_fence_action(pcmk__graph_t *graph, pcmk__graph_action_t *action) { int rc = 0; const char *id = ID(action->xml); const char *uuid = crm_element_value(action->xml, XML_LRM_ATTR_TARGET_UUID); const char *target = crm_element_value(action->xml, XML_LRM_ATTR_TARGET); const char *type = crm_meta_value(action->params, "stonith_action"); char *transition_key = NULL; const char *priority_delay = NULL; int delay_i = 0; gboolean invalid_action = FALSE; int stonith_timeout = (int) (controld_globals.transition_graph->stonith_timeout / 1000); CRM_CHECK(id != NULL, invalid_action = TRUE); CRM_CHECK(uuid != NULL, invalid_action = TRUE); CRM_CHECK(type != NULL, invalid_action = TRUE); CRM_CHECK(target != NULL, invalid_action = TRUE); if (invalid_action) { crm_log_xml_warn(action->xml, "BadAction"); return EPROTO; } priority_delay = crm_meta_value(action->params, XML_CONFIG_ATTR_PRIORITY_FENCING_DELAY); crm_notice("Requesting fencing (%s) targeting node %s " CRM_XS " action=%s timeout=%i%s%s", type, target, id, stonith_timeout, priority_delay ? " priority_delay=" : "", priority_delay ? priority_delay : ""); /* Passing NULL means block until we can connect... */ te_connect_stonith(NULL); pcmk__scan_min_int(priority_delay, &delay_i, 0); rc = fence_with_delay(target, type, delay_i); transition_key = pcmk__transition_key(controld_globals.transition_graph->id, action->id, 0, controld_globals.te_uuid), stonith_api->cmds->register_callback(stonith_api, rc, (stonith_timeout + (delay_i > 0 ? delay_i : 0)), st_opt_timeout_updates, transition_key, "tengine_stonith_callback", tengine_stonith_callback); return pcmk_rc_ok; } bool controld_verify_stonith_watchdog_timeout(const char *value) { const char *our_nodename = controld_globals.our_nodename; gboolean rv = TRUE; if (stonith_api && (stonith_api->state != stonith_disconnected) && stonith__watchdog_fencing_enabled_for_node_api(stonith_api, our_nodename)) { rv = pcmk__valid_sbd_timeout(value); } return rv; } /* end stonith API client functions */ /* * stonith history synchronization * * Each node's fencer keeps track of a cluster-wide fencing history. When a node * joins or leaves, we need to synchronize the history across all nodes. */ static crm_trigger_t *stonith_history_sync_trigger = NULL; static mainloop_timer_t *stonith_history_sync_timer_short = NULL; static mainloop_timer_t *stonith_history_sync_timer_long = NULL; void te_cleanup_stonith_history_sync(stonith_t *st, bool free_timers) { if (free_timers) { mainloop_timer_del(stonith_history_sync_timer_short); stonith_history_sync_timer_short = NULL; mainloop_timer_del(stonith_history_sync_timer_long); stonith_history_sync_timer_long = NULL; } else { mainloop_timer_stop(stonith_history_sync_timer_short); mainloop_timer_stop(stonith_history_sync_timer_long); } if (st) { st->cmds->remove_notification(st, T_STONITH_NOTIFY_HISTORY_SYNCED); } } static void tengine_stonith_history_synced(stonith_t *st, stonith_event_t *st_event) { te_cleanup_stonith_history_sync(st, FALSE); crm_debug("Fence-history synced - cancel all timers"); } static gboolean stonith_history_sync_set_trigger(gpointer user_data) { mainloop_set_trigger(stonith_history_sync_trigger); return FALSE; } void te_trigger_stonith_history_sync(bool long_timeout) { /* trigger a sync in 5s to give more nodes the * chance to show up so that we don't create * unnecessary stonith-history-sync traffic * * the long timeout of 30s is there as a fallback * so that after a successful connection to fenced * we will wait for 30s for the DC to trigger a * history-sync * if this doesn't happen we trigger a sync locally * (e.g. fenced segfaults and is restarted by pacemakerd) */ /* as we are finally checking the stonith-connection * in do_stonith_history_sync we should be fine * leaving stonith_history_sync_time & stonith_history_sync_trigger * around */ if (stonith_history_sync_trigger == NULL) { stonith_history_sync_trigger = mainloop_add_trigger(G_PRIORITY_LOW, do_stonith_history_sync, NULL); } if (long_timeout) { if(stonith_history_sync_timer_long == NULL) { stonith_history_sync_timer_long = mainloop_timer_add("history_sync_long", 30000, FALSE, stonith_history_sync_set_trigger, NULL); } crm_info("Fence history will be synchronized cluster-wide within 30 seconds"); mainloop_timer_start(stonith_history_sync_timer_long); } else { if(stonith_history_sync_timer_short == NULL) { stonith_history_sync_timer_short = mainloop_timer_add("history_sync_short", 5000, FALSE, stonith_history_sync_set_trigger, NULL); } crm_info("Fence history will be synchronized cluster-wide within 5 seconds"); mainloop_timer_start(stonith_history_sync_timer_short); } } /* end stonith history synchronization functions */ diff --git a/daemons/controld/controld_globals.h b/daemons/controld/controld_globals.h index eff1607ae3..c8733501bc 100644 --- a/daemons/controld/controld_globals.h +++ b/daemons/controld/controld_globals.h @@ -1,143 +1,146 @@ /* * Copyright 2022-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef CONTROLD_GLOBALS__H # define CONTROLD_GLOBALS__H #include // pcmk__output_t, etc. #include // uint32_t, uint64_t #include // GList, GMainLoop #include // cib_t #include // pcmk__graph_t #include // enum crmd_fsa_state typedef struct { // Booleans //! Group of \p controld_flags values uint32_t flags; // Controller FSA //! FSA state enum crmd_fsa_state fsa_state; //! FSA actions (group of \p A_* flags) uint64_t fsa_actions; //! FSA input register contents (group of \p R_* flags) uint64_t fsa_input_register; //! FSA message queue GList *fsa_message_queue; // CIB //! Connection to the CIB cib_t *cib_conn; //! CIB connection's client ID const char *cib_client_id; // Scheduler //! Reference of the scheduler request being waited on char *fsa_pe_ref; // Transitioner //! Transitioner UUID char *te_uuid; //! Graph of transition currently being processed pcmk__graph_t *transition_graph; // Logging //! Output object for controller log messages pcmk__output_t *logger_out; // Other //! Cluster name char *cluster_name; //! Designated controller name char *dc_name; //! Designated controller's Pacemaker version char *dc_version; //! Local node's node name char *our_nodename; //! Local node's UUID char *our_uuid; //! Last saved cluster communication layer membership ID unsigned long long membership_id; //! Max lifetime (in seconds) of a resource's shutdown lock to a node guint shutdown_lock_limit; + //! Node pending timeout + guint node_pending_timeout; + //! Main event loop GMainLoop *mainloop; } controld_globals_t; extern controld_globals_t controld_globals; /*! * \internal * \enum controld_flags * \brief Bit flags to store various controller state and configuration info */ enum controld_flags { //! The DC left in a membership change that is being processed controld_dc_left = (1 << 0), //! The FSA is stalled waiting for further input controld_fsa_is_stalled = (1 << 1), //! The local node has been in a quorate partition at some point controld_ever_had_quorum = (1 << 2), //! The local node is currently in a quorate partition controld_has_quorum = (1 << 3), //! Panic the local node if it loses quorum controld_no_quorum_suicide = (1 << 4), //! Lock resources to the local node when it shuts down cleanly controld_shutdown_lock_enabled = (1 << 5), }; # define controld_set_global_flags(flags_to_set) do { \ controld_globals.flags = pcmk__set_flags_as(__func__, __LINE__, \ LOG_TRACE, \ "Global", "controller", \ controld_globals.flags, \ (flags_to_set), \ #flags_to_set); \ } while (0) # define controld_clear_global_flags(flags_to_clear) do { \ controld_globals.flags \ = pcmk__clear_flags_as(__func__, __LINE__, LOG_TRACE, "Global", \ "controller", controld_globals.flags, \ (flags_to_clear), #flags_to_clear); \ } while (0) #endif // ifndef CONTROLD_GLOBALS__H diff --git a/daemons/controld/controld_membership.c b/daemons/controld/controld_membership.c index 1f7e4c0f56..8232ee2062 100644 --- a/daemons/controld/controld_membership.c +++ b/daemons/controld/controld_membership.c @@ -1,457 +1,471 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ /* put these first so that uuid_t is defined without conflicts */ #include #include #include #include #include #include #include #include void post_cache_update(int instance); extern gboolean check_join_state(enum crmd_fsa_state cur_state, const char *source); static void reap_dead_nodes(gpointer key, gpointer value, gpointer user_data) { crm_node_t *node = value; if (crm_is_peer_active(node) == FALSE) { crm_update_peer_join(__func__, node, crm_join_none); if(node && node->uname) { if (pcmk__str_eq(controld_globals.our_nodename, node->uname, pcmk__str_casei)) { crm_err("We're not part of the cluster anymore"); register_fsa_input(C_FSA_INTERNAL, I_ERROR, NULL); } else if (!AM_I_DC && pcmk__str_eq(node->uname, controld_globals.dc_name, pcmk__str_casei)) { crm_warn("Our DC node (%s) left the cluster", node->uname); register_fsa_input(C_FSA_INTERNAL, I_ELECTION, NULL); } } if ((controld_globals.fsa_state == S_INTEGRATION) || (controld_globals.fsa_state == S_FINALIZE_JOIN)) { check_join_state(controld_globals.fsa_state, __func__); } if ((node != NULL) && (node->uuid != NULL)) { fail_incompletable_actions(controld_globals.transition_graph, node->uuid); } } } void post_cache_update(int instance) { xmlNode *no_op = NULL; crm_peer_seq = instance; crm_debug("Updated cache after membership event %d.", instance); g_hash_table_foreach(crm_peer_cache, reap_dead_nodes, NULL); controld_set_fsa_input_flags(R_MEMBERSHIP); if (AM_I_DC) { populate_cib_nodes(node_update_quick | node_update_cluster | node_update_peer | node_update_expected, __func__); } /* * If we lost nodes, we should re-check the election status * Safe to call outside of an election */ controld_set_fsa_action_flags(A_ELECTION_CHECK); controld_trigger_fsa(); /* Membership changed, remind everyone we're here. * This will aid detection of duplicate DCs */ no_op = create_request(CRM_OP_NOOP, NULL, NULL, CRM_SYSTEM_CRMD, AM_I_DC ? CRM_SYSTEM_DC : CRM_SYSTEM_CRMD, NULL); send_cluster_message(NULL, crm_msg_crmd, no_op, FALSE); free_xml(no_op); } static void crmd_node_update_complete(xmlNode * msg, int call_id, int rc, xmlNode * output, void *user_data) { fsa_data_t *msg_data = NULL; if (rc == pcmk_ok) { crm_trace("Node update %d complete", call_id); } else if(call_id < pcmk_ok) { crm_err("Node update failed: %s (%d)", pcmk_strerror(call_id), call_id); crm_log_xml_debug(msg, "failed"); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } else { crm_err("Node update %d failed: %s (%d)", call_id, pcmk_strerror(rc), rc); crm_log_xml_debug(msg, "failed"); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } } /*! * \internal * \brief Create an XML node state tag with updates * * \param[in,out] node Node whose state will be used for update * \param[in] flags Bitmask of node_update_flags indicating what to update * \param[in,out] parent XML node to contain update (or NULL) * \param[in] source Who requested the update (only used for logging) * * \return Pointer to created node state tag */ xmlNode * create_node_state_update(crm_node_t *node, int flags, xmlNode *parent, const char *source) { const char *value = NULL; xmlNode *node_state; if (!node->state) { crm_info("Node update for %s cancelled: no state, not seen yet", node->uname); return NULL; } node_state = create_xml_node(parent, XML_CIB_TAG_STATE); if (pcmk_is_set(node->flags, crm_remote_node)) { pcmk__xe_set_bool_attr(node_state, XML_NODE_IS_REMOTE, true); } set_uuid(node_state, XML_ATTR_ID, node); if (crm_element_value(node_state, XML_ATTR_ID) == NULL) { crm_info("Node update for %s cancelled: no id", node->uname); free_xml(node_state); return NULL; } crm_xml_add(node_state, XML_ATTR_UNAME, node->uname); if ((flags & node_update_cluster) && node->state) { - pcmk__xe_set_bool_attr(node_state, XML_NODE_IN_CLUSTER, - pcmk__str_eq(node->state, CRM_NODE_MEMBER, pcmk__str_casei)); + if (compare_version(controld_globals.dc_version, "3.18.0") >= 0) { + // A value 0 means the node is not a cluster member. + crm_xml_add_ll(node_state, XML_NODE_IN_CLUSTER, node->when_member); + + } else { + pcmk__xe_set_bool_attr(node_state, XML_NODE_IN_CLUSTER, + pcmk__str_eq(node->state, CRM_NODE_MEMBER, + pcmk__str_casei)); + } } if (!pcmk_is_set(node->flags, crm_remote_node)) { if (flags & node_update_peer) { - value = OFFLINESTATUS; - if (pcmk_is_set(node->processes, crm_get_cluster_proc())) { - value = ONLINESTATUS; + if (compare_version(controld_globals.dc_version, "3.18.0") >= 0) { + // A value 0 means the peer is offline in CPG. + crm_xml_add_ll(node_state, XML_NODE_IS_PEER, + node->when_online); + + } else { + value = OFFLINESTATUS; + if (pcmk_is_set(node->processes, crm_get_cluster_proc())) { + value = ONLINESTATUS; + } + crm_xml_add(node_state, XML_NODE_IS_PEER, value); } - crm_xml_add(node_state, XML_NODE_IS_PEER, value); } if (flags & node_update_join) { if (node->join <= crm_join_none) { value = CRMD_JOINSTATE_DOWN; } else { value = CRMD_JOINSTATE_MEMBER; } crm_xml_add(node_state, XML_NODE_JOIN_STATE, value); } if (flags & node_update_expected) { crm_xml_add(node_state, XML_NODE_EXPECTED, node->expected); } } crm_xml_add(node_state, XML_ATTR_ORIGIN, source); return node_state; } static void remove_conflicting_node_callback(xmlNode * msg, int call_id, int rc, xmlNode * output, void *user_data) { char *node_uuid = user_data; do_crm_log_unlikely(rc == 0 ? LOG_DEBUG : LOG_NOTICE, "Deletion of the unknown conflicting node \"%s\": %s (rc=%d)", node_uuid, pcmk_strerror(rc), rc); } static void search_conflicting_node_callback(xmlNode * msg, int call_id, int rc, xmlNode * output, void *user_data) { char *new_node_uuid = user_data; xmlNode *node_xml = NULL; if (rc != pcmk_ok) { if (rc != -ENXIO) { crm_notice("Searching conflicting nodes for %s failed: %s (%d)", new_node_uuid, pcmk_strerror(rc), rc); } return; } else if (output == NULL) { return; } if (pcmk__str_eq(crm_element_name(output), XML_CIB_TAG_NODE, pcmk__str_casei)) { node_xml = output; } else { node_xml = pcmk__xml_first_child(output); } for (; node_xml != NULL; node_xml = pcmk__xml_next(node_xml)) { const char *node_uuid = NULL; const char *node_uname = NULL; GHashTableIter iter; crm_node_t *node = NULL; gboolean known = FALSE; if (!pcmk__str_eq(crm_element_name(node_xml), XML_CIB_TAG_NODE, pcmk__str_casei)) { continue; } node_uuid = crm_element_value(node_xml, XML_ATTR_ID); node_uname = crm_element_value(node_xml, XML_ATTR_UNAME); if (node_uuid == NULL || node_uname == NULL) { continue; } g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { if (node->uuid && pcmk__str_eq(node->uuid, node_uuid, pcmk__str_casei) && node->uname && pcmk__str_eq(node->uname, node_uname, pcmk__str_casei)) { known = TRUE; break; } } if (known == FALSE) { cib_t *cib_conn = controld_globals.cib_conn; int delete_call_id = 0; xmlNode *node_state_xml = NULL; crm_notice("Deleting unknown node %s/%s which has conflicting uname with %s", node_uuid, node_uname, new_node_uuid); delete_call_id = cib_conn->cmds->remove(cib_conn, XML_CIB_TAG_NODES, node_xml, cib_scope_local); fsa_register_cib_callback(delete_call_id, strdup(node_uuid), remove_conflicting_node_callback); node_state_xml = create_xml_node(NULL, XML_CIB_TAG_STATE); crm_xml_add(node_state_xml, XML_ATTR_ID, node_uuid); crm_xml_add(node_state_xml, XML_ATTR_UNAME, node_uname); delete_call_id = cib_conn->cmds->remove(cib_conn, XML_CIB_TAG_STATUS, node_state_xml, cib_scope_local); fsa_register_cib_callback(delete_call_id, strdup(node_uuid), remove_conflicting_node_callback); free_xml(node_state_xml); } } } static void node_list_update_callback(xmlNode * msg, int call_id, int rc, xmlNode * output, void *user_data) { fsa_data_t *msg_data = NULL; if(call_id < pcmk_ok) { crm_err("Node list update failed: %s (%d)", pcmk_strerror(call_id), call_id); crm_log_xml_debug(msg, "update:failed"); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } else if(rc < pcmk_ok) { crm_err("Node update %d failed: %s (%d)", call_id, pcmk_strerror(rc), rc); crm_log_xml_debug(msg, "update:failed"); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } } void populate_cib_nodes(enum node_update_flags flags, const char *source) { cib_t *cib_conn = controld_globals.cib_conn; int call_id = 0; gboolean from_hashtable = TRUE; xmlNode *node_list = create_xml_node(NULL, XML_CIB_TAG_NODES); #if SUPPORT_COROSYNC if (!pcmk_is_set(flags, node_update_quick) && is_corosync_cluster()) { from_hashtable = pcmk__corosync_add_nodes(node_list); } #endif if (from_hashtable) { GHashTableIter iter; crm_node_t *node = NULL; GString *xpath = NULL; g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { xmlNode *new_node = NULL; if ((node->uuid != NULL) && (node->uname != NULL)) { crm_trace("Creating node entry for %s/%s", node->uname, node->uuid); if (xpath == NULL) { xpath = g_string_sized_new(512); } else { g_string_truncate(xpath, 0); } /* We need both to be valid */ new_node = create_xml_node(node_list, XML_CIB_TAG_NODE); crm_xml_add(new_node, XML_ATTR_ID, node->uuid); crm_xml_add(new_node, XML_ATTR_UNAME, node->uname); /* Search and remove unknown nodes with the conflicting uname from CIB */ pcmk__g_strcat(xpath, "/" XML_TAG_CIB "/" XML_CIB_TAG_CONFIGURATION "/" XML_CIB_TAG_NODES "/" XML_CIB_TAG_NODE "[@" XML_ATTR_UNAME "='", node->uname, "']" "[@" XML_ATTR_ID "!='", node->uuid, "']", NULL); call_id = cib_conn->cmds->query(cib_conn, (const char *) xpath->str, NULL, cib_scope_local|cib_xpath); fsa_register_cib_callback(call_id, strdup(node->uuid), search_conflicting_node_callback); } } if (xpath != NULL) { g_string_free(xpath, TRUE); } } crm_trace("Populating section from %s", from_hashtable ? "hashtable" : "cluster"); if ((controld_update_cib(XML_CIB_TAG_NODES, node_list, cib_scope_local, node_list_update_callback) == pcmk_rc_ok) && (crm_peer_cache != NULL) && AM_I_DC) { /* * There is no need to update the local CIB with our values if * we've not seen valid membership data */ GHashTableIter iter; crm_node_t *node = NULL; free_xml(node_list); node_list = create_xml_node(NULL, XML_CIB_TAG_STATUS); g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { create_node_state_update(node, flags, node_list, source); } if (crm_remote_peer_cache) { g_hash_table_iter_init(&iter, crm_remote_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { create_node_state_update(node, flags, node_list, source); } } controld_update_cib(XML_CIB_TAG_STATUS, node_list, cib_scope_local, crmd_node_update_complete); } free_xml(node_list); } static void cib_quorum_update_complete(xmlNode * msg, int call_id, int rc, xmlNode * output, void *user_data) { fsa_data_t *msg_data = NULL; if (rc == pcmk_ok) { crm_trace("Quorum update %d complete", call_id); } else { crm_err("Quorum update %d failed: %s (%d)", call_id, pcmk_strerror(rc), rc); crm_log_xml_debug(msg, "failed"); register_fsa_error(C_FSA_INTERNAL, I_ERROR, NULL); } } void crm_update_quorum(gboolean quorum, gboolean force_update) { bool has_quorum = pcmk_is_set(controld_globals.flags, controld_has_quorum); if (quorum) { controld_set_global_flags(controld_ever_had_quorum); } else if (pcmk_all_flags_set(controld_globals.flags, controld_ever_had_quorum |controld_no_quorum_suicide)) { pcmk__panic(__func__); } if (AM_I_DC && ((has_quorum && !quorum) || (!has_quorum && quorum) || force_update)) { xmlNode *update = NULL; update = create_xml_node(NULL, XML_TAG_CIB); crm_xml_add_int(update, XML_ATTR_HAVE_QUORUM, quorum); crm_xml_add(update, XML_ATTR_DC_UUID, controld_globals.our_uuid); crm_debug("Updating quorum status to %s", pcmk__btoa(quorum)); controld_update_cib(XML_TAG_CIB, update, cib_scope_local, cib_quorum_update_complete); free_xml(update); /* Quorum changes usually cause a new transition via other activity: * quorum gained via a node joining will abort via the node join, * and quorum lost via a node leaving will usually abort via resource * activity and/or fencing. * * However, it is possible that nothing else causes a transition (e.g. * someone forces quorum via corosync-cmaptcl, or quorum is lost due to * a node in standby shutting down cleanly), so here ensure a new * transition is triggered. */ if (quorum) { /* If quorum was gained, abort after a short delay, in case multiple * nodes are joining around the same time, so the one that brings us * to quorum doesn't cause all the remaining ones to be fenced. */ abort_after_delay(INFINITY, pcmk__graph_restart, "Quorum gained", 5000); } else { abort_transition(INFINITY, pcmk__graph_restart, "Quorum lost", NULL); } } if (quorum) { controld_set_global_flags(controld_has_quorum); } else { controld_clear_global_flags(controld_has_quorum); } } diff --git a/daemons/controld/controld_messages.c b/daemons/controld/controld_messages.c index 54b27ec33d..553dee67cd 100644 --- a/daemons/controld/controld_messages.c +++ b/daemons/controld/controld_messages.c @@ -1,1307 +1,1307 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include extern void crm_shutdown(int nsig); static enum crmd_fsa_input handle_message(xmlNode *msg, enum crmd_fsa_cause cause); static void handle_response(xmlNode *stored_msg); static enum crmd_fsa_input handle_request(xmlNode *stored_msg, enum crmd_fsa_cause cause); static enum crmd_fsa_input handle_shutdown_request(xmlNode *stored_msg); static void send_msg_via_ipc(xmlNode * msg, const char *sys); /* debug only, can wrap all it likes */ static int last_data_id = 0; void register_fsa_error_adv(enum crmd_fsa_cause cause, enum crmd_fsa_input input, fsa_data_t * cur_data, void *new_data, const char *raised_from) { /* save the current actions if any */ if (controld_globals.fsa_actions != A_NOTHING) { register_fsa_input_adv(cur_data ? cur_data->fsa_cause : C_FSA_INTERNAL, I_NULL, cur_data ? cur_data->data : NULL, controld_globals.fsa_actions, TRUE, __func__); } /* reset the action list */ crm_info("Resetting the current action list"); fsa_dump_actions(controld_globals.fsa_actions, "Drop"); controld_globals.fsa_actions = A_NOTHING; /* register the error */ register_fsa_input_adv(cause, input, new_data, A_NOTHING, TRUE, raised_from); } void register_fsa_input_adv(enum crmd_fsa_cause cause, enum crmd_fsa_input input, void *data, uint64_t with_actions, gboolean prepend, const char *raised_from) { unsigned old_len = g_list_length(controld_globals.fsa_message_queue); fsa_data_t *fsa_data = NULL; if (raised_from == NULL) { raised_from = ""; } if (input == I_NULL && with_actions == A_NOTHING /* && data == NULL */ ) { /* no point doing anything */ crm_err("Cannot add entry to queue: no input and no action"); return; } if (input == I_WAIT_FOR_EVENT) { controld_set_global_flags(controld_fsa_is_stalled); crm_debug("Stalling the FSA pending further input: source=%s cause=%s data=%p queue=%d", raised_from, fsa_cause2string(cause), data, old_len); if (old_len > 0) { fsa_dump_queue(LOG_TRACE); prepend = FALSE; } if (data == NULL) { controld_set_fsa_action_flags(with_actions); fsa_dump_actions(with_actions, "Restored"); return; } /* Store everything in the new event and reset * controld_globals.fsa_actions */ with_actions |= controld_globals.fsa_actions; controld_globals.fsa_actions = A_NOTHING; } last_data_id++; crm_trace("%s %s FSA input %d (%s) due to %s, %s data", raised_from, (prepend? "prepended" : "appended"), last_data_id, fsa_input2string(input), fsa_cause2string(cause), (data? "with" : "without")); fsa_data = calloc(1, sizeof(fsa_data_t)); fsa_data->id = last_data_id; fsa_data->fsa_input = input; fsa_data->fsa_cause = cause; fsa_data->origin = raised_from; fsa_data->data = NULL; fsa_data->data_type = fsa_dt_none; fsa_data->actions = with_actions; if (with_actions != A_NOTHING) { crm_trace("Adding actions %.16llx to input", (unsigned long long) with_actions); } if (data != NULL) { switch (cause) { case C_FSA_INTERNAL: case C_CRMD_STATUS_CALLBACK: case C_IPC_MESSAGE: case C_HA_MESSAGE: CRM_CHECK(((ha_msg_input_t *) data)->msg != NULL, crm_err("Bogus data from %s", raised_from)); crm_trace("Copying %s data from %s as cluster message data", fsa_cause2string(cause), raised_from); fsa_data->data = copy_ha_msg_input(data); fsa_data->data_type = fsa_dt_ha_msg; break; case C_LRM_OP_CALLBACK: crm_trace("Copying %s data from %s as lrmd_event_data_t", fsa_cause2string(cause), raised_from); fsa_data->data = lrmd_copy_event((lrmd_event_data_t *) data); fsa_data->data_type = fsa_dt_lrm; break; case C_TIMER_POPPED: case C_SHUTDOWN: case C_UNKNOWN: case C_STARTUP: crm_crit("Copying %s data (from %s) is not yet implemented", fsa_cause2string(cause), raised_from); crmd_exit(CRM_EX_SOFTWARE); break; } } /* make sure to free it properly later */ if (prepend) { controld_globals.fsa_message_queue = g_list_prepend(controld_globals.fsa_message_queue, fsa_data); } else { controld_globals.fsa_message_queue = g_list_append(controld_globals.fsa_message_queue, fsa_data); } crm_trace("FSA message queue length is %d", g_list_length(controld_globals.fsa_message_queue)); /* fsa_dump_queue(LOG_TRACE); */ if (old_len == g_list_length(controld_globals.fsa_message_queue)) { crm_err("Couldn't add message to the queue"); } if (input != I_WAIT_FOR_EVENT) { controld_trigger_fsa(); } } void fsa_dump_queue(int log_level) { int offset = 0; for (GList *iter = controld_globals.fsa_message_queue; iter != NULL; iter = iter->next) { fsa_data_t *data = (fsa_data_t *) iter->data; do_crm_log_unlikely(log_level, "queue[%d.%d]: input %s raised by %s(%p.%d)\t(cause=%s)", offset++, data->id, fsa_input2string(data->fsa_input), data->origin, data->data, data->data_type, fsa_cause2string(data->fsa_cause)); } } ha_msg_input_t * copy_ha_msg_input(ha_msg_input_t * orig) { ha_msg_input_t *copy = calloc(1, sizeof(ha_msg_input_t)); CRM_ASSERT(copy != NULL); copy->msg = (orig && orig->msg)? copy_xml(orig->msg) : NULL; copy->xml = get_message_xml(copy->msg, F_CRM_DATA); return copy; } void delete_fsa_input(fsa_data_t * fsa_data) { lrmd_event_data_t *op = NULL; xmlNode *foo = NULL; if (fsa_data == NULL) { return; } crm_trace("About to free %s data", fsa_cause2string(fsa_data->fsa_cause)); if (fsa_data->data != NULL) { switch (fsa_data->data_type) { case fsa_dt_ha_msg: delete_ha_msg_input(fsa_data->data); break; case fsa_dt_xml: foo = fsa_data->data; free_xml(foo); break; case fsa_dt_lrm: op = (lrmd_event_data_t *) fsa_data->data; lrmd_free_event(op); break; case fsa_dt_none: if (fsa_data->data != NULL) { crm_err("Don't know how to free %s data from %s", fsa_cause2string(fsa_data->fsa_cause), fsa_data->origin); crmd_exit(CRM_EX_SOFTWARE); } break; } crm_trace("%s data freed", fsa_cause2string(fsa_data->fsa_cause)); } free(fsa_data); } /* returns the next message */ fsa_data_t * get_message(void) { fsa_data_t *message = (fsa_data_t *) controld_globals.fsa_message_queue->data; controld_globals.fsa_message_queue = g_list_remove(controld_globals.fsa_message_queue, message); crm_trace("Processing input %d", message->id); return message; } void * fsa_typed_data_adv(fsa_data_t * fsa_data, enum fsa_data_type a_type, const char *caller) { void *ret_val = NULL; if (fsa_data == NULL) { crm_err("%s: No FSA data available", caller); } else if (fsa_data->data == NULL) { crm_err("%s: No message data available. Origin: %s", caller, fsa_data->origin); } else if (fsa_data->data_type != a_type) { crm_crit("%s: Message data was the wrong type! %d vs. requested=%d. Origin: %s", caller, fsa_data->data_type, a_type, fsa_data->origin); CRM_ASSERT(fsa_data->data_type == a_type); } else { ret_val = fsa_data->data; } return ret_val; } /* A_MSG_ROUTE */ void do_msg_route(long long action, enum crmd_fsa_cause cause, enum crmd_fsa_state cur_state, enum crmd_fsa_input current_input, fsa_data_t * msg_data) { ha_msg_input_t *input = fsa_typed_data(fsa_dt_ha_msg); route_message(msg_data->fsa_cause, input->msg); } void route_message(enum crmd_fsa_cause cause, xmlNode * input) { ha_msg_input_t fsa_input; enum crmd_fsa_input result = I_NULL; fsa_input.msg = input; CRM_CHECK(cause == C_IPC_MESSAGE || cause == C_HA_MESSAGE, return); /* try passing the buck first */ if (relay_message(input, cause == C_IPC_MESSAGE)) { return; } /* handle locally */ result = handle_message(input, cause); /* done or process later? */ switch (result) { case I_NULL: case I_CIB_OP: case I_ROUTER: case I_NODE_JOIN: case I_JOIN_REQUEST: case I_JOIN_RESULT: break; default: /* Defering local processing of message */ register_fsa_input_later(cause, result, &fsa_input); return; } if (result != I_NULL) { /* add to the front of the queue */ register_fsa_input(cause, result, &fsa_input); } } gboolean relay_message(xmlNode * msg, gboolean originated_locally) { int dest = 1; bool is_for_dc = false; bool is_for_dcib = false; bool is_for_te = false; bool is_for_crm = false; bool is_for_cib = false; bool is_local = false; const char *host_to = crm_element_value(msg, F_CRM_HOST_TO); const char *sys_to = crm_element_value(msg, F_CRM_SYS_TO); const char *sys_from = crm_element_value(msg, F_CRM_SYS_FROM); const char *type = crm_element_value(msg, F_TYPE); const char *task = crm_element_value(msg, F_CRM_TASK); const char *ref = crm_element_value(msg, XML_ATTR_REFERENCE); if (ref == NULL) { ref = "without reference ID"; } if (msg == NULL) { crm_warn("Cannot route empty message"); return TRUE; } else if (pcmk__str_eq(task, CRM_OP_HELLO, pcmk__str_casei)) { crm_trace("No routing needed for hello message %s", ref); return TRUE; } else if (!pcmk__str_eq(type, T_CRM, pcmk__str_casei)) { crm_warn("Received invalid message %s: type '%s' not '" T_CRM "'", ref, pcmk__s(type, "")); crm_log_xml_warn(msg, "[bad message type]"); return TRUE; } else if (sys_to == NULL) { crm_warn("Received invalid message %s: no subsystem", ref); crm_log_xml_warn(msg, "[no subsystem]"); return TRUE; } is_for_dc = (strcasecmp(CRM_SYSTEM_DC, sys_to) == 0); is_for_dcib = (strcasecmp(CRM_SYSTEM_DCIB, sys_to) == 0); is_for_te = (strcasecmp(CRM_SYSTEM_TENGINE, sys_to) == 0); is_for_cib = (strcasecmp(CRM_SYSTEM_CIB, sys_to) == 0); is_for_crm = (strcasecmp(CRM_SYSTEM_CRMD, sys_to) == 0); is_local = false; if (pcmk__str_empty(host_to)) { if (is_for_dc || is_for_te) { is_local = false; } else if (is_for_crm) { if (pcmk__strcase_any_of(task, CRM_OP_NODE_INFO, PCMK__CONTROLD_CMD_NODES, NULL)) { /* Node info requests do not specify a host, which is normally * treated as "all hosts", because the whole point is that the * client may not know the local node name. Always handle these * requests locally. */ is_local = true; } else { is_local = !originated_locally; } } else { is_local = true; } } else if (pcmk__str_eq(controld_globals.our_nodename, host_to, pcmk__str_casei)) { is_local = true; } else if (is_for_crm && pcmk__str_eq(task, CRM_OP_LRM_DELETE, pcmk__str_casei)) { xmlNode *msg_data = get_message_xml(msg, F_CRM_DATA); const char *mode = crm_element_value(msg_data, PCMK__XA_MODE); if (pcmk__str_eq(mode, XML_TAG_CIB, pcmk__str_casei)) { // Local delete of an offline node's resource history is_local = true; } } if (is_for_dc || is_for_dcib || is_for_te) { if (AM_I_DC && is_for_te) { crm_trace("Route message %s locally as transition request", ref); send_msg_via_ipc(msg, sys_to); } else if (AM_I_DC) { crm_trace("Route message %s locally as DC request", ref); return FALSE; // More to be done by caller } else if (originated_locally && !pcmk__strcase_any_of(sys_from, CRM_SYSTEM_PENGINE, CRM_SYSTEM_TENGINE, NULL)) { if (is_corosync_cluster()) { dest = text2msg_type(sys_to); } crm_trace("Relay message %s to DC", ref); send_cluster_message(host_to ? crm_get_peer(0, host_to) : NULL, dest, msg, TRUE); } else { /* Neither the TE nor the scheduler should be sending messages * to DCs on other nodes. By definition, if we are no longer the DC, * then the scheduler's or TE's data should be discarded. */ crm_trace("Discard message %s because we are not DC", ref); } } else if (is_local && (is_for_crm || is_for_cib)) { crm_trace("Route message %s locally as controller request", ref); return FALSE; // More to be done by caller } else if (is_local) { crm_trace("Relay message %s locally to %s", ref, (sys_to? sys_to : "unknown client")); crm_log_xml_trace(msg, "[IPC relay]"); send_msg_via_ipc(msg, sys_to); } else { crm_node_t *node_to = NULL; if (is_corosync_cluster()) { dest = text2msg_type(sys_to); if (dest == crm_msg_none || dest > crm_msg_stonith_ng) { dest = crm_msg_crmd; } } if (host_to) { - node_to = pcmk__search_cluster_node_cache(0, host_to); + node_to = pcmk__search_cluster_node_cache(0, host_to, NULL); if (node_to == NULL) { crm_warn("Cannot route message %s: Unknown node %s", ref, host_to); return TRUE; } crm_trace("Relay message %s to %s", ref, (node_to->uname? node_to->uname : "peer")); } else { crm_trace("Broadcast message %s to all peers", ref); } send_cluster_message(host_to ? node_to : NULL, dest, msg, TRUE); } return TRUE; // No further processing of message is needed } // Return true if field contains a positive integer static bool authorize_version(xmlNode *message_data, const char *field, const char *client_name, const char *ref, const char *uuid) { const char *version = crm_element_value(message_data, field); long long version_num; if ((pcmk__scan_ll(version, &version_num, -1LL) != pcmk_rc_ok) || (version_num < 0LL)) { crm_warn("Rejected IPC hello from %s: '%s' is not a valid protocol %s " CRM_XS " ref=%s uuid=%s", client_name, ((version == NULL)? "" : version), field, (ref? ref : "none"), uuid); return false; } return true; } /*! * \internal * \brief Check whether a client IPC message is acceptable * * If a given client IPC message is a hello, "authorize" it by ensuring it has * valid information such as a protocol version, and return false indicating * that nothing further needs to be done with the message. If the message is not * a hello, just return true to indicate it needs further processing. * * \param[in] client_msg XML of IPC message * \param[in,out] curr_client If IPC is not proxied, client that sent message * \param[in] proxy_session If IPC is proxied, the session ID * * \return true if message needs further processing, false if it doesn't */ bool controld_authorize_ipc_message(const xmlNode *client_msg, pcmk__client_t *curr_client, const char *proxy_session) { xmlNode *message_data = NULL; const char *client_name = NULL; const char *op = crm_element_value(client_msg, F_CRM_TASK); const char *ref = crm_element_value(client_msg, XML_ATTR_REFERENCE); const char *uuid = (curr_client? curr_client->id : proxy_session); if (uuid == NULL) { crm_warn("IPC message from client rejected: No client identifier " CRM_XS " ref=%s", (ref? ref : "none")); goto rejected; } if (!pcmk__str_eq(CRM_OP_HELLO, op, pcmk__str_casei)) { // Only hello messages need to be authorized return true; } message_data = get_message_xml(client_msg, F_CRM_DATA); client_name = crm_element_value(message_data, "client_name"); if (pcmk__str_empty(client_name)) { crm_warn("IPC hello from client rejected: No client name", CRM_XS " ref=%s uuid=%s", (ref? ref : "none"), uuid); goto rejected; } if (!authorize_version(message_data, "major_version", client_name, ref, uuid)) { goto rejected; } if (!authorize_version(message_data, "minor_version", client_name, ref, uuid)) { goto rejected; } crm_trace("Validated IPC hello from client %s", client_name); if (curr_client) { curr_client->userdata = strdup(client_name); } controld_trigger_fsa(); return false; rejected: if (curr_client) { qb_ipcs_disconnect(curr_client->ipcs); } return false; } static enum crmd_fsa_input handle_message(xmlNode *msg, enum crmd_fsa_cause cause) { const char *type = NULL; CRM_CHECK(msg != NULL, return I_NULL); type = crm_element_value(msg, F_CRM_MSG_TYPE); if (pcmk__str_eq(type, XML_ATTR_REQUEST, pcmk__str_none)) { return handle_request(msg, cause); } else if (pcmk__str_eq(type, XML_ATTR_RESPONSE, pcmk__str_none)) { handle_response(msg); return I_NULL; } crm_err("Unknown message type: %s", type); return I_NULL; } static enum crmd_fsa_input handle_failcount_op(xmlNode * stored_msg) { const char *rsc = NULL; const char *uname = NULL; const char *op = NULL; char *interval_spec = NULL; guint interval_ms = 0; gboolean is_remote_node = FALSE; xmlNode *xml_op = get_message_xml(stored_msg, F_CRM_DATA); if (xml_op) { xmlNode *xml_rsc = first_named_child(xml_op, XML_CIB_TAG_RESOURCE); xmlNode *xml_attrs = first_named_child(xml_op, XML_TAG_ATTRS); if (xml_rsc) { rsc = ID(xml_rsc); } if (xml_attrs) { op = crm_element_value(xml_attrs, CRM_META "_" XML_RSC_ATTR_CLEAR_OP); crm_element_value_ms(xml_attrs, CRM_META "_" XML_RSC_ATTR_CLEAR_INTERVAL, &interval_ms); } } uname = crm_element_value(xml_op, XML_LRM_ATTR_TARGET); if ((rsc == NULL) || (uname == NULL)) { crm_log_xml_warn(stored_msg, "invalid failcount op"); return I_NULL; } if (crm_element_value(xml_op, XML_LRM_ATTR_ROUTER_NODE)) { is_remote_node = TRUE; } crm_debug("Clearing failures for %s-interval %s on %s " "from attribute manager, CIB, and executor state", pcmk__readable_interval(interval_ms), rsc, uname); if (interval_ms) { interval_spec = crm_strdup_printf("%ums", interval_ms); } update_attrd_clear_failures(uname, rsc, op, interval_spec, is_remote_node); free(interval_spec); controld_cib_delete_last_failure(rsc, uname, op, interval_ms); lrm_clear_last_failure(rsc, uname, op, interval_ms); return I_NULL; } static enum crmd_fsa_input handle_lrm_delete(xmlNode *stored_msg) { const char *mode = NULL; xmlNode *msg_data = get_message_xml(stored_msg, F_CRM_DATA); CRM_CHECK(msg_data != NULL, return I_NULL); /* CRM_OP_LRM_DELETE has two distinct modes. The default behavior is to * relay the operation to the affected node, which will unregister the * resource from the local executor, clear the resource's history from the * CIB, and do some bookkeeping in the controller. * * However, if the affected node is offline, the client will specify * mode="cib" which means the controller receiving the operation should * clear the resource's history from the CIB and nothing else. This is used * to clear shutdown locks. */ mode = crm_element_value(msg_data, PCMK__XA_MODE); if ((mode == NULL) || strcmp(mode, XML_TAG_CIB)) { // Relay to affected node crm_xml_add(stored_msg, F_CRM_SYS_TO, CRM_SYSTEM_LRMD); return I_ROUTER; } else { // Delete CIB history locally (compare with do_lrm_delete()) const char *from_sys = NULL; const char *user_name = NULL; const char *rsc_id = NULL; const char *node = NULL; xmlNode *rsc_xml = NULL; int rc = pcmk_rc_ok; rsc_xml = first_named_child(msg_data, XML_CIB_TAG_RESOURCE); CRM_CHECK(rsc_xml != NULL, return I_NULL); rsc_id = ID(rsc_xml); from_sys = crm_element_value(stored_msg, F_CRM_SYS_FROM); node = crm_element_value(msg_data, XML_LRM_ATTR_TARGET); user_name = pcmk__update_acl_user(stored_msg, F_CRM_USER, NULL); crm_debug("Handling " CRM_OP_LRM_DELETE " for %s on %s locally%s%s " "(clearing CIB resource history only)", rsc_id, node, (user_name? " for user " : ""), (user_name? user_name : "")); rc = controld_delete_resource_history(rsc_id, node, user_name, cib_dryrun|cib_sync_call); if (rc == pcmk_rc_ok) { rc = controld_delete_resource_history(rsc_id, node, user_name, crmd_cib_smart_opt()); } //Notify client and tengine.(Only notify tengine if mode = "cib" and CRM_OP_LRM_DELETE.) if (from_sys) { lrmd_event_data_t *op = NULL; const char *from_host = crm_element_value(stored_msg, F_CRM_HOST_FROM); const char *transition; if (strcmp(from_sys, CRM_SYSTEM_TENGINE)) { transition = crm_element_value(msg_data, XML_ATTR_TRANSITION_KEY); } else { transition = crm_element_value(stored_msg, XML_ATTR_TRANSITION_KEY); } crm_info("Notifying %s on %s that %s was%s deleted", from_sys, (from_host? from_host : "local node"), rsc_id, ((rc == pcmk_rc_ok)? "" : " not")); op = lrmd_new_event(rsc_id, CRMD_ACTION_DELETE, 0); op->type = lrmd_event_exec_complete; op->user_data = strdup(transition? transition : FAKE_TE_ID); op->params = pcmk__strkey_table(free, free); g_hash_table_insert(op->params, strdup(XML_ATTR_CRM_VERSION), strdup(CRM_FEATURE_SET)); controld_rc2event(op, rc); controld_ack_event_directly(from_host, from_sys, NULL, op, rsc_id); lrmd_free_event(op); controld_trigger_delete_refresh(from_sys, rsc_id); } return I_NULL; } } /*! * \brief Handle a CRM_OP_REMOTE_STATE message by updating remote peer cache * * \param[in] msg Message XML * * \return Next FSA input */ static enum crmd_fsa_input handle_remote_state(const xmlNode *msg) { const char *conn_host = NULL; const char *remote_uname = ID(msg); crm_node_t *remote_peer; bool remote_is_up = false; int rc = pcmk_rc_ok; rc = pcmk__xe_get_bool_attr(msg, XML_NODE_IN_CLUSTER, &remote_is_up); CRM_CHECK(remote_uname && rc == pcmk_rc_ok, return I_NULL); remote_peer = crm_remote_peer_get(remote_uname); CRM_CHECK(remote_peer, return I_NULL); pcmk__update_peer_state(__func__, remote_peer, remote_is_up ? CRM_NODE_MEMBER : CRM_NODE_LOST, 0); conn_host = crm_element_value(msg, PCMK__XA_CONN_HOST); if (conn_host) { pcmk__str_update(&remote_peer->conn_host, conn_host); } else if (remote_peer->conn_host) { free(remote_peer->conn_host); remote_peer->conn_host = NULL; } return I_NULL; } /*! * \brief Handle a CRM_OP_PING message * * \param[in] msg Message XML * * \return Next FSA input */ static enum crmd_fsa_input handle_ping(const xmlNode *msg) { const char *value = NULL; xmlNode *ping = NULL; xmlNode *reply = NULL; // Build reply ping = create_xml_node(NULL, XML_CRM_TAG_PING); value = crm_element_value(msg, F_CRM_SYS_TO); crm_xml_add(ping, XML_PING_ATTR_SYSFROM, value); // Add controller state value = fsa_state2string(controld_globals.fsa_state); crm_xml_add(ping, XML_PING_ATTR_CRMDSTATE, value); crm_notice("Current ping state: %s", value); // CTS needs this // Add controller health // @TODO maybe do some checks to determine meaningful status crm_xml_add(ping, XML_PING_ATTR_STATUS, "ok"); // Send reply reply = create_reply(msg, ping); free_xml(ping); if (reply != NULL) { (void) relay_message(reply, TRUE); free_xml(reply); } // Nothing further to do return I_NULL; } /*! * \brief Handle a PCMK__CONTROLD_CMD_NODES message * * \param[in] request Message XML * * \return Next FSA input */ static enum crmd_fsa_input handle_node_list(const xmlNode *request) { GHashTableIter iter; crm_node_t *node = NULL; xmlNode *reply = NULL; xmlNode *reply_data = NULL; // Create message data for reply reply_data = create_xml_node(NULL, XML_CIB_TAG_NODES); g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) & node)) { xmlNode *xml = create_xml_node(reply_data, XML_CIB_TAG_NODE); crm_xml_add_ll(xml, XML_ATTR_ID, (long long) node->id); // uint32_t crm_xml_add(xml, XML_ATTR_UNAME, node->uname); crm_xml_add(xml, XML_NODE_IN_CLUSTER, node->state); } // Create and send reply reply = create_reply(request, reply_data); free_xml(reply_data); if (reply) { (void) relay_message(reply, TRUE); free_xml(reply); } // Nothing further to do return I_NULL; } /*! * \brief Handle a CRM_OP_NODE_INFO request * * \param[in] msg Message XML * * \return Next FSA input */ static enum crmd_fsa_input handle_node_info_request(const xmlNode *msg) { const char *value = NULL; crm_node_t *node = NULL; int node_id = 0; xmlNode *reply = NULL; xmlNode *reply_data = NULL; // Build reply reply_data = create_xml_node(NULL, XML_CIB_TAG_NODE); crm_xml_add(reply_data, XML_PING_ATTR_SYSFROM, CRM_SYSTEM_CRMD); // Add whether current partition has quorum pcmk__xe_set_bool_attr(reply_data, XML_ATTR_HAVE_QUORUM, pcmk_is_set(controld_globals.flags, controld_has_quorum)); // Check whether client requested node info by ID and/or name crm_element_value_int(msg, XML_ATTR_ID, &node_id); if (node_id < 0) { node_id = 0; } value = crm_element_value(msg, XML_ATTR_UNAME); // Default to local node if none given if ((node_id == 0) && (value == NULL)) { value = controld_globals.our_nodename; } node = pcmk__search_node_caches(node_id, value, CRM_GET_PEER_ANY); if (node) { crm_xml_add(reply_data, XML_ATTR_ID, node->uuid); crm_xml_add(reply_data, XML_ATTR_UNAME, node->uname); crm_xml_add(reply_data, XML_NODE_IS_PEER, node->state); pcmk__xe_set_bool_attr(reply_data, XML_NODE_IS_REMOTE, pcmk_is_set(node->flags, crm_remote_node)); } // Send reply reply = create_reply(msg, reply_data); free_xml(reply_data); if (reply != NULL) { (void) relay_message(reply, TRUE); free_xml(reply); } // Nothing further to do return I_NULL; } static void verify_feature_set(xmlNode *msg) { const char *dc_version = crm_element_value(msg, XML_ATTR_CRM_VERSION); if (dc_version == NULL) { /* All we really know is that the DC feature set is older than 3.1.0, * but that's also all that really matters. */ dc_version = "3.0.14"; } if (feature_set_compatible(dc_version, CRM_FEATURE_SET)) { crm_trace("Local feature set (%s) is compatible with DC's (%s)", CRM_FEATURE_SET, dc_version); } else { crm_err("Local feature set (%s) is incompatible with DC's (%s)", CRM_FEATURE_SET, dc_version); // Nothing is likely to improve without administrator involvement controld_set_fsa_input_flags(R_STAYDOWN); crmd_exit(CRM_EX_FATAL); } } // DC gets own shutdown all-clear static enum crmd_fsa_input handle_shutdown_self_ack(xmlNode *stored_msg) { const char *host_from = crm_element_value(stored_msg, F_CRM_HOST_FROM); if (pcmk_is_set(controld_globals.fsa_input_register, R_SHUTDOWN)) { // The expected case -- we initiated own shutdown sequence crm_info("Shutting down controller"); return I_STOP; } if (pcmk__str_eq(host_from, controld_globals.dc_name, pcmk__str_casei)) { // Must be logic error -- DC confirming its own unrequested shutdown crm_err("Shutting down controller immediately due to " "unexpected shutdown confirmation"); return I_TERMINATE; } if (controld_globals.fsa_state != S_STOPPING) { // Shouldn't happen -- non-DC confirming unrequested shutdown crm_err("Starting new DC election because %s is " "confirming shutdown we did not request", (host_from? host_from : "another node")); return I_ELECTION; } // Shouldn't happen, but we are already stopping anyway crm_debug("Ignoring unexpected shutdown confirmation from %s", (host_from? host_from : "another node")); return I_NULL; } // Non-DC gets shutdown all-clear from DC static enum crmd_fsa_input handle_shutdown_ack(xmlNode *stored_msg) { const char *host_from = crm_element_value(stored_msg, F_CRM_HOST_FROM); if (host_from == NULL) { crm_warn("Ignoring shutdown request without origin specified"); return I_NULL; } if (pcmk__str_eq(host_from, controld_globals.dc_name, pcmk__str_null_matches|pcmk__str_casei)) { if (pcmk_is_set(controld_globals.fsa_input_register, R_SHUTDOWN)) { crm_info("Shutting down controller after confirmation from %s", host_from); } else { crm_err("Shutting down controller after unexpected " "shutdown request from %s", host_from); controld_set_fsa_input_flags(R_STAYDOWN); } return I_STOP; } crm_warn("Ignoring shutdown request from %s because DC is %s", host_from, controld_globals.dc_name); return I_NULL; } static enum crmd_fsa_input handle_request(xmlNode *stored_msg, enum crmd_fsa_cause cause) { xmlNode *msg = NULL; const char *op = crm_element_value(stored_msg, F_CRM_TASK); /* Optimize this for the DC - it has the most to do */ if (op == NULL) { crm_log_xml_warn(stored_msg, "[request without " F_CRM_TASK "]"); return I_NULL; } if (strcmp(op, CRM_OP_SHUTDOWN_REQ) == 0) { const char *from = crm_element_value(stored_msg, F_CRM_HOST_FROM); - crm_node_t *node = pcmk__search_cluster_node_cache(0, from); + crm_node_t *node = pcmk__search_cluster_node_cache(0, from, NULL); pcmk__update_peer_expected(__func__, node, CRMD_JOINSTATE_DOWN); if(AM_I_DC == FALSE) { return I_NULL; /* Done */ } } /*========== DC-Only Actions ==========*/ if (AM_I_DC) { if (strcmp(op, CRM_OP_JOIN_ANNOUNCE) == 0) { return I_NODE_JOIN; } else if (strcmp(op, CRM_OP_JOIN_REQUEST) == 0) { return I_JOIN_REQUEST; } else if (strcmp(op, CRM_OP_JOIN_CONFIRM) == 0) { return I_JOIN_RESULT; } else if (strcmp(op, CRM_OP_SHUTDOWN) == 0) { return handle_shutdown_self_ack(stored_msg); } else if (strcmp(op, CRM_OP_SHUTDOWN_REQ) == 0) { // Another controller wants to shut down its node return handle_shutdown_request(stored_msg); } } /*========== common actions ==========*/ if (strcmp(op, CRM_OP_NOVOTE) == 0) { ha_msg_input_t fsa_input; fsa_input.msg = stored_msg; register_fsa_input_adv(C_HA_MESSAGE, I_NULL, &fsa_input, A_ELECTION_COUNT | A_ELECTION_CHECK, FALSE, __func__); } else if (strcmp(op, CRM_OP_REMOTE_STATE) == 0) { /* a remote connection host is letting us know the node state */ return handle_remote_state(stored_msg); } else if (strcmp(op, CRM_OP_THROTTLE) == 0) { throttle_update(stored_msg); if (AM_I_DC && (controld_globals.transition_graph != NULL) && !controld_globals.transition_graph->complete) { crm_debug("The throttle changed. Trigger a graph."); trigger_graph(); } return I_NULL; } else if (strcmp(op, CRM_OP_CLEAR_FAILCOUNT) == 0) { return handle_failcount_op(stored_msg); } else if (strcmp(op, CRM_OP_VOTE) == 0) { /* count the vote and decide what to do after that */ ha_msg_input_t fsa_input; fsa_input.msg = stored_msg; register_fsa_input_adv(C_HA_MESSAGE, I_NULL, &fsa_input, A_ELECTION_COUNT | A_ELECTION_CHECK, FALSE, __func__); /* Sometimes we _must_ go into S_ELECTION */ if (controld_globals.fsa_state == S_HALT) { crm_debug("Forcing an election from S_HALT"); return I_ELECTION; #if 0 } else if (AM_I_DC) { /* This is the old way of doing things but what is gained? */ return I_ELECTION; #endif } } else if (strcmp(op, CRM_OP_JOIN_OFFER) == 0) { verify_feature_set(stored_msg); crm_debug("Raising I_JOIN_OFFER: join-%s", crm_element_value(stored_msg, F_CRM_JOIN_ID)); return I_JOIN_OFFER; } else if (strcmp(op, CRM_OP_JOIN_ACKNAK) == 0) { crm_debug("Raising I_JOIN_RESULT: join-%s", crm_element_value(stored_msg, F_CRM_JOIN_ID)); return I_JOIN_RESULT; } else if (strcmp(op, CRM_OP_LRM_DELETE) == 0) { return handle_lrm_delete(stored_msg); } else if ((strcmp(op, CRM_OP_LRM_FAIL) == 0) || (strcmp(op, CRM_OP_LRM_REFRESH) == 0) // @COMPAT || (strcmp(op, CRM_OP_REPROBE) == 0)) { crm_xml_add(stored_msg, F_CRM_SYS_TO, CRM_SYSTEM_LRMD); return I_ROUTER; } else if (strcmp(op, CRM_OP_NOOP) == 0) { return I_NULL; } else if (strcmp(op, CRM_OP_LOCAL_SHUTDOWN) == 0) { crm_shutdown(SIGTERM); /*return I_SHUTDOWN; */ return I_NULL; } else if (strcmp(op, CRM_OP_PING) == 0) { return handle_ping(stored_msg); } else if (strcmp(op, CRM_OP_NODE_INFO) == 0) { return handle_node_info_request(stored_msg); } else if (strcmp(op, CRM_OP_RM_NODE_CACHE) == 0) { int id = 0; const char *name = NULL; crm_element_value_int(stored_msg, XML_ATTR_ID, &id); name = crm_element_value(stored_msg, XML_ATTR_UNAME); if(cause == C_IPC_MESSAGE) { msg = create_request(CRM_OP_RM_NODE_CACHE, NULL, NULL, CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL); if (send_cluster_message(NULL, crm_msg_crmd, msg, TRUE) == FALSE) { crm_err("Could not instruct peers to remove references to node %s/%u", name, id); } else { crm_notice("Instructing peers to remove references to node %s/%u", name, id); } free_xml(msg); } else { reap_crm_member(id, name); /* If we're forgetting this node, also forget any failures to fence * it, so we don't carry that over to any node added later with the * same name. */ st_fail_count_reset(name); } } else if (strcmp(op, CRM_OP_MAINTENANCE_NODES) == 0) { xmlNode *xml = get_message_xml(stored_msg, F_CRM_DATA); remote_ra_process_maintenance_nodes(xml); } else if (strcmp(op, PCMK__CONTROLD_CMD_NODES) == 0) { return handle_node_list(stored_msg); /*========== (NOT_DC)-Only Actions ==========*/ } else if (!AM_I_DC) { if (strcmp(op, CRM_OP_SHUTDOWN) == 0) { return handle_shutdown_ack(stored_msg); } } else { crm_err("Unexpected request (%s) sent to %s", op, AM_I_DC ? "the DC" : "non-DC node"); crm_log_xml_err(stored_msg, "Unexpected"); } return I_NULL; } static void handle_response(xmlNode *stored_msg) { const char *op = crm_element_value(stored_msg, F_CRM_TASK); if (op == NULL) { crm_log_xml_err(stored_msg, "Bad message"); } else if (AM_I_DC && strcmp(op, CRM_OP_PECALC) == 0) { // Check whether scheduler answer been superseded by subsequent request const char *msg_ref = crm_element_value(stored_msg, XML_ATTR_REFERENCE); if (msg_ref == NULL) { crm_err("%s - Ignoring calculation with no reference", op); } else if (pcmk__str_eq(msg_ref, controld_globals.fsa_pe_ref, pcmk__str_none)) { ha_msg_input_t fsa_input; controld_stop_sched_timer(); fsa_input.msg = stored_msg; register_fsa_input_later(C_IPC_MESSAGE, I_PE_SUCCESS, &fsa_input); } else { crm_info("%s calculation %s is obsolete", op, msg_ref); } } else if (strcmp(op, CRM_OP_VOTE) == 0 || strcmp(op, CRM_OP_SHUTDOWN_REQ) == 0 || strcmp(op, CRM_OP_SHUTDOWN) == 0) { } else { const char *host_from = crm_element_value(stored_msg, F_CRM_HOST_FROM); crm_err("Unexpected response (op=%s, src=%s) sent to the %s", op, host_from, AM_I_DC ? "DC" : "controller"); } } static enum crmd_fsa_input handle_shutdown_request(xmlNode * stored_msg) { /* handle here to avoid potential version issues * where the shutdown message/procedure may have * been changed in later versions. * * This way the DC is always in control of the shutdown */ char *now_s = NULL; const char *host_from = crm_element_value(stored_msg, F_CRM_HOST_FROM); if (host_from == NULL) { /* we're shutting down and the DC */ host_from = controld_globals.our_nodename; } crm_info("Creating shutdown request for %s (state=%s)", host_from, fsa_state2string(controld_globals.fsa_state)); crm_log_xml_trace(stored_msg, "message"); now_s = pcmk__ttoa(time(NULL)); update_attrd(host_from, XML_CIB_ATTR_SHUTDOWN, now_s, NULL, FALSE); free(now_s); /* will be picked up by the TE as long as its running */ return I_NULL; } static void send_msg_via_ipc(xmlNode * msg, const char *sys) { pcmk__client_t *client_channel = NULL; CRM_CHECK(sys != NULL, return); client_channel = pcmk__find_client_by_id(sys); if (crm_element_value(msg, F_CRM_HOST_FROM) == NULL) { crm_xml_add(msg, F_CRM_HOST_FROM, controld_globals.our_nodename); } if (client_channel != NULL) { /* Transient clients such as crmadmin */ pcmk__ipc_send_xml(client_channel, 0, msg, crm_ipc_server_event); } else if (pcmk__str_eq(sys, CRM_SYSTEM_TENGINE, pcmk__str_none)) { xmlNode *data = get_message_xml(msg, F_CRM_DATA); process_te_message(msg, data); } else if (pcmk__str_eq(sys, CRM_SYSTEM_LRMD, pcmk__str_none)) { fsa_data_t fsa_data; ha_msg_input_t fsa_input; fsa_input.msg = msg; fsa_input.xml = get_message_xml(msg, F_CRM_DATA); fsa_data.id = 0; fsa_data.actions = 0; fsa_data.data = &fsa_input; fsa_data.fsa_input = I_MESSAGE; fsa_data.fsa_cause = C_IPC_MESSAGE; fsa_data.origin = __func__; fsa_data.data_type = fsa_dt_ha_msg; do_lrm_invoke(A_LRM_INVOKE, C_IPC_MESSAGE, controld_globals.fsa_state, I_MESSAGE, &fsa_data); } else if (crmd_is_proxy_session(sys)) { crmd_proxy_send(sys, msg); } else { crm_info("Received invalid request: unknown subsystem '%s'", sys); } } void delete_ha_msg_input(ha_msg_input_t * orig) { if (orig == NULL) { return; } free_xml(orig->msg); free(orig); } /*! * \internal * \brief Notify the cluster of a remote node state change * * \param[in] node_name Node's name * \param[in] node_up true if node is up, false if down */ void broadcast_remote_state_message(const char *node_name, bool node_up) { xmlNode *msg = create_request(CRM_OP_REMOTE_STATE, NULL, NULL, CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL); crm_info("Notifying cluster of Pacemaker Remote node %s %s", node_name, node_up? "coming up" : "going down"); crm_xml_add(msg, XML_ATTR_ID, node_name); pcmk__xe_set_bool_attr(msg, XML_NODE_IN_CLUSTER, node_up); if (node_up) { crm_xml_add(msg, PCMK__XA_CONN_HOST, controld_globals.our_nodename); } send_cluster_message(NULL, crm_msg_crmd, msg, TRUE); free_xml(msg); } diff --git a/daemons/controld/controld_te_utils.c b/daemons/controld/controld_te_utils.c index ecbc0b2d78..0c1b8acf75 100644 --- a/daemons/controld/controld_te_utils.c +++ b/daemons/controld/controld_te_utils.c @@ -1,367 +1,501 @@ /* * Copyright 2004-2022 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include //! Triggers transition graph processing static crm_trigger_t *transition_trigger = NULL; +static GHashTable *node_pending_timers = NULL; + gboolean stop_te_timer(pcmk__graph_action_t *action) { if (action == NULL) { return FALSE; } if (action->timer != 0) { crm_trace("Stopping action timer"); g_source_remove(action->timer); action->timer = 0; } else { crm_trace("Action timer was already stopped"); return FALSE; } return TRUE; } static gboolean te_graph_trigger(gpointer user_data) { if (controld_globals.transition_graph == NULL) { crm_debug("Nothing to do"); return TRUE; } crm_trace("Invoking graph %d in state %s", controld_globals.transition_graph->id, fsa_state2string(controld_globals.fsa_state)); switch (controld_globals.fsa_state) { case S_STARTING: case S_PENDING: case S_NOT_DC: case S_HALT: case S_ILLEGAL: case S_STOPPING: case S_TERMINATE: return TRUE; default: break; } if (!controld_globals.transition_graph->complete) { enum pcmk__graph_status graph_rc; int orig_limit = controld_globals.transition_graph->batch_limit; int throttled_limit = throttle_get_total_job_limit(orig_limit); controld_globals.transition_graph->batch_limit = throttled_limit; graph_rc = pcmk__execute_graph(controld_globals.transition_graph); controld_globals.transition_graph->batch_limit = orig_limit; if (graph_rc == pcmk__graph_active) { crm_trace("Transition not yet complete"); return TRUE; } else if (graph_rc == pcmk__graph_pending) { crm_trace("Transition not yet complete - no actions fired"); return TRUE; } if (graph_rc != pcmk__graph_complete) { crm_warn("Transition failed: %s", pcmk__graph_status2text(graph_rc)); pcmk__log_graph(LOG_NOTICE, controld_globals.transition_graph); } } crm_debug("Transition %d is now complete", controld_globals.transition_graph->id); controld_globals.transition_graph->complete = true; notify_crmd(controld_globals.transition_graph); return TRUE; } /*! * \internal * \brief Initialize transition trigger */ void controld_init_transition_trigger(void) { transition_trigger = mainloop_add_trigger(G_PRIORITY_LOW, te_graph_trigger, NULL); } /*! * \internal * \brief Destroy transition trigger */ void controld_destroy_transition_trigger(void) { mainloop_destroy_trigger(transition_trigger); transition_trigger = NULL; } void controld_trigger_graph_as(const char *fn, int line) { crm_trace("%s:%d - Triggered graph processing", fn, line); mainloop_set_trigger(transition_trigger); } static struct abort_timer_s { bool aborted; guint id; int priority; enum pcmk__graph_next action; const char *text; } abort_timer = { 0, }; static gboolean abort_timer_popped(gpointer data) { - if (AM_I_DC && (abort_timer.aborted == FALSE)) { - abort_transition(abort_timer.priority, abort_timer.action, - abort_timer.text, NULL); + struct abort_timer_s *abort_timer = (struct abort_timer_s *) data; + + if (AM_I_DC && (abort_timer->aborted == FALSE)) { + abort_transition(abort_timer->priority, abort_timer->action, + abort_timer->text, NULL); } - abort_timer.id = 0; + abort_timer->id = 0; return FALSE; // do not immediately reschedule timer } /*! * \internal * \brief Abort transition after delay, if not already aborted in that time * * \param[in] abort_text Must be literal string */ void abort_after_delay(int abort_priority, enum pcmk__graph_next abort_action, const char *abort_text, guint delay_ms) { if (abort_timer.id) { // Timer already in progress, stop and reschedule g_source_remove(abort_timer.id); } abort_timer.aborted = FALSE; abort_timer.priority = abort_priority; abort_timer.action = abort_action; abort_timer.text = abort_text; - abort_timer.id = g_timeout_add(delay_ms, abort_timer_popped, NULL); + abort_timer.id = g_timeout_add(delay_ms, abort_timer_popped, &abort_timer); +} + +static void +free_node_pending_timer(gpointer data) +{ + struct abort_timer_s *node_pending_timer = (struct abort_timer_s *) data; + + if (node_pending_timer->id != 0) { + g_source_remove(node_pending_timer->id); + node_pending_timer->id = 0; + } + + free(node_pending_timer); +} + +static gboolean +node_pending_timer_popped(gpointer key) +{ + struct abort_timer_s *node_pending_timer = NULL; + + if (node_pending_timers == NULL) { + return FALSE; + } + + node_pending_timer = g_hash_table_lookup(node_pending_timers, key); + if (node_pending_timer == NULL) { + return FALSE; + } + + crm_warn("Node with id '%s' pending timed out (%us) on joining the process " + "group", + (const char *) key, controld_globals.node_pending_timeout); + + abort_timer_popped(node_pending_timer); + + g_hash_table_remove(node_pending_timers, key); + + return FALSE; // do not reschedule timer +} + +static void +init_node_pending_timer(const crm_node_t *node, guint timeout) +{ + struct abort_timer_s *node_pending_timer = NULL; + char *key = NULL; + + if (node->uuid == NULL) { + return; + } + + if (node_pending_timers == NULL) { + node_pending_timers = pcmk__strikey_table(free, + free_node_pending_timer); + + // The timer is somehow already existing + } else if (g_hash_table_lookup(node_pending_timers, node->uuid) != NULL) { + return; + } + + crm_notice("Waiting for pending %s with id '%s' to join the process " + "group (timeout=%us)", + node->uname ? node->uname : "node", node->uuid, + controld_globals.node_pending_timeout); + + node_pending_timer = calloc(1, sizeof(struct abort_timer_s)); + CRM_ASSERT(node_pending_timer != NULL); + + node_pending_timer->aborted = FALSE; + node_pending_timer->priority = INFINITY; + node_pending_timer->action = pcmk__graph_restart; + node_pending_timer->text = "Node pending timed out"; + + key = strdup(node->uuid); + CRM_ASSERT(key != NULL); + + g_hash_table_replace(node_pending_timers, key, node_pending_timer); + + node_pending_timer->id = g_timeout_add_seconds(timeout, + node_pending_timer_popped, + key); + CRM_ASSERT(node_pending_timer->id != 0); +} + +static void +remove_node_pending_timer(const char *node_uuid) +{ + if (node_pending_timers == NULL) { + return; + } + + g_hash_table_remove(node_pending_timers, node_uuid); +} + +void +controld_node_pending_timer(const crm_node_t *node) +{ + long long remaining_timeout = 0; + + /* Node is either not even a cluster member or it's as well online in CPG. + * Free any node pending timer of it. + */ + if (node->when_member <= 0 || node->when_online != 0) { + remove_node_pending_timer(node->uuid); + return; + } + // Node is a cluster member but offline in CPG + + remaining_timeout = node->when_member - time(NULL) + + controld_globals.node_pending_timeout; + + /* It already passed node pending timeout somehow. + * Free any node pending timer of it. + */ + if (remaining_timeout <= 0) { + remove_node_pending_timer(node->uuid); + return; + } + + init_node_pending_timer(node, remaining_timeout); +} + +void +controld_free_node_pending_timers(void) +{ + if (node_pending_timers == NULL) { + return; + } + + g_hash_table_destroy(node_pending_timers); + node_pending_timers = NULL; } static const char * abort2text(enum pcmk__graph_next abort_action) { switch (abort_action) { case pcmk__graph_done: return "done"; case pcmk__graph_wait: return "stop"; case pcmk__graph_restart: return "restart"; case pcmk__graph_shutdown: return "shutdown"; } return "unknown"; } static bool update_abort_priority(pcmk__graph_t *graph, int priority, enum pcmk__graph_next action, const char *abort_reason) { bool change = FALSE; if (graph == NULL) { return change; } if (graph->abort_priority < priority) { crm_debug("Abort priority upgraded from %d to %d", graph->abort_priority, priority); graph->abort_priority = priority; if (graph->abort_reason != NULL) { crm_debug("'%s' abort superseded by %s", graph->abort_reason, abort_reason); } graph->abort_reason = abort_reason; change = TRUE; } if (graph->completion_action < action) { crm_debug("Abort action %s superseded by %s: %s", abort2text(graph->completion_action), abort2text(action), abort_reason); graph->completion_action = action; change = TRUE; } return change; } void abort_transition_graph(int abort_priority, enum pcmk__graph_next abort_action, const char *abort_text, const xmlNode *reason, const char *fn, int line) { int add[] = { 0, 0, 0 }; int del[] = { 0, 0, 0 }; int level = LOG_INFO; const xmlNode *diff = NULL; const xmlNode *change = NULL; CRM_CHECK(controld_globals.transition_graph != NULL, return); switch (controld_globals.fsa_state) { case S_STARTING: case S_PENDING: case S_NOT_DC: case S_HALT: case S_ILLEGAL: case S_STOPPING: case S_TERMINATE: crm_info("Abort %s suppressed: state=%s (%scomplete)", abort_text, fsa_state2string(controld_globals.fsa_state), (controld_globals.transition_graph->complete? "" : "in")); return; default: break; } abort_timer.aborted = TRUE; controld_expect_sched_reply(NULL); if (!controld_globals.transition_graph->complete && update_abort_priority(controld_globals.transition_graph, abort_priority, abort_action, abort_text)) { level = LOG_NOTICE; } if (reason != NULL) { const xmlNode *search = NULL; for(search = reason; search; search = search->parent) { if (pcmk__str_eq(XML_TAG_DIFF, TYPE(search), pcmk__str_casei)) { diff = search; break; } } if(diff) { xml_patch_versions(diff, add, del); for(search = reason; search; search = search->parent) { if (pcmk__str_eq(XML_DIFF_CHANGE, TYPE(search), pcmk__str_casei)) { change = search; break; } } } } if (reason == NULL) { do_crm_log(level, "Transition %d aborted: %s " CRM_XS " source=%s:%d " "complete=%s", controld_globals.transition_graph->id, abort_text, fn, line, pcmk__btoa(controld_globals.transition_graph->complete)); } else if(change == NULL) { GString *local_path = pcmk__element_xpath(reason); CRM_ASSERT(local_path != NULL); do_crm_log(level, "Transition %d aborted by %s.%s: %s " CRM_XS " cib=%d.%d.%d source=%s:%d path=%s complete=%s", controld_globals.transition_graph->id, TYPE(reason), ID(reason), abort_text, add[0], add[1], add[2], fn, line, (const char *) local_path->str, pcmk__btoa(controld_globals.transition_graph->complete)); g_string_free(local_path, TRUE); } else { const char *kind = NULL; const char *op = crm_element_value(change, XML_DIFF_OP); const char *path = crm_element_value(change, XML_DIFF_PATH); if(change == reason) { if(strcmp(op, "create") == 0) { reason = reason->children; } else if(strcmp(op, "modify") == 0) { reason = first_named_child(reason, XML_DIFF_RESULT); if(reason) { reason = reason->children; } } } kind = TYPE(reason); if(strcmp(op, "delete") == 0) { const char *shortpath = strrchr(path, '/'); do_crm_log(level, "Transition %d aborted by deletion of %s: %s " CRM_XS " cib=%d.%d.%d source=%s:%d path=%s complete=%s", controld_globals.transition_graph->id, (shortpath? (shortpath + 1) : path), abort_text, add[0], add[1], add[2], fn, line, path, pcmk__btoa(controld_globals.transition_graph->complete)); } else if (pcmk__str_eq(XML_CIB_TAG_NVPAIR, kind, pcmk__str_none)) { do_crm_log(level, "Transition %d aborted by %s doing %s %s=%s: %s " CRM_XS " cib=%d.%d.%d source=%s:%d path=%s complete=%s", controld_globals.transition_graph->id, crm_element_value(reason, XML_ATTR_ID), op, crm_element_value(reason, XML_NVPAIR_ATTR_NAME), crm_element_value(reason, XML_NVPAIR_ATTR_VALUE), abort_text, add[0], add[1], add[2], fn, line, path, pcmk__btoa(controld_globals.transition_graph->complete)); } else if (pcmk__str_eq(XML_LRM_TAG_RSC_OP, kind, pcmk__str_none)) { const char *magic = crm_element_value(reason, XML_ATTR_TRANSITION_MAGIC); do_crm_log(level, "Transition %d aborted by operation %s '%s' on %s: %s " CRM_XS " magic=%s cib=%d.%d.%d source=%s:%d complete=%s", controld_globals.transition_graph->id, crm_element_value(reason, XML_LRM_ATTR_TASK_KEY), op, crm_element_value(reason, XML_LRM_ATTR_TARGET), abort_text, magic, add[0], add[1], add[2], fn, line, pcmk__btoa(controld_globals.transition_graph->complete)); } else if (pcmk__str_any_of(kind, XML_CIB_TAG_STATE, XML_CIB_TAG_NODE, NULL)) { const char *uname = crm_peer_uname(ID(reason)); do_crm_log(level, "Transition %d aborted by %s '%s' on %s: %s " CRM_XS " cib=%d.%d.%d source=%s:%d complete=%s", controld_globals.transition_graph->id, kind, op, (uname? uname : ID(reason)), abort_text, add[0], add[1], add[2], fn, line, pcmk__btoa(controld_globals.transition_graph->complete)); } else { const char *id = ID(reason); do_crm_log(level, "Transition %d aborted by %s.%s '%s': %s " CRM_XS " cib=%d.%d.%d source=%s:%d path=%s complete=%s", controld_globals.transition_graph->id, TYPE(reason), (id? id : ""), (op? op : "change"), abort_text, add[0], add[1], add[2], fn, line, path, pcmk__btoa(controld_globals.transition_graph->complete)); } } if (controld_globals.transition_graph->complete) { if (controld_get_period_transition_timer() > 0) { controld_stop_transition_timer(); controld_start_transition_timer(); } else { register_fsa_input(C_FSA_INTERNAL, I_PE_CALC, NULL); } return; } trigger_graph(); } diff --git a/daemons/controld/controld_transition.h b/daemons/controld/controld_transition.h index 2da4221901..0655bd9055 100644 --- a/daemons/controld/controld_transition.h +++ b/daemons/controld/controld_transition.h @@ -1,63 +1,65 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef TENGINE__H # define TENGINE__H # include # include # include # include /* tengine */ pcmk__graph_action_t *match_down_event(const char *target); pcmk__graph_action_t *get_cancel_action(const char *id, const char *node); bool confirm_cancel_action(const char *id, const char *node_id); void controld_record_action_timeout(pcmk__graph_action_t *action); void controld_destroy_outside_events_table(void); void controld_remove_all_outside_events(void); gboolean fail_incompletable_actions(pcmk__graph_t *graph, const char *down_node); void process_graph_event(xmlNode *event, const char *event_node); /* utils */ pcmk__graph_action_t *controld_get_action(int id); gboolean stop_te_timer(pcmk__graph_action_t *action); const char *get_rsc_state(const char *task, enum pcmk_exec_status status); void process_te_message(xmlNode *msg, xmlNode *xml_data); void controld_register_graph_functions(void); void notify_crmd(pcmk__graph_t * graph); void cib_action_updated(xmlNode *msg, int call_id, int rc, xmlNode *output, void *user_data); gboolean action_timer_callback(gpointer data); void te_update_diff(const char *event, xmlNode *msg); void controld_init_transition_trigger(void); void controld_destroy_transition_trigger(void); void controld_trigger_graph_as(const char *fn, int line); void abort_after_delay(int abort_priority, enum pcmk__graph_next abort_action, const char *abort_text, guint delay_ms); +void controld_node_pending_timer(const crm_node_t *node); +void controld_free_node_pending_timers(void); void abort_transition_graph(int abort_priority, enum pcmk__graph_next abort_action, const char *abort_text, const xmlNode *reason, const char *fn, int line); # define trigger_graph() controld_trigger_graph_as(__func__, __LINE__) # define abort_transition(pri, action, text, reason) \ abort_transition_graph(pri, action, text, reason,__func__,__LINE__); void te_action_confirmed(pcmk__graph_action_t *action, pcmk__graph_t *graph); void te_reset_job_counts(void); #endif diff --git a/doc/sphinx/Pacemaker_Explained/options.rst b/doc/sphinx/Pacemaker_Explained/options.rst index ee0511c58e..5d95e4c867 100644 --- a/doc/sphinx/Pacemaker_Explained/options.rst +++ b/doc/sphinx/Pacemaker_Explained/options.rst @@ -1,622 +1,631 @@ Cluster-Wide Configuration -------------------------- .. index:: pair: XML element; cib pair: XML element; configuration Configuration Layout #################### The cluster is defined by the Cluster Information Base (CIB), which uses XML notation. The simplest CIB, an empty one, looks like this: .. topic:: An empty configuration .. code-block:: xml The empty configuration above contains the major sections that make up a CIB: * ``cib``: The entire CIB is enclosed with a ``cib`` element. Certain fundamental settings are defined as attributes of this element. * ``configuration``: This section -- the primary focus of this document -- contains traditional configuration information such as what resources the cluster serves and the relationships among them. * ``crm_config``: cluster-wide configuration options * ``nodes``: the machines that host the cluster * ``resources``: the services run by the cluster * ``constraints``: indications of how resources should be placed * ``status``: This section contains the history of each resource on each node. Based on this data, the cluster can construct the complete current state of the cluster. The authoritative source for this section is the local executor (pacemaker-execd process) on each cluster node, and the cluster will occasionally repopulate the entire section. For this reason, it is never written to disk, and administrators are advised against modifying it in any way. In this document, configuration settings will be described as properties or options based on how they are defined in the CIB: * Properties are XML attributes of an XML element. * Options are name-value pairs expressed as ``nvpair`` child elements of an XML element. Normally, you will use command-line tools that abstract the XML, so the distinction will be unimportant; both properties and options are cluster settings you can tweak. CIB Properties ############## Certain settings are defined by CIB properties (that is, attributes of the ``cib`` tag) rather than with the rest of the cluster configuration in the ``configuration`` section. The reason is simply a matter of parsing. These options are used by the configuration database which is, by design, mostly ignorant of the content it holds. So the decision was made to place them in an easy-to-find location. .. table:: **CIB Properties** :class: longtable :widths: 1 3 +------------------+-----------------------------------------------------------+ | Attribute | Description | +==================+===========================================================+ | admin_epoch | .. index:: | | | pair: admin_epoch; cib | | | | | | When a node joins the cluster, the cluster performs a | | | check to see which node has the best configuration. It | | | asks the node with the highest (``admin_epoch``, | | | ``epoch``, ``num_updates``) tuple to replace the | | | configuration on all the nodes -- which makes setting | | | them, and setting them correctly, very important. | | | ``admin_epoch`` is never modified by the cluster; you can | | | use this to make the configurations on any inactive nodes | | | obsolete. | | | | | | **Warning:** Never set this value to zero. In such cases, | | | the cluster cannot tell the difference between your | | | configuration and the "empty" one used when nothing is | | | found on disk. | +------------------+-----------------------------------------------------------+ | epoch | .. index:: | | | pair: epoch; cib | | | | | | The cluster increments this every time the configuration | | | is updated (usually by the administrator). | +------------------+-----------------------------------------------------------+ | num_updates | .. index:: | | | pair: num_updates; cib | | | | | | The cluster increments this every time the configuration | | | or status is updated (usually by the cluster) and resets | | | it to 0 when epoch changes. | +------------------+-----------------------------------------------------------+ | validate-with | .. index:: | | | pair: validate-with; cib | | | | | | Determines the type of XML validation that will be done | | | on the configuration. If set to ``none``, the cluster | | | will not verify that updates conform to the DTD (nor | | | reject ones that don't). | +------------------+-----------------------------------------------------------+ | cib-last-written | .. index:: | | | pair: cib-last-written; cib | | | | | | Indicates when the configuration was last written to | | | disk. Maintained by the cluster; for informational | | | purposes only. | +------------------+-----------------------------------------------------------+ | have-quorum | .. index:: | | | pair: have-quorum; cib | | | | | | Indicates if the cluster has quorum. If false, this may | | | mean that the cluster cannot start resources or fence | | | other nodes (see ``no-quorum-policy`` below). Maintained | | | by the cluster. | +------------------+-----------------------------------------------------------+ | dc-uuid | .. index:: | | | pair: dc-uuid; cib | | | | | | Indicates which cluster node is the current leader. Used | | | by the cluster when placing resources and determining the | | | order of some events. Maintained by the cluster. | +------------------+-----------------------------------------------------------+ .. _cluster_options: Cluster Options ############### Cluster options, as you might expect, control how the cluster behaves when confronted with various situations. They are grouped into sets within the ``crm_config`` section. In advanced configurations, there may be more than one set. (This will be described later in the chapter on :ref:`rules` where we will show how to have the cluster use different sets of options during working hours than during weekends.) For now, we will describe the simple case where each option is present at most once. You can obtain an up-to-date list of cluster options, including their default values, by running the ``man pacemaker-schedulerd`` and ``man pacemaker-controld`` commands. .. table:: **Cluster Options** :class: longtable :widths: 2 1 4 +---------------------------+---------+----------------------------------------------------+ | Option | Default | Description | +===========================+=========+====================================================+ | cluster-name | | .. index:: | | | | pair: cluster option; cluster-name | | | | | | | | An (optional) name for the cluster as a whole. | | | | This is mostly for users' convenience for use | | | | as desired in administration, but this can be | | | | used in the Pacemaker configuration in | | | | :ref:`rules` (as the ``#cluster-name`` | | | | :ref:`node attribute | | | | `. It may | | | | also be used by higher-level tools when | | | | displaying cluster information, and by | | | | certain resource agents (for example, the | | | | ``ocf:heartbeat:GFS2`` agent stores the | | | | cluster name in filesystem meta-data). | +---------------------------+---------+----------------------------------------------------+ | dc-version | | .. index:: | | | | pair: cluster option; dc-version | | | | | | | | Version of Pacemaker on the cluster's DC. | | | | Determined automatically by the cluster. Often | | | | includes the hash which identifies the exact | | | | Git changeset it was built from. Used for | | | | diagnostic purposes. | +---------------------------+---------+----------------------------------------------------+ | cluster-infrastructure | | .. index:: | | | | pair: cluster option; cluster-infrastructure | | | | | | | | The messaging stack on which Pacemaker is | | | | currently running. Determined automatically by | | | | the cluster. Used for informational and | | | | diagnostic purposes. | +---------------------------+---------+----------------------------------------------------+ | no-quorum-policy | stop | .. index:: | | | | pair: cluster option; no-quorum-policy | | | | | | | | What to do when the cluster does not have | | | | quorum. Allowed values: | | | | | | | | * ``ignore:`` continue all resource management | | | | * ``freeze:`` continue resource management, but | | | | don't recover resources from nodes not in the | | | | affected partition | | | | * ``stop:`` stop all resources in the affected | | | | cluster partition | | | | * ``demote:`` demote promotable resources and | | | | stop all other resources in the affected | | | | cluster partition *(since 2.0.5)* | | | | * ``suicide:`` fence all nodes in the affected | | | | cluster partition | +---------------------------+---------+----------------------------------------------------+ | batch-limit | 0 | .. index:: | | | | pair: cluster option; batch-limit | | | | | | | | The maximum number of actions that the cluster | | | | may execute in parallel across all nodes. The | | | | "correct" value will depend on the speed and | | | | load of your network and cluster nodes. If zero, | | | | the cluster will impose a dynamically calculated | | | | limit only when any node has high load. If -1, the | | | | cluster will not impose any limit. | +---------------------------+---------+----------------------------------------------------+ | migration-limit | -1 | .. index:: | | | | pair: cluster option; migration-limit | | | | | | | | The number of | | | | :ref:`live migration ` actions | | | | that the cluster is allowed to execute in | | | | parallel on a node. A value of -1 means | | | | unlimited. | +---------------------------+---------+----------------------------------------------------+ | symmetric-cluster | true | .. index:: | | | | pair: cluster option; symmetric-cluster | | | | | | | | Whether resources can run on any node by default | | | | (if false, a resource is allowed to run on a | | | | node only if a | | | | :ref:`location constraint ` | | | | enables it) | +---------------------------+---------+----------------------------------------------------+ | stop-all-resources | false | .. index:: | | | | pair: cluster option; stop-all-resources | | | | | | | | Whether all resources should be disallowed from | | | | running (can be useful during maintenance) | +---------------------------+---------+----------------------------------------------------+ | stop-orphan-resources | true | .. index:: | | | | pair: cluster option; stop-orphan-resources | | | | | | | | Whether resources that have been deleted from | | | | the configuration should be stopped. This value | | | | takes precedence over ``is-managed`` (that is, | | | | even unmanaged resources will be stopped when | | | | orphaned if this value is ``true`` | +---------------------------+---------+----------------------------------------------------+ | stop-orphan-actions | true | .. index:: | | | | pair: cluster option; stop-orphan-actions | | | | | | | | Whether recurring :ref:`operations ` | | | | that have been deleted from the configuration | | | | should be cancelled | +---------------------------+---------+----------------------------------------------------+ | start-failure-is-fatal | true | .. index:: | | | | pair: cluster option; start-failure-is-fatal | | | | | | | | Whether a failure to start a resource on a | | | | particular node prevents further start attempts | | | | on that node? If ``false``, the cluster will | | | | decide whether the node is still eligible based | | | | on the resource's current failure count and | | | | :ref:`migration-threshold `. | +---------------------------+---------+----------------------------------------------------+ | enable-startup-probes | true | .. index:: | | | | pair: cluster option; enable-startup-probes | | | | | | | | Whether the cluster should check the | | | | pre-existing state of resources when the cluster | | | | starts | +---------------------------+---------+----------------------------------------------------+ | maintenance-mode | false | .. index:: | | | | pair: cluster option; maintenance-mode | | | | | | | | Whether the cluster should refrain from | | | | monitoring, starting and stopping resources | +---------------------------+---------+----------------------------------------------------+ | stonith-enabled | true | .. index:: | | | | pair: cluster option; stonith-enabled | | | | | | | | Whether the cluster is allowed to fence nodes | | | | (for example, failed nodes and nodes with | | | | resources that can't be stopped. | | | | | | | | If true, at least one fence device must be | | | | configured before resources are allowed to run. | | | | | | | | If false, unresponsive nodes are immediately | | | | assumed to be running no resources, and resource | | | | recovery on online nodes starts without any | | | | further protection (which can mean *data loss* | | | | if the unresponsive node still accesses shared | | | | storage, for example). See also the | | | | :ref:`requires ` resource | | | | meta-attribute. | +---------------------------+---------+----------------------------------------------------+ | stonith-action | reboot | .. index:: | | | | pair: cluster option; stonith-action | | | | | | | | Action the cluster should send to the fence agent | | | | when a node must be fenced. Allowed values are | | | | ``reboot``, ``off``, and (for legacy agents only) | | | | ``poweroff``. | +---------------------------+---------+----------------------------------------------------+ | stonith-timeout | 60s | .. index:: | | | | pair: cluster option; stonith-timeout | | | | | | | | How long to wait for ``on``, ``off``, and | | | | ``reboot`` fence actions to complete by default. | +---------------------------+---------+----------------------------------------------------+ | stonith-max-attempts | 10 | .. index:: | | | | pair: cluster option; stonith-max-attempts | | | | | | | | How many times fencing can fail for a target | | | | before the cluster will no longer immediately | | | | re-attempt it. | +---------------------------+---------+----------------------------------------------------+ | stonith-watchdog-timeout | 0 | .. index:: | | | | pair: cluster option; stonith-watchdog-timeout | | | | | | | | If nonzero, and the cluster detects | | | | ``have-watchdog`` as ``true``, then watchdog-based | | | | self-fencing will be performed via SBD when | | | | fencing is required, without requiring a fencing | | | | resource explicitly configured. | | | | | | | | If this is set to a positive value, unseen nodes | | | | are assumed to self-fence within this much time. | | | | | | | | **Warning:** It must be ensured that this value is | | | | larger than the ``SBD_WATCHDOG_TIMEOUT`` | | | | environment variable on all nodes. Pacemaker | | | | verifies the settings individually on all nodes | | | | and prevents startup or shuts down if configured | | | | wrongly on the fly. It is strongly recommended | | | | that ``SBD_WATCHDOG_TIMEOUT`` be set to the same | | | | value on all nodes. | | | | | | | | If this is set to a negative value, and | | | | ``SBD_WATCHDOG_TIMEOUT`` is set, twice that value | | | | will be used. | | | | | | | | **Warning:** In this case, it is essential (and | | | | currently not verified by pacemaker) that | | | | ``SBD_WATCHDOG_TIMEOUT`` is set to the same | | | | value on all nodes. | +---------------------------+---------+----------------------------------------------------+ | concurrent-fencing | false | .. index:: | | | | pair: cluster option; concurrent-fencing | | | | | | | | Whether the cluster is allowed to initiate | | | | multiple fence actions concurrently. Fence actions | | | | initiated externally, such as via the | | | | ``stonith_admin`` tool or an application such as | | | | DLM, or by the fencer itself such as recurring | | | | device monitors and ``status`` and ``list`` | | | | commands, are not limited by this option. | +---------------------------+---------+----------------------------------------------------+ | fence-reaction | stop | .. index:: | | | | pair: cluster option; fence-reaction | | | | | | | | How should a cluster node react if notified of its | | | | own fencing? A cluster node may receive | | | | notification of its own fencing if fencing is | | | | misconfigured, or if fabric fencing is in use that | | | | doesn't cut cluster communication. Allowed values | | | | are ``stop`` to attempt to immediately stop | | | | pacemaker and stay stopped, or ``panic`` to | | | | attempt to immediately reboot the local node, | | | | falling back to stop on failure. The default is | | | | likely to be changed to ``panic`` in a future | | | | release. *(since 2.0.3)* | +---------------------------+---------+----------------------------------------------------+ | priority-fencing-delay | 0 | .. index:: | | | | pair: cluster option; priority-fencing-delay | | | | | | | | Apply this delay to any fencing targeting the lost | | | | nodes with the highest total resource priority in | | | | case we don't have the majority of the nodes in | | | | our cluster partition, so that the more | | | | significant nodes potentially win any fencing | | | | match (especially meaningful in a split-brain of a | | | | 2-node cluster). A promoted resource instance | | | | takes the resource's priority plus 1 if the | | | | resource's priority is not 0. Any static or random | | | | delays introduced by ``pcmk_delay_base`` and | | | | ``pcmk_delay_max`` configured for the | | | | corresponding fencing resources will be added to | | | | this delay. This delay should be significantly | | | | greater than (safely twice) the maximum delay from | | | | those parameters. *(since 2.0.4)* | +---------------------------+---------+----------------------------------------------------+ + | node-pending-timeout | 10min | .. index:: | + | | | pair: cluster option; node-pending-timeout | + | | | | + | | | A node that has joined the cluster can be pending | + | | | on joining the process group. We wait up to this | + | | | much time for it. If it times out, fencing | + | | | targeting the node will be issued if enabled. | + | | | *(since 2.1.7)* | + +---------------------------+---------+----------------------------------------------------+ | cluster-delay | 60s | .. index:: | | | | pair: cluster option; cluster-delay | | | | | | | | Estimated maximum round-trip delay over the | | | | network (excluding action execution). If the DC | | | | requires an action to be executed on another node, | | | | it will consider the action failed if it does not | | | | get a response from the other node in this time | | | | (after considering the action's own timeout). The | | | | "correct" value will depend on the speed and load | | | | of your network and cluster nodes. | +---------------------------+---------+----------------------------------------------------+ | dc-deadtime | 20s | .. index:: | | | | pair: cluster option; dc-deadtime | | | | | | | | How long to wait for a response from other nodes | | | | during startup. The "correct" value will depend on | | | | the speed/load of your network and the type of | | | | switches used. | +---------------------------+---------+----------------------------------------------------+ | cluster-ipc-limit | 500 | .. index:: | | | | pair: cluster option; cluster-ipc-limit | | | | | | | | The maximum IPC message backlog before one cluster | | | | daemon will disconnect another. This is of use in | | | | large clusters, for which a good value is the | | | | number of resources in the cluster multiplied by | | | | the number of nodes. The default of 500 is also | | | | the minimum. Raise this if you see | | | | "Evicting client" messages for cluster daemon PIDs | | | | in the logs. | +---------------------------+---------+----------------------------------------------------+ | pe-error-series-max | -1 | .. index:: | | | | pair: cluster option; pe-error-series-max | | | | | | | | The number of scheduler inputs resulting in errors | | | | to save. Used when reporting problems. A value of | | | | -1 means unlimited (report all), and 0 means none. | +---------------------------+---------+----------------------------------------------------+ | pe-warn-series-max | 5000 | .. index:: | | | | pair: cluster option; pe-warn-series-max | | | | | | | | The number of scheduler inputs resulting in | | | | warnings to save. Used when reporting problems. A | | | | value of -1 means unlimited (report all), and 0 | | | | means none. | +---------------------------+---------+----------------------------------------------------+ | pe-input-series-max | 4000 | .. index:: | | | | pair: cluster option; pe-input-series-max | | | | | | | | The number of "normal" scheduler inputs to save. | | | | Used when reporting problems. A value of -1 means | | | | unlimited (report all), and 0 means none. | +---------------------------+---------+----------------------------------------------------+ | enable-acl | false | .. index:: | | | | pair: cluster option; enable-acl | | | | | | | | Whether :ref:`acl` should be used to authorize | | | | modifications to the CIB | +---------------------------+---------+----------------------------------------------------+ | placement-strategy | default | .. index:: | | | | pair: cluster option; placement-strategy | | | | | | | | How the cluster should allocate resources to nodes | | | | (see :ref:`utilization`). Allowed values are | | | | ``default``, ``utilization``, ``balanced``, and | | | | ``minimal``. | +---------------------------+---------+----------------------------------------------------+ | node-health-strategy | none | .. index:: | | | | pair: cluster option; node-health-strategy | | | | | | | | How the cluster should react to node health | | | | attributes (see :ref:`node-health`). Allowed values| | | | are ``none``, ``migrate-on-red``, ``only-green``, | | | | ``progressive``, and ``custom``. | +---------------------------+---------+----------------------------------------------------+ | node-health-base | 0 | .. index:: | | | | pair: cluster option; node-health-base | | | | | | | | The base health score assigned to a node. Only | | | | used when ``node-health-strategy`` is | | | | ``progressive``. | +---------------------------+---------+----------------------------------------------------+ | node-health-green | 0 | .. index:: | | | | pair: cluster option; node-health-green | | | | | | | | The score to use for a node health attribute whose | | | | value is ``green``. Only used when | | | | ``node-health-strategy`` is ``progressive`` or | | | | ``custom``. | +---------------------------+---------+----------------------------------------------------+ | node-health-yellow | 0 | .. index:: | | | | pair: cluster option; node-health-yellow | | | | | | | | The score to use for a node health attribute whose | | | | value is ``yellow``. Only used when | | | | ``node-health-strategy`` is ``progressive`` or | | | | ``custom``. | +---------------------------+---------+----------------------------------------------------+ | node-health-red | 0 | .. index:: | | | | pair: cluster option; node-health-red | | | | | | | | The score to use for a node health attribute whose | | | | value is ``red``. Only used when | | | | ``node-health-strategy`` is ``progressive`` or | | | | ``custom``. | +---------------------------+---------+----------------------------------------------------+ | cluster-recheck-interval | 15min | .. index:: | | | | pair: cluster option; cluster-recheck-interval | | | | | | | | Pacemaker is primarily event-driven, and looks | | | | ahead to know when to recheck the cluster for | | | | failure timeouts and most time-based rules | | | | *(since 2.0.3)*. However, it will also recheck the | | | | cluster after this amount of inactivity. This has | | | | two goals: rules with ``date_spec`` are only | | | | guaranteed to be checked this often, and it also | | | | serves as a fail-safe for some kinds of scheduler | | | | bugs. A value of 0 disables this polling; positive | | | | values are a time interval. | +---------------------------+---------+----------------------------------------------------+ | shutdown-lock | false | .. index:: | | | | pair: cluster option; shutdown-lock | | | | | | | | The default of false allows active resources to be | | | | recovered elsewhere when their node is cleanly | | | | shut down, which is what the vast majority of | | | | users will want. However, some users prefer to | | | | make resources highly available only for failures, | | | | with no recovery for clean shutdowns. If this | | | | option is true, resources active on a node when it | | | | is cleanly shut down are kept "locked" to that | | | | node (not allowed to run elsewhere) until they | | | | start again on that node after it rejoins (or for | | | | at most ``shutdown-lock-limit``, if set). Stonith | | | | resources and Pacemaker Remote connections are | | | | never locked. Clone and bundle instances and the | | | | promoted role of promotable clones are currently | | | | never locked, though support could be added in a | | | | future release. Locks may be manually cleared | | | | using the ``--refresh`` option of ``crm_resource`` | | | | (both the resource and node must be specified; | | | | this works with remote nodes if their connection | | | | resource's ``target-role`` is set to ``Stopped``, | | | | but not if Pacemaker Remote is stopped on the | | | | remote node without disabling the connection | | | | resource). *(since 2.0.4)* | +---------------------------+---------+----------------------------------------------------+ | shutdown-lock-limit | 0 | .. index:: | | | | pair: cluster option; shutdown-lock-limit | | | | | | | | If ``shutdown-lock`` is true, and this is set to a | | | | nonzero time duration, locked resources will be | | | | allowed to start after this much time has passed | | | | since the node shutdown was initiated, even if the | | | | node has not rejoined. (This works with remote | | | | nodes only if their connection resource's | | | | ``target-role`` is set to ``Stopped``.) | | | | *(since 2.0.4)* | +---------------------------+---------+----------------------------------------------------+ | remove-after-stop | false | .. index:: | | | | pair: cluster option; remove-after-stop | | | | | | | | *Deprecated* Should the cluster remove | | | | resources from Pacemaker's executor after they are | | | | stopped? Values other than the default are, at | | | | best, poorly tested and potentially dangerous. | | | | This option is deprecated and will be removed in a | | | | future release. | +---------------------------+---------+----------------------------------------------------+ | startup-fencing | true | .. index:: | | | | pair: cluster option; startup-fencing | | | | | | | | *Advanced Use Only:* Should the cluster fence | | | | unseen nodes at start-up? Setting this to false is | | | | unsafe, because the unseen nodes could be active | | | | and running resources but unreachable. | +---------------------------+---------+----------------------------------------------------+ | election-timeout | 2min | .. index:: | | | | pair: cluster option; election-timeout | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | shutdown-escalation | 20min | .. index:: | | | | pair: cluster option; shutdown-escalation | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | join-integration-timeout | 3min | .. index:: | | | | pair: cluster option; join-integration-timeout | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | join-finalization-timeout | 30min | .. index:: | | | | pair: cluster option; join-finalization-timeout | | | | | | | | *Advanced Use Only:* If you need to adjust this | | | | value, it probably indicates the presence of a bug.| +---------------------------+---------+----------------------------------------------------+ | transition-delay | 0s | .. index:: | | | | pair: cluster option; transition-delay | | | | | | | | *Advanced Use Only:* Delay cluster recovery for | | | | the configured interval to allow for additional or | | | | related events to occur. This can be useful if | | | | your configuration is sensitive to the order in | | | | which ping updates arrive. Enabling this option | | | | will slow down cluster recovery under all | | | | conditions. | +---------------------------+---------+----------------------------------------------------+ diff --git a/include/crm/cluster.h b/include/crm/cluster.h index bceb9c2135..dfdbac9cc3 100644 --- a/include/crm/cluster.h +++ b/include/crm/cluster.h @@ -1,236 +1,239 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef PCMK__CRM_CLUSTER__H # define PCMK__CRM_CLUSTER__H # include // uint32_t, uint64_t # include // gboolean, GHashTable # include // xmlNode # include # include #ifdef __cplusplus extern "C" { #endif # if SUPPORT_COROSYNC # include # endif extern gboolean crm_have_quorum; extern GHashTable *crm_peer_cache; extern GHashTable *crm_remote_peer_cache; extern unsigned long long crm_peer_seq; #define CRM_NODE_LOST "lost" #define CRM_NODE_MEMBER "member" enum crm_join_phase { /* @COMPAT: crm_join_nack_quiet can be replaced by crm_node_t:user_data * at a compatibility break. */ //! Not allowed to join, but don't send a nack message crm_join_nack_quiet = -2, crm_join_nack = -1, crm_join_none = 0, crm_join_welcomed = 1, crm_join_integrated = 2, crm_join_finalized = 3, crm_join_confirmed = 4, }; enum crm_node_flags { /* node is not a cluster node and should not be considered for cluster membership */ crm_remote_node = 0x0001, /* node's cache entry is dirty */ crm_node_dirty = 0x0010, }; typedef struct crm_peer_node_s { char *uname; // Node name as known to cluster char *uuid; // Node UUID to ensure uniqueness char *state; // @TODO change to enum uint64_t flags; // Bitmask of crm_node_flags uint64_t last_seen; // Only needed by cluster nodes uint32_t processes; // @TODO most not needed, merge into flags /* @TODO When we can break public API compatibility, we can make the rest of * these members separate structs and use void *cluster_data and * void *user_data here instead, to abstract the cluster layer further. */ // Currently only needed by corosync stack uint32_t id; // Node ID time_t when_lost; // When CPG membership was last lost // Only used by controller enum crm_join_phase join; char *expected; time_t peer_lost; char *conn_host; + + time_t when_member; // Since when node has been a cluster member + time_t when_online; // Since when peer has been online in CPG } crm_node_t; void crm_peer_init(void); void crm_peer_destroy(void); typedef struct crm_cluster_s { char *uuid; char *uname; uint32_t nodeid; void (*destroy) (gpointer); # if SUPPORT_COROSYNC /* @TODO When we can break public API compatibility, make these members a * separate struct and use void *cluster_data here instead, to abstract the * cluster layer further. */ struct cpg_name group; cpg_callbacks_t cpg; cpg_handle_t cpg_handle; # endif } crm_cluster_t; gboolean crm_cluster_connect(crm_cluster_t *cluster); void crm_cluster_disconnect(crm_cluster_t *cluster); crm_cluster_t *pcmk_cluster_new(void); void pcmk_cluster_free(crm_cluster_t *cluster); enum crm_ais_msg_class { crm_class_cluster = 0, }; enum crm_ais_msg_types { crm_msg_none = 0, crm_msg_ais = 1, crm_msg_lrmd = 2, crm_msg_cib = 3, crm_msg_crmd = 4, crm_msg_attrd = 5, crm_msg_stonithd = 6, crm_msg_te = 7, crm_msg_pe = 8, crm_msg_stonith_ng = 9, }; /* used with crm_get_peer_full */ enum crm_get_peer_flags { CRM_GET_PEER_CLUSTER = 0x0001, CRM_GET_PEER_REMOTE = 0x0002, CRM_GET_PEER_ANY = CRM_GET_PEER_CLUSTER|CRM_GET_PEER_REMOTE, }; gboolean send_cluster_message(const crm_node_t *node, enum crm_ais_msg_types service, xmlNode *data, gboolean ordered); int crm_remote_peer_cache_size(void); /* Initialize and refresh the remote peer cache from a cib config */ void crm_remote_peer_cache_refresh(xmlNode *cib); crm_node_t *crm_remote_peer_get(const char *node_name); void crm_remote_peer_cache_remove(const char *node_name); /* allows filtering of remote and cluster nodes using crm_get_peer_flags */ crm_node_t *crm_get_peer_full(unsigned int id, const char *uname, int flags); /* only searches cluster nodes */ crm_node_t *crm_get_peer(unsigned int id, const char *uname); guint crm_active_peers(void); gboolean crm_is_peer_active(const crm_node_t * node); guint reap_crm_member(uint32_t id, const char *name); # if SUPPORT_COROSYNC uint32_t get_local_nodeid(cpg_handle_t handle); gboolean cluster_connect_cpg(crm_cluster_t *cluster); void cluster_disconnect_cpg(crm_cluster_t * cluster); void pcmk_cpg_membership(cpg_handle_t handle, const struct cpg_name *groupName, const struct cpg_address *member_list, size_t member_list_entries, const struct cpg_address *left_list, size_t left_list_entries, const struct cpg_address *joined_list, size_t joined_list_entries); gboolean crm_is_corosync_peer_active(const crm_node_t * node); gboolean send_cluster_text(enum crm_ais_msg_class msg_class, const char *data, gboolean local, const crm_node_t *node, enum crm_ais_msg_types dest); char *pcmk_message_common_cs(cpg_handle_t handle, uint32_t nodeid, uint32_t pid, void *msg, uint32_t *kind, const char **from); # endif const char *crm_peer_uuid(crm_node_t *node); const char *crm_peer_uname(const char *uuid); void set_uuid(xmlNode *xml, const char *attr, crm_node_t *node); enum crm_status_type { crm_status_uname, crm_status_nstate, crm_status_processes, }; enum crm_ais_msg_types text2msg_type(const char *text); void crm_set_status_callback(void (*dispatch) (enum crm_status_type, crm_node_t *, const void *)); void crm_set_autoreap(gboolean autoreap); enum cluster_type_e { pcmk_cluster_unknown = 0x0001, pcmk_cluster_invalid = 0x0002, // 0x0004 was heartbeat // 0x0010 was corosync 1 with plugin pcmk_cluster_corosync = 0x0020, // 0x0040 was corosync 1 with CMAN }; enum cluster_type_e get_cluster_type(void); const char *name_for_cluster_type(enum cluster_type_e type); gboolean is_corosync_cluster(void); const char *get_local_node_name(void); char *get_node_name(uint32_t nodeid); /*! * \brief Get log-friendly string equivalent of a join phase * * \param[in] phase Join phase * * \return Log-friendly string equivalent of \p phase */ static inline const char * crm_join_phase_str(enum crm_join_phase phase) { switch (phase) { case crm_join_nack_quiet: return "nack_quiet"; case crm_join_nack: return "nack"; case crm_join_none: return "none"; case crm_join_welcomed: return "welcomed"; case crm_join_integrated: return "integrated"; case crm_join_finalized: return "finalized"; case crm_join_confirmed: return "confirmed"; default: return "invalid"; } } #if !defined(PCMK_ALLOW_DEPRECATED) || (PCMK_ALLOW_DEPRECATED == 1) #include #endif #ifdef __cplusplus } #endif #endif diff --git a/include/crm/cluster/internal.h b/include/crm/cluster/internal.h index 9bc57c617c..e20ee4c59a 100644 --- a/include/crm/cluster/internal.h +++ b/include/crm/cluster/internal.h @@ -1,133 +1,139 @@ /* * Copyright 2004-2021 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef CRM_CLUSTER_INTERNAL__H # define CRM_CLUSTER_INTERNAL__H # include // uint32_t, uint64_t # include /* *INDENT-OFF* */ enum crm_proc_flag { crm_proc_none = 0x00000001, // Cluster layers crm_proc_cpg = 0x04000000, // Daemons crm_proc_execd = 0x00000010, crm_proc_based = 0x00000100, crm_proc_controld = 0x00000200, crm_proc_attrd = 0x00001000, crm_proc_schedulerd = 0x00010000, crm_proc_fenced = 0x00100000, }; /* *INDENT-ON* */ /*! * \internal * \brief Return the process bit corresponding to the current cluster stack * * \return Process flag if detectable, otherwise 0 */ static inline uint32_t crm_get_cluster_proc(void) { switch (get_cluster_type()) { case pcmk_cluster_corosync: return crm_proc_cpg; default: break; } return crm_proc_none; } /*! * \internal * \brief Get log-friendly string description of a Corosync return code * * \param[in] error Corosync return code * * \return Log-friendly string description corresponding to \p error */ static inline const char * pcmk__cs_err_str(int error) { # if SUPPORT_COROSYNC switch (error) { case CS_OK: return "OK"; case CS_ERR_LIBRARY: return "Library error"; case CS_ERR_VERSION: return "Version error"; case CS_ERR_INIT: return "Initialization error"; case CS_ERR_TIMEOUT: return "Timeout"; case CS_ERR_TRY_AGAIN: return "Try again"; case CS_ERR_INVALID_PARAM: return "Invalid parameter"; case CS_ERR_NO_MEMORY: return "No memory"; case CS_ERR_BAD_HANDLE: return "Bad handle"; case CS_ERR_BUSY: return "Busy"; case CS_ERR_ACCESS: return "Access error"; case CS_ERR_NOT_EXIST: return "Doesn't exist"; case CS_ERR_NAME_TOO_LONG: return "Name too long"; case CS_ERR_EXIST: return "Exists"; case CS_ERR_NO_SPACE: return "No space"; case CS_ERR_INTERRUPT: return "Interrupt"; case CS_ERR_NAME_NOT_FOUND: return "Name not found"; case CS_ERR_NO_RESOURCES: return "No resources"; case CS_ERR_NOT_SUPPORTED: return "Not supported"; case CS_ERR_BAD_OPERATION: return "Bad operation"; case CS_ERR_FAILED_OPERATION: return "Failed operation"; case CS_ERR_MESSAGE_ERROR: return "Message error"; case CS_ERR_QUEUE_FULL: return "Queue full"; case CS_ERR_QUEUE_NOT_AVAILABLE: return "Queue not available"; case CS_ERR_BAD_FLAGS: return "Bad flags"; case CS_ERR_TOO_BIG: return "Too big"; case CS_ERR_NO_SECTIONS: return "No sections"; } # endif return "Corosync error"; } # if SUPPORT_COROSYNC #if 0 /* This is the new way to do it, but we still support all Corosync 2 versions, * and this isn't always available. A better alternative here would be to check * for support in the configure script and enable this conditionally. */ #define pcmk__init_cmap(handle) cmap_initialize_map((handle), CMAP_MAP_ICMAP) #else #define pcmk__init_cmap(handle) cmap_initialize(handle) #endif char *pcmk__corosync_cluster_name(void); bool pcmk__corosync_add_nodes(xmlNode *xml_parent); # endif crm_node_t *crm_update_peer_proc(const char *source, crm_node_t * peer, uint32_t flag, const char *status); crm_node_t *pcmk__update_peer_state(const char *source, crm_node_t *node, const char *state, uint64_t membership); void pcmk__update_peer_expected(const char *source, crm_node_t *node, const char *expected); void pcmk__reap_unseen_nodes(uint64_t ring_id); void pcmk__corosync_quorum_connect(gboolean (*dispatch)(unsigned long long, gboolean), void (*destroy) (gpointer)); crm_node_t *pcmk__search_node_caches(unsigned int id, const char *uname, uint32_t flags); -crm_node_t *pcmk__search_cluster_node_cache(unsigned int id, const char *uname); +crm_node_t *pcmk__search_cluster_node_cache(unsigned int id, const char *uname, + const char *uuid); void pcmk__refresh_node_caches_from_cib(xmlNode *cib); crm_node_t *pcmk__search_known_node_cache(unsigned int id, const char *uname, uint32_t flags); +crm_node_t *pcmk__get_peer(unsigned int id, const char *uname, + const char *uuid); +crm_node_t *pcmk__get_peer_full(unsigned int id, const char *uname, + const char *uuid, int flags); + #endif diff --git a/include/crm/crm.h b/include/crm/crm.h index e824825e6c..e2e16ef1f4 100644 --- a/include/crm/crm.h +++ b/include/crm/crm.h @@ -1,238 +1,238 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef PCMK__CRM_CRM__H # define PCMK__CRM_CRM__H # include # include # include # include # include # include #ifdef __cplusplus extern "C" { #endif /** * \file * \brief A dumping ground * \ingroup core */ #ifndef PCMK_ALLOW_DEPRECATED /*! * \brief Allow use of deprecated Pacemaker APIs * * By default, external code using Pacemaker headers is allowed to use * deprecated Pacemaker APIs. If PCMK_ALLOW_DEPRECATED is defined to 0 before * including any Pacemaker headers, deprecated APIs will be unusable. It is * strongly recommended to leave this unchanged for production and release * builds, to avoid breakage when users upgrade to new Pacemaker releases that * deprecate more APIs. This should be defined to 0 only for development and * testing builds when desiring to check for usage of currently deprecated APIs. */ #define PCMK_ALLOW_DEPRECATED 1 #endif /*! * The CRM feature set assists with compatibility in mixed-version clusters. * The major version number increases when nodes with different versions * would not work (rolling upgrades are not allowed). The minor version * number increases when mixed-version clusters are allowed only during * rolling upgrades (a node with the oldest feature set will be elected DC). The * minor-minor version number is ignored, but allows resource agents to detect * cluster support for various features. * * The feature set also affects the processing of old saved CIBs (such as for * many scheduler regression tests). * * Particular feature points currently tested by Pacemaker code: * * >2.1: Operation updates include timing data * >=3.0.5: XML v2 digests are created * >=3.0.8: Peers do not need acks for cancellations * >=3.0.9: DC will send its own shutdown request to all peers * XML v2 patchsets are created by default * >=3.0.13: Fail counts include operation name and interval * >=3.2.0: DC supports PCMK_EXEC_INVALID and PCMK_EXEC_NOT_CONNECTED */ -# define CRM_FEATURE_SET "3.17.4" +# define CRM_FEATURE_SET "3.18.0" /* Pacemaker's CPG protocols use fixed-width binary fields for the sender and * recipient of a CPG message. This imposes an arbitrary limit on cluster node * names. */ //! \brief Maximum length of a Corosync cluster node name (in bytes) #define MAX_NAME 256 # define CRM_META "CRM_meta" extern char *crm_system_name; /* *INDENT-OFF* */ // How we represent "infinite" scores # define CRM_SCORE_INFINITY 1000000 # define CRM_INFINITY_S "INFINITY" # define CRM_PLUS_INFINITY_S "+" CRM_INFINITY_S # define CRM_MINUS_INFINITY_S "-" CRM_INFINITY_S /* @COMPAT API < 2.0.0 Deprecated "infinity" aliases * * INFINITY might be defined elsewhere (e.g. math.h), so undefine it first. * This, of course, complicates any attempt to use the other definition in any * code that includes this header. */ # undef INFINITY # define INFINITY_S "INFINITY" # define MINUS_INFINITY_S "-INFINITY" # define INFINITY 1000000 /* Sub-systems */ # define CRM_SYSTEM_DC "dc" #define CRM_SYSTEM_DCIB "dcib" // Primary instance of CIB manager # define CRM_SYSTEM_CIB "cib" # define CRM_SYSTEM_CRMD "crmd" # define CRM_SYSTEM_LRMD "lrmd" # define CRM_SYSTEM_PENGINE "pengine" # define CRM_SYSTEM_TENGINE "tengine" # define CRM_SYSTEM_STONITHD "stonithd" # define CRM_SYSTEM_MCP "pacemakerd" // Names of internally generated node attributes # define CRM_ATTR_UNAME "#uname" # define CRM_ATTR_ID "#id" # define CRM_ATTR_KIND "#kind" # define CRM_ATTR_ROLE "#role" # define CRM_ATTR_IS_DC "#is_dc" # define CRM_ATTR_CLUSTER_NAME "#cluster-name" # define CRM_ATTR_SITE_NAME "#site-name" # define CRM_ATTR_UNFENCED "#node-unfenced" # define CRM_ATTR_DIGESTS_ALL "#digests-all" # define CRM_ATTR_DIGESTS_SECURE "#digests-secure" # define CRM_ATTR_PROTOCOL "#attrd-protocol" # define CRM_ATTR_FEATURE_SET "#feature-set" /* Valid operations */ # define CRM_OP_NOOP "noop" # define CRM_OP_JOIN_ANNOUNCE "join_announce" # define CRM_OP_JOIN_OFFER "join_offer" # define CRM_OP_JOIN_REQUEST "join_request" # define CRM_OP_JOIN_ACKNAK "join_ack_nack" # define CRM_OP_JOIN_CONFIRM "join_confirm" # define CRM_OP_PING "ping" # define CRM_OP_NODE_INFO "node-info" # define CRM_OP_THROTTLE "throttle" # define CRM_OP_VOTE "vote" # define CRM_OP_NOVOTE "no-vote" # define CRM_OP_HELLO "hello" # define CRM_OP_PECALC "pe_calc" # define CRM_OP_QUIT "quit" # define CRM_OP_LOCAL_SHUTDOWN "start_shutdown" # define CRM_OP_SHUTDOWN_REQ "req_shutdown" # define CRM_OP_SHUTDOWN "do_shutdown" # define CRM_OP_FENCE "stonith" # define CRM_OP_REGISTER "register" # define CRM_OP_IPC_FWD "ipc_fwd" # define CRM_OP_INVOKE_LRM "lrm_invoke" # define CRM_OP_LRM_REFRESH "lrm_refresh" //!< Deprecated since 1.1.10 # define CRM_OP_LRM_DELETE "lrm_delete" # define CRM_OP_LRM_FAIL "lrm_fail" # define CRM_OP_PROBED "probe_complete" # define CRM_OP_REPROBE "probe_again" # define CRM_OP_CLEAR_FAILCOUNT "clear_failcount" # define CRM_OP_REMOTE_STATE "remote_state" # define CRM_OP_RELAXED_SET "one-or-more" # define CRM_OP_RELAXED_CLONE "clone-one-or-more" # define CRM_OP_RM_NODE_CACHE "rm_node_cache" # define CRM_OP_MAINTENANCE_NODES "maintenance_nodes" /* Possible cluster membership states */ # define CRMD_JOINSTATE_DOWN "down" # define CRMD_JOINSTATE_PENDING "pending" # define CRMD_JOINSTATE_MEMBER "member" # define CRMD_JOINSTATE_NACK "banned" # define CRMD_ACTION_DELETE "delete" # define CRMD_ACTION_CANCEL "cancel" # define CRMD_ACTION_RELOAD "reload" # define CRMD_ACTION_RELOAD_AGENT "reload-agent" # define CRMD_ACTION_MIGRATE "migrate_to" # define CRMD_ACTION_MIGRATED "migrate_from" # define CRMD_ACTION_START "start" # define CRMD_ACTION_STARTED "running" # define CRMD_ACTION_STOP "stop" # define CRMD_ACTION_STOPPED "stopped" # define CRMD_ACTION_PROMOTE "promote" # define CRMD_ACTION_PROMOTED "promoted" # define CRMD_ACTION_DEMOTE "demote" # define CRMD_ACTION_DEMOTED "demoted" # define CRMD_ACTION_NOTIFY "notify" # define CRMD_ACTION_NOTIFIED "notified" # define CRMD_ACTION_STATUS "monitor" # define CRMD_ACTION_METADATA "meta-data" # define CRMD_METADATA_CALL_TIMEOUT 30000 /* short names */ # define RSC_DELETE CRMD_ACTION_DELETE # define RSC_CANCEL CRMD_ACTION_CANCEL # define RSC_MIGRATE CRMD_ACTION_MIGRATE # define RSC_MIGRATED CRMD_ACTION_MIGRATED # define RSC_START CRMD_ACTION_START # define RSC_STARTED CRMD_ACTION_STARTED # define RSC_STOP CRMD_ACTION_STOP # define RSC_STOPPED CRMD_ACTION_STOPPED # define RSC_PROMOTE CRMD_ACTION_PROMOTE # define RSC_PROMOTED CRMD_ACTION_PROMOTED # define RSC_DEMOTE CRMD_ACTION_DEMOTE # define RSC_DEMOTED CRMD_ACTION_DEMOTED # define RSC_NOTIFY CRMD_ACTION_NOTIFY # define RSC_NOTIFIED CRMD_ACTION_NOTIFIED # define RSC_STATUS CRMD_ACTION_STATUS # define RSC_METADATA CRMD_ACTION_METADATA /* *INDENT-ON* */ # include # include # include static inline const char * crm_action_str(const char *task, guint interval_ms) { if ((task != NULL) && (interval_ms == 0) && (strcasecmp(task, RSC_STATUS) == 0)) { return "probe"; } return task; } #if !defined(PCMK_ALLOW_DEPRECATED) || (PCMK_ALLOW_DEPRECATED == 1) #include #endif #ifdef __cplusplus } #endif #endif diff --git a/include/crm/msg_xml.h b/include/crm/msg_xml.h index e9ebe5ffc7..d9f141f6f1 100644 --- a/include/crm/msg_xml.h +++ b/include/crm/msg_xml.h @@ -1,485 +1,486 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef PCMK__CRM_MSG_XML__H # define PCMK__CRM_MSG_XML__H # include #if !defined(PCMK_ALLOW_DEPRECATED) || (PCMK_ALLOW_DEPRECATED == 1) #include #endif #ifdef __cplusplus extern "C" { #endif /* This file defines constants for various XML syntax (mainly element and * attribute names). * * For consistency, new constants should start with "PCMK_", followed by "XE" * for XML element names, "XA" for XML attribute names, and "META" for meta * attribute names. Old names that don't follow this policy should eventually be * deprecated and replaced with names that do. */ /* * XML elements */ #define PCMK_XE_DATE_EXPRESSION "date_expression" #define PCMK_XE_OP_EXPRESSION "op_expression" /* This has been deprecated as a CIB element (an alias for with * "promotable" set to "true") since 2.0.0. */ #define PCMK_XE_PROMOTABLE_LEGACY "master" #define PCMK_XE_RSC_EXPRESSION "rsc_expression" /* * XML attributes */ /* These have been deprecated as CIB element attributes (aliases for * "promoted-max" and "promoted-node-max") since 2.0.0. */ #define PCMK_XA_PROMOTED_MAX_LEGACY "master-max" #define PCMK_XA_PROMOTED_NODE_MAX_LEGACY "master-node-max" /* * Meta attributes */ #define PCMK_META_ENABLED "enabled" /* * Older constants that don't follow current naming */ # ifndef F_ORIG # define F_ORIG "src" # endif # ifndef F_SEQ # define F_SEQ "seq" # endif # ifndef F_SUBTYPE # define F_SUBTYPE "subt" # endif # ifndef F_TYPE # define F_TYPE "t" # endif # ifndef F_CLIENTNAME # define F_CLIENTNAME "cn" # endif # ifndef F_XML_TAGNAME # define F_XML_TAGNAME "__name__" # endif # ifndef T_CRM # define T_CRM "crmd" # endif # ifndef T_ATTRD # define T_ATTRD "attrd" # endif # define CIB_OPTIONS_FIRST "cib-bootstrap-options" # define F_CRM_DATA "crm_xml" # define F_CRM_TASK "crm_task" # define F_CRM_HOST_TO "crm_host_to" # define F_CRM_MSG_TYPE F_SUBTYPE # define F_CRM_SYS_TO "crm_sys_to" # define F_CRM_SYS_FROM "crm_sys_from" # define F_CRM_HOST_FROM F_ORIG # define F_CRM_REFERENCE XML_ATTR_REFERENCE # define F_CRM_VERSION XML_ATTR_VERSION # define F_CRM_ORIGIN "origin" # define F_CRM_USER "crm_user" # define F_CRM_JOIN_ID "join_id" # define F_CRM_DC_LEAVING "dc-leaving" # define F_CRM_ELECTION_ID "election-id" # define F_CRM_ELECTION_AGE_S "election-age-sec" # define F_CRM_ELECTION_AGE_US "election-age-nano-sec" # define F_CRM_ELECTION_OWNER "election-owner" # define F_CRM_TGRAPH "crm-tgraph-file" # define F_CRM_TGRAPH_INPUT "crm-tgraph-in" # define F_CRM_THROTTLE_MODE "crm-limit-mode" # define F_CRM_THROTTLE_MAX "crm-limit-max" /*---- Common tags/attrs */ # define XML_DIFF_MARKER "__crm_diff_marker__" # define XML_TAG_CIB "cib" # define XML_TAG_FAILED "failed" # define XML_ATTR_CRM_VERSION "crm_feature_set" # define XML_ATTR_DIGEST "digest" # define XML_ATTR_VALIDATION "validate-with" # define XML_ATTR_QUORUM_PANIC "no-quorum-panic" # define XML_ATTR_HAVE_QUORUM "have-quorum" # define XML_ATTR_HAVE_WATCHDOG "have-watchdog" # define XML_ATTR_GENERATION "epoch" # define XML_ATTR_GENERATION_ADMIN "admin_epoch" # define XML_ATTR_NUMUPDATES "num_updates" # define XML_ATTR_TIMEOUT "timeout" # define XML_ATTR_ORIGIN "crm-debug-origin" # define XML_ATTR_TSTAMP "crm-timestamp" # define XML_CIB_ATTR_WRITTEN "cib-last-written" # define XML_ATTR_VERSION "version" # define XML_ATTR_DESC "description" # define XML_ATTR_ID "id" # define XML_ATTR_NAME "name" # define XML_ATTR_IDREF "id-ref" # define XML_ATTR_ID_LONG "long-id" # define XML_ATTR_TYPE "type" # define XML_ATTR_VERBOSE "verbose" # define XML_ATTR_OP "op" # define XML_ATTR_DC_UUID "dc-uuid" # define XML_ATTR_UPDATE_ORIG "update-origin" # define XML_ATTR_UPDATE_CLIENT "update-client" # define XML_ATTR_UPDATE_USER "update-user" # define XML_BOOLEAN_TRUE "true" # define XML_BOOLEAN_FALSE "false" # define XML_BOOLEAN_YES XML_BOOLEAN_TRUE # define XML_BOOLEAN_NO XML_BOOLEAN_FALSE # define XML_TAG_OPTIONS "options" /*---- top level tags/attrs */ # define XML_ATTR_REQUEST "request" # define XML_ATTR_RESPONSE "response" # define XML_ATTR_UNAME "uname" # define XML_ATTR_REFERENCE "reference" # define XML_CRM_TAG_PING "ping_response" # define XML_PING_ATTR_STATUS "result" # define XML_PING_ATTR_SYSFROM "crm_subsystem" # define XML_PING_ATTR_CRMDSTATE "crmd_state" # define XML_PING_ATTR_PACEMAKERDSTATE "pacemakerd_state" # define XML_PING_ATTR_PACEMAKERDSTATE_INIT "init" # define XML_PING_ATTR_PACEMAKERDSTATE_STARTINGDAEMONS "starting_daemons" # define XML_PING_ATTR_PACEMAKERDSTATE_WAITPING "wait_for_ping" # define XML_PING_ATTR_PACEMAKERDSTATE_RUNNING "running" # define XML_PING_ATTR_PACEMAKERDSTATE_SHUTTINGDOWN "shutting_down" # define XML_PING_ATTR_PACEMAKERDSTATE_SHUTDOWNCOMPLETE "shutdown_complete" # define XML_PING_ATTR_PACEMAKERDSTATE_REMOTE "remote" # define XML_FAIL_TAG_CIB "failed_update" # define XML_FAILCIB_ATTR_ID "id" # define XML_FAILCIB_ATTR_OBJTYPE "object_type" # define XML_FAILCIB_ATTR_OP "operation" # define XML_FAILCIB_ATTR_REASON "reason" /*---- CIB specific tags/attrs */ # define XML_CIB_TAG_SECTION_ALL "all" # define XML_CIB_TAG_CONFIGURATION "configuration" # define XML_CIB_TAG_STATUS "status" # define XML_CIB_TAG_RESOURCES "resources" # define XML_CIB_TAG_NODES "nodes" # define XML_CIB_TAG_DOMAINS "domains" # define XML_CIB_TAG_CONSTRAINTS "constraints" # define XML_CIB_TAG_CRMCONFIG "crm_config" # define XML_CIB_TAG_OPCONFIG "op_defaults" # define XML_CIB_TAG_RSCCONFIG "rsc_defaults" # define XML_CIB_TAG_ACLS "acls" # define XML_CIB_TAG_ALERTS "alerts" # define XML_CIB_TAG_ALERT "alert" # define XML_CIB_TAG_ALERT_RECIPIENT "recipient" # define XML_CIB_TAG_ALERT_SELECT "select" # define XML_CIB_TAG_ALERT_ATTRIBUTES "select_attributes" # define XML_CIB_TAG_ALERT_FENCING "select_fencing" # define XML_CIB_TAG_ALERT_NODES "select_nodes" # define XML_CIB_TAG_ALERT_RESOURCES "select_resources" # define XML_CIB_TAG_ALERT_ATTR "attribute" # define XML_CIB_TAG_STATE "node_state" # define XML_CIB_TAG_NODE "node" # define XML_CIB_TAG_NVPAIR "nvpair" # define XML_CIB_TAG_PROPSET "cluster_property_set" # define XML_TAG_ATTR_SETS "instance_attributes" # define XML_TAG_META_SETS "meta_attributes" # define XML_TAG_ATTRS "attributes" # define XML_TAG_PARAMS "parameters" # define XML_TAG_PARAM "param" # define XML_TAG_UTILIZATION "utilization" # define XML_TAG_RESOURCE_REF "resource_ref" # define XML_CIB_TAG_RESOURCE "primitive" # define XML_CIB_TAG_GROUP "group" # define XML_CIB_TAG_INCARNATION "clone" # define XML_CIB_TAG_CONTAINER "bundle" # define XML_CIB_TAG_RSC_TEMPLATE "template" # define XML_RSC_ATTR_TARGET "container-attribute-target" # define XML_RSC_ATTR_RESTART "restart-type" # define XML_RSC_ATTR_ORDERED "ordered" # define XML_RSC_ATTR_INTERLEAVE "interleave" # define XML_RSC_ATTR_INCARNATION "clone" # define XML_RSC_ATTR_INCARNATION_MAX "clone-max" # define XML_RSC_ATTR_INCARNATION_MIN "clone-min" # define XML_RSC_ATTR_INCARNATION_NODEMAX "clone-node-max" # define XML_RSC_ATTR_PROMOTABLE "promotable" # define XML_RSC_ATTR_PROMOTED_MAX "promoted-max" # define XML_RSC_ATTR_PROMOTED_NODEMAX "promoted-node-max" # define XML_RSC_ATTR_MANAGED "is-managed" # define XML_RSC_ATTR_TARGET_ROLE "target-role" # define XML_RSC_ATTR_UNIQUE "globally-unique" # define XML_RSC_ATTR_NOTIFY "notify" # define XML_RSC_ATTR_STICKINESS "resource-stickiness" # define XML_RSC_ATTR_FAIL_STICKINESS "migration-threshold" # define XML_RSC_ATTR_FAIL_TIMEOUT "failure-timeout" # define XML_RSC_ATTR_MULTIPLE "multiple-active" # define XML_RSC_ATTR_REQUIRES "requires" # define XML_RSC_ATTR_CONTAINER "container" # define XML_RSC_ATTR_INTERNAL_RSC "internal_rsc" # define XML_RSC_ATTR_MAINTENANCE "maintenance" # define XML_RSC_ATTR_REMOTE_NODE "remote-node" # define XML_RSC_ATTR_CLEAR_OP "clear_failure_op" # define XML_RSC_ATTR_CLEAR_INTERVAL "clear_failure_interval" # define XML_RSC_ATTR_REMOTE_RA_ADDR "addr" # define XML_RSC_ATTR_REMOTE_RA_SERVER "server" # define XML_RSC_ATTR_REMOTE_RA_PORT "port" # define XML_RSC_ATTR_CRITICAL "critical" # define XML_REMOTE_ATTR_RECONNECT_INTERVAL "reconnect_interval" # define XML_OP_ATTR_ON_FAIL "on-fail" # define XML_OP_ATTR_START_DELAY "start-delay" # define XML_OP_ATTR_ALLOW_MIGRATE "allow-migrate" # define XML_OP_ATTR_ORIGIN "interval-origin" # define XML_OP_ATTR_PENDING "record-pending" # define XML_OP_ATTR_DIGESTS_ALL "digests-all" # define XML_OP_ATTR_DIGESTS_SECURE "digests-secure" # define XML_CIB_TAG_LRM "lrm" # define XML_LRM_TAG_RESOURCES "lrm_resources" # define XML_LRM_TAG_RESOURCE "lrm_resource" # define XML_LRM_TAG_RSC_OP "lrm_rsc_op" # define XML_AGENT_ATTR_CLASS "class" # define XML_AGENT_ATTR_PROVIDER "provider" //! \deprecated Do not use (will be removed in a future release) # define XML_CIB_ATTR_REPLACE "replace" # define XML_CIB_ATTR_SOURCE "source" # define XML_CIB_ATTR_PRIORITY "priority" # define XML_CIB_ATTR_SOURCE "source" # define XML_NODE_JOIN_STATE "join" # define XML_NODE_EXPECTED "expected" # define XML_NODE_IN_CLUSTER "in_ccm" # define XML_NODE_IS_PEER "crmd" # define XML_NODE_IS_REMOTE "remote_node" # define XML_NODE_IS_FENCED "node_fenced" # define XML_NODE_IS_MAINTENANCE "node_in_maintenance" # define XML_CIB_ATTR_SHUTDOWN "shutdown" /* Aside from being an old name for the executor, LRM is a misnomer here because * the controller and scheduler use these to track actions, which are not always * executor operations. */ // XML attribute that takes interval specification (user-facing configuration) # define XML_LRM_ATTR_INTERVAL "interval" // XML attribute that takes interval in milliseconds (daemon APIs) // (identical value as above, but different constant allows clearer code intent) # define XML_LRM_ATTR_INTERVAL_MS XML_LRM_ATTR_INTERVAL # define XML_LRM_ATTR_TASK "operation" # define XML_LRM_ATTR_TASK_KEY "operation_key" # define XML_LRM_ATTR_TARGET "on_node" # define XML_LRM_ATTR_TARGET_UUID "on_node_uuid" /*! Actions to be executed on Pacemaker Remote nodes are routed through the * controller on the cluster node hosting the remote connection. That cluster * node is considered the router node for the action. */ # define XML_LRM_ATTR_ROUTER_NODE "router_node" # define XML_LRM_ATTR_RSCID "rsc-id" # define XML_LRM_ATTR_OPSTATUS "op-status" # define XML_LRM_ATTR_RC "rc-code" # define XML_LRM_ATTR_CALLID "call-id" # define XML_LRM_ATTR_OP_DIGEST "op-digest" # define XML_LRM_ATTR_OP_RESTART "op-force-restart" # define XML_LRM_ATTR_OP_SECURE "op-secure-params" # define XML_LRM_ATTR_RESTART_DIGEST "op-restart-digest" # define XML_LRM_ATTR_SECURE_DIGEST "op-secure-digest" # define XML_LRM_ATTR_EXIT_REASON "exit-reason" # define XML_RSC_OP_LAST_CHANGE "last-rc-change" # define XML_RSC_OP_LAST_RUN "last-run" // deprecated since 2.0.3 # define XML_RSC_OP_T_EXEC "exec-time" # define XML_RSC_OP_T_QUEUE "queue-time" # define XML_LRM_ATTR_MIGRATE_SOURCE "migrate_source" # define XML_LRM_ATTR_MIGRATE_TARGET "migrate_target" # define XML_TAG_GRAPH "transition_graph" # define XML_GRAPH_TAG_RSC_OP "rsc_op" # define XML_GRAPH_TAG_PSEUDO_EVENT "pseudo_event" # define XML_GRAPH_TAG_CRM_EVENT "crm_event" # define XML_GRAPH_TAG_DOWNED "downed" # define XML_GRAPH_TAG_MAINTENANCE "maintenance" # define XML_TAG_RULE "rule" # define XML_RULE_ATTR_SCORE "score" # define XML_RULE_ATTR_SCORE_ATTRIBUTE "score-attribute" # define XML_RULE_ATTR_ROLE "role" # define XML_RULE_ATTR_BOOLEAN_OP "boolean-op" # define XML_TAG_EXPRESSION "expression" # define XML_EXPR_ATTR_ATTRIBUTE "attribute" # define XML_EXPR_ATTR_OPERATION "operation" # define XML_EXPR_ATTR_VALUE "value" # define XML_EXPR_ATTR_TYPE "type" # define XML_EXPR_ATTR_VALUE_SOURCE "value-source" # define XML_CONS_TAG_RSC_DEPEND "rsc_colocation" # define XML_CONS_TAG_RSC_ORDER "rsc_order" # define XML_CONS_TAG_RSC_LOCATION "rsc_location" # define XML_CONS_TAG_RSC_TICKET "rsc_ticket" # define XML_CONS_TAG_RSC_SET "resource_set" # define XML_CONS_ATTR_SYMMETRICAL "symmetrical" # define XML_LOCATION_ATTR_DISCOVERY "resource-discovery" # define XML_COLOC_ATTR_SOURCE "rsc" # define XML_COLOC_ATTR_SOURCE_ROLE "rsc-role" # define XML_COLOC_ATTR_TARGET "with-rsc" # define XML_COLOC_ATTR_TARGET_ROLE "with-rsc-role" # define XML_COLOC_ATTR_NODE_ATTR "node-attribute" # define XML_COLOC_ATTR_INFLUENCE "influence" //! \deprecated Deprecated since 2.1.5 # define XML_COLOC_ATTR_SOURCE_INSTANCE "rsc-instance" //! \deprecated Deprecated since 2.1.5 # define XML_COLOC_ATTR_TARGET_INSTANCE "with-rsc-instance" # define XML_LOC_ATTR_SOURCE "rsc" # define XML_LOC_ATTR_SOURCE_PATTERN "rsc-pattern" # define XML_ORDER_ATTR_FIRST "first" # define XML_ORDER_ATTR_THEN "then" # define XML_ORDER_ATTR_FIRST_ACTION "first-action" # define XML_ORDER_ATTR_THEN_ACTION "then-action" # define XML_ORDER_ATTR_KIND "kind" //! \deprecated Deprecated since 2.1.5 # define XML_ORDER_ATTR_FIRST_INSTANCE "first-instance" //! \deprecated Deprecated since 2.1.5 # define XML_ORDER_ATTR_THEN_INSTANCE "then-instance" # define XML_TICKET_ATTR_TICKET "ticket" # define XML_TICKET_ATTR_LOSS_POLICY "loss-policy" # define XML_NVPAIR_ATTR_NAME "name" # define XML_NVPAIR_ATTR_VALUE "value" # define XML_NODE_ATTR_RSC_DISCOVERY "resource-discovery-enabled" # define XML_CONFIG_ATTR_DC_DEADTIME "dc-deadtime" # define XML_CONFIG_ATTR_ELECTION_FAIL "election-timeout" # define XML_CONFIG_ATTR_FORCE_QUIT "shutdown-escalation" # define XML_CONFIG_ATTR_RECHECK "cluster-recheck-interval" # define XML_CONFIG_ATTR_FENCE_REACTION "fence-reaction" # define XML_CONFIG_ATTR_SHUTDOWN_LOCK "shutdown-lock" # define XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT "shutdown-lock-limit" # define XML_CONFIG_ATTR_PRIORITY_FENCING_DELAY "priority-fencing-delay" +# define XML_CONFIG_ATTR_NODE_PENDING_TIMEOUT "node-pending-timeout" # define XML_ALERT_ATTR_PATH "path" # define XML_ALERT_ATTR_TIMEOUT "timeout" # define XML_ALERT_ATTR_TSTAMP_FORMAT "timestamp-format" # define XML_ALERT_ATTR_REC_VALUE "value" # define XML_CIB_TAG_GENERATION_TUPPLE "generation_tuple" # define XML_ATTR_TRANSITION_MAGIC "transition-magic" # define XML_ATTR_TRANSITION_KEY "transition-key" # define XML_ATTR_TE_NOWAIT "op_no_wait" # define XML_ATTR_TE_TARGET_RC "op_target_rc" # define XML_TAG_TRANSIENT_NODEATTRS "transient_attributes" # define XML_TAG_DIFF_ADDED "diff-added" # define XML_TAG_DIFF_REMOVED "diff-removed" # define XML_ACL_TAG_USER "acl_target" # define XML_ACL_TAG_USERv1 "acl_user" # define XML_ACL_TAG_GROUP "acl_group" # define XML_ACL_TAG_ROLE "acl_role" # define XML_ACL_TAG_PERMISSION "acl_permission" # define XML_ACL_TAG_ROLE_REF "role" # define XML_ACL_TAG_ROLE_REFv1 "role_ref" # define XML_ACL_ATTR_KIND "kind" # define XML_ACL_TAG_READ "read" # define XML_ACL_TAG_WRITE "write" # define XML_ACL_TAG_DENY "deny" # define XML_ACL_ATTR_REF "reference" # define XML_ACL_ATTR_REFv1 "ref" # define XML_ACL_ATTR_TAG "object-type" # define XML_ACL_ATTR_TAGv1 "tag" # define XML_ACL_ATTR_XPATH "xpath" # define XML_ACL_ATTR_ATTRIBUTE "attribute" # define XML_CIB_TAG_TICKETS "tickets" # define XML_CIB_TAG_TICKET_STATE "ticket_state" # define XML_CIB_TAG_TAGS "tags" # define XML_CIB_TAG_TAG "tag" # define XML_CIB_TAG_OBJ_REF "obj_ref" # define XML_TAG_FENCING_TOPOLOGY "fencing-topology" # define XML_TAG_FENCING_LEVEL "fencing-level" # define XML_ATTR_STONITH_INDEX "index" # define XML_ATTR_STONITH_TARGET "target" # define XML_ATTR_STONITH_TARGET_VALUE "target-value" # define XML_ATTR_STONITH_TARGET_PATTERN "target-pattern" # define XML_ATTR_STONITH_TARGET_ATTRIBUTE "target-attribute" # define XML_ATTR_STONITH_DEVICES "devices" # define XML_TAG_DIFF "diff" # define XML_DIFF_VERSION "version" # define XML_DIFF_VSOURCE "source" # define XML_DIFF_VTARGET "target" # define XML_DIFF_CHANGE "change" # define XML_DIFF_LIST "change-list" # define XML_DIFF_ATTR "change-attr" # define XML_DIFF_RESULT "change-result" # define XML_DIFF_OP "operation" # define XML_DIFF_PATH "path" # define XML_DIFF_POSITION "position" # define ID(x) crm_element_value(x, XML_ATTR_ID) # define TYPE(x) crm_element_name(x) #ifdef __cplusplus } #endif #endif diff --git a/include/crm/pengine/pe_types.h b/include/crm/pengine/pe_types.h index 29944a1b3a..935a4aced8 100644 --- a/include/crm/pengine/pe_types.h +++ b/include/crm/pengine/pe_types.h @@ -1,576 +1,577 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef PCMK__CRM_PENGINE_PE_TYPES__H # define PCMK__CRM_PENGINE_PE_TYPES__H # include // bool # include // time_t # include // xmlNode # include // gboolean, guint, GList, GHashTable # include # include #ifdef __cplusplus extern "C" { #endif /*! * \file * \brief Data types for cluster status * \ingroup pengine */ typedef struct pe_node_s pe_node_t; typedef struct pe_action_s pe_action_t; typedef struct pe_resource_s pe_resource_t; typedef struct pe_working_set_s pe_working_set_t; enum pe_obj_types { pe_unknown = -1, pe_native = 0, pe_group = 1, pe_clone = 2, pe_container = 3, }; typedef struct resource_object_functions_s { gboolean (*unpack) (pe_resource_t*, pe_working_set_t*); pe_resource_t *(*find_rsc) (pe_resource_t *parent, const char *search, const pe_node_t *node, int flags); /* parameter result must be free'd */ char *(*parameter) (pe_resource_t*, pe_node_t*, gboolean, const char*, pe_working_set_t*); //! \deprecated will be removed in a future release void (*print) (pe_resource_t*, const char*, long, void*); gboolean (*active) (pe_resource_t*, gboolean); enum rsc_role_e (*state) (const pe_resource_t*, gboolean); pe_node_t *(*location) (const pe_resource_t*, GList**, int); void (*free) (pe_resource_t*); void (*count) (pe_resource_t*); gboolean (*is_filtered) (const pe_resource_t*, GList *, gboolean); /*! * \brief Find a node (and optionally count all) where resource is active * * \param[in] rsc Resource to check * \param[out] count_all If not NULL, set this to count of active nodes * \param[out] count_clean If not NULL, set this to count of clean nodes * * \return A node where the resource is active, preferring the source node * if the resource is involved in a partial migration or a clean, * online node if the resource's "requires" is "quorum" or * "nothing", or NULL if the resource is inactive. */ pe_node_t *(*active_node)(const pe_resource_t *rsc, unsigned int *count_all, unsigned int *count_clean); /*! * \brief Get maximum resource instances per node * * \param[in] rsc Resource to check * * \return Maximum number of \p rsc instances that can be active on one node */ unsigned int (*max_per_node)(const pe_resource_t *rsc); } resource_object_functions_t; typedef struct resource_alloc_functions_s resource_alloc_functions_t; enum pe_quorum_policy { no_quorum_freeze, no_quorum_stop, no_quorum_ignore, no_quorum_suicide, no_quorum_demote }; enum node_type { node_ping, //! \deprecated Do not use node_member, node_remote }; //! \deprecated will be removed in a future release enum pe_restart { pe_restart_restart, //! \deprecated will be removed in a future release pe_restart_ignore //! \deprecated will be removed in a future release }; //! Determine behavior of pe_find_resource_with_flags() enum pe_find { pe_find_renamed = 0x001, //!< match resource ID or LRM history ID pe_find_anon = 0x002, //!< match base name of anonymous clone instances pe_find_clone = 0x004, //!< match only clone instances pe_find_current = 0x008, //!< match resource active on specified node pe_find_inactive = 0x010, //!< match resource not running anywhere pe_find_any = 0x020, //!< match base name of any clone instance }; // @TODO Make these an enum # define pe_flag_have_quorum 0x00000001ULL # define pe_flag_symmetric_cluster 0x00000002ULL # define pe_flag_maintenance_mode 0x00000008ULL # define pe_flag_stonith_enabled 0x00000010ULL # define pe_flag_have_stonith_resource 0x00000020ULL # define pe_flag_enable_unfencing 0x00000040ULL # define pe_flag_concurrent_fencing 0x00000080ULL # define pe_flag_stop_rsc_orphans 0x00000100ULL # define pe_flag_stop_action_orphans 0x00000200ULL # define pe_flag_stop_everything 0x00000400ULL # define pe_flag_start_failure_fatal 0x00001000ULL //! \deprecated # define pe_flag_remove_after_stop 0x00002000ULL # define pe_flag_startup_fencing 0x00004000ULL # define pe_flag_shutdown_lock 0x00008000ULL # define pe_flag_startup_probes 0x00010000ULL # define pe_flag_have_status 0x00020000ULL # define pe_flag_have_remote_nodes 0x00040000ULL # define pe_flag_quick_location 0x00100000ULL # define pe_flag_sanitized 0x00200000ULL //! \deprecated # define pe_flag_stdout 0x00400000ULL //! Don't count total, disabled and blocked resource instances # define pe_flag_no_counts 0x00800000ULL /*! Skip deprecated code that is kept solely for backward API compatibility. * (Internal code should always set this.) */ # define pe_flag_no_compat 0x01000000ULL # define pe_flag_show_scores 0x02000000ULL # define pe_flag_show_utilization 0x04000000ULL /*! * When scheduling, only unpack the CIB (including constraints), calculate * as much cluster status as possible, and apply node health. */ # define pe_flag_check_config 0x08000000ULL struct pe_working_set_s { xmlNode *input; crm_time_t *now; /* options extracted from the input */ char *dc_uuid; pe_node_t *dc_node; const char *stonith_action; const char *placement_strategy; unsigned long long flags; int stonith_timeout; enum pe_quorum_policy no_quorum_policy; GHashTable *config_hash; GHashTable *tickets; // Actions for which there can be only one (e.g. fence nodeX) GHashTable *singletons; GList *nodes; GList *resources; GList *placement_constraints; GList *ordering_constraints; GList *colocation_constraints; GList *ticket_constraints; GList *actions; xmlNode *failed; xmlNode *op_defaults; xmlNode *rsc_defaults; /* stats */ int num_synapse; int max_valid_nodes; //! Deprecated (will be removed in a future release) int order_id; int action_id; /* final output */ xmlNode *graph; GHashTable *template_rsc_sets; const char *localhost; GHashTable *tags; int blocked_resources; int disabled_resources; GList *param_check; // History entries that need to be checked GList *stop_needed; // Containers that need stop actions time_t recheck_by; // Hint to controller to re-run scheduler by this time int ninstances; // Total number of resource instances guint shutdown_lock;// How long (seconds) to lock resources to shutdown node int priority_fencing_delay; // Priority fencing delay void *priv; + guint node_pending_timeout; // Node pending timeout }; enum pe_check_parameters { /* Clear fail count if parameters changed for un-expired start or monitor * last_failure. */ pe_check_last_failure, /* Clear fail count if parameters changed for start, monitor, promote, or * migrate_from actions for active resources. */ pe_check_active, }; struct pe_node_shared_s { const char *id; const char *uname; enum node_type type; /* @TODO convert these flags into a bitfield */ gboolean online; gboolean standby; gboolean standby_onfail; gboolean pending; gboolean unclean; gboolean unseen; gboolean shutdown; gboolean expected_up; gboolean is_dc; gboolean maintenance; gboolean rsc_discovery_enabled; gboolean remote_requires_reset; gboolean remote_was_fenced; gboolean remote_maintenance; /* what the remote-rsc is thinking */ gboolean unpacked; int num_resources; pe_resource_t *remote_rsc; GList *running_rsc; /* pe_resource_t* */ GList *allocated_rsc; /* pe_resource_t* */ GHashTable *attrs; /* char* => char* */ GHashTable *utilization; GHashTable *digest_cache; //!< cache of calculated resource digests int priority; // calculated based on the priority of resources running on the node pe_working_set_t *data_set; //!< Cluster that this node is part of }; struct pe_node_s { int weight; gboolean fixed; //!< \deprecated Will be removed in a future release int count; struct pe_node_shared_s *details; int rsc_discover_mode; }; # define pe_rsc_orphan 0x00000001ULL # define pe_rsc_managed 0x00000002ULL # define pe_rsc_block 0x00000004ULL # define pe_rsc_orphan_container_filler 0x00000008ULL # define pe_rsc_notify 0x00000010ULL # define pe_rsc_unique 0x00000020ULL # define pe_rsc_fence_device 0x00000040ULL # define pe_rsc_promotable 0x00000080ULL # define pe_rsc_provisional 0x00000100ULL # define pe_rsc_allocating 0x00000200ULL # define pe_rsc_merging 0x00000400ULL # define pe_rsc_restarting 0x00000800ULL # define pe_rsc_stop 0x00001000ULL # define pe_rsc_reload 0x00002000ULL # define pe_rsc_allow_remote_remotes 0x00004000ULL # define pe_rsc_critical 0x00008000ULL # define pe_rsc_failed 0x00010000ULL # define pe_rsc_detect_loop 0x00020000ULL # define pe_rsc_runnable 0x00040000ULL # define pe_rsc_start_pending 0x00080000ULL //!< \deprecated Do not use # define pe_rsc_starting 0x00100000ULL //!< \deprecated Do not use # define pe_rsc_stopping 0x00200000ULL # define pe_rsc_stop_unexpected 0x00400000ULL # define pe_rsc_allow_migrate 0x00800000ULL # define pe_rsc_failure_ignored 0x01000000ULL # define pe_rsc_replica_container 0x02000000ULL # define pe_rsc_maintenance 0x04000000ULL # define pe_rsc_is_container 0x08000000ULL # define pe_rsc_needs_quorum 0x10000000ULL # define pe_rsc_needs_fencing 0x20000000ULL # define pe_rsc_needs_unfencing 0x40000000ULL /* *INDENT-OFF* */ enum pe_action_flags { pe_action_pseudo = 0x00001, pe_action_runnable = 0x00002, pe_action_optional = 0x00004, pe_action_print_always = 0x00008, pe_action_have_node_attrs = 0x00010, pe_action_implied_by_stonith = 0x00040, pe_action_migrate_runnable = 0x00080, pe_action_dumped = 0x00100, pe_action_processed = 0x00200, #if !defined(PCMK_ALLOW_DEPRECATED) || (PCMK_ALLOW_DEPRECATED == 1) pe_action_clear = 0x00400, //! \deprecated Unused #endif pe_action_dangle = 0x00800, /* This action requires one or more of its dependencies to be runnable. * We use this to clear the runnable flag before checking dependencies. */ pe_action_requires_any = 0x01000, pe_action_reschedule = 0x02000, pe_action_tracking = 0x04000, pe_action_dedup = 0x08000, //! Internal state tracking when creating graph pe_action_dc = 0x10000, //! Action may run on DC instead of target }; /* *INDENT-ON* */ struct pe_resource_s { char *id; char *clone_name; xmlNode *xml; xmlNode *orig_xml; xmlNode *ops_xml; pe_working_set_t *cluster; pe_resource_t *parent; enum pe_obj_types variant; void *variant_opaque; resource_object_functions_t *fns; resource_alloc_functions_t *cmds; enum rsc_recovery_type recovery_type; enum pe_restart restart_type; //!< \deprecated will be removed in future release int priority; int stickiness; int sort_index; int failure_timeout; int migration_threshold; guint remote_reconnect_ms; char *pending_task; unsigned long long flags; // @TODO merge these into flags gboolean is_remote_node; gboolean exclusive_discover; /* Pay special attention to whether you want to use rsc_cons_lhs and * rsc_cons directly, which include only colocations explicitly involving * this resource, or call libpacemaker's pcmk__with_this_colocations() and * pcmk__this_with_colocations() functions, which may return relevant * colocations involving the resource's ancestors as well. */ //!@{ //! This field should be treated as internal to Pacemaker GList *rsc_cons_lhs; // List of pcmk__colocation_t* GList *rsc_cons; // List of pcmk__colocation_t* GList *rsc_location; // List of pe__location_t* GList *actions; // List of pe_action_t* GList *rsc_tickets; // List of rsc_ticket* //!@} pe_node_t *allocated_to; pe_node_t *partial_migration_target; pe_node_t *partial_migration_source; GList *running_on; /* pe_node_t* */ GHashTable *known_on; /* pe_node_t* */ GHashTable *allowed_nodes; /* pe_node_t* */ enum rsc_role_e role; enum rsc_role_e next_role; GHashTable *meta; GHashTable *parameters; //! \deprecated Use pe_rsc_params() instead GHashTable *utilization; GList *children; /* pe_resource_t* */ GList *dangling_migrations; /* pe_node_t* */ pe_resource_t *container; GList *fillers; // @COMPAT These should be made const at next API compatibility break pe_node_t *pending_node; // Node on which pending_task is happening pe_node_t *lock_node; // Resource is shutdown-locked to this node time_t lock_time; // When shutdown lock started /* Resource parameters may have node-attribute-based rules, which means the * values can vary by node. This table is a cache of parameter name/value * tables for each node (as needed). Use pe_rsc_params() to get the table * for a given node. */ GHashTable *parameter_cache; // Key = node name, value = parameters table }; struct pe_action_s { int id; int priority; pe_resource_t *rsc; pe_node_t *node; xmlNode *op_entry; char *task; char *uuid; char *cancel_task; char *reason; enum pe_action_flags flags; enum rsc_start_requirement needs; enum action_fail_response on_fail; enum rsc_role_e fail_role; GHashTable *meta; GHashTable *extra; /* * These two varables are associated with the constraint logic * that involves first having one or more actions runnable before * then allowing this action to execute. * * These varables are used with features such as 'clone-min' which * requires at minimum X number of cloned instances to be running * before an order dependency can run. Another option that uses * this is 'require-all=false' in ordering constrants. This option * says "only require one instance of a resource to start before * allowing dependencies to start" -- basically, require-all=false is * the same as clone-min=1. */ /* current number of known runnable actions in the before list. */ int runnable_before; /* the number of "before" runnable actions required for this action * to be considered runnable */ int required_runnable_before; GList *actions_before; /* pe_action_wrapper_t* */ GList *actions_after; /* pe_action_wrapper_t* */ /* Some of the above fields could be moved to the details, * except for API backward compatibility. */ void *action_details; // varies by type of action }; typedef struct pe_ticket_s { char *id; gboolean granted; time_t last_granted; gboolean standby; GHashTable *state; } pe_ticket_t; typedef struct pe_tag_s { char *id; GList *refs; } pe_tag_t; //! Internal tracking for transition graph creation enum pe_link_state { pe_link_not_dumped, //! Internal tracking for transition graph creation pe_link_dumped, //! Internal tracking for transition graph creation pe_link_dup, //! \deprecated No longer used by Pacemaker }; enum pe_discover_e { pe_discover_always = 0, pe_discover_never, pe_discover_exclusive, }; /* *INDENT-OFF* */ enum pe_ordering { pe_order_none = 0x0, /* deleted */ pe_order_optional = 0x1, /* pure ordering, nothing implied */ pe_order_apply_first_non_migratable = 0x2, /* Only apply this constraint's ordering if first is not migratable. */ pe_order_implies_first = 0x10, /* If 'then' is required, ensure 'first' is too */ pe_order_implies_then = 0x20, /* If 'first' is required, ensure 'then' is too */ pe_order_promoted_implies_first = 0x40, /* If 'then' is required and then's rsc is promoted, ensure 'first' becomes required too */ /* first requires then to be both runnable and migrate runnable. */ pe_order_implies_first_migratable = 0x80, pe_order_runnable_left = 0x100, /* 'then' requires 'first' to be runnable */ pe_order_pseudo_left = 0x200, /* 'then' can only be pseudo if 'first' is runnable */ pe_order_implies_then_on_node = 0x400, /* If 'first' is required on 'nodeX', * ensure instances of 'then' on 'nodeX' are too. * Only really useful if 'then' is a clone and 'first' is not */ pe_order_probe = 0x800, /* If 'first->rsc' is * - running but about to stop, ignore the constraint * - otherwise, behave as runnable_left */ pe_order_restart = 0x1000, /* 'then' is runnable if 'first' is optional or runnable */ pe_order_stonith_stop = 0x2000, // #endif #ifdef __cplusplus } #endif #endif // PCMK__CRM_PENGINE_PE_TYPES__H diff --git a/lib/cluster/cluster.c b/lib/cluster/cluster.c index 011e0532d3..fce7591eb8 100644 --- a/lib/cluster/cluster.c +++ b/lib/cluster/cluster.c @@ -1,405 +1,405 @@ /* * Copyright 2004-2022 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "crmcluster_private.h" CRM_TRACE_INIT_DATA(cluster); /*! * \brief Get (and set if needed) a node's UUID * * \param[in,out] peer Node to check * * \return Node UUID of \p peer, or NULL if unknown */ const char * crm_peer_uuid(crm_node_t *peer) { char *uuid = NULL; // Check simple cases first, to avoid any calls that might block if (peer == NULL) { return NULL; } if (peer->uuid != NULL) { return peer->uuid; } switch (get_cluster_type()) { case pcmk_cluster_corosync: #if SUPPORT_COROSYNC uuid = pcmk__corosync_uuid(peer); #endif break; case pcmk_cluster_unknown: case pcmk_cluster_invalid: crm_err("Unsupported cluster type"); break; } peer->uuid = uuid; return peer->uuid; } /*! * \brief Connect to the cluster layer * * \param[in,out] Initialized cluster object to connect * * \return TRUE on success, otherwise FALSE */ gboolean crm_cluster_connect(crm_cluster_t *cluster) { enum cluster_type_e type = get_cluster_type(); crm_notice("Connecting to %s cluster infrastructure", name_for_cluster_type(type)); switch (type) { case pcmk_cluster_corosync: #if SUPPORT_COROSYNC crm_peer_init(); return pcmk__corosync_connect(cluster); #else break; #endif // SUPPORT_COROSYNC default: break; } return FALSE; } /*! * \brief Disconnect from the cluster layer * * \param[in,out] cluster Cluster object to disconnect */ void crm_cluster_disconnect(crm_cluster_t *cluster) { enum cluster_type_e type = get_cluster_type(); crm_info("Disconnecting from %s cluster infrastructure", name_for_cluster_type(type)); switch (type) { case pcmk_cluster_corosync: #if SUPPORT_COROSYNC crm_peer_destroy(); pcmk__corosync_disconnect(cluster); #endif // SUPPORT_COROSYNC break; default: break; } } /*! * \brief Allocate a new \p crm_cluster_t object * * \return A newly allocated \p crm_cluster_t object (guaranteed not \p NULL) * \note The caller is responsible for freeing the return value using * \p pcmk_cluster_free(). */ crm_cluster_t * pcmk_cluster_new(void) { crm_cluster_t *cluster = calloc(1, sizeof(crm_cluster_t)); CRM_ASSERT(cluster != NULL); return cluster; } /*! * \brief Free a \p crm_cluster_t object and its dynamically allocated members * * \param[in,out] cluster Cluster object to free */ void pcmk_cluster_free(crm_cluster_t *cluster) { if (cluster == NULL) { return; } free(cluster->uuid); free(cluster->uname); free(cluster); } /*! * \brief Send an XML message via the cluster messaging layer * * \param[in] node Cluster node to send message to * \param[in] service Message type to use in message host info * \param[in] data XML message to send * \param[in] ordered Ignored for currently supported messaging layers * * \return TRUE on success, otherwise FALSE */ gboolean send_cluster_message(const crm_node_t *node, enum crm_ais_msg_types service, xmlNode *data, gboolean ordered) { switch (get_cluster_type()) { case pcmk_cluster_corosync: #if SUPPORT_COROSYNC return pcmk__cpg_send_xml(data, node, service); #endif break; default: break; } return FALSE; } /*! * \brief Get the local node's name * * \return Local node's name * \note This will fatally exit if local node name cannot be known. */ const char * get_local_node_name(void) { static char *name = NULL; if (name == NULL) { name = get_node_name(0); } return name; } /*! * \brief Get the node name corresponding to a cluster node ID * * \param[in] nodeid Node ID to check (or 0 for local node) * * \return Node name corresponding to \p nodeid * \note This will fatally exit if \p nodeid is 0 and local node name cannot be * known. */ char * get_node_name(uint32_t nodeid) { char *name = NULL; enum cluster_type_e stack = get_cluster_type(); switch (stack) { case pcmk_cluster_corosync: #if SUPPORT_COROSYNC name = pcmk__corosync_name(0, nodeid); break; #endif // SUPPORT_COROSYNC default: crm_err("Unknown cluster type: %s (%d)", name_for_cluster_type(stack), stack); } if ((name == NULL) && (nodeid == 0)) { name = pcmk_hostname(); if (name == NULL) { // @TODO Maybe let the caller decide what to do crm_err("Could not obtain the local %s node name", name_for_cluster_type(stack)); crm_exit(CRM_EX_FATAL); } crm_notice("Defaulting to uname -n for the local %s node name", name_for_cluster_type(stack)); } if (name == NULL) { crm_notice("Could not obtain a node name for %s node with id %u", name_for_cluster_type(stack), nodeid); } return name; } /*! * \brief Get the node name corresponding to a node UUID * * \param[in] uuid UUID of desired node * * \return name of desired node * * \note This relies on the remote peer cache being populated with all * remote nodes in the cluster, so callers should maintain that cache. */ const char * crm_peer_uname(const char *uuid) { GHashTableIter iter; crm_node_t *node = NULL; CRM_CHECK(uuid != NULL, return NULL); /* remote nodes have the same uname and uuid */ if (g_hash_table_lookup(crm_remote_peer_cache, uuid)) { return uuid; } /* avoid blocking calls where possible */ g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { if (pcmk__str_eq(node->uuid, uuid, pcmk__str_casei)) { if (node->uname != NULL) { return node->uname; } break; } } node = NULL; if (is_corosync_cluster()) { long long id; if ((pcmk__scan_ll(uuid, &id, 0LL) != pcmk_rc_ok) || (id < 1LL) || (id > UINT32_MAX)) { crm_err("Invalid Corosync node ID '%s'", uuid); return NULL; } - node = pcmk__search_cluster_node_cache((uint32_t) id, NULL); + node = pcmk__search_cluster_node_cache((uint32_t) id, NULL, NULL); if (node != NULL) { crm_info("Setting uuid for node %s[%u] to %s", node->uname, node->id, uuid); node->uuid = strdup(uuid); return node->uname; } return NULL; } return NULL; } /*! * \brief Add a node's UUID as an XML attribute * * \param[in,out] xml XML element to add UUID to * \param[in] attr XML attribute name to set * \param[in,out] node Node whose UUID should be used as attribute value */ void set_uuid(xmlNode *xml, const char *attr, crm_node_t *node) { crm_xml_add(xml, attr, crm_peer_uuid(node)); } /*! * \brief Get a log-friendly string equivalent of a cluster type * * \param[in] type Cluster type * * \return Log-friendly string corresponding to \p type */ const char * name_for_cluster_type(enum cluster_type_e type) { switch (type) { case pcmk_cluster_corosync: return "corosync"; case pcmk_cluster_unknown: return "unknown"; case pcmk_cluster_invalid: return "invalid"; } crm_err("Invalid cluster type: %d", type); return "invalid"; } /*! * \brief Get (and validate) the local cluster type * * \return Local cluster type * \note This will fatally exit if the local cluster type is invalid. */ enum cluster_type_e get_cluster_type(void) { bool detected = false; const char *cluster = NULL; static enum cluster_type_e cluster_type = pcmk_cluster_unknown; /* Return the previous calculation, if any */ if (cluster_type != pcmk_cluster_unknown) { return cluster_type; } cluster = pcmk__env_option(PCMK__ENV_CLUSTER_TYPE); #if SUPPORT_COROSYNC /* If nothing is defined in the environment, try corosync (if supported) */ if (cluster == NULL) { crm_debug("Testing with Corosync"); cluster_type = pcmk__corosync_detect(); if (cluster_type != pcmk_cluster_unknown) { detected = true; goto done; } } #endif /* Something was defined in the environment, test it against what we support */ crm_info("Verifying cluster type: '%s'", ((cluster == NULL)? "-unspecified-" : cluster)); if (cluster == NULL) { #if SUPPORT_COROSYNC } else if (pcmk__str_eq(cluster, "corosync", pcmk__str_casei)) { cluster_type = pcmk_cluster_corosync; #endif } else { cluster_type = pcmk_cluster_invalid; goto done; /* Keep the compiler happy when no stacks are supported */ } done: if (cluster_type == pcmk_cluster_unknown) { crm_notice("Could not determine the current cluster type"); } else if (cluster_type == pcmk_cluster_invalid) { crm_notice("This installation does not support the '%s' cluster infrastructure: terminating.", cluster); crm_exit(CRM_EX_FATAL); } else { crm_info("%s an active '%s' cluster", (detected? "Detected" : "Assuming"), name_for_cluster_type(cluster_type)); } return cluster_type; } /*! * \brief Check whether the local cluster is a Corosync cluster * * \return TRUE if the local cluster is a Corosync cluster, otherwise FALSE */ gboolean is_corosync_cluster(void) { return get_cluster_type() == pcmk_cluster_corosync; } diff --git a/lib/cluster/cpg.c b/lib/cluster/cpg.c index 2af4a5059d..d79ea47cbb 100644 --- a/lib/cluster/cpg.c +++ b/lib/cluster/cpg.c @@ -1,1092 +1,1092 @@ /* * Copyright 2004-2022 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* PCMK__SPECIAL_PID* */ #include "crmcluster_private.h" /* @TODO Once we can update the public API to require crm_cluster_t* in more * functions, we can ditch this in favor of cluster->cpg_handle. */ static cpg_handle_t pcmk_cpg_handle = 0; // @TODO These could be moved to crm_cluster_t* at that time as well static bool cpg_evicted = false; static GList *cs_message_queue = NULL; static int cs_message_timer = 0; struct pcmk__cpg_host_s { uint32_t id; uint32_t pid; gboolean local; enum crm_ais_msg_types type; uint32_t size; char uname[MAX_NAME]; } __attribute__ ((packed)); typedef struct pcmk__cpg_host_s pcmk__cpg_host_t; struct pcmk__cpg_msg_s { struct qb_ipc_response_header header __attribute__ ((aligned(8))); uint32_t id; gboolean is_compressed; pcmk__cpg_host_t host; pcmk__cpg_host_t sender; uint32_t size; uint32_t compressed_size; /* 584 bytes */ char data[0]; } __attribute__ ((packed)); typedef struct pcmk__cpg_msg_s pcmk__cpg_msg_t; static void crm_cs_flush(gpointer data); #define msg_data_len(msg) (msg->is_compressed?msg->compressed_size:msg->size) #define cs_repeat(rc, counter, max, code) do { \ rc = code; \ if ((rc == CS_ERR_TRY_AGAIN) || (rc == CS_ERR_QUEUE_FULL)) { \ counter++; \ crm_debug("Retrying operation after %ds", counter); \ sleep(counter); \ } else { \ break; \ } \ } while (counter < max) /*! * \brief Disconnect from Corosync CPG * * \param[in,out] cluster Cluster to disconnect */ void cluster_disconnect_cpg(crm_cluster_t *cluster) { pcmk_cpg_handle = 0; if (cluster->cpg_handle) { crm_trace("Disconnecting CPG"); cpg_leave(cluster->cpg_handle, &cluster->group); cpg_finalize(cluster->cpg_handle); cluster->cpg_handle = 0; } else { crm_info("No CPG connection"); } } /*! * \brief Get the local Corosync node ID (via CPG) * * \param[in] handle CPG connection to use (or 0 to use new connection) * * \return Corosync ID of local node (or 0 if not known) */ uint32_t get_local_nodeid(cpg_handle_t handle) { cs_error_t rc = CS_OK; int retries = 0; static uint32_t local_nodeid = 0; cpg_handle_t local_handle = handle; cpg_model_v1_data_t cpg_model_info = {CPG_MODEL_V1, NULL, NULL, NULL, 0}; int fd = -1; uid_t found_uid = 0; gid_t found_gid = 0; pid_t found_pid = 0; int rv; if(local_nodeid != 0) { return local_nodeid; } if(handle == 0) { crm_trace("Creating connection"); cs_repeat(rc, retries, 5, cpg_model_initialize(&local_handle, CPG_MODEL_V1, (cpg_model_data_t *)&cpg_model_info, NULL)); if (rc != CS_OK) { crm_err("Could not connect to the CPG API: %s (%d)", cs_strerror(rc), rc); return 0; } rc = cpg_fd_get(local_handle, &fd); if (rc != CS_OK) { crm_err("Could not obtain the CPG API connection: %s (%d)", cs_strerror(rc), rc); goto bail; } /* CPG provider run as root (in given user namespace, anyway)? */ if (!(rv = crm_ipc_is_authentic_process(fd, (uid_t) 0,(gid_t) 0, &found_pid, &found_uid, &found_gid))) { crm_err("CPG provider is not authentic:" " process %lld (uid: %lld, gid: %lld)", (long long) PCMK__SPECIAL_PID_AS_0(found_pid), (long long) found_uid, (long long) found_gid); goto bail; } else if (rv < 0) { crm_err("Could not verify authenticity of CPG provider: %s (%d)", strerror(-rv), -rv); goto bail; } } if (rc == CS_OK) { retries = 0; crm_trace("Performing lookup"); cs_repeat(rc, retries, 5, cpg_local_get(local_handle, &local_nodeid)); } if (rc != CS_OK) { crm_err("Could not get local node id from the CPG API: %s (%d)", pcmk__cs_err_str(rc), rc); } bail: if(handle == 0) { crm_trace("Closing connection"); cpg_finalize(local_handle); } crm_debug("Local nodeid is %u", local_nodeid); return local_nodeid; } /*! * \internal * \brief Callback function for Corosync message queue timer * * \param[in] data CPG handle * * \return FALSE (to indicate to glib that timer should not be removed) */ static gboolean crm_cs_flush_cb(gpointer data) { cs_message_timer = 0; crm_cs_flush(data); return FALSE; } // Send no more than this many CPG messages in one flush #define CS_SEND_MAX 200 /*! * \internal * \brief Send messages in Corosync CPG message queue * * \param[in] data CPG handle */ static void crm_cs_flush(gpointer data) { unsigned int sent = 0; guint queue_len = 0; cs_error_t rc = 0; cpg_handle_t *handle = (cpg_handle_t *) data; if (*handle == 0) { crm_trace("Connection is dead"); return; } queue_len = g_list_length(cs_message_queue); if (((queue_len % 1000) == 0) && (queue_len > 1)) { crm_err("CPG queue has grown to %d", queue_len); } else if (queue_len == CS_SEND_MAX) { crm_warn("CPG queue has grown to %d", queue_len); } if (cs_message_timer != 0) { /* There is already a timer, wait until it goes off */ crm_trace("Timer active %d", cs_message_timer); return; } while ((cs_message_queue != NULL) && (sent < CS_SEND_MAX)) { struct iovec *iov = cs_message_queue->data; rc = cpg_mcast_joined(*handle, CPG_TYPE_AGREED, iov, 1); if (rc != CS_OK) { break; } sent++; crm_trace("CPG message sent, size=%llu", (unsigned long long) iov->iov_len); cs_message_queue = g_list_remove(cs_message_queue, iov); free(iov->iov_base); free(iov); } queue_len -= sent; do_crm_log((queue_len > 5)? LOG_INFO : LOG_TRACE, "Sent %u CPG message%s (%d still queued): %s (rc=%d)", sent, pcmk__plural_s(sent), queue_len, pcmk__cs_err_str(rc), (int) rc); if (cs_message_queue) { uint32_t delay_ms = 100; if (rc != CS_OK) { /* Proportionally more if sending failed but cap at 1s */ delay_ms = QB_MIN(1000, CS_SEND_MAX + (10 * queue_len)); } cs_message_timer = g_timeout_add(delay_ms, crm_cs_flush_cb, data); } } /*! * \internal * \brief Dispatch function for CPG handle * * \param[in,out] user_data Cluster object * * \return 0 on success, -1 on error (per mainloop_io_t interface) */ static int pcmk_cpg_dispatch(gpointer user_data) { cs_error_t rc = CS_OK; crm_cluster_t *cluster = (crm_cluster_t *) user_data; rc = cpg_dispatch(cluster->cpg_handle, CS_DISPATCH_ONE); if (rc != CS_OK) { crm_err("Connection to the CPG API failed: %s (%d)", pcmk__cs_err_str(rc), rc); cpg_finalize(cluster->cpg_handle); cluster->cpg_handle = 0; return -1; } else if (cpg_evicted) { crm_err("Evicted from CPG membership"); return -1; } return 0; } static inline const char * ais_dest(const pcmk__cpg_host_t *host) { if (host->local) { return "local"; } else if (host->size > 0) { return host->uname; } else { return ""; } } static inline const char * msg_type2text(enum crm_ais_msg_types type) { const char *text = "unknown"; switch (type) { case crm_msg_none: text = "unknown"; break; case crm_msg_ais: text = "ais"; break; case crm_msg_cib: text = "cib"; break; case crm_msg_crmd: text = "crmd"; break; case crm_msg_pe: text = "pengine"; break; case crm_msg_te: text = "tengine"; break; case crm_msg_lrmd: text = "lrmd"; break; case crm_msg_attrd: text = "attrd"; break; case crm_msg_stonithd: text = "stonithd"; break; case crm_msg_stonith_ng: text = "stonith-ng"; break; } return text; } /*! * \internal * \brief Check whether a Corosync CPG message is valid * * \param[in] msg Corosync CPG message to check * * \return true if \p msg is valid, otherwise false */ static bool check_message_sanity(const pcmk__cpg_msg_t *msg) { int32_t payload_size = msg->header.size - sizeof(pcmk__cpg_msg_t); if (payload_size < 1) { crm_err("%sCPG message %d from %s invalid: " "Claimed size of %d bytes is too small " CRM_XS " from %s[%u] to %s@%s", (msg->is_compressed? "Compressed " : ""), msg->id, ais_dest(&(msg->sender)), (int) msg->header.size, msg_type2text(msg->sender.type), msg->sender.pid, msg_type2text(msg->host.type), ais_dest(&(msg->host))); return false; } if (msg->header.error != CS_OK) { crm_err("%sCPG message %d from %s invalid: " "Sender indicated error %d " CRM_XS " from %s[%u] to %s@%s", (msg->is_compressed? "Compressed " : ""), msg->id, ais_dest(&(msg->sender)), msg->header.error, msg_type2text(msg->sender.type), msg->sender.pid, msg_type2text(msg->host.type), ais_dest(&(msg->host))); return false; } if (msg_data_len(msg) != payload_size) { crm_err("%sCPG message %d from %s invalid: " "Total size %d inconsistent with payload size %d " CRM_XS " from %s[%u] to %s@%s", (msg->is_compressed? "Compressed " : ""), msg->id, ais_dest(&(msg->sender)), (int) msg->header.size, (int) msg_data_len(msg), msg_type2text(msg->sender.type), msg->sender.pid, msg_type2text(msg->host.type), ais_dest(&(msg->host))); return false; } if (!msg->is_compressed && /* msg->size != (strlen(msg->data) + 1) would be a stronger check, * but checking the last byte or two should be quick */ (((msg->size > 1) && (msg->data[msg->size - 2] == '\0')) || (msg->data[msg->size - 1] != '\0'))) { crm_err("CPG message %d from %s invalid: " "Payload does not end at byte %llu " CRM_XS " from %s[%u] to %s@%s", msg->id, ais_dest(&(msg->sender)), (unsigned long long) msg->size, msg_type2text(msg->sender.type), msg->sender.pid, msg_type2text(msg->host.type), ais_dest(&(msg->host))); return false; } crm_trace("Verified %d-byte %sCPG message %d from %s[%u]@%s to %s@%s", (int) msg->header.size, (msg->is_compressed? "compressed " : ""), msg->id, msg_type2text(msg->sender.type), msg->sender.pid, ais_dest(&(msg->sender)), msg_type2text(msg->host.type), ais_dest(&(msg->host))); return true; } /*! * \brief Extract text data from a Corosync CPG message * * \param[in] handle CPG connection (to get local node ID if not known) * \param[in] nodeid Corosync ID of node that sent message * \param[in] pid Process ID of message sender (for logging only) * \param[in,out] content CPG message * \param[out] kind If not NULL, will be set to CPG header ID * (which should be an enum crm_ais_msg_class value, * currently always crm_class_cluster) * \param[out] from If not NULL, will be set to sender uname * (valid for the lifetime of \p content) * * \return Newly allocated string with message data * \note It is the caller's responsibility to free the return value with free(). */ char * pcmk_message_common_cs(cpg_handle_t handle, uint32_t nodeid, uint32_t pid, void *content, uint32_t *kind, const char **from) { char *data = NULL; pcmk__cpg_msg_t *msg = (pcmk__cpg_msg_t *) content; if(handle) { // Do filtering and field massaging uint32_t local_nodeid = get_local_nodeid(handle); const char *local_name = get_local_node_name(); if (msg->sender.id > 0 && msg->sender.id != nodeid) { crm_err("Nodeid mismatch from %d.%d: claimed nodeid=%u", nodeid, pid, msg->sender.id); return NULL; } else if (msg->host.id != 0 && (local_nodeid != msg->host.id)) { /* Not for us */ crm_trace("Not for us: %u != %u", msg->host.id, local_nodeid); return NULL; } else if (msg->host.size != 0 && !pcmk__str_eq(msg->host.uname, local_name, pcmk__str_casei)) { /* Not for us */ crm_trace("Not for us: %s != %s", msg->host.uname, local_name); return NULL; } msg->sender.id = nodeid; if (msg->sender.size == 0) { crm_node_t *peer = crm_get_peer(nodeid, NULL); if (peer == NULL) { crm_err("Peer with nodeid=%u is unknown", nodeid); } else if (peer->uname == NULL) { crm_err("No uname for peer with nodeid=%u", nodeid); } else { crm_notice("Fixing uname for peer with nodeid=%u", nodeid); msg->sender.size = strlen(peer->uname); memset(msg->sender.uname, 0, MAX_NAME); memcpy(msg->sender.uname, peer->uname, msg->sender.size); } } } crm_trace("Got new%s message (size=%d, %d, %d)", msg->is_compressed ? " compressed" : "", msg_data_len(msg), msg->size, msg->compressed_size); if (kind != NULL) { *kind = msg->header.id; } if (from != NULL) { *from = msg->sender.uname; } if (msg->is_compressed && msg->size > 0) { int rc = BZ_OK; char *uncompressed = NULL; unsigned int new_size = msg->size + 1; if (!check_message_sanity(msg)) { goto badmsg; } crm_trace("Decompressing message data"); uncompressed = calloc(1, new_size); rc = BZ2_bzBuffToBuffDecompress(uncompressed, &new_size, msg->data, msg->compressed_size, 1, 0); if (rc != BZ_OK) { crm_err("Decompression failed: %s " CRM_XS " bzerror=%d", bz2_strerror(rc), rc); free(uncompressed); goto badmsg; } CRM_ASSERT(rc == BZ_OK); CRM_ASSERT(new_size == msg->size); data = uncompressed; } else if (!check_message_sanity(msg)) { goto badmsg; } else { data = strdup(msg->data); } // Is this necessary? crm_get_peer(msg->sender.id, msg->sender.uname); crm_trace("Payload: %.200s", data); return data; badmsg: crm_err("Invalid message (id=%d, dest=%s:%s, from=%s:%s.%d):" " min=%d, total=%d, size=%d, bz2_size=%d", msg->id, ais_dest(&(msg->host)), msg_type2text(msg->host.type), ais_dest(&(msg->sender)), msg_type2text(msg->sender.type), msg->sender.pid, (int)sizeof(pcmk__cpg_msg_t), msg->header.size, msg->size, msg->compressed_size); free(data); return NULL; } /*! * \internal * \brief Compare cpg_address objects by node ID * * \param[in] first First cpg_address structure to compare * \param[in] second Second cpg_address structure to compare * * \return Negative number if first's node ID is lower, * positive number if first's node ID is greater, * or 0 if both node IDs are equal */ static int cmp_member_list_nodeid(const void *first, const void *second) { const struct cpg_address *const a = *((const struct cpg_address **) first), *const b = *((const struct cpg_address **) second); if (a->nodeid < b->nodeid) { return -1; } else if (a->nodeid > b->nodeid) { return 1; } /* don't bother with "reason" nor "pid" */ return 0; } /*! * \internal * \brief Get a readable string equivalent of a cpg_reason_t value * * \param[in] reason CPG reason value * * \return Readable string suitable for logging */ static const char * cpgreason2str(cpg_reason_t reason) { switch (reason) { case CPG_REASON_JOIN: return " via cpg_join"; case CPG_REASON_LEAVE: return " via cpg_leave"; case CPG_REASON_NODEDOWN: return " via cluster exit"; case CPG_REASON_NODEUP: return " via cluster join"; case CPG_REASON_PROCDOWN: return " for unknown reason"; default: break; } return ""; } /*! * \internal * \brief Get a log-friendly node name * * \param[in] peer Node to check * * \return Node's uname, or readable string if not known */ static inline const char * peer_name(const crm_node_t *peer) { if (peer == NULL) { return "unknown node"; } else if (peer->uname == NULL) { return "peer node"; } else { return peer->uname; } } /*! * \internal * \brief Process a CPG peer's leaving the cluster * * \param[in] cpg_group_name CPG group name (for logging) * \param[in] event_counter Event number (for logging) * \param[in] local_nodeid Node ID of local node * \param[in] cpg_peer CPG peer that left * \param[in] sorted_member_list List of remaining members, qsort()-ed by ID * \param[in] member_list_entries Number of entries in \p sorted_member_list */ static void node_left(const char *cpg_group_name, int event_counter, uint32_t local_nodeid, const struct cpg_address *cpg_peer, const struct cpg_address **sorted_member_list, size_t member_list_entries) { crm_node_t *peer = pcmk__search_cluster_node_cache(cpg_peer->nodeid, - NULL); + NULL, NULL); const struct cpg_address **rival = NULL; /* Most CPG-related Pacemaker code assumes that only one process on a node * can be in the process group, but Corosync does not impose this * limitation, and more than one can be a member in practice due to a * daemon attempting to start while another instance is already running. * * Check for any such duplicate instances, because we don't want to process * their leaving as if our actual peer left. If the peer that left still has * an entry in sorted_member_list (with a different PID), we will ignore the * leaving. * * @TODO Track CPG members' PIDs so we can tell exactly who left. */ if (peer != NULL) { rival = bsearch(&cpg_peer, sorted_member_list, member_list_entries, sizeof(const struct cpg_address *), cmp_member_list_nodeid); } if (rival == NULL) { crm_info("Group %s event %d: %s (node %u pid %u) left%s", cpg_group_name, event_counter, peer_name(peer), cpg_peer->nodeid, cpg_peer->pid, cpgreason2str(cpg_peer->reason)); if (peer != NULL) { crm_update_peer_proc(__func__, peer, crm_proc_cpg, OFFLINESTATUS); } } else if (cpg_peer->nodeid == local_nodeid) { crm_warn("Group %s event %d: duplicate local pid %u left%s", cpg_group_name, event_counter, cpg_peer->pid, cpgreason2str(cpg_peer->reason)); } else { crm_warn("Group %s event %d: " "%s (node %u) duplicate pid %u left%s (%u remains)", cpg_group_name, event_counter, peer_name(peer), cpg_peer->nodeid, cpg_peer->pid, cpgreason2str(cpg_peer->reason), (*rival)->pid); } } /*! * \brief Handle a CPG configuration change event * * \param[in] handle CPG connection * \param[in] cpg_name CPG group name * \param[in] member_list List of current CPG members * \param[in] member_list_entries Number of entries in \p member_list * \param[in] left_list List of CPG members that left * \param[in] left_list_entries Number of entries in \p left_list * \param[in] joined_list List of CPG members that joined * \param[in] joined_list_entries Number of entries in \p joined_list */ void pcmk_cpg_membership(cpg_handle_t handle, const struct cpg_name *groupName, const struct cpg_address *member_list, size_t member_list_entries, const struct cpg_address *left_list, size_t left_list_entries, const struct cpg_address *joined_list, size_t joined_list_entries) { int i; gboolean found = FALSE; static int counter = 0; uint32_t local_nodeid = get_local_nodeid(handle); const struct cpg_address **sorted; sorted = malloc(member_list_entries * sizeof(const struct cpg_address *)); CRM_ASSERT(sorted != NULL); for (size_t iter = 0; iter < member_list_entries; iter++) { sorted[iter] = member_list + iter; } /* so that the cross-matching multiply-subscribed nodes is then cheap */ qsort(sorted, member_list_entries, sizeof(const struct cpg_address *), cmp_member_list_nodeid); for (i = 0; i < left_list_entries; i++) { node_left(groupName->value, counter, local_nodeid, &left_list[i], sorted, member_list_entries); } free(sorted); sorted = NULL; for (i = 0; i < joined_list_entries; i++) { crm_info("Group %s event %d: node %u pid %u joined%s", groupName->value, counter, joined_list[i].nodeid, joined_list[i].pid, cpgreason2str(joined_list[i].reason)); } for (i = 0; i < member_list_entries; i++) { crm_node_t *peer = crm_get_peer(member_list[i].nodeid, NULL); if (member_list[i].nodeid == local_nodeid && member_list[i].pid != getpid()) { // See the note in node_left() crm_warn("Group %s event %d: detected duplicate local pid %u", groupName->value, counter, member_list[i].pid); continue; } crm_info("Group %s event %d: %s (node %u pid %u) is member", groupName->value, counter, peer_name(peer), member_list[i].nodeid, member_list[i].pid); /* If the caller left auto-reaping enabled, this will also update the * state to member. */ peer = crm_update_peer_proc(__func__, peer, crm_proc_cpg, ONLINESTATUS); if (peer && peer->state && strcmp(peer->state, CRM_NODE_MEMBER)) { /* The node is a CPG member, but we currently think it's not a * cluster member. This is possible only if auto-reaping was * disabled. The node may be joining, and we happened to get the CPG * notification before the quorum notification; or the node may have * just died, and we are processing its final messages; or a bug * has affected the peer cache. */ time_t now = time(NULL); if (peer->when_lost == 0) { // Track when we first got into this contradictory state peer->when_lost = now; } else if (now > (peer->when_lost + 60)) { // If it persists for more than a minute, update the state crm_warn("Node %u is member of group %s but was believed offline", member_list[i].nodeid, groupName->value); pcmk__update_peer_state(__func__, peer, CRM_NODE_MEMBER, 0); } } if (local_nodeid == member_list[i].nodeid) { found = TRUE; } } if (!found) { crm_err("Local node was evicted from group %s", groupName->value); cpg_evicted = true; } counter++; } /*! * \brief Connect to Corosync CPG * * \param[in,out] cluster Cluster object * * \return TRUE on success, otherwise FALSE */ gboolean cluster_connect_cpg(crm_cluster_t *cluster) { cs_error_t rc; int fd = -1; int retries = 0; uint32_t id = 0; crm_node_t *peer = NULL; cpg_handle_t handle = 0; const char *message_name = pcmk__message_name(crm_system_name); uid_t found_uid = 0; gid_t found_gid = 0; pid_t found_pid = 0; int rv; struct mainloop_fd_callbacks cpg_fd_callbacks = { .dispatch = pcmk_cpg_dispatch, .destroy = cluster->destroy, }; cpg_model_v1_data_t cpg_model_info = { .model = CPG_MODEL_V1, .cpg_deliver_fn = cluster->cpg.cpg_deliver_fn, .cpg_confchg_fn = cluster->cpg.cpg_confchg_fn, .cpg_totem_confchg_fn = NULL, .flags = 0, }; cpg_evicted = false; cluster->group.length = 0; cluster->group.value[0] = 0; /* group.value is char[128] */ strncpy(cluster->group.value, message_name, 127); cluster->group.value[127] = 0; cluster->group.length = 1 + QB_MIN(127, strlen(cluster->group.value)); cs_repeat(rc, retries, 30, cpg_model_initialize(&handle, CPG_MODEL_V1, (cpg_model_data_t *)&cpg_model_info, NULL)); if (rc != CS_OK) { crm_err("Could not connect to the CPG API: %s (%d)", cs_strerror(rc), rc); goto bail; } rc = cpg_fd_get(handle, &fd); if (rc != CS_OK) { crm_err("Could not obtain the CPG API connection: %s (%d)", cs_strerror(rc), rc); goto bail; } /* CPG provider run as root (in given user namespace, anyway)? */ if (!(rv = crm_ipc_is_authentic_process(fd, (uid_t) 0,(gid_t) 0, &found_pid, &found_uid, &found_gid))) { crm_err("CPG provider is not authentic:" " process %lld (uid: %lld, gid: %lld)", (long long) PCMK__SPECIAL_PID_AS_0(found_pid), (long long) found_uid, (long long) found_gid); rc = CS_ERR_ACCESS; goto bail; } else if (rv < 0) { crm_err("Could not verify authenticity of CPG provider: %s (%d)", strerror(-rv), -rv); rc = CS_ERR_ACCESS; goto bail; } id = get_local_nodeid(handle); if (id == 0) { crm_err("Could not get local node id from the CPG API"); goto bail; } cluster->nodeid = id; retries = 0; cs_repeat(rc, retries, 30, cpg_join(handle, &cluster->group)); if (rc != CS_OK) { crm_err("Could not join the CPG group '%s': %d", message_name, rc); goto bail; } pcmk_cpg_handle = handle; cluster->cpg_handle = handle; mainloop_add_fd("corosync-cpg", G_PRIORITY_MEDIUM, fd, cluster, &cpg_fd_callbacks); bail: if (rc != CS_OK) { cpg_finalize(handle); return FALSE; } peer = crm_get_peer(id, NULL); crm_update_peer_proc(__func__, peer, crm_proc_cpg, ONLINESTATUS); return TRUE; } /*! * \internal * \brief Send an XML message via Corosync CPG * * \param[in] msg XML message to send * \param[in] node Cluster node to send message to * \param[in] dest Type of message to send * * \return TRUE on success, otherwise FALSE */ gboolean pcmk__cpg_send_xml(xmlNode *msg, const crm_node_t *node, enum crm_ais_msg_types dest) { gboolean rc = TRUE; char *data = NULL; data = dump_xml_unformatted(msg); rc = send_cluster_text(crm_class_cluster, data, FALSE, node, dest); free(data); return rc; } /*! * \internal * \brief Send string data via Corosync CPG * * \param[in] msg_class Message class (to set as CPG header ID) * \param[in] data Data to send * \param[in] local What to set as host "local" value (which is never used) * \param[in] node Cluster node to send message to * \param[in] dest Type of message to send * * \return TRUE on success, otherwise FALSE */ gboolean send_cluster_text(enum crm_ais_msg_class msg_class, const char *data, gboolean local, const crm_node_t *node, enum crm_ais_msg_types dest) { static int msg_id = 0; static int local_pid = 0; static int local_name_len = 0; static const char *local_name = NULL; char *target = NULL; struct iovec *iov; pcmk__cpg_msg_t *msg = NULL; enum crm_ais_msg_types sender = text2msg_type(crm_system_name); switch (msg_class) { case crm_class_cluster: break; default: crm_err("Invalid message class: %d", msg_class); return FALSE; } CRM_CHECK(dest != crm_msg_ais, return FALSE); if (local_name == NULL) { local_name = get_local_node_name(); } if ((local_name_len == 0) && (local_name != NULL)) { local_name_len = strlen(local_name); } if (data == NULL) { data = ""; } if (local_pid == 0) { local_pid = getpid(); } if (sender == crm_msg_none) { sender = local_pid; } msg = calloc(1, sizeof(pcmk__cpg_msg_t)); msg_id++; msg->id = msg_id; msg->header.id = msg_class; msg->header.error = CS_OK; msg->host.type = dest; msg->host.local = local; if (node) { if (node->uname) { target = strdup(node->uname); msg->host.size = strlen(node->uname); memset(msg->host.uname, 0, MAX_NAME); memcpy(msg->host.uname, node->uname, msg->host.size); } else { target = crm_strdup_printf("%u", node->id); } msg->host.id = node->id; } else { target = strdup("all"); } msg->sender.id = 0; msg->sender.type = sender; msg->sender.pid = local_pid; msg->sender.size = local_name_len; memset(msg->sender.uname, 0, MAX_NAME); if ((local_name != NULL) && (msg->sender.size != 0)) { memcpy(msg->sender.uname, local_name, msg->sender.size); } msg->size = 1 + strlen(data); msg->header.size = sizeof(pcmk__cpg_msg_t) + msg->size; if (msg->size < CRM_BZ2_THRESHOLD) { msg = pcmk__realloc(msg, msg->header.size); memcpy(msg->data, data, msg->size); } else { char *compressed = NULL; unsigned int new_size = 0; char *uncompressed = strdup(data); if (pcmk__compress(uncompressed, (unsigned int) msg->size, 0, &compressed, &new_size) == pcmk_rc_ok) { msg->header.size = sizeof(pcmk__cpg_msg_t) + new_size; msg = pcmk__realloc(msg, msg->header.size); memcpy(msg->data, compressed, new_size); msg->is_compressed = TRUE; msg->compressed_size = new_size; } else { // cppcheck seems not to understand the abort logic in pcmk__realloc // cppcheck-suppress memleak msg = pcmk__realloc(msg, msg->header.size); memcpy(msg->data, data, msg->size); } free(uncompressed); free(compressed); } iov = calloc(1, sizeof(struct iovec)); iov->iov_base = msg; iov->iov_len = msg->header.size; if (msg->compressed_size) { crm_trace("Queueing CPG message %u to %s (%llu bytes, %d bytes compressed payload): %.200s", msg->id, target, (unsigned long long) iov->iov_len, msg->compressed_size, data); } else { crm_trace("Queueing CPG message %u to %s (%llu bytes, %d bytes payload): %.200s", msg->id, target, (unsigned long long) iov->iov_len, msg->size, data); } free(target); cs_message_queue = g_list_append(cs_message_queue, iov); crm_cs_flush(&pcmk_cpg_handle); return TRUE; } /*! * \brief Get the message type equivalent of a string * * \param[in] text String of message type * * \return Message type equivalent of \p text */ enum crm_ais_msg_types text2msg_type(const char *text) { int type = crm_msg_none; CRM_CHECK(text != NULL, return type); text = pcmk__message_name(text); if (pcmk__str_eq(text, "ais", pcmk__str_casei)) { type = crm_msg_ais; } else if (pcmk__str_eq(text, CRM_SYSTEM_CIB, pcmk__str_casei)) { type = crm_msg_cib; } else if (pcmk__strcase_any_of(text, CRM_SYSTEM_CRMD, CRM_SYSTEM_DC, NULL)) { type = crm_msg_crmd; } else if (pcmk__str_eq(text, CRM_SYSTEM_TENGINE, pcmk__str_casei)) { type = crm_msg_te; } else if (pcmk__str_eq(text, CRM_SYSTEM_PENGINE, pcmk__str_casei)) { type = crm_msg_pe; } else if (pcmk__str_eq(text, CRM_SYSTEM_LRMD, pcmk__str_casei)) { type = crm_msg_lrmd; } else if (pcmk__str_eq(text, CRM_SYSTEM_STONITHD, pcmk__str_casei)) { type = crm_msg_stonithd; } else if (pcmk__str_eq(text, "stonith-ng", pcmk__str_casei)) { type = crm_msg_stonith_ng; } else if (pcmk__str_eq(text, "attrd", pcmk__str_casei)) { type = crm_msg_attrd; } else { /* This will normally be a transient client rather than * a cluster daemon. Set the type to the pid of the client */ int scan_rc = sscanf(text, "%d", &type); if (scan_rc != 1 || type <= crm_msg_stonith_ng) { /* Ensure it's sane */ type = crm_msg_none; } } return type; } diff --git a/lib/cluster/membership.c b/lib/cluster/membership.c index 0c54f1938f..ce612c8398 100644 --- a/lib/cluster/membership.c +++ b/lib/cluster/membership.c @@ -1,1301 +1,1365 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #include #ifndef _GNU_SOURCE # define _GNU_SOURCE #endif #include #include #include #include #include #include #include #include #include #include #include #include "crmcluster_private.h" /* The peer cache remembers cluster nodes that have been seen. * This is managed mostly automatically by libcluster, based on * cluster membership events. * * Because cluster nodes can have conflicting names or UUIDs, * the hash table key is a uniquely generated ID. */ GHashTable *crm_peer_cache = NULL; /* * The remote peer cache tracks pacemaker_remote nodes. While the * value has the same type as the peer cache's, it is tracked separately for * three reasons: pacemaker_remote nodes can't have conflicting names or UUIDs, * so the name (which is also the UUID) is used as the hash table key; there * is no equivalent of membership events, so management is not automatic; and * most users of the peer cache need to exclude pacemaker_remote nodes. * * That said, using a single cache would be more logical and less error-prone, * so it would be a good idea to merge them one day. * * libcluster provides two avenues for populating the cache: * crm_remote_peer_get() and crm_remote_peer_cache_remove() directly manage it, * while crm_remote_peer_cache_refresh() populates it via the CIB. */ GHashTable *crm_remote_peer_cache = NULL; /* * The known node cache tracks cluster and remote nodes that have been seen in * the CIB. It is useful mainly when a caller needs to know about a node that * may no longer be in the membership, but doesn't want to add the node to the * main peer cache tables. */ static GHashTable *known_node_cache = NULL; unsigned long long crm_peer_seq = 0; gboolean crm_have_quorum = FALSE; static gboolean crm_autoreap = TRUE; // Flag setting and clearing for crm_node_t:flags #define set_peer_flags(peer, flags_to_set) do { \ (peer)->flags = pcmk__set_flags_as(__func__, __LINE__, LOG_TRACE, \ "Peer", (peer)->uname, \ (peer)->flags, (flags_to_set), \ #flags_to_set); \ } while (0) #define clear_peer_flags(peer, flags_to_clear) do { \ (peer)->flags = pcmk__clear_flags_as(__func__, __LINE__, \ LOG_TRACE, \ "Peer", (peer)->uname, \ (peer)->flags, (flags_to_clear), \ #flags_to_clear); \ } while (0) static void update_peer_uname(crm_node_t *node, const char *uname); int crm_remote_peer_cache_size(void) { if (crm_remote_peer_cache == NULL) { return 0; } return g_hash_table_size(crm_remote_peer_cache); } /*! * \brief Get a remote node peer cache entry, creating it if necessary * * \param[in] node_name Name of remote node * * \return Cache entry for node on success, NULL (and set errno) otherwise * * \note When creating a new entry, this will leave the node state undetermined, * so the caller should also call pcmk__update_peer_state() if the state * is known. */ crm_node_t * crm_remote_peer_get(const char *node_name) { crm_node_t *node; if (node_name == NULL) { errno = -EINVAL; return NULL; } /* Return existing cache entry if one exists */ node = g_hash_table_lookup(crm_remote_peer_cache, node_name); if (node) { return node; } /* Allocate a new entry */ node = calloc(1, sizeof(crm_node_t)); if (node == NULL) { return NULL; } /* Populate the essential information */ set_peer_flags(node, crm_remote_node); node->uuid = strdup(node_name); if (node->uuid == NULL) { free(node); errno = -ENOMEM; return NULL; } /* Add the new entry to the cache */ g_hash_table_replace(crm_remote_peer_cache, node->uuid, node); crm_trace("added %s to remote cache", node_name); /* Update the entry's uname, ensuring peer status callbacks are called */ update_peer_uname(node, node_name); return node; } void crm_remote_peer_cache_remove(const char *node_name) { if (g_hash_table_remove(crm_remote_peer_cache, node_name)) { crm_trace("removed %s from remote peer cache", node_name); } } /*! * \internal * \brief Return node status based on a CIB status entry * * \param[in] node_state XML of node state * * \return CRM_NODE_LOST if XML_NODE_IN_CLUSTER is false in node_state, * CRM_NODE_MEMBER otherwise * \note Unlike most boolean XML attributes, this one defaults to true, for * backward compatibility with older controllers that don't set it. */ static const char * remote_state_from_cib(const xmlNode *node_state) { bool status = false; if (pcmk__xe_get_bool_attr(node_state, XML_NODE_IN_CLUSTER, &status) == pcmk_rc_ok && !status) { return CRM_NODE_LOST; } else { return CRM_NODE_MEMBER; } } /* user data for looping through remote node xpath searches */ struct refresh_data { const char *field; /* XML attribute to check for node name */ gboolean has_state; /* whether to update node state based on XML */ }; /*! * \internal * \brief Process one pacemaker_remote node xpath search result * * \param[in] result XML search result * \param[in] user_data what to look for in the XML */ static void remote_cache_refresh_helper(xmlNode *result, void *user_data) { const struct refresh_data *data = user_data; const char *remote = crm_element_value(result, data->field); const char *state = NULL; crm_node_t *node; CRM_CHECK(remote != NULL, return); /* Determine node's state, if the result has it */ if (data->has_state) { state = remote_state_from_cib(result); } /* Check whether cache already has entry for node */ node = g_hash_table_lookup(crm_remote_peer_cache, remote); if (node == NULL) { /* Node is not in cache, so add a new entry for it */ node = crm_remote_peer_get(remote); CRM_ASSERT(node); if (state) { pcmk__update_peer_state(__func__, node, state, 0); } } else if (pcmk_is_set(node->flags, crm_node_dirty)) { /* Node is in cache and hasn't been updated already, so mark it clean */ clear_peer_flags(node, crm_node_dirty); if (state) { pcmk__update_peer_state(__func__, node, state, 0); } } } static void mark_dirty(gpointer key, gpointer value, gpointer user_data) { set_peer_flags((crm_node_t *) value, crm_node_dirty); } static gboolean is_dirty(gpointer key, gpointer value, gpointer user_data) { return pcmk_is_set(((crm_node_t*)value)->flags, crm_node_dirty); } /*! * \brief Repopulate the remote peer cache based on CIB XML * * \param[in] xmlNode CIB XML to parse */ void crm_remote_peer_cache_refresh(xmlNode *cib) { struct refresh_data data; crm_peer_init(); /* First, we mark all existing cache entries as dirty, * so that later we can remove any that weren't in the CIB. * We don't empty the cache, because we need to detect changes in state. */ g_hash_table_foreach(crm_remote_peer_cache, mark_dirty, NULL); /* Look for guest nodes and remote nodes in the status section */ data.field = "id"; data.has_state = TRUE; crm_foreach_xpath_result(cib, PCMK__XP_REMOTE_NODE_STATUS, remote_cache_refresh_helper, &data); /* Look for guest nodes and remote nodes in the configuration section, * because they may have just been added and not have a status entry yet. * In that case, the cached node state will be left NULL, so that the * peer status callback isn't called until we're sure the node started * successfully. */ data.field = "value"; data.has_state = FALSE; crm_foreach_xpath_result(cib, PCMK__XP_GUEST_NODE_CONFIG, remote_cache_refresh_helper, &data); data.field = "id"; data.has_state = FALSE; crm_foreach_xpath_result(cib, PCMK__XP_REMOTE_NODE_CONFIG, remote_cache_refresh_helper, &data); /* Remove all old cache entries that weren't seen in the CIB */ g_hash_table_foreach_remove(crm_remote_peer_cache, is_dirty, NULL); } gboolean crm_is_peer_active(const crm_node_t * node) { if(node == NULL) { return FALSE; } if (pcmk_is_set(node->flags, crm_remote_node)) { /* remote nodes are never considered active members. This * guarantees they will never be considered for DC membership.*/ return FALSE; } #if SUPPORT_COROSYNC if (is_corosync_cluster()) { return crm_is_corosync_peer_active(node); } #endif crm_err("Unhandled cluster type: %s", name_for_cluster_type(get_cluster_type())); return FALSE; } static gboolean crm_reap_dead_member(gpointer key, gpointer value, gpointer user_data) { crm_node_t *node = value; crm_node_t *search = user_data; if (search == NULL) { return FALSE; } else if (search->id && node->id != search->id) { return FALSE; } else if (search->id == 0 && !pcmk__str_eq(node->uname, search->uname, pcmk__str_casei)) { return FALSE; } else if (crm_is_peer_active(value) == FALSE) { crm_info("Removing node with name %s and id %u from membership cache", (node->uname? node->uname : "unknown"), node->id); return TRUE; } return FALSE; } /*! * \brief Remove all peer cache entries matching a node ID and/or uname * * \param[in] id ID of node to remove (or 0 to ignore) * \param[in] name Uname of node to remove (or NULL to ignore) * * \return Number of cache entries removed */ guint reap_crm_member(uint32_t id, const char *name) { int matches = 0; crm_node_t search = { 0, }; if (crm_peer_cache == NULL) { crm_trace("Membership cache not initialized, ignoring purge request"); return 0; } search.id = id; pcmk__str_update(&search.uname, name); matches = g_hash_table_foreach_remove(crm_peer_cache, crm_reap_dead_member, &search); if(matches) { crm_notice("Purged %d peer%s with id=%u%s%s from the membership cache", matches, pcmk__plural_s(matches), search.id, (search.uname? " and/or uname=" : ""), (search.uname? search.uname : "")); } else { crm_info("No peers with id=%u%s%s to purge from the membership cache", search.id, (search.uname? " and/or uname=" : ""), (search.uname? search.uname : "")); } free(search.uname); return matches; } static void count_peer(gpointer key, gpointer value, gpointer user_data) { guint *count = user_data; crm_node_t *node = value; if (crm_is_peer_active(node)) { *count = *count + 1; } } guint crm_active_peers(void) { guint count = 0; if (crm_peer_cache) { g_hash_table_foreach(crm_peer_cache, count_peer, &count); } return count; } static void destroy_crm_node(gpointer data) { crm_node_t *node = data; crm_trace("Destroying entry for node %u: %s", node->id, node->uname); free(node->uname); free(node->state); free(node->uuid); free(node->expected); free(node->conn_host); free(node); } void crm_peer_init(void) { if (crm_peer_cache == NULL) { crm_peer_cache = pcmk__strikey_table(free, destroy_crm_node); } if (crm_remote_peer_cache == NULL) { crm_remote_peer_cache = pcmk__strikey_table(NULL, destroy_crm_node); } if (known_node_cache == NULL) { known_node_cache = pcmk__strikey_table(free, destroy_crm_node); } } void crm_peer_destroy(void) { if (crm_peer_cache != NULL) { crm_trace("Destroying peer cache with %d members", g_hash_table_size(crm_peer_cache)); g_hash_table_destroy(crm_peer_cache); crm_peer_cache = NULL; } if (crm_remote_peer_cache != NULL) { crm_trace("Destroying remote peer cache with %d members", g_hash_table_size(crm_remote_peer_cache)); g_hash_table_destroy(crm_remote_peer_cache); crm_remote_peer_cache = NULL; } if (known_node_cache != NULL) { crm_trace("Destroying known node cache with %d members", g_hash_table_size(known_node_cache)); g_hash_table_destroy(known_node_cache); known_node_cache = NULL; } } static void (*peer_status_callback)(enum crm_status_type, crm_node_t *, const void *) = NULL; /*! * \brief Set a client function that will be called after peer status changes * * \param[in] dispatch Pointer to function to use as callback * * \note Previously, client callbacks were responsible for peer cache * management. This is no longer the case, and client callbacks should do * only client-specific handling. Callbacks MUST NOT add or remove entries * in the peer caches. */ void crm_set_status_callback(void (*dispatch) (enum crm_status_type, crm_node_t *, const void *)) { peer_status_callback = dispatch; } /*! * \brief Tell the library whether to automatically reap lost nodes * * If TRUE (the default), calling crm_update_peer_proc() will also update the * peer state to CRM_NODE_MEMBER or CRM_NODE_LOST, and pcmk__update_peer_state() * will reap peers whose state changes to anything other than CRM_NODE_MEMBER. * Callers should leave this enabled unless they plan to manage the cache * separately on their own. * * \param[in] autoreap TRUE to enable automatic reaping, FALSE to disable */ void crm_set_autoreap(gboolean autoreap) { crm_autoreap = autoreap; } static void dump_peer_hash(int level, const char *caller) { GHashTableIter iter; const char *id = NULL; crm_node_t *node = NULL; g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, (gpointer *) &id, (gpointer *) &node)) { do_crm_log(level, "%s: Node %u/%s = %p - %s", caller, node->id, node->uname, node, id); } } static gboolean hash_find_by_data(gpointer key, gpointer value, gpointer user_data) { return value == user_data; } /*! * \internal * \brief Search caches for a node (cluster or Pacemaker Remote) * * \param[in] id If not 0, cluster node ID to search for * \param[in] uname If not NULL, node name to search for * \param[in] flags Bitmask of enum crm_get_peer_flags * * \return Node cache entry if found, otherwise NULL */ crm_node_t * pcmk__search_node_caches(unsigned int id, const char *uname, uint32_t flags) { crm_node_t *node = NULL; CRM_ASSERT(id > 0 || uname != NULL); crm_peer_init(); if ((uname != NULL) && pcmk_is_set(flags, CRM_GET_PEER_REMOTE)) { node = g_hash_table_lookup(crm_remote_peer_cache, uname); } if ((node == NULL) && pcmk_is_set(flags, CRM_GET_PEER_CLUSTER)) { - node = pcmk__search_cluster_node_cache(id, uname); + node = pcmk__search_cluster_node_cache(id, uname, NULL); } return node; } /*! * \brief Get a node cache entry (cluster or Pacemaker Remote) * * \param[in] id If not 0, cluster node ID to search for * \param[in] uname If not NULL, node name to search for + * \param[in] uuid If not NULL while id is 0, node UUID instead of cluster + * node ID to search for * \param[in] flags Bitmask of enum crm_get_peer_flags * * \return (Possibly newly created) node cache entry */ crm_node_t * -crm_get_peer_full(unsigned int id, const char *uname, int flags) +pcmk__get_peer_full(unsigned int id, const char *uname, const char *uuid, + int flags) { crm_node_t *node = NULL; CRM_ASSERT(id > 0 || uname != NULL); crm_peer_init(); if (pcmk_is_set(flags, CRM_GET_PEER_REMOTE)) { node = g_hash_table_lookup(crm_remote_peer_cache, uname); } if ((node == NULL) && pcmk_is_set(flags, CRM_GET_PEER_CLUSTER)) { - node = crm_get_peer(id, uname); + node = pcmk__get_peer(id, uname, uuid); } return node; } +/*! + * \brief Get a node cache entry (cluster or Pacemaker Remote) + * + * \param[in] id If not 0, cluster node ID to search for + * \param[in] uname If not NULL, node name to search for + * \param[in] flags Bitmask of enum crm_get_peer_flags + * + * \return (Possibly newly created) node cache entry + */ +crm_node_t * +crm_get_peer_full(unsigned int id, const char *uname, int flags) +{ + return pcmk__get_peer_full(id, uname, NULL, flags); +} + /*! * \internal * \brief Search cluster node cache * * \param[in] id If not 0, cluster node ID to search for * \param[in] uname If not NULL, node name to search for + * \param[in] uuid If not NULL while id is 0, node UUID instead of cluster + * node ID to search for * * \return Cluster node cache entry if found, otherwise NULL */ crm_node_t * -pcmk__search_cluster_node_cache(unsigned int id, const char *uname) +pcmk__search_cluster_node_cache(unsigned int id, const char *uname, + const char *uuid) { GHashTableIter iter; crm_node_t *node = NULL; crm_node_t *by_id = NULL; crm_node_t *by_name = NULL; CRM_ASSERT(id > 0 || uname != NULL); crm_peer_init(); if (uname != NULL) { g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { if(node->uname && strcasecmp(node->uname, uname) == 0) { crm_trace("Name match: %s = %p", node->uname, node); by_name = node; break; } } } if (id > 0) { g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { if(node->id == id) { crm_trace("ID match: %u = %p", node->id, node); by_id = node; break; } } + + } else if (uuid != NULL) { + g_hash_table_iter_init(&iter, crm_peer_cache); + while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { + if (pcmk__str_eq(node->uuid, uuid, pcmk__str_casei)) { + crm_trace("UUID match: %s = %p", node->uuid, node); + by_id = node; + break; + } + } } node = by_id; /* Good default */ if(by_id == by_name) { /* Nothing to do if they match (both NULL counts) */ crm_trace("Consistent: %p for %u/%s", by_id, id, uname); } else if(by_id == NULL && by_name) { crm_trace("Only one: %p for %u/%s", by_name, id, uname); if(id && by_name->id) { dump_peer_hash(LOG_WARNING, __func__); crm_crit("Node %u and %u share the same name '%s'", id, by_name->id, uname); node = NULL; /* Create a new one */ } else { node = by_name; } } else if(by_name == NULL && by_id) { crm_trace("Only one: %p for %u/%s", by_id, id, uname); if(uname && by_id->uname) { dump_peer_hash(LOG_WARNING, __func__); crm_crit("Node '%s' and '%s' share the same cluster nodeid %u: assuming '%s' is correct", uname, by_id->uname, id, uname); } } else if(uname && by_id->uname) { if(pcmk__str_eq(uname, by_id->uname, pcmk__str_casei)) { crm_notice("Node '%s' has changed its ID from %u to %u", by_id->uname, by_name->id, by_id->id); g_hash_table_foreach_remove(crm_peer_cache, hash_find_by_data, by_name); } else { crm_warn("Node '%s' and '%s' share the same cluster nodeid: %u %s", by_id->uname, by_name->uname, id, uname); dump_peer_hash(LOG_INFO, __func__); crm_abort(__FILE__, __func__, __LINE__, "member weirdness", TRUE, TRUE); } } else if(id && by_name->id) { crm_warn("Node %u and %u share the same name: '%s'", by_id->id, by_name->id, uname); } else { /* Simple merge */ /* Only corosync-based clusters use node IDs. The functions that call * pcmk__update_peer_state() and crm_update_peer_proc() only know * nodeid, so 'by_id' is authoritative when merging. */ dump_peer_hash(LOG_DEBUG, __func__); crm_info("Merging %p into %p", by_name, by_id); g_hash_table_foreach_remove(crm_peer_cache, hash_find_by_data, by_name); } return node; } #if SUPPORT_COROSYNC static guint remove_conflicting_peer(crm_node_t *node) { int matches = 0; GHashTableIter iter; crm_node_t *existing_node = NULL; if (node->id == 0 || node->uname == NULL) { return 0; } if (!pcmk__corosync_has_nodelist()) { return 0; } g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &existing_node)) { if (existing_node->id > 0 && existing_node->id != node->id && existing_node->uname != NULL && strcasecmp(existing_node->uname, node->uname) == 0) { if (crm_is_peer_active(existing_node)) { continue; } crm_warn("Removing cached offline node %u/%s which has conflicting uname with %u", existing_node->id, existing_node->uname, node->id); g_hash_table_iter_remove(&iter); matches++; } } return matches; } #endif /*! * \brief Get a cluster node cache entry * * \param[in] id If not 0, cluster node ID to search for * \param[in] uname If not NULL, node name to search for + * \param[in] uuid If not NULL while id is 0, node UUID instead of cluster + * node ID to search for * * \return (Possibly newly created) cluster node cache entry */ /* coverity[-alloc] Memory is referenced in one or both hashtables */ crm_node_t * -crm_get_peer(unsigned int id, const char *uname) +pcmk__get_peer(unsigned int id, const char *uname, const char *uuid) { crm_node_t *node = NULL; char *uname_lookup = NULL; CRM_ASSERT(id > 0 || uname != NULL); crm_peer_init(); - node = pcmk__search_cluster_node_cache(id, uname); + node = pcmk__search_cluster_node_cache(id, uname, uuid); /* if uname wasn't provided, and find_peer did not turn up a uname based on id. * we need to do a lookup of the node name using the id in the cluster membership. */ if ((node == NULL || node->uname == NULL) && (uname == NULL)) { uname_lookup = get_node_name(id); } if (uname_lookup) { uname = uname_lookup; crm_trace("Inferred a name of '%s' for node %u", uname, id); /* try to turn up the node one more time now that we know the uname. */ if (node == NULL) { - node = pcmk__search_cluster_node_cache(id, uname); + node = pcmk__search_cluster_node_cache(id, uname, uuid); } } if (node == NULL) { char *uniqueid = crm_generate_uuid(); node = calloc(1, sizeof(crm_node_t)); CRM_ASSERT(node); crm_info("Created entry %s/%p for node %s/%u (%d total)", uniqueid, node, uname, id, 1 + g_hash_table_size(crm_peer_cache)); g_hash_table_replace(crm_peer_cache, uniqueid, node); } if(id > 0 && uname && (node->id == 0 || node->uname == NULL)) { crm_info("Node %u is now known as %s", id, uname); } if(id > 0 && node->id == 0) { node->id = id; } if (uname && (node->uname == NULL)) { update_peer_uname(node, uname); } if(node->uuid == NULL) { - const char *uuid = crm_peer_uuid(node); + if (uuid == NULL) { + uuid = crm_peer_uuid(node); + } if (uuid) { crm_info("Node %u has uuid %s", id, uuid); } else { crm_info("Cannot obtain a UUID for node %u/%s", id, node->uname); } } free(uname_lookup); return node; } +/*! + * \brief Get a cluster node cache entry + * + * \param[in] id If not 0, cluster node ID to search for + * \param[in] uname If not NULL, node name to search for + * + * \return (Possibly newly created) cluster node cache entry + */ +/* coverity[-alloc] Memory is referenced in one or both hashtables */ +crm_node_t * +crm_get_peer(unsigned int id, const char *uname) +{ + return pcmk__get_peer(id, uname, NULL); +} + /*! * \internal * \brief Update a node's uname * * \param[in,out] node Node object to update * \param[in] uname New name to set * * \note This function should not be called within a peer cache iteration, * because in some cases it can remove conflicting cache entries, * which would invalidate the iterator. */ static void update_peer_uname(crm_node_t *node, const char *uname) { CRM_CHECK(uname != NULL, crm_err("Bug: can't update node name without name"); return); CRM_CHECK(node != NULL, crm_err("Bug: can't update node name to %s without node", uname); return); if (pcmk__str_eq(uname, node->uname, pcmk__str_casei)) { crm_debug("Node uname '%s' did not change", uname); return; } for (const char *c = uname; *c; ++c) { if ((*c >= 'A') && (*c <= 'Z')) { crm_warn("Node names with capitals are discouraged, consider changing '%s'", uname); break; } } pcmk__str_update(&node->uname, uname); if (peer_status_callback != NULL) { peer_status_callback(crm_status_uname, node, NULL); } #if SUPPORT_COROSYNC if (is_corosync_cluster() && !pcmk_is_set(node->flags, crm_remote_node)) { remove_conflicting_peer(node); } #endif } /*! * \internal * \brief Get log-friendly string equivalent of a process flag * * \param[in] proc Process flag * * \return Log-friendly string equivalent of \p proc */ static inline const char * proc2text(enum crm_proc_flag proc) { const char *text = "unknown"; switch (proc) { case crm_proc_none: text = "none"; break; case crm_proc_based: text = "pacemaker-based"; break; case crm_proc_controld: text = "pacemaker-controld"; break; case crm_proc_schedulerd: text = "pacemaker-schedulerd"; break; case crm_proc_execd: text = "pacemaker-execd"; break; case crm_proc_attrd: text = "pacemaker-attrd"; break; case crm_proc_fenced: text = "pacemaker-fenced"; break; case crm_proc_cpg: text = "corosync-cpg"; break; } return text; } /*! * \internal * \brief Update a node's process information (and potentially state) * * \param[in] source Caller's function name (for log messages) * \param[in,out] node Node object to update * \param[in] flag Bitmask of new process information * \param[in] status node status (online, offline, etc.) * * \return NULL if any node was reaped from peer caches, value of node otherwise * * \note If this function returns NULL, the supplied node object was likely * freed and should not be used again. This function should not be * called within a cache iteration if reaping is possible, otherwise * reaping could invalidate the iterator. */ crm_node_t * crm_update_peer_proc(const char *source, crm_node_t * node, uint32_t flag, const char *status) { uint32_t last = 0; gboolean changed = FALSE; CRM_CHECK(node != NULL, crm_err("%s: Could not set %s to %s for NULL", source, proc2text(flag), status); return NULL); /* Pacemaker doesn't spawn processes on remote nodes */ if (pcmk_is_set(node->flags, crm_remote_node)) { return node; } last = node->processes; if (status == NULL) { node->processes = flag; if (node->processes != last) { changed = TRUE; } } else if (pcmk__str_eq(status, ONLINESTATUS, pcmk__str_casei)) { if ((node->processes & flag) != flag) { node->processes = pcmk__set_flags_as(__func__, __LINE__, LOG_TRACE, "Peer process", node->uname, node->processes, flag, "processes"); changed = TRUE; } } else if (node->processes & flag) { node->processes = pcmk__clear_flags_as(__func__, __LINE__, LOG_TRACE, "Peer process", node->uname, node->processes, flag, "processes"); changed = TRUE; } if (changed) { if (status == NULL && flag <= crm_proc_none) { crm_info("%s: Node %s[%u] - all processes are now offline", source, node->uname, node->id); } else { crm_info("%s: Node %s[%u] - %s is now %s", source, node->uname, node->id, proc2text(flag), status); } + if (pcmk_is_set(node->processes, crm_get_cluster_proc())) { + node->when_online = time(NULL); + + } else { + node->when_online = 0; + } + /* Call the client callback first, then update the peer state, * in case the node will be reaped */ if (peer_status_callback != NULL) { peer_status_callback(crm_status_processes, node, &last); } /* The client callback shouldn't touch the peer caches, * but as a safety net, bail if the peer cache was destroyed. */ if (crm_peer_cache == NULL) { return NULL; } if (crm_autoreap) { const char *peer_state = NULL; if (pcmk_is_set(node->processes, crm_get_cluster_proc())) { peer_state = CRM_NODE_MEMBER; } else { peer_state = CRM_NODE_LOST; } node = pcmk__update_peer_state(__func__, node, peer_state, 0); } } else { crm_trace("%s: Node %s[%u] - %s is unchanged (%s)", source, node->uname, node->id, proc2text(flag), status); } return node; } /*! * \internal * \brief Update a cluster node cache entry's expected join state * * \param[in] source Caller's function name (for logging) * \param[in,out] node Node to update * \param[in] expected Node's new join state */ void pcmk__update_peer_expected(const char *source, crm_node_t *node, const char *expected) { char *last = NULL; gboolean changed = FALSE; CRM_CHECK(node != NULL, crm_err("%s: Could not set 'expected' to %s", source, expected); return); /* Remote nodes don't participate in joins */ if (pcmk_is_set(node->flags, crm_remote_node)) { return; } last = node->expected; if (expected != NULL && !pcmk__str_eq(node->expected, expected, pcmk__str_casei)) { node->expected = strdup(expected); changed = TRUE; } if (changed) { crm_info("%s: Node %s[%u] - expected state is now %s (was %s)", source, node->uname, node->id, expected, last); free(last); } else { crm_trace("%s: Node %s[%u] - expected state is unchanged (%s)", source, node->uname, node->id, expected); } } /*! * \internal * \brief Update a node's state and membership information * * \param[in] source Caller's function name (for log messages) * \param[in,out] node Node object to update * \param[in] state Node's new state * \param[in] membership Node's new membership ID * \param[in,out] iter If not NULL, pointer to node's peer cache iterator * * \return NULL if any node was reaped, value of node otherwise * * \note If this function returns NULL, the supplied node object was likely * freed and should not be used again. This function may be called from * within a peer cache iteration if the iterator is supplied. */ static crm_node_t * update_peer_state_iter(const char *source, crm_node_t *node, const char *state, uint64_t membership, GHashTableIter *iter) { gboolean is_member; CRM_CHECK(node != NULL, crm_err("Could not set state for unknown host to %s" CRM_XS " source=%s", state, source); return NULL); is_member = pcmk__str_eq(state, CRM_NODE_MEMBER, pcmk__str_casei); if (is_member) { node->when_lost = 0; if (membership) { node->last_seen = membership; } } if (state && !pcmk__str_eq(node->state, state, pcmk__str_casei)) { char *last = node->state; + if (is_member) { + node->when_member = time(NULL); + + } else { + node->when_member = 0; + } + node->state = strdup(state); crm_notice("Node %s state is now %s " CRM_XS " nodeid=%u previous=%s source=%s", node->uname, state, node->id, (last? last : "unknown"), source); if (peer_status_callback != NULL) { peer_status_callback(crm_status_nstate, node, last); } free(last); if (crm_autoreap && !is_member && !pcmk_is_set(node->flags, crm_remote_node)) { /* We only autoreap from the peer cache, not the remote peer cache, * because the latter should be managed only by * crm_remote_peer_cache_refresh(). */ if(iter) { crm_notice("Purged 1 peer with id=%u and/or uname=%s from the membership cache", node->id, node->uname); g_hash_table_iter_remove(iter); } else { reap_crm_member(node->id, node->uname); } node = NULL; } } else { crm_trace("Node %s state is unchanged (%s) " CRM_XS " nodeid=%u source=%s", node->uname, state, node->id, source); } return node; } /*! * \brief Update a node's state and membership information * * \param[in] source Caller's function name (for log messages) * \param[in,out] node Node object to update * \param[in] state Node's new state * \param[in] membership Node's new membership ID * * \return NULL if any node was reaped, value of node otherwise * * \note If this function returns NULL, the supplied node object was likely * freed and should not be used again. This function should not be * called within a cache iteration if reaping is possible, * otherwise reaping could invalidate the iterator. */ crm_node_t * pcmk__update_peer_state(const char *source, crm_node_t *node, const char *state, uint64_t membership) { return update_peer_state_iter(source, node, state, membership, NULL); } /*! * \internal * \brief Reap all nodes from cache whose membership information does not match * * \param[in] membership Membership ID of nodes to keep */ void pcmk__reap_unseen_nodes(uint64_t membership) { GHashTableIter iter; crm_node_t *node = NULL; crm_trace("Reaping unseen nodes..."); g_hash_table_iter_init(&iter, crm_peer_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *)&node)) { if (node->last_seen != membership) { if (node->state) { /* * Calling update_peer_state_iter() allows us to * remove the node from crm_peer_cache without * invalidating our iterator */ update_peer_state_iter(__func__, node, CRM_NODE_LOST, membership, &iter); } else { crm_info("State of node %s[%u] is still unknown", node->uname, node->id); } } } } static crm_node_t * find_known_node(const char *id, const char *uname) { GHashTableIter iter; crm_node_t *node = NULL; crm_node_t *by_id = NULL; crm_node_t *by_name = NULL; if (uname) { g_hash_table_iter_init(&iter, known_node_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { if (node->uname && strcasecmp(node->uname, uname) == 0) { crm_trace("Name match: %s = %p", node->uname, node); by_name = node; break; } } } if (id) { g_hash_table_iter_init(&iter, known_node_cache); while (g_hash_table_iter_next(&iter, NULL, (gpointer *) &node)) { if(strcasecmp(node->uuid, id) == 0) { crm_trace("ID match: %s= %p", id, node); by_id = node; break; } } } node = by_id; /* Good default */ if (by_id == by_name) { /* Nothing to do if they match (both NULL counts) */ crm_trace("Consistent: %p for %s/%s", by_id, id, uname); } else if (by_id == NULL && by_name) { crm_trace("Only one: %p for %s/%s", by_name, id, uname); if (id) { node = NULL; } else { node = by_name; } } else if (by_name == NULL && by_id) { crm_trace("Only one: %p for %s/%s", by_id, id, uname); if (uname) { node = NULL; } } else if (uname && by_id->uname && pcmk__str_eq(uname, by_id->uname, pcmk__str_casei)) { /* Multiple nodes have the same uname in the CIB. * Return by_id. */ } else if (id && by_name->uuid && pcmk__str_eq(id, by_name->uuid, pcmk__str_casei)) { /* Multiple nodes have the same id in the CIB. * Return by_name. */ node = by_name; } else { node = NULL; } if (node == NULL) { crm_debug("Couldn't find node%s%s%s%s", id? " " : "", id? id : "", uname? " with name " : "", uname? uname : ""); } return node; } static void known_node_cache_refresh_helper(xmlNode *xml_node, void *user_data) { const char *id = crm_element_value(xml_node, XML_ATTR_ID); const char *uname = crm_element_value(xml_node, XML_ATTR_UNAME); crm_node_t * node = NULL; CRM_CHECK(id != NULL && uname !=NULL, return); node = find_known_node(id, uname); if (node == NULL) { char *uniqueid = crm_generate_uuid(); node = calloc(1, sizeof(crm_node_t)); CRM_ASSERT(node != NULL); node->uname = strdup(uname); CRM_ASSERT(node->uname != NULL); node->uuid = strdup(id); CRM_ASSERT(node->uuid != NULL); g_hash_table_replace(known_node_cache, uniqueid, node); } else if (pcmk_is_set(node->flags, crm_node_dirty)) { pcmk__str_update(&node->uname, uname); /* Node is in cache and hasn't been updated already, so mark it clean */ clear_peer_flags(node, crm_node_dirty); } } static void refresh_known_node_cache(xmlNode *cib) { crm_peer_init(); g_hash_table_foreach(known_node_cache, mark_dirty, NULL); crm_foreach_xpath_result(cib, PCMK__XP_MEMBER_NODE_CONFIG, known_node_cache_refresh_helper, NULL); /* Remove all old cache entries that weren't seen in the CIB */ g_hash_table_foreach_remove(known_node_cache, is_dirty, NULL); } void pcmk__refresh_node_caches_from_cib(xmlNode *cib) { crm_remote_peer_cache_refresh(cib); refresh_known_node_cache(cib); } /*! * \internal * \brief Search known node cache * * \param[in] id If not 0, cluster node ID to search for * \param[in] uname If not NULL, node name to search for * \param[in] flags Bitmask of enum crm_get_peer_flags * * \return Known node cache entry if found, otherwise NULL */ crm_node_t * pcmk__search_known_node_cache(unsigned int id, const char *uname, uint32_t flags) { crm_node_t *node = NULL; char *id_str = NULL; CRM_ASSERT(id > 0 || uname != NULL); node = pcmk__search_node_caches(id, uname, flags); if (node || !(flags & CRM_GET_PEER_CLUSTER)) { return node; } if (id > 0) { id_str = crm_strdup_printf("%u", id); } node = find_known_node(id_str, uname); free(id_str); return node; } // Deprecated functions kept only for backward API compatibility // LCOV_EXCL_START #include int crm_terminate_member(int nodeid, const char *uname, void *unused) { return stonith_api_kick(nodeid, uname, 120, TRUE); } int crm_terminate_member_no_mainloop(int nodeid, const char *uname, int *connection) { return stonith_api_kick(nodeid, uname, 120, TRUE); } // LCOV_EXCL_STOP // End deprecated API diff --git a/lib/pengine/common.c b/lib/pengine/common.c index 6c69bfcb41..a68219c8c8 100644 --- a/lib/pengine/common.c +++ b/lib/pengine/common.c @@ -1,564 +1,573 @@ /* * Copyright 2004-2022 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include gboolean was_processing_error = FALSE; gboolean was_processing_warning = FALSE; static bool check_placement_strategy(const char *value) { return pcmk__strcase_any_of(value, "default", "utilization", "minimal", "balanced", NULL); } static pcmk__cluster_option_t pe_opts[] = { /* name, old name, type, allowed values, * default value, validator, * short description, * long description */ { "no-quorum-policy", NULL, "select", "stop, freeze, ignore, demote, suicide", "stop", pcmk__valid_quorum, N_("What to do when the cluster does not have quorum"), NULL }, { "symmetric-cluster", NULL, "boolean", NULL, "true", pcmk__valid_boolean, N_("Whether resources can run on any node by default"), NULL }, { "maintenance-mode", NULL, "boolean", NULL, "false", pcmk__valid_boolean, N_("Whether the cluster should refrain from monitoring, starting, " "and stopping resources"), NULL }, { "start-failure-is-fatal", NULL, "boolean", NULL, "true", pcmk__valid_boolean, N_("Whether a start failure should prevent a resource from being " "recovered on the same node"), N_("When true, the cluster will immediately ban a resource from a node " "if it fails to start there. When false, the cluster will instead " "check the resource's fail count against its migration-threshold.") }, { "enable-startup-probes", NULL, "boolean", NULL, "true", pcmk__valid_boolean, N_("Whether the cluster should check for active resources during start-up"), NULL }, { XML_CONFIG_ATTR_SHUTDOWN_LOCK, NULL, "boolean", NULL, "false", pcmk__valid_boolean, N_("Whether to lock resources to a cleanly shut down node"), N_("When true, resources active on a node when it is cleanly shut down " "are kept \"locked\" to that node (not allowed to run elsewhere) " "until they start again on that node after it rejoins (or for at " "most shutdown-lock-limit, if set). Stonith resources and " "Pacemaker Remote connections are never locked. Clone and bundle " "instances and the promoted role of promotable clones are " "currently never locked, though support could be added in a future " "release.") }, { XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT, NULL, "time", NULL, "0", pcmk__valid_interval_spec, N_("Do not lock resources to a cleanly shut down node longer than " "this"), N_("If shutdown-lock is true and this is set to a nonzero time " "duration, shutdown locks will expire after this much time has " "passed since the shutdown was initiated, even if the node has not " "rejoined.") }, // Fencing-related options { "stonith-enabled", NULL, "boolean", NULL, "true", pcmk__valid_boolean, N_("*** Advanced Use Only *** " "Whether nodes may be fenced as part of recovery"), N_("If false, unresponsive nodes are immediately assumed to be harmless, " "and resources that were active on them may be recovered " "elsewhere. This can result in a \"split-brain\" situation, " "potentially leading to data loss and/or service unavailability.") }, { "stonith-action", NULL, "select", "reboot, off, poweroff", "reboot", pcmk__is_fencing_action, N_("Action to send to fence device when a node needs to be fenced " "(\"poweroff\" is a deprecated alias for \"off\")"), NULL }, { "stonith-timeout", NULL, "time", NULL, "60s", pcmk__valid_interval_spec, N_("*** Advanced Use Only *** Unused by Pacemaker"), N_("This value is not used by Pacemaker, but is kept for backward " "compatibility, and certain legacy fence agents might use it.") }, { XML_ATTR_HAVE_WATCHDOG, NULL, "boolean", NULL, "false", pcmk__valid_boolean, N_("Whether watchdog integration is enabled"), N_("This is set automatically by the cluster according to whether SBD " "is detected to be in use. User-configured values are ignored. " "The value `true` is meaningful if diskless SBD is used and " "`stonith-watchdog-timeout` is nonzero. In that case, if fencing " "is required, watchdog-based self-fencing will be performed via " "SBD without requiring a fencing resource explicitly configured.") }, { "concurrent-fencing", NULL, "boolean", NULL, PCMK__CONCURRENT_FENCING_DEFAULT, pcmk__valid_boolean, N_("Allow performing fencing operations in parallel"), NULL }, { "startup-fencing", NULL, "boolean", NULL, "true", pcmk__valid_boolean, N_("*** Advanced Use Only *** Whether to fence unseen nodes at start-up"), N_("Setting this to false may lead to a \"split-brain\" situation," "potentially leading to data loss and/or service unavailability.") }, { XML_CONFIG_ATTR_PRIORITY_FENCING_DELAY, NULL, "time", NULL, "0", pcmk__valid_interval_spec, N_("Apply fencing delay targeting the lost nodes with the highest total resource priority"), N_("Apply specified delay for the fencings that are targeting the lost " "nodes with the highest total resource priority in case we don't " "have the majority of the nodes in our cluster partition, so that " "the more significant nodes potentially win any fencing match, " "which is especially meaningful under split-brain of 2-node " "cluster. A promoted resource instance takes the base priority + 1 " "on calculation if the base priority is not 0. Any static/random " "delays that are introduced by `pcmk_delay_base/max` configured " "for the corresponding fencing resources will be added to this " "delay. This delay should be significantly greater than, safely " "twice, the maximum `pcmk_delay_base/max`. By default, priority " "fencing delay is disabled.") }, + { + XML_CONFIG_ATTR_NODE_PENDING_TIMEOUT, NULL, "time", NULL, + "10min", pcmk__valid_interval_spec, + N_("How long to wait for a node that has joined the cluster to join " + "the process group"), + N_("A node that has joined the cluster can be pending on joining the " + "process group. We wait up to this much time for it. If it times " + "out, fencing targeting the node will be issued if enabled.") + }, { "cluster-delay", NULL, "time", NULL, "60s", pcmk__valid_interval_spec, N_("Maximum time for node-to-node communication"), N_("The node elected Designated Controller (DC) will consider an action " "failed if it does not get a response from the node executing the " "action within this time (after considering the action's own " "timeout). The \"correct\" value will depend on the speed and " "load of your network and cluster nodes.") }, { "batch-limit", NULL, "integer", NULL, "0", pcmk__valid_number, N_("Maximum number of jobs that the cluster may execute in parallel " "across all nodes"), N_("The \"correct\" value will depend on the speed and load of your " "network and cluster nodes. If set to 0, the cluster will " "impose a dynamically calculated limit when any node has a " "high load.") }, { "migration-limit", NULL, "integer", NULL, "-1", pcmk__valid_number, N_("The number of live migration actions that the cluster is allowed " "to execute in parallel on a node (-1 means no limit)") }, /* Orphans and stopping */ { "stop-all-resources", NULL, "boolean", NULL, "false", pcmk__valid_boolean, N_("Whether the cluster should stop all active resources"), NULL }, { "stop-orphan-resources", NULL, "boolean", NULL, "true", pcmk__valid_boolean, N_("Whether to stop resources that were removed from the configuration"), NULL }, { "stop-orphan-actions", NULL, "boolean", NULL, "true", pcmk__valid_boolean, N_("Whether to cancel recurring actions removed from the configuration"), NULL }, { "remove-after-stop", NULL, "boolean", NULL, "false", pcmk__valid_boolean, N_("*** Deprecated *** Whether to remove stopped resources from " "the executor"), N_("Values other than default are poorly tested and potentially dangerous." " This option will be removed in a future release.") }, /* Storing inputs */ { "pe-error-series-max", NULL, "integer", NULL, "-1", pcmk__valid_number, N_("The number of scheduler inputs resulting in errors to save"), N_("Zero to disable, -1 to store unlimited.") }, { "pe-warn-series-max", NULL, "integer", NULL, "5000", pcmk__valid_number, N_("The number of scheduler inputs resulting in warnings to save"), N_("Zero to disable, -1 to store unlimited.") }, { "pe-input-series-max", NULL, "integer", NULL, "4000", pcmk__valid_number, N_("The number of scheduler inputs without errors or warnings to save"), N_("Zero to disable, -1 to store unlimited.") }, /* Node health */ { PCMK__OPT_NODE_HEALTH_STRATEGY, NULL, "select", PCMK__VALUE_NONE ", " PCMK__VALUE_MIGRATE_ON_RED ", " PCMK__VALUE_ONLY_GREEN ", " PCMK__VALUE_PROGRESSIVE ", " PCMK__VALUE_CUSTOM, PCMK__VALUE_NONE, pcmk__validate_health_strategy, N_("How cluster should react to node health attributes"), N_("Requires external entities to create node attributes (named with " "the prefix \"#health\") with values \"red\", " "\"yellow\", or \"green\".") }, { PCMK__OPT_NODE_HEALTH_BASE, NULL, "integer", NULL, "0", pcmk__valid_number, N_("Base health score assigned to a node"), N_("Only used when \"node-health-strategy\" is set to \"progressive\".") }, { PCMK__OPT_NODE_HEALTH_GREEN, NULL, "integer", NULL, "0", pcmk__valid_number, N_("The score to use for a node health attribute whose value is \"green\""), N_("Only used when \"node-health-strategy\" is set to \"custom\" or \"progressive\".") }, { PCMK__OPT_NODE_HEALTH_YELLOW, NULL, "integer", NULL, "0", pcmk__valid_number, N_("The score to use for a node health attribute whose value is \"yellow\""), N_("Only used when \"node-health-strategy\" is set to \"custom\" or \"progressive\".") }, { PCMK__OPT_NODE_HEALTH_RED, NULL, "integer", NULL, "-INFINITY", pcmk__valid_number, N_("The score to use for a node health attribute whose value is \"red\""), N_("Only used when \"node-health-strategy\" is set to \"custom\" or \"progressive\".") }, /*Placement Strategy*/ { "placement-strategy", NULL, "select", "default, utilization, minimal, balanced", "default", check_placement_strategy, N_("How the cluster should allocate resources to nodes"), NULL }, }; void pe_metadata(pcmk__output_t *out) { const char *desc_short = "Pacemaker scheduler options"; const char *desc_long = "Cluster options used by Pacemaker's scheduler"; gchar *s = pcmk__format_option_metadata("pacemaker-schedulerd", desc_short, desc_long, pe_opts, PCMK__NELEM(pe_opts)); out->output_xml(out, "metadata", s); g_free(s); } void verify_pe_options(GHashTable * options) { pcmk__validate_cluster_options(options, pe_opts, PCMK__NELEM(pe_opts)); } const char * pe_pref(GHashTable * options, const char *name) { return pcmk__cluster_option(options, pe_opts, PCMK__NELEM(pe_opts), name); } const char * fail2text(enum action_fail_response fail) { const char *result = ""; switch (fail) { case action_fail_ignore: result = "ignore"; break; case action_fail_demote: result = "demote"; break; case action_fail_block: result = "block"; break; case action_fail_recover: result = "recover"; break; case action_fail_migrate: result = "migrate"; break; case action_fail_stop: result = "stop"; break; case action_fail_fence: result = "fence"; break; case action_fail_standby: result = "standby"; break; case action_fail_restart_container: result = "restart-container"; break; case action_fail_reset_remote: result = "reset-remote"; break; } return result; } enum action_tasks text2task(const char *task) { if (pcmk__str_eq(task, CRMD_ACTION_STOP, pcmk__str_casei)) { return stop_rsc; } else if (pcmk__str_eq(task, CRMD_ACTION_STOPPED, pcmk__str_casei)) { return stopped_rsc; } else if (pcmk__str_eq(task, CRMD_ACTION_START, pcmk__str_casei)) { return start_rsc; } else if (pcmk__str_eq(task, CRMD_ACTION_STARTED, pcmk__str_casei)) { return started_rsc; } else if (pcmk__str_eq(task, CRM_OP_SHUTDOWN, pcmk__str_casei)) { return shutdown_crm; } else if (pcmk__str_eq(task, CRM_OP_FENCE, pcmk__str_casei)) { return stonith_node; } else if (pcmk__str_eq(task, CRMD_ACTION_STATUS, pcmk__str_casei)) { return monitor_rsc; } else if (pcmk__str_eq(task, CRMD_ACTION_NOTIFY, pcmk__str_casei)) { return action_notify; } else if (pcmk__str_eq(task, CRMD_ACTION_NOTIFIED, pcmk__str_casei)) { return action_notified; } else if (pcmk__str_eq(task, CRMD_ACTION_PROMOTE, pcmk__str_casei)) { return action_promote; } else if (pcmk__str_eq(task, CRMD_ACTION_DEMOTE, pcmk__str_casei)) { return action_demote; } else if (pcmk__str_eq(task, CRMD_ACTION_PROMOTED, pcmk__str_casei)) { return action_promoted; } else if (pcmk__str_eq(task, CRMD_ACTION_DEMOTED, pcmk__str_casei)) { return action_demoted; } #if SUPPORT_TRACING if (pcmk__str_eq(task, CRMD_ACTION_CANCEL, pcmk__str_casei)) { return no_action; } else if (pcmk__str_eq(task, CRMD_ACTION_DELETE, pcmk__str_casei)) { return no_action; } else if (pcmk__str_eq(task, CRMD_ACTION_STATUS, pcmk__str_casei)) { return no_action; } else if (pcmk__str_eq(task, CRMD_ACTION_MIGRATE, pcmk__str_casei)) { return no_action; } else if (pcmk__str_eq(task, CRMD_ACTION_MIGRATED, pcmk__str_casei)) { return no_action; } crm_trace("Unsupported action: %s", task); #endif return no_action; } const char * task2text(enum action_tasks task) { const char *result = ""; switch (task) { case no_action: result = "no_action"; break; case stop_rsc: result = CRMD_ACTION_STOP; break; case stopped_rsc: result = CRMD_ACTION_STOPPED; break; case start_rsc: result = CRMD_ACTION_START; break; case started_rsc: result = CRMD_ACTION_STARTED; break; case shutdown_crm: result = CRM_OP_SHUTDOWN; break; case stonith_node: result = CRM_OP_FENCE; break; case monitor_rsc: result = CRMD_ACTION_STATUS; break; case action_notify: result = CRMD_ACTION_NOTIFY; break; case action_notified: result = CRMD_ACTION_NOTIFIED; break; case action_promote: result = CRMD_ACTION_PROMOTE; break; case action_promoted: result = CRMD_ACTION_PROMOTED; break; case action_demote: result = CRMD_ACTION_DEMOTE; break; case action_demoted: result = CRMD_ACTION_DEMOTED; break; } return result; } const char * role2text(enum rsc_role_e role) { switch (role) { case RSC_ROLE_UNKNOWN: return RSC_ROLE_UNKNOWN_S; case RSC_ROLE_STOPPED: return RSC_ROLE_STOPPED_S; case RSC_ROLE_STARTED: return RSC_ROLE_STARTED_S; case RSC_ROLE_UNPROMOTED: #ifdef PCMK__COMPAT_2_0 return RSC_ROLE_UNPROMOTED_LEGACY_S; #else return RSC_ROLE_UNPROMOTED_S; #endif case RSC_ROLE_PROMOTED: #ifdef PCMK__COMPAT_2_0 return RSC_ROLE_PROMOTED_LEGACY_S; #else return RSC_ROLE_PROMOTED_S; #endif } CRM_CHECK(role >= RSC_ROLE_UNKNOWN, return RSC_ROLE_UNKNOWN_S); CRM_CHECK(role < RSC_ROLE_MAX, return RSC_ROLE_UNKNOWN_S); // coverity[dead_error_line] return RSC_ROLE_UNKNOWN_S; } enum rsc_role_e text2role(const char *role) { CRM_ASSERT(role != NULL); if (pcmk__str_eq(role, RSC_ROLE_STOPPED_S, pcmk__str_casei)) { return RSC_ROLE_STOPPED; } else if (pcmk__str_eq(role, RSC_ROLE_STARTED_S, pcmk__str_casei)) { return RSC_ROLE_STARTED; } else if (pcmk__strcase_any_of(role, RSC_ROLE_UNPROMOTED_S, RSC_ROLE_UNPROMOTED_LEGACY_S, NULL)) { return RSC_ROLE_UNPROMOTED; } else if (pcmk__strcase_any_of(role, RSC_ROLE_PROMOTED_S, RSC_ROLE_PROMOTED_LEGACY_S, NULL)) { return RSC_ROLE_PROMOTED; } else if (pcmk__str_eq(role, RSC_ROLE_UNKNOWN_S, pcmk__str_casei)) { return RSC_ROLE_UNKNOWN; } crm_err("Unknown role: %s", role); return RSC_ROLE_UNKNOWN; } void add_hash_param(GHashTable * hash, const char *name, const char *value) { CRM_CHECK(hash != NULL, return); crm_trace("Adding name='%s' value='%s' to hash table", pcmk__s(name, ""), pcmk__s(value, "")); if (name == NULL || value == NULL) { return; } else if (pcmk__str_eq(value, "#default", pcmk__str_casei)) { return; } else if (g_hash_table_lookup(hash, name) == NULL) { g_hash_table_insert(hash, strdup(name), strdup(value)); } } const char * pe_node_attribute_calculated(const pe_node_t *node, const char *name, const pe_resource_t *rsc) { const char *source; if(node == NULL) { return NULL; } else if(rsc == NULL) { return g_hash_table_lookup(node->details->attrs, name); } source = g_hash_table_lookup(rsc->meta, XML_RSC_ATTR_TARGET); if(source == NULL || !pcmk__str_eq("host", source, pcmk__str_casei)) { return g_hash_table_lookup(node->details->attrs, name); } /* Use attributes set for the containers location * instead of for the container itself * * Useful when the container is using the host's local * storage */ CRM_ASSERT(node->details->remote_rsc); CRM_ASSERT(node->details->remote_rsc->container); if(node->details->remote_rsc->container->running_on) { pe_node_t *host = node->details->remote_rsc->container->running_on->data; pe_rsc_trace(rsc, "%s: Looking for %s on the container host %s", rsc->id, name, pe__node_name(host)); return g_hash_table_lookup(host->details->attrs, name); } pe_rsc_trace(rsc, "%s: Not looking for %s on the container host: %s is inactive", rsc->id, name, node->details->remote_rsc->container->id); return NULL; } const char * pe_node_attribute_raw(const pe_node_t *node, const char *name) { if(node == NULL) { return NULL; } return g_hash_table_lookup(node->details->attrs, name); } diff --git a/lib/pengine/unpack.c b/lib/pengine/unpack.c index c7e3daef64..92639e3dc7 100644 --- a/lib/pengine/unpack.c +++ b/lib/pengine/unpack.c @@ -1,4829 +1,4904 @@ /* * Copyright 2004-2023 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include CRM_TRACE_INIT_DATA(pe_status); // A (parsed) resource action history entry struct action_history { pe_resource_t *rsc; // Resource that history is for pe_node_t *node; // Node that history is for xmlNode *xml; // History entry XML // Parsed from entry XML const char *id; // XML ID of history entry const char *key; // Operation key of action const char *task; // Action name const char *exit_reason; // Exit reason given for result guint interval_ms; // Action interval int call_id; // Call ID of action int expected_exit_status; // Expected exit status of action int exit_status; // Actual exit status of action int execution_status; // Execution status of action }; /* This uses pcmk__set_flags_as()/pcmk__clear_flags_as() directly rather than * use pe__set_working_set_flags()/pe__clear_working_set_flags() so that the * flag is stringified more readably in log messages. */ #define set_config_flag(data_set, option, flag) do { \ const char *scf_value = pe_pref((data_set)->config_hash, (option)); \ if (scf_value != NULL) { \ if (crm_is_true(scf_value)) { \ (data_set)->flags = pcmk__set_flags_as(__func__, __LINE__, \ LOG_TRACE, "Working set", \ crm_system_name, (data_set)->flags, \ (flag), #flag); \ } else { \ (data_set)->flags = pcmk__clear_flags_as(__func__, __LINE__,\ LOG_TRACE, "Working set", \ crm_system_name, (data_set)->flags, \ (flag), #flag); \ } \ } \ } while(0) static void unpack_rsc_op(pe_resource_t *rsc, pe_node_t *node, xmlNode *xml_op, xmlNode **last_failure, enum action_fail_response *failed); static void determine_remote_online_status(pe_working_set_t *data_set, pe_node_t *this_node); static void add_node_attrs(const xmlNode *xml_obj, pe_node_t *node, bool overwrite, pe_working_set_t *data_set); static void determine_online_status(const xmlNode *node_state, pe_node_t *this_node, pe_working_set_t *data_set); static void unpack_node_lrm(pe_node_t *node, const xmlNode *xml, pe_working_set_t *data_set); // Bitmask for warnings we only want to print once uint32_t pe_wo = 0; static gboolean is_dangling_guest_node(pe_node_t *node) { /* we are looking for a remote-node that was supposed to be mapped to a * container resource, but all traces of that container have disappeared * from both the config and the status section. */ if (pe__is_guest_or_remote_node(node) && node->details->remote_rsc && node->details->remote_rsc->container == NULL && pcmk_is_set(node->details->remote_rsc->flags, pe_rsc_orphan_container_filler)) { return TRUE; } return FALSE; } /*! * \brief Schedule a fence action for a node * * \param[in,out] data_set Current working set of cluster * \param[in,out] node Node to fence * \param[in] reason Text description of why fencing is needed * \param[in] priority_delay Whether to consider `priority-fencing-delay` */ void pe_fence_node(pe_working_set_t * data_set, pe_node_t * node, const char *reason, bool priority_delay) { CRM_CHECK(node, return); /* A guest node is fenced by marking its container as failed */ if (pe__is_guest_node(node)) { pe_resource_t *rsc = node->details->remote_rsc->container; if (!pcmk_is_set(rsc->flags, pe_rsc_failed)) { if (!pcmk_is_set(rsc->flags, pe_rsc_managed)) { crm_notice("Not fencing guest node %s " "(otherwise would because %s): " "its guest resource %s is unmanaged", pe__node_name(node), reason, rsc->id); } else { crm_warn("Guest node %s will be fenced " "(by recovering its guest resource %s): %s", pe__node_name(node), rsc->id, reason); /* We don't mark the node as unclean because that would prevent the * node from running resources. We want to allow it to run resources * in this transition if the recovery succeeds. */ node->details->remote_requires_reset = TRUE; pe__set_resource_flags(rsc, pe_rsc_failed|pe_rsc_stop); } } } else if (is_dangling_guest_node(node)) { crm_info("Cleaning up dangling connection for guest node %s: " "fencing was already done because %s, " "and guest resource no longer exists", pe__node_name(node), reason); pe__set_resource_flags(node->details->remote_rsc, pe_rsc_failed|pe_rsc_stop); } else if (pe__is_remote_node(node)) { pe_resource_t *rsc = node->details->remote_rsc; if ((rsc != NULL) && !pcmk_is_set(rsc->flags, pe_rsc_managed)) { crm_notice("Not fencing remote node %s " "(otherwise would because %s): connection is unmanaged", pe__node_name(node), reason); } else if(node->details->remote_requires_reset == FALSE) { node->details->remote_requires_reset = TRUE; crm_warn("Remote node %s %s: %s", pe__node_name(node), pe_can_fence(data_set, node)? "will be fenced" : "is unclean", reason); } node->details->unclean = TRUE; // No need to apply `priority-fencing-delay` for remote nodes pe_fence_op(node, NULL, TRUE, reason, FALSE, data_set); } else if (node->details->unclean) { crm_trace("Cluster node %s %s because %s", pe__node_name(node), pe_can_fence(data_set, node)? "would also be fenced" : "also is unclean", reason); } else { crm_warn("Cluster node %s %s: %s", pe__node_name(node), pe_can_fence(data_set, node)? "will be fenced" : "is unclean", reason); node->details->unclean = TRUE; pe_fence_op(node, NULL, TRUE, reason, priority_delay, data_set); } } // @TODO xpaths can't handle templates, rules, or id-refs // nvpair with provides or requires set to unfencing #define XPATH_UNFENCING_NVPAIR XML_CIB_TAG_NVPAIR \ "[(@" XML_NVPAIR_ATTR_NAME "='" PCMK_STONITH_PROVIDES "'" \ "or @" XML_NVPAIR_ATTR_NAME "='" XML_RSC_ATTR_REQUIRES "') " \ "and @" XML_NVPAIR_ATTR_VALUE "='" PCMK__VALUE_UNFENCING "']" // unfencing in rsc_defaults or any resource #define XPATH_ENABLE_UNFENCING \ "/" XML_TAG_CIB "/" XML_CIB_TAG_CONFIGURATION "/" XML_CIB_TAG_RESOURCES \ "//" XML_TAG_META_SETS "/" XPATH_UNFENCING_NVPAIR \ "|/" XML_TAG_CIB "/" XML_CIB_TAG_CONFIGURATION "/" XML_CIB_TAG_RSCCONFIG \ "/" XML_TAG_META_SETS "/" XPATH_UNFENCING_NVPAIR static void set_if_xpath(uint64_t flag, const char *xpath, pe_working_set_t *data_set) { xmlXPathObjectPtr result = NULL; if (!pcmk_is_set(data_set->flags, flag)) { result = xpath_search(data_set->input, xpath); if (result && (numXpathResults(result) > 0)) { pe__set_working_set_flags(data_set, flag); } freeXpathObject(result); } } gboolean unpack_config(xmlNode * config, pe_working_set_t * data_set) { const char *value = NULL; GHashTable *config_hash = pcmk__strkey_table(free, free); pe_rule_eval_data_t rule_data = { .node_hash = NULL, .role = RSC_ROLE_UNKNOWN, .now = data_set->now, .match_data = NULL, .rsc_data = NULL, .op_data = NULL }; data_set->config_hash = config_hash; pe__unpack_dataset_nvpairs(config, XML_CIB_TAG_PROPSET, &rule_data, config_hash, CIB_OPTIONS_FIRST, FALSE, data_set); verify_pe_options(data_set->config_hash); set_config_flag(data_set, "enable-startup-probes", pe_flag_startup_probes); if (!pcmk_is_set(data_set->flags, pe_flag_startup_probes)) { crm_info("Startup probes: disabled (dangerous)"); } value = pe_pref(data_set->config_hash, XML_ATTR_HAVE_WATCHDOG); if (value && crm_is_true(value)) { crm_info("Watchdog-based self-fencing will be performed via SBD if " "fencing is required and stonith-watchdog-timeout is nonzero"); pe__set_working_set_flags(data_set, pe_flag_have_stonith_resource); } /* Set certain flags via xpath here, so they can be used before the relevant * configuration sections are unpacked. */ set_if_xpath(pe_flag_enable_unfencing, XPATH_ENABLE_UNFENCING, data_set); value = pe_pref(data_set->config_hash, "stonith-timeout"); data_set->stonith_timeout = (int) crm_parse_interval_spec(value); crm_debug("STONITH timeout: %d", data_set->stonith_timeout); set_config_flag(data_set, "stonith-enabled", pe_flag_stonith_enabled); crm_debug("STONITH of failed nodes is %s", pcmk_is_set(data_set->flags, pe_flag_stonith_enabled)? "enabled" : "disabled"); data_set->stonith_action = pe_pref(data_set->config_hash, "stonith-action"); if (!strcmp(data_set->stonith_action, "poweroff")) { pe_warn_once(pe_wo_poweroff, "Support for stonith-action of 'poweroff' is deprecated " "and will be removed in a future release (use 'off' instead)"); data_set->stonith_action = "off"; } crm_trace("STONITH will %s nodes", data_set->stonith_action); set_config_flag(data_set, "concurrent-fencing", pe_flag_concurrent_fencing); crm_debug("Concurrent fencing is %s", pcmk_is_set(data_set->flags, pe_flag_concurrent_fencing)? "enabled" : "disabled"); value = pe_pref(data_set->config_hash, XML_CONFIG_ATTR_PRIORITY_FENCING_DELAY); if (value) { data_set->priority_fencing_delay = crm_parse_interval_spec(value) / 1000; crm_trace("Priority fencing delay is %ds", data_set->priority_fencing_delay); } set_config_flag(data_set, "stop-all-resources", pe_flag_stop_everything); crm_debug("Stop all active resources: %s", pcmk__btoa(pcmk_is_set(data_set->flags, pe_flag_stop_everything))); set_config_flag(data_set, "symmetric-cluster", pe_flag_symmetric_cluster); if (pcmk_is_set(data_set->flags, pe_flag_symmetric_cluster)) { crm_debug("Cluster is symmetric" " - resources can run anywhere by default"); } value = pe_pref(data_set->config_hash, "no-quorum-policy"); if (pcmk__str_eq(value, "ignore", pcmk__str_casei)) { data_set->no_quorum_policy = no_quorum_ignore; } else if (pcmk__str_eq(value, "freeze", pcmk__str_casei)) { data_set->no_quorum_policy = no_quorum_freeze; } else if (pcmk__str_eq(value, "demote", pcmk__str_casei)) { data_set->no_quorum_policy = no_quorum_demote; } else if (pcmk__str_eq(value, "suicide", pcmk__str_casei)) { if (pcmk_is_set(data_set->flags, pe_flag_stonith_enabled)) { int do_panic = 0; crm_element_value_int(data_set->input, XML_ATTR_QUORUM_PANIC, &do_panic); if (do_panic || pcmk_is_set(data_set->flags, pe_flag_have_quorum)) { data_set->no_quorum_policy = no_quorum_suicide; } else { crm_notice("Resetting no-quorum-policy to 'stop': cluster has never had quorum"); data_set->no_quorum_policy = no_quorum_stop; } } else { pcmk__config_err("Resetting no-quorum-policy to 'stop' because " "fencing is disabled"); data_set->no_quorum_policy = no_quorum_stop; } } else { data_set->no_quorum_policy = no_quorum_stop; } switch (data_set->no_quorum_policy) { case no_quorum_freeze: crm_debug("On loss of quorum: Freeze resources"); break; case no_quorum_stop: crm_debug("On loss of quorum: Stop ALL resources"); break; case no_quorum_demote: crm_debug("On loss of quorum: " "Demote promotable resources and stop other resources"); break; case no_quorum_suicide: crm_notice("On loss of quorum: Fence all remaining nodes"); break; case no_quorum_ignore: crm_notice("On loss of quorum: Ignore"); break; } set_config_flag(data_set, "stop-orphan-resources", pe_flag_stop_rsc_orphans); crm_trace("Orphan resources are %s", pcmk_is_set(data_set->flags, pe_flag_stop_rsc_orphans)? "stopped" : "ignored"); set_config_flag(data_set, "stop-orphan-actions", pe_flag_stop_action_orphans); crm_trace("Orphan resource actions are %s", pcmk_is_set(data_set->flags, pe_flag_stop_action_orphans)? "stopped" : "ignored"); value = pe_pref(data_set->config_hash, "remove-after-stop"); if (value != NULL) { if (crm_is_true(value)) { pe__set_working_set_flags(data_set, pe_flag_remove_after_stop); #ifndef PCMK__COMPAT_2_0 pe_warn_once(pe_wo_remove_after, "Support for the remove-after-stop cluster property is" " deprecated and will be removed in a future release"); #endif } else { pe__clear_working_set_flags(data_set, pe_flag_remove_after_stop); } } set_config_flag(data_set, "maintenance-mode", pe_flag_maintenance_mode); crm_trace("Maintenance mode: %s", pcmk__btoa(pcmk_is_set(data_set->flags, pe_flag_maintenance_mode))); set_config_flag(data_set, "start-failure-is-fatal", pe_flag_start_failure_fatal); crm_trace("Start failures are %s", pcmk_is_set(data_set->flags, pe_flag_start_failure_fatal)? "always fatal" : "handled by failcount"); if (pcmk_is_set(data_set->flags, pe_flag_stonith_enabled)) { set_config_flag(data_set, "startup-fencing", pe_flag_startup_fencing); } if (pcmk_is_set(data_set->flags, pe_flag_startup_fencing)) { crm_trace("Unseen nodes will be fenced"); } else { pe_warn_once(pe_wo_blind, "Blind faith: not fencing unseen nodes"); } pe__unpack_node_health_scores(data_set); data_set->placement_strategy = pe_pref(data_set->config_hash, "placement-strategy"); crm_trace("Placement strategy: %s", data_set->placement_strategy); set_config_flag(data_set, "shutdown-lock", pe_flag_shutdown_lock); crm_trace("Resources will%s be locked to cleanly shut down nodes", (pcmk_is_set(data_set->flags, pe_flag_shutdown_lock)? "" : " not")); if (pcmk_is_set(data_set->flags, pe_flag_shutdown_lock)) { value = pe_pref(data_set->config_hash, XML_CONFIG_ATTR_SHUTDOWN_LOCK_LIMIT); data_set->shutdown_lock = crm_parse_interval_spec(value) / 1000; crm_trace("Shutdown locks expire after %us", data_set->shutdown_lock); } + value = pe_pref(data_set->config_hash, + XML_CONFIG_ATTR_NODE_PENDING_TIMEOUT); + data_set->node_pending_timeout = crm_parse_interval_spec(value) / 1000; + crm_trace("Node pending timeout is %us", data_set->node_pending_timeout); + return TRUE; } pe_node_t * pe_create_node(const char *id, const char *uname, const char *type, const char *score, pe_working_set_t * data_set) { pe_node_t *new_node = NULL; if (pe_find_node(data_set->nodes, uname) != NULL) { pcmk__config_warn("More than one node entry has name '%s'", uname); } new_node = calloc(1, sizeof(pe_node_t)); if (new_node == NULL) { return NULL; } new_node->weight = char2score(score); new_node->details = calloc(1, sizeof(struct pe_node_shared_s)); if (new_node->details == NULL) { free(new_node); return NULL; } crm_trace("Creating node for entry %s/%s", uname, id); new_node->details->id = id; new_node->details->uname = uname; new_node->details->online = FALSE; new_node->details->shutdown = FALSE; new_node->details->rsc_discovery_enabled = TRUE; new_node->details->running_rsc = NULL; new_node->details->data_set = data_set; if (pcmk__str_eq(type, "member", pcmk__str_null_matches | pcmk__str_casei)) { new_node->details->type = node_member; } else if (pcmk__str_eq(type, "remote", pcmk__str_casei)) { new_node->details->type = node_remote; pe__set_working_set_flags(data_set, pe_flag_have_remote_nodes); } else { /* @COMPAT 'ping' is the default for backward compatibility, but it * should be changed to 'member' at a compatibility break */ if (!pcmk__str_eq(type, "ping", pcmk__str_casei)) { pcmk__config_warn("Node %s has unrecognized type '%s', " "assuming 'ping'", pcmk__s(uname, "without name"), type); } pe_warn_once(pe_wo_ping_node, "Support for nodes of type 'ping' (such as %s) is " "deprecated and will be removed in a future release", pcmk__s(uname, "unnamed node")); new_node->details->type = node_ping; } new_node->details->attrs = pcmk__strkey_table(free, free); if (pe__is_guest_or_remote_node(new_node)) { g_hash_table_insert(new_node->details->attrs, strdup(CRM_ATTR_KIND), strdup("remote")); } else { g_hash_table_insert(new_node->details->attrs, strdup(CRM_ATTR_KIND), strdup("cluster")); } new_node->details->utilization = pcmk__strkey_table(free, free); new_node->details->digest_cache = pcmk__strkey_table(free, pe__free_digests); data_set->nodes = g_list_insert_sorted(data_set->nodes, new_node, pe__cmp_node_name); return new_node; } static const char * expand_remote_rsc_meta(xmlNode *xml_obj, xmlNode *parent, pe_working_set_t *data) { xmlNode *attr_set = NULL; xmlNode *attr = NULL; const char *container_id = ID(xml_obj); const char *remote_name = NULL; const char *remote_server = NULL; const char *remote_port = NULL; const char *connect_timeout = "60s"; const char *remote_allow_migrate=NULL; const char *is_managed = NULL; for (attr_set = pcmk__xe_first_child(xml_obj); attr_set != NULL; attr_set = pcmk__xe_next(attr_set)) { if (!pcmk__str_eq((const char *)attr_set->name, XML_TAG_META_SETS, pcmk__str_casei)) { continue; } for (attr = pcmk__xe_first_child(attr_set); attr != NULL; attr = pcmk__xe_next(attr)) { const char *value = crm_element_value(attr, XML_NVPAIR_ATTR_VALUE); const char *name = crm_element_value(attr, XML_NVPAIR_ATTR_NAME); if (pcmk__str_eq(name, XML_RSC_ATTR_REMOTE_NODE, pcmk__str_casei)) { remote_name = value; } else if (pcmk__str_eq(name, "remote-addr", pcmk__str_casei)) { remote_server = value; } else if (pcmk__str_eq(name, "remote-port", pcmk__str_casei)) { remote_port = value; } else if (pcmk__str_eq(name, "remote-connect-timeout", pcmk__str_casei)) { connect_timeout = value; } else if (pcmk__str_eq(name, "remote-allow-migrate", pcmk__str_casei)) { remote_allow_migrate=value; } else if (pcmk__str_eq(name, XML_RSC_ATTR_MANAGED, pcmk__str_casei)) { is_managed = value; } } } if (remote_name == NULL) { return NULL; } if (pe_find_resource(data->resources, remote_name) != NULL) { return NULL; } pe_create_remote_xml(parent, remote_name, container_id, remote_allow_migrate, is_managed, connect_timeout, remote_server, remote_port); return remote_name; } static void handle_startup_fencing(pe_working_set_t *data_set, pe_node_t *new_node) { if ((new_node->details->type == node_remote) && (new_node->details->remote_rsc == NULL)) { /* Ignore fencing for remote nodes that don't have a connection resource * associated with them. This happens when remote node entries get left * in the nodes section after the connection resource is removed. */ return; } if (pcmk_is_set(data_set->flags, pe_flag_startup_fencing)) { // All nodes are unclean until we've seen their status entry new_node->details->unclean = TRUE; } else { // Blind faith ... new_node->details->unclean = FALSE; } /* We need to be able to determine if a node's status section * exists or not separate from whether the node is unclean. */ new_node->details->unseen = TRUE; } gboolean unpack_nodes(xmlNode * xml_nodes, pe_working_set_t * data_set) { xmlNode *xml_obj = NULL; pe_node_t *new_node = NULL; const char *id = NULL; const char *uname = NULL; const char *type = NULL; const char *score = NULL; for (xml_obj = pcmk__xe_first_child(xml_nodes); xml_obj != NULL; xml_obj = pcmk__xe_next(xml_obj)) { if (pcmk__str_eq((const char *)xml_obj->name, XML_CIB_TAG_NODE, pcmk__str_none)) { new_node = NULL; id = crm_element_value(xml_obj, XML_ATTR_ID); uname = crm_element_value(xml_obj, XML_ATTR_UNAME); type = crm_element_value(xml_obj, XML_ATTR_TYPE); score = crm_element_value(xml_obj, XML_RULE_ATTR_SCORE); crm_trace("Processing node %s/%s", uname, id); if (id == NULL) { pcmk__config_err("Ignoring <" XML_CIB_TAG_NODE "> entry in configuration without id"); continue; } new_node = pe_create_node(id, uname, type, score, data_set); if (new_node == NULL) { return FALSE; } handle_startup_fencing(data_set, new_node); add_node_attrs(xml_obj, new_node, FALSE, data_set); crm_trace("Done with node %s", crm_element_value(xml_obj, XML_ATTR_UNAME)); } } if (data_set->localhost && pe_find_node(data_set->nodes, data_set->localhost) == NULL) { crm_info("Creating a fake local node"); pe_create_node(data_set->localhost, data_set->localhost, NULL, 0, data_set); } return TRUE; } static void setup_container(pe_resource_t * rsc, pe_working_set_t * data_set) { const char *container_id = NULL; if (rsc->children) { g_list_foreach(rsc->children, (GFunc) setup_container, data_set); return; } container_id = g_hash_table_lookup(rsc->meta, XML_RSC_ATTR_CONTAINER); if (container_id && !pcmk__str_eq(container_id, rsc->id, pcmk__str_casei)) { pe_resource_t *container = pe_find_resource(data_set->resources, container_id); if (container) { rsc->container = container; pe__set_resource_flags(container, pe_rsc_is_container); container->fillers = g_list_append(container->fillers, rsc); pe_rsc_trace(rsc, "Resource %s's container is %s", rsc->id, container_id); } else { pe_err("Resource %s: Unknown resource container (%s)", rsc->id, container_id); } } } gboolean unpack_remote_nodes(xmlNode * xml_resources, pe_working_set_t * data_set) { xmlNode *xml_obj = NULL; /* Create remote nodes and guest nodes from the resource configuration * before unpacking resources. */ for (xml_obj = pcmk__xe_first_child(xml_resources); xml_obj != NULL; xml_obj = pcmk__xe_next(xml_obj)) { const char *new_node_id = NULL; /* Check for remote nodes, which are defined by ocf:pacemaker:remote * primitives. */ if (xml_contains_remote_node(xml_obj)) { new_node_id = ID(xml_obj); /* The "pe_find_node" check is here to make sure we don't iterate over * an expanded node that has already been added to the node list. */ if (new_node_id && pe_find_node(data_set->nodes, new_node_id) == NULL) { crm_trace("Found remote node %s defined by resource %s", new_node_id, ID(xml_obj)); pe_create_node(new_node_id, new_node_id, "remote", NULL, data_set); } continue; } /* Check for guest nodes, which are defined by special meta-attributes * of a primitive of any type (for example, VirtualDomain or Xen). */ if (pcmk__str_eq((const char *)xml_obj->name, XML_CIB_TAG_RESOURCE, pcmk__str_none)) { /* This will add an ocf:pacemaker:remote primitive to the * configuration for the guest node's connection, to be unpacked * later. */ new_node_id = expand_remote_rsc_meta(xml_obj, xml_resources, data_set); if (new_node_id && pe_find_node(data_set->nodes, new_node_id) == NULL) { crm_trace("Found guest node %s in resource %s", new_node_id, ID(xml_obj)); pe_create_node(new_node_id, new_node_id, "remote", NULL, data_set); } continue; } /* Check for guest nodes inside a group. Clones are currently not * supported as guest nodes. */ if (pcmk__str_eq((const char *)xml_obj->name, XML_CIB_TAG_GROUP, pcmk__str_none)) { xmlNode *xml_obj2 = NULL; for (xml_obj2 = pcmk__xe_first_child(xml_obj); xml_obj2 != NULL; xml_obj2 = pcmk__xe_next(xml_obj2)) { new_node_id = expand_remote_rsc_meta(xml_obj2, xml_resources, data_set); if (new_node_id && pe_find_node(data_set->nodes, new_node_id) == NULL) { crm_trace("Found guest node %s in resource %s inside group %s", new_node_id, ID(xml_obj2), ID(xml_obj)); pe_create_node(new_node_id, new_node_id, "remote", NULL, data_set); } } } } return TRUE; } /* Call this after all the nodes and resources have been * unpacked, but before the status section is read. * * A remote node's online status is reflected by the state * of the remote node's connection resource. We need to link * the remote node to this connection resource so we can have * easy access to the connection resource during the scheduler calculations. */ static void link_rsc2remotenode(pe_working_set_t *data_set, pe_resource_t *new_rsc) { pe_node_t *remote_node = NULL; if (new_rsc->is_remote_node == FALSE) { return; } if (pcmk_is_set(data_set->flags, pe_flag_quick_location)) { /* remote_nodes and remote_resources are not linked in quick location calculations */ return; } remote_node = pe_find_node(data_set->nodes, new_rsc->id); CRM_CHECK(remote_node != NULL, return); pe_rsc_trace(new_rsc, "Linking remote connection resource %s to %s", new_rsc->id, pe__node_name(remote_node)); remote_node->details->remote_rsc = new_rsc; if (new_rsc->container == NULL) { /* Handle start-up fencing for remote nodes (as opposed to guest nodes) * the same as is done for cluster nodes. */ handle_startup_fencing(data_set, remote_node); } else { /* pe_create_node() marks the new node as "remote" or "cluster"; now * that we know the node is a guest node, update it correctly. */ g_hash_table_replace(remote_node->details->attrs, strdup(CRM_ATTR_KIND), strdup("container")); } } static void destroy_tag(gpointer data) { pe_tag_t *tag = data; if (tag) { free(tag->id); g_list_free_full(tag->refs, free); free(tag); } } /*! * \internal * \brief Parse configuration XML for resource information * * \param[in] xml_resources Top of resource configuration XML * \param[in,out] data_set Where to put resource information * * \return TRUE * * \note unpack_remote_nodes() MUST be called before this, so that the nodes can * be used when pe__unpack_resource() calls resource_location() */ gboolean unpack_resources(const xmlNode *xml_resources, pe_working_set_t * data_set) { xmlNode *xml_obj = NULL; GList *gIter = NULL; data_set->template_rsc_sets = pcmk__strkey_table(free, destroy_tag); for (xml_obj = pcmk__xe_first_child(xml_resources); xml_obj != NULL; xml_obj = pcmk__xe_next(xml_obj)) { pe_resource_t *new_rsc = NULL; const char *id = ID(xml_obj); if (pcmk__str_empty(id)) { pcmk__config_err("Ignoring <%s> resource without ID", crm_element_name(xml_obj)); continue; } if (pcmk__str_eq((const char *) xml_obj->name, XML_CIB_TAG_RSC_TEMPLATE, pcmk__str_none)) { if (g_hash_table_lookup_extended(data_set->template_rsc_sets, id, NULL, NULL) == FALSE) { /* Record the template's ID for the knowledge of its existence anyway. */ g_hash_table_insert(data_set->template_rsc_sets, strdup(id), NULL); } continue; } crm_trace("Unpacking <%s " XML_ATTR_ID "='%s'>", crm_element_name(xml_obj), id); if (pe__unpack_resource(xml_obj, &new_rsc, NULL, data_set) == pcmk_rc_ok) { data_set->resources = g_list_append(data_set->resources, new_rsc); pe_rsc_trace(new_rsc, "Added resource %s", new_rsc->id); } else { pcmk__config_err("Ignoring <%s> resource '%s' " "because configuration is invalid", crm_element_name(xml_obj), id); } } for (gIter = data_set->resources; gIter != NULL; gIter = gIter->next) { pe_resource_t *rsc = (pe_resource_t *) gIter->data; setup_container(rsc, data_set); link_rsc2remotenode(data_set, rsc); } data_set->resources = g_list_sort(data_set->resources, pe__cmp_rsc_priority); if (pcmk_is_set(data_set->flags, pe_flag_quick_location)) { /* Ignore */ } else if (pcmk_is_set(data_set->flags, pe_flag_stonith_enabled) && !pcmk_is_set(data_set->flags, pe_flag_have_stonith_resource)) { pcmk__config_err("Resource start-up disabled since no STONITH resources have been defined"); pcmk__config_err("Either configure some or disable STONITH with the stonith-enabled option"); pcmk__config_err("NOTE: Clusters with shared data need STONITH to ensure data integrity"); } return TRUE; } gboolean unpack_tags(xmlNode * xml_tags, pe_working_set_t * data_set) { xmlNode *xml_tag = NULL; data_set->tags = pcmk__strkey_table(free, destroy_tag); for (xml_tag = pcmk__xe_first_child(xml_tags); xml_tag != NULL; xml_tag = pcmk__xe_next(xml_tag)) { xmlNode *xml_obj_ref = NULL; const char *tag_id = ID(xml_tag); if (!pcmk__str_eq((const char *)xml_tag->name, XML_CIB_TAG_TAG, pcmk__str_none)) { continue; } if (tag_id == NULL) { pcmk__config_err("Ignoring <%s> without " XML_ATTR_ID, crm_element_name(xml_tag)); continue; } for (xml_obj_ref = pcmk__xe_first_child(xml_tag); xml_obj_ref != NULL; xml_obj_ref = pcmk__xe_next(xml_obj_ref)) { const char *obj_ref = ID(xml_obj_ref); if (!pcmk__str_eq((const char *)xml_obj_ref->name, XML_CIB_TAG_OBJ_REF, pcmk__str_none)) { continue; } if (obj_ref == NULL) { pcmk__config_err("Ignoring <%s> for tag '%s' without " XML_ATTR_ID, crm_element_name(xml_obj_ref), tag_id); continue; } if (add_tag_ref(data_set->tags, tag_id, obj_ref) == FALSE) { return FALSE; } } } return TRUE; } /* The ticket state section: * "/cib/status/tickets/ticket_state" */ static gboolean unpack_ticket_state(xmlNode * xml_ticket, pe_working_set_t * data_set) { const char *ticket_id = NULL; const char *granted = NULL; const char *last_granted = NULL; const char *standby = NULL; xmlAttrPtr xIter = NULL; pe_ticket_t *ticket = NULL; ticket_id = ID(xml_ticket); if (pcmk__str_empty(ticket_id)) { return FALSE; } crm_trace("Processing ticket state for %s", ticket_id); ticket = g_hash_table_lookup(data_set->tickets, ticket_id); if (ticket == NULL) { ticket = ticket_new(ticket_id, data_set); if (ticket == NULL) { return FALSE; } } for (xIter = xml_ticket->properties; xIter; xIter = xIter->next) { const char *prop_name = (const char *)xIter->name; const char *prop_value = pcmk__xml_attr_value(xIter); if (pcmk__str_eq(prop_name, XML_ATTR_ID, pcmk__str_none)) { continue; } g_hash_table_replace(ticket->state, strdup(prop_name), strdup(prop_value)); } granted = g_hash_table_lookup(ticket->state, "granted"); if (granted && crm_is_true(granted)) { ticket->granted = TRUE; crm_info("We have ticket '%s'", ticket->id); } else { ticket->granted = FALSE; crm_info("We do not have ticket '%s'", ticket->id); } last_granted = g_hash_table_lookup(ticket->state, "last-granted"); if (last_granted) { long long last_granted_ll; pcmk__scan_ll(last_granted, &last_granted_ll, 0LL); ticket->last_granted = (time_t) last_granted_ll; } standby = g_hash_table_lookup(ticket->state, "standby"); if (standby && crm_is_true(standby)) { ticket->standby = TRUE; if (ticket->granted) { crm_info("Granted ticket '%s' is in standby-mode", ticket->id); } } else { ticket->standby = FALSE; } crm_trace("Done with ticket state for %s", ticket_id); return TRUE; } static gboolean unpack_tickets_state(xmlNode * xml_tickets, pe_working_set_t * data_set) { xmlNode *xml_obj = NULL; for (xml_obj = pcmk__xe_first_child(xml_tickets); xml_obj != NULL; xml_obj = pcmk__xe_next(xml_obj)) { if (!pcmk__str_eq((const char *)xml_obj->name, XML_CIB_TAG_TICKET_STATE, pcmk__str_none)) { continue; } unpack_ticket_state(xml_obj, data_set); } return TRUE; } static void unpack_handle_remote_attrs(pe_node_t *this_node, const xmlNode *state, pe_working_set_t *data_set) { const char *resource_discovery_enabled = NULL; const xmlNode *attrs = NULL; pe_resource_t *rsc = NULL; if (!pcmk__str_eq((const char *)state->name, XML_CIB_TAG_STATE, pcmk__str_none)) { return; } if ((this_node == NULL) || !pe__is_guest_or_remote_node(this_node)) { return; } crm_trace("Processing Pacemaker Remote node %s", pe__node_name(this_node)); pcmk__scan_min_int(crm_element_value(state, XML_NODE_IS_MAINTENANCE), &(this_node->details->remote_maintenance), 0); rsc = this_node->details->remote_rsc; if (this_node->details->remote_requires_reset == FALSE) { this_node->details->unclean = FALSE; this_node->details->unseen = FALSE; } attrs = find_xml_node(state, XML_TAG_TRANSIENT_NODEATTRS, FALSE); add_node_attrs(attrs, this_node, TRUE, data_set); if (pe__shutdown_requested(this_node)) { crm_info("%s is shutting down", pe__node_name(this_node)); this_node->details->shutdown = TRUE; } if (crm_is_true(pe_node_attribute_raw(this_node, "standby"))) { crm_info("%s is in standby mode", pe__node_name(this_node)); this_node->details->standby = TRUE; } if (crm_is_true(pe_node_attribute_raw(this_node, "maintenance")) || ((rsc != NULL) && !pcmk_is_set(rsc->flags, pe_rsc_managed))) { crm_info("%s is in maintenance mode", pe__node_name(this_node)); this_node->details->maintenance = TRUE; } resource_discovery_enabled = pe_node_attribute_raw(this_node, XML_NODE_ATTR_RSC_DISCOVERY); if (resource_discovery_enabled && !crm_is_true(resource_discovery_enabled)) { if (pe__is_remote_node(this_node) && !pcmk_is_set(data_set->flags, pe_flag_stonith_enabled)) { crm_warn("Ignoring " XML_NODE_ATTR_RSC_DISCOVERY " attribute on Pacemaker Remote node %s" " because fencing is disabled", pe__node_name(this_node)); } else { /* This is either a remote node with fencing enabled, or a guest * node. We don't care whether fencing is enabled when fencing guest * nodes, because they are "fenced" by recovering their containing * resource. */ crm_info("%s has resource discovery disabled", pe__node_name(this_node)); this_node->details->rsc_discovery_enabled = FALSE; } } } /*! * \internal * \brief Unpack a cluster node's transient attributes * * \param[in] state CIB node state XML * \param[in,out] node Cluster node whose attributes are being unpacked * \param[in,out] data_set Cluster working set */ static void unpack_transient_attributes(const xmlNode *state, pe_node_t *node, pe_working_set_t *data_set) { const char *discovery = NULL; const xmlNode *attrs = find_xml_node(state, XML_TAG_TRANSIENT_NODEATTRS, FALSE); add_node_attrs(attrs, node, TRUE, data_set); if (crm_is_true(pe_node_attribute_raw(node, "standby"))) { crm_info("%s is in standby mode", pe__node_name(node)); node->details->standby = TRUE; } if (crm_is_true(pe_node_attribute_raw(node, "maintenance"))) { crm_info("%s is in maintenance mode", pe__node_name(node)); node->details->maintenance = TRUE; } discovery = pe_node_attribute_raw(node, XML_NODE_ATTR_RSC_DISCOVERY); if ((discovery != NULL) && !crm_is_true(discovery)) { crm_warn("Ignoring " XML_NODE_ATTR_RSC_DISCOVERY " attribute for %s because disabling resource discovery " "is not allowed for cluster nodes", pe__node_name(node)); } } /*! * \internal * \brief Unpack a node state entry (first pass) * * Unpack one node state entry from status. This unpacks information from the * node_state element itself and node attributes inside it, but not the * resource history inside it. Multiple passes through the status are needed to * fully unpack everything. * * \param[in] state CIB node state XML * \param[in,out] data_set Cluster working set */ static void unpack_node_state(const xmlNode *state, pe_working_set_t *data_set) { const char *id = NULL; const char *uname = NULL; pe_node_t *this_node = NULL; id = crm_element_value(state, XML_ATTR_ID); if (id == NULL) { crm_warn("Ignoring malformed " XML_CIB_TAG_STATE " entry without " XML_ATTR_ID); return; } uname = crm_element_value(state, XML_ATTR_UNAME); if (uname == NULL) { - crm_warn("Ignoring malformed " XML_CIB_TAG_STATE " entry without " - XML_ATTR_UNAME); - return; + /* If a joining peer makes the cluster acquire the quorum from corosync + * meanwhile it has not joined CPG membership of pacemaker-controld yet, + * it's possible that the created node_state entry doesn't have an uname + * yet. We should recognize the node as `pending` and wait for it to + * join CPG. + */ + crm_trace("Handling " XML_CIB_TAG_STATE " entry with id=\"%s\" without " + XML_ATTR_UNAME, id); } this_node = pe_find_node_any(data_set->nodes, id, uname); if (this_node == NULL) { - pcmk__config_warn("Ignoring recorded node state for '%s' because " - "it is no longer in the configuration", uname); + pcmk__config_warn("Ignoring recorded node state for id=\"%s\" (%s) " + "because it is no longer in the configuration", + id, pcmk__s(uname, "uname unknown")); return; } if (pe__is_guest_or_remote_node(this_node)) { /* We can't determine the online status of Pacemaker Remote nodes until * after all resource history has been unpacked. In this first pass, we * do need to mark whether the node has been fenced, as this plays a * role during unpacking cluster node resource state. */ pcmk__scan_min_int(crm_element_value(state, XML_NODE_IS_FENCED), &(this_node->details->remote_was_fenced), 0); return; } unpack_transient_attributes(state, this_node, data_set); /* Provisionally mark this cluster node as clean. We have at least seen it * in the current cluster's lifetime. */ this_node->details->unclean = FALSE; this_node->details->unseen = FALSE; crm_trace("Determining online status of cluster node %s (id %s)", pe__node_name(this_node), id); determine_online_status(state, this_node, data_set); if (!pcmk_is_set(data_set->flags, pe_flag_have_quorum) && this_node->details->online && (data_set->no_quorum_policy == no_quorum_suicide)) { /* Everything else should flow from this automatically * (at least until the scheduler becomes able to migrate off * healthy resources) */ pe_fence_node(data_set, this_node, "cluster does not have quorum", FALSE); } } /*! * \internal * \brief Unpack nodes' resource history as much as possible * * Unpack as many nodes' resource history as possible in one pass through the * status. We need to process Pacemaker Remote nodes' connections/containers * before unpacking their history; the connection/container history will be * in another node's history, so it might take multiple passes to unpack * everything. * * \param[in] status CIB XML status section * \param[in] fence If true, treat any not-yet-unpacked nodes as unseen * \param[in,out] data_set Cluster working set * * \return Standard Pacemaker return code (specifically pcmk_rc_ok if done, * or EAGAIN if more unpacking remains to be done) */ static int unpack_node_history(const xmlNode *status, bool fence, pe_working_set_t *data_set) { int rc = pcmk_rc_ok; // Loop through all node_state entries in CIB status for (const xmlNode *state = first_named_child(status, XML_CIB_TAG_STATE); state != NULL; state = crm_next_same_xml(state)) { const char *id = ID(state); const char *uname = crm_element_value(state, XML_ATTR_UNAME); pe_node_t *this_node = NULL; if ((id == NULL) || (uname == NULL)) { // Warning already logged in first pass through status section crm_trace("Not unpacking resource history from malformed " XML_CIB_TAG_STATE " without id and/or uname"); continue; } this_node = pe_find_node_any(data_set->nodes, id, uname); if (this_node == NULL) { // Warning already logged in first pass through status section crm_trace("Not unpacking resource history for node %s because " "no longer in configuration", id); continue; } if (this_node->details->unpacked) { crm_trace("Not unpacking resource history for node %s because " "already unpacked", id); continue; } if (fence) { // We're processing all remaining nodes } else if (pe__is_guest_node(this_node)) { /* We can unpack a guest node's history only after we've unpacked * other resource history to the point that we know that the node's * connection and containing resource are both up. */ pe_resource_t *rsc = this_node->details->remote_rsc; if ((rsc == NULL) || (rsc->role != RSC_ROLE_STARTED) || (rsc->container->role != RSC_ROLE_STARTED)) { crm_trace("Not unpacking resource history for guest node %s " "because container and connection are not known to " "be up", id); continue; } } else if (pe__is_remote_node(this_node)) { /* We can unpack a remote node's history only after we've unpacked * other resource history to the point that we know that the node's * connection is up, with the exception of when shutdown locks are * in use. */ pe_resource_t *rsc = this_node->details->remote_rsc; if ((rsc == NULL) || (!pcmk_is_set(data_set->flags, pe_flag_shutdown_lock) && (rsc->role != RSC_ROLE_STARTED))) { crm_trace("Not unpacking resource history for remote node %s " "because connection is not known to be up", id); continue; } /* If fencing and shutdown locks are disabled and we're not processing * unseen nodes, then we don't want to unpack offline nodes until online * nodes have been unpacked. This allows us to number active clone * instances first. */ } else if (!pcmk_any_flags_set(data_set->flags, pe_flag_stonith_enabled |pe_flag_shutdown_lock) && !this_node->details->online) { crm_trace("Not unpacking resource history for offline " "cluster node %s", id); continue; } if (pe__is_guest_or_remote_node(this_node)) { determine_remote_online_status(data_set, this_node); unpack_handle_remote_attrs(this_node, state, data_set); } crm_trace("Unpacking resource history for %snode %s", (fence? "unseen " : ""), id); this_node->details->unpacked = TRUE; unpack_node_lrm(this_node, state, data_set); rc = EAGAIN; // Other node histories might depend on this one } return rc; } /* remove nodes that are down, stopping */ /* create positive rsc_to_node constraints between resources and the nodes they are running on */ /* anything else? */ gboolean unpack_status(xmlNode * status, pe_working_set_t * data_set) { xmlNode *state = NULL; crm_trace("Beginning unpack"); if (data_set->tickets == NULL) { data_set->tickets = pcmk__strkey_table(free, destroy_ticket); } for (state = pcmk__xe_first_child(status); state != NULL; state = pcmk__xe_next(state)) { if (pcmk__str_eq((const char *)state->name, XML_CIB_TAG_TICKETS, pcmk__str_none)) { unpack_tickets_state((xmlNode *) state, data_set); } else if (pcmk__str_eq((const char *)state->name, XML_CIB_TAG_STATE, pcmk__str_none)) { unpack_node_state(state, data_set); } } while (unpack_node_history(status, FALSE, data_set) == EAGAIN) { crm_trace("Another pass through node resource histories is needed"); } // Now catch any nodes we didn't see unpack_node_history(status, pcmk_is_set(data_set->flags, pe_flag_stonith_enabled), data_set); /* Now that we know where resources are, we can schedule stops of containers * with failed bundle connections */ if (data_set->stop_needed != NULL) { for (GList *item = data_set->stop_needed; item; item = item->next) { pe_resource_t *container = item->data; pe_node_t *node = pe__current_node(container); if (node) { stop_action(container, node, FALSE); } } g_list_free(data_set->stop_needed); data_set->stop_needed = NULL; } /* Now that we know status of all Pacemaker Remote connections and nodes, * we can stop connections for node shutdowns, and check the online status * of remote/guest nodes that didn't have any node history to unpack. */ for (GList *gIter = data_set->nodes; gIter != NULL; gIter = gIter->next) { pe_node_t *this_node = gIter->data; if (!pe__is_guest_or_remote_node(this_node)) { continue; } if (this_node->details->shutdown && (this_node->details->remote_rsc != NULL)) { pe__set_next_role(this_node->details->remote_rsc, RSC_ROLE_STOPPED, "remote shutdown"); } if (!this_node->details->unpacked) { determine_remote_online_status(data_set, this_node); } } return TRUE; } static gboolean determine_online_status_no_fencing(pe_working_set_t *data_set, const xmlNode *node_state, pe_node_t *this_node) { gboolean online = FALSE; const char *join = crm_element_value(node_state, XML_NODE_JOIN_STATE); const char *is_peer = crm_element_value(node_state, XML_NODE_IS_PEER); const char *in_cluster = crm_element_value(node_state, XML_NODE_IN_CLUSTER); const char *exp_state = crm_element_value(node_state, XML_NODE_EXPECTED); + int member = false; + bool crmd_online = false; + long long when_member = 0; + long long when_online = 0; + + if (crm_str_to_boolean(in_cluster, &member) != 1) { + pcmk__scan_ll(in_cluster, &when_member, 0LL); + member = (when_member > 0) ? true : false; + } + + if (pcmk__str_eq(is_peer, ONLINESTATUS, pcmk__str_casei)) { + crmd_online = true; - if (!crm_is_true(in_cluster)) { + } else if (pcmk__str_eq(is_peer, OFFLINESTATUS, pcmk__str_casei)) { + crmd_online = false; + + } else { + pcmk__scan_ll(is_peer, &when_online, 0LL); + crmd_online = (when_online > 0) ? true : false; + } + + if (!member) { crm_trace("Node is down: in_cluster=%s", pcmk__s(in_cluster, "")); - } else if (pcmk__str_eq(is_peer, ONLINESTATUS, pcmk__str_casei)) { + } else if (crmd_online) { if (pcmk__str_eq(join, CRMD_JOINSTATE_MEMBER, pcmk__str_casei)) { online = TRUE; } else { crm_debug("Node is not ready to run resources: %s", join); } } else if (this_node->details->expected_up == FALSE) { crm_trace("Controller is down: " "in_cluster=%s is_peer=%s join=%s expected=%s", pcmk__s(in_cluster, ""), pcmk__s(is_peer, ""), pcmk__s(join, ""), pcmk__s(exp_state, "")); } else { /* mark it unclean */ pe_fence_node(data_set, this_node, "peer is unexpectedly down", FALSE); crm_info("in_cluster=%s is_peer=%s join=%s expected=%s", pcmk__s(in_cluster, ""), pcmk__s(is_peer, ""), pcmk__s(join, ""), pcmk__s(exp_state, "")); } return online; } static gboolean determine_online_status_fencing(pe_working_set_t *data_set, const xmlNode *node_state, pe_node_t *this_node) { gboolean online = FALSE; gboolean do_terminate = FALSE; bool crmd_online = FALSE; const char *join = crm_element_value(node_state, XML_NODE_JOIN_STATE); const char *is_peer = crm_element_value(node_state, XML_NODE_IS_PEER); const char *in_cluster = crm_element_value(node_state, XML_NODE_IN_CLUSTER); const char *exp_state = crm_element_value(node_state, XML_NODE_EXPECTED); const char *terminate = pe_node_attribute_raw(this_node, "terminate"); + int member = false; + long long when_member = 0; + long long when_online = 0; /* - - XML_NODE_IN_CLUSTER ::= true|false - - XML_NODE_IS_PEER ::= online|offline - XML_NODE_JOIN_STATE ::= member|down|pending|banned - XML_NODE_EXPECTED ::= member|down + + @COMPAT with entries recorded for DCs < 2.1.7 + - XML_NODE_IN_CLUSTER ::= true|false + - XML_NODE_IS_PEER ::= online|offline + + Since crm_feature_set 3.18.0 (pacemaker-2.1.7): + - XML_NODE_IN_CLUSTER ::= |0 + Since when node has been a cluster member. A value 0 of means the node is not + a cluster member. + + - XML_NODE_IS_PEER ::= |0 + Since when peer has been online in CPG. A value 0 means the peer is offline + in CPG. */ if (crm_is_true(terminate)) { do_terminate = TRUE; } else if (terminate != NULL && strlen(terminate) > 0) { /* could be a time() value */ char t = terminate[0]; if (t != '0' && isdigit(t)) { do_terminate = TRUE; } } crm_trace("%s: in_cluster=%s is_peer=%s join=%s expected=%s term=%d", pe__node_name(this_node), pcmk__s(in_cluster, ""), pcmk__s(is_peer, ""), pcmk__s(join, ""), pcmk__s(exp_state, ""), do_terminate); - online = crm_is_true(in_cluster); - crmd_online = pcmk__str_eq(is_peer, ONLINESTATUS, pcmk__str_casei); + /* @COMPAT with boolean values of XML_NODE_IN_CLUSTER recorded for + * DCs < 2.1.7 + */ + if (crm_str_to_boolean(in_cluster, &member) != 1) { + pcmk__scan_ll(in_cluster, &when_member, 0LL); + member = (when_member > 0) ? true : false; + } + + online = member; + + /* @COMPAT with "online"/"offline" values of XML_NODE_IS_PEER recorded for + * DCs < 2.1.7 + */ + if (pcmk__str_eq(is_peer, ONLINESTATUS, pcmk__str_casei)) { + crmd_online = true; + + } else if (pcmk__str_eq(is_peer, OFFLINESTATUS, pcmk__str_casei)) { + crmd_online = false; + + } else { + pcmk__scan_ll(is_peer, &when_online, 0LL); + crmd_online = (when_online > 0) ? true : false; + } + if (exp_state == NULL) { exp_state = CRMD_JOINSTATE_DOWN; } if (this_node->details->shutdown) { crm_debug("%s is shutting down", pe__node_name(this_node)); /* Slightly different criteria since we can't shut down a dead peer */ online = crmd_online; } else if (in_cluster == NULL) { pe_fence_node(data_set, this_node, "peer has not been seen by the cluster", FALSE); } else if (pcmk__str_eq(join, CRMD_JOINSTATE_NACK, pcmk__str_casei)) { pe_fence_node(data_set, this_node, "peer failed Pacemaker membership criteria", FALSE); } else if (do_terminate == FALSE && pcmk__str_eq(exp_state, CRMD_JOINSTATE_DOWN, pcmk__str_casei)) { - if (crm_is_true(in_cluster) || crmd_online) { + if (when_member > 0 + && when_online == 0 + && (get_effective_time(data_set) - when_member + >= data_set->node_pending_timeout)) { + pe_fence_node(data_set, this_node, + "peer pending timed out on joining the process group", + FALSE); + + } else if (member || crmd_online) { crm_info("- %s is not ready to run resources", pe__node_name(this_node)); this_node->details->standby = TRUE; this_node->details->pending = TRUE; } else { crm_trace("%s is down or still coming up", pe__node_name(this_node)); } } else if (do_terminate && pcmk__str_eq(join, CRMD_JOINSTATE_DOWN, pcmk__str_casei) - && crm_is_true(in_cluster) == FALSE && !crmd_online) { + && !member && !crmd_online) { crm_info("%s was just shot", pe__node_name(this_node)); online = FALSE; - } else if (crm_is_true(in_cluster) == FALSE) { + } else if (!member) { // Consider `priority-fencing-delay` for lost nodes pe_fence_node(data_set, this_node, "peer is no longer part of the cluster", TRUE); } else if (!crmd_online) { pe_fence_node(data_set, this_node, "peer process is no longer available", FALSE); /* Everything is running at this point, now check join state */ } else if (do_terminate) { pe_fence_node(data_set, this_node, "termination was requested", FALSE); } else if (pcmk__str_eq(join, CRMD_JOINSTATE_MEMBER, pcmk__str_casei)) { crm_info("%s is active", pe__node_name(this_node)); } else if (pcmk__strcase_any_of(join, CRMD_JOINSTATE_PENDING, CRMD_JOINSTATE_DOWN, NULL)) { crm_info("%s is not ready to run resources", pe__node_name(this_node)); this_node->details->standby = TRUE; this_node->details->pending = TRUE; } else { pe_fence_node(data_set, this_node, "peer was in an unknown state", FALSE); crm_warn("%s: in-cluster=%s is-peer=%s join=%s expected=%s term=%d shutdown=%d", pe__node_name(this_node), pcmk__s(in_cluster, ""), pcmk__s(is_peer, ""), pcmk__s(join, ""), pcmk__s(exp_state, ""), do_terminate, this_node->details->shutdown); } return online; } static void determine_remote_online_status(pe_working_set_t * data_set, pe_node_t * this_node) { pe_resource_t *rsc = this_node->details->remote_rsc; pe_resource_t *container = NULL; pe_node_t *host = NULL; /* If there is a node state entry for a (former) Pacemaker Remote node * but no resource creating that node, the node's connection resource will * be NULL. Consider it an offline remote node in that case. */ if (rsc == NULL) { this_node->details->online = FALSE; goto remote_online_done; } container = rsc->container; if (container && pcmk__list_of_1(rsc->running_on)) { host = rsc->running_on->data; } /* If the resource is currently started, mark it online. */ if (rsc->role == RSC_ROLE_STARTED) { crm_trace("%s node %s presumed ONLINE because connection resource is started", (container? "Guest" : "Remote"), this_node->details->id); this_node->details->online = TRUE; } /* consider this node shutting down if transitioning start->stop */ if (rsc->role == RSC_ROLE_STARTED && rsc->next_role == RSC_ROLE_STOPPED) { crm_trace("%s node %s shutting down because connection resource is stopping", (container? "Guest" : "Remote"), this_node->details->id); this_node->details->shutdown = TRUE; } /* Now check all the failure conditions. */ if(container && pcmk_is_set(container->flags, pe_rsc_failed)) { crm_trace("Guest node %s UNCLEAN because guest resource failed", this_node->details->id); this_node->details->online = FALSE; this_node->details->remote_requires_reset = TRUE; } else if (pcmk_is_set(rsc->flags, pe_rsc_failed)) { crm_trace("%s node %s OFFLINE because connection resource failed", (container? "Guest" : "Remote"), this_node->details->id); this_node->details->online = FALSE; } else if (rsc->role == RSC_ROLE_STOPPED || (container && container->role == RSC_ROLE_STOPPED)) { crm_trace("%s node %s OFFLINE because its resource is stopped", (container? "Guest" : "Remote"), this_node->details->id); this_node->details->online = FALSE; this_node->details->remote_requires_reset = FALSE; } else if (host && (host->details->online == FALSE) && host->details->unclean) { crm_trace("Guest node %s UNCLEAN because host is unclean", this_node->details->id); this_node->details->online = FALSE; this_node->details->remote_requires_reset = TRUE; } remote_online_done: crm_trace("Remote node %s online=%s", this_node->details->id, this_node->details->online ? "TRUE" : "FALSE"); } static void determine_online_status(const xmlNode *node_state, pe_node_t *this_node, pe_working_set_t *data_set) { gboolean online = FALSE; const char *exp_state = crm_element_value(node_state, XML_NODE_EXPECTED); CRM_CHECK(this_node != NULL, return); this_node->details->shutdown = FALSE; this_node->details->expected_up = FALSE; if (pe__shutdown_requested(this_node)) { this_node->details->shutdown = TRUE; } else if (pcmk__str_eq(exp_state, CRMD_JOINSTATE_MEMBER, pcmk__str_casei)) { this_node->details->expected_up = TRUE; } if (this_node->details->type == node_ping) { this_node->details->unclean = FALSE; online = FALSE; /* As far as resource management is concerned, * the node is safely offline. * Anyone caught abusing this logic will be shot */ } else if (!pcmk_is_set(data_set->flags, pe_flag_stonith_enabled)) { online = determine_online_status_no_fencing(data_set, node_state, this_node); } else { online = determine_online_status_fencing(data_set, node_state, this_node); } if (online) { this_node->details->online = TRUE; } else { /* remove node from contention */ this_node->fixed = TRUE; // @COMPAT deprecated and unused this_node->weight = -INFINITY; } if (online && this_node->details->shutdown) { /* don't run resources here */ this_node->fixed = TRUE; // @COMPAT deprecated and unused this_node->weight = -INFINITY; } if (this_node->details->type == node_ping) { crm_info("%s is not a Pacemaker node", pe__node_name(this_node)); } else if (this_node->details->unclean) { pe_proc_warn("%s is unclean", pe__node_name(this_node)); } else if (this_node->details->online) { crm_info("%s is %s", pe__node_name(this_node), this_node->details->shutdown ? "shutting down" : this_node->details->pending ? "pending" : this_node->details->standby ? "standby" : this_node->details->maintenance ? "maintenance" : "online"); } else { crm_trace("%s is offline", pe__node_name(this_node)); } } /*! * \internal * \brief Find the end of a resource's name, excluding any clone suffix * * \param[in] id Resource ID to check * * \return Pointer to last character of resource's base name */ const char * pe_base_name_end(const char *id) { if (!pcmk__str_empty(id)) { const char *end = id + strlen(id) - 1; for (const char *s = end; s > id; --s) { switch (*s) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': break; case ':': return (s == end)? s : (s - 1); default: return end; } } return end; } return NULL; } /*! * \internal * \brief Get a resource name excluding any clone suffix * * \param[in] last_rsc_id Resource ID to check * * \return Pointer to newly allocated string with resource's base name * \note It is the caller's responsibility to free() the result. * This asserts on error, so callers can assume result is not NULL. */ char * clone_strip(const char *last_rsc_id) { const char *end = pe_base_name_end(last_rsc_id); char *basename = NULL; CRM_ASSERT(end); basename = strndup(last_rsc_id, end - last_rsc_id + 1); CRM_ASSERT(basename); return basename; } /*! * \internal * \brief Get the name of the first instance of a cloned resource * * \param[in] last_rsc_id Resource ID to check * * \return Pointer to newly allocated string with resource's base name plus :0 * \note It is the caller's responsibility to free() the result. * This asserts on error, so callers can assume result is not NULL. */ char * clone_zero(const char *last_rsc_id) { const char *end = pe_base_name_end(last_rsc_id); size_t base_name_len = end - last_rsc_id + 1; char *zero = NULL; CRM_ASSERT(end); zero = calloc(base_name_len + 3, sizeof(char)); CRM_ASSERT(zero); memcpy(zero, last_rsc_id, base_name_len); zero[base_name_len] = ':'; zero[base_name_len + 1] = '0'; return zero; } static pe_resource_t * create_fake_resource(const char *rsc_id, const xmlNode *rsc_entry, pe_working_set_t *data_set) { pe_resource_t *rsc = NULL; xmlNode *xml_rsc = create_xml_node(NULL, XML_CIB_TAG_RESOURCE); copy_in_properties(xml_rsc, rsc_entry); crm_xml_add(xml_rsc, XML_ATTR_ID, rsc_id); crm_log_xml_debug(xml_rsc, "Orphan resource"); if (pe__unpack_resource(xml_rsc, &rsc, NULL, data_set) != pcmk_rc_ok) { return NULL; } if (xml_contains_remote_node(xml_rsc)) { pe_node_t *node; crm_debug("Detected orphaned remote node %s", rsc_id); node = pe_find_node(data_set->nodes, rsc_id); if (node == NULL) { node = pe_create_node(rsc_id, rsc_id, "remote", NULL, data_set); } link_rsc2remotenode(data_set, rsc); if (node) { crm_trace("Setting node %s as shutting down due to orphaned connection resource", rsc_id); node->details->shutdown = TRUE; } } if (crm_element_value(rsc_entry, XML_RSC_ATTR_CONTAINER)) { /* This orphaned rsc needs to be mapped to a container. */ crm_trace("Detected orphaned container filler %s", rsc_id); pe__set_resource_flags(rsc, pe_rsc_orphan_container_filler); } pe__set_resource_flags(rsc, pe_rsc_orphan); data_set->resources = g_list_append(data_set->resources, rsc); return rsc; } /*! * \internal * \brief Create orphan instance for anonymous clone resource history * * \param[in,out] parent Clone resource that orphan will be added to * \param[in] rsc_id Orphan's resource ID * \param[in] node Where orphan is active (for logging only) * \param[in,out] data_set Cluster working set * * \return Newly added orphaned instance of \p parent */ static pe_resource_t * create_anonymous_orphan(pe_resource_t *parent, const char *rsc_id, const pe_node_t *node, pe_working_set_t *data_set) { pe_resource_t *top = pe__create_clone_child(parent, data_set); // find_rsc() because we might be a cloned group pe_resource_t *orphan = top->fns->find_rsc(top, rsc_id, NULL, pe_find_clone); pe_rsc_debug(parent, "Created orphan %s for %s: %s on %s", top->id, parent->id, rsc_id, pe__node_name(node)); return orphan; } /*! * \internal * \brief Check a node for an instance of an anonymous clone * * Return a child instance of the specified anonymous clone, in order of * preference: (1) the instance running on the specified node, if any; * (2) an inactive instance (i.e. within the total of clone-max instances); * (3) a newly created orphan (i.e. clone-max instances are already active). * * \param[in,out] data_set Cluster information * \param[in] node Node on which to check for instance * \param[in,out] parent Clone to check * \param[in] rsc_id Name of cloned resource in history (without instance) */ static pe_resource_t * find_anonymous_clone(pe_working_set_t *data_set, const pe_node_t *node, pe_resource_t *parent, const char *rsc_id) { GList *rIter = NULL; pe_resource_t *rsc = NULL; pe_resource_t *inactive_instance = NULL; gboolean skip_inactive = FALSE; CRM_ASSERT(parent != NULL); CRM_ASSERT(pe_rsc_is_clone(parent)); CRM_ASSERT(!pcmk_is_set(parent->flags, pe_rsc_unique)); // Check for active (or partially active, for cloned groups) instance pe_rsc_trace(parent, "Looking for %s on %s in %s", rsc_id, pe__node_name(node), parent->id); for (rIter = parent->children; rsc == NULL && rIter; rIter = rIter->next) { GList *locations = NULL; pe_resource_t *child = rIter->data; /* Check whether this instance is already known to be active or pending * anywhere, at this stage of unpacking. Because this function is called * for a resource before the resource's individual operation history * entries are unpacked, locations will generally not contain the * desired node. * * However, there are three exceptions: * (1) when child is a cloned group and we have already unpacked the * history of another member of the group on the same node; * (2) when we've already unpacked the history of another numbered * instance on the same node (which can happen if globally-unique * was flipped from true to false); and * (3) when we re-run calculations on the same data set as part of a * simulation. */ child->fns->location(child, &locations, 2); if (locations) { /* We should never associate the same numbered anonymous clone * instance with multiple nodes, and clone instances can't migrate, * so there must be only one location, regardless of history. */ CRM_LOG_ASSERT(locations->next == NULL); if (((pe_node_t *)locations->data)->details == node->details) { /* This child instance is active on the requested node, so check * for a corresponding configured resource. We use find_rsc() * instead of child because child may be a cloned group, and we * need the particular member corresponding to rsc_id. * * If the history entry is orphaned, rsc will be NULL. */ rsc = parent->fns->find_rsc(child, rsc_id, NULL, pe_find_clone); if (rsc) { /* If there are multiple instance history entries for an * anonymous clone in a single node's history (which can * happen if globally-unique is switched from true to * false), we want to consider the instances beyond the * first as orphans, even if there are inactive instance * numbers available. */ if (rsc->running_on) { crm_notice("Active (now-)anonymous clone %s has " "multiple (orphan) instance histories on %s", parent->id, pe__node_name(node)); skip_inactive = TRUE; rsc = NULL; } else { pe_rsc_trace(parent, "Resource %s, active", rsc->id); } } } g_list_free(locations); } else { pe_rsc_trace(parent, "Resource %s, skip inactive", child->id); if (!skip_inactive && !inactive_instance && !pcmk_is_set(child->flags, pe_rsc_block)) { // Remember one inactive instance in case we don't find active inactive_instance = parent->fns->find_rsc(child, rsc_id, NULL, pe_find_clone); /* ... but don't use it if it was already associated with a * pending action on another node */ if (inactive_instance && inactive_instance->pending_node && (inactive_instance->pending_node->details != node->details)) { inactive_instance = NULL; } } } } if ((rsc == NULL) && !skip_inactive && (inactive_instance != NULL)) { pe_rsc_trace(parent, "Resource %s, empty slot", inactive_instance->id); rsc = inactive_instance; } /* If the resource has "requires" set to "quorum" or "nothing", and we don't * have a clone instance for every node, we don't want to consume a valid * instance number for unclean nodes. Such instances may appear to be active * according to the history, but should be considered inactive, so we can * start an instance elsewhere. Treat such instances as orphans. * * An exception is instances running on guest nodes -- since guest node * "fencing" is actually just a resource stop, requires shouldn't apply. * * @TODO Ideally, we'd use an inactive instance number if it is not needed * for any clean instances. However, we don't know that at this point. */ if ((rsc != NULL) && !pcmk_is_set(rsc->flags, pe_rsc_needs_fencing) && (!node->details->online || node->details->unclean) && !pe__is_guest_node(node) && !pe__is_universal_clone(parent, data_set)) { rsc = NULL; } if (rsc == NULL) { rsc = create_anonymous_orphan(parent, rsc_id, node, data_set); pe_rsc_trace(parent, "Resource %s, orphan", rsc->id); } return rsc; } static pe_resource_t * unpack_find_resource(pe_working_set_t *data_set, const pe_node_t *node, const char *rsc_id) { pe_resource_t *rsc = NULL; pe_resource_t *parent = NULL; crm_trace("looking for %s", rsc_id); rsc = pe_find_resource(data_set->resources, rsc_id); if (rsc == NULL) { /* If we didn't find the resource by its name in the operation history, * check it again as a clone instance. Even when clone-max=0, we create * a single :0 orphan to match against here. */ char *clone0_id = clone_zero(rsc_id); pe_resource_t *clone0 = pe_find_resource(data_set->resources, clone0_id); if (clone0 && !pcmk_is_set(clone0->flags, pe_rsc_unique)) { rsc = clone0; parent = uber_parent(clone0); crm_trace("%s found as %s (%s)", rsc_id, clone0_id, parent->id); } else { crm_trace("%s is not known as %s either (orphan)", rsc_id, clone0_id); } free(clone0_id); } else if (rsc->variant > pe_native) { crm_trace("Resource history for %s is orphaned because it is no longer primitive", rsc_id); return NULL; } else { parent = uber_parent(rsc); } if (pe_rsc_is_anon_clone(parent)) { if (pe_rsc_is_bundled(parent)) { rsc = pe__find_bundle_replica(parent->parent, node); } else { char *base = clone_strip(rsc_id); rsc = find_anonymous_clone(data_set, node, parent, base); free(base); CRM_ASSERT(rsc != NULL); } } if (rsc && !pcmk__str_eq(rsc_id, rsc->id, pcmk__str_casei) && !pcmk__str_eq(rsc_id, rsc->clone_name, pcmk__str_casei)) { pcmk__str_update(&rsc->clone_name, rsc_id); pe_rsc_debug(rsc, "Internally renamed %s on %s to %s%s", rsc_id, pe__node_name(node), rsc->id, (pcmk_is_set(rsc->flags, pe_rsc_orphan)? " (ORPHAN)" : "")); } return rsc; } static pe_resource_t * process_orphan_resource(const xmlNode *rsc_entry, const pe_node_t *node, pe_working_set_t *data_set) { pe_resource_t *rsc = NULL; const char *rsc_id = crm_element_value(rsc_entry, XML_ATTR_ID); crm_debug("Detected orphan resource %s on %s", rsc_id, pe__node_name(node)); rsc = create_fake_resource(rsc_id, rsc_entry, data_set); if (rsc == NULL) { return NULL; } if (!pcmk_is_set(data_set->flags, pe_flag_stop_rsc_orphans)) { pe__clear_resource_flags(rsc, pe_rsc_managed); } else { CRM_CHECK(rsc != NULL, return NULL); pe_rsc_trace(rsc, "Added orphan %s", rsc->id); resource_location(rsc, NULL, -INFINITY, "__orphan_do_not_run__", data_set); } return rsc; } static void process_rsc_state(pe_resource_t * rsc, pe_node_t * node, enum action_fail_response on_fail) { pe_node_t *tmpnode = NULL; char *reason = NULL; enum action_fail_response save_on_fail = action_fail_ignore; CRM_ASSERT(rsc); pe_rsc_trace(rsc, "Resource %s is %s on %s: on_fail=%s", rsc->id, role2text(rsc->role), pe__node_name(node), fail2text(on_fail)); /* process current state */ if (rsc->role != RSC_ROLE_UNKNOWN) { pe_resource_t *iter = rsc; while (iter) { if (g_hash_table_lookup(iter->known_on, node->details->id) == NULL) { pe_node_t *n = pe__copy_node(node); pe_rsc_trace(rsc, "%s%s%s known on %s", rsc->id, ((rsc->clone_name == NULL)? "" : " also known as "), ((rsc->clone_name == NULL)? "" : rsc->clone_name), pe__node_name(n)); g_hash_table_insert(iter->known_on, (gpointer) n->details->id, n); } if (pcmk_is_set(iter->flags, pe_rsc_unique)) { break; } iter = iter->parent; } } /* If a managed resource is believed to be running, but node is down ... */ if (rsc->role > RSC_ROLE_STOPPED && node->details->online == FALSE && node->details->maintenance == FALSE && pcmk_is_set(rsc->flags, pe_rsc_managed)) { gboolean should_fence = FALSE; /* If this is a guest node, fence it (regardless of whether fencing is * enabled, because guest node fencing is done by recovery of the * container resource rather than by the fencer). Mark the resource * we're processing as failed. When the guest comes back up, its * operation history in the CIB will be cleared, freeing the affected * resource to run again once we are sure we know its state. */ if (pe__is_guest_node(node)) { pe__set_resource_flags(rsc, pe_rsc_failed|pe_rsc_stop); should_fence = TRUE; } else if (pcmk_is_set(rsc->cluster->flags, pe_flag_stonith_enabled)) { if (pe__is_remote_node(node) && node->details->remote_rsc && !pcmk_is_set(node->details->remote_rsc->flags, pe_rsc_failed)) { /* Setting unseen means that fencing of the remote node will * occur only if the connection resource is not going to start * somewhere. This allows connection resources on a failed * cluster node to move to another node without requiring the * remote nodes to be fenced as well. */ node->details->unseen = TRUE; reason = crm_strdup_printf("%s is active there (fencing will be" " revoked if remote connection can " "be re-established elsewhere)", rsc->id); } should_fence = TRUE; } if (should_fence) { if (reason == NULL) { reason = crm_strdup_printf("%s is thought to be active there", rsc->id); } pe_fence_node(rsc->cluster, node, reason, FALSE); } free(reason); } /* In order to calculate priority_fencing_delay correctly, save the failure information and pass it to native_add_running(). */ save_on_fail = on_fail; if (node->details->unclean) { /* No extra processing needed * Also allows resources to be started again after a node is shot */ on_fail = action_fail_ignore; } switch (on_fail) { case action_fail_ignore: /* nothing to do */ break; case action_fail_demote: pe__set_resource_flags(rsc, pe_rsc_failed); demote_action(rsc, node, FALSE); break; case action_fail_fence: /* treat it as if it is still running * but also mark the node as unclean */ reason = crm_strdup_printf("%s failed there", rsc->id); pe_fence_node(rsc->cluster, node, reason, FALSE); free(reason); break; case action_fail_standby: node->details->standby = TRUE; node->details->standby_onfail = TRUE; break; case action_fail_block: /* is_managed == FALSE will prevent any * actions being sent for the resource */ pe__clear_resource_flags(rsc, pe_rsc_managed); pe__set_resource_flags(rsc, pe_rsc_block); break; case action_fail_migrate: /* make sure it comes up somewhere else * or not at all */ resource_location(rsc, node, -INFINITY, "__action_migration_auto__", rsc->cluster); break; case action_fail_stop: pe__set_next_role(rsc, RSC_ROLE_STOPPED, "on-fail=stop"); break; case action_fail_recover: if (rsc->role != RSC_ROLE_STOPPED && rsc->role != RSC_ROLE_UNKNOWN) { pe__set_resource_flags(rsc, pe_rsc_failed|pe_rsc_stop); stop_action(rsc, node, FALSE); } break; case action_fail_restart_container: pe__set_resource_flags(rsc, pe_rsc_failed|pe_rsc_stop); if (rsc->container && pe_rsc_is_bundled(rsc)) { /* A bundle's remote connection can run on a different node than * the bundle's container. We don't necessarily know where the * container is running yet, so remember it and add a stop * action for it later. */ rsc->cluster->stop_needed = g_list_prepend(rsc->cluster->stop_needed, rsc->container); } else if (rsc->container) { stop_action(rsc->container, node, FALSE); } else if (rsc->role != RSC_ROLE_STOPPED && rsc->role != RSC_ROLE_UNKNOWN) { stop_action(rsc, node, FALSE); } break; case action_fail_reset_remote: pe__set_resource_flags(rsc, pe_rsc_failed|pe_rsc_stop); if (pcmk_is_set(rsc->cluster->flags, pe_flag_stonith_enabled)) { tmpnode = NULL; if (rsc->is_remote_node) { tmpnode = pe_find_node(rsc->cluster->nodes, rsc->id); } if (tmpnode && pe__is_remote_node(tmpnode) && tmpnode->details->remote_was_fenced == 0) { /* The remote connection resource failed in a way that * should result in fencing the remote node. */ pe_fence_node(rsc->cluster, tmpnode, "remote connection is unrecoverable", FALSE); } } /* require the stop action regardless if fencing is occurring or not. */ if (rsc->role > RSC_ROLE_STOPPED) { stop_action(rsc, node, FALSE); } /* if reconnect delay is in use, prevent the connection from exiting the * "STOPPED" role until the failure is cleared by the delay timeout. */ if (rsc->remote_reconnect_ms) { pe__set_next_role(rsc, RSC_ROLE_STOPPED, "remote reset"); } break; } /* ensure a remote-node connection failure forces an unclean remote-node * to be fenced. By setting unseen = FALSE, the remote-node failure will * result in a fencing operation regardless if we're going to attempt to * reconnect to the remote-node in this transition or not. */ if (pcmk_is_set(rsc->flags, pe_rsc_failed) && rsc->is_remote_node) { tmpnode = pe_find_node(rsc->cluster->nodes, rsc->id); if (tmpnode && tmpnode->details->unclean) { tmpnode->details->unseen = FALSE; } } if (rsc->role != RSC_ROLE_STOPPED && rsc->role != RSC_ROLE_UNKNOWN) { if (pcmk_is_set(rsc->flags, pe_rsc_orphan)) { if (pcmk_is_set(rsc->flags, pe_rsc_managed)) { pcmk__config_warn("Detected active orphan %s running on %s", rsc->id, pe__node_name(node)); } else { pcmk__config_warn("Resource '%s' must be stopped manually on " "%s because cluster is configured not to " "stop active orphans", rsc->id, pe__node_name(node)); } } native_add_running(rsc, node, rsc->cluster, (save_on_fail != action_fail_ignore)); switch (on_fail) { case action_fail_ignore: break; case action_fail_demote: case action_fail_block: pe__set_resource_flags(rsc, pe_rsc_failed); break; default: pe__set_resource_flags(rsc, pe_rsc_failed|pe_rsc_stop); break; } } else if (rsc->clone_name && strchr(rsc->clone_name, ':') != NULL) { /* Only do this for older status sections that included instance numbers * Otherwise stopped instances will appear as orphans */ pe_rsc_trace(rsc, "Resetting clone_name %s for %s (stopped)", rsc->clone_name, rsc->id); free(rsc->clone_name); rsc->clone_name = NULL; } else { GList *possible_matches = pe__resource_actions(rsc, node, RSC_STOP, FALSE); GList *gIter = possible_matches; for (; gIter != NULL; gIter = gIter->next) { pe_action_t *stop = (pe_action_t *) gIter->data; pe__set_action_flags(stop, pe_action_optional); } g_list_free(possible_matches); } /* A successful stop after migrate_to on the migration source doesn't make * the partially migrated resource stopped on the migration target. */ if (rsc->role == RSC_ROLE_STOPPED && rsc->partial_migration_source && rsc->partial_migration_source->details == node->details && rsc->partial_migration_target && rsc->running_on) { rsc->role = RSC_ROLE_STARTED; } } /* create active recurring operations as optional */ static void process_recurring(pe_node_t * node, pe_resource_t * rsc, int start_index, int stop_index, GList *sorted_op_list, pe_working_set_t * data_set) { int counter = -1; const char *task = NULL; const char *status = NULL; GList *gIter = sorted_op_list; CRM_ASSERT(rsc); pe_rsc_trace(rsc, "%s: Start index %d, stop index = %d", rsc->id, start_index, stop_index); for (; gIter != NULL; gIter = gIter->next) { xmlNode *rsc_op = (xmlNode *) gIter->data; guint interval_ms = 0; char *key = NULL; const char *id = ID(rsc_op); counter++; if (node->details->online == FALSE) { pe_rsc_trace(rsc, "Skipping %s on %s: node is offline", rsc->id, pe__node_name(node)); break; /* Need to check if there's a monitor for role="Stopped" */ } else if (start_index < stop_index && counter <= stop_index) { pe_rsc_trace(rsc, "Skipping %s on %s: resource is not active", id, pe__node_name(node)); continue; } else if (counter < start_index) { pe_rsc_trace(rsc, "Skipping %s on %s: old %d", id, pe__node_name(node), counter); continue; } crm_element_value_ms(rsc_op, XML_LRM_ATTR_INTERVAL_MS, &interval_ms); if (interval_ms == 0) { pe_rsc_trace(rsc, "Skipping %s on %s: non-recurring", id, pe__node_name(node)); continue; } status = crm_element_value(rsc_op, XML_LRM_ATTR_OPSTATUS); if (pcmk__str_eq(status, "-1", pcmk__str_casei)) { pe_rsc_trace(rsc, "Skipping %s on %s: status", id, pe__node_name(node)); continue; } task = crm_element_value(rsc_op, XML_LRM_ATTR_TASK); /* create the action */ key = pcmk__op_key(rsc->id, task, interval_ms); pe_rsc_trace(rsc, "Creating %s on %s", key, pe__node_name(node)); custom_action(rsc, key, task, node, TRUE, TRUE, data_set); } } void calculate_active_ops(const GList *sorted_op_list, int *start_index, int *stop_index) { int counter = -1; int implied_monitor_start = -1; int implied_clone_start = -1; const char *task = NULL; const char *status = NULL; *stop_index = -1; *start_index = -1; for (const GList *iter = sorted_op_list; iter != NULL; iter = iter->next) { const xmlNode *rsc_op = (const xmlNode *) iter->data; counter++; task = crm_element_value(rsc_op, XML_LRM_ATTR_TASK); status = crm_element_value(rsc_op, XML_LRM_ATTR_OPSTATUS); if (pcmk__str_eq(task, CRMD_ACTION_STOP, pcmk__str_casei) && pcmk__str_eq(status, "0", pcmk__str_casei)) { *stop_index = counter; } else if (pcmk__strcase_any_of(task, CRMD_ACTION_START, CRMD_ACTION_MIGRATED, NULL)) { *start_index = counter; } else if ((implied_monitor_start <= *stop_index) && pcmk__str_eq(task, CRMD_ACTION_STATUS, pcmk__str_casei)) { const char *rc = crm_element_value(rsc_op, XML_LRM_ATTR_RC); if (pcmk__strcase_any_of(rc, "0", "8", NULL)) { implied_monitor_start = counter; } } else if (pcmk__strcase_any_of(task, CRMD_ACTION_PROMOTE, CRMD_ACTION_DEMOTE, NULL)) { implied_clone_start = counter; } } if (*start_index == -1) { if (implied_clone_start != -1) { *start_index = implied_clone_start; } else if (implied_monitor_start != -1) { *start_index = implied_monitor_start; } } } // If resource history entry has shutdown lock, remember lock node and time static void unpack_shutdown_lock(const xmlNode *rsc_entry, pe_resource_t *rsc, const pe_node_t *node, pe_working_set_t *data_set) { time_t lock_time = 0; // When lock started (i.e. node shutdown time) if ((crm_element_value_epoch(rsc_entry, XML_CONFIG_ATTR_SHUTDOWN_LOCK, &lock_time) == pcmk_ok) && (lock_time != 0)) { if ((data_set->shutdown_lock > 0) && (get_effective_time(data_set) > (lock_time + data_set->shutdown_lock))) { pe_rsc_info(rsc, "Shutdown lock for %s on %s expired", rsc->id, pe__node_name(node)); pe__clear_resource_history(rsc, node, data_set); } else { /* @COMPAT I don't like breaking const signatures, but * rsc->lock_node should really be const -- we just can't change it * until the next API compatibility break. */ rsc->lock_node = (pe_node_t *) node; rsc->lock_time = lock_time; } } } /*! * \internal * \brief Unpack one lrm_resource entry from a node's CIB status * * \param[in,out] node Node whose status is being unpacked * \param[in] rsc_entry lrm_resource XML being unpacked * \param[in,out] data_set Cluster working set * * \return Resource corresponding to the entry, or NULL if no operation history */ static pe_resource_t * unpack_lrm_resource(pe_node_t *node, const xmlNode *lrm_resource, pe_working_set_t *data_set) { GList *gIter = NULL; int stop_index = -1; int start_index = -1; enum rsc_role_e req_role = RSC_ROLE_UNKNOWN; const char *rsc_id = ID(lrm_resource); pe_resource_t *rsc = NULL; GList *op_list = NULL; GList *sorted_op_list = NULL; xmlNode *rsc_op = NULL; xmlNode *last_failure = NULL; enum action_fail_response on_fail = action_fail_ignore; enum rsc_role_e saved_role = RSC_ROLE_UNKNOWN; if (rsc_id == NULL) { crm_warn("Ignoring malformed " XML_LRM_TAG_RESOURCE " entry without id"); return NULL; } crm_trace("Unpacking " XML_LRM_TAG_RESOURCE " for %s on %s", rsc_id, pe__node_name(node)); // Build a list of individual lrm_rsc_op entries, so we can sort them for (rsc_op = first_named_child(lrm_resource, XML_LRM_TAG_RSC_OP); rsc_op != NULL; rsc_op = crm_next_same_xml(rsc_op)) { op_list = g_list_prepend(op_list, rsc_op); } if (!pcmk_is_set(data_set->flags, pe_flag_shutdown_lock)) { if (op_list == NULL) { // If there are no operations, there is nothing to do return NULL; } } /* find the resource */ rsc = unpack_find_resource(data_set, node, rsc_id); if (rsc == NULL) { if (op_list == NULL) { // If there are no operations, there is nothing to do return NULL; } else { rsc = process_orphan_resource(lrm_resource, node, data_set); } } CRM_ASSERT(rsc != NULL); // Check whether the resource is "shutdown-locked" to this node if (pcmk_is_set(data_set->flags, pe_flag_shutdown_lock)) { unpack_shutdown_lock(lrm_resource, rsc, node, data_set); } /* process operations */ saved_role = rsc->role; rsc->role = RSC_ROLE_UNKNOWN; sorted_op_list = g_list_sort(op_list, sort_op_by_callid); for (gIter = sorted_op_list; gIter != NULL; gIter = gIter->next) { xmlNode *rsc_op = (xmlNode *) gIter->data; unpack_rsc_op(rsc, node, rsc_op, &last_failure, &on_fail); } /* create active recurring operations as optional */ calculate_active_ops(sorted_op_list, &start_index, &stop_index); process_recurring(node, rsc, start_index, stop_index, sorted_op_list, data_set); /* no need to free the contents */ g_list_free(sorted_op_list); process_rsc_state(rsc, node, on_fail); if (get_target_role(rsc, &req_role)) { if (rsc->next_role == RSC_ROLE_UNKNOWN || req_role < rsc->next_role) { pe__set_next_role(rsc, req_role, XML_RSC_ATTR_TARGET_ROLE); } else if (req_role > rsc->next_role) { pe_rsc_info(rsc, "%s: Not overwriting calculated next role %s" " with requested next role %s", rsc->id, role2text(rsc->next_role), role2text(req_role)); } } if (saved_role > rsc->role) { rsc->role = saved_role; } return rsc; } static void handle_orphaned_container_fillers(const xmlNode *lrm_rsc_list, pe_working_set_t *data_set) { for (const xmlNode *rsc_entry = pcmk__xe_first_child(lrm_rsc_list); rsc_entry != NULL; rsc_entry = pcmk__xe_next(rsc_entry)) { pe_resource_t *rsc; pe_resource_t *container; const char *rsc_id; const char *container_id; if (!pcmk__str_eq((const char *)rsc_entry->name, XML_LRM_TAG_RESOURCE, pcmk__str_casei)) { continue; } container_id = crm_element_value(rsc_entry, XML_RSC_ATTR_CONTAINER); rsc_id = crm_element_value(rsc_entry, XML_ATTR_ID); if (container_id == NULL || rsc_id == NULL) { continue; } container = pe_find_resource(data_set->resources, container_id); if (container == NULL) { continue; } rsc = pe_find_resource(data_set->resources, rsc_id); if (rsc == NULL || !pcmk_is_set(rsc->flags, pe_rsc_orphan_container_filler) || rsc->container != NULL) { continue; } pe_rsc_trace(rsc, "Mapped container of orphaned resource %s to %s", rsc->id, container_id); rsc->container = container; container->fillers = g_list_append(container->fillers, rsc); } } /*! * \internal * \brief Unpack one node's lrm status section * * \param[in,out] node Node whose status is being unpacked * \param[in] xml CIB node state XML * \param[in,out] data_set Cluster working set */ static void unpack_node_lrm(pe_node_t *node, const xmlNode *xml, pe_working_set_t *data_set) { bool found_orphaned_container_filler = false; // Drill down to lrm_resources section xml = find_xml_node(xml, XML_CIB_TAG_LRM, FALSE); if (xml == NULL) { return; } xml = find_xml_node(xml, XML_LRM_TAG_RESOURCES, FALSE); if (xml == NULL) { return; } // Unpack each lrm_resource entry for (const xmlNode *rsc_entry = first_named_child(xml, XML_LRM_TAG_RESOURCE); rsc_entry != NULL; rsc_entry = crm_next_same_xml(rsc_entry)) { pe_resource_t *rsc = unpack_lrm_resource(node, rsc_entry, data_set); if ((rsc != NULL) && pcmk_is_set(rsc->flags, pe_rsc_orphan_container_filler)) { found_orphaned_container_filler = true; } } /* Now that all resource state has been unpacked for this node, map any * orphaned container fillers to their container resource. */ if (found_orphaned_container_filler) { handle_orphaned_container_fillers(xml, data_set); } } static void set_active(pe_resource_t * rsc) { const pe_resource_t *top = pe__const_top_resource(rsc, false); if (top && pcmk_is_set(top->flags, pe_rsc_promotable)) { rsc->role = RSC_ROLE_UNPROMOTED; } else { rsc->role = RSC_ROLE_STARTED; } } static void set_node_score(gpointer key, gpointer value, gpointer user_data) { pe_node_t *node = value; int *score = user_data; node->weight = *score; } #define XPATH_NODE_STATE "/" XML_TAG_CIB "/" XML_CIB_TAG_STATUS \ "/" XML_CIB_TAG_STATE #define SUB_XPATH_LRM_RESOURCE "/" XML_CIB_TAG_LRM \ "/" XML_LRM_TAG_RESOURCES \ "/" XML_LRM_TAG_RESOURCE #define SUB_XPATH_LRM_RSC_OP "/" XML_LRM_TAG_RSC_OP static xmlNode * find_lrm_op(const char *resource, const char *op, const char *node, const char *source, int target_rc, pe_working_set_t *data_set) { GString *xpath = NULL; xmlNode *xml = NULL; CRM_CHECK((resource != NULL) && (op != NULL) && (node != NULL), return NULL); xpath = g_string_sized_new(256); pcmk__g_strcat(xpath, XPATH_NODE_STATE "[@" XML_ATTR_UNAME "='", node, "']" SUB_XPATH_LRM_RESOURCE "[@" XML_ATTR_ID "='", resource, "']" SUB_XPATH_LRM_RSC_OP "[@" XML_LRM_ATTR_TASK "='", op, "'", NULL); /* Need to check against transition_magic too? */ if ((source != NULL) && (strcmp(op, CRMD_ACTION_MIGRATE) == 0)) { pcmk__g_strcat(xpath, " and @" XML_LRM_ATTR_MIGRATE_TARGET "='", source, "']", NULL); } else if ((source != NULL) && (strcmp(op, CRMD_ACTION_MIGRATED) == 0)) { pcmk__g_strcat(xpath, " and @" XML_LRM_ATTR_MIGRATE_SOURCE "='", source, "']", NULL); } else { g_string_append_c(xpath, ']'); } xml = get_xpath_object((const char *) xpath->str, data_set->input, LOG_DEBUG); g_string_free(xpath, TRUE); if (xml && target_rc >= 0) { int rc = PCMK_OCF_UNKNOWN_ERROR; int status = PCMK_EXEC_ERROR; crm_element_value_int(xml, XML_LRM_ATTR_RC, &rc); crm_element_value_int(xml, XML_LRM_ATTR_OPSTATUS, &status); if ((rc != target_rc) || (status != PCMK_EXEC_DONE)) { return NULL; } } return xml; } static xmlNode * find_lrm_resource(const char *rsc_id, const char *node_name, pe_working_set_t *data_set) { GString *xpath = NULL; xmlNode *xml = NULL; CRM_CHECK((rsc_id != NULL) && (node_name != NULL), return NULL); xpath = g_string_sized_new(256); pcmk__g_strcat(xpath, XPATH_NODE_STATE "[@" XML_ATTR_UNAME "='", node_name, "']" SUB_XPATH_LRM_RESOURCE "[@" XML_ATTR_ID "='", rsc_id, "']", NULL); xml = get_xpath_object((const char *) xpath->str, data_set->input, LOG_DEBUG); g_string_free(xpath, TRUE); return xml; } /*! * \internal * \brief Check whether a resource has no completed action history on a node * * \param[in,out] rsc Resource to check * \param[in] node_name Node to check * * \return true if \p rsc_id is unknown on \p node_name, otherwise false */ static bool unknown_on_node(pe_resource_t *rsc, const char *node_name) { bool result = false; xmlXPathObjectPtr search; GString *xpath = g_string_sized_new(256); pcmk__g_strcat(xpath, XPATH_NODE_STATE "[@" XML_ATTR_UNAME "='", node_name, "']" SUB_XPATH_LRM_RESOURCE "[@" XML_ATTR_ID "='", rsc->id, "']" SUB_XPATH_LRM_RSC_OP "[@" XML_LRM_ATTR_RC "!='193']", NULL); search = xpath_search(rsc->cluster->input, (const char *) xpath->str); result = (numXpathResults(search) == 0); freeXpathObject(search); g_string_free(xpath, TRUE); return result; } /*! * \brief Check whether a probe/monitor indicating the resource was not running * on a node happened after some event * * \param[in] rsc_id Resource being checked * \param[in] node_name Node being checked * \param[in] xml_op Event that monitor is being compared to * \param[in] same_node Whether the operations are on the same node * \param[in,out] data_set Cluster working set * * \return true if such a monitor happened after event, false otherwise */ static bool monitor_not_running_after(const char *rsc_id, const char *node_name, const xmlNode *xml_op, bool same_node, pe_working_set_t *data_set) { /* Any probe/monitor operation on the node indicating it was not running * there */ xmlNode *monitor = find_lrm_op(rsc_id, CRMD_ACTION_STATUS, node_name, NULL, PCMK_OCF_NOT_RUNNING, data_set); return (monitor && pe__is_newer_op(monitor, xml_op, same_node) > 0); } /*! * \brief Check whether any non-monitor operation on a node happened after some * event * * \param[in] rsc_id Resource being checked * \param[in] node_name Node being checked * \param[in] xml_op Event that non-monitor is being compared to * \param[in] same_node Whether the operations are on the same node * \param[in,out] data_set Cluster working set * * \return true if such a operation happened after event, false otherwise */ static bool non_monitor_after(const char *rsc_id, const char *node_name, const xmlNode *xml_op, bool same_node, pe_working_set_t *data_set) { xmlNode *lrm_resource = NULL; lrm_resource = find_lrm_resource(rsc_id, node_name, data_set); if (lrm_resource == NULL) { return false; } for (xmlNode *op = first_named_child(lrm_resource, XML_LRM_TAG_RSC_OP); op != NULL; op = crm_next_same_xml(op)) { const char * task = NULL; if (op == xml_op) { continue; } task = crm_element_value(op, XML_LRM_ATTR_TASK); if (pcmk__str_any_of(task, CRMD_ACTION_START, CRMD_ACTION_STOP, CRMD_ACTION_MIGRATE, CRMD_ACTION_MIGRATED, NULL) && pe__is_newer_op(op, xml_op, same_node) > 0) { return true; } } return false; } /*! * \brief Check whether the resource has newer state on a node after a migration * attempt * * \param[in] rsc_id Resource being checked * \param[in] node_name Node being checked * \param[in] migrate_to Any migrate_to event that is being compared to * \param[in] migrate_from Any migrate_from event that is being compared to * \param[in,out] data_set Cluster working set * * \return true if such a operation happened after event, false otherwise */ static bool newer_state_after_migrate(const char *rsc_id, const char *node_name, const xmlNode *migrate_to, const xmlNode *migrate_from, pe_working_set_t *data_set) { const xmlNode *xml_op = migrate_to; const char *source = NULL; const char *target = NULL; bool same_node = false; if (migrate_from) { xml_op = migrate_from; } source = crm_element_value(xml_op, XML_LRM_ATTR_MIGRATE_SOURCE); target = crm_element_value(xml_op, XML_LRM_ATTR_MIGRATE_TARGET); /* It's preferred to compare to the migrate event on the same node if * existing, since call ids are more reliable. */ if (pcmk__str_eq(node_name, target, pcmk__str_casei)) { if (migrate_from) { xml_op = migrate_from; same_node = true; } else { xml_op = migrate_to; } } else if (pcmk__str_eq(node_name, source, pcmk__str_casei)) { if (migrate_to) { xml_op = migrate_to; same_node = true; } else { xml_op = migrate_from; } } /* If there's any newer non-monitor operation on the node, or any newer * probe/monitor operation on the node indicating it was not running there, * the migration events potentially no longer matter for the node. */ return non_monitor_after(rsc_id, node_name, xml_op, same_node, data_set) || monitor_not_running_after(rsc_id, node_name, xml_op, same_node, data_set); } /*! * \internal * \brief Parse migration source and target node names from history entry * * \param[in] entry Resource history entry for a migration action * \param[in] source_node If not NULL, source must match this node * \param[in] target_node If not NULL, target must match this node * \param[out] source_name Where to store migration source node name * \param[out] target_name Where to store migration target node name * * \return Standard Pacemaker return code */ static int get_migration_node_names(const xmlNode *entry, const pe_node_t *source_node, const pe_node_t *target_node, const char **source_name, const char **target_name) { *source_name = crm_element_value(entry, XML_LRM_ATTR_MIGRATE_SOURCE); *target_name = crm_element_value(entry, XML_LRM_ATTR_MIGRATE_TARGET); if ((*source_name == NULL) || (*target_name == NULL)) { crm_err("Ignoring resource history entry %s without " XML_LRM_ATTR_MIGRATE_SOURCE " and " XML_LRM_ATTR_MIGRATE_TARGET, ID(entry)); return pcmk_rc_unpack_error; } if ((source_node != NULL) && !pcmk__str_eq(*source_name, source_node->details->uname, pcmk__str_casei|pcmk__str_null_matches)) { crm_err("Ignoring resource history entry %s because " XML_LRM_ATTR_MIGRATE_SOURCE "='%s' does not match %s", ID(entry), *source_name, pe__node_name(source_node)); return pcmk_rc_unpack_error; } if ((target_node != NULL) && !pcmk__str_eq(*target_name, target_node->details->uname, pcmk__str_casei|pcmk__str_null_matches)) { crm_err("Ignoring resource history entry %s because " XML_LRM_ATTR_MIGRATE_TARGET "='%s' does not match %s", ID(entry), *target_name, pe__node_name(target_node)); return pcmk_rc_unpack_error; } return pcmk_rc_ok; } /* * \internal * \brief Add a migration source to a resource's list of dangling migrations * * If the migrate_to and migrate_from actions in a live migration both * succeeded, but there is no stop on the source, the migration is considered * "dangling." Add the source to the resource's dangling migration list, which * will be used to schedule a stop on the source without affecting the target. * * \param[in,out] rsc Resource involved in migration * \param[in] node Migration source */ static void add_dangling_migration(pe_resource_t *rsc, const pe_node_t *node) { pe_rsc_trace(rsc, "Dangling migration of %s requires stop on %s", rsc->id, pe__node_name(node)); rsc->role = RSC_ROLE_STOPPED; rsc->dangling_migrations = g_list_prepend(rsc->dangling_migrations, (gpointer) node); } /*! * \internal * \brief Update resource role etc. after a successful migrate_to action * * \param[in,out] history Parsed action result history */ static void unpack_migrate_to_success(struct action_history *history) { /* A complete migration sequence is: * 1. migrate_to on source node (which succeeded if we get to this function) * 2. migrate_from on target node * 3. stop on source node * * If no migrate_from has happened, the migration is considered to be * "partial". If the migrate_from succeeded but no stop has happened, the * migration is considered to be "dangling". * * If a successful migrate_to and stop have happened on the source node, we * still need to check for a partial migration, due to scenarios (easier to * produce with batch-limit=1) like: * * - A resource is migrating from node1 to node2, and a migrate_to is * initiated for it on node1. * * - node2 goes into standby mode while the migrate_to is pending, which * aborts the transition. * * - Upon completion of the migrate_to, a new transition schedules a stop * on both nodes and a start on node1. * * - If the new transition is aborted for any reason while the resource is * stopping on node1, the transition after that stop completes will see * the migrate_to and stop on the source, but it's still a partial * migration, and the resource must be stopped on node2 because it is * potentially active there due to the migrate_to. * * We also need to take into account that either node's history may be * cleared at any point in the migration process. */ int from_rc = PCMK_OCF_OK; int from_status = PCMK_EXEC_PENDING; pe_node_t *target_node = NULL; xmlNode *migrate_from = NULL; const char *source = NULL; const char *target = NULL; bool source_newer_op = false; bool target_newer_state = false; bool active_on_target = false; // Get source and target node names from XML if (get_migration_node_names(history->xml, history->node, NULL, &source, &target) != pcmk_rc_ok) { return; } // Check for newer state on the source source_newer_op = non_monitor_after(history->rsc->id, source, history->xml, true, history->rsc->cluster); // Check for a migrate_from action from this source on the target migrate_from = find_lrm_op(history->rsc->id, CRMD_ACTION_MIGRATED, target, source, -1, history->rsc->cluster); if (migrate_from != NULL) { if (source_newer_op) { /* There's a newer non-monitor operation on the source and a * migrate_from on the target, so this migrate_to is irrelevant to * the resource's state. */ return; } crm_element_value_int(migrate_from, XML_LRM_ATTR_RC, &from_rc); crm_element_value_int(migrate_from, XML_LRM_ATTR_OPSTATUS, &from_status); } /* If the resource has newer state on both the source and target after the * migration events, this migrate_to is irrelevant to the resource's state. */ target_newer_state = newer_state_after_migrate(history->rsc->id, target, history->xml, migrate_from, history->rsc->cluster); if (source_newer_op && target_newer_state) { return; } /* Check for dangling migration (migrate_from succeeded but stop not done). * We know there's no stop because we already returned if the target has a * migrate_from and the source has any newer non-monitor operation. */ if ((from_rc == PCMK_OCF_OK) && (from_status == PCMK_EXEC_DONE)) { add_dangling_migration(history->rsc, history->node); return; } /* Without newer state, this migrate_to implies the resource is active. * (Clones are not allowed to migrate, so role can't be promoted.) */ history->rsc->role = RSC_ROLE_STARTED; target_node = pe_find_node(history->rsc->cluster->nodes, target); active_on_target = !target_newer_state && (target_node != NULL) && target_node->details->online; if (from_status != PCMK_EXEC_PENDING) { // migrate_from failed on target if (active_on_target) { native_add_running(history->rsc, target_node, history->rsc->cluster, TRUE); } else { // Mark resource as failed, require recovery, and prevent migration pe__set_resource_flags(history->rsc, pe_rsc_failed|pe_rsc_stop); pe__clear_resource_flags(history->rsc, pe_rsc_allow_migrate); } return; } // The migrate_from is pending, complete but erased, or to be scheduled /* If there is no history at all for the resource on an online target, then * it was likely cleaned. Just return, and we'll schedule a probe. Once we * have the probe result, it will be reflected in target_newer_state. */ if ((target_node != NULL) && target_node->details->online && unknown_on_node(history->rsc, target)) { return; } if (active_on_target) { pe_node_t *source_node = pe_find_node(history->rsc->cluster->nodes, source); native_add_running(history->rsc, target_node, history->rsc->cluster, FALSE); if ((source_node != NULL) && source_node->details->online) { /* This is a partial migration: the migrate_to completed * successfully on the source, but the migrate_from has not * completed. Remember the source and target; if the newly * chosen target remains the same when we schedule actions * later, we may continue with the migration. */ history->rsc->partial_migration_target = target_node; history->rsc->partial_migration_source = source_node; } } else if (!source_newer_op) { // Mark resource as failed, require recovery, and prevent migration pe__set_resource_flags(history->rsc, pe_rsc_failed|pe_rsc_stop); pe__clear_resource_flags(history->rsc, pe_rsc_allow_migrate); } } /*! * \internal * \brief Update resource role etc. after a failed migrate_to action * * \param[in,out] history Parsed action result history */ static void unpack_migrate_to_failure(struct action_history *history) { xmlNode *target_migrate_from = NULL; const char *source = NULL; const char *target = NULL; // Get source and target node names from XML if (get_migration_node_names(history->xml, history->node, NULL, &source, &target) != pcmk_rc_ok) { return; } /* If a migration failed, we have to assume the resource is active. Clones * are not allowed to migrate, so role can't be promoted. */ history->rsc->role = RSC_ROLE_STARTED; // Check for migrate_from on the target target_migrate_from = find_lrm_op(history->rsc->id, CRMD_ACTION_MIGRATED, target, source, PCMK_OCF_OK, history->rsc->cluster); if (/* If the resource state is unknown on the target, it will likely be * probed there. * Don't just consider it running there. We will get back here anyway in * case the probe detects it's running there. */ !unknown_on_node(history->rsc, target) /* If the resource has newer state on the target after the migration * events, this migrate_to no longer matters for the target. */ && !newer_state_after_migrate(history->rsc->id, target, history->xml, target_migrate_from, history->rsc->cluster)) { /* The resource has no newer state on the target, so assume it's still * active there. * (if it is up). */ pe_node_t *target_node = pe_find_node(history->rsc->cluster->nodes, target); if (target_node && target_node->details->online) { native_add_running(history->rsc, target_node, history->rsc->cluster, FALSE); } } else if (!non_monitor_after(history->rsc->id, source, history->xml, true, history->rsc->cluster)) { /* We know the resource has newer state on the target, but this * migrate_to still matters for the source as long as there's no newer * non-monitor operation there. */ // Mark node as having dangling migration so we can force a stop later history->rsc->dangling_migrations = g_list_prepend(history->rsc->dangling_migrations, (gpointer) history->node); } } /*! * \internal * \brief Update resource role etc. after a failed migrate_from action * * \param[in,out] history Parsed action result history */ static void unpack_migrate_from_failure(struct action_history *history) { xmlNode *source_migrate_to = NULL; const char *source = NULL; const char *target = NULL; // Get source and target node names from XML if (get_migration_node_names(history->xml, NULL, history->node, &source, &target) != pcmk_rc_ok) { return; } /* If a migration failed, we have to assume the resource is active. Clones * are not allowed to migrate, so role can't be promoted. */ history->rsc->role = RSC_ROLE_STARTED; // Check for a migrate_to on the source source_migrate_to = find_lrm_op(history->rsc->id, CRMD_ACTION_MIGRATE, source, target, PCMK_OCF_OK, history->rsc->cluster); if (/* If the resource state is unknown on the source, it will likely be * probed there. * Don't just consider it running there. We will get back here anyway in * case the probe detects it's running there. */ !unknown_on_node(history->rsc, source) /* If the resource has newer state on the source after the migration * events, this migrate_from no longer matters for the source. */ && !newer_state_after_migrate(history->rsc->id, source, source_migrate_to, history->xml, history->rsc->cluster)) { /* The resource has no newer state on the source, so assume it's still * active there (if it is up). */ pe_node_t *source_node = pe_find_node(history->rsc->cluster->nodes, source); if (source_node && source_node->details->online) { native_add_running(history->rsc, source_node, history->rsc->cluster, TRUE); } } } /*! * \internal * \brief Add an action to cluster's list of failed actions * * \param[in,out] history Parsed action result history */ static void record_failed_op(struct action_history *history) { if (!(history->node->details->online)) { return; } for (const xmlNode *xIter = history->rsc->cluster->failed->children; xIter != NULL; xIter = xIter->next) { const char *key = pe__xe_history_key(xIter); const char *uname = crm_element_value(xIter, XML_ATTR_UNAME); if (pcmk__str_eq(history->key, key, pcmk__str_none) && pcmk__str_eq(uname, history->node->details->uname, pcmk__str_casei)) { crm_trace("Skipping duplicate entry %s on %s", history->key, pe__node_name(history->node)); return; } } crm_trace("Adding entry for %s on %s to failed action list", history->key, pe__node_name(history->node)); crm_xml_add(history->xml, XML_ATTR_UNAME, history->node->details->uname); crm_xml_add(history->xml, XML_LRM_ATTR_RSCID, history->rsc->id); add_node_copy(history->rsc->cluster->failed, history->xml); } static char * last_change_str(const xmlNode *xml_op) { time_t when; char *result = NULL; if (crm_element_value_epoch(xml_op, XML_RSC_OP_LAST_CHANGE, &when) == pcmk_ok) { char *when_s = pcmk__epoch2str(&when, 0); const char *p = strchr(when_s, ' '); // Skip day of week to make message shorter if ((p != NULL) && (*(++p) != '\0')) { result = strdup(p); CRM_ASSERT(result != NULL); } free(when_s); } if (result == NULL) { result = strdup("unknown time"); CRM_ASSERT(result != NULL); } return result; } /*! * \internal * \brief Compare two on-fail values * * \param[in] first One on-fail value to compare * \param[in] second The other on-fail value to compare * * \return A negative number if second is more severe than first, zero if they * are equal, or a positive number if first is more severe than second. * \note This is only needed until the action_fail_response values can be * renumbered at the next API compatibility break. */ static int cmp_on_fail(enum action_fail_response first, enum action_fail_response second) { switch (first) { case action_fail_demote: switch (second) { case action_fail_ignore: return 1; case action_fail_demote: return 0; default: return -1; } break; case action_fail_reset_remote: switch (second) { case action_fail_ignore: case action_fail_demote: case action_fail_recover: return 1; case action_fail_reset_remote: return 0; default: return -1; } break; case action_fail_restart_container: switch (second) { case action_fail_ignore: case action_fail_demote: case action_fail_recover: case action_fail_reset_remote: return 1; case action_fail_restart_container: return 0; default: return -1; } break; default: break; } switch (second) { case action_fail_demote: return (first == action_fail_ignore)? -1 : 1; case action_fail_reset_remote: switch (first) { case action_fail_ignore: case action_fail_demote: case action_fail_recover: return -1; default: return 1; } break; case action_fail_restart_container: switch (first) { case action_fail_ignore: case action_fail_demote: case action_fail_recover: case action_fail_reset_remote: return -1; default: return 1; } break; default: break; } return first - second; } /*! * \internal * \brief Ban a resource (or its clone if an anonymous instance) from all nodes * * \param[in,out] rsc Resource to ban */ static void ban_from_all_nodes(pe_resource_t *rsc) { int score = -INFINITY; pe_resource_t *fail_rsc = rsc; if (fail_rsc->parent != NULL) { pe_resource_t *parent = uber_parent(fail_rsc); if (pe_rsc_is_anon_clone(parent)) { /* For anonymous clones, if an operation with on-fail=stop fails for * any instance, the entire clone must stop. */ fail_rsc = parent; } } // Ban the resource from all nodes crm_notice("%s will not be started under current conditions", fail_rsc->id); if (fail_rsc->allowed_nodes != NULL) { g_hash_table_destroy(fail_rsc->allowed_nodes); } fail_rsc->allowed_nodes = pe__node_list2table(rsc->cluster->nodes); g_hash_table_foreach(fail_rsc->allowed_nodes, set_node_score, &score); } /*! * \internal * \brief Update resource role, failure handling, etc., after a failed action * * \param[in,out] history Parsed action result history * \param[out] last_failure Set this to action XML * \param[in,out] on_fail What should be done about the result */ static void unpack_rsc_op_failure(struct action_history *history, xmlNode **last_failure, enum action_fail_response *on_fail) { bool is_probe = false; pe_action_t *action = NULL; char *last_change_s = NULL; *last_failure = history->xml; is_probe = pcmk_xe_is_probe(history->xml); last_change_s = last_change_str(history->xml); if (!pcmk_is_set(history->rsc->cluster->flags, pe_flag_symmetric_cluster) && (history->exit_status == PCMK_OCF_NOT_INSTALLED)) { crm_trace("Unexpected result (%s%s%s) was recorded for " "%s of %s on %s at %s " CRM_XS " exit-status=%d id=%s", services_ocf_exitcode_str(history->exit_status), (pcmk__str_empty(history->exit_reason)? "" : ": "), pcmk__s(history->exit_reason, ""), (is_probe? "probe" : history->task), history->rsc->id, pe__node_name(history->node), last_change_s, history->exit_status, history->id); } else { crm_warn("Unexpected result (%s%s%s) was recorded for " "%s of %s on %s at %s " CRM_XS " exit-status=%d id=%s", services_ocf_exitcode_str(history->exit_status), (pcmk__str_empty(history->exit_reason)? "" : ": "), pcmk__s(history->exit_reason, ""), (is_probe? "probe" : history->task), history->rsc->id, pe__node_name(history->node), last_change_s, history->exit_status, history->id); if (is_probe && (history->exit_status != PCMK_OCF_OK) && (history->exit_status != PCMK_OCF_NOT_RUNNING) && (history->exit_status != PCMK_OCF_RUNNING_PROMOTED)) { /* A failed (not just unexpected) probe result could mean the user * didn't know resources will be probed even where they can't run. */ crm_notice("If it is not possible for %s to run on %s, see " "the resource-discovery option for location constraints", history->rsc->id, pe__node_name(history->node)); } record_failed_op(history); } free(last_change_s); action = custom_action(history->rsc, strdup(history->key), history->task, NULL, TRUE, FALSE, history->rsc->cluster); if (cmp_on_fail(*on_fail, action->on_fail) < 0) { pe_rsc_trace(history->rsc, "on-fail %s -> %s for %s (%s)", fail2text(*on_fail), fail2text(action->on_fail), action->uuid, history->key); *on_fail = action->on_fail; } if (strcmp(history->task, CRMD_ACTION_STOP) == 0) { resource_location(history->rsc, history->node, -INFINITY, "__stop_fail__", history->rsc->cluster); } else if (strcmp(history->task, CRMD_ACTION_MIGRATE) == 0) { unpack_migrate_to_failure(history); } else if (strcmp(history->task, CRMD_ACTION_MIGRATED) == 0) { unpack_migrate_from_failure(history); } else if (strcmp(history->task, CRMD_ACTION_PROMOTE) == 0) { history->rsc->role = RSC_ROLE_PROMOTED; } else if (strcmp(history->task, CRMD_ACTION_DEMOTE) == 0) { if (action->on_fail == action_fail_block) { history->rsc->role = RSC_ROLE_PROMOTED; pe__set_next_role(history->rsc, RSC_ROLE_STOPPED, "demote with on-fail=block"); } else if (history->exit_status == PCMK_OCF_NOT_RUNNING) { history->rsc->role = RSC_ROLE_STOPPED; } else { /* Staying in the promoted role would put the scheduler and * controller into a loop. Setting the role to unpromoted is not * dangerous because the resource will be stopped as part of * recovery, and any promotion will be ordered after that stop. */ history->rsc->role = RSC_ROLE_UNPROMOTED; } } if (is_probe && (history->exit_status == PCMK_OCF_NOT_INSTALLED)) { /* leave stopped */ pe_rsc_trace(history->rsc, "Leaving %s stopped", history->rsc->id); history->rsc->role = RSC_ROLE_STOPPED; } else if (history->rsc->role < RSC_ROLE_STARTED) { pe_rsc_trace(history->rsc, "Setting %s active", history->rsc->id); set_active(history->rsc); } pe_rsc_trace(history->rsc, "Resource %s: role=%s, unclean=%s, on_fail=%s, fail_role=%s", history->rsc->id, role2text(history->rsc->role), pcmk__btoa(history->node->details->unclean), fail2text(action->on_fail), role2text(action->fail_role)); if ((action->fail_role != RSC_ROLE_STARTED) && (history->rsc->next_role < action->fail_role)) { pe__set_next_role(history->rsc, action->fail_role, "failure"); } if (action->fail_role == RSC_ROLE_STOPPED) { ban_from_all_nodes(history->rsc); } pe_free_action(action); } /*! * \internal * \brief Block a resource with a failed action if it cannot be recovered * * If resource action is a failed stop and fencing is not possible, mark the * resource as unmanaged and blocked, since recovery cannot be done. * * \param[in,out] history Parsed action history entry */ static void block_if_unrecoverable(struct action_history *history) { char *last_change_s = NULL; if (strcmp(history->task, CRMD_ACTION_STOP) != 0) { return; // All actions besides stop are always recoverable } if (pe_can_fence(history->node->details->data_set, history->node)) { return; // Failed stops are recoverable via fencing } last_change_s = last_change_str(history->xml); pe_proc_err("No further recovery can be attempted for %s " "because %s on %s failed (%s%s%s) at %s " CRM_XS " rc=%d id=%s", history->rsc->id, history->task, pe__node_name(history->node), services_ocf_exitcode_str(history->exit_status), (pcmk__str_empty(history->exit_reason)? "" : ": "), pcmk__s(history->exit_reason, ""), last_change_s, history->exit_status, history->id); free(last_change_s); pe__clear_resource_flags(history->rsc, pe_rsc_managed); pe__set_resource_flags(history->rsc, pe_rsc_block); } /*! * \internal * \brief Update action history's execution status and why * * \param[in,out] history Parsed action history entry * \param[out] why Where to store reason for update * \param[in] value New value * \param[in] reason Description of why value was changed */ static inline void remap_because(struct action_history *history, const char **why, int value, const char *reason) { if (history->execution_status != value) { history->execution_status = value; *why = reason; } } /*! * \internal * \brief Remap informational monitor results and operation status * * For the monitor results, certain OCF codes are for providing extended information * to the user about services that aren't yet failed but not entirely healthy either. * These must be treated as the "normal" result by Pacemaker. * * For operation status, the action result can be used to determine an appropriate * status for the purposes of responding to the action. The status provided by the * executor is not directly usable since the executor does not know what was expected. * * \param[in,out] history Parsed action history entry * \param[in,out] on_fail What should be done about the result * \param[in] expired Whether result is expired * * \note If the result is remapped and the node is not shutting down or failed, * the operation will be recorded in the data set's list of failed operations * to highlight it for the user. * * \note This may update the resource's current and next role. */ static void remap_operation(struct action_history *history, enum action_fail_response *on_fail, bool expired) { bool is_probe = false; int orig_exit_status = history->exit_status; int orig_exec_status = history->execution_status; const char *why = NULL; const char *task = history->task; // Remap degraded results to their successful counterparts history->exit_status = pcmk__effective_rc(history->exit_status); if (history->exit_status != orig_exit_status) { why = "degraded result"; if (!expired && (!history->node->details->shutdown || history->node->details->online)) { record_failed_op(history); } } if (!pe_rsc_is_bundled(history->rsc) && pcmk_xe_mask_probe_failure(history->xml) && ((history->execution_status != PCMK_EXEC_DONE) || (history->exit_status != PCMK_OCF_NOT_RUNNING))) { history->execution_status = PCMK_EXEC_DONE; history->exit_status = PCMK_OCF_NOT_RUNNING; why = "equivalent probe result"; } /* If the executor reported an execution status of anything but done or * error, consider that final. But for done or error, we know better whether * it should be treated as a failure or not, because we know the expected * result. */ switch (history->execution_status) { case PCMK_EXEC_DONE: case PCMK_EXEC_ERROR: break; // These should be treated as node-fatal case PCMK_EXEC_NO_FENCE_DEVICE: case PCMK_EXEC_NO_SECRETS: remap_because(history, &why, PCMK_EXEC_ERROR_HARD, "node-fatal error"); goto remap_done; default: goto remap_done; } is_probe = pcmk_xe_is_probe(history->xml); if (is_probe) { task = "probe"; } if (history->expected_exit_status < 0) { /* Pre-1.0 Pacemaker versions, and Pacemaker 1.1.6 or earlier with * Heartbeat 2.0.7 or earlier as the cluster layer, did not include the * expected exit status in the transition key, which (along with the * similar case of a corrupted transition key in the CIB) will be * reported to this function as -1. Pacemaker 2.0+ does not support * rolling upgrades from those versions or processing of saved CIB files * from those versions, so we do not need to care much about this case. */ remap_because(history, &why, PCMK_EXEC_ERROR, "obsolete history format"); crm_warn("Expected result not found for %s on %s " "(corrupt or obsolete CIB?)", history->key, pe__node_name(history->node)); } else if (history->exit_status == history->expected_exit_status) { remap_because(history, &why, PCMK_EXEC_DONE, "expected result"); } else { remap_because(history, &why, PCMK_EXEC_ERROR, "unexpected result"); pe_rsc_debug(history->rsc, "%s on %s: expected %d (%s), got %d (%s%s%s)", history->key, pe__node_name(history->node), history->expected_exit_status, services_ocf_exitcode_str(history->expected_exit_status), history->exit_status, services_ocf_exitcode_str(history->exit_status), (pcmk__str_empty(history->exit_reason)? "" : ": "), pcmk__s(history->exit_reason, "")); } switch (history->exit_status) { case PCMK_OCF_OK: if (is_probe && (history->expected_exit_status == PCMK_OCF_NOT_RUNNING)) { char *last_change_s = last_change_str(history->xml); remap_because(history, &why, PCMK_EXEC_DONE, "probe"); pe_rsc_info(history->rsc, "Probe found %s active on %s at %s", history->rsc->id, pe__node_name(history->node), last_change_s); free(last_change_s); } break; case PCMK_OCF_NOT_RUNNING: if (is_probe || (history->expected_exit_status == history->exit_status) || !pcmk_is_set(history->rsc->flags, pe_rsc_managed)) { /* For probes, recurring monitors for the Stopped role, and * unmanaged resources, "not running" is not considered a * failure. */ remap_because(history, &why, PCMK_EXEC_DONE, "exit status"); history->rsc->role = RSC_ROLE_STOPPED; *on_fail = action_fail_ignore; pe__set_next_role(history->rsc, RSC_ROLE_UNKNOWN, "not running"); } break; case PCMK_OCF_RUNNING_PROMOTED: if (is_probe && (history->exit_status != history->expected_exit_status)) { char *last_change_s = last_change_str(history->xml); remap_because(history, &why, PCMK_EXEC_DONE, "probe"); pe_rsc_info(history->rsc, "Probe found %s active and promoted on %s at %s", history->rsc->id, pe__node_name(history->node), last_change_s); free(last_change_s); } if (!expired || (history->exit_status == history->expected_exit_status)) { history->rsc->role = RSC_ROLE_PROMOTED; } break; case PCMK_OCF_FAILED_PROMOTED: if (!expired) { history->rsc->role = RSC_ROLE_PROMOTED; } remap_because(history, &why, PCMK_EXEC_ERROR, "exit status"); break; case PCMK_OCF_NOT_CONFIGURED: remap_because(history, &why, PCMK_EXEC_ERROR_FATAL, "exit status"); break; case PCMK_OCF_UNIMPLEMENT_FEATURE: { guint interval_ms = 0; crm_element_value_ms(history->xml, XML_LRM_ATTR_INTERVAL_MS, &interval_ms); if (interval_ms == 0) { if (!expired) { block_if_unrecoverable(history); } remap_because(history, &why, PCMK_EXEC_ERROR_HARD, "exit status"); } else { remap_because(history, &why, PCMK_EXEC_NOT_SUPPORTED, "exit status"); } } break; case PCMK_OCF_NOT_INSTALLED: case PCMK_OCF_INVALID_PARAM: case PCMK_OCF_INSUFFICIENT_PRIV: if (!expired) { block_if_unrecoverable(history); } remap_because(history, &why, PCMK_EXEC_ERROR_HARD, "exit status"); break; default: if (history->execution_status == PCMK_EXEC_DONE) { char *last_change_s = last_change_str(history->xml); crm_info("Treating unknown exit status %d from %s of %s " "on %s at %s as failure", history->exit_status, task, history->rsc->id, pe__node_name(history->node), last_change_s); remap_because(history, &why, PCMK_EXEC_ERROR, "unknown exit status"); free(last_change_s); } break; } remap_done: if (why != NULL) { pe_rsc_trace(history->rsc, "Remapped %s result from [%s: %s] to [%s: %s] " "because of %s", history->key, pcmk_exec_status_str(orig_exec_status), crm_exit_str(orig_exit_status), pcmk_exec_status_str(history->execution_status), crm_exit_str(history->exit_status), why); } } // return TRUE if start or monitor last failure but parameters changed static bool should_clear_for_param_change(const xmlNode *xml_op, const char *task, pe_resource_t *rsc, pe_node_t *node) { if (!strcmp(task, "start") || !strcmp(task, "monitor")) { if (pe__bundle_needs_remote_name(rsc)) { /* We haven't allocated resources yet, so we can't reliably * substitute addr parameters for the REMOTE_CONTAINER_HACK. * When that's needed, defer the check until later. */ pe__add_param_check(xml_op, rsc, node, pe_check_last_failure, rsc->cluster); } else { op_digest_cache_t *digest_data = NULL; digest_data = rsc_action_digest_cmp(rsc, xml_op, node, rsc->cluster); switch (digest_data->rc) { case RSC_DIGEST_UNKNOWN: crm_trace("Resource %s history entry %s on %s" " has no digest to compare", rsc->id, pe__xe_history_key(xml_op), node->details->id); break; case RSC_DIGEST_MATCH: break; default: return TRUE; } } } return FALSE; } // Order action after fencing of remote node, given connection rsc static void order_after_remote_fencing(pe_action_t *action, pe_resource_t *remote_conn, pe_working_set_t *data_set) { pe_node_t *remote_node = pe_find_node(data_set->nodes, remote_conn->id); if (remote_node) { pe_action_t *fence = pe_fence_op(remote_node, NULL, TRUE, NULL, FALSE, data_set); order_actions(fence, action, pe_order_implies_then); } } static bool should_ignore_failure_timeout(const pe_resource_t *rsc, const char *task, guint interval_ms, bool is_last_failure) { /* Clearing failures of recurring monitors has special concerns. The * executor reports only changes in the monitor result, so if the * monitor is still active and still getting the same failure result, * that will go undetected after the failure is cleared. * * Also, the operation history will have the time when the recurring * monitor result changed to the given code, not the time when the * result last happened. * * @TODO We probably should clear such failures only when the failure * timeout has passed since the last occurrence of the failed result. * However we don't record that information. We could maybe approximate * that by clearing only if there is a more recent successful monitor or * stop result, but we don't even have that information at this point * since we are still unpacking the resource's operation history. * * This is especially important for remote connection resources with a * reconnect interval, so in that case, we skip clearing failures * if the remote node hasn't been fenced. */ if (rsc->remote_reconnect_ms && pcmk_is_set(rsc->cluster->flags, pe_flag_stonith_enabled) && (interval_ms != 0) && pcmk__str_eq(task, CRMD_ACTION_STATUS, pcmk__str_casei)) { pe_node_t *remote_node = pe_find_node(rsc->cluster->nodes, rsc->id); if (remote_node && !remote_node->details->remote_was_fenced) { if (is_last_failure) { crm_info("Waiting to clear monitor failure for remote node %s" " until fencing has occurred", rsc->id); } return TRUE; } } return FALSE; } /*! * \internal * \brief Check operation age and schedule failure clearing when appropriate * * This function has two distinct purposes. The first is to check whether an * operation history entry is expired (i.e. the resource has a failure timeout, * the entry is older than the timeout, and the resource either has no fail * count or its fail count is entirely older than the timeout). The second is to * schedule fail count clearing when appropriate (i.e. the operation is expired * and either the resource has an expired fail count or the operation is a * last_failure for a remote connection resource with a reconnect interval, * or the operation is a last_failure for a start or monitor operation and the * resource's parameters have changed since the operation). * * \param[in,out] history Parsed action result history * * \return true if operation history entry is expired, otherwise false */ static bool check_operation_expiry(struct action_history *history) { bool expired = false; bool is_last_failure = pcmk__ends_with(history->id, "_last_failure_0"); time_t last_run = 0; int unexpired_fail_count = 0; const char *clear_reason = NULL; if (history->execution_status == PCMK_EXEC_NOT_INSTALLED) { pe_rsc_trace(history->rsc, "Resource history entry %s on %s is not expired: " "Not Installed does not expire", history->id, pe__node_name(history->node)); return false; // "Not installed" must always be cleared manually } if ((history->rsc->failure_timeout > 0) && (crm_element_value_epoch(history->xml, XML_RSC_OP_LAST_CHANGE, &last_run) == 0)) { // Resource has a failure-timeout, and history entry has a timestamp time_t now = get_effective_time(history->rsc->cluster); time_t last_failure = 0; // Is this particular operation history older than the failure timeout? if ((now >= (last_run + history->rsc->failure_timeout)) && !should_ignore_failure_timeout(history->rsc, history->task, history->interval_ms, is_last_failure)) { expired = true; } // Does the resource as a whole have an unexpired fail count? unexpired_fail_count = pe_get_failcount(history->node, history->rsc, &last_failure, pe_fc_effective, history->xml); // Update scheduler recheck time according to *last* failure crm_trace("%s@%lld is %sexpired @%lld with unexpired_failures=%d timeout=%ds" " last-failure@%lld", history->id, (long long) last_run, (expired? "" : "not "), (long long) now, unexpired_fail_count, history->rsc->failure_timeout, (long long) last_failure); last_failure += history->rsc->failure_timeout + 1; if (unexpired_fail_count && (now < last_failure)) { pe__update_recheck_time(last_failure, history->rsc->cluster); } } if (expired) { if (pe_get_failcount(history->node, history->rsc, NULL, pe_fc_default, history->xml)) { // There is a fail count ignoring timeout if (unexpired_fail_count == 0) { // There is no fail count considering timeout clear_reason = "it expired"; } else { /* This operation is old, but there is an unexpired fail count. * In a properly functioning cluster, this should only be * possible if this operation is not a failure (otherwise the * fail count should be expired too), so this is really just a * failsafe. */ pe_rsc_trace(history->rsc, "Resource history entry %s on %s is not expired: " "Unexpired fail count", history->id, pe__node_name(history->node)); expired = false; } } else if (is_last_failure && (history->rsc->remote_reconnect_ms != 0)) { /* Clear any expired last failure when reconnect interval is set, * even if there is no fail count. */ clear_reason = "reconnect interval is set"; } } if (!expired && is_last_failure && should_clear_for_param_change(history->xml, history->task, history->rsc, history->node)) { clear_reason = "resource parameters have changed"; } if (clear_reason != NULL) { // Schedule clearing of the fail count pe_action_t *clear_op = pe__clear_failcount(history->rsc, history->node, clear_reason, history->rsc->cluster); if (pcmk_is_set(history->rsc->cluster->flags, pe_flag_stonith_enabled) && (history->rsc->remote_reconnect_ms != 0)) { /* If we're clearing a remote connection due to a reconnect * interval, we want to wait until any scheduled fencing * completes. * * We could limit this to remote_node->details->unclean, but at * this point, that's always true (it won't be reliable until * after unpack_node_history() is done). */ crm_info("Clearing %s failure will wait until any scheduled " "fencing of %s completes", history->task, history->rsc->id); order_after_remote_fencing(clear_op, history->rsc, history->rsc->cluster); } } if (expired && (history->interval_ms == 0) && pcmk__str_eq(history->task, CRMD_ACTION_STATUS, pcmk__str_none)) { switch (history->exit_status) { case PCMK_OCF_OK: case PCMK_OCF_NOT_RUNNING: case PCMK_OCF_RUNNING_PROMOTED: case PCMK_OCF_DEGRADED: case PCMK_OCF_DEGRADED_PROMOTED: // Don't expire probes that return these values pe_rsc_trace(history->rsc, "Resource history entry %s on %s is not expired: " "Probe result", history->id, pe__node_name(history->node)); expired = false; break; } } return expired; } int pe__target_rc_from_xml(const xmlNode *xml_op) { int target_rc = 0; const char *key = crm_element_value(xml_op, XML_ATTR_TRANSITION_KEY); if (key == NULL) { return -1; } decode_transition_key(key, NULL, NULL, NULL, &target_rc); return target_rc; } /*! * \internal * \brief Get the failure handling for an action * * \param[in,out] history Parsed action history entry * * \return Failure handling appropriate to action */ static enum action_fail_response get_action_on_fail(struct action_history *history) { enum action_fail_response result = action_fail_recover; pe_action_t *action = custom_action(history->rsc, strdup(history->key), history->task, NULL, TRUE, FALSE, history->rsc->cluster); result = action->on_fail; pe_free_action(action); return result; } /*! * \internal * \brief Update a resource's state for an action result * * \param[in,out] history Parsed action history entry * \param[in] exit_status Exit status to base new state on * \param[in] last_failure Resource's last_failure entry, if known * \param[in,out] on_fail Resource's current failure handling */ static void update_resource_state(struct action_history *history, int exit_status, const xmlNode *last_failure, enum action_fail_response *on_fail) { bool clear_past_failure = false; if ((exit_status == PCMK_OCF_NOT_INSTALLED) || (!pe_rsc_is_bundled(history->rsc) && pcmk_xe_mask_probe_failure(history->xml))) { history->rsc->role = RSC_ROLE_STOPPED; } else if (exit_status == PCMK_OCF_NOT_RUNNING) { clear_past_failure = true; } else if (pcmk__str_eq(history->task, CRMD_ACTION_STATUS, pcmk__str_none)) { if ((last_failure != NULL) && pcmk__str_eq(history->key, pe__xe_history_key(last_failure), pcmk__str_none)) { clear_past_failure = true; } if (history->rsc->role < RSC_ROLE_STARTED) { set_active(history->rsc); } } else if (pcmk__str_eq(history->task, CRMD_ACTION_START, pcmk__str_none)) { history->rsc->role = RSC_ROLE_STARTED; clear_past_failure = true; } else if (pcmk__str_eq(history->task, CRMD_ACTION_STOP, pcmk__str_none)) { history->rsc->role = RSC_ROLE_STOPPED; clear_past_failure = true; } else if (pcmk__str_eq(history->task, CRMD_ACTION_PROMOTE, pcmk__str_none)) { history->rsc->role = RSC_ROLE_PROMOTED; clear_past_failure = true; } else if (pcmk__str_eq(history->task, CRMD_ACTION_DEMOTE, pcmk__str_none)) { if (*on_fail == action_fail_demote) { // Demote clears an error only if on-fail=demote clear_past_failure = true; } history->rsc->role = RSC_ROLE_UNPROMOTED; } else if (pcmk__str_eq(history->task, CRMD_ACTION_MIGRATED, pcmk__str_none)) { history->rsc->role = RSC_ROLE_STARTED; clear_past_failure = true; } else if (pcmk__str_eq(history->task, CRMD_ACTION_MIGRATE, pcmk__str_none)) { unpack_migrate_to_success(history); } else if (history->rsc->role < RSC_ROLE_STARTED) { pe_rsc_trace(history->rsc, "%s active on %s", history->rsc->id, pe__node_name(history->node)); set_active(history->rsc); } if (!clear_past_failure) { return; } switch (*on_fail) { case action_fail_stop: case action_fail_fence: case action_fail_migrate: case action_fail_standby: pe_rsc_trace(history->rsc, "%s (%s) is not cleared by a completed %s", history->rsc->id, fail2text(*on_fail), history->task); break; case action_fail_block: case action_fail_ignore: case action_fail_demote: case action_fail_recover: case action_fail_restart_container: *on_fail = action_fail_ignore; pe__set_next_role(history->rsc, RSC_ROLE_UNKNOWN, "clear past failures"); break; case action_fail_reset_remote: if (history->rsc->remote_reconnect_ms == 0) { /* With no reconnect interval, the connection is allowed to * start again after the remote node is fenced and * completely stopped. (With a reconnect interval, we wait * for the failure to be cleared entirely before attempting * to reconnect.) */ *on_fail = action_fail_ignore; pe__set_next_role(history->rsc, RSC_ROLE_UNKNOWN, "clear past failures and reset remote"); } break; } } /*! * \internal * \brief Check whether a given history entry matters for resource state * * \param[in] history Parsed action history entry * * \return true if action can affect resource state, otherwise false */ static inline bool can_affect_state(struct action_history *history) { #if 0 /* @COMPAT It might be better to parse only actions we know we're interested * in, rather than exclude a couple we don't. However that would be a * behavioral change that should be done at a major or minor series release. * Currently, unknown operations can affect whether a resource is considered * active and/or failed. */ return pcmk__str_any_of(history->task, CRMD_ACTION_STATUS, CRMD_ACTION_START, CRMD_ACTION_STOP, CRMD_ACTION_PROMOTE, CRMD_ACTION_DEMOTE, CRMD_ACTION_MIGRATE, CRMD_ACTION_MIGRATED, "asyncmon", NULL); #else return !pcmk__str_any_of(history->task, CRMD_ACTION_NOTIFY, CRMD_ACTION_METADATA, NULL); #endif } /*! * \internal * \brief Unpack execution/exit status and exit reason from a history entry * * \param[in,out] history Action history entry to unpack * * \return Standard Pacemaker return code */ static int unpack_action_result(struct action_history *history) { if ((crm_element_value_int(history->xml, XML_LRM_ATTR_OPSTATUS, &(history->execution_status)) < 0) || (history->execution_status < PCMK_EXEC_PENDING) || (history->execution_status > PCMK_EXEC_MAX) || (history->execution_status == PCMK_EXEC_CANCELLED)) { crm_err("Ignoring resource history entry %s for %s on %s " "with invalid " XML_LRM_ATTR_OPSTATUS " '%s'", history->id, history->rsc->id, pe__node_name(history->node), pcmk__s(crm_element_value(history->xml, XML_LRM_ATTR_OPSTATUS), "")); return pcmk_rc_unpack_error; } if ((crm_element_value_int(history->xml, XML_LRM_ATTR_RC, &(history->exit_status)) < 0) || (history->exit_status < 0) || (history->exit_status > CRM_EX_MAX)) { #if 0 /* @COMPAT We should ignore malformed entries, but since that would * change behavior, it should be done at a major or minor series * release. */ crm_err("Ignoring resource history entry %s for %s on %s " "with invalid " XML_LRM_ATTR_RC " '%s'", history->id, history->rsc->id, pe__node_name(history->node), pcmk__s(crm_element_value(history->xml, XML_LRM_ATTR_RC), "")); return pcmk_rc_unpack_error; #else history->exit_status = CRM_EX_ERROR; #endif } history->exit_reason = crm_element_value(history->xml, XML_LRM_ATTR_EXIT_REASON); return pcmk_rc_ok; } /*! * \internal * \brief Process an action history entry whose result expired * * \param[in,out] history Parsed action history entry * \param[in] orig_exit_status Action exit status before remapping * * \return Standard Pacemaker return code (in particular, pcmk_rc_ok means the * entry needs no further processing) */ static int process_expired_result(struct action_history *history, int orig_exit_status) { if (!pe_rsc_is_bundled(history->rsc) && pcmk_xe_mask_probe_failure(history->xml) && (orig_exit_status != history->expected_exit_status)) { if (history->rsc->role <= RSC_ROLE_STOPPED) { history->rsc->role = RSC_ROLE_UNKNOWN; } crm_trace("Ignoring resource history entry %s for probe of %s on %s: " "Masked failure expired", history->id, history->rsc->id, pe__node_name(history->node)); return pcmk_rc_ok; } if (history->exit_status == history->expected_exit_status) { return pcmk_rc_undetermined; // Only failures expire } if (history->interval_ms == 0) { crm_notice("Ignoring resource history entry %s for %s of %s on %s: " "Expired failure", history->id, history->task, history->rsc->id, pe__node_name(history->node)); return pcmk_rc_ok; } if (history->node->details->online && !history->node->details->unclean) { /* Reschedule the recurring action. schedule_cancel() won't work at * this stage, so as a hacky workaround, forcibly change the restart * digest so pcmk__check_action_config() does what we want later. * * @TODO We should skip this if there is a newer successful monitor. * Also, this causes rescheduling only if the history entry * has an op-digest (which the expire-non-blocked-failure * scheduler regression test doesn't, but that may not be a * realistic scenario in production). */ crm_notice("Rescheduling %s-interval %s of %s on %s " "after failure expired", pcmk__readable_interval(history->interval_ms), history->task, history->rsc->id, pe__node_name(history->node)); crm_xml_add(history->xml, XML_LRM_ATTR_RESTART_DIGEST, "calculated-failure-timeout"); return pcmk_rc_ok; } return pcmk_rc_undetermined; } /*! * \internal * \brief Process a masked probe failure * * \param[in,out] history Parsed action history entry * \param[in] orig_exit_status Action exit status before remapping * \param[in] last_failure Resource's last_failure entry, if known * \param[in,out] on_fail Resource's current failure handling */ static void mask_probe_failure(struct action_history *history, int orig_exit_status, const xmlNode *last_failure, enum action_fail_response *on_fail) { pe_resource_t *ban_rsc = history->rsc; if (!pcmk_is_set(history->rsc->flags, pe_rsc_unique)) { ban_rsc = uber_parent(history->rsc); } crm_notice("Treating probe result '%s' for %s on %s as 'not running'", services_ocf_exitcode_str(orig_exit_status), history->rsc->id, pe__node_name(history->node)); update_resource_state(history, history->expected_exit_status, last_failure, on_fail); crm_xml_add(history->xml, XML_ATTR_UNAME, history->node->details->uname); record_failed_op(history); resource_location(ban_rsc, history->node, -INFINITY, "masked-probe-failure", history->rsc->cluster); } /*! * \internal Check whether a given failure is for a given pending action * * \param[in] history Parsed history entry for pending action * \param[in] last_failure Resource's last_failure entry, if known * * \return true if \p last_failure is failure of pending action in \p history, * otherwise false * \note Both \p history and \p last_failure must come from the same * lrm_resource block, as node and resource are assumed to be the same. */ static bool failure_is_newer(const struct action_history *history, const xmlNode *last_failure) { guint failure_interval_ms = 0U; long long failure_change = 0LL; long long this_change = 0LL; if (last_failure == NULL) { return false; // Resource has no last_failure entry } if (!pcmk__str_eq(history->task, crm_element_value(last_failure, XML_LRM_ATTR_TASK), pcmk__str_none)) { return false; // last_failure is for different action } if ((crm_element_value_ms(last_failure, XML_LRM_ATTR_INTERVAL_MS, &failure_interval_ms) != pcmk_ok) || (history->interval_ms != failure_interval_ms)) { return false; // last_failure is for action with different interval } if ((pcmk__scan_ll(crm_element_value(history->xml, XML_RSC_OP_LAST_CHANGE), &this_change, 0LL) != pcmk_rc_ok) || (pcmk__scan_ll(crm_element_value(last_failure, XML_RSC_OP_LAST_CHANGE), &failure_change, 0LL) != pcmk_rc_ok) || (failure_change < this_change)) { return false; // Failure is not known to be newer } return true; } /*! * \internal * \brief Update a resource's role etc. for a pending action * * \param[in,out] history Parsed history entry for pending action * \param[in] last_failure Resource's last_failure entry, if known */ static void process_pending_action(struct action_history *history, const xmlNode *last_failure) { /* For recurring monitors, a failure is recorded only in RSC_last_failure_0, * and there might be a RSC_monitor_INTERVAL entry with the last successful * or pending result. * * If last_failure contains the failure of the pending recurring monitor * we're processing here, and is newer, the action is no longer pending. * (Pending results have call ID -1, which sorts last, so the last failure * if any should be known.) */ if (failure_is_newer(history, last_failure)) { return; } if (strcmp(history->task, CRMD_ACTION_START) == 0) { pe__set_resource_flags(history->rsc, pe_rsc_start_pending); set_active(history->rsc); } else if (strcmp(history->task, CRMD_ACTION_PROMOTE) == 0) { history->rsc->role = RSC_ROLE_PROMOTED; } else if ((strcmp(history->task, CRMD_ACTION_MIGRATE) == 0) && history->node->details->unclean) { /* A migrate_to action is pending on a unclean source, so force a stop * on the target. */ const char *migrate_target = NULL; pe_node_t *target = NULL; migrate_target = crm_element_value(history->xml, XML_LRM_ATTR_MIGRATE_TARGET); target = pe_find_node(history->rsc->cluster->nodes, migrate_target); if (target != NULL) { stop_action(history->rsc, target, FALSE); } } if (history->rsc->pending_task != NULL) { /* There should never be multiple pending actions, but as a failsafe, * just remember the first one processed for display purposes. */ return; } if (pcmk_is_probe(history->task, history->interval_ms)) { /* Pending probes are currently never displayed, even if pending * operations are requested. If we ever want to change that, * enable the below and the corresponding part of * native.c:native_pending_task(). */ #if 0 history->rsc->pending_task = strdup("probe"); history->rsc->pending_node = history->node; #endif } else { history->rsc->pending_task = strdup(history->task); history->rsc->pending_node = history->node; } } static void unpack_rsc_op(pe_resource_t *rsc, pe_node_t *node, xmlNode *xml_op, xmlNode **last_failure, enum action_fail_response *on_fail) { int old_rc = 0; bool expired = false; pe_resource_t *parent = rsc; enum action_fail_response failure_strategy = action_fail_recover; struct action_history history = { .rsc = rsc, .node = node, .xml = xml_op, .execution_status = PCMK_EXEC_UNKNOWN, }; CRM_CHECK(rsc && node && xml_op, return); history.id = ID(xml_op); if (history.id == NULL) { crm_err("Ignoring resource history entry for %s on %s without ID", rsc->id, pe__node_name(node)); return; } // Task and interval history.task = crm_element_value(xml_op, XML_LRM_ATTR_TASK); if (history.task == NULL) { crm_err("Ignoring resource history entry %s for %s on %s without " XML_LRM_ATTR_TASK, history.id, rsc->id, pe__node_name(node)); return; } crm_element_value_ms(xml_op, XML_LRM_ATTR_INTERVAL_MS, &(history.interval_ms)); if (!can_affect_state(&history)) { pe_rsc_trace(rsc, "Ignoring resource history entry %s for %s on %s " "with irrelevant action '%s'", history.id, rsc->id, pe__node_name(node), history.task); return; } if (unpack_action_result(&history) != pcmk_rc_ok) { return; // Error already logged } history.expected_exit_status = pe__target_rc_from_xml(xml_op); history.key = pe__xe_history_key(xml_op); crm_element_value_int(xml_op, XML_LRM_ATTR_CALLID, &(history.call_id)); pe_rsc_trace(rsc, "Unpacking %s (%s call %d on %s): %s (%s)", history.id, history.task, history.call_id, pe__node_name(node), pcmk_exec_status_str(history.execution_status), crm_exit_str(history.exit_status)); if (node->details->unclean) { pe_rsc_trace(rsc, "%s is running on %s, which is unclean (further action " "depends on value of stop's on-fail attribute)", rsc->id, pe__node_name(node)); } expired = check_operation_expiry(&history); old_rc = history.exit_status; remap_operation(&history, on_fail, expired); if (expired && (process_expired_result(&history, old_rc) == pcmk_rc_ok)) { goto done; } if (!pe_rsc_is_bundled(rsc) && pcmk_xe_mask_probe_failure(xml_op)) { mask_probe_failure(&history, old_rc, *last_failure, on_fail); goto done; } if (!pcmk_is_set(rsc->flags, pe_rsc_unique)) { parent = uber_parent(rsc); } switch (history.execution_status) { case PCMK_EXEC_PENDING: process_pending_action(&history, *last_failure); goto done; case PCMK_EXEC_DONE: update_resource_state(&history, history.exit_status, *last_failure, on_fail); goto done; case PCMK_EXEC_NOT_INSTALLED: failure_strategy = get_action_on_fail(&history); if (failure_strategy == action_fail_ignore) { crm_warn("Cannot ignore failed %s of %s on %s: " "Resource agent doesn't exist " CRM_XS " status=%d rc=%d id=%s", history.task, rsc->id, pe__node_name(node), history.execution_status, history.exit_status, history.id); /* Also for printing it as "FAILED" by marking it as pe_rsc_failed later */ *on_fail = action_fail_migrate; } resource_location(parent, node, -INFINITY, "hard-error", rsc->cluster); unpack_rsc_op_failure(&history, last_failure, on_fail); goto done; case PCMK_EXEC_NOT_CONNECTED: if (pe__is_guest_or_remote_node(node) && pcmk_is_set(node->details->remote_rsc->flags, pe_rsc_managed)) { /* We should never get into a situation where a managed remote * connection resource is considered OK but a resource action * behind the connection gets a "not connected" status. But as a * fail-safe in case a bug or unusual circumstances do lead to * that, ensure the remote connection is considered failed. */ pe__set_resource_flags(node->details->remote_rsc, pe_rsc_failed|pe_rsc_stop); } break; // Not done, do error handling case PCMK_EXEC_ERROR: case PCMK_EXEC_ERROR_HARD: case PCMK_EXEC_ERROR_FATAL: case PCMK_EXEC_TIMEOUT: case PCMK_EXEC_NOT_SUPPORTED: case PCMK_EXEC_INVALID: break; // Not done, do error handling default: // No other value should be possible at this point break; } failure_strategy = get_action_on_fail(&history); if ((failure_strategy == action_fail_ignore) || (failure_strategy == action_fail_restart_container && (strcmp(history.task, CRMD_ACTION_STOP) == 0))) { char *last_change_s = last_change_str(xml_op); crm_warn("Pretending failed %s (%s%s%s) of %s on %s at %s succeeded " CRM_XS " %s", history.task, services_ocf_exitcode_str(history.exit_status), (pcmk__str_empty(history.exit_reason)? "" : ": "), pcmk__s(history.exit_reason, ""), rsc->id, pe__node_name(node), last_change_s, history.id); free(last_change_s); update_resource_state(&history, history.expected_exit_status, *last_failure, on_fail); crm_xml_add(xml_op, XML_ATTR_UNAME, node->details->uname); pe__set_resource_flags(rsc, pe_rsc_failure_ignored); record_failed_op(&history); if ((failure_strategy == action_fail_restart_container) && cmp_on_fail(*on_fail, action_fail_recover) <= 0) { *on_fail = failure_strategy; } } else { unpack_rsc_op_failure(&history, last_failure, on_fail); if (history.execution_status == PCMK_EXEC_ERROR_HARD) { uint8_t log_level = LOG_ERR; if (history.exit_status == PCMK_OCF_NOT_INSTALLED) { log_level = LOG_NOTICE; } do_crm_log(log_level, "Preventing %s from restarting on %s because " "of hard failure (%s%s%s) " CRM_XS " %s", parent->id, pe__node_name(node), services_ocf_exitcode_str(history.exit_status), (pcmk__str_empty(history.exit_reason)? "" : ": "), pcmk__s(history.exit_reason, ""), history.id); resource_location(parent, node, -INFINITY, "hard-error", rsc->cluster); } else if (history.execution_status == PCMK_EXEC_ERROR_FATAL) { crm_err("Preventing %s from restarting anywhere because " "of fatal failure (%s%s%s) " CRM_XS " %s", parent->id, services_ocf_exitcode_str(history.exit_status), (pcmk__str_empty(history.exit_reason)? "" : ": "), pcmk__s(history.exit_reason, ""), history.id); resource_location(parent, NULL, -INFINITY, "fatal-error", rsc->cluster); } } done: pe_rsc_trace(rsc, "%s role on %s after %s is %s (next %s)", rsc->id, pe__node_name(node), history.id, role2text(rsc->role), role2text(rsc->next_role)); } static void add_node_attrs(const xmlNode *xml_obj, pe_node_t *node, bool overwrite, pe_working_set_t *data_set) { const char *cluster_name = NULL; pe_rule_eval_data_t rule_data = { .node_hash = NULL, .role = RSC_ROLE_UNKNOWN, .now = data_set->now, .match_data = NULL, .rsc_data = NULL, .op_data = NULL }; g_hash_table_insert(node->details->attrs, strdup(CRM_ATTR_UNAME), strdup(node->details->uname)); g_hash_table_insert(node->details->attrs, strdup(CRM_ATTR_ID), strdup(node->details->id)); if (pcmk__str_eq(node->details->id, data_set->dc_uuid, pcmk__str_casei)) { data_set->dc_node = node; node->details->is_dc = TRUE; g_hash_table_insert(node->details->attrs, strdup(CRM_ATTR_IS_DC), strdup(XML_BOOLEAN_TRUE)); } else { g_hash_table_insert(node->details->attrs, strdup(CRM_ATTR_IS_DC), strdup(XML_BOOLEAN_FALSE)); } cluster_name = g_hash_table_lookup(data_set->config_hash, "cluster-name"); if (cluster_name) { g_hash_table_insert(node->details->attrs, strdup(CRM_ATTR_CLUSTER_NAME), strdup(cluster_name)); } pe__unpack_dataset_nvpairs(xml_obj, XML_TAG_ATTR_SETS, &rule_data, node->details->attrs, NULL, overwrite, data_set); pe__unpack_dataset_nvpairs(xml_obj, XML_TAG_UTILIZATION, &rule_data, node->details->utilization, NULL, FALSE, data_set); if (pe_node_attribute_raw(node, CRM_ATTR_SITE_NAME) == NULL) { const char *site_name = pe_node_attribute_raw(node, "site-name"); if (site_name) { g_hash_table_insert(node->details->attrs, strdup(CRM_ATTR_SITE_NAME), strdup(site_name)); } else if (cluster_name) { /* Default to cluster-name if unset */ g_hash_table_insert(node->details->attrs, strdup(CRM_ATTR_SITE_NAME), strdup(cluster_name)); } } } static GList * extract_operations(const char *node, const char *rsc, xmlNode * rsc_entry, gboolean active_filter) { int counter = -1; int stop_index = -1; int start_index = -1; xmlNode *rsc_op = NULL; GList *gIter = NULL; GList *op_list = NULL; GList *sorted_op_list = NULL; /* extract operations */ op_list = NULL; sorted_op_list = NULL; for (rsc_op = pcmk__xe_first_child(rsc_entry); rsc_op != NULL; rsc_op = pcmk__xe_next(rsc_op)) { if (pcmk__str_eq((const char *)rsc_op->name, XML_LRM_TAG_RSC_OP, pcmk__str_none)) { crm_xml_add(rsc_op, "resource", rsc); crm_xml_add(rsc_op, XML_ATTR_UNAME, node); op_list = g_list_prepend(op_list, rsc_op); } } if (op_list == NULL) { /* if there are no operations, there is nothing to do */ return NULL; } sorted_op_list = g_list_sort(op_list, sort_op_by_callid); /* create active recurring operations as optional */ if (active_filter == FALSE) { return sorted_op_list; } op_list = NULL; calculate_active_ops(sorted_op_list, &start_index, &stop_index); for (gIter = sorted_op_list; gIter != NULL; gIter = gIter->next) { xmlNode *rsc_op = (xmlNode *) gIter->data; counter++; if (start_index < stop_index) { crm_trace("Skipping %s: not active", ID(rsc_entry)); break; } else if (counter < start_index) { crm_trace("Skipping %s: old", ID(rsc_op)); continue; } op_list = g_list_append(op_list, rsc_op); } g_list_free(sorted_op_list); return op_list; } GList * find_operations(const char *rsc, const char *node, gboolean active_filter, pe_working_set_t * data_set) { GList *output = NULL; GList *intermediate = NULL; xmlNode *tmp = NULL; xmlNode *status = find_xml_node(data_set->input, XML_CIB_TAG_STATUS, TRUE); pe_node_t *this_node = NULL; xmlNode *node_state = NULL; for (node_state = pcmk__xe_first_child(status); node_state != NULL; node_state = pcmk__xe_next(node_state)) { if (pcmk__str_eq((const char *)node_state->name, XML_CIB_TAG_STATE, pcmk__str_none)) { const char *uname = crm_element_value(node_state, XML_ATTR_UNAME); if (node != NULL && !pcmk__str_eq(uname, node, pcmk__str_casei)) { continue; } this_node = pe_find_node(data_set->nodes, uname); if(this_node == NULL) { CRM_LOG_ASSERT(this_node != NULL); continue; } else if (pe__is_guest_or_remote_node(this_node)) { determine_remote_online_status(data_set, this_node); } else { determine_online_status(node_state, this_node, data_set); } if (this_node->details->online || pcmk_is_set(data_set->flags, pe_flag_stonith_enabled)) { /* offline nodes run no resources... * unless stonith is enabled in which case we need to * make sure rsc start events happen after the stonith */ xmlNode *lrm_rsc = NULL; tmp = find_xml_node(node_state, XML_CIB_TAG_LRM, FALSE); tmp = find_xml_node(tmp, XML_LRM_TAG_RESOURCES, FALSE); for (lrm_rsc = pcmk__xe_first_child(tmp); lrm_rsc != NULL; lrm_rsc = pcmk__xe_next(lrm_rsc)) { if (pcmk__str_eq((const char *)lrm_rsc->name, XML_LRM_TAG_RESOURCE, pcmk__str_none)) { const char *rsc_id = crm_element_value(lrm_rsc, XML_ATTR_ID); if (rsc != NULL && !pcmk__str_eq(rsc_id, rsc, pcmk__str_casei)) { continue; } intermediate = extract_operations(uname, rsc_id, lrm_rsc, active_filter); output = g_list_concat(output, intermediate); } } } } } return output; }