diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1470d44ed4..09a8961608 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,12 +1,12 @@ # Contributing to the Pacemaker project If you find Pacemaker useful and wish to support the project, you can: * Spread the word (on blogs, social media, mailing lists, Q&A websites, etc.). * Join and participate in the [mailing lists](https://clusterlabs.org/mailman/listinfo/). * Report [bugs and new feature requests](https://bugs.clusterlabs.org/). * Contribute documentation, bug fixes, or features to the code base. If you would like to contribute code base changes, please read -[Pacemaker Development](https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Development/index.html) +[Pacemaker Development](https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Development/html/) for detailed information about pull requests and policies. diff --git a/doc/crm_fencing.txt b/doc/crm_fencing.txt index eb706c4bbe..26acde7671 100644 --- a/doc/crm_fencing.txt +++ b/doc/crm_fencing.txt @@ -1,439 +1,439 @@ Fencing and Stonith =================== Dejan_Muhamedagic v0.9 Fencing is a very important concept in computer clusters for HA (High Availability). Unfortunately, given that fencing does not offer a visible service to users, it is often neglected. Fencing may be defined as a method to bring an HA cluster to a known state. But, what is a "cluster state" after all? To answer that question we have to see what is in the cluster. == Introduction to HA clusters Any computer cluster may be loosely defined as a collection of cooperating computers or nodes. Nodes talk to each other over communication channels, which are typically standard network connections, such as Ethernet. The main purpose of an HA cluster is to manage user services. Typical examples of user services are an Apache web server or, say, a MySQL database. From the user's point of view, the services do some specific and hopefully useful work when ordered to do so. To the cluster, however, they are just things which may be started or stopped. This distinction is important, because the nature of the service is irrelevant to the cluster. In the cluster lingo, the user services are known as resources. Every resource has a state attached, for instance: "resource r1 is started on node1". In an HA cluster, such state implies that "resource r1 is stopped on all nodes but node1", because an HA cluster must make sure that every resource may be started on at most one node. A collection of resource states and node states is the cluster state. Every node must report every change that happens to resources. This may happen only for the running resources, because a node should not start resources unless told so by somebody. That somebody is the Cluster Resource Manager (CRM) in our case. So far so good. But what if, for whatever reason, we cannot establish with certainty a state of some node or resource? This is where fencing comes in. With fencing, even when the cluster doesn't know what is happening on some node, we can make sure that that node doesn't run any or certain important resources. If you wonder how this can happen, there may be many risks involved with computing: reckless people, power outages, natural disasters, rodents, thieves, software bugs, just to name a few. We are sure that at least a few times your computer failed unpredictably. == Fencing There are two kinds of fencing: resource level and node level. Using the resource level fencing the cluster can make sure that a node cannot access one or more resources. One typical example is a SAN, where a fencing operation changes rules on a SAN switch to deny access from a node. The resource level fencing may be achieved using normal resources on which the resource we want to protect would depend. Such a resource would simply refuse to start on this node and therefore resources which depend on it will be unrunnable on the same node as well. The node level fencing makes sure that a node does not run any resources at all. This is usually done in a very simple, yet brutal way: the node is simply reset using a power switch. This may ultimately be necessary because the node may not be responsive at all. The node level fencing is our primary subject below. == Node level fencing devices Before we get into the configuration details, you need to pick a fencing device for the node level fencing. There are quite a few to choose from. If you want to see the list of stonith devices which are supported just run: stonith -L Stonith devices may be classified into five categories: - UPS (Uninterruptible Power Supply) - PDU (Power Distribution Unit) - Blade power control devices - Lights-out devices - Testing devices The choice depends mainly on your budget and the kind of hardware. For instance, if you're running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be capable of managing single blade computers. The lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and in future they may even become standard equipment of of-the-shelf computers. They are, however, inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. Even though this is obvious to us, the cluster manager is not in the know and will try to fence the node in vain. This will continue forever because all other resource operations would wait for the fencing/stonith operation to succeed. The testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Once the cluster goes into production, they must be replaced with real fencing devices. == STONITH (Shoot The Other Node In The Head) Stonith is our fencing implementation. It provides the node level fencing. .NB The stonith and fencing terms are often used interchangeably here as well as in other texts. The stonith subsystem consists of two components: - pacemaker-fenced - stonith plugins === pacemaker-fenced pacemaker-fenced is a daemon which may be accessed by the local processes or over the network. It accepts commands which correspond to fencing operations: reset, power-off, and power-on. It may also check the status of the fencing device. pacemaker-fenced runs on every node in the CRM HA cluster. The pacemaker-fenced instance running on the DC node receives a fencing request from the CRM. It is up to this and other pacemaker-fenced programs to carry out the desired fencing operation. === Stonith plugins For every supported fencing device there is a stonith plugin which is capable of controlling that device. A stonith plugin is the interface to the fencing device. All stonith plugins look the same to pacemaker-fenced, but are quite different on the other side reflecting the nature of the fencing device. Some plugins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol. == CRM stonith configuration The fencing configuration consists of one or more stonith resources. A stonith resource is a resource of class stonith and it is configured just like any other resource. The list of parameters (attributes) depend on and are specific to a stonith type. Use the stonith(1) program to see the list: $ stonith -t ibmhmc -n ipaddr $ stonith -t ipmilan -n hostname ipaddr port auth priv login password reset_method .NB It is easy to guess the class of a fencing device from the set of attribute names. A short help text is also available: $ stonith -t ibmhmc -h STONITH Device: ibmhmc - IBM Hardware Management Console (HMC) Use for IBM i5, p5, pSeries and OpenPower systems managed by HMC Optional parameter name managedsyspat is white-space delimited list of patterns used to match managed system names; if last character is '*', all names that begin with the pattern are matched Optional parameter name password is password for hscroot if passwordless ssh access to HMC has NOT been setup (to do so, it is necessary to create a public/private key pair with empty passphrase - see "Configure the OpenSSH client" in the redbook for more details) For more information see http://publib-b.boulder.ibm.com/redbooks.nsf/RedbookAbstracts/SG247038.html .You just said that there is pacemaker-fenced and stonith plugins. What's with these resources now? ************************** Resources of class stonith are just a representation of stonith plugins in the CIB. Well, a bit more: apart from the fencing operations, the stonith resources, just like any other, may be started and stopped and monitored. The start and stop operations are a bit of a misnomer: enable and disable would serve better, but it's too late to change that. So, these two are actually administrative operations and do not translate to any operation on the fencing device itself. Monitor, however, does translate to device status. ************************** A dummy stonith resource configuration, which may be used in some testing scenarios is very simple: configure primitive st-null stonith:null \ params hostlist="node1 node2" clone fencing st-null commit .NB ************************** All configuration examples are in the crm configuration tool syntax. To apply them, put the sample in a text file, say sample.txt and run: crm < sample.txt The configure and commit lines are omitted from further examples. ************************** An alternative configuration: primitive st-node1 stonith:null \ params hostlist="node1" primitive st-node2 stonith:null \ params hostlist="node2" location l-st-node1 st-node1 -inf: node1 location l-st-node2 st-node2 -inf: node2 This configuration is perfectly alright as far as the cluster software is concerned. The only difference to a real world configuration is that no fencing operation takes place. A more realistic, but still only for testing, is the following external/ssh configuration: primitive st-ssh stonith:external/ssh \ params hostlist="node1 node2" clone fencing st-ssh This one can also reset nodes. As you can see, this configuration is remarkably similar to the first one which features the null stonith device. .What is this clone thing? ************************** Clones are a CRM/Pacemaker feature. A clone is basically a shortcut: instead of defining _n_ identical, yet differently named resources, a single cloned resource suffices. By far the most common use of clones is with stonith resources if the stonith device is accessible from all nodes. ************************** The real device configuration is not much different, though some devices may require more attributes. For instance, an IBM RSA lights-out device might be configured like this: primitive st-ibmrsa-1 stonith:external/ibmrsa-telnet \ params nodename=node1 ipaddr=192.168.0.101 \ userid=USERID passwd=PASSW0RD primitive st-ibmrsa-2 stonith:external/ibmrsa-telnet \ params nodename=node2 ipaddr=192.168.0.102 \ userid=USERID passwd=PASSW0RD # st-ibmrsa-1 can run anywhere but on node1 location l-st-node1 st-ibmrsa-1 -inf: node1 # st-ibmrsa-2 can run anywhere but on node2 location l-st-node2 st-ibmrsa-2 -inf: node2 .Why those strange location constraints? ************************** There is always certain probability that the stonith operation is going to fail. Hence, a stonith operation on the node which is the executioner too is not reliable. If the node is reset, then it cannot send the notification about the fencing operation outcome. ************************** If you haven't already guessed, configuration of a UPS kind of fencing device is remarkably similar to all we have already shown. All UPS devices employ the same mechanics for fencing. What is, however, different is how the device itself is accessed. Old UPS devices, those that were considered professional, used to have just a serial port, typically connected at 1200baud using a special serial cable. Many new ones still come equipped with a serial port, but often they also sport a USB interface or an Ethernet interface. The kind of connection we may make use of depends on what the plugin supports. Let's see a few examples for the APC UPS equipment: $ stonith -t apcmaster -h STONITH Device: apcmaster - APC MasterSwitch (via telnet) NOTE: The APC MasterSwitch accepts only one (telnet) connection/session a time. When one session is active, subsequent attempts to connect to the MasterSwitch will fail. For more information see http://www.apc.com/ List of valid parameter names for apcmaster STONITH device: ipaddr login password $ stonith -t apcsmart -h STONITH Device: apcsmart - APC Smart UPS (via serial port - NOT USB!). Works with higher-end APC UPSes, like Back-UPS Pro, Smart-UPS, Matrix-UPS, etc. (Smart-UPS may have to be >= Smart-UPS 700?). See http://www.networkupstools.org/protocols/apcsmart.html for protocol compatibility details. For more information see http://www.apc.com/ List of valid parameter names for apcsmart STONITH device: ttydev hostlist The former plugin supports APC UPS with a network port and telnet protocol. The latter plugin uses the APC SMART protocol over the serial line which is supported by many different APC UPS product lines. .So, what do I use: clones, constraints, both? ************************** It depends. Depends on the nature of the fencing device. For example, if the device cannot serve more than one connection at the time, then clones won't do. Depends on how many hosts can the device manage. If it's only one, and that is always the case with lights-out devices, then again clones are right out. Depends also on the number of nodes in your cluster: the more nodes the more desirable to use clones. Finally, it is also a matter of personal preference. In short: if clones are safe to use with your configuration and if they reduce the configuration, then make cloned stonith resources. ************************** The CRM configuration is left as an exercise to the reader. == Monitoring the fencing devices Just like any other resource, the stonith class agents also support the monitor operation. Given that we have often seen monitor either not configured or configured in a wrong way, we have decided to devote a section to the matter. Monitoring stonith resources, which is actually checking status of the corresponding fencing devices, is strongly recommended. So strongly, that we should consider a configuration without it invalid. On the one hand, though an indispensable part of an HA cluster, a fencing device, being the last line of defense, is used seldom. Very seldom and preferably never. On the other, for whatever reason, the power management equipment is known to be rather fragile on the communication side. Some devices were known to give up if there was too much broadcast traffic on the wire. Some cannot handle more than ten or so connections per minute. Some get confused or depressed if two clients try to connect at the same time. Most cannot handle more than one session at the time. The bottom line: try not to exercise your fencing device too often. It may not like it. Use monitoring regularly, yet sparingly, say once every couple of hours. The probability that within those few hours there will be a need for a fencing operation and that the power switch would fail is usually low. == Odd plugins Apart from plugins which handle real devices, some stonith plugins are a bit out of line and deserve special attention. === external/kdumpcheck Sometimes, it may be important to get a kernel core dump. This plugin may be used to check if the dump is in progress. If that is the case, then it will return true, as if the node has been fenced, which is actually true given that it cannot run any resources at the time. kdumpcheck is typically used in concert with another, real, fencing device. See README_kdumpcheck.txt for more details. === external/sbd This is a self-fencing device. It reacts to a so-called "poison pill" which may be inserted into a shared disk. On shared storage connection loss, it also makes the node commit suicide. See http://www.linux-ha.org/wiki/SBD_Fencing for more details. === meatware Strange name and a simple concept. `meatware` requires help from a human to operate. Whenever invoked, `meatware` logs a CRIT severity message which should show up on the node's console. The operator should then make sure that the node is down and issue a `meatclient(8)` command to tell `meatware` that it's OK to tell the cluster that it may consider the node dead. See `README.meatware` for more information. === null This one is probably not of much importance to the general public. It is used in various testing scenarios. `null` is an imaginary device which always behaves and always claims that it has shot a node, but never does anything. Sort of a happy-go-lucky. Do not use it unless you know what you are doing. === suicide `suicide` is a software-only device, which can reboot a node it is running on. It depends on the operating system, so it should be avoided whenever possible. But it is OK on one-node clusters. `suicide` and `null` are the only exceptions to the "don't shoot my host" rule. .What about that pacemaker-fenced? You forgot about it, eh? ************************** The pacemaker-fenced daemon, though it is really the master of ceremony, requires no configuration itself. All configuration is stored in the CIB. ************************** == Resources http://www.linux-ha.org/wiki/STONITH https://www.clusterlabs.org/doc/crm_fencing.html -https://www.clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Explained/ +https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/html/ http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html diff --git a/doc/sphinx/Pacemaker_Remote/baremetal-tutorial.rst b/doc/sphinx/Pacemaker_Remote/baremetal-tutorial.rst index 063705a2bb..861d3e34a9 100644 --- a/doc/sphinx/Pacemaker_Remote/baremetal-tutorial.rst +++ b/doc/sphinx/Pacemaker_Remote/baremetal-tutorial.rst @@ -1,238 +1,238 @@ .. index:: single: remote node; walk-through Remote Node Walk-through ------------------------ **What this tutorial is:** An in-depth walk-through of how to get Pacemaker to integrate a remote node into the cluster as a node capable of running cluster resources. **What this tutorial is not:** A realistic deployment scenario. The steps shown here are meant to get users familiar with the concept of remote nodes as quickly as possible. Configure Cluster Nodes ####################### This walk-through assumes you already have a Pacemaker cluster configured. For examples, we will use a cluster with two cluster nodes named pcmk-1 and pcmk-2. You can substitute whatever your node names are, for however many nodes you have. If you are not familiar with setting up basic Pacemaker clusters, follow the walk-through in the Clusters From Scratch document before attempting this one. You will need to add the remote node's hostname (we're using **remote1** in this tutorial) to the cluster nodes' ``/etc/hosts`` files if you haven't already. This is required unless you have DNS set up in a way where remote1's address can be discovered. Execute the following on each cluster node, replacing the IP address with the actual IP address of the remote node. .. code-block:: none # cat << END >> /etc/hosts 192.168.122.10 remote1 END Configure Remote Node ##################### .. index:: single: remote node; firewall Configure Firewall on Remote Node _________________________________ Allow cluster-related services through the local firewall: .. code-block:: none # firewall-cmd --permanent --add-service=high-availability success # firewall-cmd --reload success .. NOTE:: If you are using some other firewall solution besides firewalld, simply open the following ports, which can be used by various clustering components: TCP ports 2224, 3121, and 21064, and UDP port 5405. If you run into any problems during testing, you might want to disable the firewall and SELinux entirely until you have everything working. This may create significant security issues and should not be performed on machines that will be exposed to the outside world, but may be appropriate during development and testing on a protected host. To disable security measures: .. code-block:: none # setenforce 0 # sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config # systemctl mask firewalld.service # systemctl stop firewalld.service Configure pacemaker_remote on Remote Node _________________________________________ Install the pacemaker_remote daemon on the remote node. .. code-block:: none # yum install -y pacemaker-remote resource-agents pcs Integrate Remote Node into Cluster __________________________________ Integrating a remote node into the cluster is achieved through the creation of a remote node connection resource. The remote node connection resource both establishes the connection to the remote node and defines that the remote node exists. Note that this resource is actually internal to Pacemaker's controller. A metadata file for this resource can be found in the ``/usr/lib/ocf/resource.d/pacemaker/remote`` file that describes what options are available, but there is no actual **ocf:pacemaker:remote** resource agent script that performs any work. Before we integrate the remote node, we'll need to authorize it. .. code-block:: none # pcs host auth remote1 Now, define the remote node connection resource to our remote node, **remote1**, using the following command on any cluster node. This command creates the ocf:pacemaker:remote resource, creates and copies the key, and enables pacemaker_remote. .. code-block:: none # pcs cluster node add-remote remote1 That's it. After a moment you should see the remote node come online. The final ``pcs status`` output should look something like this, and you can see that it created the ocf:pacemaker:remote resource: .. code-block:: none # pcs status Cluster name: mycluster Cluster Summary: * Stack: corosync * Current DC: pcmk-1 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Wed Mar 3 11:02:03 2021 * Last change: Wed Mar 3 11:01:57 2021 by root via cibadmin on pcmk-1 * 3 nodes configured * 1 resource instance configured Node List: * Online: [ pcmk-1 pcmk-2 ] * RemoteOnline: [ remote1 ] Full List of Resources: * remote1 (ocf::pacemaker:remote): Started pcmk-1 How pcs Configures the Remote ############################# To see that it created the key and copied it to all cluster nodes and the guest, run: .. code-block:: none # ls -l /etc/pacemaker To see that it enables pacemaker_remote, run: .. code-block:: none # systemctl status pacemaker_remote ● pacemaker_remote.service - Pacemaker Remote executor daemon Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2021-03-02 10:42:40 EST; 1min 23s ago Docs: man:pacemaker-remoted - https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Remote/index.html + https://clusterlabs.org/pacemaker/doc/ Main PID: 1139 (pacemaker-remot) Tasks: 1 Memory: 5.4M CGroup: /system.slice/pacemaker_remote.service └─1139 /usr/sbin/pacemaker-remoted Mar 02 10:42:40 remote1 systemd[1]: Started Pacemaker Remote executor daemon. Mar 02 10:42:40 remote1 pacemaker-remoted[1139]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log Mar 02 10:42:40 remote1 pacemaker-remoted[1139]: notice: Starting Pacemaker remote executor Mar 02 10:42:41 remote1 pacemaker-remoted[1139]: notice: Pacemaker remote executor successfully started and accepting connections Starting Resources on Remote Node ################################# Once the remote node is integrated into the cluster, starting resources on a remote node is the exact same as on cluster nodes. Refer to the `Clusters from Scratch `_ document for examples of resource creation. .. WARNING:: Never involve a remote node connection resource in a resource group, colocation constraint, or order constraint. .. index:: single: remote node; fencing Fencing Remote Nodes #################### Remote nodes are fenced the same way as cluster nodes. No special considerations are required. Configure fencing resources for use with remote nodes the same as you would with cluster nodes. Note, however, that remote nodes can never 'initiate' a fencing action. Only cluster nodes are capable of actually executing a fencing operation against another node. Accessing Cluster Tools from a Remote Node ########################################## Besides allowing the cluster to manage resources on a remote node, pacemaker_remote has one other trick. The pacemaker_remote daemon allows nearly all the pacemaker tools (``crm_resource``, ``crm_mon``, ``crm_attribute``, etc.) to work on remote nodes natively. Try it: Run ``crm_mon`` on the remote node after pacemaker has integrated it into the cluster. These tools just work. These means resource agents such as promotable resources (which need access to tools like ``crm_attribute``) work seamlessly on the remote nodes. Higher-level command shells such as ``pcs`` may have partial support on remote nodes, but it is recommended to run them from a cluster node. Troubleshooting a Remote Connection ################################### Note: This section should not be done when the remote is connected to the cluster. Should connectivity issues occur, it can be worth verifying that the cluster nodes can contact the remote node on port 3121. Here's a trick you can use. Connect using ssh from each of the cluster nodes. The connection will get destroyed, but how it is destroyed tells you whether it worked or not. If running the ssh command on one of the cluster nodes results in this output before disconnecting, the connection works: .. code-block:: none # ssh -p 3121 remote1 ssh_exchange_identification: read: Connection reset by peer If you see one of these, the connection is not working: .. code-block:: none # ssh -p 3121 remote1 ssh: connect to host remote1 port 3121: No route to host .. code-block:: none # ssh -p 3121 remote1 ssh: connect to host remote1 port 3121: Connection refused Once you can successfully connect to the remote node from the both cluster nodes, you may move on to setting up Pacemaker on the cluster nodes. diff --git a/doc/sphinx/Pacemaker_Remote/kvm-tutorial.rst b/doc/sphinx/Pacemaker_Remote/kvm-tutorial.rst index f71aa65362..92b37f1f47 100644 --- a/doc/sphinx/Pacemaker_Remote/kvm-tutorial.rst +++ b/doc/sphinx/Pacemaker_Remote/kvm-tutorial.rst @@ -1,598 +1,598 @@ .. index:: single: guest node; walk-through Guest Node Walk-through ----------------------- **What this tutorial is:** An in-depth walk-through of how to get Pacemaker to manage a KVM guest instance and integrate that guest into the cluster as a guest node. **What this tutorial is not:** A realistic deployment scenario. The steps shown here are meant to get users familiar with the concept of guest nodes as quickly as possible. Configure Cluster Nodes ####################### This walk-through assumes you already have a Pacemaker cluster configured. For examples, we will use a cluster with two cluster nodes named pcmk-1 and pcmk-2. You can substitute whatever your node names are, for however many nodes you have. If you are not familiar with setting up basic Pacemaker clusters, follow the walk-through in the Clusters From Scratch document before attempting this one. You will need to add the remote node's hostname (we're using **guest1** in this tutorial) to the cluster nodes' ``/etc/hosts`` files if you haven't already. This is required unless you have DNS set up in a way where guest1's address can be discovered. Execute the following on each cluster node, replacing the IP address with the actual IP address of the remote node. .. code-block:: none # cat << END >> /etc/hosts 192.168.122.10 guest1 END Install Virtualization Software _______________________________ On each node within your cluster, install virt-install, libvirt, and qemu-kvm. Start and enable libvirtd. .. code-block:: none # yum install -y virt-install libvirt qemu-kvm # systemctl start libvirtd # systemctl enable libvirtd Reboot the host. .. NOTE:: While KVM is used in this example, any virtualization platform with a Pacemaker resource agent can be used to create a guest node. The resource agent needs only to support usual commands (start, stop, etc.); Pacemaker implements the **remote-node** meta-attribute, independent of the agent. Configure the KVM guest ####################### Create Guest ____________ Create a KVM guest to use as a guest node. Be sure to configure the guest with a hostname and a static IP address (as an example here, we will use guest1 and 192.168.122.10). Here's an example way to create a guest: * Download an .iso file from the `CentOS Mirrors List `_ into a directory on your cluster node. * Run the following command, using your own path for the **location** flag: .. code-block:: none # virt-install \ --name guest-vm \ --ram 1024 \ --disk path=./guest-vm.qcow2,size=1 \ --vcpus 2 \ --os-type linux \ --os-variant centos-stream8\ --network bridge=virbr0 \ --graphics none \ --console pty,target_type=serial \ --location \ --extra-args 'console=ttyS0,115200n8 serial' .. index:: single: guest node; firewall Configure Firewall on Guest ___________________________ On each guest, allow cluster-related services through the local firewall. Verify Connectivity ___________________ At this point, you should be able to ping and ssh into guests from hosts, and vice versa. Configure pacemaker_remote on Guest Node ________________________________________ Install the pacemaker_remote daemon on the guest node. Here, we also install the ``pacemaker`` package; it is not required, but it contains the dummy resource agent that we will use later for testing. .. code-block:: none # yum install -y pacemaker-remote resource-agents pcs pacemaker Integrate Guest into Cluster ############################ Now the fun part, integrating the virtual machine you've just created into the cluster. It is incredibly simple. Start the Cluster _________________ On the host, start Pacemaker. .. code-block:: none # pcs cluster start Wait for the host to become the DC. Integrate Guest Node into Cluster _________________________________ We will use the following command, which creates the VirtualDomain resource, creates and copies the key, and enables pacemaker_remote: .. code-block:: none # pcs cluster node add-guest guest1 Once the **vm-guest1** resource is started you will see **guest1** appear in the ``pcs status`` output as a node. The final ``pcs status`` output should look something like this, and you can see that it created the VirtualDomain resource: .. code-block:: none # pcs status Cluster name: mycluster Cluster Summary: * Stack: corosync * Current DC: pcmk-1 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Wed Mar 17 08:37:37 2021 * Last change: Wed Mar 17 08:31:01 2021 by root via cibadmin on pcmk-1 * 3 nodes configured * 2 resource instances configured Node List: * Online: [ pcmk-1 pcmk-2 ] * GuestOnline: [ guest1@pcmk-1 ] Full List of Resources: * vm-guest1 (ocf::heartbeat:VirtualDomain): pcmk-1 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled How pcs Configures the Guest ____________________________ To see that it created the key and copied it to all cluster nodes and the guest, run: .. code-block:: none # ls -l /etc/pacemaker To see that it enables pacemaker_remote, run: .. code-block:: none # systemctl status pacemaker_remote ● pacemaker_remote.service - Pacemaker Remote executor daemon Loaded: loaded (/usr/lib/systemd/system/pacemaker_remote.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2021-03-17 08:31:01 EDT; 1min 5s ago Docs: man:pacemaker-remoted - https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/ Pacemaker_Remote/index.html + https://clusterlabs.org/pacemaker/doc/ Main PID: 90160 (pacemaker-remot) Tasks: 1 Memory: 1.4M CGroup: /system.slice/pacemaker_remote.service └─90160 /usr/sbin/pacemaker-remoted Mar 17 08:31:01 guest1 systemd[1]: Started Pacemaker Remote executor daemon. Mar 17 08:31:01 guest1 pacemaker-remoted[90160]: notice: Additional logging available in /var/log/pacemaker/pacemaker.log Mar 17 08:31:01 guest1 pacemaker-remoted[90160]: notice: Starting Pacemaker remote executor Mar 17 08:31:01 guest1 pacemaker-remoted[90160]: notice: Pacemaker remote executor successfully started and accepting connections .. NOTE:: Pacemaker will automatically monitor pacemaker_remote connections for failure, so it is not necessary to create a recurring monitor on the **VirtualDomain** resource. Starting Resources on KVM Guest ############################### The commands below demonstrate how resources can be executed on both the guest node and the cluster node. Create a few Dummy resources. Dummy resources are real resource agents used just for testing purposes. They actually execute on the host they are assigned to just like an apache server or database would, except their execution just means a file was created. When the resource is stopped, that the file it created is removed. .. code-block:: none # pcs resource create FAKE1 ocf:pacemaker:Dummy # pcs resource create FAKE2 ocf:pacemaker:Dummy # pcs resource create FAKE3 ocf:pacemaker:Dummy # pcs resource create FAKE4 ocf:pacemaker:Dummy # pcs resource create FAKE5 ocf:pacemaker:Dummy Now check your ``pcs status`` output. In the resource section, you should see something like the following, where some of the resources started on the cluster node, and some started on the guest node. .. code-block:: none Full List of Resources: * vm-guest1 (ocf::heartbeat:VirtualDomain): Started pcmk-1 * FAKE1 (ocf::pacemaker:Dummy): Started guest1 * FAKE2 (ocf::pacemaker:Dummy): Started guest1 * FAKE3 (ocf::pacemaker:Dummy): Started pcmk-1 * FAKE4 (ocf::pacemaker:Dummy): Started guest1 * FAKE5 (ocf::pacemaker:Dummy): Started pcmk-1 The guest node, **guest1**, reacts just like any other node in the cluster. For example, pick out a resource that is running on your cluster node. For my purposes, I am picking FAKE3 from the output above. We can force FAKE3 to run on **guest1** in the exact same way we would any other node. .. code-block:: none # pcs constraint location FAKE3 prefers guest1 Now, looking at the bottom of the `pcs status` output you'll see FAKE3 is on **guest1**. .. code-block:: none Full List of Resources: * vm-guest1 (ocf::heartbeat:VirtualDomain): Started pcmk-1 * FAKE1 (ocf::pacemaker:Dummy): Started guest1 * FAKE2 (ocf::pacemaker:Dummy): Started guest1 * FAKE3 (ocf::pacemaker:Dummy): Started guest1 * FAKE4 (ocf::pacemaker:Dummy): Started pcmk-1 * FAKE5 (ocf::pacemaker:Dummy): Started pcmk-1 Testing Recovery and Fencing ############################ Pacemaker's scheduler is smart enough to know fencing guest nodes associated with a virtual machine means shutting off/rebooting the virtual machine. No special configuration is necessary to make this happen. If you are interested in testing this functionality out, trying stopping the guest's pacemaker_remote daemon. This would be equivalent of abruptly terminating a cluster node's corosync membership without properly shutting it down. ssh into the guest and run this command. .. code-block:: none # kill -9 $(pidof pacemaker-remoted) Within a few seconds, your ``pcs status`` output will show a monitor failure, and the **guest1** node will not be shown while it is being recovered. .. code-block:: none # pcs status Cluster name: mycluster Cluster Summary: * Stack: corosync * Current DC: pcmk-1 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Wed Mar 17 08:37:37 2021 * Last change: Wed Mar 17 08:31:01 2021 by root via cibadmin on pcmk-1 * 3 nodes configured * 7 resource instances configured Node List: * Online: [ pcmk-1 pcmk-2 ] * GuestOnline: [ guest1@pcmk-1 ] Full List of Resources: * vm-guest1 (ocf::heartbeat:VirtualDomain): pcmk-1 * FAKE1 (ocf::pacemaker:Dummy): Stopped * FAKE2 (ocf::pacemaker:Dummy): Stopped * FAKE3 (ocf::pacemaker:Dummy): Stopped * FAKE4 (ocf::pacemaker:Dummy): Started pcmk-1 * FAKE5 (ocf::pacemaker:Dummy): Started pcmk-1 Failed Actions: * guest1_monitor_30000 on pcmk-1 'unknown error' (1): call=8, status=Error, exitreason='none', last-rc-change='Wed Mar 17 08:32:01 2021', queued=0ms, exec=0ms Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled .. NOTE:: A guest node involves two resources: the one you explicitly configured creates the guest, and Pacemaker creates an implicit resource for the pacemaker_remote connection, which will be named the same as the value of the **remote-node** attribute of the explicit resource. When we killed pacemaker_remote, it is the implicit resource that failed, which is why the failed action starts with **guest1** and not **vm-guest1**. Once recovery of the guest is complete, you'll see it automatically get re-integrated into the cluster. The final ``pcs status`` output should look something like this. .. code-block:: none # pcs status Cluster name: mycluster Cluster Summary: * Stack: corosync * Current DC: pcmk-1 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Wed Mar 17 08:37:37 2021 * Last change: Wed Mar 17 08:31:01 2021 by root via cibadmin on pcmk-1 * 3 nodes configured * 7 resource instances configured Node List: * Online: [ pcmk-1 pcmk-2 ] * GuestOnline: [ guest1@pcmk-1 ] Full List of Resources: * vm-guest1 (ocf::heartbeat:VirtualDomain): pcmk-1 * FAKE1 (ocf::pacemaker:Dummy): Stopped * FAKE2 (ocf::pacemaker:Dummy): Stopped * FAKE3 (ocf::pacemaker:Dummy): Stopped * FAKE4 (ocf::pacemaker:Dummy): Started pcmk-1 * FAKE5 (ocf::pacemaker:Dummy): Started pcmk-1 Failed Actions: * guest1_monitor_30000 on pcmk-1 'unknown error' (1): call=8, status=Error, exitreason='none', last-rc-change='Fri Jan 12 18:08:29 2018', queued=0ms, exec=0ms Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled Normally, once you've investigated and addressed a failed action, you can clear the failure. However Pacemaker does not yet support cleanup for the implicitly created connection resource while the explicit resource is active. If you want to clear the failed action from the status output, stop the guest resource before clearing it. For example: .. code-block:: none # pcs resource disable vm-guest1 --wait # pcs resource cleanup guest1 # pcs resource enable vm-guest1 Accessing Cluster Tools from Guest Node ####################################### Besides allowing the cluster to manage resources on a guest node, pacemaker_remote has one other trick. The pacemaker_remote daemon allows nearly all the pacemaker tools (``crm_resource``, ``crm_mon``, ``crm_attribute``, etc.) to work on guest nodes natively. Try it: Run ``crm_mon`` on the guest after pacemaker has integrated the guest node into the cluster. These tools just work. This means resource agents such as promotable resources (which need access to tools like ``crm_attribute``) work seamlessly on the guest nodes. Higher-level command shells such as ``pcs`` may have partial support on guest nodes, but it is recommended to run them from a cluster node. Guest nodes will show up in ``crm_mon`` output as normal. For example, this is the ``crm_mon`` output after **guest1** is integrated into the cluster: .. code-block:: none Cluster name: mycluster Cluster Summary: * Stack: corosync * Current DC: pcmk-1 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Wed Mar 17 08:37:37 2021 * Last change: Wed Mar 17 08:31:01 2021 by root via cibadmin on pcmk-1 * 2 nodes configured * 2 resource instances configured Node List: * Online: [ pcmk-1 ] * GuestOnline: [ guest1@pcmk-1 ] Full List of Resources: * vm-guest1 (ocf::heartbeat:VirtualDomain): Started pcmk-1 Now, you could place a resource, such as a webserver, on **guest1**: .. code-block:: none # pcs resource create webserver apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s # pcs constraint location webserver prefers guest1 Now, the crm_mon output would show: .. code-block:: none Cluster name: mycluster Cluster Summary: * Stack: corosync * Current DC: pcmk-1 (version 2.0.5-8.el8-ba59be7122) - partition with quorum * Last updated: Wed Mar 17 08:38:37 2021 * Last change: Wed Mar 17 08:35:01 2021 by root via cibadmin on pcmk-1 * 2 nodes configured * 3 resource instances configured Node List: * Online: [ pcmk-1 ] * GuestOnline: [ guest1@pcmk-1 ] Full List of Resources: * vm-guest1 (ocf::heartbeat:VirtualDomain): Started pcmk-1 * webserver (ocf::heartbeat::apache): Started guest1 It is worth noting that after **guest1** is integrated into the cluster, nearly all the Pacemaker command-line tools immediately become available to the guest node. This means things like ``crm_mon``, ``crm_resource``, and ``crm_attribute`` will work natively on the guest node, as long as the connection between the guest node and a cluster node exists. This is particularly important for any promotable clone resources executing on the guest node that need access to ``crm_attribute`` to set promotion scores. Mile-High View of Configuration Steps ##################################### The command used in `Integrate Guest Node into Cluster`_ does multiple things. If you'd like to each part manually, you can do so as follows. You'll see that the end result is the same: * Later, we are going to put the same authentication key with the path ``/etc/pacemaker/authkey`` on every cluster node and on every virtual machine. This secures remote communication. Run this command on your cluster node if you want to make a somewhat random key: .. code-block:: none # dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1 * To create the VirtualDomain resource agent for the management of the virtual machine, Pacemaker requires the virtual machine's xml config file to be dumped to a file -- which we can name as we'd like -- on disk. We named our virtual machine guest1; for this example, we'll dump to the file /etc/pacemaker/guest1.xml .. code-block:: none # virsh dumpxml guest1 > /etc/pacemaker/guest1.xml * Install pacemaker_remote on the virtual machine, and if a local firewall is used, allow the node to accept connections on TCP port 3121. .. code-block:: none # yum install pacemaker-remote resource-agents # firewall-cmd --add-port 3121/tcp --permanent .. NOTE:: If you just want to see this work, you may want to simply disable the local firewall and put SELinux in permissive mode while testing. This creates security risks and should not be done on a production machine exposed to the Internet, but can be appropriate for a protected test machine. * On a cluster node, create a Pacemaker VirtualDomain resource to launch the virtual machine. .. code-block:: none [root@pcmk-1 ~]# pcs resource create vm-guest1 VirtualDomain hypervisor="qemu:///system" config="vm-guest1.xml" meta Assumed agent name 'ocf:heartbeat:VirtualDomain' (deduced from 'VirtualDomain') * Now use the following command to convert the VirtualDomain resource into a guest node which we'll name guest1. By doing so, the /etc/pacemaker/authkey will get copied to the guest node and the pacemaker_remote daemon will get started and enabled on the guest node as well. .. code-block:: none [root@pcmk-1 ~]# pcs cluster node add-guest guest1 vm-guest1 No addresses specified for host 'guest1', using 'guest1' Sending 'pacemaker authkey' to 'guest1' guest1: successful distribution of the file 'pacemaker authkey' Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'guest1' guest1: successful run of 'pacemaker_remote enable' guest1: successful run of 'pacemaker_remote start' * This will create CIB XML similar to the following: .. code-block:: xml .. code-block:: xml [root@pcmk-1 ~]# pcs resource status * vm-guest1 (ocf::heartbeat:VirtualDomain): Stopped [root@pcmk-1 ~]# pcs resource config Resource: vm-guest1 (class=ocf provider=heartbeat type=VirtualDomain) Attributes: config=vm-guest1.xml hypervisor=qemu:///system Meta Attrs: remote-addr=guest1 remote-node=guest1 Operations: migrate_from interval=0s timeout=60s (vm-guest1-migrate_from-interval-0s) migrate_to interval=0s timeout=120s (vm-guest1-migrate_to-interval-0s) monitor interval=10s timeout=30s (vm-guest1-monitor-interval-10s) start interval=0s timeout=90s (vm-guest1-start-interval-0s) stop interval=0s timeout=90s (vm-guest1-stop-interval-0s) The cluster will attempt to contact the virtual machine's pacemaker_remote service at the hostname **guest1** after it launches. .. NOTE:: The ID of the resource creating the virtual machine (**vm-guest1** in the above example) 'must' be different from the virtual machine's uname (**guest1** in the above example). Pacemaker will create an implicit internal resource for the pacemaker_remote connection to the guest, named with the value of **remote-node**, so that value cannot be used as the name of any other resource. Troubleshooting a Remote Connection ################################### Note: This section should not be done when the guest is connected to the cluster. Should connectivity issues occur, it can be worth verifying that the cluster nodes can contact the remote node on port 3121. Here's a trick you can use. Connect using ssh from each of the cluster nodes. The connection will get destroyed, but how it is destroyed tells you whether it worked or not. If running the ssh command on one of the cluster nodes results in this output before disconnecting, the connection works: .. code-block:: none # ssh -p 3121 guest1 ssh_exchange_identification: read: Connection reset by peer If you see one of these, the connection is not working: .. code-block:: none # ssh -p 3121 guest1 ssh: connect to host guest1 port 3121: No route to host .. code-block:: none # ssh -p 3121 guest1 ssh: connect to host guest1 port 3121: Connection refused If you see this, then the connection is working, but port 3121 is attached to SSH, which it should not be. .. code-block:: none # ssh -p 3121 guest1 kex_exchange_identification: banner line contains invalid characters Once you can successfully connect to the guest from the host, you may shutdown the guest. Pacemaker will be managing the virtual machine from this point forward. diff --git a/include/doxygen.h b/include/doxygen.h index 7fa258cd60..c90626b9ef 100644 --- a/include/doxygen.h +++ b/include/doxygen.h @@ -1,52 +1,53 @@ /* - * Copyright 2006-2019 the Pacemaker project contributors + * Copyright 2006-2021 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU Lesser General Public License * version 2.1 or later (LGPLv2.1+) WITHOUT ANY WARRANTY. */ #ifndef DOXYGEN__H # define DOXYGEN__H /** * \file * \brief Fake header file that contains doxygen documentation. - * \author Andrew Beekhof + * \author the Pacemaker project contributors * * The purpose of this file is to provide a file that can be used to create * doxygen pages. It should contain _only_ comment blocks. * * * \defgroup core Core API * \defgroup date ISO-8601 Date/Time API * \defgroup cib Configuration API * \defgroup lrmd Executor API * \defgroup pengine Scheduler API * \defgroup fencing Fencing API - * \defgroup pacemaker Pacemaker High Level API + * \defgroup pacemaker Pacemaker High-Level API */ /** * \mainpage * Welcome to the developer documentation for The Pacemaker Project! For more * information about Pacemaker, please visit the - * project web site. + * project web site. * * Here are some pointers on where to go from here. * * Using Pacemaker APIs: * - \ref core * - \ref date * - \ref cib * - \ref lrmd * - \ref pengine * - \ref fencing * - \ref pacemaker * * Contributing to the Pacemaker Project: - * - Pacemaker Development + * - Pacemaker Development */ #endif /* DOXYGEN__H */ diff --git a/tools/crm_simulate.c b/tools/crm_simulate.c index cda651ceb4..b4aa9d1951 100644 --- a/tools/crm_simulate.c +++ b/tools/crm_simulate.c @@ -1,1187 +1,1187 @@ /* * Copyright 2009-2021 the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under the GNU General Public License version 2 * or later (GPLv2+) WITHOUT ANY WARRANTY. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define SUMMARY "crm_simulate - simulate a Pacemaker cluster's response to events" struct { gboolean all_actions; char *dot_file; char *graph_file; gchar *input_file; guint modified; GList *node_up; GList *node_down; GList *node_fail; GList *op_fail; GList *op_inject; gchar *output_file; gboolean print_pending; gboolean process; char *quorum; long long repeat; gboolean show_attrs; gboolean show_failcounts; gboolean show_scores; gboolean show_utilization; gboolean simulate; gboolean store; gchar *test_dir; GList *ticket_grant; GList *ticket_revoke; GList *ticket_standby; GList *ticket_activate; char *use_date; char *watchdog; char *xml_file; } options = { .print_pending = TRUE, .repeat = 1 }; cib_t *global_cib = NULL; bool action_numbers = FALSE; char *temp_shadow = NULL; extern gboolean bringing_nodes_online; crm_exit_t exit_code = CRM_EX_OK; #define INDENT " " static pcmk__supported_format_t formats[] = { PCMK__SUPPORTED_FORMAT_NONE, PCMK__SUPPORTED_FORMAT_TEXT, PCMK__SUPPORTED_FORMAT_XML, { NULL, NULL, NULL } }; static gboolean in_place_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.store = TRUE; options.process = TRUE; options.simulate = TRUE; return TRUE; } static gboolean live_check_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { if (options.xml_file) { free(options.xml_file); } options.xml_file = NULL; return TRUE; } static gboolean node_down_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; options.node_down = g_list_append(options.node_down, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean node_fail_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; options.node_fail = g_list_append(options.node_fail, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean node_up_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; bringing_nodes_online = TRUE; options.node_up = g_list_append(options.node_up, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean op_fail_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.process = TRUE; options.simulate = TRUE; options.op_fail = g_list_append(options.op_fail, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean op_inject_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; options.op_inject = g_list_append(options.op_inject, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean quorum_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { if (options.quorum) { free(options.quorum); } options.modified++; options.quorum = strdup(optarg); return TRUE; } static gboolean save_dotfile_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { if (options.dot_file) { free(options.dot_file); } options.process = TRUE; options.dot_file = strdup(optarg); return TRUE; } static gboolean save_graph_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { if (options.graph_file) { free(options.graph_file); } options.process = TRUE; options.graph_file = strdup(optarg); return TRUE; } static gboolean show_scores_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.process = TRUE; options.show_scores = TRUE; return TRUE; } static gboolean simulate_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.process = TRUE; options.simulate = TRUE; return TRUE; } static gboolean ticket_activate_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; options.ticket_activate = g_list_append(options.ticket_activate, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean ticket_grant_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; options.ticket_grant = g_list_append(options.ticket_grant, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean ticket_revoke_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; options.ticket_revoke = g_list_append(options.ticket_revoke, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean ticket_standby_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.modified++; options.ticket_standby = g_list_append(options.ticket_standby, (gchar *) g_strdup(optarg)); return TRUE; } static gboolean utilization_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { options.process = TRUE; options.show_utilization = TRUE; return TRUE; } static gboolean watchdog_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { if (options.watchdog) { free(options.watchdog); } options.modified++; options.watchdog = strdup(optarg); return TRUE; } static gboolean xml_file_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { if (options.xml_file) { free(options.xml_file); } options.xml_file = strdup(optarg); return TRUE; } static gboolean xml_pipe_cb(const gchar *option_name, const gchar *optarg, gpointer data, GError **error) { if (options.xml_file) { free(options.xml_file); } options.xml_file = strdup("-"); return TRUE; } static GOptionEntry operation_entries[] = { { "run", 'R', 0, G_OPTION_ARG_NONE, &options.process, "Process the supplied input and show what actions the cluster will take in response", NULL }, { "simulate", 'S', G_OPTION_FLAG_NO_ARG, G_OPTION_ARG_CALLBACK, simulate_cb, "Like --run, but also simulate taking those actions and show the resulting new status", NULL }, { "in-place", 'X', G_OPTION_FLAG_NO_ARG, G_OPTION_ARG_CALLBACK, in_place_cb, "Like --simulate, but also store the results back to the input file", NULL }, { "show-attrs", 'A', 0, G_OPTION_ARG_NONE, &options.show_attrs, "Show node attributes", NULL }, { "show-failcounts", 'c', 0, G_OPTION_ARG_NONE, &options.show_failcounts, "Show resource fail counts", NULL }, { "show-scores", 's', G_OPTION_FLAG_NO_ARG, G_OPTION_ARG_CALLBACK, show_scores_cb, "Show allocation scores", NULL }, { "show-utilization", 'U', G_OPTION_FLAG_NO_ARG, G_OPTION_ARG_CALLBACK, utilization_cb, "Show utilization information", NULL }, { "profile", 'P', 0, G_OPTION_ARG_FILENAME, &options.test_dir, "Process all the XML files in the named directory to create profiling data", "DIR" }, { "repeat", 'N', 0, G_OPTION_ARG_INT, &options.repeat, "With --profile, repeat each test N times and print timings", "N" }, /* Deprecated */ { "pending", 'j', G_OPTION_FLAG_HIDDEN, G_OPTION_ARG_NONE, &options.print_pending, "Display pending state if 'record-pending' is enabled", NULL }, { NULL } }; static GOptionEntry synthetic_entries[] = { { "node-up", 'u', 0, G_OPTION_ARG_CALLBACK, node_up_cb, "Simulate bringing a node online", "NODE" }, { "node-down", 'd', 0, G_OPTION_ARG_CALLBACK, node_down_cb, "Simulate taking a node offline", "NODE" }, { "node-fail", 'f', 0, G_OPTION_ARG_CALLBACK, node_fail_cb, "Simulate a node failing", "NODE" }, { "op-inject", 'i', 0, G_OPTION_ARG_CALLBACK, op_inject_cb, "Generate a failure for the cluster to react to in the simulation.\n" INDENT "See `Operation Specification` help for more information.", "OPSPEC" }, { "op-fail", 'F', 0, G_OPTION_ARG_CALLBACK, op_fail_cb, "If the specified task occurs during the simulation, have it fail with return code ${rc}.\n" INDENT "The transition will normally stop at the failed action.\n" INDENT "Save the result with --save-output and re-run with --xml-file.\n" INDENT "See `Operation Specification` help for more information.", "OPSPEC" }, { "set-datetime", 't', 0, G_OPTION_ARG_STRING, &options.use_date, "Set date/time (ISO 8601 format, see https://en.wikipedia.org/wiki/ISO_8601)", "DATETIME" }, { "quorum", 'q', 0, G_OPTION_ARG_CALLBACK, quorum_cb, "Set to '1' (or 'true') to indicate cluster has quorum", "QUORUM" }, { "watchdog", 'w', 0, G_OPTION_ARG_CALLBACK, watchdog_cb, "Set to '1' (or 'true') to indicate cluster has an active watchdog device", "DEVICE" }, { "ticket-grant", 'g', 0, G_OPTION_ARG_CALLBACK, ticket_grant_cb, "Simulate granting a ticket", "TICKET" }, { "ticket-revoke", 'r', 0, G_OPTION_ARG_CALLBACK, ticket_revoke_cb, "Simulate revoking a ticket", "TICKET" }, { "ticket-standby", 'b', 0, G_OPTION_ARG_CALLBACK, ticket_standby_cb, "Simulate making a ticket standby", "TICKET" }, { "ticket-activate", 'e', 0, G_OPTION_ARG_CALLBACK, ticket_activate_cb, "Simulate activating a ticket", "TICKET" }, { NULL } }; static GOptionEntry artifact_entries[] = { { "save-input", 'I', 0, G_OPTION_ARG_FILENAME, &options.input_file, "Save the input configuration to the named file", "FILE" }, { "save-output", 'O', 0, G_OPTION_ARG_FILENAME, &options.output_file, "Save the output configuration to the named file", "FILE" }, { "save-graph", 'G', 0, G_OPTION_ARG_CALLBACK, save_graph_cb, "Save the transition graph (XML format) to the named file", "FILE" }, { "save-dotfile", 'D', 0, G_OPTION_ARG_CALLBACK, save_dotfile_cb, "Save the transition graph (DOT format) to the named file", "FILE" }, { "all-actions", 'a', 0, G_OPTION_ARG_NONE, &options.all_actions, "Display all possible actions in DOT graph (even if not part of transition)", NULL }, { NULL } }; static GOptionEntry source_entries[] = { { "live-check", 'L', G_OPTION_FLAG_NO_ARG, G_OPTION_ARG_CALLBACK, live_check_cb, "Connect to CIB manager and use the current CIB contents as input", NULL }, { "xml-file", 'x', 0, G_OPTION_ARG_CALLBACK, xml_file_cb, "Retrieve XML from the named file", "FILE" }, { "xml-pipe", 'p', G_OPTION_FLAG_NO_ARG, G_OPTION_ARG_CALLBACK, xml_pipe_cb, "Retrieve XML from stdin", NULL }, { NULL } }; static void get_date(pe_working_set_t *data_set, bool print_original, char *use_date) { pcmk__output_t *out = data_set->priv; time_t original_date = 0; crm_element_value_epoch(data_set->input, "execution-date", &original_date); if (use_date) { data_set->now = crm_time_new(use_date); out->info(out, "Setting effective cluster time: %s", use_date); crm_time_log(LOG_NOTICE, "Pretending 'now' is", data_set->now, crm_time_log_date | crm_time_log_timeofday); } else if (original_date) { data_set->now = crm_time_new(NULL); crm_time_set_timet(data_set->now, &original_date); if (print_original) { char *when = crm_time_as_string(data_set->now, crm_time_log_date|crm_time_log_timeofday); out->info(out, "Using the original execution date of: %s", when); free(when); } } } static void print_cluster_status(pe_working_set_t * data_set, unsigned int show_opts) { pcmk__output_t *out = data_set->priv; int rc = pcmk_rc_no_output; GList *all = NULL; all = g_list_prepend(all, strdup("*")); rc = out->message(out, "node-list", data_set->nodes, all, all, show_opts); PCMK__OUTPUT_SPACER_IF(out, rc == pcmk_rc_ok); rc = out->message(out, "resource-list", data_set, show_opts | pcmk_show_inactive_rscs, FALSE, all, all, FALSE); if (options.show_attrs) { out->message(out, "node-attribute-list", data_set, 0, rc == pcmk_rc_ok, all, all); } if (options.show_failcounts) { out->message(out, "failed-action-list", data_set, all, all, rc == pcmk_rc_ok); } g_list_free_full(all, free); } static char * create_action_name(pe_action_t *action) { char *action_name = NULL; const char *prefix = ""; const char *action_host = NULL; const char *clone_name = NULL; const char *task = action->task; if (action->node) { action_host = action->node->details->uname; } else if (!pcmk_is_set(action->flags, pe_action_pseudo)) { action_host = ""; } if (pcmk__str_eq(action->task, RSC_CANCEL, pcmk__str_casei)) { prefix = "Cancel "; task = action->cancel_task; } if (action->rsc && action->rsc->clone_name) { clone_name = action->rsc->clone_name; } if (clone_name) { char *key = NULL; guint interval_ms = 0; if (pcmk__guint_from_hash(action->meta, XML_LRM_ATTR_INTERVAL_MS, 0, &interval_ms) != pcmk_rc_ok) { interval_ms = 0; } if (pcmk__strcase_any_of(action->task, RSC_NOTIFY, RSC_NOTIFIED, NULL)) { const char *n_type = g_hash_table_lookup(action->meta, "notify_key_type"); const char *n_task = g_hash_table_lookup(action->meta, "notify_key_operation"); CRM_ASSERT(n_type != NULL); CRM_ASSERT(n_task != NULL); key = pcmk__notify_key(clone_name, n_type, n_task); } else { key = pcmk__op_key(clone_name, task, interval_ms); } if (action_host) { action_name = crm_strdup_printf("%s%s %s", prefix, key, action_host); } else { action_name = crm_strdup_printf("%s%s", prefix, key); } free(key); } else if (pcmk__str_eq(action->task, CRM_OP_FENCE, pcmk__str_casei)) { const char *op = g_hash_table_lookup(action->meta, "stonith_action"); action_name = crm_strdup_printf("%s%s '%s' %s", prefix, action->task, op, action_host); } else if (action->rsc && action_host) { action_name = crm_strdup_printf("%s%s %s", prefix, action->uuid, action_host); } else if (action_host) { action_name = crm_strdup_printf("%s%s %s", prefix, action->task, action_host); } else { action_name = crm_strdup_printf("%s", action->uuid); } if (action_numbers) { // i.e. verbose char *with_id = crm_strdup_printf("%s (%d)", action_name, action->id); free(action_name); action_name = with_id; } return action_name; } static bool create_dotfile(pe_working_set_t * data_set, const char *dot_file, gboolean all_actions, GError **error) { GList *gIter = NULL; FILE *dot_strm = fopen(dot_file, "w"); if (dot_strm == NULL) { g_set_error(error, PCMK__RC_ERROR, errno, "Could not open %s for writing: %s", dot_file, pcmk_rc_str(errno)); return false; } fprintf(dot_strm, " digraph \"g\" {\n"); for (gIter = data_set->actions; gIter != NULL; gIter = gIter->next) { pe_action_t *action = (pe_action_t *) gIter->data; const char *style = "dashed"; const char *font = "black"; const char *color = "black"; char *action_name = create_action_name(action); crm_trace("Action %d: %s %s %p", action->id, action_name, action->uuid, action); if (pcmk_is_set(action->flags, pe_action_pseudo)) { font = "orange"; } if (pcmk_is_set(action->flags, pe_action_dumped)) { style = "bold"; color = "green"; } else if ((action->rsc != NULL) && !pcmk_is_set(action->rsc->flags, pe_rsc_managed)) { color = "red"; font = "purple"; if (all_actions == FALSE) { goto do_not_write; } } else if (pcmk_is_set(action->flags, pe_action_optional)) { color = "blue"; if (all_actions == FALSE) { goto do_not_write; } } else { color = "red"; CRM_CHECK(!pcmk_is_set(action->flags, pe_action_runnable), ;); } pe__set_action_flags(action, pe_action_dumped); crm_trace("\"%s\" [ style=%s color=\"%s\" fontcolor=\"%s\"]", action_name, style, color, font); fprintf(dot_strm, "\"%s\" [ style=%s color=\"%s\" fontcolor=\"%s\"]\n", action_name, style, color, font); do_not_write: free(action_name); } for (gIter = data_set->actions; gIter != NULL; gIter = gIter->next) { pe_action_t *action = (pe_action_t *) gIter->data; GList *gIter2 = NULL; for (gIter2 = action->actions_before; gIter2 != NULL; gIter2 = gIter2->next) { pe_action_wrapper_t *before = (pe_action_wrapper_t *) gIter2->data; char *before_name = NULL; char *after_name = NULL; const char *style = "dashed"; gboolean optional = TRUE; if (before->state == pe_link_dumped) { optional = FALSE; style = "bold"; } else if (pcmk_is_set(action->flags, pe_action_pseudo) && (before->type & pe_order_stonith_stop)) { continue; } else if (before->type == pe_order_none) { continue; } else if (pcmk_is_set(before->action->flags, pe_action_dumped) && pcmk_is_set(action->flags, pe_action_dumped) && before->type != pe_order_load) { optional = FALSE; } if (all_actions || optional == FALSE) { before_name = create_action_name(before->action); after_name = create_action_name(action); crm_trace("\"%s\" -> \"%s\" [ style = %s]", before_name, after_name, style); fprintf(dot_strm, "\"%s\" -> \"%s\" [ style = %s]\n", before_name, after_name, style); free(before_name); free(after_name); } } } fprintf(dot_strm, "}\n"); fflush(dot_strm); fclose(dot_strm); return true; } static int setup_input(const char *input, const char *output, GError **error) { int rc = pcmk_rc_ok; cib_t *cib_conn = NULL; xmlNode *cib_object = NULL; char *local_output = NULL; if (input == NULL) { /* Use live CIB */ cib_conn = cib_new(); rc = cib_conn->cmds->signon(cib_conn, crm_system_name, cib_command); rc = pcmk_legacy2rc(rc); if (rc == pcmk_rc_ok) { rc = cib_conn->cmds->query(cib_conn, NULL, &cib_object, cib_scope_local | cib_sync_call); } cib_conn->cmds->signoff(cib_conn); cib_delete(cib_conn); cib_conn = NULL; if (rc != pcmk_rc_ok) { rc = pcmk_legacy2rc(rc); g_set_error(error, PCMK__RC_ERROR, rc, "Live CIB query failed: %s (%d)", pcmk_rc_str(rc), rc); return rc; } else if (cib_object == NULL) { g_set_error(error, PCMK__EXITC_ERROR, CRM_EX_NOINPUT, "Live CIB query failed: empty result"); return pcmk_rc_no_input; } } else if (pcmk__str_eq(input, "-", pcmk__str_casei)) { cib_object = filename2xml(NULL); } else { cib_object = filename2xml(input); } if (get_object_root(XML_CIB_TAG_STATUS, cib_object) == NULL) { create_xml_node(cib_object, XML_CIB_TAG_STATUS); } if (cli_config_update(&cib_object, NULL, FALSE) == FALSE) { free_xml(cib_object); return pcmk_rc_transform_failed; } if (validate_xml(cib_object, NULL, FALSE) != TRUE) { free_xml(cib_object); return pcmk_rc_schema_validation; } if (output == NULL) { char *pid = pcmk__getpid_s(); local_output = get_shadow_file(pid); temp_shadow = strdup(local_output); output = local_output; free(pid); } rc = write_xml_file(cib_object, output, FALSE); free_xml(cib_object); cib_object = NULL; if (rc < 0) { rc = pcmk_legacy2rc(rc); g_set_error(error, PCMK__EXITC_ERROR, CRM_EX_CANTCREAT, "Could not create '%s': %s", output, pcmk_rc_str(rc)); return rc; } else { setenv("CIB_file", output, 1); free(local_output); return pcmk_rc_ok; } } static void profile_one(const char *xml_file, long long repeat, pe_working_set_t *data_set, char *use_date) { pcmk__output_t *out = data_set->priv; xmlNode *cib_object = NULL; clock_t start = 0; clock_t end; cib_object = filename2xml(xml_file); start = clock(); if (get_object_root(XML_CIB_TAG_STATUS, cib_object) == NULL) { create_xml_node(cib_object, XML_CIB_TAG_STATUS); } if (cli_config_update(&cib_object, NULL, FALSE) == FALSE) { free_xml(cib_object); return; } if (validate_xml(cib_object, NULL, FALSE) != TRUE) { free_xml(cib_object); return; } for (int i = 0; i < repeat; ++i) { xmlNode *input = (repeat == 1)? cib_object : copy_xml(cib_object); data_set->input = input; get_date(data_set, false, use_date); pcmk__schedule_actions(data_set, input, NULL); pe_reset_working_set(data_set); } end = clock(); out->message(out, "profile", xml_file, start, end); } #ifndef FILENAME_MAX # define FILENAME_MAX 512 #endif static void profile_all(const char *dir, long long repeat, pe_working_set_t *data_set, char *use_date) { pcmk__output_t *out = data_set->priv; struct dirent **namelist; int file_num = scandir(dir, &namelist, 0, alphasort); if (file_num > 0) { struct stat prop; char buffer[FILENAME_MAX]; out->begin_list(out, NULL, NULL, "Timings"); while (file_num--) { if ('.' == namelist[file_num]->d_name[0]) { free(namelist[file_num]); continue; } else if (!pcmk__ends_with_ext(namelist[file_num]->d_name, ".xml")) { free(namelist[file_num]); continue; } snprintf(buffer, sizeof(buffer), "%s/%s", dir, namelist[file_num]->d_name); if (stat(buffer, &prop) == 0 && S_ISREG(prop.st_mode)) { profile_one(buffer, repeat, data_set, use_date); } free(namelist[file_num]); } free(namelist); out->end_list(out); } } PCMK__OUTPUT_ARGS("profile", "const char *", "clock_t", "clock_t") static int profile_default(pcmk__output_t *out, va_list args) { const char *xml_file = va_arg(args, const char *); clock_t start = va_arg(args, clock_t); clock_t end = va_arg(args, clock_t); out->list_item(out, NULL, "Testing %s ... %.2f secs", xml_file, (end - start) / (float) CLOCKS_PER_SEC); return pcmk_rc_ok; } PCMK__OUTPUT_ARGS("profile", "const char *", "clock_t", "clock_t") static int profile_xml(pcmk__output_t *out, va_list args) { const char *xml_file = va_arg(args, const char *); clock_t start = va_arg(args, clock_t); clock_t end = va_arg(args, clock_t); char *duration = pcmk__ftoa((end - start) / (float) CLOCKS_PER_SEC); pcmk__output_create_xml_node(out, "timing", "file", xml_file, "duration", duration, NULL); free(duration); return pcmk_rc_ok; } static pcmk__message_entry_t fmt_functions[] = { { "profile", "default", profile_default, }, { "profile", "xml", profile_xml }, { NULL } }; static void crm_simulate_register_messages(pcmk__output_t *out) { pcmk__register_messages(out, fmt_functions); } static GOptionContext * build_arg_context(pcmk__common_args_t *args, GOptionGroup **group) { GOptionContext *context = NULL; GOptionEntry extra_prog_entries[] = { { "quiet", 'Q', 0, G_OPTION_ARG_NONE, &(args->quiet), "Display only essential output", NULL }, { NULL } }; const char *description = "Operation Specification:\n\n" "The OPSPEC in any command line option is of the form\n" "${resource}_${task}_${interval_in_ms}@${node}=${rc}\n" "(memcached_monitor_20000@bart.example.com=7, for example).\n" "${rc} is an OCF return code. For more information on these\n" - "return codes, refer to https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html/Pacemaker_Administration/s-ocf-return-codes.html\n\n" + "return codes, refer to https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Administration/html/agents.html#ocf-return-codes\n\n" "Examples:\n\n" "Pretend a recurring monitor action found memcached stopped on node\n" "fred.example.com and, during recovery, that the memcached stop\n" "action failed:\n\n" "\tcrm_simulate -LS --op-inject memcached:0_monitor_20000@bart.example.com=7 " "--op-fail memcached:0_stop_0@fred.example.com=1 --save-output /tmp/memcached-test.xml\n\n" "Now see what the reaction to the stop failed would be:\n\n" "\tcrm_simulate -S --xml-file /tmp/memcached-test.xml\n\n"; context = pcmk__build_arg_context(args, "text (default), xml", group, NULL); pcmk__add_main_args(context, extra_prog_entries); g_option_context_set_description(context, description); pcmk__add_arg_group(context, "operations", "Operations:", "Show operations options", operation_entries); pcmk__add_arg_group(context, "synthetic", "Synthetic Cluster Events:", "Show synthetic cluster event options", synthetic_entries); pcmk__add_arg_group(context, "artifact", "Artifact Options:", "Show artifact options", artifact_entries); pcmk__add_arg_group(context, "source", "Data Source:", "Show data source options", source_entries); return context; } int main(int argc, char **argv) { int printed = pcmk_rc_no_output; int rc = pcmk_rc_ok; pe_working_set_t *data_set = NULL; pcmk__output_t *out = NULL; xmlNode *input = NULL; GError *error = NULL; GOptionGroup *output_group = NULL; pcmk__common_args_t *args = pcmk__new_common_args(SUMMARY); gchar **processed_args = pcmk__cmdline_preproc(argv, "bdefgiqrtuwxDFGINO"); GOptionContext *context = build_arg_context(args, &output_group); /* This must come before g_option_context_parse_strv. */ options.xml_file = strdup("-"); pcmk__register_formats(output_group, formats); if (!g_option_context_parse_strv(context, &processed_args, &error)) { exit_code = CRM_EX_USAGE; goto done; } pcmk__cli_init_logging("crm_simulate", args->verbosity); rc = pcmk__output_new(&out, args->output_ty, args->output_dest, argv); if (rc != pcmk_rc_ok) { fprintf(stderr, "Error creating output format %s: %s\n", args->output_ty, pcmk_rc_str(rc)); exit_code = CRM_EX_ERROR; goto done; } if (pcmk__str_eq(args->output_ty, "text", pcmk__str_null_matches) && !options.show_scores && !options.show_utilization) { pcmk__force_args(context, &error, "%s --text-fancy", g_get_prgname()); } else if (pcmk__str_eq(args->output_ty, "xml", pcmk__str_none)) { pcmk__force_args(context, &error, "%s --xml-simple-list --xml-substitute", g_get_prgname()); } crm_simulate_register_messages(out); pe__register_messages(out); pcmk__register_lib_messages(out); out->quiet = args->quiet; if (args->version) { out->version(out, false); goto done; } if (args->verbosity > 0) { #ifdef PCMK__COMPAT_2_0 /* Redirect stderr to stdout so we can grep the output */ close(STDERR_FILENO); dup2(STDOUT_FILENO, STDERR_FILENO); #endif action_numbers = TRUE; } data_set = pe_new_working_set(); if (data_set == NULL) { rc = ENOMEM; g_set_error(&error, PCMK__RC_ERROR, rc, "Could not allocate working set"); goto done; } if (options.show_scores) { pe__set_working_set_flags(data_set, pe_flag_show_scores); } if (options.show_utilization) { pe__set_working_set_flags(data_set, pe_flag_show_utilization); } pe__set_working_set_flags(data_set, pe_flag_no_compat); if (options.test_dir != NULL) { data_set->priv = out; profile_all(options.test_dir, options.repeat, data_set, options.use_date); rc = pcmk_rc_ok; goto done; } rc = setup_input(options.xml_file, options.store ? options.xml_file : options.output_file, &error); if (rc != pcmk_rc_ok) { goto done; } global_cib = cib_new(); rc = global_cib->cmds->signon(global_cib, crm_system_name, cib_command); if (rc != pcmk_rc_ok) { rc = pcmk_legacy2rc(rc); g_set_error(&error, PCMK__RC_ERROR, rc, "Could not connect to the CIB: %s", pcmk_rc_str(rc)); goto done; } rc = global_cib->cmds->query(global_cib, NULL, &input, cib_sync_call | cib_scope_local); if (rc != pcmk_rc_ok) { rc = pcmk_legacy2rc(rc); g_set_error(&error, PCMK__RC_ERROR, rc, "Could not get local CIB: %s", pcmk_rc_str(rc)); goto done; } data_set->input = input; data_set->priv = out; get_date(data_set, true, options.use_date); if(options.xml_file) { pe__set_working_set_flags(data_set, pe_flag_sanitized); } if (options.show_scores) { pe__set_working_set_flags(data_set, pe_flag_show_scores); } if (options.show_utilization) { pe__set_working_set_flags(data_set, pe_flag_show_utilization); } cluster_status(data_set); if (!out->is_quiet(out)) { unsigned int show_opts = options.print_pending ? pcmk_show_pending : 0; if (pcmk_is_set(data_set->flags, pe_flag_maintenance_mode)) { printed = out->message(out, "maint-mode", data_set->flags); } if (data_set->disabled_resources || data_set->blocked_resources) { PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); printed = out->info(out, "%d of %d resource instances DISABLED and %d BLOCKED " "from further action due to failure", data_set->disabled_resources, data_set->ninstances, data_set->blocked_resources); } PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); /* Most formatted output headers use caps for each word, but this one * only has the first word capitalized for compatibility with pcs. */ out->begin_list(out, NULL, NULL, "Current cluster status"); print_cluster_status(data_set, show_opts); out->end_list(out); printed = pcmk_rc_ok; } if (options.modified) { PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); modify_configuration(data_set, global_cib, options.quorum, options.watchdog, options.node_up, options.node_down, options.node_fail, options.op_inject, options.ticket_grant, options.ticket_revoke, options.ticket_standby, options.ticket_activate); printed = pcmk_rc_ok; rc = global_cib->cmds->query(global_cib, NULL, &input, cib_sync_call); if (rc != pcmk_rc_ok) { rc = pcmk_legacy2rc(rc); g_set_error(&error, PCMK__RC_ERROR, rc, "Could not get modified CIB: %s", pcmk_rc_str(rc)); goto done; } cleanup_calculations(data_set); data_set->input = input; data_set->priv = out; get_date(data_set, true, options.use_date); if(options.xml_file) { pe__set_working_set_flags(data_set, pe_flag_sanitized); } if (options.show_scores) { pe__set_working_set_flags(data_set, pe_flag_show_scores); } if (options.show_utilization) { pe__set_working_set_flags(data_set, pe_flag_show_utilization); } cluster_status(data_set); } if (options.input_file != NULL) { rc = write_xml_file(input, options.input_file, FALSE); if (rc < 0) { rc = pcmk_legacy2rc(rc); g_set_error(&error, PCMK__RC_ERROR, rc, "Could not create '%s': %s", options.input_file, pcmk_rc_str(rc)); goto done; } } if (options.process || options.simulate) { crm_time_t *local_date = NULL; pcmk__output_t *logger_out = NULL; if (pcmk_all_flags_set(data_set->flags, pe_flag_show_scores|pe_flag_show_utilization)) { PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); out->begin_list(out, NULL, NULL, "Allocation Scores and Utilization Information"); printed = pcmk_rc_ok; } else if (pcmk_is_set(data_set->flags, pe_flag_show_scores)) { PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); out->begin_list(out, NULL, NULL, "Allocation Scores"); printed = pcmk_rc_ok; } else if (pcmk_is_set(data_set->flags, pe_flag_show_utilization)) { PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); out->begin_list(out, NULL, NULL, "Utilization Information"); printed = pcmk_rc_ok; } else { logger_out = pcmk__new_logger(); if (logger_out == NULL) { goto done; } data_set->priv = logger_out; } pcmk__schedule_actions(data_set, input, local_date); if (logger_out == NULL) { out->end_list(out); } else { logger_out->finish(logger_out, CRM_EX_OK, true, NULL); pcmk__output_free(logger_out); data_set->priv = out; } input = NULL; /* Don't try and free it twice */ if (options.graph_file != NULL) { write_xml_file(data_set->graph, options.graph_file, FALSE); } if (options.dot_file != NULL) { if (!create_dotfile(data_set, options.dot_file, options.all_actions, &error)) { goto done; } } if (!out->is_quiet(out)) { GList *gIter = NULL; PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); out->begin_list(out, NULL, NULL, "Transition Summary"); LogNodeActions(data_set); for (gIter = data_set->resources; gIter != NULL; gIter = gIter->next) { pe_resource_t *rsc = (pe_resource_t *) gIter->data; LogActions(rsc, data_set); } out->end_list(out); printed = pcmk_rc_ok; } } rc = pcmk_rc_ok; if (options.simulate) { PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); if (run_simulation(data_set, global_cib, options.op_fail) != pcmk_rc_ok) { rc = pcmk_rc_error; } printed = pcmk_rc_ok; if (!out->is_quiet(out)) { get_date(data_set, true, options.use_date); PCMK__OUTPUT_SPACER_IF(out, printed == pcmk_rc_ok); out->begin_list(out, NULL, NULL, "Revised Cluster Status"); if (options.show_scores) { pe__set_working_set_flags(data_set, pe_flag_show_scores); } if (options.show_utilization) { pe__set_working_set_flags(data_set, pe_flag_show_utilization); } cluster_status(data_set); print_cluster_status(data_set, 0); out->end_list(out); } } done: pcmk__output_and_clear_error(error, NULL); /* There sure is a lot to free in options. */ free(options.dot_file); free(options.graph_file); g_free(options.input_file); g_list_free_full(options.node_up, g_free); g_list_free_full(options.node_down, g_free); g_list_free_full(options.node_fail, g_free); g_list_free_full(options.op_fail, g_free); g_list_free_full(options.op_inject, g_free); g_free(options.output_file); free(options.quorum); g_free(options.test_dir); g_list_free_full(options.ticket_grant, g_free); g_list_free_full(options.ticket_revoke, g_free); g_list_free_full(options.ticket_standby, g_free); g_list_free_full(options.ticket_activate, g_free); free(options.use_date); free(options.watchdog); free(options.xml_file); pcmk__free_arg_context(context); g_strfreev(processed_args); if (data_set) { pe_free_working_set(data_set); } if (global_cib) { global_cib->cmds->signoff(global_cib); cib_delete(global_cib); } fflush(stderr); if (temp_shadow) { unlink(temp_shadow); free(temp_shadow); } if (rc != pcmk_rc_ok) { exit_code = pcmk_rc2exitc(rc); } if (out != NULL) { out->finish(out, exit_code, true, NULL); pcmk__output_free(out); } crm_exit(exit_code); }