diff --git a/cts/README.md b/cts/README.md index faebf9b0a2..69acef9cda 100644 --- a/cts/README.md +++ b/cts/README.md @@ -1,304 +1,304 @@ # Pacemaker Cluster Test Suite (CTS) The Cluster Test Suite (CTS) refers to all Pacemaker testing code that can be run in an installed environment. (Pacemaker also has unit tests that must be run from a source distribution.) CTS includes: * Regression tests: These test specific Pacemaker components individually (no integration tests). The primary front end is cts-regression in this directory. Run it with the --help option to see its usage. cts-regression is a wrapper for individual component regression tests also in this directory (cts-cli, cts-exec, cts-fencing, and cts-scheduler). The CLI and scheduler regression tests can also be run from a source distribution. The other regression tests can only run in an installed environment, and the cluster should not be running on the node running these tests. * The CTS lab: This is a cluster exerciser for intensively testing the behavior of an entire working cluster. It is primarily for developers and packagers of the Pacemaker source code, but it can be useful for users who wish to see how their cluster will react to various situations. Most of the lab code is in the Pacemaker Python module. The front end, cts-lab, is in this directory. The CTS lab runs a randomized series of predefined tests on the cluster. It can be run against a pre-existing cluster configuration or overwrite the existing configuration with a test configuration. * Helpers: Some of the component regression tests and the CTS lab require certain helpers to be installed as root. These include a dummy LSB init script, dummy systemd service, etc. In a source distribution, the source for these is in cts/support. The tests will install these as needed and uninstall them when done. This means that the cluster configuration created by the CTS lab will generate failures if started manually after the lab exits. However, the helper installer can be run manually to make the configuration usable, if you want to do your own further testing with it: /usr/libexec/pacemaker/cts-support install As you might expect, you can also remove the helpers with: /usr/libexec/pacemaker/cts-support uninstall (The actual directory location may vary depending on how Pacemaker was built.) * Cluster benchmark: The benchmark subdirectory of this directory contains some cluster test environment benchmarking code. It is not particularly useful for end users. * Valgrind suppressions: When memory-testing Pacemaker code with valgrind, various bugs in non-Pacemaker libraries and such can clutter the results. The valgrind-pcmk.suppressions file in this directory can be used with valgrind's --suppressions option to eliminate many of these. ## Using the CTS lab ### Requirements * Three or more machines (one test exerciser and at least two cluster nodes). * The test cluster nodes should be on the same subnet and have journalling filesystems (ext4, xfs, etc.) for all of their filesystems other than /boot. You also need a number of free IP addresses on that subnet if you intend to test IP address takeover. * The test exerciser machine doesn't need to be on the same subnet as the test cluster machines. Minimal demands are made on the exerciser; it just has to stay up during the tests. * Tracking problems is easier if all machines' clocks are closely synchronized. NTP does this automatically, but you can do it by hand if you want. * The account on the exerciser used to run the CTS lab (which does not need to be root) must be able to ssh as root to the cluster nodes without a password challenge. See the Mini-HOWTO at the end of this file for details about how to configure ssh for this. * The exerciser needs to be able to resolve all cluster node names, whether by DNS or /etc/hosts. * CTS is not guaranteed to run on all platforms that Pacemaker itself does. It calls commands such as service that may not be provided by all OSes. ### Preparation * Install Pacemaker, including the testing code, on all machines. The testing code must be the same version as the rest of Pacemaker, and the Pacemaker version must be the same on the exerciser and all cluster nodes. You can install from source, although many distributions package the testing code (named pacemaker-cts or similar). Typically, everything needed by the CTS lab is installed in /usr/share/pacemaker/tests/cts. * Configure the cluster layer (Corosync) on the cluster machines (*not* the exerciser), and verify it works. Node names used in the cluster configuration *must* match the hosts' names as returned by `uname -n`; they do not have to match the machines' fully qualified domain names. * Optionally, configure the exerciser as a log aggregator, using something like `rsyslog` log forwarding. If aggregation is detected, the exerciser will look for new messages locally instead of requesting them repeatedly from cluster nodes. * Currently, `/var/log/messages` on the exerciser is the only supported log destination. Further, if it's specified explicitly on the command line as the log file, then CTS lab will not check for aggregation. * CTS lab does not currently detect systemd journal log aggregation. * Optionally, if the lab nodes use the systemd journal for logs, create /etc/systemd/journald.conf.d/cts-lab.conf on each with `RateLimitIntervalSec=0` or `RateLimitBurst=0`, to avoid issues with log detection. ### Run The primary interface to the CTS lab is the cts-lab executable: /usr/share/pacemaker/tests/cts-lab [options] (The actual directory location may vary depending on how Pacemaker was built.) As part of the options, specify the cluster nodes with --nodes, for example: --nodes "pcmk-1 pcmk-2 pcmk-3" Most people will want to save the output to a file, for example: --outputfile ~/cts.log Unless you want to test a pre-existing cluster configuration, you also want (*warning*: with these options, any existing configuration will be lost): --clobber-cib --populate-resources You can test floating IP addresses (*not* already used by any host), one per cluster node, by specifying the first, for example: --test-ip-base 192.168.9.100 Configure some sort of fencing, for example to use fence\_xvm: - --stonith xvm + --fencing-agent fence_xvm Putting all the above together, a command line might look like: /usr/share/pacemaker/tests/cts-lab --nodes "pcmk-1 pcmk-2 pcmk-3" \ --outputfile ~/cts.log --clobber-cib --populate-resources \ - --test-ip-base 192.168.9.100 --stonith xvm 50 + --test-ip-base 192.168.9.100 --fencing-agent fence_xvm 50 For more options, run with the --help option. There is also a wrapper for cts-lab that some users may find more convenient: cluster\_test, which is in the source directory and typically not installed. ### Optional: Memory testing Pacemaker has various options for testing memory management. On cluster nodes, Pacemaker components use various environment variables to control these options. How these variables are set varies by OS, but usually they are set in a file such as /etc/sysconfig/pacemaker or /etc/default/pacemaker. Valgrind is a program for detecting memory management problems such as use-after-free errors. If you have valgrind installed, you can enable it by setting the following environment variables on all cluster nodes: PCMK_valgrind_enabled=pacemaker-attrd,pacemaker-based,pacemaker-controld,pacemaker-execd,pacemaker-fenced,pacemaker-schedulerd VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions --gen-suppressions=all" These options should only be set while specifically testing memory management, because they may slow down the cluster significantly, and they will disable writes to the CIB. If desired, you can enable valgrind on a subset of pacemaker components rather than all of them as listed above. Valgrind will put a text file for each process in the location specified by valgrind's --log-file option. See https://www.valgrind.org/docs/manual/mc-manual.html for explanations of the messages valgrind generates. Separately, if you are using the GNU C library, the G\_SLICE, MALLOC\_PERTURB\_, and MALLOC\_CHECK\_ environment variables can be set to affect the library's memory management functions. When using valgrind, G\_SLICE should be set to "always-malloc", which helps valgrind track memory by always using the malloc() and free() routines directly. When not using valgrind, G\_SLICE can be left unset, or set to "debug-blocks", which enables the C library to catch many memory errors but may impact performance. If the MALLOC\_PERTURB\_ environment variable is set to an 8-bit integer, the C library will initialize all newly allocated bytes of memory to the integer value, and will set all newly freed bytes of memory to the bitwise inverse of the integer value. This helps catch uses of uninitialized or freed memory blocks that might otherwise go unnoticed. Example: MALLOC_PERTURB_=221 If the MALLOC\_CHECK\_ environment variable is set, the C library will check for certain heap corruption errors. The most useful value in testing is 3, which will cause the library to print a message to stderr and abort execution. Example: MALLOC_CHECK_=3 Valgrind should be enabled for either all nodes or none when used with the CTS lab, but the C library variables may be set differently on different nodes. ### Optional: Remote node testing If the pacemaker-remoted daemon is installed on all cluster nodes, the CTS lab will enable remote node tests. The remote node tests choose a random node, stop the cluster on it, start pacemaker-remoted on it, and add an ocf:pacemaker:remote resource to turn it into a remote node. When the test is done, the lab will turn the node back into a cluster node. To avoid conflicts, the lab will rename the node, prefixing the original node name with "remote-". For example, "pcmk-1" will become "remote-pcmk-1". These names do not need to be resolvable. The name change may require special fencing configuration, if the fence agent expects the node name to be the same as its hostname. A common approach is to specify the "remote-" names in pcmk\_host\_list. If you use pcmk\_host\_list=all, the lab will expand that to all cluster nodes and their "remote-" names. You may additionally need a pcmk\_host\_map argument to map the "remote-" names to the hostnames. Example: - --stonith xvm --stonith-args \ + --fencing-agent fence_xvm --fencing-params \ pcmk_host_list=all,pcmk_host_map=remote-pcmk-1:pcmk-1;remote-pcmk-2:pcmk-2 ### Optional: Remote node testing with valgrind When running the remote node tests, the Pacemaker components on the *cluster* nodes can be run under valgrind as described in the "Memory testing" section. However, pacemaker-remoted cannot be run under valgrind that way, because it is started by the OS's regular boot system and not by Pacemaker. Details vary by system, but the goal is to set the VALGRIND\_OPTS environment variable and then start pacemaker-remoted by prefixing it with the path to valgrind. The init script and systemd service file provided with pacemaker-remoted will load the pacemaker environment variables from the same location used by other Pacemaker components, so VALGRIND\_OPTS will be set correctly if using one of those. For an OS using systemd, you can override the ExecStart parameter to run valgrind. For example: mkdir /etc/systemd/system/pacemaker_remote.service.d cat >/etc/systemd/system/pacemaker_remote.service.d/valgrind.conf <&2 } usage() { echo "usage: $0 " echo " dir: working directory (with the control file)" exit 0 } [ $# -eq 0 ] && usage WORKDIR=$1 test -d "$WORKDIR" || usage CTSCTRL=~/.cts CTRL=$WORKDIR/control CSV=$WORKDIR/bench.csv STATS=$WORKDIR/bench.stats test -f $CTRL && . $CTRL @datadir@/@PACKAGE@/tests/cts/cluster_test 500 || { msg "cluster_test failed" exit 1 } test -f $CTSCTRL || { msg no CTS control file $CTSCTRL exit 1 } . $CTSCTRL : ${CTS_logfile:="@CRM_LOG_DIR@/ha-log-bench"} : ${CTS_adv:="--schema pacemaker-1.2 --clobber-cib -r"} : ${RUNS:=3} : ${CTSTESTS:="--benchmark"} : ${CTSDIR:="@datadir@/@PACKAGE@/tests/cts"} : ${CTS_node_list:=""} : ${CTS_stonith:=""} : ${CTS_stonith_args:=""} [ -n "$CTS_node_list" ] || { msg no node list specified exit 1 } CTSOPTS="$CTS_adv --logfile $CTS_logfile" if [ "x$CTS_stonith" != "x" ]; then - CTSOPTS="$CTSOPTS --stonith-type $CTS_stonith" + CTSOPTS="$CTSOPTS --fencing-agent $CTS_stonith" [ "x$CTS_stonith_args" != "x" ] && - CTSOPTS="$CTSOPTS --stonith-params \"$CTS_stonith_args\"" + CTSOPTS="$CTSOPTS --fencing-params \"$CTS_stonith_args\"" else - CTSOPTS="$CTSOPTS --stonith 0" + CTSOPTS="$CTSOPTS --disable-fencing" fi CTSOPTS="$CTSOPTS $CTSTESTS" fibonacci() { F_LIMIT=$1 F_N=2 F_N_PREV=1 while [ $F_N -le $F_LIMIT ]; do echo $F_N F_N_TMP=$F_N F_N=$((F_N+F_N_PREV)) F_N_PREV=$F_N_TMP done [ $F_N_PREV -ne $F_LIMIT ] && echo $F_LIMIT } [ "$SERIES" ] || SERIES=$(fibonacci "$(echo $CTS_node_list | wc -w)") get_nodes() { GN_C_NODES=$(echo $CTS_node_list | awk -v n="$1" ' { for( i=1; i<=NF; i++ ) node[cnt++]=$i } END{for( i=0; i "$RC_ODIR/ctsrun.out" 2>&1 & ctspid=$! tail -f "$RC_ODIR/ctsrun.out" & tailpid=$! wait $ctspid kill $tailpid >/dev/null 2>&1 } bench_re='CTS:.*runtime:' diginfo() { DI_CTS_DIR="$1" DI_S="$2" filter="$3" ( cd "$DI_CTS_DIR" || return for r in [0-9]*.tar.bz2; do tar xjf $r DI_D=$(basename "$r" .tar.bz2) for DI_V in $(grep "$bench_re" "$DI_D/ha-log.txt" | eval "$filter"); do DI_S="$DI_S,$DI_V" done rm -r "$DI_D" done echo $DI_S ) } printheader() { diginfo $1 "" "awk '{print \$(NF-2)}'" } printstats() { diginfo $1 "$clusize" "awk '{print \$(NF)}'" } printmedians() { PM_F="$1" PM_S="$clusize" PM_MIDDLE=$((RUNS/2 + 1)) set $(head -1 "$PM_F" | sed 's/,/ /g') PM_COLS=$# for PM_I in $(seq 2 $PM_COLS); do PM_V=$(awk -v i=$PM_I -F, '{print $i}' < $PM_F | sort -n | head -$PM_MIDDLE | tail -1) PM_S="$PM_S,$PM_V" done echo $PM_S } rm -f $CSV tmpf=`mktemp` test -f "$tmpf" || { msg "can't create temporary file" exit 1 } trap "rm -f $tmpf" 0 for clusize in $SERIES; do nodes=`get_nodes $clusize` outdir=$WORKDIR/$clusize rm -rf $outdir mkdir -p $outdir rm -f $tmpf node_cleanup for i in `seq $RUNS`; do true > $CTS_logfile mkdir -p $outdir/$i runcts $outdir/$i mkreports $outdir/$i printstats $outdir/$i >> $tmpf done [ -f "$CSV" ] || printheader $outdir/1 > $CSV printmedians $tmpf >> $CSV cat $tmpf >> $STATS msg "Statistics for $clusize-node cluster saved" done msg "Tests done for series $SERIES, output in $CSV and $STATS" # vim: set filetype=sh: diff --git a/cts/cluster_test.in b/cts/cluster_test.in index a898eff311..324fc55183 100755 --- a/cts/cluster_test.in +++ b/cts/cluster_test.in @@ -1,148 +1,148 @@ #!@BASH_PATH@ # # Copyright 2008-2025 the Pacemaker project contributors # # The version control history for this file may have further details. # # This source code is licensed under the GNU General Public License version 2 # or later (GPLv2+) WITHOUT ANY WARRANTY. # if [ -e ~/.cts ]; then . ~/.cts fi anyAsked=0 [ $# -lt 1 ] || CTS_numtests=$1 die() { echo "$@"; exit 1; } if [ -z "$CTS_asked_once" ]; then anyAsked=1 echo "This script should only be executed on the test exerciser." echo "The test exerciser will remotely execute the actions required by the" echo "tests and should not be part of the cluster itself." read -p "Is this host intended to be the test exerciser? (yN) " doUnderstand [ "$doUnderstand" = "y" ] \ || die "This script must be executed on the test exerciser" fi if [ -z "$CTS_node_list" ]; then anyAsked=1 read -p "Please list your cluster nodes (eg. node1 node2 node3): " CTS_node_list else echo "Beginning test of cluster: $CTS_node_list" fi [ "${CTS_node_list}" = "${CTS_node_list/$HOSTNAME/}" ] \ || die "This script must be executed on the test exerciser, and the test exerciser cannot be part of the cluster" printf "+ Bootstrapping ssh... " if [ -z "$SSH_AUTH_SOCK" ]; then printf "\n + Initializing SSH " eval "$(ssh-agent)" echo " + Adding identities..." ssh-add rc=$? if [ $rc -ne 0 ]; then echo " -- No identities added" printf "\nThe ability to open key-based 'ssh' connections (as the user 'root') is required to use CTS.\n" read -p " - Do you want this program to help you create one? (yN) " auto_fix if [ "$auto_fix" = "y" ]; then ssh-keygen -t dsa ssh-add else die "Please run 'ssh-keygen -t dsa' to create a new key" fi fi else echo "OK" fi test_ok=1 printf "+ Testing ssh configuration... " for n in $CTS_node_list; do ssh -l root -o PasswordAuthentication=no -o ConnectTimeout=5 "$n" /bin/true rc=$? if [ $rc -ne 0 ]; then echo " - connection to $n failed" test_ok=0 fi done if [ $test_ok -eq 0 ]; then printf "\nThe ability to open key-based 'ssh' connections (as the user 'root') is required to use CTS.\n" read -p " - Do you want this program to help you with such a setup? (yN) " auto_fix if [ "$auto_fix" = "y" ]; then # XXX are we picking the most suitable identity? privKey=$(ssh-add -L | head -n1 | cut -d" " -f3) sshCopyIdOpts="-o User=root" [ -z "$privKey" ] || sshCopyIdOpts+=" -i \"${privKey}.pub\"" for n in $CTS_node_list; do eval "ssh-copy-id $sshCopyIdOpts \"${n}\"" \ || die "Attempt to 'ssh-copy-id $sshCopyIdOpts \"$n\"' failed" done else die "Please install one of your SSH public keys to root's account on all cluster nodes" fi fi echo "OK" if [ -z "$CTS_logfile" ]; then anyAsked=1 read -p " + Where does/should syslog store logs from remote hosts? (/var/log/messages) " CTS_logfile [ -n "$CTS_logfile" ] || CTS_logfile=/var/log/messages fi [ -e "$CTS_logfile" ] || die "$CTS_logfile doesn't exist" if [ -z "$CTS_numtests" ]; then read -p "+ How many test iterations should be performed? (500) " CTS_numtests [ -n "$CTS_numtests" ] || CTS_numtests=500 fi if [ -z "$CTS_asked_once" ]; then anyAsked=1 read -p "+ What type of STONITH agent do you use? (none) " CTS_stonith [ -z "$CTS_stonith" ] \ || read -p "+ List any STONITH agent parameters (eq. device_host=switch.power.com): " CTS_stonith_args [ -n "$CTS_adv" ] \ || read -p "+ (Advanced) Any extra CTS parameters? (none) " CTS_adv fi [ $anyAsked -eq 0 ] \ || read -p "+ Save values to ~/.cts for next time? (yN) " doSave if [ "$doSave" = "y" ]; then cat > ~/.cts <<-EOF # CTS Test data CTS_node_list="$CTS_node_list" CTS_logfile="$CTS_logfile" CTS_logport="$CTS_logport" CTS_asked_once=1 CTS_adv="$CTS_adv" CTS_stonith="$CTS_stonith" CTS_stonith_args="$CTS_stonith_args" EOF fi cts_extra="" if [ -n "$CTS_stonith" ]; then - cts_extra="$cts_extra --stonith-type $CTS_stonith" + cts_extra="$cts_extra --fencing-agent $CTS_stonith" [ -z "$CTS_stonith_args" ] \ - || cts_extra="$cts_extra --stonith-params \"$CTS_stonith_args\"" + || cts_extra="$cts_extra --fencing-params \"$CTS_stonith_args\"" else - cts_extra="$cts_extra --stonith 0" + cts_extra="$cts_extra --disable-fencing" echo " - Testing a cluster without STONITH is like a blunt pencil... pointless" fi printf "\nAll set to go for %d iterations!\n" "$CTS_numtests" [ $anyAsked -ne 0 ] \ || echo "+ To use a different configuration, remove ~/.cts and re-run cts (or edit it manually)." echo Now paste the following command into this shell: echo "@PYTHON@ `dirname "$0"`/cts-lab -L \"$CTS_logfile\" --no-unsafe-tests $CTS_adv $cts_extra \"$CTS_numtests\" --nodes \"$CTS_node_list\"" # vim: set filetype=sh: diff --git a/python/pacemaker/_cts/environment.py b/python/pacemaker/_cts/environment.py index 369e32e276..520402a30d 100644 --- a/python/pacemaker/_cts/environment.py +++ b/python/pacemaker/_cts/environment.py @@ -1,483 +1,450 @@ """Test environment classes for Pacemaker's Cluster Test Suite (CTS).""" __all__ = ["EnvFactory", "set_cts_path"] __copyright__ = "Copyright 2014-2025 the Pacemaker project contributors" __license__ = "GNU General Public License version 2 or later (GPLv2+) WITHOUT ANY WARRANTY" import argparse from contextlib import suppress from glob import glob import os import random import shlex import socket import sys from pacemaker.buildoptions import BuildOptions from pacemaker._cts.logging import LogFactory from pacemaker._cts.remote import RemoteFactory from pacemaker._cts.watcher import LogKind class Environment: """ A class for managing the CTS environment. This consists largely of processing and storing command line parameters. """ # pylint doesn't understand that self._rsh is callable (it stores the # singleton instance of RemoteExec, as returned by the getInstance method # of RemoteFactory). # @TODO See if type annotations fix this. # I think we could also fix this by getting rid of the getInstance methods, # but that's a project for another day. For now, just disable the warning. # pylint: disable=not-callable def __init__(self, args): """ Create a new Environment instance. This class can be treated kind of like a dictionary due to the presence of typical dict functions like __contains__, __getitem__, and __setitem__. However, it is not a dictionary so do not rely on standard dictionary behavior. Arguments: args -- A list of command line parameters, minus the program name. If None, sys.argv will be used. """ self.data = {} # Set some defaults before processing command line arguments. These are # either not set by any command line parameter, or they need a default # that can't be set in add_argument. self["DeadTime"] = 300 self["StartTime"] = 300 self["StableTime"] = 30 self["tests"] = [] - self["DoFencing"] = True self["CIBResource"] = False self["log_kind"] = None self["scenario"] = "random" self["syslog_facility"] = "daemon" # Hard-coded since there is only one supported cluster manager/stack self["Name"] = "crm-corosync" self["Stack"] = "corosync 2+" self.random_gen = random.Random() self._logger = LogFactory() self._rsh = RemoteFactory().getInstance() self._parse_args(args) if not self["ListTests"]: self._validate() self._discover() def dump(self): """Print the current environment.""" for key in sorted(self.data.keys()): self._logger.debug(f"{f'Environment[{key}]':35}: {str(self[key])}") def __contains__(self, key): """Return True if the given key exists in the environment.""" return key in self.data def __getitem__(self, key): """Return the given environment key, or None if it does not exist.""" return self.data.get(key) def __setitem__(self, key, value): """Set the given environment key to the given value, overriding any previous value.""" if key == "nodes": self.data["nodes"] = [] for node in value: node = node.strip() # I don't think I need the IP address, etc. but this validates # the node name against /etc/hosts and/or DNS, so it's a # GoodThing(tm). try: # @TODO This only handles IPv4, use getaddrinfo() instead # (here and in _discover()) socket.gethostbyname_ex(node) self.data["nodes"].append(node) except socket.herror: self._logger.log(f"{node} not found in DNS... aborting") raise else: self.data[key] = value def random_node(self): """Choose a random node from the cluster.""" return self.random_gen.choice(self["nodes"]) def _detect_systemd(self, node): """Detect whether systemd is in use on the target node.""" if "have_systemd" not in self.data: (rc, _) = self._rsh(node, "systemctl list-units", verbose=0) self["have_systemd"] = rc == 0 def _detect_syslog(self, node): """Detect the syslog variant in use on the target node (if any).""" if "syslogd" in self.data: return if self["have_systemd"]: # Systemd (_, lines) = self._rsh(node, r"systemctl list-units | grep syslog.*\.service.*active.*running | sed 's:.service.*::'", verbose=1) else: # SYS-V (_, lines) = self._rsh(node, "chkconfig --list | grep syslog.*on | awk '{print $1}' | head -n 1", verbose=1) with suppress(IndexError): self["syslogd"] = lines[0].strip() def disable_service(self, node, service): """Disable the given service on the given node.""" if self["have_systemd"]: # Systemd (rc, _) = self._rsh(node, f"systemctl disable {service}") return rc # SYS-V (rc, _) = self._rsh(node, f"chkconfig {service} off") return rc def enable_service(self, node, service): """Enable the given service on the given node.""" if self["have_systemd"]: # Systemd (rc, _) = self._rsh(node, f"systemctl enable {service}") return rc # SYS-V (rc, _) = self._rsh(node, f"chkconfig {service} on") return rc def service_is_enabled(self, node, service): """Return True if the given service is enabled on the given node.""" if self["have_systemd"]: # Systemd # With "systemctl is-enabled", we should check if the service is # explicitly "enabled" instead of the return code. For example it returns # 0 if the service is "static" or "indirect", but they don't really count # as "enabled". (rc, _) = self._rsh(node, f"systemctl is-enabled {service} | grep enabled") return rc == 0 # SYS-V (rc, _) = self._rsh(node, f"chkconfig --list | grep -e {service}.*on") return rc == 0 def _detect_at_boot(self, node): """Detect if the cluster starts at boot.""" self["at-boot"] = any(self.service_is_enabled(node, service) for service in ("pacemaker", "corosync")) def _detect_ip_offset(self, node): """Detect the offset for IPaddr resources.""" if self["CIBResource"] and "IPBase" not in self.data: (_, lines) = self._rsh(node, "ip addr | grep inet | grep -v -e link -e inet6 -e '/32' -e ' lo' | awk '{print $2}'", verbose=0) network = lines[0].strip() (_, lines) = self._rsh(node, "nmap -sn -n %s | grep 'scan report' | awk '{print $NF}' | sed 's:(::' | sed 's:)::' | sort -V | tail -n 1" % network, verbose=0) try: self["IPBase"] = lines[0].strip() except (IndexError, TypeError): self["IPBase"] = None if not self["IPBase"]: self["IPBase"] = " fe80::1234:56:7890:1000" self._logger.log("Could not determine an offset for IPaddr resources. Perhaps nmap is not installed on the nodes.") self._logger.log(f"""Defaulting to '{self["IPBase"]}', use --test-ip-base to override""") return last_part = self["IPBase"].split('.')[3] if int(last_part) >= 240: self._logger.log(f"Could not determine an offset for IPaddr resources. Upper bound is too high: {self['IPBase']} {last_part}") self["IPBase"] = " fe80::1234:56:7890:1000" self._logger.log(f"""Defaulting to '{self["IPBase"]}', use --test-ip-base to override""") def _validate(self): """Check that we were given all required command line parameters.""" if not self["nodes"]: raise ValueError("No nodes specified!") def _discover(self): """Probe cluster nodes to figure out how to log and manage services.""" exerciser = socket.gethostname() # Use the IP where possible to avoid name lookup failures for ip in socket.gethostbyname_ex(exerciser)[2]: if ip != "127.0.0.1": exerciser = ip break self["cts-exerciser"] = exerciser node = self["nodes"][0] self._detect_systemd(node) self._detect_syslog(node) self._detect_at_boot(node) self._detect_ip_offset(node) def _parse_args(self, argv): """ Parse and validate command line parameters. Set the appropriate values in the environment dictionary. If argv is None, use sys.argv instead. """ if not argv: argv = sys.argv[1:] - parser = argparse.ArgumentParser(epilog=f"{sys.argv[0]} -g virt1 -r --stonith ssh --schema pacemaker-2.0 500") + parser = argparse.ArgumentParser() grp1 = parser.add_argument_group("Common options") grp1.add_argument("--benchmark", action="store_true", help="Add timing information") grp1.add_argument("--list", "--list-tests", action="store_true", dest="list_tests", help="List the valid tests") grp1.add_argument("--nodes", default="", metavar="NODES", help="List of cluster nodes separated by whitespace") grp2 = parser.add_argument_group("Options that CTS will usually auto-detect correctly") grp2.add_argument("-L", "--logfile", metavar="PATH", help="Where to look for logs from cluster nodes (or 'journal' for systemd journal)") grp2.add_argument("--ip", "--test-ip-base", metavar="IP", help="Offset for generated IP address resources") grp3 = parser.add_argument_group("Options for release testing") grp3.add_argument("-r", "--populate-resources", action="store_true", help="Generate a sample configuration") grp3.add_argument("--choose", metavar="NAME", help="Run only the named tests, separated by whitespace") - grp3.add_argument("--fencing", "--stonith", - choices=["1", "0", "yes", "no", "lha", "openstack", "rhcs", "rhevm", "scsi", "ssh", "virt", "xvm"], - default="1", - help="What fencing agent to use") + grp3.add_argument("--disable-fencing", + action="store_false", + dest="fencing_enabled", + help="Whether to disable fencing") + grp3.add_argument("--fencing-agent", + metavar="AGENT", + default="external/ssh", + help="Agent to use for a fencing resource") + + # @FIXME These params are meaningful only with --fencing-agent="external/ssh" + grp3.add_argument("--fencing-params", + metavar="PARAMS", + default="hostlist=all,livedangerously=yes", + help="Parameters for the fencing resource (comma-delimited)") + grp3.add_argument("--once", action="store_true", help="Run all valid tests once") grp4 = parser.add_argument_group("Additional (less common) options") grp4.add_argument("-c", "--clobber-cib", action="store_true", help="Erase any existing configuration") grp4.add_argument("-y", "--yes", action="store_true", dest="always_continue", help="Continue to run whenever prompted") grp4.add_argument("--boot", action="store_true", help="") grp4.add_argument("--cib-filename", metavar="PATH", help="Install the given CIB file to the cluster") grp4.add_argument("--no-unsafe-tests", action="store_true", help="Don't run tests that are unsafe for use with ocfs2/drbd") grp4.add_argument("--notification-agent", metavar="PATH", default="/var/lib/pacemaker/notify.sh", help="Script to configure for Pacemaker alerts") grp4.add_argument("--notification-recipient", metavar="R", default="/var/lib/pacemaker/notify.log", help="Recipient to pass to alert script") grp4.add_argument("--outputfile", metavar="PATH", help="Location to write logs to") grp4.add_argument("--schema", metavar="SCHEMA", default=f"pacemaker-{BuildOptions.CIB_SCHEMA_VERSION}", help="Create a CIB conforming to the given schema") grp4.add_argument("--seed", metavar="SEED", help="Use the given string as the random number seed") - grp4.add_argument("--stonith-args", - metavar="ARGS", - default="hostlist=all,livedangerously=yes", - help="") - grp4.add_argument("--stonith-type", - metavar="TYPE", - default="external/ssh", - help="") grp4.add_argument("--trunc", action="store_true", dest="truncate", help="Truncate log file before starting") parser.add_argument("iterations", nargs='?', type=int, default=1, help="Number of tests to run") args = parser.parse_args(args=argv) # Set values on this object based on what happened with command line # processing. This has to be done in several blocks. # These values can always be set. Most get a default from the add_argument # calls, they only do one thing, and they do not have any side effects. self["CIBfilename"] = args.cib_filename if args.cib_filename else None self["ClobberCIB"] = args.clobber_cib + self["DoFencing"] = args.fencing_enabled self["ListTests"] = args.list_tests self["Schema"] = args.schema self["TruncateLog"] = args.truncate self["benchmark"] = args.benchmark self["continue"] = args.always_continue self["iterations"] = args.iterations self["nodes"] = shlex.split(args.nodes) self["notification-agent"] = args.notification_agent self["notification-recipient"] = args.notification_recipient - self["stonith-params"] = args.stonith_args - self["stonith-type"] = args.stonith_type + self["stonith-params"] = args.fencing_params + self["stonith-type"] = args.fencing_agent self["unsafe-tests"] = not args.no_unsafe_tests # Everything else either can't have a default set in an add_argument # call (likely because we don't want to always have a value set for it) # or it does something fancier than just set a single value. However, # order does not matter for these as long as the user doesn't provide # conflicting arguments on the command line. So just do Everything # alphabetically. if args.boot: self["scenario"] = "boot" if args.choose: self["scenario"] = "sequence" self["tests"].extend(shlex.split(args.choose)) self["iterations"] = len(self["tests"]) - if args.fencing in ["0", "no"]: - self["DoFencing"] = False - - elif args.fencing in ["rhcs", "virt", "xvm"]: - self["stonith-type"] = "fence_xvm" - - elif args.fencing == "scsi": - self["stonith-type"] = "fence_scsi" - - elif args.fencing in ["lha", "ssh"]: - self["stonith-params"] = "hostlist=all,livedangerously=yes" - self["stonith-type"] = "external/ssh" - - elif args.fencing == "openstack": - self["stonith-type"] = "fence_openstack" - - print("Obtaining OpenStack credentials from the current environment") - region = os.environ['OS_REGION_NAME'] - tenant = os.environ['OS_TENANT_NAME'] - auth = os.environ['OS_AUTH_URL'] - user = os.environ['OS_USERNAME'] - password = os.environ['OS_PASSWORD'] - - self["stonith-params"] = f"region={region},tenant={tenant},auth={auth},user={user},password={password}" - - elif args.fencing == "rhevm": - self["stonith-type"] = "fence_rhevm" - - print("Obtaining RHEV-M credentials from the current environment") - user = os.environ['RHEVM_USERNAME'] - password = os.environ['RHEVM_PASSWORD'] - server = os.environ['RHEVM_SERVER'] - port = os.environ['RHEVM_PORT'] - - self["stonith-params"] = f"login={user},passwd={password},ipaddr={server},ipport={port},ssl=1,shell_timeout=10" - if args.ip: self["CIBResource"] = True self["ClobberCIB"] = True self["IPBase"] = args.ip if args.logfile == "journal": self["LogAuditDisabled"] = True self["log_kind"] = LogKind.JOURNAL elif args.logfile: self["LogAuditDisabled"] = True self["LogFileName"] = args.logfile self["log_kind"] = LogKind.REMOTE_FILE else: # We can't set this as the default on the parser.add_argument call # for this option because then args.logfile will be set, which means # the above branch will be taken and those other values will also be # set. self["LogFileName"] = "/var/log/messages" if args.once: self["scenario"] = "all-once" if args.outputfile: self["OutputFile"] = args.outputfile LogFactory().add_file(self["OutputFile"]) if args.populate_resources: self["CIBResource"] = True self["ClobberCIB"] = True self.random_gen.seed(args.seed) class EnvFactory: """A class for constructing a singleton instance of an Environment object.""" instance = None # pylint: disable=invalid-name def getInstance(self, args=None): """ Return the previously created instance of Environment. If no instance exists, create a new instance and return that. """ if not EnvFactory.instance: EnvFactory.instance = Environment(args) return EnvFactory.instance def set_cts_path(extra=None): """Set the PATH environment variable appropriately for the tests.""" new_path = os.environ['PATH'] # Add any search paths given on the command line if extra is not None: for p in extra: new_path = f"{p}:{new_path}" cwd = os.getcwd() if os.path.exists(f"{cwd}/cts/cts-attrd.in"): # pylint: disable=protected-access print(f"Running tests from the source tree: {BuildOptions._BUILD_DIR}") for d in glob(f"{BuildOptions._BUILD_DIR}/daemons/*/"): new_path = f"{d}:{new_path}" new_path = f"{BuildOptions._BUILD_DIR}/tools:{new_path}" new_path = f"{BuildOptions._BUILD_DIR}/cts/support:{new_path}" print(f"Using local schemas from: {cwd}/xml") os.environ["PCMK_schema_directory"] = f"{cwd}/xml" else: print(f"Running tests from the install tree: {BuildOptions.DAEMON_DIR} (not {cwd})") new_path = f"{BuildOptions.DAEMON_DIR}:{new_path}" os.environ["PCMK_schema_directory"] = BuildOptions.SCHEMA_DIR print(f'Using PATH="{new_path}"') os.environ['PATH'] = new_path