diff --git a/README-testing b/README-testing index 0d70c85..dee658b 100644 --- a/README-testing +++ b/README-testing @@ -1,104 +1,174 @@ There's a booth-test RPM available that contains two types of tests. It installs the necessary files into `/usr/share/booth/tests`. +=== Live tests (booth operation) + +BEWARE: Run this with _test_ clusters only! + +The live testing utility tests booth operation using the given +`booth.conf`: + + $ /usr/share/booth/tests/test/live_test.sh booth.conf + +It is possible to run only specific tests. See tail of the script +for the list of tests which are currently available. + +Example booth.conf: + +------------ +transport="UDP" +port="6666" +arbitrator="10.2.12.53" +arbitrator="10.2.13.82" +site="10.2.12.101" +site="10.2.13.101" +site="10.121.187.99" + +ticket="ticket-A" + expire = 30 + timeout = 3 + retries = 3 + before-acquire-handler = /usr/share/booth/service-runnable d-src1 +------------ + +A split brain condition is also tested. For that to work, all +sites need `iptables` installed. The supplied script `booth_path` +is used to manipulate iptables rules. + +It is not necessary to run the test script on one of the sites. +Just copy the script and make the test `booth.conf` available +locally: + +$ scp testsite:/usr/share/booth/tests/test/live_test.sh . +$ scp testsite:/etc/booth/booth.conf . +$ sh live_test.sh booth.conf + +You need at least two sites and one arbitrator. + +The ticket must be named `ticket-A`. + +It is not necessary to configure the `before-acquire-handler`. + +Notes: + +- (BEWARE!) the supplied configuration files is copied to + /etc/booth/booth.conf to all sites/arbitrators thus overwriting + any existing configuration + +- the utility uses ssh to manage booth at all sites/arbitrators + and logs in as user `root` + +- it is required that ssh public authentication works without + providing the passphrase (otherwise it is impractical) + +- the log file is ./test_booth.log (it is actually a shell trace, + with timestamps if you're running bash) + +- in case one of the tests fail, hb_report is created + +If you want to open a bug report, please attach all hb_reports +and `test_booth.log`. + + + === Simple tests (commandline, config file) Run (as non-root) # python test/runtests.py to run the tests written in python. === Unit tests These use gdb and pexpect to set boothd state to some configured value, injecting some input and looking at the output. # python script/unit-test.py src/boothd unit-tests/ Or, if using the 'booth-test' RPM, # python unit-test.py src/boothd unit-tests/ This must (currently?) be run as a non-root user; another optional argument is the test to start from, eg. '003'. Basically, boothd is started with the config file `unit-tests/booth.conf`, and gdb gets attached to it. Then, some ticket state is set, incoming messages are delivered, and outgoing messages and the state is compared to expected values. `unit-tests/_defaults.txt` has default values for the initial state and message data. Each test file consists of headers and key/value pairs: -------------------- ticket: state ST_STABLE message0: # optional comment for the log file header.cmd OP_ACCEPTING ticket.id "asdga" outgoing0: header.cmd OP_PREPARING last_ack_ballot 42 finally: new_ballot 1234 -------------------- A few details to the the above example: * Ticket states in RAM (`ticket`, `finally`) are written in host-endianness. * Message data (`messageN`, `outgoingN`) are automatically converted via `htonl` resp. `ntohl`. They are delivered/checked in the order defined by the integer `N` component. * Strings are done via `strcpy()` * `ticket` and `messageN` are assignment chunks * `finally` and `outgoingN` are compare chunks * In `outgoingN` you can check _both_ message data (keys with a `.` in them) and ticket state * Symbolic names are useable, GDB translates them for us * The test scripts in `unit-tests/` need to be named with 3 digits, an underscore, some text, and `.txt` * The "fake" `crm_ticket` script gets the current test via `UNIT_TEST`; test scripts can pass additional information via `UNIT_TEST_AUX`. ==== Tips and Hints There's another special header: `gdb__N__`. These lines are sent to GDB after injecting a message, but before waiting for an outgoing line. Values that contain `§` are sent as multiple lines to GDB. This means that a stanza like -------------------- gdb0: watch booth_conf->ticket[0].owner § commands § bt § c § end -------------------- will cause a watchpoint to be set, and when it is triggered a backtrace (`bt`) is written to the log file. This makes it easy to ask for additional data or check for a call-chain when hitting bugs that can be reproduced via such a unit-test. # vim: set ft=asciidoc : diff --git a/booth.spec.in b/booth.spec.in index 8f33649..2326145 100644 --- a/booth.spec.in +++ b/booth.spec.in @@ -1,117 +1,119 @@ %global test_path %{_datadir}/booth/tests %if 0%{?suse_version} %define _libexecdir %{_libdir} %endif %define with_extra_warnings 0 %define with_debugging 0 %define without_fatal_warnings 1 %if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version} %define pkg_group System Environment/Daemons %else %define pkg_group Productivity/Clustering/HA %endif Name: booth Summary: Ticket Manager for Multi-site Clusters License: GPL-2.0+ Group: %{pkg_group} Version: @version@ Release: 0 Source: booth.tar.bz2 Source1: %name-rpmlintrc BuildRoot: %{_tmppath}/%{name}-%{version}-build BuildRequires: asciidoc BuildRequires: autoconf BuildRequires: automake BuildRequires: glib2-devel BuildRequires: libglue-devel BuildRequires: libpacemaker-devel BuildRequires: libxml2-devel BuildRequires: pkgconfig # the following is probably SUSE specific Requires: pacemaker-ticket-support >= 2.0 %description Booth manages the ticket which authorizes one of the cluster sites located in geographically dispersed distances to run certain resources. It is designed to be an add-on of Pacemaker, which extends Pacemaker to support geographically distributed clustering. %prep %setup -q -n %{name} %build ./autogen.sh %configure \ --with-initddir=%{_initrddir} make #except check #%check #make check %install make DESTDIR=$RPM_BUILD_ROOT install docdir=%{_defaultdocdir}/%{name} mkdir -p %{buildroot}/%{_mandir}/man8/ gzip < docs/boothd.8 > %{buildroot}/%{_mandir}/man8/booth.8.gz ln %{buildroot}/%{_mandir}/man8/booth.8.gz %{buildroot}/%{_mandir}/man8/boothd.8.gz # systemd mkdir -p %{buildroot}/usr/lib/systemd/system/ cp -a conf/booth@.service %{buildroot}/usr/lib/systemd/system/booth@.service #install test-parts mkdir -p %{buildroot}/%{test_path} cp -a unit-tests/ script/unit-test.py test conf %{buildroot}/%{test_path}/ +chmod +x %{buildroot}/%{test_path}/booth_path +chmod +x %{buildroot}/%{test_path}/live_test.sh mkdir -p %{buildroot}/%{test_path}/src/ ln -s %{_sbindir}/boothd %{buildroot}/%{test_path}/src/ rm -f %{buildroot}/%{test_path}/test/*.pyc %clean rm -rf %{buildroot} %files %defattr(-,root,root,-) %{_sbindir}/booth %{_sbindir}/boothd %{_initrddir}/booth-arbitrator %{_mandir}/man8/booth.8.gz %{_mandir}/man8/boothd.8.gz %dir /usr/lib/ocf %dir /usr/lib/ocf/resource.d %dir /usr/lib/ocf/resource.d/pacemaker %dir %{_sysconfdir}/booth /usr/lib/ocf/resource.d/pacemaker/booth-site %config %{_sysconfdir}/booth/booth.conf.example /usr/lib/systemd/system/booth@.service %dir %{_datadir}/booth %{_datadir}/booth/service-runnable %doc README COPYING %package test Summary: Test scripts for Booth Group: %{pkg_group} Requires: booth Requires: python %description test This package contains automated tests for Booth, the Cluster Ticket Manager for Pacemaker. %files test %defattr(-,root,root) %doc README-testing %{test_path} %changelog diff --git a/test/booth_path b/test/booth_path new file mode 100755 index 0000000..6ea1402 --- /dev/null +++ b/test/booth_path @@ -0,0 +1,35 @@ +#!/bin/sh +# +# manage iptables rules for port 6666 +# + +[ $# -lt 1 ] && exit +action=$1 +port=6666 +testip() { + local chain=$1 + iptables -L $chain | grep -wq ^DROP.*$port +} +logcmd() { + logger -p local7.info "$*" + eval $* +} + +case "$action" in +start) +logcmd iptables -D INPUT -p udp --dport $port -j DROP +logcmd iptables -D OUTPUT -p udp --dport $port -j DROP +logcmd iptables -D INPUT -p udp --sport $port -j DROP +logcmd iptables -D OUTPUT -p udp --sport $port -j DROP +;; +stop) +testip INPUT && { + echo "packets from/to $port already being dropped!" + exit +} +logcmd iptables -A INPUT -p udp --dport $port -j DROP +logcmd iptables -A OUTPUT -p udp --dport $port -j DROP +logcmd iptables -A INPUT -p udp --sport $port -j DROP +logcmd iptables -A OUTPUT -p udp --sport $port -j DROP +;; +esac diff --git a/test/live_test.sh b/test/live_test.sh new file mode 100755 index 0000000..6e129d2 --- /dev/null +++ b/test/live_test.sh @@ -0,0 +1,479 @@ +#!/bin/sh +# +# see README-testing for more information +# do some basic booth operation tests for the given config +# + +usage() { + echo "$0: {booth.conf}" + exit +} + +[ $# -eq 0 ] && usage + +tkt=ticket-A +cnf=$1 +shift 1 +logf=test_booth.log +iprules=/usr/share/booth/tests/test/booth_path + +is_function() { + test z"`command -v $1`" = z"$1" +} +manage_site() { + ssh $1 crm resource $2 booth +} +manage_arbitrator() { + ssh $1 systemctl $2 booth@booth.service +} +start_site() { + manage_site $1 start +} +start_arbitrator() { + manage_arbitrator $1 start +} +stop_site_clean() { + manage_site $1 stop && + sleep 1 && + ssh $1 crm --force site ticket revoke $tkt +} +stop_site() { + manage_site $1 stop +} +stop_arbitrator() { + manage_arbitrator $1 stop +} +restart_site() { + manage_site $1 restart +} +restart_arbitrator() { + manage_arbitrator $1 restart +} +get_stat_fld() { + local h=$1 fld=$2 + ssh $h booth status | sed "s/.* $fld=//;s/ .*//;s/'//g" +} +booth_status() { + test "`get_stat_fld $1 booth_state`" = "started" +} +start_booth() { + local h + for h in $sites; do + start_site $h + done >/dev/null 2>&1 + for h in $arbitrators; do + start_arbitrator $h + done >/dev/null 2>&1 + wait_timeout +} +restart_booth() { + local h procs + for h in $sites; do + restart_site $h & procs="$! $procs" + done >/dev/null 2>&1 + for h in $arbitrators; do + restart_arbitrator $h + done >/dev/null 2>&1 + wait $procs + wait_timeout +} +sync_conf() { + local h rc=0 + for h in $sites $arbitrators; do + rsync -q $cnf $h:/etc/booth/booth.conf + rc=$((rc|$?)) + done + return $rc +} +forall() { + local h rc=0 + for h in $sites $arbitrators; do + ssh $h $@ + rc=$((rc|$?)) + done + return $rc +} +forall_fun() { + local h rc=0 f=$1 + for h in $sites $arbitrators; do + $f $h + rc=$((rc|$?)) + [ $rc -ne 0 ] && break + done + return $rc +} +run_site() { + local n=$1 h + shift 1 + h=`echo $sites | awk '{print $'$n'}'` + ssh $h $@ || { + echo "$h: '$@' failed (exit code $?)" >&2 + } +} +run_arbitrator() { + local n=$1 h + shift 1 + h=`echo $arbitrators | awk '{print $'$n'}'` + ssh $h $@ +} +get_site() { + local n=$1 h + shift 1 + echo $sites | awk '{print $'$n'}' +} + +get_servers() { + grep "^$1" | + sed -n 's/.*="//;s/"//p' +} + +get_tkt_settings() { +awk ' +n && /^ / && /expire|timeout/ { + sub(" = ", "=", $0); + sub("^ ", "T_", $0); + print + next +} +n && /^$/ {exit} +/^ticket.*'$tkt'/ {n=1} +' $cnf +} +wait_exp() { + sleep $T_expire +} +wait_half_exp() { + sleep $((T_expire/2)) +} +wait_timeout() { + sleep $T_timeout +} + +cib_status() { + local h=$1 stat + stat=`ssh $h crm_ticket -L | + grep "^$tkt" | awk '{print $2}'` + test "$stat" != "-1" +} +is_cib_granted() { + local stat h=$1 + stat=`ssh $h crm_ticket -L | + grep "^$tkt" | awk '{print $2}'` + [ "$stat" = "granted" ] +} +check_cib_consistency() { + local h gh="" rc=0 + for h in $sites; do + if is_cib_granted $h; then + [ -n "$gh" ] && rc=1 # granted twice + gh="$gh $h" + fi + done + [ -z "$gh" ] && gh="none" + if [ $rc -eq 0 ]; then + echo $gh + return $rc + fi + cat<&2 +CIB consistency test failed +ticket granted to $gh +EOF + return $rc +} +check_cib() { + local exp_grantee=$1 cib_grantee booth_grantee + local rc=0 pending + cib_grantee=`check_cib_consistency` + booth_grantee=`booth_where_granted` + pending=$? + if [ $pending -eq 0 ]; then + [ "$cib_grantee" = "$booth_grantee" ] + rc=$? + else + # ticket is not committed to cib yet + [ "$exp_grantee" = "$booth_grantee" ] + rc=$? + exp_grantee="" # cheat a bit + fi + case "$exp_grantee" in + "any") [ "$cib_grantee" != "none" ] ;; + "") [ "$cib_grantee" = "none" ] ;; + *) [ "$cib_grantee" = "$exp_grantee" ] ;; + esac + rc=$((rc|$?)) + if [ $rc -ne 0 ]; then + cat<&2 +CIB check failed +CIB grantee: $cib_grantee +booth grantee: $booth_grantee +expected grantee: $booth_grantee +EOF + fi + return $rc +} + +booth_where_granted() { + local grantee ticket_line + ticket_line=`run_arbitrator 1 booth list | grep $tkt` + grantee=`echo "$ticket_line" | sed 's/.*leader: //;s/,.*//'` + echo $grantee + ! ssh $grantee booth list | grep -q "$tkt.*pending" +} +check_booth_consistency() { + local cnt tlist + tlist=`forall booth list 2>/dev/null | grep $tkt | + sed 's/commit:.*//;s/NONE/none/'` + cnt=`echo "$tlist" | sort -u | wc -l` + test $cnt -eq 1 && return + cat<&2 +booth list consistency test failed: +=========== +"$tlist" +=========== +EOF + return 1 +} + +check_consistency() { + local exp_grantee=$1 + check_booth_consistency && + check_cib $exp_grantee +} + +reset_booth() { + start_booth + run_site 1 booth revoke $tkt >/dev/null + wait_timeout +} +test_booth_status() { + forall_fun booth_status +} + +runtest() { + local start_ts end_ts rc + local start_time end_time + start_time=`date` + start_ts=`date +%s` + echo -n "Testing: $1... " + test_$1 && check_$1 + rc=$? + end_time=`date` + end_ts=`date +%s` + is_function recover_$1 && recover_$1 + if [ $rc -eq 0 ]; then + echo OK + else + echo "FAIL (running hb_report ... $1.tar.bz2; see also $logf)" + echo "running hb_report" >&2 + hb_report -f "`date -d @$((start_ts-5))`" \ + -t "`date -d @$((end_ts+60))`" \ + -n "$sites $arbitrators" $1 >&2 + fi +} + +[ -f "$cnf" ] || { + ls $cnf + usage +} + +sites=`get_servers site < $cnf` +arbitrators=`get_servers arbitrator < $cnf` +eval `get_tkt_settings` + +[ -z "$sites" ] && { + echo no sites in $cnf + usage +} + +[ -z "$T_expire" ] && { + echo set $tkt expire time in $cnf + usage +} + +exec 2>$logf +BASH_XTRACEFD=2 +PS4='+ `date +"%T"`: ' +set -x + +# +# the tests +# + +# just a grant +test_grant() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout +} +check_grant() { + check_consistency `get_site 1` +} + +# just a revoke +test_revoke() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + run_site 1 booth revoke $tkt >/dev/null + wait_timeout +} +check_revoke() { + check_consistency +} + +# just a grant to another site +test_grant_elsewhere() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant -s `get_site 2` $tkt >/dev/null + wait_timeout +} +check_grant_elsewhere() { + check_consistency `get_site 2` +} + +# grant with one site lost +test_grant_site_lost() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + stop_site `get_site 2` + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + check_cib `get_site 1` || return 1 + wait_exp +} +check_grant_site_lost() { + check_consistency `get_site 1` +} +recover_grant_site_lost() { + start_site `get_site 2` +} + +# restart with ticket granted +test_restart_granted() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + restart_site `get_site 1` + wait_timeout +} +check_restart_granted() { + check_consistency `get_site 1` +} + +# restart with ticket granted (but cib empty) +test_restart_granted_nocib() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + restart_site `get_site 1` + wait_timeout +} +check_restart_granted_nocib() { + check_consistency `get_site 1` +} + +# restart with ticket not granted +test_restart_notgranted() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + stop_site `get_site 2` + sleep 1 + start_site `get_site 2` + wait_timeout +} +check_restart_notgranted() { + check_consistency `get_site 1` +} + +# ticket failover +test_failover() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + stop_site_clean `get_site 1` || return 1 + booth_status `get_site 1` && return 1 + wait_exp + wait_timeout +} +check_failover() { + check_consistency any + start_site `get_site 1` +} + +# split brain (leader alone) +test_split_leader() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + run_site 1 $iprules stop >/dev/null + wait_exp + wait_timeout + check_cib any || return 1 + run_site 1 $iprules start >/dev/null + wait_timeout +} +check_split_leader() { + check_consistency any +} +recover_split_leader() { + run_site 1 $iprules start >/dev/null +} + +# split brain (follower alone) +test_split_follower() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + run_site 2 $iprules stop >/dev/null + wait_exp + wait_timeout + run_site 2 $iprules start >/dev/null + wait_timeout +} +check_split_follower() { + check_consistency `get_site 1` +} + +# split brain (leader alone) +test_split_edge() { + run_site 1 booth revoke $tkt >/dev/null + wait_timeout + run_site 1 booth grant $tkt >/dev/null + wait_timeout + run_site 1 $iprules stop >/dev/null + wait_exp + run_site 1 $iprules start >/dev/null + wait_timeout +} +check_split_edge() { + check_consistency any +} + +sync_conf || exit +restart_booth +test_booth_status || { + reset_booth + test_booth_status || exit +} + +TESTS="$@" + +: ${TESTS:="grant grant_elsewhere grant_site_lost revoke +restart_granted restart_granted_nocib restart_notgranted +failover +split_leader split_follower split_edge"} + +for t in $TESTS; do + runtest $t +done