Page MenuHomeClusterLabs Projects

No OneTemporary

diff --git a/README-testing b/README-testing
index 0d70c85..dee658b 100644
--- a/README-testing
+++ b/README-testing
@@ -1,104 +1,174 @@
There's a booth-test RPM available that contains two types of tests.
It installs the necessary files into `/usr/share/booth/tests`.
+=== Live tests (booth operation)
+
+BEWARE: Run this with _test_ clusters only!
+
+The live testing utility tests booth operation using the given
+`booth.conf`:
+
+ $ /usr/share/booth/tests/test/live_test.sh booth.conf
+
+It is possible to run only specific tests. See tail of the script
+for the list of tests which are currently available.
+
+Example booth.conf:
+
+------------
+transport="UDP"
+port="6666"
+arbitrator="10.2.12.53"
+arbitrator="10.2.13.82"
+site="10.2.12.101"
+site="10.2.13.101"
+site="10.121.187.99"
+
+ticket="ticket-A"
+ expire = 30
+ timeout = 3
+ retries = 3
+ before-acquire-handler = /usr/share/booth/service-runnable d-src1
+------------
+
+A split brain condition is also tested. For that to work, all
+sites need `iptables` installed. The supplied script `booth_path`
+is used to manipulate iptables rules.
+
+It is not necessary to run the test script on one of the sites.
+Just copy the script and make the test `booth.conf` available
+locally:
+
+$ scp testsite:/usr/share/booth/tests/test/live_test.sh .
+$ scp testsite:/etc/booth/booth.conf .
+$ sh live_test.sh booth.conf
+
+You need at least two sites and one arbitrator.
+
+The ticket must be named `ticket-A`.
+
+It is not necessary to configure the `before-acquire-handler`.
+
+Notes:
+
+- (BEWARE!) the supplied configuration files is copied to
+ /etc/booth/booth.conf to all sites/arbitrators thus overwriting
+ any existing configuration
+
+- the utility uses ssh to manage booth at all sites/arbitrators
+ and logs in as user `root`
+
+- it is required that ssh public authentication works without
+ providing the passphrase (otherwise it is impractical)
+
+- the log file is ./test_booth.log (it is actually a shell trace,
+ with timestamps if you're running bash)
+
+- in case one of the tests fail, hb_report is created
+
+If you want to open a bug report, please attach all hb_reports
+and `test_booth.log`.
+
+
+
=== Simple tests (commandline, config file)
Run (as non-root)
# python test/runtests.py
to run the tests written in python.
=== Unit tests
These use gdb and pexpect to set boothd state to some configured value,
injecting some input and looking at the output.
# python script/unit-test.py src/boothd unit-tests/
Or, if using the 'booth-test' RPM,
# python unit-test.py src/boothd unit-tests/
This must (currently?) be run as a non-root user; another optional argument is
the test to start from, eg. '003'.
Basically, boothd is started with the config file `unit-tests/booth.conf`, and
gdb gets attached to it.
Then, some ticket state is set, incoming messages are delivered, and outgoing
messages and the state is compared to expected values.
`unit-tests/_defaults.txt` has default values for the initial state and
message data.
Each test file consists of headers and key/value pairs:
--------------------
ticket:
state ST_STABLE
message0: # optional comment for the log file
header.cmd OP_ACCEPTING
ticket.id "asdga"
outgoing0:
header.cmd OP_PREPARING
last_ack_ballot 42
finally:
new_ballot 1234
--------------------
A few details to the the above example:
* Ticket states in RAM (`ticket`, `finally`) are written in host-endianness.
* Message data (`messageN`, `outgoingN`) are automatically converted via `htonl` resp. `ntohl`. They are delivered/checked in the order defined by the integer `N` component.
* Strings are done via `strcpy()`
* `ticket` and `messageN` are assignment chunks
* `finally` and `outgoingN` are compare chunks
* In `outgoingN` you can check _both_ message data (keys with a `.` in them) and ticket state
* Symbolic names are useable, GDB translates them for us
* The test scripts in `unit-tests/` need to be named with 3 digits, an underscore, some text, and `.txt`
* The "fake" `crm_ticket` script gets the current test via `UNIT_TEST`; test scripts can pass additional information via `UNIT_TEST_AUX`.
==== Tips and Hints
There's another special header: `gdb__N__`. These lines are sent to GDB after
injecting a message, but before waiting for an outgoing line. Values that
contain `§` are sent as multiple lines to GDB.
This means that a stanza like
--------------------
gdb0:
watch booth_conf->ticket[0].owner § commands § bt § c § end
--------------------
will cause a watchpoint to be set, and when it is triggered a backtrace (`bt`)
is written to the log file.
This makes it easy to ask for additional data or check for a call-chain when
hitting bugs that can be reproduced via such a unit-test.
# vim: set ft=asciidoc :
diff --git a/booth.spec.in b/booth.spec.in
index 8f33649..2326145 100644
--- a/booth.spec.in
+++ b/booth.spec.in
@@ -1,117 +1,119 @@
%global test_path %{_datadir}/booth/tests
%if 0%{?suse_version}
%define _libexecdir %{_libdir}
%endif
%define with_extra_warnings 0
%define with_debugging 0
%define without_fatal_warnings 1
%if 0%{?fedora_version} || 0%{?centos_version} || 0%{?rhel_version}
%define pkg_group System Environment/Daemons
%else
%define pkg_group Productivity/Clustering/HA
%endif
Name: booth
Summary: Ticket Manager for Multi-site Clusters
License: GPL-2.0+
Group: %{pkg_group}
Version: @version@
Release: 0
Source: booth.tar.bz2
Source1: %name-rpmlintrc
BuildRoot: %{_tmppath}/%{name}-%{version}-build
BuildRequires: asciidoc
BuildRequires: autoconf
BuildRequires: automake
BuildRequires: glib2-devel
BuildRequires: libglue-devel
BuildRequires: libpacemaker-devel
BuildRequires: libxml2-devel
BuildRequires: pkgconfig
# the following is probably SUSE specific
Requires: pacemaker-ticket-support >= 2.0
%description
Booth manages the ticket which authorizes one of the cluster sites located in
geographically dispersed distances to run certain resources. It is designed to
be an add-on of Pacemaker, which extends Pacemaker to support geographically
distributed clustering.
%prep
%setup -q -n %{name}
%build
./autogen.sh
%configure \
--with-initddir=%{_initrddir}
make
#except check
#%check
#make check
%install
make DESTDIR=$RPM_BUILD_ROOT install docdir=%{_defaultdocdir}/%{name}
mkdir -p %{buildroot}/%{_mandir}/man8/
gzip < docs/boothd.8 > %{buildroot}/%{_mandir}/man8/booth.8.gz
ln %{buildroot}/%{_mandir}/man8/booth.8.gz %{buildroot}/%{_mandir}/man8/boothd.8.gz
# systemd
mkdir -p %{buildroot}/usr/lib/systemd/system/
cp -a conf/booth@.service %{buildroot}/usr/lib/systemd/system/booth@.service
#install test-parts
mkdir -p %{buildroot}/%{test_path}
cp -a unit-tests/ script/unit-test.py test conf %{buildroot}/%{test_path}/
+chmod +x %{buildroot}/%{test_path}/booth_path
+chmod +x %{buildroot}/%{test_path}/live_test.sh
mkdir -p %{buildroot}/%{test_path}/src/
ln -s %{_sbindir}/boothd %{buildroot}/%{test_path}/src/
rm -f %{buildroot}/%{test_path}/test/*.pyc
%clean
rm -rf %{buildroot}
%files
%defattr(-,root,root,-)
%{_sbindir}/booth
%{_sbindir}/boothd
%{_initrddir}/booth-arbitrator
%{_mandir}/man8/booth.8.gz
%{_mandir}/man8/boothd.8.gz
%dir /usr/lib/ocf
%dir /usr/lib/ocf/resource.d
%dir /usr/lib/ocf/resource.d/pacemaker
%dir %{_sysconfdir}/booth
/usr/lib/ocf/resource.d/pacemaker/booth-site
%config %{_sysconfdir}/booth/booth.conf.example
/usr/lib/systemd/system/booth@.service
%dir %{_datadir}/booth
%{_datadir}/booth/service-runnable
%doc README COPYING
%package test
Summary: Test scripts for Booth
Group: %{pkg_group}
Requires: booth
Requires: python
%description test
This package contains automated tests for Booth,
the Cluster Ticket Manager for Pacemaker.
%files test
%defattr(-,root,root)
%doc README-testing
%{test_path}
%changelog
diff --git a/test/booth_path b/test/booth_path
new file mode 100755
index 0000000..6ea1402
--- /dev/null
+++ b/test/booth_path
@@ -0,0 +1,35 @@
+#!/bin/sh
+#
+# manage iptables rules for port 6666
+#
+
+[ $# -lt 1 ] && exit
+action=$1
+port=6666
+testip() {
+ local chain=$1
+ iptables -L $chain | grep -wq ^DROP.*$port
+}
+logcmd() {
+ logger -p local7.info "$*"
+ eval $*
+}
+
+case "$action" in
+start)
+logcmd iptables -D INPUT -p udp --dport $port -j DROP
+logcmd iptables -D OUTPUT -p udp --dport $port -j DROP
+logcmd iptables -D INPUT -p udp --sport $port -j DROP
+logcmd iptables -D OUTPUT -p udp --sport $port -j DROP
+;;
+stop)
+testip INPUT && {
+ echo "packets from/to $port already being dropped!"
+ exit
+}
+logcmd iptables -A INPUT -p udp --dport $port -j DROP
+logcmd iptables -A OUTPUT -p udp --dport $port -j DROP
+logcmd iptables -A INPUT -p udp --sport $port -j DROP
+logcmd iptables -A OUTPUT -p udp --sport $port -j DROP
+;;
+esac
diff --git a/test/live_test.sh b/test/live_test.sh
new file mode 100755
index 0000000..6e129d2
--- /dev/null
+++ b/test/live_test.sh
@@ -0,0 +1,479 @@
+#!/bin/sh
+#
+# see README-testing for more information
+# do some basic booth operation tests for the given config
+#
+
+usage() {
+ echo "$0: {booth.conf}"
+ exit
+}
+
+[ $# -eq 0 ] && usage
+
+tkt=ticket-A
+cnf=$1
+shift 1
+logf=test_booth.log
+iprules=/usr/share/booth/tests/test/booth_path
+
+is_function() {
+ test z"`command -v $1`" = z"$1"
+}
+manage_site() {
+ ssh $1 crm resource $2 booth
+}
+manage_arbitrator() {
+ ssh $1 systemctl $2 booth@booth.service
+}
+start_site() {
+ manage_site $1 start
+}
+start_arbitrator() {
+ manage_arbitrator $1 start
+}
+stop_site_clean() {
+ manage_site $1 stop &&
+ sleep 1 &&
+ ssh $1 crm --force site ticket revoke $tkt
+}
+stop_site() {
+ manage_site $1 stop
+}
+stop_arbitrator() {
+ manage_arbitrator $1 stop
+}
+restart_site() {
+ manage_site $1 restart
+}
+restart_arbitrator() {
+ manage_arbitrator $1 restart
+}
+get_stat_fld() {
+ local h=$1 fld=$2
+ ssh $h booth status | sed "s/.* $fld=//;s/ .*//;s/'//g"
+}
+booth_status() {
+ test "`get_stat_fld $1 booth_state`" = "started"
+}
+start_booth() {
+ local h
+ for h in $sites; do
+ start_site $h
+ done >/dev/null 2>&1
+ for h in $arbitrators; do
+ start_arbitrator $h
+ done >/dev/null 2>&1
+ wait_timeout
+}
+restart_booth() {
+ local h procs
+ for h in $sites; do
+ restart_site $h & procs="$! $procs"
+ done >/dev/null 2>&1
+ for h in $arbitrators; do
+ restart_arbitrator $h
+ done >/dev/null 2>&1
+ wait $procs
+ wait_timeout
+}
+sync_conf() {
+ local h rc=0
+ for h in $sites $arbitrators; do
+ rsync -q $cnf $h:/etc/booth/booth.conf
+ rc=$((rc|$?))
+ done
+ return $rc
+}
+forall() {
+ local h rc=0
+ for h in $sites $arbitrators; do
+ ssh $h $@
+ rc=$((rc|$?))
+ done
+ return $rc
+}
+forall_fun() {
+ local h rc=0 f=$1
+ for h in $sites $arbitrators; do
+ $f $h
+ rc=$((rc|$?))
+ [ $rc -ne 0 ] && break
+ done
+ return $rc
+}
+run_site() {
+ local n=$1 h
+ shift 1
+ h=`echo $sites | awk '{print $'$n'}'`
+ ssh $h $@ || {
+ echo "$h: '$@' failed (exit code $?)" >&2
+ }
+}
+run_arbitrator() {
+ local n=$1 h
+ shift 1
+ h=`echo $arbitrators | awk '{print $'$n'}'`
+ ssh $h $@
+}
+get_site() {
+ local n=$1 h
+ shift 1
+ echo $sites | awk '{print $'$n'}'
+}
+
+get_servers() {
+ grep "^$1" |
+ sed -n 's/.*="//;s/"//p'
+}
+
+get_tkt_settings() {
+awk '
+n && /^ / && /expire|timeout/ {
+ sub(" = ", "=", $0);
+ sub("^ ", "T_", $0);
+ print
+ next
+}
+n && /^$/ {exit}
+/^ticket.*'$tkt'/ {n=1}
+' $cnf
+}
+wait_exp() {
+ sleep $T_expire
+}
+wait_half_exp() {
+ sleep $((T_expire/2))
+}
+wait_timeout() {
+ sleep $T_timeout
+}
+
+cib_status() {
+ local h=$1 stat
+ stat=`ssh $h crm_ticket -L |
+ grep "^$tkt" | awk '{print $2}'`
+ test "$stat" != "-1"
+}
+is_cib_granted() {
+ local stat h=$1
+ stat=`ssh $h crm_ticket -L |
+ grep "^$tkt" | awk '{print $2}'`
+ [ "$stat" = "granted" ]
+}
+check_cib_consistency() {
+ local h gh="" rc=0
+ for h in $sites; do
+ if is_cib_granted $h; then
+ [ -n "$gh" ] && rc=1 # granted twice
+ gh="$gh $h"
+ fi
+ done
+ [ -z "$gh" ] && gh="none"
+ if [ $rc -eq 0 ]; then
+ echo $gh
+ return $rc
+ fi
+ cat<<EOF >&2
+CIB consistency test failed
+ticket granted to $gh
+EOF
+ return $rc
+}
+check_cib() {
+ local exp_grantee=$1 cib_grantee booth_grantee
+ local rc=0 pending
+ cib_grantee=`check_cib_consistency`
+ booth_grantee=`booth_where_granted`
+ pending=$?
+ if [ $pending -eq 0 ]; then
+ [ "$cib_grantee" = "$booth_grantee" ]
+ rc=$?
+ else
+ # ticket is not committed to cib yet
+ [ "$exp_grantee" = "$booth_grantee" ]
+ rc=$?
+ exp_grantee="" # cheat a bit
+ fi
+ case "$exp_grantee" in
+ "any") [ "$cib_grantee" != "none" ] ;;
+ "") [ "$cib_grantee" = "none" ] ;;
+ *) [ "$cib_grantee" = "$exp_grantee" ] ;;
+ esac
+ rc=$((rc|$?))
+ if [ $rc -ne 0 ]; then
+ cat<<EOF >&2
+CIB check failed
+CIB grantee: $cib_grantee
+booth grantee: $booth_grantee
+expected grantee: $booth_grantee
+EOF
+ fi
+ return $rc
+}
+
+booth_where_granted() {
+ local grantee ticket_line
+ ticket_line=`run_arbitrator 1 booth list | grep $tkt`
+ grantee=`echo "$ticket_line" | sed 's/.*leader: //;s/,.*//'`
+ echo $grantee
+ ! ssh $grantee booth list | grep -q "$tkt.*pending"
+}
+check_booth_consistency() {
+ local cnt tlist
+ tlist=`forall booth list 2>/dev/null | grep $tkt |
+ sed 's/commit:.*//;s/NONE/none/'`
+ cnt=`echo "$tlist" | sort -u | wc -l`
+ test $cnt -eq 1 && return
+ cat<<EOF >&2
+booth list consistency test failed:
+===========
+"$tlist"
+===========
+EOF
+ return 1
+}
+
+check_consistency() {
+ local exp_grantee=$1
+ check_booth_consistency &&
+ check_cib $exp_grantee
+}
+
+reset_booth() {
+ start_booth
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+}
+test_booth_status() {
+ forall_fun booth_status
+}
+
+runtest() {
+ local start_ts end_ts rc
+ local start_time end_time
+ start_time=`date`
+ start_ts=`date +%s`
+ echo -n "Testing: $1... "
+ test_$1 && check_$1
+ rc=$?
+ end_time=`date`
+ end_ts=`date +%s`
+ is_function recover_$1 && recover_$1
+ if [ $rc -eq 0 ]; then
+ echo OK
+ else
+ echo "FAIL (running hb_report ... $1.tar.bz2; see also $logf)"
+ echo "running hb_report" >&2
+ hb_report -f "`date -d @$((start_ts-5))`" \
+ -t "`date -d @$((end_ts+60))`" \
+ -n "$sites $arbitrators" $1 >&2
+ fi
+}
+
+[ -f "$cnf" ] || {
+ ls $cnf
+ usage
+}
+
+sites=`get_servers site < $cnf`
+arbitrators=`get_servers arbitrator < $cnf`
+eval `get_tkt_settings`
+
+[ -z "$sites" ] && {
+ echo no sites in $cnf
+ usage
+}
+
+[ -z "$T_expire" ] && {
+ echo set $tkt expire time in $cnf
+ usage
+}
+
+exec 2>$logf
+BASH_XTRACEFD=2
+PS4='+ `date +"%T"`: '
+set -x
+
+#
+# the tests
+#
+
+# just a grant
+test_grant() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+}
+check_grant() {
+ check_consistency `get_site 1`
+}
+
+# just a revoke
+test_revoke() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+}
+check_revoke() {
+ check_consistency
+}
+
+# just a grant to another site
+test_grant_elsewhere() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant -s `get_site 2` $tkt >/dev/null
+ wait_timeout
+}
+check_grant_elsewhere() {
+ check_consistency `get_site 2`
+}
+
+# grant with one site lost
+test_grant_site_lost() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ stop_site `get_site 2`
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ check_cib `get_site 1` || return 1
+ wait_exp
+}
+check_grant_site_lost() {
+ check_consistency `get_site 1`
+}
+recover_grant_site_lost() {
+ start_site `get_site 2`
+}
+
+# restart with ticket granted
+test_restart_granted() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ restart_site `get_site 1`
+ wait_timeout
+}
+check_restart_granted() {
+ check_consistency `get_site 1`
+}
+
+# restart with ticket granted (but cib empty)
+test_restart_granted_nocib() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ restart_site `get_site 1`
+ wait_timeout
+}
+check_restart_granted_nocib() {
+ check_consistency `get_site 1`
+}
+
+# restart with ticket not granted
+test_restart_notgranted() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ stop_site `get_site 2`
+ sleep 1
+ start_site `get_site 2`
+ wait_timeout
+}
+check_restart_notgranted() {
+ check_consistency `get_site 1`
+}
+
+# ticket failover
+test_failover() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ stop_site_clean `get_site 1` || return 1
+ booth_status `get_site 1` && return 1
+ wait_exp
+ wait_timeout
+}
+check_failover() {
+ check_consistency any
+ start_site `get_site 1`
+}
+
+# split brain (leader alone)
+test_split_leader() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ run_site 1 $iprules stop >/dev/null
+ wait_exp
+ wait_timeout
+ check_cib any || return 1
+ run_site 1 $iprules start >/dev/null
+ wait_timeout
+}
+check_split_leader() {
+ check_consistency any
+}
+recover_split_leader() {
+ run_site 1 $iprules start >/dev/null
+}
+
+# split brain (follower alone)
+test_split_follower() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ run_site 2 $iprules stop >/dev/null
+ wait_exp
+ wait_timeout
+ run_site 2 $iprules start >/dev/null
+ wait_timeout
+}
+check_split_follower() {
+ check_consistency `get_site 1`
+}
+
+# split brain (leader alone)
+test_split_edge() {
+ run_site 1 booth revoke $tkt >/dev/null
+ wait_timeout
+ run_site 1 booth grant $tkt >/dev/null
+ wait_timeout
+ run_site 1 $iprules stop >/dev/null
+ wait_exp
+ run_site 1 $iprules start >/dev/null
+ wait_timeout
+}
+check_split_edge() {
+ check_consistency any
+}
+
+sync_conf || exit
+restart_booth
+test_booth_status || {
+ reset_booth
+ test_booth_status || exit
+}
+
+TESTS="$@"
+
+: ${TESTS:="grant grant_elsewhere grant_site_lost revoke
+restart_granted restart_granted_nocib restart_notgranted
+failover
+split_leader split_follower split_edge"}
+
+for t in $TESTS; do
+ runtest $t
+done

File Metadata

Mime Type
text/x-diff
Expires
Sat, Nov 23, 5:20 AM (6 h, 6 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
1018254
Default Alt Text
(18 KB)

Event Timeline