diff --git a/tools/README.hb_report b/tools/README.hb_report deleted file mode 100644 index 5d8f5d6757..0000000000 --- a/tools/README.hb_report +++ /dev/null @@ -1,308 +0,0 @@ -Heartbeat reporting -=================== -Dejan Muhamedagic -v1.0 - -`hb_report` is a utility to collect all information relevant to -Heartbeat over the given period of time. - -Quick start ------------ - -Run `hb_report` on one of the nodes or on the host which serves as -a central log server. Run `hb_report` without parameters to see usage. - -A few examples: - -1. Last night during the backup there were several warnings -encountered (logserver is the log host): -+ - logserver# hb_report -f 3:00 -t 4:00 /tmp/report -+ -collects everything from all nodes from 3am to 4am last night. -The files are stored in /tmp/report and compressed to a tarball -/tmp/report.tar.gz. - -2. Just found a problem during testing: - - node1# date : note the current time - node1# /etc/init.d/heartbeat start - node1# nasty_command_that_breaks_things - node1# sleep 120 : wait for the cluster to settle - node1# hb_report -f time /tmp/hb1 - -Introduction ------------- - -Managing clusters is cumbersome. Heartbeat v2 with its numerous -configuration files and multi-node clusters just adds to the -complexity. No wonder then that most problem reports were less -than optimal. This is an attempt to rectify that situation and -make life easier for both the users and the developers. - -On security ------------ - -`hb_report` is a fairly complex program. As some of you are -probably going to run it as `root` let us state a few important -things you should keep in mind: - -1. Don't run `hb_report` as `root`! It is fairly simple to setup -things in such a way that root access is not needed. I won't go -into details, just to stress that all information collected -should be readable by accounts belonging the haclient group. - -2. If you still have to run this as root. Well, don't use the -`-C` option. - -3. Of course, every possible precaution has been taken not to -disturb processes, or touch or remove files out of the given -destination directory. If you (by mistake) specify an existing -directory, `hb_report` will bail out soon. If you specify a -relative path, it won't work either. - -The final product of `hb_report` is a tarball. However, the -destination directory is not removed on any node, unless the user -specifies `-C`. If you're too lazy to cleanup the previous run, -do yourself a favour and just supply a new destination directory. -You've been warned. If you worry about the space used, just put -all your directories under `/tmp` and setup a cronjob to remove -those directories once a week: -.......... - for d in /tmp/*; do - test -d $d || - continue - test -f $d/description.txt || test -f $d/.env || - continue - grep -qs 'By: hb_report' $d/description.txt || - grep -qs '^UNIQUE_MSG=Mark' $d/.env || - continue - rm -r $d - done -.......... - -Mode of operation ------------------ - -Cluster data collection is straightforward: just run the same -procedure on all nodes and collect the reports. There is, -apart from many small ones, one large complication: central -syslog destination. So, in order to allow this to be fully -automated, we should sometimes run the procedure on the log host -too. Actually, if there is a log host, then the best way is to -run `hb_report` there. - -We use `ssh` for the remote program invocation. Even though it is -possible to run `hb_report` without ssh by doing a more menial job, -the overall user experience is much better if ssh works. Anyway, -how else do you manage your cluster? - -Another ssh related point: In case your security policy -proscribes loghost-to-cluster-over-ssh communications, then -you'll have to copy the log file to one of the nodes and point -`hb_report` to it. - -Prerequisites -------------- - -1. ssh -+ -This is not strictly required, but you won't regret having a -password-less ssh. It is not too difficult to setup and will save -you a lot of time. If you can't have it, for example because your -security policy does not allow such a thing, or you just prefer -menial work, then you will have to resort to the semi-manual -semi-automated report generation. See below for instructions. -+ -If you need to supply a password for your passphrase/login, then -please use the `-u` option. - -2. Times -+ -In order to find files and messages in the given period and to -parse the `-f` and `-t` options, `hb_report` uses perl and one of the -`Date::Parse` or `Date::Manip` perl modules. Note that you need -only one of these. Furthermore, on nodes which have no logs and -where you don't run `hb_report` directly, no date parsing is -necessary. In other words, if you run this on a loghost then you -don't need these perl modules on the cluster nodes. -+ -On rpm based distributions, you can find `Date::Parse` in -`perl-TimeDate` and on Debian and its derivatives in -`libtimedate-perl`. - -3. Core dumps -+ -To backtrace core dumps `gdb` is needed and the Heartbeat packages -with the debugging info. The debug info packages may be installed -at the time the report is created. Let's hope that you will need -this really seldom. - -What is in the report ---------------------- - -1. Heartbeat related -- heartbeat version/release information -- heartbeat configuration (CIB, ha.cf, logd.cf) -- heartbeat status (output from crm_mon, crm_verify, crm_tool) -- pengine transition graphs (if any) -- backtraces of core dumps (if any) -- heartbeat logs (if any) -2. System related -- general platform information (`uname`, `arch`, `distribution`) -- system statistics (`uptime`, `top`, `ps`, `netstat -i`, `arp`) -3. User created :) -- problem description (template to be edited) -4. Generated -- problem analysis (generated) - -It is preferred that the Heartbeat is running at the time of the -report, but not absolutely required. `hb_report` will also do a -quick analysis of the collected information. - -Times ------ - -Specifying times can at times be a nuisance. That is why we have -chosen to use one of the perl modules--they do allow certain -freedom when talking dates. You can either read the instructions -at the -http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES[Date::Parse -examples page]. - -or just rely on common sense and try stuff like: - - 3:00 (today at 3am) - 15:00 (today at 3pm) - 2007/9/1 2pm (September 1st at 2pm) - -`hb_report` will (probably) complain if it can't figure out what do -you mean. - -Try to delimit the event as close as possible in order to reduce -the size of the report, but still leaving a minute or two around -for good measure. - -Note that `-f` is not an optional option. And don't forget to quote -dates when they contain spaces. - -It is also possible to extract a CTS test. Just prefix the test -number with `cts:` in the `-f` option. - -Should I send all this to the rest of Internet? ------------------------------------------------ - -We make an effort to remove sensitive data from the Heartbeat -configuration (CIB, ha.cf, and transition graphs). However, you -_have_ to tell us what is sensitive! Use the `-p` option to specify -additional regular expressions to match variable names which may -contain information you don't want to leak. For example: - - # hb_report -f 18:00 -p "user.*" -p "secret.*" /var/tmp/report - -We look by default for variable names matching "pass.*" and the -stonith_host ha.cf directive. - -Logs and other files are not filtered. Please filter them -yourself if necessary. - -Logs ----- - -It may be tricky to find syslog logs. The scheme used is to log a -unique message on all nodes and then look it up in the usual -syslog locations. This procedure is not foolproof, in particular -if the syslog files are in a non-standard directory. We look in -/var/log /var/logs /var/syslog /var/adm /var/log/ha -/var/log/cluster. In case we can't find the logs, please supply -their location: - - # hb_report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1 - -If you have different log locations on different nodes, well, -perhaps you'd like to make them the same and make life easier for -everybody. - -The log files are collected from all hosts where found. In case -your syslog is configured to log to both the log server and local -files and `hb_report` is run on the log server you will end up with -multiple logs with same content. - -Files starting with "ha-" are preferred. In case syslog sends -messages to more than one file, if one of them is named ha-log or -ha-debug those will be favoured to syslog or messages. - -If there is no separate log for Heartbeat, possibly unrelated -messages from other programs are included. We don't filter logs, -just pick a segment for the period you specified. - -NB: Don't have a central log host? Read the CTS README and setup -one. - -Manual report collection ------------------------- - -So, your ssh doesn't work. In that case, you will have to run -this procedure on all nodes. Use `-S` so that we don't bother with -ssh: - - # hb_report -f 5:20pm -t 5:30pm -S /tmp/report_node1 - -If you also have a log host which is not in the cluster, then -you'll have to copy the log to one of the nodes and tell us where -it is: - - # hb_report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1 - -Furthermore, to prevent `hb_report` from asking you to edit the -report to describe the problem on every node use `-D` on all but -one: - - # hb_report -f 5:20pm -t 5:30pm -DS /tmp/report_node1 - -If you reconsider and want the ssh setup, take a look at the CTS -README file for instructions. - -Analysis --------- - -The point of analysis is to get out the most important -information from probably several thousand lines worth of text. -Perhaps this should be more properly named as report review as it -is rather simple, but let's pretend that we are doing something -utterly sophisticated. - -The analysis consists of the following: - -- compare files coming from different nodes; if they are equal, - make one copy in the top level directory, remove duplicates, - and create soft links instead -- print errors, warnings, and lines matching `-L` patterns from logs -- report if there were coredumps and by whom -- report crm_verify results - -The goods ---------- - -1. Common -+ -- ha-log (if found on the log host) -- description.txt (template and user report) -- analysis.txt - -2. Per node -+ -- ha.cf -- logd.cf -- ha-log (if found) -- cib.xml (`cibadmin -Ql` or `cp` if Heartbeat is not running) -- ccm_tool.txt (`crm_node -p`) -- crm_mon.txt (`crm_mon -1`) -- crm_verify.txt (`crm_verify -V`) -- pengine/ (only on DC, directory with pengine transitions) -- sysinfo.txt (static info) -- sysstats.txt (dynamic info) -- backtraces.txt (if coredumps found) -- DC (well...) -- RUNNING or STOPPED - diff --git a/tools/hb_report.in b/tools/hb_report.in deleted file mode 100755 index 6712c3c617..0000000000 --- a/tools/hb_report.in +++ /dev/null @@ -1,778 +0,0 @@ -#!/bin/sh - - # Copyright (C) 2007 Dejan Muhamedagic - # - # This program is free software; you can redistribute it and/or - # modify it under the terms of the GNU General Public - # License as published by the Free Software Foundation; either - # version 2.1 of the License, or (at your option) any later version. - # - # This software is distributed in the hope that it will be useful, - # but WITHOUT ANY WARRANTY; without even the implied warranty of - # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - # General Public License for more details. - # - # You should have received a copy of the GNU General Public - # License along with this library; if not, write to the Free Software - # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - # - -. @sysconfdir@/ha.d/shellfuncs -. $HA_NOARCHBIN/utillib.sh - -PROG=`basename $0` -# FIXME: once this is part of the package! -PROGDIR=`dirname $0` -echo "$PROGDIR" | grep -qs '^/' || { - test -f @sbindir@/$PROG && - PROGDIR=@sbindir@ - test -f $HA_NOARCHBIN/$PROG && - PROGDIR=$HA_NOARCHBIN -} - -# the default syslog facility is not (yet) exported by heartbeat -# to shell scripts -# -DEFAULT_HA_LOGFACILITY="daemon" -export DEFAULT_HA_LOGFACILITY -LOGD_CF=`findlogdcf @sysconfdir@ $HA_DIR` -export LOGD_CF - -: ${SSH_OPTS="-T"} -LOG_PATTERNS="CRIT: ERROR:" -# PEINPUTS_PATT="peng.*PEngine Input stored" - -# the goods -ANALYSIS_F=analysis.txt -DESCRIPTION_F=description.txt -HALOG_F=ha-log.txt -BT_F=backtraces.txt -SYSINFO_F=sysinfo.txt -SYSSTATS_F=sysstats.txt -export ANALYSIS_F DESCRIPTION_F HALOG_F BT_F SYSINFO_F SYSSTATS_F -CRM_MON_F=crm_mon.txt -CCMTOOL_F=ccm_tool.txt -CRM_VERIFY_F=crm_verify.txt -CIB_F=cib.xml -export CRM_MON_F CCMTOOL_F CRM_VERIFY_F CIB_F - -# -# the instance where user runs hb_report is the master -# the others are slaves -# -if [ x"$1" = x__slave ]; then - SLAVE=1 -fi - -usage() { - cat< $DESTDIR/.env && - $PROGDIR/hb_report __slave $DESTDIR" | - (cd $DESTDIR && tar xf -) & - SLAVEPIDS="$SLAVEPIDS $!" - done -} - -# -# does ssh work? -# -testsshuser() { - if [ "$2" ]; then - ssh -T -o Batchmode=yes $2@$1 true 2>/dev/null - else - ssh -T -o Batchmode=yes $1 true 2>/dev/null - fi -} -findsshuser() { - for u in "" $TRY_SSH; do - rc=0 - for n in `getnodes`; do - [ "$node" = "$WE" ] && continue - testsshuser $n $u || { - rc=1 - break - } - done - if [ $rc -eq 0 ]; then - echo $u - return 0 - fi - done - return 1 -} - -# -# the usual stuff -# -getbacktraces() { - flist=`find_files $HA_VARLIB/cores $1 $2` - [ "$flist" ] && - getbt $flist > $3 -} -getpeinputs() { - flist=$( - find_files $HA_VARLIB/pengine $1 $2 | sed "s,$HA_VARLIB/,,g" - ) - [ "$flist" ] && - (cd $HA_VARLIB && tar cf - $flist) | (cd $3 && tar xf -) -} -touch_DC_if_dc() { - dc=`crmadmin -D 2>/dev/null | awk '{print $NF}'` - if [ "$WE" = "$dc" ]; then - touch $1/DC - fi -} - -# -# some basic system info and stats -# -sys_info() { - echo "Heartbeat version: `hb_ver`" - crm_info - echo "Platform: `uname`" - echo "Kernel release: `uname -r`" - echo "Architecture: `uname -m`" - [ `uname` = Linux ] && - echo "Distribution: `distro`" -} -sys_stats() { - set -x - uname -n - uptime - ps axf - ps auxw - top -b -n 1 - ifconfig - netstat -i - arp -an - test -d /proc && { - cat /proc/cpuinfo - test -f /proc/scsi/scsi && cat /proc/scsi/scsi - } - mount - df - set +x -} - -# -# replace sensitive info with '****' -# -sanitize() { - for f in $1/ha.cf $1/$CIB_F $1/pengine/*; do - [ -f "$f" ] && sanitize_one $f - done -} - -# -# remove duplicates if files are same, make links instead -# -consolidate() { - for n in `getnodes`; do - if [ -f $1/$2 ]; then - rm $1/$n/$2 - else - mv $1/$n/$2 $1 - fi - ln -s ../$2 $1/$n - done -} - -# -# some basic analysis of the report -# -checkcrmvfy() { - for n in `getnodes`; do - if [ -s $1/$n/$CRM_VERIFY_F ]; then - echo "WARN: crm_verify reported warnings at $n:" - cat $1/$n/$CRM_VERIFY_F - fi - done -} -checkbacktraces() { - for n in `getnodes`; do - [ -s $1/$n/$BT_F ] && { - echo "WARN: coredumps found at $n:" - egrep 'Core was generated|Program terminated' \ - $1/$n/$BT_F | - sed 's/^/ /' - } - done -} -checklogs() { - logs=$(find $1 -name $HALOG_F; - for l in $EXTRA_LOGS; do find $1/* -name `basename $l`; done) - [ "$logs" ] || return - trap '[ -z "$patfiles" ] || rmtempfile "$patfiles"]' 0 - pattfile=`maketempfile` || - fatal "cannot create temporary files" - for p in $LOG_PATTERNS; do - echo "$p" - done > $pattfile - echo "" - echo "Log patterns:" - for n in `getnodes`; do - cat $logs | grep -f $pattfile - done - rmtempfile $patfiles - trap "" 0 -} - -# -# check if files have same content in the cluster -# -cibdiff() { - d1=`dirname $1` - d2=`dirname $2` - if [ -f $d1/RUNNING -a -f $d2/RUNNING ] || - [ -f $d1/STOPPED -a -f $d2/STOPPED ]; then - crm_diff -c -n $1 -o $2 - else - echo "can't compare cibs from running and stopped systems" - fi -} -txtdiff() { - diff $1 $2 -} -diffcheck() { - [ -f "$1" ] || { - echo "$1 does not exist" - return 1 - } - [ -f "$2" ] || { - echo "$2 does not exist" - return 1 - } - case `basename $1` in - $CCMTOOL_F) - txtdiff $1 $2;; # worddiff? - $CIB_F) - cibdiff $1 $2;; - ha.cf) - txtdiff $1 $2;; # confdiff? - $CRM_MON_F|$SYSINFO_F) - txtdiff $1 $2;; - esac -} -analyze_one() { - rc=0 - node0="" - for n in `getnodes`; do - if [ "$node0" ]; then - diffcheck $1/$node0/$2 $1/$n/$2 - rc=$(($rc+$?)) - else - node0=$n - fi - done - return $rc -} -analyze() { - flist="hostcache $CCMTOOL_F $CIB_F $CRM_MON_F ha.cf logd.cf $SYSINFO_F" - for f in $flist; do - perl -e "printf \"Diff $f... \"" - ls $1/*/$f >/dev/null 2>&1 || continue - if analyze_one $1 $f; then - echo "OK" - [ "$f" != $CIB_F ] && - consolidate $1 $f - else - echo "varies" - fi - done - checkcrmvfy $1 - checkbacktraces $1 - checklogs $1 -} - -# -# description template, editing, and other notes -# -mktemplate() { - cat<=100{exit 1}' || - cat < $DESTDIR/$WE/$HALOG_F - else - cat > $DESTDIR/$HALOG_F # we are log server, probably - fi - else - warning "could not figure out the log format of $HA_LOG" - fi - fi -else - [ "$MASTER_IS_HOSTLOG" ] || - warning "could not find the log file on $WE" -fi - -# -# part 5: start this program on other nodes -# -if [ ! "$SLAVE" ]; then - if [ "$ssh_good" ]; then - start_remote_collectors - else - if [ -z "$NO_SSH" -a `getnodes | wc -w` -gt 1 ]; then - warning "ssh does not work to all nodes" - warning "please use the -u option if you want to supply a password" - fi - fi -fi - -# -# part 6: get all other info (config, stats, etc) -# -if [ "$THIS_IS_NODE" ]; then - getconfig $DESTDIR/$WE - getpeinputs $FROM_TIME $TO_TIME $DESTDIR/$WE - getbacktraces $FROM_TIME $TO_TIME $DESTDIR/$WE/$BT_F - touch_DC_if_dc $DESTDIR/$WE - sanitize $DESTDIR/$WE - sys_info > $DESTDIR/$WE/$SYSINFO_F - sys_stats > $DESTDIR/$WE/$SYSSTATS_F 2>&1 - - for l in $EXTRA_LOGS; do - [ "$NO_str2time" ] && break - [ "$l" = "$HA_LOG" -o ! -f "$l" ] && continue - getstampproc=`find_getstampproc < $l` - if [ "$getstampproc" ]; then - export getstampproc # used by linetime - findlogseg $l $FROM_TIME $TO_TIME - dumplog $l $FROM_LINE $TO_LINE > $DESTDIR/$WE/`basename $l` - else - warning "could not figure out the log format of $l" - fi - done -fi - -# -# part 7: endgame: -# slaves tar their results to stdout, the master waits -# for them, analyses results, asks the user to edit the -# problem description template, and prints final notes -# -if [ "$SLAVE" ]; then - (cd $DESTDIR && tar cf - $WE) -else - wait $SLAVEPIDS - analyze $DESTDIR > $DESTDIR/$ANALYSIS_F - mktemplate > $DESTDIR/$DESCRIPTION_F - [ "$NO_DESCRIPTION" ] || { - echo press enter to edit the problem description... - read junk - edittemplate $DESTDIR/$DESCRIPTION_F - } - cd $DESTDIR/.. - tar czf $DESTDIR.tar.gz `basename $DESTDIR` - finalword - checksize -fi - -[ "$REMOVE_DEST" ] && - rm -r $DESTDIR diff --git a/tools/ocf-tester.in b/tools/ocf-tester.in deleted file mode 100644 index 071ec1a39a..0000000000 --- a/tools/ocf-tester.in +++ /dev/null @@ -1,345 +0,0 @@ -#!/bin/sh -# -# $Id: ocf-tester,v 1.2 2006/08/14 09:38:20 andrew Exp $ -# -# Copyright (c) 2006 Novell Inc, Andrew Beekhof -# All Rights Reserved. -# -# This program is free software; you can redistribute it and/or modify -# it under the terms of version 2 of the GNU General Public License as -# published by the Free Software Foundation. -# -# This program is distributed in the hope that it would be useful, but -# WITHOUT ANY WARRANTY; without even the implied warranty of -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -# -# Further, this software is distributed without any warranty that it is -# free of the rightful claim of any third person regarding infringement -# or the like. Any license provided herein, whether implied or -# otherwise, applies only to this software file. Patent licenses, if -# any, provided herein do not apply to combinations of this program with -# other software, or any other product whatsoever. -# -# You should have received a copy of the GNU General Public License -# along with this program; if not, write the Free Software Foundation, -# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA. -# - -LRMD=@libdir@/heartbeat/lrmd -LRMADMIN=@sbindir@/lrmadmin - -num_errors=0 - -usage() { - echo "ocf-tester [-Lvh] -n resource_name [-o name=value]* /full/path/to/resource" - echo "" - echo "-L: use lrmadmin/lrmd for tests" - echo "-v: be verbose" - exit $1 -} - -assert() { - rc=$1; shift - target=$1; shift - msg=$1; shift - exit_code=$1; shift - if [ $rc -ne $target ]; then - num_errors=`expr $num_errors + 1` - echo -e "* rc=$rc: $msg" - if [ ! -z $exit_code ]; then - echo "Aborting tests" - exit $exit_code - fi - fi -} - -done=0 -ra_args="" -verbose=0 -while test "$done" = "0"; do - case "$1" in - -n) OCF_RESOURCE_INSTANCE=$2; ra_args="$ra_args OCF_RESOURCE_INSTANCE=$2"; shift; shift;; - -o) name=${2%%=*}; value=${2##*=}; - lrm_ra_args="$lrm_ra_args $2"; - ra_args="$ra_args OCF_RESKEY_$name=$value"; shift; shift;; - -L) use_lrmd=1; shift;; - -v) verbose=1; shift;; - -?) usage 0;; - -*) echo "unknown option: $1"; usage 1;; - *) done=1;; - esac -done - -if [ "x" = "x$OCF_ROOT" ]; then - if [ -d /usr/lib/ocf ]; then - export OCF_ROOT=/usr/lib/ocf - else - echo "You must supply the location of OCF_ROOT (common location is /usr/lib/ocf)" - usage 1 - fi -fi - -if [ "x" = "x$OCF_RESOURCE_INSTANCE" ]; then - echo "You must give your resource a name, set OCF_RESOURCE_INSTANCE" - usage 1 -fi - -agent=$1 -if [ ! -e $agent ]; then - echo "You must provide the full path to your resource agent" - usage 1 -fi -stopped_rc=7 -has_demote=1 -has_promote=1 - -start_lrmd() { - lrmd_timeout=0 - lrmd_interval=0 - lrmd_target_rc=EVERYTIME - lrmd_started="" - $LRMD -s 2>/dev/null - rc=$? - if [ $rc -eq 3 ]; then - lrmd_started=1 - $LRMD & - sleep 1 - $LRMD -s 2>/dev/null - else - return $rc - fi -} -add_resource() { - $LRMADMIN -A $OCF_RESOURCE_INSTANCE \ - ocf \ - `basename $agent` \ - $(basename `dirname $agent`) \ - $lrm_ra_args > /dev/null -} -del_resource() { - $LRMADMIN -D $OCF_RESOURCE_INSTANCE -} -parse_lrmadmin_output() { - awk ' -BEGIN{ rc=1; } -/Waiting for lrmd to callback.../ { n=1; next; } -n==1 && /----------------operation--------------/ { n++; next; } -n==2 && /return code:/ { rc=$0; sub("return code: *","",rc); next } -n==2 && /---------------------------------------/ { - n++; - next; -} -END{ - if( n!=3 ) exit 1; - else exit rc; -} -' -} -exec_resource() { - op="$1" - args="$2" - $LRMADMIN -E $OCF_RESOURCE_INSTANCE \ - $op $lrmd_timeout $lrmd_interval \ - $lrmd_target_rc \ - $args | parse_lrmadmin_output -} - -if [ "$use_lrmd" = 1 ]; then - echo "Using lrmd/lrmadmin for all tests" - start_lrmd || { - echo "could not start lrmd" - exit 1 - } - trap ' - [ "$lrmd_started" = 1 ] && $LRMD -k - ' EXIT - add_resource || { - echo "failed to add resource to lrmd" - exit 1 - } -fi - -lrm_test_command() { - action="$1" - msg="$2" - [ "$verbose" -eq 0 ] || echo "$msg" - exec_resource $action "$lrm_ra_args" -} - -test_command() { - action=$1; shift - msg=${1:-"Testing: $action"} - if [ "$use_lrmd" = 1 ]; then - lrm_test_command $action "$msg" - return $? - fi - #echo Running: "export $ra_args; bash $agent $action 2>&1 > /dev/null" - if [ $verbose -eq 0 ]; then - bash $agent $action >/dev/null 2>&1 - else - echo $msg - bash $agent $action - fi - rc=$? - #echo rc: $rc - return $rc -} - -# Begin tests -echo Beginning tests for $agent... - -if [ ! -f $agent ]; then - assert 7 0 "Could not find file: $agent" -fi - -test_command meta-data -rc=$? -if [ $rc -eq 3 ]; then - assert $rc 0 "Your agent does not support the meta-data action" -else - assert $rc 0 "The meta-data action cannot fail and must return 0" -fi - -export $ra_args; - -test_command validate-all -rc=$? -if [ $rc -eq 3 ]; then - assert $rc 0 "Your agent does not support the validate-all action" -elif [ $rc -ne 0 ]; then - assert $rc 0 "Validation failed. Did you supply enough options with -o ?" 1 - usage $rc -fi - -test_command monitor "Checking current state" -rc=$? -if [ $rc -eq 3 ]; then - assert $rc 7 "Your agent does not support the monitor action" 1 - -elif [ $rc -eq 1 ]; then - assert $rc 7 "Monitoring a stopped resources should return 0" - echo "Test updated to expect 1 for stopped resources for the remainder of this run" - stopped_rc=1 - -elif [ $rc -eq 8 ]; then - test_command demote "Cleanup, demote" - assert $? 0 "Your agent was a master and could not be demoted" 1 - - test_command stop "Cleanup, stop" - assert $? 0 "Your agent was a master and could not be stopped" 1 - -elif [ $rc -ne 7 ]; then - test_command stop - assert $? 0 "Your agent was active and could not be stopped" 1 -fi - -test_command monitor -assert $? $stopped_rc "Monitoring a stopped resource should return $stopped_rc" - -test_command start -assert $? 0 "Start failed. Did you supply enough options with -o ?" 1 - -test_command monitor -assert $? 0 "Monitoring an active resource should return 0" - -test_command notify -rc=$? -if [ $rc -eq 3 ]; then - echo "* Your agent does not support the notify action (optional)" -else - assert $rc 0 "The notify action cannot fail and must return 0" -fi - -test_command demote "Checking for demote action" -if [ $? -eq 3 ]; then - has_demote=0 - echo "* Your agent does not support the demote action (optional)" -fi - -test_command promote "Checking for promote action" -if [ $? -eq 3 ]; then - has_promote=0 - echo "* Your agent does not support the promote action (optional)" -fi - -if [ $has_promote -eq 1 -a $has_demote -eq 1 ]; then - test_command demote "Testing: demotion of started resource" - assert $? 0 "Demoting a start resource should not fail" - - test_command promote - assert $? 0 "Promote failed" - - test_command demote - assert $? 0 "Demote failed" 1 - - test_command demote "Testing: demotion of demoted resource" - assert $? 0 "Demoting a demoted resource should not fail" - - test_command promote "Promoting resource" - assert $? 0 "Promote failed" 1 - - test_command promote "Testing: promotion of promoted resource" - assert $? 0 "Promoting a promoted resource should not fail" - - test_command demote "Demoting resource" - assert $? 0 "Demote failed" 1 - -elif [ $has_promote -eq 0 -a $has_demote -eq 0 ]; then - echo "* Your agent does not support master/slave (optional)" - -else - echo "* Your agent partially supports master/slave" - num_errors=`expr $num_errors + 1` -fi - -test_command stop -assert $? 0 "Stop failed" 1 - -test_command monitor -assert $? $stopped_rc "Monitoring a stopped resource should return $stopped_rc" - -test_command start "Restarting resource..." -assert $? 0 "Start failed" 1 - -test_command monitor -assert $? 0 "Monitoring an active resource should return 0" - -test_command start "Testing: starting a started resource" -assert $? 0 "Starting a running resource is required to succeed" - -test_command monitor -assert $? 0 "Monitoring an active resource should return 0" - -test_command stop "Stopping resource" -assert $? 0 "Stop could not clean up after multiple starts" 1 - -test_command monitor -assert $? $stopped_rc "Monitoring a stopped resource should return $stopped_rc" - -test_command stop "Testing: stopping a stopped resource" -assert $? 0 "Stopping a stopped resource is required to succeed" - -test_command monitor -assert $? $stopped_rc "Monitoring a stopped resource should return $stopped_rc" - -test_command migrate_to "Checking for migrate_to action" -rc=$? -if [ $rc -ne 3 ]; then - test_command migrate_from "Checking for migrate_from action" -fi -if [ $? -eq 3 ]; then - echo "* Your agent does not support the migrate action (optional)" -fi - -test_command reload "Checking for reload action" -if [ $? -eq 3 ]; then - echo "* Your agent does not support the reload action (optional)" -fi - -if [ $num_errors -gt 0 ]; then - echo Tests failed: $agent failed $num_errors tests - exit 1 -else - echo $agent passed all tests - exit 0 -fi diff --git a/tools/utillib.sh b/tools/utillib.sh deleted file mode 100644 index 1648204192..0000000000 --- a/tools/utillib.sh +++ /dev/null @@ -1,389 +0,0 @@ - # Copyright (C) 2007 Dejan Muhamedagic - # - # This program is free software; you can redistribute it and/or - # modify it under the terms of the GNU General Public - # License as published by the Free Software Foundation; either - # version 2.1 of the License, or (at your option) any later version. - # - # This software is distributed in the hope that it will be useful, - # but WITHOUT ANY WARRANTY; without even the implied warranty of - # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - # General Public License for more details. - # - # You should have received a copy of the GNU General Public - # License along with this library; if not, write to the Free Software - # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA - # - -# -# ha.cf/logd.cf parsing -# -getcfvar() { - [ -f $HA_CF ] || return - sed 's/#.*//' < $HA_CF | - grep -w "^$1" | - sed 's/^[^[:space:]]*[[:space:]]*//' -} -iscfvarset() { - test "`getcfvar \"$1\"`" -} -iscfvartrue() { - getcfvar "$1" | - egrep -qsi "^(true|y|yes|on|1)" -} -getnodes() { - getcfvar node -} - -# -# logging -# -syslogmsg() { - severity=$1 - shift 1 - logtag="" - [ "$HA_LOGTAG" ] && logtag="-t $HA_LOGTAG" - logger -p ${HA_LOGFACILITY:-"daemon"}.$severity $logtag $* -} - -# -# find log destination -# -uselogd() { - iscfvartrue use_logd && - return 0 # if use_logd true - iscfvarset logfacility || - iscfvarset logfile || - iscfvarset debugfile || - return 0 # or none of the log options set - false -} -findlogdcf() { - for f in \ - `which strings > /dev/null 2>&1 && - strings $HA_BIN/ha_logd | grep 'logd\.cf'` \ - `for d; do echo $d/logd.cf $d/ha_logd.cf; done` - do - if [ -f "$f" ]; then - echo $f - return 0 - fi - done - return 1 -} -getlogvars() { - savecf=$HA_CF - if uselogd; then - [ -f "$LOGD_CF" ] || - fatal "could not find logd.cf or ha_logd.cf" - HA_CF=$LOGD_CF - fi - HA_LOGFACILITY=`getcfvar logfacility` - [ none = "$HA_LOGFACILITY" ] && HA_LOGFACILITY="" - HA_LOGFILE=`getcfvar logfile` - HA_DEBUGFILE=`getcfvar debugfile` - HA_SYSLOGMSGFMT="" - iscfvartrue syslogmsgfmt && - HA_SYSLOGMSGFMT=1 - HA_CF=$savecf -} -findmsg() { - # this is tricky, we try a few directories - syslogdir="/var/log /var/logs /var/syslog /var/adm /var/log/ha /var/log/cluster" - favourites="ha-*" - mark=$1 - log="" - for d in $syslogdir; do - [ -d $d ] || continue - log=`fgrep -l "$mark" $d/$favourites` && break - log=`fgrep -l "$mark" $d/*` && break - done 2>/dev/null - echo $log -} - -# -# print a segment of a log file -# -str2time() { - perl -e "\$time='$*';" -e ' - eval "use Date::Parse"; - if (!$@) { - print str2time($time); - } else { - eval "use Date::Manip"; - if (!$@) { - print UnixDate(ParseDateString($time), "%s"); - } - } - ' -} -getstamp() { - if [ "$HA_SYSLOGMSGFMT" -o "$HA_LOGFACILITY" ]; then - awk '{print $1,$2,$3}' - else - awk '{print $2}' | sed 's/_/ /' - fi -} -linetime() { - l=`tail -n +$2 $1 | head -1 | getstamp` - str2time "$l" -} -findln_by_time() { - logf=$1 - tm=$2 - first=1 - last=`wc -l < $logf` - while [ $first -le $last ]; do - mid=$(((last+first)/2)) - trycnt=10 - while [ $trycnt -gt 0 ]; do - tmid=`linetime $logf $mid` - [ "$tmid" ] && break - warning "cannot extract time: $logf:$mid; will try the next one" - trycnt=$((trycnt-1)) - mid=$((mid+1)) - done - if [ -z "$tmid" ]; then - warning "giving up on log..." - return - fi - if [ $tmid -gt $tm ]; then - last=$((mid-1)) - elif [ $tmid -lt $tm ]; then - first=$((mid+1)) - else - break - fi - done - echo $mid -} - -dumplog() { - logf=$1 - from_line=$2 - to_line=$3 - [ "$from_line" ] || - return - tail -n +$from_line $logf | - if [ "$to_line" ]; then - head -$((to_line-from_line+1)) - else - cat - fi -} - -# -# find files newer than a and older than b -# -isnumber() { - echo "$*" | grep -qs '^[0-9][0-9]*$' -} -touchfile() { - t=`maketempfile` && - perl -e "\$file=\"$t\"; \$tm=$1;" -e 'utime $tm, $tm, $file;' && - echo $t -} -find_files() { - dir=$1 - from_time=$2 - to_time=$3 - isnumber "$from_time" && [ "$from_time" -gt 0 ] || { - warning "sorry, can't find files based on time if you don't supply time" - return - } - from_stamp=`touchfile $from_time` - findexp="-newer $from_stamp" - if isnumber "$to_time" && [ "$to_time" -gt 0 ]; then - to_stamp=`touchfile $to_time` - findexp="$findexp ! -newer $to_stamp" - fi - find $dir -type f $findexp - rm -f $from_stamp $to_stamp -} - -# -# coredumps -# -findbinary() { - random_binary=`which cat 2>/dev/null` # suppose we are lucky - binary=`gdb $random_binary $1 < /dev/null 2>/dev/null | - grep 'Core was generated' | awk '{print $5}' | - sed "s/^.//;s/[.']*$//"` - [ x = x"$binary" ] && return - fullpath=`which $binary 2>/dev/null` - if [ x = x"$fullpath" ]; then - [ -x $HA_BIN/$binary ] && echo $HA_BIN/$binary - else - echo $fullpath - fi -} -getbt() { - which gdb > /dev/null 2>&1 || { - warning "please install gdb to get backtraces" - return - } - for corefile; do - absbinpath=`findbinary $corefile` - [ x = x"$absbinpath" ] && return 1 - echo "====================== start backtrace ======================" - ls -l $corefile - gdb -batch -n -quiet -ex ${BT_OPTS:-"thread apply all bt full"} -ex quit \ - $absbinpath $corefile 2>/dev/null - echo "======================= end backtrace =======================" - done -} - -# -# heartbeat configuration/status -# -iscrmrunning() { - crmadmin -D >/dev/null 2>&1 & - pid=$! - maxwait=10 - while kill -0 $pid 2>/dev/null && [ $maxwait -gt 0 ]; do - sleep 1 - maxwait=$((maxwait-1)) - done - if kill -0 $pid 2>/dev/null; then - kill $pid - false - else - wait $pid - fi -} -dumpstate() { - crm_mon -1 | grep -v '^Last upd' > $1/crm_mon.txt - cibadmin -Ql > $1/cib.xml - crm_node -p > $1/ccm_tool.txt 2>&1 -} -getconfig() { - [ -f $HA_CF ] && - cp -p $HA_CF $1/ - [ -f $LOGD_CF ] && - cp -p $LOGD_CF $1/ - if iscrmrunning; then - dumpstate $1 - touch $1/RUNNING - else - cp -p $HA_VARLIB/crm/cib.xml $1/ 2>/dev/null - touch $1/STOPPED - fi - [ -f "$1/cib.xml" ] && - crm_verify -V -x $1/cib.xml >$1/crm_verify.txt 2>&1 -} - -# -# remove values of sensitive attributes -# -# this is not proper xml parsing, but it will work under the -# circumstances -sanitize_xml_attrs() { - sed $( - for patt in $SANITIZE; do - echo "-e /name=\"$patt\"/s/value=\"[^\"]*\"/value=\"****\"/" - done - ) -} -sanitize_hacf() { - awk ' - $1=="stonith_host"{ for( i=5; i<=NF; i++ ) $i="****"; } - {print} - ' -} -sanitize_one() { - file=$1 - compress="" - echo $file | grep -qs 'gz$' && compress=gzip - echo $file | grep -qs 'bz2$' && compress=bzip2 - if [ "$compress" ]; then - decompress="$compress -dc" - else - compress=cat - decompress=cat - fi - tmp=`maketempfile` && ref=`maketempfile` || - fatal "cannot create temporary files" - touch -r $file $ref # save the mtime - if [ "`basename $file`" = ha.cf ]; then - sanitize_hacf - else - $decompress | sanitize_xml_attrs | $compress - fi < $file > $tmp - mv $tmp $file - touch -r $ref $file - rm -f $ref -} - -# -# keep the user posted -# -fatal() { - echo "`uname -n`: ERROR: $*" >&2 - exit 1 -} -warning() { - echo "`uname -n`: WARN: $*" >&2 -} -info() { - echo "`uname -n`: INFO: $*" >&2 -} -pickfirst() { - for x; do - which $x >/dev/null 2>&1 && { - echo $x - return 0 - } - done - return 1 -} - -# -# get some system info -# -distro() { - which lsb_release >/dev/null 2>&1 && { - lsb_release -d - return - } - relf=`ls /etc/debian_version 2>/dev/null` || - relf=`ls /etc/slackware-version 2>/dev/null` || - relf=`ls -d /etc/*-release 2>/dev/null` && { - for f in $relf; do - test -f $f && { - echo "`ls $f` `cat $f`" - return - } - done - } - warning "no lsb_release no /etc/*-release no /etc/debian_version" -} -hb_ver() { - # for Linux .deb based systems - which dpkg > /dev/null 2>&1 && { - for pkg in heartbeat heartbeat-2; do - dpkg-query -f '${Version}' -W $pkg 2>/dev/null && break - done - [ $? -eq 0 ] && - debsums -s $pkg 2>/dev/null - return - } - # for Linux .rpm based systems - which rpm > /dev/null 2>&1 && { - rpm -q --qf '%{version}' heartbeat && - rpm --verify heartbeat - return - } - # for OpenBSD - which pkg_info > /dev/null 2>&1 && { - pkg_info | grep heartbeat | cut -d "-" -f 2- | cut -d " " -f 1 - return - } - # for Solaris - which pkginfo > /dev/null 2>&1 && { - pkginfo | awk '{print $3}' - } - # more packagers? -} -crm_info() { - $HA_BIN/crmd version 2>&1 -}