diff --git a/tools/README.hb_report b/tools/README.hb_report index 043898184c..ed6fef4c96 100644 --- a/tools/README.hb_report +++ b/tools/README.hb_report @@ -1,297 +1,305 @@ Heartbeat reporting =================== Dejan Muhamedagic v1.0 `hb_report` is a utility to collect all information relevant to Heartbeat over the given period of time. Quick start ----------- Run `hb_report` on one of the nodes or on the host which serves as a central log server. Run `hb_report` without parameters to see usage. A few examples: 1. Last night during the backup there were several warnings encountered (logserver is the log host): + logserver# hb_report -f 3:00 -t 4:00 /tmp/report + collects everything from all nodes from 3am to 4am last night. The files are stored in /tmp/report and compressed to a tarball /tmp/report.tar.gz. 2. Just found a problem during testing: node1# date : note the current time node1# /etc/init.d/heartbeat start node1# nasty_command_that_breaks_things node1# sleep 120 : wait for the cluster to settle node1# hb_report -f time /tmp/hb1 Introduction ------------ Managing clusters is cumbersome. Heartbeat v2 with its numerous configuration files and multi-node clusters just adds to the complexity. No wonder then that most problem reports were less than optimal. This is an attempt to rectify that situation and make life easier for both the users and the developers. On security ----------- `hb_report` is a fairly complex program. As some of you are -probably going to run it as root let us state a few important +probably going to run it as `root` let us state a few important things you should keep in mind: -1. Don't run `hb_report` as root! It is fairly simple to setup +1. Don't run `hb_report` as `root`! It is fairly simple to setup things in such a way that root access is not needed. I won't go into details, just to stress that all information collected should be readable by accounts belonging the haclient group. 2. If you still have to run this as root. Well, don't use the `-C` option. 3. Of course, every possible precaution has been taken not to disturb processes, or touch or remove files out of the given destination directory. If you (by mistake) specify an existing directory, `hb_report` will bail out soon. If you specify a -relative path, it won't work either. The final product of -`hb_report` is a tarball. However, the destination directory is -not removed on any node, unless the user specifies `-C`. If you're -too lazy to cleanup the previous run, do yourself a favour and -just supply a new destination directory. You've been warned. If -you worry about the space used, just put all your directories -under /tmp and setup a cronjob to remove those directories once a -week: +relative path, it won't work either. + +The final product of `hb_report` is a tarball. However, the +destination directory is not removed on any node, unless the user +specifies `-C`. If you're too lazy to cleanup the previous run, +do yourself a favour and just supply a new destination directory. +You've been warned. If you worry about the space used, just put +all your directories under `/tmp` and setup a cronjob to remove +those directories once a week: .......... for d in /tmp/*; do test -d $d || continue test -f $d/description.txt || test -f $d/.env || continue grep -qs 'By: hb_report' $d/description.txt || grep -qs '^UNIQUE_MSG=Mark' $d/.env || continue rm -r $d done .......... Mode of operation ----------------- Cluster data collection is straightforward: just run the same procedure on all nodes and collect the reports. There is, apart from many small ones, one large complication: central syslog destination. So, in order to allow this to be fully automated, we should sometimes run the procedure on the log host too. Actually, if there is a log host, then the best way is to run `hb_report` there. -We use ssh for the remote program invocation. Even though it is +We use `ssh` for the remote program invocation. Even though it is possible to run `hb_report` without ssh by doing a more menial job, the overall user experience is much better if ssh works. Anyway, how else do you manage your cluster? Another ssh related point: In case your security policy proscribes loghost-to-cluster-over-ssh communications, then you'll have to copy the log file to one of the nodes and point `hb_report` to it. Prerequisites ------------- 1. ssh + This is not strictly required, but you won't regret having a password-less ssh. It is not too difficult to setup and will save you a lot of time. If you can't have it, for example because your security policy does not allow such a thing, or you just prefer menial work, then you will have to resort to the semi-manual semi-automated report generation. See below for instructions. ++ +If you need to supply a password for your passphrase/login, then +please use the `-u` option. 2. Times + In order to find files and messages in the given period and to parse the `-f` and `-t` options, `hb_report` uses perl and one of the `Date::Parse` or `Date::Manip` perl modules. Note that you need -only one of these. +only one of these. Furthermore, on nodes which have no logs and +where you don't run `hb_report` directly, no date parsing is +necessary. In other words, if you run this on a loghost then you +don't need these perl modules on the cluster nodes. + On rpm based distributions, you can find `Date::Parse` in `perl-TimeDate` and on Debian and its derivatives in `libtimedate-perl`. 3. Core dumps + -To backtrace core dumps gdb is needed and the Heartbeat packages +To backtrace core dumps `gdb` is needed and the Heartbeat packages with the debugging info. The debug info packages may be installed at the time the report is created. Let's hope that you will need this really seldom. What is in the report --------------------- 1. Heartbeat related - heartbeat version/release information - heartbeat configuration (CIB, ha.cf, logd.cf) - heartbeat status (output from crm_mon, crm_verify, ccm_tool) - pengine transition graphs (if any) - backtraces of core dumps (if any) - heartbeat logs (if any) 2. System related - general platform information (`uname`, `arch`, `distribution`) -- system statistics (`uptime`, `top`, `ps`) +- system statistics (`uptime`, `top`, `ps`, `netstat -i`, `arp`) 3. User created :) - problem description (template to be edited) 4. Generated - problem analysis (generated) It is preferred that the Heartbeat is running at the time of the report, but not absolutely required. `hb_report` will also do a quick analysis of the collected information. Times ----- Specifying times can at times be a nuisance. That is why we have chosen to use one of the perl modules--they do allow certain freedom when talking dates. You can either read the instructions at the http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES[Date::Parse examples page]. or just rely on common sense and try stuff like: 3:00 (today at 3am) 15:00 (today at 3pm) 2007/9/1 2pm (September 1st at 2pm) `hb_report` will (probably) complain if it can't figure out what do you mean. Try to delimit the event as close as possible in order to reduce the size of the report, but still leaving a minute or two around for good measure. Note that `-f` is not an optional option. And don't forget to quote dates when they contain spaces. Should I send all this to the rest of Internet? ----------------------------------------------- We make an effort to remove sensitive data from the Heartbeat configuration (CIB, ha.cf, and transition graphs). However, you _have_ to tell us what is sensitive! Use the `-p` option to specify additional regular expressions to match variable names which may contain information you don't want to leak. For example: # hb_report -f 18:00 -p "user.*" -p "secret.*" /var/tmp/report We look by default for variable names matching "pass.*" and the stonith_host ha.cf directive. Logs and other files are not filtered. Please filter them yourself if necessary. Logs ---- It may be tricky to find syslog logs. The scheme used is to log a unique message on all nodes and then look it up in the usual syslog locations. This procedure is not foolproof, in particular if the syslog files are in a non-standard directory. We look in /var/log /var/logs /var/syslog /var/adm /var/log/ha /var/log/cluster. In case we can't find the logs, please supply their location: # hb_report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1 If you have different log locations on different nodes, well, -perhaps you'd like to make them the same. Or read about the -manual report collection. +perhaps you'd like to make them the same and make life easier for +everybody. The log files are collected from all hosts where found. In case your syslog is configured to log to both the log server and local files and `hb_report` is run on the log server you will end up with multiple logs with same content. Files starting with "ha-" are preferred. In case syslog sends messages to more than one file, if one of them is named ha-log or ha-debug those will be favoured to syslog or messages. If there is no separate log for Heartbeat, possibly unrelated messages from other programs are included. We don't filter logs, just pick a segment for the period you specified. NB: Don't have a central log host? Read the CTS README and setup one. Manual report collection ------------------------ So, your ssh doesn't work. In that case, you will have to run this procedure on all nodes. Use `-S` so that we don't bother with ssh: # hb_report -f 5:20pm -t 5:30pm -S /tmp/report_node1 If you also have a log host which is not in the cluster, then you'll have to copy the log to one of the nodes and tell us where it is: # hb_report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1 Furthermore, to prevent `hb_report` from asking you to edit the report to describe the problem on every node use `-D` on all but one: # hb_report -f 5:20pm -t 5:30pm -DS /tmp/report_node1 If you reconsider and want the ssh setup, take a look at the CTS README file for instructions. Analysis -------- The point of analysis is to get out the most important information from probably several thousand lines worth of text. Perhaps this should be more properly named as report review as it is rather simple, but let's pretend that we are doing something utterly sophisticated. The analysis consists of the following: - compare files coming from different nodes; if they are equal, make one copy in the top level directory, remove duplicates, and create soft links instead - print errors, warnings, and lines matching `-L` patterns from logs - report if there were coredumps and by whom - report crm_verify results The goods --------- 1. Common + - ha-log (if found on the log host) - description.txt (template and user report) - analysis.txt 2. Per node + - ha.cf - logd.cf - ha-log (if found) - cib.xml (`cibadmin -Ql` or `cp` if Heartbeat is not running) - ccm_tool.txt (`ccm_tool -p`) - crm_mon.txt (`crm_mon -1`) - crm_verify.txt (`crm_verify -V`) - pengine/ (only on DC, directory with pengine transitions) - sysinfo.txt (static info) - sysstats.txt (dynamic info) - backtraces.txt (if coredumps found) - DC (well...) +- RUNNING or STOPPED diff --git a/tools/hb_report.in b/tools/hb_report.in index c02a3df378..f4ee7fbee9 100755 --- a/tools/hb_report.in +++ b/tools/hb_report.in @@ -1,608 +1,663 @@ #!/bin/sh # Copyright (C) 2007 Dejan Muhamedagic # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This software is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # . @sysconfdir@/ha.d/shellfuncs . $HA_NOARCHBIN/utillib.sh PROG=`basename $0` # FIXME: once this is part of the package! PROGDIR=`dirname $0` echo "$PROGDIR" | grep -qs '^/' || { test -f @sbindir@/$PROG && PROGDIR=@sbindir@ test -f $HA_NOARCHBIN/$PROG && PROGDIR=$HA_NOARCHBIN } LOGD_CF=`findlogdcf @sysconfdir@ $HA_DIR` export LOGD_CF -: ${SSH_OPTS="-T -o Batchmode=yes"} +: ${SSH_OPTS="-T"} LOG_PATTERNS="CRIT: ERROR:" # # the instance where user runs hb_report is the master # the others are slaves # if [ x"$1" = x__slave ]; then SLAVE=1 fi # # if this is the master, allow ha.cf and logd.cf in the current dir # (because often the master is the log host) # if [ "$SLAVE" = "" ]; then [ -f ha.cf ] && HA_CF=ha.cf [ -f logd.cf ] && LOGD_CF=logd.cf fi usage() { cat< $DESTDIR/.env" + ssh $ssh_opts $node "mkdir -p $DESTDIR; cat > $DESTDIR/.env" done } start_remote_collectors() { for node in `getnodes`; do [ "$node" = "$WE" ] && continue - ssh $SSH_OPTS $SSH_USER@$node "$PROGDIR/hb_report __slave $DESTDIR" | + ssh $ssh_opts $node "$PROGDIR/hb_report __slave $DESTDIR" | (cd $DESTDIR && tar xf -) & SLAVEPIDS="$SLAVEPIDS $!" done } # # does ssh work? # -findsshuser() { - for n in `getnodes`; do - [ "$node" = "$WE" ] && continue - trysshusers $n $TRY_SSH && break - done +testsshuser() { + if [ "$2" ]; then + ssh -T -o Batchmode=yes $2@$1 true 2>/dev/null + else + ssh -T -o Batchmode=yes $1 true 2>/dev/null + fi } -checkssh() { - for n in `getnodes`; do - [ "$node" = "$WE" ] && continue - checksshuser $n $SSH_USER || return 1 +findsshuser() { + for u in "" $TRY_SSH; do + rc=0 + for n in `getnodes`; do + [ "$node" = "$WE" ] && continue + testsshuser $n $u || { + rc=1 + break + } + done + if [ $rc -eq 0 ]; then + echo $u + return 0 + fi done - return 0 + return 1 } # # the usual stuff # getbacktraces() { flist=`find_files $HA_VARLIB/cores $1 $2` [ "$flist" ] && getbt $flist > $3 } getpeinputs() { n=`basename $3` flist=$( if [ -f $3/ha-log ]; then grep " $n peng.*PEngine Input stored" $3/ha-log | awk '{print $NF}' else find_files $HA_VARLIB/pengine $1 $2 fi | sed "s,$HA_VARLIB/,,g" ) [ "$flist" ] && (cd $HA_VARLIB && tar cf - $flist) | (cd $3 && tar xf -) } touch_DC_if_dc() { dc=`crmadmin -D 2>/dev/null | awk '{print $NF}'` if [ "$WE" = "$dc" ]; then touch $1/DC fi } # # some basic system info and stats # sys_info() { echo "Heartbeat version: `hb_ver`" crm_info echo "Platform: `uname`" echo "Kernel release: `uname -r`" echo "Architecture: `arch`" [ `uname` = Linux ] && echo "Distribution: `distro`" } sys_stats() { set -x uptime ps axf ps auxw top -b -n 1 netstat -i + arp -an set +x } # # replace sensitive info with '****' # sanitize() { for f in $1/ha.cf $1/cib.xml $1/pengine/*; do [ -f "$f" ] && sanitize_one $f done } # # remove duplicates if files are same, make links instead # consolidate() { for n in `getnodes`; do if [ -f $1/$2 ]; then rm $1/$n/$2 else mv $1/$n/$2 $1 fi ln -s ../$2 $1/$n done } # # some basic analysis of the report # checkcrmvfy() { for n in `getnodes`; do if [ -s $1/$n/crm_verify.txt ]; then echo "WARN: crm_verify reported warnings at $n:" cat $1/$n/crm_verify.txt fi done } checkbacktraces() { for n in `getnodes`; do [ -s $1/$n/backtraces.txt ] && { echo "WARN: coredumps found at $n:" egrep 'Core was generated|Program terminated' \ $1/$n/backtraces.txt | sed 's/^/ /' } done } checklogs() { logs=`find $1 -name ha-log` [ "$logs" ] || return pattfile=`maketempfile` || fatal "cannot create temporary files" for p in $LOG_PATTERNS; do echo "$p" done > $pattfile echo "" echo "Log patterns:" for n in `getnodes`; do cat $logs | grep -f $pattfile done rm -f $pattfile } # # check if files have same content in the cluster # cibdiff() { - crm_diff -c -n $1 -o $2 + d1=`dirname $1` + d2=`dirname $2` + if [ -f $d1/RUNNING -a -f $d2/RUNNING ] || + [ -f $d1/STOPPED -a -f $d2/STOPPED ]; then + crm_diff -c -n $1 -o $2 + else + echo "can't compare cibs from running and stopped systems" + fi } txtdiff() { diff $1 $2 } diffcheck() { + [ -f "$1" ] || { + echo "$1 does not exist" + return 1 + } + [ -f "$2" ] || { + echo "$2 does not exist" + return 1 + } case `basename $1` in ccm_tool.txt) txtdiff $1 $2;; # worddiff? cib.xml) cibdiff $1 $2;; ha.cf) txtdiff $1 $2;; # confdiff? crm_mon.txt|sysinfo.txt) txtdiff $1 $2;; esac } analyze_one() { rc=0 node0="" for n in `getnodes`; do if [ "$node0" ]; then diffcheck $1/$node0/$2 $1/$n/$2 rc=$((rc+$?)) else node0=$n fi done return $rc } analyze() { flist="ccm_tool.txt cib.xml crm_mon.txt ha.cf sysinfo.txt" for f in $flist; do perl -e "printf \"Diff $f... \"" ls $1/*/$f >/dev/null 2>&1 || continue if analyze_one $1 $f; then echo "OK" consolidate $1 $f else echo "varies" fi done checkcrmvfy $1 checkbacktraces $1 checklogs $1 } # # description template, editing, and other notes # mktemplate() { cat<=100{exit 1}' || cat < $DESTDIR/$WE/ha-log + if [ "$NO_str2time" ]; then + warning "a log found; but we cannot slice it" + warning "please install the perl Date::Parse module" else - cat > $DESTDIR/ha-log # we are log server, probably + dumplog $HA_LOG $FROM_TIME $TO_TIME | + if [ "$THIS_IS_NODE" ]; then + cat > $DESTDIR/$WE/ha-log + else + cat > $DESTDIR/ha-log # we are log server, probably + fi fi else warning "could not find the log file on $WE" fi # # part 6: get all other info (config, stats, etc) # if [ "$THIS_IS_NODE" ]; then getconfig $DESTDIR/$WE getpeinputs $FROM_TIME $TO_TIME $DESTDIR/$WE getbacktraces $FROM_TIME $TO_TIME $DESTDIR/$WE/backtraces.txt touch_DC_if_dc $DESTDIR/$WE sanitize $DESTDIR/$WE sys_info > $DESTDIR/$WE/sysinfo.txt sys_stats > $DESTDIR/$WE/sysstats.txt 2>&1 fi # # part 7: endgame: # slaves tar their results to stdout, the master waits # for them, analyses results, asks the user to edit the # problem description template, and prints final notes # if [ "$SLAVE" ]; then (cd $DESTDIR && tar cf - $WE) else wait $SLAVEPIDS analyze $DESTDIR > $DESTDIR/analysis.txt mktemplate > $DESTDIR/description.txt [ "$NO_DESCRIPTION" ] || { echo press enter to edit the problem description... read junk edittemplate $DESTDIR/description.txt } cd $DESTDIR/.. - tar czf $DESTDIR.tar.gz $DESTDIR/ + tar czf $DESTDIR.tar.gz `basename $DESTDIR` finalword checksize fi [ "$REMOVE_DEST" ] && rm -r $DESTDIR diff --git a/tools/utillib.sh b/tools/utillib.sh index 05e259120a..2187624d9d 100644 --- a/tools/utillib.sh +++ b/tools/utillib.sh @@ -1,384 +1,354 @@ # Copyright (C) 2007 Dejan Muhamedagic # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This software is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # # ha.cf/logd.cf parsing # getcfvar() { [ -f $HA_CF ] || return sed 's/#.*//' < $HA_CF | grep -w "^$1" | sed 's/^[^[:space:]]*[[:space:]]*//' } iscfvarset() { test "`getcfvar \"$1\"`" } iscfvartrue() { getcfvar "$1" | egrep -qsi "^(true|y|yes|on|1)" } getnodes() { getcfvar node } -# -# ssh -# -checksshuser() { - ssh -o Batchmode=yes $2@$1 true 2>/dev/null -} -trysshusers() { - n=$1 - shift 1 - for u; do - if checksshuser $n $u; then - echo $u - break - fi - done -} - # # logging # syslogmsg() { severity=$1 shift 1 logtag="" [ "$HA_LOGTAG" ] && logtag="-t $HA_LOGTAG" logger -p ${HA_LOGFACILITY:-"daemon"}.$severity $logtag $* } # # find log destination # uselogd() { iscfvartrue use_logd && return 0 # if use_logd true iscfvarset logfacility || iscfvarset logfile || iscfvarset debugfile || return 0 # or none of the log options set false } findlogdcf() { for f in \ `which strings > /dev/null 2>&1 && strings $HA_BIN/ha_logd | grep 'logd\.cf'` \ `for d; do echo $d/logd.cf $d/ha_logd.cf; done` do if [ -f "$f" ]; then echo $f return 0 fi done return 1 } getlogvars() { savecf=$HA_CF if uselogd; then [ -f "$LOGD_CF" ] || fatal "could not find logd.cf or ha_logd.cf" HA_CF=$LOGD_CF fi HA_LOGFACILITY=`getcfvar logfacility` HA_LOGFILE=`getcfvar logfile` HA_DEBUGFILE=`getcfvar debugfile` HA_SYSLOGMSGFMT="" iscfvartrue syslogmsgfmt && HA_SYSLOGMSGFMT=1 HA_CF=$savecf } findmsg() { # this is tricky, we try a few directories syslogdir="/var/log /var/logs /var/syslog /var/adm /var/log/ha /var/log/cluster" favourites="ha-*" mark=$1 log="" for d in $syslogdir; do [ -d $d ] || continue log=`fgrep -l "$mark" $d/$favourites` && break log=`fgrep -l "$mark" $d/*` && break done 2>/dev/null echo $log } # # print a segment of a log file # str2time() { perl -e "\$time='$*';" -e ' eval "use Date::Parse"; if (!$@) { print str2time($time); } else { eval "use Date::Manip"; if (!$@) { print UnixDate(ParseDateString($time), "%s"); } } ' } getstamp() { if [ "$HA_SYSLOGMSGFMT" -o "$HA_LOGFACILITY" ]; then awk '{print $1,$2,$3}' else awk '{print $2}' | sed 's/_/ /' fi } linetime() { l=`tail -n +$2 $1 | head -1 | getstamp` str2time "$l" } findln_by_time() { logf=$1 tm=$2 first=1 last=`wc -l < $logf` while [ $first -le $last ]; do mid=$(((last+first)/2)) tmid=`linetime $logf $mid` if [ -z "$tmid" ]; then warning "cannot extract time: $logf:$mid" return fi if [ $tmid -gt $tm ]; then last=$((mid-1)) elif [ $tmid -lt $tm ]; then first=$((mid+1)) else break fi done echo $mid } dumplog() { logf=$1 from_time=$2 to_time=$3 from_line=`findln_by_time $logf $from_time` if [ -z "$from_line" ]; then warning "couldn't find line for time $from_time; corrupt log file?" return fi tail -n +$from_line $logf | if [ "$to_time" != 0 ]; then to_line=`findln_by_time $logf $to_time` if [ -z "$to_line" ]; then warning "couldn't find line for time $to_time; corrupt log file?" return fi head -$((to_line-from_line+1)) else cat fi } # # find files newer than a and older than b # touchfile() { t=`maketempfile` && perl -e "\$file=\"$t\"; \$tm=$1;" -e 'utime $tm, $tm, $file;' && echo $t } find_files() { dir=$1 from_time=$2 to_time=$3 from_stamp=`touchfile $from_time` findexp="-newer $from_stamp" if [ "$to_time" -a "$to_time" -gt 0 ]; then to_stamp=`touchfile $to_time` findexp="$findexp ! -newer $to_stamp" fi find $dir -type f $findexp rm -f $from_stamp $to_stamp } # # coredumps # findbinary() { random_binary=`which cat 2>/dev/null` # suppose we are lucky binary=`gdb $random_binary $1 < /dev/null 2>/dev/null | grep 'Core was generated' | awk '{print $5}' | sed "s/^.//;s/[.']*$//"` [ x = x"$binary" ] && return fullpath=`which $binary 2>/dev/null` if [ x = x"$fullpath" ]; then [ -x $HA_BIN/$binary ] && echo $HA_BIN/$binary else echo $fullpath fi } getbt() { which gdb > /dev/null 2>&1 || { warning "please install gdb to get backtraces" return } for corefile; do absbinpath=`findbinary $corefile` [ x = x"$absbinpath" ] && return 1 echo "====================== start backtrace ======================" ls -l $corefile gdb -batch -n -quiet -ex ${BT_OPTS:-"thread apply all bt full"} -ex quit \ $absbinpath $corefile 2>/dev/null echo "======================= end backtrace =======================" done } # # heartbeat configuration/status # iscrmrunning() { crmadmin -D >/dev/null 2>&1 } dumpstate() { crm_mon -1 | grep -v '^Last upd' > $1/crm_mon.txt cibadmin -Ql > $1/cib.xml ccm_tool -p > $1/ccm_tool.txt 2>&1 } getconfig() { - cp -p $HA_CF $1/ + [ -f $HA_CF ] && + cp -p $HA_CF $1/ [ -f $LOGD_CF ] && cp -p $LOGD_CF $1/ if iscrmrunning; then dumpstate $1 + touch $1/RUNNING else cp -p $HA_VARLIB/crm/cib.xml $1/ 2>/dev/null + touch $1/STOPPED fi [ -f "$1/cib.xml" ] && crm_verify -V -x $1/cib.xml >$1/crm_verify.txt 2>&1 } # # remove values of sensitive attributes # # this is not proper xml parsing, but it will work under the # circumstances sanitize_xml_attrs() { sed $( for patt in $SANITIZE; do echo "-e /name=\"$patt\"/s/value=\"[^\"]*\"/value=\"****\"/" done ) } sanitize_hacf() { awk ' $1=="stonith_host"{ for( i=5; i<=NF; i++ ) $i="****"; } {print} ' } sanitize_one() { file=$1 compress="" echo $file | grep -qs 'gz$' && compress=gzip echo $file | grep -qs 'bz2$' && compress=bzip2 if [ "$compress" ]; then decompress="$compress -dc" else compress=cat decompress=cat fi tmp=`maketempfile` && ref=`maketempfile` || fatal "cannot create temporary files" touch -r $file $ref # save the mtime if [ "`basename $file`" = ha.cf ]; then sanitize_hacf else $decompress | sanitize_xml_attrs | $compress fi < $file > $tmp mv $tmp $file touch -r $ref $file rm -f $ref } # # keep the user posted # fatal() { - echo "ERROR: $*" >&2 + echo "`uname -n`: ERROR: $*" >&2 exit 1 } warning() { - echo "WARN: $*" >&2 + echo "`uname -n`: WARN: $*" >&2 } info() { - echo "INFO: $*" >&2 + echo "`uname -n`: INFO: $*" >&2 } pickfirst() { for x; do which $x >/dev/null 2>&1 && { echo $x return 0 } done return 1 } -# -# run a command everywhere -# -forall() { - c="$*" - for n in `getnodes`; do - if [ "$n" = "`uname -n`" ]; then - $c - else - if [ "$SSH_USER" ]; then - echo $c | ssh $SSH_OPTS $SSH_USER@$n - fi - fi - done -} - # # get some system info # distro() { which lsb_release >/dev/null 2>&1 && { lsb_release -d return } relf=`ls /etc/debian_version 2>/dev/null` || relf=`ls /etc/slackware-version 2>/dev/null` || relf=`ls -d /etc/*-release 2>/dev/null` && { for f in $relf; do test -f $f && { echo "`ls $f` `cat $f`" return } done } warning "no lsb_release no /etc/*-release no /etc/debian_version" } hb_ver() { which dpkg > /dev/null 2>&1 && { dpkg-query -f '${Version}' -W heartbeat 2>/dev/null || dpkg-query -f '${Version}' -W heartbeat-2 return } which rpm > /dev/null 2>&1 && { rpm -q --qf '%{version}' heartbeat return } # more packagers? } crm_info() { $HA_BIN/crmd version 2>&1 }