No OneTemporary
Actions

Size

35 KB

Referenced Files

None

Subscribers

None

View Options

	diff --git a/tools/README.hb_report b/tools/README.hb_report
	index 043898184c..ed6fef4c96 100644
	--- a/tools/README.hb_report
	+++ b/tools/README.hb_report
	@@ -1,297 +1,305 @@
	Heartbeat reporting
	===================
	Dejan Muhamedagic <dmuhamedagic@suse.de>
	v1.0

	`hb_report` is a utility to collect all information relevant to
	Heartbeat over the given period of time.

	Quick start
	-----------

	Run `hb_report` on one of the nodes or on the host which serves as
	a central log server. Run `hb_report` without parameters to see usage.

	A few examples:

	1. Last night during the backup there were several warnings
	encountered (logserver is the log host):
	+
	logserver# hb_report -f 3:00 -t 4:00 /tmp/report
	+
	collects everything from all nodes from 3am to 4am last night.
	The files are stored in /tmp/report and compressed to a tarball
	/tmp/report.tar.gz.

	2. Just found a problem during testing:

	node1# date : note the current time
	node1# /etc/init.d/heartbeat start
	node1# nasty_command_that_breaks_things
	node1# sleep 120 : wait for the cluster to settle
	node1# hb_report -f time /tmp/hb1

	Introduction
	------------

	Managing clusters is cumbersome. Heartbeat v2 with its numerous
	configuration files and multi-node clusters just adds to the
	complexity. No wonder then that most problem reports were less
	than optimal. This is an attempt to rectify that situation and
	make life easier for both the users and the developers.

	On security
	-----------

	`hb_report` is a fairly complex program. As some of you are
	-probably going to run it as root let us state a few important
	+probably going to run it as `root` let us state a few important
	things you should keep in mind:

	-1. Don't run `hb_report` as root! It is fairly simple to setup
	+1. Don't run `hb_report` as `root`! It is fairly simple to setup
	things in such a way that root access is not needed. I won't go
	into details, just to stress that all information collected
	should be readable by accounts belonging the haclient group.

	2. If you still have to run this as root. Well, don't use the
	`-C` option.

	3. Of course, every possible precaution has been taken not to
	disturb processes, or touch or remove files out of the given
	destination directory. If you (by mistake) specify an existing
	directory, `hb_report` will bail out soon. If you specify a
	-relative path, it won't work either. The final product of
	-`hb_report` is a tarball. However, the destination directory is
	-not removed on any node, unless the user specifies `-C`. If you're
	-too lazy to cleanup the previous run, do yourself a favour and
	-just supply a new destination directory. You've been warned. If
	-you worry about the space used, just put all your directories
	-under /tmp and setup a cronjob to remove those directories once a
	-week:
	+relative path, it won't work either.
	+
	+The final product of `hb_report` is a tarball. However, the
	+destination directory is not removed on any node, unless the user
	+specifies `-C`. If you're too lazy to cleanup the previous run,
	+do yourself a favour and just supply a new destination directory.
	+You've been warned. If you worry about the space used, just put
	+all your directories under `/tmp` and setup a cronjob to remove
	+those directories once a week:
	..........
	for d in /tmp/*; do
	test -d $d \|\|
	continue
	test -f $d/description.txt \|\| test -f $d/.env \|\|
	continue
	grep -qs 'By: hb_report' $d/description.txt \|\|
	grep -qs '^UNIQUE_MSG=Mark' $d/.env \|\|
	continue
	rm -r $d
	done
	..........

	Mode of operation
	-----------------

	Cluster data collection is straightforward: just run the same
	procedure on all nodes and collect the reports. There is,
	apart from many small ones, one large complication: central
	syslog destination. So, in order to allow this to be fully
	automated, we should sometimes run the procedure on the log host
	too. Actually, if there is a log host, then the best way is to
	run `hb_report` there.

	-We use ssh for the remote program invocation. Even though it is
	+We use `ssh` for the remote program invocation. Even though it is
	possible to run `hb_report` without ssh by doing a more menial job,
	the overall user experience is much better if ssh works. Anyway,
	how else do you manage your cluster?

	Another ssh related point: In case your security policy
	proscribes loghost-to-cluster-over-ssh communications, then
	you'll have to copy the log file to one of the nodes and point
	`hb_report` to it.

	Prerequisites
	-------------

	1. ssh
	+
	This is not strictly required, but you won't regret having a
	password-less ssh. It is not too difficult to setup and will save
	you a lot of time. If you can't have it, for example because your
	security policy does not allow such a thing, or you just prefer
	menial work, then you will have to resort to the semi-manual
	semi-automated report generation. See below for instructions.
	++
	+If you need to supply a password for your passphrase/login, then
	+please use the `-u` option.

	2. Times
	+
	In order to find files and messages in the given period and to
	parse the `-f` and `-t` options, `hb_report` uses perl and one of the
	`Date::Parse` or `Date::Manip` perl modules. Note that you need
	-only one of these.
	+only one of these. Furthermore, on nodes which have no logs and
	+where you don't run `hb_report` directly, no date parsing is
	+necessary. In other words, if you run this on a loghost then you
	+don't need these perl modules on the cluster nodes.
	+
	On rpm based distributions, you can find `Date::Parse` in
	`perl-TimeDate` and on Debian and its derivatives in
	`libtimedate-perl`.

	3. Core dumps
	+
	-To backtrace core dumps gdb is needed and the Heartbeat packages
	+To backtrace core dumps `gdb` is needed and the Heartbeat packages
	with the debugging info. The debug info packages may be installed
	at the time the report is created. Let's hope that you will need
	this really seldom.

	What is in the report
	---------------------

	1. Heartbeat related
	- heartbeat version/release information
	- heartbeat configuration (CIB, ha.cf, logd.cf)
	- heartbeat status (output from crm_mon, crm_verify, ccm_tool)
	- pengine transition graphs (if any)
	- backtraces of core dumps (if any)
	- heartbeat logs (if any)
	2. System related
	- general platform information (`uname`, `arch`, `distribution`)
	-- system statistics (`uptime`, `top`, `ps`)
	+- system statistics (`uptime`, `top`, `ps`, `netstat -i`, `arp`)
	3. User created :)
	- problem description (template to be edited)
	4. Generated
	- problem analysis (generated)

	It is preferred that the Heartbeat is running at the time of the
	report, but not absolutely required. `hb_report` will also do a
	quick analysis of the collected information.

	Times
	-----

	Specifying times can at times be a nuisance. That is why we have
	chosen to use one of the perl modules--they do allow certain
	freedom when talking dates. You can either read the instructions
	at the
	http://search.cpan.org/dist/TimeDate/lib/Date/Parse.pm#EXAMPLE_DATES[Date::Parse
	examples page].

	or just rely on common sense and try stuff like:

	3:00 (today at 3am)
	15:00 (today at 3pm)
	2007/9/1 2pm (September 1st at 2pm)

	`hb_report` will (probably) complain if it can't figure out what do
	you mean.

	Try to delimit the event as close as possible in order to reduce
	the size of the report, but still leaving a minute or two around
	for good measure.

	Note that `-f` is not an optional option. And don't forget to quote
	dates when they contain spaces.

	Should I send all this to the rest of Internet?
	-----------------------------------------------

	We make an effort to remove sensitive data from the Heartbeat
	configuration (CIB, ha.cf, and transition graphs). However, you
	_have_ to tell us what is sensitive! Use the `-p` option to specify
	additional regular expressions to match variable names which may
	contain information you don't want to leak. For example:

	# hb_report -f 18:00 -p "user." -p "secret." /var/tmp/report

	We look by default for variable names matching "pass.*" and the
	stonith_host ha.cf directive.

	Logs and other files are not filtered. Please filter them
	yourself if necessary.

	Logs
	----

	It may be tricky to find syslog logs. The scheme used is to log a
	unique message on all nodes and then look it up in the usual
	syslog locations. This procedure is not foolproof, in particular
	if the syslog files are in a non-standard directory. We look in
	/var/log /var/logs /var/syslog /var/adm /var/log/ha
	/var/log/cluster. In case we can't find the logs, please supply
	their location:

	# hb_report -f 5pm -l /var/log/cluster1/ha-log -S /tmp/report_node1

	If you have different log locations on different nodes, well,
	-perhaps you'd like to make them the same. Or read about the
	-manual report collection.
	+perhaps you'd like to make them the same and make life easier for
	+everybody.

	The log files are collected from all hosts where found. In case
	your syslog is configured to log to both the log server and local
	files and `hb_report` is run on the log server you will end up with
	multiple logs with same content.

	Files starting with "ha-" are preferred. In case syslog sends
	messages to more than one file, if one of them is named ha-log or
	ha-debug those will be favoured to syslog or messages.

	If there is no separate log for Heartbeat, possibly unrelated
	messages from other programs are included. We don't filter logs,
	just pick a segment for the period you specified.

	NB: Don't have a central log host? Read the CTS README and setup
	one.

	Manual report collection
	------------------------

	So, your ssh doesn't work. In that case, you will have to run
	this procedure on all nodes. Use `-S` so that we don't bother with
	ssh:

	# hb_report -f 5:20pm -t 5:30pm -S /tmp/report_node1

	If you also have a log host which is not in the cluster, then
	you'll have to copy the log to one of the nodes and tell us where
	it is:

	# hb_report -f 5:20pm -t 5:30pm -l /var/tmp/ha-log -S /tmp/report_node1

	Furthermore, to prevent `hb_report` from asking you to edit the
	report to describe the problem on every node use `-D` on all but
	one:

	# hb_report -f 5:20pm -t 5:30pm -DS /tmp/report_node1

	If you reconsider and want the ssh setup, take a look at the CTS
	README file for instructions.

	Analysis
	--------

	The point of analysis is to get out the most important
	information from probably several thousand lines worth of text.
	Perhaps this should be more properly named as report review as it
	is rather simple, but let's pretend that we are doing something
	utterly sophisticated.

	The analysis consists of the following:

	- compare files coming from different nodes; if they are equal,
	make one copy in the top level directory, remove duplicates,
	and create soft links instead
	- print errors, warnings, and lines matching `-L` patterns from logs
	- report if there were coredumps and by whom
	- report crm_verify results

	The goods
	---------

	1. Common
	+
	- ha-log (if found on the log host)
	- description.txt (template and user report)
	- analysis.txt

	2. Per node
	+
	- ha.cf
	- logd.cf
	- ha-log (if found)
	- cib.xml (`cibadmin -Ql` or `cp` if Heartbeat is not running)
	- ccm_tool.txt (`ccm_tool -p`)
	- crm_mon.txt (`crm_mon -1`)
	- crm_verify.txt (`crm_verify -V`)
	- pengine/ (only on DC, directory with pengine transitions)
	- sysinfo.txt (static info)
	- sysstats.txt (dynamic info)
	- backtraces.txt (if coredumps found)
	- DC (well...)
	+- RUNNING or STOPPED

	diff --git a/tools/hb_report.in b/tools/hb_report.in
	index c02a3df378..f4ee7fbee9 100755
	--- a/tools/hb_report.in
	+++ b/tools/hb_report.in
	@@ -1,608 +1,663 @@
	#!/bin/sh

	# Copyright (C) 2007 Dejan Muhamedagic <dmuhamedagic@suse.de>
	#
	# This program is free software; you can redistribute it and/or
	# modify it under the terms of the GNU General Public
	# License as published by the Free Software Foundation; either
	# version 2.1 of the License, or (at your option) any later version.
	#
	# This software is distributed in the hope that it will be useful,
	# but WITHOUT ANY WARRANTY; without even the implied warranty of
	# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
	# General Public License for more details.
	#
	# You should have received a copy of the GNU General Public
	# License along with this library; if not, write to the Free Software
	# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
	#

	. @sysconfdir@/ha.d/shellfuncs
	. $HA_NOARCHBIN/utillib.sh

	PROG=`basename $0`
	# FIXME: once this is part of the package!
	PROGDIR=`dirname $0`
	echo "$PROGDIR" \| grep -qs '^/' \|\| {
	test -f @sbindir@/$PROG &&
	PROGDIR=@sbindir@
	test -f $HA_NOARCHBIN/$PROG &&
	PROGDIR=$HA_NOARCHBIN
	}

	LOGD_CF=`findlogdcf @sysconfdir@ $HA_DIR`
	export LOGD_CF

	-: ${SSH_OPTS="-T -o Batchmode=yes"}
	+: ${SSH_OPTS="-T"}
	LOG_PATTERNS="CRIT: ERROR:"

	#
	# the instance where user runs hb_report is the master
	# the others are slaves
	#
	if [ x"$1" = x__slave ]; then
	SLAVE=1
	fi

	#
	# if this is the master, allow ha.cf and logd.cf in the current dir
	# (because often the master is the log host)
	#
	if [ "$SLAVE" = "" ]; then
	[ -f ha.cf ] && HA_CF=ha.cf
	[ -f logd.cf ] && LOGD_CF=logd.cf
	fi

	usage() {
	cat<<EOF
	usage: hb_report -f time [-t time] [-u user] [-l file] [-p patt] [-L patt]
	[-e prog] [-SDC] dest

	-f time: time to start from
	-t time: time to finish at (dflt: now)
	- -u user: ssh user to access other nodes (dftl: hacluster)
	+ -u user: ssh user to access other nodes (dflt: empty, hacluster, root)
	-l file: log file
	-p patt: regular expression to match variables to be removed;
	this option is additive (dflt: "passw.*")
	-L patt: regular expression to match in log files for analysis;
	this option is additive (dflt: $LOG_PATTERNS)
	-e prog: your favourite editor
	-D : don't invoke editor to write description
	-C : remove the destination directory
	-S : single node operation; don't try to start report
	collectors on other nodes
	dest : destination directory
	EOF

	[ "$1" != short ] &&
	cat<<EOF

	. the multifile output is first stored in a directory {dest}
	of which a tarball {dest}.tar.gz is created
	. the time specification is as in either Date::Parse or
	Date::Manip, whatever you have installed; Date::Parse is
	preferred
	. we try to figure where is the logfile; if we can't, please
	clue us in

	Examples

	hb_report -f 2pm /tmp/report_1
	hb_report -f "2007/9/5 12:30" -t "2007/9/5 14:00" /tmp/report_2
	hb_report -f 1:00 -t 3:00 -l /var/log/cluster/ha-debug /tmp/report_3
	hb_report -f "09sep07 2:00" -u hbadmin /tmp/report_4
	hb_report -f 18:00 -p "usern." -p "admin." /tmp/report_5

	. WARNING . WARNING . WARNING . WARNING . WARNING . WARNING .
	We try to sanitize the CIB and the peinputs files. If you
	have more sensitive information, please supply additional
	patterns yourself. The logs and the crm_mon, ccm_tool, and
	crm_verify output are not sanitized.
	IT IS YOUR RESPONSIBILITY TO PROTECT THE DATA FROM EXPOSURE!
	EOF
	exit
	}
	#
	# these are "global" variables
	#
	setvarsanddefaults() {
	now=`perl -e 'print time()'`
	# used by all
	DESTDIR=""
	FROM_TIME=""
	TO_TIME=0
	HA_LOG=""
	UNIQUE_MSG="Mark:HB_REPORT:$now"
	SANITIZE="passw.*"
	REMOVE_DEST=""
	# used only by the master
	NO_SSH=""
	SSH_USER=""
	- TRY_SSH="hacluster"
	+ TRY_SSH="hacluster root"
	SLAVEPIDS=""
	NO_DESCRIPTION=""
	}
	chkdirname() {
	[ "$1" ] \|\| usage short
	[ $# -ne 1 ] && fatal "bad directory name: $1"
	echo $1 \| grep -qs '^/' \|\|
	fatal "destination directory must be an absolute path"
	[ "$1" = / ] &&
	fatal "no root here, thank you"
	}
	chktime() {
	[ "$1" ] \|\| fatal "bad time specification: $2"
	}
	msgcleanup() {
	fatal "destination directory $DESTDIR exists, please cleanup"
	}
	nodistdirectory() {
	fatal "could not create the destination directory $DESTDIR"
	}
	time2str() {
	perl -e "use POSIX; print strftime('%x %X',localtime($1));"
	}

	#
	# find log files
	#
	logmarks() {
	sev=$1 msg=$2
	- forall "logger -p $HA_LOGFACILITY.$sev $msg"
	+ c="logger -p $HA_LOGFACILITY.$sev $msg"
	+
	+ for n in `getnodes`; do
	+ if [ "$n" = "`uname -n`" ]; then
	+ $c
	+ else
	+ [ "$ssh_good" ] &&
	+ echo $c \| ssh $ssh_opts $n
	+ fi
	+ done
	}
	findlog() {
	if [ "$HA_LOGFACILITY" ]; then
	findmsg $UNIQUE_MSG \| awk '{print $1}'
	else
	echo ${HA_DEBUGFILE:-$HA_LOGFILE}
	fi
	}

	#
	# this is how we pass environment to other hosts
	#
	dumpenv() {
	cat<<EOF
	FROM_TIME=$FROM_TIME
	TO_TIME=$TO_TIME
	HA_LOG=$HA_LOG
	DESTDIR=$DESTDIR
	UNIQUE_MSG=$UNIQUE_MSG
	SANITIZE="$SANITIZE"
	REMOVE_DEST="$REMOVE_DEST"
	EOF
	}
	send_config() {
	for node in `getnodes`; do
	[ "$node" = "$WE" ] && continue
	dumpenv \|
	- ssh $SSH_OPTS $SSH_USER@$node "mkdir -p $DESTDIR; cat > $DESTDIR/.env"
	+ ssh $ssh_opts $node "mkdir -p $DESTDIR; cat > $DESTDIR/.env"
	done
	}
	start_remote_collectors() {
	for node in `getnodes`; do
	[ "$node" = "$WE" ] && continue
	- ssh $SSH_OPTS $SSH_USER@$node "$PROGDIR/hb_report __slave $DESTDIR" \|
	+ ssh $ssh_opts $node "$PROGDIR/hb_report __slave $DESTDIR" \|
	(cd $DESTDIR && tar xf -) &
	SLAVEPIDS="$SLAVEPIDS $!"
	done
	}

	#
	# does ssh work?
	#
	-findsshuser() {
	- for n in `getnodes`; do
	- [ "$node" = "$WE" ] && continue
	- trysshusers $n $TRY_SSH && break
	- done
	+testsshuser() {
	+ if [ "$2" ]; then
	+ ssh -T -o Batchmode=yes $2@$1 true 2>/dev/null
	+ else
	+ ssh -T -o Batchmode=yes $1 true 2>/dev/null
	+ fi
	}
	-checkssh() {
	- for n in `getnodes`; do
	- [ "$node" = "$WE" ] && continue
	- checksshuser $n $SSH_USER \|\| return 1
	+findsshuser() {
	+ for u in "" $TRY_SSH; do
	+ rc=0
	+ for n in `getnodes`; do
	+ [ "$node" = "$WE" ] && continue
	+ testsshuser $n $u \|\| {
	+ rc=1
	+ break
	+ }
	+ done
	+ if [ $rc -eq 0 ]; then
	+ echo $u
	+ return 0
	+ fi
	done
	- return 0
	+ return 1
	}

	#
	# the usual stuff
	#
	getbacktraces() {
	flist=`find_files $HA_VARLIB/cores $1 $2`
	[ "$flist" ] &&
	getbt $flist > $3
	}
	getpeinputs() {
	n=`basename $3`
	flist=$(
	if [ -f $3/ha-log ]; then
	grep " $n peng.*PEngine Input stored" $3/ha-log \| awk '{print $NF}'
	else
	find_files $HA_VARLIB/pengine $1 $2
	fi \| sed "s,$HA_VARLIB/,,g"
	)
	[ "$flist" ] &&
	(cd $HA_VARLIB && tar cf - $flist) \| (cd $3 && tar xf -)
	}
	touch_DC_if_dc() {
	dc=`crmadmin -D 2>/dev/null \| awk '{print $NF}'`
	if [ "$WE" = "$dc" ]; then
	touch $1/DC
	fi
	}

	#
	# some basic system info and stats
	#
	sys_info() {
	echo "Heartbeat version: `hb_ver`"
	crm_info
	echo "Platform: `uname`"
	echo "Kernel release: `uname -r`"
	echo "Architecture: `arch`"
	[ `uname` = Linux ] &&
	echo "Distribution: `distro`"
	}
	sys_stats() {
	set -x
	uptime
	ps axf
	ps auxw
	top -b -n 1
	netstat -i
	+ arp -an
	set +x
	}

	#
	# replace sensitive info with '****'
	#
	sanitize() {
	for f in $1/ha.cf $1/cib.xml $1/pengine/*; do
	[ -f "$f" ] && sanitize_one $f
	done
	}

	#
	# remove duplicates if files are same, make links instead
	#
	consolidate() {
	for n in `getnodes`; do
	if [ -f $1/$2 ]; then
	rm $1/$n/$2
	else
	mv $1/$n/$2 $1
	fi
	ln -s ../$2 $1/$n
	done
	}

	#
	# some basic analysis of the report
	#
	checkcrmvfy() {
	for n in `getnodes`; do
	if [ -s $1/$n/crm_verify.txt ]; then
	echo "WARN: crm_verify reported warnings at $n:"
	cat $1/$n/crm_verify.txt
	fi
	done
	}
	checkbacktraces() {
	for n in `getnodes`; do
	[ -s $1/$n/backtraces.txt ] && {
	echo "WARN: coredumps found at $n:"
	egrep 'Core was generated\|Program terminated' \
	$1/$n/backtraces.txt \|
	sed 's/^/ /'
	}
	done
	}
	checklogs() {
	logs=`find $1 -name ha-log`
	[ "$logs" ] \|\| return
	pattfile=`maketempfile` \|\|
	fatal "cannot create temporary files"
	for p in $LOG_PATTERNS; do
	echo "$p"
	done > $pattfile
	echo ""
	echo "Log patterns:"
	for n in `getnodes`; do
	cat $logs \| grep -f $pattfile
	done
	rm -f $pattfile
	}

	#
	# check if files have same content in the cluster
	#
	cibdiff() {
	- crm_diff -c -n $1 -o $2
	+ d1=`dirname $1`
	+ d2=`dirname $2`
	+ if [ -f $d1/RUNNING -a -f $d2/RUNNING ] \|\|
	+ [ -f $d1/STOPPED -a -f $d2/STOPPED ]; then
	+ crm_diff -c -n $1 -o $2
	+ else
	+ echo "can't compare cibs from running and stopped systems"
	+ fi
	}
	txtdiff() {
	diff $1 $2
	}
	diffcheck() {
	+ [ -f "$1" ] \|\| {
	+ echo "$1 does not exist"
	+ return 1
	+ }
	+ [ -f "$2" ] \|\| {
	+ echo "$2 does not exist"
	+ return 1
	+ }
	case `basename $1` in
	ccm_tool.txt)
	txtdiff $1 $2;; # worddiff?
	cib.xml)
	cibdiff $1 $2;;
	ha.cf)
	txtdiff $1 $2;; # confdiff?
	crm_mon.txt\|sysinfo.txt)
	txtdiff $1 $2;;
	esac
	}
	analyze_one() {
	rc=0
	node0=""
	for n in `getnodes`; do
	if [ "$node0" ]; then
	diffcheck $1/$node0/$2 $1/$n/$2
	rc=$((rc+$?))
	else
	node0=$n
	fi
	done
	return $rc
	}
	analyze() {
	flist="ccm_tool.txt cib.xml crm_mon.txt ha.cf sysinfo.txt"
	for f in $flist; do
	perl -e "printf \"Diff $f... \""
	ls $1/*/$f >/dev/null 2>&1 \|\| continue
	if analyze_one $1 $f; then
	echo "OK"
	consolidate $1 $f
	else
	echo "varies"
	fi
	done
	checkcrmvfy $1
	checkbacktraces $1
	checklogs $1
	}

	#
	# description template, editing, and other notes
	#
	mktemplate() {
	cat<<EOF
	Please edit this template and describe the issue/problem you
	encountered. Then, post to
	Linux-HA@lists.linux-ha.org
	or file a bug at
	http://old.linux-foundation.org/developer_bugzilla/

	See http://linux-ha.org/ReportingProblems for detailed
	description on how to report problems.

	Thank you.

	Date: `date`
	By: $PROG $userargs
	Subject: [short problem description]
	Severity: [choose one] enhancement minor normal major critical blocking
	-Component: [choose one] CRM LRM CCM RA fencing comm GUI other
	+Component: [choose one] CRM LRM CCM RA fencing heartbeat comm GUI tools other

	Detailed description:
	---
	[...]
	---

	-$(
	-if [ -f $DESTDIR/sysinfo.txt ]; then
	- cat $DESTDIR/sysinfo.txt
	-else
	- for n in `getnodes`; do
	- [ -f $DESTDIR/$n/sysinfo.txt ] &&
	- echo "Info $n:"; sed 's/^/ /' $DESTDIR/$n/sysinfo.txt
	- done
	-fi
	-)
	EOF
	+
	+ if [ -f $DESTDIR/sysinfo.txt ]; then
	+ echo "Common system info found:"
	+ cat $DESTDIR/sysinfo.txt
	+ else
	+ for n in `getnodes`; do
	+ if [ -f $DESTDIR/$n/sysinfo.txt ]; then
	+ echo "System info $n:"
	+ sed 's/^/ /' $DESTDIR/$n/sysinfo.txt
	+ fi
	+ done
	+ fi
	}
	edittemplate() {
	if ec=`pickfirst $EDITOR vim vi emacs nano`; then
	$ec $1
	else
	warning "could not find a text editor"
	fi
	}
	finalword() {
	cat<<EOF
	The report is saved in $DESTDIR.tar.gz.

	Thank you for taking time to create this report.
	EOF
	}
	checksize() {
	ls -s $DESTDIR.tar.gz \| awk '$1>=100{exit 1}' \|\|
	cat <<EOF

	NB: size of the tarball exceeds 100kb and if posted to the
	mailing list will have to be first approved by the moderator.
	Try reducing the period (use the -f and -t options).
	EOF
	}

	[ $# -eq 0 ] && usage

	-# check for the major prereq
	+# check for the major prereq for a) parameter parsing and b)
	+# parsing logs
	+#
	+NO_str2time=""
	t=`str2time "12:00"`
	if [ "$t" = "" ]; then
	- fatal "please install the perl Date::Parse module"
	+ NO_str2time=1
	+ [ "$SLAVE" ] \|\|
	+ fatal "please install the perl Date::Parse module"
	fi

	WE=`uname -n` # who am i?
	THIS_IS_NODE=""
	getnodes \| grep -wqs $WE && # are we a node?
	THIS_IS_NODE=1
	getlogvars

	#
	# part 1: get and check options; and the destination
	#
	if [ "$SLAVE" = "" ]; then
	setvarsanddefaults
	userargs="$@"
	args=`getopt -o f:t:l:u:p:L:e:SDCh -- "$@"`
	[ $? -ne 0 ] && usage
	eval set -- "$args"
	while [ x"$1" != x ]; do
	case "$1" in
	-h) usage;;
	-f) FROM_TIME=`str2time "$2"`
	chktime "$FROM_TIME" "$2"
	shift 2;;
	-t) TO_TIME=`str2time "$2"`
	chktime "$TO_TIME" "$2"
	shift 2;;
	-u) SSH_USER="$2"; shift 2;;
	-l) HA_LOG="$2"; shift 2;;
	-e) EDITOR="$2"; shift 2;;
	-p) SANITIZE="$SANITIZE $2"; shift 2;;
	-L) LOG_PATTERNS="$LOG_PATTERNS $2"; shift 2;;
	-S) NO_SSH=1; shift 1;;
	-D) NO_DESCRIPTION=1; shift 1;;
	-C) REMOVE_DEST=1; shift 1;;
	--) shift 1; break;;
	*) usage short;;
	esac
	done
	[ $# -ne 1 ] && usage short
	DESTDIR=$1
	chkdirname $DESTDIR
	[ "$FROM_TIME" ] \|\| usage short
	fi

	# this only on master
	if [ "$SLAVE" = "" ]; then
	#
	# part 2: ssh business
	#
	# find out if ssh works
	- if [ "$NO_SSH" = "" ]; then
	+ ssh_good=""
	+ if [ -z "$NO_SSH" ]; then
	[ "$SSH_USER" ] \|\|
	SSH_USER=`findsshuser`
	- [ "$SSH_USER" ] && checkssh \|\| # check if it works on _all_ nodes
	- SSH_USER=""
	+ if [ $? -eq 0 ]; then
	+ ssh_good=1
	+ if [ "$SSH_USER" ]; then
	+ ssh_opts="-l $SSH_USER $SSH_OPTS"
	+ else
	+ ssh_opts="$SSH_OPTS"
	+ fi
	+ fi
	fi
	# final check: don't run if the destination directory exists
	[ -d $DESTDIR ] && msgcleanup
	- [ "$SSH_USER" ] &&
	+ [ "$ssh_good" ] &&
	for node in `getnodes`; do
	[ "$node" = "$WE" ] && continue
	- ssh $SSH_OPTS $SSH_USER@$node "test -d $DESTDIR" &&
	+ ssh $ssh_opts $node "test -d $DESTDIR" &&
	msgcleanup
	done
	fi

	if [ "$SLAVE" ]; then
	DESTDIR=$2
	[ -d $DESTDIR ] \|\| nodistdirectory
	. $DESTDIR/.env
	else
	mkdir -p $DESTDIR
	[ -d $DESTDIR ] \|\| nodistdirectory
	fi

	if [ "$SLAVE" = "" ]; then
	#
	# part 3: log marks to be searched for later
	# important to do this now on _all_ nodes
	#
	if [ "$HA_LOGFACILITY" ]; then
	sev="info"
	cfdebug=`getcfvar debug` # prefer debuglog if set
	[ "$cfdebug" -a "$cfdebug" -gt 0 ] &&
	sev="debug"
	logmarks $sev $UNIQUE_MSG
	fi
	#
	# part 4: start this program on other nodes
	#
	- if [ "$SSH_USER" ]; then
	+ if [ "$ssh_good" ]; then
	send_config
	start_remote_collectors
	else
	[ `getnodes \| wc -w` -gt 1 ] &&
	warning "ssh does not work to all nodes"
	fi
	fi

	# only cluster nodes need their own directories
	[ "$THIS_IS_NODE" ] && mkdir -p $DESTDIR/$WE

	#
	# part 5: find the logs and cut out the segment for the period
	#
	if [ "$HA_LOG" ]; then # log provided by the user?
	[ -f "$HA_LOG" ] \|\| { # not present
	[ "$SLAVE" ] \|\| # warning if not on slave
	warning "$HA_LOG not found; we will try to find log ourselves"
	HA_LOG=""
	}
	fi
	if [ "$HA_LOG" = "" ]; then
	HA_LOG=`findlog`
	[ "$HA_LOG" ] &&
	cnt=`fgrep -c $UNIQUE_MSG < $HA_LOG`
	fi
	nodecnt=`getnodes \| wc -w`
	if [ "$cnt" ] && [ $cnt -eq $nodecnt ]; then
	info "found the central log!"
	info "you can ignore warnings about missing logs"
	fi

	if [ -f "$HA_LOG" ]; then
	- dumplog $HA_LOG $FROM_TIME $TO_TIME \|
	- if [ "$THIS_IS_NODE" ]; then
	- cat > $DESTDIR/$WE/ha-log
	+ if [ "$NO_str2time" ]; then
	+ warning "a log found; but we cannot slice it"
	+ warning "please install the perl Date::Parse module"
	else
	- cat > $DESTDIR/ha-log # we are log server, probably
	+ dumplog $HA_LOG $FROM_TIME $TO_TIME \|
	+ if [ "$THIS_IS_NODE" ]; then
	+ cat > $DESTDIR/$WE/ha-log
	+ else
	+ cat > $DESTDIR/ha-log # we are log server, probably
	+ fi
	fi
	else
	warning "could not find the log file on $WE"
	fi

	#
	# part 6: get all other info (config, stats, etc)
	#
	if [ "$THIS_IS_NODE" ]; then
	getconfig $DESTDIR/$WE
	getpeinputs $FROM_TIME $TO_TIME $DESTDIR/$WE
	getbacktraces $FROM_TIME $TO_TIME $DESTDIR/$WE/backtraces.txt
	touch_DC_if_dc $DESTDIR/$WE
	sanitize $DESTDIR/$WE
	sys_info > $DESTDIR/$WE/sysinfo.txt
	sys_stats > $DESTDIR/$WE/sysstats.txt 2>&1
	fi

	#
	# part 7: endgame:
	# slaves tar their results to stdout, the master waits
	# for them, analyses results, asks the user to edit the
	# problem description template, and prints final notes
	#
	if [ "$SLAVE" ]; then
	(cd $DESTDIR && tar cf - $WE)
	else
	wait $SLAVEPIDS
	analyze $DESTDIR > $DESTDIR/analysis.txt
	mktemplate > $DESTDIR/description.txt
	[ "$NO_DESCRIPTION" ] \|\| {
	echo press enter to edit the problem description...
	read junk
	edittemplate $DESTDIR/description.txt
	}
	cd $DESTDIR/..
	- tar czf $DESTDIR.tar.gz $DESTDIR/
	+ tar czf $DESTDIR.tar.gz `basename $DESTDIR`
	finalword
	checksize
	fi

	[ "$REMOVE_DEST" ] &&
	rm -r $DESTDIR
	diff --git a/tools/utillib.sh b/tools/utillib.sh
	index 05e259120a..2187624d9d 100644
	--- a/tools/utillib.sh
	+++ b/tools/utillib.sh
	@@ -1,384 +1,354 @@
	# Copyright (C) 2007 Dejan Muhamedagic <dmuhamedagic@suse.de>
	#
	# This program is free software; you can redistribute it and/or
	# modify it under the terms of the GNU General Public
	# License as published by the Free Software Foundation; either
	# version 2.1 of the License, or (at your option) any later version.
	#
	# This software is distributed in the hope that it will be useful,
	# but WITHOUT ANY WARRANTY; without even the implied warranty of
	# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
	# General Public License for more details.
	#
	# You should have received a copy of the GNU General Public
	# License along with this library; if not, write to the Free Software
	# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
	#

	#
	# ha.cf/logd.cf parsing
	#
	getcfvar() {
	[ -f $HA_CF ] \|\| return
	sed 's/#.*//' < $HA_CF \|
	grep -w "^$1" \|
	sed 's/^[^[:space:]][[:space:]]//'
	}
	iscfvarset() {
	test "`getcfvar \"$1\"`"
	}
	iscfvartrue() {
	getcfvar "$1" \|
	egrep -qsi "^(true\|y\|yes\|on\|1)"
	}
	getnodes() {
	getcfvar node
	}

	-#
	-# ssh
	-#
	-checksshuser() {
	- ssh -o Batchmode=yes $2@$1 true 2>/dev/null
	-}
	-trysshusers() {
	- n=$1
	- shift 1
	- for u; do
	- if checksshuser $n $u; then
	- echo $u
	- break
	- fi
	- done
	-}
	-
	#
	# logging
	#
	syslogmsg() {
	severity=$1
	shift 1
	logtag=""
	[ "$HA_LOGTAG" ] && logtag="-t $HA_LOGTAG"
	logger -p ${HA_LOGFACILITY:-"daemon"}.$severity $logtag $*
	}

	#
	# find log destination
	#
	uselogd() {
	iscfvartrue use_logd &&
	return 0 # if use_logd true
	iscfvarset logfacility \|\|
	iscfvarset logfile \|\|
	iscfvarset debugfile \|\|
	return 0 # or none of the log options set
	false
	}
	findlogdcf() {
	for f in \
	`which strings > /dev/null 2>&1 &&
	strings $HA_BIN/ha_logd \| grep 'logd\.cf'` \
	`for d; do echo $d/logd.cf $d/ha_logd.cf; done`
	do
	if [ -f "$f" ]; then
	echo $f
	return 0
	fi
	done
	return 1
	}
	getlogvars() {
	savecf=$HA_CF
	if uselogd; then
	[ -f "$LOGD_CF" ] \|\|
	fatal "could not find logd.cf or ha_logd.cf"
	HA_CF=$LOGD_CF
	fi
	HA_LOGFACILITY=`getcfvar logfacility`
	HA_LOGFILE=`getcfvar logfile`
	HA_DEBUGFILE=`getcfvar debugfile`
	HA_SYSLOGMSGFMT=""
	iscfvartrue syslogmsgfmt &&
	HA_SYSLOGMSGFMT=1
	HA_CF=$savecf
	}
	findmsg() {
	# this is tricky, we try a few directories
	syslogdir="/var/log /var/logs /var/syslog /var/adm /var/log/ha /var/log/cluster"
	favourites="ha-*"
	mark=$1
	log=""
	for d in $syslogdir; do
	[ -d $d ] \|\| continue
	log=`fgrep -l "$mark" $d/$favourites` && break
	log=`fgrep -l "$mark" $d/*` && break
	done 2>/dev/null
	echo $log
	}

	#
	# print a segment of a log file
	#
	str2time() {
	perl -e "\$time='$*';" -e '
	eval "use Date::Parse";
	if (!$@) {
	print str2time($time);
	} else {
	eval "use Date::Manip";
	if (!$@) {
	print UnixDate(ParseDateString($time), "%s");
	}
	}
	'
	}
	getstamp() {
	if [ "$HA_SYSLOGMSGFMT" -o "$HA_LOGFACILITY" ]; then
	awk '{print $1,$2,$3}'
	else
	awk '{print $2}' \| sed 's/_/ /'
	fi
	}
	linetime() {
	l=`tail -n +$2 $1 \| head -1 \| getstamp`
	str2time "$l"
	}
	findln_by_time() {
	logf=$1
	tm=$2
	first=1
	last=`wc -l < $logf`
	while [ $first -le $last ]; do
	mid=$(((last+first)/2))
	tmid=`linetime $logf $mid`
	if [ -z "$tmid" ]; then
	warning "cannot extract time: $logf:$mid"
	return
	fi
	if [ $tmid -gt $tm ]; then
	last=$((mid-1))
	elif [ $tmid -lt $tm ]; then
	first=$((mid+1))
	else
	break
	fi
	done
	echo $mid
	}
	dumplog() {
	logf=$1
	from_time=$2
	to_time=$3
	from_line=`findln_by_time $logf $from_time`
	if [ -z "$from_line" ]; then
	warning "couldn't find line for time $from_time; corrupt log file?"
	return
	fi
	tail -n +$from_line $logf \|
	if [ "$to_time" != 0 ]; then
	to_line=`findln_by_time $logf $to_time`
	if [ -z "$to_line" ]; then
	warning "couldn't find line for time $to_time; corrupt log file?"
	return
	fi
	head -$((to_line-from_line+1))
	else
	cat
	fi
	}

	#
	# find files newer than a and older than b
	#
	touchfile() {
	t=`maketempfile` &&
	perl -e "\$file=\"$t\"; \$tm=$1;" -e 'utime $tm, $tm, $file;' &&
	echo $t
	}
	find_files() {
	dir=$1
	from_time=$2
	to_time=$3
	from_stamp=`touchfile $from_time`
	findexp="-newer $from_stamp"
	if [ "$to_time" -a "$to_time" -gt 0 ]; then
	to_stamp=`touchfile $to_time`
	findexp="$findexp ! -newer $to_stamp"
	fi
	find $dir -type f $findexp
	rm -f $from_stamp $to_stamp
	}

	#
	# coredumps
	#
	findbinary() {
	random_binary=`which cat 2>/dev/null` # suppose we are lucky
	binary=`gdb $random_binary $1 < /dev/null 2>/dev/null \|
	grep 'Core was generated' \| awk '{print $5}' \|
	sed "s/^.//;s/[.']*$//"`
	[ x = x"$binary" ] && return
	fullpath=`which $binary 2>/dev/null`
	if [ x = x"$fullpath" ]; then
	[ -x $HA_BIN/$binary ] && echo $HA_BIN/$binary
	else
	echo $fullpath
	fi
	}
	getbt() {
	which gdb > /dev/null 2>&1 \|\| {
	warning "please install gdb to get backtraces"
	return
	}
	for corefile; do
	absbinpath=`findbinary $corefile`
	[ x = x"$absbinpath" ] && return 1
	echo "====================== start backtrace ======================"
	ls -l $corefile
	gdb -batch -n -quiet -ex ${BT_OPTS:-"thread apply all bt full"} -ex quit \
	$absbinpath $corefile 2>/dev/null
	echo "======================= end backtrace ======================="
	done
	}

	#
	# heartbeat configuration/status
	#
	iscrmrunning() {
	crmadmin -D >/dev/null 2>&1
	}
	dumpstate() {
	crm_mon -1 \| grep -v '^Last upd' > $1/crm_mon.txt
	cibadmin -Ql > $1/cib.xml
	ccm_tool -p > $1/ccm_tool.txt 2>&1
	}
	getconfig() {
	- cp -p $HA_CF $1/
	+ [ -f $HA_CF ] &&
	+ cp -p $HA_CF $1/
	[ -f $LOGD_CF ] &&
	cp -p $LOGD_CF $1/
	if iscrmrunning; then
	dumpstate $1
	+ touch $1/RUNNING
	else
	cp -p $HA_VARLIB/crm/cib.xml $1/ 2>/dev/null
	+ touch $1/STOPPED
	fi
	[ -f "$1/cib.xml" ] &&
	crm_verify -V -x $1/cib.xml >$1/crm_verify.txt 2>&1
	}

	#
	# remove values of sensitive attributes
	#
	# this is not proper xml parsing, but it will work under the
	# circumstances
	sanitize_xml_attrs() {
	sed $(
	for patt in $SANITIZE; do
	echo "-e /name=\"$patt\"/s/value=\"[^\"]\"/value=\"***\"/"
	done
	)
	}
	sanitize_hacf() {
	awk '
	$1=="stonith_host"{ for( i=5; i<=NF; i++ ) $i="****"; }
	{print}
	'
	}
	sanitize_one() {
	file=$1
	compress=""
	echo $file \| grep -qs 'gz$' && compress=gzip
	echo $file \| grep -qs 'bz2$' && compress=bzip2
	if [ "$compress" ]; then
	decompress="$compress -dc"
	else
	compress=cat
	decompress=cat
	fi
	tmp=`maketempfile` && ref=`maketempfile` \|\|
	fatal "cannot create temporary files"
	touch -r $file $ref # save the mtime
	if [ "`basename $file`" = ha.cf ]; then
	sanitize_hacf
	else
	$decompress \| sanitize_xml_attrs \| $compress
	fi < $file > $tmp
	mv $tmp $file
	touch -r $ref $file
	rm -f $ref
	}

	#
	# keep the user posted
	#
	fatal() {
	- echo "ERROR: $*" >&2
	+ echo "`uname -n`: ERROR: $*" >&2
	exit 1
	}
	warning() {
	- echo "WARN: $*" >&2
	+ echo "`uname -n`: WARN: $*" >&2
	}
	info() {
	- echo "INFO: $*" >&2
	+ echo "`uname -n`: INFO: $*" >&2
	}
	pickfirst() {
	for x; do
	which $x >/dev/null 2>&1 && {
	echo $x
	return 0
	}
	done
	return 1
	}

	-#
	-# run a command everywhere
	-#
	-forall() {
	- c="$*"
	- for n in `getnodes`; do
	- if [ "$n" = "`uname -n`" ]; then
	- $c
	- else
	- if [ "$SSH_USER" ]; then
	- echo $c \| ssh $SSH_OPTS $SSH_USER@$n
	- fi
	- fi
	- done
	-}
	-
	#
	# get some system info
	#
	distro() {
	which lsb_release >/dev/null 2>&1 && {
	lsb_release -d
	return
	}
	relf=`ls /etc/debian_version 2>/dev/null` \|\|
	relf=`ls /etc/slackware-version 2>/dev/null` \|\|
	relf=`ls -d /etc/*-release 2>/dev/null` && {
	for f in $relf; do
	test -f $f && {
	echo "`ls $f` `cat $f`"
	return
	}
	done
	}
	warning "no lsb_release no /etc/*-release no /etc/debian_version"
	}
	hb_ver() {
	which dpkg > /dev/null 2>&1 && {
	dpkg-query -f '${Version}' -W heartbeat 2>/dev/null \|\|
	dpkg-query -f '${Version}' -W heartbeat-2
	return
	}
	which rpm > /dev/null 2>&1 && {
	rpm -q --qf '%{version}' heartbeat
	return
	}
	# more packagers?
	}
	crm_info() {
	$HA_BIN/crmd version 2>&1
	}

File Metadata

Mime Type: text/x-diff
Expires: Wed, Oct 15, 11:50 PM (2 h, 51 m)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 2530583
Default Alt Text: (35 KB)

No OneTemporaryActions

View Options

File Metadata

Event Timeline

No OneTemporary
Actions