diff --git a/doc/Pacemaker_Remote/en-US/Ch-Example.txt b/doc/Pacemaker_Remote/en-US/Ch-Example.txt
index 9513e3da6a..cdc1823dd7 100644
--- a/doc/Pacemaker_Remote/en-US/Ch-Example.txt
+++ b/doc/Pacemaker_Remote/en-US/Ch-Example.txt
@@ -1,130 +1,130 @@
= Guest Node Quick Example =
If you already know how to use Pacemaker, you'll likely be able to grasp this
new concept of guest nodes by reading through this quick example without
having to sort through all the detailed walk-through steps. Here are the key
configuration ingredients that make this possible using libvirt and KVM virtual
guests. These steps strip everything down to the very basics.
(((guest node)))
(((node,guest node)))
== Mile-High View of Configuration Steps ==
* Give each virtual machine that will be used as a guest node a static network
address and unique hostname.
* Put the same authentication key with the path +/etc/pacemaker/authkey+ on
every cluster node and virtual machine. This secures remote communication.
+
Run this command if you want to make a somewhat random key:
+
----
dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
----
* Install pacemaker_remote on every virtual machine, enabling it to start at
boot, and if a local firewall is used, allow the node to accept connections
on TCP port 3121.
+
----
yum install pacemaker-remote resource-agents
systemctl enable pacemaker_remote
firewall-cmd --add-port 3121/tcp --permanent
----
+
[NOTE]
======
If you just want to see this work, you may want to simply disable the local
firewall and put SELinux in permissive mode while testing. This creates
security risks and should not be done on a production machine exposed to the
Internet, but can be appropriate for a protected test machine.
======
* Create a Pacemaker resource to launch each virtual machine, using the
*remote-node* meta-attribute to let Pacemaker know this will be a
guest node capable of running resources.
+
----
# pcs resource create vm-guest1 VirtualDomain hypervisor="qemu:///system" config="vm-guest1.xml" meta remote-node="guest1"
----
+
The above command will create CIB XML similar to the following:
+
[source,XML]
----
----
In the example above, the meta-attribute *remote-node="guest1"* tells Pacemaker
that this resource is a guest node with the hostname *guest1*. The cluster will
attempt to contact the virtual machine's pacemaker_remote service at the
hostname *guest1* after it launches.
[NOTE]
======
The ID of the resource creating the virtual machine (*vm-guest1* in the above
example) 'must' be different from the virtual machine's uname (*guest1* in the
above example). Pacemaker will create an implicit internal resource for the
pacemaker_remote connection to the guest, named with the value of *remote-node*,
so that value cannot be used as the name of any other resource.
======
== Using a Guest Node ==
Guest nodes will show up in `crm_mon` output as normal:
.Example `crm_mon` output after *guest1* is integrated into cluster
----
Last updated: Wed Mar 13 13:52:39 2013
Last change: Wed Mar 13 13:25:17 2013 via crmd on node1
Stack: corosync
Current DC: node1 (24815808) - partition with quorum
Version: 1.1.10
2 Nodes configured, unknown expected votes
2 Resources configured.
Online: [ node1 guest1]
vm-guest1 (ocf::heartbeat:VirtualDomain): Started node1
----
Now, you could place a resource, such as a webserver, on *guest1*:
----
# pcs resource create webserver apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s
-# pcs constraint webserver prefers guest1
+# pcs constraint location webserver prefers guest1
----
Now, the crm_mon output would show:
----
Last updated: Wed Mar 13 13:52:39 2013
Last change: Wed Mar 13 13:25:17 2013 via crmd on node1
Stack: corosync
Current DC: node1 (24815808) - partition with quorum
Version: 1.1.10
2 Nodes configured, unknown expected votes
2 Resources configured.
Online: [ node1 guest1]
vm-guest1 (ocf::heartbeat:VirtualDomain): Started node1
webserver (ocf::heartbeat::apache): Started guest1
----
It is worth noting that after *guest1* is integrated into the cluster, nearly all the
Pacemaker command-line tools immediately become available to the guest node.
This means things like `crm_mon`, `crm_resource`, and `crm_attribute` will work
natively on the guest node, as long as the connection between the guest node
and a cluster node exists. This is particularly important for any master/slave
resources executing on the guest node that need access to `crm_master` to set
transient attributes.
diff --git a/lrmd/main.c b/lrmd/main.c
index 98a14126df..7fc4d5f503 100644
--- a/lrmd/main.c
+++ b/lrmd/main.c
@@ -1,481 +1,481 @@
/*
* Copyright (c) 2012 David Vossel
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, write to the Free Software
* Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
*/
#include
#include
#include
#include
#include
#include
#include
#include
#include
#include
#if defined(HAVE_GNUTLS_GNUTLS_H) && defined(SUPPORT_REMOTE)
# define ENABLE_PCMK_REMOTE
#endif
GMainLoop *mainloop = NULL;
static qb_ipcs_service_t *ipcs = NULL;
stonith_t *stonith_api = NULL;
int lrmd_call_id = 0;
#ifdef ENABLE_PCMK_REMOTE
/* whether shutdown request has been sent */
static volatile sig_atomic_t shutting_down = FALSE;
/* timer for waiting for acknowledgment of shutdown request */
static volatile guint shutdown_ack_timer = 0;
static gboolean lrmd_exit(gpointer data);
#endif
static void
stonith_connection_destroy_cb(stonith_t * st, stonith_event_t * e)
{
stonith_api->state = stonith_disconnected;
crm_err("LRMD lost STONITH connection");
stonith_connection_failed();
}
stonith_t *
get_stonith_connection(void)
{
if (stonith_api && stonith_api->state == stonith_disconnected) {
stonith_api_delete(stonith_api);
stonith_api = NULL;
}
if (!stonith_api) {
int rc = 0;
int tries = 10;
stonith_api = stonith_api_new();
do {
rc = stonith_api->cmds->connect(stonith_api, "lrmd", NULL);
if (rc == pcmk_ok) {
stonith_api->cmds->register_notification(stonith_api,
T_STONITH_NOTIFY_DISCONNECT,
stonith_connection_destroy_cb);
break;
}
sleep(1);
tries--;
} while (tries);
if (rc) {
crm_err("Unable to connect to stonith daemon to execute command. error: %s",
pcmk_strerror(rc));
stonith_api_delete(stonith_api);
stonith_api = NULL;
}
}
return stonith_api;
}
static int32_t
lrmd_ipc_accept(qb_ipcs_connection_t * c, uid_t uid, gid_t gid)
{
crm_trace("Connection %p", c);
if (crm_client_new(c, uid, gid) == NULL) {
return -EIO;
}
return 0;
}
static void
lrmd_ipc_created(qb_ipcs_connection_t * c)
{
crm_client_t *new_client = crm_client_get(c);
crm_trace("Connection %p", c);
CRM_ASSERT(new_client != NULL);
/* Now that the connection is offically established, alert
* the other clients a new connection exists. */
notify_of_new_client(new_client);
}
static int32_t
lrmd_ipc_dispatch(qb_ipcs_connection_t * c, void *data, size_t size)
{
uint32_t id = 0;
uint32_t flags = 0;
crm_client_t *client = crm_client_get(c);
xmlNode *request = crm_ipcs_recv(client, data, size, &id, &flags);
CRM_CHECK(client != NULL, crm_err("Invalid client");
return FALSE);
CRM_CHECK(client->id != NULL, crm_err("Invalid client: %p", client);
return FALSE);
CRM_CHECK(flags & crm_ipc_client_response, crm_err("Invalid client request: %p", client);
return FALSE);
if (!request) {
return 0;
}
if (!client->name) {
const char *value = crm_element_value(request, F_LRMD_CLIENTNAME);
if (value == NULL) {
client->name = crm_itoa(crm_ipcs_client_pid(c));
} else {
client->name = strdup(value);
}
}
lrmd_call_id++;
if (lrmd_call_id < 1) {
lrmd_call_id = 1;
}
crm_xml_add(request, F_LRMD_CLIENTID, client->id);
crm_xml_add(request, F_LRMD_CLIENTNAME, client->name);
crm_xml_add_int(request, F_LRMD_CALLID, lrmd_call_id);
process_lrmd_message(client, id, request);
free_xml(request);
return 0;
}
/*!
* \internal
* \brief Free a client connection, and exit if appropriate
*
* \param[in] client Client connection to free
*/
void
lrmd_client_destroy(crm_client_t *client)
{
crm_client_destroy(client);
#ifdef ENABLE_PCMK_REMOTE
/* If we were waiting to shut down, we can now safely do so
* if there are no more proxied IPC providers
*/
if (shutting_down && (ipc_proxy_get_provider() == NULL)) {
lrmd_exit(NULL);
}
#endif
}
static int32_t
lrmd_ipc_closed(qb_ipcs_connection_t * c)
{
crm_client_t *client = crm_client_get(c);
if (client == NULL) {
return 0;
}
crm_trace("Connection %p", c);
client_disconnect_cleanup(client->id);
#ifdef ENABLE_PCMK_REMOTE
ipc_proxy_remove_provider(client);
#endif
lrmd_client_destroy(client);
return 0;
}
static void
lrmd_ipc_destroy(qb_ipcs_connection_t * c)
{
lrmd_ipc_closed(c);
crm_trace("Connection %p", c);
}
static struct qb_ipcs_service_handlers lrmd_ipc_callbacks = {
.connection_accept = lrmd_ipc_accept,
.connection_created = lrmd_ipc_created,
.msg_process = lrmd_ipc_dispatch,
.connection_closed = lrmd_ipc_closed,
.connection_destroyed = lrmd_ipc_destroy
};
int
lrmd_server_send_reply(crm_client_t * client, uint32_t id, xmlNode * reply)
{
crm_trace("sending reply to client (%s) with msg id %d", client->id, id);
switch (client->kind) {
case CRM_CLIENT_IPC:
return crm_ipcs_send(client, id, reply, FALSE);
#ifdef ENABLE_PCMK_REMOTE
case CRM_CLIENT_TLS:
return lrmd_tls_send_msg(client->remote, reply, id, "reply");
#endif
default:
crm_err("Unknown lrmd client type %d", client->kind);
}
return -1;
}
int
lrmd_server_send_notify(crm_client_t * client, xmlNode * msg)
{
crm_trace("sending notify to client (%s)", client->id);
switch (client->kind) {
case CRM_CLIENT_IPC:
if (client->ipcs == NULL) {
crm_trace("Asked to send event to disconnected local client");
return -1;
}
return crm_ipcs_send(client, 0, msg, crm_ipc_server_event);
#ifdef ENABLE_PCMK_REMOTE
case CRM_CLIENT_TLS:
if (client->remote == NULL) {
crm_trace("Asked to send event to disconnected remote client");
return -1;
}
return lrmd_tls_send_msg(client->remote, msg, 0, "notify");
#endif
default:
crm_err("Unknown lrmd client type %d", client->kind);
}
return -1;
}
/*!
* \internal
* \brief Clean up and exit immediately
*
* \param[in] data Ignored
*
* \return Doesn't return
* \note This can be used as a timer callback.
*/
static gboolean
lrmd_exit(gpointer data)
{
crm_info("Terminating with %d clients", crm_hash_table_size(client_connections));
if (stonith_api) {
stonith_api->cmds->remove_notification(stonith_api, T_STONITH_NOTIFY_DISCONNECT);
stonith_api->cmds->disconnect(stonith_api);
stonith_api_delete(stonith_api);
}
if (ipcs) {
mainloop_del_ipc_server(ipcs);
}
#ifdef ENABLE_PCMK_REMOTE
lrmd_tls_server_destroy();
ipc_proxy_cleanup();
#endif
crm_client_cleanup();
g_hash_table_destroy(rsc_list);
crm_exit(pcmk_ok);
return FALSE;
}
/*!
* \internal
* \brief Request cluster shutdown if appropriate, otherwise exit immediately
*
* \param[in] nsig Signal that caused invocation (ignored)
*/
static void
lrmd_shutdown(int nsig)
{
#ifdef ENABLE_PCMK_REMOTE
crm_client_t *ipc_proxy = ipc_proxy_get_provider();
/* If there are active proxied IPC providers, then we may be running
* resources, so notify the cluster that we wish to shut down.
*/
if (ipc_proxy) {
if (shutting_down) {
- crm_trace("Shutdown already in progress");
+ crm_notice("Waiting for cluster to stop resources before exiting");
return;
}
crm_info("Sending shutdown request to cluster");
if (ipc_proxy_shutdown_req(ipc_proxy) < 0) {
crm_crit("Shutdown request failed, exiting immediately");
} else {
/* We requested a shutdown. Now, we need to wait for an
* acknowledgement from the proxy host (which ensures the proxy host
* supports shutdown requests), then wait for all proxy hosts to
* disconnect (which ensures that all resources have been stopped).
*/
shutting_down = TRUE;
/* Stop accepting new proxy connections */
lrmd_tls_server_destroy();
/* Older crmd versions will never acknowledge our request, so set a
* fairly short timeout to exit quickly in that case. If we get the
* ack, we'll defuse this timer.
*/
shutdown_ack_timer = g_timeout_add_seconds(20, lrmd_exit, NULL);
/* Currently, we let the OS kill us if the clients don't disconnect
* in a reasonable time. We could instead set a long timer here
* (shorter than what the OS is likely to use) and exit immediately
* if it pops.
*/
return;
}
}
#endif
lrmd_exit(NULL);
}
/*!
* \internal
* \brief Defuse short exit timer if shutting down
*/
void handle_shutdown_ack()
{
#ifdef ENABLE_PCMK_REMOTE
if (shutting_down) {
crm_info("Received shutdown ack");
if (shutdown_ack_timer > 0) {
g_source_remove(shutdown_ack_timer);
}
return;
}
#endif
crm_debug("Ignoring unexpected shutdown ack");
}
/* *INDENT-OFF* */
static struct crm_option long_options[] = {
/* Top-level Options */
{"help", 0, 0, '?', "\tThis text"},
{"version", 0, 0, '$', "\tVersion information" },
{"verbose", 0, 0, 'V', "\tIncrease debug output"},
{"logfile", 1, 0, 'l', "\tSend logs to the additional named logfile"},
/* For compatibility with the original lrmd */
{"dummy", 0, 0, 'r', NULL, 1},
{0, 0, 0, 0}
};
/* *INDENT-ON* */
int
main(int argc, char **argv)
{
int flag = 0;
int index = 0;
const char *option = NULL;
#ifndef ENABLE_PCMK_REMOTE
crm_log_preinit("lrmd", argc, argv);
crm_set_options(NULL, "[options]", long_options,
"Daemon for controlling services confirming to different standards");
#else
crm_log_preinit("pacemaker_remoted", argc, argv);
crm_set_options(NULL, "[options]", long_options,
"Pacemaker Remote daemon for extending pacemaker functionality to remote nodes.");
#endif
while (1) {
flag = crm_get_option(argc, argv, &index);
if (flag == -1) {
break;
}
switch (flag) {
case 'r':
break;
case 'l':
crm_add_logfile(optarg);
break;
case 'V':
crm_bump_log_level(argc, argv);
break;
case '?':
case '$':
crm_help(flag, EX_OK);
break;
default:
crm_help('?', EX_USAGE);
break;
}
}
crm_log_init(NULL, LOG_INFO, TRUE, FALSE, argc, argv, FALSE);
option = daemon_option("logfacility");
if(option && safe_str_neq(option, "none")) {
setenv("HA_LOGFACILITY", option, 1); /* Used by the ocf_log/ha_log OCF macro */
}
option = daemon_option("logfile");
if(option && safe_str_neq(option, "none")) {
setenv("HA_LOGFILE", option, 1); /* Used by the ocf_log/ha_log OCF macro */
if (daemon_option_enabled(crm_system_name, "debug")) {
setenv("HA_DEBUGLOG", option, 1); /* Used by the ocf_log/ha_debug OCF macro */
}
}
/* The presence of this variable allegedly controls whether child
* processes like httpd will try and use Systemd's sd_notify
* API
*/
unsetenv("NOTIFY_SOCKET");
/* Used by RAs - Leave owned by root */
crm_build_path(CRM_RSCTMP_DIR, 0755);
/* Legacy: Used by RAs - Leave owned by root */
crm_build_path(HA_STATE_DIR"/heartbeat/rsctmp", 0755);
rsc_list = g_hash_table_new_full(crm_str_hash, g_str_equal, NULL, free_rsc);
ipcs = mainloop_add_ipc_server(CRM_SYSTEM_LRMD, QB_IPC_SHM, &lrmd_ipc_callbacks);
if (ipcs == NULL) {
crm_err("Failed to create IPC server: shutting down and inhibiting respawn");
crm_exit(DAEMON_RESPAWN_STOP);
}
#ifdef ENABLE_PCMK_REMOTE
{
const char *remote_port_str = getenv("PCMK_remote_port");
int remote_port = remote_port_str ? atoi(remote_port_str) : DEFAULT_REMOTE_PORT;
if (lrmd_init_remote_tls_server(remote_port) < 0) {
crm_err("Failed to create TLS server on port %d: shutting down and inhibiting respawn", remote_port);
crm_exit(DAEMON_RESPAWN_STOP);
}
ipc_proxy_init();
}
#endif
mainloop_add_signal(SIGTERM, lrmd_shutdown);
mainloop = g_main_new(FALSE);
crm_info("Starting");
g_main_run(mainloop);
/* should never get here */
lrmd_exit(NULL);
return pcmk_ok;
}
diff --git a/tools/report.collector b/tools/report.collector
index 0cf4a1e29c..ecd1546b34 100644
--- a/tools/report.collector
+++ b/tools/report.collector
@@ -1,791 +1,794 @@
# Copyright (C) 2007 Dejan Muhamedagic
# Almost everything as part of hb_report
# Copyright (C) 2010 Andrew Beekhof
# Cleanups, refactoring, extensions
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public
# License as published by the Free Software Foundation; either
# version 2.1 of the License, or (at your option) any later version.
#
# This software is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
#
# You should have received a copy of the GNU General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
if
echo $REPORT_HOME | grep -qs '^/'
then
debug "Using full path to working directory: $REPORT_HOME"
else
REPORT_HOME="$HOME/$REPORT_HOME"
debug "Canonicalizing working directory path: $REPORT_HOME"
fi
detect_host
findlogdcf() {
for f in \
`test -x $CRM_DAEMON_DIR/ha_logd &&
which strings > /dev/null 2>&1 &&
strings $CRM_DAEMON_DIR/ha_logd | grep 'logd\.cf'` \
`for d; do echo $d/logd.cf $d/ha_logd.cf; done`
do
if [ -f "$f" ]; then
echo $f
debug "Located logd.cf at: $f"
return 0
fi
done
debug "Could not determine logd.cf location"
return 1
}
#
# find files newer than a and older than b
#
isnumber() {
echo "$*" | grep -qs '^[0-9][0-9]*$'
}
touchfile() {
t=`mktemp` &&
perl -e "\$file=\"$t\"; \$tm=$1;" -e 'utime $tm, $tm, $file;' &&
echo $t
}
find_files_clean() {
[ -z "$from_stamp" ] || rm -f "$from_stamp"
[ -z "$to_stamp" ] || rm -f "$to_stamp"
from_stamp=""
to_stamp=""
}
find_files() {
dirs=
from_time=$2
to_time=$3
for d in $1; do
if [ -d $d ]; then
dirs="$dirs $d"
fi
done
if [ x"$dirs" = x ]; then
return
fi
isnumber "$from_time" && [ "$from_time" -gt 0 ] || {
warning "sorry, can't find files in [ $1 ] based on time if you don't supply time"
return
}
trap find_files_clean 0
if ! from_stamp=`touchfile $from_time`; then
warning "sorry, can't create temporary file for find_files"
return
fi
findexp="-newer $from_stamp"
if isnumber "$to_time" && [ "$to_time" -gt 0 ]; then
if ! to_stamp=`touchfile $to_time`; then
warning "sorry, can't create temporary file for find_files"
find_files_clean
return
fi
findexp="$findexp ! -newer $to_stamp"
fi
find $dirs -type f $findexp
find_files_clean
trap "" 0
}
#
# check permissions of files/dirs
#
pl_checkperms() {
perl -e '
# check permissions and ownership
# uid and gid are numeric
# everything must match exactly
# no error checking! (file should exist, etc)
($filename, $perms, $in_uid, $in_gid) = @ARGV;
($mode,$uid,$gid) = (stat($filename))[2,4,5];
$p=sprintf("%04o", $mode & 07777);
$p ne $perms and exit(1);
$uid ne $in_uid and exit(1);
$gid ne $in_gid and exit(1);
' $*
}
num_id() {
getent $1 $2 | awk -F: '{print $3}'
}
chk_id() {
[ "$2" ] && return 0
echo "$1: id not found"
return 1
}
check_perms() {
while read type f p uid gid; do
[ -$type $f ] || {
echo "$f wrong type or doesn't exist"
continue
}
n_uid=`num_id passwd $uid`
chk_id "$uid" "$n_uid" || continue
n_gid=`num_id group $gid`
chk_id "$gid" "$n_gid" || continue
pl_checkperms $f $p $n_uid $n_gid || {
echo "wrong permissions or ownership for $f:"
ls -ld $f
}
done
}
#
# coredumps
#
findbinary() {
random_binary=`which cat 2>/dev/null` # suppose we are lucky
binary=`gdb $random_binary $1 < /dev/null 2>/dev/null |
grep 'Core was generated' | awk '{print $5}' |
sed "s/^.//;s/[.':]*$//"`
if [ x = x"$binary" ]; then
debug "Could not detect the program name for core $1 from the gdb output; will try with file(1)"
binary=$(file $1 | awk '/from/{
for( i=1; i<=NF; i++ )
if( $i == "from" ) {
print $(i+1)
break
}
}')
binary=`echo $binary | tr -d "'"`
binary=$(echo $binary | tr -d '`')
if [ "$binary" ]; then
binary=`which $binary 2>/dev/null`
fi
fi
if [ x = x"$binary" ]; then
warning "Could not find the program path for core $1"
return
fi
fullpath=`which $binary 2>/dev/null`
if [ x = x"$fullpath" ]; then
if [ -x $CRM_DAEMON_DIR/$binary ]; then
echo $CRM_DAEMON_DIR/$binary
debug "Found the program at $CRM_DAEMON_DIR/$binary for core $1"
else
warning "Could not find the program path for core $1"
fi
else
echo $fullpath
debug "Found the program at $fullpath for core $1"
fi
}
getbt() {
which gdb > /dev/null 2>&1 || {
warning "Please install gdb to get backtraces"
return
}
for corefile; do
absbinpath=`findbinary $corefile`
[ x = x"$absbinpath" ] && continue
echo "====================== start backtrace ======================"
ls -l $corefile
# Summary first...
gdb -batch -n -quiet -ex ${BT_OPTS:-"thread apply all bt"} -ex quit \
$absbinpath $corefile 2>/dev/null
echo "====================== start detail ======================"
# Now the unreadable details...
gdb -batch -n -quiet -ex ${BT_OPTS:-"thread apply all bt full"} -ex quit \
$absbinpath $corefile 2>/dev/null
echo "======================= end backtrace ======================="
done
}
getconfig() {
cluster=$1; shift;
target=$1; shift;
for cf in $*; do
if [ -e "$cf" ]; then
cp -a "$cf" $target/
fi
done
crm_uuid -r > $target/$HB_UUID_F 2>&1
if
ps -ef | egrep -qs [c]rmd
then
crm_mon -1 2>&1 | grep -v '^Last upd' > $target/$CRM_MON_F
cibadmin -Ql 2>/dev/null > $target/${CIB_F}.live
case $cluster in
cman) crm_node -p --cman > $target/$MEMBERSHIP_F 2>&1;;
corosync|openais) crm_node -p --openais > $target/$MEMBERSHIP_F 2>&1;;
heartbeat) crm_node -p --heartbeat > $target/$MEMBERSHIP_F 2>&1;;
*) crm_node -p > $target/$MEMBERSHIP_F 2>&1;;
esac
echo "$host" > $target/RUNNING
else
echo "$host" > $target/STOPPED
fi
if [ -f "$target/$CIB_F" ]; then
crm_verify -V -x $target/$CIB_F >$target/$CRM_VERIFY_F 2>&1
CIB_file=$target/$CIB_F crm configure show >$target/$CIB_TXT_F 2>&1
fi
}
#
# remove values of sensitive attributes
#
# this is not proper xml parsing, but it will work under the
# circumstances
sanitize_xml_attrs() {
sed $(
for patt in $SANITIZE; do
echo "-e /name=\"$patt\"/s/value=\"[^\"]*\"/value=\"****\"/"
done
)
}
sanitize_hacf() {
awk '
$1=="stonith_host"{ for( i=5; i<=NF; i++ ) $i="****"; }
{print}
'
}
sanitize_one_clean() {
[ -z "$tmp" ] || rm -f "$tmp"
tmp=""
[ -z "$ref" ] || rm -f "$ref"
ref=""
}
sanitize() {
file=$1
compress=""
if [ -z "$SANITIZE" ]; then
return
fi
echo $file | grep -qs 'gz$' && compress=gzip
echo $file | grep -qs 'bz2$' && compress=bzip2
if [ "$compress" ]; then
decompress="$compress -dc"
else
compress=cat
decompress=cat
fi
trap sanitize_one_clean 0
tmp=`mktemp`
ref=`mktemp`
if [ -z "$tmp" -o -z "$ref" ]; then
sanitize_one_clean
fatal "cannot create temporary files"
fi
touch -r $file $ref # save the mtime
if [ "`basename $file`" = ha.cf ]; then
sanitize_hacf
else
$decompress | sanitize_xml_attrs | $compress
fi < $file > $tmp
mv $tmp $file
# note: cleaning $tmp up is still needed even after it's renamed
# because its temp directory is still there.
touch -r $ref $file
sanitize_one_clean
trap "" 0
}
#
# get some system info
#
distro() {
if
which lsb_release >/dev/null 2>&1
then
lsb_release -d
debug "Using lsb_release for distribution info"
return
fi
relf=`ls /etc/debian_version 2>/dev/null` ||
relf=`ls /etc/slackware-version 2>/dev/null` ||
relf=`ls -d /etc/*-release 2>/dev/null` && {
for f in $relf; do
test -f $f && {
echo "`ls $f` `cat $f`"
debug "Found `echo $relf | tr '\n' ' '` distribution release file(s)"
return
}
done
}
warning "No lsb_release, no /etc/*-release, no /etc/debian_version: no distro information"
}
pkg_ver() {
if which dpkg >/dev/null 2>&1 ; then
pkg_mgr="deb"
elif which rpm >/dev/null 2>&1 ; then
pkg_mgr="rpm"
elif which pkg_info >/dev/null 2>&1 ; then
pkg_mgr="pkg_info"
elif which pkginfo >/dev/null 2>&1 ; then
pkg_mgr="pkginfo"
else
warning "Unknown package manager"
return
fi
debug "The package manager is: $pkg_mgr"
echo "The package manager is: $pkg_mgr"
# for Linux .deb based systems
case $pkg_mgr in
deb)
dpkg-query -f '${Package} ${Version} ${Architecture}\n' -W | sort
for pkg in $*; do
if dpkg-query -W $pkg 2>/dev/null ; then
debug "Verifying installation of: $pkg"
echo "Verifying installation of: $pkg"
debsums -s $pkg 2>/dev/null
fi
done
;;
rpm)
rpm -qa --qf '%{name} %{version}-%{release} - %{distribution} %{arch}\n' | sort
for pkg in $*; do
if rpm -q $pkg >/dev/null 2>&1 ; then
debug "Verifying installation of: $pkg"
echo "Verifying installation of: $pkg"
rpm --verify $pkg 2>&1
fi
done
;;
pkg_info)
pkg_info
;;
pkginfo)
pkginfo | awk '{print $3}' # format?
;;
esac
}
getbacktraces() {
debug "Looking for backtraces: $*"
flist=$(
for f in `find_files "$CRM_CORE_DIRS" $1 $2`; do
bf=`basename $f`
test `expr match $bf core` -gt 0 &&
echo $f
done)
if [ "$flist" ]; then
for core in $flist; do
log "Found core file: `ls -al $core`"
done
# Make a copy of them in case we need more data later
# Luckily they compress well
mkdir cores &> /dev/null
cp -a $flist cores/
shrink cores
rm -rf cores
# Now get as much as we can from them automagically
for f in $flist; do
getbt $f
done
fi
}
getpeinputs() {
flist=$(
find_files $PE_STATE_DIR $1 $2 | sed "s,`dirname $PE_STATE_DIR`/,,g"
)
if [ "$flist" ]; then
(cd `dirname $PE_STATE_DIR` && tar cf - $flist) | (cd $3 && tar xf -)
debug "found `echo $flist | wc -w` pengine input files in $PE_STATE_DIR"
fi
}
getblackboxes() {
flist=$(
find_files $BLACKBOX_DIR $1 $2
)
for bb in $flist; do
bb_short=`basename $bb`
qb-blackbox $bb &> $3/${bb_short}.blackbox
info "Extracting contents of blackbox: $bb_short"
done
}
#
# some basic system info and stats
#
sys_info() {
cluster=$1; shift
echo "Platform: `uname`"
echo "Kernel release: `uname -r`"
echo "Architecture: `uname -m`"
if [ `uname` = Linux ]; then
echo "Distribution: `distro`"
fi
cibadmin --version 2>&1
cibadmin -! 2>&1
case $1 in
openais)
: echo "openais version: how?"
;;
cman)
cman_tool -V
/usr/sbin/corosync -v 2>&1
;;
corosync)
/usr/sbin/corosync -v 2>&1
;;
heartbeat)
heartbeat version: `$CRM_DAEMON_DIR/heartbeat -V` 2>&1
;;
esac
# Cluster glue version hash (if available)
stonith -V 2>/dev/null
# Resource agents version hash
echo "resource-agents: `grep 'Build version:' /usr/lib/ocf/resource.d/heartbeat/.ocf-shellfuncs`"
pkg_ver $*
}
sys_stats() {
set -x
uname -n
uptime
ps axf
ps auxw
top -b -n 1
ifconfig -a
ip addr list
netstat -i
arp -an
test -d /proc && {
cat /proc/cpuinfo
}
lsscsi
lspci
mount
df
set +x
}
dlm_dump() {
if which dlm_tool >/dev/null 2>&1 ; then
if
ps -ef | egrep -qs '[d]lm_controld'
then
echo "--- Lockspace overview:"
dlm_tool ls -n
echo "---Lockspace history:"
dlm_tool dump
echo "---Lockspace status:"
dlm_tool status
dlm_tool status -v
echo "---Lockspace config:"
dlm_tool dump_config
dlm_tool log_plock
dlm_tool ls | grep name |
while read X N ; do
echo "--- Lockspace $N:"
dlm_tool lockdump "$N"
dlm_tool lockdebug -svw "$N"
done
fi
fi
}
iscfvarset() {
test "`getcfvar $1 $2`"
}
iscfvartrue() {
getcfvar $1 $2 $3 | egrep -qsi "^(true|y|yes|on|1)"
}
uselogd() {
cf_file=$2
case $1 in
heartbeat)
iscfvartrue $1 use_logd $cf_file && return 0 # if use_logd true
iscfvarset $1 logfacility $cf_file ||
iscfvarset $1 logfile $cf_file ||
iscfvarset $1 debugfile $cf_file ||
return 0 # or none of the log options set
false
;;
*)
iscfvartrue $1 use_logd $cf_file
;;
esac
}
get_logfiles() {
cf_type=$1
cf_file="$2"
cf_logd="$3"
facility_var="logfacility"
if [ -f "$cf_logd" ]; then
if uselogd; then
cf_file="$cf_logd"
cf_type="logd"
fi
fi
debug "Reading $cf_type log settings"
case $cf_type in
cman|openais|corosync)
debug "Reading log settings from $cf_file"
if iscfvartrue $cf_type to_syslog $cf_file; then
facility_var=syslog_facility
fi
if iscfvartrue $cf_type to_logfile $cf_file; then
logfile=`getcfvar $cf_type logfile $cf_file`
fi
;;
heartbeat|logd)
debug "Reading log settings from $cf_file"
if
iscfvartrue $cf_type debug $cf_file
then
logfile=`getcfvar $cf_type debugfile $cf_file`
else
logfile=`getcfvar $cf_type logfile $cf_file`
fi
;;
*) debug "Unknown cluster type: $cf_type"
echo "/var/log/pacemaker.log"
;;
esac
if [ "x$logfile" != "x" -a -f "$logfile" ]; then
echo $logfile
fi
if [ "x$facility" = x ]; then
facility=`getcfvar $cf_type $facility_var $cf_file`
[ "" = "$facility" ] && facility="daemon"
fi
if [ "x$facility" = x ]; then
facility="daemon"
fi
# Always include system logs (if we can find them)
msg="Mark:pcmk:`perl -e 'print time()'`"
logger -p $facility.info $msg >/dev/null 2>&1
sleep 2 # Give syslog time to catch up in case its busy
findmsg 1 "$msg"
# Initial pacemakerd logs and tracing might also go to a file (other than the syslog log file)
findmsg 3 "Starting Pacemaker"
# Make sure we get something from the Policy Engine
findmsg 3 "Calculated Transition"
# These patterns look for cib and lrmd updates
# Helpful on non-DC nodes or when the cluster has been up for a long time
findmsg 3 cib_perform_op
findmsg 3 process_lrm_event
}
essential_files() {
cat< $SYSINFO_F
essential_files $cluster | check_perms > $PERMISSIONS_F 2>&1
getconfig $cluster "$REPORT_HOME/$REPORT_TARGET" "$cluster_cf" "$logd_cf" "$CRM_CONFIG_DIR/$CIB_F" "$HA_STATE_DIR/hostcache" "/etc/drbd.conf" "/etc/drbd.d" "/etc/booth"
getpeinputs $LOG_START $LOG_END $REPORT_HOME/$REPORT_TARGET
getbacktraces $LOG_START $LOG_END > $REPORT_HOME/$REPORT_TARGET/$BT_F
getblackboxes $LOG_START $LOG_END $REPORT_HOME/$REPORT_TARGET
case $cluster in
cman|corosync)
if
ps -ef | egrep -qs '[c]orosync'
then
corosync-blackbox &> corosync-blackbox-live.txt
fi
# corosync-fplay > corosync-blackbox.txt
tool=`pickfirst corosync-objctl corosync-cmapctl`
case $tool in
*objctl) $tool -a > corosync.dump 2>/dev/null;;
*cmapctl) $tool > corosync.dump 2>/dev/null;;
esac
corosync-quorumtool -s -i > corosync.quorum 2>&1
;;
esac
dc=`crm_mon -1 2>/dev/null | awk '/Current DC/ {print $3}'`
if [ "$REPORT_TARGET" = "$dc" ]; then
echo "$REPORT_TARGET" > DC
fi
dlm_dump > $DLM_DUMP_F 2>&1
sys_stats > $SYSSTATS_F 2>&1
debug "Sanitizing files: $SANITIZE"
#
# replace sensitive info with '****'
#
cf=""
if [ ! -z "$cluster_cf" ]; then
cf=`basename $cluster_cf`
fi
for f in $cf $CIB_F $CIB_TXT_F $CIB_F.live pengine/*; do
if [ -f "$f" ]; then
sanitize $f
fi
done
# Grab logs
start=`date -d @${LOG_START} +"%F %T"`
end=`date -d @${LOG_END} +"%F %T"`
debug "Gathering logs from $start to $end: $logfiles $EXTRA_LOGS"
trap '[ -z "$pattfile" ] || rm -f "$pattfile"' 0
pattfile=`mktemp` || fatal "cannot create temporary files"
for p in $LOG_PATTERNS; do
echo "$p"
done > $pattfile
for l in $logfiles $EXTRA_LOGS; do
- b=`basename $l`
+ b="$(basename $l).extract.txt"
+
if [ ! -f "$l" ]; then
# Not a file
continue
elif [ -f "$b" ]; then
# We already have it
continue
fi
dumplogset "$l" $LOG_START $LOG_END > "$b"
+ sanitize "$b"
+
echo "Log patterns $REPORT_TARGET:" > $ANALYSIS_F
- cat $b | grep -f $pattfile >> $ANALYSIS_F
+ grep -f "$pattfile" "$b" >> $ANALYSIS_F
done
which journalctl > /dev/null 2>&1
if [ $? = 0 ]; then
log "Including segment [$LOG_START-$LOG_END] from journald"
journalctl --since "$start" --until "$end" > journal.log
cat journal.log | grep -f $pattfile >> $ANALYSIS_F
fi
rm -f $pattfile
trap "" 0
# Purge files containing no information
for f in `ls -1`; do
if [ -d "$f" ]; then
continue
elif [ ! -s "$f" ]; then
case $f in
*core*) log "Detected empty core file: $f";;
*) debug "Removing empty file: `ls -al $f`"
rm -f $f
;;
esac
fi
done
# Parse for events
for l in $logfiles $EXTRA_LOGS; do
- node_events `basename $l` > $EVENTS_F
+ b="$(basename $l).extract.txt"
+ node_events "$b" > $EVENTS_F
# Link the first logfile to a standard name if it doesn't yet exist
- f=`basename $l`
- if [ -e $f -a ! -e $HALOG_F ]; then
- ln -s $f $HALOG_F
+ if [ -e "$b" -a ! -e "$HALOG_F" ]; then
+ ln -s "$b" "$HALOG_F"
fi
done
if [ -e $REPORT_HOME/.env ]; then
debug "Localhost: $REPORT_MASTER $REPORT_TARGET"
elif [ "$REPORT_MASTER" != "$REPORT_TARGET" ]; then
debug "Streaming report back to $REPORT_MASTER"
(cd $REPORT_HOME && tar cf - $REPORT_TARGET)
if [ "$REMOVE" = "1" ]; then
cd
rm -rf $REPORT_HOME
fi
fi