Page Menu
Home
ClusterLabs Projects
Search
Configure Global Search
Log In
Files
F3151845
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Flag For Later
Award Token
Size
46 KB
Referenced Files
None
Subscribers
None
View Options
diff --git a/configure.ac b/configure.ac
index a1edb59..be6a1d8 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1,57 +1,56 @@
dnl
dnl autoconf for Agents
dnl
dnl License: GNU General Public License (GPL)
dnl ===============================================
dnl Bootstrap
dnl ===============================================
AC_PREREQ(2.63)
dnl Suggested structure:
dnl information on the package
dnl checks for programs
dnl checks for libraries
dnl checks for header files
dnl checks for types
dnl checks for structures
dnl checks for compiler characteristics
dnl checks for library functions
dnl checks for system services
AC_INIT([sbd],
[1.0],
[lmb@suse.com])
AC_CANONICAL_HOST
AC_CONFIG_AUX_DIR(.)
AC_CONFIG_HEADERS(config.h)
AM_INIT_AUTOMAKE
AM_PROG_CC_C_O
PKG_CHECK_MODULES(glib, [glib-2.0])
PKG_CHECK_MODULES(libcoroipcc, [libcoroipcc])
PKG_CHECK_MODULES(pcmk, [pcmk, pcmk-cib])
PKG_CHECK_MODULES(libxml, [libxml-2.0])
dnl checks for libraries
AC_CHECK_LIB(aio, io_setup, , missing="yes")
AC_CHECK_LIB(plumbgpl, init_set_proc_title, , missing="yes")
AC_CHECK_LIB(crmcommon, set_crm_log_level, , missing="yes")
AC_CHECK_LIB(cib, cib_new, , missing="yes")
AC_CHECK_LIB(pe_status, pe_find_node, , missing="yes")
AC_CHECK_LIB(pe_rules, test_rule, , missing="yes")
AC_CHECK_LIB(crmcluster, crm_peer_init, , missing="yes")
if test "$missing" = "yes"; then
AC_MSG_ERROR([Missing required libraries or functions.])
fi
-
-
+AC_PATH_PROGS(POD2MAN, pod2man, pod2man)
dnl The Makefiles and shell scripts we output
AC_CONFIG_FILES([Makefile src/Makefile agent/Makefile man/Makefile])
dnl Now process the entire list of files added by previous
dnl calls to AC_CONFIG_FILES()
AC_OUTPUT()
diff --git a/man/Makefile.am b/man/Makefile.am
index 6e2684d..28efd74 100644
--- a/man/Makefile.am
+++ b/man/Makefile.am
@@ -1,4 +1,7 @@
-man_MANS = sbd.7 sbd.8
+man_MANS = sbd.8
EXTRA_DIST = $(man_MANS)
+sbd.8: sbd.8.pod
+ @POD2MAN@ -s 8 -c "STONITH Block Device" -r "SBD" -n "SBD" $< $@
+
diff --git a/man/sbd.7 b/man/sbd.7
deleted file mode 100644
index 100c05c..0000000
--- a/man/sbd.7
+++ /dev/null
@@ -1,326 +0,0 @@
-.TH sbd 7 "29 Mar 2012" "" "cluster-glue"
-.\"
-.SH NAME
-sbd \- Stonith Block Device
-.\"
-.SH DESCRIPTION
-.br
-\fB* Data Protection\fR
-
-The SLE HA cluster stack's highest priority is protecting the integrity
-of data. This is achieved by preventing uncoordinated concurrent access
-to data storage - such as mounting an ext3 file system more than once in
-the cluster, but also preventing OCFS2 from being mounted if
-coordination with other cluster nodes is not available. In a
-well-functioning cluster, Pacemaker will detect if resources are active
-beyond their concurrency limits and initiate recovery; further, its
-policy engine will never exceed these limitations.
-
-However, network partitioning or software malfunction could potentially
-cause scenarios where several coordinators are elected. If this
-so-called split brain scenario were allowed to unfold, data corruption
-might occur. Hence, several layers of protection have been added to the
-cluster stack to mitigate this.
-
-IO fencing/STONITH is the primary component contributing to this goal,
-since they ensure that, prior to storage activation, all other access is
-terminated; cLVM2 exclusive activation or OCFS2 file locking support are
-other mechanisms, protecting against administrative or application
-faults. Combined appropriately for your setup, these can reliably
-prevent split-brain scenarios from causing harm.
-
-This chapter describes an IO fencing mechanism that leverages the
-storage itself, following by a description of an additional layer of
-protection to ensure exclusive storage access. These two mechanisms can
-even be combined for higher levels of protection.
-.\"
-.P
-\fB* Storage-based Fencing\fR
-
-In scenarios where shared storage is used one can
-leverage said shared storage for very reliable I/O fencing and avoidance
-of split-brain scenarios.
-
-This mechanism has been used successfully with the Novell Cluster Suite
-and is also available in a similar fashion for the SLE HA 11 product
-using the "external/sbd" STONITH agent.
-
-In an environment where all nodes have access to shared storage, a small
-partition is formated for use with SBD. The sbd daemon, once
-configured, is brought online on each node before the rest of the
-cluster stack is started, and terminated only after all other cluster
-components have been shut down - ensuring that cluster resources are
-never activated without SBD supervision.
-
-The daemon automatically allocates one of the message slots on the
-partition to itself, and constantly monitors it for messages to itself.
-Upon receipt of a message, the daemon immediately complies with the
-request, such as initiating a power-off or reboot cycle for fencing.
-
-The daemon also constantly monitors connectivity to the storage device,
-and commits suicide in case the partition becomes unreachable,
-guaranteeing that it is not disconnected from fencing message. (If the
-cluster data resides on the same logical unit in a different partition,
-this is not an additional point of failure; the work-load would
-terminate anyway if the storage connectivity was lost.)
-
-SBD supports one, two, or three devices. This affects the operation
-of SBD as follows:
-
-.B ** One device
-
-In its most simple implementation, you use one device only. (Older
-versions of SBD did not support more.) This is appropriate for clusters
-where all your data is on the same shared storage (with internal redundancy)
-anyway; the SBD device does not introduce an additional single point of
-failure then.
-
-If the SBD device is not accessible, the daemon will fail to start and
-inhibit openais startup.
-
-.B ** Two devices
-
-This configuration is a trade-off, primarily aimed at environments where
-host-based mirroring is used, but no third storage device is available.
-
-SBD will not commit suicide if it loses access to one mirror leg; this
-allows the cluster to continue to function even in the face of one outage.
-
-However, SBD will not fence the other side while only one mirror leg is
-available, since it does not have enough knowledge to detect an asymmetric
-split of the storage. So it will not be able to automatically tolerate a
-second failure while one of the storage arrays is down. (Though you
-can use the appropriate crm command to acknowledge the fence manually.)
-
-If devices are configured different, the cluster will not start.
-If no header is on the devices, the cluster starts and keeps looking for a
-valid header.
-
-.B ** Three devices
-
-In this most reliable configuration, SBD will only commit suicide if more
-than one device is lost; hence, this configuration is resilient against
-one device outages (be it due to failures or maintenance). Fencing
-messages can be successfully relayed if at least two devices remain up.
-
-If one device out of three is completely missing at cluster start, the cluster
-will start. If one device out of three is available, but mis-configured, the
-cluster will not start. If two devices are completely missing, the cluster
-will also not start.
-
-This configuration is appropriate for more complex scenarios where storage
-is not confined to a single array.
-
-Host-based mirroring solutions could have one SBD per mirror leg (not
-mirrored itself), and an additional tie-breaker on iSCSI.
-
-.\"
-.P
-\fB* Pre-Requisites\fR
-
-The environment must have shared storage reachable by all nodes.
-You must dedicate a small partition of each as the SBD device.
-This shared storage segment must not make use of host-based RAID, cLVM2,
-nor DRBD.
-
-The SBD device can be connected via Fibre Channel, Fibre Channel over
-Eterhnet, or even iSCSI. Thus, an iSCSI target can become a sort-of
-network-based quorum server; the advantage is that it does not require
-a smart host at your third location, just block storage.
-
-However, using storage-based RAID and multipathing is recommended for
-increased reliability.
-.\"
-.P
-\fB* SBD Partition\fR
-
-It is recommended to create a tiny partition at the start of the device.
-In the rest of this text, this is referred to as "/dev/<SBD>" or "/dev/<SBD_n>",
-please substitute your actual pathnames
-(f.e. "/dev/disk/by-id/scsi-1494554000000000036363600000000000000000000000000-part1")
-for this below.
-
-The size of the SBD device depends on the block size of the underlying
-device. SBD uses 255 slots. Thus, 1MB is fine on plain SCSI devices and
-SAN storages with 512 byte blocks. On the IBM s390x architecture disks could
-have larger block sizes, as 4096 bytes. Therefor 4MB or more are needed there.
-
-After having made very sure that this is indeed the device you want to
-use, and does not hold any data you need - as the sbd command will
-overwrite it without further requests for confirmation -, initialize the
-sbd device.
-
-If your SBD device resides on a multipath group, you may need to adjust
-the timeouts sbd uses, as MPIO's path down detection can cause some
-latency: after the msgwait timeout, the message is assumed to have been
-delivered to the node. For multipath, this should be the time required
-for MPIO to detect a path failure and switch to the next path. You may
-have to test this in your environment. The node will perform suicide if
-it has not updated the watchdog timer fast enough; the watchdog timeout
-must be shorter than the msgwait timeout - half the value is a good
-estimate. This can be specified when the SBD device is initialized.
-.\"
-.P
-\fB* Testing and Starting the SBD Daemon\fR
-
-The sbd daemon is a critical piece of the cluster stack. It must always
-be running when the cluster stack is up, or even when the rest of it has
-crashed, so that it can be fenced.
-
-The openais init script starts and stops SBD if configured; add the
-following to /etc/sysconfig/sbd:
-
-===
-.br
-#/etc/sysconfig/sbd
-.br
-# SBD devices (no trailing ";"):
-.br
-SBD_DEVICE="/dev/<SBD_1>;/dev/<SBD_2>;/dev/<SBD_3>"
-.br
-# Watchdog support:
-.br
-SBD_OPTS="-W -t300"
-.br
-===
-
-Note: If the SBD device becomes inaccessible from a node, this could
-cause the node to enter an infinite reboot cycle. That is technically
-correct, but depending on your administrative policies, might be
-considered a nuisance. You may wish to not automatically start up
-openais on boot in such cases.
-
-Before proceeding, ensure that SBD has indeed started on all nodes
-through "rcopenais restart".
-Once the resource has started, your cluster is now successfully
-configured for shared-storage fencing, and will utilize this method in
-case a node needs to be fenced.
-
-The command sbd
-can be used to read and write the sbd device, see sbd(8) .
-
-To complete the sbd setup, it is necessary to activate SBD as a
-STONITH/fencing mechanism in the CIB.
-The SBD mechanism is used instead of other fencing/stonith mechanisms;
-please disable any others you might have configured before.
-.\"
-.P
-\fB* Software Watchdog\fR
-
-Increased protection is offered through "watchdog" support. Modern
-systems support a "hardware watchdog" that has to be updated by the
-software client, or else the hardware will enforce a system restart.
-This protects against failures of the sbd process itself, such as
-dieing, or becoming stuck on an IO error.
-
-It is highly recommended that you set up your Linux system
-to use a watchdog. Please refer to the SLES manual for this step.
-
-This involves loading the proper watchdog driver on system boot. On HP
-hardware, this is the "hpwdt" module. For systems with a Intel TCO,
-"iTCO_wdt" can be used. "softdog" is the most generic driver, but it is
-recommended that you use one with actual hardware integration. See
-/lib/modules/.../kernel/drivers/watchdog in the kernel package for a list
-of choices.
-
-No other software must access the watchdog timer. Some hardware vendors
-ship systems management software that use the watchdog for system resets
-(f.e. HP ASR daemon). Such software has to be disabled if the watchdog is
-used by SBD.
-
-SBD can be configured in /etc/sysconfig/sbd to use the systems' watchdog.
-.\"
-.P
-\fB* Timeout Settings\fR
-
-If your SBD device resides on a multipath group, you may need to adjust
-the timeouts sbd uses, as MPIO's path down detection can cause some
-latency: after the msgwait timeout, the message is assumed to have been
-delivered to the node. For multipath, this should be the time required
-for MPIO to detect a path failure and switch to the next path. You may
-have to test this in your environment. The node will perform suicide if
-it has not updated the watchdog timer fast enough; the watchdog timeout
-must be shorter than the msgwait timeout - half the value is a good
-estimate. This can be specified when the SBD device is initialized.
-
-If you want to avoid MD mirror splitting in case of IO errors, the watchdog
-timeout has to be shorter than the total MPIO failure timeout. Thus, a node
-is fenced before the MD mirror is splitted. On the other hand, the time
-the cluster waits for SAN and storage to recover is shortened.
-
-In any case, the watchdog timeout must be shorter than sbd message wait timeout.
-The sbd message wait timeout must be shorter than the cluster stonith-timeout.
-
-If the sbd device recovers from IO errors within the watchdog timeout, the sbd
-daemon could reset the watchdog timer and save the node from being fenced.
-To allow re-discovery of a failed sbd device, at least the primary sbd retry
-cycle should be shorter than the watchdog timeout. Since this cycle is currently
-hardcoded as ten time the loop timeout, it has to be set by choosing an
-apropriate loop timeout.
-
-It might be also wise to set a start delay for the cluster resource agent in
-the CIB. This is done to overcome situations where both nodes fence each other
-within the sbd loop timeout, see sbd(8).
-
-Putting it all together:
-.br
-- How long a cluster survives a storage outage depends on the watchdog
- timeout and the sbd retry cycle. All other timeouts should be aligned with
- this settings. That means they have to be longer.
-.br
-- Storage resources - as Raid1, LVM, Filesystem - have operation timeouts.
- Those should be aligned with the MPIO settings. This avoids non-needed failure
- actions, but does not define how long the cluster will survive a storage
- outage.
-.\"
-.SH FILES
-.TP
-/usr/sbin/sbd
- the daemon (and control command).
-.TP
-/usr/lib64/stonith/plugins/external/sbd
- the STONITH plugin.
-.TP
-/etc/sysconfig/sbd
- the SBD configuration file.
-.TP
-/etc/sysconfig/kernel
- the kernel and initrd configuration file.
-.TP
-/etc/rc.d/rc3.d/K01openais
- stop script to prevent stonith during system shutdown.
-.TP
-/dev/<SBD>
- the SBD block device(s).
-.TP
-/dev/watchdog
- the watchdog device node.
-.TP
-/lib/modules/<kernel-version>/kernel/drivers/watchdog/
- the watchdog modules.
-.\"
-.SH BUGS
-To report bugs for a SUSE or Novell product component, please use
- http://support.novell.com/additional/bugreport.html .
-.\"
-.SH SEE ALSO
-
-\fBsbd\fP(8), \fBadd_watchdog_to_initrd\fP(8), \fBdisable_other_watchdog\fP(8),
-\fBmake_sbd_devices\fP(8), \fBdasdfmt\fP(8),
-http://www.linux-ha.org/wiki/SBD_Fencing ,
-http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg03849.html ,
-http://www.novell.com/documentation/sle_ha/book_sleha/?page=/documentation/sle_ha/book_sleha/data/part_config.html ,
-http://www.novell.com/documentation/sle_ha/book_sleha/?page=/documentation/sle_ha/book_sleha/data/part_storage.html
-.\"
-.SH AUTHORS
-The content of this manual page was mostly derived from online documentation
-mentioned above.
-.\"
-.SH COPYRIGHT
-(c) 2009-2011 SUSE Linux GmbH, Germany.
-.br
-sbd comes with ABSOLUTELY NO WARRANTY.
-.br
-For details see the GNU General Public License at
-http://www.gnu.org/licenses/gpl.html
-.\"
diff --git a/man/sbd.8 b/man/sbd.8
deleted file mode 100644
index dc57bce..0000000
--- a/man/sbd.8
+++ /dev/null
@@ -1,338 +0,0 @@
-.TH sbd 8 "16 Jan 2012" "" "cluster-glue"
-.\"
-.SH NAME
-sbd \- Stonith Block Device daemon
-.\"
-.SH SYNOPSIS
-.B sbd
-\fIOPTIONS\fR [\fIOPT_ARGUMENT\fR] [\fICOMMAND\fR] [\fICMD_ARGUMENT\fR]
-
-.\"
-.SH OPTIONS
-.TP
-\fB-d\fR <SBD>
- Block device to use (mandatory),
-if you have more than one device, provide them by specifying this
-option multiple times
-.TP
-\fB-h\fR
- Display this help
-.TP
-\fB-n\fR <node>
- Set local node name; defaults to uname -n (optional)
-.TP
-\fB-R\fR
- Do NOT enable realtime priority (debugging only)
-.TP
-\fB-W\fR
- Use watchdog (recommended) (watch only)
-.TP
-\fB-w\fR <SBD>
- Specify watchdog device (optional) (watch only)
-.TP
-\fB-D\fR
- Run as background daemon (optional) (watch only)
-.TP
-\fB-t\fR <N>
- Set timeout to <N> seconds before automatic recover of failed SBD device
-(optional) (default is 3600, set to 0 to disable) (watch only)
-.TP
-\fB-v\fR
- Enable some verbose debug logging (optional)
-.TP
-\fB-1\fR <N>
- Set watchdog timeout to N seconds (optional) (create only)
-.TP
-\fB-2\fR <N>
- Set slot allocation timeout to N seconds (optional) (create only)
-.TP
-\fB-3\fR <N>
- Set daemon loop timeout to N seconds (optional) (create only)
-.TP
-\fB-4\fR <N>
- Set msgwait timeout to N seconds (optional) (create only)
-.TP
-\fB-5\fR <N>
- Warn if loop latency exceeds threshold (optional) (watch only)
-(default is 3, set to 0 to disable)
-
-.\"
-.SH COMMAND
-.TP
-\fBcreate\fR
- Initialize N slots on devicde <SBD> - OVERWRITES DEVICE!
-.TP
-\fBlist\fR
- List all allocated slots on device, and messages.
-.TP
-\fBdump\fR
- Dump meta-data header from device.
-.TP
-\fBwatch\fR
- Loop forever, monitoring own slot
-.TP
-\fBallocate\fR <node>
- Allocate a slot for node (optional)
-.TP
-\fBmessage\fR <node> (test|reset|off|clear|exit)
- Write the specified message to node's slot.
-
-.\"
-.SH DESCRIPTION
-
-The \fBsbd\fR daemon automatically allocates one of the message slots on the
-assigned disk partition to itself, and constantly monitors it for messages to
-itself. Upon receipt of a message, the daemon immediately complies with the
-request, such as initiating a power-off or reboot cycle for fencing.
-
-The daemon also constantly monitors connectivity to the storage device,
-and commits suicide in case the partition becomes unreachable,
-guaranteeing that it is not disconnected from fencing message.
-
-The daemon is brought online on each node before the rest of the
-cluster stack is started, and terminated only after all other cluster
-components have been shut down - ensuring that cluster resources are
-never activated without SBD supervision.
-
-The environment must have shared storage reachable by all nodes.
-This shared storage segment must not make use of host-based RAID, cLVM2,
-nor DRBD. Please refer to sbd(7) for more information.
-
-The \fBsbd\fR can also be called manually to perform actions described in the
-commands section of this manual page.
-
-In the rest of this text, this is referred to as "/dev/<SBD>" or "/dev/<SBD_n>",
-please substitute your actual pathname
-(f.e. "/dev/disk/by-id/scsi-1494554000000000036363600000000000000000000000000")
-for this below.
-
-If a watchdog is used together with the sbd, the watchdog is activated at initial
-start of the sbd daemon. Afterwards the watchdog timer is reset by the inquisitor
-process after each successful read loop of each watcher process.
-
-STONITH is an acronym for Shoot The Other Node in The Head.
-.\"
-.SH EXAMPLES
-
-
-\fB* Initialising SBD Partition\fR
-
-All these steps must be performed as root.
-
-After having made very sure that this is indeed the device you want to
-use, and does not hold any data you need - as the sbd command will
-overwrite it without further requests for confirmation -, initialize one
-single SBD device:
-
-# \fBsbd -d /dev/<SBD> create\fR
-
-This will write a header to the device, and create slots for up to 255
-nodes sharing this device with default timings.
-
-If your sbd device resides on a multipath group, you may need to adjust
-the timeouts sbd uses, as MPIO's path down detection can cause some
-latency: after the msgwait timeout, the message is assumed to have been
-delivered to the node. For multipath, this should be the time required
-for MPIO to detect a path failure and switch to the next path. You may
-have to test this in your environment.
-
-Initialize three SBD devices in exact the same way, with identical settings:
-
-# \fBsbd -d /dev/<SBD_1> -d /dev/<SBD_2> -d /dev/<SBD_3> create\fR
-
-
-\fB* Setting Watchdog Timeout for SBD Partition\fR
-
-The node will perform suicide if
-it has not updated the watchdog timer fast enough; the watchdog timeout
-must be shorter than the msgwait timeout - half the value is a good
-estimate. This can be specified when the SBD device is initialized:
-
-# \fB/usr/sbin/sbd -d /dev/<SBD> -4 $msgwait -3 $looptimeout -1 $watchdogtimeout create\fR
-
-(All timeouts are in seconds. See also sbd(7) for information on timings.)
-
-If your single sbd device resides on a multipath group, you may need to
-adjust the timeouts sbd uses, as MPIO's path down detection can cause
-delays. (If you have multiple devices, transient timeouts of a single
-device will not negatively affect SBD. However, if they all go through
-the same FC switches, you will still need to do this.)
-
-
-\fB* Dumping Content of SBD Partition\fR
-
-You can look at what was written to the device using:
-
-# \fBsbd -d /dev/<SBD> dump\fR
-.br
-Header version : 2
-.br
-Number of slots : 255
-.br
-Sector size : 512
-.br
-Timeout (watchdog) : 5
-.br
-Timeout (allocate) : 2
-.br
-Timeout (loop) : 1
-.br
-Timeout (msgwait) : 10
-
-As you can see, the timeouts are also stored in the header, to ensure
-that all participating nodes agree on them. The example output above
-shows built-in defaults. Usually the timeouts for watchdog and msgwait
-are adjusted to specific needs, see sbd(7). The timeouts for allocate
-and loop normally should not be changed.
-
-Additionally, it is highly recommended that you set up your Linux system
-to use a watchdog.
-
-
-\fB* Starting the SBD daemon\fR
-
-The sbd daemon is a critical piece of the cluster stack. It must always
-be running when the cluster stack is up, or even when the rest of it has
-crashed, so that it can be fenced.
-
-The openais init script starts and stops SBD if configured; add the
-following to /etc/sysconfig/sbd:
-
-===
-.br
-# The next line points to three devices (no trailing ";"):
-.br
-SBD_DEVICE="/dev/<SBD_1>;/dev/<SBD_2>;/dev/<SBD_3>"
-.br
-# The next line enables watchdog support, re-discover time 210 seconds:
-.br
-SBD_OPTS="-W -t 210"
-.br
-===
-
-Before proceeding, ensure that SBD has indeed started on all nodes through
-
-# \fBrcopenais restart\fR
-
-
-\fB* Listing Content of SBD\fR
-
-The command
-
-# \fBsbd -d /dev/<SBD> list\fR
-
-will dump the node slots, and their current messages, from the sbd
-device. You should see all cluster nodes that have ever been started
-with sbd being listed there; most likely with the message slot showing
-"clear".
-
-
-\fB* Testing SBD\fR
-
-You can now try sending a test message to one of the nodes:
-
-# \fBsbd -d /dev/<SBD> message nodea test\fR
-
-The node will acknowledge the receipt of the message in the system logs:
-.br
-Aug 29 14:10:00 nodea sbd: [13412]: info: Received command test from nodeb
-
-This confirms that SBD is indeed up and running on the node, and that it
-is ready to receive messages.
-
-
-\fB* Recovering from temporary SBD device outage\fR
-
-If you have multiple devices, failure of a single device is not immediately
-fatal.
-SBD will retry ten times in succession to reattach to the device, and then pause
-(as to not flood the system) before retrying. The pause intervall timeout could
- be configured. Thus, SBD should automatically recover from temporary outages.
-
-Should you wish to try reattach to the device right now, you can send a SIGUSR1
-to the SBD parent daemon.
-
-# \fBps aux | grep sbd\fR
-.br
-root 3363 0.0 1.0 44552 5764 ? SL Dec16 0:13 sbd: inquisitor
-.br
-root 3364 0.0 1.0 44568 5712 ? SL Dec16 0:32 sbd: watcher: /dev/disk/by-id/scsi-1494554000000000036363600000000000000000000000000-part1 - slot: 0
-.br
-# \fBkill -SIGUSR1 3363\fR
-.br
-# \fBps aux | grep sbd\fR
-.br
-root 3363 0.0 1.0 44552 5764 ? SL Dec16 0:13 sbd: inquisitor
-.br
-root 3364 0.0 1.0 44568 5712 ? SL Dec16 0:32 sbd: watcher: /dev/disk/by-id/scsi-1494554000000000036363600000000000000000000000000-part1 - slot: 0
-.br
-root 3380 0.0 1.0 44568 5712 ? SL Dec16 0:00 sbd: watcher: /dev/disk/by-id/scsi-1494554000000000038383800000000000000000000000000-part1 - slot: 0
-.\" check the fake
-
-There are two to four sbd processes, depending on the number of sbd devices:
-One master process (inquisitor), and per device one watcher.
-
-\fB* Configuring the Fencing Resource in the Cluster Information Base\fR
-
-To complete the sbd setup, it is necessary to activate sbd as a
-STONITH/fencing mechanism in the CIB as follows:
-
-# \fBcrm
-.br
-configure
-.br
-property stonith-enabled="true"
-.br
-property stonith-timeout="150s"
-.br
-primitive stonith_sbd stonith:external/sbd
-op start interval="0" timeout="15" start-delay="5"
-.br
-commit
-.br
-quit
-\fR
-
-Note that since node slots are allocated automatically, no manual hostlist needs
-to be defined. Also, there is no need to define the SBD devices. On the other hand,
- a start delay is set. This is done to overcome situations where both nodes fence
-each other within the sbd loop timeout.
-
-Once the resource has started, your cluster is now successfully
-configured for shared-storage fencing, and will utilize this method in
-case a node needs to be fenced.
-
-The sbd agent does not need to and should not be cloned. If all of your nodes
-run SBD, as is most likely, not even a monitor action provides a real benefit,
-since the daemon would suicide the node if there was a problem.
-
-SBD also supports turning the reset request into a crash request, which may be
-helpful for debugging if you have kernel crashdumping configured; then, every
-fence request will cause the node to dump core. You can enable this via the
-crashdump="true" setting on the fencing resource. This is not recommended for
-on-going production use, but for debugging phases.
-.\"
-.SH BUGS
-To report bugs for a SUSE or Novell product component, please use
- http://support.novell.com/additional/bugreport.html .
-.\"
-.SH SEE ALSO
-
-\fBsbd\fP(7),
-http://www.linux-ha.org/wiki/SBD_Fencing ,
-http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg03849.html ,
-http://www.novell.com/documentation/sle_ha/book_sleha/?page=/documentation/sle_ha/book_sleha/data/part_config.html ,
-http://www.novell.com/documentation/sle_ha/book_sleha/?page=/documentation/sle_ha/book_sleha/data/part_storage.html
-.\"
-.SH AUTHORS
-The content of this manual page was mostly derived from online documentation
-mentioned above and the programm's help option.
-.\"
-.SH COPYRIGHT
-(c) 2009-2011 SUSE Linux GmbH, Germany.
-.br
-sbd comes with ABSOLUTELY NO WARRANTY.
-.br
-For details see the GNU General Public License at
-http://www.gnu.org/licenses/gpl.html
-.\"
diff --git a/man/sbd.8.pod b/man/sbd.8.pod
new file mode 100644
index 0000000..98e7b51
--- /dev/null
+++ b/man/sbd.8.pod
@@ -0,0 +1,558 @@
+=head1 NAME
+
+sbd - STONITH Block Device daemon
+
+=head1 SYNPOSIS
+
+sbd <-d F</dev/...>> [options] C<command>
+
+=head1 SUMMARY
+
+SBD provides a node fencing mechanism (Shoot the other node in the head,
+STONITH) for Pacemaker-based clusters through the exchange of messages
+via shared block storage such as for example a SAN, iSCSI, FCoE. This
+isolates the fencing mechanism from changes in firmware version or
+dependencies on specific firmware controllers, and it can be used as a
+STONITH mechanism in all configurations that have reliable shared
+storage.
+
+The F<sbd> binary implements both the daemon that watches the message
+slots as well as the management tool for interacting with the block
+storage device(s). This mode of operation is specified via the
+C<command> parameter; some of these modes take additional parameters.
+
+To use, you must first C<create> the messaging layout on one to three
+block devices. Second, configure F</etc/sysconfig/sbd> to list those
+devices (and possibly adjust other options), and restart the cluster
+stack on each node to ensure that C<sbd> is started. Third, configure
+the C<external/sbd> fencing resource in the Pacemaker CIB.
+
+Each of these steps is documented in more detail below the description
+of the command options.
+
+C<sbd> can only be used as root.
+
+=head2 GENERAL OPTIONS
+
+=over
+
+=item B<-d> F</dev/...>
+
+Specify the block device(s) to be used. If you have more than one,
+specify this option up to three times. This parameter is mandatory for
+all modes, since SBD always needs a block device to interact with.
+
+This man page uses F</dev/sda1>, F</dev/sdb1>, and F</dev/sdc1> as
+example device names for brevity. However, in your production
+environment, you should instead always refer to them by using the long,
+stable device name (e.g.,
+F</dev/disk/by-id/dm-uuid-part1-mpath-3600508b400105b5a0001500000250000>).
+
+=item B<-v>
+
+Enable some verbose debug logging.
+
+=item B<-h>
+
+Display a concise summary of C<sbd> options.
+
+=item B<-c> I<node>
+
+Set local node name; defaults to C<uname -n>. This should not need to be
+set.
+
+=item B<-R>
+
+Do B<not> enable realtime priority. By default, C<sbd> runs at realtime
+priority, locks itself into memory, and also acquires highest IO
+priority to protect itself against interference from other processes on
+the system. This is a debugging-only option.
+
+=item B<-I> I<N>
+
+Async IO timeout (defaults to 3 seconds, optional). You should not need
+to adjust this unless your IO setup is really very slow.
+
+(In daemon mode, the watchdog is refreshed when the majority of devices
+could be read within this time.)
+
+=back
+
+=head2 create
+
+Example usage:
+
+ sbd -d /dev/sdc2 -d /dev/sdd3 create
+
+If you specify the I<create> command, sbd will write a metadata header
+to the device(s) specified and also initialize the messaging slots for
+up to 255 nodes.
+
+B<Warning>: This command will not prompt for confirmation. Roughly the
+first megabyte of the specified block device(s) will be overwritten
+immediately and without backup.
+
+This command accepts a few options to adjust the default timings that
+are written to the metadata (to ensure they are identical across all
+nodes accessing the device).
+
+=over
+
+=item B<-1> I<N>
+
+Set watchdog timeout to N seconds. This depends mostly on your storage
+latency; the majority of devices must be successfully read within this
+time, or else the node will self-fence.
+
+If your sbd device(s) reside on a multipath setup or iSCSI, this should
+be the time required to detect a path failure. You may be able to reduce
+this if your device outages are independent, or if you are using the
+Pacemaker integration.
+
+=item B<-2> I<N>
+
+Set slot allocation timeout to N seconds. You should not need to tune
+this.
+
+=item B<-3> I<N>
+
+Set daemon loop timeout to N seconds. You should not need to tune this.
+
+=item B<-4> I<N>
+
+Set I<msgwait> timeout to N seconds. This should be twice the I<watchdog>
+timeout. This is the time after which a message written to a node's slot
+will be considered delivered. (Or long enough for the node to detect
+that it needed to self-fence.)
+
+This also affects the I<stonith-timeout> in Pacemaker's CIB; see below.
+
+=back
+
+=head2 list
+
+Example usage:
+
+ # sbd -d /dev/sda1 list
+ 0 hex-0 clear
+ 1 hex-7 clear
+ 2 hex-9 clear
+
+List all allocated slots on device, and messages. You should see all
+cluster nodes that have ever been started against this device. Nodes
+that are currently running should have a I<clear> state; nodes that have
+been fenced, but not yet restarted, will show the appropriate fencing
+message.
+
+=head2 dump
+
+Example usage:
+
+ # sbd -d /dev/sda1 dump
+ ==Dumping header on disk /dev/sda1
+ Header version : 2
+ Number of slots : 255
+ Sector size : 512
+ Timeout (watchdog) : 15
+ Timeout (allocate) : 2
+ Timeout (loop) : 1
+ Timeout (msgwait) : 30
+ ==Header on disk /dev/sda1 is dumped
+
+Dump meta-data header from device.
+
+=head2 watch
+
+Example usage:
+
+ sbd -d /dev/sdc2 -d /dev/sdd3 -W -P watch
+
+This command will make C<sbd> start in daemon mode. It will constantly monitor
+the message slot of the local node for incoming messages, reachability, and
+optionally take Pacemaker's state into account.
+
+The options for this mode are rarely specified directly on the
+commandline directly, but most frequently set via F</etc/sysconfig/sbd>. The
+C<openais> or C<corosync> system start-up scripts take care of starting
+or stopping C<sbd> as required before starting the rest of the cluster
+stack. Thus, the daemon is brought online on each node before the rest of the
+cluster stack is started, and terminated only after all other cluster
+components have been shut down - ensuring that cluster resources are
+never activated without SBD supervision.
+
+It also constantly monitors connectivity to the storage device, and
+self-fences in case the partition becomes unreachable, guaranteeing that it
+does not disconnect from fencing messages.
+
+A node slot is automatically allocated on the device(s) the first time
+the daemon starts watching the device; hence, manual allocation is not
+usually required.
+
+If a watchdog is used together with the C<sbd> as is strongly
+recommended, the watchdog is activated at initial start of the sbd
+daemon. The watchdog is refreshed every time the majority of SBD devices
+has been successfully read. Using a watchdog provides additional
+protection against C<sbd> crashing.
+
+If the Pacemaker integration is activated, C<sbd> will B<not> self-fence
+if device majority is lost, if:
+
+=over
+
+=item 1.
+
+The partition the node is in is still quorate according to the CIB;
+
+=item 2.
+
+it is still quorate according to Corosync's node count;
+
+=item 3.
+
+the node itself is considered online and healthy by Pacemaker.
+
+=back
+
+This allows C<sbd> to survive temporary outages of the majority of
+devices. However, while the cluster is in such a degraded state, it can
+neither successfully fence nor be shutdown cleanly (as taking the
+cluster below the quorum threshold will immediately cause all remaining
+nodes to self-fence). In short, it will not tolerate any further faults.
+Please repair the system before continuing.
+
+There is one C<sbd> process that acts as a master to which all watchers
+report; one per device to monitor the node's slot; and, optionally, one
+that handles the Pacemaker integration.
+
+=over
+
+=item B<-W>
+
+Enable use of the system watchdog. This is I<highly> recommended.
+
+=item B<-w> F</dev/watchdog>
+
+This can be used to override the default watchdog device used and should not
+usually be necessary.
+
+=item B<-F> I<N>
+
+Number of failures before a failing servant process will not be restarted
+immediately until the dampening delay has expired. If set to zero, servants
+will be restarted immediately and indefinitely. If set to one, a failed
+servant will be restarted once every B<-t> seconds. If set to a different
+value, the servant will be restarted that many times within the dampening
+period and then delay.
+
+Defaults to I<1>.
+
+=item B<-t> I<N>
+
+Dampening delay before faulty servants are restarted. Combined with C<-F 1>,
+the most logical way to tune the restart frequency of servant processes.
+Default is 5 seconds.
+
+If set to zero, processes will be restarted indefinitely and immediately.
+
+=item B<-P>
+
+Check Pacemaker quorum and node health.
+
+=item B<-Z>
+
+Enable trace mode. B<Warning: this is unsafe for production, use at your
+own risk!> Specifying this once will turn all reboots or power-offs, be
+they caused by self-fence decisions or messages, into a crashdump.
+Specifying this twice will just log them but not continue running.
+
+=item B<-T>
+
+By default, the daemon will set the watchdog timeout as specified in the
+device metadata. However, this does not work for every watchdog device.
+In this case, you must manually ensure that the watchdog timeout used by
+the system correctly matches the SBD settings, and then specify this
+option to allow C<sbd> to continue with start-up.
+
+=item B<-5> I<N>
+
+Warn if the time interval for tickling the watchdog exceeds this many seconds.
+Since the node is unable to log the watchdog expiry (it reboots immediately
+without a chance to write its logs to disk), this is very useful for getting
+an indication that the watchdog timeout is too short for the IO load of the
+system.
+
+Default is 3 seconds, set to zero to disable.
+
+=item B<-C> I<N>
+
+Watchdog timeout to set before crashdumping. If SBD is set to crashdump
+instead of reboot - either via the trace mode settings or the I<external/sbd>
+fencing agent's parameter -, SBD will adjust the watchdog timeout to this
+setting before triggering the dump. Otherwise, the watchdog might trigger and
+prevent a successful crashdump from ever being written.
+
+Defaults to 240 seconds. Set to zero to disable.
+
+=back
+
+=head2 allocate
+
+Example usage:
+
+ sbd -d /dev/sda1 allocate node1
+
+Explicitly allocates a slot for the specified node name. This should
+rarely be necessary, as every node will automatically allocate itself a
+slot the first time it starts up on watch mode.
+
+=head2 message
+
+Example usage:
+
+ sbd -d /dev/sda1 message node1 test
+
+Writes the specified message to node's slot. This is rarely done
+directly, but rather abstracted via the C<external/sbd> fencing agent
+configured as a cluster resource.
+
+Supported message types are:
+
+=over
+
+=item test
+
+This only generates a log message on the receiving node and can be used
+to check if SBD is seeing the device. Note that this could overwrite a
+fencing request send by the cluster, so should not be used during
+production.
+
+=item reset
+
+Reset the target upon receipt of this message.
+
+=item off
+
+Power-off the target.
+
+=item crashdump
+
+Cause the target node to crashdump.
+
+=item exit
+
+This will make the C<sbd> daemon exit cleanly on the target. You should
+B<not> send this message manually; this is handled properly during
+shutdown of the cluster stack. Manually stopping the daemon means the
+node is unprotected!
+
+=item clear
+
+This message indicates that no real message has been sent to the node.
+You should not set this manually; C<sbd> will clear the message slot
+automatically during start-up, and setting this manually could overwrite
+a fencing message by the cluster.
+
+=back
+
+=head1 Base system configuration
+
+=head2 Configure a watchdog
+
+It is highly recommended that you configure your Linux system to load a
+watchdog driver with hardware assistance (as is available on most modern
+systems), such as I<hpwdt>, I<iTCO_wdt>, or others. As a fall-back, you
+can use the I<softdog> module.
+
+No other software must access the watchdog timer; it can only be
+accessed by one process at any given time. Some hardware vendors ship
+systems management software that use the watchdog for system resets
+(f.e. HP ASR daemon). Such software has to be disabled if the watchdog
+is to be used by SBD.
+
+=head2 Choosing and initializing the block device(s)
+
+First, you have to decide if you want to use one, two, or three devices.
+
+If you are using multiple ones, they should reside on independent
+storage setups. Putting all three of them on the same logical unit for
+example would not provide any additional redundancy.
+
+The SBD device can be connected via Fibre Channel, Fibre Channel over
+Ethernet, or even iSCSI. Thus, an iSCSI target can become a sort-of
+network-based quorum server; the advantage is that it does not require
+a smart host at your third location, just block storage.
+
+The SBD partitions themselves B<must not> be mirrored (via MD,
+DRBD, or the storage layer itself), since this could result in a
+split-mirror scenario. Nor can they reside on cLVM2 volume groups, since
+they must be accessed by the cluster stack before it has started the
+cLVM2 daemons; hence, these should be either raw partitions or logical
+units on (multipath) storage.
+
+The block device(s) must be accessible from all nodes. (While it is not
+necessary that they share the same path name on all nodes, this is
+considered a very good idea.)
+
+SBD will only use about one megabyte per device, so you can easily
+create a small partition, or very small logical units. (The size of the
+SBD device depends on the block size of the underlying device. Thus, 1MB
+is fine on plain SCSI devices and SAN storage with 512 byte blocks. On
+the IBM s390x architecture in particular, disks default to 4k blocks,
+and thus require roughly 4MB.)
+
+The number of devices will affect the operation of SBD as follows:
+
+=over
+
+=item One device
+
+In its most simple implementation, you use one device only. This is
+appropriate for clusters where all your data is on the same shared
+storage (with internal redundancy) anyway; the SBD device does not
+introduce an additional single point of failure then.
+
+If the SBD device is not accessible, the daemon will fail to start and
+inhibit openais startup.
+
+=item Two devices
+
+This configuration is a trade-off, primarily aimed at environments where
+host-based mirroring is used, but no third storage device is available.
+
+SBD will not commit suicide if it loses access to one mirror leg; this
+allows the cluster to continue to function even in the face of one outage.
+
+However, SBD will not fence the other side while only one mirror leg is
+available, since it does not have enough knowledge to detect an asymmetric
+split of the storage. So it will not be able to automatically tolerate a
+second failure while one of the storage arrays is down. (Though you
+can use the appropriate crm command to acknowledge the fence manually.)
+
+It will not start unless both devices are accessible on boot.
+
+=item Three devices
+
+In this most reliable and recommended configuration, SBD will only
+self-fence if more than one device is lost; hence, this configuration is
+resilient against temporary single device outages (be it due to failures
+or maintenance). Fencing messages can still be successfully relayed if
+at least two devices remain accessible.
+
+This configuration is appropriate for more complex scenarios where
+storage is not confined to a single array. For example, host-based
+mirroring solutions could have one SBD per mirror leg (not mirrored
+itself), and an additional tie-breaker on iSCSI.
+
+It will only start if at least two devices are accessible on boot.
+
+=back
+
+After you have chosen the devices and created the appropriate partitions
+and perhaps multipath alias names to ease management, use the C<sbd create>
+command described above to initialize the SBD metadata on them.
+
+=head3 Sharing the block device(s) between multiple clusters
+
+It is possible to share the block devices between multiple clusters,
+provided the total number of nodes accessing them does not exceed I<255>
+nodes, and they all must share the same SBD timeouts (since these are
+part of the metadata).
+
+If you are using multiple devices this can reduce the setup overhead
+required. However, you should B<not> share devices between clusters in
+different security domains.
+
+=head2 Configure SBD to start on boot
+
+If configured via F</etc/sysconfig/sbd>, the cluster stack's init script
+will automatically start and stop C<sbd> as required. In this file, you
+must specify the device(s) used, as well as any options to pass to the
+daemon:
+
+ SBD_DEVICE="/dev/sda1;/dev/sdb1;/dev/sdc1"
+ SBD_OPTS="-W -P"
+
+After a restart of the cluster stack on this node, you can now try
+sending a test message to it as root, from this or any other node:
+
+ sbd -d /dev/sda1 message node1 test
+
+The node will acknowledge the receipt of the message in the system logs:
+
+ Aug 29 14:10:00 node1 sbd: [13412]: info: Received command test from node2
+
+This confirms that SBD is indeed up and running on the node, and that it
+is ready to receive messages.
+
+Make sure that F</etc/sysconfig/sbd> is identical on all cluster nodes,
+and that all cluster nodes are running the daemon!
+
+=head1 Pacemaker CIB integration
+
+=head2 Fencing resource
+
+Pacemaker can only interact with SBD to issue a node fence if there is a
+configure fencing resource. This should be a primitive, not a clone, as
+follows:
+
+ primitive fencing-sbd external/sbd \
+ op start start-delay="15"
+
+This will automatically use the same devices as configured in
+F</etc/sysconfig/sbd>.
+
+While you should not configure this as a clone (as Pacemaker will start
+a fencing agent in each partition automatically), the I<start-delay>
+setting ensures, in a scenario where a split-brain scenario did occur in
+a two node cluster, that the one that still needs to instantiate a
+fencing agent is slightly disadvantaged to avoid fencing loops.
+
+SBD also supports turning the reset request into a crash request, which
+may be helpful for debugging if you have kernel crashdumping configured;
+then, every fence request will cause the node to dump core. You can
+enable this via the C<crashdump="true"> parameter on the fencing
+resource. This is B<not> recommended for production use, but only for
+debugging phases.
+
+
+=head2 General cluster properties
+
+You must also enable STONITH in general, and set the STONITH timeout to
+be at least twice the I<msgwait> timeout you have configured, to allow
+enough time for the fencing message to be delivered. If your I<msgwait>
+timeout is 60 seconds, this is a possible configuration:
+
+ property stonith-enabled="true"
+ property stonith-timeout="120s"
+
+=head1 Management tasks
+
+=head2 Recovering from temporary SBD device outage
+
+If you have multiple devices, failure of a single device is not immediately
+fatal. C<sbd> will retry to restart the monitor for the device every 5
+seconds by default. However, you can tune this via the options to the
+I<watch> command.
+
+In case you wish the immediately force a restart of all currently
+disabled monitor processes, you can send a I<SIGUSR1> to the SBD
+I<inquisitor> process.
+
+
+=head1 LICENSE
+
+Copyright (C) 2008-2012 Lars Marowsky-Bree
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public
+License as published by the Free Software Foundation; either
+version 2.1 of the License, or (at your option) any later version.
+
+This software is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+General Public License for more details.
+
+For details see the GNU General Public License at
+http://www.gnu.org/licenses/gpl.html
+
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Mon, Feb 24, 11:54 AM (18 h, 29 m ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
1464180
Default Alt Text
(46 KB)
Attached To
Mode
rS SBD
Attached
Detach File
Event Timeline
Log In to Comment