Page MenuHomeClusterLabs Projects

boothd.8.txt
No OneTemporary

boothd.8.txt

BOOTHD(8)
===========
:doctype: manpage
NAME
----
boothd - The Booth Cluster Ticket Manager.
SYNOPSIS
--------
*boothd* 'daemon' [-SD] [-c 'config'] [-l 'lockfile']
*booth* 'list' [-s 'site'] [-c 'config']
*booth* 'grant' [-s 'site'] [-c 'config'] [-Fw] 'ticket'
*booth* 'revoke' [-s 'site'] [-c 'config'] [-w] 'ticket'
*booth* 'status' [-D] [-c 'config']
DESCRIPTION
-----------
Booth manages tickets which authorizes one of the cluster sites
located in geographically dispersed distances to run certain
resources. It is designed to be extend Pacemaker to support
geographically distributed clustering.
It is based on the RAFT protocol, see eg.
<https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf>
for details.
SHORT EXAMPLES
--------------
---------------------
# boothd daemon -D
# booth list
# booth grant ticket-nfs
# booth revoke ticket-nfs
---------------------
OPTIONS
-------
*-c* 'configfile'::
Configuration to use.
+
Can be a full path to a configuration file, or a short name; in the latter
case, the directory '/etc/booth' and suffix '.conf' are added.
Per default 'booth' is used, which results in the path '/etc/booth/booth.conf'.
+
The configuration name also determines the name of the PID file - for the defaults,
'/var/run/booth/booth.pid'.
*-s*::
Site address or name.
*-F*::
'immediate grant': Don't wait for unreachable sites to
relinquish the ticket. See the 'Booth ticket management'
section below for more details. Use with caution!
*-w*::
'wait indefinitely': The client will wait forever for the
server result for grant and revoke requests.
*-h*, *--help*::
Give a short usage output.
*--version*::
Report version information.
*-S*::
'systemd' mode: don't fork. This is like '-D' but without the debug output.
*-D*::
Debug output/don't daemonize.
Increases the debug output level; booth daemon remains
in the foreground.
*-l* 'lockfile'::
Use another lock file. By default, the lock file name is
inferred from the configuration file name. Normally not
needed.
COMMANDS
--------
Whether the binary is called as 'boothd' or 'booth' doesn't matter; the first
argument determines the mode of operation.
*'daemon'*::
Tells 'boothd' to serve a site. The locally configured interfaces are
searched for an IP address that is defined in the configuration.
booth then runs in either /arbitrator/ or /site/ mode.
*'client'*::
Booth clients can list the ticket information (see also 'crm_ticket -L'),
and revoke or grant tickets to a site.
+
The grant and, under certain circumstances, revoke operations may
take a while to return a definite operation's outcome. The client
will wait up to the network timeout value (by default 5 seconds)
for the result. Unless the '-w' option was set, in which case the
client waits indefinitely.
+
In this mode the configuration file is searched for an IP address that is
locally reachable, ie. matches a configured subnet.
This allows to run the client commands on another node in the same cluster, as
long as the config file and the service IP is locally reachable.
+
For instance, if the booth service IP is 192.168.55.200, and the
local node has 192.168.55.15 configured on one of its network
interfaces, it knows which site it belongs to.
+
Use '-s' to direct client to connect to a different site.
*'status'*::
'boothd' looks for the (locked) PID file and the UDP socket, prints
some output to stdout (for use in shell scripts) and returns
an OCF-compatible return code.
With '-D', a human-readable message is printed to STDERR as well.
CONFIGURATION FILE
------------------
The configuration file must be identical on all sites and
arbitrators.
A minimal file may look like this:
-----------------------
site="192.168.201.100"
site="192.168.202.100"
arbitrator="192.168.203.100"
ticket="ticket-db8"
-----------------------
Comments start with a hash-sign (''#''). Whitespace at the start
and end of the line, and around the ''='', are ignored.
The following key/value pairs are defined:
*'port'*::
The UDP/TCP port to use. Default is '9929'.
*'transport'*::
The transport protocol to use for Raft exchanges.
Currently only UDP is supported.
+
Clients use TCP to communicate with a daemon; Booth
will always bind and listen to both UDP and TCP ports.
*'site'*::
Defines a site Raft member with the given IP. Sites can
acquire tickets. The sites' IP should be managed by the cluster.
*'arbitrator'*::
Defines an arbitrator Raft member with the given IP.
Arbitrators help reach consensus in elections and cannot hold
tickets.
Booth needs at least three members for normal operation. Odd
number of members provides more redundancy.
*'site-user'*, *'site-group'*, *'arbitrator-user'*, *'arbitrator-group'*::
These define the credentials 'boothd' will be running with.
+
On a (Pacemaker) site the booth process will have to call 'crm_ticket', so the
default is to use 'hacluster':'haclient'; for an arbitrator this user and group
might not exists, so there we default to 'nobody':'nobody'.
*'ticket'*::
Registers a ticket. Multiple tickets can be handled by single
Booth instance.
+
Use the special ticket name '__defaults__' to modify the
defaults. The '__defaults__' stanza must precede all the other
ticket specifications.
All times are in seconds.
*'expire'*::
The lease time for a ticket. After that time the ticket can be
acquired by another site if the ticket holder is not
reachable.
+
The default is '600'.
*'acquire-after'*::
Once a ticket is lost, wait this time in addition before
acquiring the ticket.
+
This is to allow for the site that lost the ticket to relinquish
the resources, by either stopping them or fencing a node.
+
A typical delay might be 60 seconds, but ultimately it depends on
the protected resources and the fencing configuration.
+
The default is '0'.
*'renewal-freq'*::
Set the ticket renewal frequency period.
+
If the network reliability is often reduced over prolonged
periods, it is advisable to try to renew more often.
+
Before every renewal the 'before-acquire-handler' is run. This
parameter then doubles as a local cluster monitor interval.
*'timeout'*::
After that time 'booth' will re-send packets if there was an
insufficient number of replies. This should be long enough to
allow packets to reach other members.
+
The default is '5'.
*'retries'*::
Defines how many times to retry sending packets before giving
up waiting for acks from other members.
+
Default is '10'. Values lower than 3 are illegal.
+
Ticket renewals should allow for this number of retries. Hence,
the total retry time must be shorter than the renewal time
(either half the expire time or *'renewal-freq'*):
timeout*(retries+1) < renewal
*'weights'*::
A comma-separated list of integers that define the weight of individual
Raft members, in the same order as the 'site' and 'arbitrator' lines.
+
Default is '0' for all; this means that the order in the configuration
file defines priority for conflicting requests.
*'before-acquire-handler'*::
If set, this command will be called before 'boothd' tries to
acquire or renew a ticket. On exit code other than 0,
'boothd' relinquishes the ticket.
+
Thus it is possible to ensure whether the services and its
dependencies protected by the ticket are in good shape at this
site. For instance, if a service in the dependency-chain has a
failcount of 'INFINITY' on all available nodes, the service will
be unable to run. In that case, it is of no use to claim the
ticket.
+
'boothd' waits synchronously for the result of the handler, so make
sure that the program returns quickly.
+
See below for details about booth specific environment variables.
The distributed 'service-runnable' script is an example which may
be used to test whether a pacemaker resource can be started.
One example of a booth configuration file:
-----------------------
transport = udp
port = 9930
# D-85774
site="192.168.201.100"
# D-90409
site="::ffff:192.168.202.100"
# A-1120
arbitrator="192.168.203.100"
ticket="ticket-db8"
expire = 600
acquire-after = 60
timeout = 10
retries = 5
renewal-freq = 60
before-acquire-handler = /usr/share/booth/service-runnable db8
-----------------------
BOOTH TICKET MANAGEMENT
-----------------------
The booth cluster guarantees that every ticket is owned by only
one site at the time.
Tickets must be initially granted with the 'booth client grant'
command. Once it gets granted, the ticket is managed by the booth
cluster. Hence, only granted tickets are managed by 'booth'.
If the ticket gets lost, i.e. that the other members of the booth
cluster do not hear from the ticket owner in a sufficiently long
time, one of the remaining sites will acquire the ticket. This is
what is called _ticket failover_.
If the remaining members cannot form a majority, then the ticket
cannot fail over.
A ticket may be revoked at any time with the 'booth client
revoke' command. For revoke to succeed, the site holding the
ticket must be reachable.
Once the ticket is administratively revoked, it is not managed by
the booth cluster anymore. For the booth cluster to start
managing the ticket again, it must be again granted to a site.
The grant operation, in case not all sites are reachable, may get
delayed for the ticket expire time (and, if defined, the
'acquire-after' time). The reason is that the other booth members
may not know if the ticket is currently granted at the
unreachable site.
This delay may be disabled with the '-F' option. In that
case, it is up to the administrator to make sure that the
unreachable site is not holding the ticket.
When the ticket is managed by 'booth', it is dangerous to modify
it manually using either `crm_ticket` command or `crm site
ticket`. Neither of these tools is aware of 'booth' and,
consequently, 'booth' itself may not be aware of any ticket
status changes. A notable exception is setting the ticket to
standby which is typically done before a planned failover.
NOTES
-----
Tickets are not meant to be moved around quickly, the default
'expire' time is 600 seconds (10 minutes).
'booth' works with both IPv4 and IPv6 addresses.
'booth' renews a ticket before it expires, to account for
possible transmission delays. The renewal time, unless explicitly
set, is set to half the 'expire' time.
HANDLERS
--------
Currently, there's only one external handler defined (see the
'before-acquire-handler' configuration item above).
The following environment variables are exported to the handler:
*'BOOTH_TICKET'::
The ticket name, as given in the configuration file. (See 'ticket' item above.)
*'BOOTH_LOCAL'::
The local site name, as defined in 'site'.
*'BOOTH_CONF_PATH'::
The path to the active configuration file.
*'BOOTH_CONF_NAME'::
The configuration name, as used by the '-c' commandline argument.
*'BOOTH_TICKET_EXPIRES'::
When the ticket expires (in seconds since 1.1.1970), or '0'.
The handler is invoked with positional arguments specified after
it.
FILES
-----
*'/etc/booth/booth.conf'*::
The default configuration file name. See also the '-c' argument.
*'/var/run/booth/'*::
Directory that holds PID/lock files. See also the 'status' command.
RAFT IMPLEMENTATION
-------------------
In essence, every ticket corresponds to a separate Raft cluster.
A ticket is granted to the Raft _Leader_ which then owns (or
keeps) the ticket.
ARBITRATOR MANAGEMENT
---------------------
The booth daemon for an arbitrator which typically doesn't run
the cluster stack, may be started through systemd or with
'/etc/init.d/booth-arbitrator', depending on which init system
the platform supports.
The SysV init script starts a booth arbitrator for every
configuration file found in '/etc/booth'.
Platforms running systemd can enable and start every
configuration separately using 'systemctl':
-----------
# systemctl enable booth@<configurationname>
# systemctl start booth@<configurationname>
-----------
'systemctl' requires the configuration name, even for the default
name 'booth'.
EXIT STATUS
-----------
*0*::
Success. For the 'status' command: Daemon running.
*1* (PCMK_OCF_UNKNOWN_ERROR)::
General error code.
*7* (PCMK_OCF_NOT_RUNNING)::
No daemon process for that configuration active.
BUGS
----
Probably.
Please report them on GitHub: <https://github.com/ClusterLabs/booth/issues>
AUTHOR
------
'boothd' was originally written (mostly) by Jiaju Zhang.
Many people have contributed to it.
In 2013 Philipp Marek took over maintainership, followed by Dejan
Muhamedagic.
RESOURCES
---------
GitHub: <https://github.com/ClusterLabs/booth>
Documentation: <http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.geo.html>
COPYING
-------
Copyright (C) 2011 Jiaju Zhang <jjzhang@suse.de>
Copyright (C) 2013-2014 Philipp Marek <philipp.marek@linbit.com>
Copyright (C) 2014 Dejan Muhamedagic <dmuhamedagic@suse.com>
Free use of this software is
granted under the terms of the GNU General Public License (GPL).
// vim: set ft=asciidoc :

File Metadata

Mime Type
text/plain
Expires
Sat, Jan 25, 7:05 AM (1 d, 15 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
1321606
Default Alt Text
boothd.8.txt (12 KB)

Event Timeline