diff --git a/docs/boothd.8.txt b/docs/boothd.8.txt index f396dce..fe34fb5 100644 --- a/docs/boothd.8.txt +++ b/docs/boothd.8.txt @@ -1,377 +1,377 @@ BOOTHD(8) =========== :doctype: manpage NAME ---- boothd - The Booth Cluster Ticket Manager. SYNOPSIS -------- *boothd* 'daemon' ['-D'] [-c 'config'] *booth* ['client'] {'list'} [-S 'site'] ['-D'] [-c 'config'] *booth* ['client'] {'grant'|'revoke'} [-S 'site'] ['-D'] [-t] 'ticket' [-c 'config'] *booth* 'status' ['-D'] [-c 'config'] DESCRIPTION ----------- Booth manages tickets which authorizes one of the cluster sites located in geographically dispersed distances to run certain resources. It is designed to be an add-on to Pacemaker, which extends Pacemaker to support geographically distributed clustering. It is based on the RAFT protocol, see eg. for details. SHORT EXAMPLES -------------- --------------------- # boothd daemon # boothd client list # boothd client grant -t ticket-nfs # boothd client revoke -t ticket-nfs --------------------- OPTIONS ------- *-c*:: Configuration to use. + Can be a full path to a configuration file, or a short name; in the latter case, the directory '/etc/booth' and suffix '.conf' are added. Per default 'booth' is used, which results in the path '/etc/booth/booth.conf'. + The configuration name also determines the name of the PID file - for the defaults, '/var/run/booth/booth.pid'. *-D*:: Debug output/don't daemonize. Increases the debug output level; for 'boothd daemon', keeps the process in the foreground. *-h*, *--help*:: Give a short usage output. *-s*:: Site address. *-t*:: Ticket name. *-v*, *--version*:: Report version information. *-S*:: 'systemd' mode: don't fork. This is like '-D' but without the debug output. COMMANDS -------- Whether the binary is called as 'boothd' or 'booth' doesn't matter; the first argument determines the mode of operation. *'daemon'*:: Tells 'boothd' to serve a site. The locally configured interfaces are searched for an IP address that got defined in the configuration, so that Booth can operate in /arbitrator/ resp. /site/ mode. *'client'*:: Allows to list the ticket information (see also 'crm_ticket -L'), and to revoke or (initially) grant tickets to a site. + In this mode the configuration file is searched for an IP address that is locally reachable, ie. matches a configured subnet. This allows to run the client commands on another node in the same cluster, as long as the config file and the service IP is locally reachable. + Example: If the booth service IP is 192.168.55.200, and the local node has 192.168.55.15 configured on an interface, it knows which site it belongs to. + The client can also ask another site; use '-s' to tell where to connect to. *'status'*:: 'boothd' looks for the (locked) PID file and the UDP socket, prints some output to stdout (for use in shell scripts) and returns a OCF-compatible return code. With '-D', a human-readable message is printed to STDERR as well. CONFIGURATION FILE ------------------ A basic file looks like this: ----------------------- site="192.168.201.100" site="192.168.202.100" arbitrator="192.168.203.100" ticket="ticket-db8" ----------------------- You can use comment lines, by starting them with a hash-sign (''#''). Whitespace at the start and end of the line, and around the ''='', are ignored. The following key/value pairs are defined: *'port'*:: The UDP/TCP port to use. Default is '9929'. *'transport'*:: The transport protocol to use for Raft exchanges. Currently only UDP is available. + The client mode always uses TCP to talk to a daemon; Booth will always bind and listen to *both* UDP and TCP ports. *'site'*, *'arbitrator'*:: Defines a Raft member with the given IP, which should be a service IP. + You will need at least three members for normal operation; an odd number is preferred. *'ticket'*:: Registers a ticket. Multiple tickets can be handled in a single Booth instance. The next items modify per-ticket defaults. They are stored as defaults for further tickets, and are used as value for the last defined ticket (if any). *'expire'*:: The lease time for a ticket, in seconds. After that time the ticket can be revoked, and another site can get it. + Typically 'booth' will try to renew a held ticket after half the lease time. *'timeout'*:: After that time 'booth' will re-send packets if there was an insufficient number of replies. + The default is '5' seconds. *'weights'*:: A comma-separated list of integers that define the weight of individual Raft members, in the same order as the 'site' and 'arbitrator' lines. + Default is '0' for all; this means that the order in the configuration file defines priority for conflicting requests. *'acquire-after'*:: - Setting this to a positive value will make 'booth' try to acquire a ticket - that got lost. + Try to acquire a lost ticket _after_ this period passed. + -Ie. if the site that _had_ the ticket is not reachable any more, -then 'acquire-after' seconds after ticket expiration other sites will try -to activate the ticket. (Only one will succeed, though.) +This is to allow for some time for the site that lost the ticket +to relinquish the resources, by either stopping them or fencing a +node. + -A typical delay might be 60 seconds. +A typical delay might be 60 seconds, but ultimately it depends on +the protected resources and the fencing configuration. *'retries'*:: Defines how many times to retry broadcasting packets before the current operation (grant, revoke) is aborted. + Default is 10. Values lower than 3 are illegal. + This counts only for a single broadcast; if ticket *renewal* runs into this limit (because the network was temporarily down), but the ticket is still valid afterwards, a new renewal run will be started automatically. *'site-user'*, *'site-group'*, *'arbitrator-user'*, *'arbitrator-group'*:: These define the credentials 'boothd' will be running with. + On a (Pacemaker) site the booth process will have to call 'crm_ticket', so the default is to use 'hacluster':'haclient'; for an arbitrator this user and group might not exists, so that will default to 'nobody':'nobody'. *'before-acquire-handler'*:: If set, this command will be called before 'boothd' tries to acquire or renew a ticket. On exit code other than 0, 'boothd' cancels the operation. + This makes it possible to check whether it is appropriate to acquire the ticket. For instance, if a service in the dependency-chain has a failcount of 'INFINITY' on all available nodes, the service will be unable to run. In that case, it is of no use to claim the ticket. + 'boothd' waits synchronously for the result of the handler, so make sure that the program returns quickly. + See below for details about booth specific environment variables. A more verbose example of a configuration file might be ----------------------- transport = udp port = 9930 # D-85774 site="192.168.201.100" # D-90409 site="::ffff:192.168.202.100" # A-1120 arbitrator="192.168.203.100" ticket="ticket-db8" expire = 600 acquire-after = 60 timeout = 10 retries = 5 ----------------------- NOTES ----- Tickets are not meant to be moved around quickly--a reasonable 'expire' time might be 300 seconds (5 minutes). 'booth' works with both IPv4 and IPv6 addresses. 'booth' renews a ticket before it expires, to account for possible transmission delays. The renewal time is calculated as larger of half the 'expire' time and 'timeout'*'retries'/2. Hence, with small 'expire' values (eg. 60 seconds) the ticket renewal process will be started just after the ticket got acquired. HANDLERS -------- Currently, there's only one external handler defined (see the 'before-acquire-handler' configuration item above). The following data is available as environment variables: *'BOOTH_TICKET':: The ticket name, as given in the configuration file. (See 'ticket' item above.) *'BOOTH_LOCAL':: The local site name, as defined in 'site'. *'BOOTH_CONF_PATH':: The path to the active configuration file. *'BOOTH_CONF_NAME':: The configuration name, as used by the '-c' commandline argument. *'BOOTH_TICKET_EXPIRES':: When the ticket expires (in seconds since 1.1.1970), or '0'. FILES ----- *'/etc/booth/booth.conf'*:: The default configuration file name. See also the '-c' argument. *'/var/run/booth/'*:: Directory that holds PID/lock files. See also the 'status' command. RAFT IMPLEMENTATION ------------------- In essence, every ticket corresponds to a separate Raft cluster. A ticket is granted _only_ to the Raft _Leader_, but a Leader needs not grant the ticket to Pacemaker. To move a ticket, the Leader withdraws, and votes for the new Leader instead. SYSTEMD INTEGRATION ------------------- The 'boothd' 'systemd' unit file should be distributed with booth. The booth daemon for a site or an arbitrator may be started through systemd: ----------- # systemctl enable booth@{configurationname}.service # systemctl start booth@{configurationname}.service ----------- The configuration name is required for 'systemctl', even in case of the default name 'booth'. EXIT STATUS ----------- *0*:: Success. For the 'status' command: Daemon running. *1* (PCMK_OCF_UNKNOWN_ERROR):: General error code. *7* (PCMK_OCF_NOT_RUNNING):: No daemon process for that configuration active. BUGS ---- Probably. Please report them on GitHub: AUTHOR ------ 'boothd' was originally written (mostly) by Jiaju Zhang. Many people have contributed to it. In 2013 Philipp Marek took over maintainership, followed by Dejan Muhamedagic. RESOURCES --------- GitHub: Documentation: COPYING ------- Copyright (C) 2011 Jiaju Zhang Copyright (C) 2013-2014 Philipp Marek Copyright (C) 2014 Dejan Muhamedagic Free use of this software is granted under the terms of the GNU General Public License (GPL). // vim: set ft=asciidoc :