diff --git a/docs/boothd.8.txt b/docs/boothd.8.txt index 70a4749..c0be935 100644 --- a/docs/boothd.8.txt +++ b/docs/boothd.8.txt @@ -1,581 +1,580 @@ BOOTHD(8) =========== :doctype: manpage NAME ---- boothd - The Booth Cluster Ticket Manager. SYNOPSIS -------- *boothd* 'daemon' [-SD] [-c 'config'] [-l 'lockfile'] *booth* 'list' [-s 'site'] [-c 'config'] *booth* 'grant' [-s 'site'] [-c 'config'] [-FCw] 'ticket' *booth* 'revoke' [-s 'site'] [-c 'config'] [-w] 'ticket' *booth* 'peers' [-s 'site'] [-c 'config'] *booth* 'status' [-D] [-c 'config'] DESCRIPTION ----------- Booth manages tickets which authorizes one of the cluster sites located in geographically dispersed distances to run certain resources. It is designed to be extend Pacemaker to support geographically distributed clustering. It is based on the RAFT protocol, see eg. for details. SHORT EXAMPLES -------------- --------------------- # boothd daemon -D # booth list # booth grant ticket-nfs # booth revoke ticket-nfs --------------------- OPTIONS ------- *-c* 'configfile':: Configuration to use. + Can be a full path to a configuration file, or a short name; in the latter case, the directory '/etc/booth' and suffix '.conf' are added. Per default 'booth' is used, which results in the path '/etc/booth/booth.conf'. + The configuration name also determines the name of the PID file - for the defaults, '/var/run/booth/booth.pid'. *-s*:: Site address or name. + The special value 'other' can be used to specify the other site. Obviously, in that case, the booth configuration must have exactly two sites defined. *-F*:: 'immediate grant': Don't wait for unreachable sites to relinquish the ticket. See the 'Booth ticket management' section below for more details. + This option may be DANGEROUS. It makes booth grant the ticket even though it cannot ascertain that unreachable sites don't hold the same ticket. It is up to the user to make sure that unreachable sites don't have this ticket as granted. *-w*:: 'wait for the request outcome': The client waits for the final outcome of grant or revoke request. *-C*:: 'wait for ticket commit to CIB': The client waits for the ticket commit to CIB (only for grant requests). If one or more sites are unreachable, this takes the ticket expire time (plus, if defined, the 'acquire-after' time). *-h*, *--help*:: Give a short usage output. *--version*:: Report version information. *-S*:: - 'systemd' mode: don't fork. This is like '-D' but without the debug output. + 'systemd' mode: don't fork. + Disables daemonizing, the process will remain in the foreground. *-D*:: - Debug output/don't daemonize. - Increases the debug output level; booth daemon remains - in the foreground. + Increases the debug output level. *-l* 'lockfile':: Use another lock file. By default, the lock file name is inferred from the configuration file name. Normally not needed. COMMANDS -------- Whether the binary is called as 'boothd' or 'booth' doesn't matter; the first argument determines the mode of operation. *'daemon'*:: Tells 'boothd' to serve a site. The locally configured interfaces are searched for an IP address that is defined in the configuration. booth then runs in either /arbitrator/ or /site/ mode. *'client'*:: Booth clients can list the ticket information (see also 'crm_ticket -L'), and revoke or grant tickets to a site. + The grant and, under certain circumstances, revoke operations may take a while to return a definite operation's outcome. The client will wait up to the network timeout value (by default 5 seconds) for the result. Unless the '-w' option was set, in which case the client waits indefinitely. + In this mode the configuration file is searched for an IP address that is locally reachable, ie. matches a configured subnet. This allows to run the client commands on another node in the same cluster, as long as the config file and the service IP is locally reachable. + For instance, if the booth service IP is 192.168.55.200, and the local node has 192.168.55.15 configured on one of its network interfaces, it knows which site it belongs to. + Use '-s' to direct client to connect to a different site. *'status'*:: 'boothd' looks for the (locked) PID file and the UDP socket, prints some output to stdout (for use in shell scripts) and returns an OCF-compatible return code. With '-D', a human-readable message is printed to STDERR as well. *'peers'*:: List the other 'boothd' servers we know about. + In addition to the type, name (IP address), and the last time the server was heard from, network statistics are also printed. The statistics are split into two rows, the first one consists of counters for the sent packets and the second one for the received packets. The first counter is the total number of packets and descriptions of the other counters follows: 'resends':: Packets which had to be resent because the recipient didn't acknowledge a message. This usually means that either the message or the acknowledgement got lost. The number of resends usually reflect the network reliability. 'error':: Packets which either couldn't be sent, got truncated, or were badly formed. Should be zero. 'invalid':: These packets contain either invalid or non-existing ticket name or refer to a non-existing ticket leader. Should be zero. 'authfail':: Packets which couldn't be authenticated. Should be zero. CONFIGURATION FILE ------------------ The configuration file must be identical on all sites and arbitrators. A minimal file may look like this: ----------------------- site="192.168.201.100" site="192.168.202.100" arbitrator="192.168.203.100" ticket="ticket-db8" ----------------------- Comments start with a hash-sign (''#''). Whitespace at the start and end of the line, and around the ''='', are ignored. The following key/value pairs are defined: *'port'*:: The UDP/TCP port to use. Default is '9929'. *'transport'*:: The transport protocol to use for Raft exchanges. Currently only UDP is supported. + Clients use TCP to communicate with a daemon; Booth will always bind and listen to both UDP and TCP ports. *'authfile'*:: File containing the authentication key. The key can be either binary or text. If the latter, then both leading and trailing white space, including new lines, is ignored. This key is a shared secret and used to authenticate both clients and servers. The key must be between 8 and 64 characters long and be readable only by the file owner. *'maxtimeskew'*:: As protection against replay attacks, packets contain generation timestamps. Such a timestamp is not allowed to be too old. Just how old can be specified with this parameter. The value is in seconds and the default is 600 (10 minutes). If clocks vary more than this default between sites and nodes (which is definitely something you should fix) then set this parameter to a higher value. The time skew test is performed only in concert with authentication. *'site'*:: Defines a site Raft member with the given IP. Sites can acquire tickets. The sites' IP should be managed by the cluster. *'arbitrator'*:: Defines an arbitrator Raft member with the given IP. Arbitrators help reach consensus in elections and cannot hold tickets. Booth needs at least three members for normal operation. Odd number of members provides more redundancy. *'site-user'*, *'site-group'*, *'arbitrator-user'*, *'arbitrator-group'*:: These define the credentials 'boothd' will be running with. + On a (Pacemaker) site the booth process will have to call 'crm_ticket', so the default is to use 'hacluster':'haclient'; for an arbitrator this user and group might not exists, so there we default to 'nobody':'nobody'. *'ticket'*:: Registers a ticket. Multiple tickets can be handled by single Booth instance. + Use the special ticket name '__defaults__' to modify the defaults. The '__defaults__' stanza must precede all the other ticket specifications. All times are in seconds. *'expire'*:: The lease time for a ticket. After that time the ticket can be acquired by another site if the ticket holder is not reachable. + The default is '600'. *'acquire-after'*:: Once a ticket is lost, wait this time in addition before acquiring the ticket. + This is to allow for the site that lost the ticket to relinquish the resources, by either stopping them or fencing a node. + A typical delay might be 60 seconds, but ultimately it depends on the protected resources and the fencing configuration. + The default is '0'. *'renewal-freq'*:: Set the ticket renewal frequency period. + If the network reliability is often reduced over prolonged periods, it is advisable to try to renew more often. + Before every renewal, if defined, the command or commands specified in 'before-acquire-handler' is run. In that case the 'renewal-freq' parameter is effectively also the local cluster monitoring interval. *'timeout'*:: After that time 'booth' will re-send packets if there was an insufficient number of replies. This should be long enough to allow packets to reach other members. + The default is '5'. *'retries'*:: Defines how many times to retry sending packets before giving up waiting for acks from other members. + Default is '10'. Values lower than 3 are illegal. + Ticket renewals should allow for this number of retries. Hence, the total retry time must be shorter than the renewal time (either half the expire time or *'renewal-freq'*): timeout*(retries+1) < renewal *'weights'*:: A comma-separated list of integers that define the weight of individual Raft members, in the same order as the 'site' and 'arbitrator' lines. + Default is '0' for all; this means that the order in the configuration file defines priority for conflicting requests. *'before-acquire-handler'*:: If set, this parameter specifies either a file containing a program to be run or a directory where a number of programs can reside. They are invoked before 'boothd' tries to acquire or renew a ticket. If any of them exits with a code other than 0, 'boothd' relinquishes the ticket. + Thus it is possible to ensure whether the services and its dependencies protected by the ticket are in good shape at this site. For instance, if a service in the dependency-chain has a failcount of 'INFINITY' on all available nodes, the service will be unable to run. In that case, it is of no use to claim the ticket. + One or more arguments may follow the program or directory location. Typically, there is at least the name of one of the resources which depend on this ticket. + See below for details about booth specific environment variables. The distributed 'service-runnable' script is an example which may be used to test whether a pacemaker resource can be started. *'attr-prereq'*:: Sites can have GEO attributes managed with the 'geostore(8)' program. Attributes are within ticket's scope and may be tested by 'boothd' for additional control of ticket failover (automatic) or ticket acquire (manual). + Attributes are typically used to convey extra information about resources, for instance database replication status. The attributes are commonly updated by resource agents. + Attribute values are referenced in expressions and may be tested for equality with the 'eq' binary operator or inequality with the 'ne' operator. The usage is as follows: attr-prereq = : "auto" | "manual" : attribute name : "eq" | "ne" : attribute value + The two grant types are 'auto' for ticket failover and 'manual' for grants using the booth client. Only in case the expression evaluates to true can the ticket be granted. + It is not clear whether the 'manual' grant type has any practical use because, obviously, this operation is anyway controlled by a human. + Note that there can be no guarantee on whether an attribute value is up to date, i.e. if it actually reflects the current state. One example of a booth configuration file: ----------------------- transport = udp port = 9930 # D-85774 site="192.168.201.100" # D-90409 site="::ffff:192.168.202.100" # A-1120 arbitrator="192.168.203.100" ticket="ticket-db8" expire = 600 acquire-after = 60 timeout = 10 retries = 5 renewal-freq = 60 before-acquire-handler = /usr/share/booth/service-runnable db8 attr-prereq = auto repl_state eq ACTIVE ----------------------- BOOTH TICKET MANAGEMENT ----------------------- The booth cluster guarantees that every ticket is owned by only one site at the time. Tickets must be initially granted with the 'booth client grant' command. Once it gets granted, the ticket is managed by the booth cluster. Hence, only granted tickets are managed by 'booth'. If the ticket gets lost, i.e. that the other members of the booth cluster do not hear from the ticket owner in a sufficiently long time, one of the remaining sites will acquire the ticket. This is what is called _ticket failover_. If the remaining members cannot form a majority, then the ticket cannot fail over. A ticket may be revoked at any time with the 'booth client revoke' command. For revoke to succeed, the site holding the ticket must be reachable. Once the ticket is administratively revoked, it is not managed by the booth cluster anymore. For the booth cluster to start managing the ticket again, it must be again granted to a site. The grant operation, in case not all sites are reachable, may get delayed for the ticket expire time (and, if defined, the 'acquire-after' time). The reason is that the other booth members may not know if the ticket is currently granted at the unreachable site. This delay may be disabled with the '-F' option. In that case, it is up to the administrator to make sure that the unreachable site is not holding the ticket. When the ticket is managed by 'booth', it is dangerous to modify it manually using either `crm_ticket` command or `crm site ticket`. Neither of these tools is aware of 'booth' and, consequently, 'booth' itself may not be aware of any ticket status changes. A notable exception is setting the ticket to standby which is typically done before a planned failover. NOTES ----- Tickets are not meant to be moved around quickly, the default 'expire' time is 600 seconds (10 minutes). 'booth' works with both IPv4 and IPv6 addresses. 'booth' renews a ticket before it expires, to account for possible transmission delays. The renewal time, unless explicitly set, is set to half the 'expire' time. HANDLERS -------- Currently, there's only one external handler defined (see the 'before-acquire-handler' configuration item above). The following environment variables are exported to the handler: *'BOOTH_TICKET':: The ticket name, as given in the configuration file. (See 'ticket' item above.) *'BOOTH_LOCAL':: The local site name, as defined in 'site'. *'BOOTH_CONF_PATH':: The path to the active configuration file. *'BOOTH_CONF_NAME':: The configuration name, as used by the '-c' commandline argument. *'BOOTH_TICKET_EXPIRES':: When the ticket expires (in seconds since 1.1.1970), or '0'. The handler is invoked with positional arguments specified after it. FILES ----- *'/etc/booth/booth.conf'*:: The default configuration file name. See also the '-c' argument. *'/etc/booth/authkey'*:: There is no default, but this is a typical location for the shared secret (authentication key). *'/var/run/booth/'*:: Directory that holds PID/lock files. See also the 'status' command. RAFT IMPLEMENTATION ------------------- In essence, every ticket corresponds to a separate Raft cluster. A ticket is granted to the Raft _Leader_ which then owns (or keeps) the ticket. ARBITRATOR MANAGEMENT --------------------- The booth daemon for an arbitrator which typically doesn't run the cluster stack, may be started through systemd or with '/etc/init.d/booth-arbitrator', depending on which init system the platform supports. The SysV init script starts a booth arbitrator for every configuration file found in '/etc/booth'. Platforms running systemd can enable and start every configuration separately using 'systemctl': ----------- # systemctl enable booth@ # systemctl start booth@ ----------- 'systemctl' requires the configuration name, even for the default name 'booth'. EXIT STATUS ----------- *0*:: Success. For the 'status' command: Daemon running. *1* (PCMK_OCF_UNKNOWN_ERROR):: General error code. *7* (PCMK_OCF_NOT_RUNNING):: No daemon process for that configuration active. BUGS ---- Booth is tested regularly. See the `README-testing` file for more information. Please report any bugs either at GitHub: Or, if you prefer bugzilla, at openSUSE bugzilla (component "High Availability"): https://bugzilla.opensuse.org/enter_bug.cgi?product=openSUSE%20Factory AUTHOR ------ 'boothd' was originally written (mostly) by Jiaju Zhang. In 2013 and 2014 Philipp Marek took over maintainership. Since April 2014 it has been mainly developed by Dejan Muhamedagic. Many people contributed (see the `AUTHORS` file). RESOURCES --------- GitHub: Documentation: COPYING ------- Copyright (C) 2011 Jiaju Zhang Copyright (C) 2013-2014 Philipp Marek Copyright (C) 2014 Dejan Muhamedagic Free use of this software is granted under the terms of the GNU General Public License (GPL) as of version 2 (see `COPYING` file) or later. // vim: set ft=asciidoc : diff --git a/src/main.c b/src/main.c index ce6c35e..7ccb782 100644 --- a/src/main.c +++ b/src/main.c @@ -1,1626 +1,1629 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "b_config.h" #if HAVE_LIBGCRYPT #include #endif #ifndef NAMETAG_LIBSYSTEMD #include #else #include "alt/nametag_libsystemd.h" #endif #ifdef COREDUMP_NURSING #include #include #endif #include "log.h" #include "booth.h" #include "config.h" #include "transport.h" #include "inline-fn.h" #include "pacemaker.h" #include "ticket.h" #include "request.h" #include "attr.h" #include "handler.h" #define RELEASE_VERSION "1.0" #define RELEASE_STR RELEASE_VERSION " (build " BOOTH_BUILD_VERSION ")" #define CLIENT_NALLOC 32 static int daemonize = 1; int enable_stderr = 0; timetype start_time; /** Structure for "clients". * Filehandles with incoming data get registered here (and in pollfds), * along with their callbacks. * Because these can be reallocated with every new fd, addressing * happens _only_ by their numeric index. */ struct client *clients = NULL; struct pollfd *pollfds = NULL; static int client_maxi; static int client_size = 0; static const struct booth_site _no_leader = { .addr_string = "none", .site_id = NO_ONE, }; struct booth_site *const no_leader = (struct booth_site*) &_no_leader; typedef enum { BOOTHD_STARTED=0, BOOTHD_STARTING } BOOTH_DAEMON_STATE; int poll_timeout; struct booth_config *booth_conf; struct command_line cl; static void client_alloc(void) { int i; if (!(clients = realloc( clients, (client_size + CLIENT_NALLOC) * sizeof(*clients)) ) || !(pollfds = realloc( pollfds, (client_size + CLIENT_NALLOC) * sizeof(*pollfds)) )) { log_error("can't alloc for client array"); exit(1); } for (i = client_size; i < client_size + CLIENT_NALLOC; i++) { clients[i].workfn = NULL; clients[i].deadfn = NULL; clients[i].fd = -1; pollfds[i].fd = -1; pollfds[i].revents = 0; } client_size += CLIENT_NALLOC; } static void client_dead(int ci) { struct client *c = clients + ci; if (c->fd != -1) { log_debug("removing client %d", c->fd); close(c->fd); } c->fd = -1; c->workfn = NULL; if (c->msg) { free(c->msg); c->msg = NULL; c->offset = 0; } pollfds[ci].fd = -1; } int client_add(int fd, const struct booth_transport *tpt, void (*workfn)(int ci), void (*deadfn)(int ci)) { int i; struct client *c; if (client_size - 1 <= client_maxi ) { client_alloc(); } for (i = 0; i < client_size; i++) { c = clients + i; if (c->fd != -1) continue; c->workfn = workfn; if (deadfn) c->deadfn = deadfn; else c->deadfn = client_dead; c->transport = tpt; c->fd = fd; c->msg = NULL; c->offset = 0; pollfds[i].fd = fd; pollfds[i].events = POLLIN; if (i > client_maxi) client_maxi = i; return i; } assert(!"no client"); } int find_client_by_fd(int fd) { int i; if (fd < 0) return -1; for (i = 0; i <= client_maxi; i++) { if (clients[i].fd == fd) return i; } return -1; } static int format_peers(char **pdata, unsigned int *len) { struct booth_site *s; char *data, *cp; char time_str[64]; int i, alloc; *pdata = NULL; *len = 0; alloc = booth_conf->site_count * (BOOTH_NAME_LEN + 256); data = malloc(alloc); if (!data) return -ENOMEM; cp = data; foreach_node(i, s) { if (s == local) continue; strftime(time_str, sizeof(time_str), "%F %T", localtime(&s->last_recv)); cp += snprintf(cp, alloc - (cp - data), "%-12s %s, last recv: %s\n", type_to_string(s->type), s->addr_string, time_str); cp += snprintf(cp, alloc - (cp - data), "\tSent pkts:%u error:%u resends:%u\n", s->sent_cnt, s->sent_err_cnt, s->resend_cnt); cp += snprintf(cp, alloc - (cp - data), "\tRecv pkts:%u error:%u authfail:%u invalid:%u\n\n", s->recv_cnt, s->recv_err_cnt, s->sec_cnt, s->invalid_cnt); if (alloc - (cp - data) <= 0) { free(data); return -ENOMEM; } } *pdata = data; *len = cp - data; return 0; } void list_peers(int fd) { char *data; unsigned int olen; struct boothc_hdr_msg hdr; if (format_peers(&data, &olen) < 0) goto out; init_header(&hdr.header, CL_LIST, 0, 0, RLT_SUCCESS, 0, sizeof(hdr) + olen); (void)send_header_plus(fd, &hdr, data, olen); out: if (data) free(data); } /* trim trailing spaces if the key is ascii */ static void trim_key() { char *p; int i; for (i=0, p=booth_conf->authkey; i < booth_conf->authkey_len; i++, p++) if (!isascii(*p)) return; p = booth_conf->authkey; while (booth_conf->authkey_len > 0 && isspace(*p)) { p++; booth_conf->authkey_len--; } memmove(booth_conf->authkey, p, booth_conf->authkey_len); p = booth_conf->authkey + booth_conf->authkey_len - 1; while (booth_conf->authkey_len > 0 && isspace(*p)) { booth_conf->authkey_len--; p--; } } static int read_authkey() { int fd; booth_conf->authkey[0] = '\0'; if (stat(booth_conf->authfile, &booth_conf->authstat) < 0) { log_error("cannot stat authentication file %s: %s", booth_conf->authfile, strerror(errno)); return -1; } if (booth_conf->authstat.st_mode & (S_IRGRP | S_IROTH)) { log_error("%s: file shall not be readable for anyone but the owner", booth_conf->authfile); return -1; } fd = open(booth_conf->authfile, O_RDONLY); if (fd < 0) { log_error("cannot open %s: %s", booth_conf->authfile, strerror(errno)); return -1; } booth_conf->authkey_len = read(fd, booth_conf->authkey, BOOTH_MAX_KEY_LEN); close(fd); trim_key(); log_debug("read key of size %d in authfile %s", booth_conf->authkey_len, booth_conf->authfile); /* make sure that the key is of minimum length */ return (booth_conf->authkey_len >= BOOTH_MIN_KEY_LEN) ? 0 : -1; } int update_authkey() { struct stat statbuf; if (stat(booth_conf->authfile, &statbuf) < 0) { log_error("cannot stat authentication file %s: %s", booth_conf->authfile, strerror(errno)); return -1; } if (statbuf.st_mtime > booth_conf->authstat.st_mtime) { return read_authkey(); } return 0; } static int setup_config(int type) { int rv; rv = read_config(cl.configfile, type); if (rv < 0) goto out; if (is_auth_req()) { rv = read_authkey(); if (rv < 0) goto out; #if HAVE_LIBGCRYPT if (!gcry_check_version(NULL)) { log_error("gcry_check_version"); rv = -ENOENT; goto out; } gcry_control(GCRYCTL_DISABLE_SECMEM, 0); gcry_control(GCRYCTL_INITIALIZATION_FINISHED, 0); #endif } /* Set "local" pointer, ignoring errors. */ if (cl.type == DAEMON && cl.site[0]) { if (!find_site_by_name(cl.site, &local, 1)) { log_error("Cannot find \"%s\" in the configuration.", cl.site); return -EINVAL; } local->local = 1; } else find_myself(NULL, type == CLIENT || type == GEOSTORE); rv = check_config(type); if (rv < 0) goto out; /* Per default the PID file name is derived from the * configuration name. */ if (!cl.lockfile[0]) { snprintf(cl.lockfile, sizeof(cl.lockfile)-1, "%s/%s.pid", BOOTH_RUN_DIR, booth_conf->name); } out: return rv; } static int setup_transport(void) { int rv; rv = transport()->init(message_recv); if (rv < 0) { log_error("failed to init booth_transport %s", transport()->name); goto out; } rv = booth_transport[TCP].init(NULL); if (rv < 0) { log_error("failed to init booth_transport[TCP]"); goto out; } out: return rv; } static int write_daemon_state(int fd, int state) { char buffer[1024]; int rv, size; size = sizeof(buffer) - 1; rv = snprintf(buffer, size, "booth_pid=%d " "booth_state=%s " "booth_type=%s " "booth_cfg_name='%s' " "booth_id=%d " "booth_addr_string='%s' " "booth_port=%d\n", getpid(), ( state == BOOTHD_STARTED ? "started" : state == BOOTHD_STARTING ? "starting" : "invalid"), type_to_string(local->type), booth_conf->name, local->site_id, local->addr_string, booth_conf->port); if (rv < 0 || rv == size) { log_error("Buffer filled up in write_daemon_state()."); return -1; } size = rv; rv = ftruncate(fd, 0); if (rv < 0) { log_error("lockfile %s truncate error %d: %s", cl.lockfile, errno, strerror(errno)); return rv; } rv = lseek(fd, 0, SEEK_SET); if (rv < 0) { log_error("lseek set fd(%d) offset to 0 error, return(%d), message(%s)", fd, rv, strerror(errno)); rv = -1; return rv; } rv = write(fd, buffer, size); if (rv != size) { log_error("write to fd(%d, %d) returned %d, errno %d, message(%s)", fd, size, rv, errno, strerror(errno)); return -1; } return 0; } static int loop(int fd) { void (*workfn) (int ci); void (*deadfn) (int ci); int rv, i; rv = setup_transport(); if (rv < 0) goto fail; rv = setup_ticket(); if (rv < 0) goto fail; rv = write_daemon_state(fd, BOOTHD_STARTED); if (rv != 0) { log_error("write daemon state %d to lockfile error %s: %s", BOOTHD_STARTED, cl.lockfile, strerror(errno)); goto fail; } log_info("BOOTH %s daemon started, node id is 0x%08X (%d).", type_to_string(local->type), local->site_id, local->site_id); while (1) { rv = poll(pollfds, client_maxi + 1, poll_timeout); if (rv == -1 && errno == EINTR) continue; if (rv < 0) { log_error("poll failed: %s (%d)", strerror(errno), errno); goto fail; } for (i = 0; i <= client_maxi; i++) { if (clients[i].fd < 0) continue; if (pollfds[i].revents & POLLIN) { workfn = clients[i].workfn; if (workfn) workfn(i); } if (pollfds[i].revents & (POLLERR | POLLHUP | POLLNVAL)) { deadfn = clients[i].deadfn; if (deadfn) deadfn(i); } } process_tickets(); } return 0; fail: return -1; } static int test_reply(cmd_result_t reply_code, cmd_request_t cmd) { int rv = 0; const char *op_str = ""; if (cmd == CMD_GRANT) op_str = "grant"; else if (cmd == CMD_REVOKE) op_str = "revoke"; else if (cmd == CMD_LIST) op_str = "list"; else if (cmd == CMD_PEERS) op_str = "peers"; else { log_error("internal error reading reply result!"); return -1; } switch (reply_code) { case RLT_OVERGRANT: log_info("You're granting a granted ticket. " "If you wanted to migrate a ticket, " "use revoke first, then use grant."); rv = -1; break; case RLT_TICKET_IDLE: log_info("ticket is not owned"); rv = 0; break; case RLT_ASYNC: log_info("%s command sent, result will be returned " "asynchronously. Please use \"booth list\" to " "see the outcome.", op_str); rv = 0; break; case RLT_CIB_PENDING: log_info("%s succeeded (CIB commit pending)", op_str); /* wait for the CIB commit? */ rv = (cl.options & OPT_WAIT_COMMIT) ? 3 : 0; break; case RLT_MORE: rv = 2; break; case RLT_SYNC_SUCC: case RLT_SUCCESS: if (cmd != CMD_LIST && cmd != CMD_PEERS) log_info("%s succeeded!", op_str); rv = 0; break; case RLT_SYNC_FAIL: log_info("%s failed!", op_str); rv = -1; break; case RLT_INVALID_ARG: log_error("ticket \"%s\" does not exist", cl.msg.ticket.id); rv = -1; break; case RLT_AUTH: log_error("authentication error"); rv = -1; break; case RLT_EXT_FAILED: log_error("before-acquire-handler for ticket \"%s\" failed, grant denied", cl.msg.ticket.id); rv = -1; break; case RLT_ATTR_PREREQ: log_error("attr-prereq for ticket \"%s\" failed, grant denied", cl.msg.ticket.id); rv = -1; break; case RLT_REDIRECT: /* talk to another site */ rv = 1; break; default: log_error("got an error code: %x", rv); rv = -1; } return rv; } static int query_get_string_answer(cmd_request_t cmd) { struct booth_site *site; struct boothc_hdr_msg reply; struct boothc_header *header; char *data; int data_len; int rv; struct booth_transport const *tpt; int (*test_reply_f) (cmd_result_t reply_code, cmd_request_t cmd); size_t msg_size; void *request; if (cl.type == GEOSTORE) { test_reply_f = test_attr_reply; msg_size = sizeof(cl.attr_msg); request = &cl.attr_msg; } else { test_reply_f = test_reply; msg_size = sizeof(cl.msg); request = &cl.msg; } header = (struct boothc_header *)request; data = NULL; init_header(header, cmd, 0, cl.options, 0, 0, msg_size); if (!*cl.site) site = local; else if (!find_site_by_name(cl.site, &site, 1)) { log_error("cannot find site \"%s\"", cl.site); rv = ENOENT; goto out; } tpt = booth_transport + TCP; rv = tpt->open(site); if (rv < 0) goto out_close; rv = tpt->send(site, request, msg_size); if (rv < 0) goto out_close; rv = tpt->recv_auth(site, &reply, sizeof(reply)); if (rv < 0) goto out_close; data_len = ntohl(reply.header.length) - rv; /* no attribute, or no ticket found */ if (!data_len) { goto out_test_reply; } data = malloc(data_len+1); if (!data) { rv = -ENOMEM; goto out_close; } rv = tpt->recv(site, data, data_len); if (rv < 0) goto out_close; *(data+data_len) = '\0'; *(data + data_len) = '\0'; (void)fputs(data, stdout); fflush(stdout); rv = 0; out_test_reply: rv = test_reply_f(ntohl(reply.header.result), cmd); out_close: tpt->close(site); out: if (data) free(data); return rv; } static int do_command(cmd_request_t cmd) { struct booth_site *site; struct boothc_ticket_msg reply; struct booth_transport const *tpt; uint32_t leader_id; int rv; int reply_cnt = 0, msg_logged = 0; const char *op_str = ""; if (cmd == CMD_GRANT) op_str = "grant"; else if (cmd == CMD_REVOKE) op_str = "revoke"; rv = 0; site = NULL; /* Always use TCP for client - at least for now. */ tpt = booth_transport + TCP; if (!*cl.site) site = local; else { if (!find_site_by_name(cl.site, &site, 1)) { log_error("Site \"%s\" not configured.", cl.site); goto out_close; } } if (site->type == ARBITRATOR) { if (site == local) { log_error("We're just an arbitrator, cannot grant/revoke tickets here."); } else { log_error("%s is just an arbitrator, cannot grant/revoke tickets there.", cl.site); } goto out_close; } assert(site->type == SITE); /* We don't check for existence of ticket, so that asking can be * done without local configuration, too. * Although, that means that the UDP port has to be specified, too. */ if (!cl.msg.ticket.id[0]) { /* If the loaded configuration has only a single ticket defined, use that. */ if (booth_conf->ticket_count == 1) { strncpy(cl.msg.ticket.id, booth_conf->ticket[0].name, sizeof(cl.msg.ticket.id)); } else { log_error("No ticket given."); goto out_close; } } redirect: init_header(&cl.msg.header, cmd, 0, cl.options, 0, 0, sizeof(cl.msg)); rv = tpt->open(site); if (rv < 0) goto out_close; rv = tpt->send(site, &cl.msg, sendmsglen(&cl.msg)); if (rv < 0) goto out_close; read_more: rv = tpt->recv_auth(site, &reply, sizeof(reply)); if (rv < 0) { /* print any errors depending on the code sent by the * server */ (void)test_reply(ntohl(reply.header.result), cmd); goto out_close; } rv = test_reply(ntohl(reply.header.result), cmd); if (rv == 1) { tpt->close(site); leader_id = ntohl(reply.ticket.leader); if (!find_site_by_id(leader_id, &site)) { log_error("Message with unknown redirect site %x received", leader_id); rv = -1; goto out_close; } goto redirect; } else if (rv == 2 || rv == 3) { /* the server has more to say */ /* don't wait too long */ if (reply_cnt > 1 && !(cl.options & OPT_WAIT)) { rv = 0; log_info("Giving up on waiting for the definite result. " "Please use \"booth list\" later to " "see the outcome."); goto out_close; } if (reply_cnt == 0) { log_info("%s request sent, " "waiting for the result ...", op_str); msg_logged++; } else if (rv == 3 && msg_logged < 2) { log_info("waiting for the CIB commit ..."); msg_logged++; } reply_cnt++; goto read_more; } out_close: if (site) tpt->close(site); return rv; } static int _lockfile(int mode, int *fdp, pid_t *locked_by) { struct flock lock; int fd, rv; /* After reboot the directory may not yet exist. * Try to create it, but ignore errors. */ if (strncmp(cl.lockfile, BOOTH_RUN_DIR, strlen(BOOTH_RUN_DIR)) == 0) mkdir(BOOTH_RUN_DIR, 0775); if (locked_by) *locked_by = 0; *fdp = -1; fd = open(cl.lockfile, mode, 0664); if (fd < 0) return errno; *fdp = fd; lock.l_type = F_WRLCK; lock.l_start = 0; lock.l_whence = SEEK_SET; lock.l_len = 0; lock.l_pid = 0; if (fcntl(fd, F_SETLK, &lock) == 0) return 0; rv = errno; if (locked_by) if (fcntl(fd, F_GETLK, &lock) == 0) *locked_by = lock.l_pid; return rv; } static inline int is_root(void) { return geteuid() == 0; } static int create_lockfile(void) { int rv, fd; fd = -1; rv = _lockfile(O_CREAT | O_WRONLY, &fd, NULL); if (fd == -1) { log_error("lockfile %s open error %d: %s", cl.lockfile, rv, strerror(rv)); return -1; } if (rv < 0) { log_error("lockfile %s setlk error %d: %s", cl.lockfile, rv, strerror(rv)); goto fail; } rv = write_daemon_state(fd, BOOTHD_STARTING); if (rv != 0) { log_error("write daemon state %d to lockfile error %s: %s", BOOTHD_STARTING, cl.lockfile, strerror(errno)); goto fail; } if (is_root()) { if (fchown(fd, booth_conf->uid, booth_conf->gid) < 0) log_error("fchown() on lockfile said %d: %s", errno, strerror(errno)); } return fd; fail: close(fd); return -1; } static void unlink_lockfile(int fd) { unlink(cl.lockfile); close(fd); } static void print_usage(void) { printf( "Usage:\n" " booth list [options]\n" " booth {grant|revoke} [options] \n" " booth status [options]\n" "\n" " list: List all tickets\n" " grant: Grant ticket to site\n" " revoke: Revoke ticket\n" "\n" "Options:\n" " -c FILE Specify config file [default " BOOTH_DEFAULT_CONF "]\n" " Can be a path or just a name without \".conf\" suffix\n" " -s Connect/grant to a different site\n" " -F Try to grant the ticket immediately\n" " even if not all sites are reachable\n" " -w Wait forever for the outcome of the request\n" " -C Wait until the ticket is committed to the CIB (grant only)\n" " -h Print this help\n" "\n" "Examples:\n" "\n" " # booth list (list tickets)\n" " # booth grant ticket-A (grant ticket here)\n" " # booth grant -s 10.121.8.183 ticket-A (grant ticket to site 10.121.8.183)\n" " # booth revoke ticket-A (revoke ticket)\n" "\n" "See the booth(8) man page for more details.\n" ); } #define OPTION_STRING "c:Dl:t:s:FhSwC" #define ATTR_OPTION_STRING "c:Dt:s:h" void safe_copy(char *dest, char *value, size_t buflen, const char *description) { int content_len = buflen - 1; if (strlen(value) >= content_len) { fprintf(stderr, "'%s' exceeds maximum %s length of %d\n", value, description, content_len); exit(EXIT_FAILURE); } strncpy(dest, value, content_len); dest[content_len] = 0; } static int host_convert(char *hostname, char *ip_str, size_t ip_size) { struct addrinfo *result = NULL, hints = {0}; int re = -1; memset(&hints, 0, sizeof(hints)); hints.ai_family = AF_INET; hints.ai_socktype = SOCK_DGRAM; re = getaddrinfo(hostname, NULL, &hints, &result); if (re == 0) { struct in_addr addr = ((struct sockaddr_in *)result->ai_addr)->sin_addr; const char *re_ntop = inet_ntop(AF_INET, &addr, ip_str, ip_size); if (re_ntop == NULL) { re = -1; } } freeaddrinfo(result); return re; } #define cparg(dest, descr) do { \ if (optind >= argc) \ goto missingarg; \ safe_copy(dest, argv[optind], sizeof(dest), descr); \ optind++; \ } while(0) static int read_arguments(int argc, char **argv) { int optchar; char *arg1 = argv[1]; char *op = NULL; char *cp; const char *opt_string = OPTION_STRING; char site_arg[INET_ADDRSTRLEN] = {0}; int left; cl.type = 0; if ((cp = strstr(argv[0], ATTR_PROG)) && !strcmp(cp, ATTR_PROG)) { cl.type = GEOSTORE; op = argv[1]; optind = 2; opt_string = ATTR_OPTION_STRING; } else if (argc > 1 && (strcmp(arg1, "arbitrator") == 0 || strcmp(arg1, "site") == 0 || strcmp(arg1, "start") == 0 || strcmp(arg1, "daemon") == 0)) { cl.type = DAEMON; optind = 2; } else if (argc > 1 && (strcmp(arg1, "status") == 0)) { cl.type = STATUS; optind = 2; } else if (argc > 1 && (strcmp(arg1, "client") == 0)) { cl.type = CLIENT; if (argc < 3) { print_usage(); exit(EXIT_FAILURE); } op = argv[2]; optind = 3; } if (!cl.type) { cl.type = CLIENT; op = argv[1]; optind = 2; } if (argc < 2 || !strcmp(arg1, "help") || !strcmp(arg1, "--help") || !strcmp(arg1, "-h")) { if (cl.type == GEOSTORE) print_geostore_usage(); else print_usage(); exit(EXIT_SUCCESS); } if (!strcmp(arg1, "version") || !strcmp(arg1, "--version") || !strcmp(arg1, "-V")) { printf("%s %s\n", argv[0], RELEASE_STR); exit(EXIT_SUCCESS); } if (cl.type == CLIENT) { if (!strcmp(op, "list")) cl.op = CMD_LIST; else if (!strcmp(op, "grant")) cl.op = CMD_GRANT; else if (!strcmp(op, "revoke")) cl.op = CMD_REVOKE; else if (!strcmp(op, "peers")) cl.op = CMD_PEERS; else { fprintf(stderr, "client operation \"%s\" is unknown\n", op); exit(EXIT_FAILURE); } } else if (cl.type == GEOSTORE) { if (!strcmp(op, "list")) cl.op = ATTR_LIST; else if (!strcmp(op, "set")) cl.op = ATTR_SET; else if (!strcmp(op, "get")) cl.op = ATTR_GET; else if (!strcmp(op, "delete")) cl.op = ATTR_DEL; else { fprintf(stderr, "attribute operation \"%s\" is unknown\n", op); exit(EXIT_FAILURE); } } while (optind < argc) { optchar = getopt(argc, argv, opt_string); switch (optchar) { case 'c': if (strchr(optarg, '/')) { safe_copy(cl.configfile, optarg, sizeof(cl.configfile), "config file"); } else { /* If no "/" in there, use with default directory. */ strcpy(cl.configfile, BOOTH_DEFAULT_CONF_DIR); cp = cl.configfile + strlen(BOOTH_DEFAULT_CONF_DIR); assert(cp > cl.configfile); assert(*(cp-1) == '/'); /* Write at the \0, ie. after the "/" */ safe_copy(cp, optarg, (sizeof(cl.configfile) - (cp - cl.configfile) - strlen(BOOTH_DEFAULT_CONF_EXT)), "config name"); /* If no extension, append ".conf". * Space is available, see -strlen() above. */ if (!strchr(cp, '.')) strcat(cp, BOOTH_DEFAULT_CONF_EXT); } break; + case 'D': debug_level++; enable_stderr = 1; - /* Fall through */ + break; + case 'S': daemonize = 0; + enable_stderr = 1; break; case 'l': safe_copy(cl.lockfile, optarg, sizeof(cl.lockfile), "lock file"); break; case 't': if (cl.op == CMD_GRANT || cl.op == CMD_REVOKE) { safe_copy(cl.msg.ticket.id, optarg, sizeof(cl.msg.ticket.id), "ticket name"); } else if (cl.type == GEOSTORE) { safe_copy(cl.attr_msg.attr.tkt_id, optarg, sizeof(cl.attr_msg.attr.tkt_id), "ticket name"); } else { print_usage(); exit(EXIT_FAILURE); } break; case 's': /* For testing and debugging: allow "-s site" also for * daemon start, so that the address that should be used * can be set manually. * This makes it easier to start multiple processes * on one machine. */ if (cl.type == CLIENT || cl.type == GEOSTORE || (cl.type == DAEMON && debug_level)) { if (strcmp(optarg, OTHER_SITE) && host_convert(optarg, site_arg, INET_ADDRSTRLEN) == 0) { safe_copy(cl.site, site_arg, sizeof(cl.site), "site name"); } else { safe_copy(cl.site, optarg, sizeof(cl.site), "site name"); } } else { log_error("\"-s\" not allowed in daemon mode."); exit(EXIT_FAILURE); } break; case 'F': if (cl.type != CLIENT || cl.op != CMD_GRANT) { log_error("use \"-F\" only for client grant"); exit(EXIT_FAILURE); } cl.options |= OPT_IMMEDIATE; break; case 'w': if (cl.type != CLIENT || (cl.op != CMD_GRANT && cl.op != CMD_REVOKE)) { log_error("use \"-w\" only for grant and revoke"); exit(EXIT_FAILURE); } cl.options |= OPT_WAIT; break; case 'C': if (cl.type != CLIENT || cl.op != CMD_GRANT) { log_error("use \"-C\" only for grant"); exit(EXIT_FAILURE); } cl.options |= OPT_WAIT | OPT_WAIT_COMMIT; break; case 'h': if (cl.type == GEOSTORE) print_geostore_usage(); else print_usage(); exit(EXIT_SUCCESS); break; case ':': case '?': fprintf(stderr, "Please use '-h' for usage.\n"); exit(EXIT_FAILURE); break; case -1: /* No more parameters on cmdline, only arguments. */ goto extra_args; default: goto unknown; }; } return 0; extra_args: if (cl.type == CLIENT && !cl.msg.ticket.id[0]) { cparg(cl.msg.ticket.id, "ticket name"); } else if (cl.type == GEOSTORE) { if (cl.op != ATTR_LIST) { cparg(cl.attr_msg.attr.name, "attribute name"); } if (cl.op == ATTR_SET) { cparg(cl.attr_msg.attr.val, "attribute value"); } } if (optind == argc) return 0; left = argc - optind; fprintf(stderr, "Superfluous argument%s: %s%s\n", left == 1 ? "" : "s", argv[optind], left == 1 ? "" : "..."); exit(EXIT_FAILURE); unknown: fprintf(stderr, "unknown option: %s\n", argv[optind]); exit(EXIT_FAILURE); missingarg: fprintf(stderr, "not enough arguments\n"); exit(EXIT_FAILURE); } static void set_scheduler(void) { struct sched_param sched_param; struct rlimit rlimit; int rv; rlimit.rlim_cur = RLIM_INFINITY; rlimit.rlim_max = RLIM_INFINITY; setrlimit(RLIMIT_MEMLOCK, &rlimit); rv = mlockall(MCL_CURRENT | MCL_FUTURE); if (rv < 0) { log_error("mlockall failed"); } rv = sched_get_priority_max(SCHED_RR); if (rv != -1) { sched_param.sched_priority = rv; rv = sched_setscheduler(0, SCHED_RR, &sched_param); if (rv == -1) log_error("could not set SCHED_RR priority %d: %s (%d)", sched_param.sched_priority, strerror(errno), errno); } else { log_error("could not get maximum scheduler priority err %d", errno); } } static int set_procfs_val(const char *path, const char *val) { int rc = -1; FILE *fp = fopen(path, "w"); if (fp) { if (fprintf(fp, "%s", val) > 0) rc = 0; fclose(fp); } return rc; } static int do_status(int type) { pid_t pid; int rv, status_lock_fd, ret; const char *reason = NULL; char lockfile_data[1024], *cp; ret = PCMK_OCF_NOT_RUNNING; rv = setup_config(type); if (rv) { reason = "Error reading configuration."; ret = PCMK_OCF_UNKNOWN_ERROR; goto quit; } if (!local) { reason = "No Service IP active here."; goto quit; } rv = _lockfile(O_RDWR, &status_lock_fd, &pid); if (status_lock_fd == -1) { reason = "No PID file."; goto quit; } if (rv == 0) { close(status_lock_fd); reason = "PID file not locked."; goto quit; } if (pid) { fprintf(stdout, "booth_lockpid=%d ", pid); fflush(stdout); } rv = read(status_lock_fd, lockfile_data, sizeof(lockfile_data) - 1); if (rv < 4) { close(status_lock_fd); reason = "Cannot read lockfile data."; ret = PCMK_LSB_UNKNOWN_ERROR; goto quit; } lockfile_data[rv] = 0; close(status_lock_fd); /* Make sure it's only a single line */ cp = strchr(lockfile_data, '\r'); if (cp) *cp = 0; cp = strchr(lockfile_data, '\n'); if (cp) *cp = 0; rv = setup_tcp_listener(1); if (rv == 0) { reason = "TCP port not in use."; goto quit; } fprintf(stdout, "booth_lockfile='%s' %s\n", cl.lockfile, lockfile_data); if (!daemonize) fprintf(stderr, "Booth at %s port %d seems to be running.\n", local->addr_string, booth_conf->port); return 0; quit: log_debug("not running: %s", reason); /* Ie. "DEBUG" */ if (!daemonize) fprintf(stderr, "not running: %s\n", reason); return ret; } static int limit_this_process(void) { int rv; if (!is_root()) return 0; if (setregid(booth_conf->gid, booth_conf->gid) < 0) { rv = errno; log_error("setregid() didn't work: %s", strerror(rv)); return rv; } if (setreuid(booth_conf->uid, booth_conf->uid) < 0) { rv = errno; log_error("setreuid() didn't work: %s", strerror(rv)); return rv; } return 0; } static int lock_fd = -1; static void server_exit(void) { int rv; if (lock_fd >= 0) { /* We might not be able to delete it, but at least * make it empty. */ rv = ftruncate(lock_fd, 0); (void)rv; unlink_lockfile(lock_fd); } log_info("exiting"); } static void sig_exit_handler(int sig) { log_info("caught signal %d", sig); exit(0); } static int do_server(int type) { int rv = -1; static char log_ent[128] = DAEMON_NAME "-"; rv = setup_config(type); if (rv < 0) return rv; if (!local) { log_error("Cannot find myself in the configuration."); exit(EXIT_FAILURE); } if (daemonize) { if (daemon(0, 0) < 0) { perror("daemon error"); exit(EXIT_FAILURE); } } /* The lockfile must be written to _after_ the call to daemon(), so * that the lockfile contains the pid of the daemon, not the parent. */ lock_fd = create_lockfile(); if (lock_fd < 0) return lock_fd; atexit(server_exit); strcat(log_ent, type_to_string(local->type)); cl_log_set_entity(log_ent); cl_log_enable_stderr(enable_stderr ? TRUE : FALSE); cl_log_set_facility(HA_LOG_FACILITY); cl_inherit_logging_environment(0); log_info("BOOTH %s %s daemon is starting", type_to_string(local->type), RELEASE_STR); signal(SIGUSR1, (__sighandler_t)tickets_log_info); signal(SIGTERM, (__sighandler_t)sig_exit_handler); signal(SIGINT, (__sighandler_t)sig_exit_handler); /* we'll handle errors there and then */ signal(SIGPIPE, SIG_IGN); set_scheduler(); /* we don't want to be killed by the OOM-killer */ if (set_procfs_val("/proc/self/oom_score_adj", "-999")) (void)set_procfs_val("/proc/self/oom_adj", "-16"); set_proc_title("%s %s %s for [%s]:%d", DAEMON_NAME, cl.configfile, type_to_string(local->type), local->addr_string, booth_conf->port); rv = limit_this_process(); if (rv) return rv; #ifdef COREDUMP_NURSING if (cl_enable_coredumps(TRUE) < 0){ log_error("enabling core dump failed"); } cl_cdtocoredir(); prctl(PR_SET_DUMPABLE, (unsigned long)TRUE, 0UL, 0UL, 0UL); #else if (chdir(BOOTH_CORE_DIR) < 0) { log_error("cannot change working directory to %s", BOOTH_CORE_DIR); } #endif signal(SIGCHLD, (__sighandler_t)wait_child); rv = loop(lock_fd); return rv; } static int do_client(void) { int rv; rv = setup_config(CLIENT); if (rv < 0) { log_error("cannot read config"); goto out; } switch (cl.op) { case CMD_LIST: case CMD_PEERS: rv = query_get_string_answer(cl.op); break; case CMD_GRANT: case CMD_REVOKE: rv = do_command(cl.op); break; } out: return rv; } static int do_attr(void) { int rv = -1; rv = setup_config(GEOSTORE); if (rv < 0) { log_error("cannot read config"); goto out; } /* We don't check for existence of ticket, so that asking can be * done without local configuration, too. * Although, that means that the UDP port has to be specified, too. */ if (!cl.attr_msg.attr.tkt_id[0]) { /* If the loaded configuration has only a single ticket defined, use that. */ if (booth_conf->ticket_count == 1) { strncpy(cl.attr_msg.attr.tkt_id, booth_conf->ticket[0].name, sizeof(cl.attr_msg.attr.tkt_id)); } else { rv = 1; log_error("No ticket given."); goto out; } } switch (cl.op) { case ATTR_LIST: case ATTR_GET: rv = query_get_string_answer(cl.op); break; case ATTR_SET: case ATTR_DEL: rv = do_attr_command(cl.op); break; } out: return rv; } int main(int argc, char *argv[], char *envp[]) { int rv; const char *cp; #ifdef LOGGING_LIBQB enum qb_log_target_slot i; #endif init_set_proc_title(argc, argv, envp); get_time(&start_time); memset(&cl, 0, sizeof(cl)); strncpy(cl.configfile, BOOTH_DEFAULT_CONF, BOOTH_PATH_LEN - 1); cl.lockfile[0] = 0; debug_level = 0; cp = ((cp = strstr(argv[0], ATTR_PROG)) && !strcmp(cp, ATTR_PROG) ? ATTR_PROG : "booth"); #ifndef LOGGING_LIBQB cl_log_set_entity(cp); #else qb_log_init(cp, LOG_USER, LOG_DEBUG); /* prio driven by debug_level */ for (i = QB_LOG_TARGET_START; i < QB_LOG_TARGET_MAX; i++) { if (i == QB_LOG_SYSLOG || i == QB_LOG_BLACKBOX) continue; qb_log_format_set(i, "%t %H %N: [%P]: %p: %b"); } (void) qb_log_filter_ctl(QB_LOG_STDERR, QB_LOG_FILTER_ADD, QB_LOG_FILTER_FILE, "*", LOG_DEBUG); #endif cl_log_enable_stderr(TRUE); cl_log_set_facility(0); rv = read_arguments(argc, argv); if (rv < 0) goto out; switch (cl.type) { case STATUS: rv = do_status(cl.type); break; case ARBITRATOR: case DAEMON: case SITE: rv = do_server(cl.type); break; case CLIENT: rv = do_client(); break; case GEOSTORE: rv = do_attr(); break; } out: #ifdef LOGGING_LIBQB qb_log_fini(); #endif /* Normalize values. 0x100 would be seen as "OK" by waitpid(). */ return (rv >= 0 && rv < 0x70) ? rv : 1; } diff --git a/test/boothrunner.py b/test/boothrunner.py index ac56df8..f9154e7 100755 --- a/test/boothrunner.py +++ b/test/boothrunner.py @@ -1,108 +1,111 @@ #!/usr/bin/python import os import subprocess import time import unittest class BoothRunner: default_config_file = '/etc/booth/booth.conf' default_lock_file = '/var/run/booth.pid' def __init__(self, boothd_path, mode, args): self.boothd_path = boothd_path self.args = [ mode ] self.final_args = args # will be appended to self.args self.mode = mode self.config_file = None self.lock_file = None def set_config_file_arg(self): self.args += [ '-c', self.config_file ] def set_config_file(self, config_file): self.config_file = config_file self.set_config_file_arg() def set_lock_file(self, lock_file): self.lock_file = lock_file self.args += [ '-l', self.lock_file ] def set_debug(self): self.args += [ '-D' ] + def set_foreground(self): + self.args += [ '-S' ] + def all_args(self): return [ self.boothd_path ] + self.args + self.final_args def show_output(self, stdout, stderr): if stdout: print "STDOUT:" print "------" print stdout, if stderr: print "STDERR: (N.B. crm_ticket failures indicate daemon started correctly)" print "------" print stderr, print "-" * 70 def subproc_completed_within(self, p, timeout): start = time.time() wait = 0.1 while True: if p.poll() is not None: return True elapsed = time.time() - start if elapsed + wait > timeout: wait = timeout - elapsed print "Waiting on %d for %.1fs ..." % (p.pid, wait) time.sleep(wait) elapsed = time.time() - start if elapsed >= timeout: return False wait *= 2 def lock_file_used(self): return self.lock_file or self.default_lock_file def config_file_used(self): return self.config_file or self.default_config_file def config_text_used(self): config_file = self.config_file_used() try: c = open(config_file) except: return None text = "".join(c.readlines()) c.close() text = text.replace('\t', '') text = text.replace('\n', '|\n') return text def show_args(self): print "\n" print "-" * 70 print "Running", ' '.join(self.all_args()) msg = "with config from %s" % self.config_file_used() config_text = self.config_text_used() if config_text is not None: msg += ": [%s]" % config_text print msg def run(self): p = subprocess.Popen(self.all_args(), stdout=subprocess.PIPE, stderr=subprocess.PIPE) if not p: raise RuntimeError, "failed to start subprocess" print "Started subprocess pid %d" % p.pid completed = self.subproc_completed_within(p, 2) if completed: (stdout, stderr) = p.communicate() self.show_output(stdout, stderr) return (p.pid, p.returncode, stdout, stderr) return (p.pid, None, None, None) diff --git a/test/serverenv.py b/test/serverenv.py index 59ae6fa..d0467b9 100755 --- a/test/serverenv.py +++ b/test/serverenv.py @@ -1,203 +1,208 @@ #!/usr/bin/python import os import re import time import unittest from assertions import BoothAssertions from boothrunner import BoothRunner from boothtestenv import BoothTestEnvironment from utils import get_IP class ServerTestEnvironment(BoothTestEnvironment): ''' boothd site/arbitrator will hang in setup phase while attempting to connect to an unreachable peer during ticket_catchup(). In a test environment we don't have any reachable peers. Fortunately, we can still successfully launch a daemon by only listing our own IP in the config file. ''' typical_config = """\ # This is like the config in the manual transport="UDP" port="9929" # Here's another comment #arbitrator="147.2.207.14" site="147.4.215.19" #site="147.18.2.1" ticket="ticketA" ticket="ticketB" """ site_re = re.compile('^site=".+"', re.MULTILINE) working_config = re.sub(site_re, 'site="%s"' % get_IP(), typical_config, 1) def run_booth(self, expected_exitcode, expected_daemon, config_text=None, config_file=None, lock_file=True, - args=[], debug=False): + args=[], debug=False, foreground=False): ''' Runs boothd. Defaults to using a temporary lock file and the standard config file path. There are four possible types of outcome: - boothd exits non-zero without launching a daemon (setup phase failed, e.g. due to invalid configuration file) - boothd exits zero after launching a daemon (successful operation) - - boothd does not exit (running in foreground / debug mode) + - boothd does not exit (running in foreground mode) - boothd does not exit (setup phase hangs, e.g. while attempting to connect to peer during ticket_catchup()) Arguments: config_text a string containing the contents of a configuration file to use config_file path to a configuration file to use lock_file False: don't pass a lockfile parameter to booth via -l True: pass a temporary lockfile parameter to booth via -l string: pass the given lockfile path to booth via -l args array of extra args to pass to booth expected_exitcode an integer, or False if booth is not expected to terminate within the timeout expected_daemon - True iff a daemon is expected to be launched (this includes - running the server in debug / foreground mode via -D; even - though in this case the server's not technically not a daemon, + True iff a daemon is expected to be launched (this means + running the server in foreground mode via -S; + even though in this case the server's not technically not a daemon, we still want to treat it like one by checking the lockfile before and after we kill it) debug True means pass the -D parameter + foreground + True means pass the -S parameter Returns a (pid, return_code, stdout, stderr, runner) tuple, where return_code/stdout/stderr are None iff pid is still running. ''' if expected_daemon and expected_exitcode is not None and expected_exitcode != 0: raise RuntimeError, \ "Shouldn't ever expect daemon to start and then failure" if not expected_daemon and expected_exitcode == 0: raise RuntimeError, \ "Shouldn't ever expect success without starting daemon" self.init_log() runner = BoothRunner(self.boothd_path, self.mode, args) if config_text: config_file = self.write_config_file(config_text) if config_file: runner.set_config_file(config_file) if lock_file is True: lock_file = os.path.join(self.test_path, 'boothd-lock.pid') if lock_file: runner.set_lock_file(lock_file) if debug: runner.set_debug() + if foreground: + runner.set_foreground() + runner.show_args() (pid, return_code, stdout, stderr) = runner.run() self.check_return_code(pid, return_code, expected_exitcode) if expected_daemon: self.check_daemon_handling(runner, expected_daemon) elif return_code is None: # This isn't strictly necessary because we ensure no # daemon is running from within test setUp(), but it's # probably a good idea to tidy up after ourselves anyway. self.kill_pid(pid) return (pid, return_code, stdout, stderr, runner) def write_config_file(self, config_text): config_file = self.get_tempfile('config') c = open(config_file, 'w') c.write(config_text) c.close() return config_file def kill_pid(self, pid): print "killing %d ..." % pid os.kill(pid, 15) print "killed" def check_daemon_handling(self, runner, expected_daemon): ''' Check that the lock file contains a pid referring to a running daemon. Then kill the daemon, and ensure that the lock file vanishes (bnc#749763). ''' daemon_pid = self.get_daemon_pid_from_lock_file(runner.lock_file) err = "lock file should contain pid" if not expected_daemon: err += ", even though we didn't expect a daemon" self.assertTrue(daemon_pid is not None, err) daemon_running = self.is_pid_running_daemon(daemon_pid) err = "pid in lock file should refer to a running daemon" self.assertTrue(daemon_running, err) if daemon_running: self.kill_pid(int(daemon_pid)) time.sleep(1) daemon_pid = self.get_daemon_pid_from_lock_file(runner.lock_file) self.assertTrue(daemon_pid is None, 'bnc#749763: lock file should vanish after daemon is killed') def get_daemon_pid_from_lock_file(self, lock_file): ''' Returns the pid contained in lock_file, or None if it doesn't exist. ''' if not os.path.exists(lock_file): print "%s does not exist" % lock_file return None l = open(lock_file) lines = l.readlines() l.close() self.assertEqual(len(lines), 1, "Lock file should contain one line") pid = re.search('\\bbooth_pid="?(\\d+)"?', lines[0]).group(1) print "lockfile contains: <%s>" % pid return pid def is_pid_running_daemon(self, pid): ''' Returns true iff the given pid refers to a running boothd process. ''' path = "/proc/%s" % pid pid_running = os.path.isdir(path) # print "======" # import subprocess # print subprocess.check_output(['lsof', '-p', pid]) # print subprocess.check_output(['ls', path]) # print subprocess.check_output(['cat', "/proc/%s/cmdline" % pid]) # print "======" if not pid_running: return False c = open("/proc/%s/cmdline" % pid) cmdline = "".join(c.readlines()) print cmdline c.close() if cmdline.find('boothd') == -1: print 'no boothd in cmdline:', cmdline return False # self.assertRegexpMatches( # cmdline, # 'boothd', # "lock file should refer to pid of running boothd" # ) return True def _test_buffer_overflow(self, expected_error, **args): (pid, ret, stdout, stderr, runner) = \ self.run_booth(expected_exitcode=1, expected_daemon=False, **args) self.assertRegexpMatches(stderr, expected_error) diff --git a/test/servertests.py b/test/servertests.py index 290bce2..f574f26 100755 --- a/test/servertests.py +++ b/test/servertests.py @@ -1,142 +1,152 @@ #!/usr/bin/python import copy from pprint import pprint, pformat import re import string from serverenv import ServerTestEnvironment class ServerTests(ServerTestEnvironment): # We don't know enough about the build/test system to rely on the # existence, permissions, contents of the default config file. So # we can't predict (and hence test) how booth will behave when -c # is not specified. # # def test_no_args(self): # # If do_server() called lockfile() first then this would be # # the appropriate test: # #self.assertLockFileError(lock_file=False) # # # If do_server() called setup() first, and the default # # config file was readable non-root, then this would be the # # appropriate test: # self.configFileMissingMyIP(lock_file=False) # # def test_custom_lock_file(self): # (pid, ret, stdout, stderr, runner) = \ # self.run_booth(expected_exitcode=1, expected_daemon=False) # self.assertRegexpMatches( # stderr, # 'failed to open %s: ' % runner.config_file_used(), # 'should fail to read default config file' # ) def test_example_config(self): self.configFileMissingMyIP(config_file=self.example_config_path) def test_config_file_buffer_overflow(self): # https://bugzilla.novell.com/show_bug.cgi?id=750256 longfile = (string.lowercase * 5)[:127] expected_error = "'%s' exceeds maximum config name length" % longfile self._test_buffer_overflow(expected_error, config_file=longfile) def test_lock_file_buffer_overflow(self): # https://bugzilla.novell.com/show_bug.cgi?id=750256 longfile = (string.lowercase * 5)[:127] expected_error = "'%s' exceeds maximum lock file length" % longfile self._test_buffer_overflow(expected_error, lock_file=longfile) def test_working_config(self): (pid, ret, stdout, stderr, runner) = \ self.run_booth(expected_exitcode=0, expected_daemon=True, config_text=self.working_config) def test_missing_quotes(self): # quotes no longer required return True orig_lines = self.working_config.split("\n") for i in xrange(len(orig_lines)): new_lines = copy.copy(orig_lines) new_lines[i] = new_lines[i].replace('"', '') new_config = "\n".join(new_lines) line_contains_IP = re.search('^\s*(site|arbitrator)=.*[0-9]\.', orig_lines[i]) if line_contains_IP: # IP addresses need to be surrounded by quotes, # so stripping them should cause it to fail expected_exitcode = 1 expected_daemon = False else: expected_exitcode = 0 expected_daemon = True (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=new_config, expected_exitcode=expected_exitcode, expected_daemon=expected_daemon) if line_contains_IP: self.assertRegexpMatches( self.read_log(), "(ERROR|error): invalid config file format: unquoted '.'", 'IP addresses need to be quoted' ) def test_debug_mode(self): (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=self.working_config, debug=True, + expected_exitcode=0, expected_daemon=True) + + def test_foreground_mode(self): + (pid, ret, stdout, stderr, runner) = \ + self.run_booth(config_text=self.working_config, foreground=True, + expected_exitcode=None, expected_daemon=True) + + def test_debug_and_foreground_mode(self): + (pid, ret, stdout, stderr, runner) = \ + self.run_booth(config_text=self.working_config, debug=True, foreground=True, expected_exitcode=None, expected_daemon=True) def test_missing_transport(self): # UDP is default -- TODO? return True config = re.sub('transport=.+\n', '', self.typical_config) (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=config, expected_exitcode=1, expected_daemon=False) self.assertRegexpMatches( self.read_log(), 'config file was missing transport line' ) def test_invalid_transport_protocol(self): config = re.sub('transport=.+', 'transport=SNEAKERNET', self.typical_config) (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=config, expected_exitcode=1, expected_daemon=False) self.assertRegexpMatches(stderr, 'invalid transport protocol') def test_missing_final_newline(self): config = re.sub('\n$', '', self.working_config) (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=config, expected_exitcode=0, expected_daemon=True) def test_a_few_trailing_whitespaces(self): for ws in (' ', ' '): new_config = self.working_config.replace("\n", ws + "\n", 3) (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=new_config, expected_exitcode=0, expected_daemon=True) def test_trailing_space_everywhere(self): for ws in (' ', ' '): new_config = self.working_config.replace("\n", ws + "\n") (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=new_config, expected_exitcode=0, expected_daemon=True) def test_unquoted_space(self): for ticket in ('unquoted space', 'unquoted space man'): new_config = re.sub('ticket=.+', 'ticket=' + ticket, self.working_config, 1) (pid, ret, stdout, stderr, runner) = \ self.run_booth(config_text=new_config, expected_exitcode=1, expected_daemon=False) self.assertRegexpMatches(stderr, 'ticket name "' + ticket + '" invalid') def test_unreachable_peer(self): # what should this test do? daemon not expected, but no exitcode either? # booth would now just run, and try to reach that peer... # TCP reachability is not required during startup anymore. return True config = re.sub('#(.+147.+)', lambda m: m.group(1), self.working_config) self.run_booth(config_text=config, expected_exitcode=None, expected_daemon=False)