diff --git a/docs/boothd.8.txt b/docs/boothd.8.txt index 7310ae7..123a226 100644 --- a/docs/boothd.8.txt +++ b/docs/boothd.8.txt @@ -1,502 +1,512 @@ BOOTHD(8) =========== :doctype: manpage NAME ---- boothd - The Booth Cluster Ticket Manager. SYNOPSIS -------- *boothd* 'daemon' [-SD] [-c 'config'] [-l 'lockfile'] *booth* 'list' [-s 'site'] [-c 'config'] *booth* 'grant' [-s 'site'] [-c 'config'] [-FCw] 'ticket' *booth* 'revoke' [-s 'site'] [-c 'config'] [-w] 'ticket' *booth* 'status' [-D] [-c 'config'] DESCRIPTION ----------- Booth manages tickets which authorizes one of the cluster sites located in geographically dispersed distances to run certain resources. It is designed to be extend Pacemaker to support geographically distributed clustering. It is based on the RAFT protocol, see eg. for details. SHORT EXAMPLES -------------- --------------------- # boothd daemon -D # booth list # booth grant ticket-nfs # booth revoke ticket-nfs --------------------- OPTIONS ------- *-c* 'configfile':: Configuration to use. + Can be a full path to a configuration file, or a short name; in the latter case, the directory '/etc/booth' and suffix '.conf' are added. Per default 'booth' is used, which results in the path '/etc/booth/booth.conf'. + The configuration name also determines the name of the PID file - for the defaults, '/var/run/booth/booth.pid'. *-s*:: Site address or name. *-F*:: 'immediate grant': Don't wait for unreachable sites to relinquish the ticket. See the 'Booth ticket management' section below for more details. + This option may be DANGEROUS. It makes booth grant the ticket even though it cannot ascertain that unreachable sites don't hold the same ticket. It is up to the user to make sure that unreachable sites don't have this ticket as granted. *-w*:: 'wait for the request outcome': The client waits for the final outcome of grant or revoke request. *-C*:: 'wait for ticket commit to CIB': The client waits for the ticket commit to CIB (only for grant requests). If one or more sites are unreachable, this takes the ticket expire time (plus, if defined, the 'acquire-after' time). *-h*, *--help*:: Give a short usage output. *--version*:: Report version information. *-S*:: 'systemd' mode: don't fork. This is like '-D' but without the debug output. *-D*:: Debug output/don't daemonize. Increases the debug output level; booth daemon remains in the foreground. *-l* 'lockfile':: Use another lock file. By default, the lock file name is inferred from the configuration file name. Normally not needed. COMMANDS -------- Whether the binary is called as 'boothd' or 'booth' doesn't matter; the first argument determines the mode of operation. *'daemon'*:: Tells 'boothd' to serve a site. The locally configured interfaces are searched for an IP address that is defined in the configuration. booth then runs in either /arbitrator/ or /site/ mode. *'client'*:: Booth clients can list the ticket information (see also 'crm_ticket -L'), and revoke or grant tickets to a site. + The grant and, under certain circumstances, revoke operations may take a while to return a definite operation's outcome. The client will wait up to the network timeout value (by default 5 seconds) for the result. Unless the '-w' option was set, in which case the client waits indefinitely. + In this mode the configuration file is searched for an IP address that is locally reachable, ie. matches a configured subnet. This allows to run the client commands on another node in the same cluster, as long as the config file and the service IP is locally reachable. + For instance, if the booth service IP is 192.168.55.200, and the local node has 192.168.55.15 configured on one of its network interfaces, it knows which site it belongs to. + Use '-s' to direct client to connect to a different site. *'status'*:: 'boothd' looks for the (locked) PID file and the UDP socket, prints some output to stdout (for use in shell scripts) and returns an OCF-compatible return code. With '-D', a human-readable message is printed to STDERR as well. CONFIGURATION FILE ------------------ The configuration file must be identical on all sites and arbitrators. A minimal file may look like this: ----------------------- site="192.168.201.100" site="192.168.202.100" arbitrator="192.168.203.100" ticket="ticket-db8" ----------------------- Comments start with a hash-sign (''#''). Whitespace at the start and end of the line, and around the ''='', are ignored. The following key/value pairs are defined: *'port'*:: The UDP/TCP port to use. Default is '9929'. *'transport'*:: The transport protocol to use for Raft exchanges. Currently only UDP is supported. + Clients use TCP to communicate with a daemon; Booth will always bind and listen to both UDP and TCP ports. *'authfile'*:: File containing the authentication key. The key can be either binary or text. If the latter, then both leading and trailing white space, including new lines, is ignored. This key is a shared secret and used to authenticate both clients and servers. The key must be between 8 and 64 characters long and be readable only by the file owner. +*'maxtimeskew'*:: + As protection against replay attacks, packets contain + generation timestamps. Such a timestamp is not allowed to be + too old. Just how old can be specified with this parameter. + The value is in seconds and the default is 600 (10 minutes). + If clocks vary more than this default between sites and nodes + (which is definitely something you should fix) then set this + parameter to a higher value. The time skew test is performed + only in concert with authentication. + *'site'*:: Defines a site Raft member with the given IP. Sites can acquire tickets. The sites' IP should be managed by the cluster. *'arbitrator'*:: Defines an arbitrator Raft member with the given IP. Arbitrators help reach consensus in elections and cannot hold tickets. Booth needs at least three members for normal operation. Odd number of members provides more redundancy. *'site-user'*, *'site-group'*, *'arbitrator-user'*, *'arbitrator-group'*:: These define the credentials 'boothd' will be running with. + On a (Pacemaker) site the booth process will have to call 'crm_ticket', so the default is to use 'hacluster':'haclient'; for an arbitrator this user and group might not exists, so there we default to 'nobody':'nobody'. *'ticket'*:: Registers a ticket. Multiple tickets can be handled by single Booth instance. + Use the special ticket name '__defaults__' to modify the defaults. The '__defaults__' stanza must precede all the other ticket specifications. All times are in seconds. *'expire'*:: The lease time for a ticket. After that time the ticket can be acquired by another site if the ticket holder is not reachable. + The default is '600'. *'acquire-after'*:: Once a ticket is lost, wait this time in addition before acquiring the ticket. + This is to allow for the site that lost the ticket to relinquish the resources, by either stopping them or fencing a node. + A typical delay might be 60 seconds, but ultimately it depends on the protected resources and the fencing configuration. + The default is '0'. *'renewal-freq'*:: Set the ticket renewal frequency period. + If the network reliability is often reduced over prolonged periods, it is advisable to try to renew more often. + Before every renewal, if defined, the command specified in 'before-acquire-handler' is run. In that case the 'renewal-freq' parameter is effectively also the local cluster monitoring interval. *'timeout'*:: After that time 'booth' will re-send packets if there was an insufficient number of replies. This should be long enough to allow packets to reach other members. + The default is '5'. *'retries'*:: Defines how many times to retry sending packets before giving up waiting for acks from other members. + Default is '10'. Values lower than 3 are illegal. + Ticket renewals should allow for this number of retries. Hence, the total retry time must be shorter than the renewal time (either half the expire time or *'renewal-freq'*): timeout*(retries+1) < renewal *'weights'*:: A comma-separated list of integers that define the weight of individual Raft members, in the same order as the 'site' and 'arbitrator' lines. + Default is '0' for all; this means that the order in the configuration file defines priority for conflicting requests. *'before-acquire-handler'*:: If set, this command will be called before 'boothd' tries to acquire or renew a ticket. On exit code other than 0, 'boothd' relinquishes the ticket. + Thus it is possible to ensure whether the services and its dependencies protected by the ticket are in good shape at this site. For instance, if a service in the dependency-chain has a failcount of 'INFINITY' on all available nodes, the service will be unable to run. In that case, it is of no use to claim the ticket. + 'boothd' waits synchronously for the result of the handler, so make sure that the program returns quickly. + See below for details about booth specific environment variables. The distributed 'service-runnable' script is an example which may be used to test whether a pacemaker resource can be started. One example of a booth configuration file: ----------------------- transport = udp port = 9930 # D-85774 site="192.168.201.100" # D-90409 site="::ffff:192.168.202.100" # A-1120 arbitrator="192.168.203.100" ticket="ticket-db8" expire = 600 acquire-after = 60 timeout = 10 retries = 5 renewal-freq = 60 before-acquire-handler = /usr/share/booth/service-runnable db8 ----------------------- BOOTH TICKET MANAGEMENT ----------------------- The booth cluster guarantees that every ticket is owned by only one site at the time. Tickets must be initially granted with the 'booth client grant' command. Once it gets granted, the ticket is managed by the booth cluster. Hence, only granted tickets are managed by 'booth'. If the ticket gets lost, i.e. that the other members of the booth cluster do not hear from the ticket owner in a sufficiently long time, one of the remaining sites will acquire the ticket. This is what is called _ticket failover_. If the remaining members cannot form a majority, then the ticket cannot fail over. A ticket may be revoked at any time with the 'booth client revoke' command. For revoke to succeed, the site holding the ticket must be reachable. Once the ticket is administratively revoked, it is not managed by the booth cluster anymore. For the booth cluster to start managing the ticket again, it must be again granted to a site. The grant operation, in case not all sites are reachable, may get delayed for the ticket expire time (and, if defined, the 'acquire-after' time). The reason is that the other booth members may not know if the ticket is currently granted at the unreachable site. This delay may be disabled with the '-F' option. In that case, it is up to the administrator to make sure that the unreachable site is not holding the ticket. When the ticket is managed by 'booth', it is dangerous to modify it manually using either `crm_ticket` command or `crm site ticket`. Neither of these tools is aware of 'booth' and, consequently, 'booth' itself may not be aware of any ticket status changes. A notable exception is setting the ticket to standby which is typically done before a planned failover. NOTES ----- Tickets are not meant to be moved around quickly, the default 'expire' time is 600 seconds (10 minutes). 'booth' works with both IPv4 and IPv6 addresses. 'booth' renews a ticket before it expires, to account for possible transmission delays. The renewal time, unless explicitly set, is set to half the 'expire' time. HANDLERS -------- Currently, there's only one external handler defined (see the 'before-acquire-handler' configuration item above). The following environment variables are exported to the handler: *'BOOTH_TICKET':: The ticket name, as given in the configuration file. (See 'ticket' item above.) *'BOOTH_LOCAL':: The local site name, as defined in 'site'. *'BOOTH_CONF_PATH':: The path to the active configuration file. *'BOOTH_CONF_NAME':: The configuration name, as used by the '-c' commandline argument. *'BOOTH_TICKET_EXPIRES':: When the ticket expires (in seconds since 1.1.1970), or '0'. The handler is invoked with positional arguments specified after it. FILES ----- *'/etc/booth/booth.conf'*:: The default configuration file name. See also the '-c' argument. *'/etc/booth/authkey'*:: There is no default, but this is a typical location for the shared secret (authentication key). *'/var/run/booth/'*:: Directory that holds PID/lock files. See also the 'status' command. RAFT IMPLEMENTATION ------------------- In essence, every ticket corresponds to a separate Raft cluster. A ticket is granted to the Raft _Leader_ which then owns (or keeps) the ticket. ARBITRATOR MANAGEMENT --------------------- The booth daemon for an arbitrator which typically doesn't run the cluster stack, may be started through systemd or with '/etc/init.d/booth-arbitrator', depending on which init system the platform supports. The SysV init script starts a booth arbitrator for every configuration file found in '/etc/booth'. Platforms running systemd can enable and start every configuration separately using 'systemctl': ----------- # systemctl enable booth@ # systemctl start booth@ ----------- 'systemctl' requires the configuration name, even for the default name 'booth'. EXIT STATUS ----------- *0*:: Success. For the 'status' command: Daemon running. *1* (PCMK_OCF_UNKNOWN_ERROR):: General error code. *7* (PCMK_OCF_NOT_RUNNING):: No daemon process for that configuration active. BUGS ---- Booth is tested regularly. See the `README-testing` file for more information. Please report any bugs either at GitHub: Or, if you prefer bugzilla, at openSUSE bugzilla (component "High Availability"): https://bugzilla.opensuse.org/enter_bug.cgi?product=openSUSE%20Factory AUTHOR ------ 'boothd' was originally written (mostly) by Jiaju Zhang. In 2013 and 2014 Philipp Marek took over maintainership. Since April 2014 it has been mainly developed by Dejan Muhamedagic. Many people contributed (see the `AUTHORS` file). RESOURCES --------- GitHub: Documentation: COPYING ------- Copyright (C) 2011 Jiaju Zhang Copyright (C) 2013-2014 Philipp Marek Copyright (C) 2014 Dejan Muhamedagic Free use of this software is granted under the terms of the GNU General Public License (GPL). // vim: set ft=asciidoc : diff --git a/src/booth.h b/src/booth.h index 91a1967..6ffd8cb 100644 --- a/src/booth.h +++ b/src/booth.h @@ -1,308 +1,323 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #ifndef _BOOTH_H #define _BOOTH_H #include #include #include #include #include +#include "timer.h" #define BOOTH_RUN_DIR "/var/run/booth/" #define BOOTH_LOG_DIR "/var/log" #define BOOTH_LOGFILE_NAME "booth.log" #define BOOTH_DEFAULT_CONF_DIR "/etc/booth/" #define BOOTH_DEFAULT_CONF_NAME "booth" #define BOOTH_DEFAULT_CONF_EXT ".conf" #define BOOTH_DEFAULT_CONF \ BOOTH_DEFAULT_CONF_DIR BOOTH_DEFAULT_CONF_NAME BOOTH_DEFAULT_CONF_EXT #define DAEMON_NAME "boothd" #define BOOTH_PATH_LEN 127 #define BOOTH_MAX_KEY_LEN 64 #define BOOTH_MIN_KEY_LEN 8 /* hash size is 160 bits (sha1), but add a bit more space in case * stronger hashes are required */ #define BOOTH_MAC_SIZE 24 +/* tolerate packets which are not older than 10 minutes */ +#define BOOTH_DEFAULT_MAX_TIME_SKEW 600 + #define BOOTH_DEFAULT_PORT 9929 /* TODO: remove */ #define BOOTH_PROTO_FAMILY AF_INET #define BOOTHC_MAGIC 0x5F1BA08C #define BOOTHC_VERSION 0x00010003 /** Timeout value for poll(). * Determines frequency of periodic jobs, eg. when send-retries are done. * See process_tickets(). */ #define POLL_TIMEOUT 100 /** @{ */ /** The on-network data structures and constants. */ #define BOOTH_NAME_LEN 64 #define CHAR2CONST(a,b,c,d) ((a << 24) | (b << 16) | (c << 8) | d) /* Says that the ticket shouldn't be active anywhere. * NONE wouldn't be specific enough. */ #define NO_ONE ((uint32_t)-1) /* Says that another one should recover. */ #define TICKET_LOST CHAR2CONST('L', 'O', 'S', 'T') typedef unsigned char boothc_site [BOOTH_NAME_LEN]; typedef unsigned char boothc_ticket[BOOTH_NAME_LEN]; struct boothc_header { - /** Authentication data; not used now. */ + /** Generation info (used for authentication) + * This is something that would need to be monotone + * incremental. CLOCK_MONOTONIC should fit the purpose. On + * failover, however, it may happen that the new host has a + * clock which is significantly behind the clock of old host. + * We'll need to relax a bit for the nodes which are starting + * (just accept all OP_STATUS). + */ uint32_t iv; - uint32_t auth1; - uint32_t auth2; + uint32_t secs; /* seconds */ + uint32_t usecs; /* microseconds */ /** BOOTHC_MAGIC */ uint32_t magic; /** BOOTHC_VERSION */ uint32_t version; /** Packet source; site_id. See add_site(). */ uint32_t from; /** Length including header */ uint32_t length; /** The command respectively protocol state. See cmd_request_t. */ uint32_t cmd; /** The matching request (what do we reply to). See cmd_request_t. */ uint32_t request; /** Command options. */ uint32_t options; /** The reason for this RPC. */ uint32_t reason; /** Result of operation. 0 == OK */ uint32_t result; char data[0]; } __attribute__((packed)); struct ticket_msg { /** Ticket name. */ boothc_ticket id; /** Current leader. May be NO_ONE. See add_site(). * For a OP_REQ_VOTE this is */ uint32_t leader; /** Current term. */ uint32_t term; uint32_t term_valid_for; /* Perhaps we need to send a status along, too - like * starting, running, stopping, error, ...? */ } __attribute__((packed)); struct hmac { /** hash id, currently set to constant BOOTH_HASH */ uint32_t hid; /** the calculated hash, BOOTH_MAC_SIZE is big enough to * accommodate the hash of type hid */ unsigned char hash[BOOTH_MAC_SIZE]; } __attribute__((packed)); struct boothc_hdr_msg { struct boothc_header header; struct hmac hmac; } __attribute__((packed)); struct boothc_ticket_msg { struct boothc_header header; struct ticket_msg ticket; struct hmac hmac; } __attribute__((packed)); typedef enum { /* 0x43 = "C"ommands */ CMD_LIST = CHAR2CONST('C', 'L', 's', 't'), CMD_GRANT = CHAR2CONST('C', 'G', 'n', 't'), CMD_REVOKE = CHAR2CONST('C', 'R', 'v', 'k'), /* Replies */ CL_RESULT = CHAR2CONST('R', 's', 'l', 't'), CL_LIST = CHAR2CONST('R', 'L', 's', 't'), CL_GRANT = CHAR2CONST('R', 'G', 'n', 't'), CL_REVOKE = CHAR2CONST('R', 'R', 'v', 'k'), /* get status from another server */ OP_STATUS = CHAR2CONST('S', 't', 'a', 't'), OP_MY_INDEX = CHAR2CONST('M', 'I', 'd', 'x'), /* reply to status */ /* Raft */ OP_REQ_VOTE = CHAR2CONST('R', 'V', 'o', 't'), /* start election */ OP_VOTE_FOR = CHAR2CONST('V', 't', 'F', 'r'), /* reply to REQ_VOTE */ OP_HEARTBEAT= CHAR2CONST('H', 'r', 't', 'B'), /* Heartbeat */ OP_ACK = CHAR2CONST('A', 'c', 'k', '.'), /* Ack for heartbeats and revokes */ OP_UPDATE = CHAR2CONST('U', 'p', 'd', 'E'), /* Update ticket */ OP_REVOKE = CHAR2CONST('R', 'e', 'v', 'k'), /* Revoke ticket */ OP_REJECTED = CHAR2CONST('R', 'J', 'C', '!'), } cmd_request_t; typedef enum { /* for compatibility with other functions */ RLT_SUCCESS = 0, RLT_ASYNC = CHAR2CONST('A', 's', 'y', 'n'), RLT_MORE = CHAR2CONST('M', 'o', 'r', 'e'), RLT_SYNC_SUCC = CHAR2CONST('S', 'c', 'c', 's'), RLT_SYNC_FAIL = CHAR2CONST('F', 'a', 'i', 'l'), RLT_INVALID_ARG = CHAR2CONST('I', 'A', 'r', 'g'), RLT_CIB_PENDING = CHAR2CONST('P', 'e', 'n', 'd'), RLT_EXT_FAILED = CHAR2CONST('X', 'P', 'r', 'g'), RLT_TICKET_IDLE = CHAR2CONST('T', 'i', 'd', 'l'), RLT_OVERGRANT = CHAR2CONST('O', 'v', 'e', 'r'), RLT_PROBABLY_SUCCESS = CHAR2CONST('S', 'u', 'c', '?'), RLT_BUSY = CHAR2CONST('B', 'u', 's', 'y'), RLT_TERM_OUTDATED = CHAR2CONST('T', 'O', 'd', 't'), RLT_TERM_STILL_VALID = CHAR2CONST('T', 'V', 'l', 'd'), RLT_YOU_OUTDATED = CHAR2CONST('O', 'u', 't', 'd'), RLT_REDIRECT = CHAR2CONST('R', 'e', 'd', 'r'), } cmd_result_t; typedef enum { /* for compatibility with other functions */ OR_JUST_SO = 0, OR_AGAIN = CHAR2CONST('A', 'a', 'a', 'a'), OR_TKT_LOST = CHAR2CONST('T', 'L', 's', 't'), OR_REACQUIRE = CHAR2CONST('R', 'a', 'c', 'q'), OR_ADMIN = CHAR2CONST('A', 'd', 'm', 'n'), OR_LOCAL_FAIL = CHAR2CONST('L', 'o', 'c', 'F'), OR_STEPDOWN = CHAR2CONST('S', 'p', 'd', 'n'), OR_SPLIT = CHAR2CONST('S', 'p', 'l', 't'), } cmd_reason_t; /* bitwise command options */ typedef enum { OPT_IMMEDIATE = 1, /* immediate grant */ OPT_WAIT = 2, /* wait for the elections' outcome */ OPT_WAIT_COMMIT = 4, /* wait for the ticket commit to CIB */ } cmd_options_t; /** @} */ /** @{ */ struct booth_site { /** Calculated ID. See add_site(). */ int site_id; int type; int local; /** Roles, like ACCEPTOR, PROPOSER, or LEARNER. Not really used ATM. */ int role; char addr_string[BOOTH_NAME_LEN]; int tcp_fd; int udp_fd; /* 0-based, used for indexing into per-ticket weights */ int index; uint64_t bitmask; unsigned short family; union { struct sockaddr_in sa4; struct sockaddr_in6 sa6; }; int saddrlen; int addrlen; + + /** last timestamp seen from this site */ + uint32_t last_secs; + uint32_t last_usecs; } __attribute__((packed)); extern struct booth_site *local; extern struct booth_site * no_leader; /** @} */ struct booth_transport; struct client { int fd; const struct booth_transport *transport; void (*workfn)(int); void (*deadfn)(int); }; extern struct client *clients; extern struct pollfd *pollfds; int client_add(int fd, const struct booth_transport *tpt, void (*workfn)(int ci), void (*deadfn)(int ci)); int find_client_by_fd(int fd); int do_read(int fd, void *buf, size_t count); int do_write(int fd, void *buf, size_t count); void process_connection(int ci); void safe_copy(char *dest, char *value, size_t buflen, const char *description); int update_authkey(void); struct command_line { int type; /* ACT_ */ int op; /* OP_ */ int options; /* OPT_ */ char configfile[BOOTH_PATH_LEN]; char lockfile[BOOTH_PATH_LEN]; char site[BOOTH_NAME_LEN]; struct boothc_ticket_msg msg; }; extern struct command_line cl; /* http://gcc.gnu.org/onlinedocs/gcc/Typeof.html */ #define min(a__,b__) \ ({ typeof (a__) _a = (a__); \ typeof (b__) _b = (b__); \ _a < _b ? _a : _b; }) #define max(a__,b__) \ ({ typeof (a__) _a = (a__); \ typeof (b__) _b = (b__); \ _a > _b ? _a : _b; }) #endif /* _BOOTH_H */ diff --git a/src/config.c b/src/config.c index a9e32d7..b86559b 100644 --- a/src/config.c +++ b/src/config.c @@ -1,809 +1,815 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #include #include #include #include #include #include #include #include #include #include #include "b_config.h" #include "booth.h" #include "config.h" #include "raft.h" #include "ticket.h" #include "log.h" static int ticket_size = 0; static int ticket_realloc(void) { const int added = 5; int had, want; void *p; had = booth_conf->ticket_allocated; want = had + added; p = realloc(booth_conf->ticket, sizeof(struct ticket_config) * want); if (!p) { log_error("can't alloc more tickets"); return -ENOMEM; } booth_conf->ticket = p; memset(booth_conf->ticket + had, 0, sizeof(struct ticket_config) * added); booth_conf->ticket_allocated = want; return 0; } int add_site(char *address, int type); int add_site(char *addr_string, int type) { int rv; struct booth_site *site; uLong nid; uint32_t mask; int i; rv = 1; if (booth_conf->site_count == MAX_NODES) { log_error("too many nodes"); goto out; } if (strlen(addr_string)+1 >= sizeof(booth_conf->site[0].addr_string)) { log_error("site address \"%s\" too long", addr_string); goto out; } site = booth_conf->site + booth_conf->site_count; site->family = BOOTH_PROTO_FAMILY; site->type = type; /* Make site_id start at a non-zero point. * Perhaps use hash over string or address? */ strcpy(site->addr_string, addr_string); site->index = booth_conf->site_count; site->bitmask = 1 << booth_conf->site_count; /* Catch site overflow */ assert(site->bitmask); booth_conf->all_bits |= site->bitmask; if (type == SITE) booth_conf->sites_bits |= site->bitmask; site->tcp_fd = -1; booth_conf->site_count++; rv = 0; memset(&site->sa6, 0, sizeof(site->sa6)); nid = crc32(0L, NULL, 0); /* Using the ASCII representation in site->addr_string (both sizeof() * and strlen()) gives quite a lot of collisions; a brute-force run * from 0.0.0.0 to 24.0.0.0 gives ~4% collisions, and this tends to * increase even more. * Whether there'll be a collision in real-life, with 3 or 5 nodes, is * another question ... but for now get the ID from the binary * representation - that had *no* collisions up to 32.0.0.0. */ if (inet_pton(AF_INET, site->addr_string, &site->sa4.sin_addr) > 0) { site->family = AF_INET; site->sa4.sin_family = site->family; site->sa4.sin_port = htons(booth_conf->port); site->saddrlen = sizeof(site->sa4); site->addrlen = sizeof(site->sa4.sin_addr); site->site_id = crc32(nid, (void*)&site->sa4.sin_addr, site->addrlen); } else if (inet_pton(AF_INET6, site->addr_string, &site->sa6.sin6_addr) > 0) { site->family = AF_INET6; site->sa6.sin6_family = site->family; site->sa6.sin6_flowinfo = 0; site->sa6.sin6_port = htons(booth_conf->port); site->saddrlen = sizeof(site->sa6); site->addrlen = sizeof(site->sa6.sin6_addr); site->site_id = crc32(nid, (void*)&site->sa6.sin6_addr, site->addrlen); } else { log_error("Address string \"%s\" is bad", site->addr_string); rv = EINVAL; } /* Make sure we will never collide with NO_ONE, * or be negative (to get "get_local_id() < 0" working). */ mask = 1 << (sizeof(site->site_id)*8 -1); assert(NO_ONE & mask); site->site_id &= ~mask; /* Test for collisions with other sites */ for(i=0; iindex; i++) if (booth_conf->site[i].site_id == site->site_id) { log_error("Got a site-ID collision. Please file a bug on https://github.com/ClusterLabs/booth/issues/new, attaching the configuration file."); exit(1); } out: return rv; } inline static char *skip_while_in(const char *cp, int (*fn)(int), const char *allowed) { /* strchr() returns a pointer to the terminator if *cp == 0. */ while (*cp && (fn(*cp) || strchr(allowed, *cp))) cp++; /* discard "const" qualifier */ return (char*)cp; } inline static char *skip_while(char *cp, int (*fn)(int)) { while (fn(*cp)) cp++; return cp; } inline static char *skip_until(char *cp, char expected) { while (*cp && *cp != expected) cp++; return cp; } static inline int is_end_of_line(char *cp) { char c = *cp; return c == '\n' || c == 0 || c == '#'; } static int add_ticket(const char *name, struct ticket_config **tkp, const struct ticket_config *def) { int rv; struct ticket_config *tk; if (booth_conf->ticket_count == booth_conf->ticket_allocated) { rv = ticket_realloc(); if (rv < 0) return rv; } tk = booth_conf->ticket + booth_conf->ticket_count; booth_conf->ticket_count++; tk->last_valid_tk = malloc(sizeof(struct ticket_config)); if (!tk->last_valid_tk) { log_error("out of memory"); return -ENOMEM; } memset(tk->last_valid_tk, 0, sizeof(struct ticket_config)); if (!check_max_len_valid(name, sizeof(tk->name))) { log_error("ticket name \"%s\" too long.", name); return -EINVAL; } if (find_ticket_by_name(name, NULL)) { log_error("ticket name \"%s\" used again.", name); return -EINVAL; } if (* skip_while_in(name, isalnum, "-/")) { log_error("ticket name \"%s\" invalid; only alphanumeric names.", name); return -EINVAL; } strcpy(tk->name, name); tk->timeout = def->timeout; tk->term_duration = def->term_duration; tk->retries = def->retries; memcpy(tk->weight, def->weight, sizeof(tk->weight)); if (tkp) *tkp = tk; return 0; } static int postproc_ticket(struct ticket_config *tk) { if (!tk) return 1; if (!tk->renewal_freq) { tk->renewal_freq = tk->term_duration/2; } if (tk->timeout*(tk->retries+1) >= tk->renewal_freq) { log_error("%s: total amount of time to " "retry sending packets cannot exceed " "renewal frequency " "(%d*(%d+1) >= %d)", tk->name, tk->timeout, tk->retries, tk->renewal_freq); return 0; } return 1; } /* returns number of weights, or -1 on bad input. */ static int parse_weights(const char *input, int weights[MAX_NODES]) { int i, v; char *cp; for(i=0; iproto = UDP; booth_conf->port = BOOTH_DEFAULT_PORT; + booth_conf->maxtimeskew = BOOTH_DEFAULT_MAX_TIME_SKEW; booth_conf->authkey[0] = '\0'; /* Provide safe defaults. -1 is reserved, though. */ booth_conf->uid = -2; booth_conf->gid = -2; strcpy(booth_conf->site_user, "hacluster"); strcpy(booth_conf->site_group, "haclient"); strcpy(booth_conf->arb_user, "nobody"); strcpy(booth_conf->arb_group, "nobody"); parse_weights("", defaults.weight); defaults.ext_verifier = NULL; defaults.term_duration = DEFAULT_TICKET_EXPIRY; defaults.timeout = DEFAULT_TICKET_TIMEOUT; defaults.retries = DEFAULT_RETRIES; defaults.acquire_after = 0; error = ""; log_debug("reading config file %s", path); while (fgets(line, sizeof(line), fp)) { lineno++; s = skip_while(line, isspace); if (is_end_of_line(s)) continue; key = s; /* Key */ end_of_key = skip_while_in(key, isalnum, "-_"); if (end_of_key == key) { error = "No key"; goto err; } if (!*end_of_key) goto exp_equal; /* whitespace, and something else but nothing more? */ s = skip_while(end_of_key, isspace); if (*s != '=') { exp_equal: error = "Expected '=' after key"; goto err; } s++; /* It's my buffer, and I terminate if I want to. */ /* But not earlier than that, because we had to check for = */ *end_of_key = 0; /* Value tokenizing */ s = skip_while(s, isspace); switch (*s) { case '"': case '\'': val = s+1; s = skip_until(val, *s); /* Terminate value */ if (!*s) { error = "Unterminated quoted string"; goto err; } /* Remove and skip quote */ *s = 0; s++; if (* skip_while(s, isspace)) { error = "Surplus data after value"; goto err; } *s = 0; break; case 0: no_value: error = "No value"; goto err; break; default: val = s; /* Rest of line. */ i = strlen(s); /* i > 0 because of "case 0" above. */ while (i > 0 && isspace(s[i-1])) i--; s += i; *s = 0; } if (val == s) goto no_value; if (strlen(key) > BOOTH_NAME_LEN || strlen(val) > BOOTH_NAME_LEN) { error = "key/value too long"; goto err; } if (strcmp(key, "transport") == 0) { if (got_transport) { error = "config file has multiple transport lines"; goto err; } if (strcasecmp(val, "UDP") == 0) booth_conf->proto = UDP; else if (strcasecmp(val, "SCTP") == 0) booth_conf->proto = SCTP; else { error = "invalid transport protocol"; goto err; } got_transport = 1; continue; } if (strcmp(key, "port") == 0) { booth_conf->port = atoi(val); continue; } if (strcmp(key, "name") == 0) { safe_copy(booth_conf->name, val, BOOTH_NAME_LEN, "name"); continue; } #if HAVE_LIBMHASH if (strcmp(key, "authfile") == 0) { safe_copy(booth_conf->authfile, val, BOOTH_PATH_LEN, "authfile"); continue; } + + if (strcmp(key, "maxtimeskew") == 0) { + booth_conf->maxtimeskew = atoi(val); + continue; + } #endif if (strcmp(key, "site") == 0) { if (add_site(val, SITE)) goto out; continue; } if (strcmp(key, "arbitrator") == 0) { if (add_site(val, ARBITRATOR)) goto out; continue; } if (strcmp(key, "site-user") == 0) { safe_copy(booth_conf->site_user, optarg, BOOTH_NAME_LEN, "site-user"); continue; } if (strcmp(key, "site-group") == 0) { safe_copy(booth_conf->site_group, optarg, BOOTH_NAME_LEN, "site-group"); continue; } if (strcmp(key, "arbitrator-user") == 0) { safe_copy(booth_conf->arb_user, optarg, BOOTH_NAME_LEN, "arbitrator-user"); continue; } if (strcmp(key, "arbitrator-group") == 0) { safe_copy(booth_conf->arb_group, optarg, BOOTH_NAME_LEN, "arbitrator-group"); continue; } if (strcmp(key, "debug") == 0) { if (type != CLIENT) debug_level = max(debug_level, atoi(val)); continue; } if (strcmp(key, "ticket") == 0) { if (current_tk && strcmp(current_tk->name, "__defaults__")) { if (!postproc_ticket(current_tk)) { goto out; } } if (!strcmp(val, "__defaults__")) { current_tk = &defaults; } else if (add_ticket(val, ¤t_tk, &defaults)) { goto out; } continue; } /* current_tk must be allocated at this point, otherwise * we don't know to which ticket the key refers */ if (!current_tk) { error = "Unexpected keyword"; goto err; } if (strcmp(key, "expire") == 0) { current_tk->term_duration = read_time(val); if (current_tk->term_duration <= 0) { error = "Expected time >0 for expire"; goto err; } continue; } if (strcmp(key, "timeout") == 0) { current_tk->timeout = read_time(val); if (current_tk->timeout <= 0) { error = "Expected time >0 for timeout"; goto err; } if (!min_timeout) { min_timeout = current_tk->timeout; } else { min_timeout = min(min_timeout, current_tk->timeout); } continue; } if (strcmp(key, "retries") == 0) { current_tk->retries = strtol(val, &s, 0); if (*s || s == val || current_tk->retries<3 || current_tk->retries > 100) { error = "Expected plain integer value in the range [3, 100] for retries"; goto err; } continue; } if (strcmp(key, "renewal-freq") == 0) { current_tk->renewal_freq = read_time(val); if (current_tk->renewal_freq <= 0) { error = "Expected time >0 for renewal-freq"; goto err; } continue; } if (strcmp(key, "acquire-after") == 0) { current_tk->acquire_after = read_time(val); if (current_tk->acquire_after < 0) { error = "Expected time >=0 for acquire-after"; goto err; } continue; } if (strcmp(key, "before-acquire-handler") == 0) { if (current_tk->ext_verifier) { free(current_tk->ext_verifier); } current_tk->ext_verifier = strdup(val); if (!current_tk->ext_verifier) { error = "Out of memory"; goto err; } continue; } if (strcmp(key, "weights") == 0) { if (parse_weights(val, current_tk->weight) < 0) goto out; continue; } error = "Unknown keyword"; goto err; } if ((booth_conf->site_count % 2) == 0) { log_warn("Odd number of nodes is strongly recommended!"); } /* Default: make config name match config filename. */ if (!booth_conf->name[0]) { cp = strrchr(path, '/'); cp = cp ? cp+1 : (char *)path; cp2 = strrchr(cp, '.'); if (!cp2) cp2 = cp + strlen(cp); if (cp2-cp >= BOOTH_NAME_LEN) { log_error("booth config file name too long"); goto err; } strncpy(booth_conf->name, cp, cp2-cp); *(booth_conf->name+(cp2-cp)) = '\0'; } if (!postproc_ticket(current_tk)) { goto out; } poll_timeout = min(POLL_TIMEOUT, min_timeout/10); return 0; err: out: log_error("%s in config file line %d", error, lineno); free(booth_conf); booth_conf = NULL; return -1; } int check_config(int type) { struct passwd *pw; struct group *gr; char *cp, *input; if (!booth_conf) return -1; input = (type == ARBITRATOR) ? booth_conf->arb_user : booth_conf->site_user; if (!*input) goto u_inval; if (isdigit(input[0])) { booth_conf->uid = strtol(input, &cp, 0); if (*cp != 0) { u_inval: log_error("User \"%s\" cannot be resolved into a UID.", input); return ENOENT; } } else { pw = getpwnam(input); if (!pw) goto u_inval; booth_conf->uid = pw->pw_uid; } input = (type == ARBITRATOR) ? booth_conf->arb_group : booth_conf->site_group; if (!*input) goto g_inval; if (isdigit(input[0])) { booth_conf->gid = strtol(input, &cp, 0); if (*cp != 0) { g_inval: log_error("Group \"%s\" cannot be resolved into a UID.", input); return ENOENT; } } else { gr = getgrnam(input); if (!gr) goto g_inval; booth_conf->gid = gr->gr_gid; } /* TODO: check whether uid or gid is 0 again? * The admin may shoot himself in the foot, though. */ return 0; } int find_site_by_name(unsigned char *site, struct booth_site **node, int any_type) { struct booth_site *n; int i; if (!booth_conf) return 0; for (i = 0; i < booth_conf->site_count; i++) { n = booth_conf->site + i; if ((n->type == SITE || any_type) && strcmp(n->addr_string, site) == 0) { *node = n; return 1; } } return 0; } int find_site_by_id(uint32_t site_id, struct booth_site **node) { struct booth_site *n; int i; if (site_id == NO_ONE) { *node = no_leader; return 1; } if (!booth_conf) return 0; for (i = 0; i < booth_conf->site_count; i++) { n = booth_conf->site + i; if (n->site_id == site_id) { *node = n; return 1; } } return 0; } const char *type_to_string(int type) { switch (type) { case ARBITRATOR: return "arbitrator"; case SITE: return "site"; case CLIENT: return "client"; } return "??invalid-type??"; } diff --git a/src/config.h b/src/config.h index 0303e2a..32097b3 100644 --- a/src/config.h +++ b/src/config.h @@ -1,255 +1,257 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #ifndef _CONFIG_H #define _CONFIG_H #include #include #include "booth.h" #include "timer.h" #include "raft.h" #include "transport.h" /** @{ */ /** Definitions for in-RAM data. */ #define MAX_NODES 16 #define TICKET_ALLOC 16 struct ticket_config { /** \name Configuration items. * @{ */ /** Name of ticket. */ boothc_ticket name; /** How long a term lasts if not refreshed (in ms) */ int term_duration; /** Network related timeouts (in ms) */ int timeout; /** Retries before giving up. */ int retries; /** If >0, time to wait for a site to get fenced. * The ticket may be acquired after that timespan by * another site. */ int acquire_after; /* TODO: needed? */ /* How often to renew the ticket (in ms) */ int renewal_freq; /* Program to ask whether it makes sense to * acquire the ticket */ char *ext_verifier; /** Node weights. */ int weight[MAX_NODES]; /** @} */ /** \name Runtime values. * @{ */ /** Current state. */ server_state_e state; /** Next state. Used at startup. */ server_state_e next_state; /** When something has to be done */ timetype next_cron; /** Current leader. This is effectively the log[] in Raft. */ struct booth_site *leader; /** Leader that got lost. */ struct booth_site *lost_leader; /** Is the ticket granted? */ int is_granted; /** Timestamp of leadership expiration */ timetype term_expires; /** End of election period */ timetype election_end; struct booth_site *voted_for; /** Who the various sites vote for. * NO_OWNER = no vote yet. */ struct booth_site *votes_for[MAX_NODES]; /* bitmap */ uint64_t votes_received; /** Last voting round that was seen. */ uint32_t current_term; /** Do ticket updates whenever we get enough heartbeats. * But do that only once. * This is reset to 0 whenever we broadcast heartbeat and set * to 1 once enough acks are received. * Increased to 2 when the ticket is commited to the CIB (see * delay_commit). */ uint32_t ticket_updated; /** Outcome of whatever ticket request was processed. * Can also be an intermediate stage. */ uint32_t outcome; /** @} */ /** */ uint32_t last_applied; uint32_t next_index[MAX_NODES]; uint32_t match_index[MAX_NODES]; /* Why did we start the elections? */ cmd_reason_t election_reason; /* if it is potentially dangerous to grant the ticket * immediately, then this is set to some point in time, * usually (now + term_duration + acquire_after) */ timetype delay_commit; /* the last request RPC we sent */ uint32_t last_request; /* if we expect some acks, then set this to the id of * the RPC which others will send us; it is cleared once all * replies were received */ uint32_t acks_expected; /* bitmask of servers which sent acks */ uint64_t acks_received; /* timestamp of the request */ timetype req_sent_at; /* we need to wait for MY_INDEX from other servers, * hold the ticket processing for a while until they reply */ int start_postpone; /** Last renewal time */ timetype last_renewal; /* Do we need to update the copy in the CIB? * Normally, the ticket is written only when it changes via * the UPDATE RPC (for followers) and on expiration update * (for leaders) */ int update_cib; /* Is this ticket in election? */ int in_election; /* don't log warnings unnecessarily */ int expect_more_rejects; /** \name Needed while proposals are being done. * @{ */ /* Need to keep the previous valid ticket in case we moved to * start new elections and another server asks for the ticket * status. It would be wrong to send our candidate ticket. */ struct ticket_config *last_valid_tk; /** Whom to vote for the next time. * Needed to push a ticket to someone else. */ #if 0 /** Bitmap of sites that acknowledge that state. */ uint64_t proposal_acknowledges; /** When an incompletely acknowledged proposal gets done. * If all peers agree, that happens sooner. * See switch_state_to(). */ struct timeval proposal_switch; /** Timestamp of proposal expiration. */ time_t proposal_expires; #endif /** Number of send retries left. * Used on the new owner. * Starts at 0, counts up. */ int retry_number; /** @} */ }; struct booth_config { char name[BOOTH_NAME_LEN]; /** File containing the authentication file. */ char authfile[BOOTH_PATH_LEN]; struct stat authstat; unsigned char authkey[BOOTH_MAX_KEY_LEN]; int authkey_len; + /** Maximum time skew between peers allowed */ + int maxtimeskew; transport_layer_t proto; uint16_t port; /** Stores the OR of sites bitmasks. */ uint64_t sites_bits; /** Stores the OR of all members' bitmasks. */ uint64_t all_bits; char site_user[BOOTH_NAME_LEN]; char site_group[BOOTH_NAME_LEN]; char arb_user[BOOTH_NAME_LEN]; char arb_group[BOOTH_NAME_LEN]; uid_t uid; gid_t gid; int site_count; struct booth_site site[MAX_NODES]; int ticket_count; int ticket_allocated; struct ticket_config *ticket; }; extern struct booth_config *booth_conf; #define is_auth_req() (booth_conf->authkey[0] != '\0') int read_config(const char *path, int type); int check_config(int type); int find_site_by_name(unsigned char *site, struct booth_site **node, int any_type); int find_site_by_id(uint32_t site_id, struct booth_site **node); const char *type_to_string(int type); #endif /* _CONFIG_H */ diff --git a/src/inline-fn.h b/src/inline-fn.h index 5521e28..15712b6 100644 --- a/src/inline-fn.h +++ b/src/inline-fn.h @@ -1,285 +1,294 @@ /* * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #ifndef _INLINE_FN_H #define _INLINE_FN_H #include #include #include #include #include "timer.h" #include "config.h" #include "transport.h" inline static uint32_t get_local_id(void) { return local ? local->site_id : -1; } inline static uint32_t get_node_id(struct booth_site *node) { return node ? node->site_id : 0; } /** Returns number of seconds left, if any. */ inline static int term_time_left(struct ticket_config *tk) { int left = 0; if (is_time_set(&tk->term_expires)) { left = time_left(&tk->term_expires); } return (left < 0) ? 0 : left; } inline static int leader_and_valid(struct ticket_config *tk) { if (tk->leader != local) return 0; return term_time_left(tk); } /** Is this some leader? */ inline static int is_owned(const struct ticket_config *tk) { return (tk->leader && tk->leader != no_leader); } inline static int is_resend(struct ticket_config *tk) { timetype now; get_time(&now); return time_sub_int(&now, &tk->req_sent_at) >= tk->timeout; } static inline void init_header_bare(struct boothc_header *h) { + timetype now; + assert(local && local->site_id); h->magic = htonl(BOOTHC_MAGIC); h->version = htonl(BOOTHC_VERSION); h->from = htonl(local->site_id); - h->iv = htonl(0); - h->auth1 = htonl(0); - h->auth2 = htonl(0); + if (is_auth_req()) { + get_time(&now); + h->iv = htonl(1); + h->secs = htonl(secs_since_epoch(&now)); + h->usecs = htonl(get_usecs(&now)); + } else { + h->iv = htonl(0); + h->secs = htonl(0); + h->usecs = htonl(0); + } } /* get the _real_ message length out of the header */ #define sendmsglen(msg) ntohl((msg)->header.length) static inline void init_header(struct boothc_header *h, int cmd, int request, int options, int result, int reason, int data_len) { init_header_bare(h); h->length = htonl(data_len - (is_auth_req() ? 0 : sizeof(struct hmac))); h->cmd = htonl(cmd); h->request = htonl(request); h->options = htonl(options); h->result = htonl(result); h->reason = htonl(reason); } #define my_last_term(tk) \ (((tk)->state == ST_CANDIDATE && (tk)->last_valid_tk->current_term) ? \ (tk)->last_valid_tk->current_term : (tk)->current_term) extern int TIME_RES, TIME_MULT; #define msg_term_time(msg) \ ntohl((msg)->ticket.term_valid_for)*TIME_RES/TIME_MULT #define set_msg_term_time(msg, tk) \ (msg)->ticket.term_valid_for = htonl(term_time_left(tk)*TIME_MULT/TIME_RES) static inline void init_ticket_msg(struct boothc_ticket_msg *msg, int cmd, int request, int rv, int reason, struct ticket_config *tk) { assert(sizeof(msg->ticket.id) == sizeof(tk->name)); init_header(&msg->header, cmd, request, 0, rv, reason, sizeof(*msg)); if (!tk) { memset(&msg->ticket, 0, sizeof(msg->ticket)); } else { memcpy(msg->ticket.id, tk->name, sizeof(msg->ticket.id)); msg->ticket.leader = htonl(get_node_id( (tk->leader && tk->leader != no_leader) ? tk->leader : (tk->voted_for ? tk->voted_for : no_leader))); msg->ticket.term = htonl(tk->current_term); set_msg_term_time(msg, tk); } } static inline struct booth_transport const *transport(void) { return booth_transport + booth_conf->proto; } static inline const char *site_string(struct booth_site *site) { return site ? site->addr_string : "NONE"; } static inline const char *ticket_leader_string(struct ticket_config *tk) { return site_string(tk->leader); } /* We allow half of the uint32_t to be used; * half of that below, half of that above the current known "good" value. * 0 UINT32_MAX * |--------------------------+----------------+------------| * | | | * |--------+-------| allowed range * | * current commit index * * So, on overflow it looks like that: * UINT32_MAX 0 * |--------------------------+-----------||---+------------| * | | | * |--------+-------| allowed range * | * current commit index * * This should be possible by using the same datatype and relying * on the under/overflow semantics. * * * Having 30 bits available, and assuming an expire time of * one minute and a (high) commit index step of 64 == 2^6 (because * of weights), we get 2^24 minutes of range - which is ~750 * years. "Should be enough for everybody." */ static inline int index_is_higher_than(uint32_t c_high, uint32_t c_low) { uint32_t diff; if (c_high == c_low) return 0; diff = c_high - c_low; if (diff < UINT32_MAX/4) return 1; diff = c_low - c_high; if (diff < UINT32_MAX/4) return 0; assert(!"commit index out of range - invalid"); } static inline uint32_t index_max2(uint32_t a, uint32_t b) { return index_is_higher_than(a, b) ? a : b; } static inline uint32_t index_max3(uint32_t a, uint32_t b, uint32_t c) { return index_max2( index_max2(a, b), c); } /* only invoked when ticket leader */ static inline void get_next_election_time(struct ticket_config *tk, timetype *next) { assert(tk->leader == local); /* if last_renewal is not set, which is unusual, it may mean * that the ticket never got updated, i.e. nobody acked * ticket updates (say, due to a temporary connection * problem) * we may try a bit later again */ if (!is_time_set(&tk->last_renewal)) { time_reset(next); } else { interval_add(&tk->last_renewal, tk->renewal_freq, next); } /* if delay_commit is earlier than next, then set next to * delay_commit */ if (is_time_set(&tk->delay_commit) && time_cmp(next, &tk->delay_commit, >)) { copy_time(&tk->delay_commit, next); } } static inline void expect_replies(struct ticket_config *tk, int reply_type) { tk->retry_number = 0; tk->acks_expected = reply_type; tk->acks_received = local->bitmask; get_time(&tk->req_sent_at); } static inline void no_resends(struct ticket_config *tk) { tk->retry_number = 0; tk->acks_expected = 0; } static inline struct booth_site *my_vote(struct ticket_config *tk) { return tk->votes_for[ local->index ]; } static inline int count_bits(uint64_t val) { return __builtin_popcount(val); } static inline int majority_of_bits(struct ticket_config *tk, uint64_t val) { /* Use ">" to get majority decision, even for an even number * of participants. */ return count_bits(val) * 2 > booth_conf->site_count; } static inline int all_replied(struct ticket_config *tk) { return !(tk->acks_received ^ booth_conf->all_bits); } static inline int all_sites_replied(struct ticket_config *tk) { return !((tk->acks_received & booth_conf->sites_bits) ^ booth_conf->sites_bits); } #endif diff --git a/src/timer.c b/src/timer.c index 7701d7a..d51e807 100644 --- a/src/timer.c +++ b/src/timer.c @@ -1,160 +1,181 @@ /* * Copyright (C) 2014 Dejan Muhamedagic * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #include "timer.h" /* which time resolution makes most sense? * the factors are clock resolution and network latency */ int TIME_RES = 1000; int TIME_MULT = 1; int time_sub_int(timetype *a, timetype *b) { timetype res; time_sub(a, b, &res); return res.tv_sec*TIME_RES + res.SUBSEC/TIME_FAC; } /* interval (b) is in ms (1/TIME_RES) */ void interval_add(timetype *a, int b, timetype *res) { /* need this to allow interval_add(a, b, a); */ long tmp_subsec = a->SUBSEC + (long)b*TIME_FAC; res->SUBSEC = tmp_subsec%NSECS; res->tv_sec = a->tv_sec + tmp_subsec/NSECS; } int is_time_set(timetype *p) { return (p->tv_sec != 0) || (p->SUBSEC != 0); } int is_past(timetype *p) { timetype now; /*if (!is_time_set(p)) return 1;*/ assert(p->tv_sec || p->SUBSEC); get_time(&now); return time_cmp(&now, p, >); } void secs2tv(time_t secs, timetype *p) { memset(p, 0, sizeof(timetype)); p->tv_sec = secs; } int time_left(timetype *p) { timetype now; assert(p->tv_sec || p->SUBSEC); get_time(&now); return time_sub_int(p, &now); } void set_future_time(timetype *a, int b) { timetype now; get_time(&now); interval_add(&now, b, a); } void time_reset(timetype *p) { memset(p, 0, sizeof(timetype)); } void copy_time(timetype *src, timetype *dst) { dst->SUBSEC = src->SUBSEC; dst->tv_sec = src->tv_sec; } #if _POSIX_TIMERS > 0 void time_sub(struct timespec *a, struct timespec *b, struct timespec *res) { if (a->tv_nsec < b->tv_nsec) { res->tv_sec = a->tv_sec - b->tv_sec - 1L; res->tv_nsec = a->tv_nsec + (NSECS - b->tv_nsec); } else { res->tv_sec = a->tv_sec - b->tv_sec; res->tv_nsec = a->tv_nsec - b->tv_nsec; } } void time_add(struct timespec *a, struct timespec *b, struct timespec *res) { res->tv_nsec = a->tv_nsec + b->tv_nsec; res->tv_sec = a->tv_sec + b->tv_sec; if (res->tv_nsec >= NSECS) { res->tv_sec++; res->tv_nsec %= NSECS; } } time_t get_secs(struct timespec *p) { if (p) { get_time(p); return p->tv_sec; } else { struct timespec tv; get_time(&tv); return tv.tv_sec; } } -/* time booth_clk_t is a time since boot or similar, return - * something humans can understand */ -time_t wall_ts(struct timespec *booth_clk_t) +/* time booth_clk_t is a time since boot or similar, convert that + * to time since epoch (Jan 1, 1970) + */ +static void clock2epochtime(struct timespec *booth_clk_t, struct timespec *res) { - struct timespec booth_clk_now, now_tv, res; + struct timespec booth_clk_now, now_tv; struct timeval now; get_time(&booth_clk_now); gettimeofday(&now, NULL); TIMEVAL_TO_TIMESPEC(&now, &now_tv); - time_sub(&now_tv, &booth_clk_now, &res); - time_add(booth_clk_t, &res, &res); + time_sub(&now_tv, &booth_clk_now, res); + time_add(booth_clk_t, res, res); +} + +/* time booth_clk_t is a time since boot or similar, return + * something humans can understand (rounded seconds only) */ +time_t wall_ts(struct timespec *booth_clk_t) +{ + struct timespec res; + + clock2epochtime(booth_clk_t, &res); return round2secs(&res); } +/* time booth_clk_t is a time since boot or similar, get here + * seconds since epoch + */ +time_t secs_since_epoch(struct timespec *booth_clk_t) +{ + struct timespec res; + + clock2epochtime(booth_clk_t, &res); + return res.tv_sec; +} + /* time t is wall clock time, convert to time compatible * with our clock_gettime clock */ time_t unwall_ts(time_t t) { struct timespec booth_clk_now, now_tv, res; struct timeval now; get_time(&booth_clk_now); gettimeofday(&now, NULL); TIMEVAL_TO_TIMESPEC(&now, &now_tv); time_sub(&now_tv, &booth_clk_now, &res); return t - res.tv_sec; } #endif diff --git a/src/timer.h b/src/timer.h index 31028a5..7d1120b 100644 --- a/src/timer.h +++ b/src/timer.h @@ -1,92 +1,96 @@ /* * Copyright (C) 2014 Dejan Muhamedagic * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #ifndef _TIMER_H #define _TIMER_H #include #include #include #include #include #if _POSIX_TIMERS > 0 #if defined(CLOCK_MONOTONIC) # define BOOTH_CLOCK CLOCK_MONOTONIC #else # define BOOTH_CLOCK CLOCK_REALTIME #endif #define NSECS 1000000000L /* nanoseconds */ #define TIME_FAC (NSECS/TIME_RES) #define SUBSEC tv_nsec #define SUBSEC_FAC NSECS typedef struct timespec timetype; #define get_time(p) clock_gettime(BOOTH_CLOCK, p) +#define get_usecs(p) ((p)->tv_nsec/1000L) #define time_cmp(a, b, CMP) \ (((a)->tv_sec == (b)->tv_sec) ? \ ((a)->tv_nsec CMP (b)->tv_nsec) : \ ((a)->tv_sec CMP (b)->tv_sec)) void time_sub(struct timespec *a, struct timespec *b, struct timespec *res); void time_add(struct timespec *a, struct timespec *b, struct timespec *res); time_t get_secs(struct timespec *p); time_t wall_ts(struct timespec *p); +time_t secs_since_epoch(struct timespec *booth_clk_t); time_t unwall_ts(time_t t); #else #define MUSECS 1000000L /* microseconds */ #define SUBSEC_FAC MUSECS #define TIME_FAC (MUSECS/TIME_RES) #define SUBSEC tv_usec typedef struct timeval timetype; #define get_time(p) gettimeofday(p, NULL) +#define secs_since_epoch(p) ((p)->tv_sec) +#define get_usecs(p) ((p)->tv_usec) #define time_sub timersub #define time_add timeradd #define time_cmp timercmp #define get_secs time #define wall_ts round2secs #define unwall_ts(t) (t) #endif int is_past(timetype *p); void secs2tv(time_t secs, timetype *p); void time_reset(timetype *p); int time_sub_int(timetype *a, timetype *b); void set_future_time(timetype *a, int b); int time_left(timetype *p); void copy_time(timetype *src, timetype *dst); void interval_add(timetype *p, int interval, timetype *res); int is_time_set(timetype *p); #define intfmt(t) "%d.%03d", (t)/TIME_RES, ((t)<0?-(t):(t))%TIME_RES /* random time from 0 to t ms (1/TIME_RES) */ #define rand_time(t) cl_rand_from_interval(0, t*(TIME_RES/1000)) #define round2secs(p) \ ((p)->tv_sec + ((p)->SUBSEC + SUBSEC_FAC/2)/SUBSEC_FAC) #endif diff --git a/src/transport.c b/src/transport.c index 51caecd..d31b092 100644 --- a/src/transport.c +++ b/src/transport.c @@ -1,826 +1,877 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public * License along with this library; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include "b_config.h" #include "booth.h" #include "inline-fn.h" #include "log.h" #include "config.h" #include "ticket.h" #include "transport.h" #include "auth.h" #define BOOTH_IPADDR_LEN (sizeof(struct in6_addr)) #define NETLINK_BUFSIZE 16384 #define SOCKET_BUFFER_SIZE 160000 #define FRAME_SIZE_MAX 10000 struct booth_site *local = NULL; static int (*deliver_fn) (void *msg, int msglen); static void parse_rtattr(struct rtattr *tb[], int max, struct rtattr *rta, int len) { while (RTA_OK(rta, len)) { if (rta->rta_type <= max) tb[rta->rta_type] = rta; rta = RTA_NEXT(rta,len); } } enum match_type { NO_MATCH = 0, FUZZY_MATCH, EXACT_MATCH, }; static int find_address(unsigned char ipaddr[BOOTH_IPADDR_LEN], int family, int prefixlen, int fuzzy_allowed, struct booth_site **me, int *address_bits_matched) { int i; struct booth_site *node; int bytes, bits_left, mask; unsigned char node_bits, ip_bits; uint8_t *n_a; int matched; enum match_type did_match = NO_MATCH; bytes = prefixlen / 8; bits_left = prefixlen % 8; /* One bit left to check means ignore 7 lowest bits. */ mask = ~( (1 << (8 - bits_left)) -1); for (i = 0; i < booth_conf->site_count; i++) { node = booth_conf->site + i; if (family != node->family) continue; n_a = node_to_addr_pointer(node); for(matched = 0; matched < node->addrlen; matched++) if (ipaddr[matched] != n_a[matched]) break; if (matched == node->addrlen) { /* Full match. */ *address_bits_matched = matched * 8; found: *me = node; did_match = EXACT_MATCH; continue; } if (!fuzzy_allowed) continue; /* Check prefix, whole bytes */ if (matched < bytes) continue; if (matched * 8 < *address_bits_matched) continue; if (!bits_left) goto found; node_bits = n_a[bytes]; ip_bits = ipaddr[bytes]; if (((node_bits ^ ip_bits) & mask) == 0) { /* _At_least_ prefixlen bits matched. */ *address_bits_matched = prefixlen; if (did_match < EXACT_MATCH) { *me = node; did_match = FUZZY_MATCH; } } } return did_match; } int _find_myself(int family, struct booth_site **mep, int fuzzy_allowed); int _find_myself(int family, struct booth_site **mep, int fuzzy_allowed) { int fd; struct sockaddr_nl nladdr; struct booth_site *me; unsigned char ipaddr[BOOTH_IPADDR_LEN]; static char rcvbuf[NETLINK_BUFSIZE]; struct { struct nlmsghdr nlh; struct rtgenmsg g; } req; int address_bits_matched; if (local) goto found; me = NULL; address_bits_matched = 0; if (mep) *mep = NULL; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); if (fd < 0) { log_error("failed to create netlink socket"); return 0; } setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &rcvbuf, sizeof(rcvbuf)); memset(&nladdr, 0, sizeof(nladdr)); nladdr.nl_family = AF_NETLINK; memset(&req, 0, sizeof(req)); req.nlh.nlmsg_len = sizeof(req); req.nlh.nlmsg_type = RTM_GETADDR; req.nlh.nlmsg_flags = NLM_F_ROOT|NLM_F_MATCH|NLM_F_REQUEST; req.nlh.nlmsg_pid = 0; req.nlh.nlmsg_seq = 1; req.g.rtgen_family = family; if (sendto(fd, (void *)&req, sizeof(req), 0, (struct sockaddr*)&nladdr, sizeof(nladdr)) < 0) { close(fd); log_error("failed to send data to netlink socket"); return 0; } while (1) { int status; struct nlmsghdr *h; struct iovec iov = { rcvbuf, sizeof(rcvbuf) }; struct msghdr msg = { (void *)&nladdr, sizeof(nladdr), &iov, 1, NULL, 0, 0 }; status = recvmsg(fd, &msg, 0); if (!status) { close(fd); log_error("failed to recvmsg from netlink socket"); return 0; } h = (struct nlmsghdr *)rcvbuf; if (h->nlmsg_type == NLMSG_DONE) break; if (h->nlmsg_type == NLMSG_ERROR) { close(fd); log_error("netlink socket recvmsg error"); return 0; } while (NLMSG_OK(h, status)) { if (h->nlmsg_type == RTM_NEWADDR) { struct ifaddrmsg *ifa = NLMSG_DATA(h); struct rtattr *tb[IFA_MAX+1]; int len = h->nlmsg_len - NLMSG_LENGTH(sizeof(*ifa)); memset(tb, 0, sizeof(tb)); parse_rtattr(tb, IFA_MAX, IFA_RTA(ifa), len); memset(ipaddr, 0, BOOTH_IPADDR_LEN); /* prefer IFA_LOCAL if it exists, for p-t-p * interfaces, otherwise use IFA_ADDRESS */ if (tb[IFA_LOCAL]) { memcpy(ipaddr, RTA_DATA(tb[IFA_LOCAL]), BOOTH_IPADDR_LEN); } else { memcpy(ipaddr, RTA_DATA(tb[IFA_ADDRESS]), BOOTH_IPADDR_LEN); } /* First try with exact addresses, then optionally with subnet matching. */ if (ifa->ifa_prefixlen > address_bits_matched) find_address(ipaddr, ifa->ifa_family, ifa->ifa_prefixlen, fuzzy_allowed, &me, &address_bits_matched); } h = NLMSG_NEXT(h, status); } } close(fd); if (!me) return 0; me->local = 1; local = me; found: if (mep) *mep = local; return 1; } int find_myself(struct booth_site **mep, int fuzzy_allowed) { return _find_myself(AF_INET6, mep, fuzzy_allowed) || _find_myself(AF_INET, mep, fuzzy_allowed); } /** Checks the header fields for validity. * cf. init_header(). * For @len_incl_data < 0 the length is not checked. * Return <0 if error, else bytes read. */ int check_boothc_header(struct boothc_header *h, int len_incl_data) { int l; if (h->magic != htonl(BOOTHC_MAGIC)) { log_error("magic error %x", ntohl(h->magic)); return -EINVAL; } if (h->version != htonl(BOOTHC_VERSION)) { log_error("version error %x", ntohl(h->version)); return -EINVAL; } l = ntohl(h->length); if (l < sizeof(*h)) { log_error("length %d out of range", l); return -EINVAL; } if (len_incl_data < 0) return 0; if (l != len_incl_data) { log_error("length error - got %d, wanted %d", len_incl_data, l); return -EINVAL; } return len_incl_data; } static void process_tcp_listener(int ci) { int fd, i, one = 1; socklen_t addrlen = sizeof(struct sockaddr); struct sockaddr addr; fd = accept(clients[ci].fd, &addr, &addrlen); if (fd < 0) { log_error("process_tcp_listener: accept error %d %d", fd, errno); return; } setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (char *)&one, sizeof(one)); i = client_add(fd, clients[ci].transport, process_connection, NULL); log_debug("client connection %d fd %d", i, fd); } int setup_tcp_listener(int test_only) { int s, rv; int one = 1; s = socket(local->family, SOCK_STREAM, 0); if (s == -1) { log_error("failed to create tcp socket %s", strerror(errno)); return s; } rv = setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (char *)&one, sizeof(one)); if (rv == -1) { log_error("failed to set the SO_REUSEADDR option"); return rv; } rv = bind(s, &local->sa6, local->saddrlen); if (test_only) { rv = (rv == -1) ? errno : 0; close(s); return rv; } if (rv == -1) { log_error("failed to bind socket %s", strerror(errno)); return rv; } rv = listen(s, 5); if (rv == -1) { log_error("failed to listen on socket %s", strerror(errno)); return rv; } return s; } static int booth_tcp_init(void * unused __attribute__((unused))) { int rv; if (get_local_id() < 0) return -1; rv = setup_tcp_listener(0); if (rv < 0) return rv; client_add(rv, booth_transport + TCP, process_tcp_listener, NULL); return 0; } static int connect_nonb(int sockfd, const struct sockaddr *saptr, socklen_t salen, int sec) { int flags, n, error; socklen_t len; fd_set rset, wset; struct timeval tval; flags = fcntl(sockfd, F_GETFL, 0); fcntl(sockfd, F_SETFL, flags | O_NONBLOCK); error = 0; if ( (n = connect(sockfd, saptr, salen)) < 0) if (errno != EINPROGRESS) return -1; if (n == 0) goto done; /* connect completed immediately */ FD_ZERO(&rset); FD_SET(sockfd, &rset); wset = rset; tval.tv_sec = sec; tval.tv_usec = 0; if ((n = select(sockfd + 1, &rset, &wset, NULL, sec ? &tval : NULL)) == 0) { /* leave outside function to close */ /* timeout */ /* close(sockfd); */ errno = ETIMEDOUT; return -1; } if (FD_ISSET(sockfd, &rset) || FD_ISSET(sockfd, &wset)) { len = sizeof(error); if (getsockopt(sockfd, SOL_SOCKET, SO_ERROR, &error, &len) < 0) return -1; /* Solaris pending error */ } else { log_error("select error: sockfd not set"); return -1; } done: fcntl(sockfd, F_SETFL, flags); /* restore file status flags */ if (error) { /* leave outside function to close */ /* close(sockfd); */ errno = error; return -1; } return 0; } int booth_tcp_open(struct booth_site *to) { int s, rv; if (to->tcp_fd >= STDERR_FILENO) goto found; s = socket(to->family, SOCK_STREAM, 0); if (s == -1) { log_error("cannot create socket of family %d", to->family); return -1; } rv = connect_nonb(s, (struct sockaddr *)&to->sa6, to->saddrlen, 10); if (rv == -1) { if( errno == ETIMEDOUT) log_error("connect to %s got a timeout", site_string(to)); else log_error("connect to %s got an error: %s", site_string(to), strerror(errno)); goto error; } to->tcp_fd = s; found: return 1; error: if (s >= 0) close(s); return -1; } int booth_tcp_send(struct booth_site *to, void *buf, int len) { int rv; rv = add_hmac(buf, len); if (!rv) rv = do_write(to->tcp_fd, buf, len); return rv; } static int booth_tcp_recv(struct booth_site *from, void *buf, int len) { int got; /* Needs timeouts! */ got = do_read(from->tcp_fd, buf, len); if (got < 0) { log_error("read failed (%d): %s", errno, strerror(errno)); return got; } return len; } static int booth_tcp_recv_auth(struct booth_site *from, void *buf, int len) { int got, total; int payload_len; /* Needs timeouts! */ payload_len = len - sizeof(struct hmac); got = booth_tcp_recv(from, buf, payload_len); if (got < 0) { return got; } total = got; if (is_auth_req()) { got = booth_tcp_recv(from, (unsigned char *)buf+payload_len, sizeof(struct hmac)); if (got != sizeof(struct hmac) || check_auth(from, buf, len)) { return -1; } total += got; } return total; } static int booth_tcp_close(struct booth_site *to) { if (to) { if (to->tcp_fd > STDERR_FILENO) close(to->tcp_fd); to->tcp_fd = -1; } return 0; } static int booth_tcp_exit(void) { return 0; } static int setup_udp_server(void) { int rv, fd; int one = 1; unsigned int recvbuf_size; fd = socket(local->family, SOCK_DGRAM, 0); if (fd == -1) { log_error("failed to create UDP socket %s", strerror(errno)); goto ex; } rv = fcntl(fd, F_SETFL, O_NONBLOCK); if (rv == -1) { log_error("failed to set non-blocking operation " "on UDP socket: %s", strerror(errno)); goto ex; } rv = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, (char *)&one, sizeof(one)); if (rv == -1) { log_error("failed to set the SO_REUSEADDR option"); goto ex; } rv = bind(fd, (struct sockaddr *)&local->sa6, local->saddrlen); if (rv == -1) { log_error("failed to bind UDP socket to [%s]:%d: %s", site_string(local), booth_conf->port, strerror(errno)); goto ex; } recvbuf_size = SOCKET_BUFFER_SIZE; rv = setsockopt(fd, SOL_SOCKET, SO_RCVBUF, &recvbuf_size, sizeof(recvbuf_size)); if (rv == -1) { log_error("failed to set recvbuf size"); goto ex; } local->udp_fd = fd; return 0; ex: if (fd >= 0) close(fd); return -1; } /* Receive/process callback for UDP */ static void process_recv(int ci) { struct sockaddr_storage sa; int rv; socklen_t sa_len; char buffer[256]; /* Used for unit tests */ struct boothc_ticket_msg *msg; sa_len = sizeof(sa); msg = (void*)buffer; rv = recvfrom(clients[ci].fd, buffer, sizeof(buffer), MSG_NOSIGNAL | MSG_DONTWAIT, (struct sockaddr *)&sa, &sa_len); if (rv == -1) return; deliver_fn(msg, rv); } static int booth_udp_init(void *f) { int rv; rv = setup_udp_server(); if (rv < 0) return rv; deliver_fn = f; client_add(local->udp_fd, booth_transport + UDP, process_recv, NULL); return 0; } int booth_udp_send(struct booth_site *to, void *buf, int len) { int rv; rv = sendto(local->udp_fd, buf, len, MSG_NOSIGNAL, (struct sockaddr *)&to->sa6, to->saddrlen); if (rv == len) { rv = 0; } else if (rv < 0) { log_error("Cannot send to %s: %d %s", site_string(to), errno, strerror(errno)); } else { rv = -1; log_error("Packet sent to %s got truncated", site_string(to)); } return rv; } int booth_udp_send_auth(struct booth_site *to, void *buf, int len) { int rv; rv = add_hmac(buf, len); if (rv < 0) return rv; return booth_udp_send(to, buf, len); } static int booth_udp_broadcast_auth(void *buf, int len) { int i, rv, rvs; struct booth_site *site; if (!booth_conf || !booth_conf->site_count) return -1; rv = add_hmac(buf, len); if (rv < 0) return rv; rvs = 0; foreach_node(i, site) { if (site != local) { rv = booth_udp_send(site, buf, len); if (!rvs) rvs = rv; } } return rvs; } static int booth_udp_exit(void) { return 0; } /* SCTP transport layer has not been developed yet */ static int booth_sctp_init(void *f __attribute__((unused))) { return 0; } static int booth_sctp_send(struct booth_site * to __attribute__((unused)), void *buf __attribute__((unused)), int len __attribute__((unused))) { return 0; } static int booth_sctp_broadcast(void *buf __attribute__((unused)), int len __attribute__((unused))) { return 0; } static int return_0_booth_site(struct booth_site *v __attribute((unused))) { return 0; } static int return_0(void) { return 0; } const struct booth_transport booth_transport[TRANSPORT_ENTRIES] = { [TCP] = { .name = "TCP", .init = booth_tcp_init, .open = booth_tcp_open, .send = booth_tcp_send, .recv = booth_tcp_recv, .recv_auth = booth_tcp_recv_auth, .close = booth_tcp_close, .exit = booth_tcp_exit }, [UDP] = { .name = "UDP", .init = booth_udp_init, .open = return_0_booth_site, .send = booth_udp_send, .send_auth = booth_udp_send_auth, .close = return_0_booth_site, .broadcast_auth = booth_udp_broadcast_auth, .exit = booth_udp_exit }, [SCTP] = { .name = "SCTP", .init = booth_sctp_init, .open = return_0_booth_site, .send = booth_sctp_send, .broadcast = booth_sctp_broadcast, .exit = return_0, } }; const struct booth_transport *local_transport = booth_transport+TCP; /* data + (datalen-sizeof(struct hmac)) points to struct hmac * i.e. struct hmac is always tacked on the payload */ int add_hmac(void *data, int len) { int rv = 0; #if HAVE_LIBMHASH int payload_len; struct hmac *hp; if (!is_auth_req()) return 0; rv = update_authkey(); if (rv < 0) { return rv; } payload_len = len - sizeof(struct hmac); hp = (struct hmac *)((unsigned char *)data + payload_len); hp->hid = htonl(BOOTH_HASH); memset(hp->hash, 0, BOOTH_MAC_SIZE); rv = calc_hmac(data, payload_len, BOOTH_HASH, hp->hash, booth_conf->authkey, booth_conf->authkey_len); if (rv < 0) { log_error("internal error: cannot calculate mac"); } #endif return rv; } +/* TODO: we need some client identification */ +#define peer_string(p) (p ? site_string(p) : "client") + +/* verify the validity of timestamp from the header + * the timestamp needs to be either greater than the one already + * recorded for the site or, and this is checked for clients, + * not to be older than booth_conf->maxtimeskew + * update the timestamp for the site, if this packet is from a + * site + */ +static int verify_ts(struct booth_site *from, void *buf, int len) +{ + struct boothc_header *h; + struct timeval tv, curr_tv, now; + + if (len < sizeof(*h)) { + log_error("%s: packet too short", peer_string(from)); + return -1; + } + + h = (struct boothc_header *)buf; + tv.tv_sec = ntohl(h->secs); + tv.tv_usec = ntohl(h->usecs); + if (from) { + curr_tv.tv_sec = from->last_secs; + curr_tv.tv_usec = from->last_usecs; + if (timercmp(&tv, &curr_tv, >)) + goto accept; + log_warn("%s: packet timestamp older than previous one", + site_string(from)); + } + + gettimeofday(&now, NULL); + now.tv_sec -= booth_conf->maxtimeskew; + if (timercmp(&tv, &now, >)) + goto accept; + log_error("%s: packet timestamp older than %d seconds", + peer_string(from), booth_conf->maxtimeskew); + return -1; + +accept: + if (from) { + from->last_secs = tv.tv_sec; + from->last_usecs = tv.tv_usec; + } + return 0; +} + int check_auth(struct booth_site *from, void *buf, int len) { int rv = 0; #if HAVE_LIBMHASH int payload_len; struct hmac *hp; if (!is_auth_req()) return 0; rv = update_authkey(); if (rv < 0) { return rv; } payload_len = len - sizeof(struct hmac); hp = (struct hmac *)((unsigned char *)buf + payload_len); rv = verify_hmac(buf, payload_len, ntohl(hp->hid), hp->hash, booth_conf->authkey, booth_conf->authkey_len); - if (rv < 0 && from) { - log_error("%s failed to authenticate", site_string(from)); + if (!rv) { + rv = verify_ts(from, buf, len); + } + if (rv != 0) { + log_error("%s: failed to authenticate", peer_string(from)); } #endif return rv; } int send_data(int fd, void *data, int datalen) { int rv = 0; rv = add_hmac(data, datalen); if (!rv) rv = do_write(fd, data, datalen); return rv; } int send_header_plus(int fd, struct boothc_hdr_msg *msg, void *data, int len) { int rv; rv = send_data(fd, msg, sendmsglen(msg)-len); if (rv >= 0 && len) rv = do_write(fd, data, len); return rv; }