diff --git a/README-testing b/README-testing index 9a74a54..9788aea 100644 --- a/README-testing +++ b/README-testing @@ -1,217 +1,217 @@ There's a booth-test package which contains two types of tests. It installs the necessary files into `/usr/share/booth/tests`. === Live tests (booth operation) BEWARE: Run this with _test_ clusters only! The live testing utility tests booth operation using the given `booth.conf`: $ /usr/share/booth/tests/test/live_test.sh booth.conf It is possible to run only specific tests. Run the script without arguments to see usage and the list of tests and netem network emulation functions. There are some restrictions on how booth.conf is formatted. -There could be several tickets defined and all of them will be +There may be several tickets defined and all of them will be tested, one after another (they will be tested separately). The tickets must have expire and timeout parameters configured. Example booth.conf: ------------ transport="UDP" port="9929" arbitrator="10.2.12.53" arbitrator="10.2.13.82" site="10.2.12.101" site="10.2.13.101" site="10.121.187.99" ticket="ticket-A" expire = 30 timeout = 3 retries = 3 before-acquire-handler = /usr/share/booth/service-runnable d-src1 ------------ A split brain condition is also tested. For that to work, all sites need `iptables` installed. The supplied script `booth_path` is used to manipulate iptables rules. ==== Pacemaker configuration This is a sample pacemaker configuration for a single-node cluster: primitive booth ocf:pacemaker:booth-site primitive d-src1 ocf:heartbeat:Dummy rsc_ticket global-d-src1 ticket-A: d-src1 Additionally, you may also add an ocf:booth:sharedrsc resource to also check that the ticket is granted always to only one site: primitive shared ocf:booth:sharedrsc \ params dir="10.2.13.82:/var/tmp/boothtestdir" rsc_ticket global-shared ticket-A: shared Please adjust to your environment. ==== Network environment emulation To introduce packet loss or network delays, set the NETEM_ENV environment variable. There are currently three netem network emulation settings supported: - loss: all servers emulate packet loss (30% by default) - single_loss: the first site in the configuration emulates packet loss (30% by default) - net_delay: all servers emulate packet delay (100ms by default with random variation of 10%) The settings can be supplied by adding ':' to the emulator name. For instance: # NETEM_ENV=loss:50 /usr/share/booth/tests/test/live_test.sh booth.conf It is not necessary to run the test script on one of the sites. Just copy the script and make the test `booth.conf` available locally: $ scp testsite:/usr/share/booth/tests/test/live_test.sh . $ scp testsite:/etc/booth/booth.conf . $ sh live_test.sh booth.conf You need at least two sites and one arbitrator. The configuration can contain just one ticket. It is not necessary to configure the `before-acquire-handler`. Notes: - (BEWARE!) the supplied configuration files is copied to /etc/booth/booth.conf to all sites/arbitrators thus overwriting any existing configuration - the utility uses ssh to manage booth at all sites/arbitrators and logs in as user `root` - it is required that ssh public authentication works without providing the passphrase (otherwise it is impractical) - the log file is ./test_booth.log (it is actually a shell trace, with timestamps if you're running bash) - in case one of the tests fail, hb_report is created If you want to open a bug report, please attach all hb_reports and `test_booth.log`. === Simple tests (commandline, config file) Run (as non-root) # python test/runtests.py to run the tests written in python. === Unit tests These use gdb and pexpect to set boothd state to some configured value, injecting some input and looking at the output. # python script/unit-test.py src/boothd unit-tests/ Or, if using the 'booth-test' RPM, # python unit-test.py src/boothd unit-tests/ This must (currently?) be run as a non-root user; another optional argument is the test to start from, eg. '003'. Basically, boothd is started with the config file `unit-tests/booth.conf`, and gdb gets attached to it. Then, some ticket state is set, incoming messages are delivered, and outgoing messages and the state is compared to expected values. `unit-tests/_defaults.txt` has default values for the initial state and message data. Each test file consists of headers and key/value pairs: -------------------- ticket: state ST_STABLE message0: # optional comment for the log file header.cmd OP_ACCEPTING ticket.id "asdga" outgoing0: header.cmd OP_PREPARING last_ack_ballot 42 finally: new_ballot 1234 -------------------- A few details to the the above example: * Ticket states in RAM (`ticket`, `finally`) are written in host-endianness. * Message data (`messageN`, `outgoingN`) are automatically converted via `htonl` resp. `ntohl`. They are delivered/checked in the order defined by the integer `N` component. * Strings are done via `strcpy()` * `ticket` and `messageN` are assignment chunks * `finally` and `outgoingN` are compare chunks * In `outgoingN` you can check _both_ message data (keys with a `.` in them) and ticket state * Symbolic names are useable, GDB translates them for us * The test scripts in `unit-tests/` need to be named with 3 digits, an underscore, some text, and `.txt` * The "fake" `crm_ticket` script gets the current test via `UNIT_TEST`; test scripts can pass additional information via `UNIT_TEST_AUX`. ==== Tips and Hints There's another special header: `gdb__N__`. These lines are sent to GDB after injecting a message, but before waiting for an outgoing line. Values that contain `§` are sent as multiple lines to GDB. This means that a stanza like -------------------- gdb0: watch booth_conf->ticket[0].owner § commands § bt § c § end -------------------- will cause a watchpoint to be set, and when it is triggered a backtrace (`bt`) is written to the log file. This makes it easy to ask for additional data or check for a call-chain when hitting bugs that can be reproduced via such a unit-test. # vim: set ft=asciidoc : diff --git a/src/config.h b/src/config.h index 09def16..bca73bc 100644 --- a/src/config.h +++ b/src/config.h @@ -1,328 +1,340 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _CONFIG_H #define _CONFIG_H #include #include #include "booth.h" #include "timer.h" #include "raft.h" #include "transport.h" /** @{ */ /** Definitions for in-RAM data. */ #define MAX_NODES 16 #define MAX_ARGS 16 #define TICKET_ALLOC 16 #define OTHER_SITE "other" typedef enum { EXTPROG_IDLE, EXTPROG_RUNNING, EXTPROG_EXITED, EXTPROG_IGNORE, } extprog_state_e; #define tk_test tk->clu_test typedef enum { ATTR_OP_EQ = 1, ATTR_OP_NE, } attr_op_e; typedef enum { GRANT_AUTO = 1, GRANT_MANUAL, } grant_type_e; typedef enum { TICKET_MODE_AUTO = 1, TICKET_MODE_MANUAL, } ticket_mode_e; struct toktab { const char *str; int val; }; struct attr_prereq { grant_type_e grant_type; /* grant type */ attr_op_e op; /* attribute operation */ char *attr_name; char *attr_val; }; struct ticket_config { /** \name Configuration items. * @{ */ /** Name of ticket. */ boothc_ticket name; /** How long a term lasts if not refreshed (in ms) */ int term_duration; /** Network related timeouts (in ms) */ int timeout; /** Retries before giving up. */ int retries; /** If >0, time to wait for a site to get fenced. * The ticket may be acquired after that timespan by * another site. */ int acquire_after; /* How often to renew the ticket (in ms) */ int renewal_freq; /* Program to ask whether it makes sense to * acquire the ticket */ struct clu_test { char *path; int is_dir; char *argv[MAX_ARGS]; pid_t pid; int status; /* child exit status */ extprog_state_e progstate; /* program running/idle/waited on */ } clu_test; /** Node weights. */ int weight[MAX_NODES]; /* Mode operation of the ticket. * Set to MANUAL to make sure that the ticket will be manipulated * only by manual commands of the administrator. In such a case * automatic elections will be disabled. * Manual tickets do not have to be renewed every some time. * The leader will continue to send heartbeat messages to other sites. */ ticket_mode_e mode; /** @} */ /** \name Runtime values. * @{ */ /** Current state. */ server_state_e state; /** Next state. Used at startup. */ server_state_e next_state; /** When something has to be done */ timetype next_cron; /** Current leader. This is effectively the log[] in Raft. */ struct booth_site *leader; /** Leader that got lost. */ struct booth_site *lost_leader; /** Is the ticket granted? */ int is_granted; /** Which site considered itself a leader. * For manual tickets it is possible, that * more than one site will act as a leader. * This array is used for tracking that situation * and notifying the user about the issue. + * + * Possible values for every site: + * 0: the site does not claim to be the leader + * 1: the site considers itself a leader and + * is sending or used to send heartbeat messages + * + * The site will be marked as '1' until this site + * receives revoke confirmation. + * + * If more than one site has '1', the geo cluster is + * considered to have multiple leadership and proper + * warning are generated. */ int sites_where_granted[MAX_NODES]; /** Timestamp of leadership expiration */ timetype term_expires; /** End of election period */ timetype election_end; struct booth_site *voted_for; /** Who the various sites vote for. * NO_OWNER = no vote yet. */ struct booth_site *votes_for[MAX_NODES]; /* bitmap */ uint64_t votes_received; /** Last voting round that was seen. */ uint32_t current_term; /** Do ticket updates whenever we get enough heartbeats. * But do that only once. * This is reset to 0 whenever we broadcast heartbeat and set * to 1 once enough acks are received. * Increased to 2 when the ticket is commited to the CIB (see * delay_commit). */ uint32_t ticket_updated; /** Outcome of whatever ticket request was processed. * Can also be an intermediate stage. */ uint32_t outcome; /** @} */ /** */ uint32_t last_applied; uint32_t next_index[MAX_NODES]; uint32_t match_index[MAX_NODES]; /* Why did we start the elections? */ cmd_reason_t election_reason; /* if it is potentially dangerous to grant the ticket * immediately, then this is set to some point in time, * usually (now + term_duration + acquire_after) */ timetype delay_commit; /* the last request RPC we sent */ uint32_t last_request; /* if we expect some acks, then set this to the id of * the RPC which others will send us; it is cleared once all * replies were received */ uint32_t acks_expected; /* bitmask of servers which sent acks */ uint64_t acks_received; /* timestamp of the request */ timetype req_sent_at; /* we need to wait for MY_INDEX from other servers, * hold the ticket processing for a while until they reply */ int start_postpone; /** Last renewal time */ timetype last_renewal; /* Do we need to update the copy in the CIB? * Normally, the ticket is written only when it changes via * the UPDATE RPC (for followers) and on expiration update * (for leaders) */ int update_cib; /* Is this ticket in election? */ int in_election; /* don't log warnings unnecessarily */ int expect_more_rejects; /** \name Needed while proposals are being done. * @{ */ /* Need to keep the previous valid ticket in case we moved to * start new elections and another server asks for the ticket * status. It would be wrong to send our candidate ticket. */ struct ticket_config *last_valid_tk; /** Attributes, user defined */ GHashTable *attr; /** Attribute prerequisites */ GList *attr_prereqs; /** Whom to vote for the next time. * Needed to push a ticket to someone else. */ #if 0 /** Bitmap of sites that acknowledge that state. */ uint64_t proposal_acknowledges; /** When an incompletely acknowledged proposal gets done. * If all peers agree, that happens sooner. * See switch_state_to(). */ struct timeval proposal_switch; /** Timestamp of proposal expiration. */ time_t proposal_expires; #endif /** Number of send retries left. * Used on the new owner. * Starts at 0, counts up. */ int retry_number; /** @} */ }; struct booth_config { char name[BOOTH_NAME_LEN]; /** File containing the authentication file. */ char authfile[BOOTH_PATH_LEN]; struct stat authstat; char authkey[BOOTH_MAX_KEY_LEN]; int authkey_len; /** Maximum time skew between peers allowed */ int maxtimeskew; transport_layer_t proto; uint16_t port; /** Stores the OR of sites bitmasks. */ uint64_t sites_bits; /** Stores the OR of all members' bitmasks. */ uint64_t all_bits; char site_user[BOOTH_NAME_LEN]; char site_group[BOOTH_NAME_LEN]; char arb_user[BOOTH_NAME_LEN]; char arb_group[BOOTH_NAME_LEN]; uid_t uid; gid_t gid; int site_count; struct booth_site site[MAX_NODES]; int ticket_count; int ticket_allocated; struct ticket_config *ticket; }; extern struct booth_config *booth_conf; #define is_auth_req() (booth_conf->authkey[0] != '\0') int read_config(const char *path, int type); int check_config(int type); int find_site_by_name(char *site, struct booth_site **node, int any_type); int find_site_by_id(uint32_t site_id, struct booth_site **node); const char *type_to_string(int type); #endif /* _CONFIG_H */ diff --git a/src/manual.c b/src/manual.c index ae951a6..ee9e858 100644 --- a/src/manual.c +++ b/src/manual.c @@ -1,109 +1,108 @@ /* * Copyright (C) 2017 Chris Kowalczyk * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include "manual.h" #include "transport.h" #include "ticket.h" #include "config.h" #include "log.h" #include "request.h" /* For manual tickets, manual_selection function is an equivalent * of new_election function used for assigning automatic tickets. * The workflow here is much simplier, as no voting is performed, * and the current node doesn't have to wait for any responses * from other sites. */ int manual_selection(struct ticket_config *tk, struct booth_site *preference, int update_term, cmd_reason_t reason) { if (local->type != SITE) return 0; tk_log_debug("starting manual selection (caused by %s %s)", state_to_string(reason), reason == OR_AGAIN ? state_to_string(tk->election_reason) : "" ); // Manual selection is done without any delay, the leader is assigned set_leader(tk, local); - mark_ticket_as_granted_to(tk, local); set_state(tk, ST_LEADER); // Manual tickets never expire, we don't specify expiration time // Make sure that election_end field is empty time_reset(&tk->election_end); // Make sure that delay commit is empty, as manual tickets don't // wait for any kind of confirmation from other nodes time_reset(&tk->delay_commit); save_committed_tkt(tk); // Inform others about the new leader ticket_broadcast(tk, OP_HEARTBEAT, OP_ACK, RLT_SUCCESS, 0); tk->ticket_updated = 0; return 0; } /* This function is called for manual tickets that were * revoked from another site, which this site doesn't * consider as a leader. */ int process_REVOKE_for_manual_ticket ( struct ticket_config *tk, struct booth_site *sender, struct boothc_ticket_msg *msg) { int rv; // For manual tickets, we may end up having two leaders. // If one of them is revoked, it will send information // to all members of the GEO cluster. // We may find ourselves here if this particular site // has not been following the leader which had been revoked // (and which had sent this message). // We send the ACK, to satisfy the requestor. rv = send_msg(OP_ACK, tk, sender, msg); // Mark this ticket as not granted to the sender anymore. mark_ticket_as_revoked(tk, sender); if (tk->state == ST_LEADER) { tk_log_warn("%s wants to revoke ticket, " "but this site is itself a leader", site_string(sender)); // Because another leader is presumably stepping down, // let's notify other sites that now we are the only leader. ticket_broadcast(tk, OP_HEARTBEAT, OP_ACK, RLT_SUCCESS, 0); } else { tk_log_warn("%s wants to revoke ticket, " "but this site is not following it", site_string(sender)); } return rv; } diff --git a/src/pacemaker.c b/src/pacemaker.c index e9aea4b..7e3f9e6 100644 --- a/src/pacemaker.c +++ b/src/pacemaker.c @@ -1,540 +1,539 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include #include #include #include #include #include #include #include #include #include "ticket.h" #include "log.h" #include "attr.h" #include "pacemaker.h" #include "inline-fn.h" enum atomic_ticket_supported { YES=0, NO, FILENOTFOUND, /* Ie. UNKNOWN */ UNKNOWN = FILENOTFOUND, }; /* http://thedailywtf.com/Articles/What_Is_Truth_0x3f_.aspx */ enum atomic_ticket_supported atomicity = UNKNOWN; #define COMMAND_MAX 1024 /** Determines whether the installed crm_ticket can do atomic ticket grants, * _including_ multiple attribute changes. * * See * https://bugzilla.novell.com/show_bug.cgi?id=855099 * * Run "crm_ticket" without "--force"; * - the old version asks for "Y/N" via STDIN, and returns 0 * when reading "no"; * - the new version just reports an error without asking. */ static void test_atomicity(void) { int rv; if (atomicity != UNKNOWN) return; rv = system("echo n | crm_ticket -g -t any-ticket-name > /dev/null 2> /dev/null"); if (rv == -1) { log_error("Cannot run \"crm_ticket\"!"); /* BIG problem. Abort. */ exit(1); } if (WIFSIGNALED(rv)) { log_error("\"crm_ticket\" terminated by a signal!"); /* Problem. Abort. */ exit(1); } switch (WEXITSTATUS(rv)) { case 0: atomicity = NO; log_info("Old \"crm_ticket\" found, using non-atomic ticket updates."); break; case 1: atomicity = YES; log_info("New \"crm_ticket\" found, using atomic ticket updates."); break; default: log_error("Unexpected return value from \"crm_ticket\" (%d), " "falling back to non-atomic ticket updates.", rv); atomicity = NO; } assert(atomicity == YES || atomicity == NO); } const char * interpret_rv(int rv) { static char text[64]; if (rv == 0) return "0"; if (WIFSIGNALED(rv)) sprintf(text, "got signal %d", WTERMSIG(rv)); else sprintf(text, "exit code %d", WEXITSTATUS(rv)); return text; } static int pcmk_write_ticket_atomic(struct ticket_config *tk, int grant) { char cmd[COMMAND_MAX]; int rv; /* The values are appended to "-v", so that NO_ONE * (which is -1) isn't seen as another option. */ snprintf(cmd, COMMAND_MAX, "crm_ticket -t '%s' " "%s --force " "-S owner -v%" PRIi32 " " "-S expires -v%" PRIi64 " " "-S term -v%" PRIi64, tk->name, (grant > 0 ? "-g" : grant < 0 ? "-r" : ""), (int32_t)get_node_id(tk->leader), (int64_t)wall_ts(&tk->term_expires), (int64_t)tk->current_term); rv = system(cmd); log_debug("command: '%s' was executed", cmd); if (rv != 0) log_error("error: \"%s\" failed, %s", cmd, interpret_rv(rv)); return rv; } static int pcmk_store_ticket_nonatomic(struct ticket_config *tk); static int pcmk_grant_ticket(struct ticket_config *tk) { char cmd[COMMAND_MAX]; int rv; test_atomicity(); if (atomicity == YES) return pcmk_write_ticket_atomic(tk, +1); rv = pcmk_store_ticket_nonatomic(tk); if (rv) return rv; snprintf(cmd, COMMAND_MAX, "crm_ticket -t %s -g --force", tk->name); log_debug("command: '%s' was executed", cmd); rv = system(cmd); if (rv != 0) log_error("error: \"%s\" failed, %s", cmd, interpret_rv(rv)); return rv; } static int pcmk_revoke_ticket(struct ticket_config *tk) { char cmd[COMMAND_MAX]; int rv; test_atomicity(); if (atomicity == YES) return pcmk_write_ticket_atomic(tk, -1); rv = pcmk_store_ticket_nonatomic(tk); if (rv) return rv; snprintf(cmd, COMMAND_MAX, "crm_ticket -t %s -r --force", tk->name); log_debug("command: '%s' was executed", cmd); rv = system(cmd); if (rv != 0) log_error("error: \"%s\" failed, %s", cmd, interpret_rv(rv)); return rv; } static int _run_crm_ticket(char *cmd) { int i, rv; /* If there are errors, there's not much we can do but retry ... */ for (i=0; i<3 && (rv = system(cmd)); i++) ; log_debug("'%s' gave result %s", cmd, interpret_rv(rv)); return rv; } static int crm_ticket_set_int(const struct ticket_config *tk, const char *attr, int64_t val) { char cmd[COMMAND_MAX]; snprintf(cmd, COMMAND_MAX, "crm_ticket -t '%s' -S '%s' -v %" PRIi64, tk->name, attr, val); return _run_crm_ticket(cmd); } static int pcmk_set_attr(struct ticket_config *tk, const char *attr, const char *val) { char cmd[COMMAND_MAX]; snprintf(cmd, COMMAND_MAX, "crm_ticket -t '%s' -S '%s' -v '%s'", tk->name, attr, val); return _run_crm_ticket(cmd); } static int pcmk_del_attr(struct ticket_config *tk, const char *attr) { char cmd[COMMAND_MAX]; snprintf(cmd, COMMAND_MAX, "crm_ticket -t '%s' -D '%s'", tk->name, attr); return _run_crm_ticket(cmd); } static int pcmk_store_ticket_nonatomic(struct ticket_config *tk) { int rv; /* Always try to store *each* attribute, even if there's an error * for one of them. */ rv = crm_ticket_set_int(tk, "owner", (int32_t)get_node_id(tk->leader)); rv = crm_ticket_set_int(tk, "expires", wall_ts(&tk->term_expires)) || rv; rv = crm_ticket_set_int(tk, "term", tk->current_term) || rv; if (rv) log_error("setting crm_ticket attributes failed; %s", interpret_rv(rv)); else log_info("setting crm_ticket attributes successful"); return rv; } typedef int (*attr_f)(struct ticket_config *tk, const char *name, const char *val); struct attr_tab { const char *name; attr_f handling_f; }; static int save_expires(struct ticket_config *tk, const char *name, const char *val) { secs2tv(unwall_ts(atol(val)), &tk->term_expires); return 0; } static int save_term(struct ticket_config *tk, const char *name, const char *val) { tk->current_term = atol(val); return 0; } static int parse_boolean(const char *val) { long v; if (!strncmp(val, "false", 5)) { v = 0; } else if (!strncmp(val, "true", 4)) { v = 1; } else { v = atol(val); } return v; } static int save_granted(struct ticket_config *tk, const char *name, const char *val) { tk->is_granted = parse_boolean(val); return 0; } static int save_owner(struct ticket_config *tk, const char *name, const char *val) { /* No check, node could have been deconfigured. */ tk->leader = NULL; return !find_site_by_id(atol(val), &tk->leader); } static int ignore_attr(struct ticket_config *tk, const char *name, const char *val) { return 0; } static int save_attr(struct ticket_config *tk, const char *name, const char *val) { /* tell store_geo_attr not to store time, we don't have that * information available */ return store_geo_attr(tk, name, val, 1); } struct attr_tab attr_handlers[] = { { "expires", save_expires}, { "term", save_term}, { "granted", save_granted}, { "owner", save_owner}, { "id", ignore_attr}, { "last-granted", ignore_attr}, { NULL, 0}, }; /* get_attr is currently not used and has not been tested */ static int pcmk_get_attr(struct ticket_config *tk, const char *attr, const char **vp) { char cmd[COMMAND_MAX]; char line[BOOTH_ATTRVAL_LEN+1]; int rv = 0; FILE *p; *vp = NULL; snprintf(cmd, COMMAND_MAX, "crm_ticket -t '%s' -G '%s' --quiet", tk->name, attr); p = popen(cmd, "r"); if (p == NULL) { rv = errno; log_error("popen error %d (%s) for \"%s\"", rv, strerror(rv), cmd); return rv || EINVAL; } if (fgets(line, BOOTH_ATTRVAL_LEN, p) == NULL) { rv = ENODATA; goto out; } *vp = g_strdup(line); out: rv = pclose(p); if (!rv) { log_debug("command \"%s\"", cmd); } else if (WEXITSTATUS(rv) == 6) { log_info("command \"%s\", ticket not found", cmd); } else { log_error("command \"%s\" %s", cmd, interpret_rv(rv)); } return rv; } static int save_attributes(struct ticket_config *tk, xmlDocPtr doc) { int rv = 0, rc; xmlNodePtr n; xmlAttrPtr attr; xmlChar *v; struct attr_tab *atp; n = xmlDocGetRootElement(doc); if (n == NULL) { tk_log_error("crm_ticket xml output empty"); return -EINVAL; } if (xmlStrcmp(n->name, (const xmlChar *)"ticket_state")) { tk_log_error("crm_ticket xml root element not ticket_state"); return -EINVAL; } for (attr = n->properties; attr; attr = attr->next) { v = xmlGetProp(n, attr->name); for (atp = attr_handlers; atp->name; atp++) { if (!strcmp(atp->name, (const char *) attr->name)) { rc = atp->handling_f(tk, (const char *) attr->name, (const char *) v); break; } } if (!atp->name) { rc = save_attr(tk, (const char *) attr->name, (const char *) v); } if (rc) { tk_log_error("error storing attribute %s", attr->name); rv |= rc; } xmlFree(v); } return rv; } #define CHUNK_SIZE 256 static int parse_ticket_state(struct ticket_config *tk, FILE *p) { int rv = 0; GString *input = NULL; char line[CHUNK_SIZE]; xmlDocPtr doc = NULL; xmlErrorPtr errptr; int opts = XML_PARSE_COMPACT | XML_PARSE_NONET; /* skip first two lines of output */ if (fgets(line, CHUNK_SIZE-1, p) == NULL || fgets(line, CHUNK_SIZE-1, p) == NULL) { tk_log_error("crm_ticket xml output empty"); rv = ENODATA; goto out; } input = g_string_sized_new(CHUNK_SIZE); if (!input) { log_error("out of memory"); rv = -1; goto out; } while (fgets(line, CHUNK_SIZE-1, p) != NULL) { if (!g_string_append(input, line)) { log_error("out of memory"); rv = -1; goto out; } } doc = xmlReadDoc((const xmlChar *) input->str, NULL, NULL, opts); if (doc == NULL) { errptr = xmlGetLastError(); if (errptr) { tk_log_error("crm_ticket xml parse failed (domain=%d, level=%d, code=%d): %s", errptr->domain, errptr->level, errptr->code, errptr->message); } else { tk_log_error("crm_ticket xml parse failed"); } rv = -EINVAL; goto out; } rv = save_attributes(tk, doc); out: if (doc) xmlFreeDoc(doc); if (input) g_string_free(input, TRUE); return rv; } static int pcmk_load_ticket(struct ticket_config *tk) { char cmd[COMMAND_MAX]; int rv = 0, pipe_rv; FILE *p; /* This here gets run during startup; testing that here means that * normal operation won't be interrupted with that test. */ test_atomicity(); snprintf(cmd, COMMAND_MAX, "crm_ticket -t '%s' -q", tk->name); p = popen(cmd, "r"); if (p == NULL) { pipe_rv = errno; log_error("popen error %d (%s) for \"%s\"", pipe_rv, strerror(pipe_rv), cmd); return pipe_rv || -EINVAL; } rv = parse_ticket_state(tk, p); if (!tk->leader) { /* Hmm, no site found for the ticket we have in the * CIB!? * Assume that the ticket belonged to us if it was * granted here! */ log_warn("%s: no site matches; site got reconfigured?", tk->name); if (tk->is_granted) { log_warn("%s: granted here, assume it belonged to us", tk->name); set_leader(tk, local); - mark_ticket_as_granted_to(tk, local); } } pipe_rv = pclose(p); if (!pipe_rv) { log_debug("command \"%s\"", cmd); } else if (WEXITSTATUS(pipe_rv) == 6) { log_info("command \"%s\", ticket not found", cmd); } else { log_error("command \"%s\" %s", cmd, interpret_rv(pipe_rv)); } return rv | pipe_rv; } struct ticket_handler pcmk_handler = { .grant_ticket = pcmk_grant_ticket, .revoke_ticket = pcmk_revoke_ticket, .load_ticket = pcmk_load_ticket, .set_attr = pcmk_set_attr, .get_attr = pcmk_get_attr, .del_attr = pcmk_del_attr, }; diff --git a/src/raft.c b/src/raft.c index a0cec42..462fc3b 100644 --- a/src/raft.c +++ b/src/raft.c @@ -1,1023 +1,1011 @@ /* * Copyright (C) 2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include #include #include #include #include #include "booth.h" #include "timer.h" #include "transport.h" #include "inline-fn.h" #include "config.h" #include "raft.h" #include "ticket.h" #include "request.h" #include "log.h" #include "manual.h" inline static void clear_election(struct ticket_config *tk) { int i; struct booth_site *site; tk_log_debug("clear election"); tk->votes_received = 0; foreach_node(i, site) tk->votes_for[site->index] = NULL; } inline static void record_vote(struct ticket_config *tk, struct booth_site *who, struct booth_site *vote) { tk_log_debug("site %s votes for %s", site_string(who), site_string(vote)); if (!tk->votes_for[who->index]) { tk->votes_for[who->index] = vote; tk->votes_received |= who->bitmask; } else { if (tk->votes_for[who->index] != vote) tk_log_warn("%s voted previously " "for %s and now wants to vote for %s (ignored)", site_string(who), site_string(tk->votes_for[who->index]), site_string(vote)); } } static void update_term_from_msg(struct ticket_config *tk, struct boothc_ticket_msg *msg) { uint32_t i; i = ntohl(msg->ticket.term); /* if we failed to start the election, then accept the term * from the leader * */ if (tk->state == ST_CANDIDATE) { tk->current_term = i; } else { tk->current_term = max(i, tk->current_term); } } static void set_ticket_expiry(struct ticket_config *tk, int duration) { set_future_time(&tk->term_expires, duration); } static void update_ticket_from_msg(struct ticket_config *tk, struct booth_site *sender, struct boothc_ticket_msg *msg) { int duration; tk_log_info("updating from %s (%d/%d)", site_string(sender), ntohl(msg->ticket.term), msg_term_time(msg)); duration = min(tk->term_duration, msg_term_time(msg)); set_ticket_expiry(tk, duration); update_term_from_msg(tk, msg); } static void copy_ticket_from_msg(struct ticket_config *tk, struct boothc_ticket_msg *msg) { set_ticket_expiry(tk, msg_term_time(msg)); tk->current_term = ntohl(msg->ticket.term); } static void become_follower(struct ticket_config *tk, struct boothc_ticket_msg *msg) { copy_ticket_from_msg(tk, msg); set_state(tk, ST_FOLLOWER); time_reset(&tk->delay_commit); tk->in_election = 0; /* if we're following and the ticket was granted here * then commit to CIB right away (we're probably restarting) */ if (tk->is_granted) { disown_ticket(tk); ticket_write(tk); } } static void won_elections(struct ticket_config *tk) { set_leader(tk, local); - mark_ticket_as_granted_to(tk, local); set_state(tk, ST_LEADER); set_ticket_expiry(tk, tk->term_duration); time_reset(&tk->election_end); tk->voted_for = NULL; if (is_time_set(&tk->delay_commit) && all_sites_replied(tk)) { time_reset(&tk->delay_commit); tk_log_debug("reset delay commit as all sites replied"); } save_committed_tkt(tk); ticket_broadcast(tk, OP_HEARTBEAT, OP_ACK, RLT_SUCCESS, 0); tk->ticket_updated = 0; } /* if more than one member got the same (and maximum within that * election) number of votes, then that is a tie */ static int is_tie(struct ticket_config *tk) { int i; struct booth_site *v; int count[MAX_NODES] = { 0, }; int max_votes = 0, max_cnt = 0; for(i=0; isite_count; i++) { v = tk->votes_for[i]; if (!v) continue; count[v->index]++; max_votes = max(max_votes, count[v->index]); } for(i=0; isite_count; i++) { if (count[i] == max_votes) max_cnt++; } return max_cnt > 1; } static struct booth_site *majority_votes(struct ticket_config *tk) { int i, n; struct booth_site *v; int count[MAX_NODES] = { 0, }; for(i=0; isite_count; i++) { v = tk->votes_for[i]; if (!v || v == no_leader) continue; n = v->index; count[n]++; tk_log_debug("Majority: %d %s wants %d %s => %d", i, site_string(&booth_conf->site[i]), n, site_string(v), count[n]); if (count[n]*2 <= booth_conf->site_count) continue; tk_log_debug("Majority reached: %d of %d for %s", count[n], booth_conf->site_count, site_string(v)); return v; } return NULL; } void elections_end(struct ticket_config *tk) { struct booth_site *new_leader; if (is_past(&tk->election_end)) { /* This is previous election timed out */ tk_log_info("elections finished"); } tk->in_election = 0; new_leader = majority_votes(tk); if (new_leader == local) { won_elections(tk); tk_log_info("granted successfully here"); } else if (new_leader) { tk_log_info("ticket granted at %s", site_string(new_leader)); } else { tk_log_info("nobody won elections, new elections"); tk->outcome = RLT_MORE; foreach_tkt_req(tk, notify_client); if (!new_election(tk, NULL, is_tie(tk) ? 2 : 0, OR_AGAIN)) { ticket_activate_timeout(tk); } } } static int newer_term(struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg, int in_election) { uint32_t term; /* it may happen that we hear about our newer term */ if (leader == local) return 0; term = ntohl(msg->ticket.term); /* §5.1 */ if (term > tk->current_term) { set_state(tk, ST_FOLLOWER); if (!in_election) { set_leader(tk, leader); - mark_ticket_as_granted_to(tk, leader); tk_log_info("from %s: higher term %d vs. %d, following %s", site_string(sender), term, tk->current_term, ticket_leader_string(tk)); } else { tk_log_debug("from %s: higher term %d vs. %d (election)", site_string(sender), term, tk->current_term); } tk->current_term = term; return 1; } return 0; } static int msg_term_invalid(struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg) { uint32_t term; term = ntohl(msg->ticket.term); /* §5.1 */ if (is_term_invalid(tk, term)) { tk_log_info("got invalid term from %s " "(%d), ignoring", site_string(sender), term); return 1; } return 0; } static int term_too_low(struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg) { uint32_t term; term = ntohl(msg->ticket.term); /* §5.1 */ if (term < tk->current_term) { tk_log_info("sending reject to %s, its term too low " "(%d vs. %d)", site_string(sender), term, tk->current_term ); send_reject(sender, tk, RLT_TERM_OUTDATED, msg); return 1; } return 0; } /* For follower. */ static int answer_HEARTBEAT ( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { uint32_t term; term = ntohl(msg->ticket.term); tk_log_debug("heartbeat from leader: %s, have %s; term %d vs %d", site_string(leader), ticket_leader_string(tk), term, tk->current_term); if (term < tk->current_term) { if (sender == tk->leader) { tk_log_info("trusting leader %s with a lower term (%d vs %d)", site_string(leader), term, tk->current_term); } else if (is_owned(tk)) { tk_log_warn("different leader %s with a lower term " "(%d vs %d), sending reject", site_string(leader), term, tk->current_term); return send_reject(sender, tk, RLT_TERM_OUTDATED, msg); } } /* got heartbeat, no rejects expected anymore */ tk->expect_more_rejects = 0; /* Needed? */ newer_term(tk, sender, leader, msg, 0); become_follower(tk, msg); /* Racy??? */ assert(sender == leader || !leader); set_leader(tk, leader); - mark_ticket_as_granted_to(tk, leader); /* Ack the heartbeat (we comply). */ return send_msg(OP_ACK, tk, sender, msg); } static int process_UPDATE ( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { if (is_owned(tk) && sender != tk->leader) { tk_log_warn("different leader %s wants to update " "our ticket, sending reject", site_string(leader)); - - mark_ticket_as_granted_to(tk, sender); + + mark_ticket_as_granted(tk, sender); return send_reject(sender, tk, RLT_TERM_OUTDATED, msg); } tk_log_debug("leader %s wants to update our ticket", site_string(leader)); become_follower(tk, msg); set_leader(tk, leader); - mark_ticket_as_granted_to(tk, leader); ticket_write(tk); /* run ticket_cron if the ticket expires */ set_ticket_wakeup(tk); return send_msg(OP_ACK, tk, sender, msg); } static int process_REVOKE ( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { int rv; if (tk->state == ST_INIT && tk->leader == no_leader) { /* assume that our ack got lost */ rv = send_msg(OP_ACK, tk, sender, msg); } else if (tk->leader != sender) { if (!is_manual(tk)) { tk_log_error("%s wants to revoke ticket, " "but it is not granted there (ignoring)", site_string(sender)); return -1; } else { rv = process_REVOKE_for_manual_ticket(tk, sender, msg); // Ticket data stored in this site is not modified. This means // that this site will still follow another leader (the one which // has not been revoked) or be a leader itself. } } else if (tk->state != ST_FOLLOWER) { tk_log_error("unexpected ticket revoke from %s " "(in state %s) (ignoring)", site_string(sender), state_to_string(tk->state)); return -1; } else { tk_log_info("%s revokes ticket", site_string(tk->leader)); save_committed_tkt(tk); - mark_ticket_as_revoked_from_leader(tk); - reset_ticket(tk); - set_leader(tk, no_leader); + reset_ticket_and_set_no_leader(tk); ticket_write(tk); rv = send_msg(OP_ACK, tk, sender, msg); } return rv; } /* For leader. */ static int process_ACK( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { uint32_t term; int req; term = ntohl(msg->ticket.term); if (newer_term(tk, sender, leader, msg, 0)) { /* unexpected higher term */ tk_log_warn("got higher term from %s (%d vs. %d)", site_string(sender), term, tk->current_term); return 0; } /* Don't send a reject. */ if (term < tk->current_term) { /* Doesn't know what he's talking about - perhaps * doesn't receive our packets? */ tk_log_warn("unexpected term " "from %s (%d vs. %d) (ignoring)", site_string(sender), term, tk->current_term); return 0; } /* if the ticket is to be revoked, further processing is not * interesting (and dangerous) */ if (tk->next_state == ST_INIT || tk->state == ST_INIT) return 0; req = ntohl(msg->header.request); if ((req == OP_UPDATE || req == OP_HEARTBEAT) && term == tk->current_term && leader == tk->leader) { if (majority_of_bits(tk, tk->acks_received)) { /* OK, at least half of the nodes are reachable; * Update the ticket and send update messages out */ return leader_update_ticket(tk); } } return 0; } static int process_VOTE_FOR( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { if (leader == no_leader) { /* leader wants to step down? */ if (sender == tk->leader && (tk->state == ST_FOLLOWER || tk->state == ST_CANDIDATE)) { tk_log_info("%s wants to give the ticket away (ticket release)", site_string(tk->leader)); save_committed_tkt(tk); reset_ticket(tk); set_state(tk, ST_FOLLOWER); if (local->type == SITE) { ticket_write(tk); schedule_election(tk, OR_STEPDOWN); } } else { tk_log_info("%s votes for none, ignoring (duplicate ticket release?)", site_string(sender)); } return 0; } if (tk->state != ST_CANDIDATE) { /* lost candidate status, somebody rejected our proposal */ tk_log_info("candidate status lost, ignoring VtFr from %s", site_string(sender)); return 0; } if (term_too_low(tk, sender, leader, msg)) return 0; if (newer_term(tk, sender, leader, msg, 0)) { clear_election(tk); } record_vote(tk, sender, leader); /* only if all voted can we take the ticket now, otherwise * wait for timeout in ticket_cron */ if (!tk->acks_expected) { /* §5.2 */ elections_end(tk); } return 0; } static int process_REJECTED( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { uint32_t rv; rv = ntohl(msg->header.result); if (tk->state == ST_CANDIDATE && leader == local) { /* the sender has us as the leader (!) * the elections will time out, then we can try again */ tk_log_warn("ticket was granted to us " "(and we didn't know)"); tk->expect_more_rejects = 1; return 0; } if (tk->state == ST_CANDIDATE && rv == RLT_TERM_OUTDATED) { tk_log_warn("ticket outdated (term %d), granted to %s", ntohl(msg->ticket.term), site_string(leader) ); set_leader(tk, leader); - mark_ticket_as_granted_to(tk, leader); tk->expect_more_rejects = 1; become_follower(tk, msg); return 0; } if (tk->state == ST_CANDIDATE && rv == RLT_TERM_STILL_VALID) { if (tk->lost_leader == leader) { if (tk->election_reason == OR_TKT_LOST) { tk_log_warn("%s still has the ticket valid, " "we'll backup a bit", site_string(sender)); } else { tk_log_warn("%s unexpectedly rejects elections", site_string(sender)); } } else { tk_log_warn("ticket was granted to %s " "(and we didn't know)", site_string(leader)); } set_leader(tk, leader); - mark_ticket_as_granted_to(tk, leader); become_follower(tk, msg); tk->expect_more_rejects = 1; return 0; } if (tk->state == ST_CANDIDATE && rv == RLT_YOU_OUTDATED) { set_leader(tk, leader); - mark_ticket_as_granted_to(tk, leader); tk->expect_more_rejects = 1; if (leader && leader != no_leader) { tk_log_warn("our ticket is outdated, granted to %s", site_string(leader)); become_follower(tk, msg); } else { tk_log_warn("our ticket is outdated and revoked"); update_ticket_from_msg(tk, sender, msg); set_state(tk, ST_INIT); } return 0; } if (!tk->expect_more_rejects) { tk_log_warn("from %s: in state %s, got %s (unexpected reject)", site_string(sender), state_to_string(tk->state), state_to_string(rv)); } return 0; } static int ticket_seems_ok(struct ticket_config *tk) { int left; left = term_time_left(tk); if (!left) return 0; /* quite sure */ if (tk->state == ST_CANDIDATE) return 0; /* in state of flux */ if (tk->state == ST_LEADER) return 1; /* quite sure */ if (tk->state == ST_FOLLOWER && left >= tk->term_duration/3) return 1; /* almost quite sure */ return 0; } static int test_reason( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { int reason; reason = ntohl(msg->header.reason); if (reason == OR_TKT_LOST) { if (tk->state == ST_INIT && tk->leader == no_leader) { tk_log_warn("%s claims that the ticket is lost, " "but it's in %s state (reject sent)", site_string(sender), state_to_string(tk->state) ); return RLT_YOU_OUTDATED; } if (ticket_seems_ok(tk)) { tk_log_warn("%s claims that the ticket is lost, " "but it is ok here (reject sent)", site_string(sender)); return RLT_TERM_STILL_VALID; } } return 0; } /* §5.2 */ static int answer_REQ_VOTE( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { int valid; struct boothc_ticket_msg omsg; cmd_result_t inappr_reason; int reason; inappr_reason = test_reason(tk, sender, leader, msg); if (inappr_reason) return send_reject(sender, tk, inappr_reason, msg); valid = term_time_left(tk); reason = ntohl(msg->header.reason); /* valid tickets are not allowed only if the sender thinks * the ticket got lost */ if (sender != tk->leader && valid && reason != OR_STEPDOWN) { tk_log_warn("election from %s with reason %s rejected " "(we have %s as ticket owner), ticket still valid for %ds", site_string(sender), state_to_string(reason), site_string(tk->leader), valid); return send_reject(sender, tk, RLT_TERM_STILL_VALID, msg); } if (term_too_low(tk, sender, leader, msg)) return 0; /* set this, so that we know not to send status for the * ticket */ tk->in_election = 1; /* reset ticket's leader on not valid tickets */ - if (!valid) { - mark_ticket_as_revoked_from_leader(tk); + if (!valid) set_leader(tk, NULL); - } /* if it's a newer term or ... */ if (newer_term(tk, sender, leader, msg, 1)) { clear_election(tk); goto vote_for_sender; } /* ... we didn't vote yet, then vote for the sender */ /* §5.2, §5.4 */ if (!tk->voted_for) { vote_for_sender: tk->voted_for = sender; record_vote(tk, sender, leader); } init_ticket_msg(&omsg, OP_VOTE_FOR, OP_REQ_VOTE, RLT_SUCCESS, 0, tk); omsg.ticket.leader = htonl(get_node_id(tk->voted_for)); return booth_udp_send_auth(sender, &omsg, sendmsglen(&omsg)); } #define is_reason(r, tk) \ (reason == (r) || (reason == OR_AGAIN && (tk)->election_reason == (r))) int new_election(struct ticket_config *tk, struct booth_site *preference, int update_term, cmd_reason_t reason) { struct booth_site *new_leader; if (local->type != SITE) return 0; if ((is_reason(OR_TKT_LOST, tk) || is_reason(OR_STEPDOWN, tk)) && check_attr_prereq(tk, GRANT_AUTO)) { tk_log_info("attribute prerequisite not met, " "not starting elections"); return 0; } /* elections were already started, but not yet finished/timed out */ if (is_time_set(&tk->election_end) && !is_past(&tk->election_end)) return 1; if (ANYDEBUG) { int tdiff; if (is_time_set(&tk->election_end)) { tdiff = -time_left(&tk->election_end); tk_log_debug("starting elections, previous finished since " intfmt(tdiff)); } else { tk_log_debug("starting elections"); } tk_log_debug("elections caused by %s %s", state_to_string(reason), reason == OR_AGAIN ? state_to_string(tk->election_reason) : "" ); } /* §5.2 */ /* If there was _no_ answer, don't keep incrementing the term number * indefinitely. If there was no peer, there'll probably be no one * listening now either. However, we don't know if we were * invoked due to a timeout (caller does). */ /* increment the term only if either the current term was * valid or if there was a tie (in that case update_term > 1) */ if ((update_term > 1) || (update_term && tk->last_valid_tk && tk->last_valid_tk->current_term >= tk->current_term)) { /* save the previous term, we may need to send out the * MY_INDEX message */ if (tk->state != ST_CANDIDATE) { save_committed_tkt(tk); } tk->current_term++; } set_future_time(&tk->election_end, tk->timeout); tk->in_election = 1; tk_log_info("starting new election (term=%d)", tk->current_term); clear_election(tk); new_leader = preference ? preference : local; record_vote(tk, local, new_leader); tk->voted_for = new_leader; set_state(tk, ST_CANDIDATE); /* some callers may want just to repeat on timeout */ if (reason == OR_AGAIN) { reason = tk->election_reason; } else { tk->election_reason = reason; } ticket_broadcast(tk, OP_REQ_VOTE, OP_VOTE_FOR, RLT_SUCCESS, reason); add_random_delay(tk); return 0; } /* we were a leader and somebody says that they have a more up * to date ticket * there was probably connectivity loss * tricky */ static int leader_handle_newer_ticket( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { update_term_from_msg(tk, msg); if (leader != no_leader && leader && leader != local) { /* eek, two leaders, split brain */ /* normally shouldn't happen; run election */ tk_log_error("from %s: ticket granted to %s! (revoking locally)", site_string(sender), site_string(leader) ); } else if (term_time_left(tk)) { /* eek, two leaders, split brain */ /* normally shouldn't happen; run election */ tk_log_error("from %s: ticket granted to %s! (revoking locally)", site_string(sender), site_string(leader) ); } set_next_state(tk, ST_LEADER); return 0; } /* reply to STATUS */ static int process_MY_INDEX ( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { int i; int expired; expired = !msg_term_time(msg); /* test against the last valid(!) ticket we have */ i = my_last_term(tk) - ntohl(msg->ticket.term); if (i > 0) { /* let them know about our newer ticket */ send_msg(OP_MY_INDEX, tk, sender, msg); if (tk->state == ST_LEADER) { tk_log_info("sending ticket update to %s", site_string(sender)); return send_msg(OP_UPDATE, tk, sender, msg); } } /* we have a newer or equal ticket and theirs is expired, * nothing more to do here */ if (i >= 0 && expired) { return 0; } if (tk->state == ST_LEADER) { /* we're the leader, thread carefully */ if (expired) { /* if their ticket is expired, * nothing more to do */ return 0; } if (i < 0) { /* they have a newer ticket, trouble if we're already leader * for it */ tk_log_warn("from %s: more up to date ticket at %s", site_string(sender), site_string(leader) ); return leader_handle_newer_ticket(tk, sender, leader, msg); } else { /* we have the ticket and we don't care */ return 0; } } else if (tk->state == ST_CANDIDATE) { if (leader == local) { /* a belated MY_INDEX, we're already trying to get the * ticket */ return 0; } } /* their ticket is either newer or not expired, don't * ignore it */ update_ticket_from_msg(tk, sender, msg); set_leader(tk, leader); - mark_ticket_as_granted_to(tk, leader); update_ticket_state(tk, sender); save_committed_tkt(tk); set_ticket_wakeup(tk); return 0; } int raft_answer( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { int cmd, req; int rv; rv = 0; cmd = ntohl(msg->header.cmd); req = ntohl(msg->header.request); if (req) tk_log_debug("got %s (req %s) from %s", state_to_string(cmd), state_to_string(req), site_string(sender)); else tk_log_debug("got %s from %s", state_to_string(cmd), site_string(sender)); /* don't process tickets with invalid term */ if (cmd != OP_STATUS && msg_term_invalid(tk, sender, leader, msg)) return 0; switch (cmd) { case OP_REQ_VOTE: rv = answer_REQ_VOTE(tk, sender, leader, msg); break; case OP_VOTE_FOR: rv = process_VOTE_FOR(tk, sender, leader, msg); break; case OP_ACK: if (tk->leader == local && tk->state == ST_LEADER) rv = process_ACK(tk, sender, leader, msg); break; case OP_HEARTBEAT: if ((tk->leader != local || !term_time_left(tk)) && (tk->state == ST_INIT || tk->state == ST_FOLLOWER || tk->state == ST_CANDIDATE)) rv = answer_HEARTBEAT(tk, sender, leader, msg); else { tk_log_warn("unexpected message %s, from %s", state_to_string(cmd), site_string(sender)); - mark_ticket_as_granted_to(tk, sender); + mark_ticket_as_granted(tk, sender); if (ticket_seems_ok(tk)) send_reject(sender, tk, RLT_TERM_STILL_VALID, msg); rv = -EINVAL; } break; case OP_UPDATE: if (((tk->leader != local && tk->leader == leader) || !is_owned(tk)) && (tk->state == ST_INIT || tk->state == ST_FOLLOWER || tk->state == ST_CANDIDATE)) { rv = process_UPDATE(tk, sender, leader, msg); } else { tk_log_warn("unexpected message %s, from %s", state_to_string(cmd), site_string(sender)); if (ticket_seems_ok(tk)) send_reject(sender, tk, RLT_TERM_STILL_VALID, msg); rv = -EINVAL; } break; case OP_REJECTED: rv = process_REJECTED(tk, sender, leader, msg); break; case OP_REVOKE: rv = process_REVOKE(tk, sender, leader, msg); break; case OP_MY_INDEX: rv = process_MY_INDEX(tk, sender, leader, msg); break; case OP_STATUS: if (!tk->in_election) rv = send_msg(OP_MY_INDEX, tk, sender, msg); break; default: tk_log_error("unknown message %s, from %s", state_to_string(cmd), site_string(sender)); rv = -EINVAL; } return rv; } diff --git a/src/ticket.c b/src/ticket.c index ab87b0b..6becc10 100644 --- a/src/ticket.c +++ b/src/ticket.c @@ -1,1433 +1,1438 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #include #include #include #include #include #include #include #include #include "b_config.h" #ifndef RANGE2RANDOM_GLIB #include #else #include "alt/range2random_glib.h" #endif #include "ticket.h" #include "config.h" #include "pacemaker.h" #include "inline-fn.h" #include "log.h" #include "booth.h" #include "raft.h" #include "handler.h" #include "request.h" #include "manual.h" #define TK_LINE 256 extern int TIME_RES; /* Untrusted input, must fit (incl. \0) in a buffer of max chars. */ int check_max_len_valid(const char *s, int max) { int i; for(i=0; iticket_count; i++) { if (!strncmp(booth_conf->ticket[i].name, ticket, sizeof(booth_conf->ticket[i].name))) { if (found) *found = booth_conf->ticket + i; return 1; } } return 0; } int check_ticket(char *ticket, struct ticket_config **found) { if (found) *found = NULL; if (!booth_conf) return 0; if (!check_max_len_valid(ticket, sizeof(booth_conf->ticket[0].name))) return 0; return find_ticket_by_name(ticket, found); } int check_site(char *site, int *is_local) { struct booth_site *node; if (!check_max_len_valid(site, sizeof(node->addr_string))) return 0; if (find_site_by_name(site, &node, 0)) { *is_local = node->local; return 1; } return 0; } /* is it safe to commit the grant? * if we didn't hear from all sites on the initial grant, we may * need to delay the commit * * TODO: investigate possibility to devise from history whether a * missing site could be holding a ticket or not */ static int ticket_dangerous(struct ticket_config *tk) { int tdiff; /* we may be invoked often, don't spam the log unnecessarily */ static int no_log_delay_msg; if (!is_time_set(&tk->delay_commit)) return 0; if (is_past(&tk->delay_commit) || all_sites_replied(tk)) { if (tk->leader == local) { tk_log_info("%s, committing to CIB", is_past(&tk->delay_commit) ? "ticket delay expired" : "all sites replied"); } time_reset(&tk->delay_commit); no_log_delay_msg = 0; return 0; } tdiff = time_left(&tk->delay_commit); tk_log_debug("delay ticket commit for another " intfmt(tdiff)); if (!no_log_delay_msg) { tk_log_info("delaying ticket commit to CIB for " intfmt(tdiff)); tk_log_info("(or all sites are reached)"); no_log_delay_msg = 1; } return 1; } int ticket_write(struct ticket_config *tk) { if (local->type != SITE) return -EINVAL; if (ticket_dangerous(tk)) return 1; if (tk->leader == local) { if (tk->state != ST_LEADER) { tk_log_info("ticket state not yet consistent, " "delaying ticket grant to CIB"); return 1; } pcmk_handler.grant_ticket(tk); } else { pcmk_handler.revoke_ticket(tk); } tk->update_cib = 0; return 0; } void save_committed_tkt(struct ticket_config *tk) { if (!tk->last_valid_tk) { tk->last_valid_tk = malloc(sizeof(struct ticket_config)); if (!tk->last_valid_tk) { log_error("out of memory"); return; } } memcpy(tk->last_valid_tk, tk, sizeof(struct ticket_config)); } static void ext_prog_failed(struct ticket_config *tk, int start_election) { if (!is_manual(tk)) { /* Give it to somebody else. * Just send a VOTE_FOR message, so the * others can start elections. */ if (leader_and_valid(tk)) { save_committed_tkt(tk); reset_ticket(tk); ticket_write(tk); if (start_election) { ticket_broadcast(tk, OP_VOTE_FOR, OP_REQ_VOTE, RLT_SUCCESS, OR_LOCAL_FAIL); } } } else { /* There is not much we can do now because * the manual ticket cannot be relocated. * Just warn the user. */ if (tk->leader == local) { save_committed_tkt(tk); reset_ticket(tk); ticket_write(tk); log_error("external test failed on the specified machine, cannot acquire a manual ticket"); } } } #define attr_found(geo_ap, ap) \ ((geo_ap) && !strcmp((geo_ap)->val, (ap)->attr_val)) int check_attr_prereq(struct ticket_config *tk, grant_type_e grant_type) { GList *el; struct attr_prereq *ap; struct geo_attr *geo_ap; for (el = g_list_first(tk->attr_prereqs); el; el = g_list_next(el)) { ap = (struct attr_prereq *)el->data; if (ap->grant_type != grant_type) continue; geo_ap = (struct geo_attr *)g_hash_table_lookup(tk->attr, ap->attr_name); switch(ap->op) { case ATTR_OP_EQ: if (!attr_found(geo_ap, ap)) goto fail; break; case ATTR_OP_NE: if (attr_found(geo_ap, ap)) goto fail; break; default: break; } } return 0; fail: tk_log_warn("'%s' attr-prereq failed", ap->attr_name); return 1; } /* do we need to run the external program? * or we already done that and waiting for the outcome * or program exited and we can collect the status * return codes * 0: no program defined * RUNCMD_MORE: program forked, results later * != 0: executing program failed (or some other failure) */ static int do_ext_prog(struct ticket_config *tk, int start_election) { int rv = 0; if (!tk_test.path) return 0; switch(tk_test.progstate) { case EXTPROG_IDLE: rv = run_handler(tk); if (rv == RUNCMD_ERR) { tk_log_warn("couldn't run external test, not allowed to acquire ticket"); ext_prog_failed(tk, start_election); } break; case EXTPROG_RUNNING: /* should never get here, but just in case */ rv = RUNCMD_MORE; break; case EXTPROG_EXITED: rv = tk_test_exit_status(tk); if (rv) { ext_prog_failed(tk, start_election); } break; case EXTPROG_IGNORE: /* nothing to do here */ break; } return rv; } /* Try to acquire a ticket * Could be manual grant or after start (if the ticket is granted * and still valid in the CIB) * If the external program needs to run, this is run twice, once * to start the program, and then to get the result and start * elections. */ int acquire_ticket(struct ticket_config *tk, cmd_reason_t reason) { int rv; if (reason == OR_ADMIN && check_attr_prereq(tk, GRANT_MANUAL)) return RLT_ATTR_PREREQ; switch(do_ext_prog(tk, 0)) { case 0: /* everything fine */ break; case RUNCMD_MORE: /* need to wait for the outcome before starting elections */ return 0; default: return RLT_EXT_FAILED; } if (is_manual(tk)) { rv = manual_selection(tk, local, 1, reason); } else { rv = new_election(tk, local, 1, reason); } return rv ? RLT_SYNC_FAIL : 0; } /** Try to get the ticket for the local site. * */ int do_grant_ticket(struct ticket_config *tk, int options) { int rv; tk_log_info("granting ticket"); if (tk->leader == local) return RLT_SUCCESS; if (is_owned(tk)) { if (is_manual(tk) && (options & OPT_IMMEDIATE)) { /* -F flag has been used while granting a manual ticket. * The ticket will be granted and may end up being granted * on multiple sites */ tk_log_warn("manual ticket forced to be granted! be aware that " "you may end up having two sites holding the same manual " "ticket! revoke the ticket from the unnecessary site!"); } else { return RLT_OVERGRANT; } } set_future_time(&tk->delay_commit, tk->term_duration + tk->acquire_after); if (options & OPT_IMMEDIATE) { tk_log_warn("granting ticket immediately! If there are " "unreachable sites, _hope_ you are sure that they don't " "have the ticket!"); time_reset(&tk->delay_commit); } rv = acquire_ticket(tk, OR_ADMIN); if (rv) { time_reset(&tk->delay_commit); return rv; } else { return RLT_MORE; } } static void start_revoke_ticket(struct ticket_config *tk) { tk_log_info("revoking ticket"); save_committed_tkt(tk); - mark_ticket_as_revoked_from_leader(tk); - reset_ticket(tk); - set_leader(tk, no_leader); + reset_ticket_and_set_no_leader(tk); ticket_write(tk); ticket_broadcast(tk, OP_REVOKE, OP_ACK, RLT_SUCCESS, OR_ADMIN); } /** Ticket revoke. * Only to be started from the leader. */ int do_revoke_ticket(struct ticket_config *tk) { if (tk->acks_expected) { tk_log_info("delay ticket revoke until the current operation finishes"); set_next_state(tk, ST_INIT); return RLT_MORE; } else { start_revoke_ticket(tk); return RLT_SUCCESS; } } int list_ticket(char **pdata, unsigned int *len) { struct ticket_config *tk; char timeout_str[64]; char pending_str[64]; char *data, *cp; int i, alloc, site_index; time_t ts; int multiple_grant_warning_length = 0; *pdata = NULL; *len = 0; alloc = booth_conf->ticket_count * (BOOTH_NAME_LEN * 2 + 128 + 16); foreach_ticket(i, tk) { multiple_grant_warning_length = number_sites_marked_as_granted(tk); - + if (multiple_grant_warning_length > 1) { // 164: 55 + 45 + 2*number_of_multiple_sites + some margin alloc += 164 + BOOTH_NAME_LEN * (1+multiple_grant_warning_length); } } data = malloc(alloc); if (!data) return -ENOMEM; cp = data; foreach_ticket(i, tk) { if ((!is_manual(tk)) && is_time_set(&tk->term_expires)) { /* Manual tickets doesn't have term_expires defined */ ts = wall_ts(&tk->term_expires); strftime(timeout_str, sizeof(timeout_str), "%F %T", localtime(&ts)); } else strcpy(timeout_str, "INF"); if (tk->leader == local && is_time_set(&tk->delay_commit) && !is_past(&tk->delay_commit)) { ts = wall_ts(&tk->delay_commit); strcpy(pending_str, " (commit pending until "); strftime(pending_str + strlen(" (commit pending until "), sizeof(pending_str) - strlen(" (commit pending until ") - 1, "%F %T", localtime(&ts)); strcat(pending_str, ")"); } else *pending_str = '\0'; cp += snprintf(cp, alloc - (cp - data), "ticket: %s, leader: %s", tk->name, ticket_leader_string(tk)); if (is_owned(tk)) { cp += snprintf(cp, alloc - (cp - data), ", expires: %s%s", timeout_str, pending_str); } if (is_manual(tk)) { cp += snprintf(cp, alloc - (cp - data), " [manual mode]"); } cp += snprintf(cp, alloc - (cp - data), "\n"); if (alloc - (cp - data) <= 0) { free(data); return -ENOMEM; } } foreach_ticket(i, tk) { multiple_grant_warning_length = number_sites_marked_as_granted(tk); - + if (multiple_grant_warning_length > 1) { cp += snprintf(cp, alloc - (cp - data), "\nWARNING: The ticket %s is granted to multiple sites: ", // ~55 characters tk->name); - + for(site_index=0; site_indexsite_count; ++site_index) { if (tk->sites_where_granted[site_index] > 0) { cp += snprintf(cp, alloc - (cp - data), "%s", site_string(&(booth_conf->site[site_index]))); if (--multiple_grant_warning_length > 0) { cp += snprintf(cp, alloc - (cp - data), ", "); } } } cp += snprintf(cp, alloc - (cp - data), ". Revoke the ticket from the faulty sites.\n"); // ~45 characters } } *pdata = data; *len = cp - data; return 0; } void disown_ticket(struct ticket_config *tk) { - mark_ticket_as_revoked_from_leader(tk); set_leader(tk, NULL); tk->is_granted = 0; get_time(&tk->term_expires); } int disown_if_expired(struct ticket_config *tk) { if (is_past(&tk->term_expires) || !tk->leader) { disown_ticket(tk); return 1; } return 0; } void reset_ticket(struct ticket_config *tk) { ignore_ext_test(tk); disown_ticket(tk); no_resends(tk); set_state(tk, ST_INIT); set_next_state(tk, 0); tk->voted_for = NULL; } +void reset_ticket_and_set_no_leader(struct ticket_config *tk) +{ + mark_ticket_as_revoked_from_leader(tk); + reset_ticket(tk); + + tk->leader = no_leader; + tk_log_debug("ticket leader set to no_leader"); +} static void log_reacquire_reason(struct ticket_config *tk) { int valid; const char *where_granted = "\0"; char buff[75]; valid = is_time_set(&tk->term_expires) && !is_past(&tk->term_expires); if (tk->leader == local) { where_granted = "granted here"; } else { snprintf(buff, sizeof(buff), "granted to %s", site_string(tk->leader)); where_granted = buff; } if (!valid) { tk_log_warn("%s, but not valid " "anymore (will try to reacquire)", where_granted); } if (tk->is_granted && tk->leader != local) { if (tk->leader && tk->leader != no_leader) { tk_log_error("granted here, but also %s, " "that's really too bad (will try to reacquire)", where_granted); } else { tk_log_warn("granted here, but we're " "not recorded as the grantee (will try to reacquire)"); } } } void update_ticket_state(struct ticket_config *tk, struct booth_site *sender) { if (tk->state == ST_CANDIDATE) { tk_log_info("learned from %s about " "newer ticket, stopping elections", site_string(sender)); /* there could be rejects coming from others; don't log * warnings unnecessarily */ tk->expect_more_rejects = 1; } if (tk->leader == local || tk->is_granted) { /* message from a live leader with valid ticket? */ if (sender == tk->leader && term_time_left(tk)) { if (tk->is_granted) { tk_log_warn("ticket was granted here, " "but it's live at %s (revoking here)", site_string(sender)); } else { tk_log_info("ticket live at %s", site_string(sender)); } disown_ticket(tk); ticket_write(tk); set_state(tk, ST_FOLLOWER); set_next_state(tk, ST_FOLLOWER); } else { if (tk->state == ST_CANDIDATE) { set_state(tk, ST_FOLLOWER); } set_next_state(tk, ST_LEADER); } } else { if (!tk->leader || tk->leader == no_leader) { if (sender) tk_log_info("ticket is not granted"); else tk_log_info("ticket is not granted (from CIB)"); set_state(tk, ST_INIT); } else { if (sender) tk_log_info("ticket granted to %s (says %s)", site_string(tk->leader), tk->leader == sender ? "they" : site_string(sender)); else tk_log_info("ticket granted to %s (from CIB)", site_string(tk->leader)); set_state(tk, ST_FOLLOWER); /* just make sure that we check the ticket soon */ set_next_state(tk, ST_FOLLOWER); } } } int setup_ticket(void) { struct ticket_config *tk; int i; foreach_ticket(i, tk) { reset_ticket(tk); if (local->type == SITE) { if (!pcmk_handler.load_ticket(tk)) { update_ticket_state(tk, NULL); } tk->update_cib = 1; } tk_log_info("broadcasting state query"); /* wait until all send their status (or the first * timeout) */ tk->start_postpone = 1; ticket_broadcast(tk, OP_STATUS, OP_MY_INDEX, RLT_SUCCESS, 0); } return 0; } int ticket_answer_list(int fd) { char *data; int rv; unsigned int olen; struct boothc_hdr_msg hdr; rv = list_ticket(&data, &olen); if (rv < 0) goto out; init_header(&hdr.header, CL_LIST, 0, 0, RLT_SUCCESS, 0, sizeof(hdr) + olen); rv = send_header_plus(fd, &hdr, data, olen); out: if (data) free(data); return rv; } int process_client_request(struct client *req_client, void *buf) { int rv, rc = 1; struct ticket_config *tk; int cmd; struct boothc_ticket_msg omsg; struct boothc_ticket_msg *msg; msg = (struct boothc_ticket_msg *)buf; cmd = ntohl(msg->header.cmd); if (!check_ticket(msg->ticket.id, &tk)) { log_warn("client referenced unknown ticket %s", msg->ticket.id); rv = RLT_INVALID_ARG; goto reply_now; } /* Perform the initial check before granting * an already granted non-manual ticket */ if ((!is_manual(tk) && (cmd == CMD_GRANT) && is_owned(tk))) { log_warn("client wants to grant an (already granted!) ticket %s", msg->ticket.id); rv = RLT_OVERGRANT; goto reply_now; } if ((cmd == CMD_REVOKE) && !is_owned(tk)) { log_info("client wants to revoke a free ticket %s", msg->ticket.id); rv = RLT_TICKET_IDLE; goto reply_now; } if ((cmd == CMD_REVOKE) && tk->leader != local) { tk_log_info("not granted here, redirect to %s", ticket_leader_string(tk)); rv = RLT_REDIRECT; goto reply_now; } if (cmd == CMD_REVOKE) rv = do_revoke_ticket(tk); else rv = do_grant_ticket(tk, ntohl(msg->header.options)); if (rv == RLT_MORE) { /* client may receive further notifications, save the * request for further processing */ add_req(tk, req_client, msg); tk_log_debug("queue request %s for client %d", state_to_string(cmd), req_client->fd); rc = 0; /* we're not yet done with the message */ } reply_now: init_ticket_msg(&omsg, CL_RESULT, 0, rv, 0, tk); send_client_msg(req_client->fd, &omsg); return rc; } int notify_client(struct ticket_config *tk, int client_fd, struct boothc_ticket_msg *msg) { struct boothc_ticket_msg omsg; void (*deadfn) (int ci); int rv, rc, ci; int cmd, options; struct client *req_client; cmd = ntohl(msg->header.cmd); options = ntohl(msg->header.options); rv = tk->outcome; ci = find_client_by_fd(client_fd); if (ci < 0) { tk_log_info("client %d (request %s) left before being notified", client_fd, state_to_string(cmd)); return 0; } tk_log_debug("notifying client %d (request %s)", client_fd, state_to_string(cmd)); init_ticket_msg(&omsg, CL_RESULT, 0, rv, 0, tk); rc = send_client_msg(client_fd, &omsg); if (rc == 0 && ((rv == RLT_MORE) || (rv == RLT_CIB_PENDING && (options & OPT_WAIT_COMMIT)))) { /* more to do here, keep the request */ return 1; } else { /* we sent a definite answer or there was a write error, drop * the client */ if (rc) { tk_log_debug("failed to notify client %d (request %s)", client_fd, state_to_string(cmd)); } else { tk_log_debug("client %d (request %s) got final notification", client_fd, state_to_string(cmd)); } req_client = clients + ci; deadfn = req_client->deadfn; if(deadfn) { deadfn(ci); } return 0; /* we're done with this request */ } } int ticket_broadcast(struct ticket_config *tk, cmd_request_t cmd, cmd_request_t expected_reply, cmd_result_t res, cmd_reason_t reason) { struct boothc_ticket_msg msg; init_ticket_msg(&msg, cmd, 0, res, reason, tk); tk_log_debug("broadcasting '%s' (term=%d, valid=%d)", state_to_string(cmd), ntohl(msg.ticket.term), msg_term_time(&msg)); tk->last_request = cmd; if (expected_reply) { expect_replies(tk, expected_reply); } ticket_activate_timeout(tk); return transport()->broadcast_auth(&msg, sendmsglen(&msg)); } /* update the ticket on the leader, write it to the CIB, and send out the update message to others with the new expiry time */ int leader_update_ticket(struct ticket_config *tk) { int rv = 0, rv2; timetype now; if (tk->ticket_updated >= 2) return 0; /* for manual tickets, we don't set time expiration */ if (!is_manual(tk)) { if (tk->ticket_updated < 1) { tk->ticket_updated = 1; get_time(&now); copy_time(&now, &tk->last_renewal); set_future_time(&tk->term_expires, tk->term_duration); rv = ticket_broadcast(tk, OP_UPDATE, OP_ACK, RLT_SUCCESS, 0); } } if (tk->ticket_updated < 2) { rv2 = ticket_write(tk); switch(rv2) { case 0: tk->ticket_updated = 2; tk->outcome = RLT_SUCCESS; foreach_tkt_req(tk, notify_client); break; case 1: if (tk->outcome != RLT_CIB_PENDING) { tk->outcome = RLT_CIB_PENDING; foreach_tkt_req(tk, notify_client); } break; default: break; } } return rv; } static void log_lost_servers(struct ticket_config *tk) { struct booth_site *n; int i; if (tk->retry_number > 1) /* log those that we couldn't reach, but do * that only on the first retry */ return; for (i = 0; i < booth_conf->site_count; i++) { n = booth_conf->site + i; if (!(tk->acks_received & n->bitmask)) { tk_log_warn("%s %s didn't acknowledge our %s, " "will retry %d times", (n->type == ARBITRATOR ? "arbitrator" : "site"), site_string(n), state_to_string(tk->last_request), tk->retries); } } } static void resend_msg(struct ticket_config *tk) { struct booth_site *n; int i; if (!(tk->acks_received ^ local->bitmask)) { ticket_broadcast(tk, tk->last_request, 0, RLT_SUCCESS, 0); } else { for (i = 0; i < booth_conf->site_count; i++) { n = booth_conf->site + i; if (!(tk->acks_received & n->bitmask)) { n->resend_cnt++; tk_log_debug("resending %s to %s", state_to_string(tk->last_request), site_string(n) ); send_msg(tk->last_request, tk, n, NULL); } } ticket_activate_timeout(tk); } } static void handle_resends(struct ticket_config *tk) { int ack_cnt; if (++tk->retry_number > tk->retries) { tk_log_info("giving up on sending retries"); no_resends(tk); set_ticket_wakeup(tk); return; } /* try to reach some sites again if we just stepped down */ if (tk->last_request == OP_VOTE_FOR) { tk_log_warn("no answers to our VtFr request to step down (try #%d), " "we are alone", tk->retry_number); goto just_resend; } if (!majority_of_bits(tk, tk->acks_received)) { ack_cnt = count_bits(tk->acks_received) - 1; if (!ack_cnt) { tk_log_warn("no answers to our request (try #%d), " "we are alone", tk->retry_number); } else { tk_log_warn("not enough answers to our request (try #%d): " "only got %d answers", tk->retry_number, ack_cnt); } } else { log_lost_servers(tk); } just_resend: resend_msg(tk); } int postpone_ticket_processing(struct ticket_config *tk) { extern timetype start_time; return tk->start_postpone && (-time_left(&start_time) < tk->timeout); } #define has_extprog_exited(tk) ((tk)->clu_test.progstate == EXTPROG_EXITED) static void process_next_state(struct ticket_config *tk) { int rv; switch(tk->next_state) { case ST_LEADER: if (has_extprog_exited(tk)) { if (tk->state != ST_LEADER) { rv = acquire_ticket(tk, OR_ADMIN); if (rv != 0) { /* external program failed */ tk->outcome = rv; foreach_tkt_req(tk, notify_client); } } } else { log_reacquire_reason(tk); acquire_ticket(tk, OR_REACQUIRE); } break; case ST_INIT: no_resends(tk); start_revoke_ticket(tk); tk->outcome = RLT_SUCCESS; foreach_tkt_req(tk, notify_client); break; /* wanting to be follower is not much of an ambition; no * processing, just return; don't reset start_postpone until * we got some replies to status */ case ST_FOLLOWER: return; default: break; } tk->start_postpone = 0; } static void ticket_lost(struct ticket_config *tk) { int reason = OR_TKT_LOST; if (tk->leader != local) { tk_log_warn("lost at %s", site_string(tk->leader)); } else { if (is_ext_prog_running(tk)) { ext_prog_timeout(tk); reason = OR_LOCAL_FAIL; } else { tk_log_warn("lost majority (revoking locally)"); reason = tk->election_reason ? tk->election_reason : OR_REACQUIRE; } } tk->lost_leader = tk->leader; save_committed_tkt(tk); mark_ticket_as_revoked_from_leader(tk); reset_ticket(tk); set_state(tk, ST_FOLLOWER); if (local->type == SITE) { ticket_write(tk); schedule_election(tk, reason); } } static void next_action(struct ticket_config *tk) { int rv; switch(tk->state) { case ST_INIT: /* init state, handle resends for ticket revoke */ /* and rebroadcast if stepping down */ /* try to acquire ticket on grant */ if (has_extprog_exited(tk)) { rv = acquire_ticket(tk, OR_ADMIN); if (rv != 0) { /* external program failed */ tk->outcome = rv; foreach_tkt_req(tk, notify_client); } } else { if (tk->acks_expected) { handle_resends(tk); } } break; case ST_FOLLOWER: if (!is_manual(tk)) { /* leader/ticket lost? and we didn't vote yet */ tk_log_debug("leader: %s, voted_for: %s", site_string(tk->leader), site_string(tk->voted_for)); if (!tk->leader) { if (!tk->voted_for || !tk->in_election) { disown_ticket(tk); if (!new_election(tk, NULL, 1, OR_AGAIN)) { ticket_activate_timeout(tk); } } else { /* we should restart elections in case nothing * happens in the meantime */ tk->in_election = 0; ticket_activate_timeout(tk); } } } else { /* for manual tickets, also try to acquire ticket on grant * in the Follower state (because we may end up having * two Leaders) */ if (has_extprog_exited(tk)) { rv = acquire_ticket(tk, OR_ADMIN); if (rv != 0) { /* external program failed */ tk->outcome = rv; foreach_tkt_req(tk, notify_client); } } else { /* Otherwise, just send ACKs if needed */ if (tk->acks_expected) { handle_resends(tk); } } } break; case ST_CANDIDATE: /* elections timed out? */ elections_end(tk); break; case ST_LEADER: /* timeout or ticket renewal? */ if (tk->acks_expected) { handle_resends(tk); if (majority_of_bits(tk, tk->acks_received)) { leader_update_ticket(tk); } } else { /* this is ticket renewal, run local test */ if (!do_ext_prog(tk, 1)) { ticket_broadcast(tk, OP_HEARTBEAT, OP_ACK, RLT_SUCCESS, 0); tk->ticket_updated = 0; } } break; default: break; } } static void ticket_cron(struct ticket_config *tk) { /* don't process the tickets too early after start */ if (postpone_ticket_processing(tk)) { tk_log_debug("ticket processing postponed (start_postpone=%d)", tk->start_postpone); /* but run again soon */ ticket_activate_timeout(tk); return; } /* no need for status resends, we hope we got at least one * my_index back */ if (tk->acks_expected == OP_MY_INDEX) { no_resends(tk); } /* after startup, we need to decide what to do based on the * current ticket state; tk->next_state has a hint * also used for revokes which had to be delayed */ if (tk->next_state) { process_next_state(tk); goto out; } /* Has an owner, has an expiry date, and expiry date in the past? * For automatic tickets, losing the ticket must happen * in _every_ state. */ if ((!is_manual(tk)) && is_owned(tk) && is_time_set(&tk->term_expires) && is_past(&tk->term_expires)) { ticket_lost(tk); goto out; } next_action(tk); out: tk->next_state = 0; if (!tk->in_election && tk->update_cib) ticket_write(tk); } void process_tickets(void) { struct ticket_config *tk; int i; timetype last_cron; foreach_ticket(i, tk) { if (!has_extprog_exited(tk) && is_time_set(&tk->next_cron) && !is_past(&tk->next_cron)) continue; tk_log_debug("ticket cron"); copy_time(&tk->next_cron, &last_cron); ticket_cron(tk); if (time_cmp(&last_cron, &tk->next_cron, ==)) { tk_log_debug("nobody set ticket wakeup"); set_ticket_wakeup(tk); } } } void tickets_log_info(void) { struct ticket_config *tk; int i; time_t ts; foreach_ticket(i, tk) { ts = wall_ts(&tk->term_expires); tk_log_info("state '%s' " "term %d " "leader %s " "expires %-24.24s", state_to_string(tk->state), tk->current_term, ticket_leader_string(tk), ctime(&ts)); } } static void update_acks( struct ticket_config *tk, struct booth_site *sender, struct booth_site *leader, struct boothc_ticket_msg *msg ) { uint32_t cmd; uint32_t req; cmd = ntohl(msg->header.cmd); req = ntohl(msg->header.request); if (req != tk->last_request || (tk->acks_expected != cmd && tk->acks_expected != OP_REJECTED)) return; /* got an ack! */ tk->acks_received |= sender->bitmask; if (all_replied(tk) || /* we just stepped down, need only one site to start * elections */ (cmd == OP_REQ_VOTE && tk->last_request == OP_VOTE_FOR)) { no_resends(tk); tk->start_postpone = 0; set_ticket_wakeup(tk); } } /* read ticket message */ int ticket_recv(void *buf, struct booth_site *source) { struct boothc_ticket_msg *msg; struct ticket_config *tk; struct booth_site *leader; uint32_t leader_u; msg = (struct boothc_ticket_msg *)buf; if (!check_ticket(msg->ticket.id, &tk)) { log_warn("got invalid ticket name %s from %s", msg->ticket.id, site_string(source)); source->invalid_cnt++; return -EINVAL; } leader_u = ntohl(msg->ticket.leader); if (!find_site_by_id(leader_u, &leader)) { tk_log_error("message with unknown leader %u received", leader_u); source->invalid_cnt++; return -EINVAL; } update_acks(tk, source, leader, msg); return raft_answer(tk, source, leader, msg); } static void log_next_wakeup(struct ticket_config *tk) { int left; left = time_left(&tk->next_cron); tk_log_debug("set ticket wakeup in " intfmt(left)); } /* New vote round; §5.2 */ /* delay the next election start for some random time * (up to 1 second) */ void add_random_delay(struct ticket_config *tk) { timetype tv; interval_add(&tk->next_cron, rand_time(min(1000, tk->timeout)), &tv); ticket_next_cron_at(tk, &tv); if (ANYDEBUG) { log_next_wakeup(tk); } } void set_ticket_wakeup(struct ticket_config *tk) { timetype near_future, tv, next_vote; set_future_time(&near_future, 10); if (!is_manual(tk)) { /* At least every hour, perhaps sooner (default) */ tk_log_debug("ticket will be woken up after up to one hour"); ticket_next_cron_in(tk, 3600*TIME_RES); switch (tk->state) { case ST_LEADER: assert(tk->leader == local); get_next_election_time(tk, &next_vote); /* If timestamp is in the past, wakeup in * near future */ if (!is_time_set(&next_vote)) { tk_log_debug("next ts unset, wakeup soon"); ticket_next_cron_at(tk, &near_future); } else if (is_past(&next_vote)) { int tdiff = time_left(&next_vote); tk_log_debug("next ts in the past " intfmt(tdiff)); ticket_next_cron_at(tk, &near_future); } else { ticket_next_cron_at(tk, &next_vote); } break; case ST_CANDIDATE: assert(is_time_set(&tk->election_end)); ticket_next_cron_at(tk, &tk->election_end); break; case ST_INIT: case ST_FOLLOWER: /* If there is (or should be) some owner, check on it later on. * If no one is interested - don't care. */ if (is_owned(tk)) { interval_add(&tk->term_expires, tk->acquire_after, &tv); ticket_next_cron_at(tk, &tv); } break; default: tk_log_error("unknown ticket state: %d", tk->state); } if (tk->next_state) { /* we need to do something soon here */ if (!tk->acks_expected) { ticket_next_cron_at(tk, &near_future); } else { ticket_activate_timeout(tk); } } } else { /* At least six minutes, to make sure that multi-leader situations * will be solved promptly. */ tk_log_debug("manual ticket will be woken up after up to six minutes"); ticket_next_cron_in(tk, 60*TIME_RES); /* For manual tickets, no earlier timeout could be set in a similar * way as it is done in a switch above for automatic tickets. * The reason is that term's timeout is INF and no Raft-based elections * are performed. */ } if (ANYDEBUG) { log_next_wakeup(tk); } } void schedule_election(struct ticket_config *tk, cmd_reason_t reason) { if (local->type != SITE) return; tk->election_reason = reason; get_time(&tk->next_cron); /* introduce a short delay before starting election */ add_random_delay(tk); } int is_manual(struct ticket_config *tk) { return (tk->mode == TICKET_MODE_MANUAL) ? 1 : 0; } int number_sites_marked_as_granted(struct ticket_config *tk) { int i, result = 0; for(i=0; isite_count; ++i) { result += tk->sites_where_granted[i]; } return result; } /* Given a state (in host byte order), return a human-readable (char*). * An array is used so that multiple states can be printed in a single printf(). */ char *state_to_string(uint32_t state_ho) { union mu { cmd_request_t s; char c[5]; }; static union mu cache[6] = { { 0 } }, *cur; static int current = 0; current ++; if (current >= sizeof(cache)/sizeof(cache[0])) current = 0; cur = cache + current; cur->s = htonl(state_ho); /* Shouldn't be necessary, union array is initialized with zeroes, and * these bytes never get written. */ cur->c[4] = 0; return cur->c; } int send_reject(struct booth_site *dest, struct ticket_config *tk, cmd_result_t code, struct boothc_ticket_msg *in_msg) { int req = ntohl(in_msg->header.cmd); struct boothc_ticket_msg msg; tk_log_debug("sending reject to %s", site_string(dest)); init_ticket_msg(&msg, OP_REJECTED, req, code, 0, tk); return booth_udp_send_auth(dest, &msg, sendmsglen(&msg)); } int send_msg ( int cmd, struct ticket_config *tk, struct booth_site *dest, struct boothc_ticket_msg *in_msg ) { int req = 0; struct ticket_config *valid_tk = tk; struct boothc_ticket_msg msg; /* if we want to send the last valid ticket, then if we're in * the ST_CANDIDATE state, the last valid ticket is in * tk->last_valid_tk */ if (cmd == OP_MY_INDEX) { if (tk->state == ST_CANDIDATE && tk->last_valid_tk) { valid_tk = tk->last_valid_tk; } tk_log_info("sending status to %s", site_string(dest)); } if (in_msg) req = ntohl(in_msg->header.cmd); init_ticket_msg(&msg, cmd, req, RLT_SUCCESS, 0, valid_tk); return booth_udp_send_auth(dest, &msg, sendmsglen(&msg)); } diff --git a/src/ticket.h b/src/ticket.h index fb9dc9a..e36e323 100644 --- a/src/ticket.h +++ b/src/ticket.h @@ -1,154 +1,162 @@ /* * Copyright (C) 2011 Jiaju Zhang * Copyright (C) 2013-2014 Philipp Marek * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This software is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License for more details. * * You should have received a copy of the GNU General Public License along * with this program; if not, write to the Free Software Foundation, Inc., * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. */ #ifndef _TICKET_H #define _TICKET_H #include #include #include #include "timer.h" #include "config.h" #include "log.h" extern int TIME_RES; #define DEFAULT_TICKET_EXPIRY (600*TIME_RES) #define DEFAULT_TICKET_TIMEOUT (5*TIME_RES) #define DEFAULT_RETRIES 10 #define foreach_ticket(i_,t_) for(i_=0; (t_=booth_conf->ticket+i_, i_ticket_count); i_++) #define foreach_node(i_,n_) for(i_=0; (n_=booth_conf->site+i_, i_site_count); i_++) #define set_leader(tk, who) do { \ + if (who == NULL) { \ + mark_ticket_as_revoked_from_leader(tk); \ + } \ + \ tk->leader = who; \ tk_log_debug("ticket leader set to %s", ticket_leader_string(tk)); \ + \ + if (tk->leader) { \ + mark_ticket_as_granted(tk, tk->leader); \ + } \ } while(0) -#define mark_ticket_as_granted_to(tk, who) do { \ +#define mark_ticket_as_granted(tk, who) do { \ if (is_manual(tk) && (who->index > -1)) { \ tk->sites_where_granted[who->index] = 1; \ - tk_log_debug("ticket marked as granted to %s", ticket_leader_string(tk)); \ + tk_log_debug("manual ticket marked as granted to %s", ticket_leader_string(tk)); \ } \ } while(0) #define mark_ticket_as_revoked(tk, who) do { \ - if (is_manual(tk) && (who->index > -1)) { \ + if (is_manual(tk) && who && (who->index > -1)) { \ tk->sites_where_granted[who->index] = 0; \ - tk_log_debug("ticket marked as revoked from %s", site_string(who)); \ + tk_log_debug("manual ticket marked as revoked from %s", site_string(who)); \ } \ } while(0) #define mark_ticket_as_revoked_from_leader(tk) do { \ - if (is_manual(tk) && tk->leader && (tk->leader->index > -1)) { \ - tk->sites_where_granted[tk->leader->index] = 0; \ - tk_log_debug("ticket marked as revoked from %s", ticket_leader_string(tk)); \ + if (tk->leader) { \ + mark_ticket_as_revoked(tk, tk->leader); \ } \ } while(0) #define set_state(tk, newst) do { \ tk_log_debug("state transition: %s -> %s", \ state_to_string(tk->state), state_to_string(newst)); \ tk->state = newst; \ } while(0) #define set_next_state(tk, newst) do { \ if (!(newst)) tk_log_debug("next state reset"); \ else tk_log_debug("next state set to %s", state_to_string(newst)); \ tk->next_state = newst; \ } while(0) #define is_term_invalid(tk, term) \ ((tk)->last_valid_tk && (tk)->last_valid_tk->current_term > (term)) void save_committed_tkt(struct ticket_config *tk); void disown_ticket(struct ticket_config *tk); int disown_if_expired(struct ticket_config *tk); int check_ticket(char *ticket, struct ticket_config **tc); int check_site(char *site, int *local); int grant_ticket(struct ticket_config *ticket); int revoke_ticket(struct ticket_config *ticket); int list_ticket(char **pdata, unsigned int *len); int ticket_recv(void *buf, struct booth_site *source); void reset_ticket(struct ticket_config *tk); +void reset_ticket_and_set_no_leader(struct ticket_config *tk); void update_ticket_state(struct ticket_config *tk, struct booth_site *sender); int setup_ticket(void); int check_max_len_valid(const char *s, int max); int do_grant_ticket(struct ticket_config *ticket, int options); int do_revoke_ticket(struct ticket_config *tk); int find_ticket_by_name(const char *ticket, struct ticket_config **found); void set_ticket_wakeup(struct ticket_config *tk); int postpone_ticket_processing(struct ticket_config *tk); int acquire_ticket(struct ticket_config *tk, cmd_reason_t reason); int ticket_answer_list(int fd); int process_client_request(struct client *req_client, void *buf); int ticket_write(struct ticket_config *tk); void process_tickets(void); void tickets_log_info(void); char *state_to_string(uint32_t state_ho); int send_reject(struct booth_site *dest, struct ticket_config *tk, cmd_result_t code, struct boothc_ticket_msg *in_msg); int send_msg (int cmd, struct ticket_config *tk, struct booth_site *dest, struct boothc_ticket_msg *in_msg); int notify_client(struct ticket_config *tk, int client_fd, struct boothc_ticket_msg *msg); int ticket_broadcast(struct ticket_config *tk, cmd_request_t cmd, cmd_request_t expected_reply, cmd_result_t res, cmd_reason_t reason); int leader_update_ticket(struct ticket_config *tk); void add_random_delay(struct ticket_config *tk); void schedule_election(struct ticket_config *tk, cmd_reason_t reason); int is_manual(struct ticket_config *tk); int number_sites_marked_as_granted(struct ticket_config *tk); int check_attr_prereq(struct ticket_config *tk, grant_type_e grant_type); static inline void ticket_next_cron_at(struct ticket_config *tk, timetype *when) { copy_time(when, &tk->next_cron); } static inline void ticket_next_cron_in(struct ticket_config *tk, int interval) { timetype tv; set_future_time(&tv, interval); ticket_next_cron_at(tk, &tv); } static inline void ticket_activate_timeout(struct ticket_config *tk) { /* TODO: increase timeout when no answers */ tk_log_debug("activate ticket timeout in %d", tk->timeout); ticket_next_cron_in(tk, tk->timeout); } #endif /* _TICKET_H */ diff --git a/test/live_test.sh b/test/live_test.sh index 6df7dfb..f8644a2 100755 --- a/test/live_test.sh +++ b/test/live_test.sh @@ -1,1353 +1,1351 @@ #!/bin/sh # # see README-testing for more information # do some basic booth operation tests for the given config # PROG=`basename $0` usage() { cat<[:]] $PROG [ ...] EOF if [ $1 -eq 0 ]; then list_all examples fi exit } list_all() { echo "Tests:" grep "^test_.*{$" $0 | sed 's/test_//;s/(.*//;s/^/ /' echo echo "Netem functions:" grep "^NETEM_ENV_.*{$" $0 | sed 's/NETEM_ENV_//;s/(.*//;s/^/ /' } examples() { cat< /dev/null } stop_site() { manage_site $1 stop } stop_arbitrator() { manage_arbitrator $1 stop } restart_site() { manage_site $1 restart } cleanup_site() { manage_site $1 cleanup } reload_site() { runcmd $1 OCF_ROOT=/usr/lib/ocf /usr/lib/ocf/resource.d/pacemaker/booth-site reload } restart_arbitrator() { manage_arbitrator $1 restart } booth_status() { test "`runcmd $1 booth status | get_stat_fld booth_state`" = "started" } cleanup_booth() { local h procs for h in $sites; do cleanup_site $h & procs="$! $procs" done >/dev/null 2>&1 wait $procs wait_timeout } cleanup_dep_rsc() { local dep_rsc=`get_rsc` test -z "$dep_rsc" && return local h procs for h in $sites; do runcmd $h crm -w resource cleanup $dep_rsc & procs="$! $procs" done >/dev/null 2>&1 wait $procs } check_dep_rsc() { local dep_rsc=`get_rsc` test -z "$dep_rsc" && return 0 local h for h in $sites; do runcmd $h BOOTH_TICKET=$tkt /usr/share/booth/service-runnable $dep_rsc || return 1 done return 0 } stop_booth() { local h rc for h in $sites; do stop_site $h rc=$((rc|$?)) done >/dev/null 2>&1 for h in $arbitrators; do stop_arbitrator $h rc=$((rc|$?)) done >/dev/null 2>&1 wait_timeout return $rc } start_booth() { local h rc for h in $sites; do start_site $h rc=$((rc|$?)) done >/dev/null 2>&1 for h in $arbitrators; do start_arbitrator $h rc=$((rc|$?)) done >/dev/null 2>&1 wait_timeout return $rc } restart_booth() { local h procs for h in $sites; do restart_site $h & procs="$! $procs" done >/dev/null 2>&1 for h in $arbitrators; do restart_arbitrator $h done >/dev/null 2>&1 wait $procs wait_timeout } reboot_test() { cleanup_booth restart_booth cleanup_dep_rsc } is_we_server() { local h for h in $sites $arbitrators; do ip a l | fgrep -wq $h && return done return 1 } is_pacemaker_running() { local h for h in $sites; do runcmd $h crmadmin -D >/dev/null || return 1 done return 0 } sync_conf() { local h rc=0 local tmpf for h in $sites $arbitrators; do rsync -q -e "ssh $SSH_OPTS" $1 root@$h:$run_cnf rc=$((rc|$?)) if [ -n "$authfile" ]; then tmpf=`mktemp` scp -q $(get_site 1):$authfile $tmpf && rsync -q -e "ssh $SSH_OPTS" $tmpf root@$h:$authfile rc=$((rc|$?)) rm -f $tmpf fi done return $rc } dump_conf() { echo "test configuration file $cnf:" grep -v '^#' $cnf | grep -v '^[[:space:]]*$' | sed "s/^/$cnf: /" } forall() { local h rc=0 for h in $sites $arbitrators; do runcmd $h $@ rc=$((rc|$?)) done return $rc } forall_withname() { local h rc=0 output for h in $sites $arbitrators; do output=`runcmd $h $@` rc=$((rc|$?)) echo $h: $output done return $rc } forall_sites() { local h rc=0 for h in $sites; do runcmd $h $@ rc=$((rc|$?)) done return $rc } forall_fun() { local h rc=0 f=$1 for h in $sites $arbitrators; do $f $h rc=$((rc|$?)) [ $rc -ne 0 ] && break done return $rc } # run on all hosts whatever function produced on stdout forall_fun2() { local h rc=0 f f=$1 shift 1 for h in $sites $arbitrators; do $f $@ | ssh $SSH_OPTS $h rc=$((rc|$?)) [ $rc -ne 0 ] && break done return $rc } run_site() { local n=$1 h shift 1 h=`echo $sites | awk '{print $'$n'}'` runcmd $h $@ } run_arbitrator() { local n=$1 h shift 1 h=`echo $arbitrators | awk '{print $'$n'}'` runcmd $h $@ } # need to get logs from _all_ clusters' nodes get_all_nodes() { for h in $sites; do runcmd $h crm_node -l | awk '{print $2}' done } extract_value() { sed 's/ *#.*//;s/.*=//;s/"//g;s/^ *//;s/ *$//' } get_extern_ip() { grep "^$1" | awk ' { if(/# *external[_-]ip=/) print $NF; else print; } ' | extract_value } get_value() { grep "^$1" | extract_value } # get internal IP for the external address internal_ip() { fgrep "$1" $cnf | extract_value } get_rsc() { awk ' n && /^[[:space:]]*before-acquire-handler/ {print $NF; exit} n && (/^$/ || /^ticket.*/) {exit} /^ticket.*'$tkt'/ {n=1} ' $cnf } get_attr() { awk ' n && /^[[:space:]]*attr-prereq = auto .* eq / {print $4,$6; exit} n && (/^$/ || /^ticket.*/) {exit} /^ticket.*'$tkt'/ {n=1} ' $cnf } get_mode() { awk ' n && /^[[:space:]]*mode/ {print $NF; exit} n && (/^$/ || /^ticket.*/) {exit} /^ticket.*'$tkt'/ {n=1} ' $cnf } set_site_attr() { local site site=$1 set -- `get_attr` run_site $site geostore set $1 $2 } del_site_attr() { local site site=$1 set -- `get_attr` run_site $site geostore delete $1 } break_external_prog() { run_site $1 crm configure "location $PREFNAME `get_rsc` rule -inf: defined \#uname" } show_pref() { run_site $1 crm configure show $PREFNAME > /dev/null } repair_external_prog() { run_site $1 crm configure delete __pref_booth_live_test } get_tkt() { grep "^ticket=" | head -1 | sed 's/ticket=//;s/"//g' } get_tkt_settings() { awk ' n && /^[[:space:]]*(expire|timeout|renewal-freq)/ { sub(" = ", "=", $0); gsub("-", "_", $0); sub("^[[:space:]]*", "T_", $0); if ($0 ~ /ms$/) { sub("ms$", "", $0); eq = match($0, "="); print substr($0, 1, eq)""substr($0, eq+1)/1000; } else { print; } next } n && (/^$/ || /^ticket.*/) {exit} /^ticket.*'$tkt'/ {n=1} ' $1 } wait_exp() { sleep $T_expire } wait_renewal() { sleep $T_renewal_freq } wait_timeout() { sleep $MIN_TIMEOUT } set_netem_env() { local modfun args modfun=`echo $1 | sed 's/:.*//'` args=`echo $1 | sed 's/[^:]*//;s/:/ /g'` if ! is_function NETEM_ENV_$modfun; then echo "NETEM_ENV_$modfun: doesn't exist" exit 1 fi NETEM_ENV_$modfun $args } reset_netem_env() { [ -z "$NETEM_ENV" ] && return [ -n "$__NETEM_RESET" ] && return __NETEM_RESET=1 forall $ABSPATH $run_cnf __netem__ netem_reset } setup_netem() { [ -z "$NETEM_ENV" ] && return __NETEM_RESET= echo "-------------------------------------------------- (netem)" | logmsg for env in $NETEM_ENV; do set_netem_env $env done trap "reset_netem_env" EXIT } cib_status() { local h=$1 stat stat=`runcmd $h crm_ticket -L | grep "^$tkt" | awk '{print $2}'` test "$stat" != "-1" } is_cib_granted() { local stat h=$1 stat=`runcmd $h crm_ticket -L | grep "^$tkt" | awk '{print $2}'` [ "$stat" = "granted" ] } check_cib_consistency() { local h gh="" rc=0 for h in $sites; do if is_cib_granted $h; then [ -n "$gh" ] && rc=1 # granted twice gh="$gh `internal_ip $h`" fi done [ -z "$gh" ] && gh="none" if [ $rc -eq 0 ]; then echo $gh return $rc fi cat<= 0 ? x : -x; } } ' | sort -n | tail -1 } booth_leader_consistency() { test `booth_list_fld 2 | sort -u | wc -l` -eq 1 } # are there two leaders or is it just that some booths are outdated booth_leader_consistency_2() { test `booth_list_fld 2 | sort -u | grep -iv none | wc -l` -le 1 } # do all booths have the same info? # possible differences: # a) more than one leader # b) some booths not uptodate (have no leader for the ticket) # c) ticket expiry times differ check_booth_consistency() { local tlist tlist_validate rc rc_lead maxdiff tlist=`forall_withname booth list 2>/dev/null | grep $tkt` # Check time consistency ticket_times=$(echo "$tlist" | booth_list_fld 3) if [[ $ticket_times == *"INF"* ]]; then - rc=0 + rc=0 else maxdiff=`echo "$tlist" | max_booth_time_diff` test "$maxdiff" -eq 0 rc=$? fi # Check leader consistency echo "$tlist" | booth_leader_consistency rc_lead=$? if [ $rc_lead -ne 0 ]; then echo "$tlist" | booth_leader_consistency_2 rc_lead=$(($rc_lead + $?)) # rc_lead=2 if the prev test failed fi rc=$(($rc | $rc_lead<<1)) test $rc -eq 0 && return cat</dev/null + run_site 1 booth revoke -w $tkt >/dev/null wait_timeout } run_report() { local start_ts=$1 end_ts=$2 name=$3 local hb_report_opts="" local quick_opt="" logmsg "running hb_report" hb_report -Q 2>&1 | grep -sq "illegal.option" || quick_opt="-Q" if [ `id -u` != 0 ]; then hb_report_opts="-u root" fi hb_report $hb_report_opts $quick_opt -f "`date -d @$((start_ts-5))`" \ -t "`date -d @$((end_ts+60))`" \ -n "$all_nodes $arbitrators" $name 2>&1 | logmsg } runtest() { local start_ts end_ts local rc booth_status dep_rsc_status local start_time end_time local usrmsg - local tested_ticket rc=0 TEST=$1 - tested_ticket=$2 start_time=`date` start_ts=`date +%s` - echo -n "Testing: $1 (ticket: $tested_ticket)... " + echo -n "Testing: $1 (ticket: $tkt)... " can_run_test $1 || return 0 echo "==================================================" | logmsg echo "starting booth test $1 ..." | logmsg if is_function setup_$1; then echo "-------------------------------------------------- (setup)" | logmsg - setup_$1 $tested_ticket + setup_$1 rc=$? [ "$rc" -ne 0 ] && rc=$ERR_SETUP_FAILED fi if [ "$rc" -eq 0 ]; then setup_netem echo "-------------------------------------------------- (test)" | logmsg - test_$1 $tested_ticket + test_$1 rc=$? fi case $rc in 0) # wait a bit more if we're losing packets [ -n "$PKT_LOSS" ] && wait_timeout echo "-------------------------------------------------- (check)" | logmsg - check_$1 $tested_ticket + check_$1 rc=$? if [ $rc -eq 0 ]; then usrmsg="SUCCESS" else usrmsg="check FAIL: $rc" fi ;; $ERR_SETUP_FAILED) usrmsg="setup FAIL" ;; *) usrmsg="test FAIL: $rc" ;; esac end_time=`date` end_ts=`date +%s` - echo "finished booth test $1 ($tested_ticket): $usrmsg" | logmsg + echo "finished booth test $1 ($tkt): $usrmsg" | logmsg echo "==================================================" | logmsg - is_function recover_$1 && recover_$1 $tested_ticket + is_function recover_$1 && recover_$1 reset_netem_env #sleep 3 all_booth_status booth_status=$? check_dep_rsc dep_rsc_status=$? if [ $((rc|booth_status|dep_rsc_status)) -eq 0 ]; then echo OK [ "$GET_REPORT" ] && run_report $start_ts $end_ts $TEST else echo "$usrmsg (running hb_report ... $1.tar.bz2; see also $logf)" [ $booth_status -ne 0 ] && echo "unexpected: some booth daemons not running" [ $dep_rsc_status -ne 0 ] && echo "unexpected: dependent resource failure" run_report $start_ts $end_ts $TEST reboot_test master_rc=1 fi - revoke_ticket $tested_ticket + revoke_ticket } # # the tests # # most tests start by granting ticket grant_ticket() { - run_site $1 booth grant -w $2 >/dev/null + run_site $1 booth grant -w $tkt >/dev/null } grant_ticket_cib() { - run_site $1 booth grant -C $2 >/dev/null + run_site $1 booth grant -C $tkt >/dev/null } ## TEST: grant ## # just a grant test_grant() { - grant_ticket 1 $1 + grant_ticket 1 } check_grant() { check_consistency `get_internal_site 1` } ## TEST: longgrant ## # just a grant followed by three expire times setup_longgrant() { - grant_ticket 1 $1 + grant_ticket 1 } test_longgrant() { wait_exp wait_exp wait_exp } check_longgrant() { check_consistency `get_internal_site 1` } ## TEST: longgrant2 ## # just a grant followed by 10 expire times setup_longgrant2() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 } test_longgrant2() { local i for i in `seq 10`; do wait_exp done } check_longgrant2() { check_consistency `get_internal_site 1` } ## TEST: grant_noarb ## # just a grant with no arbitrators setup_grant_noarb() { local h for h in $arbitrators; do stop_arbitrator $h || return 1 done >/dev/null 2>&1 #sleep 1 } test_grant_noarb() { - grant_ticket 1 $1 + grant_ticket 1 } check_grant_noarb() { check_consistency `get_internal_site 1` } recover_grant_noarb() { local h for h in $arbitrators; do start_arbitrator $h done >/dev/null 2>&1 } applicable_grant_noarb() { [ -n "$arbitrators" ] } ## TEST: revoke ## # just a revoke setup_revoke() { - grant_ticket 1 $1 + grant_ticket 1 } test_revoke() { - revoke_ticket $1 + revoke_ticket } check_revoke() { check_consistency } ## TEST: grant_elsewhere ## # just a grant to another site test_grant_elsewhere() { run_site 1 booth grant -w -s `get_internal_site 2` $tkt >/dev/null } check_grant_elsewhere() { check_consistency `get_internal_site 2` } ## TEST: grant_site_lost ## # grant with one site lost setup_grant_site_lost() { stop_site `get_site 2` booth_status `get_site 2` && return 1 return 0 } test_grant_site_lost() { - grant_ticket 1 $1 + grant_ticket 1 wait_exp } check_grant_site_lost() { check_consistency `get_internal_site 1` } recover_grant_site_lost() { start_site `get_site 2` } ## TEST: grant_site_reappear ## # grant with one site lost then reappearing setup_grant_site_reappear() { stop_site `get_site 2` booth_status `get_site 2` && return 1 return 0 #sleep 1 } test_grant_site_reappear() { - grant_ticket 1 $1 || return $ERR_SETUP_FAILED + grant_ticket 1 || return $ERR_SETUP_FAILED check_cib `get_internal_site 1` || return $ERR_SETUP_FAILED wait_timeout start_site `get_site 2` || return $ERR_SETUP_FAILED wait_timeout wait_timeout } check_grant_site_reappear() { check_consistency `get_internal_site 1` && is_cib_granted `get_site 1` } recover_grant_site_reappear() { start_site `get_site 2` } ## TEST: simultaneous_start_even ## # simultaneous start of even number of members setup_simultaneous_start_even() { - grant_ticket_cib 2 $1 || return 1 + grant_ticket_cib 2 || return 1 stop_booth || return 1 #wait_timeout } test_simultaneous_start_even() { local serv for serv in $(echo $sites | sed "s/`get_site 1` //"); do start_site $serv & done for serv in $arbitrators; do start_arbitrator $serv & done wait_renewal start_site `get_site 1` wait_timeout wait_timeout } check_simultaneous_start_even() { check_consistency `get_internal_site 2` } ## TEST: slow_start_granted ## # slow start setup_slow_start_granted() { - grant_ticket_cib 1 $1 || return 1 + grant_ticket_cib 1 || return 1 stop_booth || return 1 #wait_timeout } test_slow_start_granted() { for serv in $sites; do start_site $serv wait_timeout done for serv in $arbitrators; do start_arbitrator $serv wait_timeout done } check_slow_start_granted() { check_consistency `get_internal_site 1` } ## TEST: restart_granted ## # restart with ticket granted setup_restart_granted() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 } test_restart_granted() { restart_site `get_site 1` || return 1 wait_timeout } check_restart_granted() { check_consistency `get_internal_site 1` } ## TEST: reload_granted ## # reload with ticket granted setup_reload_granted() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 } test_reload_granted() { reload_site `get_site 1` || return 1 wait_timeout } check_reload_granted() { check_consistency `get_internal_site 1` } ## TEST: restart_granted_nocib ## # restart with ticket granted (but cib empty) setup_restart_granted_nocib() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 } test_restart_granted_nocib() { stop_site_clean `get_site 1` || return 1 #wait_timeout start_site `get_site 1` || return 1 wait_timeout wait_timeout wait_timeout } check_restart_granted_nocib() { check_consistency `get_internal_site 1` } ## TEST: restart_notgranted ## # restart with ticket not granted setup_restart_notgranted() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 } test_restart_notgranted() { stop_site `get_site 2` || return 1 #sleep 1 start_site `get_site 2` || return 1 wait_timeout } check_restart_notgranted() { check_consistency `get_internal_site 1` } ## TEST: failover ## # ticket failover setup_failover() { - grant_ticket 1 $1 + grant_ticket 1 [ -n "`get_attr`" ] && set_site_attr 2 return 0 } test_failover() { stop_site_clean `get_site 1` || return 1 booth_status `get_site 1` && return 1 wait_exp wait_timeout wait_timeout wait_timeout } check_failover() { check_consistency any } recover_failover() { start_site `get_site 1` } ## TEST: split_leader ## # split brain (leader alone) setup_split_leader() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 [ -n "`get_attr`" ] && set_site_attr 2 return 0 } test_split_leader() { run_site 1 $iprules stop $port >/dev/null wait_exp wait_timeout wait_timeout wait_timeout wait_timeout check_cib any || return 1 run_site 1 $iprules start $port >/dev/null wait_timeout wait_timeout wait_timeout } check_split_leader() { check_consistency any } recover_split_leader() { run_site 1 $iprules start $port >/dev/null } ## TEST: split_follower ## # split brain (follower alone) setup_split_follower() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 } test_split_follower() { run_site 2 $iprules stop $port >/dev/null wait_exp wait_timeout run_site 2 $iprules start $port >/dev/null wait_timeout } check_split_follower() { check_consistency `get_internal_site 1` } ## TEST: split_edge ## # split brain (leader alone) setup_split_edge() { - grant_ticket_cib 1 $1 + grant_ticket_cib 1 } test_split_edge() { run_site 1 $iprules stop $port >/dev/null wait_exp run_site 1 $iprules start $port >/dev/null wait_timeout wait_timeout } check_split_edge() { check_consistency any } ## TEST: external_prog_failed ## # external test prog failed setup_external_prog_failed() { - grant_ticket 1 $1 || return 1 + grant_ticket 1 || return 1 [ -n "`get_attr`" ] && set_site_attr 2 break_external_prog 1 show_pref 1 || return 1 } test_external_prog_failed() { wait_renewal wait_timeout } check_external_prog_failed() { check_consistency any && [ `booth_where_granted` != `get_internal_site 1` ] } recover_external_prog_failed() { repair_external_prog 1 } applicable_external_prog_failed() { [ -n "`get_rsc`" ] } ## TEST: attr_prereq_ok ## # failover with attribute prerequisite setup_attr_prereq_ok() { - grant_ticket 1 $1 || return 1 + grant_ticket 1 || return 1 set_site_attr 2 stop_site_clean `get_site 1` booth_status `get_site 1` && return 1 return 0 } test_attr_prereq_ok() { wait_exp wait_timeout } check_attr_prereq_ok() { check_consistency `get_internal_site 2` } recover_attr_prereq_ok() { start_site `get_site 1` del_site_attr 2 } applicable_attr_prereq_ok() { [ -n "`get_attr`" ] } ## TEST: attr_prereq_fail ## # failover with failed attribute prerequisite setup_attr_prereq_fail() { - grant_ticket 1 $1 || return 1 + grant_ticket 1 || return 1 del_site_attr 2 >/dev/null 2>&1 stop_site_clean `get_site 1` booth_status `get_site 1` && return 1 return 0 } test_attr_prereq_fail() { wait_exp wait_exp wait_exp } check_attr_prereq_fail() { check_consistency && booth_where_granted | grep -qwi none } recover_attr_prereq_fail() { start_site `get_site 1` } applicable_attr_prereq_fail() { [ -n "`get_attr`" ] } # # environment modifications # # packet loss at one site 30% NETEM_ENV_single_loss() { run_site 1 $ABSPATH $run_cnf __netem__ netem_loss ${1:-30} PKT_LOSS=${1:-30} } # packet loss everywhere 30% NETEM_ENV_loss() { forall $ABSPATH $run_cnf __netem__ netem_loss ${1:-30} PKT_LOSS=${1:-30} } # network delay 100ms NETEM_ENV_net_delay() { forall $ABSPATH $run_cnf __netem__ netem_delay ${1:-100} } # duplicate packets NETEM_ENV_duplicate() { forall $ABSPATH $run_cnf __netem__ netem_duplicate ${1:-10} } # reorder packets NETEM_ENV_reorder() { forall $ABSPATH $run_cnf __netem__ netem_reorder ${1:-25} ${2:-50} } # need this if we're run from a local directory or such get_prog_abspath() { local p p=`run_site 1 rpm -ql booth-test | fgrep -w $PROG` echo ${p:-/usr/share/booth/tests/test/live_test.sh} } [ -f "$cnf" ] || { echo "ERROR: configuration file $cnf doesn't exist" usage 1 } is_pacemaker_running || { echo "ERROR: sites must run pacemaker" exit 1 } sites=`get_extern_ip site < $cnf` arbitrators=`get_extern_ip arbitrator < $cnf` internal_sites=`get_value site < $cnf` internal_arbitrators=`get_value arbitrator < $cnf` all_nodes=`get_all_nodes` port=`get_value port < $cnf` : ${port:=9929} site_cnt=`echo $internal_sites | wc -w` arbitrator_cnt=`echo $internal_arbitrators | wc -w` if [ "$1" = "__netem__" ]; then shift 1 _JUST_NETEM=1 local_netem_env $@ exit fi [ -z "$internal_sites" ] && { echo no sites in $cnf usage 1 } exec 2>$logf BASH_XTRACEFD=2 PS4='+ `date +"%T"`: ' set -x WE_SERVER="" is_we_server && WE_SERVER=1 PREFNAME=__pref_booth_live_test authfile=`get_value authfile < $cnf` run_site 1 'test -f '"$authfile"' || booth-keygen '"$authfile" TESTS="$@" MANUAL_TESTS="$@" : ${TESTS:="grant longgrant grant_noarb grant_elsewhere grant_site_lost grant_site_reappear revoke simultaneous_start_even slow_start_granted restart_granted reload_granted restart_granted_nocib restart_notgranted failover split_leader split_follower split_edge external_prog_failed attr_prereq_ok attr_prereq_fail"} : ${MANUAL_TESTS:="grant longgrant grant_noarb grant_elsewhere grant_site_lost restart_granted reload_granted split_leader split_follower split_edge "} #get total number od lines in the file conf_file_size=$(grep -c $ $cnf) #get line numbers for all tickets ticket_line_numbers=$(grep -n ticket $cnf | cut -d: -f1) read -a TICKET_LINES<<< $ticket_line_numbers #save the part of config located before ticket definitions sed -n "1,$((${TICKET_LINES[0]}-1))p" $cnf > ${cnf}_main.config #create a separate file for every ticket data number_of_tickets=0 for i in $(seq 0 1 $((${#TICKET_LINES[@]}-1))); do - ticket_line_start=${TICKET_LINES[i]} - ticket_line_end=$((${TICKET_LINES[i+1]}-1)) - if [ ${ticket_line_end} -lt 0 ]; then + ticket_line_start=${TICKET_LINES[i]} + ticket_line_end=$((${TICKET_LINES[i+1]}-1)) + if [ ${ticket_line_end} -lt 0 ]; then # for the last ticket - ticket_line_end=${conf_file_size} - fi - sed -n "${ticket_line_start},${ticket_line_end}p" $cnf > ${cnf}_${number_of_tickets}.ticket - number_of_tickets=$((number_of_tickets+1)) + ticket_line_end=${conf_file_size} + fi + sed -n "${ticket_line_start},${ticket_line_end}p" $cnf > ${cnf}_${number_of_tickets}.ticket + number_of_tickets=$((number_of_tickets+1)) done master_rc=0 # updated in runtest for i in `seq 0 $(($number_of_tickets-1))` do cat ${cnf}_main.config > booth_${i}.conf cat ${cnf}_${i}.ticket >> booth_${i}.conf - tkt=`get_tkt < booth_${i}.conf` + tkt=`get_tkt < booth_${i}.conf` if [ -z "$tkt" ]; then echo "Skipping empty ticket.." - continue + continue fi sync_conf booth_${i}.conf || exit reboot_test all_booth_status || { start_booth all_booth_status || { echo "some booth servers couldn't be started" exit 1 } } ABSPATH=`get_prog_abspath` dump_conf | logmsg - eval `get_tkt_settings booth_${i}.conf` + eval `get_tkt_settings booth_${i}.conf` - MIN_TIMEOUT=`awk -v tm=$T_timeout 'BEGIN{ - if (tm >= 2) print tm; - else print 2*tm; - }'` + MIN_TIMEOUT=`awk -v tm=$T_timeout 'BEGIN{ + if (tm >= 2) print tm; + else print 2*tm; + }'` - [ -z "$T_expire" ] && { - echo set $tkt expire time in $cnf - usage 1 - } + [ -z "$T_expire" ] && { + echo set $tkt expire time in $cnf + usage 1 + } - if [ -z "$T_renewal_freq" ]; then - T_renewal_freq=$((T_expire/2)) - fi + if [ -z "$T_renewal_freq" ]; then + T_renewal_freq=$((T_expire/2)) + fi - revoke_ticket $tkt + revoke_ticket - T_mode=`get_mode` - T_mode_lowercase=$(echo "$T_mode" | tr '[:upper:]' '[:lower:]') + T_mode=`get_mode` + T_mode_lowercase=$(echo "$T_mode" | tr '[:upper:]' '[:lower:]') - if [[ $T_mode_lowercase == *"manual"* ]]; then - echo "Testing the manual ticket.." + if [[ $T_mode_lowercase == *"manual"* ]]; then + echo "Running tests for manual tickets.." - for t in $MANUAL_TESTS; do - runtest $t $tkt - done + for t in $MANUAL_TESTS; do + runtest $t + done else - echo "Testing an automatic Raft ticket.." + echo "Running tests for automatic Raft tickets.." - for t in $TESTS; do - runtest $t $tkt - done - fi + for t in $TESTS; do + runtest $t + done + fi done exit $master_rc