diff --git a/doc/crm_fencing.txt b/doc/crm_fencing.txt index 96c80487ae..cb5bae481c 100644 --- a/doc/crm_fencing.txt +++ b/doc/crm_fencing.txt @@ -1,441 +1,441 @@ Fencing and Stonith =================== Fencing is a very important concept in computer clusters for HA (High Availability). Unfortunately, given that fencing does not offer a visible service to users, it is often neglected. Fencing may be defined as a method to bring an HA cluster to a known state. But, what is a "cluster state" after all? To answer that question we have to see what is in the cluster. == Introduction to HA clusters Any computer cluster may be loosely defined as a collection of cooperating computers or nodes. Nodes talk to each other over communication channels, which are typically standard network connections, such as Ethernet. The main purpose of an HA cluster is to manage user services. Typical examples of user services are an Apache web server or, say, a MySQL database. From the user's point of view, the services do some specific and hopefully useful work when ordered to do so. To the cluster, however, they are just things which may be started or stopped. This distinction is important, because the nature of the service is irrelevant to the cluster. In the cluster lingo, the user services are known as resources. Every resource has a state attached, for instance: "resource r1 is started on node1". In an HA cluster, such state implies that "resource r1 is stopped on all nodes but node1", because an HA cluster must make sure that every resource may be started on at most one node. A collection of resource states and node states is the cluster state. Every node must report every change that happens to resources. This may happen only for the running resources, because a node should not start resources unless told so by somebody. That somebody is the Cluster Resource Manager (CRM) in our case. So far so good. But what if, for whatever reason, we cannot establish with certainty a state of some node or resource? This is where fencing comes in. With fencing, even when the cluster doesn't know what is happening on some node, we can make sure that that node doesn't run any or certain important resources. If you wonder how this can happen, there may be many risks involved with computing: reckless people, power outages, natural disasters, rodents, thieves, software bugs, just to name a few. We are sure that at least a few times your computer failed unpredictably. == Fencing There are two kinds of fencing: resource level and node level. Using the resource level fencing the cluster can make sure that a node cannot access one or more resources. One typical example is a SAN, where a fencing operation changes rules on a SAN switch to deny access from a node. The resource level fencing may be achieved using normal resources on which the resource we want to protect would depend. Such a resource would simply refuse to start on this node and therefore resources which depend on it will be unrunnable on the same node as well. The node level fencing makes sure that a node does not run any resources at all. This is usually done in a very simple, yet brutal way: the node is simply reset using a power switch. This may ultimately be necessary because the node may not be responsive at all. The node level fencing is our primary subject below. == Node level fencing devices Before we get into the configuration details, you need to pick a fencing device for the node level fencing. There are quite a few to choose from. If you want to see the list of stonith devices which are supported just run: stonith -L Stonith devices may be classified into five categories: - UPS (Uninterruptible Power Supply) - PDU (Power Distribution Unit) - Blade power control devices - Lights-out devices - Testing devices The choice depends mainly on your budget and the kind of hardware. For instance, if you're running a cluster on a set of blades, then the power control device in the blade enclosure is the only candidate for fencing. Of course, this device must be capable of managing single blade computers. The lights-out devices (IBM RSA, HP iLO, Dell DRAC) are becoming increasingly popular and in future they may even become standard equipment of of-the-shelf computers. They are, however, inferior to UPS devices, because they share a power supply with their host (a cluster node). If a node stays without power, the device supposed to control it would be just as useless. Even though this is obvious to us, the cluster manager is not in the know and will try to fence the node in vain. This will continue forever because all other resource operations would wait for the fencing/stonith operation to succeed. The testing devices are used exclusively for testing purposes. They are usually more gentle on the hardware. Once the cluster goes into production, they must be replaced with real fencing devices. == STONITH (Shoot The Other Node In The Head) Stonith is our fencing implementation. It provides the node level fencing. NOTE: The stonith and fencing terms are often used interchangeably here as well as in other texts. The stonith subsystem consists of two components: - stonithd - stonith plugins === stonithd stonithd is a daemon which may be accessed by the local processes or over the network. It accepts commands which correspond to fencing operations: reset, power-off, and power-on. It may also check the status of the fencing device. stonithd runs on every node in the CRM HA cluster. The stonithd instance running on the DC node receives a fencing request from the CRM. It is up to this and other stonithd programs to carry out the desired fencing operation. === Stonith plugins For every supported fencing device there is a stonith plugin which is capable of controlling that device. A stonith plugin is the interface to the fencing device. All stonith plugins look the same to stonithd, but are quite different on the other side reflecting the nature of the fencing device. Some plugins support more than one device. A typical example is ipmilan (or external/ipmi) which implements the IPMI protocol and can control any device which supports this protocol. == CRM stonith configuration The fencing configuration consists of one or more stonith resources. A stonith resource is a resource of class stonith and it is configured just like any other resource. The list of parameters (attributes) depend on and are specific to a stonith type. Use the stonith(1) program to see the list: $ stonith -t ibmhmc -n ipaddr $ stonith -t ipmilan -n hostname ipaddr port auth priv login password reset_method NOTE: It is easy to guess the class of a fencing device from the set of attribute names. A short help text is also available: $ stonith -t ibmhmc -h STONITH Device: ibmhmc - IBM Hardware Management Console (HMC) Use for IBM i5, p5, pSeries and OpenPower systems managed by HMC Optional parameter name managedsyspat is white-space delimited list of patterns used to match managed system names; if last character is '*', all names that begin with the pattern are matched Optional parameter name password is password for hscroot if passwordless ssh access to HMC has NOT been setup (to do so, it is necessary to create a public/private key pair with empty passphrase - see "Configure the OpenSSH client" in the redbook for more details) For more information see http://publib-b.boulder.ibm.com/redbooks.nsf/RedbookAbstracts/SG247038.html .You just said that there is stonithd and stonith plugins. What's with these resources now? ************************** Resources of class stonith are just a representation of stonith plugins in the CIB. Well, a bit more: apart from the fencing operations, the stonith resources, just like any other, may be started and stopped and monitored. The start and stop operations are a bit of a misnomer: enable and disable would serve better, but it's too late to change that. So, these two are actually administrative operations and do not translate to any operation on the fencing device itself. Monitor, however, does translate to device status. ************************** A dummy stonith resource configuration, which may be used in some testing scenarios is very simple: configure primitive st-null stonith:null \ params hostlist="node1 node2" clone fencing st-null \ meta globally-unique=false commit .NB ************************** All configuration examples are in the crm configuration tool syntax. To apply them, put the sample in a text file, say sample.txt and run: crm < sample.txt The configure and commit lines are omitted from further examples. ************************** An alternative configuration: primitive st-node1 stonith:null \ params hostlist="node1" primitive st-node2 stonith:null \ params hostlist="node2" location l-st-node1 st-node1 -inf: node1 location l-st-node2 st-node2 -inf: node2 This configuration is perfectly alright as far as the cluster software is concerned. The only difference to a real world configuration is that no fencing operation takes place. A more realistic, but still only for testing, is the following external/ssh configuration: primitive st-ssh stonith:external/ssh \ params hostlist="node1 node2" clone fencing st-ssh \ meta globally-unique=false This one can also reset nodes. As you can see, this configuration is remarkably similar to the first one which features the null stonith device. .What is this clone thing? ************************** Clones are a CRM/Pacemaker feature. A clone is basically a shortcut: instead of defining n identical, yet differently named resources, a single cloned resource suffices. By far the most common use of clones is with stonith resources if the stonith device is accessible from all nodes. ************************** The real device configuration is not much different, though some devices may require more attributes. For instance, an IBM RSA lights-out device might be configured like this: primitive st-ibmrsa-1 stonith:external/ibmrsa-telnet \ params nodename=node1 ipaddr=192.168.0.101 \ userid=USERID passwd=PASSW0RD primitive st-ibmrsa-2 stonith:external/ibmrsa-telnet \ params nodename=node2 ipaddr=192.168.0.102 \ userid=USERID passwd=PASSW0RD # st-ibmrsa-1 can run anywhere but on node1 location l-st-node1 st-ibmrsa-1 -inf: node1 # st-ibmrsa-2 can run anywhere but on node2 location l-st-node2 st-ibmrsa-2 -inf: node2 .Why those strange location constraints? ************************** There is always certain probability that the stonith operation is going to fail. Hence, a stonith operation on the node which is the executioner too is not reliable. If the node is reset, then it cannot send the notification about the fencing operation outcome. The only way to do that is to assume that the operation is going to succeed and send the notification beforehand. Then, if the operation fails, we are in trouble. Given all this, we decided that, by convention, stonithd refuses to kill its host. ************************** If you didn't already guess, configuration of a UPS kind of fencing device is remarkably similar to all we have already shown. All UPS devices employ the same mechanics for fencing. What is, however, different is how the device itself is accessed. Old UPS devices, those that were considered professional, used to have just a serial port, typically connected at 1200baud using a special serial cable. Many new ones still come equipped with a serial port, but often they also sport a USB interface or an Ethernet interface. The kind of connection we may make use of depends on what the plugin supports. Let's see a few examples for the APC UPS equipment: $ stonith -t apcmaster -h STONITH Device: apcmaster - APC MasterSwitch (via telnet) NOTE: The APC MasterSwitch accepts only one (telnet) connection/session a time. When one session is active, subsequent attempts to connect to the MasterSwitch will fail. For more information see http://www.apc.com/ List of valid parameter names for apcmaster STONITH device: ipaddr login password $ stonith -t apcsmart -h STONITH Device: apcsmart - APC Smart UPS (via serial port - NOT USB!). Works with higher-end APC UPSes, like Back-UPS Pro, Smart-UPS, Matrix-UPS, etc. (Smart-UPS may have to be >= Smart-UPS 700?). See http://www.networkupstools.org/protocols/apcsmart.html for protocol compatibility details. For more information see http://www.apc.com/ List of valid parameter names for apcsmart STONITH device: ttydev hostlist The former plugin supports APC UPS with a network port and telnet protocol. The latter plugin uses the APC SMART protocol over the serial line which is supported by many different APC UPS product lines. .So, what do I use: clones, constraints, both? ************************** It depends. Depends on the nature of the fencing device. For example, if the device cannot serve more than one connection at the time, then clones won't do. Depends on how many hosts can the device manage. If it's only one, and that is always the case with lights-out devices, then again clones are right out. Depends also on the number of nodes in your cluster: the more nodes the more desirable to use clones. Finally, it is also a matter of personal preference. In short: if clones are safe to use with your configuration and if they reduce the configuration, then make cloned stonith resources. ************************** The CRM configuration is left as an exercise to the reader. == Monitoring the fencing devices Just like any other resource, the stonith class agents also support the monitor operation. Given that we have often seen monitor either not configured or configured in a wrong way, we have decided to devote a section to the matter. Monitoring stonith resources, which is actually checking status of the corresponding fencing devices, is strongly recommended. So strongly, that we should consider a configuration without it -wrong. +invalid. On the one hand, though an indispensable part of an HA cluster, a fencing device, being the last line of defense, is used seldom. Very seldom and preferably never. On the other, for whatever reason, the power management equipment is known to be rather fragile on the communication side. Some devices were known to give up if there was too much broadcast traffic on the wire. Some cannot handle more than ten or so connections per minute. Some get very confused if two clients try to connect at the same time. Most cannot handle more than one session at the time. The bottom line: try not to exercise your fencing device too often. It may not like it. Use monitoring regularly, yet sparingly, say once every couple of hours. The probability that within those few hours there will be a need for a fencing operation and that the power switch would fail is usually low. == Odd plugins Apart from plugins which handle real devices, some stonith plugins are a bit awkward and deserve special attention. === external/kdumpcheck Sometimes, it may be important to get a kernel core dump. This plugin may be used to check if the dump is in progress. If that is the case, then it will return true, as if the node has been fenced, which is actually true given that it cannot run any resources at the time. kdumpcheck is typically used in -concert with another, "real", fencing device. See +concert with another, real, fencing device. See README_kdumpcheck.txt for more details. === external/sbd This is a self-fencing device. It reacts to a so-called "poison pill" which may be inserted into a shared disk. On shared storage connection loss, it also makes the node commit suicide. See http://www.linux-ha.org/SBD_Fencing for more details. === meatware Strange name and a simple concept. meatware requires help from a human to operate. Whenever invoked, meatware logs a CRIT severity message which should show up on the node's console. The operator should then make sure that the node is down and issue a meatclient(8) command to tell meatware that it's OK to tell the cluster that it may consider the node dead. See README.meatware for more information. === null This one is probably not of much importance to the general public. It is used in various testing scenarios. null is an imaginary device which always behaves and always claims that it has shot a node, but never does anything. Do not use it unless you know what you are doing. === suicide suicide is a software-only device, which can reboot a node it is running on. It depends on the operating system, so it should be avoided whenever possible. But it is OK on one-node clusters. suicide and null are the only exceptions to the "don't shoot my host" rule. .What about that stonithd? You forgot about it, eh? ************************** The stonithd daemon, though it is really the master of ceremony, requires no configuration itself. All configuration is stored in the CIB. ************************** == Resources http://linux-ha.org/STONITH http://linux-ha.org/fencing http://linux-ha.org/ConfiguringStonithPlugins http://linux-ha.org/CIB/Idioms http://www.clusterlabs.org/mediawiki/images/f/fb/Configuration_Explained.pdf http://techthoughts.typepad.com/managing_computers/2007/10/split-brain-quo.html diff --git a/tools/crm.in b/tools/crm.in index 54b693c913..e2bff05168 100644 --- a/tools/crm.in +++ b/tools/crm.in @@ -1,5201 +1,5202 @@ #!/usr/bin/env python # # Copyright (C) 2008 Dejan Muhamedagic # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This software is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # import shlex import os from tempfile import mkstemp import subprocess import sys import time import readline import copy import xml.dom.minidom import signal import re class ErrorBuffer(object): ''' Show error messages either immediately or buffered. ''' def __init__(self): self.msg_list = [] self.mode = "immediate" def buffer(self): self.mode = "keep" def release(self): if self.msg_list: print '\n'.join(self.msg_list) if interactive: raw_input("Press enter to continue... ") self.msg_list = [] self.mode = "immediate" def writemsg(self,msg): if self.mode == "immediate": print msg else: self.msg_list.append(msg) def error(self,s): self.writemsg("ERROR: %s" % add_lineno(s)) def warning(self,s): self.writemsg("WARNING: %s" % add_lineno(s)) def info(self,s): self.writemsg("INFO: %s" % add_lineno(s)) def add_lineno(s): if lineno > 0: return "%d: %s" % (lineno,s) else: return s def common_err(s): err_buf.error(s) def common_warn(s): err_buf.warning(s) def common_info(s): err_buf.info(s) def no_prog_err(name): err_buf.error("%s not available, check your installation"%name) def missing_prog_warn(name): err_buf.warning("could not find any %s on the system"%name) def no_attribute_err(attr,obj_type): err_buf.error("required attribute %s not found in %s"%(attr,obj_type)) def bad_def_err(what,msg): err_buf.error("bad %s definition: %s"%(what,msg)) def unsupported_err(name): err_buf.error("%s is not supported"%name) def no_such_obj_err(name): err_buf.error("%s object is not supported"%name) def obj_cli_err(name): err_buf.error("object %s cannot be represented in the CLI notation"%name) def missing_obj_err(node): err_buf.error("object %s:%s missing (shouldn't have happened)"% \ (node.tagName,node.getAttribute("id"))) def constraint_norefobj_err(constraint_id,obj_id): err_buf.error("constraint %s references a resource %s which doesn't exist"% \ (constraint_id,obj_id)) def obj_exists_err(name): err_buf.error("object %s already exists"%name) def no_object_err(name): err_buf.error("object %s does not exist"%name) def invalid_id_err(obj_id): err_buf.error("%s: invalid object id"%obj_id) def id_used_err(node_id): err_buf.error("%s: id is already in use"%node_id) def skill_err(s): err_buf.error("%s: this command is not allowed at this skill level"%' '.join(s)) def syntax_err(s,token = '',context = ''): pfx = "syntax" if context: pfx = "%s in %s" %(pfx,context) if type(s) == type(''): err_buf.error("%s near <%s>"%(pfx,s)) elif token: err_buf.error("%s near <%s>: %s"%(pfx,token,' '.join(s))) else: err_buf.error("%s: %s"%(pfx,' '.join(s))) def bad_attr_usage(cmd,args): err_buf.error("bad usage: %s %s"%(cmd,args)) def cib_parse_err(msg): err_buf.error("%s"%msg) def cib_no_elem_err(el_name): err_buf.error("CIB contains no '%s' element!"%el_name) def cib_ver_unsupported_err(validator,rel): err_buf.error("CIB not supported: validator '%s', release '%s'"% (validator,rel)) err_buf.error("You may try the upgrade command") def update_err(obj_id,cibadm_opt,node): if cibadm_opt == '-C': task = "create" elif cibadm_opt == '-D': task = "delete" else: task = "update" err_buf.error("could not %s %s"%(task,obj_id)) err_buf.info("offending xml: %s" % node.toprettyxml()) def not_impl_info(s): err_buf.info("%s is not implemented yet" % s) def ask(msg): ans = raw_input(msg + ' ') if not ans: return False return ans[0].lower() == 'y' from UserDict import DictMixin class odict(DictMixin): def __init__(self, data=None, **kwdata): self._keys = [] self._data = {} def __setitem__(self, key, value): if key not in self._data: self._keys.append(key) self._data[key] = value def __getitem__(self, key): return self._data[key] def __delitem__(self, key): del self._data[key] self._keys.remove(key) def keys(self): return list(self._keys) def copy(self): copyDict = odict() copyDict._data = self._data.copy() copyDict._keys = self._keys[:] return copyDict global_aliases = { "quit": ("bye","exit"), "end": ("cd","up"), } def setup_aliases(obj): for cmd in obj.cmd_aliases.keys(): for alias in obj.cmd_aliases[cmd]: obj.help_table[alias] = obj.help_table[cmd] obj.cmd_table[alias] = obj.cmd_table[cmd] # # Resource Agents interface (meta-data, parameters, etc) # def lrmadmin(opts, xml = False): ''' Get information directly from lrmd using lrmadmin. ''' lrmadmin_prog = "@sbindir@/lrmadmin" l = [] #print "invoke: lrmadmin",opts if is_program(lrmadmin_prog) and is_process("lrmd"): proc = subprocess.Popen("%s %s"%(lrmadmin_prog,opts), \ shell=True, stdout=subprocess.PIPE) outp = proc.communicate()[0] proc.wait() outp = outp.strip() l = outp.split('\n') if not xml: l = l[1:] # skip the first line return l def pengine_meta(): ''' Do pengine metadata. ''' pengine = "@CRM_DAEMON_DIR@/pengine" l = [] if is_program(pengine): proc = subprocess.Popen("%s metadata"%pengine, \ shell=True, stdout=subprocess.PIPE) outp = proc.communicate()[0] proc.wait() outp = outp.strip() l = outp.split('\n') return l def get_nodes_text(n,tag): try: node = n.getElementsByTagName(tag)[0] for c in node.childNodes: if c.nodeType == c.TEXT_NODE: return c.data.strip() except: return '' def ra_classes(): ''' List of RA classes. ''' if wcache.is_cached("ra_classes"): return wcache.retrieve("ra_classes") l = lrmadmin("-C") return wcache.store("ra_classes",l) def ra_providers(ra_type,ra_class = "ocf"): 'List of providers for a class:type.' id = "ra_providers-%s-%s" % (ra_class,ra_type) if wcache.is_cached(id): return wcache.retrieve(id) l = lrmadmin("-P %s %s" % (ra_class,ra_type),True) return wcache.store(id,l) def ra_providers_all(ra_class = "ocf"): ''' List of providers for a class:type. ''' id = "ra_providers_all-%s" % ra_class if wcache.is_cached(id): return wcache.retrieve(id) ocf_root = os.getenv("@OCF_ROOT_DIR@") if not ocf_root: ocf_root = "/usr/lib/ocf" dir = ocf_root + "/resource.d" l = [] for s in os.listdir(dir): if os.path.isdir("%s/%s" % (dir,s)): l.append(s) return wcache.store(id,l) def ra_types(ra_class = "ocf", ra_provider = ""): ''' List of RA type for a class. ''' if not ra_class: ra_class = "ocf" id = "ra_types-%s-%s" % (ra_class,ra_provider) if wcache.is_cached(id): return wcache.retrieve(id) if ra_provider: list = [] for ra in lrmadmin("-T %s" % ra_class): if ra_provider in ra_providers(ra,ra_class): list.append(ra) else: list = lrmadmin("-T %s" % ra_class) list.sort() return wcache.store(id,list) class RAInfo(object): ''' A resource agent and whatever's useful about it. ''' ra_tab = " " # four horses skip_ops = ("meta-data", "validate-all") act_attr = ("timeout", "interval", "depth") def __init__(self,ra_class,ra_type,ra_provider = "heartbeat"): self.ra_class = ra_class self.ra_type = ra_type self.ra_provider = ra_provider if not self.ra_provider: self.ra_provider = "heartbeat" self.mk_ra_node() # indirectly caches meta-data and doc def mk_ra_node(self): ''' Return the resource_agent node. ''' meta = self.meta() try: self.doc = xml.dom.minidom.parseString('\n'.join(meta)) except: #common_err("could not parse meta-data for (%s,%s,%s)" \ # % (self.ra_class,self.ra_type,self.ra_provider)) self.ra_node = None return try: self.ra_node = self.doc.getElementsByTagName("resource-agent")[0] except: common_err("meta-data contains no resource-agent element") self.ra_node = None def params(self): ''' Construct a dict: parameters are keys and lists of flags (required and unique) are values. Cached too. ''' id = "ra_params-%s-%s-%s"%(self.ra_class,self.ra_type,self.ra_provider) if wcache.is_cached(id): return wcache.retrieve(id) if not self.ra_node: return None d = {} for pset in self.ra_node.getElementsByTagName("parameters"): for c in pset.getElementsByTagName("parameter"): name = c.getAttribute("name") if not name: continue required = c.getAttribute("required") unique = c.getAttribute("unique") d[name] = (required == '1',unique == '1') return wcache.store(id,d) def params_list(self): ''' List of parameters. ''' try: l = self.params().keys() l.sort() return l except: return [] def reqd_params_list(self): ''' List of required parameters. ''' d = self.params() if not d: return [] return [x for x in d if d[x][0]] def is_param_reqd(self,pname): ''' Is pname a required parameter? ''' return pname in self.reqd_params_list() def meta(self): ''' RA meta-data as raw xml. ''' id = "ra_meta-%s-%s-%s" % (self.ra_class,self.ra_type,self.ra_provider) if wcache.is_cached(id): return wcache.retrieve(id) if self.ra_class == "pengine": l = pengine_meta() else: l = lrmadmin("-M %s %s %s" % (self.ra_class,self.ra_type,self.ra_provider),True) return wcache.store(id, l) def meta_pretty(self): ''' Print the RA meta-data in a human readable form. ''' if not self.ra_node: return '' l = [] title = self.meta_title() l.append(title) longdesc = get_nodes_text(self.ra_node,"longdesc") if longdesc: l.append(longdesc) if self.ra_class != "heartbeat": params = self.meta_parameters() if params: l.append(params.rstrip()) actions = self.meta_actions() if actions: l.append(actions) return '\n\n'.join(l) def meta_title(self): if self.ra_class == "ocf": s = "%s:%s:%s" % (self.ra_class,self.ra_provider,self.ra_type) else: s = "%s:%s" % (self.ra_class,self.ra_type) shortdesc = get_nodes_text(self.ra_node,"shortdesc") if shortdesc and shortdesc != self.ra_type: s = "%s (%s)" % (shortdesc,s) return s def meta_param_head(self,n): type = default = None name = n.getAttribute("name") if not name: return None s = name if n.getAttribute("required") == "1": s = s + "*" try: content = n.getElementsByTagName("content")[0] type = content.getAttribute("type") default = content.getAttribute("default") except: pass if type and default: s = "%s (%s, [%s])" % (s,type,default) elif type: s = "%s (%s)" % (s,type) shortdesc = get_nodes_text(n,"shortdesc") if shortdesc and shortdesc != name: s = "%s: %s" % (s,shortdesc) return s def format_parameter(self,n): l = [] head = self.meta_param_head(n) if not head: common_err("no name attribute for parameter") return "" l.append(head) longdesc = get_nodes_text(n,"longdesc") if longdesc: longdesc = self.ra_tab + longdesc.replace("\n","\n"+self.ra_tab) + '\n' l.append(longdesc) return '\n'.join(l) def meta_parameter(self,param): if not self.ra_node: return '' l = [] for pset in self.ra_node.getElementsByTagName("parameters"): for c in pset.getElementsByTagName("parameter"): if c.getAttribute("name") == param: return self.format_parameter(c) def meta_parameters(self): if not self.ra_node: return '' l = [] for pset in self.ra_node.getElementsByTagName("parameters"): for c in pset.getElementsByTagName("parameter"): s = self.format_parameter(c) if s: l.append(s) if l: return "Parameters (* denotes required, [] the default):\n\n" + '\n'.join(l) def meta_action_head(self,n): name = n.getAttribute("name") if not name: return '' if name in self.skip_ops: return '' s = "%-8s" % name for a in self.act_attr: v = n.getAttribute(a) if v: s = "%s %s=%s" % (s,a,v) return s def meta_actions(self): l = [] for aset in self.ra_node.getElementsByTagName("actions"): for c in aset.getElementsByTagName("action"): s = self.meta_action_head(c) if s: l.append(self.ra_tab + s) if l: return "Operations' defaults (advisory minimum):\n\n" + '\n'.join(l) def cmd_end(cmd,dir = ".."): "Go up one level." levels.droplevel() def cmd_exit(cmd): "Exit the crm program" cmd_end(cmd) if interactive: print "bye" try: readline.write_history_file(hist_file) except: pass for f in tmpfiles: os.unlink(f) sys.exit() # # help or make users feel less lonely # def add_shorthelp(topic,shorthelp,topic_help): ''' Join topics ("%s,%s") if they share the same short description. ''' for i in range(len(topic_help)): if topic_help[i][1] == shorthelp: topic_help[i][0] = "%s,%s" % (topic_help[i][0], topic) return topic_help.append([topic, shorthelp]) def dump_short_help(help_tab): topic_help = [] for topic in help_tab: if topic == '.': continue # with odict, for whatever reason, python parses differently: # help_tab["..."] = ("...","...") and # help_tab["..."] = ("...",""" # ...""") # a parser bug? if type(help_tab[topic][0]) == type(()): shorthelp = help_tab[topic][0][0] else: shorthelp = help_tab[topic][0] add_shorthelp(topic,shorthelp,topic_help) for t,d in topic_help: print "\t%-16s %s" % (t,d) def overview(help_tab): print "" print help_tab['.'][1] print "" print "Available commands:" print "" dump_short_help(help_tab) print "" def topic_help(help_tab,topic): if topic not in help_tab: print "There is no help for topic %s" % topic return if type(help_tab[topic][0]) == type(()): shorthelp = help_tab[topic][0][0] longhelp = help_tab[topic][0][1] else: shorthelp = help_tab[topic][0] longhelp = help_tab[topic][1] print longhelp or shorthelp def cmd_help(help_tab,topic = ''): "help!" # help_tab is an odict (ordered dictionary): # help_tab[topic] = (short_help,long_help) # topic '.' is a special entry for the top level if not topic: overview(help_tab) else: topic_help(help_tab,topic) def add_sudo(cmd): if user_prefs.crm_user: return "sudo -E -u %s %s"%(user_prefs.crm_user,cmd) return cmd def pipe_string(cmd,s): cmd = add_sudo(cmd) try: p = os.popen(cmd,'w') p.write(s) return p.close() except IOError, msg: common_err(msg) return -1 def xml2doc(cmd): cmd = add_sudo(cmd) try: p = os.popen(cmd,'r') except IOError, msg: common_err(msg) return None try: doc = xml.dom.minidom.parse(p) except xml.parsers.expat.ExpatError,msg: common_err("cannot parse output of %s: %s"%(cmd,msg)) p.close() return None p.close() return doc #def pipe_string(cmd,s): # 'Run a program, collect and return stdout.' # if user_prefs.crm_user: # p = Popen3("sudo -E -u %s %s"%(user_prefs.crm_user,cmd), None) # else: # p = Popen3(cmd, None) # p.fromchild.close() # p.tochild.write(s) # p.tochild.close() # p.wait() def str2tmp(s): ''' Write the given string to a temporary file. Return the name of the file. ''' fd,tmp = mkstemp() try: f = os.fdopen(fd,"w") except IOError, msg: common_err(msg) return f.write(s) f.close() return tmp def ext_cmd(cmd): if os.system(add_sudo(cmd)) != 0: return False else: return True def is_program(prog): return os.system("which %s >/dev/null 2>&1"%prog) == 0 def find_program(envvar,*args): if envvar and os.getenv(envvar): return os.getenv(envvar) for prog in args: if is_program(prog): return prog def is_id_valid(id): """ Verify that the id follows the definition: http://www.w3.org/TR/1999/REC-xml-names-19990114/#ns-qualnames """ if not id: return False id_re = "^[A-Za-z_][\w._-]*$" return re.match(id_re,id) def check_filename(fname): """ Verify that the string is a filename. """ fname_re = "^[^/]+$" return re.match(fname_re,id) class UserPrefs(object): ''' Keep user preferences here. ''' def __init__(self): self.skill_level = 2 #TODO: set back to 0? self.editor = find_program("EDITOR","vim","vi","emacs","nano") self.pager = find_program("PAGER","less","more","pg") self.dotty = find_program("","dotty") if not self.editor: missing_prog_warn("editor") if not self.pager: missing_prog_warn("pager") self.crm_user = "" self.xmlindent = " " # two spaces def check_skill_level(self,n): return self.skill_level >= n class CliOptions(object): ''' Manage user preferences ''' skill_levels = {"operator":0, "administrator":1, "expert":2} help_table = odict() help_table["."] = ("user preferences","Various user preferences may be set here.") help_table["skill-level"] = ("set skill level", "") help_table["editor"] = ("set prefered editor program", "") help_table["pager"] = ("set prefered pager program", "") help_table["user"] = ("set the cluster user", """ If you need extra privileges to talk to the cluster (i.e. the cib process), then set this to user. Typically, that is either "root" or "hacluster". Don't forget to setup the sudoers file as well. Example: user hacluster """) help_table["quit"] = ("exit the program", "") help_table["help"] = ("show help", "") help_table["end"] = ("go back one level", "") cmd_aliases = global_aliases def __init__(self): self.cmd_table = { "skill-level": (self.set_skill_level,(1,1),0,(skills_list,)), "editor": (self.set_editor,(1,1),0), "pager": (self.set_pager,(1,1),0), "user": (self.set_crm_user,(0,1),0), "save": (self.save_options,(0,0),0), "show": (self.show_options,(0,0),0), "help": (self.help,(0,1),0), "quit": (cmd_exit,(0,0),0), "end": (cmd_end,(0,1),0), } setup_aliases(self) def set_skill_level(self,cmd,skill_level): """usage: skill-level level: operator | administrator | expert""" if skill_level in self.skill_levels: user_prefs.skill_level = self.skill_levels[skill_level] else: common_err("no %s skill level"%skill_level) return False def get_skill_level(self): for s in self.skill_levels: if user_prefs.skill_level == self.skill_levels[s]: return s def set_editor(self,cmd,prog): "usage: editor " if is_program(prog): user_prefs.editor = prog else: common_err("program %s does not exist"% prog) return False def set_pager(self,cmd,prog): "usage: pager " if is_program(prog): user_prefs.pager = prog else: common_err("program %s does not exist"% prog) return False def set_crm_user(self,cmd,user = ''): "usage: user []" user_prefs.crm_user = user def write_rc(self,f): print >>f, '%s "%s"' % ("editor",user_prefs.editor) print >>f, '%s "%s"' % ("pager",user_prefs.pager) print >>f, '%s "%s"' % ("user",user_prefs.crm_user) print >>f, '%s "%s"' % ("skill-level",self.get_skill_level()) def show_options(self,cmd): "usage: show" self.write_rc(sys.stdout) def save_options(self,cmd): "usage: save" try: f = open(rc_file,"w") except os.error,msg: common_err("open: %s"%msg) return print >>f, 'options' self.write_rc(f) print >>f, 'end' f.close() def help(self,cmd,topic = ''): "usage: help []" cmd_help(self.help_table,topic) cib_dump = "cibadmin -Ql" cib_piped = "cibadmin -p" cib_upgrade = "cibadmin --upgrade --force" cib_verify = "crm_verify -V -p" class WCache(object): "Cache stuff. A naive implementation." def __init__(self): self.lists = {} self.stamp = time.time() self.max_cache_age = 600 # seconds def is_cached(self,name): if time.time() - self.stamp > self.max_cache_age: self.stamp = time.time() self.clear() return name in self.lists def store(self,name,lst): self.lists[name] = lst return lst def retrieve(self,name): if self.is_cached(name): return self.lists[name] else: return None def clear(self): self.lists = {} def is_name_sane(name): if re.match("['/;]",name): common_err("%s: bad file name"%name) return False return True class CibShadow(object): ''' CIB shadow management class ''' help_table = odict() help_table["."] = ("",""" CIB shadow management. See the crm_shadow program. """) help_table["new"] = ("create a new shadow CIB", "") help_table["delete"] = ("delete a shadow CIB", "") help_table["reset"] = ("copy live cib to a shadow CIB", "") help_table["commit"] = ("copy a shadow CIB to the cluster", "") help_table["use"] = ("change working CIB", ''' Choose a shadow CIB for further changes. If the name provided is empty, then the live (cluster) CIB is used. ''') help_table["diff"] = ("diff between the shadow CIB and the live CIB", "") help_table["list"] = ("list all shadow CIBs", "") help_table["quit"] = ("exit the program", "") help_table["help"] = ("show help", "") help_table["end"] = ("go back one level", "") envvar = "CIB_shadow" extcmd = ">/dev/null &1" % self.extcmd) except os.error: no_prog_err(self.extcmd) return False return True def new(self,cmd,name,force = ''): "usage: new [force]" if not is_name_sane(name): return False new_cmd = "%s -c '%s'" % (self.extcmd,name) if force: if force == "force" or force == "--force": new_cmd = "%s --force" % new_cmd else: syntax_err((new_cmd,force), context = 'new') return False if ext_cmd(new_cmd): common_info("%s shadow CIB created"%name) self.use("use",name) def delete(self,cmd,name): "usage: delete " if not is_name_sane(name): return False if cib_in_use == name: common_err("%s shadow CIB is in use"%name) return False if ext_cmd("%s -D '%s' --force" % (self.extcmd,name)): common_info("%s shadow CIB deleted"%name) else: common_err("failed to delete %s shadow CIB"%name) return False def reset(self,cmd,name): "usage: reset " if not is_name_sane(name): return False if ext_cmd("%s -r '%s'" % (self.extcmd,name)): common_info("copied live CIB to %s"%name) else: common_err("failed to copy live CIB to %s"%name) return False def commit(self,cmd,name): "usage: commit " if not is_name_sane(name): return False if ext_cmd("%s -C '%s' --force" % (self.extcmd,name)): common_info("commited '%s' shadow CIB to the cluster"%name) else: common_err("failed to commit the %s shadow CIB"%name) return False def diff(self,cmd): "usage: diff" return ext_cmd("%s -d" % self.extcmd) def list(self,cmd): "usage: list" return ext_cmd("ls @CRM_CONFIG_DIR@ | fgrep shadow. | sed 's/^shadow\.//'") def use(self,cmd,name = ''): "usage: use []" # Choose a shadow cib for further changes. If the name # provided is empty, then choose the live (cluster) cib. # Don't allow ' in shadow names if not is_name_sane(name): return False global cib_in_use if not name or name == "live": name = "" os.unsetenv(self.envvar) else: if not ext_cmd("test -r '@CRM_CONFIG_DIR@/shadow.%s'"%name): common_err("%s: no such shadow CIB"%name) return False os.putenv(self.envvar,name) cib_in_use = name def help(self,cmd,topic = ''): cmd_help(self.help_table,topic) def manage_attr(cmd,attr_ext_commands,*args): if len(args) < 3: bad_attr_usage(cmd,' '.join(args)) return False attr_cmd = None try: attr_cmd = attr_ext_commands[args[1]] except KeyError: bad_attr_usage(cmd,' '.join(args)) return False if not attr_cmd: bad_attr_usage(cmd,' '.join(args)) return False if args[1] == 'set': if len(args) == 4: return ext_cmd(attr_cmd%(args[0],args[2],args[3])) else: bad_attr_usage(cmd,' '.join(args)) return False elif args[1] in ('delete','show'): if len(args) == 3: return ext_cmd(attr_cmd%(args[0],args[2])) else: bad_attr_usage(cmd,' '.join(args)) return False else: bad_attr_usage(cmd,' '.join(args)) return False def rsc2node(rsc): proc = subprocess.Popen(RscMgmt.rsc_showxml % rsc, \ shell=True, stdout=subprocess.PIPE) outp = proc.communicate()[0] # skip until "raw xml:" # NB: depends on crm_resource output l = outp.split('\n') i = 0 for s in l: i += 1 if s.find("raw xml:") == 0: break s = '\n'.join(l[i:]) try: doc = xml.dom.minidom.parseString(s) except xml.parsers.expat.ExpatError,msg: cib_parse_err(msg) common_info("in output from: %s" % (RscMgmt.rsc_showxml % rsc)) return None return doc.childNodes[0] def get_meta_param(id,param): proc = subprocess.Popen(RscMgmt.rsc_meta['show'] % (id,param), \ shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) return proc.communicate()[0] def is_rsc_running(id): if not id: return False proc = subprocess.Popen(RscMgmt.rsc_status % id, \ shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) outp = proc.communicate()[0] return outp.find("running") > 0 and outp.find("NOT") == -1 def is_rsc_clone(rsc_id): rsc_node = rsc2node(rsc_id) try: return rsc_node.tagName == "clone" except: return False def is_rsc_ms(rsc_id): rsc_node = rsc2node(rsc_id) try: return rsc_node.tagName == "master" except: return False def is_process(s): proc = subprocess.Popen("ps -e -o pid,command | grep -qs '%s'" % s, \ shell=True, stdout=subprocess.PIPE) proc.wait() return proc.returncode == 0 def cluster_stack(): if is_process("heartbeat:.[m]aster"): return "heartbeat" elif is_process("[a]isexec"): return "openais" return "" def get_cloned_rsc(rsc_id): rsc_node = rsc2node(rsc_id) if not rsc_node: return "" for c in rsc_node.childNodes: if is_child_rsc(c): return c.getAttribute("id") return "" def get_max_clone(id): v = get_meta_param(id,"clone-max") return v and int(v) or len(listnodes()) def cleanup_resource(rsc,node): if is_rsc_clone(rsc) or is_rsc_ms(rsc): base = get_cloned_rsc(rsc) if not base: return False clone_max = get_max_clone(rsc) rc = True for n in range(clone_max): if not ext_cmd(RscMgmt.rsc_cleanup % ("%s:%d" % (base,n), node)): rc = False else: rc = ext_cmd(RscMgmt.rsc_cleanup%(rsc,node)) return rc class RscMgmt(object): ''' Resources management class ''' rsc_status_all = "crm_resource -L" rsc_status = "crm_resource -W -r '%s'" rsc_showxml = "crm_resource -q -r '%s'" rsc_startstop = "crm_resource --meta -r '%s' -p target-role -v '%s'" rsc_manage = "crm_resource --meta -r '%s' -p is-managed -v '%s'" rsc_migrate = "crm_resource -M -r '%s'" rsc_migrateto = "crm_resource -M -r '%s' -H '%s'" rsc_unmigrate = "crm_resource -U -r '%s'" rsc_cleanup = "crm_resource -C -r '%s' -H '%s'" rsc_param = { 'set': "crm_resource -r '%s' -p '%s' -v '%s'", 'delete': "crm_resource -r '%s' -d '%s'", 'show': "crm_resource -r '%s' -g '%s'", } rsc_meta = { 'set': "crm_resource --meta -r '%s' -p '%s' -v '%s'", 'delete': "crm_resource --meta -r '%s' -d '%s'", 'show': "crm_resource --meta -r '%s' -g '%s'", } rsc_failcount = { 'set': "crm_failcount -r '%s' -U '%s' -v '%s'", 'delete': "crm_failcount -r '%s' -U '%s' -D", 'show': "crm_failcount -r '%s' -U '%s' -G", } rsc_refresh = "crm_resource -R" rsc_refresh_node = "crm_resource -R -H '%s'" rsc_reprobe = "crm_resource -P" rsc_reprobe_node = "crm_resource -P -H '%s'" help_table = odict() help_table["."] = ("","Resource management.") help_table["status"] = ("show status of resources", "") help_table["start"] = ("start a resource", "") help_table["stop"] = ("stop a resource", "") help_table["manage"] = ("put a resource into managed mode", "") help_table["unmanage"] = ("put a resource into unmanaged mode", "") help_table["migrate"] = ("migrate a resource to another node", "") help_table["unmigrate"] = ("migrate a resource to its prefered node", "") help_table["param"] = ("manage a parameter of a resource",""" Manage or display a parameter of a resource (also known as an instance_attribute). Usage: param set param delete param show Example: param ip_0 show ip """) help_table["meta"] = ("manage a meta attribute",""" Show/edit/delete a meta attribute of a resource. Currently, all meta attributes of a resource may be managed with other commands such as 'resource stop'. Usage: meta set meta delete meta show Example: meta ip_0 set target-role stopped """) help_table["failcount"] = ("manage failcounts", """ Show/edit/delete the failcount of a resource. Usage: failcount set failcount delete failcount show Example: failcount fs_0 delete node2 """) help_table["cleanup"] = ("cleanup resource status","") help_table["refresh"] = ("refresh CIB from the LRM status","") help_table["reprobe"] = ("probe for resources not started by the CRM","") help_table["quit"] = ("exit the program", "") help_table["help"] = ("show help", "") help_table["end"] = ("go back one level", "") cmd_aliases = global_aliases.copy() cmd_aliases.update({ "status": ("show","list",), "migrate": ("move",), "unmigrate": ("unmove",), }) def __init__(self): self.cmd_table = { "status": (self.status,(0,1),0,(rsc_list,)), "start": (self.start,(1,1),0,(rsc_list,)), "stop": (self.stop,(1,1),0,(rsc_list,)), "manage": (self.manage,(1,1),0,(rsc_list,)), "unmanage": (self.unmanage,(1,1),0,(rsc_list,)), "migrate": (self.migrate,(1,2),0,(rsc_list,nodes_list)), "unmigrate": (self.unmigrate,(1,1),0,(rsc_list,)), "param": (self.param,(3,4),1,(rsc_list,attr_cmds)), "meta": (self.meta,(3,4),1,(rsc_list,attr_cmds)), "failcount": (self.failcount,(3,4),0,(rsc_list,attr_cmds,nodes_list)), "cleanup": (self.cleanup,(1,2),1,(rsc_list,nodes_list)), "refresh": (self.refresh,(0,1),0,(nodes_list,)), "reprobe": (self.reprobe,(0,1),0,(nodes_list,)), "help": (self.help,(0,1),0), "quit": (cmd_exit,(0,0),0), "end": (cmd_end,(0,1),0), } setup_aliases(self) def status(self,cmd,rsc = None): "usage: status []" if rsc: return ext_cmd(self.rsc_status % rsc) else: return ext_cmd(self.rsc_status_all) def start(self,cmd,rsc): "usage: start " return ext_cmd(self.rsc_startstop%(rsc,"Started")) def stop(self,cmd,rsc): "usage: stop " return ext_cmd(self.rsc_startstop%(rsc,"Stopped")) def manage(self,cmd,rsc): "usage: manage " return ext_cmd(self.rsc_manage%(rsc,"true")) def unmanage(self,cmd,rsc): "usage: unmanage " return ext_cmd(self.rsc_manage%(rsc,"false")) def migrate(self,cmd,*args): """usage: migrate []""" if len(args) == 1: return ext_cmd(self.rsc_migrate%args[0]) else: return ext_cmd(self.rsc_migrateto%(args[0],args[1])) def unmigrate(self,cmd,rsc): "usage: unmigrate " return ext_cmd(self.rsc_unmigrate%rsc) def cleanup(self,cmd,*args): "usage: cleanup []" # Cleanup a resource on a node. Omit node to cleanup on # all live nodes. if len(args) == 2: # remove return cleanup_resource(args[0],args[1]) else: rv = True for n in listnodes(): if not cleanup_resource(args[0],n): rv = False return rv def failcount(self,cmd,*args): """usage: failcount set failcount delete failcount show """ d = lambda: manage_attr(cmd,self.rsc_failcount,*args) return d() def param(self,cmd,*args): """usage: param set param delete param show """ d = lambda: manage_attr(cmd,self.rsc_param,*args) return d() def meta(self,cmd,*args): """usage: meta set meta delete meta show """ d = lambda: manage_attr(cmd,self.rsc_meta,*args) return d() def refresh(self,cmd,*args): 'usage: refresh []' if len(args) == 1: return ext_cmd(self.rsc_refresh_node%args[0]) else: return ext_cmd(self.rsc_refresh) def reprobe(self,cmd,*args): 'usage: reprobe []' if len(args) == 1: return ext_cmd(self.rsc_reprobe_node%args[0]) else: return ext_cmd(self.rsc_reprobe) def help(self,cmd,topic = ''): cmd_help(self.help_table,topic) def print_node(uname,id,type,other,inst_attr): """ Try to pretty print a node from the cib. Sth like: uname(id): type attr1: v1 attr2: v2 """ if uname == id: print "%s: %s" % (uname,type) else: print "%s(%s): %s" % (uname,id,type) for a in other: print "\t%s: %s" % (a,other[a]) for a,v in inst_attr: print "\t%s: %s" % (a,v) class NodeMgmt(object): ''' Nodes management class ''' node_standby = "crm_standby -U '%s' -v '%s'" node_delete = "cibadmin -D -o nodes -X ''" hb_delnode = "@libdir@/heartbeat/hb_delnode '%s'" dc = "crmadmin -D" node_attr = { 'set': "crm_attribute -t nodes -U '%s' -n '%s' -v '%s'", 'delete': "crm_attribute -D -t nodes -U '%s' -n '%s'", 'show': "crm_attribute -G -t nodes -U '%s' -n '%s'", } node_status = { 'set': "crm_attribute -t status -U '%s' -n '%s' -v '%s'", 'delete': "crm_attribute -D -t status -U '%s' -n '%s'", 'show': "crm_attribute -G -t status -U '%s' -n '%s'", } help_table = odict() help_table["."] = ("","Nodes management.") help_table["show"] = ("show node", "") help_table["standby"] = ("put node into standby", "") help_table["online"] = ("bring node online", "") help_table["delete"] = ("delete node", "") help_table["attribute"] = ("manage attributes", """ Edit node attributes. This kind of attribute should refer to relatively static properties, such as memory size. Usage: attribute set attribute delete attribute show Example: attribute node_1 set memory_size 4096 """) help_table["status-attr"] = ("manage status attributes", """ Edit node attributes which are in the CIB status section, i.e. attributes which hold properties of a more volatile nature. One typical example is attribute generated by the 'pingd' utility. Usage: ............... status-attr set status-attr delete status-attr show ............... Example: ............... status-attr node_1 show pingd """) help_table["quit"] = ("exit the program", "") help_table["help"] = ("show help", "") help_table["end"] = ("go back one level", "") cmd_aliases = global_aliases.copy() cmd_aliases.update({ "show": ("list",), }) def __init__(self): self.cmd_table = { "status": (self.status,(0,1),0,(nodes_list,)), "show": (self.show,(0,1),0,(nodes_list,)), "standby": (self.standby,(1,1),0,(nodes_list,)), "online": (self.online,(1,1),0,(nodes_list,)), "delete": (self.delete,(1,1),0,(nodes_list,)), "attribute": (self.attribute,(3,4),0,(nodes_list,attr_cmds)), "status-attr": (self.status_attr,(3,4),0,(nodes_list,attr_cmds)), "help": (self.help,(0,1),0), "quit": (cmd_exit,(0,0),0), "end": (cmd_end,(0,1),0), } setup_aliases(self) def status(self,cmd,node = None): 'usage: status []' return ext_cmd("%s -o nodes"%cib_dump) def show(self,cmd,node = None): 'usage: show []' doc = xml2doc("%s -o nodes"%cib_dump) if not doc: return False nodes_node = get_conf_elem(doc, "nodes") if not nodes_node: return False for c in nodes_node.childNodes: if not is_element(c) or c.tagName != "node": continue if node and c.getAttribute("uname") != node: continue type = uname = id = "" other = inst_attr = [] for attr in c.attributes.keys(): v = c.getAttribute(attr) if attr == "type": type = v elif attr == "uname": uname = v elif attr == "id": id = v else: other[attr] = v for c2 in c.childNodes: if not is_element(c2): continue if c2.tagName == "instance_attributes": inst_attr = nvpairs2list(c2) print_node(uname,id,type,other,inst_attr) def standby(self,cmd,node = None): 'usage: standby []' if not node: node = this_node return ext_cmd(self.node_standby%(node,"on")) def online(self,cmd,node = None): 'usage: online []' if not node: node = this_node return ext_cmd(self.node_standby%(node,"off")) def delete(self,cmd,node): 'usage: delete ' rv = True if cluster_stack() == "heartbeat": rv = ext_cmd(self.hb_delnode%node) if rv: rv = ext_cmd(self.node_delete%node) return rv def attribute(self,cmd,*args): """usage: attribute set attribute delete attribute show """ d = lambda: manage_attr(cmd,self.node_attr,*args) return d() def status_attr(self,cmd,*args): """usage: status-attr set status-attr delete status-attr show """ d = lambda: manage_attr(cmd,self.node_status,*args) return d() def help(self,cmd,topic = ''): cmd_help(self.help_table,topic) def edit_file(fname): 'Edit a file.' if not fname: return if not user_prefs.editor: return return os.system("%s %s" % (user_prefs.editor,fname)) def page_string(s): 'Write string through a pager.' if not s: return if not user_prefs.pager or not interactive: print s else: pipe_string(user_prefs.pager,s) def lines2cli(s): ''' Convert a string into a list of lines. Replace continuation characters. Strip white space, left and right. Drop empty lines. ''' cl = [] l = s.split('\n') cum = [] for p in l: p = p.strip() if p.endswith('\\'): p = p.rstrip('\\') cum.append(p) else: cum.append(p) cl.append(''.join(cum).strip()) cum = [] if cum: # in case s ends with backslash cl.append(''.join(cum)) return [x for x in cl if x] def get_winsize(): try: import curses curses.setupterm() w = curses.tigetnum('cols') h = curses.tigetnum('lines') except: try: w = os.environ['COLS'] h = os.environ['LINES'] except: w = 80; h = 25 return w,h def multicolumn(l): ''' A ls-like representation of a list of strings. A naive approach. ''' min_gap = 2 w,h = get_winsize() max_len = 0 for s in l: if len(s) > max_len: max_len = len(s) cols = w/(max_len + min_gap) # approx. col_len = w/cols for i in range(len(l)/cols): s = '' for j in range(i*cols,(i+1)*cols): if not j < len(l): break if not s: s = "%-*s" % (col_len,l[j]) elif (j+1)%cols == 0: s = "%s%s" % (s,l[j]) else: s = "%s%-*s" % (s,col_len,l[j]) print s class RA(object): ''' CIB shadow management class ''' help_table = odict() help_table["."] = ("",""" Resource Agents (RA) lists and documentation. """) help_table["classes"] = ("list classes and providers", """ Print all resource agents' classes and, where appropriate, a list of available providers. Usage: ............... classes ............... """) help_table["list"] = ("list RA for a class (and provider)", """ List available resource agents for the given class. If the class is `ocf`, supply a provider to get agents which are available only from that provider. Usage: ............... list [] ............... Example: ............... list ocf pacemaker ............... """) help_table["meta"] = ("show meta data for a RA", """ Show the meta-data of a resource agent type. This is where users can find information on how to use a resource agent. Usage: ............... meta [] ............... Example: ............... meta apache ocf meta ipmilan stonith ............... """) help_table["providers"] = ("show providers for a RA", """ List providers for a resource agent type. Usage: ............... providers ............... Example: ............... providers apache ............... """) help_table["quit"] = ("exit the program", "") help_table["help"] = ("show help", "") help_table["end"] = ("go back one level", "") provider_classes = ["ocf"] cmd_aliases = global_aliases def __init__(self): self.cmd_table = { "classes": (self.classes,(0,0),0), "list": (self.list,(1,2),1), "providers": (self.providers,(1,1),1), "meta": (self.meta,(2,3),1), "help": (self.help,(0,1),0), "quit": (cmd_exit,(0,0),0), "end": (cmd_end,(0,1),0), } setup_aliases(self) def classes(self,cmd): "usage: classes" for c in ra_classes(): if c in self.provider_classes: print "%s / %s" % (c,' '.join(ra_providers_all(c))) else: print "%s" % c def providers(self,cmd,ra_type): "usage: providers " print ' '.join(ra_providers(ra_type)) def list(self,cmd,c,p = None): "usage: list []" if not c in ra_classes(): common_err("class %s does not exist" % c) return False if p and not p in ra_providers_all(c): common_err("there is no provider %s for class %s" % (p,c)) return False multicolumn(ra_types(c,p)) def meta(self,cmd,ra_type,ra_class,ra_provider = "heartbeat"): "usage: meta []" ra = RAInfo(ra_class,ra_type,ra_provider) try: page_string(ra.meta_pretty()) except: return False def help(self,cmd,topic = ''): cmd_help(self.help_table,topic) class CibConfig(object): ''' The configuration class ''' help_table = odict() help_table["."] = ("","CIB configuration.") help_table["verify"] = ("verify the CIB before commit", "Verify the CIB (before commit)") help_table["erase"] = ("erase the CIB", "Erase the CIB (careful!).") help_table["ptest"] = ("show cluster actions if changes were committed", """ Show PE (policy engine) motions using ptest. A CIB is constructed using the current user edited version and the status from the current CIB. This CIB is run through ptest to show changes. If you have graphviz installed and X11 session, dotty is run to display the changes graphically. """) help_table["refresh"] = ("refresh from CIB", """ Load the CIB from scratch. All changes are lost. The user is asked for confirmation beforehand """) help_table["show"] = ("display CIB objects", """ The `show` command displays objects. It may display all objects or a set of objects. Specify 'changed' to see what changed. Usage: ............... show [xml] [ ...] show [xml] changed ............... """) help_table["edit"] = ("edit CIB objects", """ This command invokes the editor with the object description. As with the `show` command, the user may choose to edit all objects or a set of objects. If the user insists, he or she may edit the XML edition of the object. Usage: ............... edit [xml] [ ...] edit [xml] changed ............... """) help_table["delete"] = ("delete CIB objects", """ The user may delete one or more objects by specifying a list of ids. If the object to be deleted belongs to a container object, such as group, and it is the only resource in that container, then the container is deleted as well. Usage: ............... delete [ ...] ............... """) help_table["rename"] = ("rename a CIB object", """ Rename an object. It is recommended to use this command to rename a resource, because it will take care of updating all related constraints. Changing ids with the edit command won't have the same effect. If you want to rename a resource, it must be stopped. Usage: ............... rename ............... """) help_table["save"] = ("save the CIB to a file", """ Save the configuration to a file. Optionally, as XML. Usage: ............... save [xml] ............... Example: ............... save myfirstcib.txt ............... """) help_table["load"] = ("import the CIB from a file", """ Load a part of configuration (or all of it) from a local file or a network URL. The `replace` method replaces the current configuration with the one from the source. The `update` tries to import the contents into the current configuration. The file may be a CLI file or an XML file. Usage: ............... load [xml] method URL method :: replace | update ............... Example: ............... load xml replace myfirstcib.xml load xml replace http://storage.big.com/cibs/bigcib.xml ............... """) help_table["template"] = ("edit and import a configuration from a template", """ The specified template is loaded into the editor. It's up to the user to make a good CRM configuration out of it. Usage: ............... template [xml] url ............... Example: ............... template two-apaches.txt ............... """) help_table["commit"] = ("commit the changes to the CIB", """ The changes at the configure level are not immediately applied to the CIB, but by this command or on exiting the configure level. Sometimes, the program will refuse to apply the changes, usually for good reason. If you know what you're doing, you may say 'commit force' to force the changes. """) help_table["upgrade"] = ("upgrade the CIB to version 1.0", """ If you get the "CIB not supported" error, which typically means that the current CIB version is coming from the older release, you may try to upgrade it to the latest revision. The command used is: cibadmin --upgrade --force If we don't recognize the current CIB as the old one, but you're sure that it is, you may force the command. Usage: ............... upgrade [force] ............... """) help_table["node"] = ("define a cluster node", """ The node command describes a cluster node. Nodes in the CIB are commonly created automatically by the CRM. Hence, you should not need to do this yourself unless you also want to define node attributes. Note that it is also possible to manage node attributes at the `node` level. Usage: ............... node [:] [attributes = [=...]] type :: normal | member | ping ............... Example: ............... node node1 node big_node attributes memory=64 ............... """) help_table["primitive"] = ("define a resource", """ The primitive command describes a resource. Usage: ............... primitive [:[:]] [params = [=...]] [meta = [=...]] [operations id_spec [op op_type [=...] ...]] id_spec :: $id= | $id-ref= op_type :: start | stop | monitor ............... Example: ............... primitive apcfence stonith:apcsmart \\ params ttydev=/dev/ttyS0 hostlist="node1 node2" \\ op start timeout=60s \\ op monitor interval=30m timeout=60s primitive www8 apache \\ params configfile=/etc/apache/www8.conf \\ operations $id-ref=apache_ops ............... """) help_table["group"] = ("define a group", """ The `group` command creates a group of resources. Usage: ............... group [...] [params = [=...]] [meta = [=...]] ............... Example: ............... group internal_www disk0 fs0 internal_ip apache \\ meta target-role=stopped ............... """) help_table["clone"] = ("define a clone", """ The `clone` command creates a resource clone. It may contain a single primitive resource or one group of resources. Usage: ............... clone [params = [=...]] [meta = [=...]] ............... Example: ............... clone cl_fence apc_1 \\ meta clone-node-max=1 globally-unique=false ............... """) help_table["ms"] = ("define a master-slave resource", """ The `ms` command creates a master/slave resource type. It may contain a single primitive resource or one group of resources. Usage: ............... ms [params = [=...]] [meta = [=...]] ............... Example: ............... ms disk1 drbd1 \\ meta notify=true globally-unique=false ............... """) help_table["location"] = ("a location preference", """ `location` defines the preference of nodes for the given resource. The location constraints consist of one or more rules which specify a score to be awarded if the rule matches. Usage: ............... location {node_pref|rules} node_pref :: : rules :: rule [id_spec] [$role=] : [rule [id_spec] [$role=] : ...] id_spec :: $id= | $id-ref= score :: | | [-]inf expression :: [bool_op ...] | bool_op :: or | and single_exp :: [type:] | type :: string | version | number binary_op :: lt | gt | lte | gte | eq | ne unary_op :: defined | not_defined date_expr :: date_op [] (TBD) ............... Examples: ............... location conn_1 internal_www 100: node1 location conn_1 internal_www \\ rule 100: #uname eq node1 \\ rule pingd: defined pingd location conn_2 dummy_float \\ rule -inf: not_defined pingd or pingd lte 0 ............... """) help_table["colocation"] = ("colocate resources", """ This constraint expresses the placement relation between two resources. Usage: ............... colocation : [:] [:] ............... Example: ............... colocation dummy_and_apache -inf: apache dummy ............... """) help_table["order"] = ("order resources", """ This constraint expresses the order of actions on two resources. Usage: ............... order score-type: [:] [:] [symmetrical=] score-type :: advisory | mandatory | ............... Example: ............... order c_apache_1 mandatory: apache:start ip_1 ............... """) help_table["property"] = ("set a cluster property", """ Set the cluster (`crm_config`) options. Usage: ............... property [$id=]