diff --git a/doc/README.hb2openais b/doc/README.hb2openais index 3817dc24fa..9470e56ded 100644 --- a/doc/README.hb2openais +++ b/doc/README.hb2openais @@ -1,275 +1,281 @@ -Heartbeat to OpenAIS cluster stack conversion +Heartbeat to Corosync/OpenAIS cluster stack conversion ============================================= Please read this description entirely before converting to -OpenAIS. Every possible precaution was taken to preclude +Corosync/OpenAIS. Every possible precaution was taken to preclude problems. Still, you should run the conversion only when you understood all the steps and the consequences. You need to know your cluster in detail. The conversion program will inform you about changes it makes. It is up to you to verify that the changes are meaningful. Testing the conversion ---------------------- It is possible (and highly recommended) to test the conversion with your heartbeat configuration without making any changes. This way you will get acquainted with the process and make sure that the conversion is done properly. Create a test directory and copy ha.cf, logd.cf, cib.xml, and hostcache to it: $ mkdir /tmp/hb2openais-testdir $ cp /etc/ha.d/ha.cf /tmp/hb2openais-testdir $ cp /var/lib/heartbeat/hostcache /tmp/hb2openais-testdir $ cp /etc/logd.cf /tmp/hb2openais-testdir $ sudo cp /var/lib/heartbeat/crm/cib.xml /tmp/hb2openais-testdir Run the test conversion: $ /usr/lib/heartbeat/hb2openais.sh -T /tmp/hb2openais-testdir +or + +$ /usr/lib/heartbeat/hb2openais.sh -C -T /tmp/hb2openais-testdir + +to produce corosync.conf. + Here is the scripts usage: usage: hb2openais.sh [-UF] [-u user] [-T directory] [revert] -U: skip upgrade the CIB to v1.0 -F: force conversion despite it being done beforehand -u user: a user to sudo with (otherwise, you'd have to run this as root) + -C: force conversion to corosync (default is openais) -T directory: a directory containing ha.cf/logd.cf/cib.xml/hostcache (use for testing); with this option files are not copied to other nodes and there are no destructive commands executed; you may run as unprivileged uid Note: You can run the test as many times as you want on the same test directory. Copy files just once. Note: The directory where hb2openais.sh resides may be different, e.g. /usr/lib64/heartbeat. -Read and verify the resulting openais.conf and cib-out.xml: +Read and verify the resulting corosync.conf/openais.conf and +cib-out.xml: $ cd /tmp/hb2openais-testdir $ less openais.conf $ crm_verify -V -x cib-out.xml The conversion takes several stages: -1. Generate openais.conf from ha.cf. +1. Generate corosync.conf or openais.conf from ha.cf. 2. Rename nodes ids. 3. Upgrade of the CIB to Pacemaker v1.0 (optional) 4. Addition of pingd resource. 5. Conversion of ocfs2 filesystem. 6. Conversion of EVMS2 CSM containers to cLVM2 volumes. 7. Replacement of EVMS2 with clvmd. -Conversion from the Heartbeat to OpenAIS cluster stack is -implemented in hb2openais.sh which is part of the pacemaker -package. +Conversion from the Heartbeat to the Corosync/OpenAIS cluster +stack is implemented in hb2openais.sh which is part of the +pacemaker package. Prerequisites ------------- /etc/ha.d/ha.cf must be equal on all nodes. /var/lib/heartbeat/crm/cib.xml must be equal on all nodes. This -should have been enforced by the CRM and users should refrain -from making manual changes there. +is enforced by the CRM and users should refrain from making +manual changes there. The ocfs2 filesystems must not be mounted. sshd running on all nodes with access allowed for root. The conversion process ---------------------- This procedure is supposed to be run on one node only. Although the main cluster configuration (the CIB) is automatically replicated, there are some files which have to be copied by other means. For that to work, we need sshd running on all nodes and root access working. For some operations root privileges are required. Either run this script as the root user or, if you have a working sudo setup, specify the privileged user (normally root) using the -u option: -# /usr/lib/heartbeat/hb2openais.sh -u root +$ /usr/lib/heartbeat/hb2openais.sh -u root NB: Do not run this procedure on more than one node! -1. Generate openais.conf from ha.cf. +1. Generate corosync.conf or openais.conf from ha.cf. -/etc/ha.d/ha.cf is parsed and /etc/ais/openais.conf -correspondingly generated. +/etc/ha.d/ha.cf is parsed and /etc/ais/openais.conf or +/etc/corosync/corosync.conf correspondingly generated. Whereas heartbeat supports several different communication -types (broadcast, unicast, multicast), OpenAIS uses only +types (broadcast, unicast, multicast), Corosync/OpenAIS uses only multicasting. The conversion tries to create equivalent media, but with some network configurations it may produce wrong results. Pay particular attention to the "interface" -sub-directive of the "totem" directive. The openais.conf(5) man -page is the reference documentation. +sub-directive of the "totem" directive. The openais.conf(5) or +corosync.conf(5) man page is the reference documentation. Make sure that your network supports IP multicasts. -OpenAIS does not support serial communication links. +Corosync/OpenAIS does not support serial communication links. -In addition, an OpenAIS authentication key is generated. +In addition, an Corosync/OpenAIS authentication key is generated +if authentication has been used in Heartbeat. Corosync key +generation may take some time while corosync-keygen gathers +enough entropy for the key. + +NB: corosync.conf is created with compatibility set to whitetank. 2. Rename nodes ids. -Since the nodes UUID are generated by OpenAIS in a different +Since the nodes UUID are generated by Corosync/OpenAIS in a different manner, the id fields of nodes must be renamed to the node uname. 3. Upgrade of the CIB to Pacemaker v1.0 (optional) There are significant changes introduced in the CIB since heartbeat versions before and including 2.1.4 and the previous pacemaker stable version 0.6. The new CRM in pacemaker still supports the old CIB, but it is recommended to convert to the new -version. You may do so by passing the -U option to the -hb2openais.sh program. If this option is not specified, the -program will still ask if you want to upgrade the CIB to the new -version. +version. The hb2openais.sh program performs the upgrade by +default. This may be skipped by specifying the -U option. If you don't convert to the new CIB version, the new crm shell and configuration tool will not work. 4. Addition of pingd resource. In heartbeat the pingd daemon could be controlled by the heartbeat itself through the respawn ha.cf directive. Obviously, it is not possible anymore, so a pingd resource has to be created in the CIB. Furthermore, hosts from the "ping" directives (the "ping" nodes) are inserted into the "host_list" pingd resource attribute. 5. Conversion of ocfs2 filesystem. The ocfs2 filesystem is closely related to the cluster stack used. It must be converted if the stack is changed. The -conversion script will do this automatically for you. Note that -for this step it will start the cluster stack. The conversion is -performed by the tunefs.ocfs2 program: +conversion script will do this automatically for you. For this +step it will start the cluster stack. The conversion is performed +by the tunefs.ocfs2 program: tunefs.ocfs2 --update-cluster-stack For more details on ocfs2 conversion refer to the ocfs2 documentation. Skip the following two items in case you don't have EVMS2 CSM containers. 6. Conversion of EVMS2 CSM containers to cLVM2 volumes. All EVMS2 CSM containers found on the system are converted by csm-converter (see README.csm-converter for more details). For volume groups referenced in existing resources the CIB (/dev/evms//lvm2//), new LVM resources are created. Order and collocation constraints are created for those resources and new LVM resources to ensure proper start/stop order and resource placement. 7. Non-LVM EVMS2 Skip this in case you don't have EVMS2 resources. It is not possible to deal with this on a SLE11 system, so you should convert it to a compatibility volume on SLES10, which would turn the it into a LVM2 volume group. The CIB then should be modified accordingly. -Note on logging ---------------- - -The CRM still does not share the logging setup with the OpenAIS, -i.e. it does not read the logging stanza from openais.conf. This -will be rectified in future, but in the meantime the logging -configuration has to be replicated in /etc/sysconfig/pacemaker, -for instance: - -USE_LOGD=yes -SYSLOG_FACILITY=local7 - Enforcing conversion -------------------- There is a simple mechanism which prevents running the conversion process twice in a row. If you know what you are doing, it is possible to force the conversion using the -F option. After the conversion -------------------- Once the conversion has been finished, you may start the new cluster stack: -# /etc/init.d/ais start +# /etc/init.d/openais start (for SLE11 HAE >=SP1 too) + +or + +# /etc/init.d/corosync start Put resources back to the managed mode in case they were previously unmanaged. -TODO: What happens to the tunefs.ocfs2 process? We should know -when it's done and stop the cluster stack. - Backup ------ The conversion procedure also creates backup of all affected files. It is possible to revert to the version from the time of backup: # /usr/lib/heartbeat/hb2openais.sh revert Note that the revert process is executed only on the node on which the conversion took place. -TODO: Check effect of hb_uuid files removal on other nodes! They -have to be regenerated and will be different from the nodes -section. Perhaps backup/revert should take place on all nodes. +NB: hostcache and hb_uuid files (in /var/lib/heartbeat) are not +removed. They are not used by Corosync/OpenAIS, hence, once +you're content with the conversion, you may safely remove them. Affected files -------------- All file processing is done on the node where conversion runs. The CIB is the only file which is converted: /var/lib/heartbeat/crm/cib.xml The CIB is removed on all other nodes. The following files are generated: /etc/ais/openais.conf /etc/ais/authkey +or + +/etc/corosync/corosync.conf +/etc/corosync/authkey + The following files are removed on all nodes: /var/lib/heartbeat/crm/cib.xml.sig /var/lib/heartbeat/crm/cib.xml.last /var/lib/heartbeat/crm/cib.xml.sig.last /var/lib/heartbeat/hostcache /var/lib/heartbeat/hb_uuid -The OpenAIS specific files are copied to all nodes using ssh. +The Corosync/OpenAIS specific files are copied to all nodes using ssh. The CIB is automatically replicated by the CRM and it is not copied to other nodes. References ---------- -Configuration_Explained.pdf +http://www.clusterlabs.org/doc/en-US/Pacemaker/1.0/html/Pacemaker_Explained openais.conf(5) +corosync.conf(5) diff --git a/tools/hb2openais.sh.in b/tools/hb2openais.sh.in index 380da2f14b..7e90776630 100755 --- a/tools/hb2openais.sh.in +++ b/tools/hb2openais.sh.in @@ -1,771 +1,804 @@ #!/bin/sh # Copyright (C) 2008,2009 Dejan Muhamedagic # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or (at your option) any later version. # # This software is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # . @sysconfdir@/ha.d/shellfuncs -. $HA_NOARCHBIN/utillib.sh -. $HA_NOARCHBIN/ha_cf_support.sh +# utillib.sh moved (sigh!) +# cluster-glue doesn't make its shared data dir available +# we guess, and guess that that's safe, that the datadir is the same +testdirs="@datadir@/cluster-glue $HA_NOARCHBIN/utillib.sh" +for d in $testdirs; do + if [ -f $d/utillib.sh ]; then + NOARCH_DIR=$d + break + fi +done +test -f $NOARCH_DIR/utillib.sh || { + echo "sorry, could not find utillib.sh in $testdirs" + exit 1 +} + +. $NOARCH_DIR/utillib.sh +. $NOARCH_DIR/ha_cf_support.sh PROG=`basename $0` -# FIXME: once this is part of the package! PROGDIR=`dirname $0` -echo "$PROGDIR" | grep -qs '^/' || { - test -f @sbindir@/$PROG && - PROGDIR=@sbindir@ - test -f $HA_NOARCHBIN/$PROG && - PROGDIR=$HA_NOARCHBIN -} # the default syslog facility is not (yet) exported by heartbeat # to shell scripts # DEFAULT_HA_LOGFACILITY="daemon" export DEFAULT_HA_LOGFACILITY -AIS_CONF=/etc/ais/openais.conf -AIS_KEYF=/etc/ais/authkey -AUTHENTICATION=on -MAXINTERFACE=2 -MCASTPORT=5405 -RRP_MODE=active -SUPPORTED_RESPAWNS="pingd evmsd" - -PY_HELPER=$HA_BIN/hb2openais-helper.py -CRM_VARLIB=$HA_VARLIB/crm -CIB=$CRM_VARLIB/cib.xml -CIBSIG=$CRM_VARLIB/cib.xml.sig -CIBLAST=$CRM_VARLIB/cib.xml.last -CIBLAST_SIG=$CRM_VARLIB/cib.xml.sig.last -HOSTCACHE=$HA_VARLIB/hostcache -HB_UUID=$HA_VARLIB/hb_uuid -DONE_F=$HA_VARRUN/heartbeat/.$PROG.conv_done -BACKUPDIR=/var/tmp/`basename $PROG .sh`.backup -RM_FILES=" $CIBSIG $HOSTCACHE $HB_UUID $CIBLAST $CIBLAST_SIG" -REMOTE_RM_FILES=" $CIB $RM_FILES" -BACKUP_FILES=" $AIS_CONF $AIS_KEYF $REMOTE_RM_FILES " -DIST_FILES=" $AIS_CONF $AIS_KEYF $DONE_F " -MAN_TARF=/var/tmp/`basename $PROG .sh`.tar.gz : ${SSH_OPTS="-T"} usage() { cat</dev/null else ssh -T -o Batchmode=yes $1 true 2>/dev/null fi } findsshuser() { for u in "" $TRY_SSH; do rc=0 for n in `getnodes`; do [ "$node" = "$WE" ] && continue testsshuser $n $u || { rc=1 break } done if [ $rc -eq 0 ]; then echo $u return 0 fi done return 1 } important() { echo "IMPORTANT: $*" >&2 } newportinfo() { important "the multicast port number on $1 is set to $2" important "please update your firewall rules (if any)" } changemediainfo() { - important "openais uses multicast for communication" + important "$PRODUCT uses multicast for communication" important "please make sure that your network infrastructure supports it" } multicastinfo() { - info "multicast for openais ring $1 set to $2:$3" + info "multicast for $PRODUCT ring $1 set to $2:$3" } netaddrinfo() { - info "network address for openais ring $1 set to $2" + info "network address for $PRODUCT ring $1 set to $2" } backup_files() { [ "$TEST_DIR" ] && return info "backing up $BACKUP_FILES to $BACKUPDIR" $DRY mkdir $BACKUPDIR || { echo sorry, could not create $BACKUPDIR directory echo please cleanup exit 1 } if [ -z "$DRY" ]; then tar cf - $BACKUP_FILES | gzip > $BACKUPDIR/$WE.tar.gz || { echo sorry, could not create $BACKUPDIR/$WE.tar.gz exit 1 } else $DRY "tar cf - $BACKUP_FILES | gzip > $BACKUPDIR/$WE.tar.gz" fi } revert() { [ "$TEST_DIR" ] && return test -d $BACKUPDIR || { echo sorry, there is no $BACKUPDIR directory echo cannot revert exit 1 } info "restoring $BACKUP_FILES from $BACKUPDIR/$WE.tar.gz" gzip -dc $BACKUPDIR/$WE.tar.gz | (cd / && tar xf -) || { echo sorry, could not unpack $BACKUPDIR/$WE.tar.gz exit 1 } } pls_press_enter() { [ "$TEST_DIR" ] && return cat</dev/null | prochbmedia 2>/dev/null | sort -u | wc -l` if [ $mediacnt -ge 2 ]; then setvalue rrp_mode $RRP_MODE fi changemediainfo endstanza # the logging stanza getlogvars # enforce some syslog facility +[ "$COROSYNC" ] && + TO_FILE=to_logfile || + TO_FILE=to_file debugsetting=`setdebug` newstanza logging setvalue debug $debugsetting setvalue fileline off setvalue to_stderr no +setvalue timestamp off if [ "$HA_LOGFILE" ]; then - setvalue to_file yes + setvalue $TO_FILE yes setvalue logfile $HA_LOGFILE else - setvalue to_file no + setvalue $TO_FILE no fi if [ "$HA_LOGFACILITY" ]; then setvalue to_syslog yes setvalue syslog_facility $HA_LOGFACILITY else setvalue to_syslog no fi +newstanza logger_subsys +setvalue subsys AMF +setvalue debug $debugsetting +endstanza endstanza newstanza amf setvalue mode disabled endstanza } if [ -z "$DRY" ]; then openaisconf > $AIS_CONF || fatal "cannot create $AIS_CONF" grep -wqs interface $AIS_CONF || fatal "no media found in $HA_CF" else openaisconf fi [ "$AIS_KEYF" ] && { info "Generating a key for OpenAIS authentication ..." if [ "$TEST_DIR" ]; then - echo would run: $DRY ais-keygen + echo would run: $DRY $KEYGEN_PROG else - $DRY ais-keygen || - fatal "cannot generate the key using ais-keygen" + $DRY $KEYGEN_PROG || + fatal "cannot generate the key using $KEYGEN_PROG" fi } # remove various files which could get in a way if [ -z "$TEST_DIR" ]; then $DRY rm -f $RM_FILES fi fixcibperms() { [ "$TEST_DIR" ] && return uid=`ls -ldn $CRM_VARLIB | awk '{print $3}'` gid=`ls -ldn $CRM_VARLIB | awk '{print $4}'` $DRY $MYSUDO chown $uid:$gid $CIB } upgrade_cib() { $DRY $MYSUDO cibadmin --upgrade --force $DRY $MYSUDO crm_verify -V -x $CIB_file } py_proc_cib() { tmpfile=`maketempfile` $MYSUDO sh -c "python $PY_HELPER $* <$CIB >$tmpfile" || fatal "cannot process cib: $PY_HELPER $*" $DRY $MYSUDO mv $tmpfile $CIB } set_property() { py_proc_cib set_property $* } # remove the nodes section from the CIB py_proc_cib set_node_ids info "Edited the nodes' ids in the CIB" numnodes=`getnodes | wc -w` [ $numnodes -eq 2 ] && set_property no-quorum-policy ignore set_property expected-nodes $numnodes overwrite -info "Done converting ha.cf to openais.conf" +info "Done converting ha.cf to $AIS_CONF_BASE" important "Please check the resulting $AIS_CONF" important "and in particular interface stanzas and logging." important "If you find problems, please edit $AIS_CONF now!" # # first part done (openais), on to the CIB analyze_cib() { info "Analyzing the CIB..." $MYSUDO sh -c "python $PY_HELPER analyze_cib <$CIB" } check_respawns() { rc=1 for p in $SUPPORTED_RESPAWNS; do grep -qs "^respawn.*$p" $HA_CF && { info "a $p resource has to be created" rc=0 } done return $rc } part2() { intro_part2 || return 0 opts="-c $HA_CF" [ "$TEST_DIR" ] && opts="-T $opts" py_proc_cib $opts convert_cib info "Processed the CIB successfully" } # make the user believe that something's happening :) some_dots_idle() { [ "$TEST_DIR" ] && return cnt=0 printf "$2 ." while [ $cnt -lt $1 ]; do sleep 1 printf "." ctn=$((cnt+1)) done echo } print_dc() { crm_mon -1 | awk '/Current DC/{print $3}' } dcidle() { dc=`$MYSUDO print_dc` if [ "$dc" = "$WE" ]; then maxcnt=60 cnt=0 while [ $cnt -lt $maxcnt ]; do stat=`$MYSUDO crmadmin -S $dc` echo $stat | grep -qs S_IDLE && break [ "$1" = "-v" ] && echo $stat sleep 1 printf "." cnt=$((cnt+1)) done echo $stat | grep -qs S_IDLE else some_dots_idle 10 #just wait for 10 seconds fi } wait_crm() { [ "$TEST_DIR" ] && return cnt=10 dc="" while [ -z "$dc" -a $cnt -gt 0 ]; do dc=`$MYSUDO print_dc` cnt=$((cnt-1)) done if [ x = x"$dc" ]; then echo "sorry, no dc found/elected" exit 1 fi dcidle } manage_cluster() { if [ "$TEST_DIR" ]; then echo would run: /etc/init.d/openais $1 else $DRY /etc/init.d/openais $1 fi } tune_ocfs2() { cat< $MAN_TARF) fi