subcommand shows a list of nodes known to cman. the state is one of the following:
.br
M The node is a member of the cluster
.br
X The node is not a member of the cluster
.br
d The node is known to the cluster but disallowed access to it.
.br
.SH ENVIRONMENT VARIABLES
cman_tool removes most environment variables before forking and running OpenAIS, as well as adding some of its own for setting up
configuration parameters that were overridden on the command-line, the exception to this is that variable with names starting
COROSYNC_ will be passed down intact as they are assumed to be used for configuring the daemon.
.SH DISALLOWED NODES
Occasionally (but very infrequently I hope) you may see nodes marked as "Disallowed" in cman_tool status or "d" in cman_tool nodes. This is a bit of a nasty hack to get around mismatch between what the upper layers expect of the cluster manager and OpenAIS.
.TP
If a node experiences a momentary lack of connectivity, but one that is long enough to trigger the token timeouts, then it will be removed from the cluster. When connectivity is restored OpenAIS will happily let it rejoin the cluster with no fuss. Sadly the upper layers don't like this very much. They may (indeed probably will have) have changed their internal state while the other node was away and there is no straightforward way to bring the rejoined node up-to-date with that state. When this happens the node is marked "Disallowed" and is not permitted to take part in cman operations.
.P
If the remainder of the cluster is quorate the the node will be sent a kill message and it will be forced to leave the cluster that way. Note that fencing should kick in to remove the node permanently anyway, but it may take longer than the network outage for this to complete.
If the remainder of the cluster is inquorate then we have a problem. The likelihood is that we will have two (or more) partitioned clusters and we cannot decide which is the "right" one. In this case we need to defer to the system administrator to kill an appropriate selection of nodes to restore the cluster to sensible operation.
The latter scenario should be very rare and may indicate a bug somewhere in the code. If the local network is very flaky or busy it may be necessary to increase some of the protocol timeouts for OpenAIS. We are trying to think of better solutions to this problem.
Recovering from this state can, unfortunately, be complicated. Fortunately, in the majority of cases, fencing will do the job for you, and the disallowed state will only be temporary. If it persists, the recommended approach it is to do a cman tool nodes on all systems in the cluster and determine the largest common subset of nodes that are valid members to each other. Then reboot the others and let them rejoin correctly. In the case of a single-node disconnection this should be straightforward, with a large cluster that has experienced a network partition it could get very complicated!
Example:
In this example we have a five node cluster that has experienced a network partition. Here is the output of cman_tool nodes from all systems:
.nf
Node Sts Inc Joined Name
1 M 2372 2007-11-05 02:58:55 node-01.example.com
2 d 2376 2007-11-05 02:58:56 node-02.example.com
3 d 2376 2007-11-05 02:58:56 node-03.example.com
4 M 2376 2007-11-05 02:58:56 node-04.example.com
5 M 2376 2007-11-05 02:58:56 node-05.example.com
Node Sts Inc Joined Name
1 d 2372 2007-11-05 02:58:55 node-01.example.com
2 M 2376 2007-11-05 02:58:56 node-02.example.com
3 M 2376 2007-11-05 02:58:56 node-03.example.com
4 d 2376 2007-11-05 02:58:56 node-04.example.com
5 d 2376 2007-11-05 02:58:56 node-05.example.com
Node Sts Inc Joined Name
1 d 2372 2007-11-05 02:58:55 node-01.example.com
2 M 2376 2007-11-05 02:58:56 node-02.example.com
3 M 2376 2007-11-05 02:58:56 node-03.example.com
4 d 2376 2007-11-05 02:58:56 node-04.example.com
5 d 2376 2007-11-05 02:58:56 node-05.example.com
Node Sts Inc Joined Name
1 M 2372 2007-11-05 02:58:55 node-01.example.com
2 d 2376 2007-11-05 02:58:56 node-02.example.com
3 d 2376 2007-11-05 02:58:56 node-03.example.com
4 M 2376 2007-11-05 02:58:56 node-04.example.com
5 M 2376 2007-11-05 02:58:56 node-05.example.com
Node Sts Inc Joined Name
1 M 2372 2007-11-05 02:58:55 node-01.example.com
2 d 2376 2007-11-05 02:58:56 node-02.example.com
3 d 2376 2007-11-05 02:58:56 node-03.example.com
4 M 2376 2007-11-05 02:58:56 node-04.example.com
5 M 2376 2007-11-05 02:58:56 node-05.example.com
.fi
In this scenario we should kill the node node-02 and node-03. Of course, the 3 node cluster of node-01, node-04 & node-05 should remain quorate and be able to fenced the two rejoined nodes anyway, but it is possible that the cluster has a qdisk setup that precludes this.
.SH CONFIGURATION SYSTEMS
This section details how the configuration systems work in cman. You might need to know this if you are using the -C option
to cman_tool, or writing your own configuration subsystem.
.br
By default cman uses two configuration plugins to OpenAIS. The first, 'ccsconfig', reads the configuration information
stored in cluster.conf and stores it in an internal database, in the same schema as it finds in cluster.conf.
The second plugin, 'cmanpreconfig', takes the information in that the database, adds several cman defaults, determines
the OpenAIS node name and nodeID
and formats the information in a similar manner to openais.conf(5). OpenAIS then reads those keys to start the cluster protocol.
cmanpreconfig also reads several environment variables that might be set by cman_tool which can override information in the
configuration.
.br
In the absence of ccsconfig, ie when 'cman_tool join' is run with -X switch (this removes ccsconfig from the module list),
cmanpreconfig also generates several defaults so that the cluster can be got running without any configuration information - see above
for the details.
.br
Note that cmanpreconfig will not overwrite OpenAIS keys that are explicitly set in the configuration file, allowing you to provide
custom values for token timeouts etc, even though cman has its own defaults for some of those values. The exception to this is the node
name/address and multicast values, which are always taken from the cman configuration keys.