Page MenuHomeClusterLabs Projects

crm.txt
No OneTemporary

This document is not UTF8. It was detected as ISO-8859-1 (Latin 1) and converted to UTF8 for display.
DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT! DRAFT!
NOTICE: Some ideas in this paper aren't yet well sorted. Some ideas aren't
complete. Some phrasings I'm myself not happy with yet. Some ideas need
further explanation. Most of the ideas presented are not final yet. It is
mostly a braindump.
And did I say yet that this is still a DRAFT!!!!!! ?
Title: Design of a Cluster Resource Manager
Revision: $Id: crm.txt,v 1.2 2003/12/01 14:10:14 lars Exp $
Author: Lars Marowsky-Brée <lmb@suse.de>
Acknowledgements: Andrew Beekhof <andrew@beekhof.net>
Luis Claudio R. Goncalves <lclaudio@conectiva.com.br>
Fábio Olivé Leite <olive@conectiva.com.br>
Alan Robertson <alanr@unix.sh>
XX. Global TODO for this document
In this section I keep track of tasks which I still want to perform on this
document; probably of little interest to anyone else, and this section should
be gone in the final version ;-)
- Break out major parts (CIB, Policy Engine etc) into their own documents.
- Add references to sub-features used and do not replicate too much
information here. References should take the form of the feature id from the
feature list.
- Ensure unified use of words and terms when referring to the
components.
- Convert to docbook (not pressing)
0. Abstract
This paper outlines the design of a clustered recovery/resource manager
to be running on top of and enhancing the Open Clustering Framework
(OCF) infrastructure provided by heartbeat.
The goal is to allow flexible resource allocation and globally ordered
actions in a cluster of N nodes and dynamic reallocation of resources in
case of failures (ie, "fail-over") or administrative changes to the
cluster ("switch-over").
This new Cluster Resource Manager is intended to be a replacement for the
currently used "resource manager" of heartbeat.
1. Introduction
1.1. Requirements overview
The CRM needs to provide the following functionality:
- Secure.
- Simple.
heartbeat already provides these two properties; by no means may the
new resource manager be more insecure. For sanity and stability - both
of the developers, but in particular the users -, complexity needs to
be kept at a minimum (but no simpler).
- Support for more than 2 nodes.
- Fault tolerant.
(Failures can be node, network or resource failures.)
- Ability to deal with complex policies for resource allocation and
dependencies. Examples:
- Support for globally ordered starting order,
- Support for feedback in the allocation process to better support
replicated resources,
- Negated dependencies etc
- Ability to make administrative requests for resource migration, adding new
resources, removing resources et cetera while online.
- Extensible framework.
1.2. Scenario description
The design outlined in this document is aimed at a cluster with the following
properties; some of them will be further specified later.
- The cluster provides a Concensus Membership Layer, as outlined in the OCF
documents and provided by the CCM implementation by Ram Pai.
(This provides all nodes in a partition with a common and agreed upon
view of cluster membership, which tries to compute the largest set of
fully connected nodes.)
- It is possible for the nodes in a given partition to order all nodes
in the partition in the same way, without the need for a distributed
decision. This can be either achieved by having the membership be
returned in the same order on all nodes, or by attaching a
distinguishing attribute to each node.
- The cluster provides a communication mechanism for unicast as well as
broadcast messaging.
Messages don't necessarily satisfy any special ordering, but they are
atomic.
- Byzantine failures are rare and self-contained. Stable storage is stable and
not otherwise corrupted; network packets aren't spuriously generated etc.
These errors are self-contained to the respective components which take
appropriate precautions to prevent them from propagating upwards.
(Checksums, authentication, error recovery or fail-fast)
- Time is synchronized cluster-wide. Even in the face of a network
partition, an upper bound for time diversion can be safely assumed.
This can be virtually guaranteed by running NTP across all nodes in
the cluster.
- An IO fencing mechanism is available in the cluster.
The fencing mechanism provides definitive feedback on whether a given
fencing request succeeded or not.
- A node knows which resources itself currently holds and their state
(running / failed / stopped); it can provide this information if
queried and will inform the CRM of state changes.
(This part will be provided by the Local Resource Manager.)
2.1. Basic algorithm
The basic algorithm can be summarized as follows:
a) Every partition elects a "Designated Coordinator"; this node will
activate special logic to coordinate the recovery and administrative
actions on all nodes in the cluster.
a.1) The DC has the full state of the cluster available (or is able to
retrieve it), as well as an uptodate copy of the administrative
policies, information about fenced nodes etc; this shall further be
referred to as "Cluster Information Base".
b) Whenever a cluster event occurs, be it an adminstrative request, a node
failure by membership services or a resource failure reported by a
participating LRM, it is forwarded to the "Designated Coordinator".
b1) For administrative requests, the DC arbitates whether or not they will be
accepted into the CIB and serializes these updates. ie, policy changes which
cannot be satisfied or would lead to an inconsistent state of the cluster will
be rejected (unless explicitly overridden).
c) It then computes via the Policy Engine:
c.1) the new CIB
c.2) The Transition Graph, an dependency-ordered graph of the actions
necessary to go from the current cluster state as close as possible to
the cluster state described by the CIB.
d) Leading the transition to the target state:
d1) Replicating the new CIB to all clients.
d2) Executing each step of the transition graph in dependency order.
(Potentially parallelized.)
e) Exception handling if _any_ event or failure occurs:
e1) The algorithm is aborted cleanly; pending operations are allowed to
finish, but no new commands are issued to clients (in particular during
phase d2)
e2) The algorithm is invoked again from scratch.
(It is obvious that there is room for optimization here by only
recomputing smaller parts of the dependency tree or not broadcasting the
full CIB every time, in particular if the DC has not been re-elected.
However, these complicate the implementation and are not necessary for
the first phase.)
2.2) Feature analysis
This meets the requirements listed as follows:
- It is reasonably simple to have a policy engine capable of dealing
with more than two nodes as distributed decisions are avoided as far
as possible; only a single node leads the transition and coordinates
participating nodes.
Local nodes are only required to know their own state; whenever the
coordinator fails, the necessary state can simply be reconstructed by the
DC.
- An event can be anything from a failed node or a request to add a new policy
rule (ie, a new resource, a change to an allocation policy etc); this
satisfies the requirement to deal with adminstrative requests at runtime.
- Support for new kinds of policies, types of resources, ... can be added via
the policy engine.
- The modular design keeps complexity in any single component down.
2.3. Stability of the algorithm
This algorithm will eventually converge if no new events occur for a
sufficiently long period of time.
TODO: Add a discussion on factors affecting stability or kill the section
entirely.
2.4. Components
This approach neatly divides the task into various components: (see
crm-flowchart.fig!)
- Cluster infrastructure
- heartbeat
- Concensus Cluster Membership
- Messaging
- Cluster Resource Manager
- Policy Engine
- Cluster Information Base
- Transitioner
- Local Resource Manager
- Executioner
- Resource Agents
3. Local Resource Manager
Note: This section only documents the requirements on the LRM from the point
of view of the cluster-wide resource management. For a more detailled
explanation, see the documentation by lclaudio.
[ TODO: Add reference to lclaudio's document as soon as available ]
This component knows which resources the node currently holds, their status
(running, running/failed, stopping, stopped, stopped/failed, etc) and can
provide this information to the CRM. It can start, restart and stop resources
on demand. It can provide the CRM with a list of supported resource types.
It will initiate any recovery action which is applicable and limitted to the
local node and escalate all other events to the CRM.
Any correct node in the cluster is running this service. Any node which fails
to run this service will be evicted and fenced.
It has access to the CIB for the resource parameters when starting a resource.
NOTE: Because it might be necessary for the success of the monitoring
operation that it is invoked with the same instance parameters as the resource
was started with, it needs to keep a copy of that data, because the CIB might
change at runtime.
4. Cluster Resource Manager
The CRM coordinates all non-local interactions in the cluster. It interacts
with:
- The membership layer
- The local resource managers
- non-local CRMs
- Administrative commands
- Fencing functionality
Only one node is running the "Designated Coordinator" CRM at any given time in
any given partition. All other nodes forward their input to this node only,
and will relay its decisions to the local LRM.
The coordinator is a "primus inter pares"; in theory, any CRM can act in this
fashion, but the arbitation algorithm will distinguish a designated node.
4.1. How to deal with failure of the designated CRM
There are two major groups of failures here; if the entire node running the
CRM fails, the underlaying membership algorithm shall take care of this.
If the CRM logic fails, this shall be detected by internal consistency checks
and local heartbeating to apphbd must stop immediately, effectively committing
'suicide' and providing a failfast mechanism. However, for now we will
assume that the CRM does not fail. Coping with internal failures
internally is always difficult.
This can later be enhanced by providing 'peer monitoring' of the CRMs among
eachother and initiating fencing if necessary; however this has many pitfalls
and is not inherently better.
4.2. Election algorithm for the DC
The election algorithm exploits the fact that there is a global ordering of
the nodes in a given partition; it will simply select the first one. The node
will know that it has been "distinguished" and take control.
As the first action in the algorithm is to collect the status from all
nodes, this will inform any node about the newly elected leader. Should
any of them have been a DC until the new membership (in the case of
cluster partitionings healing), he will cease operation and handover
control in an orderly fashion.
4.3. Consistency audits
The DC can perform a variety of consistency checks:
- Exclusive allocation of resources
- All nodes have the necessary Resource Agents for the resources which might
be running on them
- Adminstrative requests (ie, rule additions) can be rejected if it would
prevent the policy engine from computing a valid target state
Implementing any of the audits is optional.
4.4. Communication between the CRM and LRM
- Every resource in the system is clearly identified in the CIB via a "UUID".
This is the key passed around between CRM/LRM.
(This avoids issues like with FailSafe, which used a "resource name / type"
combination as the key; however, this made it very difficult to have, for
example, two filesystems mounted at /usr/sap - production system and the
test system - , because that is a key-clash. It is however a perfectly
legal combination, as long as the resources are not activated on the same
node; which can however be avoided by an appropriate negative dependency.
Combined with "resource priority", the higher priority production system
will push off the lower-priority test system and operation will continue)
TODO: Protocol and channel need to be decided; based on the heartbeat IPC
layer, wrapper library so they don't have to know all the gory details. AI
lclaudio
4.4. Executing the Transition Graph
It is also the task of the DC to execute the computed dependency graph.
The graph will be traversed and evaluated in dependency-compliant order.
Every node in the graph corresponds to a single action; the links between them
correspond to the dependencies.
Only nodes for which all dependencies have been satisfied will be executed;
the CRM is allowed to parallelize these however.
If a task cannot be successfully evaluated and has to be ultimately considered
failed, this will be treated as a hard barrier - all currently pending tasks
will run to completion, appropriate constraints added to the CIB (ie,
"resource X cannot start on node N1", for example) and the graph execution
will be aborted with a failure; this will escalate the recovery back to the
higher level again. (ie, trigger a rerun of the recovery algorithm)
TODO: This needs more thought. Especially the failure paths could simply be
treated just as normal nodes in the graph, simplifying the logic here; and for
some tasks where re-running the recovery algorithm is pointless, the Policy
Engine could precompute alternate actions (like marking the resource
hierarchy failed from that node upwards etc). Could be a possible future
extension not implemented in v1.0.
TODO: A pure dependency graph as outlined doesn't easily allow to express "OR"
conditions, but only "AND". "OR" conditions might be helpful if a node could
be fenced via multiple mechanisms, and if each one should be tried in order as
a fallback; this could be implemented by allowing a node in the graph to be a
_list_ of actions to be tried in order.
5. Cluster Information Base
The CIB is also running on every node in the cluster. In essence, it provides
a distributed database with weak transactional semantics, exploiting the fact
that all updates are serialized by the DC and that each node itself knows
its own latest status.
5.1. Contents of the CIB
The CIB is divided into two major parts; the "configuration" data and the
"runtime" data.
a) The configuration part of the database is setup by the
administrator.
The configuration present on any given node is appropriately versioned; a
combination of timestamps and generation counter seems sensible. Thus the most
recent version available in a partition can always be clearly identified and
selected.
- Configured resources
- Resource identifier
- Resource instance parameters
- Special node attributes
- Administrative policies
- Resource placement constraints
- Resource dependencies
b) Runtime information
This includes:
- Information about the currently participating nodes in the partition.
- Resource Agents present on each node
- Resources active on the node and their status
- Operational status of the node itself
- Fencing data
- results: the timestamped result of a fencing request.
- metadata: each nodes contributes the list of fencing devices available
to it.
TODO: Maybe this is part of the "static" configuration data?
- Dynamic administrative policies and constraints:
- Temporary constraints to deny placing a resource on a node, ie in response to
failures etc.
- Resource migration requests by the administrator.
(These might be ignored by the policy engine if otherwise a consistent
state cannot be computed or because they have a limitted life time, for
example "until the node has booted again")
The runtime data can be constructed by merging all available data in the
partition by exploiting that every node holds authoritive data on itself.
5.2. Process of generating an uptodate CIB
If necessary, the DC will retrieve the CIB from each node and compute the
uptodate CIB from this data and broadcast the result to all nodes.
The algorithm for merging the CIBs is rather straightforward:
- Select the most recent configuration from all nodes.
Note that this relies on the fact that an adminstrator has the wits to not
force incompatible changes to separate cluster partitions; if he does, some
configuration changes might be lost (or rather, overridden by the others),
but that is a classical PEBKAC anyway.
More complex merging algorithms could always be devised later for this step;
the straightforward approach is to complain loudly about this problem as it
can be clearly detected, and to also try to reject configuration changes in
a degraded cluster unless forced.
If any node in the cluster has a valid copy of the configuration, the
cluster will be able to proceed. The case where this is not true is very
unlikely; it requires at least a double failure to occur. The worstcase in
this scenario is that some configuration changes might be reverted.
- Merge the runtime data.
- A node present in the partition is assumed to always have authoritive data
on itself, its current status, resources it helds etc. This overrides any
other data about it from other nodes.
("Normative power of facts" as opposed to rumours)
- For nodes not present in the partition, their latest status can be
identified from the partial information present on other nodes because it
is timestamped. ie, it is possible to say whether they were cleanly
shutdown, whether they have already been fenced cleanly or whether a
fencing attempt has failed etc.
- Update timestamps and generation counters as appropriate.
- Commit locally and broadcast.
5.3. How are updates to the CIB handled?
Any updates to the configuration will be serialized by the DC; they will be
verified, committed to its own CIB and also broadcast to all nodes in the
partition as an incremental update.
Any updates pertaining to the runtime portion of the CIB can simply be
broadcast to all nodes; the DC will receive them too, and all other nodes can
save them for further reference so they will be available if the (new) DC
needs to compute a new CIB.
6. Policy Engine
6.1. Functionality provided
Even if only simple dependencies are supported at first (of the first two
types, probably) which is straightforward to implement, the model can be
easily extended.
6.1.1. Required constraints
- To be started on the same node as resX after resX
- To only be started on a node from the set {A, B, C}
6.1.2. Future extensions
- To be started after resX, but without tieing them both to a given node, ie
providing globally order
- To NOT be run on the same node as resX (or generically, negated constraints)
- To only be placed on a node with the attribute FOO (for example, connected
to the FC-AL rack, or with at least XX mb of memory and others)
6.2. Thoughts about algorithm implementation
The process of arriving at a target state / transition graph for the cluster
from the set of rules and the current cluster state (basically, all
information in the CIB is available) can be implemented by a "constraint
solver".
- Analyse dependency graph in the configuration;
- Compute eligible nodes for each subtree (intersection of nodes
configured for resources within a dependency subtree)
- Order eligible nodes by priority, stickiness etc and select target node
- All other eligible nodes either have to be part of the partition or must
be fenced successfully.
...
TODO: UNFINISHED
TODO: Alternate implementation: Steal Finite Domain solver from GNU Prolog;
steal constraint solver from
http://www.cs.washington.edu/research/constraints/cassowary/ etc
would allow policies to be expressed as intuitive constraints. Rather
cool indeed. Evaluation.
6.4. Design considerations
The following part should answer the most common questions regarding this
approach:
6.4.1. Whether to deal with resource groups or "only" resources and
dependencies?
At the first glance, resource groups a la FailSafe are a welcome
simplification. They can be treated as atomic units, resource allocation
policy seems to become simpler, the CRM wouldn't even have to know about
resources but simply allocate/reallocate resource groups. It seems natural to
group resources in this fashion; all related resources are put into a resource
group and done.
However, it is also limitting. Resource groups become awkward if they stop
being a logical mapping between resource and provided service; for example,
the natural answer is to throw everything which should be running on a single
node into a single resource group. If this becomes unnecessary later, resource
group splitting is difficult. (Same for merging)
Dependencies which spawn resource groups are also commonly requested; ie, a
resource group should not run on the same node as a given other one. They are
also cumbersome if one has to have a resource group for _everything_, even if
it might only be a single resource or two. Then the abstraction gains little.
Some decisions also spawn the boundary between single resources and resource
groups, in which case the CRM has to know about both again. For example, the
allocation policy of a resource group must be in line with how each resource
itself can be allocated.
In short, resource groups seem to fall short of easing bigger clusters and
also lack some flexibility which can only be added at the expense of
complexity.
It seems more natural to me (by now) to only deal with resources and
dependencies between them and to external constraints. I believe that in the
end, both are just mindsets and that none is inherently harder to understand
than the other, just one which is cleaner to implement.
A very important point to keep in mind is that this document reflects the
_internal_ representation of the configuration. An appropriate configuration
wizard could (reasonably easily, although with limited features) present the
user with a 'resource group'-style frontend.
6.4.2. Why a transition graph
The dependency information in the graph will allow operations to be
parallelized to speed up recovery. (All operations for which all requirements
are satisfied can be carried out in parallel.)
The graph is "global" (by virtue of being centralized at the DC) and thus even
allows the possibility of "barriers"; ie, starting a resource only if a
resource has been started on another node.
6.4.3. Who orders resource operations on a single node
The LRM does not have enough information to order resource operations (start,
stop etc) even for the local node, because it might - in theory - depend on
non-local actions (resource operations, fencing etc).
Only the transition graph as constructed by the DC contains such information.
A possible optimisation here is that the DC transmits all operations which do
not depend on external events as a single block to a given node, which can
then proceed accordingly. Initially, just issueing the actions one by one
shall suffice.
7. Executioner
7.1. Integration
The Executioner is responsible for the fencing of nodes on request. It is a
local component on each node and knows which fencing devices are available to
it.
This information is contributed to the CIB.
On request of the CRM (relayed from the DC), it will carry out a fencing
attempt and report the status back. The CRM might of course try to fence a
given node from multiple nodes until one succeeds.
See "executioner.txt" for details.
7.2. Fencing algorithm
At the start of the recovery process, the CRM shall verify the list of devices
reachable by each node; this shall be done as part of querying the current CIB
from them.
It will then retrieve the list of nodes fenceable via each device. If this
list has already been retrieved in the past, it may be reused from cache
appropriately. (Reloaded only when new nodes get added to the cluster, or
something)
For nodes which need to be fenced, appropriate dependencies will need to be
added to the transition graph. These shall ensure that any given device is not
concurrently accessed, and that fencing requests are appropriately retried on
other devices if available.
THOUGHT: Adding "meatware" as a ultima ratio fallback fencing would allow
disaster resilient setups; if the one site failed completely, the
administrator could flip the switch and the transition graph would proceed.
This would neatly address this part of the requirements list.
7.3. Un-fencing
Aborting an on-going fencing / STONITH operation is not supported.
7.3.1. After a successful fencing
Okay, so a node has been fenced. How does it properly rejoin the cluster?
For example, a node n1 can notice an unclean shutdown at startup, and must
assume it has been STONITHed (in particular, if uptime very low ;-).
When can it clear this flag and initiate recovery / startup actions on its
own?
The answer seems to relate to the "thought" under 6.2.; when is recovery
action initiated for a resource group anyway. An additional option "Proceed if
more than 50% of the nodes for the resource group are in our partition; if it
is a tie, do not proceed to initiate recovery if the 'unclean' flag is set on
any node."
(This moves the unclean flag into the runtime portion of the CIB)
The flag can be cleared once majority for all resources has been reached
again.
7.3.2. Rejoining node while fencing requests are still pending
Answer: Running fencing requests can not be aborted. In this rather
unfortunate case, the node will most likely disappear soon because of the
fencing request. Provisions might be made to consider this node as "down"
already, to prevent resource pingpong.
TODO: Ok, so n1 tried to fence n2 and vice-versa because of a partition, they
failed via the STONITH device and have fallen back to meatware. The partition
resolves. Shouldn't the system try to abort the meatware request? Maybe a
special case for "meatware" requests? Or should meatware just notice this?
7.3.3. Rejoining node for which fencing has ultimately failed in the past
Two options are basically possible; if a node has rejoined the cluster, the
"fencing failed" flag could be cleared for it, and the cluster could proceed
as normal. (This seems sensible)
The other option would be to request the node to commit suicide. Again, this
is difficult in the case of a resolving cluster partition; both sides would
commit suicide. This doesn't seem sensible.
If the failure was due to a local hang - ie, scheduler bug, power management
running wild etc - this shall be considered outside the scope of this
discussion. The local health monitoring on such a node shall be responsible
for containing such failures and reacting to it appropriately.
8. Interaction with quorum
TODO: Highly unfinished!
Summary: CRM does not need quorum. However, the CRM could easily compute
'quorum' as just another resource.
"Quorum" is in fact not necessary for this design. It is implicit in the
policy engine / CRM which will only bring a resources for which all
dependencies - including fencing - have been satisfied.
This in fact is quorum with slightly finer granularity. It allows the cluster
to proceed in a scenario like:
- {a,b,c,d} form a cluster. {a,b} share resources (called R1) and {c,d} share
resources (called R2). If the cluster is partitioned int {a,b} and {c,d}, no
fencing has to carried out; cluster operation can proceed as normal.
Of course, as soon as a global resource spawning {a,b,c,d} is added, this in
fact translates to "global quorum".
This makes me think that if global quorum is in fact required it can be best
expressed in this design by mapping it to such a global ('configured on all
nodes') resource and communicating to the partition that it has quorum if it
was able to recover this resource (or failed to recover it, that is).
However, it also allows for "sub-quorum"; ie, given the example of an
application requiring quorum to operate, it will usually only be interested in
quorum of the _nodes eligible for the related resources_. So quorum could
potentially be different if reported to different clients...
8.1. Issues wrt quorum
TODO: Certainly ;-)
10. Integration with other projects
10.1. Integration with heartbeat
TODO: Elaborate.
This component is supposed to replace the "resource manager" present in
heartbeat currently.
Issues which need to be addressed / kept in mind:
- Start order; heartbeat should only join the cluster if the startup of CRM is
successfully completed locally.
- CRM makes extensive use of heartbeat's libraries (IPC, HBcomm,
STONITH, PILS etc).
...
10.2. Integration with CCM
TODO: Elaborate, if necessary.
CRM is a client of CCM; CCM provides the set of nodes in the partition, CRM
only operates on this data.
It would be nice if CCM enforced policy before allowing a node to join a
partition; ie, time not vastly desynchronized, CRM/LRM etc running and other
ideas come to mind.
10.3. Integration with Group Services
Should Group Services functionality - in particular, group messaging - be
available one day, the exact semantics will of course have to be taken into
account.
The obvious simplification possible with this component would be that the CRM
could form a distributed process group in the cluster; instead of building on
top of node messaging primitives and filtering these.
10.4. Integration with non-heartbeat clusters
In theory, the software should be able to run in any "compliant" environment;
I'd safely assume that it should be reasonably easy to port on top of the
Compaq CI, for example, or any Service-Availability Forum AIS.
This would complement the respective feature lists as well as demonstrate that
such interoperability is in fact possible, providing great leverage for OCF!
TODO: Keep that in mind while designing the code. Encapsulate
interactions with the lower levels cleanly.
10.5. Integration with cluster-aware applications
Software like cluster-aware volume managers, filesystems or distributed
applications (databases) need to be integrated into the 'recovery' tree just
like Resource Agent based ones.
This should be reasonably simple - because it is a matter of inserting the
necessary trigger in the right order into the transition graph -, but these
clients need to be aware that they don't need to provide any fencing
themselves etc, because the external framework has already taken care of
this.
10.6. Relation to OCF
All applicable OCF specifications shall be implemented. Areas which OCF does
not specify yet shall be implemented as prototypes, hopefully serving as a
basis for OCF discussion.
11. Monitoring
11.1. Integration with health monitoring
Health monitoring should prevent startup of cluster software on a "sick"
node. ("Hey, I've rebooted 12 times within the last 60 minutes, maybe
bringing the cluster software online wouldn't be so smart!") Should be
handled by Local Resource Manager?
Health monitoring can send events to DC:
- I'm about to crash, migrate everything while you can
- I'm overloaded, please migrate some resources if possible
etc
11.2. Monitoring the CRM
Monitoring software can query CIB to get access to all data.
TODO: Should traps/events be triggered from inside the different components
themselves, or might it be a good idea to allow clients to "subscribe" to
certain parts of the CIB and be notified if a change occurs? This would
be helpful for SNMP/CIM traps.
(The later would be somewhat alike FailSafe's cdbd)
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Attention: Here be dragons. Anything following these lines are unordered
thoughts which haven't yet been incorporated into the grand scheme of things.
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
X. ...
Question: Why isn't moving a resource to another node an basic operation?
Answer: Because it involves more than one node and needs to be coordinated by
the designated CRM.
Question: How can this deal with disaster resilient setups, where one site may
be physical separate and fencing the other side is not possible?
Answer: See the "note" under "Hangman".

File Metadata

Mime Type
text/plain
Expires
Thu, Oct 16, 12:15 AM (1 d, 12 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
2442287
Default Alt Text
crm.txt (31 KB)

Event Timeline