Page MenuHomeClusterLabs Projects

TODO
No OneTemporary

* Not currently tested with GuLM.
* Resource group state is not persistent. That is, a disabled
service will revert to the "stopped" state when the resource manager
is restarted. This may not be a problem; the only time you'll
get here is after a loss of quorum.
- Might be better to start with all services disabled, but this would
break automatic recovery after a total cluster outage.
- Could have a tag in the RG group config: Don't start this automatically
initially.
* Online configuration changes don't work yet (at all).
- Need a way for CCS to notify clients of a configuration change
- Still working out with visegrips.
* OCF status/monitor check levels/time intervals don't work yet.
- These will eventually supercede the old "check interval" notion,
which is thus not implemented.
* Samba failover is not complete.
- We can do the old RHEL model which required everyone to have a copy
of each samba.conf.sharename
- We could add internal Samba configuration info into the DB as XML
tags/attributes.
- We could dump the entire samba.conf.sharename for each sharename into
cluster.xml as CDATA, and have special tags for IP addrs so that we
know where to insert the IPs from the resource group.
- This requires adding something to allow the determination of all
IPs in a resource group for a given samba service. Vile.
* Too much RAM allocated. ;) Silly pthreads.
- Need to check out how much stack size we actually need. Obviously,
resource group threads need the most of any due to recursion and
the whole arbitrarily complex tree structure of resource groups.
* I suspect View-Formation is not scalable more than about 32 nodes
the way it's currently implemented.
- Another implementation would be to only keep track of pieces of data
relevant to resource groups running on each node and update a
centralized server. This is more scalable, but requires more recovery
in the event that the centralized server fails. Centralized server
can simply be high-node-ID or low-node-ID of the current active
membership.
* No man pages.
* Init script 100% broken.
- Ok, all we need to do is start clurgmgrd and stop it.
* Ordered failover domain "relocate-to-more-preferred-node" is broken
at the moment.
* Rewrite list handling code or hack around linux/list.h's restrictions.
- sys/queue.h is BSD stuff, I don't want to go there again.
* Write RHEL 3 -> LCP upgrade.
- Should be simple; the resource group structure is based on the RHEL3
cluster manager model.

File Metadata

Mime Type
text/plain
Expires
Thu, Feb 27, 12:27 AM (17 h, 48 m ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
1465944
Default Alt Text
TODO (2 KB)

Event Timeline