This directory contains the design documentation and eventually the code
for the Cluster Resource Manager framework for heartbeat.

We might eventually have several policy engines and other modular
components (as outlined in the crm.txt), but hopefully only one
framework they all plug into.


===============================

Installation:
run ConfigureMe or bootstrap with the --enable-crm option
make ; make install

Pids and FIFOs will be in: $HA_VARLIBDIR/heartbeat/crm
Logs will be in: /var/log
Executables will be in /usr/lib/heartbeat/ or wherever the heartbeat 
  executable is normally found

Startup:
- Add a respawn entry for ccm to ha.cf
- Add the appropriate "apiauth" entries to ha.cf
- Start heartbeat
- start the cib ($your_bin_dir/cib)
- start the crmd ($your_bin_dir/crmd)


Testing:
I recommend the following command for testing:
admin/crmadmin --daemon --query -o node

It queries the CIB for all the objects of type "node".

To add nodes, use a command like:
admin/crmadmin --create -V -o node -i node_uname_1 -D "test node: node_uname_1"

Everything works for the resources and constraints, but lets keep it simple
for the moment :)

I'll be committing a test script as soon as this IPC "issue" is resolved 
and I can verify it works.

Notes:
To help demonstrate my "issue", the admin client has been temporarilty 
modified to *not* exit.  To reproduce the probelm, 
1) execute: $your_bin_dir/crmadmin --daemon --query -o node
2) in another window execute the same command again
3*) repeat as many times as you like, they should all succeed (as in get 
    a result back)
4) Exit any of the crmadmin processes with ctrl-c
5) re-execute: $your_bin_dir/crmadmin --daemon --query -o node

At this point you should notice in the logs that the CRMd got a 
disconnect for the new client almost straight after the 
G_main_add_IPC_WaitConnection callback is invoked.

Since this only occurs *after* a client has quit, it seems reasonable 
that whatever the cause is, its occuring in code that is invoked by this 
event.  In the CRM, this consititutes the 
if(client->ch_status == IPC_DISCONNECT) block of crmd_ipc_input_callback() 
and all of default_ipc_input_destroy().

Currently I have commented out all meaningful code in those blocks and so 
it is now my belief that the culprit lies elsewhere.  Next steps as I see 
them are to check the activities of those calling the aforementioned 
functions, followed by using a debugging version of malloc.