Page MenuHomeClusterLabs Projects

README
No OneTemporary

This directory contains the design documentation and eventually the code
for the Cluster Resource Manager framework for heartbeat.
We might eventually have several policy engines and other modular
components (as outlined in the crm.txt), but hopefully only one
framework they all plug into.
===============================
Installation:
run ConfigureMe or bootstrap with the --enable-crm option
make ; make install
Pids and FIFOs will be in: $HA_VARLIBDIR/heartbeat/crm
Logs will be in: /var/log
Executables will be in /usr/lib/heartbeat/ or wherever the heartbeat
executable is normally found
Startup:
- Add a respawn entry for ccm to ha.cf
- Add the appropriate "apiauth" entries to ha.cf
- Start heartbeat
- start the cib ($your_bin_dir/cib)
- start the crmd ($your_bin_dir/crmd)
Testing:
I recommend the following command for testing:
admin/crmadmin --daemon --query -o node
It queries the CIB for all the objects of type "node".
To add nodes, use a command like:
admin/crmadmin --create -V -o node -i node_uname_1 -D "test node: node_uname_1"
Everything works for the resources and constraints, but lets keep it simple
for the moment :)
I'll be committing a test script as soon as this IPC "issue" is resolved
and I can verify it works.
Notes:
To help demonstrate my "issue", the admin client has been temporarilty
modified to *not* exit. To reproduce the probelm,
1) execute: $your_bin_dir/crmadmin --daemon --query -o node
2) in another window execute the same command again
3*) repeat as many times as you like, they should all succeed (as in get
a result back)
4) Exit any of the crmadmin processes with ctrl-c
5) re-execute: $your_bin_dir/crmadmin --daemon --query -o node
At this point you should notice in the logs that the CRMd got a
disconnect for the new client almost straight after the
G_main_add_IPC_WaitConnection callback is invoked.
Since this only occurs *after* a client has quit, it seems reasonable
that whatever the cause is, its occuring in code that is invoked by this
event. In the CRM, this consititutes the
if(client->ch_status == IPC_DISCONNECT) block of crmd_ipc_input_callback()
and all of default_ipc_input_destroy().
Currently I have commented out all meaningful code in those blocks and so
it is now my belief that the culprit lies elsewhere. Next steps as I see
them are to check the activities of those calling the aforementioned
functions, followed by using a debugging version of malloc.

File Metadata

Mime Type
text/plain
Expires
Thu, Oct 16, 12:15 AM (1 d, 6 h)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
2502801
Default Alt Text
README (2 KB)

Event Timeline