This directory contains the design documentation and eventually the code for the Cluster Resource Manager framework for heartbeat. We might eventually have several policy engines and other modular components (as outlined in the crm.txt), but hopefully only one framework they all plug into. =============================== Installation: run ConfigureMe or bootstrap with the --enable-crm option make ; make install Pids and FIFOs will be in: $HA_VARLIBDIR/heartbeat/crm Logs will be in: /var/log Executables will be in /usr/lib/heartbeat/ or wherever the heartbeat executable is normally found Startup: - Add a respawn entry for ccm to ha.cf - Add the appropriate "apiauth" entries to ha.cf - Start heartbeat - start the cib ($your_bin_dir/cib) - start the crmd ($your_bin_dir/crmd) Testing: I recommend the following command for testing: admin/crmadmin --daemon --query -o node It queries the CIB for all the objects of type "node". To add nodes, use a command like: admin/crmadmin --create -V -o node -i node_uname_1 -D "test node: node_uname_1" Everything works for the resources and constraints, but lets keep it simple for the moment :) I'll be committing a test script as soon as this IPC "issue" is resolved and I can verify it works. Notes: To help demonstrate my "issue", the admin client has been temporarilty modified to *not* exit. To reproduce the probelm, 1) execute: $your_bin_dir/crmadmin --daemon --query -o node 2) in another window execute the same command again 3*) repeat as many times as you like, they should all succeed (as in get a result back) 4) Exit any of the crmadmin processes with ctrl-c 5) re-execute: $your_bin_dir/crmadmin --daemon --query -o node At this point you should notice in the logs that the CRMd got a disconnect for the new client almost straight after the G_main_add_IPC_WaitConnection callback is invoked. Since this only occurs *after* a client has quit, it seems reasonable that whatever the cause is, its occuring in code that is invoked by this event. In the CRM, this consititutes the if(client->ch_status == IPC_DISCONNECT) block of crmd_ipc_input_callback() and all of default_ipc_input_destroy(). Currently I have commented out all meaningful code in those blocks and so it is now my belief that the culprit lies elsewhere. Next steps as I see them are to check the activities of those calling the aforementioned functions, followed by using a debugging version of malloc.