diff --git a/README.devmap b/README.devmap deleted file mode 100644 index 89e4caf7..00000000 --- a/README.devmap +++ /dev/null @@ -1,1220 +0,0 @@ -Copyright (c) 2002-2004 MontaVista Software, Inc. -Copyright (c) 2006, 2009 Red Hat, Inc. - -All rights reserved. - -This software licensed under BSD license, the text of which follows: - -Redistribution and use in source and binary forms, with or without -modification, are permitted provided that the following conditions are met: - -- Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. -- Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. -- Neither the name of the MontaVista Software, Inc. nor the names of its - contributors may be used to endorse or promote products derived from this - software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF -THE POSSIBILITY OF SUCH DAMAGE. - -------------------------------------------------------------------------------- -This file provides a map for developers to understand how to contribute -to the corosync project. The purpose of this document is to prepare a -developer to write a service for corosync, or understand the architecture -of corosync. - -The following is described in this document: - - * all files, purpose, and dependencies - * architecture of corosync - * taking advantage of virtual synchrony - * adding libraries - * adding services - -------------------------------------------------------------------------------- - all files, purpose, and dependencies. -------------------------------------------------------------------------------- - -*----------------* -*- AIS INCLUDES -* -*----------------* - -include/saAmf.h ------------------ - Definitions for AMF interface. - -include/saCkpt.h ------------------- - Definitions for CKPT interface. - -include/saClm.h ------------------ - Definitions for CLM interface. - -include/saAmf.h ------------------ - Definitions for the AMF interface. - -include/saEvt.h ------------------ - Defintiions for the EVT interface. - -include/saLck.h ------------------ - Definitions for the LCK interface. - -include/cfg.h - Definitions for the CFG interface. - -include/cpg.h - Definitions for the CPG interface. - -include/evs.h - Definitions for the EVS interface. - -include/ipc_amf.h - IPC interface between client and server for AMF service. - -include/ipc_cfg.h - IPC interface between client and server for CFG service. - -include/ipc_ckpt.h - IPC interface between client and server for CKPT service. - -include/ipc_clm.h - IPC interface between client and server for CLM service. - -include/ipc_cpg.h - IPC interface between client and server for CPG service. - -include/ipc_evs.h - IPC interface between client and server for EVS service. - -include/ipc_evt.h - IPC interface between client and server for EVT service. - -include/ipc_gen.h - IPC interface for generic operations. - -include/ipc_lck.h - IPC interface between client and server for LCK service. - -include/ipc_msg.h - IPC interface between client and server for MSG service. - -include/hdb.h - Handle database implementation. - -include/list.h - Linked list implementation. - -include/swab.h - Byte swapping implementation. - -include/queue.h - FIFO queue implementation. - -include/sq.h - Sort queue where items are sorted according to a sequence number. Avoids - Sort, hence, install of a new element takes is O(1). Inline implementation. - - depends on list. - -*---------------* -* AIS LIBRARIES * -*---------------* -lib/amf.c ---------- - AMF user library linked into user application. - -lib/cfg.c ---------- - CFG user library linked into user application. - -lib/ckpt.c ---------- - CKPT user library linked into user application. - -lib/clm.c ---------- - CLM user library linked into user application. - -lib/cpg.c ---------- - CPG user library linked into user application. - -lib/evs.c ---------- - EVS user library linked into user application. - -lib/evt.c ---------- - EVT user library linked into user application. - -lib/lck.c ---------- - LCK user library linked into user application. - -lib/msg.c ---------- - MSG user library linked into uer application. - -lib/amf.c ---------- - AMF user library linked into user application. - -lib/ckpt.c ----------- - CKPT user library linked into user application. - -lib/evt.c ----------- - EVT user library linked into user application. - -lib/util.c ----------- - Utility functions used by all libraries. - -*-----------------* -*- AIS EXECUTIVE -* -*-----------------* - -exec/aisparser.{h|c} - Parser plugin for default configuration file format. - -exec/aispoll.{h|c} - Poll abstraction interface. - -exec/amfapp.c - AMF application handling. - -exec/amfcluster.c - AMF cluster handling. - -exec/amfcomp.c - AMF component level handling. - -exec/amf.h - Defines all AMF symbol names. - -exec/amfnode.c - AMF node level handling. - -exec/amfsg.c - AMF service group handling. - -exec/amfsi.c - AMF Service instance handling. - -exec/amfsu.c - AMF service unit handling. - -exec/amfutil.c - AMF utility functions. - -exec/cfg.c - Server side implementation of CFG service which is used to display - redundant ring status and reenabling redundant rings. - -exec/ckpt.c - Server side implementation of Checkpointing (CKPT API). - -exec/clm.c - Server side implementation of Cluster Membership (CLM API). - -exec/cpg.c - Server side implementation of closed procss groups (CPG API). - -exec/crypto.{c|h} - Cryptography functions used by corosync. - -exec/evs.c - Server side implementation of extended virtual synchrony passthrough - (EVS API). - -exec/evt.c - Server side implementation of Event Service (EVT API). - -exec/ipc.{c|h} - All IPC operations used by corosync. - -exec/jhash.h - A hash routine. - -exec/keygen.c - Secret key generator used by corosync encryption tools. - -exec/lck.c - Server side implementation of the distributed lock service (LCK API). - -exec/main.{c|h} - Main function which connects all components together. - -exec/mainconfig.{c|h} - Reads main configuration that is set in the configuration parser. - -exec/mempool.{c|h} - Currently unused. - -exec/msg.c - Server side implementation of message service (MSG API). - -exec/objdb.{c|h} - Object database used to configure services. - -exec/corosync-instantiate.c - instantiates a component by forking and exec'ing it and writing its - pid to a pid file. - -exec/print.{c|h} - Non-blocking thread-based logging service with overflow protection. - -exec/service.{c|h} - Service handling routines including the default service handler - description. - -exec/sync.{c|h} - The synchronization service implementation. - -exec/timer.{c|h} - Threaded based timer service (deprecated - use qb_loop_timer). - -exec/totemconfig.{c.h} - The totem configuration configurator from data parsed with aisparser - in the configuration file. - -exec/totem.h - General definitions for the totem protocol used by the totem stack. - -exec/totemip.{c.h} - IP handling functions for totem - lowest on stack. - -exec/{totemrrp.{c.h} - The totem multi ring protocool and currently unimplemented. Between - totemsrp and totempg. - -exec/totemnet.{c.h} - Network handling functions for totem - between totemip and totemrrp. - -exec/totempg.{c|h} - Process groups interface which is used by all applications - highest on - stack. - -exec/totemrrp.{c.h} - Redundant ring functions for totem - between totemnet and totemsrp. - -exec/util.{c|h} - Utility functions used by corosync executive. - -exec/version.h - Defines build version. - -exec/vsf.h - Virtual Synchrony plugin API. - -exec/vsf_ykd.c - Virtual Synchrony YKD Dynamic Linear Voting algorithm. - -exec/wthread.{c|h} - Worker threads API. - -loc ---- -Counts the lines of code in the AIS implementation. - -------------------------------------------------------------------------------- - architecture of corosync -------------------------------------------------------------------------------- - -The corosync standards based cluster framework is a generic cluster plugin -architecture used to create cluster APIs and services. Usually there are -libraries which implement APIs and are linked into the end user application. -The libraries request services from the aisexec process, called the AIS -executive. The AIS executive uses the Totem protocol stack to communicate -within the cluster and execute operations on behalf of the user. Finally the -response of the API is delivered once the operation has completed. - - - -------------------------------------------------- - | AMF and more services libraries | - -------------------------------------------------- - | IPC API | - -------------------------------------------------- - | corosync Executive | - | | - | +---------+ +--------+ +---------+ | - | | Object | | AIS | | Service | | - | | Datbase | | Config | | Handler | | - | | Service | | Parser | | Manager | | - | +---------+ +--------+ +---------+ | - | +-------+ +-------+ | - | | AMF | | more | | - | |Service| |svcs...| | - | +-------+ +-------+ | - | +---------+ | - | | Sync | | - | | Service | | - | +---------+ | - | +---------+ | - | | VSF | | - | | Service | | - | +---------+ | - | +--------------------------------+ +--------+ | - | | Totem | | Timers | | - | | Stack | | API | | - | +--------------------------------+ +--------+ | - | +-----------+ | - | | Poll | | - | | Interface | | - | +-----------+ | - | | - ------------------------------------------------- - - Figure 1: corosync Architecture - -Every application that intends to use corosync links with the libais library. -This library uses IPC, or more specifically BSD unix sockets, to communicate -with the executive. The library is a small program responsible only for -packaging the request into a message. This message is sent, using IPC, to -the executive which then processes it. The library then waits for a response. - -The library itself contains very little intelligence. Some utility services -are provided: - - * create a connection to the executive - * send messages to the executive - * retrieve messages from the executive - * Poll on a fd - * create a handle instance - * destroy a handle instance - * get a reference to a handle instance - * release a reference to a handle instance - -When a library connects, it sends via a message, the service type. The -service type is stored and used later to reference the message handlers -for both the library message handlers and executive message handlers. -Every message sent contains an integer identifier, which is used to index -into an array of message handlers to determine the correct message handler -to execute For the library. Hence a message is uniquely identified by the -message handler ID number and the service handler ID number. - -When a library sends a message via IPC, the delivery of the message occurs -to the proper library message handler. The library message handler is -responsible for sending the message via the totem process groups API to all -nodes in the system. - -This simplifies the library handler significantly. The main purpose of the -library handler should be to package the library request into a message that -can be sent to all nodes. - -The totem process groups API sends the message according to the extended -virtual synchrony model. The group messaging interface also delivers the -message according to the extended virtual synchrony model. This has several -advantages which are described in the virtual synchrony section. One -advantage that must be described now is that messages are self-delivered; -if a node sends a message, that same message is delivered back to that -node. - -When the executive message is delivered, it is processed by the executive -message handler. The executive message handler contains the brains of -AIS and is responsible for making all decisions relating to the request -from the libais library user. - -------------------------------------------------------------------------------- - taking advantage of virtual synchrony -------------------------------------------------------------------------------- - -definitions: -processor: a system responsible for executing the virtual synchrony model -configuration: the list of processors under which messages are delivered -partition: one or more processors leave the configuration -merge: one or more processors join the configuration -group messaging: sending a message from one sender to many receivers - -Virtual synchrony is a model for group messaging. This is often confused -with particular implementations of virtual synchrony. Try to focus on -what virtual syncrhony provides, not how it provides it, unless interested -in working on the group messaging interface of corosync. - -Virtual synchrony provides several advantages: - - * integrated membership - * strong membership guarantees - * agreed ordering of delivered messages - * same delivery of configuration changes and messages on every node - * self-delivery - * reliable communication in the face of unreliable networks - * recovery of messages sent within a configuration where possible - * use of network multicast using standard UDP/IP - -Integrated membership allows the group messaging interface to give -configuration change events to the API services. This is obviously beneficial -to the cluster membership service (and its respective API0, but is helpful -to other services as described later. - -Strong membership guarantees allow a distributed application to make decisions -based upon the configuration (membership). Every service in corosync registers -a configuration change function. This function is called whenever a -configuration change occurs. The information passed is the current processors, -the processors that have left the configuration, and the processors that have -joined the configuration. This information is then used to make decisions -within a distributed state machine. One example usage is that an AMF component -running a specific processor has left the configuration, so failover actions -must now be taken with the new configuration (and known components). - -Virtual synchrony requires that messages may be delivered in agreed order. -FIFO order indicates that one sender and one receiver agree on the order of -messages sent. Agreed ordering takes this requirement to groups, requiring that -one sender and all receivers agree on the order of messages sent. - -Consider a lock service. The service is responsible for arbitrating locks -between multiple processors in the system. With fifo ordering, this is very -difficult because a request at about the same time for a lock from two seperate -processors may arrive at all the receivers in different order. Agreed ordering -ensures that all the processors are delivered the message in the same order. -In this case the first lock message will always be from processor X, while the -second lock message will always be from processor Y. Hence the first request -is always honored by all processors, and the second request is rejected (since -the lock is taken). This is how race conditions are avoided in distributed -systems. - -Every processor is delivered a configuration change and messages within a -configuration in the same order. This ensures that any distributed state -machine will make the same decisions on every processor within the -configuration. This also allows the configuration and the messages to be -considered when making decisions. - -Virtual synchrony requires that every node is delivered messages that it -sends. This enables the logic to be placed in one location (the handler -for the delivery of the group message) instead of two seperate places. This -also allows messages that are sent to be ordered in the stream of other -messages within the configuration. - -Certain guarantees are required by virtual synchrony. If a message is sent, -it must be delivered by every processor unless that processor fails. If a -particular processor fails, a configuration change occurs creating a new -configuration under which a new set of decisions may be made. This implies -that even unreliable networks must reliably deliver messages. The -mplementation in corosync works on unreliable as well as reliable networks. - -Every message sent must be delivered, unless a configuration change occurs. -In the case of a configuration change, every message that can be recovered -must be recovered before the new configuration is installed. Some systems -during partition won't continue to recover messages within the old -configuration even though those messages can be recovered. Virtual synchrony -makes that impossible, except for those members that are no longer part -of a configuration. - -Finally virtual syncrhony takes advantage of hardware multicast to avoid -duplicated packets and scale to large transmit rates. On 100mbit network, -corosync can approach wire speeds depending on the number of messages queued -for a particular processor. - -What does all of this mean for the developer? - - * messages are delivered reliably - * messages are delivered in the same order to all nodes - * configuration and messages can both be used to make decisions - -------------------------------------------------------------------------------- - adding libraries -------------------------------------------------------------------------------- - -The first stage in adding a library to the system is to develop the library. - -Library code should follow these guidelines: - - * use SA Forum coding style for SA Forum APIs to aid in debugging - * use corosync coding guidelines for APIs that are not SA Forum that - are to be merged into the corosync tree. - * implement all library code within one file named after the api. - examples are ckpt.c, clm.c, amf.c. - * use parallel structure as much as possible between different APIs - * make use of utility services provided by util.c. - * if something is needed that is generic and useful by all services, - submit patches for other libraries to use these services. - * use the reference counting handle manager for handle management. - ------------------- - Version checking ------------------- - -struct saVersionDatabase { - int versionCount; - SaVersionT *versionsSupported; -}; - -The versionCount number describes how many entries are in the version database. -The versionsSupported member is an array of SaVersionT describing the acceptable -versions this API supports. - -An api developer specifies versions supported by adding the following C -code to the library file: - -/* - * Versions supported - */ -static SaVersionT clmVersionsSupported[] = { - { 'B', 1, 1 }, - { 'b', 1, 1 } -}; - -static struct saVersionDatabase clmVersionDatabase = { - sizeof (clmVersionsSupported) / sizeof (SaVersionT), - clmVersionsSupported -}; - -After this is specified, the following API is used to check versions: - -SaErrorT -saVersionVerify ( - struct saVersionDatabase *versionDatabase, - const SaVersionT *version); - -An example usage of this is - SaErrorT error; - - error = saVersioNVerify (&clmVersionDatabase, version); - - where version is a pointer to an SaVersionT passed into the API. - -error will return SA_OK if the version is valid as specified in the -version database. - ------------------- - Handle Instances ------------------- - -Every handle instance is stored in a handle database. The handle database -stores instance information for every handle used by libraries. The system -includes reference counting and is safe for use in threaded applications. - -The handle database structure is: - -struct saHandleDatabase { - unsigned int handleCount; - struct saHandle *handles; - pthread_mutex_t mutex; - void (*handleInstanceDestructor) (void *); -}; - -handleCount is the number of handles -handles is an array of handles -mutex is a pthread mutex used to mutually exclude access to the handle db -handleInstanceDestructor is a callback that is called when the handle - should be freed because its reference count as dropped to zero. - -The handle database is defined in a library as follows: - -static void clmHandleInstanceDestructor (void *); - -static struct saHandleDatabase clmHandleDatabase = { - .handleCount = 0, - .handles = 0, - .mutex = PTHREAD_MUTEX_INITIALIZER, - .handleInstanceDestructor = clmHandleInstanceDestructor -}; - -There are several APIs to access the handle database: - -SaErrorT -saHandleCreate ( - struct saHandleDatabase *handleDatabase, - int instanceSize, - int *handleOut); - -Creates an instance of size instanceSize in the handleDatabase paraemter -returning the handle number in handleOut. The handle instance reference -count starts at the value 1. - -SaErrorT -saHandleDestroy ( - struct saHandleDatabase *handleDatabase, - unsigned int handle); - -Destroys further access to the handle. Once the handle reference count -drops to zero, the database destructor is called for the handle. The handle -instance reference count is decremented by 1. - -SaErrorT -saHandleInstanceGet ( - struct saHandleDatabase *handleDatabase, - unsigned int handle, - void **instance); - -Gets an instance specified handle from the handleDatabase and returns -it in the instance member. If the handle is valid SA_OK is returned -otherwise an error is returned. This is used to ensure a handle is -valid. Eveyr get call increases the reference count on a handle instance -by one. - -SaErrorT -saHandleInstancePut ( - struct saHandleDatabase *handleDatabase, - unsigned int handle); - -Decrements the reference count by 1. If the reference count indicates -the handle has been destroyed, it will then be removed from the database -and the destructor called on the instance data. The put call takes care -of freeing the handle instance data. - -Create a data structure for the instance, and use it within the libraries -to store state information about the instance. This information can be -the handle, a mutex for protecting I/O, a queue for queueing async messages -or whatever is needed by the API. - ------------------------------------ - communicating with the executive ------------------------------------ - -A service connection is created with the following API; - -SaErrorT -saServiceConnect ( - int *responseOut, - int *callbackOut, - enum service_types service); - - -The responseOut parameter specifies the file descriptor where response messages -will be delivered. The callback out parameter describes the file descriptor -where callback messages are delivered. - -The service specifies the service to use. - -Messages are sent and received from the executive with the following functions: - -SaAisErrorT saSendMsgRetry ( - int s, - struct iovec *iov, - unsigned int iov_len); - -the s member is the socket to use retrieved with saServiceConnect -The iov is the iovector used to send a message. -the iov_len is the number of elements in iov. - -This sends an IO-vectorized message. - -SaErrorT -saSendRetry ( - int s, - const void *msg, - size_t len, - int flags); - -the s member is the socket to use retrieved with saServiceConnect -the msg member is a pointer to the message to send to the service -the len member is the length of the message to send -the flags parameter is the flags to use with the sendmsg system call - - -This sends a data blob to the exective. - -A message is received from the executive with the function: - -SaErrorT -saRecvRetry ( - int s, - void *msg, - size_t len, - int flags); - -the s member is the socket to use retrieved with saServiceConnect -the msg member is a pointer to the message to receive to the service -the len member is the length of the message to receive -the flags parameter is the flags to use with the sendmsg system call - -A message may be send and a reply waited for with the following function: -SaAisErrorT saSendMsgReceiveReply ( - int s, - struct iovec *iov, - unsigned int iov_len, - void *responseMessage, - int responseLen) - -s is the socket to send and receive the response. -iov is the iovector to send. -iov_len is the number of elements in iov. -responseMessage is the data block used to store the response. -responesLen is the length of the data block that is expected to be received. - -Waiting for a file descriptor using poll systemcall is done with the api: - -SaErrorT -saPollRetry ( - struct pollfd *ufds, - unsigned int nfds, - int timeout); - -where the parameters are the standard poll parameters. - -Messages can be received out of order searching for a specific message id with: - ----------- - messages ----------- -Please follow the style of the messages. It makes debugging much easier -if parallel style is used. - -An service should be added to service_types enumeration in ipc_gen or in the -case of an external project, a number should be registered with the project. - -enum service_types { - EVS_SERVICE = 0, - CLM_SERVICE = 1, - AMF_SERVICE = 2, - CKPT_SERVICE = 3, - EVT_SERVICE = 4, - LCK_SERVICE = 5, - MSG_SERVICE = 6, - CFG_SERVICE = 7, - CPG_SERVICE = 8 -}; - -These are the request CLM message identifiers: - -Each library should have an ipc_APINAME.h file in include. It should define -request types and response types. - -enum req_clm_types { - MESSAGE_REQ_CLM_TRACKSTART = 0, - MESSAGE_REQ_CLM_TRACKSTOP = 1, - MESSAGE_REQ_CLM_NODEGET = 2, - MESSAGE_REQ_CLM_NODEGETASYNC = 3 -}; - -These are the response CLM message identifiers: - -enum res_clm_types { - MESSAGE_RES_CLM_TRACKCALLBACK = 0, - MESSAGE_RES_CLM_TRACKSTART = 1, - MESSAGE_RES_CLM_TRACKSTOP = 2, - MESSAGE_RES_CLM_NODEGET = 3, - MESSAGE_RES_CLM_NODEGETASYNC = 4, - MESSAGE_RES_CLM_NODEGETCALLBACK = 5 -}; - -A request header should be placed at the front of every message send by -the library. - -typedef struct { - int size __attribute__((aligned(8))); - int id __attribute__((aligned(8))); -} mar_req_header_t __attribute__((aligned(8))); - -There is also a response message header which should start every response -message: - -typedef struct { - int size; __attribute__((aligned(8))) - int id __attribute__((aligned(8))); - SaAisErrorT error __attribute__((aligned(8))); -} mar_res_header_t __attribute__((aligned(8))); - -the error parameter is used to pass errors from the executive to the library, -including SA_ERR_TRY_AGAIN for flow control, which is described later. - -This is described later: - -typedef struct { - mar_uint32_t nodeid __attribute__((aligned(8))); - void *conn __attribute__((aligned(8))); -} mar_message_source_t __attribute__((aligned(8))); - -This is the MESSAGE_REQ_CLM_TRACKSTART message id above: - -struct req_clm_trackstart { - mar_req_header_t header; - SaUint8T trackFlags; - SaClmClusterNotificationT *notificationBufferAddress; - SaUint32T numberOfItems; -}; - -The saClmClusterTrackStart api should create this message and send it to the -executive. - -responses should be of: - -struct res_clm_trackstart - ------------- - some notes ------------- -* Avoid doing anything tricky in the library itself. Let the executive - handler do all of the work of the system. minimize what the API does. -* Once an api is developed, it must be added to the makefile. Just add - a line for the file to EXECOBJS build line. -* protect I/O send/recv with a mutex. -* always look at other libraries when there is a question about how to - do something. It has likely been thought out in another library. - -------------------------------------------------------------------------------- - adding services -------------------------------------------------------------------------------- -Services are defined by service handlers and messages described in -include/ipc_SERVICE.h. These two peices of information are used by the -executive to dispatch the correct messages to the correct receipients. - -------------------------------- - the service handler structure -------------------------------- - -A service is added by defining a structure defined in exec/service.h. The -structure is a little daunting: - -struct libais_handler { - int (*libais_handler_fn) (void *conn, void *msg); - int response_size; - int response_id; - enum corosync_flow_control flow_control; -}; - -The response_size, response_id, and flow_control for a library handler are -used for flow control. A response message will be sent to the library of the -size response_size, with the header id of response_id if the totem message -queue is full. Some library APIs may not need to block in this condition -(because they don't have to use totem), so they should specify -COROSYNC_FLOW_CONTROL_NOT_REQUIREDin the flow control field. - -The libais_handler_fn is a function to be called when the library handler is -requested to be executed. - -struct corosync_exec_handler { - void (*exec_handler_fn) (void *msg, unsigned int nodeid); - void (*exec_endian_convert_fn) (void *msg); -}; - -The exec_handler_fn is a function to be called when the executive handler is -requested to execute. - -The exec_endian_convert_fn is a function to be called to convert the endianess -of the executive message. Note messages are not stored in big or little endian -format before transmit. Instead they are transmitted in either big endian or -little endian depending on the byte order of the transmitter and converted to -the host machine order on receipt of the message. - -struct corosync_service_handler { - unsigned char *name; - unsigned short id; - unsigned int private_data_size; - int (*lib_init_fn) (void *conn); - int (*lib_exit_fn) (void *conn); - struct corosync_lib_handler *lib_service; - int lib_service_count; - struct corosync_exec_handler *exec_service; - int (*exec_init_fn) (struct objdb_iface_ver0 *); - int (*config_init_fn) (struct objdb_iface_ver0 *); - void (*exec_dump_fn) (void); - int exec_service_count; - void (*confchg_fn) ( - enum totem_configuration_type configuration_type, - const unsigned int *member_list, size_t member_list_entries, - const unsigned int *left_list, size_t left_list_entries, - const unsigned int *joined_list, size_t joined_list_entries, - const struct memb_ring_id *ring_id); - void (*sync_init) (void); - int (*sync_process) (void); - void (*sync_activate) (void); - void (*sync_abort) (void); -}; - -name is the name of the service. - -id is the identifier of the service. - -private_data_size is the size of the private data used by the connection -which the library and executive handlers can reference. - -lib_init_fn is the function executed when a library connection is made to -the service handler. - -lib_exit_fn is the function executed when a library connection is exited -either because the application closed the file descriptor, or the OS -closed the file descriptor. - -lib_service is an array of corosync_lib_handler data structures which define -the library service handler. - -lib_service_count is the number of elements in lib_service. - -exec_service is an array of corosync_exec_handler data structures which define -the executive service handler. - -exec_init_fn is a function used to initialize the executive service. This -is only called once. - -config_init_fn is called to parse config files and populate the object -database. - -exec_dump_fn is called when SIGUSR2 is sent to the executive to dump the -current state of the service. - -exec_service_count is the number of entries in the exec_service array. - -confchg_fn is called every time a configuration change occurs. - -sync_init is called when the service should begin synchronization. - -sync_process is called to process synchronization messages. - -sync_activate is called to activate the current service synchronization. - -sync_abort is called to abort the current service synchronization. - --------------- - flow control --------------- -The totem protocol includes flow control so that it doesn't send too many -messages when the network is completely full. But the library can -still send messages to the executive much faster then the executive can send -them over totem. So the library relies on the group messaging flow control to -control flow of messages sent from the library. If the totem queues are full, -no more messages may be sent, so the executive in ipc.c automatically detects -this scenario and returns an SA_ERR_TRY_AGAIN error. - -When a library gets SA_ERR_TRY_AGAIN, the library may either retry, or return -this error to the user if the error is allowed by the API definitions. The -The other information is critical to ensuring that the library reads the correct -message and size of message. Make sure the libais_handler matches the messages -used in the handler function. - ------------------------------------------------- - dynamically linking the service handler plugin ------------------------------------------------- - -The service handler needs some special magic to dynamically be linked into -corosync. - -/* - * Dynamic loader definition - */ -static struct corosync_service_handler *clm_get_service_handler_ver0 (void); - -static struct corosync_service_handler_iface_ver0 clm_service_handler_iface = { - .corosync_get_service_handler_ver0 = clm_get_service_handler_ver0 -}; - -static struct lcr_iface corosync_clm_ver0[1] = { - { - .name = "corosync_clm", - .version = 0, - .versions_replace = 0, - .versions_replace_count = 0, - .dependencies = 0, - .dependency_count = 0, - .constructor = NULL, - .destructor = NULL, - .interfaces = NULL - } -}; - -static struct lcr_comp clm_comp_ver0 = { - .iface_count = 1, - .ifaces = corosync_clm_ver0 -}; - -static struct corosync_service_handler *clm_get_service_handler_ver0 (void) -{ - return (&clm_service_handler); -} - -__attribute__ ((constructor)) static void clm_comp_register (void) { - lcr_interfaces_set (&corosync_clm_ver0[0], &clm_service_handler_iface); - - lcr_component_register (&clm_comp_ver0); -} - -Once this code is added (substitute clm for the service being implemented), -the service will be loaded if its in the default services list. - -The default service list is specified in service.c:default_services. If -creating an external plugin, there are configuration parameters which may -be used to add your plugin into the corosync scanning of plugins. - ---------------------------------- - Connection specific information ---------------------------------- -Every connection may have specific connection information if private data -is greater then zero for the service handler. This is used to allow each -library connection to maintain private state to that connection. The private -data for a connection can be retrieved with: -struct service_pd service_pd = (struct service_pd *)corosync_conn_private_data_get (conn); - -where service is the name of the service implemented and conn is the connection -information likely passed into the library handler or stored in a -message_source structure for later use by an executive handler. - ------------------------------- - sending responses to the api ------------------------------- - -A message is sent to the library from the executive message handler using -the function: - -extern int corosync_conn_send_response (void *conn_info, void *msg, - int mlen); - -conn_info is passed into the library message handler or stored in the -executive message. This member describes the connection to send the response. - -msg is the message to send -mlen is the length of the message to send - -Keep in mind that struct res_message should be at the beginning of the response -message so that it follows the style used in the rest of corosync. - --------------------------------------------- - deferring response to an executive message --------------------------------------------- - -The message source structure is used to store information about the source of a -message so a later executive message can respond to a library request. In -a library handler, the source field should be set up with: - -message_source_set (&req_exec_ZZZZZZZ.source, conn); -gmi_mcast (req_exec_ZZZZZZZ) - -In this case conn_info is passed into the library message handler - -Then the executive message handler determines if this processor is responsible -for responding: - -if (message_source_is_local (conn)) { - corosync_conn_send_response (); - -} - ---------------- - Using totempg ---------------- -To send a message to every processor and the local processor for self -delivery according to virtual synchrony semantics use: - -The totempg interface supports multiple users at one time and if you need -to use a full totempg interface (defined in totempg.h) please ask for -assistance on the mailing list. If you simply want to use multicast -transmissions in corosync, do the following: - - assert (totempg_groups_mcast_joined (corosync_group_handle, &req_exec_clm_iovec, 1, TOTEMPG_AGREED) == 0); - ------------------ - library handler ------------------ -Every library handler has the prototype: - -static int message_handler_req_clm_init (void *conn, void *msg); - -The start of the handler function should look something like this: - -int message_handler_req_clm_trackstart (void *conn *conn, - void *msg) -{ - struct req_clm_trackstart *req_clm_trackstart = - (struct req_clm_trackstart *)message; - - { package up library handler message into executive message } - { multicast message using totempg interface } -} - -This assigns the void *message to a structure that can be used by the -library handler. - -The conn field is used to indicate where the response should respond to. -Use the tricks described in deferring a response to the executive handler to -have the executive handler respond to the message. - -avoid doing anything tricky in a library handler. Do all the work in the -executive handler at first. If later, it is possible to optimize, optimize -away. - -------------------- - executive handler -------------------- -Every executive handler has the prototype: - -static int message_handler_req_exec_clm_nodejoin (void *msg, - unsigned int nodeid); - -The start of the handler function should look something like this: - -static int message_handler_req_exec_clm_nodejoin (void *msg, - unsigned int nodeid); -{ - struct req_exec_clm_nodejoin *req_exec_clm_nodejoin = (struct req_exec_clm_nodejoin *)message; - - { do real work of executing request, this is done on every node } -} - -The conn_info structure is not available. If it is needed, it can be stored -in the message sent by the library message handler in a source structure. - -The msg field contains the message sent by the library handler - -The nodeid is a unique node identifier of the node that originated the message. - --------------------- - the libais_init_fn --------------------- -This should be used to initialize any state for the connection. - --------------------- - the libais_exit_fn --------------------- -This function is called every time a service connection is disconnected by -the executive. Free memory, change structures, or whatever work needs to -be done to clean up. - -If the exit_fn couldn't complete because it is waiting for some event, it may -return -1, which will allow the executive to make some forward progress. Then -exit_fn will be called again. Return 0 when the exit was completed. This is -most useful when toteom should be used to queue a message, but the queue is -full. In this case, waiting a few more seconds may open up the queue, so -return -1, and then the executive will try again to call exit_fn. Do NOT -return -1 forever or the ais executive will spin. - -If -1 is returned, ENSURE that the state of the library hasn't changed so much that -exit_fn cannot be called again. If exit_fn returns -1, it WILL be called again -so expect it in the code. - ----------------- - the confchg_fn ----------------- -This function is called whenever a configuration change occurs. Some -services may not need this function, while others may. This is a good way -to sync up joining nodes with the current state of the information stored -on a particular processor. - -------------------------------------------------------------------------------- -Final comments -------------------------------------------------------------------------------- -GDB is your friend, especially the "where" command. But it stops execution. -This has a nasty side effect of killing the current configuration. In this -case GDB may become your enemy. - -printf is your friend when GDB is your enemy. - -If stuck, ask on the mailing list, send your patches. Alot of time has been -spent designing corosync, and even more time debugging it. There are people -that can help you debug problems, especially around things like message -delivery. - -Submit patches early to get feedback, especially around things like parallel -style. Parallel style is very important to ensure maintainability by the -corosync community. - -If this document is wrong or incomplete, complain so we can get it fixed -for other people. - -Have fun!