diff --git a/doc/sphinx/Pacemaker_Development/c.rst b/doc/sphinx/Pacemaker_Development/c.rst index 3f3c9746fb..076e022972 100644 --- a/doc/sphinx/Pacemaker_Development/c.rst +++ b/doc/sphinx/Pacemaker_Development/c.rst @@ -1,1129 +1,1129 @@ .. index:: single: C pair: C; guidelines C Coding Guidelines ------------------- Pacemaker is a large project accepting contributions from developers with a wide range of skill levels and organizational affiliations, and maintained by multiple people over long periods of time. Following consistent guidelines makes reading, writing, and reviewing code easier, and helps avoid common mistakes. Some existing Pacemaker code does not follow these guidelines, for historical reasons and API backward compatibility, but new code should. Code Organization ################# Pacemaker's C code is organized as follows: +-----------------+-----------------------------------------------------------+ | Directory | Contents | +=================+===========================================================+ | daemons | the Pacemaker daemons (pacemakerd, pacemaker-based, etc.) | +-----------------+-----------------------------------------------------------+ | include | header files for library APIs | +-----------------+-----------------------------------------------------------+ | lib | libraries | +-----------------+-----------------------------------------------------------+ | tools | command-line tools | +-----------------+-----------------------------------------------------------+ Source file names should be unique across the entire project, to allow for individual tracing via ``PCMK_trace_files``. .. index:: single: C; library single: C library Pacemaker Libraries ################### +---------------+---------+---------------+---------------------------+-------------------------------------+ | Library | Symbol | Source | API Headers | Description | | | prefix | location | | | +===============+=========+===============+===========================+=====================================+ | libcib | cib | lib/cib | | include/crm/cib.h | .. index:: | | | | | | include/crm/cib/* | single: C library; libcib | | | | | | single: libcib | | | | | | | | | | | | API for pacemaker-based IPC and | | | | | | the CIB | +---------------+---------+---------------+---------------------------+-------------------------------------+ | libcrmcluster | pcmk | lib/cluster | | include/crm/cluster.h | .. index:: | | | | | | include/crm/cluster/* | single: C library; libcrmcluster | | | | | | single: libcrmcluster | | | | | | | | | | | | Abstract interface to underlying | | | | | | cluster layer | +---------------+---------+---------------+---------------------------+-------------------------------------+ | libcrmcommon | pcmk | lib/common | | include/crm/common/* | .. index:: | | | | | | some of include/crm/* | single: C library; libcrmcommon | | | | | | single: libcrmcommon | | | | | | | | | | | | Everything else | +---------------+---------+---------------+---------------------------+-------------------------------------+ | libcrmservice | svc | lib/services | | include/crm/services.h | .. index:: | | | | | | single: C library; libcrmservice | | | | | | single: libcrmservice | | | | | | | | | | | | Abstract interface to supported | | | | | | resource types (OCF, LSB, etc.) | +---------------+---------+---------------+---------------------------+-------------------------------------+ | liblrmd | lrmd | lib/lrmd | | include/crm/lrmd*.h | .. index:: | | | | | | single: C library; liblrmd | | | | | | single: liblrmd | | | | | | | | | | | | API for pacemaker-execd IPC | +---------------+---------+---------------+---------------------------+-------------------------------------+ | libpacemaker | pcmk | lib/pacemaker | | include/pacemaker*.h | .. index:: | | | | | | include/pcmki/* | single: C library; libpacemaker | | | | | | single: libpacemaker | | | | | | | | | | | | High-level APIs equivalent to | | | | | | command-line tool capabilities | | | | | | (and high-level internal APIs) | +---------------+---------+---------------+---------------------------+-------------------------------------+ | libpe_rules | pe | lib/pengine | | include/crm/pengine/* | .. index:: | | | | | | single: C library; libpe_rules | | | | | | single: libpe_rules | | | | | | | -| | | | | Scheduler functionality related | -| | | | | to evaluating rules | +| | | | | Deprecated APIs related to | +| | | | | evaluating rules | +---------------+---------+---------------+---------------------------+-------------------------------------+ | libpe_status | pe | lib/pengine | | include/crm/pengine/* | .. index:: | | | | | | single: C library; libpe_status | | | | | | single: libpe_status | | | | | | | | | | | | Low-level scheduler functionality | +---------------+---------+---------------+---------------------------+-------------------------------------+ | libstonithd | stonith | lib/fencing | | include/crm/stonith-ng.h| .. index:: | | | | | | include/crm/fencing/* | single: C library; libstonithd | | | | | | single: libstonithd | | | | | | | | | | | | API for pacemaker-fenced IPC | +---------------+---------+---------------+---------------------------+-------------------------------------+ Public versus Internal APIs ___________________________ Pacemaker libraries have both internal and public APIs. Internal APIs are those used only within Pacemaker; public APIs are those offered (via header files and documentation) for external code to use. Generic functionality needed by Pacemaker itself, such as string processing or XML processing, should remain internal, while functions providing useful high-level access to Pacemaker capabilities should be public. When in doubt, keep APIs internal, because it's easier to expose a previously internal API than hide a previously public API. Internal APIs can be changed as needed. The public API/ABI should maintain a degree of stability so that external applications using it do not need to be rewritten or rebuilt frequently. Many OSes/distributions avoid breaking API/ABI compatibility within a major release, so if Pacemaker breaks compatibility, that significantly delays when OSes can package the new version. Therefore, changes to public APIs should be backward-compatible (as detailed throughout this chapter), unless we are doing a (rare) release where we specifically intend to break compatibility. External applications known to use Pacemaker's public C API include `sbd `_ and dlm_controld. .. index:: pair: C; naming API Symbol Naming _________________ Exposed API symbols (non-``static`` function names, ``struct`` and ``typedef`` names in header files, etc.) must begin with the prefix appropriate to the library (shown in the table at the beginning of this section). This reduces the chance of naming collisions when external software links against the library. The prefix is usually lowercase but may be all-caps for some defined constants and macros. Public API symbols should follow the library prefix with a single underbar (for example, ``pcmk_something``), and internal API symbols with a double underbar (for example, ``pcmk__other_thing``). File-local symbols (such as static functions) and non-library code do not require a prefix, though a unique prefix indicating an executable (controld, crm_mon, etc.) can be helpful when symbols are shared between multiple source files for the executable. API Header File Naming ______________________ * Internal API headers should be named ending in ``_internal.h``, in the same location as public headers, with the exception of libpacemaker, which for historical reasons keeps internal headers in ``include/pcmki/pcmki_*.h``). * If a library needs to share symbols just within the library, header files for these should be named ending in ``_private.h`` and located in the library source directory (not ``include``). Such functions should be declared as ``G_GNUC_INTERNAL``, to aid compiler efficiency (glib defines this symbol appropriately for the compiler). Header files that are not library API are kept in the same directory as the source code they're included from. The easiest way to tell what kind of API a symbol is, is to see where it's declared. If it's in a public header, it's public API; if it's in an internal header, it's internal API; if it's in a library-private header, it's library-private API; otherwise, it's not an API. .. index:: pair: C; API documentation single: Doxygen API Documentation _________________ Pacemaker uses `Doxygen `_ to automatically generate its `online API documentation `_, so all public API (header files, functions, structs, enums, etc.) should be documented with Doxygen comment blocks. Other code may be documented in the same way if desired, with an ``\internal`` tag in the Doxygen comment. Simple example of an internal function with a Doxygen comment block: .. code-block:: c /*! * \internal * \brief Return string length plus 1 * * Return the number of characters in a given string, plus one. * * \param[in] s A string (must not be NULL) * * \return The length of \p s plus 1. */ static int f(const char *s) { return strlen(s) + 1; } Function arguments are marked as ``[in]`` for input only, ``[out]`` for output only, or ``[in,out]`` for both input and output. ``[in,out]`` should be used for struct pointer arguments if the function can change any data accessed via the pointer. For example, if the struct contains a ``GHashTable *`` member, the argument should be marked as ``[in,out]`` if the function inserts data into the table, even if the struct members themselves are not changed. However, an argument is not ``[in,out]`` if something reachable via the argument is modified via a separate argument. For example, both ``pcmk_resource_t`` and ``pcmk_node_t`` contain pointers to their ``pcmk_scheduler_t`` and thus indirectly to each other, but if the function modifies the resource via the resource argument, the node argument does not have to be ``[in,out]``. Public API Deprecation ______________________ Public APIs may not be removed in most Pacemaker releases, but they may be deprecated. When a public API is deprecated, it is moved to a header whose name ends in ``compat.h``. The original header includes the compatibility header only if the ``PCMK_ALLOW_DEPRECATED`` symbol is undefined or defined to 1. This allows external code to continue using the deprecated APIs, but internal code is prevented from using them because the ``crm_internal.h`` header defines the symbol to 0. .. index:: pair: C; boilerplate pair: license; C pair: copyright; C C Boilerplate ############# Every C file should start with a short copyright and license notice: .. code-block:: c /* * Copyright the Pacemaker project contributors * * The version control history for this file may have further details. * * This source code is licensed under WITHOUT ANY WARRANTY. */ ** should follow the policy set forth in the `COPYING `_ file, generally one of "GNU General Public License version 2 or later (GPLv2+)" or "GNU Lesser General Public License version 2.1 or later (LGPLv2.1+)". Header files should additionally protect against multiple inclusion by defining a unique symbol of the form ``PCMK____H``, and declare C compatibility for inclusion by C++. For example: .. code-block:: c #ifndef PCMK__MY_HEADER__H #define PCMK__MY_HEADER__H // put #include directives here #ifdef __cplusplus extern "C" { #endif // put header code here #ifdef __cplusplus } #endif #endif // PCMK__MY_HEADER__H Public API header files should give a Doxygen file description at the top of the header code. For example: .. code-block:: c /*! * \file * \brief My brief description here * \ingroup core */ .. index:: pair: C; whitespace Line Formatting ############### * Indentation must be 4 spaces, no tabs. * Do not leave trailing whitespace. * Lines should be no longer than 80 characters unless limiting line length hurts readability. .. index:: pair: C; comment Comments ######## .. code-block:: c /* Single-line comments may look like this */ // ... or this /* Multi-line comments should start immediately after the comment opening. * Subsequent lines should start with an aligned asterisk. The comment * closing should be aligned and on a line by itself. */ .. index:: pair: C; operator Operators ######### .. code-block:: c // Operators have spaces on both sides x = a; /* (1) Do not rely on operator precedence; use parentheses when mixing * operators with different priority, for readability. * (2) No space is used after an opening parenthesis or before a closing * parenthesis. */ x = a + b - (c * d); .. index:: single: C; if single: C; else single: C; while single: C; for single: C; switch Control Statements (if, else, while, for, switch) ################################################# .. code-block:: c /* * (1) The control keyword is followed by a space, a left parenthesis * without a space, the condition, a right parenthesis, a space, and the * opening bracket on the same line. * (2) Always use braces around control statement blocks, even if they only * contain one line. This makes code review diffs smaller if a line gets * added in the future, and avoids the chance of bad indenting making a * line incorrectly appear to be part of the block. * (3) The closing bracket is on a line by itself. */ if (v < 0) { return 0; } /* "else" and "else if" are on the same line with the previous ending brace * and next opening brace, separated by a space. Blank lines may be used * between blocks to help readability. */ if (v > 0) { return 0; } else if (a == 0) { return 1; } else { return 2; } /* Do not use assignments in conditions. This ensures that the developer's * intent is always clear, makes code reviews easier, and reduces the chance * of using assignment where comparison is intended. */ // Do this ... a = f(); if (a) { return 0; } // ... NOT this if (a = f()) { return 0; } /* It helps readability to use the "!" operator only in boolean * comparisons, and explicitly compare numeric values against 0, * pointers against NULL, etc. This helps remind the reader of the * type being compared. */ int i = 0; char *s = NULL; bool cond = false; if (!cond) { return 0; } if (i == 0) { return 0; } if (s == NULL) { return 0; } /* In a "switch" statement, indent "case" one level, and indent the body of * each "case" another level. */ switch (expression) { case 0: command1; break; case 1: command2; break; default: command3; break; } .. index:: pair: C; macro Macros ###### Macros are a powerful but easily misused feature of the C preprocessor, and Pacemaker uses a lot of obscure macro features. If you need to brush up, the `GCC documentation for macros `_ is excellent. Some common issues: * Beware of side effects in macro arguments that may be evaluated more than once * Always parenthesize macro arguments used in the macro body to avoid precedence issues if the argument is an expression * Multi-statement macro bodies should be enclosed in do...while(0) to make them behave more like a single statement and avoid control flow issues Often, a static inline function defined in a header is preferable to a macro, to avoid the numerous issues that plague macros and gain the benefit of argument and return value type checking. .. index:: pair: C; memory Memory Management ################# * Always use ``calloc()`` rather than ``malloc()``. It has no additional cost on modern operating systems, and reduces the severity and security risks of uninitialized memory usage bugs. * Ensure that all dynamically allocated memory is freed when no longer needed, and not used after it is freed. This can be challenging in the more event-driven, callback-oriented sections of code. * Free dynamically allocated memory using the free function corresponding to how it was allocated. For example, use ``free()`` with ``calloc()``, and ``g_free()`` with most glib functions that allocate objects. .. index:: single: C; struct Structures ########## Changes to structures defined in public API headers (adding or removing members, or changing member types) are generally not possible without breaking API compatibility. However, there are exceptions: * Public API structures can be designed such that they can be allocated only via API functions, not declared directly or allocated with standard memory functions using ``sizeof``. * This can be enforced simply by documentating the limitation, in which case new ``struct`` members can be added to the end of the structure without breaking compatibility. * Alternatively, the structure definition can be kept in an internal header, with only a pointer type definition kept in a public header, in which case the structure definition can be changed however needed. .. index:: single: C; variable Variables ######### .. index:: single: C; pointer Pointers ________ .. code-block:: c /* (1) The asterisk goes by the variable name, not the type; * (2) Avoid leaving pointers uninitialized, to lessen the impact of * use-before-assignment bugs */ char *my_string = NULL; // Use space before asterisk and after closing parenthesis in a cast char *foo = (char *) bar; .. index:: single: C; global variable Globals _______ Global variables should be avoided in libraries when possible. State information should instead be passed as function arguments (often as a structure). This is not for thread safety -- Pacemaker's use of forking ensures it will never be threaded -- but it does minimize overhead, improve readability, and avoid obscure side effects. Variable Naming _______________ Time intervals are sometimes represented in Pacemaker code as user-defined text specifications (for example, "10s"), other times as an integer number of seconds or milliseconds, and still other times as a string representation of an integer number. Variables for these should be named with an indication of which is being used (for example, use ``interval_spec``, ``interval_ms``, or ``interval_ms_s`` instead of ``interval``). .. index:: pair: C; booleans pair: C; bool pair: C; gboolean Booleans ________ Booleans in C can be represented by an integer type, ``bool``, or ``gboolean``. Integers are sometimes useful for storing booleans when they must be converted to and from a string, such as an XML attribute value (for which ``crm_element_value_int()`` can be used). Integer booleans use 0 for false and nonzero (usually 1) for true. ``gboolean`` should be used with glib APIs that specify it. ``gboolean`` should always be used with glib's ``TRUE`` and ``FALSE`` constants. Otherwise, ``bool`` should be preferred. ``bool`` should be used with the ``true`` and ``false`` constants from the ``stdbool.h`` header. Do not use equality operators when testing booleans. For example: .. code-block:: c // Do this if (bool1) { fn(); } if (!bool2) { fn2(); } // Not this if (bool1 == true) { fn(); } if (bool2 == false) { fn2(); } // Otherwise there's no logical end ... if ((bool1 == false) == true) { fn(); } .. index:: pair: C; strings String Handling ############### Define Constants for Magic Strings __________________________________ A "magic" string is one used for control purposes rather than human reading, and which must be exactly the same every time it is used. Examples would be configuration option names, XML attribute names, or environment variable names. These should always be defined constants, rather than using the string literal everywhere. If someone mistypes a defined constant, the code won't compile, but if they mistype a literal, it could go unnoticed until a user runs into a problem. String-Related Library Functions ________________________________ Pacemaker's libcrmcommon has a large number of functions to assist in string handling. The most commonly used ones are: * ``pcmk__str_eq()`` tests string equality (similar to ``strcmp()``), but can handle NULL, and takes options for case-insensitive, whether NULL should be considered a match, etc. * ``crm_strdup_printf()`` takes ``printf()``-style arguments and creates a string from them (dynamically allocated, so it must be freed with ``free()``). It asserts on memory failure, so the return value is always non-NULL. String handling functions should almost always be internal API, since Pacemaker isn't intended to be used as a general-purpose library. Most are declared in ``include/crm/common/strings_internal.h``. ``util.h`` has some older ones that are public API (for now, but will eventually be made internal). char*, gchar*, and GString __________________________ When using dynamically allocated strings, be careful to always use the appropriate free function. * ``char*`` strings allocated with something like ``calloc()`` must be freed with ``free()``. Most Pacemaker library functions that allocate strings use this implementation. * glib functions often use ``gchar*`` instead, which must be freed with ``g_free()``. * Occasionally, it's convenient to use glib's flexible ``GString*`` type, which must be freed with ``g_string_free()``. .. index:: pair: C; regular expression Regular Expressions ___________________ - Use ``REG_NOSUB`` with ``regcomp()`` whenever possible, for efficiency. - Be sure to use ``regfree()`` appropriately. .. index:: single: C; enum Enumerations ############ * Enumerations should not have a ``typedef``, and do not require any naming convention beyond what applies to all exposed symbols. * New values should usually be added to the end of public API enumerations, because the compiler will define the values to 0, 1, etc., in the order given, and inserting a value in the middle would change the numerical values of all later values, breaking code compiled with the old values. However, if enum numerical values are explicitly specified rather than left to the compiler, new values can be added anywhere. * When defining constant integer values, enum should be preferred over ``#define`` or ``const`` when possible. This allows type checking without consuming memory. Flag groups ___________ Pacemaker often uses flag groups (also called bit fields or bitmasks) for a collection of boolean options (flags/bits). This is more efficient for storage and manipulation than individual booleans, but its main advantage is when used in public APIs, because using another bit in a bitmask is backward compatible, whereas adding a new function argument (or sometimes even a structure member) is not. .. code-block:: c #include /* (1) Define an enumeration to name the individual flags, for readability. * An enumeration is preferred to a series of "#define" constants * because it is typed, and logically groups the related names. * (2) Define the values using left-shifting, which is more readable and * less error-prone than hexadecimal literals (0x0001, 0x0002, 0x0004, * etc.). * (3) Using a comma after the last entry makes diffs smaller for reviewing * if a new value needs to be added or removed later. */ enum pcmk__some_bitmask_type { pcmk__some_value = (1 << 0), pcmk__other_value = (1 << 1), pcmk__another_value = (1 << 2), }; /* The flag group itself should be an unsigned type from stdint.h (not * the enum type, since it will be a mask of the enum values and not just * one of them). uint32_t is the most common, since we rarely need more than * 32 flags, but a smaller or larger type could be appropriate in some * cases. */ uint32_t flags = pcmk__some_value|pcmk__other_value; /* If the values will be used only with uint64_t, define them accordingly, * to make compilers happier. */ enum pcmk__something_else { pcmk__whatever = (UINT64_C(1) << 0), }; We have convenience functions for checking flags (see ``pcmk_any_flags_set()``, ``pcmk_all_flags_set()``, and ``pcmk_is_set()``) as well as setting and clearing them (see ``pcmk__set_flags_as()`` and ``pcmk__clear_flags_as()``, usually used via wrapper macros defined for specific flag groups). These convenience functions should be preferred to direct bitwise arithmetic, for readability and logging consistency. .. index:: pair: C; function Functions ######### Function Naming _______________ Function names should be unique across the entire project, to allow for individual tracing via ``PCMK_trace_functions``, and make it easier to search code and follow detail logs. .. _sort_func: Sorting ^^^^^^^ A function that sorts an entire list should have ``sort`` in its name. It sorts elements using a :ref:`comparison ` function, which may be either hard-coded or passed as an argument. .. _compare_func: Comparison ^^^^^^^^^^ A comparison function for :ref:`sorting ` should have ``cmp`` in its name and should *not* have ``sort`` in its name. .. _constructor_func: Constructors ^^^^^^^^^^^^ A constructor creates a new dynamically allocated object. It may perform some initialization procedure on the new object. * If the constructor always creates an independent object instance, its name should include ``new``. * If the constructor may add the new object to some existing object, its name should include ``create``. Function Definitions ____________________ .. code-block:: c /* * (1) The return type goes on its own line * (2) The opening brace goes by itself on a line * (3) Use "const" with pointer arguments whenever appropriate, to allow the * function to be used by more callers. */ int my_func1(const char *s) { return 0; } /* Functions with no arguments must explicitly list them as void, * for compatibility with strict compilers */ int my_func2(void) { return 0; } /* * (1) For functions with enough arguments that they must break to the next * line, align arguments with the first argument. * (2) When a function argument is a function itself, use the pointer form. * (3) Declare functions and file-global variables as ``static`` whenever * appropriate. This gains a slight efficiency in shared libraries, and * helps the reader know that it is not used outside the one file. */ static int my_func3(int bar, const char *a, const char *b, const char *c, void (*callback)()) { return 0; } Return Values _____________ Functions that need to indicate success or failure should follow one of the following guidelines. More details, including functions for using them in user messages and converting from one to another, can be found in ``include/crm/common/results.h``. * A **standard Pacemaker return code** is one of the ``pcmk_rc_*`` enum values or a system errno code, as an ``int``. * ``crm_exit_t`` (the ``CRM_EX_*`` enum values) is a system-independent code suitable for the exit status of a process, or for interchange between nodes. * Other special-purpose status codes exist, such as ``enum ocf_exitcode`` for the possible exit statuses of OCF resource agents (along with some Pacemaker-specific extensions). It is usually obvious when the context calls for such. * Some older Pacemaker APIs use the now-deprecated "legacy" return values of ``pcmk_ok`` or the positive or negative value of one of the ``pcmk_err_*`` constants or system errno codes. * Functions registered with external libraries (as callbacks for example) should use the appropriate signature defined by those libraries, rather than follow Pacemaker guidelines. Of course, functions may have return values that aren't success/failure indicators, such as a pointer, integer count, or bool. :ref:`Comparison ` functions should return * a negative integer if the first argument should sort first * 0 if its arguments are equal for sorting purposes * a positive integer is the second argument should sort first Public API Functions ____________________ Unless we are doing a (rare) release where we break public API compatibility, new public API functions can be added, but existing function signatures (return type, name, and argument types) should not be changed. To work around this, an existing function can become a wrapper for a new function. .. index:: pair: C; logging pair: C; output Logging and Output ################## Logging Vs. Output __________________ Log messages and output messages are logically similar but distinct. Oversimplifying a bit, daemons log, and tools output. Log messages are intended to help with troubleshooting and debugging. They may have a high level of technical detail, and are usually filtered by severity -- for example, the system log by default gets messages of notice level and higher. Output is intended to let the user know what a tool is doing, and is generally terser and less technical, and may even be parsed by scripts. Output might have "verbose" and "quiet" modes, but it is not filtered by severity. Common Guidelines for All Messages __________________________________ * When format strings are used for derived data types whose implementation may vary across platforms (``pid_t``, ``time_t``, etc.), the safest approach is to use ``%lld`` in the format string, and cast the value to ``long long``. * Do not rely on ``%s`` handling ``NULL`` values properly. While the standard library functions might, not all functions using printf-style formatting does, and it's safest to get in the habit of always ensuring format values are non-NULL. If a value can be NULL, the ``pcmk__s()`` function is a convenient way to say "this string if not NULL otherwise this default". * The convenience macros ``pcmk__plural_s()`` and ``pcmk__plural_alt()`` are handy when logging a word that may be singular or plural. Log Levels __________ When to use each log level: * **critical:** fatal error (usually something that would make a daemon exit) * **error:** failure of something that affects the cluster (such as a resource action, fencing action, etc.) or daemon operation * **warning:** minor, potential, or recoverable failures (such as something only affecting a daemon client, or invalid configuration that can be left to default) * **notice:** important successful events (such as a node joining or leaving, resource action results, or configuration changes) * **info:** events that would be helpful with troubleshooting (such as status section updates or elections) * **debug:** information that would be helpful for debugging code or complex problems * **trace:** like debug but for very noisy or low-level stuff By default, critical through notice are logged to the system log and detail log, info is logged to the detail log only, and debug and trace are not logged (if enabled, they go to the detail log only). Logging _______ Pacemaker uses libqb for logging, but wraps it with a higher level of functionality (see ``include/crm/common/logging*h``). A few macros ``crm_err()``, ``crm_warn()``, etc. do most of the heavy lifting. By default, Pacemaker sends logs at notice level and higher to the system log, and logs at info level and higher to the detail log (typically ``/var/log/pacemaker/pacemaker.log``). The intent is that most users will only ever need the system log, but for deeper troubleshooting and developer debugging, the detail log may be helpful, at the cost of being more technical and difficult to follow. The same message can have more detail in the detail log than in the system log, using libqb's "extended logging" feature: .. code-block:: c /* The following will log a simple message in the system log, like: warning: Action failed: Node not found with extra detail in the detail log, like: warning: Action failed: Node not found | rc=-1005 id=hgjjg-51006 */ crm_warn("Action failed: %s " QB_XS " rc=%d id=%s", pcmk_rc_str(rc), rc, id); Assertion Logging _________________ ``pcmk__assert(expr)`` If ``expr`` is false, this will call ``crm_err()`` with a "Triggered fatal assertion" message (with details), then abort execution. This should be used for logic errors that should be impossible (such as a NULL function argument where not accepted) and environmental errors that can't be handled gracefully (for example, memory allocation failures, though returning ``ENOMEM`` is often better). ``CRM_LOG_ASSERT(expr)`` If ``expr`` is false, this will generally log a message without aborting. If the log level is below trace, it just calls ``crm_err()`` with a "Triggered assert" message (with details). If the log level is trace, and the caller is a daemon, then it will fork a child process in which to dump core, as well as logging the message. If the log level is trace, and the caller is not a daemon, then it will behave like ``pcmk__assert()`` (i.e. log and abort). This should be used for logic or protocol errors that require no special handling. ``CRM_CHECK(expr, failed_action)`` If ``expr`` is false, behave like ``CRM_LOG_ASSERT(expr)`` (that is, log a message and dump core if requested) then perform ``failed_action`` (which must not contain ``continue``, ``break``, or ``errno``). This should be used for logic or protocol errors that can be handled, usually by returning an error status. Output ______ Pacemaker has a somewhat complicated system for tool output. The main benefit is that the user can select the output format with the ``--output-as`` option (usually "text" for human-friendly output or "xml" for reliably script-parsable output, though ``crm_mon`` additionally supports "console" and "html"). A custom message can be defined with a unique string identifier, plus implementation functions for each supported format. The caller invokes the message using the identifier. The user selects the output format via ``--output-as``, and the output code automatically calls the appropriate implementation function. Custom messages are useful when you want to output messages that are more complex than a one-line error or informational message, reproducible, and automatically handled by the output formatting system. Custom messages can contain other custom messages. Custom message functions are implemented as follows: Start with the macro ``PCMK__OUTPUT_ARGS``, whose arguments are the message name, followed by the arguments to the message. Then there is the function declaration, for which the arguments are the pointer to the current output object, then a variable argument list. To output a custom message, you first need to create, i.e. register, the custom message that you want to output. Either call ``register_message``, which registers a custom message at runtime, or make use of the collection of predefined custom messages in ``fmt_functions``, which is defined in ``lib/pacemaker/pcmk_output.c``. Once you have the message to be outputted, output it by calling ``message``. Note: The ``fmt_functions`` functions accommodate all of the output formats; the default implementation accommodates any format that isn't explicitly accommodated. The default output provides valid output for any output format, but you may still want to implement a specific output, i.e. xml, text, or html. The ``message`` function automatically knows which implementation to use, because the ``pcmk__output_s`` contains this information. The interface (most importantly ``pcmk__output_t``) is declared in ``include/crm/common/output*h``. See the API comments and existing tools for examples. Some of its important member functions are ``err``, which formats error messages and ``info``, which formats informational messages. Also, ``list_item``, which formats list items, ``begin_list``, which starts lists, and ``end_list``, which ends lists, are important because lists can be useful, yet differently handled by the different output types. .. index:: pair: C; XML XML ### External Libraries __________________ Pacemaker uses `libxml2 `_ and `libxslt `_ to process XML. These libraries implement only version 1.0 of the XML, XPath, and XSLT specifications. Naming ______ Names of functions, constants, and enum values related to XML should contain substrings indicating the type of object they're used with, according to the following convention: * ``xml``: XML subtree, or XML generically * ``xe``: XML element node, including the attributes belonging to an element * ``xa``: XML attribute node * ``xc``: XML comment node Private Data ____________ Libxml2 data structures such as ``xmlNode`` and ``xmlDoc`` contain a ``void *_private`` member for application-specific data. Pacemaker uses this field to store internal bookkeeping data, such as changes relative to another XML tree, or ACLs. XML documents, elements, attributes, and comments have private data. The private data field must be allocated immediately after the node is created and freed immediately before the node is freed. Wrapper Functions _________________ Pacemaker provides wrappers for a variety of libxml2 and libxslt functions. They should be used whenever possible. Some are merely for convenience. However, many perform additional, Pacemaker-specific tasks, such as change tracking, ACL checking, and allocation/deallocation of XML documents and private data. Pacemaker assumes that every XML node is part of a document and has private data allocated. If libxml2 APIs are used directly instead of the wrapper functions, Pacemaker may crash with a segmentation fault, or change tracking and ACL checking may be incorrectly disabled. .. index:: single: Makefile.am Makefiles ######### Pacemaker uses `automake `_ for building, so the Makefile.am in each directory should be edited rather than Makefile.in or Makefile, which are automatically generated. * Public API headers are installed (by adding them to a ``HEADERS`` variable in ``Makefile.am``), but internal API headers are not (by adding them to ``noinst_HEADERS``). .. index:: pair: C; vim settings vim Settings ############ Developers who use ``vim`` to edit source code can add the following settings to their ``~/.vimrc`` file to follow Pacemaker C coding guidelines: .. code-block:: none " follow Pacemaker coding guidelines when editing C source code files filetype plugin indent on au FileType c setlocal expandtab tabstop=4 softtabstop=4 shiftwidth=4 textwidth=80 autocmd BufNewFile,BufRead *.h set filetype=c let c_space_errors = 1 diff --git a/doc/sphinx/Pacemaker_Development/components.rst b/doc/sphinx/Pacemaker_Development/components.rst index ce6b36ba52..f332410871 100644 --- a/doc/sphinx/Pacemaker_Development/components.rst +++ b/doc/sphinx/Pacemaker_Development/components.rst @@ -1,491 +1,491 @@ Coding Particular Pacemaker Components -------------------------------------- The Pacemaker code can be intricate and difficult to follow. This chapter has some high-level descriptions of how individual components work. .. index:: single: controller single: pacemaker-controld Controller ########## ``pacemaker-controld`` is the Pacemaker daemon that utilizes the other daemons to orchestrate actions that need to be taken in the cluster. It receives CIB change notifications from the CIB manager, passes the new CIB to the scheduler to determine whether anything needs to be done, uses the executor and fencer to execute any actions required, and sets failure counts (among other things) via the attribute manager. As might be expected, it has the most code of any of the daemons. .. index:: single: join Join sequence _____________ Most daemons track their cluster peers using Corosync's membership and :term:`CPG` only. The controller additionally requires peers to `join`, which ensures they are ready to be assigned tasks. Joining proceeds through a series of phases referred to as the `join sequence` or `join process`. A node's current join phase is tracked by the ``join`` member of ``crm_node_t`` (used in the peer cache). It is an ``enum crm_join_phase`` that (ideally) progresses from the DC's point of view as follows: * The node initially starts at ``crm_join_none`` * The DC sends the node a `join offer` (``CRM_OP_JOIN_OFFER``), and the node proceeds to ``crm_join_welcomed``. This can happen in three ways: * The joining node will send a `join announce` (``CRM_OP_JOIN_ANNOUNCE``) at its controller startup, and the DC will reply to that with a join offer. * When the DC's peer status callback notices that the node has joined the messaging layer, it registers ``I_NODE_JOIN`` (which leads to ``A_DC_JOIN_OFFER_ONE`` -> ``do_dc_join_offer_one()`` -> ``join_make_offer()``). * After certain events (notably a new DC being elected), the DC will send all nodes join offers (via A_DC_JOIN_OFFER_ALL -> ``do_dc_join_offer_all()``). These can overlap. The DC can send a join offer and the node can send a join announce at nearly the same time, so the node responds to the original join offer while the DC responds to the join announce with a new join offer. The situation resolves itself after looping a bit. * The node responds to join offers with a `join request` (``CRM_OP_JOIN_REQUEST``, via ``do_cl_join_offer_respond()`` and ``join_query_callback()``). When the DC receives the request, the node proceeds to ``crm_join_integrated`` (via ``do_dc_join_filter_offer()``). * As each node is integrated, the current best CIB is sync'ed to each integrated node via ``do_dc_join_finalize()``. As each integrated node's CIB sync succeeds, the DC acks the node's join request (``CRM_OP_JOIN_ACKNAK``) and the node proceeds to ``crm_join_finalized`` (via ``finalize_sync_callback()`` + ``finalize_join_for()``). * Each node confirms the finalization ack (``CRM_OP_JOIN_CONFIRM`` via ``do_cl_join_finalize_respond()``), including its current resource operation history (via ``controld_query_executor_state()``). Once the DC receives this confirmation, the node proceeds to ``crm_join_confirmed`` via ``do_dc_join_ack()``. Once all nodes are confirmed, the DC calls ``do_dc_join_final()``, which checks for quorum and responds appropriately. When peers are lost, their join phase is reset to none (in various places). ``crm_update_peer_join()`` updates a node's join phase. The DC increments the global ``current_join_id`` for each joining round, and rejects any (older) replies that don't match. .. index:: single: fencer single: pacemaker-fenced Fencer ###### ``pacemaker-fenced`` is the Pacemaker daemon that handles fencing requests. In the broadest terms, fencing works like this: #. The initiator (an external program such as ``stonith_admin``, or the cluster itself via the controller) asks the local fencer, "Hey, could you please fence this node?" #. The local fencer asks all the fencers in the cluster (including itself), "Hey, what fencing devices do you have access to that can fence this node?" #. Each fencer in the cluster replies with a list of available devices that it knows about. #. Once the original fencer gets all the replies, it asks the most appropriate fencer peer to actually carry out the fencing. It may send out more than one such request if the target node must be fenced with multiple devices. #. The chosen fencer(s) call the appropriate fencing resource agent(s) to do the fencing, then reply to the original fencer with the result. #. The original fencer broadcasts the result to all fencers. #. Each fencer sends the result to each of its local clients (including, at some point, the initiator). A more detailed description follows. .. index:: single: libstonithd Initiating a fencing request ____________________________ A fencing request can be initiated by the cluster or externally, using the libstonithd API. * The cluster always initiates fencing via ``daemons/controld/controld_fencing.c:te_fence_node()`` (which calls the ``fence()`` API method). This occurs when a transition graph synapse contains a ``CRM_OP_FENCE`` XML operation. * The main external clients are ``stonith_admin`` and ``cts-fence-helper``. The ``DLM`` project also uses Pacemaker for fencing. Highlights of the fencing API: * ``stonith_api_new()`` creates and returns a new ``stonith_t`` object, whose ``cmds`` member has methods for connect, disconnect, fence, etc. * the ``fence()`` method creates and sends a ``STONITH_OP_FENCE XML`` request with the desired action and target node. Callers do not have to choose or even have any knowledge about particular fencing devices. Fencing queries _______________ The function calls for a fencing request go something like this: The local fencer receives the client's request via an :term:`IPC` or messaging layer callback, which calls * ``stonith_command()``, which (for requests) calls * ``handle_request()``, which (for ``STONITH_OP_FENCE`` from a client) calls * ``initiate_remote_stonith_op()``, which creates a ``STONITH_OP_QUERY`` XML request with the target, desired action, timeout, etc. then broadcasts the operation to the cluster group (i.e. all fencer instances) and starts a timer. The query is broadcast because (1) location constraints might prevent the local node from accessing the stonith device directly, and (2) even if the local node does have direct access, another node might be preferred to carry out the fencing. Each fencer receives the original fencer's ``STONITH_OP_QUERY`` broadcast request via IPC or messaging layer callback, which calls: * ``stonith_command()``, which (for requests) calls * ``handle_request()``, which (for ``STONITH_OP_QUERY`` from a peer) calls * ``stonith_query()``, which calls * ``get_capable_devices()`` with ``stonith_query_capable_device_cb()`` to add device information to an XML reply and send it. (A message is considered a reply if it contains ``T_STONITH_REPLY``, which is only set by fencer peers, not clients.) The original fencer receives all peers' ``STONITH_OP_QUERY`` replies via IPC or messaging layer callback, which calls: * ``stonith_command()``, which (for replies) calls * ``handle_reply()`` which (for ``STONITH_OP_QUERY``) calls * ``process_remote_stonith_query()``, which allocates a new query result structure, parses device information into it, and adds it to the operation object. It increments the number of replies received for this operation, and compares it against the expected number of replies (i.e. the number of active peers), and if this is the last expected reply, calls * ``request_peer_fencing()``, which calculates the timeout and sends ``STONITH_OP_FENCE`` request(s) to carry out the fencing. If the target node has a fencing "topology" (which allows specifications such as "this node can be fenced either with device A, or devices B and C in combination"), it will choose the device(s), and send out as many requests as needed. If it chooses a device, it will choose the peer; a peer is preferred if it has "verified" access to the desired device, meaning that it has the device "running" on it and thus has a monitor operation ensuring reachability. Fencing operations __________________ Each ``STONITH_OP_FENCE`` request goes something like this: The chosen peer fencer receives the ``STONITH_OP_FENCE`` request via :term:`IPC` or messaging layer callback, which calls: * ``stonith_command()``, which (for requests) calls * ``handle_request()``, which (for ``STONITH_OP_FENCE`` from a peer) calls * ``stonith_fence()``, which calls * ``schedule_stonith_command()`` (using supplied device if ``F_STONITH_DEVICE`` was set, otherwise the highest-priority capable device obtained via ``get_capable_devices()`` with ``stonith_fence_get_devices_cb()``), which adds the operation to the device's pending operations list and triggers processing. The chosen peer fencer's mainloop is triggered and calls * ``stonith_device_dispatch()``, which calls * ``stonith_device_execute()``, which pops off the next item from the device's pending operations list. If acting as the (internally implemented) watchdog agent, it panics the node, otherwise it calls * ``stonith_action_create()`` and ``stonith_action_execute_async()`` to call the fencing agent. The chosen peer fencer's mainloop is triggered again once the fencing agent returns, and calls * ``stonith_action_async_done()`` which adds the results to an action object then calls its * done callback (``st_child_done()``), which calls ``schedule_stonith_command()`` for a new device if there are further required actions to execute or if the original action failed, then builds and sends an XML reply to the original fencer (via ``send_async_reply()``), then checks whether any pending actions are the same as the one just executed and merges them if so. Fencing replies _______________ The original fencer receives the ``STONITH_OP_FENCE`` reply via :term:`IPC` or messaging layer callback, which calls: * ``stonith_command()``, which (for replies) calls * ``handle_reply()``, which calls * ``fenced_process_fencing_reply()``, which calls either ``request_peer_fencing()`` (to retry a failed operation, or try the next device in a topology if appropriate, which issues a new ``STONITH_OP_FENCE`` request, proceeding as before) or ``finalize_op()`` (if the operation is definitively failed or successful). * ``finalize_op()`` broadcasts the result to all peers. Finally, all peers receive the broadcast result and call * ``finalize_op()``, which sends the result to all local clients. .. index:: single: fence history Fencing History _______________ The fencer keeps a running history of all fencing operations. The bulk of the relevant code is in `fenced_history.c` and ensures the history is synchronized across all nodes even if a node leaves and rejoins the cluster. In libstonithd, this information is represented by `stonith_history_t` and is queryable by the `stonith_api_operations_t:history()` method. `crm_mon` and `stonith_admin` use this API to display the history. .. index:: single: scheduler single: pacemaker-schedulerd + single: libcrmcommon single: libpe_status - single: libpe_rules single: libpacemaker Scheduler ######### ``pacemaker-schedulerd`` is the Pacemaker daemon that runs the Pacemaker scheduler for the controller, but "the scheduler" in general refers to related -library code in ``libpe_status`` and ``libpe_rules`` (``lib/pengine/*.c``), and -some of ``libpacemaker`` (``lib/pacemaker/pcmk_sched_*.c``). +library code in various files in ``libcrmcommon``, ``libpe_status``, and +``libpacemaker``. The purpose of the scheduler is to take a CIB as input and generate a transition graph (list of actions that need to be taken) as output. The controller invokes the scheduler by contacting the scheduler daemon via local :term:`IPC`. Tools such as ``crm_simulate``, ``crm_mon``, and ``crm_resource`` can also invoke the scheduler, but do so by calling the library functions directly. This allows them to run using a ``CIB_file`` without the cluster needing to be active. The main entry point for the scheduler code is ``lib/pacemaker/pcmk_scheduler.c:pcmk__schedule_actions()``. It sets defaults and calls a series of functions for the scheduling. Some key steps: * ``unpack_cib()`` parses most of the CIB XML into data structures, and determines the current cluster status. * ``apply_node_criteria()`` applies factors that make resources prefer certain nodes, such as shutdown locks, location constraints, and stickiness. * ``pcmk__create_internal_constraints()`` creates internal constraints, such as the implicit ordering for group members, or start actions being implicitly ordered before promote actions. * ``pcmk__handle_rsc_config_changes()`` processes resource history entries in the CIB status section. This is used to decide whether certain actions need to be done, such as deleting orphan resources, forcing a restart when a resource definition changes, etc. * ``assign_resources()`` :term:`assigns ` resources to nodes. * ``schedule_resource_actions()`` schedules resource-specific actions (which might or might not end up in the final graph). * ``pcmk__apply_orderings()`` processes ordering constraints in order to modify action attributes such as optional or required. * ``pcmk__create_graph()`` creates the transition graph. Challenges __________ Working with the scheduler is difficult. Challenges include: * It is far too much code to keep more than a small portion in your head at one time. * Small changes can have large (and unexpected) effects. This is why we have a large number of regression tests (``cts/cts-scheduler``), which should be run after making code changes. * It produces an insane amount of log messages at debug and trace levels. You can put resource ID(s) in the ``PCMK_trace_tags`` environment variable to enable trace-level messages only when related to specific resources. * Different parts of the main ``pcmk_scheduler_t`` structure are finalized at different points in the scheduling process, so you have to keep in mind whether information you're using at one point of the code can possibly change later. For example, data unpacked from the CIB can safely be used anytime after ``unpack_cib(),`` but actions may become optional or required anytime before ``pcmk__create_graph()``. There's no easy way to deal with this. * Many names of struct members, functions, etc., are suboptimal, but are part of the public API and cannot be changed until an API backward compatibility break. .. index:: single: pcmk_scheduler_t Cluster Working Set ___________________ The main data object for the scheduler is ``pcmk_scheduler_t``, which contains all information needed about nodes, resources, constraints, etc., both as the raw CIB XML and parsed into more usable data structures, plus the resulting transition graph XML. The variable name is usually ``scheduler``. .. index:: single: pcmk_resource_t Resources _________ ``pcmk_resource_t`` is the data object representing cluster resources. A resource has a variant: :term:`primitive`, group, clone, or :term:`bundle`. The resource object has members for two sets of methods, ``resource_object_functions_t`` from the ``libpe_status`` public API, and ``resource_alloc_functions_t`` whose implementation is internal to ``libpacemaker``. The actual functions vary by variant. The object functions have basic capabilities such as unpacking the resource XML, and determining the current or planned location of the resource. The :term:`assignment ` functions have more obscure capabilities needed for scheduling, such as processing location and ordering constraints. For example, ``pcmk__create_internal_constraints()`` simply calls the ``internal_constraints()`` method for each top-level resource in the cluster. .. index:: single: pcmk_node_t Nodes _____ :term:`Assignment ` of resources to nodes is done by choosing the node with the highest :term:`score` for a given resource. The scheduler does a bunch of processing to generate the scores, then the actual assignment is straightforward. Node lists are frequently used. For example, ``pcmk_scheduler_t`` has a ``nodes`` member which is a list of all nodes in the cluster, and ``pcmk_resource_t`` has a ``running_on`` member which is a list of all nodes on which the resource is (or might be) active. These are lists of ``pcmk_node_t`` objects. The ``pcmk_node_t`` object contains a ``struct pe_node_shared_s *details`` member with all node information that is independent of resource assignment (the node name, etc.). The working set's ``nodes`` member contains the original of this information. All other node lists contain copies of ``pcmk_node_t`` where only the ``details`` member points to the originals in the working set's ``nodes`` list. In this way, the other members of ``pcmk_node_t`` (such as ``weight``, which is the node score) may vary by node list, while the common details are shared. .. index:: single: pcmk_action_t single: pe_action_flags Actions _______ ``pcmk_action_t`` is the data object representing actions that might need to be taken. These could be resource actions, cluster-wide actions such as fencing a node, or "pseudo-actions" which are abstractions used as convenient points for ordering other actions against. It has a ``flags`` member which is a bitmask of ``enum pe_action_flags``. The most important of these are ``pe_action_runnable`` (if not set, the action is "blocked" and cannot be added to the transition graph) and ``pe_action_optional`` (actions with this set will not be added to the transition graph; actions often start out as optional, and may become required later). .. index:: single: pe__colocation_t Colocations ___________ ``pcmk__colocation_t`` is the data object representing colocations. Colocation constraints come into play in these parts of the scheduler code: * When sorting resources for :term:`assignment `, so resources with highest node :term:`score` are assigned first (see ``cmp_resources()``) * When updating node scores for resource assigment or promotion priority * When assigning resources, so any resources to be colocated with can be assigned first, and so colocations affect where the resource is assigned * When choosing roles for promotable clone instances, so colocations involving a specific role can affect which instances are promoted The resource assignment functions have several methods related to colocations: * ``apply_coloc_score():`` This applies a colocation's score to either the dependent's allowed node scores (if called while resources are being assigned) or the dependent's priority (if called while choosing promotable instance roles). It can behave differently depending on whether it is being called as the :term:`primary's ` method or as the :term:`dependent's ` method. * ``add_colocated_node_scores():`` This updates a table of nodes for a given colocation attribute and score. It goes through colocations involving a given resource, and updates the scores of the nodes in the table with the best scores of nodes that match up according to the colocation criteria. * ``colocated_resources():`` This generates a list of all resources involved in mandatory colocations (directly or indirectly via colocation chains) with a given resource. .. index:: single: pe__ordering_t single: pe_ordering Orderings _________ Ordering constraints are simple in concept, but they are one of the most important, powerful, and difficult to follow aspects of the scheduler code. ``pe__ordering_t`` is the data object representing an ordering, better thought of as a relationship between two actions, since the relation can be more complex than just "this one runs after that one". For an ordering "A then B", the code generally refers to A as "first" or "before", and B as "then" or "after". Much of the power comes from ``enum pe_ordering``, which are flags that determine how an ordering behaves. There are many obscure flags with big effects. A few examples: * ``pe_order_none`` means the ordering is disabled and will be ignored. It's 0, meaning no flags set, so it must be compared with equality rather than ``pcmk_is_set()``. * ``pe_order_optional`` means the ordering does not make either action required, so it only applies if they both become required for other reasons. * ``pe_order_implies_first`` means that if action B becomes required for any reason, then action A will become required as well.