Page MenuHomeClusterLabs Projects

Validate stonith-watchdog-timeout appropriately and default to 0 on invalid values
Open, NormalPublic

Assigned To
None
Authored By
nrwahl2
Jan 11 2024, 12:45 AM
Tags
  • Restricted Project
  • Restricted Project
  • Restricted Project
Referenced Files
None
Subscribers

Description

See discussion starting at https://github.com/ClusterLabs/pacemaker/pull/3311#discussion_r1446742520.

Currently, the stonith-watchdog-timeout cluster option has two defaults:

  • 0 ("disabled") if unset
  • -1 ("auto-calculate") if the configured value is invalid (that is, a garbage string that can't be parsed as a timeout)
    • All negative and invalid values are treated the same as -1. We say -1 for convenience.

This has a couple of unfortunate and closely related side effects.

  • We can't validate the value at all. Any value not parsable as a timeout by crm_get_msec() is treated as -1 and causes the timeout to be auto-calculated based on the SBD_WATCHDOG_TIMEOUT environment variable. Some configurations might rely on this behavior, so we can't begin validating values until a compatibility break. In effect, everything is allowed and has meaning (invalid means "negative", which means "auto-calculate").
  • The default value in the documentation and the options array is wrong and can't be made correct. The default value is supposed to be the same regardless of whether the option is completely unset or the option is set to an invalid value. For stonith-watchdog-timeout, these two cases yield different default values.

The proposal is to make 0 (disabled) the unconditional default value at Pacemaker 3.0.0 and to reject values that aren't parsable as timeouts. We can either continue letting negative values mean "auto-calculate" or define a string like "auto" for that purpose. The latter is clearer but would require an XSL transform for existing configs. (Would keeping "negative == auto-calculate" also require that an XSL transform from garbage values to -1?)

Note: pcmk__valid_stonith_watchdog_timeout() looks like a validation function, but it can never return false. It can only pass or exit with a fatal error; it exits only when the stonith-watchdog-timeout value is incompatible with the SBD_WATCHDOG_TIMEOUT value.

Event Timeline

nrwahl2 triaged this task as Normal priority.Jan 11 2024, 12:45 AM
nrwahl2 created this task.
nrwahl2 created this object with edit policy "Restricted Project (Project)".
kgaillot moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jan 24 2024, 1:04 PM