Page MenuHomeClusterLabs Projects

Dynamically adjust watchdog timeout per node
Open, LowPublic

Assigned To
None
Authored By
kgaillot
Oct 24 2024, 10:30 AM
Tags
  • Restricted Project
  • Restricted Project
Referenced Files
None
Subscribers

Description

Goals:

  • Track each node's STONITH_WATCHDOG_TIMEOUT
    • Most likely in the controller
    • A node_state attribute would be good, so it persists across reboots
    • Cluster nodes could advertise it as part of the join process
    • Pacemaker Remote nodes already have a timeout verification when it could be done
    • The local STONITH_WATCHDOG_TIMEOUT would be the default before a target node's value is known (watchdog fencing is unreliable before a target is first seen anyway, and a never-seen node shouldn't be running resources)
  • The stonith-watchdog-timeout cluster option should be deprecated and replaced with one or more new options
    • The new options should use standard types/validators when practical
    • The new option names should use some variant of "fencing" rather than "stonith"
    • Something like watchdog-fencing-duration as a duration (nonnegative interval spec) to set a specific wait time
    • Something like watchdog-fencing-auto=true/false/increase/decrease where true = use twice the target-specific value, false = use specified duration exactly, increase = use the higher of the specified duration or twice the target-specific value, decrease = use the lower of the specified duration or twice the target-specific value
    • The new syntax should be designed so it is straightforward to XSL-transform the old syntax to it when we eventually drop it

Event Timeline

kgaillot triaged this task as Normal priority.Oct 24 2024, 10:30 AM
kgaillot created this task.
kgaillot created this object with edit policy "Restricted Project (Project)".
kgaillot lowered the priority of this task from Normal to Low.Thu, Jan 2, 4:06 PM
kgaillot updated the task description. (Show Details)