Dynamically adjust watchdog timeout per node
Open, LowPublic
Actions

Assigned To

None

Authored By

	kgaillot
	Oct 24 2024, 10:30 AM

Tags

Referenced Files

None

Subscribers

Description

Goals:

Track each node's STONITH_WATCHDOG_TIMEOUT
- Most likely in the controller
- A node_state attribute would be good, so it persists across reboots
- Cluster nodes could advertise it as part of the join process
- Pacemaker Remote nodes already have a timeout verification when it could be done
- The local STONITH_WATCHDOG_TIMEOUT would be the default before a target node's value is known (watchdog fencing is unreliable before a target is first seen anyway, and a never-seen node shouldn't be running resources)
The stonith-watchdog-timeout cluster option should be deprecated and replaced with one or more new options
- The new options should use standard types/validators when practical
- The new option names should use some variant of "fencing" rather than "stonith"
- Something like watchdog-fencing-duration as a duration (nonnegative interval spec) to set a specific wait time
- Something like watchdog-fencing-auto=true/false/increase/decrease where true = use twice the target-specific value, false = use specified duration exactly, increase = use the higher of the specified duration or twice the target-specific value, decrease = use the lower of the specified duration or twice the target-specific value
- The new syntax should be designed so it is straightforward to XSL-transform the old syntax to it when we eventually drop it

Related Objects

Mentioned In: T749: Validate stonith-watchdog-timeout appropriately and default to 0 on invalid values

Event Timeline

kgaillot triaged this task as Normal priority.Oct 24 2024, 10:30 AM

kgaillot created this task.

kgaillot created this object with edit policy "Restricted Project (Project)".

kgaillot mentioned this in T749: Validate stonith-watchdog-timeout appropriately and default to 0 on invalid values.Oct 24 2024, 10:33 AM

kgaillot lowered the priority of this task from Normal to Low.Jan 2 2025, 4:06 PM

kgaillot updated the task description. (Show Details)