Page MenuHomeClusterLabs Projects

Fix handling of fence device monitor timeouts
Open, NormalPublic

Assigned To
Authored By
kgaillot
Mar 20 2024, 5:57 PM
Tags
  • Restricted Project
  • Restricted Project
  • Restricted Project
  • Restricted Project
  • Restricted Project
  • Restricted Project
  • Restricted Project
  • Restricted Project
Referenced Files
None
Subscribers

Description

The executor uses pcmk_monitor_timeout, but the controller considers a recurring monitor to have timed out after its op timeout expires. If pcmk_monitor_timeout is long, a stonith stop action can fail: the monitor is declared as timed out before the pcmk_monitor_timeout expires, the stop action is requested, and its timer begins counting down, but the stop action can't begin until after the monitor finishes or pcmk_monitor_timeout expires.

Separately, the pcmk_monitor_timeout default is 60s, while the default for generic monitor operations is 20s. We need to investigate how each is used and how they interact, and possibly either change the pcmk_monitor_timeout default to 20s, special-case fence device monitors in the scheduler to use the pcmk_monitor_timeout default, or some other improvement.

Once it is all straightened out, we should update Pacemaker Explained with the correct information about the topic.

Related:

Event Timeline

kgaillot triaged this task as Normal priority.Mar 20 2024, 5:57 PM
kgaillot created this task.
kgaillot created this object with edit policy "Restricted Project (Project)".
kgaillot added projects: Restricted Project, Restricted Project.Mar 20 2024, 5:59 PM
kgaillot moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.
kgaillot edited projects, added Restricted Project; removed Restricted Project.May 21 2024, 10:35 AM