HomeClusterLabs Projects

Fix: crmd: Reset stonith failcount to recover transitioner when the node rejoins

Description

Fix: crmd: Reset stonith failcount to recover transitioner when the node rejoins

CRMd transitioner could not recover from "Too many failures to fence".

Steps to produce:

  1. Two-node cluster with stonith, for example using IPMI.
  2. Node-1 has a complete power outage for a couple of minutes. The

IPMI device is also without power, which causes the fencing to fail

  1. Node-2 tries to fence node-1 for several times but fails.
  2. Node-2 reports "Too many failures to fence node-1 (11), giving up".
  3. The power returns and node-1 boots up normally.
  4. Node-1 rejoins the cluster, but resources are not started on it.

Expected result:
The stonith failcount for node-1 should be reset and resources should
be started on node-1.

Actual result:
Node-2 still logs "Too many failures to fence" and resources are not
started on node-1.

Details

Provenance
gao-yanAuthored on Mar 10 2015, 11:02 AM
Parents
rP72223f67b88c: Merge pull request #663 from davidvossel/clone-notify-env
Branches
Unknown
Tags
Unknown

Event Timeline