HomeClusterLabs Projects

Fix: controller: set timeout on scheduler responses

Description

Fix: controller: set timeout on scheduler responses

Previously, once the DC successfully read the CIB and sent a calculation
request to the scheduler, it wouldn't do anything further with the request,
aside from the message handler for the scheduler's response.

This meant that if the scheduler successfully accepted the request, but then
was unable to reply (such as not getting enough CPU cycles), the controller
would never detect anything wrong, and the cluster would be blocked.

Now, the controller sets a 2-minute timer after handing off the request to the
scheduler, and if it doesn't get a response in that time, it exits and stays
down (if a node is elected DC but can't run the scheduler, we want to ensure it
doesn't interfere with further elections).

Details

Provenance
kgaillotAuthored on Jun 28 2019, 5:20 PM
Parents
rP7dfc0902afa9: Refactor: controller: functionize access to last scheduler request ID
Branches
Unknown
Tags
Unknown

Event Timeline