HomeClusterLabs Projects

Make the check for the docker daemon being up more robust

Description

Make the check for the docker daemon being up more robust

This amends 5941b98140b09e39b4dc2ee155817b287ef32859 (Fails docker RA
gracefully when command not found). That commit checked for a pidfile
which tends to be less robust in the presence of stale pidfiles and
also adds a configuration option for the pidfile location which is
more churn than needed to simply check for a service availability.

Let's simply call 'docker version'. When that commands returns 1 the docker
daemon is not running and also return OCF_ERR_GENERIC instead of
OCF_NOT_RUNNING. This is a key point because if the docker daemon
is stopped and not running it can very well be that the containers
are still up (e.g. when you use live-restore in docker). In this
situation we want an explicit fence event to be triggered due to
the failure of stopping.

Not doing so would mean that the stop operation returned ok and
for example we'd be starting an A/P resource on a second node all
the while it was still running on the node there the docker daemon
was stopped.

We also explicitely catch OCF_ERR_GENERIC in the docker_stop function
to make our intent clearer.

Tested this in an Openstack deployment and observed the following:
A) All the usual pcmk operations still correctly work
B) A 'systemctl stop docker' will eventually trigger a fence operation

on the node.

Co-Authored-By: Luca Miccini <lmiccini@redhat.com>
Co-Authored-By: Damien Ciabrini <dciabrin@redhat.com>
Signed-off-by: Michele Baldessari <michele@acksyn.org>

Details

Provenance
Michele Baldessari <michele@acksyn.org>Authored on Aug 28 2019, 4:46 AM
Parents
rR0be589169f1d: Merge pull request #1387 from ClusterLabs/CI-fix-make-rpm
Branches
Unknown
Tags
Unknown

Event Timeline