HomeClusterLabs Projects

Simplify podman_monitor()

Description

Simplify podman_monitor()

Before this change podman_monitor() does two things:
\-> podman_simple_status()

\-> podman inspect {{.State.Running}}

\-> if podman_simple_status == 0 then monitor_cmd_exec()

\-> if [ -z "$OCF_RESKEY_monitor_cmd" ]; then # so if OCF_RESKEY_monitor_cmd is empty we just return SUCCESS
      return $rc
    fi
    # if OCF_RESKEY_monitor_cmd is set to something we execute it
    podman exec ${CONTAINER} $OCF_RESKEY_monitor_cmd

Let's actually only rely on podman exec as invoked inside monitor_cmd_exec
when $OCF_RESKEY_monitor_cmd is non empty (which is the default as it is set to "/bin/true").
When there is no monitor_cmd command defined then it makes sense to rely on podman inspect
calls container in podman_simple_status().

Tested as follows:

  1. Injected the change on an existing bundle-based cluster
  2. Observed that monitoring operations kept working okay
  3. Restarted rabbitmq-bundle and galera-bundle successfully
  4. Killed a container and we correctly detected the monitor failure

Jun 12 09:52:12 controller-0 pacemaker-controld[25747]: notice: controller-0-haproxy-bundle-podman-1_monitor_60000:230 [ ocf-exit-reason:monitor cmd failed (rc=125), output: cannot exec into container that is not running\n ]

  1. Container correctly got restarted after the monitor failure: haproxy-bundle-podman-1 (ocf::heartbeat:podman): Started controller-0
  2. Stopped and removed a container and pcmk detected it correctly:

Jun 12 09:55:15 controller-0 podman(haproxy-bundle-podman-1)[841411]: ERROR: monitor cmd failed (rc=125), output: unable to exec into haproxy-bundle-podman-1: no container with name or ID haproxy-bundle-podman-1 found: no such container
Jun 12 09:55:15 controller-0 pacemaker-execd[25744]: notice: haproxy-bundle-podman-1_monitor_60000:841411:stderr [ ocf-exit-reason:monitor cmd failed (rc=125), output: unable to exec into haproxy-bundle-podman-1: no container with name or ID haproxy-bundle-podman-1 found: no such container ]

  1. pcmk was able to start the container that was stopped and removed:

Jun 12 09:55:16 controller-0 pacemaker-controld[25747]: notice: Result of start operation for haproxy-bundle-podman-1 on controller-0: 0 (ok)

  1. Added 'set -x' to the RA and correctly observed that no 'podman inspect' has been invoked during monitoring operations

Signed-off-by: Michele Baldessari <michele@acksyn.org>

Details

Provenance
Michele Baldessari <michele@acksyn.org>Authored on Jun 12 2019, 6:00 AM
Parents
rRd8400a306042: Avoid double call to podman inspect in podman_simple_status()
Branches
Unknown
Tags
Unknown

Event Timeline