HomeClusterLabs Projects

heartbeat/mysql - Fixed bug where crm_admin is never called, leaving master…

Description

heartbeat/mysql - Fixed bug where crm_admin is never called, leaving master scores to -1 in certain conditions.

Consider the following scenario:

- crm got mysql master slave resource configured without providing check_level and test_table in the config
- crm is put into maintenance mode
- mysql replication is adjusted automatically or by hand
- crm is restarted on all nodes
- crm resources are reprobed
- crm is put into live mode
- at this point all nodes are working as expected but NONE of them got any master-mysql score set thus defaulting to -1. monitor of the resource never called crm_master.
- master fails
- crm will refuse to elect any slaves with the following error

        failednode.com pengine: debug: master_color: mysql:0 master score: -1

When ms_mysql resource is configured master-mysql attribute/score for each node is not set by default thus returning -1. This translates to 'never promote this service as master on this machine'

master-mysql should be set to positive value by the resource agent when RA decides that this machine is suitable for master.

In the configuration set specified above if crm never did any operations on the mysql service such as start/stop/promote/demote score on particular node score remains -1 for that node. It just never called crm_master.

When current master fails and new one needs to be promoted/elected crm is unable to choose new master with following error:

    failednode.com pengine: debug: master_color: mysql:1 master score: 0 ---> because node that hosts mysql:1 is down
    failednode.com pengine: debug: master_color: mysql:0 master score: -1 --> because the current live node got initial default valule

Respectively we fail to promote new master node for the particular service.

    failednode.com pengine: info: master_color: ms_mysql: Promoted 0 instances of a possible 1 to master

When failover procedure is started crm calls resource agents (read ocfs 'init' script with action 'monitor' on all live nodes that host the have the particular master/slave resource started.

This monitor operation is expected to return master-mysql scorenum here. But it did not due to specific conditions and configurations.

To solve this issue we modified the mysql resource agent to always export master-mysql scores depending on the response if called with 'monitor'.

Scores are exported by calling:

    crm_master -l reboot -v SCORE - if status is success. The higher the score, the better the chance to elect this node,
    crm_master -l reboot -D - if monitor operation fails thus instructing the engine that the current node can not be used as master as it got some issues.

Details

Provenance
vaLentin chernoZemski <valentin@siteground.com>Authored on Oct 13 2016, 6:17 AM
Parents
rRa95ea74ce7bb: Merge pull request #842 from ytakeshita/fix_nfsserver_monitor
Branches
Unknown
Tags
Unknown

Event Timeline