Reset timer_problem_decrementer on fault
8f284b26b333
Actions

Description

Reset timer_problem_decrementer on fault

After a heartbeat link's FAULTY and its auto re-enable,
active_instance->timer_problem_decrementer did not reset to zero. So in
the next timer_function_active_token_expired() round,
active_timer_problem_decrementer_start() will not be called. This will
result in that the active_instance->counter_problems of this link can
not be decreased any more. Cause rrp lose the ability to tolerate
network fluctuation.

This problem can be reproduced by the following sequence:

Set RRP in active mode, configure at least 2 heartbeat links.
Unplug one link till corosync-cfgtool -s shows it is FAULTY.
Re-plug this link then corosync-cfgtool -s shows it is active with

no faults.

Unplug this link again but quicky re-plug it before it becomes

FAULTY.

Finally, you can see corosync-cfgtool -s shows it is in

"Incrementing problem counter" state despite it currently is physically
healthy.

It can be solved by not forget to reset timer_problem_decrementer to
zero in active_timer_problem_decrementer_cancel().

Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>

Details

Provenance

Jason <huzhijiang@gmail.com>	Authored on Dec 8 2014, 10:24 AM
jfriesse	Committed on Dec 8 2014, 10:26 AM

Parents

rC6449bea835c9: config: Ensure mcast address/port differs for rrp

Branches

Unknown

Tags

Unknown

Event Timeline

Changes (1)

Path

Size

exec/

totemrrp.c

rC8f284b26b333

View Options