HomeClusterLabs Projects

[PMTUd] add dynamic pong timeout when using crypto

Description

[PMTUd] add dynamic pong timeout when using crypto

problem originally reported by proxmox community, users
observed that under pressure the MTU would flap back and forth
between 2 values due to other node response timeout.

implement a dynamic timeout multiplier when using crypto that
should solve the problem in a more flexible fashion.

When a timeout hits, those new logs will show:

[knet]: [info] host: host: 1 (passive) best link: 0 (pri: 0)
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (4) for host 1 link: 0
[knet]: [info] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 65429
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
[knet]: [info] pmtud: Global data MTU changed to: 65429
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (8) for host 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (16) for host 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (32) for host 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (64) for host 1 link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429
[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Increasing PMTUd response timeout multiplier to (128) for host 1 link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429

and when the latency reduces and it is safe to be more responsive again:

[knet]: [debug] pmtud: Starting PMTUD for host: 1 link: 0
[knet]: [debug] pmtud: Decreasing PMTUd response timeout multiplier to (64) for host 1 link: 0
[knet]: [debug] pmtud: PMTUD completed for host: 1 link: 0 current link mtu: 65429

....

testing this patch on normal hosts is a bit challenging tho.

Patch was tested by hardcoding a super low timeout here:

diff --git a/libknet/threads_pmtud.c b/libknet/threads_pmtud.c
index 4f0ba0f..5e2b89b 100644

  • a/libknet/threads_pmtud.c

+++ b/libknet/threads_pmtud.c
@@ -261,7 +271,8 @@ retry:

/*
 * crypto, under pressure, is a royal PITA
 */
  • pong_timeout_adj_tmp = dst_link->pong_timeout_adj * 2;

+ //pong_timeout_adj_tmp = dst_link->pong_timeout_adj * dst_link->pmtud_crypto_timeout_multiplier;
+ pong_timeout_adj_tmp = 30 * dst_link->pmtud_crypto_timeout_multiplier;

} else {
        pong_timeout_adj_tmp = dst_link->pong_timeout_adj;
}

and using a long running version of api_knet_send_crypto_test with a short PMTUd setfreq (10 sec).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>

Details

Provenance
fabbioneAuthored on Aug 13 2019, 12:41 AM
Parents
rK4aa4db652495: [PMTUd] rework the whole math to calculate MTU
Branches
Unknown
Tags
Unknown

Event Timeline