From: Hoang Le hoang.h.le@dektech.com.au
[ Upstream commit 426071f1f3995d7e9603246bffdcbf344cd31719 ]
With huge cluster (e.g >200nodes), the amount of that flow: gap -> retransmit packet -> acked will take time in case of STATE_MSG dropped/delayed because a lot of traffic. This lead to 1.5 sec tolerance value criteria made link easy failure around 2nd, 3rd of failed retransmission attempts.
Instead of re-introduced criteria of 99 faled retransmissions to fix the issue, we increase failure detection timer to ten times tolerance value.
Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria") Acked-by: Jon Maloy jon.maloy@ericsson.com Signed-off-by: Hoang Le hoang.h.le@dektech.com.au Acked-by: Jon Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/link.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c index 999eab592de8..a9d8a81e80cf 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -1084,7 +1084,7 @@ static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, return false;
if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp + - msecs_to_jiffies(r->tolerance))) + msecs_to_jiffies(r->tolerance * 10))) return false;
hdr = buf_msg(skb);