[PATCH 5.10 064/317] tcp: avoid premature drops in tcp_add_backlog()

13 Jun 2024

5.10-stable review patch.  If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit ec00ed472bdb7d0af840da68c8c11bff9f4d9caa ]
While testing TCP performance with latest trees,
I saw suspect SOCKET_BACKLOG drops.
tcp_add_backlog() computes its limit with :
limit = (u32)READ_ONCE(sk->sk_rcvbuf) +
            (u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
    limit += 64 * 1024;
This does not take into account that sk->sk_backlog.len
is reset only at the very end of __release_sock().
Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
sk_rcvbuf in normal conditions.
We should double sk->sk_rcvbuf contribution in the formula
to absorb bubbles in the backlog, which happen more often
for very fast flows.
This change maintains decent protection against abuses.
Fixes: c377411f2494 ("net: sk_add_backlog() take rmem_alloc into account")
Signed-off-by: Eric Dumazet edumazet@google.com
Link: https://lore.kernel.org/r/20240423125620.3309458-1-edumazet@google.com
Signed-off-by: Jakub Kicinski kuba@kernel.org
Signed-off-by: Sasha Levin sashal@kernel.org
---
 net/ipv4/tcp_ipv4.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 85d8688933f3c..0e7179a19e224 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1787,7 +1787,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
-	u32 limit, tail_gso_size, tail_gso_segs;
+	u32 tail_gso_size, tail_gso_segs;
    struct skb_shared_info *shinfo;
    const struct tcphdr *th;
    struct tcphdr *thtail;
@@ -1796,6 +1796,7 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
    bool fragstolen;
    u32 gso_segs;
    u32 gso_size;
+	u64 limit;
    int delta;
/* In case all data was pulled from skb frags (in __pskb_pull_tail()),
@@ -1891,7 +1892,13 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
    __skb_push(skb, hdrlen);
no_coalesce:
-	limit = (u32)READ_ONCE(sk->sk_rcvbuf) + (u32)(READ_ONCE(sk->sk_sndbuf) >> 1);
+	/* sk->sk_backlog.len is reset only at the end of __release_sock().
+	 * Both sk->sk_backlog.len and sk->sk_rmem_alloc could reach
+	 * sk_rcvbuf in normal conditions.
+	 */
+	limit = ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1;
+
+	limit += ((u32)READ_ONCE(sk->sk_sndbuf)) >> 1;
/* Only socket owner can try to collapse/prune rx queues
     * to reduce memory overhead, so add a little headroom here.
@@ -1899,6 +1906,8 @@ bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
     */
    limit += 64 * 1024;
+	limit = min_t(u64, limit, UINT_MAX);
+
    if (unlikely(sk_add_backlog(sk, skb, limit))) {
    	bh_unlock_sock(sk);
    	__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPBACKLOGDROP);
-- 
2.43.0





    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH 5.10 064/317] tcp: avoid premature drops in tcp_add_backlog()