The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x 833d4313bc1e9e194814917d23e8874d6b651649 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025101604-chamber-playhouse-5278@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 833d4313bc1e9e194814917d23e8874d6b651649 Mon Sep 17 00:00:00 2001 From: "Matthieu Baerts (NGI0)" matttbe@kernel.org Date: Thu, 18 Sep 2025 10:50:18 +0200 Subject: [PATCH] mptcp: reset blackhole on success with non-loopback ifaces
When a first MPTCP connection gets successfully established after a blackhole period, 'active_disable_times' was supposed to be reset when this connection was done via any non-loopback interfaces.
Unfortunately, the opposite condition was checked: only reset when the connection was established via a loopback interface. Fixing this by simply looking at the opposite.
This is similar to what is done with TCP FastOpen, see tcp_fastopen_active_disable_ofo_check().
This patch is a follow-up of a previous discussion linked to commit 893c49a78d9f ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in mptcp_active_enable()."), see [1].
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/4209a283-8822-47bd-95b7-87e96d9b7ea3@kernel.org [1] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@google.com Link: https://patch.msgid.link/20250918-net-next-mptcp-blackhole-reset-loopback-v1... Signed-off-by: Jakub Kicinski kuba@kernel.org
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index e8ffa62ec183..d96130e49942 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -507,7 +507,7 @@ void mptcp_active_enable(struct sock *sk) rcu_read_lock(); dst = __sk_dst_get(sk); dev = dst ? dst_dev_rcu(dst) : NULL; - if (dev && (dev->flags & IFF_LOOPBACK)) + if (!(dev && (dev->flags & IFF_LOOPBACK))) atomic_set(&pernet->active_disable_times, 0); rcu_read_unlock(); }
From: Eric Dumazet edumazet@google.com
[ Upstream commit e7b9ecce562ca6a1de32c56c597fa45e08c44ec0 ]
TCP uses of dev_net() are under RCU protection, change them to dev_net_rcu() to get LOCKDEP support.
Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250301201424.2046477-4-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/inet6_hashtables.h | 2 +- include/net/inet_hashtables.h | 2 +- net/ipv4/tcp_ipv4.c | 12 ++++++------ net/ipv4/tcp_metrics.c | 6 +++--- net/ipv6/tcp_ipv6.c | 22 +++++++++++----------- 5 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 74dd90ff5f129..c32878c69179d 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -150,7 +150,7 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, int iif, int sdif, bool *refcounted) { - struct net *net = dev_net(skb_dst(skb)->dev); + struct net *net = dev_net_rcu(skb_dst(skb)->dev); const struct ipv6hdr *ip6h = ipv6_hdr(skb); struct sock *sk;
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 5eea47f135a42..da818fb0205fe 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -492,7 +492,7 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, const int sdif, bool *refcounted) { - struct net *net = dev_net(skb_dst(skb)->dev); + struct net *net = dev_net_rcu(skb_dst(skb)->dev); const struct iphdr *iph = ip_hdr(skb); struct sock *sk;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 824048679e1b8..d8976753d4e47 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -494,14 +494,14 @@ int tcp_v4_err(struct sk_buff *skb, u32 info) { const struct iphdr *iph = (const struct iphdr *)skb->data; struct tcphdr *th = (struct tcphdr *)(skb->data + (iph->ihl << 2)); - struct tcp_sock *tp; + struct net *net = dev_net_rcu(skb->dev); const int type = icmp_hdr(skb)->type; const int code = icmp_hdr(skb)->code; - struct sock *sk; struct request_sock *fastopen; + struct tcp_sock *tp; u32 seq, snd_una; + struct sock *sk; int err; - struct net *net = dev_net(skb->dev);
sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, iph->daddr, th->dest, iph->saddr, @@ -786,7 +786,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb, arg.iov[0].iov_base = (unsigned char *)&rep; arg.iov[0].iov_len = sizeof(rep.th);
- net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev); + net = sk ? sock_net(sk) : dev_net_rcu(skb_dst(skb)->dev);
/* Invalid TCP option size or twice included auth */ if (tcp_parse_auth_options(tcp_hdr(skb), &md5_hash_location, &aoh)) @@ -1965,7 +1965,7 @@ EXPORT_SYMBOL(tcp_v4_do_rcv);
int tcp_v4_early_demux(struct sk_buff *skb) { - struct net *net = dev_net(skb->dev); + struct net *net = dev_net_rcu(skb->dev); const struct iphdr *iph; const struct tcphdr *th; struct sock *sk; @@ -2176,7 +2176,7 @@ static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph,
int tcp_v4_rcv(struct sk_buff *skb) { - struct net *net = dev_net(skb->dev); + struct net *net = dev_net_rcu(skb->dev); enum skb_drop_reason drop_reason; int sdif = inet_sdif(skb); int dif = inet_iif(skb); diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 95669935494ef..4251670e328c8 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -170,7 +170,7 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, bool reclaim = false;
spin_lock_bh(&tcp_metrics_lock); - net = dev_net(dst->dev); + net = dev_net_rcu(dst->dev);
/* While waiting for the spin-lock the cache might have been populated * with this entry and so we have to check again. @@ -273,7 +273,7 @@ static struct tcp_metrics_block *__tcp_get_metrics_req(struct request_sock *req, return NULL; }
- net = dev_net(dst->dev); + net = dev_net_rcu(dst->dev); hash ^= net_hash_mix(net); hash = hash_32(hash, tcp_metrics_hash_log);
@@ -318,7 +318,7 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk, else return NULL;
- net = dev_net(dst->dev); + net = dev_net_rcu(dst->dev); hash ^= net_hash_mix(net); hash = hash_32(hash, tcp_metrics_hash_log);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 882ce5444572e..06edfe0b2db90 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -376,7 +376,7 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt, { const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data; const struct tcphdr *th = (struct tcphdr *)(skb->data+offset); - struct net *net = dev_net(skb->dev); + struct net *net = dev_net_rcu(skb->dev); struct request_sock *fastopen; struct ipv6_pinfo *np; struct tcp_sock *tp; @@ -864,16 +864,16 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32 int oif, int rst, u8 tclass, __be32 label, u32 priority, u32 txhash, struct tcp_key *key) { - const struct tcphdr *th = tcp_hdr(skb); - struct tcphdr *t1; - struct sk_buff *buff; - struct flowi6 fl6; - struct net *net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev); - struct sock *ctl_sk = net->ipv6.tcp_sk; + struct net *net = sk ? sock_net(sk) : dev_net_rcu(skb_dst(skb)->dev); unsigned int tot_len = sizeof(struct tcphdr); + struct sock *ctl_sk = net->ipv6.tcp_sk; + const struct tcphdr *th = tcp_hdr(skb); __be32 mrst = 0, *topt; struct dst_entry *dst; - __u32 mark = 0; + struct sk_buff *buff; + struct tcphdr *t1; + struct flowi6 fl6; + u32 mark = 0;
if (tsecr) tot_len += TCPOLEN_TSTAMP_ALIGNED; @@ -1036,7 +1036,7 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb, if (!sk && !ipv6_unicast_destination(skb)) return;
- net = sk ? sock_net(sk) : dev_net(skb_dst(skb)->dev); + net = sk ? sock_net(sk) : dev_net_rcu(skb_dst(skb)->dev); /* Invalid TCP option size or twice included auth */ if (tcp_parse_auth_options(th, &md5_hash_location, &aoh)) return; @@ -1739,6 +1739,7 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) { + struct net *net = dev_net_rcu(skb->dev); enum skb_drop_reason drop_reason; int sdif = inet6_sdif(skb); int dif = inet6_iif(skb); @@ -1748,7 +1749,6 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb) bool refcounted; int ret; u32 isn; - struct net *net = dev_net(skb->dev);
drop_reason = SKB_DROP_REASON_NOT_SPECIFIED; if (skb->pkt_type != PACKET_HOST) @@ -1999,7 +1999,7 @@ INDIRECT_CALLABLE_SCOPE int tcp_v6_rcv(struct sk_buff *skb)
void tcp_v6_early_demux(struct sk_buff *skb) { - struct net *net = dev_net(skb->dev); + struct net *net = dev_net_rcu(skb->dev); const struct ipv6hdr *hdr; const struct tcphdr *th; struct sock *sk;
From: Eric Dumazet edumazet@google.com
[ Upstream commit 15492700ac41459b54a6683490adcee350ab11e3 ]
tcp_in_quickack_mode() is called from input path for small packets.
It calls __sk_dst_get() which reads sk->sk_dst_cache which has been put in sock_read_tx group (for good reasons).
Then dst_metric(dst, RTAX_QUICKACK) also needs extra cache line misses.
Cache RTAX_QUICKACK in icsk->icsk_ack.dst_quick_ack to no longer pull these cache lines for the cases a delayed ACK is scheduled.
After this patch TCP receive path does not longer access sock_read_tx group.
Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Reviewed-by: Neal Cardwell ncardwell@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20250312083907.1931644-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Stable-dep-of: 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/inet_connection_sock.h | 3 ++- net/core/sock.c | 6 +++++- net/ipv4/tcp_input.c | 3 +-- 3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 4bd93571e6c1b..bcc138ff087bd 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -116,7 +116,8 @@ struct inet_connection_sock { #define ATO_BITS 8 __u32 ato:ATO_BITS, /* Predicted tick of soft clock */ lrcv_flowlabel:20, /* last received ipv6 flowlabel */ - unused:4; + dst_quick_ack:1, /* cache dst RTAX_QUICKACK */ + unused:3; unsigned long timeout; /* Currently scheduled timeout */ __u32 lrcvtime; /* timestamp of last received data packet */ __u16 last_seg_size; /* Size of last incoming segment */ diff --git a/net/core/sock.c b/net/core/sock.c index d392cb37a864f..b5723adab4ebf 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2547,8 +2547,12 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) u32 max_segs = 1;
sk->sk_route_caps = dst->dev->features; - if (sk_is_tcp(sk)) + if (sk_is_tcp(sk)) { + struct inet_connection_sock *icsk = inet_csk(sk); + sk->sk_route_caps |= NETIF_F_GSO; + icsk->icsk_ack.dst_quick_ack = dst_metric(dst, RTAX_QUICKACK); + } if (sk->sk_route_caps & NETIF_F_GSO) sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE; if (unlikely(sk->sk_gso_disabled)) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4c8d84fc27ca3..1d9e93a04930b 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -331,9 +331,8 @@ static void tcp_enter_quickack_mode(struct sock *sk, unsigned int max_quickacks) static bool tcp_in_quickack_mode(struct sock *sk) { const struct inet_connection_sock *icsk = inet_csk(sk); - const struct dst_entry *dst = __sk_dst_get(sk);
- return (dst && dst_metric(dst, RTAX_QUICKACK)) || + return icsk->icsk_ack.dst_quick_ack || (icsk->icsk_ack.quick && !inet_csk_in_pingpong_mode(sk)); }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 88fe14253e181878c2ddb51a298ae8c468a63010 ]
dst->dev is read locklessly in many contexts, and written in dst_dev_put().
Fixing all the races is going to need many changes.
We probably will have to add full RCU protection.
Add three helpers to ease this painful process.
static inline struct net_device *dst_dev(const struct dst_entry *dst) { return READ_ONCE(dst->dev); }
static inline struct net_device *skb_dst_dev(const struct sk_buff *skb) { return dst_dev(skb_dst(skb)); }
static inline struct net *skb_dst_dev_net(const struct sk_buff *skb) { return dev_net(skb_dst_dev(skb)); }
static inline struct net *skb_dst_dev_net_rcu(const struct sk_buff *skb) { return dev_net_rcu(skb_dst_dev(skb)); }
Fixes: 4a6ce2b6f2ec ("net: introduce a new function dst_dev_put()") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@google.com Link: https://patch.msgid.link/20250630121934.3399505-7-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/dst.h | 20 ++++++++++++++++++++ net/core/dst.c | 4 ++-- net/core/sock.c | 8 ++++---- 3 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h index e18826cd05595..23ee8139a7563 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -561,6 +561,26 @@ static inline void skb_dst_update_pmtu_no_confirm(struct sk_buff *skb, u32 mtu) dst->ops->update_pmtu(dst, NULL, skb, mtu, false); }
+static inline struct net_device *dst_dev(const struct dst_entry *dst) +{ + return READ_ONCE(dst->dev); +} + +static inline struct net_device *skb_dst_dev(const struct sk_buff *skb) +{ + return dst_dev(skb_dst(skb)); +} + +static inline struct net *skb_dst_dev_net(const struct sk_buff *skb) +{ + return dev_net(skb_dst_dev(skb)); +} + +static inline struct net *skb_dst_dev_net_rcu(const struct sk_buff *skb) +{ + return dev_net_rcu(skb_dst_dev(skb)); +} + struct dst_entry *dst_blackhole_check(struct dst_entry *dst, u32 cookie); void dst_blackhole_update_pmtu(struct dst_entry *dst, struct sock *sk, struct sk_buff *skb, u32 mtu, bool confirm_neigh); diff --git a/net/core/dst.c b/net/core/dst.c index cc990706b6451..9a0ddef8bee43 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -150,7 +150,7 @@ void dst_dev_put(struct dst_entry *dst) dst->ops->ifdown(dst, dev); WRITE_ONCE(dst->input, dst_discard); WRITE_ONCE(dst->output, dst_discard_out); - dst->dev = blackhole_netdev; + WRITE_ONCE(dst->dev, blackhole_netdev); netdev_ref_replace(dev, blackhole_netdev, &dst->dev_tracker, GFP_ATOMIC); } @@ -263,7 +263,7 @@ unsigned int dst_blackhole_mtu(const struct dst_entry *dst) { unsigned int mtu = dst_metric_raw(dst, RTAX_MTU);
- return mtu ? : dst->dev->mtu; + return mtu ? : dst_dev(dst)->mtu; } EXPORT_SYMBOL_GPL(dst_blackhole_mtu);
diff --git a/net/core/sock.c b/net/core/sock.c index b5723adab4ebf..a5f248a914042 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2534,8 +2534,8 @@ static u32 sk_dst_gso_max_size(struct sock *sk, struct dst_entry *dst) !ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)); #endif /* pairs with the WRITE_ONCE() in netif_set_gso(_ipv4)_max_size() */ - max_size = is_ipv6 ? READ_ONCE(dst->dev->gso_max_size) : - READ_ONCE(dst->dev->gso_ipv4_max_size); + max_size = is_ipv6 ? READ_ONCE(dst_dev(dst)->gso_max_size) : + READ_ONCE(dst_dev(dst)->gso_ipv4_max_size); if (max_size > GSO_LEGACY_MAX_SIZE && !sk_is_tcp(sk)) max_size = GSO_LEGACY_MAX_SIZE;
@@ -2546,7 +2546,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) { u32 max_segs = 1;
- sk->sk_route_caps = dst->dev->features; + sk->sk_route_caps = dst_dev(dst)->features; if (sk_is_tcp(sk)) { struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2564,7 +2564,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst) sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM; sk->sk_gso_max_size = sk_dst_gso_max_size(sk, dst); /* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */ - max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1); + max_segs = max_t(u32, READ_ONCE(dst_dev(dst)->gso_max_segs), 1); } } sk->sk_gso_max_segs = max_segs;
From: Eric Dumazet edumazet@google.com
[ Upstream commit a74fc62eec155ca5a6da8ff3856f3dc87fe24558 ]
Use the new helpers as a first step to deal with potential dst->dev races.
Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@google.com Link: https://patch.msgid.link/20250630121934.3399505-8-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/inet_hashtables.h | 2 +- include/net/ip.h | 11 ++++++----- include/net/route.h | 2 +- net/ipv4/icmp.c | 24 +++++++++++++----------- net/ipv4/igmp.c | 2 +- net/ipv4/ip_fragment.c | 2 +- net/ipv4/ip_output.c | 6 +++--- net/ipv4/ip_vti.c | 4 ++-- net/ipv4/netfilter.c | 4 ++-- net/ipv4/route.c | 8 ++++---- net/ipv4/tcp_fastopen.c | 4 +++- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/tcp_metrics.c | 8 ++++---- net/ipv4/xfrm4_output.c | 2 +- 14 files changed, 43 insertions(+), 38 deletions(-)
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index da818fb0205fe..3c4118c63cfe1 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -492,7 +492,7 @@ static inline struct sock *__inet_lookup_skb(struct inet_hashinfo *hashinfo, const int sdif, bool *refcounted) { - struct net *net = dev_net_rcu(skb_dst(skb)->dev); + struct net *net = skb_dst_dev_net_rcu(skb); const struct iphdr *iph = ip_hdr(skb); struct sock *sk;
diff --git a/include/net/ip.h b/include/net/ip.h index bd201278c55a5..5f0f1215d2f92 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -475,7 +475,7 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
rcu_read_lock();
- net = dev_net_rcu(dst->dev); + net = dev_net_rcu(dst_dev(dst)); if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) || ip_mtu_locked(dst) || !forwarding) { @@ -489,7 +489,7 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, if (mtu) goto out;
- mtu = READ_ONCE(dst->dev->mtu); + mtu = READ_ONCE(dst_dev(dst)->mtu);
if (unlikely(ip_mtu_locked(dst))) { if (rt->rt_uses_gateway && mtu > 576) @@ -509,16 +509,17 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst, static inline unsigned int ip_skb_dst_mtu(struct sock *sk, const struct sk_buff *skb) { + const struct dst_entry *dst = skb_dst(skb); unsigned int mtu;
if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) { bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
- return ip_dst_mtu_maybe_forward(skb_dst(skb), forwarding); + return ip_dst_mtu_maybe_forward(dst, forwarding); }
- mtu = min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU); - return mtu - lwtunnel_headroom(skb_dst(skb)->lwtstate, mtu); + mtu = min(READ_ONCE(dst_dev(dst)->mtu), IP_MAX_MTU); + return mtu - lwtunnel_headroom(dst->lwtstate, mtu); }
struct dst_metrics *ip_fib_metrics_init(struct nlattr *fc_mx, int fc_mx_len, diff --git a/include/net/route.h b/include/net/route.h index 8a11d19f897bb..232b7bf55ba22 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -369,7 +369,7 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst) const struct net *net;
rcu_read_lock(); - net = dev_net_rcu(dst->dev); + net = dev_net_rcu(dst_dev(dst)); hoplimit = READ_ONCE(net->ipv4.sysctl_ip_default_ttl); rcu_read_unlock(); } diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 8f11870b77377..508b23204edc5 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -311,18 +311,20 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt, { struct dst_entry *dst = &rt->dst; struct inet_peer *peer; + struct net_device *dev; bool rc = true;
if (!apply_ratelimit) return true;
/* No rate limit on loopback */ - if (dst->dev && (dst->dev->flags&IFF_LOOPBACK)) + dev = dst_dev(dst); + if (dev && (dev->flags & IFF_LOOPBACK)) goto out;
rcu_read_lock(); peer = inet_getpeer_v4(net->ipv4.peers, fl4->daddr, - l3mdev_master_ifindex_rcu(dst->dev)); + l3mdev_master_ifindex_rcu(dev)); rc = inet_peer_xrlim_allow(peer, READ_ONCE(net->ipv4.sysctl_icmp_ratelimit)); rcu_read_unlock(); @@ -468,13 +470,13 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb) */ static struct net_device *icmp_get_route_lookup_dev(struct sk_buff *skb) { - struct net_device *route_lookup_dev = NULL; + struct net_device *dev = skb->dev; + const struct dst_entry *dst;
- if (skb->dev) - route_lookup_dev = skb->dev; - else if (skb_dst(skb)) - route_lookup_dev = skb_dst(skb)->dev; - return route_lookup_dev; + if (dev) + return dev; + dst = skb_dst(skb); + return dst ? dst_dev(dst) : NULL; }
static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4, @@ -873,7 +875,7 @@ static enum skb_drop_reason icmp_unreach(struct sk_buff *skb) struct net *net; u32 info = 0;
- net = dev_net_rcu(skb_dst(skb)->dev); + net = skb_dst_dev_net_rcu(skb);
/* * Incomplete header ? @@ -1016,7 +1018,7 @@ static enum skb_drop_reason icmp_echo(struct sk_buff *skb) struct icmp_bxm icmp_param; struct net *net;
- net = dev_net_rcu(skb_dst(skb)->dev); + net = skb_dst_dev_net_rcu(skb); /* should there be an ICMP stat for ignored echos? */ if (READ_ONCE(net->ipv4.sysctl_icmp_echo_ignore_all)) return SKB_NOT_DROPPED_YET; @@ -1186,7 +1188,7 @@ static enum skb_drop_reason icmp_timestamp(struct sk_buff *skb) return SKB_NOT_DROPPED_YET;
out_err: - __ICMP_INC_STATS(dev_net_rcu(skb_dst(skb)->dev), ICMP_MIB_INERRORS); + __ICMP_INC_STATS(skb_dst_dev_net_rcu(skb), ICMP_MIB_INERRORS); return SKB_DROP_REASON_PKT_TOO_SMALL; }
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 9bf09de6a2e77..f4a87b90351e9 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -424,7 +424,7 @@ static int igmpv3_sendpack(struct sk_buff *skb)
pig->csum = ip_compute_csum(igmp_hdr(skb), igmplen);
- return ip_local_out(dev_net(skb_dst(skb)->dev), skb->sk, skb); + return ip_local_out(skb_dst_dev_net(skb), skb->sk, skb); }
static int grec_size(struct ip_mc_list *pmc, int type, int gdel, int sdel) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index 9ca0a183a55ff..183856b0b7409 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -488,7 +488,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, /* Process an incoming IP datagram fragment. */ int ip_defrag(struct net *net, struct sk_buff *skb, u32 user) { - struct net_device *dev = skb->dev ? : skb_dst(skb)->dev; + struct net_device *dev = skb->dev ? : skb_dst_dev(skb); int vif = l3mdev_master_ifindex_rcu(dev); struct ipq *qp;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 49811c9281d42..4d432e314bcb2 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -117,7 +117,7 @@ int __ip_local_out(struct net *net, struct sock *sk, struct sk_buff *skb) skb->protocol = htons(ETH_P_IP);
return nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_OUT, - net, sk, skb, NULL, skb_dst(skb)->dev, + net, sk, skb, NULL, skb_dst_dev(skb), dst_output); }
@@ -200,7 +200,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s { struct dst_entry *dst = skb_dst(skb); struct rtable *rt = dst_rtable(dst); - struct net_device *dev = dst->dev; + struct net_device *dev = dst_dev(dst); unsigned int hh_len = LL_RESERVED_SPACE(dev); struct neighbour *neigh; bool is_v6gw = false; @@ -426,7 +426,7 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb) { - struct net_device *dev = skb_dst(skb)->dev, *indev = skb->dev; + struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev;
skb->dev = dev; skb->protocol = htons(ETH_P_IP); diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index f0b4419cef349..fc95161663071 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -229,7 +229,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev, goto tx_error_icmp; }
- tdev = dst->dev; + tdev = dst_dev(dst);
if (tdev == dev) { dst_release(dst); @@ -259,7 +259,7 @@ static netdev_tx_t vti_xmit(struct sk_buff *skb, struct net_device *dev, xmit: skb_scrub_packet(skb, !net_eq(tunnel->net, dev_net(dev))); skb_dst_set(skb, dst); - skb->dev = skb_dst(skb)->dev; + skb->dev = skb_dst_dev(skb);
err = dst_output(tunnel->net, skb->sk, skb); if (net_xmit_eval(err) == 0) diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c index e0aab66cd9251..dff06b9eb6607 100644 --- a/net/ipv4/netfilter.c +++ b/net/ipv4/netfilter.c @@ -20,12 +20,12 @@ /* route_me_harder function, used by iptable_nat, iptable_mangle + ip_queue */ int ip_route_me_harder(struct net *net, struct sock *sk, struct sk_buff *skb, unsigned int addr_type) { + struct net_device *dev = skb_dst_dev(skb); const struct iphdr *iph = ip_hdr(skb); struct rtable *rt; struct flowi4 fl4 = {}; __be32 saddr = iph->saddr; __u8 flags; - struct net_device *dev = skb_dst(skb)->dev; struct flow_keys flkeys; unsigned int hh_len;
@@ -74,7 +74,7 @@ int ip_route_me_harder(struct net *net, struct sock *sk, struct sk_buff *skb, un #endif
/* Change in oif may mean change in hh_len. */ - hh_len = skb_dst(skb)->dev->hard_header_len; + hh_len = skb_dst_dev(skb)->hard_header_len; if (skb_headroom(skb) < hh_len && pskb_expand_head(skb, HH_DATA_ALIGN(hh_len - skb_headroom(skb)), 0, GFP_ATOMIC)) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 261ddb6542a40..7d04df4fc6608 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -413,7 +413,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, const void *daddr) { const struct rtable *rt = container_of(dst, struct rtable, dst); - struct net_device *dev = dst->dev; + struct net_device *dev = dst_dev(dst); struct neighbour *n;
rcu_read_lock(); @@ -440,7 +440,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, static void ipv4_confirm_neigh(const struct dst_entry *dst, const void *daddr) { const struct rtable *rt = container_of(dst, struct rtable, dst); - struct net_device *dev = dst->dev; + struct net_device *dev = dst_dev(dst); const __be32 *pkey = daddr;
if (rt->rt_gw_family == AF_INET) { @@ -1025,7 +1025,7 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu) return;
rcu_read_lock(); - net = dev_net_rcu(dst->dev); + net = dev_net_rcu(dst_dev(dst)); if (mtu < net->ipv4.ip_rt_min_pmtu) { lock = true; mtu = min(old_mtu, net->ipv4.ip_rt_min_pmtu); @@ -1323,7 +1323,7 @@ static unsigned int ipv4_default_advmss(const struct dst_entry *dst) struct net *net;
rcu_read_lock(); - net = dev_net_rcu(dst->dev); + net = dev_net_rcu(dst_dev(dst)); advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size, net->ipv4.ip_rt_min_advmss); rcu_read_unlock(); diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 408985eb74eef..86c995dc1c5e5 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -558,6 +558,7 @@ bool tcp_fastopen_active_should_disable(struct sock *sk) void tcp_fastopen_active_disable_ofo_check(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); + struct net_device *dev; struct dst_entry *dst; struct sk_buff *skb;
@@ -575,7 +576,8 @@ void tcp_fastopen_active_disable_ofo_check(struct sock *sk) } else if (tp->syn_fastopen_ch && atomic_read(&sock_net(sk)->ipv4.tfo_active_disable_times)) { dst = sk_dst_get(sk); - if (!(dst && dst->dev && (dst->dev->flags & IFF_LOOPBACK))) + dev = dst ? dst_dev(dst) : NULL; + if (!(dev && (dev->flags & IFF_LOOPBACK))) atomic_set(&sock_net(sk)->ipv4.tfo_active_disable_times, 0); dst_release(dst); } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index d8976753d4e47..1572562b0498c 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -786,7 +786,7 @@ static void tcp_v4_send_reset(const struct sock *sk, struct sk_buff *skb, arg.iov[0].iov_base = (unsigned char *)&rep; arg.iov[0].iov_len = sizeof(rep.th);
- net = sk ? sock_net(sk) : dev_net_rcu(skb_dst(skb)->dev); + net = sk ? sock_net(sk) : skb_dst_dev_net_rcu(skb);
/* Invalid TCP option size or twice included auth */ if (tcp_parse_auth_options(tcp_hdr(skb), &md5_hash_location, &aoh)) diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c index 4251670e328c8..03c068ea27b6a 100644 --- a/net/ipv4/tcp_metrics.c +++ b/net/ipv4/tcp_metrics.c @@ -166,11 +166,11 @@ static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst, unsigned int hash) { struct tcp_metrics_block *tm; - struct net *net; bool reclaim = false; + struct net *net;
spin_lock_bh(&tcp_metrics_lock); - net = dev_net_rcu(dst->dev); + net = dev_net_rcu(dst_dev(dst));
/* While waiting for the spin-lock the cache might have been populated * with this entry and so we have to check again. @@ -273,7 +273,7 @@ static struct tcp_metrics_block *__tcp_get_metrics_req(struct request_sock *req, return NULL; }
- net = dev_net_rcu(dst->dev); + net = dev_net_rcu(dst_dev(dst)); hash ^= net_hash_mix(net); hash = hash_32(hash, tcp_metrics_hash_log);
@@ -318,7 +318,7 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk, else return NULL;
- net = dev_net_rcu(dst->dev); + net = dev_net_rcu(dst_dev(dst)); hash ^= net_hash_mix(net); hash = hash_32(hash, tcp_metrics_hash_log);
diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c index 3cff51ba72bb0..0ae67d537499a 100644 --- a/net/ipv4/xfrm4_output.c +++ b/net/ipv4/xfrm4_output.c @@ -31,7 +31,7 @@ static int __xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb) int xfrm4_output(struct net *net, struct sock *sk, struct sk_buff *skb) { return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, - net, sk, skb, skb->dev, skb_dst(skb)->dev, + net, sk, skb, skb->dev, skb_dst_dev(skb), __xfrm4_output, !(IPCB(skb)->flags & IPSKB_REROUTED)); }
From: Sharath Chandra Vurukala quic_sharathv@quicinc.com
[ Upstream commit 1dbf1d590d10a6d1978e8184f8dfe20af22d680a ]
In ip_output() skb->dev is updated from the skb_dst(skb)->dev this can become invalid when the interface is unregistered and freed,
Introduced new skb_dst_dev_rcu() function to be used instead of skb_dst_dev() within rcu_locks in ip_output.This will ensure that all the skb's associated with the dev being deregistered will be transnmitted out first, before freeing the dev.
Given that ip_output() is called within an rcu_read_lock() critical section or from a bottom-half context, it is safe to introduce an RCU read-side critical section within it.
Multiple panic call stacks were observed when UL traffic was run in concurrency with device deregistration from different functions, pasting one sample for reference.
[496733.627565][T13385] Call trace: [496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0 [496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498 [496733.627595][T13385] ip_finish_output+0xa4/0xf4 [496733.627605][T13385] ip_output+0x100/0x1a0 [496733.627613][T13385] ip_send_skb+0x68/0x100 [496733.627618][T13385] udp_send_skb+0x1c4/0x384 [496733.627625][T13385] udp_sendmsg+0x7b0/0x898 [496733.627631][T13385] inet_sendmsg+0x5c/0x7c [496733.627639][T13385] __sys_sendto+0x174/0x1e4 [496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c [496733.627653][T13385] invoke_syscall+0x58/0x11c [496733.627662][T13385] el0_svc_common+0x88/0xf4 [496733.627669][T13385] do_el0_svc+0x2c/0xb0 [496733.627676][T13385] el0_svc+0x2c/0xa4 [496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4 [496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
Changes in v3: - Replaced WARN_ON() with WARN_ON_ONCE(), as suggested by Willem de Bruijn. - Dropped legacy lines mistakenly pulled in from an outdated branch.
Changes in v2: - Addressed review comments from Eric Dumazet - Used READ_ONCE() to prevent potential load/store tearing - Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
Signed-off-by: Sharath Chandra Vurukala quic_sharathv@quicinc.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20250730105118.GA26100@hu-sharathv-hyd.qualcomm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/dst.h | 12 ++++++++++++ net/ipv4/ip_output.c | 15 ++++++++++----- 2 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/include/net/dst.h b/include/net/dst.h index 23ee8139a7563..e5c9ea1883838 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -566,11 +566,23 @@ static inline struct net_device *dst_dev(const struct dst_entry *dst) return READ_ONCE(dst->dev); }
+static inline struct net_device *dst_dev_rcu(const struct dst_entry *dst) +{ + /* In the future, use rcu_dereference(dst->dev) */ + WARN_ON_ONCE(!rcu_read_lock_held()); + return READ_ONCE(dst->dev); +} + static inline struct net_device *skb_dst_dev(const struct sk_buff *skb) { return dst_dev(skb_dst(skb)); }
+static inline struct net_device *skb_dst_dev_rcu(const struct sk_buff *skb) +{ + return dst_dev_rcu(skb_dst(skb)); +} + static inline struct net *skb_dst_dev_net(const struct sk_buff *skb) { return dev_net(skb_dst_dev(skb)); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 4d432e314bcb2..0bda561aa8a89 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -426,15 +426,20 @@ int ip_mc_output(struct net *net, struct sock *sk, struct sk_buff *skb)
int ip_output(struct net *net, struct sock *sk, struct sk_buff *skb) { - struct net_device *dev = skb_dst_dev(skb), *indev = skb->dev; + struct net_device *dev, *indev = skb->dev; + int ret_val;
+ rcu_read_lock(); + dev = skb_dst_dev_rcu(skb); skb->dev = dev; skb->protocol = htons(ETH_P_IP);
- return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, - net, sk, skb, indev, dev, - ip_finish_output, - !(IPCB(skb)->flags & IPSKB_REROUTED)); + ret_val = NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, + net, sk, skb, indev, dev, + ip_finish_output, + !(IPCB(skb)->flags & IPSKB_REROUTED)); + rcu_read_unlock(); + return ret_val; } EXPORT_SYMBOL(ip_output);
From: Kuniyuki Iwashima kuniyu@google.com
[ Upstream commit 108a86c71c93ff28087994e6107bc99ebe336629 ]
mptcp_active_enable() calls sk_dst_get(), which returns dst with its refcount bumped, but forgot dst_release().
Let's add missing dst_release().
Cc: stable@vger.kernel.org Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Signed-off-by: Kuniyuki Iwashima kuniyu@google.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20250916214758.650211-7-kuniyu@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces") Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/ctrl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index dd595d9b5e50c..6a97d11bda7e2 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -385,6 +385,8 @@ void mptcp_active_enable(struct sock *sk)
if (dst && dst->dev && (dst->dev->flags & IFF_LOOPBACK)) atomic_set(&pernet->active_disable_times, 0); + + dst_release(dst); } }
From: Kuniyuki Iwashima kuniyu@google.com
[ Upstream commit 893c49a78d9f85e4b8081b908fb7c407d018106a ]
mptcp_active_enable() is called from subflow_finish_connect(), which is icsk->icsk_af_ops->sk_rx_dst_set() and it's not always under RCU.
Using sk_dst_get(sk)->dev could trigger UAF.
Let's use __sk_dst_get() and dst_dev_rcu().
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Signed-off-by: Kuniyuki Iwashima kuniyu@google.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20250916214758.650211-8-kuniyu@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces") Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/ctrl.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index 6a97d11bda7e2..9e1f61b02a66e 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -381,12 +381,15 @@ void mptcp_active_enable(struct sock *sk) struct mptcp_pernet *pernet = mptcp_get_pernet(sock_net(sk));
if (atomic_read(&pernet->active_disable_times)) { - struct dst_entry *dst = sk_dst_get(sk); + struct net_device *dev; + struct dst_entry *dst;
- if (dst && dst->dev && (dst->dev->flags & IFF_LOOPBACK)) + rcu_read_lock(); + dst = __sk_dst_get(sk); + dev = dst ? dst_dev_rcu(dst) : NULL; + if (dev && (dev->flags & IFF_LOOPBACK)) atomic_set(&pernet->active_disable_times, 0); - - dst_release(dst); + rcu_read_unlock(); } }
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
[ Upstream commit 833d4313bc1e9e194814917d23e8874d6b651649 ]
When a first MPTCP connection gets successfully established after a blackhole period, 'active_disable_times' was supposed to be reset when this connection was done via any non-loopback interfaces.
Unfortunately, the opposite condition was checked: only reset when the connection was established via a loopback interface. Fixing this by simply looking at the opposite.
This is similar to what is done with TCP FastOpen, see tcp_fastopen_active_disable_ofo_check().
This patch is a follow-up of a previous discussion linked to commit 893c49a78d9f ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in mptcp_active_enable()."), see [1].
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/4209a283-8822-47bd-95b7-87e96d9b7ea3@kernel.org [1] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@google.com Link: https://patch.msgid.link/20250918-net-next-mptcp-blackhole-reset-loopback-v1... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/ctrl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index 9e1f61b02a66e..0d60556cfefab 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -387,7 +387,7 @@ void mptcp_active_enable(struct sock *sk) rcu_read_lock(); dst = __sk_dst_get(sk); dev = dst ? dst_dev_rcu(dst) : NULL; - if (dev && (dev->flags & IFF_LOOPBACK)) + if (!(dev && (dev->flags & IFF_LOOPBACK))) atomic_set(&pernet->active_disable_times, 0); rcu_read_unlock(); }
Hi Sasha, Greg,
On 20/10/2025 17:44, Sasha Levin wrote:
From: "Matthieu Baerts (NGI0)" matttbe@kernel.org
[ Upstream commit 833d4313bc1e9e194814917d23e8874d6b651649 ]
The backport of this whole series for v6.12, containing these commits...
- e7b9ecce562c ("tcp: convert to dev_net_rcu()") - 15492700ac41 ("tcp: cache RTAX_QUICKACK metric in a hot cache line") - 88fe14253e18 ("net: dst: add four helpers to annotate data-races around dst->dev") - a74fc62eec15 ("ipv4: adopt dst_dev, skb_dst_dev and skb_dst_dev_net[_rcu]") - 1dbf1d590d10 ("net: Add locking to protect skb->dev access in ip_output") - 108a86c71c93 ("mptcp: Call dst_release() in mptcp_active_enable().") - 893c49a78d9f ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in mptcp_active_enable().") - 833d4313bc1e ("mptcp: reset blackhole on success with non-loopback ifaces")
... looks good to me, if that's also OK for Eric and/or Kuniyuki.
Please note that these commits should also help to backport a few more fixes from Kuniyuki from this series:
https://lore.kernel.org/20250916214758.650211-1-kuniyu@google.com
- 3d3466878afd ("smc: Fix use-after-free in __pnet_find_base_ndev().") - 935d783e5de9 ("smc: Use __sk_dst_get() and dst_dev_rcu() in in smc_clc_prfx_set().") - 235f81045c00 ("smc: Use __sk_dst_get() and dst_dev_rcu() in smc_clc_prfx_match().") - 0b0e4d51c655 ("smc: Use __sk_dst_get() and dst_dev_rcu() in smc_vlan_by_tcpsk().") - c65f27b9c3be ("tls: Use __sk_dst_get() and dst_dev_rcu() in get_netdev_for_sock().")
And also eventually commit b62a59c18b69 ("tcp: use dst_dev_rcu() in tcp_fastopen_active_disable_ofo_check()") from Eric.
(I don't know who will initiate these backports.)
Cheers, Matt
linux-stable-mirror@lists.linaro.org