From: Eric Dumazet edumazet@google.com
[ Upstream commit 4fd84a0aaf2ba125b441aa09d415022385e66bf2 ]
inet_diag_bc_sk() runs with an unlocked socket, annotate potential races with READ_ONCE().
Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@google.com Link: https://patch.msgid.link/20250828102738.2065992-4-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
Explanation
- Fixes unlocked read races: inet_diag_bc_sk() runs without the socket lock; the patch snapshots fields using READ_ONCE() to avoid data races and torn/unstable reads when populating the filter entry used by the bytecode engine. - Snapshots `sk_family`: `net/ipv4/inet_diag.c:603` (`entry.family = READ_ONCE(sk->sk_family);`) - Snapshots ports and ifindex: `net/ipv4/inet_diag.c:605` (`entry.sport = READ_ONCE(inet->inet_num);`), `net/ipv4/inet_diag.c:606` (`entry.dport = ntohs(READ_ONCE(inet->inet_dport));`), `net/ipv4/inet_diag.c:607` (`entry.ifindex = READ_ONCE(sk->sk_bound_dev_if);`) - Snapshots userlocks and mark: `net/ipv4/inet_diag.c:609` (`entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0;`), `net/ipv4/inet_diag.c:612` (`entry.mark = READ_ONCE(sk->sk_mark);`) - Adds harmless const-correctness for `inet`: `net/ipv4/inet_diag.c:597` (`const struct inet_sock *inet = inet_sk(sk);`)
- Consistent family/address decision: entry_fill_addrs() now branches on the single, snapshotted family stored in entry, rather than re-reading the socket family, removing a possible inconsistency window between family read and address selection. - Uses `entry->family` instead of `sk->sk_family`: `net/ipv4/inet_diag.c:583` (`if (entry->family == AF_INET6) { ... }`)
- Impact and correctness: - inet_diag_bc_sk() collects a snapshot which the bytecode interpreter compares against filters. Without READ_ONCE(), unlocked reads may be optimized into multiple loads or observe mixed values, causing spurious filter mismatches (missing or extra sockets in diagnostics) or KCSAN data-race reports. - No behavioral change beyond stabilizing a read snapshot; avoids inconsistencies and better documents concurrency expectations.
- Small, contained change: - Single file (`net/ipv4/inet_diag.c`) and localized to entry construction in inet_diag_bc_sk() and to a helper branch condition. - No API/ABI changes; only read-side annotations and a trivial const qualifier.
- Low regression risk: - READ_ONCE() is a read-only annotation; it neither introduces ordering constraints beyond the atomic access nor changes control flow. - entry_fill_addrs() uses the already-read family for a consistent decision, which is strictly safer than re-reading an unlocked field.
- Broad but safe applicability: - inet_diag_bc_sk() is called across protocols (e.g., `net/ipv4/udp_diag.c:16`, `net/ipv4/tcp_diag.c:366`, `net/ipv4/raw_diag.c:131`, `net/mptcp/mptcp_diag.c:20`). All benefit from a consistent snapshot without requiring additional locking.
- Stable criteria assessment: - Fixes a real bug class (lockless data races/inconsistent reads) affecting user-visible diagnostics correctness, especially under concurrent socket mutations. - Minimal, non-architectural change with negligible risk. - No new features or behavior changes beyond stabilizing reads. - While there’s no explicit “Cc: stable” tag, the change aligns with typical stable backports that annotate/fix data races in widely used subsystems.
Conclusion: This is a low-risk correctness fix to avoid unlocked read races in inet_diag filtering and should be backported to stable trees.
net/ipv4/inet_diag.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 2fa53b16fe778..238b2a4a6cf43 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -785,7 +785,7 @@ static void entry_fill_addrs(struct inet_diag_entry *entry, const struct sock *sk) { #if IS_ENABLED(CONFIG_IPV6) - if (sk->sk_family == AF_INET6) { + if (entry->family == AF_INET6) { entry->saddr = sk->sk_v6_rcv_saddr.s6_addr32; entry->daddr = sk->sk_v6_daddr.s6_addr32; } else @@ -798,18 +798,18 @@ static void entry_fill_addrs(struct inet_diag_entry *entry,
int inet_diag_bc_sk(const struct nlattr *bc, struct sock *sk) { - struct inet_sock *inet = inet_sk(sk); + const struct inet_sock *inet = inet_sk(sk); struct inet_diag_entry entry;
if (!bc) return 1;
- entry.family = sk->sk_family; + entry.family = READ_ONCE(sk->sk_family); entry_fill_addrs(&entry, sk); - entry.sport = inet->inet_num; - entry.dport = ntohs(inet->inet_dport); - entry.ifindex = sk->sk_bound_dev_if; - entry.userlocks = sk_fullsock(sk) ? sk->sk_userlocks : 0; + entry.sport = READ_ONCE(inet->inet_num); + entry.dport = ntohs(READ_ONCE(inet->inet_dport)); + entry.ifindex = READ_ONCE(sk->sk_bound_dev_if); + entry.userlocks = sk_fullsock(sk) ? READ_ONCE(sk->sk_userlocks) : 0; if (sk_fullsock(sk)) entry.mark = READ_ONCE(sk->sk_mark); else if (sk->sk_state == TCP_NEW_SYN_RECV)