On Mon, 2024-02-19 at 18:35 +0100, Eric Dumazet wrote:
On Mon, Feb 19, 2024 at 6:21 PM Eric Dumazet edumazet@google.com wrote:
On Thu, Feb 15, 2024 at 7:25 PM Matthieu Baerts (NGI0) matttbe@kernel.org wrote:
From: Paolo Abeni pabeni@redhat.com
Since the introduction of the subflow ULP diag interface, the dump callback accessed all the subflow data with lockless.
We need either to annotate all the read and write operation accordingly, or acquire the subflow socket lock. Let's do latter, even if slower, to avoid a diffstat havoc.
Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
Notes:
This patch modifies the existing ULP API. No better solutions have been found for -net, and there is some similar prior art, see commit 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info").
Please also note that TLS ULP Diag has likely the same issue.
To: Boris Pismenny borisp@nvidia.com To: John Fastabend john.fastabend@gmail.com
include/net/tcp.h | 2 +- net/mptcp/diag.c | 6 +++++- net/tls/tls_main.c | 2 +- 3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h index dd78a1181031..f6eba9652d01 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -2506,7 +2506,7 @@ struct tcp_ulp_ops { /* cleanup ulp */ void (*release)(struct sock *sk); /* diagnostic */
int (*get_info)(const struct sock *sk, struct sk_buff *skb);
int (*get_info)(struct sock *sk, struct sk_buff *skb); size_t (*get_info_size)(const struct sock *sk); /* clone ulp */ void (*clone)(const struct request_sock *req, struct sock *newsk,
diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c index a536586742f2..e57c5f47f035 100644 --- a/net/mptcp/diag.c +++ b/net/mptcp/diag.c @@ -13,17 +13,19 @@ #include <uapi/linux/mptcp.h> #include "protocol.h"
-static int subflow_get_info(const struct sock *sk, struct sk_buff *skb) +static int subflow_get_info(struct sock *sk, struct sk_buff *skb) { struct mptcp_subflow_context *sf; struct nlattr *start; u32 flags = 0;
bool slow; int err; start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP); if (!start) return -EMSGSIZE;
slow = lock_sock_fast(sk); rcu_read_lock();
I am afraid lockdep is not happy with this change.
Paolo, we probably need the READ_ONCE() annotations after all.
Or perhaps something like the following would be enough.
diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c index 6ff6f14674aa2941bc04c680bacd9f79fc65060d..7017dd60659dc7133318c1c82e3f429bea3a5d57 100644 --- a/net/mptcp/diag.c +++ b/net/mptcp/diag.c @@ -21,6 +21,9 @@ static int subflow_get_info(struct sock *sk, struct sk_buff *skb) bool slow; int err;
if (inet_sk_state_load(sk) == TCP_LISTEN)
return 0;
start = nla_nest_start_noflag(skb, INET_ULP_INFO_MPTCP); if (!start) return -EMSGSIZE;
Thanks for the head-up. This later option looks preferable, to avoid quit a bit of noise with _ONCE annotation. Is there a syzkaller splat I could look at? if it landed on the ML, I missed it.
Thanks!
Paolo