Here is a new batch of fixes related to MPTCP for v6.5 and older.
Patches 1 and 2 fix issues with MPTCP Join selftest when manually launched with '-i' parameter to use 'ip mptcp' tool instead of the dedicated one (pm_nl_ctl). The issues have been there since v5.18.
Thank you Andrea for your first contributions to MPTCP code in the upstream kernel!
Patch 3 avoids corrupting the data stream when trying to reset connections that have fallen back to TCP. This can happen from v6.1.
Patch 4 fixes a race when doing a disconnect() and an accept() in parallel on a listener socket. The issue only happens in rare cases if the user is really unlucky since a fix that landed in v6.3 but backported up to v6.1.
Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- Andrea Claudi (2): selftests: mptcp: join: fix 'delete and re-add' test selftests: mptcp: join: fix 'implicit EP' test
Paolo Abeni (2): mptcp: avoid bogus reset on fallback close mptcp: fix disconnect vs accept race
net/mptcp/protocol.c | 2 +- net/mptcp/protocol.h | 1 - net/mptcp/subflow.c | 60 ++++++++++++------------- tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 ++- 4 files changed, 35 insertions(+), 34 deletions(-) --- base-commit: 0f71c9caf26726efea674646f566984e735cc3b9 change-id: 20230803-upstream-net-20230803-misc-fixes-6-5-6046c6ca74b6
Best regards,
From: Andrea Claudi aclaudi@redhat.com
mptcp_join 'delete and re-add' test fails when using ip mptcp:
$ ./mptcp_join.sh -iI <snip> 002 delete and re-add before delete[ ok ] mptcp_info subflows=1 [ ok ] Error: argument "ADDRESS" is wrong: invalid for non-zero id address after delete[fail] got 2:2 subflows expected 1
This happens because endpoint delete includes an ip address while id is not 0, contrary to what is indicated in the ip mptcp man page:
"When used with the delete id operation, an IFADDR is only included when the ID is 0."
This fixes the issue using the $addr variable in pm_nl_del_endpoint() only when id is 0.
Fixes: 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") Cc: stable@vger.kernel.org Signed-off-by: Andrea Claudi aclaudi@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 3c2096ac97ef..067fabc401f1 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -705,6 +705,7 @@ pm_nl_del_endpoint() local addr=$3
if [ $ip_mptcp -eq 1 ]; then + [ $id -ne 0 ] && addr='' ip -n $ns mptcp endpoint delete id $id $addr else ip netns exec $ns ./pm_nl_ctl del $id $addr
From: Andrea Claudi aclaudi@redhat.com
mptcp_join 'implicit EP' test currently fails when using ip mptcp:
$ ./mptcp_join.sh -iI <snip> 001 implicit EP creation[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 ' Error: too many addresses or duplicate one: -22. ID change is prevented[fail] expected '10.0.2.2 10.0.2.2 id 1 implicit' found '10.0.2.2 id 1 rawflags 10 ' modif is allowed[fail] expected '10.0.2.2 10.0.2.2 id 1 signal' found '10.0.2.2 id 1 signal '
This happens because of two reasons: - iproute v6.3.0 does not support the implicit flag, fixed with iproute2-next commit 3a2535a41854 ("mptcp: add support for implicit flag") - pm_nl_check_endpoint wrongly expects the ip address to be repeated two times in iproute output, and does not account for a final whitespace in it.
This fixes the issue trimming the whitespace in the output string and removing the double address in the expected string.
Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case") Cc: stable@vger.kernel.org Signed-off-by: Andrea Claudi aclaudi@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 067fabc401f1..d01b73a8ed0f 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -796,10 +796,11 @@ pm_nl_check_endpoint() fi
if [ $ip_mptcp -eq 1 ]; then + # get line and trim trailing whitespace line=$(ip -n $ns mptcp endpoint show $id) + line="${line% }" # the dump order is: address id flags port dev - expected_line="$addr" - [ -n "$addr" ] && expected_line="$expected_line $addr" + [ -n "$addr" ] && expected_line="$addr" expected_line="$expected_line $id" [ -n "$_flags" ] && expected_line="$expected_line ${_flags//","/" "}" [ -n "$dev" ] && expected_line="$expected_line $dev"
From: Paolo Abeni pabeni@redhat.com
Since the blamed commit, the MPTCP protocol unconditionally sends TCP resets on all the subflows on disconnect().
That fits full-blown MPTCP sockets - to implement the fastclose mechanism - but causes unexpected corruption of the data stream, caught as sporadic self-tests failures.
Fixes: d21f83485518 ("mptcp: use fastclose on more edge scenarios") Cc: stable@vger.kernel.org Tested-by: Matthieu Baerts matthieu.baerts@tessares.net Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/419 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- net/mptcp/protocol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3317d1cca156..ac7c11a5cbe5 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2335,7 +2335,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
- if (flags & MPTCP_CF_FASTCLOSE) { + if ((flags & MPTCP_CF_FASTCLOSE) && !__mptcp_check_fallback(msk)) { /* be sure to force the tcp_disconnect() path, * to generate the egress reset */
From: Paolo Abeni pabeni@redhat.com
Despite commit 0ad529d9fd2b ("mptcp: fix possible divide by zero in recvmsg()"), the mptcp protocol is still prone to a race between disconnect() (or shutdown) and accept.
The root cause is that the mentioned commit checks the msk-level flag, but mptcp_stream_accept() does acquire the msk-level lock, as it can rely directly on the first subflow lock.
As reported by Christoph than can lead to a race where an msk socket is accepted after that mptcp_subflow_queue_clean() releases the listener socket lock and just before it takes destructive actions leading to the following splat:
BUG: kernel NULL pointer dereference, address: 0000000000000012 PGD 5a4ca067 P4D 5a4ca067 PUD 37d4c067 PMD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 2 PID: 10955 Comm: syz-executor.5 Not tainted 6.5.0-rc1-gdc7b257ee5dd #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:mptcp_stream_accept+0x1ee/0x2f0 include/net/inet_sock.h:330 Code: 0a 09 00 48 8b 1b 4c 39 e3 74 07 e8 bc 7c 7f fe eb a1 e8 b5 7c 7f fe 4c 8b 6c 24 08 eb 05 e8 a9 7c 7f fe 49 8b 85 d8 09 00 00 <0f> b6 40 12 88 44 24 07 0f b6 6c 24 07 bf 07 00 00 00 89 ee e8 89 RSP: 0018:ffffc90000d07dc0 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888037e8d020 RCX: ffff88803b093300 RDX: 0000000000000000 RSI: ffffffff833822c5 RDI: ffffffff8333896a RBP: 0000607f82031520 R08: ffff88803b093300 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000003e83 R12: ffff888037e8d020 R13: ffff888037e8c680 R14: ffff888009af7900 R15: ffff888009af6880 FS: 00007fc26d708640(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000012 CR3: 0000000066bc5001 CR4: 0000000000370ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> do_accept+0x1ae/0x260 net/socket.c:1872 __sys_accept4+0x9b/0x110 net/socket.c:1913 __do_sys_accept4 net/socket.c:1954 [inline] __se_sys_accept4 net/socket.c:1951 [inline] __x64_sys_accept4+0x20/0x30 net/socket.c:1951 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Address the issue by temporary removing the pending request socket from the accept queue, so that racing accept() can't touch them.
After depleting the msk - the ssk still exists, as plain TCP sockets, re-insert them into the accept queue, so that later inet_csk_listen_stop() will complete the tcp socket disposal.
Fixes: 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch cpaasch@apple.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/423 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- net/mptcp/protocol.h | 1 - net/mptcp/subflow.c | 60 ++++++++++++++++++++++++++-------------------------- 2 files changed, 30 insertions(+), 31 deletions(-)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 37fbe22e2433..ba2a873a4d2e 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -325,7 +325,6 @@ struct mptcp_sock { u32 subflow_id; u32 setsockopt_seq; char ca_name[TCP_CA_NAME_MAX]; - struct mptcp_sock *dl_next; };
#define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 9ee3b7abbaf6..94ae7dd01c65 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1793,34 +1793,21 @@ static void subflow_state_change(struct sock *sk) void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk) { struct request_sock_queue *queue = &inet_csk(listener_ssk)->icsk_accept_queue; - struct mptcp_sock *msk, *next, *head = NULL; - struct request_sock *req; - struct sock *sk; + struct request_sock *req, *head, *tail; + struct mptcp_subflow_context *subflow; + struct sock *sk, *ssk;
- /* build a list of all unaccepted mptcp sockets */ + /* Due to lock dependencies no relevant lock can be acquired under rskq_lock. + * Splice the req list, so that accept() can not reach the pending ssk after + * the listener socket is released below. + */ spin_lock_bh(&queue->rskq_lock); - for (req = queue->rskq_accept_head; req; req = req->dl_next) { - struct mptcp_subflow_context *subflow; - struct sock *ssk = req->sk; - - if (!sk_is_mptcp(ssk)) - continue; - - subflow = mptcp_subflow_ctx(ssk); - if (!subflow || !subflow->conn) - continue; - - /* skip if already in list */ - sk = subflow->conn; - msk = mptcp_sk(sk); - if (msk->dl_next || msk == head) - continue; - - sock_hold(sk); - msk->dl_next = head; - head = msk; - } + head = queue->rskq_accept_head; + tail = queue->rskq_accept_tail; + queue->rskq_accept_head = NULL; + queue->rskq_accept_tail = NULL; spin_unlock_bh(&queue->rskq_lock); + if (!head) return;
@@ -1829,13 +1816,19 @@ void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_s */ release_sock(listener_ssk);
- for (msk = head; msk; msk = next) { - sk = (struct sock *)msk; + for (req = head; req; req = req->dl_next) { + ssk = req->sk; + if (!sk_is_mptcp(ssk)) + continue; + + subflow = mptcp_subflow_ctx(ssk); + if (!subflow || !subflow->conn) + continue; + + sk = subflow->conn; + sock_hold(sk);
lock_sock_nested(sk, SINGLE_DEPTH_NESTING); - next = msk->dl_next; - msk->dl_next = NULL; - __mptcp_unaccepted_force_close(sk); release_sock(sk);
@@ -1859,6 +1852,13 @@ void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_s
/* we are still under the listener msk socket lock */ lock_sock_nested(listener_ssk, SINGLE_DEPTH_NESTING); + + /* restore the listener queue, to let the TCP code clean it up */ + spin_lock_bh(&queue->rskq_lock); + WARN_ON_ONCE(queue->rskq_accept_head); + queue->rskq_accept_head = head; + queue->rskq_accept_tail = tail; + spin_unlock_bh(&queue->rskq_lock); }
static int subflow_ulp_init(struct sock *sk)
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Thu, 03 Aug 2023 18:27:26 +0200 you wrote:
Here is a new batch of fixes related to MPTCP for v6.5 and older.
Patches 1 and 2 fix issues with MPTCP Join selftest when manually launched with '-i' parameter to use 'ip mptcp' tool instead of the dedicated one (pm_nl_ctl). The issues have been there since v5.18.
Thank you Andrea for your first contributions to MPTCP code in the upstream kernel!
[...]
Here is the summary with links: - [net,1/4] selftests: mptcp: join: fix 'delete and re-add' test https://git.kernel.org/netdev/net/c/aaf2123a5cf4 - [net,2/4] selftests: mptcp: join: fix 'implicit EP' test https://git.kernel.org/netdev/net/c/c8c101ae390a - [net,3/4] mptcp: avoid bogus reset on fallback close https://git.kernel.org/netdev/net/c/ff18f9ef30ee - [net,4/4] mptcp: fix disconnect vs accept race https://git.kernel.org/netdev/net/c/511b90e39250
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org