Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect timeout on MPTCP sockets.
Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.
Patch 6 correctly initializes some subflow fields on outgoing connections.
Signed-off-by: Mat Martineau martineau@kernel.org --- Paolo Abeni (6): mptcp: fix connect timeout handling mptcp: add annotations around msk->subflow accesses mptcp: consolidate passive msk socket initialization mptcp: fix data race around msk->first access mptcp: add annotations around sk->sk_shutdown accesses mptcp: fix active subflow finalization
net/mptcp/protocol.c | 140 ++++++++++++++++++++++++++++----------------------- net/mptcp/protocol.h | 15 +++--- net/mptcp/subflow.c | 28 +---------- 3 files changed, 88 insertions(+), 95 deletions(-) --- base-commit: 448a5ce1120c5bdbce1f1ccdabcd31c7d029f328 change-id: 20230531-send-net-20230531-428ddf43b4ed
Best regards,
From: Paolo Abeni pabeni@redhat.com
Ondrej reported a functional issue WRT timeout handling on connect with a nice reproducer.
The problem is that the current mptcp connect waits for both the MPTCP socket level timeout, and the first subflow socket timeout. The latter is not influenced/touched by the exposed setsockopt().
Overall the above makes the SO_SNDTIMEO a no-op on connect.
Since mptcp_connect is invoked via inet_stream_connect and the latter properly handle the MPTCP level timeout, we can address the issue making the nested subflow level connect always unblocking.
This also allow simplifying a bit the code, dropping an ugly hack to handle the fastopen and custom proto_ops connect.
The issues predates the blamed commit below, but the current resolution requires the infrastructure introduced there.
Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()") Reported-by: Ondrej Mosnacek omosnace@redhat.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399 Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau martineau@kernel.org --- net/mptcp/protocol.c | 29 +++++++---------------------- net/mptcp/protocol.h | 1 - 2 files changed, 7 insertions(+), 23 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 08dc53f56bc2..9cafd3b89908 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1702,7 +1702,6 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
lock_sock(ssk); msg->msg_flags |= MSG_DONTWAIT; - msk->connect_flags = O_NONBLOCK; msk->fastopening = 1; ret = tcp_sendmsg_fastopen(ssk, msg, copied_syn, len, NULL); msk->fastopening = 0; @@ -3617,9 +3616,9 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) * acquired the subflow socket lock, too. */ if (msk->fastopening) - err = __inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags, 1); + err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1); else - err = inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags); + err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK); inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
/* on successful connect, the msk state will be moved to established by @@ -3632,12 +3631,10 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
mptcp_copy_inaddrs(sk, ssock->sk);
- /* unblocking connect, mptcp-level inet_stream_connect will error out - * without changing the socket state, update it here. + /* silence EINPROGRESS and let the caller inet_stream_connect + * handle the connection in progress */ - if (err == -EINPROGRESS) - sk->sk_socket->state = ssock->state; - return err; + return 0; }
static struct proto mptcp_prot = { @@ -3696,18 +3693,6 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len) return err; }
-static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr, - int addr_len, int flags) -{ - int ret; - - lock_sock(sock->sk); - mptcp_sk(sock->sk)->connect_flags = flags; - ret = __inet_stream_connect(sock, uaddr, addr_len, flags, 0); - release_sock(sock->sk); - return ret; -} - static int mptcp_listen(struct socket *sock, int backlog) { struct mptcp_sock *msk = mptcp_sk(sock->sk); @@ -3859,7 +3844,7 @@ static const struct proto_ops mptcp_stream_ops = { .owner = THIS_MODULE, .release = inet_release, .bind = mptcp_bind, - .connect = mptcp_stream_connect, + .connect = inet_stream_connect, .socketpair = sock_no_socketpair, .accept = mptcp_stream_accept, .getname = inet_getname, @@ -3954,7 +3939,7 @@ static const struct proto_ops mptcp_v6_stream_ops = { .owner = THIS_MODULE, .release = inet6_release, .bind = mptcp_bind, - .connect = mptcp_stream_connect, + .connect = inet_stream_connect, .socketpair = sock_no_socketpair, .accept = mptcp_stream_accept, .getname = inet6_getname, diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 2d7b2c80a164..de4667dafe59 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -297,7 +297,6 @@ struct mptcp_sock { nodelay:1, fastopening:1, in_accept_queue:1; - int connect_flags; struct work_struct work; struct sk_buff *ooo_last_skb; struct rb_root out_of_order_queue;
From: Paolo Abeni pabeni@redhat.com
Active subflow are inserted into the connection list at creation time. When the MPJ handshake completes successfully, a new subflow creation netlink event is generated correctly, but the current code wrongly avoid initializing a couple of subflow data.
The above will cause misbehavior on a few exceptional events: unneeded mptcp-level retransmission on msk-level sequence wrap-around and infinite mapping fallback even when a MPJ socket is present.
Address the issue factoring out the needed initialization in a new helper and invoking the latter from __mptcp_finish_join() time for passive subflow and from mptcp_finish_join() for active ones.
Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Mat Martineau martineau@kernel.org --- net/mptcp/protocol.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index af54a878ac27..67311e7d5b21 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -825,6 +825,13 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) mptcp_data_unlock(sk); }
+static void mptcp_subflow_joined(struct mptcp_sock *msk, struct sock *ssk) +{ + mptcp_subflow_ctx(ssk)->map_seq = READ_ONCE(msk->ack_seq); + WRITE_ONCE(msk->allow_infinite_fallback, false); + mptcp_event(MPTCP_EVENT_SUB_ESTABLISHED, msk, ssk, GFP_ATOMIC); +} + static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk) { struct sock *sk = (struct sock *)msk; @@ -839,6 +846,7 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk) mptcp_sock_graft(ssk, sk->sk_socket);
mptcp_sockopt_sync_locked(msk, ssk); + mptcp_subflow_joined(msk, ssk); return true; }
@@ -3485,14 +3493,16 @@ bool mptcp_finish_join(struct sock *ssk) return false; }
- if (!list_empty(&subflow->node)) - goto out; + /* active subflow, already present inside the conn_list */ + if (!list_empty(&subflow->node)) { + mptcp_subflow_joined(msk, ssk); + return true; + }
if (!mptcp_pm_allow_new_subflow(msk)) goto err_prohibited;
- /* active connections are already on conn_list. - * If we can't acquire msk socket lock here, let the release callback + /* If we can't acquire msk socket lock here, let the release callback * handle it */ mptcp_data_lock(parent); @@ -3515,11 +3525,6 @@ bool mptcp_finish_join(struct sock *ssk) return false; }
- subflow->map_seq = READ_ONCE(msk->ack_seq); - WRITE_ONCE(msk->allow_infinite_fallback, false); - -out: - mptcp_event(MPTCP_EVENT_SUB_ESTABLISHED, msk, ssk, GFP_ATOMIC); return true; }
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Wed, 31 May 2023 12:37:02 -0700 you wrote:
Patch 1 allows the SO_SNDTIMEO sockopt to correctly change the connect timeout on MPTCP sockets.
Patches 2-5 add READ_ONCE()/WRITE_ONCE() annotations to fix KCSAN issues.
Patch 6 correctly initializes some subflow fields on outgoing connections.
[...]
Here is the summary with links: - [net,1/6] mptcp: fix connect timeout handling https://git.kernel.org/netdev/net/c/786fc1245726 - [net,2/6] mptcp: add annotations around msk->subflow accesses https://git.kernel.org/netdev/net/c/5b825727d087 - [net,3/6] mptcp: consolidate passive msk socket initialization https://git.kernel.org/netdev/net/c/7e8b88ec35ee - [net,4/6] mptcp: fix data race around msk->first access https://git.kernel.org/netdev/net/c/1b1b43ee7a20 - [net,5/6] mptcp: add annotations around sk->sk_shutdown accesses https://git.kernel.org/netdev/net/c/6b9831bfd932 - [net,6/6] mptcp: fix active subflow finalization https://git.kernel.org/netdev/net/c/55b47ca7d808
You are awesome, thank you!
linux-stable-mirror@lists.linaro.org