Last week, Jakub reported [1] that the MPTCP Connect selftest was unstable. It looked like it started after the introduction of some fixes [2]. After analysis from Paolo, these patches revealed existing bugs, that should be fixed by the following patches.
- Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In some corner cases, the other peer was not notified when more data could be sent. A fix for v5.11, but depending on a feature introduced in v5.19.
- Patch 2: Fix spurious wake-up under memory pressure. In this situation, the userspace could be invited to read data not being there yet. A fix for v6.7.
- Patch 3: Fix a false positive error when running the MPTCP Connect selftest with the "disconnect" cases. The userspace could disconnect the socket too soon, which would reset (MP_FASTCLOSE) the connection, interpreted as an error by the test. A fix for v5.17.
Link: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org [1] Link: https://lore.kernel.org/20241230-net-mptcp-rbuf-fixes-v1-0-8608af434ceb@kern... [2] Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Paolo Abeni (3): mptcp: be sure to send ack when mptcp-level window re-opens mptcp: fix spurious wake-up on under memory pressure selftests: mptcp: avoid spurious errors on disconnect
net/mptcp/options.c | 6 ++-- net/mptcp/protocol.h | 9 +++-- tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +++++++++++++++++------ 3 files changed, 43 insertions(+), 15 deletions(-) --- base-commit: 76201b5979768500bca362871db66d77cb4c225e change-id: 20250113-net-mptcp-connect-st-flakes-4af6389808de
Best regards,
From: Paolo Abeni pabeni@redhat.com
mptcp_cleanup_rbuf() is responsible to send acks when the user-space reads enough data to update the receive windows significantly.
It tries hard to avoid acquiring the subflow sockets locks by checking conditions similar to the ones implemented at the TCP level.
To avoid too much code duplication - the MPTCP protocol can't reuse the TCP helpers as part of the relevant status is maintained into the msk socket - and multiple costly window size computation, mptcp_cleanup_rbuf uses a rough estimate for the most recently advertised window size: the MPTCP receive free space, as recorded as at last-ack time.
Unfortunately the above does not allow mptcp_cleanup_rbuf() to detect a zero to non-zero win change in some corner cases, skipping the tcp_cleanup_rbuf call and leaving the peer stuck.
After commit ea66758c1795 ("tcp: allow MPTCP to update the announced window"), MPTCP has actually cheap access to the announced window value. Use it in mptcp_cleanup_rbuf() for a more accurate ack generation.
Fixes: e3859603ba13 ("mptcp: better msk receive window updates") Cc: stable@vger.kernel.org Reported-by: Jakub Kicinski kuba@kernel.org Closes: https://lore.kernel.org/20250107131845.5e5de3c5@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/options.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c index a62bc874bf1e17c4ee3b4d8d94eef4d0e7c9f272..123f3f2972841aab9fc2ef338687b4c4758efd4a 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -607,7 +607,6 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, } opts->ext_copy.use_ack = 1; opts->suboptions = OPTION_MPTCP_DSS; - WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk));
/* Add kind/length/subtype/flag overhead if mapping is not populated */ if (dss_size == 0) @@ -1288,7 +1287,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) } MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDCONFLICT); } - return; + goto update_wspace; }
if (rcv_wnd_new != rcv_wnd_old) { @@ -1313,6 +1312,9 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) th->window = htons(new_win); MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_RCVWNDSHARED); } + +update_wspace: + WRITE_ONCE(msk->old_wspace, tp->rcv_wnd); }
__sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
From: Paolo Abeni pabeni@redhat.com
The wake-up condition currently implemented by mptcp_epollin_ready() is wrong, as it could mark the MPTCP socket as readable even when no data are present and the system is under memory pressure.
Explicitly check for some data being available in the receive queue.
Fixes: 5684ab1a0eff ("mptcp: give rcvlowat some love") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/protocol.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a93e661ef5c435155066ce9cc109092661f0711c..73526f1d768fcb6ba5bcf1a43eb09ed5ff9d67bf 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -760,10 +760,15 @@ static inline u64 mptcp_data_avail(const struct mptcp_sock *msk)
static inline bool mptcp_epollin_ready(const struct sock *sk) { + u64 data_avail = mptcp_data_avail(mptcp_sk(sk)); + + if (!data_avail) + return false; + /* mptcp doesn't have to deal with small skbs in the receive queue, - * at it can always coalesce them + * as it can always coalesce them */ - return (mptcp_data_avail(mptcp_sk(sk)) >= sk->sk_rcvlowat) || + return (data_avail >= sk->sk_rcvlowat) || (mem_cgroup_sockets_enabled && sk->sk_memcg && mem_cgroup_under_socket_pressure(sk->sk_memcg)) || READ_ONCE(tcp_memory_pressure);
From: Paolo Abeni pabeni@redhat.com
The disconnect test-case generates spurious errors:
INFO: disconnect INFO: extra options: -I 3 -i /tmp/tmp.r43niviyoI 01 ns1 MPTCP -> ns1 (10.0.1.1:10000 ) MPTCP (duration 140ms) [FAIL] file received by server does not match (in, out): Unexpected revents: POLLERR/POLLNVAL(19) -rw-r--r-- 1 root root 10028676 Jan 10 10:47 /tmp/tmp.r43niviyoI.disconnect Trailing bytes are: ��\����R���!8��u2��5N% -rw------- 1 root root 9992290 Jan 10 10:47 /tmp/tmp.Os4UbnWbI1 Trailing bytes are: ��\����R���!8��u2��5N% 02 ns1 MPTCP -> ns1 (dead:beef:1::1:10001) MPTCP (duration 206ms) [ OK ] 03 ns1 MPTCP -> ns1 (dead:beef:1::1:10002) TCP (duration 31ms) [ OK ] 04 ns1 TCP -> ns1 (dead:beef:1::1:10003) MPTCP (duration 26ms) [ OK ] [FAIL] Tests of the full disconnection have failed Time: 2 seconds
The root cause is actually in the user-space bits: the test program currently disconnects as soon as all the pending data has been spooled, generating an FASTCLOSE. If such option reaches the peer before the latter has reached the closed status, the msk socket will report an error to the user-space, as per protocol specification, causing the above failure.
Address the issue explicitly waiting for all the relevant sockets to reach a closed status before performing the disconnect.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_connect.c | 43 +++++++++++++++++------ 1 file changed, 32 insertions(+), 11 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 4209b95690394b7565f330a37606bf39b6d2d228..414addef9a4514c489ecd09249143fe0ce2af649 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -25,6 +25,8 @@ #include <sys/types.h> #include <sys/mman.h>
+#include <arpa/inet.h> + #include <netdb.h> #include <netinet/in.h>
@@ -1211,23 +1213,42 @@ static void parse_setsock_options(const char *name) exit(1); }
-void xdisconnect(int fd, int addrlen) +void xdisconnect(int fd) { - struct sockaddr_storage empty; + socklen_t addrlen = sizeof(struct sockaddr_storage); + struct sockaddr_storage addr, empty; int msec_sleep = 10; - int queued = 1; - int i; + void *raw_addr; + int i, cmdlen; + char cmd[128]; + + /* get the local address and convert it to string */ + if (getsockname(fd, (struct sockaddr *)&addr, &addrlen) < 0) + xerror("getsockname"); + + if (addr.ss_family == AF_INET) + raw_addr = &(((struct sockaddr_in *)&addr)->sin_addr); + else if (addr.ss_family == AF_INET6) + raw_addr = &(((struct sockaddr_in6 *)&addr)->sin6_addr); + else + xerror("bad family"); + + strcpy(cmd, "ss -M | grep -q "); + cmdlen = strlen(cmd); + if (!inet_ntop(addr.ss_family, raw_addr, &cmd[cmdlen], + sizeof(cmd) - cmdlen)) + xerror("inet_ntop");
shutdown(fd, SHUT_WR);
- /* while until the pending data is completely flushed, the later + /* + * wait until the pending data is completely flushed and all + * the MPTCP sockets reached the closed status. * disconnect will bypass/ignore/drop any pending data. */ for (i = 0; ; i += msec_sleep) { - if (ioctl(fd, SIOCOUTQ, &queued) < 0) - xerror("can't query out socket queue: %d", errno); - - if (!queued) + /* closed socket are not listed by 'ss' */ + if (system(cmd) != 0) break;
if (i > poll_timeout) @@ -1281,9 +1302,9 @@ int main_loop(void) return ret;
if (cfg_truncate > 0) { - xdisconnect(fd, peer->ai_addrlen); + xdisconnect(fd); } else if (--cfg_repeat > 0) { - xdisconnect(fd, peer->ai_addrlen); + xdisconnect(fd);
/* the socket could be unblocking at this point, we need the * connect to be blocking
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Mon, 13 Jan 2025 16:44:55 +0100 you wrote:
Last week, Jakub reported [1] that the MPTCP Connect selftest was unstable. It looked like it started after the introduction of some fixes [2]. After analysis from Paolo, these patches revealed existing bugs, that should be fixed by the following patches.
- Patch 1: Make sure ACK are sent when MPTCP-level window re-opens. In some corner cases, the other peer was not notified when more data could be sent. A fix for v5.11, but depending on a feature introduced in v5.19.
[...]
Here is the summary with links: - [net,1/3] mptcp: be sure to send ack when mptcp-level window re-opens https://git.kernel.org/netdev/net/c/2ca06a2f6531 - [net,2/3] mptcp: fix spurious wake-up on under memory pressure https://git.kernel.org/netdev/net/c/e226d9259dc4 - [net,3/3] selftests: mptcp: avoid spurious errors on disconnect https://git.kernel.org/netdev/net/c/218cc166321f
You are awesome, thank you!
linux-stable-mirror@lists.linaro.org