Here are some patches for the MPTCP PM, including some refactoring that I thought it would be best to send at the end of a cycle to avoid conflicts between net and net-next that could last a few weeks.
The most interesting changes are in the first and last patch, the rest are patches refactoring the code & tests to validate the modifications.
- Patches 1 & 2: When servers set the C-flag in their MP_CAPABLE to tell clients not to create subflows to the initial address and port -- e.g. a deployment behind a L4 load balancer like a typical CDN deployment -- clients will not use their other endpoints when default settings are used. That's because the in-kernel path-manager uses the 'subflow' endpoints to create subflows only to the initial address and port. The first patch fixes that (for >=v5.14), and the second one validates it.
- Patches 3-14: various patches refactoring the code around the in-kernel PM (mainly): split too long functions, rename variables and functions to avoid confusions, reduce structure size, and compare IDs instead of IP addresses. Note that one patch modifies one internal variable used in one BPF selftest.
- Patch 15: ability to control endpoints that are used in reaction to a new address announced by the other peer. With that, endpoints can be used only once.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - Patches 1 & 2 are sent to net-next on purpose: to delay a bit the backports, just in case. Plus we are at the end of a cycle, and not to delay the other refactoring patches. - Sorry, I wanted to send this series earlier on, but due to some unrelated issues (and holiday), it got delayed. Most patches are pure refactoring ones.
--- Matthieu Baerts (NGI0) (15): mptcp: pm: in-kernel: usable client side with C-flag selftests: mptcp: join: validate C-flag + def limit mptcp: pm: in-kernel: refactor fill_local_addresses_vec mptcp: pm: in-kernel: refactor fill_remote_addresses_vec mptcp: pm: rename 'subflows' to 'extra_subflows' mptcp: pm: in-kernel: rename 'subflows_max' to 'limit_extra_subflows' mptcp: pm: in-kernel: rename 'add_addr_signal_max' to 'endp_signal_max' mptcp: pm: in-kernel: rename 'add_addr_accept_max' to 'limit_add_addr_accepted' mptcp: pm: in-kernel: rename 'local_addr_max' to 'endp_subflow_max' mptcp: pm: in-kernel: rename 'local_addr_list' to 'endp_list' mptcp: pm: in-kernel: rename 'addrs' to 'endpoints' mptcp: pm: in-kernel: remove stale_loss_cnt mptcp: pm: in-kernel: reduce pernet struct size mptcp: pm: in-kernel: compare IDs instead of addresses mptcp: pm: in-kernel: add laminar endpoints
include/uapi/linux/mptcp.h | 11 +- net/mptcp/pm.c | 32 +- net/mptcp/pm_kernel.c | 569 ++++++++++++++-------- net/mptcp/pm_userspace.c | 2 +- net/mptcp/protocol.h | 21 +- net/mptcp/sockopt.c | 22 +- tools/testing/selftests/bpf/progs/mptcp_subflow.c | 2 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 + 8 files changed, 441 insertions(+), 229 deletions(-) --- base-commit: a1f1f2422e098485b09e55a492de05cf97f9954d change-id: 20250925-net-next-mptcp-c-flag-laminar-f8442e4d4bd9
Best regards,
When servers set the C-flag in their MP_CAPABLE to tell clients not to create subflows to the initial address and port, clients will likely not use their other endpoints. That's because the in-kernel path-manager uses the 'subflow' endpoints to create subflows only to the initial address and port.
If the limits have not been modified to accept ADD_ADDR, the client doesn't try to establish new subflows. If the limits accept ADD_ADDR, the routing routes will be used to select the source IP.
The C-flag is typically set when the server is operating behind a legacy Layer 4 load balancer, or using anycast IP address. Clients having their different 'subflow' endpoints setup, don't end up creating multiple subflows as expected, and causing some deployment issues.
A special case is then added here: when servers set the C-flag in the MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted. The 'subflows' endpoints will then be used with this new remote IP and port. This exception is only allowed when the ADD_ADDR is sent immediately after the 3WHS, and makes the client switching to the 'fully established' mode. After that, 'select_local_address()' will not be able to find any subflows, because 'id_avail_bitmap' will be filled in mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully established' mode.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536 Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/pm.c | 7 +++++-- net/mptcp/pm_kernel.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++- net/mptcp/protocol.h | 8 ++++++++ 3 files changed, 62 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 204e1f61212e2be77a8476f024b59be67d04b80a..584cab90aa6eff4c01cdf4ca4d3dce8894829920 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -637,9 +637,12 @@ void mptcp_pm_add_addr_received(const struct sock *ssk, } else { __MPTCP_INC_STATS(sock_net((struct sock *)msk), MPTCP_MIB_ADDADDRDROP); } - /* id0 should not have a different address */ + /* - id0 should not have a different address + * - special case for C-flag: linked to fill_local_addresses_vec() + */ } else if ((addr->id == 0 && !mptcp_pm_is_init_remote_addr(msk, addr)) || - (addr->id > 0 && !READ_ONCE(pm->accept_addr))) { + (addr->id > 0 && !READ_ONCE(pm->accept_addr) && + !mptcp_pm_add_addr_c_flag_case(msk))) { mptcp_pm_announce_addr(msk, addr, true); mptcp_pm_add_addr_send_ack(msk); } else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) { diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index 667803d72b643a0bb98365003b136c53f2a9a975..8c46493a0835b0e2d5e70950662ae6e845393777 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -389,10 +389,12 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info mpc_addr; struct pm_nl_pernet *pernet; unsigned int subflows_max; + bool c_flag_case; int i = 0;
pernet = pm_nl_get_pernet_from_msk(msk); subflows_max = mptcp_pm_get_subflows_max(msk); + c_flag_case = remote->id && mptcp_pm_add_addr_c_flag_case(msk);
mptcp_local_address((struct sock_common *)msk, &mpc_addr);
@@ -405,12 +407,27 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, continue;
if (msk->pm.subflows < subflows_max) { + bool is_id0; + locals[i].addr = entry->addr; locals[i].flags = entry->flags; locals[i].ifindex = entry->ifindex;
+ is_id0 = mptcp_addresses_equal(&locals[i].addr, + &mpc_addr, + locals[i].addr.port); + + if (c_flag_case && + (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) { + __clear_bit(locals[i].addr.id, + msk->pm.id_avail_bitmap); + + if (!is_id0) + msk->pm.local_addr_used++; + } + /* Special case for ID0: set the correct ID */ - if (mptcp_addresses_equal(&locals[i].addr, &mpc_addr, locals[i].addr.port)) + if (is_id0) locals[i].addr.id = 0;
msk->pm.subflows++; @@ -419,6 +436,37 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk, } rcu_read_unlock();
+ /* Special case: peer sets the C flag, accept one ADD_ADDR if default + * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints + */ + if (!i && c_flag_case) { + unsigned int local_addr_max = mptcp_pm_get_local_addr_max(msk); + + while (msk->pm.local_addr_used < local_addr_max && + msk->pm.subflows < subflows_max) { + struct mptcp_pm_local *local = &locals[i]; + + if (!select_local_address(pernet, msk, local)) + break; + + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + + if (!mptcp_pm_addr_families_match(sk, &local->addr, + remote)) + continue; + + if (mptcp_addresses_equal(&local->addr, &mpc_addr, + local->addr.port)) + continue; + + msk->pm.local_addr_used++; + msk->pm.subflows++; + i++; + } + + return i; + } + /* If the array is empty, fill in the single * 'IPADDRANY' local address */ diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a1787a1344ac1bbeefdb4548740d6aef980b79e7..cbe54331e5c745989af50409d9cb79c6d90a8201 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -1199,6 +1199,14 @@ static inline void mptcp_pm_close_subflow(struct mptcp_sock *msk) spin_unlock_bh(&msk->pm.lock); }
+static inline bool mptcp_pm_add_addr_c_flag_case(struct mptcp_sock *msk) +{ + return READ_ONCE(msk->pm.remote_deny_join_id0) && + msk->pm.local_addr_used == 0 && + mptcp_pm_get_add_addr_accept_max(msk) == 0 && + msk->pm.subflows < mptcp_pm_get_subflows_max(msk); +} + void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk);
static inline struct mptcp_ext *mptcp_get_ext(const struct sk_buff *skb)
The previous commit adds an exception for the C-flag case. The 'mptcp_join.sh' selftest is extended to validate this case.
In this subtest, there is a typical CDN deployment with a client where MPTCP endpoints have been 'automatically' configured:
- the server set net.mptcp.allow_join_initial_addr_port=0
- the client has multiple 'subflow' endpoints, and the default limits: not accepting ADD_ADDRs.
Without the parent patch, the client is not able to establish new subflows using its 'subflow' endpoints. The parent commit fixes that.
The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID.
Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received") Cc: stable@vger.kernel.org Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 6055ee5762e13108e5e2924a0e77d58da584d008..a94b3960ad5e009dbead66b6ff2aa01f70aa3e1f 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3306,6 +3306,17 @@ deny_join_id0_tests() run_tests $ns1 $ns2 10.0.1.1 chk_join_nr 1 1 1 fi + + # default limits, server deny join id 0 + signal + if reset_with_allow_join_id0 "default limits, server deny join id 0" 0 1; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 0 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns2 10.0.3.2 flags subflow + pm_nl_add_endpoint $ns2 10.0.4.2 flags subflow + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 2 2 2 + fi }
fullmesh_tests()
Hello:
This series was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Thu, 25 Sep 2025 12:32:35 +0200 you wrote:
Here are some patches for the MPTCP PM, including some refactoring that I thought it would be best to send at the end of a cycle to avoid conflicts between net and net-next that could last a few weeks.
The most interesting changes are in the first and last patch, the rest are patches refactoring the code & tests to validate the modifications.
[...]
Here is the summary with links: - [net-next,01/15] mptcp: pm: in-kernel: usable client side with C-flag https://git.kernel.org/netdev/net-next/c/4b1ff850e0c1 - [net-next,02/15] selftests: mptcp: join: validate C-flag + def limit https://git.kernel.org/netdev/net-next/c/008385efd05e - [net-next,03/15] mptcp: pm: in-kernel: refactor fill_local_addresses_vec https://git.kernel.org/netdev/net-next/c/8dc63ade451d - [net-next,04/15] mptcp: pm: in-kernel: refactor fill_remote_addresses_vec https://git.kernel.org/netdev/net-next/c/a845b2bbf26e - [net-next,05/15] mptcp: pm: rename 'subflows' to 'extra_subflows' https://git.kernel.org/netdev/net-next/c/c5273f6ca166 - [net-next,06/15] mptcp: pm: in-kernel: rename 'subflows_max' to 'limit_extra_subflows' https://git.kernel.org/netdev/net-next/c/3eb3c9a9596a - [net-next,07/15] mptcp: pm: in-kernel: rename 'add_addr_signal_max' to 'endp_signal_max' https://git.kernel.org/netdev/net-next/c/45cae570664d - [net-next,08/15] mptcp: pm: in-kernel: rename 'add_addr_accept_max' to 'limit_add_addr_accepted' https://git.kernel.org/netdev/net-next/c/37712d84dfc2 - [net-next,09/15] mptcp: pm: in-kernel: rename 'local_addr_max' to 'endp_subflow_max' https://git.kernel.org/netdev/net-next/c/e7757b6d3a62 - [net-next,10/15] mptcp: pm: in-kernel: rename 'local_addr_list' to 'endp_list' https://git.kernel.org/netdev/net-next/c/35e71e43a56d - [net-next,11/15] mptcp: pm: in-kernel: rename 'addrs' to 'endpoints' https://git.kernel.org/netdev/net-next/c/e9aa044f4a1f - [net-next,12/15] mptcp: pm: in-kernel: remove stale_loss_cnt https://git.kernel.org/netdev/net-next/c/db9a0e3858ba - [net-next,13/15] mptcp: pm: in-kernel: reduce pernet struct size https://git.kernel.org/netdev/net-next/c/4984fe6254f8 - [net-next,14/15] mptcp: pm: in-kernel: compare IDs instead of addresses https://git.kernel.org/netdev/net-next/c/f596293314b2 - [net-next,15/15] mptcp: pm: in-kernel: add laminar endpoints https://git.kernel.org/netdev/net-next/c/539f6b9de39e
You are awesome, thank you!
linux-stable-mirror@lists.linaro.org