When looking at improving the user experience around the MPTCP endpoints setup, I noticed that setting an endpoint with both the 'signal' and the 'subflow' flags -- as it has been done in the past by users according to bug reports we got -- was resulting on only announcing the endpoint, but not using it to create subflows: the 'subflow' flag was then ignored.
My initial thought was to modify IPRoute2 to warn the user when the two flags were set, but it doesn't sound normal to ignore one of them. I then looked at modifying the kernel not to allow having the two flags set, but when discussing about that with Mat, we thought it was maybe not ideal to do that, as there might be use-cases, we might break some configs. Then I saw it was working before v5.17. So instead, I fixed the support on the kernel side (patch 5) using Paolo's suggestion. This also includes a fix on the options side (patch 1: for v5.11+), an explicit deny of some options combinations (patch 2: for v5.18+), and some refactoring (patches 3 and 4) to ease the inclusion of the patch 5.
While at it, I added a new selftest (patch 7) to validate this case -- including a modification of the chk_add_nr helper to inverse the sides were the counters are checked (patch 6) -- and allowed ADD_ADDR echo just after the MP_JOIN 3WHS.
The selftests modification have the same Fixes tag as the previous commit, but no 'Cc: Stable': if the backport can work, that's good -- but it still need to be verified by running the selftests -- if not, no need to worry, many CIs will use the selftests from the last stable version to validate previous stable releases.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Matthieu Baerts (NGI0) (7): mptcp: fully established after ADD_ADDR echo on MPJ mptcp: pm: deny endp with signal + subflow + port mptcp: pm: reduce indentation blocks mptcp: pm: don't try to create sf if alloc failed mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set selftests: mptcp: join: ability to invert ADD_ADDR check selftests: mptcp: join: test both signal & subflow
net/mptcp/options.c | 3 +- net/mptcp/pm_netlink.c | 47 +++++++++++++-------- tools/testing/selftests/net/mptcp/mptcp_join.sh | 55 ++++++++++++++++++------- 3 files changed, 73 insertions(+), 32 deletions(-) --- base-commit: 0bf50cead4c4710d9f704778c32ab8af47ddf070 change-id: 20240731-upstream-net-20240731-mptcp-endp-subflow-signal-181d640cf5e8
Best regards,
Before this patch, receiving an ADD_ADDR echo on the just connected MP_JOIN subflow -- initiator side, after the MP_JOIN 3WHS -- was resulting in an MP_RESET. That's because only ACKs with a DSS or ADD_ADDRs without the echo bit were allowed.
Not allowing the ADD_ADDR echo after an MP_CAPABLE 3WHS makes sense, as we are not supposed to send an ADD_ADDR before because it requires to be in full established mode first. For the MP_JOIN 3WHS, that's different: the ADD_ADDR can be sent on a previous subflow, and the ADD_ADDR echo can be received on the recently created one. The other peer will already be in fully established, so it is allowed to send that.
We can then relax the conditions here to accept the ADD_ADDR echo for MPJ subflows.
Fixes: 67b12f792d5e ("mptcp: full fully established support after ADD_ADDR") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/options.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 8a68382a4fe9..ac2f1a54cc43 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -958,7 +958,8 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
if (subflow->remote_key_valid && (((mp_opt->suboptions & OPTION_MPTCP_DSS) && mp_opt->use_ack) || - ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) && !mp_opt->echo))) { + ((mp_opt->suboptions & OPTION_MPTCP_ADD_ADDR) && + (!mp_opt->echo || subflow->mp_join)))) { /* subflows are fully established as soon as we get any * additional ack, including ADD_ADDR. */
As mentioned in the 'Fixes' commit, the port flag is only supported by the 'signal' flag, and not by the 'subflow' one. Then if both the 'signal' and 'subflow' flags are set, the problem is the same: the feature cannot work with the 'subflow' flag.
Technically, if both the 'signal' and 'subflow' flags are set, it will be possible to create the listening socket, but not to establish a subflow using this source port. So better to explicitly deny it, not to create some confusions because the expected behaviour is not possible.
Fixes: 09f12c3ab7a5 ("mptcp: allow to use port and non-signal in set_flags") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/pm_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 37954a0b087d..c921d07e5940 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1328,8 +1328,8 @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info) if (ret < 0) return ret;
- if (addr.addr.port && !(addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) { - GENL_SET_ERR_MSG(info, "flags must have signal when using port"); + if (addr.addr.port && !address_use_port(&addr)) { + GENL_SET_ERR_MSG(info, "flags must have signal and not subflow when using port"); return -EINVAL; }
It sounds better to avoid wasting cycles and / or put extreme memory pressure on the system by trying to create new subflows if it was not possible to add a new item in the announce list.
While at it, a warning is now printed if the entry was already in the list as it should not happen with the in-kernel path-manager. With this PM, mptcp_pm_alloc_anno_list() should only fail in case of memory pressure.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/pm_netlink.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 780f4cca165c..2be7af377cda 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -348,7 +348,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk, add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
if (add_entry) { - if (mptcp_pm_is_kernel(msk)) + if (WARN_ON_ONCE(mptcp_pm_is_kernel(msk))) return false;
sk_reset_timer(sk, &add_entry->add_timer, @@ -555,8 +555,6 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
/* check first for announce */ if (msk->pm.add_addr_signaled < add_addr_signal_max) { - local = select_signal_address(pernet, msk); - /* due to racing events on both ends we can reach here while * previous add address is still running: if we invoke now * mptcp_pm_announce_addr(), that will fail and the @@ -567,11 +565,15 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL)) return;
+ local = select_signal_address(pernet, msk); if (!local) goto subflow;
+ /* If the alloc fails, we are on memory pressure, not worth + * continuing, and trying to create subflows. + */ if (!mptcp_pm_alloc_anno_list(msk, &local->addr)) - goto subflow; + return;
__clear_bit(local->addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++;
Up to the 'Fixes' commit, having an endpoint with both the 'signal' and 'subflow' flags, resulted in the creation of a subflow and an address announcement using the address linked to this endpoint. After this commit, only the address announcement was done, ignoring the 'subflow' flag.
That's because the same bitmap is used for the two flags. It is OK to keep this single bitmap, the already selected local endpoint simply have to be re-used, but not via select_local_address() not to look at the just modified bitmap.
Note that it is unusual to set the two flags together: creating a new subflow using a new local address will implicitly advertise it to the other peer. So in theory, no need to advertise it explicitly as well. Maybe there are use-cases -- the subflow might not reach the other peer that way, we can ask the other peer to try initiating the new subflow without delay -- or very likely the user is confused, and put both flags "just to be sure at least the right one is set". Still, if it is allowed, the kernel should do what has been asked: using this endpoint to announce the address and to create a new subflow from it.
An alternative is to forbid the use of the two flags together, but that's probably too late, there are maybe use-cases, and it was working before. This patch will avoid people complaining subflows are not created using the endpoint they added with the 'subflow' and 'signal' flag.
Note that with the current patch, the subflow might not be created in some corner cases, e.g. if the 'subflows' limit was reached when sending the ADD_ADDR, but changed later on. It is probably not worth splitting id_avail_bitmap per target ('signal', 'subflow'), which will add another large field to the msk "just" to track (again) endpoints. Anyway, currently when the limits are changed, the kernel doesn't check if new subflows can be created or removed, because we would need to keep track of the received ADD_ADDR, and more. It sounds OK to assume that the limits should be properly configured before establishing new connections.
Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk") Cc: stable@vger.kernel.org Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/pm_netlink.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 2be7af377cda..4cae2aa7be5c 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -512,8 +512,8 @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info)
static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { + struct mptcp_pm_addr_entry *local, *signal_and_subflow = NULL; struct sock *sk = (struct sock *)msk; - struct mptcp_pm_addr_entry *local; unsigned int add_addr_signal_max; unsigned int local_addr_max; struct pm_nl_pernet *pernet; @@ -579,6 +579,9 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) msk->pm.add_addr_signaled++; mptcp_pm_announce_addr(msk, &local->addr, false); mptcp_pm_nl_addr_send_ack(msk); + + if (local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) + signal_and_subflow = local; }
subflow: @@ -589,9 +592,14 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) bool fullmesh; int i, nr;
- local = select_local_address(pernet, msk); - if (!local) - break; + if (signal_and_subflow) { + local = signal_and_subflow; + signal_and_subflow = NULL; + } else { + local = select_local_address(pernet, msk); + if (!local) + break; + }
fullmesh = !!(local->flags & MPTCP_PM_ADDR_FLAG_FULLMESH);
Hello:
This series was applied to netdev/net.git (main) by Jakub Kicinski kuba@kernel.org:
On Wed, 31 Jul 2024 13:05:52 +0200 you wrote:
When looking at improving the user experience around the MPTCP endpoints setup, I noticed that setting an endpoint with both the 'signal' and the 'subflow' flags -- as it has been done in the past by users according to bug reports we got -- was resulting on only announcing the endpoint, but not using it to create subflows: the 'subflow' flag was then ignored.
My initial thought was to modify IPRoute2 to warn the user when the two flags were set, but it doesn't sound normal to ignore one of them. I then looked at modifying the kernel not to allow having the two flags set, but when discussing about that with Mat, we thought it was maybe not ideal to do that, as there might be use-cases, we might break some configs. Then I saw it was working before v5.17. So instead, I fixed the support on the kernel side (patch 5) using Paolo's suggestion. This also includes a fix on the options side (patch 1: for v5.11+), an explicit deny of some options combinations (patch 2: for v5.18+), and some refactoring (patches 3 and 4) to ease the inclusion of the patch 5.
[...]
Here is the summary with links: - [net,1/7] mptcp: fully established after ADD_ADDR echo on MPJ https://git.kernel.org/netdev/net/c/d67c5649c154 - [net,2/7] mptcp: pm: deny endp with signal + subflow + port https://git.kernel.org/netdev/net/c/8af1f11865f2 - [net,3/7] mptcp: pm: reduce indentation blocks https://git.kernel.org/netdev/net/c/c95eb32ced82 - [net,4/7] mptcp: pm: don't try to create sf if alloc failed https://git.kernel.org/netdev/net/c/cd7c957f936f - [net,5/7] mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set https://git.kernel.org/netdev/net/c/85df533a787b - [net,6/7] selftests: mptcp: join: ability to invert ADD_ADDR check https://git.kernel.org/netdev/net/c/bec1f3b119eb - [net,7/7] selftests: mptcp: join: test both signal & subflow https://git.kernel.org/netdev/net/c/4d2868b5d191
You are awesome, thank you!
linux-stable-mirror@lists.linaro.org