Here is a series from Geliang, adding mptcp_subflow bpf_iter support.
We are working on extending MPTCP with BPF, e.g. to control the path manager -- in charge of the creation, deletion, and announcements of subflows (paths) -- and the packet scheduler -- in charge of selecting which available path the next data will be sent to. These extensions need to iterate over the list of subflows attached to an MPTCP connection, and do some specific actions via some new kfunc that will be added later on.
This preparation work is split in different patches:
- Patch 1: extend bpf_skc_to_mptcp_sock() to be called with msk.
- Patch 2: allow using skc_to_mptcp_sock() in CGroup sockopt hooks.
- Patch 3: register some "basic" MPTCP kfunc.
- Patch 4: add mptcp_subflow bpf_iter support. Note that previous versions of this single patch have already been shared to the BPF mailing list. The changelog has been kept with a comment, but the version number has been reset to avoid confusions.
- Patch 5: add kfunc to make sure the msk is valid
- Patch 6: add more MPTCP endpoints in the selftests, in order to create more than 2 subflows.
- Patch 7: add a very simple test validating mptcp_subflow bpf_iter support. This test could be written without the new bpf_iter, but it is there only to make sure this specific feature works as expected.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Changes in v2: - Patches 1-2: new ones. - Patch 3: remove two kfunc, more restrictions. (Martin) - Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin) - Patch 7: adaptations due to modifications in patches 1-4. - Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-0...
--- Geliang Tang (7): bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock bpf: Allow use of skc_to_mptcp_sock in cg_sockopt bpf: Register mptcp common kfunc set bpf: Add mptcp_subflow bpf_iter bpf: Acquire and release mptcp socket selftests/bpf: More endpoints for endpoint_init selftests/bpf: Add mptcp_subflow bpf_iter subtest
include/net/mptcp.h | 4 +- kernel/bpf/cgroup.c | 2 + net/core/filter.c | 2 +- net/mptcp/bpf.c | 113 +++++++++++++++++- tools/testing/selftests/bpf/bpf_experimental.h | 8 ++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 129 ++++++++++++++++++++- tools/testing/selftests/bpf/progs/mptcp_bpf.h | 9 ++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 63 ++++++++++ 8 files changed, 318 insertions(+), 12 deletions(-) --- base-commit: dad704ebe38642cd405e15b9c51263356391355c change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e
Best regards,
From: Geliang Tang tanggeliang@kylinos.cn
Currently, bpf_skc_to_mptcp_sock() can only be used with sockets that are MPTCP subflows: TCP sockets with tp->is_mptcp, created by the kernel from an MPTCP socket (IPPROTO_MPTCP). Typically used with BPF sock_ops operators.
Here, this helper is extended to support MPTCP sockets, the ones created by the userspace (IPPROTO_MPTCP). This is useful for BPF hooks involving these sockets, e.g. [gs]etsocktopt.
bpf_skc_to_mptcp_sock() uses bpf_mptcp_sock_from_subflow(). The former suggests any MPTCP type/subtype can be used, but the latter only accepts subflow ones. So bpf_mptcp_sock_from_subflow is modified here to support MPTCP socket, and renamed to avoid confusions.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: new patch. --- include/net/mptcp.h | 4 ++-- net/core/filter.c | 2 +- net/mptcp/bpf.c | 10 ++++++++-- 3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 814b5f2e3ed5e3e474a2bac5e4cca5a89abcfe1c..94d5976f7b8d8eb8b82f298f7b33ecd653937dc0 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -322,9 +322,9 @@ static inline void mptcpv6_handle_mapped(struct sock *sk, bool mapped) { } #endif
#if defined(CONFIG_MPTCP) && defined(CONFIG_BPF_SYSCALL) -struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk); +struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk); #else -static inline struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) { return NULL; } +static inline struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk) { return NULL; } #endif
#if !IS_ENABLED(CONFIG_MPTCP) diff --git a/net/core/filter.c b/net/core/filter.c index 6625b3f563a4a3110a76b9c12a46340828e16b6e..40ed3854a899bf9d11847ec0e8aee174923f03d2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11833,7 +11833,7 @@ const struct bpf_func_proto bpf_skc_to_unix_sock_proto = { BPF_CALL_1(bpf_skc_to_mptcp_sock, struct sock *, sk) { BTF_TYPE_EMIT(struct mptcp_sock); - return (unsigned long)bpf_mptcp_sock_from_subflow(sk); + return (unsigned long)bpf_mptcp_sock_from_sock(sk); }
const struct bpf_func_proto bpf_skc_to_mptcp_sock_proto = { diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 8a16672b94e2384f5263e1432296cbca1236bb30..988b22a06331ac293b36f3c6f9bc29ff0f6db048 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -12,9 +12,15 @@ #include <linux/bpf.h> #include "protocol.h"
-struct mptcp_sock *bpf_mptcp_sock_from_subflow(struct sock *sk) +struct mptcp_sock *bpf_mptcp_sock_from_sock(struct sock *sk) { - if (sk && sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) + if (unlikely(!sk || !sk_fullsock(sk))) + return NULL; + + if (sk->sk_protocol == IPPROTO_MPTCP) + return mptcp_sk(sk); + + if (sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) return mptcp_sk(mptcp_subflow_ctx(sk)->conn);
return NULL;
From: Geliang Tang tanggeliang@kylinos.cn
Currently, bpf_skc_to_mptcp_sock() helper is not allowed to be used in cg_sockopt. This patch adds this permission.
Thanks to the previous patch allowing skc_to_mptcp_sock() to be used with MPTCP sockets, this permission allows this helper to be use it in CGroup BPF hooks, e.g. [gs]etsocktopt.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: new patch. --- kernel/bpf/cgroup.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 46e5db65dbc8d8c6591b53dfc77bb689357f33ea..1ca22e4842cf6b6c6bdbc37bf415187e706b2b69 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -2358,6 +2358,8 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) #ifdef CONFIG_INET case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; + case BPF_FUNC_skc_to_mptcp_sock: + return &bpf_skc_to_mptcp_sock_proto; #endif case BPF_FUNC_perf_event_output: return &bpf_event_output_data_proto;
From: Geliang Tang tanggeliang@kylinos.cn
MPTCP helper mptcp_sk() is used to convert struct sock to mptcp_sock. Helpers mptcp_subflow_ctx() and mptcp_subflow_tcp_sock() are used to convert between struct mptcp_subflow_context and sock. They all will be used in MPTCP BPF programs too.
This patch defines corresponding wrappers of them, and put the wrappers into mptcp common kfunc set and register the set with the flag BPF_PROG_TYPE_UNSPEC to let them accessible to all types of BPF programs.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: - Thanks to the two new previous patches, bpf_mptcp_sk() and bpf_mptcp_subflow_tcp_sock() are no longer needed. - bpf_mptcp_subflow_ctx(): make sure the socket is an MPTCP subflow, and add KF_RET_NULL (Martin). - Restrict this kfunc to BPF_PROG_TYPE_CGROUP_SOCKOPT for the moment. --- net/mptcp/bpf.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index 988b22a06331ac293b36f3c6f9bc29ff0f6db048..c5bfd84c16c43230d9d8e1fd8ff781a767e647b5 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -35,8 +35,37 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, };
+__bpf_kfunc_start_defs(); + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_mptcp_subflow_ctx(const struct sock *sk) +{ + if (sk && sk_fullsock(sk) && + sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) + return mptcp_subflow_ctx(sk); + + return NULL; +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) +BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_mptcp_common_kfunc_ids, +}; + static int __init bpf_mptcp_kfunc_init(void) { - return register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + int ret; + + ret = register_btf_fmodret_id_set(&bpf_mptcp_fmodret_set); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCKOPT, + &bpf_mptcp_common_kfunc_set); + + return ret; } late_initcall(bpf_mptcp_kfunc_init);
From: Geliang Tang tanggeliang@kylinos.cn
It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this:
mptcp_for_each_subflow(msk, subflow) kfunc(subflow);
But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case.
So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this:
bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow);
Suggested-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: - Add BUILD_BUG_ON() checks, similar to the ones done with other bpf_iter_(...) helpers. - Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and !spin_is_locked() (Martin).
A few versions of this single patch have been previously posted to the BPF mailing list by Geliang, before continuing to the MPTCP mailing list only, with other patches of this series. The version of the whole series has been reset to 1, but here is the ChangeLog for the previous ones: - v2: remove msk->pm.lock in _new() and _destroy() (Martin) drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii) - v3: drop bpf_iter__mptcp_subflow - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in _next() (Andrii) - v5: use list_is_last() instead of list_entry_is_head() add KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new() - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin) --- net/mptcp/bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, };
+struct bpf_iter_mptcp_subflow { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_mptcp_subflow_kern { + struct mptcp_sock *msk; + struct list_head *pos; +} __aligned(8); + __bpf_kfunc_start_defs();
__bpf_kfunc static struct mptcp_subflow_context * @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) return NULL; }
+__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct mptcp_sock *msk) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + struct sock *sk = (struct sock *)msk; + + BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) > + sizeof(struct bpf_iter_mptcp_subflow)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != + __alignof__(struct bpf_iter_mptcp_subflow)); + + kit->msk = msk; + if (!msk) + return -EINVAL; + + if (!sock_owned_by_user_nocheck(sk) && + !spin_is_locked(&sk->sk_lock.slock)) + return -EINVAL; + + kit->pos = &msk->conn_list; + return 0; +} + +__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{ + struct bpf_iter_mptcp_subflow_kern *kit = (void *)it; + + if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list)) + return NULL; + + kit->pos = kit->pos->next; + return list_entry(kit->pos, struct mptcp_subflow_context, node); +} + +__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +} + __bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote:
From: Geliang Tang tanggeliang@kylinos.cn
It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this:
mptcp_for_each_subflow(msk, subflow) kfunc(subflow);
But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case.
So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this:
bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow);
Suggested-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
Notes:
- v2:
- Add BUILD_BUG_ON() checks, similar to the ones done with other bpf_iter_(...) helpers.
- Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and !spin_is_locked() (Martin).
A few versions of this single patch have been previously posted to the BPF mailing list by Geliang, before continuing to the MPTCP mailing list only, with other patches of this series. The version of the whole series has been reset to 1, but here is the ChangeLog for the previous ones:
- v2: remove msk->pm.lock in _new() and _destroy() (Martin) drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
- v3: drop bpf_iter__mptcp_subflow
- v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in _next() (Andrii)
- v5: use list_is_last() instead of list_entry_is_head() add KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
- v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
net/mptcp/bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +struct bpf_iter_mptcp_subflow {
- __u64 __opaque[2];
+} __aligned(8);
+struct bpf_iter_mptcp_subflow_kern {
- struct mptcp_sock *msk;
- struct list_head *pos;
+} __aligned(8);
- __bpf_kfunc_start_defs();
__bpf_kfunc static struct mptcp_subflow_context * @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) return NULL; } +__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
struct mptcp_sock *msk)
+{
- struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
- struct sock *sk = (struct sock *)msk;
- BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) >
sizeof(struct bpf_iter_mptcp_subflow));
- BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
__alignof__(struct bpf_iter_mptcp_subflow));
- kit->msk = msk;
- if (!msk)
NULL check is not needed. verifier should have rejected it for KF_TRUSTED_ARGS.
return -EINVAL;
- if (!sock_owned_by_user_nocheck(sk) &&
!spin_is_locked(&sk->sk_lock.slock))
I could have missed something. If it is to catch bug, should it be sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt hook here, the lock should have already been held earlier in the kernel.
This set is only showing the cg sockopt bpf prog but missing the major struct_ops piece. It is hard to comment. I assumed the lock situation is the same for the struct_ops where the lock will be held before calling the struct_ops prog?
return -EINVAL;
- kit->pos = &msk->conn_list;
- return 0;
+}
+__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{
- struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
- if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
return NULL;
- kit->pos = kit->pos->next;
- return list_entry(kit->pos, struct mptcp_subflow_context, node);
+}
+__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +}
- __bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
From: Geliang Tang tanggeliang@kylinos.cn
The KF_TRUSTED_ARGS flag is used for bpf_iter_mptcp_subflow_new, it indicates that the all pointer arguments are valid. It's necessary to add a KF_ACQUIRE helper to get valid "msk".
This patch adds bpf_mptcp_sock_acquire() and bpf_mptcp_sock_release() helpers for this. Increase sk->sk_refcnt in _acquire() and decrease it in _release(). Register them with KF_ACQUIRE flag and KF_RELEASE flag.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- net/mptcp/bpf.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index e39f0e4fb683c1aa31ee075281daee218dac5878..d50bd1ea7f6d0ff1abff32deef9a98b98ee8f42c 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -97,6 +97,23 @@ bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) { }
+__bpf_kfunc static struct +mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk) +{ + struct sock *sk = (struct sock *)msk; + + if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) + return msk; + return NULL; +} + +__bpf_kfunc static void bpf_mptcp_sock_release(struct mptcp_sock *msk) +{ + struct sock *sk = (struct sock *)msk; + + WARN_ON_ONCE(!sk || !refcount_dec_not_one(&sk->sk_refcnt)); +} + __bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) @@ -104,6 +121,8 @@ BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_mptcp_sock_acquire, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_mptcp_sock_release, KF_RELEASE) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote:
From: Geliang Tang tanggeliang@kylinos.cn
The KF_TRUSTED_ARGS flag is used for bpf_iter_mptcp_subflow_new, it indicates that the all pointer arguments are valid. It's necessary to add a KF_ACQUIRE helper to get valid "msk".
This feels wrong. It forces an unnecessary acquire to get around the verifier. bpf_sockopt->sk should be in "trusted". From looking at patch 7, the issue should be the return value of bpf_skc_to_mptcp_sock().
This patch adds bpf_mptcp_sock_acquire() and bpf_mptcp_sock_release() helpers for this. Increase sk->sk_refcnt in _acquire() and decrease it in _release(). Register them with KF_ACQUIRE flag and KF_RELEASE flag.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
net/mptcp/bpf.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index e39f0e4fb683c1aa31ee075281daee218dac5878..d50bd1ea7f6d0ff1abff32deef9a98b98ee8f42c 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -97,6 +97,23 @@ bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) { } +__bpf_kfunc static struct +mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk) +{
- struct sock *sk = (struct sock *)msk;
- if (sk && refcount_inc_not_zero(&sk->sk_refcnt))
return msk;
- return NULL;
+}
+__bpf_kfunc static void bpf_mptcp_sock_release(struct mptcp_sock *msk) +{
- struct sock *sk = (struct sock *)msk;
- WARN_ON_ONCE(!sk || !refcount_dec_not_one(&sk->sk_refcnt));
+}
- __bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) @@ -104,6 +121,8 @@ BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_mptcp_sock_acquire, KF_ACQUIRE | KF_RET_NULL)
It should need a KF_TRUSTED_ARGS here but then it will hit the same problem described in the commit message.
Instead of changing the verifier to get this work, one option is to use the "struct sock *sk" instead of "struct mptcp-sock *msk" as the argument in the bpf_iter_mptcp_subflow_new, and do the bpf_mptcp_sock_from_sock check in the bpf_iter_mptcp_subflow_new.
+BTF_ID_FLAGS(func, bpf_mptcp_sock_release, KF_RELEASE) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
From: Geliang Tang tanggeliang@kylinos.cn
This patch changes ADDR_2 from "10.0.1.2" to "10.0.2.1", and adds two more IPv4 test addresses ADDR_3 - ADDR_4, four IPv6 addresses ADDR6_1 - ADDR6_4. Add a new helper address_init() to initialize all these addresses.
Add a new parameter "endpoints" for endpoint_init() to control how many endpoints are used for the tests. This makes it more flexible. Update the parameters of endpoint_init() in test_subflow().
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 56 +++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index f8eb7f9d4fd20bbb7ee018728f7ae0f0a09d4d30..85f3d4119802a85c86cde7b74a0b857252bad8b8 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -14,7 +14,13 @@
#define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" -#define ADDR_2 "10.0.1.2" +#define ADDR_2 "10.0.2.1" +#define ADDR_3 "10.0.3.1" +#define ADDR_4 "10.0.4.1" +#define ADDR6_1 "dead:beef:1::1" +#define ADDR6_2 "dead:beef:2::1" +#define ADDR6_3 "dead:beef:3::1" +#define ADDR6_4 "dead:beef:4::1" #define PORT_1 10001
#ifndef IPPROTO_MPTCP @@ -322,22 +328,60 @@ static void test_mptcpify(void) close(cgroup_fd); }
-static int endpoint_init(char *flags) +static int address_init(void) { SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s addr add %s/64 dev veth1 nodad", NS_TEST, ADDR6_1); SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s addr add %s/64 dev veth2 nodad", NS_TEST, ADDR6_2); SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); - if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + + SYS(fail, "ip -net %s link add veth3 type veth peer name veth4", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth3", NS_TEST, ADDR_3); + SYS(fail, "ip -net %s addr add %s/64 dev veth3 nodad", NS_TEST, ADDR6_3); + SYS(fail, "ip -net %s link set dev veth3 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth4", NS_TEST, ADDR_4); + SYS(fail, "ip -net %s addr add %s/64 dev veth4 nodad", NS_TEST, ADDR6_4); + SYS(fail, "ip -net %s link set dev veth4 up", NS_TEST); + + return 0; +fail: + return -1; +} + +static int endpoint_add(char *addr, char *flags) +{ + return SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, addr, flags); +} + +static int endpoint_init(char *flags, u8 endpoints) +{ + int ret = -1; + + if (!endpoints || endpoints > 4) + goto fail; + + if (address_init()) + goto fail; + + if (SYS_NOFAIL("ip -net %s mptcp limits set add_addr_accepted 4 subflows 4", + NS_TEST)) { printf("'ip mptcp' not supported, skip this test.\n"); test__skip(); goto fail; }
- return 0; + if (endpoints > 1) + ret = endpoint_add(ADDR_2, flags); + if (endpoints > 2) + ret = ret ?: endpoint_add(ADDR_3, flags); + if (endpoints > 3) + ret = ret ?: endpoint_add(ADDR_4, flags); + fail: - return -1; + return ret; }
static void wait_for_new_subflows(int fd) @@ -423,7 +467,7 @@ static void test_subflow(void) if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_subflow")) goto skel_destroy;
- if (endpoint_init("subflow") < 0) + if (endpoint_init("subflow", 2) < 0) goto close_netns;
run_subflow();
From: Geliang Tang tanggeliang@kylinos.cn
This patch adds a "cgroup/getsockopt" program "iters_subflow" to test the newly added mptcp_subflow bpf_iter.
Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy, bpf_mptcp_sock_acquire/_release and other helpers into bpf_experimental.h.
Use bpf_mptcp_sock_acquire() to acquire the msk, then use bpf_for_each() to walk the subflow list of this msk. From there, future MPTCP-specific kfunc can be called in the loop. Because they are not there yet, this test doesn't do anything very useful for the moment, but it focuses on validating the 'bpf_iter' part and the basic MPTCP kfunc. That's why it simply adds all subflow ids to local variable local_ids to make sure all subflows have been seen, then invoke mptcp_subflow_tcp_sock() in the loop to pick the subflow context.
Out of the loop, use bpf_mptcp_subflow_ctx() to get the subflow context of the picked subflow context and do some verifications. Finally, assign local_ids to global variable ids so that the application can obtain this value, and release the msk.
A related subtest called test_iters_subflow is added to load and verify the newly added mptcp_subflow type bpf_iter example in test_mptcp. The endpoint_init() helper is used to add 3 new subflow endpoints. Then one byte of message is sent to trigger the creation of new subflows. getsockopt() is invoked once the subflows have been created to trigger the "cgroup/getsockopt" test program "iters_subflow". skel->bss->ids is then checked to make sure it equals 10, the sum of each subflow ID: we should have 4 subflows: 1 + 2 + 3 + 4 = 10. If that's the case, the bpf_iter loop did the job as expected.
Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2: - explicit sk protocol checks are no longer needed, implicitly done in bpf_skc_to_mptcp_sock(). - use bpf_skc_to_mptcp_sock() instead of bpf_mptcp_sk(), and mptcp_subflow_tcp_sock() instead of bpf_mptcp_subflow_tcp_sock(). - bpf_mptcp_subflow_ctx() can now return NULL. --- tools/testing/selftests/bpf/bpf_experimental.h | 8 +++ tools/testing/selftests/bpf/prog_tests/mptcp.c | 73 ++++++++++++++++++++++ tools/testing/selftests/bpf/progs/mptcp_bpf.h | 9 +++ .../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 63 +++++++++++++++++++ 4 files changed, 153 insertions(+)
diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index cd8ecd39c3f3c68d40c6e3e1465b42ed66537027..2ab3f0063c0fd6091ee19da4787671b89b5661f0 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -575,6 +575,14 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it, extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
+struct bpf_iter_mptcp_subflow; +extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, + struct mptcp_sock *msk) __weak __ksym; +extern struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym; +extern void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym; + extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym; extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym; extern int bpf_wq_set_callback_impl(struct bpf_wq *wq, diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index 85f3d4119802a85c86cde7b74a0b857252bad8b8..f37574b5ef68d8f32f8002df317869dfdf1d4b2d 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -11,6 +11,7 @@ #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" #include "mptcp_subflow.skel.h" +#include "mptcp_bpf_iters.skel.h"
#define NS_TEST "mptcp_ns" #define ADDR_1 "10.0.1.1" @@ -33,6 +34,9 @@ #ifndef MPTCP_INFO #define MPTCP_INFO 1 #endif +#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif #ifndef MPTCP_INFO_FLAG_FALLBACK #define MPTCP_INFO_FLAG_FALLBACK _BITUL(0) #endif @@ -480,6 +484,73 @@ static void test_subflow(void) close(cgroup_fd); }
+static void run_iters_subflow(void) +{ + int server_fd, client_fd; + int is_mptcp, err; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_fd")) + goto close_server; + + send_byte(client_fd); + wait_for_new_subflows(client_fd); + + len = sizeof(is_mptcp); + /* mainly to trigger the BPF program */ + err = getsockopt(client_fd, SOL_TCP, TCP_IS_MPTCP, &is_mptcp, &len); + if (ASSERT_OK(err, "getsockopt(client_fd, TCP_IS_MPTCP)")) + ASSERT_EQ(is_mptcp, 1, "is_mptcp"); + + close(client_fd); +close_server: + close(server_fd); +} + +static void test_iters_subflow(void) +{ + struct mptcp_bpf_iters *skel; + struct netns_obj *netns; + int cgroup_fd; + + cgroup_fd = test__join_cgroup("/iters_subflow"); + if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: iters_subflow")) + return; + + skel = mptcp_bpf_iters__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_load: iters_subflow")) + goto close_cgroup; + + skel->links.iters_subflow = bpf_program__attach_cgroup(skel->progs.iters_subflow, + cgroup_fd); + if (!ASSERT_OK_PTR(skel->links.iters_subflow, "attach getsockopt")) + goto skel_destroy; + + netns = netns_new(NS_TEST, true); + if (!ASSERT_OK_PTR(netns, "netns_new: iters_subflow")) + goto skel_destroy; + + if (endpoint_init("subflow", 4) < 0) + goto close_netns; + + run_iters_subflow(); + + /* 1 + 2 + 3 + 4 = 10 */ + ASSERT_EQ(skel->bss->ids, 10, "subflow ids"); + +close_netns: + netns_free(netns); +skel_destroy: + mptcp_bpf_iters__destroy(skel); +close_cgroup: + close(cgroup_fd); +} + void test_mptcp(void) { if (test__start_subtest("base")) @@ -488,4 +559,6 @@ void test_mptcp(void) test_mptcpify(); if (test__start_subtest("subflow")) test_subflow(); + if (test__start_subtest("iters_subflow")) + test_iters_subflow(); } diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h index 3b188ccdcc4041acb4f7ed38ae8ddf5a7305466a..31d7ecdc4c0489c63d28a25778f91093a6a093a5 100644 --- a/tools/testing/selftests/bpf/progs/mptcp_bpf.h +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h @@ -39,4 +39,13 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) return subflow->tcp_sock; }
+/* ksym */ +extern struct mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk) __ksym; +extern void bpf_mptcp_sock_release(struct mptcp_sock *msk) __ksym; + +extern struct mptcp_subflow_context * +bpf_mptcp_subflow_ctx(const struct sock *sk) __ksym; +extern struct sock * +bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) __ksym; + #endif diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c new file mode 100644 index 0000000000000000000000000000000000000000..fd5691a4073b22c24d3fa6f7ed6edb32d366492c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters.c @@ -0,0 +1,63 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024, Kylin Software */ + +/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" +#include "mptcp_bpf.h" + +char _license[] SEC("license") = "GPL"; +int ids; + +#ifndef TCP_IS_MPTCP +#define TCP_IS_MPTCP 43 /* Is MPTCP being used? */ +#endif + +SEC("cgroup/getsockopt") +int iters_subflow(struct bpf_sockopt *ctx) +{ + struct mptcp_subflow_context *subflow; + struct bpf_sock *sk = ctx->sk; + struct sock *ssk = NULL; + struct mptcp_sock *msk; + int local_ids = 0; + + if (ctx->level != SOL_TCP || ctx->optname != TCP_IS_MPTCP) + return 1; + + msk = bpf_skc_to_mptcp_sock(sk); + if (!msk || msk->pm.server_side || !msk->pm.subflows) + return 1; + + msk = bpf_mptcp_sock_acquire(msk); + if (!msk) + return 1; + bpf_for_each(mptcp_subflow, subflow, msk) { + /* Here MPTCP-specific packet scheduler kfunc can be called: + * this test is not doing anything really useful, only to + * verify the iteration works. + */ + + local_ids += subflow->subflow_id; + + /* only to check the following kfunc works */ + ssk = mptcp_subflow_tcp_sock(subflow); + } + + if (!ssk) + goto out; + + /* assert: if not OK, something wrong on the kernel side */ + if (ssk->sk_dport != ((struct sock *)msk)->sk_dport) + goto out; + + /* only to check the following kfunc works */ + subflow = bpf_mptcp_subflow_ctx(ssk); + if (!subflow || subflow->token != msk->token) + goto out; + + ids = local_ids; + +out: + bpf_mptcp_sock_release(msk); + return 1; +}
On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote:
+SEC("cgroup/getsockopt") +int iters_subflow(struct bpf_sockopt *ctx) +{
- struct mptcp_subflow_context *subflow;
- struct bpf_sock *sk = ctx->sk;
- struct sock *ssk = NULL;
- struct mptcp_sock *msk;
- int local_ids = 0;
- if (ctx->level != SOL_TCP || ctx->optname != TCP_IS_MPTCP)
return 1;
- msk = bpf_skc_to_mptcp_sock(sk);
- if (!msk || msk->pm.server_side || !msk->pm.subflows)
return 1;
- msk = bpf_mptcp_sock_acquire(msk);
- if (!msk)
return 1;
- bpf_for_each(mptcp_subflow, subflow, msk) {
/* Here MPTCP-specific packet scheduler kfunc can be called:
* this test is not doing anything really useful, only to
* verify the iteration works.
*/
local_ids += subflow->subflow_id;
/* only to check the following kfunc works */
ssk = mptcp_subflow_tcp_sock(subflow);
It is good to have test cases to exercise the new iter and kfunc. Thanks.
However, it seems not useful to show how it will be used in the future packet/subflow scheduler. iiuc, the core piece is in bpf_struct_ops. Without it, it is hard to comment. Any RFC patches ready to be posted?
- }
- if (!ssk)
goto out;
- /* assert: if not OK, something wrong on the kernel side */
- if (ssk->sk_dport != ((struct sock *)msk)->sk_dport)
goto out;
- /* only to check the following kfunc works */
- subflow = bpf_mptcp_subflow_ctx(ssk);
- if (!subflow || subflow->token != msk->token)
goto out;
- ids = local_ids;
+out:
- bpf_mptcp_sock_release(msk);
- return 1;
+}
Hello BPF maintainers and reviewers,
On 19/12/2024 16:46, Matthieu Baerts (NGI0) wrote:
Here is a series from Geliang, adding mptcp_subflow bpf_iter support.
We are working on extending MPTCP with BPF, e.g. to control the path manager -- in charge of the creation, deletion, and announcements of subflows (paths) -- and the packet scheduler -- in charge of selecting which available path the next data will be sent to. These extensions need to iterate over the list of subflows attached to an MPTCP connection, and do some specific actions via some new kfunc that will be added later on.
(...)
Changes in v2:
- Patches 1-2: new ones.
- Patch 3: remove two kfunc, more restrictions. (Martin)
- Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin)
- Patch 7: adaptations due to modifications in patches 1-4.
- Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-0...
The v2 of this series didn't get any reviews, probably because it has been sent the week before Xmas. Do you prefer if I resend it?
There is no hurry, I can also re-send it later if "now" is not a good time.
Cheers, Matt
On 1/15/25 1:39 AM, Matthieu Baerts wrote:
Here is a series from Geliang, adding mptcp_subflow bpf_iter support.
We are working on extending MPTCP with BPF, e.g. to control the path manager -- in charge of the creation, deletion, and announcements of subflows (paths) -- and the packet scheduler -- in charge of selecting which available path the next data will be sent to. These extensions need to iterate over the list of subflows attached to an MPTCP connection, and do some specific actions via some new kfunc that will be added later on.
(...)
Changes in v2:
- Patches 1-2: new ones.
- Patch 3: remove two kfunc, more restrictions. (Martin)
- Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin)
- Patch 7: adaptations due to modifications in patches 1-4.
- Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-0...
The v2 of this series didn't get any reviews, probably because it has been sent the week before Xmas. Do you prefer if I resend it?
No need to resend. will get to it.
linux-kselftest-mirror@lists.linaro.org