On 12/19/24 7:46 AM, Matthieu Baerts (NGI0) wrote:
From: Geliang Tang tanggeliang@kylinos.cn
It's necessary to traverse all subflows on the conn_list of an MPTCP socket and then call kfunc to modify the fields of each subflow. In kernel space, mptcp_for_each_subflow() helper is used for this:
mptcp_for_each_subflow(msk, subflow) kfunc(subflow);
But in the MPTCP BPF program, this has not yet been implemented. As Martin suggested recently, this conn_list walking + modify-by-kfunc usage fits the bpf_iter use case.
So this patch adds a new bpf_iter type named "mptcp_subflow" to do this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/ _destroy(). And register these bpf_iter mptcp_subflow into mptcp common kfunc set. Then bpf_for_each() for mptcp_subflow can be used in BPF program like this:
bpf_for_each(mptcp_subflow, subflow, msk) kfunc(subflow);
Suggested-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org
Notes:
- v2:
- Add BUILD_BUG_ON() checks, similar to the ones done with other bpf_iter_(...) helpers.
- Replace msk_owned_by_me() by sock_owned_by_user_nocheck() and !spin_is_locked() (Martin).
A few versions of this single patch have been previously posted to the BPF mailing list by Geliang, before continuing to the MPTCP mailing list only, with other patches of this series. The version of the whole series has been reset to 1, but here is the ChangeLog for the previous ones:
- v2: remove msk->pm.lock in _new() and _destroy() (Martin) drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
- v3: drop bpf_iter__mptcp_subflow
- v4: if msk is NULL, initialize kit->msk to NULL in _new() and check it in _next() (Andrii)
- v5: use list_is_last() instead of list_entry_is_head() add KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
- v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
net/mptcp/bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c index c5bfd84c16c43230d9d8e1fd8ff781a767e647b5..e39f0e4fb683c1aa31ee075281daee218dac5878 100644 --- a/net/mptcp/bpf.c +++ b/net/mptcp/bpf.c @@ -35,6 +35,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = { .set = &bpf_mptcp_fmodret_ids, }; +struct bpf_iter_mptcp_subflow {
- __u64 __opaque[2];
+} __aligned(8);
+struct bpf_iter_mptcp_subflow_kern {
- struct mptcp_sock *msk;
- struct list_head *pos;
+} __aligned(8);
- __bpf_kfunc_start_defs();
__bpf_kfunc static struct mptcp_subflow_context * @@ -47,10 +56,54 @@ bpf_mptcp_subflow_ctx(const struct sock *sk) return NULL; } +__bpf_kfunc static int +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
struct mptcp_sock *msk)
+{
- struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
- struct sock *sk = (struct sock *)msk;
- BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) >
sizeof(struct bpf_iter_mptcp_subflow));
- BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
__alignof__(struct bpf_iter_mptcp_subflow));
- kit->msk = msk;
- if (!msk)
NULL check is not needed. verifier should have rejected it for KF_TRUSTED_ARGS.
return -EINVAL;
- if (!sock_owned_by_user_nocheck(sk) &&
!spin_is_locked(&sk->sk_lock.slock))
I could have missed something. If it is to catch bug, should it be sock_owned_by_me() that has the lockdep splat? For the cg get/setsockopt hook here, the lock should have already been held earlier in the kernel.
This set is only showing the cg sockopt bpf prog but missing the major struct_ops piece. It is hard to comment. I assumed the lock situation is the same for the struct_ops where the lock will be held before calling the struct_ops prog?
return -EINVAL;
- kit->pos = &msk->conn_list;
- return 0;
+}
+__bpf_kfunc static struct mptcp_subflow_context * +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) +{
- struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
- if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
return NULL;
- kit->pos = kit->pos->next;
- return list_entry(kit->pos, struct mptcp_subflow_context, node);
+}
+__bpf_kfunc static void +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) +{ +}
- __bpf_kfunc_end_defs();
BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx, KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {