In this series from Geliang, modifying MPTCP BPF selftests, we have:
- A new MPTCP subflow BPF program setting socket options per subflow: it looks better to have this old test program in the BPF selftests to track regressions and to serve as example.
Note: Nicolas is no longer working at Tessares, but he did this work while working for them, and his email address is no longer available.
- A new hook in the same BPF program to do the verification step.
- A new MPTCP BPF subtest validating the new BPF program added in the first patch, with the help of the new hook added in the second patch.
Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Changes in v5: - See the individual changelog for more details about them - Patch 1/3: set TCP on the 2nd subflow - Patch 2/3: new - Patch 3/3: use the BPF program from patch 2/3 to do the validation instead of using ss. - Link to v4: https://lore.kernel.org/r/20240805-upstream-bpf-next-20240506-mptcp-subflow-...
Changes in v4: - Drop former patch 2/3: MPTCP's pm_nl_ctl requires a new header file: - I will check later if it is possible to avoid having duplicated header files in tools/include/uapi, but no need to block this series for that. Patch 2/3 can be added later if needed. - Patch 2/2: skip the test if 'ip mptcp' is not available. - Link to v3: https://lore.kernel.org/r/20240703-upstream-bpf-next-20240506-mptcp-subflow-...
Changes in v3: - Sorry for the delay between v2 and v3, this series was conflicting with the "add netns helpers", but it looks like it is on hold: https://lore.kernel.org/cover.1715821541.git.tanggeliang@kylinos.cn - Patch 1/3 includes "bpf_tracing_net.h", introduced in between. - New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link". - Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper. - Link to v2: https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow-...
Changes in v2: - Previous patches 1/4 and 2/4 have been dropped from this series: - 1/4: "selftests/bpf: Handle SIGINT when creating netns": - A new version, more generic and no longer specific to MPTCP BPF selftest will be sent later, as part of a new series. (Alexei) - 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro": - Removed, not to hide helper functions in macros. (Alexei) - The commit message of patch 1/2 has been clarified to avoid some possible confusions spot by Alexei. - Link to v1: https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow-...
--- Geliang Tang (2): selftests/bpf: Add getsockopt to inspect mptcp subflow selftests/bpf: Add mptcp subflow subtest
Nicolas Rybowski (1): selftests/bpf: Add mptcp subflow example
MAINTAINERS | 2 +- tools/testing/selftests/bpf/prog_tests/mptcp.c | 126 +++++++++++++++++++++ tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 +++++++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 128 ++++++++++++++++++++++ 4 files changed, 297 insertions(+), 1 deletion(-) --- base-commit: 6b083650a37318112fb60c65fbb6070584f53d93 change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3
Best regards,
From: Nicolas Rybowski nicolas.rybowski@tessares.net
Move Nicolas' patch into bpf selftests directory. This example adds a different mark (SO_MARK) on each subflow, and changes the TCP CC only on the first subflow.
From the userspace, an application can do a setsockopt() on an MPTCP socket, and typically the same value will be propagated to all subflows (paths). If someone wants to have different values per subflow, the recommended way is to use BPF. So it is good to add such example here, and make sure there is no regressions.
This example shows how it is possible to:
Identify the parent msk of an MPTCP subflow. Put different sockopt for each subflow of a same MPTCP connection.
Here especially, two different behaviours are implemented:
A socket mark (SOL_SOCKET SO_MARK) is put on each subflow of a same MPTCP connection. The order of creation of the current subflow defines its mark. The TCP CC algorithm of the very first subflow of an MPTCP connection is set to "reno".
This is just to show it is possible to identify an MPTCP connection, and set socket options, from different SOL levels, per subflow. "reno" has been picked because it is built-in and usually not set as default one. It is easy to verify with 'ss' that these modifications have been applied correctly. That's what the next patch is going to do.
Nicolas' code comes from:
commit 4d120186e4d6 ("bpf:examples: update mptcp_set_mark_kern.c")
from the MPTCP repo https://github.com/multipath-tcp/mptcp_net-next (the "scripts" branch), and it has been adapted by Geliang.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Co-developed-by: Geliang Tang tanggeliang@kylinos.cn Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Signed-off-by: Nicolas Rybowski nicolas.rybowski@tessares.net Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v1 -> v2: - The commit message has been updated: why setting multiple socket options, why reno, the verification is done in a later patch (different author). (Alexei) - v2 -> v3: - Only #include "bpf_tracing_net.h", linked to: https://lore.kernel.org/20240509175026.3423614-1-martin.lau@linux.dev - v4 -> v5: - Set reno as TCP cc on the second subflow, not to influence the getsockopt() done from the userspace, which will return the one from the first subflow, the default TCP cc then, not the modified one. --- tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 +++++++++++++++++++++++ 1 file changed, 59 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c new file mode 100644 index 000000000000..2e28f4a215b5 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2020, Tessares SA. */ +/* Copyright (c) 2024, Kylin Software */ + +/* vmlinux.h, bpf_helpers.h and other 'define' */ +#include "bpf_tracing_net.h" + +char _license[] SEC("license") = "GPL"; + +char cc[TCP_CA_NAME_MAX] = "reno"; + +/* Associate a subflow counter to each token */ +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u32)); + __uint(max_entries, 100); +} mptcp_sf SEC(".maps"); + +SEC("sockops") +int mptcp_subflow(struct bpf_sock_ops *skops) +{ + __u32 init = 1, key, mark, *cnt; + struct mptcp_sock *msk; + struct bpf_sock *sk; + int err; + + if (skops->op != BPF_SOCK_OPS_TCP_CONNECT_CB) + return 1; + + sk = skops->sk; + if (!sk) + return 1; + + msk = bpf_skc_to_mptcp_sock(sk); + if (!msk) + return 1; + + key = msk->token; + cnt = bpf_map_lookup_elem(&mptcp_sf, &key); + if (cnt) { + /* A new subflow is added to an existing MPTCP connection */ + __sync_fetch_and_add(cnt, 1); + mark = *cnt; + } else { + /* A new MPTCP connection is just initiated and this is its primary subflow */ + bpf_map_update_elem(&mptcp_sf, &key, &init, BPF_ANY); + mark = init; + } + + /* Set the mark of the subflow's socket based on appearance order */ + err = bpf_setsockopt(skops, SOL_SOCKET, SO_MARK, &mark, sizeof(mark)); + if (err < 0) + return 1; + if (mark == 2) + err = bpf_setsockopt(skops, SOL_TCP, TCP_CONGESTION, cc, TCP_CA_NAME_MAX); + + return 1; +}
From: Geliang Tang tanggeliang@kylinos.cn
This patch adds a "cgroup/getsockopt" way to inspect the subflows of an MPTCP socket, and verify the modifications done by the same BPF program in the previous commit: a different mark per subflow, and a different TCP CC set on the second one. This new hook will be used by the next commit to verify the socket options set on each subflow.
This extra "cgroup/getsockopt" prog walks the msk->conn_list and use bpf_core_cast to cast a pointer for readonly. It allows to inspect all the fields of a structure.
Note that on the kernel side, the MPTCP socket stores a list of subflows under 'msk->conn_list'. They can be iterated using the generic 'list' helpers. They have been imported here, with a small difference: list_for_each_entry() uses 'cond_break' to limit the number of iterations, and ease its use. Because only data need to be read here, it is enough to use this technique. It is planned to use bpf_iter, when BPF programs will be used to modify data from the different subflows. mptcp_subflow_tcp_sock() and mptcp_for_each_stubflow() helpers have also be imported.
Suggested-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v5: new patch, instead of using 'ss' in the following patch --- MAINTAINERS | 2 +- tools/testing/selftests/bpf/progs/mptcp_bpf.h | 42 ++++++++++++++ tools/testing/selftests/bpf/progs/mptcp_subflow.c | 69 +++++++++++++++++++++++ 3 files changed, 112 insertions(+), 1 deletion(-)
diff --git a/MAINTAINERS b/MAINTAINERS index 30a9b9450e11..2674b86209d1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16058,7 +16058,7 @@ F: include/net/mptcp.h F: include/trace/events/mptcp.h F: include/uapi/linux/mptcp*.h F: net/mptcp/ -F: tools/testing/selftests/bpf/*/*mptcp*.c +F: tools/testing/selftests/bpf/*/*mptcp*.[ch] F: tools/testing/selftests/net/mptcp/
NETWORKING [TCP] diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h new file mode 100644 index 000000000000..179b74c1205f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */ +#ifndef __MPTCP_BPF_H__ +#define __MPTCP_BPF_H__ + +#include "bpf_experimental.h" + +/* list helpers from include/linux/list.h */ +static inline int list_is_head(const struct list_head *list, + const struct list_head *head) +{ + return list == head; +} + +#define list_entry(ptr, type, member) \ + container_of(ptr, type, member) + +#define list_first_entry(ptr, type, member) \ + list_entry((ptr)->next, type, member) + +#define list_next_entry(pos, member) \ + list_entry((pos)->member.next, typeof(*(pos)), member) + +#define list_entry_is_head(pos, head, member) \ + list_is_head(&pos->member, (head)) + +/* small difference: 'cond_break' has been added in the conditions */ +#define list_for_each_entry(pos, head, member) \ + for (pos = list_first_entry(head, typeof(*pos), member); \ + cond_break, !list_entry_is_head(pos, head, member); \ + pos = list_next_entry(pos, member)) + +/* mptcp helpers from protocol.h */ +#define mptcp_for_each_subflow(__msk, __subflow) \ + list_for_each_entry(__subflow, &((__msk)->conn_list), node) + +static __always_inline struct sock * +mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) +{ + return subflow->tcp_sock; +} + +#endif diff --git a/tools/testing/selftests/bpf/progs/mptcp_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_subflow.c index 2e28f4a215b5..70302477e326 100644 --- a/tools/testing/selftests/bpf/progs/mptcp_subflow.c +++ b/tools/testing/selftests/bpf/progs/mptcp_subflow.c @@ -4,10 +4,12 @@
/* vmlinux.h, bpf_helpers.h and other 'define' */ #include "bpf_tracing_net.h" +#include "mptcp_bpf.h"
char _license[] SEC("license") = "GPL";
char cc[TCP_CA_NAME_MAX] = "reno"; +int pid;
/* Associate a subflow counter to each token */ struct { @@ -57,3 +59,70 @@ int mptcp_subflow(struct bpf_sock_ops *skops)
return 1; } + +static int _check_getsockopt_subflow_mark(struct mptcp_sock *msk, struct bpf_sockopt *ctx) +{ + struct mptcp_subflow_context *subflow; + int i = 0; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk; + + ssk = mptcp_subflow_tcp_sock(bpf_core_cast(subflow, + struct mptcp_subflow_context)); + + if (ssk->sk_mark != ++i) { + ctx->retval = -2; + break; + } + } + + return 1; +} + +static int _check_getsockopt_subflow_cc(struct mptcp_sock *msk, struct bpf_sockopt *ctx) +{ + struct mptcp_subflow_context *subflow; + + mptcp_for_each_subflow(msk, subflow) { + struct inet_connection_sock *icsk; + struct sock *ssk; + + ssk = mptcp_subflow_tcp_sock(bpf_core_cast(subflow, + struct mptcp_subflow_context)); + icsk = bpf_core_cast(ssk, struct inet_connection_sock); + + if (ssk->sk_mark == 2 && + __builtin_memcmp(icsk->icsk_ca_ops->name, cc, TCP_CA_NAME_MAX)) { + ctx->retval = -2; + break; + } + } + + return 1; +} + +SEC("cgroup/getsockopt") +int _getsockopt_subflow(struct bpf_sockopt *ctx) +{ + struct bpf_sock *sk = ctx->sk; + struct mptcp_sock *msk; + + if (bpf_get_current_pid_tgid() >> 32 != pid) + return 1; + + if (!sk || sk->protocol != IPPROTO_MPTCP || + (!(ctx->level == SOL_SOCKET && ctx->optname == SO_MARK) && + !(ctx->level == SOL_TCP && ctx->optname == TCP_CONGESTION))) + return 1; + + msk = bpf_core_cast(sk, struct mptcp_sock); + if (msk->pm.subflows != 1) { + ctx->retval = -1; + return 1; + } + + if (ctx->optname == SO_MARK) + return _check_getsockopt_subflow_mark(msk, ctx); + return _check_getsockopt_subflow_cc(msk, ctx); +}
From: Geliang Tang tanggeliang@kylinos.cn
This patch adds a subtest named test_subflow in test_mptcp to load and verify the newly added MPTCP subflow BPF program. To goal is to make sure it is possible to set different socket options per subflows, while the userspace socket interface only lets the application to set the same socket options for the whole MPTCP connection and its multiple subflows.
To check that, a client and a server are started in a dedicated netns, with veth interfaces to simulate multiple paths. They will exchange data to allow the creation of an additional subflow.
When the different subflows are being created, the new MPTCP subflow BPF program will set some socket options: marks and TCP CC. The validation is done by the same program, when the userspace checks the value of the modified socket options. On the userspace side, it will see that the default values are still being used on the MPTCP connection, while the BPF program will see different options set per subflow of the same MPTCP connection.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/76 Signed-off-by: Geliang Tang tanggeliang@kylinos.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org --- Notes: - v2 -> v3: - Use './mptcp_pm_nl_ctl' instead of 'ip mptcp', not supported by the BPF CI running IPRoute 5.5.0. - Use SYS_NOFAIL() in _ss_search() instead of calling system() - v3 -> v4: - Drop './mptcp_pm_nl_ctl', but skip this new test if 'ip mptcp' is not supported. - v4 -> v5: - Note that this new test is no longer skipped on the BPF CI, because 'ip mptcp' is now supported after the switch from Ubuntu 20.04 to 22.04. - Update the commit message, reflecting the latest version. - The validations are no longer done using 'ss', but using the new BPF program added in the previous patch, to reduce the use of external dependences. --- tools/testing/selftests/bpf/prog_tests/mptcp.c | 126 +++++++++++++++++++++++++ 1 file changed, 126 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index d2ca32fa3b21..c30f032edaca 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -9,8 +9,12 @@ #include "network_helpers.h" #include "mptcp_sock.skel.h" #include "mptcpify.skel.h" +#include "mptcp_subflow.skel.h"
#define NS_TEST "mptcp_ns" +#define ADDR_1 "10.0.1.1" +#define ADDR_2 "10.0.1.2" +#define PORT_1 10001
#ifndef IPPROTO_MPTCP #define IPPROTO_MPTCP 262 @@ -335,10 +339,132 @@ static void test_mptcpify(void) close(cgroup_fd); }
+static int endpoint_init(char *flags) +{ + SYS(fail, "ip -net %s link add veth1 type veth peer name veth2", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth1", NS_TEST, ADDR_1); + SYS(fail, "ip -net %s link set dev veth1 up", NS_TEST); + SYS(fail, "ip -net %s addr add %s/24 dev veth2", NS_TEST, ADDR_2); + SYS(fail, "ip -net %s link set dev veth2 up", NS_TEST); + if (SYS_NOFAIL("ip -net %s mptcp endpoint add %s %s", NS_TEST, ADDR_2, flags)) { + printf("'ip mptcp' not supported, skip this test.\n"); + test__skip(); + goto fail; + } + + return 0; +fail: + return -1; +} + +static void wait_for_new_subflows(int fd) +{ + socklen_t len; + u8 subflows; + int err, i; + + len = sizeof(subflows); + /* Wait max 1 sec for new subflows to be created */ + for (i = 0; i < 10; i++) { + err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len); + if (!err && subflows > 0) + break; + + sleep(0.1); + } +} + +static void run_subflow(void) +{ + int server_fd, client_fd, err; + char new[TCP_CA_NAME_MAX]; + char cc[TCP_CA_NAME_MAX]; + unsigned int mark; + socklen_t len; + + server_fd = start_mptcp_server(AF_INET, ADDR_1, PORT_1, 0); + if (!ASSERT_OK_FD(server_fd, "start_mptcp_server")) + return; + + client_fd = connect_to_fd(server_fd, 0); + if (!ASSERT_OK_FD(client_fd, "connect_to_fd")) + goto close_server; + + send_byte(client_fd); + wait_for_new_subflows(client_fd); + + len = sizeof(mark); + err = getsockopt(client_fd, SOL_SOCKET, SO_MARK, &mark, &len); + if (ASSERT_OK(err, "getsockopt(client_fd, SO_MARK)")) + ASSERT_EQ(mark, 0, "mark"); + + len = sizeof(new); + err = getsockopt(client_fd, SOL_TCP, TCP_CONGESTION, new, &len); + if (ASSERT_OK(err, "getsockopt(client_fd, TCP_CONGESTION)")) { + get_msk_ca_name(cc); + ASSERT_STREQ(new, cc, "cc"); + } + + close(client_fd); +close_server: + close(server_fd); +} + +static void test_subflow(void) +{ + int cgroup_fd, prog_fd, err; + struct mptcp_subflow *skel; + struct nstoken *nstoken; + struct bpf_link *link; + + cgroup_fd = test__join_cgroup("/mptcp_subflow"); + if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: mptcp_subflow")) + return; + + skel = mptcp_subflow__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_load: mptcp_subflow")) + goto close_cgroup; + + skel->bss->pid = getpid(); + + err = mptcp_subflow__attach(skel); + if (!ASSERT_OK(err, "skel_attach: mptcp_subflow")) + goto skel_destroy; + + prog_fd = bpf_program__fd(skel->progs.mptcp_subflow); + err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_SOCK_OPS, 0); + if (!ASSERT_OK(err, "prog_attach")) + goto skel_destroy; + + nstoken = create_netns(); + if (!ASSERT_OK_PTR(nstoken, "create_netns: mptcp_subflow")) + goto skel_destroy; + + if (endpoint_init("subflow") < 0) + goto close_netns; + + link = bpf_program__attach_cgroup(skel->progs._getsockopt_subflow, + cgroup_fd); + if (!ASSERT_OK_PTR(link, "getsockopt prog")) + goto close_netns; + + run_subflow(); + + bpf_link__destroy(link); +close_netns: + cleanup_netns(nstoken); +skel_destroy: + mptcp_subflow__destroy(skel); +close_cgroup: + close(cgroup_fd); +} + void test_mptcp(void) { if (test__start_subtest("base")) test_base(); if (test__start_subtest("mptcpify")) test_mptcpify(); + if (test__start_subtest("subflow")) + test_subflow(); }
Hello,
On 10/09/2024 16:13, Matthieu Baerts (NGI0) wrote:
From: Geliang Tang tanggeliang@kylinos.cn
This patch adds a subtest named test_subflow in test_mptcp to load and verify the newly added MPTCP subflow BPF program. To goal is to make sure it is possible to set different socket options per subflows, while the userspace socket interface only lets the application to set the same socket options for the whole MPTCP connection and its multiple subflows.
To check that, a client and a server are started in a dedicated netns, with veth interfaces to simulate multiple paths. They will exchange data to allow the creation of an additional subflow.
(...)
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c index d2ca32fa3b21..c30f032edaca 100644 --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -335,10 +339,132 @@ static void test_mptcpify(void) close(cgroup_fd); }
(...)
+static void wait_for_new_subflows(int fd) +{
- socklen_t len;
- u8 subflows;
- int err, i;
- len = sizeof(subflows);
- /* Wait max 1 sec for new subflows to be created */
- for (i = 0; i < 10; i++) {
err = getsockopt(fd, SOL_MPTCP, MPTCP_INFO, &subflows, &len);
if (!err && subflows > 0)
break;
sleep(0.1);
As reported by the CI, we are not in Python, usleep() should be used here. I missed that one during my review. I will send a new version with the fix tomorrow. Sorry for the noise.
Cheers, Matt
linux-kselftest-mirror@lists.linaro.org