This patch set adds support for using FOU or GUE encapsulation with an ipip device operating in collect-metadata mode and a set of kfuncs for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses) in the ingress path of an externally controlled tunnel interface via the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be redirected to the same or a different externally controlled tunnel interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt} helpers and a call to bpf_redirect. This enables us to redirect packets between tunnel interfaces - and potentially change the encapsulation type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations. For example: redirecting packets between Geneve and GRE interfaces or GRE and plain ipip interfaces. However, redirecting using FOU or GUE is not supported today. The ip_tunnel module does not allow us to egress packets using additional UDP encapsulation from an ipip device in collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to the tunnel metadata. It can be filled by a new BPF kfunc introduced in Patch 2 and evaluated by the ip_tunnel egress path. This will allow us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap. These helpers can be used to set and get UDP encap parameters from the BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
--- v3: - Integrate selftest into test_progs (Alexei) v2: - Fixes for checkpatch.pl - Fixes for kernel test robot
Christian Ehrig (3): ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 + include/net/ip_tunnels.h | 28 ++-- net/ipv4/Makefile | 2 +- net/ipv4/fou_bpf.c | 119 ++++++++++++++ net/ipv4/fou_core.c | 5 + net/ipv4/ip_tunnel.c | 22 ++- net/ipv4/ipip.c | 1 + net/ipv6/sit.c | 2 +- .../selftests/bpf/prog_tests/test_tunnel.c | 153 +++++++++++++++++- .../selftests/bpf/progs/test_tunnel_kern.c | 117 ++++++++++++++ 10 files changed, 432 insertions(+), 19 deletions(-) create mode 100644 net/ipv4/fou_bpf.c
Add tests for FOU and GUE encapsulation via the bpf_skb_{set,get}_fou_encap kfuncs, using ipip devices in collect-metadata mode.
These tests make sure that we can successfully set and obtain FOU and GUE encap parameters using ingress / egress BPF tc-hooks.
Signed-off-by: Christian Ehrig cehrig@cloudflare.com --- .../selftests/bpf/prog_tests/test_tunnel.c | 153 +++++++++++++++++- .../selftests/bpf/progs/test_tunnel_kern.c | 117 ++++++++++++++ 2 files changed, 268 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c index 47f1d482fe39..d149ab98798d 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_tunnel.c +++ b/tools/testing/selftests/bpf/prog_tests/test_tunnel.c @@ -89,6 +89,9 @@ #define IP6VXLAN_TUNL_DEV0 "ip6vxlan00" #define IP6VXLAN_TUNL_DEV1 "ip6vxlan11"
+#define IPIP_TUNL_DEV0 "ipip00" +#define IPIP_TUNL_DEV1 "ipip11" + #define PING_ARGS "-i 0.01 -c 3 -w 10 -q"
static int config_device(void) @@ -188,6 +191,79 @@ static void delete_ip6vxlan_tunnel(void) SYS_NOFAIL("ip link delete dev %s", IP6VXLAN_TUNL_DEV1); }
+enum ipip_encap { + NONE = 0, + FOU = 1, + GUE = 2, +}; + +static int set_ipip_encap(const char *ipproto, const char *type) +{ + SYS(fail, "ip -n at_ns0 fou add port 5555 %s", ipproto); + SYS(fail, "ip -n at_ns0 link set dev %s type ipip encap %s", + IPIP_TUNL_DEV0, type); + SYS(fail, "ip -n at_ns0 link set dev %s type ipip encap-dport 5555", + IPIP_TUNL_DEV0); + + return 0; +fail: + return -1; +} + +static int add_ipip_tunnel(enum ipip_encap encap) +{ + int err; + const char *ipproto, *type; + + switch (encap) { + case FOU: + ipproto = "ipproto 4"; + type = "fou"; + break; + case GUE: + ipproto = "gue"; + type = ipproto; + break; + default: + ipproto = NULL; + type = ipproto; + } + + /* at_ns0 namespace */ + SYS(fail, "ip -n at_ns0 link add dev %s type ipip local %s remote %s", + IPIP_TUNL_DEV0, IP4_ADDR_VETH0, IP4_ADDR1_VETH1); + + if (type && ipproto) { + err = set_ipip_encap(ipproto, type); + if (!ASSERT_OK(err, "set_ipip_encap")) + goto fail; + } + + SYS(fail, "ip -n at_ns0 link set dev %s up", IPIP_TUNL_DEV0); + SYS(fail, "ip -n at_ns0 addr add dev %s %s/24", + IPIP_TUNL_DEV0, IP4_ADDR_TUNL_DEV0); + + /* root namespace */ + if (type && ipproto) + SYS(fail, "ip fou add port 5555 %s", ipproto); + SYS(fail, "ip link add dev %s type ipip external", IPIP_TUNL_DEV1); + SYS(fail, "ip link set dev %s up", IPIP_TUNL_DEV1); + SYS(fail, "ip addr add dev %s %s/24", IPIP_TUNL_DEV1, + IP4_ADDR_TUNL_DEV1); + + return 0; +fail: + return -1; +} + +static void delete_ipip_tunnel(void) +{ + SYS_NOFAIL("ip -n at_ns0 link delete dev %s", IPIP_TUNL_DEV0); + SYS_NOFAIL("ip -n at_ns0 fou del port 5555 2> /dev/null"); + SYS_NOFAIL("ip link delete dev %s", IPIP_TUNL_DEV1); + SYS_NOFAIL("ip fou del port 5555 2> /dev/null"); +} + static int test_ping(int family, const char *addr) { SYS(fail, "%s %s %s > /dev/null", ping_command(family), PING_ARGS, addr); @@ -386,10 +462,80 @@ static void test_ip6vxlan_tunnel(void) test_tunnel_kern__destroy(skel); }
-#define RUN_TEST(name) \ +static void test_ipip_tunnel(enum ipip_encap encap) +{ + struct test_tunnel_kern *skel = NULL; + struct nstoken *nstoken; + int set_src_prog_fd, get_src_prog_fd; + int ifindex = -1; + int err; + DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook, + .attach_point = BPF_TC_INGRESS); + + /* add ipip tunnel */ + err = add_ipip_tunnel(encap); + if (!ASSERT_OK(err, "add_ipip_tunnel")) + goto done; + + /* load and attach bpf prog to tunnel dev tc hook point */ + skel = test_tunnel_kern__open_and_load(); + if (!ASSERT_OK_PTR(skel, "test_tunnel_kern__open_and_load")) + goto done; + ifindex = if_nametoindex(IPIP_TUNL_DEV1); + if (!ASSERT_NEQ(ifindex, 0, "ipip11 ifindex")) + goto done; + tc_hook.ifindex = ifindex; + + switch (encap) { + case FOU: + get_src_prog_fd = bpf_program__fd( + skel->progs.ipip_encap_get_tunnel); + set_src_prog_fd = bpf_program__fd( + skel->progs.ipip_fou_set_tunnel); + break; + case GUE: + get_src_prog_fd = bpf_program__fd( + skel->progs.ipip_encap_get_tunnel); + set_src_prog_fd = bpf_program__fd( + skel->progs.ipip_gue_set_tunnel); + break; + default: + get_src_prog_fd = bpf_program__fd( + skel->progs.ipip_get_tunnel); + set_src_prog_fd = bpf_program__fd( + skel->progs.ipip_set_tunnel); + } + + if (!ASSERT_GE(set_src_prog_fd, 0, "bpf_program__fd")) + goto done; + if (!ASSERT_GE(get_src_prog_fd, 0, "bpf_program__fd")) + goto done; + if (attach_tc_prog(&tc_hook, get_src_prog_fd, set_src_prog_fd)) + goto done; + + /* ping from root namespace test */ + err = test_ping(AF_INET, IP4_ADDR_TUNL_DEV0); + if (!ASSERT_OK(err, "test_ping")) + goto done; + + /* ping from at_ns0 namespace test */ + nstoken = open_netns("at_ns0"); + err = test_ping(AF_INET, IP4_ADDR_TUNL_DEV1); + if (!ASSERT_OK(err, "test_ping")) + goto done; + close_netns(nstoken); + +done: + /* delete ipip tunnel */ + delete_ipip_tunnel(); + if (skel) + test_tunnel_kern__destroy(skel); +} + +#define RUN_TEST(name, ...) \ ({ \ if (test__start_subtest(#name)) { \ - test_ ## name(); \ + test_ ## name(__VA_ARGS__); \ } \ })
@@ -400,6 +546,9 @@ static void *test_tunnel_run_tests(void *arg)
RUN_TEST(vxlan_tunnel); RUN_TEST(ip6vxlan_tunnel); + RUN_TEST(ipip_tunnel, NONE); + RUN_TEST(ipip_tunnel, FOU); + RUN_TEST(ipip_tunnel, GUE);
cleanup();
diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c index 9ab2d55ab7c0..f66af753bbbb 100644 --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c @@ -52,6 +52,21 @@ struct vxlan_metadata { __u32 gbp; };
+struct bpf_fou_encap { + __be16 sport; + __be16 dport; +}; + +enum bpf_fou_encap_type { + FOU_BPF_ENCAP_FOU, + FOU_BPF_ENCAP_GUE, +}; + +int bpf_skb_set_fou_encap(struct __sk_buff *skb_ctx, + struct bpf_fou_encap *encap, int type) __ksym; +int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, + struct bpf_fou_encap *encap) __ksym; + struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 1); @@ -749,6 +764,108 @@ int ipip_get_tunnel(struct __sk_buff *skb) return TC_ACT_OK; }
+SEC("tc") +int ipip_gue_set_tunnel(struct __sk_buff *skb) +{ + struct bpf_tunnel_key key = {}; + struct bpf_fou_encap encap = {}; + void *data = (void *)(long)skb->data; + struct iphdr *iph = data; + void *data_end = (void *)(long)skb->data_end; + int ret; + + if (data + sizeof(*iph) > data_end) { + log_err(1); + return TC_ACT_SHOT; + } + + key.tunnel_ttl = 64; + if (iph->protocol == IPPROTO_ICMP) + key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */ + + ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0); + if (ret < 0) { + log_err(ret); + return TC_ACT_SHOT; + } + + encap.sport = 0; + encap.dport = bpf_htons(5555); + + ret = bpf_skb_set_fou_encap(skb, &encap, FOU_BPF_ENCAP_GUE); + if (ret < 0) { + log_err(ret); + return TC_ACT_SHOT; + } + + return TC_ACT_OK; +} + +SEC("tc") +int ipip_fou_set_tunnel(struct __sk_buff *skb) +{ + struct bpf_tunnel_key key = {}; + struct bpf_fou_encap encap = {}; + void *data = (void *)(long)skb->data; + struct iphdr *iph = data; + void *data_end = (void *)(long)skb->data_end; + int ret; + + if (data + sizeof(*iph) > data_end) { + log_err(1); + return TC_ACT_SHOT; + } + + key.tunnel_ttl = 64; + if (iph->protocol == IPPROTO_ICMP) + key.remote_ipv4 = 0xac100164; /* 172.16.1.100 */ + + ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0); + if (ret < 0) { + log_err(ret); + return TC_ACT_SHOT; + } + + encap.sport = 0; + encap.dport = bpf_htons(5555); + + ret = bpf_skb_set_fou_encap(skb, &encap, FOU_BPF_ENCAP_FOU); + if (ret < 0) { + log_err(ret); + return TC_ACT_SHOT; + } + + return TC_ACT_OK; +} + +SEC("tc") +int ipip_encap_get_tunnel(struct __sk_buff *skb) +{ + int ret; + struct bpf_tunnel_key key = {}; + struct bpf_fou_encap encap = {}; + + ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0); + if (ret < 0) { + log_err(ret); + return TC_ACT_SHOT; + } + + ret = bpf_skb_get_fou_encap(skb, &encap); + if (ret < 0) { + log_err(ret); + return TC_ACT_SHOT; + } + + if (bpf_ntohs(encap.dport) != 5555) + return TC_ACT_SHOT; + + bpf_printk("%d remote ip 0x%x, sport %d, dport %d\n", ret, + key.remote_ipv4, bpf_ntohs(encap.sport), + bpf_ntohs(encap.dport)); + return TC_ACT_OK; +} + SEC("tc") int ipip6_set_tunnel(struct __sk_buff *skb) {
Hello:
This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov ast@kernel.org:
On Fri, 7 Apr 2023 15:38:52 +0200 you wrote:
This patch set adds support for using FOU or GUE encapsulation with an ipip device operating in collect-metadata mode and a set of kfuncs for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses) in the ingress path of an externally controlled tunnel interface via the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be redirected to the same or a different externally controlled tunnel interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt} helpers and a call to bpf_redirect. This enables us to redirect packets between tunnel interfaces - and potentially change the encapsulation type - using only a single BPF program.
[...]
Here is the summary with links: - [bpf-next,v3,1/3] ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices https://git.kernel.org/bpf/bpf-next/c/ac931d4cdec3 - [bpf-next,v3,2/3] bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs https://git.kernel.org/bpf/bpf-next/c/c50e96099edb - [bpf-next,v3,3/3] selftests/bpf: Test FOU kfuncs for externally controlled ipip devices https://git.kernel.org/bpf/bpf-next/c/d9688f898c08
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org