From: Amery Hung ameryhung@gmail.com
[ Upstream commit efec2e55bdefb889639a6e7fe1f1f2431cdddc6a ]
It is possible for drivers to generate xdp packets with data residing entirely in fragments. To keep parsing headers using direct packet access, call bpf_xdp_pull_data() to pull headers into the linear data area.
Signed-off-by: Amery Hung ameryhung@gmail.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Link: https://patch.msgid.link/20250922233356.3356453-9-ameryhung@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
- What it fixes and why it matters - The BPF selftest program parsed Ethernet/IP/UDP headers using direct packet access without guaranteeing those headers are in the linear area. On drivers/NIC configs that place header data entirely in XDP fragments (multi-buffer/HDS), this can lead to invalid accesses or verifier failures. The patch ensures headers are pulled into the linear area before parsing, so the tests behave correctly on such drivers.
- Scope and contained changes - Single selftests file only: tools/testing/selftests/net/lib/xdp_native.bpf.c. - Adds kfunc declaration for `bpf_xdp_pull_data()` at xdp_native.bpf.c:17 to request pulling bytes from non-linear XDP data into the linear region. - Updates the UDP header parsing helper to pull and then re-read pointers: - Pull L2 first, then re-load pointers (xdp_native.bpf.c:78–86 and 96–106). - For IPv4, pull up to L3+L4 and re-load pointers (xdp_native.bpf.c:91–106). - For IPv6, same pattern (xdp_native.bpf.c:109–124). - This ensures `data`/`data_end` are refreshed after each pull to satisfy the verifier and correctness of direct accesses. - Updates TX path similarly: - Pull L2 then re-load pointers (xdp_native.bpf.c:182–190). - For IPv4, pull up to L3+L4, re-load pointers, then validate, swap L2 and swap IPv4 src/dst (xdp_native.bpf.c:196–221). - For IPv6, same flow including `eth = data` reload before swapping MACs (xdp_native.bpf.c:233–261). - No kernel subsystem logic is changed; only test-side BPF program logic.
- Backport suitability vs. stable rules - Fixes a real-world issue affecting test correctness on drivers that produce non-linear XDP frames (user-visible in CI/selftests). - Minimal, self-contained change confined to selftests; no API or ABI changes; no architecture changes; low regression risk to the kernel proper. - Aligns with stable policy to keep selftests working on stable trees that already have the underlying feature.
- Important dependency to include - This change depends on kernel support for the kfunc `bpf_xdp_pull_data()` which is introduced by “bpf: Support pulling non-linear xdp data” (net/core/filter.c:12253). Ensure that commit is present in the target stable branch; otherwise the selftest program load will fail on kernels without this kfunc. - There is a follow-up fix that must be included to avoid verifier failures: “selftests: drv-net: Reload pkt pointer after calling filter_udphdr” (commit 11ae737efea10). It re-computes header length using a freshly reloaded `ctx->data` after `filter_udphdr()` because `bpf_xdp_pull_data()` invalidates earlier packet pointers. In this tree, that fix manifests as changing `hdr_len` calculations to `... - (void *)(long)ctx->data` (e.g., xdp_native.bpf.c:430–436 and 582–590). Backport this fix alongside the main patch to prevent non- deterministic verifier errors depending on compiler codegen.
- Risk and side effects - Selftests-only; no effect on runtime kernel paths. - The only meaningful risk is missing dependencies: if `bpf_xdp_pull_data()` support isn’t in the target stable branch, or if the follow-up “Reload pkt pointer” fix is omitted, test load or verification can fail. With both present, changes are straightforward and low risk.
Given the above, this is a good candidate for stable backport on branches that already include `bpf_xdp_pull_data()` support, and it should be backported together with the follow-up “Reload pkt pointer” fix to avoid verifier regressions.
.../selftests/net/lib/xdp_native.bpf.c | 89 +++++++++++++++---- 1 file changed, 74 insertions(+), 15 deletions(-)
diff --git a/tools/testing/selftests/net/lib/xdp_native.bpf.c b/tools/testing/selftests/net/lib/xdp_native.bpf.c index 521ba38f2ddda..df4eea5c192b3 100644 --- a/tools/testing/selftests/net/lib/xdp_native.bpf.c +++ b/tools/testing/selftests/net/lib/xdp_native.bpf.c @@ -14,6 +14,8 @@ #define MAX_PAYLOAD_LEN 5000 #define MAX_HDR_LEN 64
+extern int bpf_xdp_pull_data(struct xdp_md *xdp, __u32 len) __ksym __weak; + enum { XDP_MODE = 0, XDP_PORT = 1, @@ -68,30 +70,57 @@ static void record_stats(struct xdp_md *ctx, __u32 stat_type)
static struct udphdr *filter_udphdr(struct xdp_md *ctx, __u16 port) { - void *data_end = (void *)(long)ctx->data_end; - void *data = (void *)(long)ctx->data; struct udphdr *udph = NULL; - struct ethhdr *eth = data; + void *data, *data_end; + struct ethhdr *eth; + int err; + + err = bpf_xdp_pull_data(ctx, sizeof(*eth)); + if (err) + return NULL; + + data_end = (void *)(long)ctx->data_end; + data = eth = (void *)(long)ctx->data;
if (data + sizeof(*eth) > data_end) return NULL;
if (eth->h_proto == bpf_htons(ETH_P_IP)) { - struct iphdr *iph = data + sizeof(*eth); + struct iphdr *iph; + + err = bpf_xdp_pull_data(ctx, sizeof(*eth) + sizeof(*iph) + + sizeof(*udph)); + if (err) + return NULL; + + data_end = (void *)(long)ctx->data_end; + data = (void *)(long)ctx->data; + + iph = data + sizeof(*eth);
if (iph + 1 > (struct iphdr *)data_end || iph->protocol != IPPROTO_UDP) return NULL;
- udph = (void *)eth + sizeof(*iph) + sizeof(*eth); - } else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) { - struct ipv6hdr *ipv6h = data + sizeof(*eth); + udph = data + sizeof(*iph) + sizeof(*eth); + } else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) { + struct ipv6hdr *ipv6h; + + err = bpf_xdp_pull_data(ctx, sizeof(*eth) + sizeof(*ipv6h) + + sizeof(*udph)); + if (err) + return NULL; + + data_end = (void *)(long)ctx->data_end; + data = (void *)(long)ctx->data; + + ipv6h = data + sizeof(*eth);
if (ipv6h + 1 > (struct ipv6hdr *)data_end || ipv6h->nexthdr != IPPROTO_UDP) return NULL;
- udph = (void *)eth + sizeof(*ipv6h) + sizeof(*eth); + udph = data + sizeof(*ipv6h) + sizeof(*eth); } else { return NULL; } @@ -145,17 +174,34 @@ static void swap_machdr(void *data)
static int xdp_mode_tx_handler(struct xdp_md *ctx, __u16 port) { - void *data_end = (void *)(long)ctx->data_end; - void *data = (void *)(long)ctx->data; struct udphdr *udph = NULL; - struct ethhdr *eth = data; + void *data, *data_end; + struct ethhdr *eth; + int err; + + err = bpf_xdp_pull_data(ctx, sizeof(*eth)); + if (err) + return XDP_PASS; + + data_end = (void *)(long)ctx->data_end; + data = eth = (void *)(long)ctx->data;
if (data + sizeof(*eth) > data_end) return XDP_PASS;
if (eth->h_proto == bpf_htons(ETH_P_IP)) { - struct iphdr *iph = data + sizeof(*eth); - __be32 tmp_ip = iph->saddr; + struct iphdr *iph; + __be32 tmp_ip; + + err = bpf_xdp_pull_data(ctx, sizeof(*eth) + sizeof(*iph) + + sizeof(*udph)); + if (err) + return XDP_PASS; + + data_end = (void *)(long)ctx->data_end; + data = (void *)(long)ctx->data; + + iph = data + sizeof(*eth);
if (iph + 1 > (struct iphdr *)data_end || iph->protocol != IPPROTO_UDP) @@ -169,8 +215,10 @@ static int xdp_mode_tx_handler(struct xdp_md *ctx, __u16 port) return XDP_PASS;
record_stats(ctx, STATS_RX); + eth = data; swap_machdr((void *)eth);
+ tmp_ip = iph->saddr; iph->saddr = iph->daddr; iph->daddr = tmp_ip;
@@ -178,9 +226,19 @@ static int xdp_mode_tx_handler(struct xdp_md *ctx, __u16 port)
return XDP_TX;
- } else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) { - struct ipv6hdr *ipv6h = data + sizeof(*eth); + } else if (eth->h_proto == bpf_htons(ETH_P_IPV6)) { struct in6_addr tmp_ipv6; + struct ipv6hdr *ipv6h; + + err = bpf_xdp_pull_data(ctx, sizeof(*eth) + sizeof(*ipv6h) + + sizeof(*udph)); + if (err) + return XDP_PASS; + + data_end = (void *)(long)ctx->data_end; + data = (void *)(long)ctx->data; + + ipv6h = data + sizeof(*eth);
if (ipv6h + 1 > (struct ipv6hdr *)data_end || ipv6h->nexthdr != IPPROTO_UDP) @@ -194,6 +252,7 @@ static int xdp_mode_tx_handler(struct xdp_md *ctx, __u16 port) return XDP_PASS;
record_stats(ctx, STATS_RX); + eth = data; swap_machdr((void *)eth);
__builtin_memcpy(&tmp_ipv6, &ipv6h->saddr, sizeof(tmp_ipv6));