On Mon Apr 28, 2025 at 6:52 PM CEST, Eduard Zingerman wrote:
Alexis Lothoré alexis.lothore@bootlin.com writes:
[...]
The function listened to is defined as accepting 'struct bpf_testmod_struct_arg_7', at the same time this function uses 'struct bpf_testmod_struct_arg_5'.
That's not an accidental mistake, those are in fact the same definition. bpf_testmod_struct_arg_7 is the kernel side definition in bpf_testmod.c:
struct bpf_testmod_struct_arg_7 { __int128 a; };
and struct bpf_testmode_struct_arg_5 is the one defined in the bpf test program:
struct bpf_testmod_struct_arg_5 { __int128 a; };
Apologies, but I'm still confused:
I apply this series on top of: 224ee86639f5 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc4")
line 12 of tracing_struct_many_args.c has the following definition:
struct bpf_testmod_struct_arg_5 { char a; short b; int c; long d; };
line 135 of the same file has the following definition:
SEC("fentry/bpf_testmod_test_struct_arg_11") int BPF_PROG2(test_struct_many_args_9, struct bpf_testmod_struct_arg_5, a, struct bpf_testmod_struct_arg_5, b, struct bpf_testmod_struct_arg_5, c, struct bpf_testmod_struct_arg_5, d, int, e, struct bpf_testmod_struct_arg_5, f)
line 70 of tools/testing/selftests/bpf/test_kmods/bpf_testmod.c:
struct bpf_testmod_struct_arg_7 { __int128 a; };
line 152 of the same file:
noinline int bpf_testmod_test_struct_arg_11(struct bpf_testmod_struct_arg_7 a, struct bpf_testmod_struct_arg_7 b, struct bpf_testmod_struct_arg_7 c, struct bpf_testmod_struct_arg_7 d, short e, struct bpf_testmod_struct_arg_7 f)
Do I use a wrong base to apply the series?
Argh, no, no, you are right, I checked again and I made some confusions between progs/tracing_struct.c and progs/tracing_struct_many_args.c. I initially did most of the work in tracing_struct.c, and eventually moved the code to tracing_struct_many_args.c before sending my series, but I apparently forgot to move bpf_testmod_struct_arg_5 declaration in tracing_struct_many_args.c (and so, to rename it, since this name is already used in there). As a consequence the bpf program is actually using the wrong struct layout. So thanks for insisting and spotting this issue !
I fixed my mess locally in order to re-run the gdb analysis mentioned in my previous mail, and the bug seems to be the same (unexpected t11:f: actual 35 != expected 43), with the same layout issue on the bpf context passed on the stack ("lucky" mistake ?). However, thinking more about this, I feel like there is still something that I have missed:
0xffffc900001dbce8: 38 0 0 0 0 0 0 0 0xffffc900001dbcf0: 0 0 0 0 0 0 0 0 0xffffc900001dbcf8: 39 0 0 0 0 0 0 0 0xffffc900001dbd00: 0 0 0 0 0 0 0 0 0xffffc900001dbd08: 40 0 0 0 0 0 0 0 0xffffc900001dbd10: 0 0 0 0 0 0 0 0 0xffffc900001dbd18: 41 0 0 0 0 0 0 0 0xffffc900001dbd20: 0 0 0 0 0 0 0 0 0xffffffffc04016a6: 42 0 0 0 0 0 0 0 0xffffc900001dbd30: 35 0 0 0 0 0 0 0 0xffffc900001dbd38: 43 0 0 0 0 0 0 0 0xffffc900001dbd40: 37 0 0 0 0 0 0 0
If things really behaved correctly, f would not have the correct value but would still be handled as a 16 bytes value, so the test would not fail with "actual 35 != 43", but something like "actual 27254487904906932132179118915584 != 43" (43 << 64 | 35) I guess. I still need to sort this out.
Alexis