On Tue, Jun 25, 2024 at 10:25 AM Geliang Tang geliang@kernel.org wrote:
From: Geliang Tang tanggeliang@kylinos.cn
Run the following BPF selftests on Loongarch:
./test_progs -t sockmap_basic
A Kernel panic occurs:
''' Oops[#1]: CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018-V4.0.11 pc 9000000004162774 ra 90000000048bf6c0 tp 90001000aa16c000 sp 90001000aa16fb90 a0 0000000000000000 a1 0000000000000000 a2 0000000000000000 a3 90001000aa16fd70 a4 0000000000000800 a5 0000000000000000 a6 000055557b63aae8 a7 00000000000000cf t0 0000000000000000 t1 0000000000004000 t2 0000000000000048 t3 0000000000000000 t4 0000000000000001 t5 0000000000000002 t6 0000000000000001 t7 0000000000000002 t8 0000000000000018 u0 9000000004856150 s9 0000000000000000 s0 0000000000000000 s1 0000000000000000 s2 90001000aa16fd70 s3 0000000000000000 s4 0000000000000000 s5 0000000000004000 s6 900010009284dc00 s7 0000000000000001 s8 900010009284dc00 ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560 ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 0000000c (PPLV0 +PIE +PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) BADV: 0000000000000040 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=000000001cba0874) Stack : 0000000000000001 fffffffffffffffc 0000000000000000 0000000000000000 0000000000000018 0000000000000000 0000000000000000 90000000048bf6c0 90000000052cd638 90001000aa16fd70 900010008bf51580 900010009284f000 90000000049f2b90 900010009284f188 900010009284f178 90001000861d4780 9000100084dccd00 0000000000000800 0000000000000007 fffffffffffffff2 000000000453e92f 90000000049aae34 90001000aa16fd60 900010009284f000 0000000000000000 0000000000000000 900010008bf51580 90000000049f2b90 0000000000000001 0000000000000000 9000100084dc3a10 900010009284f1ac 90001000aa16fd40 0000555559953278 0000000000000001 0000000000000000 90001000aa16fdc8 9000000005a5a000 90001000861d4780 0000000000000800 ... Call Trace: [<9000000004162774>] copy_page_to_iter+0x74/0x1c0 [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560 [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0 [<90000000049aae34>] inet_recvmsg+0x54/0x100 [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0 [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0 [<900000000481e27c>] sys_recvfrom+0x1c/0x40 [<9000000004c076ec>] do_syscall+0x8c/0xc0 [<9000000003731da4>] handle_syscall+0xc4/0x160
Code: 0010b09b 440125a0 0011df8d <28c10364> 0012b70c 00133305 0013b1ac 0010dc84 00151585
---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception Kernel relocated by 0x3510000 .text @ 0x9000000003710000 .data @ 0x9000000004d70000 .bss @ 0x9000000006469400 ---[ end Kernel panic - not syncing: Fatal exception ]--- '''
This is because "sg_page(sge)" is NULL in that case. This patch adds null check for it in sk_msg_recvmsg() to fix this error.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Geliang Tang tanggeliang@kylinos.cn
net/core/skmsg.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c index fd20aae30be2..bafcc1e2eadf 100644 --- a/net/core/skmsg.c +++ b/net/core/skmsg.c @@ -432,6 +432,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg, sge = sk_msg_elem(msg_rx, i); copy = sge->length; page = sg_page(sge);
if (!page)
goto out; if (copied + copy > len) copy = len - copied; copy = copy_page_to_iter(page, sge->offset, copy, iter);
-- 2.43.0
This looks pretty much random to me.
Please find the root cause, instead of desperately trying to fix 'tests'.