Hi to whom it may concern,
We are getting on a 6.0.0 (and also on 5.10 up) the following Mellanox infiniband problem (see below). Can you please help (this is on a running ia64 cluster).
Regards,
Rudi Gabler
[ 31.915749] Unable to handle kernel NULL pointer dereference (address 0000000000000010) [ 31.915749] kworker/u17:0[44]: Oops 11012296146944 [1] [ 31.915749] Modules linked in: af_packet ib_iser libiscsi scsi_transport_iscsi nf_tables nfnetlink rpcrdma sunrpc ib_ipoib tg3 libphy ib_mthca fuse configfs dm_round_robin qla2xxx firmware_class dm_mirror dm_region_hash dm_log dm_multipath efivarfs
[ 31.915749] CPU: 0 PID: 44 Comm: kworker/u17:0 Not tainted 6.0.0-gentoo-ia64 #5 [ 31.915749] Hardware name: hp server BL860c , BIOS 04.32 05/21/2013 [ 31.915749] Workqueue: ib-comp-unb-wq ib_cq_poll_work [ 31.915749] psr : 0000121008522030 ifs : 8000000000000ca1 ip : [<a00000020036ba21>] Not tainted (6.0.0-gentoo-ia64) [ 31.915749] ip is at mthca_poll_cq+0xc41/0x1620 [ib_mthca] [ 31.915749] unat: 0000000000000000 pfs : 0000000000000ca1 rsc : 0000000000000003 [ 31.915749] rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000015555 [ 31.915749] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f [ 31.915749] csd : 0000000000000000 ssd : 0000000000000000 [ 31.915749] b0 : a00000020036b290 b6 : a00000020036ade0 b7 : a00000010000bce0 [ 31.915749] f6 : 1003ee000000106bf1c50 f7 : 1003e61c8864680b583eb [ 31.915749] f8 : 1003e73ad788c017bed70 f9 : 1003e0000000000015ab9 [ 31.915749] f10 : 1003e000000000000b76a f11 : 1003e0000000000000000 [ 31.915749] r1 : a00000020037b480 r2 : 0000000000000000 r3 : 00000000000000d0 [ 31.915749] r8 : e000000107d85100 r9 : 0000000000000000 r10 : 0000000000000000 [ 31.915749] r11 : 0000000000000000 r12 : e000000100507d40 r13 : e000000100500000 [ 31.915749] r14 : e000000100ce9e00 r15 : 0000000000000000 r16 : 0000000000000010 [ 31.915749] r17 : 0000000000040000 r18 : 8080808080808080 r19 : e00000010012cb74 [ 31.915749] r20 : 000000000000012c r21 : 73ad788c017bed70 r22 : 0000040000000000 [ 31.915749] r23 : e000000106bd4c10 r24 : 0000000000010000 r25 : 000000000000ffff [ 31.915749] r26 : 0000000000000400 r27 : e00000010786b018 r28 : e000000107d85148 [ 31.915749] r29 : e000000107d852f0 r30 : 0000000400000000 r31 : e000000107d85314 [ 31.915749] Call Trace: [ 31.915749] [<a000000100013170>] show_stack.part.0+0x30/0x50 sp=e000000100507990 bsp=e000000100501430 [ 31.915749] [<a000000100013720>] show_stack+0x30/0xa0 sp=e000000100507990 bsp=e000000100501400 [ 31.915749] [<a000000100014110>] show_regs+0x980/0x990 sp=e000000100507b60 bsp=e0000001005013a8 [ 31.915749] [<a000000100022340>] die+0x180/0x2e0 sp=e000000100507b60 bsp=e000000100501360 [ 31.915749] [<a000000100045a90>] ia64_do_page_fault+0x850/0xa20 sp=e000000100507b60 bsp=e0000001005012d8 [ 31.915749] [<a00000010000c4c0>] ia64_leave_kernel+0x0/0x270 sp=e000000100507b70 bsp=e0000001005012d8 [ 31.915749] [<a00000020036ba20>] mthca_poll_cq+0xc40/0x1620 [ib_mthca] sp=e000000100507d40 bsp=e0000001005011c8 [ 31.915749] [<a000000100ad0f30>] __ib_process_cq+0xc0/0x210 sp=e000000100507e30 bsp=e000000100501150 [ 31.915749] [<a000000100ad1430>] ib_cq_poll_work+0x40/0x100 sp=e000000100507e30 bsp=e000000100501120 [ 31.915749] [<a000000100081820>] process_one_work+0x3b0/0x4c0 sp=e000000100507e30 bsp=e0000001005010a0 [ 31.915749] [<a000000100081f30>] worker_thread+0x580/0x670 sp=e000000100507e30 bsp=e000000100501008 [ 31.915749] [<a000000100090580>] kthread+0x1d0/0x1f0 sp=e000000100507e30 bsp=e000000100500fb8 [ 31.915749] [<a00000010000c2b0>] call_payload+0x50/0x80 sp=e000000100507e30 bsp=e000000100500fa0 [ 31.915749] Disabling lock debugging due to kernel taint
On Fri, Oct 14, 2022 at 05:49:21PM +0000, rug@usm.lmu.de wrote:
Hi to whom it may concern,
We are getting on a 6.0.0 (and also on 5.10 up) the following Mellanox infiniband problem (see below). Can you please help (this is on a running ia64 cluster).
Please email the infiniband kernel developers about this. Their mailing list info is in the MAINTAINERS file in the kernel source tree.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org