We recently updated an internal test server from kernel v4.19.273 to v4.19.286 and since then it has already multiple times triggered a kernel panic due to a hard lockup. The lockups look e.g. like this:
[29397.950589] RIP: 0010:native_queued_spin_lock_slowpath+0x57/0x190 [29397.950590] Code: 74 38 81 e6 00 ff ff ff 75 60 f0 0f ba 2f 08 8b 07 72 57 89 c2 30 e6 a9 00 00 ff ff 75 48 85 d2 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01 [29397.950591] RSP: 0000:ffff93d07f703de8 EFLAGS: 00000002 [29397.950591] RAX: 0000000000000101 RBX: ffff93cf90087220 RCX: 0000000000000006 [29397.950592] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff93cf90087220 [29397.950592] RBP: ffff93d07f703de8 R08: 0000000000000000 R09: ffff93d07f71fb70 [29397.950593] R10: ffff93d07f71fb28 R11: ffff93d07f703ee8 R12: 0000000000000002 [29397.950593] R13: ffff93cf852cde58 R14: ffff93cf852cc000 R15: 0000000000000003 [29397.950594] ? native_queued_spin_lock_slowpath+0x57/0x190 [29397.950594] ? native_queued_spin_lock_slowpath+0x57/0x190 [29397.950594] </NMI> [29397.950595] <IRQ> [29397.950595] _raw_spin_lock_irqsave+0x46/0x50 [29397.950595] set_need_watch+0x2d/0x70 [ipmi_si] [29397.950596] ? _raw_spin_lock_irqsave+0x25/0x50 [29397.950596] ipmi_timeout+0x2b4/0x530 [ipmi_msghandler] [29397.950597] ? ipmi_set_gets_events+0x260/0x260 [ipmi_msghandler] [29397.950597] call_timer_fn+0x30/0x130 [29397.950597] ? ipmi_set_gets_events+0x260/0x260 [ipmi_msghandler] [29397.950598] run_timer_softirq+0x1ce/0x3f0 [29397.950598] ? ktime_get+0x40/0xa0 [29397.950598] ? sched_clock+0x9/0x10 [29397.950599] ? sched_clock_cpu+0x11/0xc0 [29397.950599] __do_softirq+0x104/0x32d [29397.950600] ? sched_clock_cpu+0x11/0xc0 [29397.950600] irq_exit+0x11b/0x120 [29397.950600] smp_apic_timer_interrupt+0x79/0x140 [29397.950601] apic_timer_interrupt+0xf/0x20 [29397.950601] </IRQ>
And like this:
[16944.269585] RIP: 0010:native_queued_spin_lock_slowpath+0x57/0x190 [16944.269586] Code: 74 38 81 e6 00 ff ff ff 75 60 f0 0f ba 2f 08 8b 07 72 57 89 c2 30 e6 a9 00 00 ff ff 75 48 85 d2 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01 [16944.269587] RSP: 0018:ffff9a4e40b03dd0 EFLAGS: 00000002 [16944.269588] RAX: 0000000000580101 RBX: ffff9a4ac5089e98 RCX: ffff9a2b9574b538 [16944.269588] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9a4ac5089e98 [16944.269589] RBP: ffff9a4e40b03dd0 R08: ffff9a4ac5089e58 R09: ffff9a4e40b1fb70 [16944.269589] R10: ffff9a4e40b1fb28 R11: ffff9a4e40b03ee8 R12: 0000000000000002 [16944.269590] R13: ffff9a4ac5089e98 R14: ffff9a4ac5089e58 R15: ffff9a4ac5089e54 [16944.269590] ? native_queued_spin_lock_slowpath+0x57/0x190 [16944.269591] ? native_queued_spin_lock_slowpath+0x57/0x190 [16944.269591] </NMI> [16944.269592] <IRQ> [16944.269592] _raw_spin_lock_irqsave+0x46/0x50 [16944.269592] ipmi_smi_msg_received+0x1bc/0x300 [ipmi_msghandler] [16944.269593] smi_event_handler+0x26c/0x650 [ipmi_si] [16944.269593] smi_timeout+0x46/0xc0 [ipmi_si] [16944.269594] ? ipmi_si_irq_handler+0x70/0x70 [ipmi_si] [16944.269594] call_timer_fn+0x30/0x130 [16944.269595] ? ipmi_si_irq_handler+0x70/0x70 [ipmi_si] [16944.269595] run_timer_softirq+0x1ce/0x3f0 [16944.269595] ? ktime_get+0x40/0xa0 [16944.269596] ? sched_clock+0x9/0x10 [16944.269596] ? sched_clock_cpu+0x11/0xc0 [16944.269597] __do_softirq+0x104/0x32d [16944.269597] ? sched_clock_cpu+0x11/0xc0 [16944.269597] irq_exit+0x11b/0x120 [16944.269598] smp_apic_timer_interrupt+0x79/0x140 [16944.269598] apic_timer_interrupt+0xf/0x20 [16944.269599] </IRQ>
To me these would look like an ordering violation between "smi_info->si_lock" and "intf->xmit_msgs_lock", probably introduced by this commit:
commit b4a34aa6dfbca67610e56ad84a3595f537c85af9 Author: Corey Minyard cminyard@mvista.com Date: Tue Oct 23 11:29:02 2018 -0500
ipmi: Fix how the lower layers are told to watch for messages
[ Upstream commit c65ea996595005be470fbfa16711deba414fd33b ]
In order to test the theory further I built and booted a kernel with lockdep and this happened:
[ 215.679605] kipmi0/1465 is trying to acquire lock: [ 215.684490] 00000000fc1528d3 (&(&new_smi->si_lock)->rlock){..-.}, at: set_need_watch+0x2d/0x70 [ipmi_si] [ 215.694073] but task is already holding lock: [ 215.699995] 00000000e2eea01c (&(&intf->xmit_msgs_lock)->rlock){..-.}, at: smi_recv_tasklet+0x170/0x260 [ipmi_msghandler] [ 215.710966] which lock already depends on the new lock.
[ 215.719243] the existing dependency chain (in reverse order) is: [ 215.726824] -> #1 (&(&intf->xmit_msgs_lock)->rlock){..-.}: [ 215.733890] lock_acquire+0xae/0x180 [ 215.738111] _raw_spin_lock_irqsave+0x4d/0x8a [ 215.743113] ipmi_smi_msg_received+0x1bc/0x300 [ipmi_msghandler] [ 215.749766] smi_event_handler+0x26c/0x660 [ipmi_si] [ 215.755378] ipmi_thread+0x5d/0x200 [ipmi_si] [ 215.760377] kthread+0x13c/0x160 [ 215.764247] ret_from_fork+0x24/0x50 [ 215.768457] -> #0 (&(&new_smi->si_lock)->rlock){..-.}: [ 215.775220] __lock_acquire+0xa61/0xfb0 [ 215.779700] lock_acquire+0xae/0x180 [ 215.783912] _raw_spin_lock_irqsave+0x4d/0x8a [ 215.788915] set_need_watch+0x2d/0x70 [ipmi_si] [ 215.794091] smi_tell_to_watch.constprop.39+0x4a/0x50 [ipmi_msghandler] [ 215.801353] smi_recv_tasklet+0xe8/0x260 [ipmi_msghandler] [ 215.807484] tasklet_action_common.isra.14+0x83/0x1a0 [ 215.813177] tasklet_action+0x22/0x30 [ 215.817479] __do_softirq+0xd4/0x3ef [ 215.821698] irq_exit+0x120/0x130 [ 215.825649] smp_irq_work_interrupt+0x8c/0x190 [ 215.830732] irq_work_interrupt+0xf/0x20 [ 215.835296] _raw_spin_unlock_irqrestore+0x56/0x60 [ 215.840725] ipmi_thread+0x13e/0x200 [ipmi_si] [ 215.845811] kthread+0x13c/0x160 [ 215.849683] ret_from_fork+0x24/0x50 [ 215.853894] other info that might help us debug this:
[ 215.862075] Possible unsafe locking scenario:
[ 215.868139] CPU0 CPU1 [ 215.872788] ---- ---- [ 215.877433] lock(&(&intf->xmit_msgs_lock)->rlock); [ 215.882512] lock(&(&new_smi->si_lock)->rlock); [ 215.889776] lock(&(&intf->xmit_msgs_lock)->rlock); [ 215.897413] lock(&(&new_smi->si_lock)->rlock); [ 215.902151] *** DEADLOCK ***
[ 215.908250] 2 locks held by kipmi0/1465:
This seems to also indicate the same locks as culprits although the backtraces look different from the actual crashes.
On Mon, Jun 19, 2023 at 11:11:16AM +0000, Janne Huttunen (Nokia) wrote:
We recently updated an internal test server from kernel v4.19.273 to v4.19.286 and since then it has already multiple times triggered a kernel panic due to a hard lockup. The lockups look e.g. like this:
Does this also happen on newer 5.4 and 6.4-rc7 releases?
thanks,
greg k-h
On Mon, Jun 19, 2023 at 02:23:47PM +0200, Greg KH wrote:
On Mon, Jun 19, 2023 at 11:11:16AM +0000, Janne Huttunen (Nokia) wrote:
We recently updated an internal test server from kernel v4.19.273 to v4.19.286 and since then it has already multiple times triggered a kernel panic due to a hard lockup. The lockups look e.g. like this:
Does this also happen on newer 5.4 and 6.4-rc7 releases?
If I am correct, no, it will not happen there. I was surprised
I believe the change that causes this issue was:
b4a34aa6d "ipmi: Fix how the lower layers are told to watch for messages"
which was backported into 4.9. But there was a locking change:
e1891cffd4c4 "ipmi: Make the smi watcher be disabled immediately when not needed"
that came in 5.1. That was required for b4a34aa6d to work correctly, and I didn't realize that. If Janne tests that patch (and the other fix for it), then kernels 5.1 and later are good as they are.
Thanks,
-corey
thanks,
greg k-h
On Mon, Jun 19, 2023 at 11:11:16AM +0000, Janne Huttunen (Nokia) wrote:
We recently updated an internal test server from kernel v4.19.273 to v4.19.286 and since then it has already multiple times triggered a kernel panic due to a hard lockup. The lockups look e.g. like this:
It looks like
b4a34aa6d "ipmi: Fix how the lower layers are told to watch for messages"
was backported to fullfill a dependency for another backport, but there was another change:
e1891cffd4c4 "ipmi: Make the smi watcher be disabled immediately when not needed"
That is needed to avoid calling a lower layer function with xmit_msgs_lock held. It doesn't apply completely cleanly because of other changes, but you just need to leave in the free_user_work() function and delete the other function in the conflict. In addition to that, you will also need:
383035211c79 "ipmi: move message error checking to avoid deadlock"
to fix a bug in that change.
Can you try this out?
Thanks,
-corey
[29397.950589] RIP: 0010:native_queued_spin_lock_slowpath+0x57/0x190 [29397.950590] Code: 74 38 81 e6 00 ff ff ff 75 60 f0 0f ba 2f 08 8b 07 72 57 89 c2 30 e6 a9 00 00 ff ff 75 48 85 d2 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01 [29397.950591] RSP: 0000:ffff93d07f703de8 EFLAGS: 00000002 [29397.950591] RAX: 0000000000000101 RBX: ffff93cf90087220 RCX: 0000000000000006 [29397.950592] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff93cf90087220 [29397.950592] RBP: ffff93d07f703de8 R08: 0000000000000000 R09: ffff93d07f71fb70 [29397.950593] R10: ffff93d07f71fb28 R11: ffff93d07f703ee8 R12: 0000000000000002 [29397.950593] R13: ffff93cf852cde58 R14: ffff93cf852cc000 R15: 0000000000000003 [29397.950594] ? native_queued_spin_lock_slowpath+0x57/0x190 [29397.950594] ? native_queued_spin_lock_slowpath+0x57/0x190 [29397.950594] </NMI> [29397.950595] <IRQ> [29397.950595] _raw_spin_lock_irqsave+0x46/0x50 [29397.950595] set_need_watch+0x2d/0x70 [ipmi_si] [29397.950596] ? _raw_spin_lock_irqsave+0x25/0x50 [29397.950596] ipmi_timeout+0x2b4/0x530 [ipmi_msghandler] [29397.950597] ? ipmi_set_gets_events+0x260/0x260 [ipmi_msghandler] [29397.950597] call_timer_fn+0x30/0x130 [29397.950597] ? ipmi_set_gets_events+0x260/0x260 [ipmi_msghandler] [29397.950598] run_timer_softirq+0x1ce/0x3f0 [29397.950598] ? ktime_get+0x40/0xa0 [29397.950598] ? sched_clock+0x9/0x10 [29397.950599] ? sched_clock_cpu+0x11/0xc0 [29397.950599] __do_softirq+0x104/0x32d [29397.950600] ? sched_clock_cpu+0x11/0xc0 [29397.950600] irq_exit+0x11b/0x120 [29397.950600] smp_apic_timer_interrupt+0x79/0x140 [29397.950601] apic_timer_interrupt+0xf/0x20 [29397.950601] </IRQ>
And like this:
[16944.269585] RIP: 0010:native_queued_spin_lock_slowpath+0x57/0x190 [16944.269586] Code: 74 38 81 e6 00 ff ff ff 75 60 f0 0f ba 2f 08 8b 07 72 57 89 c2 30 e6 a9 00 00 ff ff 75 48 85 d2 74 0e 8b 07 84 c0 74 08 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 81 fe 00 01 [16944.269587] RSP: 0018:ffff9a4e40b03dd0 EFLAGS: 00000002 [16944.269588] RAX: 0000000000580101 RBX: ffff9a4ac5089e98 RCX: ffff9a2b9574b538 [16944.269588] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9a4ac5089e98 [16944.269589] RBP: ffff9a4e40b03dd0 R08: ffff9a4ac5089e58 R09: ffff9a4e40b1fb70 [16944.269589] R10: ffff9a4e40b1fb28 R11: ffff9a4e40b03ee8 R12: 0000000000000002 [16944.269590] R13: ffff9a4ac5089e98 R14: ffff9a4ac5089e58 R15: ffff9a4ac5089e54 [16944.269590] ? native_queued_spin_lock_slowpath+0x57/0x190 [16944.269591] ? native_queued_spin_lock_slowpath+0x57/0x190 [16944.269591] </NMI> [16944.269592] <IRQ> [16944.269592] _raw_spin_lock_irqsave+0x46/0x50 [16944.269592] ipmi_smi_msg_received+0x1bc/0x300 [ipmi_msghandler] [16944.269593] smi_event_handler+0x26c/0x650 [ipmi_si] [16944.269593] smi_timeout+0x46/0xc0 [ipmi_si] [16944.269594] ? ipmi_si_irq_handler+0x70/0x70 [ipmi_si] [16944.269594] call_timer_fn+0x30/0x130 [16944.269595] ? ipmi_si_irq_handler+0x70/0x70 [ipmi_si] [16944.269595] run_timer_softirq+0x1ce/0x3f0 [16944.269595] ? ktime_get+0x40/0xa0 [16944.269596] ? sched_clock+0x9/0x10 [16944.269596] ? sched_clock_cpu+0x11/0xc0 [16944.269597] __do_softirq+0x104/0x32d [16944.269597] ? sched_clock_cpu+0x11/0xc0 [16944.269597] irq_exit+0x11b/0x120 [16944.269598] smp_apic_timer_interrupt+0x79/0x140 [16944.269598] apic_timer_interrupt+0xf/0x20 [16944.269599] </IRQ>
To me these would look like an ordering violation between "smi_info->si_lock" and "intf->xmit_msgs_lock", probably introduced by this commit:
commit b4a34aa6dfbca67610e56ad84a3595f537c85af9 Author: Corey Minyard cminyard@mvista.com Date: Tue Oct 23 11:29:02 2018 -0500
ipmi: Fix how the lower layers are told to watch for messages
[ Upstream commit c65ea996595005be470fbfa16711deba414fd33b ]
In order to test the theory further I built and booted a kernel with lockdep and this happened:
[ 215.679605] kipmi0/1465 is trying to acquire lock: [ 215.684490] 00000000fc1528d3 (&(&new_smi->si_lock)->rlock){..-.}, at: set_need_watch+0x2d/0x70 [ipmi_si] [ 215.694073] but task is already holding lock: [ 215.699995] 00000000e2eea01c (&(&intf->xmit_msgs_lock)->rlock){..-.}, at: smi_recv_tasklet+0x170/0x260 [ipmi_msghandler] [ 215.710966] which lock already depends on the new lock.
[ 215.719243] the existing dependency chain (in reverse order) is: [ 215.726824] -> #1 (&(&intf->xmit_msgs_lock)->rlock){..-.}: [ 215.733890] lock_acquire+0xae/0x180 [ 215.738111] _raw_spin_lock_irqsave+0x4d/0x8a [ 215.743113] ipmi_smi_msg_received+0x1bc/0x300 [ipmi_msghandler] [ 215.749766] smi_event_handler+0x26c/0x660 [ipmi_si] [ 215.755378] ipmi_thread+0x5d/0x200 [ipmi_si] [ 215.760377] kthread+0x13c/0x160 [ 215.764247] ret_from_fork+0x24/0x50 [ 215.768457] -> #0 (&(&new_smi->si_lock)->rlock){..-.}: [ 215.775220] __lock_acquire+0xa61/0xfb0 [ 215.779700] lock_acquire+0xae/0x180 [ 215.783912] _raw_spin_lock_irqsave+0x4d/0x8a [ 215.788915] set_need_watch+0x2d/0x70 [ipmi_si] [ 215.794091] smi_tell_to_watch.constprop.39+0x4a/0x50 [ipmi_msghandler] [ 215.801353] smi_recv_tasklet+0xe8/0x260 [ipmi_msghandler] [ 215.807484] tasklet_action_common.isra.14+0x83/0x1a0 [ 215.813177] tasklet_action+0x22/0x30 [ 215.817479] __do_softirq+0xd4/0x3ef [ 215.821698] irq_exit+0x120/0x130 [ 215.825649] smp_irq_work_interrupt+0x8c/0x190 [ 215.830732] irq_work_interrupt+0xf/0x20 [ 215.835296] _raw_spin_unlock_irqrestore+0x56/0x60 [ 215.840725] ipmi_thread+0x13e/0x200 [ipmi_si] [ 215.845811] kthread+0x13c/0x160 [ 215.849683] ret_from_fork+0x24/0x50 [ 215.853894] other info that might help us debug this:
[ 215.862075] Possible unsafe locking scenario:
[ 215.868139] CPU0 CPU1 [ 215.872788] ---- ---- [ 215.877433] lock(&(&intf->xmit_msgs_lock)->rlock); [ 215.882512] lock(&(&new_smi->si_lock)->rlock); [ 215.889776] lock(&(&intf->xmit_msgs_lock)->rlock); [ 215.897413] lock(&(&new_smi->si_lock)->rlock); [ 215.902151] *** DEADLOCK ***
[ 215.908250] 2 locks held by kipmi0/1465:
This seems to also indicate the same locks as culprits although the backtraces look different from the actual crashes.
It looks like
b4a34aa6d "ipmi: Fix how the lower layers are told to watch for messages"
was backported to fullfill a dependency for another backport, but there was another change:
e1891cffd4c4 "ipmi: Make the smi watcher be disabled immediately when not needed"
That is needed to avoid calling a lower layer function with xmit_msgs_lock held. It doesn't apply completely cleanly because of other changes, but you just need to leave in the free_user_work() function and delete the other function in the conflict. In addition to that, you will also need:
383035211c79 "ipmi: move message error checking to avoid deadlock"
to fix a bug in that change.
Can you try this out?
Yes, sorry for the delay, had a bit of technical problems testing your proposed patches. In the meantime we found out that over a dozen of our test servers have had the same crash, some of them multiple times since the kernel update.
Anyways, with your proposed patches on top of 4.19.286, I couldn't trigger the lockdep warning anymore even in a server that without the fixes triggers it very reliably right after the boot. I also saw in another very similar server (without the fixes) that it took almost 17 hours to get even the lockdep warning. Maybe some specific BMC behavior affects this or something? Sadly, that kind of diminishes the value of the short duration tests, but at least there has so far been zero lockdep warnings with the fixes applied. The actual lockups are then way too unpredictable to test reliably in any kind of short time frame.
Anyways, looking at e1891cffd4c4, it's right there where the issue seems to originate from, so it makes total sense to me that it does fix it. I was already kind of looking at it when you confirmed it. Thanks for pointing out also the 383035211c79 patch, it might have been easily missed.
On Tue, Jun 20, 2023 at 09:56:50AM +0000, Janne Huttunen (Nokia) wrote:
It looks like
b4a34aa6d "ipmi: Fix how the lower layers are told to watch for messages"
was backported to fullfill a dependency for another backport, but there was another change:
e1891cffd4c4 "ipmi: Make the smi watcher be disabled immediately when not needed"
That is needed to avoid calling a lower layer function with xmit_msgs_lock held. It doesn't apply completely cleanly because of other changes, but you just need to leave in the free_user_work() function and delete the other function in the conflict. In addition to that, you will also need:
383035211c79 "ipmi: move message error checking to avoid deadlock"
to fix a bug in that change.
Can you try this out?
Yes, sorry for the delay, had a bit of technical problems testing your proposed patches. In the meantime we found out that over a dozen of our test servers have had the same crash, some of them multiple times since the kernel update.
I don't consider this a delay, it was quite speedy.
Anyways, with your proposed patches on top of 4.19.286, I couldn't trigger the lockdep warning anymore even in a server that without the fixes triggers it very reliably right after the boot. I also saw in another very similar server (without the fixes) that it took almost 17 hours to get even the lockdep warning. Maybe some specific BMC behavior affects this or something? Sadly, that kind of diminishes the value of the short duration tests, but at least there has so far been zero lockdep warnings with the fixes applied. The actual lockups are then way too unpredictable to test reliably in any kind of short time frame.
It does depend on what you are doing to the driver, but it sounds like you are running the same software everywhere. I'm not sure; I've seen timing do strange things before.
Anyways, looking at e1891cffd4c4, it's right there where the issue seems to originate from, so it makes total sense to me that it does fix it. I was already kind of looking at it when you confirmed it. Thanks for pointing out also the 383035211c79 patch, it might have been easily missed.
Ok, thank you for testing. I'll prepare a stable kernel request.
-corey
linux-stable-mirror@lists.linaro.org