From: Ronald Wahl ronald.wahl@raritan.com
When SMP is enabled and spinlocks are actually functional then there is a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi and ks8851_irq:
watchdog: BUG: soft lockup - CPU#0 stuck for 27s! call trace: queued_spin_lock_slowpath+0x100/0x284 do_raw_spin_lock+0x34/0x44 ks8851_start_xmit_spi+0x30/0xb8 ks8851_start_xmit+0x14/0x20 netdev_start_xmit+0x40/0x6c dev_hard_start_xmit+0x6c/0xbc sch_direct_xmit+0xa4/0x22c __qdisc_run+0x138/0x3fc qdisc_run+0x24/0x3c net_tx_action+0xf8/0x130 handle_softirqs+0x1ac/0x1f0 __do_softirq+0x14/0x20 ____do_softirq+0x10/0x1c call_on_irq_stack+0x3c/0x58 do_softirq_own_stack+0x1c/0x28 __irq_exit_rcu+0x54/0x9c irq_exit_rcu+0x10/0x1c el1_interrupt+0x38/0x50 el1h_64_irq_handler+0x18/0x24 el1h_64_irq+0x64/0x68 __netif_schedule+0x6c/0x80 netif_tx_wake_queue+0x38/0x48 ks8851_irq+0xb8/0x2c8 irq_thread_fn+0x2c/0x74 irq_thread+0x10c/0x1b0 kthread+0xc8/0xd8 ret_from_fork+0x10/0x20
This issue has not been identified earlier because tests were done on a device with SMP disabled and so spinlocks were actually NOPs.
This commit moves the netif_wake_queue call outside the spinlock protected area.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun") Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: Simon Horman horms@kernel.org Cc: netdev@vger.kernel.org Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Ronald Wahl ronald.wahl@raritan.com --- drivers/net/ethernet/micrel/ks8851_common.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c index 6453c92f0fa7..60b959126b26 100644 --- a/drivers/net/ethernet/micrel/ks8851_common.c +++ b/drivers/net/ethernet/micrel/ks8851_common.c @@ -348,15 +348,17 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
if (status & IRQ_TXI) { unsigned short tx_space = ks8851_rdreg16(ks, KS_TXMIR); + bool need_wake_queue;
netif_dbg(ks, intr, ks->netdev, "%s: txspace %d\n", __func__, tx_space);
spin_lock(&ks->statelock); ks->tx_space = tx_space; - if (netif_queue_stopped(ks->netdev)) - netif_wake_queue(ks->netdev); + need_wake_queue = netif_queue_stopped(ks->netdev); spin_unlock(&ks->statelock); + if (need_wake_queue) + netif_wake_queue(ks->netdev); }
if (status & IRQ_SPIBEI) { -- 2.45.2
On Wed, 3 Jul 2024 18:00:53 +0200 Ronald Wahl wrote:
bool need_wake_queue;
netif_dbg(ks, intr, ks->netdev, "%s: txspace %d\n", __func__, tx_space);
spin_lock(&ks->statelock); ks->tx_space = tx_space;
if (netif_queue_stopped(ks->netdev))
netif_wake_queue(ks->netdev);
spin_unlock(&ks->statelock);need_wake_queue = netif_queue_stopped(ks->netdev);
if (need_wake_queue)
netif_wake_queue(ks->netdev);
xmit runs in BH, this is just one way you can hit this deadlock better fix would be to make sure statelock is always taken using spin_lock_bh()
Thanks, I made a v2.
I now also found another potential TX stall issue caused by improper locking. In ks8851_tx_work we need to move
last = skb_queue_empty(&ks->txq);
under the lock or otherwise risk a TX stall because in case the queue was empty and has meanwhile being completely filled while we were waiting for the lock. I need to double check this scenario first. If it is indeed an issue then I will provide a separate patch later.
On 04.07.24 16:44, Jakub Kicinski wrote:
On Wed, 3 Jul 2024 18:00:53 +0200 Ronald Wahl wrote:
bool need_wake_queue; netif_dbg(ks, intr, ks->netdev, "%s: txspace %d\n", __func__, tx_space); spin_lock(&ks->statelock); ks->tx_space = tx_space;
if (netif_queue_stopped(ks->netdev))
netif_wake_queue(ks->netdev);
need_wake_queue = netif_queue_stopped(ks->netdev); spin_unlock(&ks->statelock);
if (need_wake_queue)
netif_wake_queue(ks->netdev);
xmit runs in BH, this is just one way you can hit this deadlock better fix would be to make sure statelock is always taken using spin_lock_bh()
________________________________
Ce message, ainsi que tous les fichiers joints à ce message, peuvent contenir des informations sensibles et/ ou confidentielles ne devant pas être divulguées. Si vous n'êtes pas le destinataire de ce message (ou que vous recevez ce message par erreur), nous vous remercions de le notifier immédiatement à son expéditeur, et de détruire ce message. Toute copie, divulgation, modification, utilisation ou diffusion, non autorisée, directe ou indirecte, de tout ou partie de ce message, est strictement interdite.
This e-mail, and any document attached hereby, may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized, direct or indirect, copying, disclosure, distribution or other use of the material or parts thereof is strictly forbidden.
linux-stable-mirror@lists.linaro.org