On 21-05-24 10:56:23, Pawel Laszczak wrote:
On 21-05-20 11:42:24, Pawel Laszczak wrote:
From: Pawel Laszczak pawell@cadence.com
Patch fixes the critical issue caused by deadlock which has been detected during testing NCM class.
The root cause of issue is spin_lock/spin_unlock instruction instead spin_lock_irqsave/spin_lock_irqrestore in cdnsp_thread_irq_handler function.
Pawel, would you please explain more about why the deadlock occurs with current code, and why this patch could fix it?
I'm going to add such description to commit message:
Lack of spin_lock_irqsave causes that during handling threaded interrupt inside spin_lock/spin_unlock section the ethernet subsystem invokes eth_start_xmit function from interrupt context which in turn starts queueing free requests in cdnsp driver. Consequently, the deadlock occurs inside cdnsp_gadget_ep_queue on spin_lock_irqsave instruction. Replacing spin_lock/spin_unlock with spin_lock_irqsave/spin_lock_irqrestor causes disableing
%s/disableing/disabling
interrupts and blocks queuing requests by ethernet subsystem until cdnsp_thread_irq_handler ends..
I hope this description is sufficient.
A call stack graph may be better, like [1]
usb/msg211931.html__;!!EHscmS1ygiU1lA!WCxrsHL4OWKd3iPdyymp-dQlVCoiHM9QzjIZPUC6-Dm6cnpU5CRLLOHrgiYSRA$
After putting online extra cpus, I've manage to catch call trace showing the deadlock issue:
[ 223.051673] smp: csd: Detected non-responsive CSD lock (#1) on CPU#0, waiting 5000000015 ns for CPU#02 do_sync_core+0x0/0x30(0x0). [ 223.051690] smp: csd: CSD lock (#1) unresponsive. [ 223.051693] Sending NMI from CPU 0 to CPUs 2: [ 223.052690] NMI backtrace for cpu 2 [ 223.052692] CPU: 2 PID: 3053 Comm: irq/16-0000:01: Tainted: G OE 5.11.0-rc1+ #5 [ 223.052693] Hardware name: ASUS All Series/Q87T, BIOS 0908 07/22/2014 [ 223.052693] RIP: 0010:native_queued_spin_lock_slowpath+0x61/0x1d0 [ 223.052695] Code: 0f ba 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 47 85 c0 74 0e 8b 07 84 c0 74 08 f3 90 8b 07 <84> c0 75 f8 b8 01 00 00 00 5d 66 89 07 c3 8b 37 b8 00 02 00 00 81 [ 223.052696] RSP: 0018:ffffbc494011cde0 EFLAGS: 00000002 [ 223.052698] RAX: 0000000000000101 RBX: ffff9ee8116b4a68 RCX: 0000000000000000 [ 223.052699] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ee8116b4658 [ 223.052700] RBP: ffffbc494011cde0 R08: 0000000000000001 R09: 0000000000000000 [ 223.052701] R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658 [ 223.052702] R13: ffff9ee8116b4670 R14: 0000000000000246 R15: ffff9ee8116b4658 [ 223.052702] FS: 0000000000000000(0000) GS:ffff9ee89b400000(0000) knlGS:0000000000000000 [ 223.052703] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 223.052704] CR2: 00007f7bcc41a830 CR3: 000000007a612003 CR4: 00000000001706e0 [ 223.052705] Call Trace: [ 223.052706] <IRQ> [ 223.052706] do_raw_spin_lock+0xc0/0xd0 [ 223.052707] _raw_spin_lock_irqsave+0x95/0xa0 [ 223.052708] cdnsp_gadget_ep_queue.cold+0x88/0x107 [cdnsp_udc_pci] [ 223.052708] usb_ep_queue+0x35/0x110 [ 223.052709] eth_start_xmit+0x220/0x3d0 [u_ether] [ 223.052710] ncm_tx_timeout+0x34/0x40 [usb_f_ncm] [ 223.052711] ? ncm_free_inst+0x50/0x50 [usb_f_ncm] [ 223.052711] __hrtimer_run_queues+0xac/0x440 [ 223.052712] hrtimer_run_softirq+0x8c/0xb0 [ 223.052713] __do_softirq+0xcf/0x428 [ 223.052713] asm_call_irq_on_stack+0x12/0x20 [ 223.052714] </IRQ> [ 223.052715] do_softirq_own_stack+0x61/0x70 [ 223.052715] irq_exit_rcu+0xc1/0xd0 [ 223.052716] sysvec_apic_timer_interrupt+0x52/0xb0 [ 223.052717] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 223.052718] RIP: 0010:do_raw_spin_trylock+0x18/0x40 [ 223.052719] Code: ff ff 66 90 eb 86 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 8b 07 48 89 e5 85 c0 75 29 ba 01 00 00 00 f0 0f b1 17 <75> 1e 65 8b 05 2f 77 ee 70 89 47 08 5d 65 48 8b 04 25 40 7e 01 00 [ 223.052720] RSP: 0018:ffffbc494138bda8 EFLAGS: 00000246 [ 223.052721] RAX: 0000000000000000 RBX: ffff9ee8116b4658 RCX: 0000000000000000 [ 223.052722] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9ee8116b4658 [ 223.052723] RBP: ffffbc494138bda8 R08: 0000000000000001 R09: 0000000000000000 [ 223.052724] R10: ffff9ee8116b4670 R11: 0000000000000000 R12: ffff9ee8116b4658 [ 223.052725] R13: ffff9ee8116b4670 R14: ffff9ee7b5c73d80 R15: ffff9ee8116b4000 [ 223.052726] _raw_spin_lock+0x3d/0x70 [ 223.052726] ? cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci] [ 223.052727] cdnsp_thread_irq_handler.cold+0x32/0x112c [cdnsp_udc_pci] [ 223.052728] ? cdnsp_remove_request+0x1f0/0x1f0 [cdnsp_udc_pci] [ 223.052729] ? cdnsp_thread_irq_handler+0x5/0xa0 [cdnsp_udc_pci] [ 223.052730] ? irq_thread+0xa0/0x1c0 [ 223.052730] irq_thread_fn+0x28/0x60 [ 223.052731] irq_thread+0x105/0x1c0 [ 223.052732] ? __kthread_parkme+0x42/0x90 [ 223.052732] ? irq_forced_thread_fn+0x90/0x90 [ 223.052733] ? wake_threads_waitq+0x30/0x30 [ 223.052734] ? irq_thread_check_affinity+0xe0/0xe0 [ 223.052735] kthread+0x12a/0x160 [ 223.052735] ? kthread_park+0x90/0x90 [ 223.052736] ret_from_fork+0x22/0x30
Pawel
Peter
Thanks, Pawel
Cc: stable@vger.kernel.org Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") Signed-off-by: Pawel Laszczak pawell@cadence.com
drivers/usb/cdns3/cdnsp-ring.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/cdns3/cdnsp-ring.c b/drivers/usb/cdns3/cdnsp-ring.c index 5f0513c96c04..68972746e363 100644 --- a/drivers/usb/cdns3/cdnsp-ring.c +++ b/drivers/usb/cdns3/cdnsp-ring.c @@ -1517,13 +1517,14 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data) { struct cdnsp_device *pdev = (struct cdnsp_device *)data; union cdnsp_trb *event_ring_deq;
- unsigned long flags; int counter = 0;
- spin_lock(&pdev->lock);
spin_lock_irqsave(&pdev->lock, flags);
if (pdev->cdnsp_state & (CDNSP_STATE_HALTED | CDNSP_STATE_DYING)) { cdnsp_died(pdev);
spin_unlock(&pdev->lock);
return IRQ_HANDLED; }spin_unlock_irqrestore(&pdev->lock, flags);
@@ -1539,7 +1540,7 @@ irqreturn_t cdnsp_thread_irq_handler(int irq, void *data)
cdnsp_update_erst_dequeue(pdev, event_ring_deq, 1);
- spin_unlock(&pdev->lock);
spin_unlock_irqrestore(&pdev->lock, flags);
return IRQ_HANDLED;
}
2.25.1
--
Thanks, Peter Chen
--
Thanks, Peter Chen