As &ndlp->lock is acquired by timer lpfc_els_retry_delay() under softirq context, process context code acquiring the lock &ndlp->lock should disable irq or bh, otherwise deadlock could happen if the timer preempt the execution while the lock is held in process context on the same CPU.
The two lock acquisition inside lpfc_cleanup_pending_mbox() does not disable irq or softirq.
[Deadlock Scenario] lpfc_cmpl_els_fdisc() -> lpfc_cleanup_pending_mbox() -> spin_lock(&ndlp->lock); <irq> -> lpfc_els_retry_delay() -> lpfc_nlp_get() -> spin_lock_irqsave(&ndlp->lock, flags); (deadlock here)
This flaw was found by an experimental static analysis tool I am developing for irq-related deadlock.
The patch fix the potential deadlock by spin_lock_irq() to disable irq.
Signed-off-by: Chengfeng Ye dg573847474@gmail.com --- drivers/scsi/lpfc/lpfc_sli.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 58d10f8f75a7..8555f6bb9742 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -21049,9 +21049,9 @@ lpfc_cleanup_pending_mbox(struct lpfc_vport *vport) mb->mbox_flag |= LPFC_MBX_IMED_UNREG; restart_loop = 1; spin_unlock_irq(&phba->hbalock); - spin_lock(&ndlp->lock); + spin_lock_irq(&ndlp->lock); ndlp->nlp_flag &= ~NLP_IGNR_REG_CMPL; - spin_unlock(&ndlp->lock); + spin_unlock_irq(&ndlp->lock); spin_lock_irq(&phba->hbalock); break; } @@ -21067,9 +21067,9 @@ lpfc_cleanup_pending_mbox(struct lpfc_vport *vport) ndlp = (struct lpfc_nodelist *)mb->ctx_ndlp; mb->ctx_ndlp = NULL; if (ndlp) { - spin_lock(&ndlp->lock); + spin_lock_irq(&ndlp->lock); ndlp->nlp_flag &= ~NLP_IGNR_REG_CMPL; - spin_unlock(&ndlp->lock); + spin_unlock_irq(&ndlp->lock); lpfc_nlp_put(ndlp); } }