From: Justin Tee justin.tee@broadcom.com
[ Upstream commit a4809b98eb004fcbf7c4d45eb5a624d1c682bb73 ]
In lpfc_cleanup, there is an extraneous nlp_put for NPIV ports on the F_Port_Ctrl ndlp object. In cases when an ABTS is issued, the outstanding kref is needed for when a second XRI_ABORTED CQE is received. The final kref for the ndlp is designed to be decremented in lpfc_sli4_els_xri_aborted instead. Also, add a new log message to allow for future diagnostics when debugging related issues.
Signed-off-by: Justin Tee justin.tee@broadcom.com Message-ID: 20250915180811.137530-5-justintee8345@gmail.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES **Why Backport** - `lpfc_cleanup` no longer drops the last reference for NPIV fabric nodes up front, so those entries now flow through the normal discovery teardown path instead of triggering a premature `lpfc_nlp_put()` (drivers/scsi/lpfc/lpfc_init.c:3059-3076). The old short-circuit let the ndlp hit a zero kref while exchanges were still draining. - During ABTS processing each aborted ELS exchange takes its own reference (`lpfc_nlp_get`) before parking on `lpfc_abts_els_sgl_list`; if the cleanup path has already removed the final reference, the second `XRI_ABORTED` CQE arrives with a dangling pointer, leading to crashes or RRQ cleanup failures (drivers/scsi/lpfc/lpfc_sli.c:1399-1407). - The final release of those references is already centralized in `lpfc_sli4_els_xri_aborted`, so keeping the extra kref alive until the CQE arrives matches the intended life cycle; the new log line simply aids diagnosis (drivers/scsi/lpfc/lpfc_els.c:12020-12029). Any ndlp that never sees its CQE still gets dropped by `lpfc_sli4_vport_delete_els_xri_aborted` when the vport is torn down (drivers/scsi/lpfc/lpfc_els.c:11953-11979).
**Risk** - Change is tightly scoped to the lpfc driver, removes an overzealous `kref_put`, and relies on existing cleanup paths; no API shifts or cross-subsystem dependencies. Impact of not backporting is a real NPIV crash/UAF when ABTS races with vport removal, so the bug fix outweighs the low regression risk.
drivers/scsi/lpfc/lpfc_els.c | 6 +++++- drivers/scsi/lpfc/lpfc_init.c | 7 ------- 2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c index 4c405bade4f34..3f703932b2f07 100644 --- a/drivers/scsi/lpfc/lpfc_els.c +++ b/drivers/scsi/lpfc/lpfc_els.c @@ -12013,7 +12013,11 @@ lpfc_sli4_els_xri_aborted(struct lpfc_hba *phba, sglq_entry->state = SGL_FREED; spin_unlock_irqrestore(&phba->sli4_hba.sgl_list_lock, iflag); - + lpfc_printf_log(phba, KERN_INFO, LOG_ELS | LOG_SLI | + LOG_DISCOVERY | LOG_NODE, + "0732 ELS XRI ABORT on Node: ndlp=x%px " + "xri=x%x\n", + ndlp, xri); if (ndlp) { lpfc_set_rrq_active(phba, ndlp, sglq_entry->sli4_lxritag, diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 4081d2a358eee..f7824266db5e8 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -3057,13 +3057,6 @@ lpfc_cleanup(struct lpfc_vport *vport) lpfc_vmid_vport_cleanup(vport);
list_for_each_entry_safe(ndlp, next_ndlp, &vport->fc_nodes, nlp_listp) { - if (vport->port_type != LPFC_PHYSICAL_PORT && - ndlp->nlp_DID == Fabric_DID) { - /* Just free up ndlp with Fabric_DID for vports */ - lpfc_nlp_put(ndlp); - continue; - } - if (ndlp->nlp_DID == Fabric_Cntl_DID && ndlp->nlp_state == NLP_STE_UNUSED_NODE) { lpfc_nlp_put(ndlp);