From: Daniel Wagner wagi@kernel.org
[ Upstream commit f2537be4f8421f6495edfa0bc284d722f253841d ]
When forcefully shutting down a port via the configfs interface, nvmet_port_subsys_drop_link() first calls nvmet_port_del_ctrls() and then nvmet_disable_port(). Both functions will eventually schedule all remaining associations for deletion.
The current implementation checks whether an association is about to be removed, but only after the work item has already been scheduled. As a result, it is possible for the first scheduled work item to free all resources, and then for the same work item to be scheduled again for deletion.
Because the association list is an RCU list, it is not possible to take a lock and remove the list entry directly, so it cannot be looked up again. Instead, a flag (terminating) must be used to determine whether the association is already in the process of being deleted.
Reported-by: Shinichiro Kawasaki shinichiro.kawasaki@wdc.com Closes: https://lore.kernel.org/all/rsdinhafrtlguauhesmrrzkybpnvwantwmyfq2ih5areggha... Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Daniel Wagner wagi@kernel.org Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES – this prevents a real use-after-free when an FC target port is torn down through configfs.
- `nvmet_port_subsys_drop_link()` first calls `nvmet_port_del_ctrls()` and then `nvmet_disable_port()` (`drivers/nvme/target/configfs.c:1088`, `drivers/nvme/target/core.c:301`), and both paths funnel into `__nvmet_fc_free_assocs()` which queues `assoc->del_work` (`drivers/nvme/target/fc.c:1482`). So a forced shutdown schedules the same association cleanup twice. - The guard that’s supposed to stop duplicates only runs inside `nvmet_fc_delete_target_assoc()` after the work executes (`drivers/nvme/target/fc.c:1201`), so the second caller can still re- queue the work once the first invocation has freed the association, hitting the race reported on the mailing list. - The patch simply flips the `terminating` flag before queueing (`drivers/nvme/target/fc.c:1076` in the new code) and removes the redundant check from the worker. That keeps the work from ever being queued a second time, exactly matching the original intent with no behavioural side effects. - Change is tiny, isolated to the nvmet-fc transport, and has no dependencies beyond the existing `assoc->terminating` infrastructure (already present in supported stable series), so the risk of regression is minimal while the bug being fixed can crash systems under administrative port removal.
If you’re carrying stable trees that include the fc target (v6.10 and earlier back to when `assoc->terminating` was introduced), you should pick this up; older branches without the later queue_work refcount patch just need the same flag move applied to their local `nvmet_fc_schedule_delete_assoc()`.
drivers/nvme/target/fc.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/target/fc.c b/drivers/nvme/target/fc.c index a9b18c051f5bd..249adb2811420 100644 --- a/drivers/nvme/target/fc.c +++ b/drivers/nvme/target/fc.c @@ -1075,6 +1075,14 @@ nvmet_fc_delete_assoc_work(struct work_struct *work) static void nvmet_fc_schedule_delete_assoc(struct nvmet_fc_tgt_assoc *assoc) { + int terminating; + + terminating = atomic_xchg(&assoc->terminating, 1); + + /* if already terminating, do nothing */ + if (terminating) + return; + nvmet_fc_tgtport_get(assoc->tgtport); if (!queue_work(nvmet_wq, &assoc->del_work)) nvmet_fc_tgtport_put(assoc->tgtport); @@ -1202,13 +1210,7 @@ nvmet_fc_delete_target_assoc(struct nvmet_fc_tgt_assoc *assoc) { struct nvmet_fc_tgtport *tgtport = assoc->tgtport; unsigned long flags; - int i, terminating; - - terminating = atomic_xchg(&assoc->terminating, 1); - - /* if already terminating, do nothing */ - if (terminating) - return; + int i;
spin_lock_irqsave(&tgtport->lock, flags); list_del_rcu(&assoc->a_list);