Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in interrupt context results in a possible race condition. A controller reset results in errored io completions, which schedules error work. The change of error work to a work element allows it to fire after the ctrl state transition to NVME_CTRL_CONNECTING, causing any outstanding io (used to initialize the controller) to fail and cause problems for connect_work.
Add a state check to only schedule error work if not in the RESETTING state.
Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Nigel Kirkland nkirkland2304@gmail.com Signed-off-by: James Smart jsmart2021@gmail.com
--- v2: clean up typo in commit header --- drivers/nvme/host/fc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 20dadd86e981..0f92bd12123e 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2055,7 +2055,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) nvme_fc_complete_rq(rq);
check_error: - if (terminate_assoc) + if (terminate_assoc && ctrl->ctrl.state != NVME_CTRL_RESETTING) queue_work(nvme_reset_wq, &ctrl->ioerr_work); }
On Mon, 2021-03-08 at 16:51 -0800, James Smart wrote:
Recent patch to prevent calling __nvme_fc_abort_outstanding_ios in interrupt context results in a possible race condition. A controller reset results in errored io completions, which schedules error work. The change of error work to a work element allows it to fire after the ctrl state transition to NVME_CTRL_CONNECTING, causing any outstanding io (used to initialize the controller) to fail and cause problems for connect_work.
Add a state check to only schedule error work if not in the RESETTING state.
Fixes: 19fce0470f05 ("nvme-fc: avoid calling _nvme_fc_abort_outstanding_ios from interrupt context") Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Nigel Kirkland nkirkland2304@gmail.com Signed-off-by: James Smart jsmart2021@gmail.com
v2: clean up typo in commit header
drivers/nvme/host/fc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c index 20dadd86e981..0f92bd12123e 100644 --- a/drivers/nvme/host/fc.c +++ b/drivers/nvme/host/fc.c @@ -2055,7 +2055,7 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req) nvme_fc_complete_rq(rq); check_error:
- if (terminate_assoc)
- if (terminate_assoc && ctrl->ctrl.state != NVME_CTRL_RESETTING) queue_work(nvme_reset_wq, &ctrl->ioerr_work);
}
This fix resolves the frequent -EBUSY / -ENETRESET errors I saw when resetting the controller via sysfs, as well as the eventual hang with the controller stuck in the _CONNECTING state, thanks. Looks good.
-Ewan
linux-stable-mirror@lists.linaro.org