From: Ming Lei ming.lei@redhat.com
Move start_freeze into nvme_tcp_configure_io_queues(), and there is at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Yi Zhang yi.zhang@redhat.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org --- drivers/nvme/host/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 96d8d7844e84..c2e037644ad1 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1882,6 +1882,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new) goto out_cleanup_connect_q;
if (!new) { + nvme_start_freeze(ctrl); nvme_start_queues(ctrl); if (!nvme_wait_freeze_timeout(ctrl, NVME_IO_TIMEOUT)) { /* @@ -1890,6 +1891,7 @@ static int nvme_tcp_configure_io_queues(struct nvme_ctrl *ctrl, bool new) * to be safe. */ ret = -ENODEV; + nvme_unfreeze(ctrl); goto out_wait_freeze_timed_out; } blk_mq_update_nr_hw_queues(ctrl->tagset, @@ -2008,7 +2010,6 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl, if (ctrl->queue_count <= 1) return; blk_mq_quiesce_queue(ctrl->admin_q); - nvme_start_freeze(ctrl); nvme_stop_queues(ctrl); nvme_sync_io_queues(ctrl); nvme_tcp_stop_io_queues(ctrl);
From: Ming Lei ming.lei@redhat.com
Move start_freeze into nvme_rdma_configure_io_queues(), and there is at least two benefits:
1) fix unbalanced freeze and unfreeze, since re-connection work may fail or be broken by removal
2) IO during error recovery can be failfast quickly because nvme fabrics unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal:
1) same problem exists with current code base
2) compared with !mpath, mpath use case is dominant
Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Yi Zhang yi.zhang@redhat.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org --- drivers/nvme/host/rdma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index 2db9c166a1b7..b76e1d4adcc7 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -989,6 +989,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new) goto out_cleanup_connect_q;
if (!new) { + nvme_start_freeze(&ctrl->ctrl); nvme_start_queues(&ctrl->ctrl); if (!nvme_wait_freeze_timeout(&ctrl->ctrl, NVME_IO_TIMEOUT)) { /* @@ -997,6 +998,7 @@ static int nvme_rdma_configure_io_queues(struct nvme_rdma_ctrl *ctrl, bool new) * to be safe. */ ret = -ENODEV; + nvme_unfreeze(&ctrl->ctrl); goto out_wait_freeze_timed_out; } blk_mq_update_nr_hw_queues(ctrl->ctrl.tagset, @@ -1038,7 +1040,6 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl, bool remove) { if (ctrl->ctrl.queue_count > 1) { - nvme_start_freeze(&ctrl->ctrl); nvme_stop_queues(&ctrl->ctrl); nvme_sync_io_queues(&ctrl->ctrl); nvme_rdma_stop_io_queues(ctrl);
On Sun, Aug 13, 2023 at 05:45:09PM +0300, Sagi Grimberg wrote:
From: Ming Lei ming.lei@redhat.com
Move start_freeze into nvme_tcp_configure_io_queues(), and there is at least two benefits:
- fix unbalanced freeze and unfreeze, since re-connection work may
fail or be broken by removal
- IO during error recovery can be failfast quickly because nvme fabrics
unquiesces queues after teardown.
One side-effect is that !mpath request may timeout during connecting because of queue topo change, but that looks not one big deal:
same problem exists with current code base
compared with !mpath, mpath use case is dominant
Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic") Cc: stable@vger.kernel.org Signed-off-by: Ming Lei ming.lei@redhat.com Tested-by: Yi Zhang yi.zhang@redhat.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Keith Busch kbusch@kernel.org
drivers/nvme/host/tcp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
All now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org