From: "Chunguang.xu" chunguang.xu@shopee.com
[ Upstream commit 5858b687559809f05393af745cbadf06dee61295 ]
Kernel will hang on destroy admin_q while we create ctrl failed, such as following calltrace:
PID: 23644 TASK: ff2d52b40f439fc0 CPU: 2 COMMAND: "nvme" #0 [ff61d23de260fb78] __schedule at ffffffff8323bc15 #1 [ff61d23de260fc08] schedule at ffffffff8323c014 #2 [ff61d23de260fc28] blk_mq_freeze_queue_wait at ffffffff82a3dba1 #3 [ff61d23de260fc78] blk_freeze_queue at ffffffff82a4113a #4 [ff61d23de260fc90] blk_cleanup_queue at ffffffff82a33006 #5 [ff61d23de260fcb0] nvme_rdma_destroy_admin_queue at ffffffffc12686ce #6 [ff61d23de260fcc8] nvme_rdma_setup_ctrl at ffffffffc1268ced #7 [ff61d23de260fd28] nvme_rdma_create_ctrl at ffffffffc126919b #8 [ff61d23de260fd68] nvmf_dev_write at ffffffffc024f362 #9 [ff61d23de260fe38] vfs_write at ffffffff827d5f25 RIP: 00007fda7891d574 RSP: 00007ffe2ef06958 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 000055e8122a4d90 RCX: 00007fda7891d574 RDX: 000000000000012b RSI: 000055e8122a4d90 RDI: 0000000000000004 RBP: 00007ffe2ef079c0 R8: 000000000000012b R9: 000055e8122a4d90 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000004 R13: 000055e8122923c0 R14: 000000000000012b R15: 00007fda78a54500 ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
This due to we have quiesced admi_q before cancel requests, but forgot to unquiesce before destroy it, as a result we fail to drain the pending requests, and hang on blk_mq_freeze_queue_wait() forever. Here try to reuse nvme_rdma_teardown_admin_queue() to fix this issue and simplify the code.
Fixes: 958dc1d32c80 ("nvme-rdma: add clean action for failed reconnection") Reported-by: Yingfu.zhou yingfu.zhou@shopee.com Signed-off-by: Chunguang.xu chunguang.xu@shopee.com Signed-off-by: Yue.zhao yue.zhao@shopee.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Keith Busch kbusch@kernel.org [Minor context change fixed] Signed-off-by: Feng Liu Feng.Liu3@windriver.com Signed-off-by: He Zhe Zhe.He@windriver.com --- Verified the build test. --- drivers/nvme/host/rdma.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index c04317a966b3..055b95d2ce93 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -1083,13 +1083,7 @@ static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new) nvme_rdma_free_io_queues(ctrl); } destroy_admin: - nvme_quiesce_admin_queue(&ctrl->ctrl); - blk_sync_queue(ctrl->ctrl.admin_q); - nvme_rdma_stop_queue(&ctrl->queues[0]); - nvme_cancel_admin_tagset(&ctrl->ctrl); - if (new) - nvme_remove_admin_tag_set(&ctrl->ctrl); - nvme_rdma_destroy_admin_queue(ctrl); + nvme_rdma_teardown_admin_queue(ctrl, new); return ret; }
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 5858b687559809f05393af745cbadf06dee61295
WARNING: Author mismatch between patch and upstream commit: Backport author: Feng LiuFeng.Liu3@windriver.com Commit author: Chunguang.xuchunguang.xu@shopee.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.13.y | Present (exact SHA1) 6.12.y | Present (different SHA1: 05b436f3cf65)
Note: The patch differs from the upstream commit: --- 1: 5858b68755980 ! 1: 4d596a05770c2 nvme-rdma: unquiesce admin_q before destroy it @@ Metadata ## Commit message ## nvme-rdma: unquiesce admin_q before destroy it
+ [ Upstream commit 5858b687559809f05393af745cbadf06dee61295 ] + Kernel will hang on destroy admin_q while we create ctrl failed, such as following calltrace:
@@ Commit message Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Keith Busch kbusch@kernel.org + [Minor context change fixed] + Signed-off-by: Feng Liu Feng.Liu3@windriver.com + Signed-off-by: He Zhe Zhe.He@windriver.com
## drivers/nvme/host/rdma.c ## @@ drivers/nvme/host/rdma.c: static int nvme_rdma_setup_ctrl(struct nvme_rdma_ctrl *ctrl, bool new) + nvme_rdma_free_io_queues(ctrl); } destroy_admin: - nvme_stop_keep_alive(&ctrl->ctrl); - nvme_quiesce_admin_queue(&ctrl->ctrl); - blk_sync_queue(ctrl->ctrl.admin_q); - nvme_rdma_stop_queue(&ctrl->queues[0]); ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
linux-stable-mirror@lists.linaro.org