The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 25524288631fc5b7d33259fca1e0dc38146be5d6 Mon Sep 17 00:00:00 2001
From: Chuck Lever chuck.lever@oracle.com Date: Mon, 19 Mar 2018 14:23:16 -0400 Subject: [PATCH] xprtrdma: Fix corner cases when handling device removal
Michal Kalderon has found some corner cases around device unload with active NFS mounts that I didn't have the imagination to test when xprtrdma device removal was added last year.
- The ULP device removal handler is responsible for deallocating the PD. That wasn't clear to me initially, and my own testing suggested it was not necessary, but that is incorrect.
- The transport destruction path can no longer assume that there is a valid ID.
- When destroying a transport, ensure that ib_free_cq() is not invoked on a CQ that was already released.
Reported-by: Michal Kalderon Michal.Kalderon@cavium.com Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA from ...") Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 6a7a5a277e75..fe5eaca2d197 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -250,7 +250,6 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event) wait_for_completion(&ia->ri_remove_done);
ia->ri_id = NULL; - ia->ri_pd = NULL; ia->ri_device = NULL; /* Return 1 to ensure the core destroys the id. */ return 1; @@ -447,7 +446,9 @@ rpcrdma_ia_remove(struct rpcrdma_ia *ia) ia->ri_id->qp = NULL; } ib_free_cq(ep->rep_attr.recv_cq); + ep->rep_attr.recv_cq = NULL; ib_free_cq(ep->rep_attr.send_cq); + ep->rep_attr.send_cq = NULL;
/* The ULP is responsible for ensuring all DMA * mappings and MRs are gone. @@ -460,6 +461,8 @@ rpcrdma_ia_remove(struct rpcrdma_ia *ia) rpcrdma_dma_unmap_regbuf(req->rl_recvbuf); } rpcrdma_mrs_destroy(buf); + ib_dealloc_pd(ia->ri_pd); + ia->ri_pd = NULL;
/* Allow waiters to continue */ complete(&ia->ri_remove_done); @@ -627,14 +630,16 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia) { cancel_delayed_work_sync(&ep->rep_connect_worker);
- if (ia->ri_id->qp) { + if (ia->ri_id && ia->ri_id->qp) { rpcrdma_ep_disconnect(ep, ia); rdma_destroy_qp(ia->ri_id); ia->ri_id->qp = NULL; }
- ib_free_cq(ep->rep_attr.recv_cq); - ib_free_cq(ep->rep_attr.send_cq); + if (ep->rep_attr.recv_cq) + ib_free_cq(ep->rep_attr.recv_cq); + if (ep->rep_attr.send_cq) + ib_free_cq(ep->rep_attr.send_cq); }
/* Re-establish a connection after a device removal event.