On Mon, Sep 30, 2019 at 04:16:54PM -0700, Bart Van Assche wrote:
This patch fixes the lock inversion complaint:
============================================ WARNING: possible recursive locking detected 5.3.0-rc7-dbg+ #1 Not tainted kworker/u16:6/171 is trying to acquire lock: 00000000035c6e6c (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x78/0x4a0 [rdma_cm]
but task is already holding lock: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
other info that might help us debug this: Possible unsafe locking scenario:
CPU0
lock(&id_priv->handler_mutex); lock(&id_priv->handler_mutex);
*** DEADLOCK ***
May be due to missing lock nesting notation
3 locks held by kworker/u16:6/171: #0: 00000000e2eaa773 ((wq_completion)iw_cm_wq){+.+.}, at: process_one_work+0x472/0xac0 #1: 000000001efd357b ((work_completion)(&work->work)#3){+.+.}, at: process_one_work+0x476/0xac0 #2: 00000000bc7c307d (&id_priv->handler_mutex){+.+.}, at: iw_conn_req_handler+0x151/0x680 [rdma_cm]
stack backtrace: CPU: 3 PID: 171 Comm: kworker/u16:6 Not tainted 5.3.0-rc7-dbg+ #1 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: iw_cm_wq cm_work_handler [iw_cm] Call Trace: dump_stack+0x8a/0xd6 __lock_acquire.cold+0xe1/0x24d lock_acquire+0x106/0x240 __mutex_lock+0x12e/0xcb0 mutex_lock_nested+0x1f/0x30 rdma_destroy_id+0x78/0x4a0 [rdma_cm] iw_conn_req_handler+0x5c9/0x680 [rdma_cm] cm_work_handler+0xe62/0x1100 [iw_cm] process_one_work+0x56d/0xac0 worker_thread+0x7a/0x5d0 kthread+0x1bc/0x210 ret_from_fork+0x24/0x30
Cc: Or Gerlitz gerlitz.or@gmail.com Cc: Steve Wise larrystevenwise@gmail.com Cc: Sagi Grimberg sagi@grimberg.me Cc: Bernard Metzler BMT@zurich.ibm.com Cc: Krishnamraju Eraparaju krishna2@chelsio.com Cc: stable@vger.kernel.org Fixes: de910bd92137 ("RDMA/cma: Simplify locking needed for serialization of callbacks"; v2.6.27). Signed-off-by: Bart Van Assche bvanassche@acm.org drivers/infiniband/core/cma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 0e3cf3461999..d78f67623f24 100644 +++ b/drivers/infiniband/core/cma.c @@ -2396,9 +2396,10 @@ static int iw_conn_req_handler(struct iw_cm_id *cm_id, conn_id->cm_id.iw = NULL; cma_exch(conn_id, RDMA_CM_DESTROYING); mutex_unlock(&conn_id->handler_mutex);
cma_deref_id(conn_id); rdma_destroy_id(&conn_id->id);mutex_unlock(&listen_id->handler_mutex);
goto out;
}return ret;
Hurm. Minimizing code under lock is always a good fix, but the lockdep report is not a bug.
The issue is caused by the hacky use of SINGLE_DEPTH_NESTING when we really have two lock classes, 'listening' and 'connecting' for CM ids.
connecting IDs can be nested under listening IDs but not the other way around.
So two lock classes will also get rid of the lockdep warning, which is why it isn't a bug..
Applied to for-rc
Thanks, Jason