From: Jason Gunthorpe jgg@nvidia.com
commit 2ee9bf346fbfd1dad0933b9eb3a4c2c0979b633e upstream.
This three thread race can result in the work being run once the callback becomes NULL:
CPU1 CPU2 CPU3 netevent_callback() process_one_req() rdma_addr_cancel() [..] spin_lock_bh() set_timeout() spin_unlock_bh()
spin_lock_bh() list_del_init(&req->list); spin_unlock_bh()
req->callback = NULL spin_lock_bh() if (!list_empty(&req->list)) // Skipped! // cancel_delayed_work(&req->work); spin_unlock_bh()
process_one_req() // again req->callback() // BOOM cancel_delayed_work_sync()
The solution is to always cancel the work once it is completed so any in between set_timeout() does not result in it running again.
Cc: stable@vger.kernel.org Fixes: 44e75052bc2a ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence") Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.org Reported-by: Dan Aloni dan@kernelim.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/addr.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
--- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -647,13 +647,12 @@ static void process_one_req(struct work_ req->callback = NULL;
spin_lock_bh(&lock); + /* + * Although the work will normally have been canceled by the workqueue, + * it can still be requeued as long as it is on the req_list. + */ + cancel_delayed_work(&req->work); if (!list_empty(&req->list)) { - /* - * Although the work will normally have been canceled by the - * workqueue, it can still be requeued as long as it is on the - * req_list. - */ - cancel_delayed_work(&req->work); list_del_init(&req->list); kfree(req); }