This is a note to let you know that I've just added the patch titled
nbd: don't start req until after the dead connection logic
to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: nbd-don-t-start-req-until-after-the-dead-connection-logic.patch and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From 6a468d5990ecd1c2d07dd85f8633bbdd0ba61c40 Mon Sep 17 00:00:00 2001
From: Josef Bacik jbacik@fb.com Date: Mon, 6 Nov 2017 16:11:58 -0500 Subject: nbd: don't start req until after the dead connection logic
From: Josef Bacik jbacik@fb.com
commit 6a468d5990ecd1c2d07dd85f8633bbdd0ba61c40 upstream.
We can end up sleeping for a while waiting for the dead timeout, which means we could get the per request timer to fire. We did handle this case, but if the dead timeout happened right after we submitted we'd either tear down the connection or possibly requeue as we're handling an error and race with the endio which can lead to panics and other hilarity.
Fixes: 560bc4b39952 ("nbd: handle dead connections") Signed-off-by: Josef Bacik jbacik@fb.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/block/nbd.c | 20 +++++++------------- 1 file changed, 7 insertions(+), 13 deletions(-)
--- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -288,15 +288,6 @@ static enum blk_eh_timer_return nbd_xmit cmd->status = BLK_STS_TIMEOUT; return BLK_EH_HANDLED; } - - /* If we are waiting on our dead timer then we could get timeout - * callbacks for our request. For this we just want to reset the timer - * and let the queue side take care of everything. - */ - if (!completion_done(&cmd->send_complete)) { - nbd_config_put(nbd); - return BLK_EH_RESET_TIMER; - } config = nbd->config;
if (config->num_connections > 1) { @@ -740,6 +731,7 @@ static int nbd_handle_cmd(struct nbd_cmd if (!refcount_inc_not_zero(&nbd->config_refs)) { dev_err_ratelimited(disk_to_dev(nbd->disk), "Socks array is empty\n"); + blk_mq_start_request(req); return -EINVAL; } config = nbd->config; @@ -748,6 +740,7 @@ static int nbd_handle_cmd(struct nbd_cmd dev_err_ratelimited(disk_to_dev(nbd->disk), "Attempted send on invalid socket\n"); nbd_config_put(nbd); + blk_mq_start_request(req); return -EINVAL; } cmd->status = BLK_STS_OK; @@ -771,6 +764,7 @@ again: */ sock_shutdown(nbd); nbd_config_put(nbd); + blk_mq_start_request(req); return -EIO; } goto again; @@ -781,6 +775,7 @@ again: * here so that it gets put _after_ the request that is already on the * dispatch list. */ + blk_mq_start_request(req); if (unlikely(nsock->pending && nsock->pending != req)) { blk_mq_requeue_request(req, true); ret = 0; @@ -793,10 +788,10 @@ again: ret = nbd_send_cmd(nbd, cmd, index); if (ret == -EAGAIN) { dev_err_ratelimited(disk_to_dev(nbd->disk), - "Request send failed trying another connection\n"); + "Request send failed, requeueing\n"); nbd_mark_nsock_dead(nbd, nsock, 1); - mutex_unlock(&nsock->tx_lock); - goto again; + blk_mq_requeue_request(req, true); + ret = 0; } out: mutex_unlock(&nsock->tx_lock); @@ -820,7 +815,6 @@ static blk_status_t nbd_queue_rq(struct * done sending everything over the wire. */ init_completion(&cmd->send_complete); - blk_mq_start_request(bd->rq);
/* We can be called directly from the user space process, which means we * could possibly have signals pending so our sendmsg will fail. In
Patches currently in stable-queue which might be from jbacik@fb.com are
queue-4.14/nbd-wait-uninterruptible-for-the-dead-timeout.patch queue-4.14/nbd-don-t-start-req-until-after-the-dead-connection-logic.patch