This is a note to let you know that I've just added the patch titled
net: ena: fix race condition between submit and completion admin command
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: net-ena-fix-race-condition-between-submit-and-completion-admin-command.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Mon Apr 9 17:09:24 CEST 2018
From: Netanel Belgazal netanel@amazon.com Date: Sun, 11 Jun 2017 15:42:46 +0300 Subject: net: ena: fix race condition between submit and completion admin command
From: Netanel Belgazal netanel@amazon.com
[ Upstream commit 661d2b0ccef6a63f48b61105cf7be17403d1db01 ]
Bug: "Completion context is occupied" error printout will be noticed in dmesg. This error will cause the admin command to fail, which will lead to an ena_probe() failure or a watchdog reset (depends on which admin command failed).
Root cause: __ena_com_submit_admin_cmd() is the function that submits new entries to the admin queue. The function have a check that makes sure the queue is not full and the function does not override any outstanding command. It uses head and tail indexes for this check. The head is increased by ena_com_handle_admin_completion() which runs from interrupt context, and the tail index is increased by the submit function (the function is running under ->q_lock, so there is no risk of multithread increment). Each command is associated with a completion context. This context allocated before call to __ena_com_submit_admin_cmd() and freed by ena_com_wait_and_process_admin_cq_interrupts(), right after the command was completed.
This can lead to a state where the head was increased, the check passed, but the completion context is still in use.
Solution: Use the atomic variable ->outstanding_cmds instead of using the head and the tail indexes. This variable is safe for use since it is bumped in get_comp_ctx() in __ena_com_submit_admin_cmd() and is freed by comp_ctxt_release()
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal netanel@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin alexander.levin@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -232,11 +232,9 @@ static struct ena_comp_ctx *__ena_com_su tail_masked = admin_queue->sq.tail & queue_size_mask;
/* In case of queue FULL */ - cnt = admin_queue->sq.tail - admin_queue->sq.head; + cnt = atomic_read(&admin_queue->outstanding_cmds); if (cnt >= admin_queue->q_depth) { - pr_debug("admin queue is FULL (tail %d head %d depth: %d)\n", - admin_queue->sq.tail, admin_queue->sq.head, - admin_queue->q_depth); + pr_debug("admin queue is full.\n"); admin_queue->stats.out_of_space++; return ERR_PTR(-ENOSPC); }
Patches currently in stable-queue which might be from netanel@amazon.com are
queue-4.9/net-ena-disable-admin-msix-while-working-in-polling-mode.patch queue-4.9/net-ena-fix-race-condition-between-submit-and-completion-admin-command.patch queue-4.9/net-ena-fix-rare-uncompleted-admin-command-false-alarm.patch queue-4.9/net-ena-add-missing-unmap-bars-on-device-removal.patch queue-4.9/net-ena-add-missing-return-when-ena_com_get_io_handlers-fails.patch
linux-stable-mirror@lists.linaro.org