The patch below does not apply to the 5.5-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f0b493e6b9a8959356983f57112229e69c2f7b8c Mon Sep 17 00:00:00 2001
From: Jens Axboe axboe@kernel.dk Date: Sat, 1 Feb 2020 21:30:11 -0700 Subject: [PATCH] io_uring: prevent potential eventfd recursion on poll
If we have nested or circular eventfd wakeups, then we can deadlock if we run them inline from our poll waitqueue wakeup handler. It's also possible to have very long chains of notifications, to the extent where we could risk blowing the stack.
Check the eventfd recursion count before calling eventfd_signal(). If it's non-zero, then punt the signaling to async context. This is always safe, as it takes us out-of-line in terms of stack and locking context.
Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Jens Axboe axboe@kernel.dk
diff --git a/fs/io_uring.c b/fs/io_uring.c index 217721c7bc41..43f3b7d90299 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1020,21 +1020,28 @@ static struct io_uring_cqe *io_get_cqring(struct io_ring_ctx *ctx)
static inline bool io_should_trigger_evfd(struct io_ring_ctx *ctx) { + if (!ctx->cq_ev_fd) + return false; if (!ctx->eventfd_async) return true; return io_wq_current_is_worker() || in_interrupt(); }
-static void io_cqring_ev_posted(struct io_ring_ctx *ctx) +static void __io_cqring_ev_posted(struct io_ring_ctx *ctx, bool trigger_ev) { if (waitqueue_active(&ctx->wait)) wake_up(&ctx->wait); if (waitqueue_active(&ctx->sqo_wait)) wake_up(&ctx->sqo_wait); - if (ctx->cq_ev_fd && io_should_trigger_evfd(ctx)) + if (trigger_ev) eventfd_signal(ctx->cq_ev_fd, 1); }
+static void io_cqring_ev_posted(struct io_ring_ctx *ctx) +{ + __io_cqring_ev_posted(ctx, io_should_trigger_evfd(ctx)); +} + /* Returns true if there are no backlogged entries after the flush */ static bool io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force) { @@ -3561,6 +3568,14 @@ static void io_poll_flush(struct io_wq_work **workptr) __io_poll_flush(req->ctx, nodes); }
+static void io_poll_trigger_evfd(struct io_wq_work **workptr) +{ + struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work); + + eventfd_signal(req->ctx->cq_ev_fd, 1); + io_put_req(req); +} + static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, void *key) { @@ -3586,14 +3601,22 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
if (llist_empty(&ctx->poll_llist) && spin_trylock_irqsave(&ctx->completion_lock, flags)) { + bool trigger_ev; + hash_del(&req->hash_node); io_poll_complete(req, mask, 0); - req->flags |= REQ_F_COMP_LOCKED; - io_put_req(req); - spin_unlock_irqrestore(&ctx->completion_lock, flags);
- io_cqring_ev_posted(ctx); - req = NULL; + trigger_ev = io_should_trigger_evfd(ctx); + if (trigger_ev && eventfd_signal_count()) { + trigger_ev = false; + req->work.func = io_poll_trigger_evfd; + } else { + req->flags |= REQ_F_COMP_LOCKED; + io_put_req(req); + req = NULL; + } + spin_unlock_irqrestore(&ctx->completion_lock, flags); + __io_cqring_ev_posted(ctx, trigger_ev); } else { req->result = mask; req->llist_node.next = NULL;
On Sun, Feb 09, 2020 at 12:52:51PM +0100, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.5-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f0b493e6b9a8959356983f57112229e69c2f7b8c Mon Sep 17 00:00:00 2001 From: Jens Axboe axboe@kernel.dk Date: Sat, 1 Feb 2020 21:30:11 -0700 Subject: [PATCH] io_uring: prevent potential eventfd recursion on poll
If we have nested or circular eventfd wakeups, then we can deadlock if we run them inline from our poll waitqueue wakeup handler. It's also possible to have very long chains of notifications, to the extent where we could risk blowing the stack.
Check the eventfd recursion count before calling eventfd_signal(). If it's non-zero, then punt the signaling to async context. This is always safe, as it takes us out-of-line in terms of stack and locking context.
Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Jens Axboe axboe@kernel.dk
I queued it back to 5.5 by taking f2842ab5b72d ("io_uring: enable option to only trigger eventfd for async completions") and working around missing commit 69b3e546139a ("io_uring: change io_ring_ctx bool fields into bit fields"). However, 5.4 is a bit more complex than what I can tackle without a test suite.
Jens, is there something I can run to validate io_uring on older kernels?
On 2/9/20 11:33 AM, Sasha Levin wrote:
On Sun, Feb 09, 2020 at 12:52:51PM +0100, gregkh@linuxfoundation.org wrote:
The patch below does not apply to the 5.5-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f0b493e6b9a8959356983f57112229e69c2f7b8c Mon Sep 17 00:00:00 2001 From: Jens Axboe axboe@kernel.dk Date: Sat, 1 Feb 2020 21:30:11 -0700 Subject: [PATCH] io_uring: prevent potential eventfd recursion on poll
If we have nested or circular eventfd wakeups, then we can deadlock if we run them inline from our poll waitqueue wakeup handler. It's also possible to have very long chains of notifications, to the extent where we could risk blowing the stack.
Check the eventfd recursion count before calling eventfd_signal(). If it's non-zero, then punt the signaling to async context. This is always safe, as it takes us out-of-line in terms of stack and locking context.
Cc: stable@vger.kernel.org # 5.1+ Signed-off-by: Jens Axboe axboe@kernel.dk
I queued it back to 5.5 by taking f2842ab5b72d ("io_uring: enable option to only trigger eventfd for async completions") and working around missing commit 69b3e546139a ("io_uring: change io_ring_ctx bool fields into bit fields"). However, 5.4 is a bit more complex than what I can tackle without a test suite.
Jens, is there something I can run to validate io_uring on older kernels?
liburing has a set of regression tests, but unfortunately mostly tailored to the current kernel, though stable should hopefully pass if we have everything we need backported! I can try and stake a stab at the backport too, I'll have more later in this round anyway...
linux-stable-mirror@lists.linaro.org