On 8/10/20 9:02 AM, Jens Axboe wrote:
On 8/10/20 5:42 AM, peterz@infradead.org wrote:
On Sat, Aug 08, 2020 at 12:34:39PM -0600, Jens Axboe wrote:
An earlier commit:
b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()")
ensured that we didn't get stuck waiting for eventfd reads when it's registered with the io_uring ring for event notification, but we still have a gap where the task can be waiting on other events in the kernel and need a bigger nudge to make forward progress.
Ensure that we use signaled notifications for a task that isn't currently running, to be certain the work is seen and processed immediately.
Cc: stable@vger.kernel.org # v5.7+ Reported-by: Josef josef.grieb@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk
fs/io_uring.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index e9b27cdaa735..443eecdfeda9 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1712,21 +1712,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb) struct io_ring_ctx *ctx = req->ctx; int ret, notify = TWA_RESUME;
- ret = __task_work_add(tsk, cb);
- if (unlikely(ret))
return ret;
- /*
- SQPOLL kernel thread doesn't need notification, just a wakeup.
* If we're not using an eventfd, then TWA_RESUME is always fine,
* as we won't have dependencies between request completions for
* other kernel wait conditions.
* For any other work, use signaled wakeups if the task isn't
* running to avoid dependencies between tasks or threads. If
* the issuing task is currently waiting in the kernel on a thread,
* and same thread is waiting for a completion event, then we need
* to ensure that the issuing task processes task_work. TWA_SIGNAL
*/ if (ctx->flags & IORING_SETUP_SQPOLL) notify = 0;* is needed for that.
- else if (ctx->cq_ev_fd)
- else if (READ_ONCE(tsk->state) != TASK_RUNNING) notify = TWA_SIGNAL;
- ret = task_work_add(tsk, cb, notify);
- if (!ret)
wake_up_process(tsk);
- return ret;
- __task_work_notify(tsk, notify);
- wake_up_process(tsk);
- return 0;
}
Wait.. so the only change here is that you look at tsk->state, _after_ doing __task_work_add(), but nothing, not the Changelog nor the comment explains this.
So you're relying on __task_work_add() being an smp_mb() vs the add, and you order this against the smp_mb() in set_current_state() ?
This really needs spelling out.
I'll update the changelog, it suffers a bit from having been reused from the earlier versions. Thanks for checking!
I failed to convince myself that the existing construct was safe, so here's an incremental on top of that. Basically we re-check the task state _after_ the initial notification, to protect ourselves from the case where we initially find the task running, but between that check and when we do the notification, it's now gone to sleep. Should be pretty slim, but I think it's there.
Hence do a loop around it, if we're using TWA_RESUME.
diff --git a/fs/io_uring.c b/fs/io_uring.c index 44ac103483b6..a4ecb6c7e2b0 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1780,12 +1780,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb) * to ensure that the issuing task processes task_work. TWA_SIGNAL * is needed for that. */ - if (ctx->flags & IORING_SETUP_SQPOLL) + if (ctx->flags & IORING_SETUP_SQPOLL) { notify = 0; - else if (READ_ONCE(tsk->state) != TASK_RUNNING) - notify = TWA_SIGNAL; + } else { + bool notified = false;
- __task_work_notify(tsk, notify); + /* + * If the task is running, TWA_RESUME notify is enough. Make + * sure to re-check after we've sent the notification, as not + * to have a race between the check and the notification. This + * only applies for TWA_RESUME, as TWA_SIGNAL is safe with a + * sleeping task + */ + do { + if (READ_ONCE(tsk->state) != TASK_RUNNING) + notify = TWA_SIGNAL; + else if (notified) + break; + __task_work_notify(tsk, notify); + notified = true; + } while (notify != TWA_SIGNAL); + } wake_up_process(tsk); return 0; }
and I've folded it in here:
https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=8d685b5...