On 8/10/20 2:12 PM, Peter Zijlstra wrote:
On Mon, Aug 10, 2020 at 01:21:48PM -0600, Jens Axboe wrote:
Wait.. so the only change here is that you look at tsk->state, _after_ doing __task_work_add(), but nothing, not the Changelog nor the comment explains this.
So you're relying on __task_work_add() being an smp_mb() vs the add, and you order this against the smp_mb() in set_current_state() ?
This really needs spelling out.
I'll update the changelog, it suffers a bit from having been reused from the earlier versions. Thanks for checking!
I failed to convince myself that the existing construct was safe, so here's an incremental on top of that. Basically we re-check the task state _after_ the initial notification, to protect ourselves from the case where we initially find the task running, but between that check and when we do the notification, it's now gone to sleep. Should be pretty slim, but I think it's there.
Hence do a loop around it, if we're using TWA_RESUME.
diff --git a/fs/io_uring.c b/fs/io_uring.c index 44ac103483b6..a4ecb6c7e2b0 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1780,12 +1780,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb) * to ensure that the issuing task processes task_work. TWA_SIGNAL * is needed for that. */
- if (ctx->flags & IORING_SETUP_SQPOLL)
- if (ctx->flags & IORING_SETUP_SQPOLL) { notify = 0;
- else if (READ_ONCE(tsk->state) != TASK_RUNNING)
notify = TWA_SIGNAL;
- } else {
bool notified = false;
- __task_work_notify(tsk, notify);
/*
* If the task is running, TWA_RESUME notify is enough. Make
* sure to re-check after we've sent the notification, as not
Could we get a clue as to why TWA_RESUME is enough when it's running? I presume it is because we'll do task_work_run() somewhere before we block, but having an explicit reference here might help someone new to this make sense of it all.
* to have a race between the check and the notification. This
* only applies for TWA_RESUME, as TWA_SIGNAL is safe with a
* sleeping task
*/
do {
if (READ_ONCE(tsk->state) != TASK_RUNNING)
notify = TWA_SIGNAL;
else if (notified)
break;
__task_work_notify(tsk, notify);
notified = true;
} while (notify != TWA_SIGNAL);
- } wake_up_process(tsk); return 0;
}
Would it be clearer to write it like so perhaps?
/* * Optimization; when the task is RUNNING we can do with a * cheaper TWA_RESUME notification because,... <reason goes * here>. Otherwise do the more expensive, but always correct * TWA_SIGNAL. */ if (READ_ONCE(tsk->state) == TASK_RUNNING) { __task_work_notify(tsk, TWA_RESUME); if (READ_ONCE(tsk->state) == TASK_RUNNING) return; } __task_work_notify(tsk, TWA_SIGNAL); wake_up_process(tsk);
Yeah that is easier to read, wasn't a huge fan of the loop since it's only a single retry kind of condition. I'll adopt this suggestion, thanks!