On Mon, Aug 10, 2020 at 10:25 PM Jens Axboe axboe@kernel.dk wrote:
On 8/10/20 2:13 PM, Jens Axboe wrote:
Would it be clearer to write it like so perhaps?
/* * Optimization; when the task is RUNNING we can do with a * cheaper TWA_RESUME notification because,... <reason goes * here>. Otherwise do the more expensive, but always correct * TWA_SIGNAL. */ if (READ_ONCE(tsk->state) == TASK_RUNNING) { __task_work_notify(tsk, TWA_RESUME); if (READ_ONCE(tsk->state) == TASK_RUNNING) return; } __task_work_notify(tsk, TWA_SIGNAL); wake_up_process(tsk);
Yeah that is easier to read, wasn't a huge fan of the loop since it's only a single retry kind of condition. I'll adopt this suggestion, thanks!
Re-write it a bit on top of that, just turning it into two separate READ_ONCE, and added appropriate comments. For the SQPOLL case, the wake_up_process() is enough, so we can clean up that if/else.
https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=49bc5c1...
I think I'm starting to understand the overall picture here, and I think if my understanding is correct, your solution isn't going to work properly.
My understanding of the scenario you're trying to address is:
- task A starts up io_uring - task A tells io_uring to bump the counter of an eventfd E when work has been completed - task A submits some work ("read a byte from file descriptor X", or something like that) - io_uring internally starts an asynchronous I/O operation, with a callback C - task A calls read(E, &counter, sizeof(counter)) to wait for events to be processed - the async I/O operation finishes, C is invoked, and C schedules task_work for task A
And here you run into a deadlock, because the task_work will only run when task A returns from the syscall, but the syscall will only return once the task_work is executing and has finished the I/O operation.
If that is the scenario you're trying to solve here (where you're trying to force a task that's in the middle of some syscall that's completely unrelated to io_uring to return back to syscall context), I don't think this will work: It might well be that the task has e.g. just started entering the read() syscall, and is *about to* block, but is currently still running.