Recently failed to apply io_uring stable-6.1 patches
Dylan Yudaken (2): io_uring: always lock in io_apoll_task_func io_uring: revert "io_uring fix multishot accept ordering"
Gabriel Krisman Bertazi (1): io_uring: Don't set affinity on a dying sqpoll thread
Jens Axboe (1): io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used
Pavel Begunkov (2): io_uring/net: don't overflow multishot accept io_uring: break out of iowq iopoll on teardown
io_uring/io-wq.c | 17 +++++++++++++++-- io_uring/io-wq.h | 3 ++- io_uring/io_uring.c | 31 ++++++++++++++++++++----------- io_uring/net.c | 8 ++++---- io_uring/poll.c | 3 ++- io_uring/sqpoll.c | 17 +++++++++++++++++ io_uring/sqpoll.h | 1 + 7 files changed, 61 insertions(+), 19 deletions(-)
From: Dylan Yudaken dylany@meta.com
[ upstream commit c06c6c5d276707e04cedbcc55625e984922118aa ]
This is required for the failure case (io_req_complete_failed) and is missing.
The alternative would be to only lock in the failure path, however all of the non-error paths in io_poll_check_events that do not do not return IOU_POLL_NO_ACTION end up locking anyway. The only extraneous lock would be for the multishot poll overflowing the CQE ring, however multishot poll would probably benefit from being locked as it will allow completions to be batched.
So it seems reasonable to lock always.
Signed-off-by: Dylan Yudaken dylany@meta.com Link: https://lore.kernel.org/r/20221124093559.3780686-3-dylany@meta.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Pavel Begunkov asml.silence@gmail.com --- io_uring/poll.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/io_uring/poll.c b/io_uring/poll.c index 869e1d2a4413..a4084acaff91 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -360,11 +360,12 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked) if (ret == IOU_POLL_NO_ACTION) return;
+ io_tw_lock(req->ctx, locked); io_poll_remove_entries(req); io_poll_tw_hash_eject(req, locked);
if (ret == IOU_POLL_REMOVE_POLL_USE_RES) - io_req_complete_post(req); + io_req_task_complete(req, locked); else if (ret == IOU_POLL_DONE || ret == IOU_POLL_REISSUE) io_req_task_submit(req, locked); else
From: Dylan Yudaken dylany@meta.com
[ upstream commit 515e26961295bee9da5e26916c27739dca6c10e1 ]
This is no longer needed after commit aa1df3a360a0 ("io_uring: fix CQE reordering"), since all reordering is now taken care of.
This reverts commit cbd25748545c ("io_uring: fix multishot accept ordering").
Signed-off-by: Dylan Yudaken dylany@meta.com Link: https://lore.kernel.org/r/20221107125236.260132-2-dylany@meta.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Pavel Begunkov asml.silence@gmail.com --- io_uring/net.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/io_uring/net.c b/io_uring/net.c index 2b44126a876e..00b4433b6cd8 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1337,12 +1337,12 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; }
- if (ret >= 0 && - io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE, false)) + if (ret < 0) + return ret; + if (io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE, true)) goto retry;
- io_req_set_res(req, ret, 0); - return (issue_flags & IO_URING_F_MULTISHOT) ? IOU_STOP_MULTISHOT : IOU_OK; + return -ECANCELED; }
int io_socket_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
[ upstream commit 1bfed23349716a7811645336a7ce42c4b8f250bc ]
Don't allow overflowing multishot accept CQEs, we want to limit the grows of the overflow list.
Cc: stable@vger.kernel.org Fixes: 4e86a2c980137 ("io_uring: implement multishot mode for accept") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/7d0d749649244873772623dd7747966f516fe6e2.169175766... Signed-off-by: Jens Axboe axboe@kernel.dk --- io_uring/net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/net.c b/io_uring/net.c index 00b4433b6cd8..7245218fdbe2 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1339,7 +1339,7 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
if (ret < 0) return ret; - if (io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE, true)) + if (io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE, false)) goto retry;
return -ECANCELED;
[ upstream commit 45500dc4e01c167ee063f3dcc22f51ced5b2b1e9 ]
io-wq will retry iopoll even when it failed with -EAGAIN. If that races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers, such workers might potentially infinitely spin retrying iopoll again and again and each time failing on some allocation / waiting / etc. Don't keep spinning if io-wq is dying.
Fixes: 561fb04a6a225 ("io_uring: replace workqueue usage with io-wq") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk --- io_uring/io-wq.c | 10 ++++++++++ io_uring/io-wq.h | 1 + io_uring/io_uring.c | 2 ++ 3 files changed, 13 insertions(+)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index 411bb2d1acd4..dc3d4b835622 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -181,6 +181,16 @@ static void io_worker_ref_put(struct io_wq *wq) complete(&wq->worker_done); }
+bool io_wq_worker_stopped(void) +{ + struct io_worker *worker = current->worker_private; + + if (WARN_ON_ONCE(!io_wq_current_is_worker())) + return true; + + return test_bit(IO_WQ_BIT_EXIT, &worker->wqe->wq->state); +} + static void io_worker_cancel_cb(struct io_worker *worker) { struct io_wqe_acct *acct = io_wqe_get_acct(worker); diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h index 31228426d192..31cc5cc9048c 100644 --- a/io_uring/io-wq.h +++ b/io_uring/io-wq.h @@ -52,6 +52,7 @@ void io_wq_hash_work(struct io_wq_work *work, void *val);
int io_wq_cpu_affinity(struct io_wq *wq, cpumask_var_t mask); int io_wq_max_workers(struct io_wq *wq, int *new_count); +bool io_wq_worker_stopped(void);
static inline bool io_wq_is_hashed(struct io_wq_work *work) { diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6d455e2428b9..7c8e81057eb1 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1823,6 +1823,8 @@ void io_wq_submit_work(struct io_wq_work *work) if (!needs_poll) { if (!(req->ctx->flags & IORING_SETUP_IOPOLL)) break; + if (io_wq_worker_stopped()) + break; cond_resched(); continue; }
From: Jens Axboe axboe@kernel.dk
[ upstream commit ebdfefc09c6de7897962769bd3e63a2ff443ebf5 ]
If we setup the ring with SQPOLL, then that polling thread has its own io-wq setup. This means that if the application uses IORING_REGISTER_IOWQ_AFF to set the io-wq affinity, we should not be setting it for the invoking task, but rather the sqpoll task.
Add an sqpoll helper that parks the thread and updates the affinity, and use that one if we're using SQPOLL.
Fixes: fe76421d1da1 ("io_uring: allow user configurable IO thread CPU affinity") Cc: stable@vger.kernel.org # 5.10+ Link: https://github.com/axboe/liburing/discussions/884 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Pavel Begunkov asml.silence@gmail.com --- io_uring/io-wq.c | 7 +++++-- io_uring/io-wq.h | 2 +- io_uring/io_uring.c | 29 ++++++++++++++++++----------- io_uring/sqpoll.c | 15 +++++++++++++++ io_uring/sqpoll.h | 1 + 5 files changed, 40 insertions(+), 14 deletions(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index dc3d4b835622..98ac9dbcec2f 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -1350,13 +1350,16 @@ static int io_wq_cpu_offline(unsigned int cpu, struct hlist_node *node) return __io_wq_cpu_online(wq, cpu, false); }
-int io_wq_cpu_affinity(struct io_wq *wq, cpumask_var_t mask) +int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask) { int i;
+ if (!tctx || !tctx->io_wq) + return -EINVAL; + rcu_read_lock(); for_each_node(i) { - struct io_wqe *wqe = wq->wqes[i]; + struct io_wqe *wqe = tctx->io_wq->wqes[i];
if (mask) cpumask_copy(wqe->cpu_mask, mask); diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h index 31cc5cc9048c..2b2a6406dd8e 100644 --- a/io_uring/io-wq.h +++ b/io_uring/io-wq.h @@ -50,7 +50,7 @@ void io_wq_put_and_exit(struct io_wq *wq); void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work); void io_wq_hash_work(struct io_wq_work *work, void *val);
-int io_wq_cpu_affinity(struct io_wq *wq, cpumask_var_t mask); +int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask); int io_wq_max_workers(struct io_wq *wq, int *new_count); bool io_wq_worker_stopped(void);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7c8e81057eb1..f413ebed81ab 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3835,16 +3835,28 @@ static int io_register_enable_rings(struct io_ring_ctx *ctx) return 0; }
+static __cold int __io_register_iowq_aff(struct io_ring_ctx *ctx, + cpumask_var_t new_mask) +{ + int ret; + + if (!(ctx->flags & IORING_SETUP_SQPOLL)) { + ret = io_wq_cpu_affinity(current->io_uring, new_mask); + } else { + mutex_unlock(&ctx->uring_lock); + ret = io_sqpoll_wq_cpu_affinity(ctx, new_mask); + mutex_lock(&ctx->uring_lock); + } + + return ret; +} + static __cold int io_register_iowq_aff(struct io_ring_ctx *ctx, void __user *arg, unsigned len) { - struct io_uring_task *tctx = current->io_uring; cpumask_var_t new_mask; int ret;
- if (!tctx || !tctx->io_wq) - return -EINVAL; - if (!alloc_cpumask_var(&new_mask, GFP_KERNEL)) return -ENOMEM;
@@ -3865,19 +3877,14 @@ static __cold int io_register_iowq_aff(struct io_ring_ctx *ctx, return -EFAULT; }
- ret = io_wq_cpu_affinity(tctx->io_wq, new_mask); + ret = __io_register_iowq_aff(ctx, new_mask); free_cpumask_var(new_mask); return ret; }
static __cold int io_unregister_iowq_aff(struct io_ring_ctx *ctx) { - struct io_uring_task *tctx = current->io_uring; - - if (!tctx || !tctx->io_wq) - return -EINVAL; - - return io_wq_cpu_affinity(tctx->io_wq, NULL); + return __io_register_iowq_aff(ctx, NULL); }
static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx, diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 6ffa5cf1bbb8..2949959cbe60 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -423,3 +423,18 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx, io_sq_thread_finish(ctx); return ret; } + +__cold int io_sqpoll_wq_cpu_affinity(struct io_ring_ctx *ctx, + cpumask_var_t mask) +{ + struct io_sq_data *sqd = ctx->sq_data; + int ret = -EINVAL; + + if (sqd) { + io_sq_thread_park(sqd); + ret = io_wq_cpu_affinity(sqd->thread->io_uring, mask); + io_sq_thread_unpark(sqd); + } + + return ret; +} diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h index 0c3fbcd1f583..36245f1afa5e 100644 --- a/io_uring/sqpoll.h +++ b/io_uring/sqpoll.h @@ -27,3 +27,4 @@ void io_sq_thread_park(struct io_sq_data *sqd); void io_sq_thread_unpark(struct io_sq_data *sqd); void io_put_sq_data(struct io_sq_data *sqd); int io_sqpoll_wait_sq(struct io_ring_ctx *ctx); +int io_sqpoll_wq_cpu_affinity(struct io_ring_ctx *ctx, cpumask_var_t mask);
From: Gabriel Krisman Bertazi krisman@suse.de
[ upstream commit bd6fc5da4c51107e1e0cec4a3a07963d1dae2c84 ]
Syzbot reported a null-ptr-deref of sqd->thread inside io_sqpoll_wq_cpu_affinity. It turns out the sqd->thread can go away from under us during io_uring_register, in case the process gets a fatal signal during io_uring_register.
It is not particularly hard to hit the race, and while I am not sure this is the exact case hit by syzbot, it solves it. Finally, checking ->thread is enough to close the race because we locked sqd while "parking" the thread, thus preventing it from going away.
I reproduced it fairly consistently with a program that does:
int main(void) { ... io_uring_queue_init(RING_LEN, &ring1, IORING_SETUP_SQPOLL); while (1) { io_uring_register_iowq_aff(ring, 1, &mask); } }
Executed in a loop with timeout to trigger SIGTERM: while true; do timeout 1 /a.out ; done
This will hit the following BUG() in very few attempts.
BUG: kernel NULL pointer dereference, address: 00000000000007a8 PGD 800000010e949067 P4D 800000010e949067 PUD 10e46e067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 15715 Comm: dead-sqpoll Not tainted 6.5.0-rc7-next-20230825-g193296236fa0-dirty #23 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:io_sqpoll_wq_cpu_affinity+0x27/0x70 Code: 90 90 90 0f 1f 44 00 00 55 53 48 8b 9f 98 03 00 00 48 85 db 74 4f 48 89 df 48 89 f5 e8 e2 f8 ff ff 48 8b 43 38 48 85 c0 74 22 <48> 8b b8 a8 07 00 00 48 89 ee e8 ba b1 00 00 48 89 df 89 c5 e8 70 RSP: 0018:ffffb04040ea7e70 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff93c010749e40 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ffffffffa7653331 RDI: 00000000ffffffff RBP: ffffb04040ea7eb8 R08: 0000000000000000 R09: c0000000ffffdfff R10: ffff93c01141b600 R11: ffffb04040ea7d18 R12: ffff93c00ea74840 R13: 0000000000000011 R14: 0000000000000000 R15: ffff93c00ea74800 FS: 00007fb7c276ab80(0000) GS:ffff93c36f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000007a8 CR3: 0000000111634003 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body+0x1a/0x60 ? page_fault_oops+0x154/0x440 ? do_user_addr_fault+0x174/0x7b0 ? exc_page_fault+0x63/0x140 ? asm_exc_page_fault+0x22/0x30 ? io_sqpoll_wq_cpu_affinity+0x27/0x70 __io_register_iowq_aff+0x2b/0x60 __io_uring_register+0x614/0xa70 __x64_sys_io_uring_register+0xaa/0x1a0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 RIP: 0033:0x7fb7c226fec9 Code: 2e 00 b8 ca 00 00 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 7f 2d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe2c0674f8 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb7c226fec9 RDX: 00007ffe2c067530 RSI: 0000000000000011 RDI: 0000000000000003 RBP: 00007ffe2c0675d0 R08: 00007ffe2c067550 R09: 00007ffe2c067550 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffe2c067750 R14: 0000000000000000 R15: 0000000000000000 </TASK> Modules linked in: CR2: 00000000000007a8 ---[ end trace 0000000000000000 ]---
Reported-by: syzbot+c74fea926a78b8a91042@syzkaller.appspotmail.com Fixes: ebdfefc09c6d ("io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used") Signed-off-by: Gabriel Krisman Bertazi krisman@suse.de Link: https://lore.kernel.org/r/87v8cybuo6.fsf@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Pavel Begunkov asml.silence@gmail.com --- io_uring/sqpoll.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 2949959cbe60..7b6facf529b8 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -432,7 +432,9 @@ __cold int io_sqpoll_wq_cpu_affinity(struct io_ring_ctx *ctx,
if (sqd) { io_sq_thread_park(sqd); - ret = io_wq_cpu_affinity(sqd->thread->io_uring, mask); + /* Don't set affinity for a dying thread */ + if (sqd->thread) + ret = io_wq_cpu_affinity(sqd->thread->io_uring, mask); io_sq_thread_unpark(sqd); }
linux-stable-mirror@lists.linaro.org