From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit fde04c7e2775feb0746301e0ef86a04d3598c3fe ]
io_queue_deferred() is not tolerant to spurious calls not completing some requests. You can have an inflight drain-marked request and another request that came after and got queued into the drain list. Now, if io_queue_deferred() is called before the first request completes, it'll check the 2nd req with req_need_defer(), find that there is no drain flag set, and queue it for execution.
To make io_queue_deferred() work, it should at least check sequences for the first request, and then we need also need to check if there is another drain request creating another bubble.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/972bde11b7d4ef25b3f5e3fd34f80e4d2aa345b8.174678871... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org ---
Let me analyze the commit by examining the differences between what the commit shows and what should be the fix: Based on my analysis of the commit message and code changes, here is my evaluation: **YES** This commit should be backported to stable kernel trees. Here's my extensive analysis: ## Bug Analysis The commit fixes a serious correctness issue in the io_uring drain mechanism. The problem described is: 1. **Spurious drain flushing**: The original `io_queue_deferred()` function was not tolerant to spurious calls 2. **Race condition**: When a drain-marked request is inflight and another request gets queued into the defer list, calling `io_queue_deferred()` before the first request completes could incorrectly queue the second request for execution 3. **Incorrect sequence checking**: The function only checked `req_need_defer()` which looks for the drain flag, but didn't properly handle the sequencing requirements ## Code Changes Analysis The fix introduces several important changes: 1. **New helper function `io_drain_defer_seq()`**: ```c static bool io_drain_defer_seq(struct io_kiocb *req, u32 seq) { struct io_ring_ctx *ctx = req->ctx; return seq + READ_ONCE(ctx->cq_extra) != ctx->cached_cq_tail; } ``` This properly checks sequence ordering for drain requests. 2. **Enhanced `io_queue_deferred()` logic**: - Tracks if any drain request has been seen (`drain_seen`) - Uses `first` flag to handle the initial request specially - Replaces the simple `req_need_defer()` check with `io_drain_defer_seq()` when drain semantics are involved 3. **Proper drain bubble handling**: The new logic ensures that when there's a drain request, subsequent requests wait for proper sequencing, preventing the "spurious drain flushing" problem. ## Backport Criteria Assessment ✅ **Fixes important bug**: This addresses a correctness issue in io_uring's drain mechanism that could lead to out-of-order execution of requests, violating user expectations and potentially causing data corruption or race conditions. ✅ **Small and contained**: The changes are minimal and focused - adding one helper function and modifying the logic in one existing function. The change is about 20 lines total. ✅ **No architectural changes**: This doesn't change the overall io_uring architecture, just fixes the drain sequencing logic. ✅ **Low regression risk**: The fix makes the drain checking more conservative (stricter), which is safer than the current buggy behavior. ✅ **Critical subsystem**: io_uring is a critical high-performance I/O subsystem used by databases, storage systems, and other performance-critical applications. ## Comparison with Similar Commits Looking at the provided similar commits: - Most drain-related fixes were marked as **NO** because they were optimizations or refactoring - However, the one marked **YES** (commit about "counter inc/dec mismatch") was a correctness fix similar to this one - This commit fits the pattern of the **YES** example: it fixes a functional bug rather than optimizing code ## Risk Assessment **Minimal risk**: The change makes drain checking more conservative, so the worst case would be slightly more restrictive ordering, which maintains correctness. There's no risk of introducing the opposite problem (allowing incorrect reordering). ## Conclusion This commit fixes a real correctness bug in a critical kernel subsystem with minimal, conservative changes. It prevents potential data races and ordering violations in io_uring drain operations, making it an excellent candidate for stable backporting.
io_uring/io_uring.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index edda31a15c6e6..9266d4f2016ad 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -537,18 +537,30 @@ void io_req_queue_iowq(struct io_kiocb *req) io_req_task_work_add(req); }
+static bool io_drain_defer_seq(struct io_kiocb *req, u32 seq) +{ + struct io_ring_ctx *ctx = req->ctx; + + return seq + READ_ONCE(ctx->cq_extra) != ctx->cached_cq_tail; +} + static __cold noinline void io_queue_deferred(struct io_ring_ctx *ctx) { + bool drain_seen = false, first = true; + spin_lock(&ctx->completion_lock); while (!list_empty(&ctx->defer_list)) { struct io_defer_entry *de = list_first_entry(&ctx->defer_list, struct io_defer_entry, list);
- if (req_need_defer(de->req, de->seq)) + drain_seen |= de->req->flags & REQ_F_IO_DRAIN; + if ((drain_seen || first) && io_drain_defer_seq(de->req, de->seq)) break; + list_del_init(&de->list); io_req_task_queue(de->req); kfree(de); + first = false; } spin_unlock(&ctx->completion_lock); }