From: Petr Malat oss@malat.biz
[ Upstream commit af295892a7abbf05a3c2ba7abc4d81bb448623d6 ]
Function __sctp_write_space() doesn't set poll key, which leads to ep_poll_callback() waking up all waiters, not only these waiting for the socket being writable. Set the key properly using wake_up_interruptible_poll(), which is preferred over the sync variant, as writers are not woken up before at least half of the queue is available. Also, TCP does the same.
Signed-off-by: Petr Malat oss@malat.biz Acked-by: Xin Long lucien.xin@gmail.com Link: https://patch.msgid.link/20250516081727.1361451-1-oss@malat.biz Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
Based on my analysis of the commit and comparison with similar historical commits, here is my assessment: **YES** This commit should be backported to stable kernel trees. ## Detailed Analysis: ### 1. **Core Issue Being Fixed** The commit addresses a specific bug in SCTP's `__sctp_write_space()` function where `wake_up_interruptible()` is called without setting a poll key. This causes `ep_poll_callback()` to wake up **all** waiters on the socket, not just those waiting for writability (EPOLLOUT events). This is inefficient and can cause spurious wake-ups for readers waiting on the same socket. ### 2. **Code Changes Analysis** The fix is minimal and surgical: ```c // Before: wake_up_interruptible(&wq->wait); // After: wake_up_interruptible_poll(&wq->wait, EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND); ``` ### 3. **Consistency with Existing Patterns** Looking at `/home/sasha/linux/net/core/stream.c:41-42`, I can see that the fix aligns with established patterns in the kernel networking stack: ```c wake_up_interruptible_poll(&wq->wait, EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND); ``` This is exactly what TCP and other protocols use in their write space callbacks, making SCTP consistent with the rest of the networking subsystem. ### 4. **Comparison with Historical Similar Commits** **Positive Indicators (matching YES backports):** - **Similar Commit #1**: Also SCTP-related, fixes socket behavior issue affecting users - **Similar Commit #4**: TCP polling fix with similar wake-up improvements - **Similar Commit #5**: SCTP wake-up related fix that was backported **Risk Assessment:** - The change is contained to a single function (`__sctp_write_space`) - No architectural changes - Follows existing kernel patterns exactly - Low regression risk - improves efficiency without changing core behavior ### 5. **Impact and Benefits** - **Performance improvement**: Eliminates spurious wake-ups for readers when only write space becomes available - **Correctness**: Ensures only appropriate waiters are woken (those waiting for EPOLLOUT) - **Consistency**: Aligns SCTP behavior with TCP and other protocols ### 6. **Stable Tree Criteria Compliance** ✅ **Important bugfix**: Fixes incorrect polling behavior affecting performance ✅ **Minimal risk**: Single-line change following established patterns ✅ **Small and contained**: Only affects one function in SCTP ✅ **No new features**: Pure bugfix ✅ **No architectural changes**: Maintains existing interfaces ### 7. **Clear Author Intent** The commit message explicitly states this change makes SCTP behave "as TCP does," indicating this is a correctness fix to align with established networking patterns rather than a new feature. This commit exhibits all the characteristics of a good stable backport candidate: it's a small, contained bugfix that improves correctness and performance without introducing new functionality or significant risk.
net/sctp/socket.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 53725ee7ba06d..b301d64d9d80f 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -9100,7 +9100,8 @@ static void __sctp_write_space(struct sctp_association *asoc) wq = rcu_dereference(sk->sk_wq); if (wq) { if (waitqueue_active(&wq->wait)) - wake_up_interruptible(&wq->wait); + wake_up_interruptible_poll(&wq->wait, EPOLLOUT | + EPOLLWRNORM | EPOLLWRBAND);
/* Note that we try to include the Async I/O support * here by modeling from the current TCP/UDP code.