The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: e34dbbc85d64af59176fe59fad7b4122f4330fe2
Gitweb: https://git.kernel.org/tip/e34dbbc85d64af59176fe59fad7b4122f4330fe2
Author: Xin Li (Intel) <xin(a)zytor.com>
AuthorDate: Mon, 09 Jun 2025 01:40:53 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Mon, 09 Jun 2025 08:50:58 -07:00
x86/fred/signal: Prevent immediate repeat of single step trap on return from SIGTRAP handler
Clear the software event flag in the augmented SS to prevent immediate
repeat of single step trap on return from SIGTRAP handler if the trap
flag (TF) is set without an external debugger attached.
Following is a typical single-stepping flow for a user process:
1) The user process is prepared for single-stepping by setting
RFLAGS.TF = 1.
2) When any instruction in user space completes, a #DB is triggered.
3) The kernel handles the #DB and returns to user space, invoking the
SIGTRAP handler with RFLAGS.TF = 0.
4) After the SIGTRAP handler finishes, the user process performs a
sigreturn syscall, restoring the original state, including
RFLAGS.TF = 1.
5) Goto step 2.
According to the FRED specification:
A) Bit 17 in the augmented SS is designated as the software event
flag, which is set to 1 for FRED event delivery of SYSCALL,
SYSENTER, or INT n.
B) If bit 17 of the augmented SS is 1 and ERETU would result in
RFLAGS.TF = 1, a single-step trap will be pending upon completion
of ERETU.
In step 4) above, the software event flag is set upon the sigreturn
syscall, and its corresponding ERETU would restore RFLAGS.TF = 1.
This combination causes a pending single-step trap upon completion of
ERETU. Therefore, another #DB is triggered before any user space
instruction is executed, which leads to an infinite loop in which the
SIGTRAP handler keeps being invoked on the same user space IP.
Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Suggested-by: H. Peter Anvin (Intel) <hpa(a)zytor.com>
Signed-off-by: Xin Li (Intel) <xin(a)zytor.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Tested-by: Sohil Mehta <sohil.mehta(a)intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250609084054.2083189-2-xin%40zytor.com
---
arch/x86/include/asm/sighandling.h | 22 ++++++++++++++++++++++
arch/x86/kernel/signal_32.c | 4 ++++
arch/x86/kernel/signal_64.c | 4 ++++
3 files changed, 30 insertions(+)
diff --git a/arch/x86/include/asm/sighandling.h b/arch/x86/include/asm/sighandling.h
index e770c4f..8727c7e 100644
--- a/arch/x86/include/asm/sighandling.h
+++ b/arch/x86/include/asm/sighandling.h
@@ -24,4 +24,26 @@ int ia32_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs);
int x64_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs);
int x32_setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs);
+/*
+ * To prevent immediate repeat of single step trap on return from SIGTRAP
+ * handler if the trap flag (TF) is set without an external debugger attached,
+ * clear the software event flag in the augmented SS, ensuring no single-step
+ * trap is pending upon ERETU completion.
+ *
+ * Note, this function should be called in sigreturn() before the original
+ * state is restored to make sure the TF is read from the entry frame.
+ */
+static __always_inline void prevent_single_step_upon_eretu(struct pt_regs *regs)
+{
+ /*
+ * If the trap flag (TF) is set, i.e., the sigreturn() SYSCALL instruction
+ * is being single-stepped, do not clear the software event flag in the
+ * augmented SS, thus a debugger won't skip over the following instruction.
+ */
+#ifdef CONFIG_X86_FRED
+ if (!(regs->flags & X86_EFLAGS_TF))
+ regs->fred_ss.swevent = 0;
+#endif
+}
+
#endif /* _ASM_X86_SIGHANDLING_H */
diff --git a/arch/x86/kernel/signal_32.c b/arch/x86/kernel/signal_32.c
index 98123ff..42bbc42 100644
--- a/arch/x86/kernel/signal_32.c
+++ b/arch/x86/kernel/signal_32.c
@@ -152,6 +152,8 @@ SYSCALL32_DEFINE0(sigreturn)
struct sigframe_ia32 __user *frame = (struct sigframe_ia32 __user *)(regs->sp-8);
sigset_t set;
+ prevent_single_step_upon_eretu(regs);
+
if (!access_ok(frame, sizeof(*frame)))
goto badframe;
if (__get_user(set.sig[0], &frame->sc.oldmask)
@@ -175,6 +177,8 @@ SYSCALL32_DEFINE0(rt_sigreturn)
struct rt_sigframe_ia32 __user *frame;
sigset_t set;
+ prevent_single_step_upon_eretu(regs);
+
frame = (struct rt_sigframe_ia32 __user *)(regs->sp - 4);
if (!access_ok(frame, sizeof(*frame)))
diff --git a/arch/x86/kernel/signal_64.c b/arch/x86/kernel/signal_64.c
index ee94538..d483b58 100644
--- a/arch/x86/kernel/signal_64.c
+++ b/arch/x86/kernel/signal_64.c
@@ -250,6 +250,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
sigset_t set;
unsigned long uc_flags;
+ prevent_single_step_upon_eretu(regs);
+
frame = (struct rt_sigframe __user *)(regs->sp - sizeof(long));
if (!access_ok(frame, sizeof(*frame)))
goto badframe;
@@ -366,6 +368,8 @@ COMPAT_SYSCALL_DEFINE0(x32_rt_sigreturn)
sigset_t set;
unsigned long uc_flags;
+ prevent_single_step_upon_eretu(regs);
+
frame = (struct rt_sigframe_x32 __user *)(regs->sp - 8);
if (!access_ok(frame, sizeof(*frame)))
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: f287822688eeb44ae1cf6ac45701d965efc33218
Gitweb: https://git.kernel.org/tip/f287822688eeb44ae1cf6ac45701d965efc33218
Author: Xin Li (Intel) <xin(a)zytor.com>
AuthorDate: Mon, 09 Jun 2025 01:40:54 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Mon, 09 Jun 2025 08:52:06 -07:00
selftests/x86: Add a test to detect infinite SIGTRAP handler loop
When FRED is enabled, if the Trap Flag (TF) is set without an external
debugger attached, it can lead to an infinite loop in the SIGTRAP
handler. To avoid this, the software event flag in the augmented SS
must be cleared, ensuring that no single-step trap remains pending when
ERETU completes.
This test checks for that specific scenario—verifying whether the kernel
correctly prevents an infinite SIGTRAP loop in this edge case when FRED
is enabled.
The test should _always_ pass with IDT event delivery, thus no need to
disable the test even when FRED is not enabled.
Signed-off-by: Xin Li (Intel) <xin(a)zytor.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Tested-by: Sohil Mehta <sohil.mehta(a)intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250609084054.2083189-3-xin%40zytor.com
---
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/sigtrap_loop.c | 101 ++++++++++++++++++++-
2 files changed, 102 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c
diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile
index f703fcf..8314887 100644
--- a/tools/testing/selftests/x86/Makefile
+++ b/tools/testing/selftests/x86/Makefile
@@ -12,7 +12,7 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh "$(CC)" trivial_program.c -no-pie)
TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \
check_initial_reg_state sigreturn iopl ioperm \
- test_vsyscall mov_ss_trap \
+ test_vsyscall mov_ss_trap sigtrap_loop \
syscall_arg_fault fsgsbase_restore sigaltstack
TARGETS_C_BOTHBITS += nx_stack
TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \
diff --git a/tools/testing/selftests/x86/sigtrap_loop.c b/tools/testing/selftests/x86/sigtrap_loop.c
new file mode 100644
index 0000000..9d06547
--- /dev/null
+++ b/tools/testing/selftests/x86/sigtrap_loop.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 Intel Corporation
+ */
+#define _GNU_SOURCE
+
+#include <err.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ucontext.h>
+
+#ifdef __x86_64__
+# define REG_IP REG_RIP
+#else
+# define REG_IP REG_EIP
+#endif
+
+static void sethandler(int sig, void (*handler)(int, siginfo_t *, void *), int flags)
+{
+ struct sigaction sa;
+
+ memset(&sa, 0, sizeof(sa));
+ sa.sa_sigaction = handler;
+ sa.sa_flags = SA_SIGINFO | flags;
+ sigemptyset(&sa.sa_mask);
+
+ if (sigaction(sig, &sa, 0))
+ err(1, "sigaction");
+
+ return;
+}
+
+static void sigtrap(int sig, siginfo_t *info, void *ctx_void)
+{
+ ucontext_t *ctx = (ucontext_t *)ctx_void;
+ static unsigned int loop_count_on_same_ip;
+ static unsigned long last_trap_ip;
+
+ if (last_trap_ip == ctx->uc_mcontext.gregs[REG_IP]) {
+ printf("\tTrapped at %016lx\n", last_trap_ip);
+
+ /*
+ * If the same IP is hit more than 10 times in a row, it is
+ * _considered_ an infinite loop.
+ */
+ if (++loop_count_on_same_ip > 10) {
+ printf("[FAIL]\tDetected SIGTRAP infinite loop\n");
+ exit(1);
+ }
+
+ return;
+ }
+
+ loop_count_on_same_ip = 0;
+ last_trap_ip = ctx->uc_mcontext.gregs[REG_IP];
+ printf("\tTrapped at %016lx\n", last_trap_ip);
+}
+
+int main(int argc, char *argv[])
+{
+ sethandler(SIGTRAP, sigtrap, 0);
+
+ /*
+ * Set the Trap Flag (TF) to single-step the test code, therefore to
+ * trigger a SIGTRAP signal after each instruction until the TF is
+ * cleared.
+ *
+ * Because the arithmetic flags are not significant here, the TF is
+ * set by pushing 0x302 onto the stack and then popping it into the
+ * flags register.
+ *
+ * Four instructions in the following asm code are executed with the
+ * TF set, thus the SIGTRAP handler is expected to run four times.
+ */
+ printf("[RUN]\tSIGTRAP infinite loop detection\n");
+ asm volatile(
+#ifdef __x86_64__
+ /*
+ * Avoid clobbering the redzone
+ *
+ * Equivalent to "sub $128, %rsp", however -128 can be encoded
+ * in a single byte immediate while 128 uses 4 bytes.
+ */
+ "add $-128, %rsp\n\t"
+#endif
+ "push $0x302\n\t"
+ "popf\n\t"
+ "nop\n\t"
+ "nop\n\t"
+ "push $0x202\n\t"
+ "popf\n\t"
+#ifdef __x86_64__
+ "sub $-128, %rsp\n\t"
+#endif
+ );
+
+ printf("[OK]\tNo SIGTRAP infinite loop detected\n");
+ return 0;
+}
From: Yu Kuai <yukuai3(a)huawei.com>
[ Upstream commit 2afe17794cfed5f80295b1b9facd66e6f65e5002 ]
It's supposed to be COUNTER_MAX / 2, not COUNTER_MAX.
Link: https://lore.kernel.org/linux-raid/20250524061320.370630-14-yukuai1@huaweic…
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Now I have a clear understanding. Let me analyze the specific code
change in the commit:
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Analysis of the Bug Fix
### 1. The Bug
The code in `md_bitmap_new_disk_sb()` incorrectly validates
`write_behind` against `COUNTER_MAX` instead of `COUNTER_MAX / 2`:
```c
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
```
### 2. Why This is a Real Bug
The comment immediately above this code explicitly states:
```c
/*
- FIXME: write_behind for RAID1. If not specified, what
- is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
```
The intent is clearly to limit write_behind to `COUNTER_MAX / 2`, but
the validation check was incorrectly comparing against `COUNTER_MAX`.
This creates an inconsistency where:
- Values between `COUNTER_MAX / 2` and `COUNTER_MAX` would pass the
check
- But then get clamped to `COUNTER_MAX / 2` anyway
- This means the validation isn't serving its intended purpose
### 3. Impact on Users
This bug affects:
- **dm-raid users**: When creating new bitmap superblocks for RAID
arrays with write-behind configured
- **Data consistency**: While not causing data corruption, it creates
confusion about the actual limits
- **Configuration validation**: Users might set values that appear to be
accepted but are silently reduced
### 4. Consistency with Other Code
The fix aligns with the pattern used elsewhere:
- In `dm-raid.c`: The validation correctly checks `value / 2 >
COUNTER_MAX` (equivalent check for sector-based values)
- The clamping value of `COUNTER_MAX / 2` is consistently used as the
default throughout the codebase
### 5. Backport Criteria
This commit meets the stable kernel backport criteria:
- **It fixes a bug**: Incorrect validation logic that doesn't match the
intended behavior
- **Simple and contained**: One-line change with minimal risk
- **Clear fix**: The correct value is obvious from the surrounding code
and comment
- **User-visible impact**: Affects configuration validation for dm-raid
users
### 6. Similar Historical Commits
Looking at the similar commits provided:
- Commit #1 (backported): Added validation to prevent setting
max_write_behind without write-mostly devices
- Commit #4 (backported): Fixed missing bitmap updates with write-mostly
devices
Both of these were backported and dealt with correctness issues in the
write-behind functionality, establishing a pattern that such fixes are
considered stable-worthy.
### 7. Risk Assessment
- **Low risk**: Single character change ('/' to '/ 2')
- **Well-understood**: The correct limit is documented in the comment
- **No ABI changes**: Just tightens an existing validation check
- **Fail-safe**: Makes the validation more restrictive, not less
## Conclusion
This is a clear bug fix that corrects an incorrect validation check to
match the documented and intended behavior. The fix is trivial, safe,
and addresses a real issue that affects dm-raid users. It should be
backported to ensure consistent behavior across stable kernel versions.
drivers/md/md-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 8fc85b6251e48..feff5b29d0985 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -549,7 +549,7 @@ static int md_bitmap_new_disk_sb(struct bitmap *bitmap)
* is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
write_behind = bitmap->mddev->bitmap_info.max_write_behind;
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
sb->write_behind = cpu_to_le32(write_behind);
bitmap->mddev->bitmap_info.max_write_behind = write_behind;
--
2.39.5
From: Yu Kuai <yukuai3(a)huawei.com>
[ Upstream commit 2afe17794cfed5f80295b1b9facd66e6f65e5002 ]
It's supposed to be COUNTER_MAX / 2, not COUNTER_MAX.
Link: https://lore.kernel.org/linux-raid/20250524061320.370630-14-yukuai1@huaweic…
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Now I have a clear understanding. Let me analyze the specific code
change in the commit:
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Analysis of the Bug Fix
### 1. The Bug
The code in `md_bitmap_new_disk_sb()` incorrectly validates
`write_behind` against `COUNTER_MAX` instead of `COUNTER_MAX / 2`:
```c
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
```
### 2. Why This is a Real Bug
The comment immediately above this code explicitly states:
```c
/*
- FIXME: write_behind for RAID1. If not specified, what
- is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
```
The intent is clearly to limit write_behind to `COUNTER_MAX / 2`, but
the validation check was incorrectly comparing against `COUNTER_MAX`.
This creates an inconsistency where:
- Values between `COUNTER_MAX / 2` and `COUNTER_MAX` would pass the
check
- But then get clamped to `COUNTER_MAX / 2` anyway
- This means the validation isn't serving its intended purpose
### 3. Impact on Users
This bug affects:
- **dm-raid users**: When creating new bitmap superblocks for RAID
arrays with write-behind configured
- **Data consistency**: While not causing data corruption, it creates
confusion about the actual limits
- **Configuration validation**: Users might set values that appear to be
accepted but are silently reduced
### 4. Consistency with Other Code
The fix aligns with the pattern used elsewhere:
- In `dm-raid.c`: The validation correctly checks `value / 2 >
COUNTER_MAX` (equivalent check for sector-based values)
- The clamping value of `COUNTER_MAX / 2` is consistently used as the
default throughout the codebase
### 5. Backport Criteria
This commit meets the stable kernel backport criteria:
- **It fixes a bug**: Incorrect validation logic that doesn't match the
intended behavior
- **Simple and contained**: One-line change with minimal risk
- **Clear fix**: The correct value is obvious from the surrounding code
and comment
- **User-visible impact**: Affects configuration validation for dm-raid
users
### 6. Similar Historical Commits
Looking at the similar commits provided:
- Commit #1 (backported): Added validation to prevent setting
max_write_behind without write-mostly devices
- Commit #4 (backported): Fixed missing bitmap updates with write-mostly
devices
Both of these were backported and dealt with correctness issues in the
write-behind functionality, establishing a pattern that such fixes are
considered stable-worthy.
### 7. Risk Assessment
- **Low risk**: Single character change ('/' to '/ 2')
- **Well-understood**: The correct limit is documented in the comment
- **No ABI changes**: Just tightens an existing validation check
- **Fail-safe**: Makes the validation more restrictive, not less
## Conclusion
This is a clear bug fix that corrects an incorrect validation check to
match the documented and intended behavior. The fix is trivial, safe,
and addresses a real issue that affects dm-raid users. It should be
backported to ensure consistent behavior across stable kernel versions.
drivers/md/md-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 91bc764a854c6..f2ba541ed89d4 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -546,7 +546,7 @@ static int md_bitmap_new_disk_sb(struct bitmap *bitmap)
* is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
write_behind = bitmap->mddev->bitmap_info.max_write_behind;
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
sb->write_behind = cpu_to_le32(write_behind);
bitmap->mddev->bitmap_info.max_write_behind = write_behind;
--
2.39.5
From: Yu Kuai <yukuai3(a)huawei.com>
[ Upstream commit 2afe17794cfed5f80295b1b9facd66e6f65e5002 ]
It's supposed to be COUNTER_MAX / 2, not COUNTER_MAX.
Link: https://lore.kernel.org/linux-raid/20250524061320.370630-14-yukuai1@huaweic…
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Now I have a clear understanding. Let me analyze the specific code
change in the commit:
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Analysis of the Bug Fix
### 1. The Bug
The code in `md_bitmap_new_disk_sb()` incorrectly validates
`write_behind` against `COUNTER_MAX` instead of `COUNTER_MAX / 2`:
```c
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
```
### 2. Why This is a Real Bug
The comment immediately above this code explicitly states:
```c
/*
- FIXME: write_behind for RAID1. If not specified, what
- is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
```
The intent is clearly to limit write_behind to `COUNTER_MAX / 2`, but
the validation check was incorrectly comparing against `COUNTER_MAX`.
This creates an inconsistency where:
- Values between `COUNTER_MAX / 2` and `COUNTER_MAX` would pass the
check
- But then get clamped to `COUNTER_MAX / 2` anyway
- This means the validation isn't serving its intended purpose
### 3. Impact on Users
This bug affects:
- **dm-raid users**: When creating new bitmap superblocks for RAID
arrays with write-behind configured
- **Data consistency**: While not causing data corruption, it creates
confusion about the actual limits
- **Configuration validation**: Users might set values that appear to be
accepted but are silently reduced
### 4. Consistency with Other Code
The fix aligns with the pattern used elsewhere:
- In `dm-raid.c`: The validation correctly checks `value / 2 >
COUNTER_MAX` (equivalent check for sector-based values)
- The clamping value of `COUNTER_MAX / 2` is consistently used as the
default throughout the codebase
### 5. Backport Criteria
This commit meets the stable kernel backport criteria:
- **It fixes a bug**: Incorrect validation logic that doesn't match the
intended behavior
- **Simple and contained**: One-line change with minimal risk
- **Clear fix**: The correct value is obvious from the surrounding code
and comment
- **User-visible impact**: Affects configuration validation for dm-raid
users
### 6. Similar Historical Commits
Looking at the similar commits provided:
- Commit #1 (backported): Added validation to prevent setting
max_write_behind without write-mostly devices
- Commit #4 (backported): Fixed missing bitmap updates with write-mostly
devices
Both of these were backported and dealt with correctness issues in the
write-behind functionality, establishing a pattern that such fixes are
considered stable-worthy.
### 7. Risk Assessment
- **Low risk**: Single character change ('/' to '/ 2')
- **Well-understood**: The correct limit is documented in the comment
- **No ABI changes**: Just tightens an existing validation check
- **Fail-safe**: Makes the validation more restrictive, not less
## Conclusion
This is a clear bug fix that corrects an incorrect validation check to
match the documented and intended behavior. The fix is trivial, safe,
and addresses a real issue that affects dm-raid users. It should be
backported to ensure consistent behavior across stable kernel versions.
drivers/md/md-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index b26e22dd9ba2e..cb84a4ab8d70f 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -546,7 +546,7 @@ static int md_bitmap_new_disk_sb(struct bitmap *bitmap)
* is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
write_behind = bitmap->mddev->bitmap_info.max_write_behind;
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
sb->write_behind = cpu_to_le32(write_behind);
bitmap->mddev->bitmap_info.max_write_behind = write_behind;
--
2.39.5
From: Yu Kuai <yukuai3(a)huawei.com>
[ Upstream commit 2afe17794cfed5f80295b1b9facd66e6f65e5002 ]
It's supposed to be COUNTER_MAX / 2, not COUNTER_MAX.
Link: https://lore.kernel.org/linux-raid/20250524061320.370630-14-yukuai1@huaweic…
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Now I have a clear understanding. Let me analyze the specific code
change in the commit:
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Analysis of the Bug Fix
### 1. The Bug
The code in `md_bitmap_new_disk_sb()` incorrectly validates
`write_behind` against `COUNTER_MAX` instead of `COUNTER_MAX / 2`:
```c
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
```
### 2. Why This is a Real Bug
The comment immediately above this code explicitly states:
```c
/*
- FIXME: write_behind for RAID1. If not specified, what
- is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
```
The intent is clearly to limit write_behind to `COUNTER_MAX / 2`, but
the validation check was incorrectly comparing against `COUNTER_MAX`.
This creates an inconsistency where:
- Values between `COUNTER_MAX / 2` and `COUNTER_MAX` would pass the
check
- But then get clamped to `COUNTER_MAX / 2` anyway
- This means the validation isn't serving its intended purpose
### 3. Impact on Users
This bug affects:
- **dm-raid users**: When creating new bitmap superblocks for RAID
arrays with write-behind configured
- **Data consistency**: While not causing data corruption, it creates
confusion about the actual limits
- **Configuration validation**: Users might set values that appear to be
accepted but are silently reduced
### 4. Consistency with Other Code
The fix aligns with the pattern used elsewhere:
- In `dm-raid.c`: The validation correctly checks `value / 2 >
COUNTER_MAX` (equivalent check for sector-based values)
- The clamping value of `COUNTER_MAX / 2` is consistently used as the
default throughout the codebase
### 5. Backport Criteria
This commit meets the stable kernel backport criteria:
- **It fixes a bug**: Incorrect validation logic that doesn't match the
intended behavior
- **Simple and contained**: One-line change with minimal risk
- **Clear fix**: The correct value is obvious from the surrounding code
and comment
- **User-visible impact**: Affects configuration validation for dm-raid
users
### 6. Similar Historical Commits
Looking at the similar commits provided:
- Commit #1 (backported): Added validation to prevent setting
max_write_behind without write-mostly devices
- Commit #4 (backported): Fixed missing bitmap updates with write-mostly
devices
Both of these were backported and dealt with correctness issues in the
write-behind functionality, establishing a pattern that such fixes are
considered stable-worthy.
### 7. Risk Assessment
- **Low risk**: Single character change ('/' to '/ 2')
- **Well-understood**: The correct limit is documented in the comment
- **No ABI changes**: Just tightens an existing validation check
- **Fail-safe**: Makes the validation more restrictive, not less
## Conclusion
This is a clear bug fix that corrects an incorrect validation check to
match the documented and intended behavior. The fix is trivial, safe,
and addresses a real issue that affects dm-raid users. It should be
backported to ensure consistent behavior across stable kernel versions.
drivers/md/md-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 02629516748e4..dac27206cd3df 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -546,7 +546,7 @@ static int md_bitmap_new_disk_sb(struct bitmap *bitmap)
* is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
write_behind = bitmap->mddev->bitmap_info.max_write_behind;
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
sb->write_behind = cpu_to_le32(write_behind);
bitmap->mddev->bitmap_info.max_write_behind = write_behind;
--
2.39.5
From: Yu Kuai <yukuai3(a)huawei.com>
[ Upstream commit 2afe17794cfed5f80295b1b9facd66e6f65e5002 ]
It's supposed to be COUNTER_MAX / 2, not COUNTER_MAX.
Link: https://lore.kernel.org/linux-raid/20250524061320.370630-14-yukuai1@huaweic…
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Now I have a clear understanding. Let me analyze the specific code
change in the commit:
**YES**
This commit should be backported to stable kernel trees. Here's my
extensive analysis:
## Analysis of the Bug Fix
### 1. The Bug
The code in `md_bitmap_new_disk_sb()` incorrectly validates
`write_behind` against `COUNTER_MAX` instead of `COUNTER_MAX / 2`:
```c
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
```
### 2. Why This is a Real Bug
The comment immediately above this code explicitly states:
```c
/*
- FIXME: write_behind for RAID1. If not specified, what
- is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
```
The intent is clearly to limit write_behind to `COUNTER_MAX / 2`, but
the validation check was incorrectly comparing against `COUNTER_MAX`.
This creates an inconsistency where:
- Values between `COUNTER_MAX / 2` and `COUNTER_MAX` would pass the
check
- But then get clamped to `COUNTER_MAX / 2` anyway
- This means the validation isn't serving its intended purpose
### 3. Impact on Users
This bug affects:
- **dm-raid users**: When creating new bitmap superblocks for RAID
arrays with write-behind configured
- **Data consistency**: While not causing data corruption, it creates
confusion about the actual limits
- **Configuration validation**: Users might set values that appear to be
accepted but are silently reduced
### 4. Consistency with Other Code
The fix aligns with the pattern used elsewhere:
- In `dm-raid.c`: The validation correctly checks `value / 2 >
COUNTER_MAX` (equivalent check for sector-based values)
- The clamping value of `COUNTER_MAX / 2` is consistently used as the
default throughout the codebase
### 5. Backport Criteria
This commit meets the stable kernel backport criteria:
- **It fixes a bug**: Incorrect validation logic that doesn't match the
intended behavior
- **Simple and contained**: One-line change with minimal risk
- **Clear fix**: The correct value is obvious from the surrounding code
and comment
- **User-visible impact**: Affects configuration validation for dm-raid
users
### 6. Similar Historical Commits
Looking at the similar commits provided:
- Commit #1 (backported): Added validation to prevent setting
max_write_behind without write-mostly devices
- Commit #4 (backported): Fixed missing bitmap updates with write-mostly
devices
Both of these were backported and dealt with correctness issues in the
write-behind functionality, establishing a pattern that such fixes are
considered stable-worthy.
### 7. Risk Assessment
- **Low risk**: Single character change ('/' to '/ 2')
- **Well-understood**: The correct limit is documented in the comment
- **No ABI changes**: Just tightens an existing validation check
- **Fail-safe**: Makes the validation more restrictive, not less
## Conclusion
This is a clear bug fix that corrects an incorrect validation check to
match the documented and intended behavior. The fix is trivial, safe,
and addresses a real issue that affects dm-raid users. It should be
backported to ensure consistent behavior across stable kernel versions.
drivers/md/md-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 8317e07b326d0..21decb97bc050 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -589,7 +589,7 @@ static int md_bitmap_new_disk_sb(struct bitmap *bitmap)
* is a good choice? We choose COUNTER_MAX / 2 arbitrarily.
*/
write_behind = bitmap->mddev->bitmap_info.max_write_behind;
- if (write_behind > COUNTER_MAX)
+ if (write_behind > COUNTER_MAX / 2)
write_behind = COUNTER_MAX / 2;
sb->write_behind = cpu_to_le32(write_behind);
bitmap->mddev->bitmap_info.max_write_behind = write_behind;
--
2.39.5
From: Hannes Reinecke <hare(a)kernel.org>
[ Upstream commit f42d4796ee100fade86086d1cf98537fb4d326c8 ]
When the socket is busy processing nvme_tcp_try_recv() might return
-EAGAIN, but this doesn't automatically imply that the sending side is
blocked, too. So check if there are pending requests once
nvme_tcp_try_recv() returns -EAGAIN and continue with the sending loop
to avoid I/O stalls.
Signed-off-by: Hannes Reinecke <hare(a)kernel.org>
Acked-by: Chris Leech <cleech(a)redhat.com>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Based on my analysis of the commit and comparison with similar commits,
here is my determination:
**YES**
This commit should be backported to stable kernel trees for the
following reasons:
1. **Fixes a real user-impacting bug**: The commit addresses I/O stalls
on congested sockets, which is a serious issue that can cause system
hangs or severe performance degradation. When the socket is congested
and `nvme_tcp_try_recv()` returns -EAGAIN, the current code
incorrectly assumes that the sending side is also blocked, leading to
I/O stalls.
2. **Small and contained fix**: The changes are minimal and localized to
the `nvme_tcp_io_work()` function:
- Changes `nvme_tcp_try_recv()` to return 0 instead of -EAGAIN to
prevent premature exit
- Adds a check after receive processing to see if the socket became
writable
- Only 5 lines of actual code changes
3. **Clear logic fix**: The patch addresses a specific logic error
where:
- The receive side returns -EAGAIN (socket would block on receive)
- But this doesn't mean the send side is also blocked
- The fix checks if there are pending requests and if the socket is
writable after receive processing
4. **Similar to other backported fixes**: Looking at the historical
commits:
- Commit #2 (backported): Fixed hangs waiting for icresp response
- Commit #3 (backported): Fixed wrong stop condition in io_work
- Commit #4 (backported): Fixed UAF when detecting digest errors
- Commit #5 (backported): Fixed possible null deref on timed out
connections
All these commits that were backported involved fixing hangs, stalls,
or error conditions in the nvme-tcp driver.
5. **No architectural changes**: The commit doesn't introduce new
features or change the architecture. It simply adds a missing check
to prevent I/O stalls, which aligns with stable kernel criteria.
6. **Critical subsystem**: NVMe-TCP is used for storage access, and I/O
stalls can have severe consequences for system stability and data
integrity.
The specific code changes show:
- `return consumed == -EAGAIN ? 0 : consumed;` - prevents treating
EAGAIN as an error
- The new check `if (nvme_tcp_queue_has_pending(queue) &&
sk_stream_is_writeable(queue->sock->sk))` ensures that if there are
pending requests and the socket is writable after receive processing,
we continue processing instead of stalling.
This is exactly the type of bug fix that should be backported to stable
kernels to ensure reliable storage operation.
drivers/nvme/host/tcp.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 4cc72be28c731..13ede6e309092 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1349,7 +1349,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
queue->nr_cqe = 0;
consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
release_sock(sk);
- return consumed;
+ return consumed == -EAGAIN ? 0 : consumed;
}
static void nvme_tcp_io_work(struct work_struct *w)
@@ -1377,6 +1377,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
else if (unlikely(result < 0))
return;
+ /* did we get some space after spending time in recv? */
+ if (nvme_tcp_queue_has_pending(queue) &&
+ sk_stream_is_writeable(queue->sock->sk))
+ pending = true;
+
if (!pending || !queue->rd_enabled)
return;
--
2.39.5
From: Hannes Reinecke <hare(a)kernel.org>
[ Upstream commit f42d4796ee100fade86086d1cf98537fb4d326c8 ]
When the socket is busy processing nvme_tcp_try_recv() might return
-EAGAIN, but this doesn't automatically imply that the sending side is
blocked, too. So check if there are pending requests once
nvme_tcp_try_recv() returns -EAGAIN and continue with the sending loop
to avoid I/O stalls.
Signed-off-by: Hannes Reinecke <hare(a)kernel.org>
Acked-by: Chris Leech <cleech(a)redhat.com>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Based on my analysis of the commit and comparison with similar commits,
here is my determination:
**YES**
This commit should be backported to stable kernel trees for the
following reasons:
1. **Fixes a real user-impacting bug**: The commit addresses I/O stalls
on congested sockets, which is a serious issue that can cause system
hangs or severe performance degradation. When the socket is congested
and `nvme_tcp_try_recv()` returns -EAGAIN, the current code
incorrectly assumes that the sending side is also blocked, leading to
I/O stalls.
2. **Small and contained fix**: The changes are minimal and localized to
the `nvme_tcp_io_work()` function:
- Changes `nvme_tcp_try_recv()` to return 0 instead of -EAGAIN to
prevent premature exit
- Adds a check after receive processing to see if the socket became
writable
- Only 5 lines of actual code changes
3. **Clear logic fix**: The patch addresses a specific logic error
where:
- The receive side returns -EAGAIN (socket would block on receive)
- But this doesn't mean the send side is also blocked
- The fix checks if there are pending requests and if the socket is
writable after receive processing
4. **Similar to other backported fixes**: Looking at the historical
commits:
- Commit #2 (backported): Fixed hangs waiting for icresp response
- Commit #3 (backported): Fixed wrong stop condition in io_work
- Commit #4 (backported): Fixed UAF when detecting digest errors
- Commit #5 (backported): Fixed possible null deref on timed out
connections
All these commits that were backported involved fixing hangs, stalls,
or error conditions in the nvme-tcp driver.
5. **No architectural changes**: The commit doesn't introduce new
features or change the architecture. It simply adds a missing check
to prevent I/O stalls, which aligns with stable kernel criteria.
6. **Critical subsystem**: NVMe-TCP is used for storage access, and I/O
stalls can have severe consequences for system stability and data
integrity.
The specific code changes show:
- `return consumed == -EAGAIN ? 0 : consumed;` - prevents treating
EAGAIN as an error
- The new check `if (nvme_tcp_queue_has_pending(queue) &&
sk_stream_is_writeable(queue->sock->sk))` ensures that if there are
pending requests and the socket is writable after receive processing,
we continue processing instead of stalling.
This is exactly the type of bug fix that should be backported to stable
kernels to ensure reliable storage operation.
drivers/nvme/host/tcp.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index d991baa82a1c2..a2e825e37b38b 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1349,7 +1349,7 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
queue->nr_cqe = 0;
consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
release_sock(sk);
- return consumed;
+ return consumed == -EAGAIN ? 0 : consumed;
}
static void nvme_tcp_io_work(struct work_struct *w)
@@ -1377,6 +1377,11 @@ static void nvme_tcp_io_work(struct work_struct *w)
else if (unlikely(result < 0))
return;
+ /* did we get some space after spending time in recv? */
+ if (nvme_tcp_queue_has_pending(queue) &&
+ sk_stream_is_writeable(queue->sock->sk))
+ pending = true;
+
if (!pending || !queue->rd_enabled)
return;
--
2.39.5