From: Sam James sam@gentoo.org
[ Upstream commit 9aeed9041929812a10a6d693af050846942a1d16 ]
Similar in nature to ab107276607af90b13a5994997e19b7b9731e251. glibc-2.42 drops the legacy termio struct, but the ioctls.h header still defines some TC* constants in terms of termio (via sizeof). Hardcode the values instead.
This fixes building Python for example, which falls over like: ./Modules/termios.c:1119:16: error: invalid application of 'sizeof' to incomplete type 'struct termio'
Link: https://bugs.gentoo.org/961769 Link: https://bugs.gentoo.org/962600 Signed-off-by: Sam James sam@gentoo.org Reviewed-by: Magnus Lindholm linmag7@gmail.com Link: https://lore.kernel.org/r/6ebd3451908785cad53b50ca6bc46cfe9d6bc03c.176492249... Signed-off-by: Magnus Lindholm linmag7@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis Summary
### 1. COMMIT MESSAGE ANALYSIS
This commit addresses a **userspace build failure** caused by glibc-2.42 removing the legacy `struct termio` definition. The kernel's uapi header `arch/alpha/include/uapi/asm/ioctls.h` uses `sizeof(struct termio)` in macro expansions, which fails when that struct is undefined.
**Key signals:** - Links to two real bug reports (Gentoo bugs #961769 and #962600) - References a prior fix (ab107276607af) for powerpc with the exact same issue - Has `Reviewed-by:` tag - Demonstrates real-world impact: Python build failure
### 2. CODE CHANGE ANALYSIS
**What the change does:** ```c // Before: Uses sizeof(struct termio) in macro expansion #define TCGETA _IOR('t', 23, struct termio) // After: Pre-computed constant #define TCGETA 0x40127417 ```
**Verification of the hardcoded values:** Looking at `arch/alpha/include/uapi/asm/ioctl.h`, the ioctl encoding on alpha is: - `_IOC_SIZESHIFT` = 16, `_IOC_DIRSHIFT` = 29 - `_IOC_READ` = 2, `_IOC_WRITE` = 4
For `TCGETA = _IOR('t', 23, struct termio)`: - dir=2, type=0x74, nr=0x17, size=18(0x12) - Result: `(2<<29)|(0x74<<8)|(0x17)|(0x12<<16)` = **0x40127417** ✓
The hardcoded values are mathematically correct.
### 3. CLASSIFICATION
**Category: BUILD FIX**
This falls under the **build fixes** exception category - it's critical for users who need to build userspace software with modern glibc. Other architectures (powerpc, sh, xtensa) already have identical fixes in the tree.
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed:** 4 - **Files touched:** 1 (alpha-specific uapi header) - **Risk: EXTREMELY LOW** - No runtime behavior change - only affects compilation - The ioctl numbers are identical (pre-computed vs macro-computed) - Pattern already established in other architectures
### 5. USER IMPACT
- **Affected users:** Anyone on alpha using glibc 2.42+ (e.g., Gentoo users) - **Severity:** Critical for affected users - cannot build Python, and potentially many other programs that use termios - **Scope:** Limited to alpha architecture, but complete blocker for those users
### 6. STABILITY INDICATORS
- Has `Reviewed-by:` tag - Same pattern accepted for powerpc (commit ab107276607af) - Documented with actual bug reports showing real users affected - No runtime changes - purely compile-time fix
### 7. DEPENDENCY CHECK
- **No dependencies** - standalone fix - The affected file exists in all stable trees (uapi header) - Clean application expected
### Final Assessment
**Pros:** - Fixes a real, documented build breakage with modern glibc - Extremely small and localized (4 lines, 1 file) - Zero runtime risk - mathematically equivalent values - Established precedent with powerpc fix - Has review tag and bug report links
**Cons:** - No explicit `Cc: stable@vger.kernel.org` tag - Alpha is a niche architecture
**Verdict:** This is textbook stable material. It's a build fix that: 1. Is obviously correct (values are mathematically equivalent) 2. Fixes a real bug (complete build failure) 3. Is small and contained (4 lines) 4. Has no runtime risk whatsoever 5. Follows established precedent from other architectures
The lack of a stable tag is not disqualifying - many valid stable patches don't have it. Build compatibility with modern toolchains is exactly what stable trees need to maintain.
**YES**
arch/alpha/include/uapi/asm/ioctls.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/alpha/include/uapi/asm/ioctls.h b/arch/alpha/include/uapi/asm/ioctls.h index 971311605288f..a09d04b49cc65 100644 --- a/arch/alpha/include/uapi/asm/ioctls.h +++ b/arch/alpha/include/uapi/asm/ioctls.h @@ -23,10 +23,10 @@ #define TCSETSW _IOW('t', 21, struct termios) #define TCSETSF _IOW('t', 22, struct termios)
-#define TCGETA _IOR('t', 23, struct termio) -#define TCSETA _IOW('t', 24, struct termio) -#define TCSETAW _IOW('t', 25, struct termio) -#define TCSETAF _IOW('t', 28, struct termio) +#define TCGETA 0x40127417 +#define TCSETA 0x80127418 +#define TCSETAW 0x80127419 +#define TCSETAF 0x8012741c
#define TCSBRK _IO('t', 29) #define TCXONC _IO('t', 30)
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit fedadc4137234c3d00c4785eeed3e747fe9036ae ]
gup_pgd_range() is invoked with disabled interrupts and invokes __kmap_local_page_prot() via pte_offset_map(), gup_p4d_range(). With HIGHPTE enabled, __kmap_local_page_prot() invokes kmap_high_get() which uses a spinlock_t via lock_kmap_any(). This leads to an sleeping-while-atomic error on PREEMPT_RT because spinlock_t becomes a sleeping lock and must not be acquired in atomic context.
The loop in map_new_virtual() uses wait_queue_head_t for wake up which also is using a spinlock_t.
Since HIGHPTE is rarely needed at all, turn it off for PREEMPT_RT to allow the use of get_user_pages_fast().
[arnd: rework patch to turn off HIGHPTE instead of HAVE_PAST_GUP]
Co-developed-by: Arnd Bergmann arnd@arndb.de
Acked-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
# Commit Analysis: ARM: 9461/1: Disable HIGHPTE on PREEMPT_RT kernels
## 1. COMMIT MESSAGE ANALYSIS
**Problem identified**: The commit addresses a "sleeping-while-atomic" bug on PREEMPT_RT ARM kernels. The issue occurs because: - `gup_pgd_range()` runs with interrupts disabled (atomic context) - With HIGHPTE enabled, the code path calls `kmap_high_get()` which acquires a `spinlock_t` via `lock_kmap_any()` - On PREEMPT_RT, `spinlock_t` becomes a sleeping lock (mutex) - Attempting to acquire a sleeping lock in atomic context is a bug
**Tags present**: - Acked-by: Linus Walleij (ARM/pinctrl maintainer) - Reviewed-by: Arnd Bergmann (major ARM contributor) - Signed-off-by: Sebastian Andrzej Siewior (PREEMPT_RT maintainer) - Signed-off-by: Russell King (ARM maintainer)
**Missing tags**: No `Cc: stable@vger.kernel.org` or `Fixes:` tag.
## 2. CODE CHANGE ANALYSIS
The change is a single-line Kconfig modification:
```diff - depends on HIGHMEM + depends on HIGHMEM && !PREEMPT_RT ```
This simply prevents the `HIGHPTE` configuration option from being selected when `PREEMPT_RT` is enabled. The technical mechanism of the bug is clear:
1. `get_user_pages_fast()` → `gup_pgd_range()` (runs with interrupts disabled) 2. → `pte_offset_map()` → `__kmap_local_page_prot()` → `kmap_high_get()` 3. `kmap_high_get()` calls `lock_kmap_any()` which uses `spinlock_t` 4. On PREEMPT_RT: `spinlock_t` = sleeping lock → BUG in atomic context
The commit message notes that "HIGHPTE is rarely needed at all" - it's an optimization to put page tables in high memory, which is typically unnecessary on modern systems.
## 3. CLASSIFICATION
- **Bug type**: Runtime crash/BUG (sleeping-while-atomic violation) - **Not a new feature**: Disables a problematic configuration combination - **Not a security fix**: No CVE or security-sensitive code - **Build fix category**: No, this is a runtime issue
## 4. SCOPE AND RISK ASSESSMENT
**Scope**: - 1 file changed (`arch/arm/Kconfig`) - 1 line modified - Affects only ARM + PREEMPT_RT + HIGHMEM configurations
**Risk**: **Very low** - This is a Kconfig dependency change only - Users who previously had HIGHPTE enabled will now have it disabled on PREEMPT_RT - The workaround is conservative (disable the problematic feature rather than complex code fixes) - Cannot introduce regressions in other code paths
## 5. USER IMPACT
**Affected users**: ARM systems running PREEMPT_RT kernels with HIGHMEM (systems with >~800MB RAM on 32-bit ARM)
**Severity**: High for affected users - `get_user_pages_fast()` is a commonly used path for I/O and memory management - Without this fix, users would hit kernel warnings/crashes when GUP fast path is used - This completely breaks PREEMPT_RT usability on affected configurations
## 6. STABILITY INDICATORS
**Review chain is strong**: - Sebastian Andrzej Siewior (PREEMPT_RT maintainer) developed this - Arnd Bergmann reworked and reviewed it - Linus Walleij acked it - Russell King (ARM maintainer) accepted it
## 7. DEPENDENCY CHECK
This is a standalone Kconfig change. Dependencies: - `PREEMPT_RT` must exist in the kernel - PREEMPT_RT was merged into mainline in kernel 6.12 - `HIGHPTE` and `HIGHMEM` options exist on ARM in all relevant kernel versions
The fix should apply cleanly to any stable tree with PREEMPT_RT support.
## STABLE KERNEL CRITERIA EVALUATION
| Criterion | Assessment | |-----------|------------| | Obviously correct | ✅ Yes - disables problematic config combination | | Fixes real bug | ✅ Yes - sleeping-while-atomic crash | | Important issue | ✅ Yes - crashes on PREEMPT_RT systems | | Small and contained | ✅ Yes - 1 line Kconfig change | | No new features | ✅ Yes - only disables an option | | Clean application | ✅ Yes - simple dependency addition |
## CONCERNS
1. **No explicit stable request**: Maintainers didn't add `Cc: stable`. However, the fix is clearly appropriate for stable. 2. **PREEMPT_RT availability**: Only relevant for kernels 6.12+ where PREEMPT_RT was merged into mainline.
## CONCLUSION
This commit fixes a real, reproducible crash on ARM PREEMPT_RT systems. The fix is minimal (1 line), obviously correct (disables problematic feature combination), well-reviewed by relevant maintainers (ARM, RT), and carries essentially zero risk. While there's no explicit stable tag, the technical merits strongly support backporting.
The sleeping-while-atomic bug would make `get_user_pages_fast()` unusable on affected configurations, which is a serious correctness issue for PREEMPT_RT users who require deterministic behavior.
**YES**
arch/arm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 4fb985b76e97f..70cd3b5b5a059 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1215,7 +1215,7 @@ config HIGHMEM
config HIGHPTE bool "Allocate 2nd-level pagetables from highmem" if EXPERT - depends on HIGHMEM + depends on HIGHMEM && !PREEMPT_RT default y help The VM uses one page of physical memory for each page table.
From: Yang Li yang.li85200@gmail.com
[ Upstream commit 809ef03d6d21d5fea016bbf6babeec462e37e68c ]
In the csky_cmpxchg_fixup function, it is incorrect to use the global variable csky_cmpxchg_stw to determine the address where the exception occurred.The global variable csky_cmpxchg_stw stores the opcode at the time of the exception, while &csky_cmpxchg_stw shows the address where the exception occurred.
Signed-off-by: Yang Li yang.li85200@gmail.com Signed-off-by: Guo Ren guoren@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis of csky: fix csky_cmpxchg_fixup not working
### 1. COMMIT MESSAGE ANALYSIS
**Subject**: "csky: fix csky_cmpxchg_fixup not working" - clearly indicates a bug fix
**Key issue explained**: The commit message describes a fundamental semantic error - the code was using the **value** of `csky_cmpxchg_stw` (which contains an opcode) when it should have been using the **address** of `csky_cmpxchg_stw` (where the instruction is located).
**Missing tags**: No "Cc: stable@vger.kernel.org" or "Fixes:" tag, but this doesn't preclude backporting if the fix is clearly warranted.
### 2. CODE CHANGE ANALYSIS
The change is extremely surgical - only 2 lines modified:
```c - if (instruction_pointer(regs) == csky_cmpxchg_stw) - instruction_pointer_set(regs, csky_cmpxchg_ldw); + if (instruction_pointer(regs) == (unsigned long)&csky_cmpxchg_stw) + instruction_pointer_set(regs, (unsigned long)&csky_cmpxchg_ldw); ```
**Technical explanation**: - `csky_cmpxchg_ldw` and `csky_cmpxchg_stw` are external symbols declared as `extern unsigned long` - they represent labels/addresses in the cmpxchg assembly implementation - The **value** stored at these symbols is the opcode of the instruction - The **address** (`&csky_cmpxchg_stw`) is where the instruction resides in memory - The code compares against `instruction_pointer(regs)` which is an address, so it must compare against an address, not an opcode value
**Root cause**: Simple semantic error - using value instead of address
**Why the bug is severe**: This function handles TLB modification faults during compare-and-exchange operations. When such a fault occurs at the store instruction, the handler should redirect execution back to the load instruction to retry the operation. With the bug, the comparison `instruction_pointer(regs) == csky_cmpxchg_stw` would almost never match (comparing an address to an opcode), so the fixup **never worked**.
### 3. CLASSIFICATION
- **Bug fix**: Yes, clearly fixing broken functionality - **Security impact**: Potentially - broken cmpxchg can lead to race conditions - **Data corruption risk**: Yes - atomic operations that don't work correctly can cause data races
### 4. SCOPE AND RISK ASSESSMENT
- **Lines changed**: 2 lines - **Files touched**: 1 file (arch/csky/mm/fault.c) - **Subsystem**: CSKY architecture-specific code - **Complexity**: Minimal - straightforward address-of fix - **Risk**: Very low - the fix is obviously correct and architecture- specific
### 5. USER IMPACT
- **Affected users**: CSKY systems without LDSTEX instructions (when `CONFIG_CPU_HAS_LDSTEX` is not defined) - **Severity**: High - broken compare-and-exchange atomic operations can cause: - Race conditions in concurrent code - Data corruption - Deadlocks - Unpredictable behavior in any code using cmpxchg
### 6. STABILITY INDICATORS
- Signed-off by maintainer Guo Ren (CSKY maintainer) - The fix is logically obvious once understood
### 7. DEPENDENCY CHECK
- No dependencies on other commits - The affected code has existed since CSKY was added to the kernel - Should apply cleanly to stable trees that have CSKY support
### CONCLUSION
This commit clearly meets all stable kernel criteria:
1. **Obviously correct**: The fix is a textbook case of using `&variable` (address) instead of `variable` (value) when comparing against an instruction pointer 2. **Fixes a real bug**: The cmpxchg fixup mechanism was completely non- functional 3. **Important issue**: Broken atomic operations can cause data corruption, race conditions, and system instability 4. **Small and contained**: Only 2 lines changed in one file, architecture-specific 5. **No new features**: Just fixing existing functionality to actually work
The absence of stable tags appears to be an oversight. This is a critical fix for CSKY platforms - without it, the entire cmpxchg fixup path is dead code that never triggers when it should.
**YES**
arch/csky/mm/fault.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/csky/mm/fault.c b/arch/csky/mm/fault.c index a6ca7dff42153..7ff4011089850 100644 --- a/arch/csky/mm/fault.c +++ b/arch/csky/mm/fault.c @@ -45,8 +45,8 @@ static inline void csky_cmpxchg_fixup(struct pt_regs *regs) if (trap_no(regs) != VEC_TLBMODIFIED) return;
- if (instruction_pointer(regs) == csky_cmpxchg_stw) - instruction_pointer_set(regs, csky_cmpxchg_ldw); + if (instruction_pointer(regs) == (unsigned long)&csky_cmpxchg_stw) + instruction_pointer_set(regs, (unsigned long)&csky_cmpxchg_ldw); return; } #endif
From: Mikulas Patocka mpatocka@redhat.com
[ Upstream commit 8581b19eb2c5ccf06c195d3b5468c3c9d17a5020 ]
There is reported 'scheduling while atomic' bug when using dm-snapshot on real-time kernels. The reason for the bug is that the hlist_bl code does preempt_disable() when taking the lock and the kernel attempts to take other spinlocks while holding the hlist_bl lock.
Fix this by converting a hlist_bl spinlock into a regular spinlock.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Reported-by: Jiping Ma jiping.ma2@windriver.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
### Summary
**Bug Analysis:** The commit fixes a genuine runtime bug on PREEMPT_RT kernels. The `hlist_bl` implementation uses bit spinlocks with `preempt_disable()`, creating an atomic context. When dm-snapshot code then tries to take other spinlocks (which on RT kernels are sleeping locks/rt_mutexes), it triggers "scheduling while atomic" warnings.
**Historical Context:** The buggy code (using `hlist_bl` in dm-snapshot) was introduced in commit `f79ae415b64c35` ("dm snapshot: Make exception tables scalable") which landed in **v5.2**. This means all current stable/LTS trees (5.4.y, 5.10.y, 5.15.y, 6.1.y, 6.6.y, 6.12.y) contain the vulnerable code.
**Risk vs Benefit:**
| Factor | Assessment | |--------|------------| | Bug severity | HIGH - causes kernel BUG on RT systems | | Fix correctness | HIGH - straightforward lock type conversion | | Code scope | CONTAINED - dm-snapshot only | | Regression risk | LOW - semantically equivalent locking | | User impact | RT kernel users with dm-snapshot |
**Stable Criteria Checklist:** - ✅ Fixes a real bug (scheduling while atomic) - ✅ Obviously correct (converts bit spinlock to regular spinlock) - ✅ Small and contained (within dm-snapshot subsystem) - ✅ No new features or APIs - ✅ Authored by subsystem maintainer (Mikulas Patocka) - ✅ Has Reported-by indicating real user impact - ⚠️ No explicit `Cc: stable` tag (but this isn't required)
**Conclusion:** This commit is an excellent stable backport candidate. It fixes a real, reported bug that causes kernel warnings and potential instability on PREEMPT_RT kernels when using dm-snapshot. The fix is well-understood, uses standard kernel primitives, and is authored by the DM subsystem maintainer. The change is self-contained and poses low regression risk since it maintains the same locking semantics while fixing RT compatibility.
**YES**
drivers/md/dm-exception-store.h | 2 +- drivers/md/dm-snap.c | 73 +++++++++++++++------------------ 2 files changed, 35 insertions(+), 40 deletions(-)
diff --git a/drivers/md/dm-exception-store.h b/drivers/md/dm-exception-store.h index b679766375381..061b4d3108132 100644 --- a/drivers/md/dm-exception-store.h +++ b/drivers/md/dm-exception-store.h @@ -29,7 +29,7 @@ typedef sector_t chunk_t; * chunk within the device. */ struct dm_exception { - struct hlist_bl_node hash_list; + struct hlist_node hash_list;
chunk_t old_chunk; chunk_t new_chunk; diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index f40c18da40000..dbd148967de42 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -40,10 +40,15 @@ static const char dm_snapshot_merge_target_name[] = "snapshot-merge"; #define DM_TRACKED_CHUNK_HASH(x) ((unsigned long)(x) & \ (DM_TRACKED_CHUNK_HASH_SIZE - 1))
+struct dm_hlist_head { + struct hlist_head head; + spinlock_t lock; +}; + struct dm_exception_table { uint32_t hash_mask; unsigned int hash_shift; - struct hlist_bl_head *table; + struct dm_hlist_head *table; };
struct dm_snapshot { @@ -628,8 +633,8 @@ static uint32_t exception_hash(struct dm_exception_table *et, chunk_t chunk);
/* Lock to protect access to the completed and pending exception hash tables. */ struct dm_exception_table_lock { - struct hlist_bl_head *complete_slot; - struct hlist_bl_head *pending_slot; + spinlock_t *complete_slot; + spinlock_t *pending_slot; };
static void dm_exception_table_lock_init(struct dm_snapshot *s, chunk_t chunk, @@ -638,20 +643,20 @@ static void dm_exception_table_lock_init(struct dm_snapshot *s, chunk_t chunk, struct dm_exception_table *complete = &s->complete; struct dm_exception_table *pending = &s->pending;
- lock->complete_slot = &complete->table[exception_hash(complete, chunk)]; - lock->pending_slot = &pending->table[exception_hash(pending, chunk)]; + lock->complete_slot = &complete->table[exception_hash(complete, chunk)].lock; + lock->pending_slot = &pending->table[exception_hash(pending, chunk)].lock; }
static void dm_exception_table_lock(struct dm_exception_table_lock *lock) { - hlist_bl_lock(lock->complete_slot); - hlist_bl_lock(lock->pending_slot); + spin_lock_nested(lock->complete_slot, 1); + spin_lock_nested(lock->pending_slot, 2); }
static void dm_exception_table_unlock(struct dm_exception_table_lock *lock) { - hlist_bl_unlock(lock->pending_slot); - hlist_bl_unlock(lock->complete_slot); + spin_unlock(lock->pending_slot); + spin_unlock(lock->complete_slot); }
static int dm_exception_table_init(struct dm_exception_table *et, @@ -661,13 +666,15 @@ static int dm_exception_table_init(struct dm_exception_table *et,
et->hash_shift = hash_shift; et->hash_mask = size - 1; - et->table = kvmalloc_array(size, sizeof(struct hlist_bl_head), + et->table = kvmalloc_array(size, sizeof(struct dm_hlist_head), GFP_KERNEL); if (!et->table) return -ENOMEM;
- for (i = 0; i < size; i++) - INIT_HLIST_BL_HEAD(et->table + i); + for (i = 0; i < size; i++) { + INIT_HLIST_HEAD(&et->table[i].head); + spin_lock_init(&et->table[i].lock); + }
return 0; } @@ -675,16 +682,17 @@ static int dm_exception_table_init(struct dm_exception_table *et, static void dm_exception_table_exit(struct dm_exception_table *et, struct kmem_cache *mem) { - struct hlist_bl_head *slot; + struct dm_hlist_head *slot; struct dm_exception *ex; - struct hlist_bl_node *pos, *n; + struct hlist_node *pos; int i, size;
size = et->hash_mask + 1; for (i = 0; i < size; i++) { slot = et->table + i;
- hlist_bl_for_each_entry_safe(ex, pos, n, slot, hash_list) { + hlist_for_each_entry_safe(ex, pos, &slot->head, hash_list) { + hlist_del(&ex->hash_list); kmem_cache_free(mem, ex); cond_resched(); } @@ -700,7 +708,7 @@ static uint32_t exception_hash(struct dm_exception_table *et, chunk_t chunk)
static void dm_remove_exception(struct dm_exception *e) { - hlist_bl_del(&e->hash_list); + hlist_del(&e->hash_list); }
/* @@ -710,12 +718,11 @@ static void dm_remove_exception(struct dm_exception *e) static struct dm_exception *dm_lookup_exception(struct dm_exception_table *et, chunk_t chunk) { - struct hlist_bl_head *slot; - struct hlist_bl_node *pos; + struct hlist_head *slot; struct dm_exception *e;
- slot = &et->table[exception_hash(et, chunk)]; - hlist_bl_for_each_entry(e, pos, slot, hash_list) + slot = &et->table[exception_hash(et, chunk)].head; + hlist_for_each_entry(e, slot, hash_list) if (chunk >= e->old_chunk && chunk <= e->old_chunk + dm_consecutive_chunk_count(e)) return e; @@ -762,18 +769,17 @@ static void free_pending_exception(struct dm_snap_pending_exception *pe) static void dm_insert_exception(struct dm_exception_table *eh, struct dm_exception *new_e) { - struct hlist_bl_head *l; - struct hlist_bl_node *pos; + struct hlist_head *l; struct dm_exception *e = NULL;
- l = &eh->table[exception_hash(eh, new_e->old_chunk)]; + l = &eh->table[exception_hash(eh, new_e->old_chunk)].head;
/* Add immediately if this table doesn't support consecutive chunks */ if (!eh->hash_shift) goto out;
/* List is ordered by old_chunk */ - hlist_bl_for_each_entry(e, pos, l, hash_list) { + hlist_for_each_entry(e, l, hash_list) { /* Insert after an existing chunk? */ if (new_e->old_chunk == (e->old_chunk + dm_consecutive_chunk_count(e) + 1) && @@ -804,13 +810,13 @@ static void dm_insert_exception(struct dm_exception_table *eh, * Either the table doesn't support consecutive chunks or slot * l is empty. */ - hlist_bl_add_head(&new_e->hash_list, l); + hlist_add_head(&new_e->hash_list, l); } else if (new_e->old_chunk < e->old_chunk) { /* Add before an existing exception */ - hlist_bl_add_before(&new_e->hash_list, &e->hash_list); + hlist_add_before(&new_e->hash_list, &e->hash_list); } else { /* Add to l's tail: e is the last exception in this slot */ - hlist_bl_add_behind(&new_e->hash_list, &e->hash_list); + hlist_add_behind(&new_e->hash_list, &e->hash_list); } }
@@ -820,7 +826,6 @@ static void dm_insert_exception(struct dm_exception_table *eh, */ static int dm_add_exception(void *context, chunk_t old, chunk_t new) { - struct dm_exception_table_lock lock; struct dm_snapshot *s = context; struct dm_exception *e;
@@ -833,17 +838,7 @@ static int dm_add_exception(void *context, chunk_t old, chunk_t new) /* Consecutive_count is implicitly initialised to zero */ e->new_chunk = new;
- /* - * Although there is no need to lock access to the exception tables - * here, if we don't then hlist_bl_add_head(), called by - * dm_insert_exception(), will complain about accessing the - * corresponding list without locking it first. - */ - dm_exception_table_lock_init(s, old, &lock); - - dm_exception_table_lock(&lock); dm_insert_exception(&s->complete, e); - dm_exception_table_unlock(&lock);
return 0; } @@ -873,7 +868,7 @@ static int calc_max_buckets(void) /* use a fixed size of 2MB */ unsigned long mem = 2 * 1024 * 1024;
- mem /= sizeof(struct hlist_bl_head); + mem /= sizeof(struct dm_hlist_head);
return mem; }
From: Mikulas Patocka mpatocka@redhat.com
[ Upstream commit d9f3e47d3fae0c101d9094bc956ed24e7a0ee801 ]
There are two problems with the recursive correction:
1. It may cause denial-of-service. In fec_read_bufs, there is a loop that has 253 iterations. For each iteration, we may call verity_hash_for_block recursively. There is a limit of 4 nested recursions - that means that there may be at most 253^4 (4 billion) iterations. Red Hat QE team actually created an image that pushes dm-verity to this limit - and this image just makes the udev-worker process get stuck in the 'D' state.
2. It doesn't work. In fec_read_bufs we store data into the variable "fio->bufs", but fio bufs is shared between recursive invocations, if "verity_hash_for_block" invoked correction recursively, it would overwrite partially filled fio->bufs.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Reported-by: Guangwu Zhang guazhang@redhat.com Reviewed-by: Sami Tolvanen samitolvanen@google.com Reviewed-by: Eric Biggers ebiggers@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Summary and Recommendation
### What Problem This Commit Solves
This commit fixes **two critical bugs** in dm-verity's Forward Error Correction (FEC):
1. **Denial-of-Service vulnerability**: The recursive FEC allows 4 levels of nesting with 253 iterations per level, resulting in up to 253^4 (~4 billion) potential iterations. Red Hat QE demonstrated this causes the `udev-worker` process to hang in uninterruptible 'D' state.
2. **Data corruption bug**: The `fio->bufs` buffer is shared across all recursion levels. When `verity_hash_for_block` triggers nested FEC correction, it corrupts partially-filled buffers from outer levels. The recursive FEC feature fundamentally doesn't work.
### Stable Kernel Criteria Assessment
| Criterion | Assessment | |-----------|------------| | Obviously correct | ✅ YES - Simple condition change that completely disables broken recursion | | Fixes real bug | ✅ YES - DoS and data corruption, reproducible by Red Hat QE | | Important issue | ✅ YES - System hang (DoS), affects Android/Chromebook verified boot | | Small and contained | ✅ YES - ~20 lines across 3 files, removes code rather than adding | | No new features | ✅ YES - Removes broken functionality |
### Risk vs Benefit Analysis
**Benefits:** - Eliminates a system-hang DoS vulnerability - Fixes a data corruption bug in FEC recovery - Affects widely-deployed dm-verity users (Android, Chromebooks, verified boot systems) - Conservative fix - disables broken feature rather than complex repair
**Risks:** - Minimal - the recursive FEC was fundamentally broken anyway - Version bump (1.12→1.13) is cosmetic; documents behavioral change - Theoretical: some error correction scenarios may not work, but they were already broken
### Additional Considerations
- **Reviewers**: Sami Tolvanen (Google) and Eric Biggers (kernel crypto expert) - strong vetting - **Author**: Mikulas Patocka, dm subsystem maintainer - **Bug origin**: FEC feature added in 2015 (commit a739ff3f543af), so affects all current LTS kernels - **Dependencies**: Self-contained, should apply cleanly to stable trees - **No explicit `Cc: stable`**: But severity and fix quality strongly support backporting
### Conclusion
This is an excellent stable candidate. It fixes a proven DoS vulnerability and data corruption bug in security-critical dm-verity infrastructure. The fix is minimal, conservative (disables rather than patches), well-reviewed by domain experts, and authored by the subsystem maintainer. The affected FEC recursion feature was broken since introduction, so removing it has no practical downside. The user impact is high given dm-verity's deployment in Android and other verified boot systems.
**YES**
drivers/md/dm-verity-fec.c | 4 +--- drivers/md/dm-verity-fec.h | 3 --- drivers/md/dm-verity-target.c | 2 +- 3 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c index 72047b47a7a0a..e41bde1d3b15b 100644 --- a/drivers/md/dm-verity-fec.c +++ b/drivers/md/dm-verity-fec.c @@ -413,10 +413,8 @@ int verity_fec_decode(struct dm_verity *v, struct dm_verity_io *io, if (!verity_fec_is_enabled(v)) return -EOPNOTSUPP;
- if (fio->level >= DM_VERITY_FEC_MAX_RECURSION) { - DMWARN_LIMIT("%s: FEC: recursion too deep", v->data_dev->name); + if (fio->level) return -EIO; - }
fio->level++;
diff --git a/drivers/md/dm-verity-fec.h b/drivers/md/dm-verity-fec.h index 09123a6129538..ec37e607cb3f0 100644 --- a/drivers/md/dm-verity-fec.h +++ b/drivers/md/dm-verity-fec.h @@ -23,9 +23,6 @@ #define DM_VERITY_FEC_BUF_MAX \ (1 << (PAGE_SHIFT - DM_VERITY_FEC_BUF_RS_BITS))
-/* maximum recursion level for verity_fec_decode */ -#define DM_VERITY_FEC_MAX_RECURSION 4 - #define DM_VERITY_OPT_FEC_DEV "use_fec_from_device" #define DM_VERITY_OPT_FEC_BLOCKS "fec_blocks" #define DM_VERITY_OPT_FEC_START "fec_start" diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c index 66a00a8ccb398..c8695c079cfe0 100644 --- a/drivers/md/dm-verity-target.c +++ b/drivers/md/dm-verity-target.c @@ -1690,7 +1690,7 @@ static struct target_type verity_target = { .name = "verity", /* Note: the LSMs depend on the singleton and immutable features */ .features = DM_TARGET_SINGLETON | DM_TARGET_IMMUTABLE, - .version = {1, 12, 0}, + .version = {1, 13, 0}, .module = THIS_MODULE, .ctr = verity_ctr, .dtr = verity_dtr,
linux-stable-mirror@lists.linaro.org