May 2025 - Linux-stable-mirror

[PATCH AUTOSEL 6.1 1/3] fs/filesystems: Fix potential unsigned integer underflow in fs_name()

by Sasha Levin

From: Zijun Hu <quic_zijuhu(a)quicinc.com> [ Upstream commit 1363c134ade81e425873b410566e957fecebb261 ] fs_name() has @index as unsigned int, so there is underflow risk for operation '@index--'. Fix by breaking the for loop when '@index == 0' which is also more proper than '@index <= 0' for unsigned integer comparison. Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> Link: https://lore.kernel.org/20250410-fix_fs-v1-1-7c14ccc8ebaa@quicinc.com Signed-off-by: Christian Brauner <brauner(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Detailed Analysis:** **The Bug:** The `fs_name()` function at `fs/filesystems.c:156-174` has a critical unsigned integer underflow vulnerability. When the function receives `index=0` as a parameter, the loop `for (tmp = file_systems; tmp; tmp = tmp->next, index--)` decrements `index` from 0 to `UINT_MAX` (4294967295 on 32-bit systems), causing the condition `if (index <= 0 && try_module_get(tmp->owner))` to evaluate incorrectly. **The Fix:** The commit changes the logic from: - Old: `if (index <= 0 && try_module_get(tmp->owner))` - New: `if (index == 0) { if (try_module_get(tmp->owner)) res = 0; break; }` This prevents the unsigned integer from wrapping around and provides proper bounds checking. **Impact and Severity:** 1. **User-accessible vulnerability**: The `fs_name()` function is called through the `sysfs` system call (syscall #139) with option 2, making it directly accessible to userspace applications. 2. **Potential for exploitation**: An attacker could call `sysfs(2, 0, buffer)` to trigger the underflow, potentially causing: - Infinite loops in the filesystem list traversal - Unintended module reference acquisition - System instability or denial of service 3. **Core filesystem subsystem**: This affects the fundamental filesystem registration mechanism in the kernel. **Comparison with Similar Commits:** This follows the same pattern as the **accepted backport examples**: - **Similar to Commit #1 (ntfs3)**: Both fix integer overflow/underflow issues that could cause system instability - **Similar to Commit #3 (f2fs)**: Both prevent integer arithmetic issues in filesystem code - **Similar to Commit #5 (f2fs)**: Both add bounds checking to prevent corruption **Stable Tree Criteria:** ✅ **Fixes important bug**: Prevents potential system instability and undefined behavior ✅ **Small and contained**: Minimal code change, only affects one function ✅ **Clear side effects**: No architectural changes, just safer bounds checking ✅ **Low regression risk**: The fix makes the function more robust without changing expected behavior ✅ **Critical subsystem**: Filesystem management is fundamental to kernel operation **Conclusion:** This is a textbook example of a commit suitable for stable backporting: it fixes a clear bug with security implications in core kernel infrastructure, uses a minimal and safe approach, and has no risk of introducing regressions. The unsigned integer underflow could lead to system instability when triggered through the accessible `sysfs` syscall. fs/filesystems.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 58b9067b2391c..95e5256821a53 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -156,15 +156,19 @@ static int fs_index(const char __user * __name) static int fs_name(unsigned int index, char __user * buf) { struct file_system_type * tmp; - int len, res; + int len, res = -EINVAL; read_lock(&file_systems_lock); - for (tmp = file_systems; tmp; tmp = tmp->next, index--) - if (index <= 0 && try_module_get(tmp->owner)) + for (tmp = file_systems; tmp; tmp = tmp->next, index--) { + if (index == 0) { + if (try_module_get(tmp->owner)) + res = 0; break; + } + } read_unlock(&file_systems_lock); - if (!tmp) - return -EINVAL; + if (res) + return res; /* OK, we got the reference, so we can safely block */ len = strlen(tmp->name) + 1; -- 2.39.5

4 days, 8 hours

1
2
0 0

[PATCH AUTOSEL 6.6 1/3] fs/filesystems: Fix potential unsigned integer underflow in fs_name()

by Sasha Levin

From: Zijun Hu <quic_zijuhu(a)quicinc.com> [ Upstream commit 1363c134ade81e425873b410566e957fecebb261 ] fs_name() has @index as unsigned int, so there is underflow risk for operation '@index--'. Fix by breaking the for loop when '@index == 0' which is also more proper than '@index <= 0' for unsigned integer comparison. Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> Link: https://lore.kernel.org/20250410-fix_fs-v1-1-7c14ccc8ebaa@quicinc.com Signed-off-by: Christian Brauner <brauner(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Detailed Analysis:** **The Bug:** The `fs_name()` function at `fs/filesystems.c:156-174` has a critical unsigned integer underflow vulnerability. When the function receives `index=0` as a parameter, the loop `for (tmp = file_systems; tmp; tmp = tmp->next, index--)` decrements `index` from 0 to `UINT_MAX` (4294967295 on 32-bit systems), causing the condition `if (index <= 0 && try_module_get(tmp->owner))` to evaluate incorrectly. **The Fix:** The commit changes the logic from: - Old: `if (index <= 0 && try_module_get(tmp->owner))` - New: `if (index == 0) { if (try_module_get(tmp->owner)) res = 0; break; }` This prevents the unsigned integer from wrapping around and provides proper bounds checking. **Impact and Severity:** 1. **User-accessible vulnerability**: The `fs_name()` function is called through the `sysfs` system call (syscall #139) with option 2, making it directly accessible to userspace applications. 2. **Potential for exploitation**: An attacker could call `sysfs(2, 0, buffer)` to trigger the underflow, potentially causing: - Infinite loops in the filesystem list traversal - Unintended module reference acquisition - System instability or denial of service 3. **Core filesystem subsystem**: This affects the fundamental filesystem registration mechanism in the kernel. **Comparison with Similar Commits:** This follows the same pattern as the **accepted backport examples**: - **Similar to Commit #1 (ntfs3)**: Both fix integer overflow/underflow issues that could cause system instability - **Similar to Commit #3 (f2fs)**: Both prevent integer arithmetic issues in filesystem code - **Similar to Commit #5 (f2fs)**: Both add bounds checking to prevent corruption **Stable Tree Criteria:** ✅ **Fixes important bug**: Prevents potential system instability and undefined behavior ✅ **Small and contained**: Minimal code change, only affects one function ✅ **Clear side effects**: No architectural changes, just safer bounds checking ✅ **Low regression risk**: The fix makes the function more robust without changing expected behavior ✅ **Critical subsystem**: Filesystem management is fundamental to kernel operation **Conclusion:** This is a textbook example of a commit suitable for stable backporting: it fixes a clear bug with security implications in core kernel infrastructure, uses a minimal and safe approach, and has no risk of introducing regressions. The unsigned integer underflow could lead to system instability when triggered through the accessible `sysfs` syscall. fs/filesystems.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/fs/filesystems.c b/fs/filesystems.c index 58b9067b2391c..95e5256821a53 100644 --- a/fs/filesystems.c +++ b/fs/filesystems.c @@ -156,15 +156,19 @@ static int fs_index(const char __user * __name) static int fs_name(unsigned int index, char __user * buf) { struct file_system_type * tmp; - int len, res; + int len, res = -EINVAL; read_lock(&file_systems_lock); - for (tmp = file_systems; tmp; tmp = tmp->next, index--) - if (index <= 0 && try_module_get(tmp->owner)) + for (tmp = file_systems; tmp; tmp = tmp->next, index--) { + if (index == 0) { + if (try_module_get(tmp->owner)) + res = 0; break; + } + } read_unlock(&file_systems_lock); - if (!tmp) - return -EINVAL; + if (res) + return res; /* OK, we got the reference, so we can safely block */ len = strlen(tmp->name) + 1; -- 2.39.5

4 days, 8 hours

1
2
0 0

[PATCH AUTOSEL 6.12 1/7] btrfs: exit after state insertion failure at btrfs_convert_extent_bit()

by Sasha Levin

From: Filipe Manana <fdmanana(a)suse.com> [ Upstream commit 3bf179e36da917c5d9bec71c714573ed1649b7c1 ] If insert_state() state failed it returns an error pointer and we call extent_io_tree_panic() which will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an uncommon and exotic scenario, then we fallthrough and call cache_state() which will dereference the error pointer, resulting in an invalid memory access. So jump to the 'out' label after calling extent_io_tree_panic(), it also makes the code more clear besides dealing with the exotic scenario where CONFIG_BUG is disabled. Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Detailed Analysis:** **1. Bug Description and Impact:** The commit fixes a serious potential memory corruption bug in the btrfs filesystem. When `insert_state()` fails in `btrfs_convert_extent_bit()`, it returns an error pointer (ERR_PTR). The current code calls `extent_io_tree_panic()` which triggers `BUG()`, but if `CONFIG_BUG` is disabled, the code continues executing and calls `cache_state(inserted_state, cached_state)` with the error pointer, causing invalid memory access. **2. Code Changes Analysis:** The fix is minimal and surgical - just adding a single `goto out;` statement after the `extent_io_tree_panic()` call: ```c if (IS_ERR(inserted_state)) { ret = PTR_ERR(inserted_state); extent_io_tree_panic(tree, prealloc, "insert", ret); + goto out; // <-- The fix } ``` This ensures that when `CONFIG_BUG` is disabled, execution jumps to the cleanup code instead of continuing with an invalid pointer. **3. Comparison with Similar Commits:** This commit aligns with the pattern seen in "Similar Commit #2" (Status: YES), which also: - Removes reliance on `BUG_ON()` behavior - Provides graceful error handling - Has minimal risk - Fixes a potential crash/corruption scenario Similar to commit #3 and #5 (both Status: NO), this touches BUG() handling, but unlike those commits which make broader architectural changes to error handling patterns, this fix is much more contained. **4. Stable Tree Criteria Assessment:** ✅ **Fixes important bug**: Prevents potential memory corruption/crashes ✅ **Small and contained**: Single line addition ✅ **Minimal risk**: Only affects error path when insert_state() fails AND CONFIG_BUG is disabled ✅ **No new features**: Pure bug fix ✅ **No architectural changes**: Preserves existing error handling, just prevents fallthrough ✅ **Critical subsystem**: btrfs filesystem corruption prevention ✅ **Clear side effects**: No unintended consequences beyond fixing the bug **5. Risk Assessment:** - **Very Low Risk**: The change only affects an error condition that's already problematic - **Exotic scenario**: Only impacts systems with `CONFIG_BUG` disabled (uncommon but not impossible) - **No regression potential**: The change only prevents executing invalid code, doesn't change normal operation - **Well-contained**: Affects only one function in one file **6. Security Implications:** While `CONFIG_BUG` disabled is uncommon, this could potentially be exploited if an attacker can trigger the `insert_state()` failure condition, leading to memory corruption. The fix prevents this attack vector. This is a clear candidate for stable backporting - it fixes a real bug with minimal risk and follows the stable tree rules perfectly. fs/btrfs/extent-io-tree.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 6d08c100b01de..bb3aaf610652a 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -1456,6 +1456,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, if (IS_ERR(inserted_state)) { ret = PTR_ERR(inserted_state); extent_io_tree_panic(tree, prealloc, "insert", ret); + goto out; } cache_state(inserted_state, cached_state); if (inserted_state == prealloc) -- 2.39.5

4 days, 8 hours

1
6
0 0

[PATCH AUTOSEL 6.14 1/8] btrfs: exit after state insertion failure at btrfs_convert_extent_bit()

by Sasha Levin

From: Filipe Manana <fdmanana(a)suse.com> [ Upstream commit 3bf179e36da917c5d9bec71c714573ed1649b7c1 ] If insert_state() state failed it returns an error pointer and we call extent_io_tree_panic() which will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an uncommon and exotic scenario, then we fallthrough and call cache_state() which will dereference the error pointer, resulting in an invalid memory access. So jump to the 'out' label after calling extent_io_tree_panic(), it also makes the code more clear besides dealing with the exotic scenario where CONFIG_BUG is disabled. Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Detailed Analysis:** **1. Bug Description and Impact:** The commit fixes a serious potential memory corruption bug in the btrfs filesystem. When `insert_state()` fails in `btrfs_convert_extent_bit()`, it returns an error pointer (ERR_PTR). The current code calls `extent_io_tree_panic()` which triggers `BUG()`, but if `CONFIG_BUG` is disabled, the code continues executing and calls `cache_state(inserted_state, cached_state)` with the error pointer, causing invalid memory access. **2. Code Changes Analysis:** The fix is minimal and surgical - just adding a single `goto out;` statement after the `extent_io_tree_panic()` call: ```c if (IS_ERR(inserted_state)) { ret = PTR_ERR(inserted_state); extent_io_tree_panic(tree, prealloc, "insert", ret); + goto out; // <-- The fix } ``` This ensures that when `CONFIG_BUG` is disabled, execution jumps to the cleanup code instead of continuing with an invalid pointer. **3. Comparison with Similar Commits:** This commit aligns with the pattern seen in "Similar Commit #2" (Status: YES), which also: - Removes reliance on `BUG_ON()` behavior - Provides graceful error handling - Has minimal risk - Fixes a potential crash/corruption scenario Similar to commit #3 and #5 (both Status: NO), this touches BUG() handling, but unlike those commits which make broader architectural changes to error handling patterns, this fix is much more contained. **4. Stable Tree Criteria Assessment:** ✅ **Fixes important bug**: Prevents potential memory corruption/crashes ✅ **Small and contained**: Single line addition ✅ **Minimal risk**: Only affects error path when insert_state() fails AND CONFIG_BUG is disabled ✅ **No new features**: Pure bug fix ✅ **No architectural changes**: Preserves existing error handling, just prevents fallthrough ✅ **Critical subsystem**: btrfs filesystem corruption prevention ✅ **Clear side effects**: No unintended consequences beyond fixing the bug **5. Risk Assessment:** - **Very Low Risk**: The change only affects an error condition that's already problematic - **Exotic scenario**: Only impacts systems with `CONFIG_BUG` disabled (uncommon but not impossible) - **No regression potential**: The change only prevents executing invalid code, doesn't change normal operation - **Well-contained**: Affects only one function in one file **6. Security Implications:** While `CONFIG_BUG` disabled is uncommon, this could potentially be exploited if an attacker can trigger the `insert_state()` failure condition, leading to memory corruption. The fix prevents this attack vector. This is a clear candidate for stable backporting - it fixes a real bug with minimal risk and follows the stable tree rules perfectly. fs/btrfs/extent-io-tree.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 6d08c100b01de..bb3aaf610652a 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -1456,6 +1456,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, if (IS_ERR(inserted_state)) { ret = PTR_ERR(inserted_state); extent_io_tree_panic(tree, prealloc, "insert", ret); + goto out; } cache_state(inserted_state, cached_state); if (inserted_state == prealloc) -- 2.39.5

4 days, 8 hours

1
7
0 0

[PATCH AUTOSEL 6.15 1/9] btrfs: exit after state insertion failure at btrfs_convert_extent_bit()

by Sasha Levin

From: Filipe Manana <fdmanana(a)suse.com> [ Upstream commit 3bf179e36da917c5d9bec71c714573ed1649b7c1 ] If insert_state() state failed it returns an error pointer and we call extent_io_tree_panic() which will trigger a BUG() call. However if CONFIG_BUG is disabled, which is an uncommon and exotic scenario, then we fallthrough and call cache_state() which will dereference the error pointer, resulting in an invalid memory access. So jump to the 'out' label after calling extent_io_tree_panic(), it also makes the code more clear besides dealing with the exotic scenario where CONFIG_BUG is disabled. Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- **YES** This commit should be backported to stable kernel trees. **Detailed Analysis:** **1. Bug Description and Impact:** The commit fixes a serious potential memory corruption bug in the btrfs filesystem. When `insert_state()` fails in `btrfs_convert_extent_bit()`, it returns an error pointer (ERR_PTR). The current code calls `extent_io_tree_panic()` which triggers `BUG()`, but if `CONFIG_BUG` is disabled, the code continues executing and calls `cache_state(inserted_state, cached_state)` with the error pointer, causing invalid memory access. **2. Code Changes Analysis:** The fix is minimal and surgical - just adding a single `goto out;` statement after the `extent_io_tree_panic()` call: ```c if (IS_ERR(inserted_state)) { ret = PTR_ERR(inserted_state); extent_io_tree_panic(tree, prealloc, "insert", ret); + goto out; // <-- The fix } ``` This ensures that when `CONFIG_BUG` is disabled, execution jumps to the cleanup code instead of continuing with an invalid pointer. **3. Comparison with Similar Commits:** This commit aligns with the pattern seen in "Similar Commit #2" (Status: YES), which also: - Removes reliance on `BUG_ON()` behavior - Provides graceful error handling - Has minimal risk - Fixes a potential crash/corruption scenario Similar to commit #3 and #5 (both Status: NO), this touches BUG() handling, but unlike those commits which make broader architectural changes to error handling patterns, this fix is much more contained. **4. Stable Tree Criteria Assessment:** ✅ **Fixes important bug**: Prevents potential memory corruption/crashes ✅ **Small and contained**: Single line addition ✅ **Minimal risk**: Only affects error path when insert_state() fails AND CONFIG_BUG is disabled ✅ **No new features**: Pure bug fix ✅ **No architectural changes**: Preserves existing error handling, just prevents fallthrough ✅ **Critical subsystem**: btrfs filesystem corruption prevention ✅ **Clear side effects**: No unintended consequences beyond fixing the bug **5. Risk Assessment:** - **Very Low Risk**: The change only affects an error condition that's already problematic - **Exotic scenario**: Only impacts systems with `CONFIG_BUG` disabled (uncommon but not impossible) - **No regression potential**: The change only prevents executing invalid code, doesn't change normal operation - **Well-contained**: Affects only one function in one file **6. Security Implications:** While `CONFIG_BUG` disabled is uncommon, this could potentially be exploited if an attacker can trigger the `insert_state()` failure condition, leading to memory corruption. The fix prevents this attack vector. This is a clear candidate for stable backporting - it fixes a real bug with minimal risk and follows the stable tree rules perfectly. fs/btrfs/extent-io-tree.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent-io-tree.c b/fs/btrfs/extent-io-tree.c index 13de6af279e52..92cfde37b1d33 100644 --- a/fs/btrfs/extent-io-tree.c +++ b/fs/btrfs/extent-io-tree.c @@ -1456,6 +1456,7 @@ int convert_extent_bit(struct extent_io_tree *tree, u64 start, u64 end, if (IS_ERR(inserted_state)) { ret = PTR_ERR(inserted_state); extent_io_tree_panic(tree, prealloc, "insert", ret); + goto out; } cache_state(inserted_state, cached_state); if (inserted_state == prealloc) -- 2.39.5

4 days, 8 hours

1
8
0 0

[PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table

by Gavin Guo

There is ABBA dead locking scenario happening between hugetlb_fault() and hugetlb_wp() on the pagecache folio's lock and hugetlb global mutex, which is reproducible with syzkaller [1]. As below stack traces reveal, process-1 tries to take the hugetlb global mutex (A3), but with the pagecache folio's lock hold. Process-2 took the hugetlb global mutex but tries to take the pagecache folio's lock. Process-1 Process-2 ========= ========= hugetlb_fault mutex_lock (A1) filemap_lock_hugetlb_folio (B1) hugetlb_wp alloc_hugetlb_folio #error mutex_unlock (A2) hugetlb_fault mutex_lock (A4) filemap_lock_hugetlb_folio (B4) unmap_ref_private mutex_lock (A3) Fix it by releasing the pagecache folio's lock at (A2) of process-1 so that pagecache folio's lock is available to process-2 at (B4), to avoid the deadlock. In process-1, a new variable is added to track if the pagecache folio's lock has been released by its child function hugetlb_wp() to avoid double releases on the lock in hugetlb_fault(). The similar changes are applied to hugetlb_no_page(). Link: https://drive.google.com/file/d/1DVRnIW-vSayU5J1re9Ct_br3jJQU6Vpb/view?usp=… [1] Fixes: 40549ba8f8e0 ("hugetlb: use new vma_lock for pmd sharing synchronization") Cc: <stable(a)vger.kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Florent Revest <revest(a)google.com> Reviewed-by: Gavin Shan <gshan(a)redhat.com> Signed-off-by: Gavin Guo <gavinguo(a)igalia.com> --- V1 -> V2 Suggested-by Oscar Salvador: - Use folio_test_locked to replace the unnecessary parameter passing. V2 -> V3 - Dropped the approach suggested by Oscar. - Refine the code and git commit suggested by Gavin Shan. mm/hugetlb.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6a3cf7935c14..560b9b35262a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6137,7 +6137,8 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, * Keep the pte_same checks anyway to make transition from the mutex easier. */ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, - struct vm_fault *vmf) + struct vm_fault *vmf, + bool *pagecache_folio_locked) { struct vm_area_struct *vma = vmf->vma; struct mm_struct *mm = vma->vm_mm; @@ -6234,6 +6235,18 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, u32 hash; folio_put(old_folio); + /* + * The pagecache_folio has to be unlocked to avoid + * deadlock and we won't re-lock it in hugetlb_wp(). The + * pagecache_folio could be truncated after being + * unlocked. So its state should not be reliable + * subsequently. + */ + if (pagecache_folio) { + folio_unlock(pagecache_folio); + if (pagecache_folio_locked) + *pagecache_folio_locked = false; + } /* * Drop hugetlb_fault_mutex and vma_lock before * unmapping. unmapping needs to hold vma_lock @@ -6588,7 +6601,7 @@ static vm_fault_t hugetlb_no_page(struct address_space *mapping, hugetlb_count_add(pages_per_huge_page(h), mm); if ((vmf->flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { /* Optimization, do the COW without a second fault */ - ret = hugetlb_wp(folio, vmf); + ret = hugetlb_wp(folio, vmf, NULL); } spin_unlock(vmf->ptl); @@ -6660,6 +6673,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, struct hstate *h = hstate_vma(vma); struct address_space *mapping; int need_wait_lock = 0; + bool pagecache_folio_locked = true; struct vm_fault vmf = { .vma = vma, .address = address & huge_page_mask(h), @@ -6814,7 +6828,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { if (!huge_pte_write(vmf.orig_pte)) { - ret = hugetlb_wp(pagecache_folio, &vmf); + ret = hugetlb_wp(pagecache_folio, &vmf, + &pagecache_folio_locked); goto out_put_page; } else if (likely(flags & FAULT_FLAG_WRITE)) { vmf.orig_pte = huge_pte_mkdirty(vmf.orig_pte); @@ -6832,7 +6847,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(vmf.ptl); if (pagecache_folio) { - folio_unlock(pagecache_folio); + if (pagecache_folio_locked) + folio_unlock(pagecache_folio); + folio_put(pagecache_folio); } out_mutex: base-commit: 914873bc7df913db988284876c16257e6ab772c6 -- 2.43.0

4 days, 9 hours

5
10
0 0

[PATCH 2/8] drm/fdinfo: Switch to idr_for_each() in drm_show_memory_stats()

by Simona Vetter

Unlike idr_for_each_entry(), which terminates on the first NULL entry, idr_for_each passes them through. This fixes potential issues with the idr walk terminating prematurely due to transient NULL entries the exist when creating and destroying a handle. Note that transient NULL pointers in drm_file.object_idr have been a thing since f6cd7daecff5 ("drm: Release driver references to handle before making it available again"), this is a really old issue. Aside from temporarily inconsistent fdinfo statistic there's no other impact of this issue. Fixes: 686b21b5f6ca ("drm: Add fdinfo memory stats") Cc: Rob Clark <robdclark(a)chromium.org> Cc: Emil Velikov <emil.l.velikov(a)gmail.com> Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.5+ Signed-off-by: Simona Vetter <simona.vetter(a)intel.com> Signed-off-by: Simona Vetter <simona.vetter(a)ffwll.ch> --- drivers/gpu/drm/drm_file.c | 95 ++++++++++++++++++++++---------------- 1 file changed, 55 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c index 246cf845e2c9..428a4eb85e94 100644 --- a/drivers/gpu/drm/drm_file.c +++ b/drivers/gpu/drm/drm_file.c @@ -892,6 +892,58 @@ void drm_print_memory_stats(struct drm_printer *p, } EXPORT_SYMBOL(drm_print_memory_stats); +struct drm_bo_print_data { + struct drm_memory_stats status; + enum drm_gem_object_status supported_status; +}; + +static int +drm_bo_memory_stats(int id, void *ptr, void *data) +{ + struct drm_bo_print_data *drm_data; + struct drm_gem_object *obj = ptr; + enum drm_gem_object_status s = 0; + size_t add_size; + + if (!obj) + return 0; + + add_size = (obj->funcs && obj->funcs->rss) ? + obj->funcs->rss(obj) : obj->size; + + if (obj->funcs && obj->funcs->status) { + s = obj->funcs->status(obj); + drm_data->supported_status |= s; + } + + if (drm_gem_object_is_shared_for_memory_stats(obj)) + drm_data->status.shared += obj->size; + else + drm_data->status.private += obj->size; + + if (s & DRM_GEM_OBJECT_RESIDENT) { + drm_data->status.resident += add_size; + } else { + /* If already purged or not yet backed by pages, don't + * count it as purgeable: + */ + s &= ~DRM_GEM_OBJECT_PURGEABLE; + } + + if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) { + drm_data->status.active += add_size; + drm_data->supported_status |= DRM_GEM_OBJECT_ACTIVE; + + /* If still active, don't count as purgeable: */ + s &= ~DRM_GEM_OBJECT_PURGEABLE; + } + + if (s & DRM_GEM_OBJECT_PURGEABLE) + drm_data->status.purgeable += add_size; + + return 0; +} + /** * drm_show_memory_stats - Helper to collect and show standard fdinfo memory stats * @p: the printer to print output to @@ -902,50 +954,13 @@ EXPORT_SYMBOL(drm_print_memory_stats); */ void drm_show_memory_stats(struct drm_printer *p, struct drm_file *file) { - struct drm_gem_object *obj; - struct drm_memory_stats status = {}; - enum drm_gem_object_status supported_status = 0; - int id; + struct drm_bo_print_data data = {}; spin_lock(&file->table_lock); - idr_for_each_entry (&file->object_idr, obj, id) { - enum drm_gem_object_status s = 0; - size_t add_size = (obj->funcs && obj->funcs->rss) ? - obj->funcs->rss(obj) : obj->size; - - if (obj->funcs && obj->funcs->status) { - s = obj->funcs->status(obj); - supported_status |= s; - } - - if (drm_gem_object_is_shared_for_memory_stats(obj)) - status.shared += obj->size; - else - status.private += obj->size; - - if (s & DRM_GEM_OBJECT_RESIDENT) { - status.resident += add_size; - } else { - /* If already purged or not yet backed by pages, don't - * count it as purgeable: - */ - s &= ~DRM_GEM_OBJECT_PURGEABLE; - } - - if (!dma_resv_test_signaled(obj->resv, dma_resv_usage_rw(true))) { - status.active += add_size; - supported_status |= DRM_GEM_OBJECT_ACTIVE; - - /* If still active, don't count as purgeable: */ - s &= ~DRM_GEM_OBJECT_PURGEABLE; - } - - if (s & DRM_GEM_OBJECT_PURGEABLE) - status.purgeable += add_size; - } + idr_for_each(&file->object_idr, &drm_bo_memory_stats, &data); spin_unlock(&file->table_lock); - drm_print_memory_stats(p, &status, supported_status, "memory"); + drm_print_memory_stats(p, &data.status, data.supported_status, "memory"); } EXPORT_SYMBOL(drm_show_memory_stats); -- 2.49.0

4 days, 10 hours

2
2
0 0

[PATCH v2] can: kvaser_pciefd: refine error prone echo_skb_max handling logic

by Fedor Pchelkin

echo_skb_max should define the supported upper limit of echo_skb[] allocated inside the netdevice's priv. The corresponding size value provided by this driver to alloc_candev() is KVASER_PCIEFD_CAN_TX_MAX_COUNT which is 17. But later echo_skb_max is rounded up to the nearest power of two (for the max case, that would be 32) and the tx/ack indices calculated further during tx/rx may exceed the upper array boundary. Kasan reported this for the ack case inside kvaser_pciefd_handle_ack_packet(), though the xmit function has actually caught the same thing earlier. BUG: KASAN: slab-out-of-bounds in kvaser_pciefd_handle_ack_packet+0x2d7/0x92a drivers/net/can/kvaser_pciefd.c:1528 Read of size 8 at addr ffff888105e4f078 by task swapper/4/0 CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Not tainted 6.15.0 #12 PREEMPT(voluntary) Call Trace: <IRQ> dump_stack_lvl lib/dump_stack.c:122 print_report mm/kasan/report.c:521 kasan_report mm/kasan/report.c:634 kvaser_pciefd_handle_ack_packet drivers/net/can/kvaser_pciefd.c:1528 kvaser_pciefd_read_packet drivers/net/can/kvaser_pciefd.c:1605 kvaser_pciefd_read_buffer drivers/net/can/kvaser_pciefd.c:1656 kvaser_pciefd_receive_irq drivers/net/can/kvaser_pciefd.c:1684 kvaser_pciefd_irq_handler drivers/net/can/kvaser_pciefd.c:1733 __handle_irq_event_percpu kernel/irq/handle.c:158 handle_irq_event kernel/irq/handle.c:210 handle_edge_irq kernel/irq/chip.c:833 __common_interrupt arch/x86/kernel/irq.c:296 common_interrupt arch/x86/kernel/irq.c:286 </IRQ> Tx max count definitely matters for kvaser_pciefd_tx_avail(), but for seq numbers' generation that's not the case - we're free to calculate them as would be more convenient, not taking tx max count into account. The only downside is that the size of echo_skb[] should correspond to the max seq number (not tx max count), so in some situations a bit more memory would be consumed than could be. Thus make the size of the underlying echo_skb[] sufficient for the rounded max tx value. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 8256e0ca6010 ("can: kvaser_pciefd: Fix echo_skb race") Cc: stable(a)vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru> --- v2: fix the problem by rounding up the KVASER_PCIEFD_CAN_TX_MAX_COUNT constant when allocating candev (Axel Forsman) drivers/net/can/kvaser_pciefd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/net/can/kvaser_pciefd.c b/drivers/net/can/kvaser_pciefd.c index f6921368cd14..0071a51ce2c1 100644 --- a/drivers/net/can/kvaser_pciefd.c +++ b/drivers/net/can/kvaser_pciefd.c @@ -966,7 +966,7 @@ static int kvaser_pciefd_setup_can_ctrls(struct kvaser_pciefd *pcie) u32 status, tx_nr_packets_max; netdev = alloc_candev(sizeof(struct kvaser_pciefd_can), - KVASER_PCIEFD_CAN_TX_MAX_COUNT); + roundup_pow_of_two(KVASER_PCIEFD_CAN_TX_MAX_COUNT)); if (!netdev) return -ENOMEM; @@ -995,7 +995,6 @@ static int kvaser_pciefd_setup_can_ctrls(struct kvaser_pciefd *pcie) can->tx_max_count = min(KVASER_PCIEFD_CAN_TX_MAX_COUNT, tx_nr_packets_max - 1); can->can.clock.freq = pcie->freq; - can->can.echo_skb_max = roundup_pow_of_two(can->tx_max_count); spin_lock_init(&can->lock); can->can.bittiming_const = &kvaser_pciefd_bittiming_const; -- 2.49.0

4 days, 11 hours

1
0
0 0

[PATCH] can: kvaser_pciefd: refine error prone echo_skb_max handling logic

by Fedor Pchelkin

echo_skb_max should define the supported upper limit of echo_skb[] allocated inside the netdevice's priv. The corresponding size value provided by this driver to alloc_candev() is KVASER_PCIEFD_CAN_TX_MAX_COUNT which is 17. But later echo_skb_max is rounded up to the nearest power of two (for the max case, that would be 32) and the tx/ack indices calculated further during tx/rx may exceed the upper array boundary. Kasan reported this for the ack case inside kvaser_pciefd_handle_ack_packet(), though the xmit function has actually caught the same thing earlier. BUG: KASAN: slab-out-of-bounds in kvaser_pciefd_handle_ack_packet+0x2d7/0x92a drivers/net/can/kvaser_pciefd.c:1528 Read of size 8 at addr ffff888105e4f078 by task swapper/4/0 CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Not tainted 6.15.0 #12 PREEMPT(voluntary) Call Trace: <IRQ> dump_stack_lvl lib/dump_stack.c:122 print_report mm/kasan/report.c:521 kasan_report mm/kasan/report.c:634 kvaser_pciefd_handle_ack_packet drivers/net/can/kvaser_pciefd.c:1528 kvaser_pciefd_read_packet drivers/net/can/kvaser_pciefd.c:1605 kvaser_pciefd_read_buffer drivers/net/can/kvaser_pciefd.c:1656 kvaser_pciefd_receive_irq drivers/net/can/kvaser_pciefd.c:1684 kvaser_pciefd_irq_handler drivers/net/can/kvaser_pciefd.c:1733 __handle_irq_event_percpu kernel/irq/handle.c:158 handle_irq_event kernel/irq/handle.c:210 handle_edge_irq kernel/irq/chip.c:833 __common_interrupt arch/x86/kernel/irq.c:296 common_interrupt arch/x86/kernel/irq.c:286 </IRQ> Remove echo_skb_max rounding as this may increase it to unexpected values. In this sense restore echo_skb_max handling logic prior to commit 8256e0ca6010 ("can: kvaser_pciefd: Fix echo_skb race"). Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: 8256e0ca6010 ("can: kvaser_pciefd: Fix echo_skb race") Cc: stable(a)vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru> --- Actually the trick with rounding up allows to calculate seq numbers efficiently, avoiding a more consuming 'mod' operation used in the current patch. I see tx max size definitely matters only for kvaser_pciefd_tx_avail(), but for seq numbers' generation that's not the case - we're free to calculate them as would be more convenient, not taking tx max size into account. The only downside is that the size of echo_skb[] should correspond to the max seq number (not tx max number), so in some situations a bit more memory would be consumed than could be. So another approach to fix the problem would be to precompute the rounded up value of echo_skb_max and pass it to alloc_candev() making the size of the underlying echo_skb[] sufficient. If that looks more acceptable, I'll be glad to rework the patch in that direction. drivers/net/can/kvaser_pciefd.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/can/kvaser_pciefd.c b/drivers/net/can/kvaser_pciefd.c index f6921368cd14..1ec4ab9510b6 100644 --- a/drivers/net/can/kvaser_pciefd.c +++ b/drivers/net/can/kvaser_pciefd.c @@ -411,7 +411,6 @@ struct kvaser_pciefd_can { void __iomem *reg_base; struct can_berr_counter bec; u8 cmd_seq; - u8 tx_max_count; u8 tx_idx; u8 ack_idx; int err_rep_cnt; @@ -760,7 +759,7 @@ static int kvaser_pciefd_stop(struct net_device *netdev) static unsigned int kvaser_pciefd_tx_avail(const struct kvaser_pciefd_can *can) { - return can->tx_max_count - (READ_ONCE(can->tx_idx) - READ_ONCE(can->ack_idx)); + return can->can.echo_skb_max - (READ_ONCE(can->tx_idx) - READ_ONCE(can->ack_idx)); } static int kvaser_pciefd_prepare_tx_packet(struct kvaser_pciefd_tx_packet *p, @@ -810,7 +809,7 @@ static netdev_tx_t kvaser_pciefd_start_xmit(struct sk_buff *skb, { struct kvaser_pciefd_can *can = netdev_priv(netdev); struct kvaser_pciefd_tx_packet packet; - unsigned int seq = can->tx_idx & (can->can.echo_skb_max - 1); + unsigned int seq = can->tx_idx % can->can.echo_skb_max; unsigned int frame_len; int nr_words; @@ -992,10 +991,9 @@ static int kvaser_pciefd_setup_can_ctrls(struct kvaser_pciefd *pcie) tx_nr_packets_max = FIELD_GET(KVASER_PCIEFD_KCAN_TX_NR_PACKETS_MAX_MASK, ioread32(can->reg_base + KVASER_PCIEFD_KCAN_TX_NR_PACKETS_REG)); - can->tx_max_count = min(KVASER_PCIEFD_CAN_TX_MAX_COUNT, tx_nr_packets_max - 1); + can->can.echo_skb_max = min(KVASER_PCIEFD_CAN_TX_MAX_COUNT, tx_nr_packets_max - 1); can->can.clock.freq = pcie->freq; - can->can.echo_skb_max = roundup_pow_of_two(can->tx_max_count); spin_lock_init(&can->lock); can->can.bittiming_const = &kvaser_pciefd_bittiming_const; @@ -1523,7 +1521,7 @@ static int kvaser_pciefd_handle_ack_packet(struct kvaser_pciefd *pcie, unsigned int len, frame_len = 0; struct sk_buff *skb; - if (echo_idx != (can->ack_idx & (can->can.echo_skb_max - 1))) + if (echo_idx != can->ack_idx % can->can.echo_skb_max) return 0; skb = can->can.echo_skb[echo_idx]; if (!skb) -- 2.49.0

4 days, 11 hours

2
2
0 0

[PATCH] accel/ivpu: Trigger device recovery on engine reset/resume failure

by Jacek Lawrynowicz

From: Karol Wachowski <karol.wachowski(a)intel.com> Trigger full device recovery when the driver fails to restore device state via engine reset and resume operations. This is necessary because, even if submissions from a faulty context are blocked, the NPU may still process previously submitted faulty jobs if the engine reset fails to abort them. Such jobs can continue to generate faults and occupy device resources. When engine reset is ineffective, the only way to recover is to perform a full device recovery. Fixes: dad945c27a42 ("accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW") Cc: <stable(a)vger.kernel.org> # v6.15+ Signed-off-by: Karol Wachowski <karol.wachowski(a)intel.com> Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com> --- drivers/accel/ivpu/ivpu_job.c | 6 ++++-- drivers/accel/ivpu/ivpu_jsm_msg.c | 9 +++++++-- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c index 1c8e283ad9854..fae8351aa3309 100644 --- a/drivers/accel/ivpu/ivpu_job.c +++ b/drivers/accel/ivpu/ivpu_job.c @@ -986,7 +986,8 @@ void ivpu_context_abort_work_fn(struct work_struct *work) return; if (vdev->fw->sched_mode == VPU_SCHEDULING_MODE_HW) - ivpu_jsm_reset_engine(vdev, 0); + if (ivpu_jsm_reset_engine(vdev, 0)) + return; mutex_lock(&vdev->context_list_lock); xa_for_each(&vdev->context_xa, ctx_id, file_priv) { @@ -1009,7 +1010,8 @@ void ivpu_context_abort_work_fn(struct work_struct *work) if (vdev->fw->sched_mode != VPU_SCHEDULING_MODE_HW) goto runtime_put; - ivpu_jsm_hws_resume_engine(vdev, 0); + if (ivpu_jsm_hws_resume_engine(vdev, 0)) + return; /* * In hardware scheduling mode NPU already has stopped processing jobs * and won't send us any further notifications, thus we have to free job related resources diff --git a/drivers/accel/ivpu/ivpu_jsm_msg.c b/drivers/accel/ivpu/ivpu_jsm_msg.c index 219ab8afefabd..0256b2dfefc10 100644 --- a/drivers/accel/ivpu/ivpu_jsm_msg.c +++ b/drivers/accel/ivpu/ivpu_jsm_msg.c @@ -7,6 +7,7 @@ #include "ivpu_hw.h" #include "ivpu_ipc.h" #include "ivpu_jsm_msg.h" +#include "ivpu_pm.h" #include "vpu_jsm_api.h" const char *ivpu_jsm_msg_type_to_str(enum vpu_ipc_msg_type type) @@ -163,8 +164,10 @@ int ivpu_jsm_reset_engine(struct ivpu_device *vdev, u32 engine) ret = ivpu_ipc_send_receive(vdev, &req, VPU_JSM_MSG_ENGINE_RESET_DONE, &resp, VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm); - if (ret) + if (ret) { ivpu_err_ratelimited(vdev, "Failed to reset engine %d: %d\n", engine, ret); + ivpu_pm_trigger_recovery(vdev, "Engine reset failed"); + } return ret; } @@ -354,8 +357,10 @@ int ivpu_jsm_hws_resume_engine(struct ivpu_device *vdev, u32 engine) ret = ivpu_ipc_send_receive(vdev, &req, VPU_JSM_MSG_HWS_RESUME_ENGINE_DONE, &resp, VPU_IPC_CHAN_ASYNC_CMD, vdev->timeout.jsm); - if (ret) + if (ret) { ivpu_err_ratelimited(vdev, "Failed to resume engine %d: %d\n", engine, ret); + ivpu_pm_trigger_recovery(vdev, "Engine resume failed"); + } return ret; } -- 2.45.1

4 days, 12 hours

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror May 2025