The quilt patch titled
Subject: mm/ksm: fix flag-dropping behavior in ksm_madvise
has been removed from the -mm tree. Its filename was
mm-ksm-fix-flag-dropping-behavior-in-ksm_madvise.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jakub Acs <acsjakub(a)amazon.de>
Subject: mm/ksm: fix flag-dropping behavior in ksm_madvise
Date: Wed, 1 Oct 2025 09:03:52 +0000
syzkaller discovered the following crash: (kernel BUG)
[ 44.607039] ------------[ cut here ]------------
[ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
[ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
<snip other registers, drop unreliable trace>
[ 44.617726] Call Trace:
[ 44.617926] <TASK>
[ 44.619284] userfaultfd_release+0xef/0x1b0
[ 44.620976] __fput+0x3f9/0xb60
[ 44.621240] fput_close_sync+0x110/0x210
[ 44.622222] __x64_sys_close+0x8f/0x120
[ 44.622530] do_syscall_64+0x5b/0x2f0
[ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 44.623244] RIP: 0033:0x7f365bb3f227
Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR mode,
it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.
Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int and
int are 32-bit wide. This setup causes the following mishap during the &=
~VM_MERGEABLE assignment.
VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation. This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is 0).
& operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.
Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
BIT() macro.
Note: other VM_* flags are not affected: This only happens to the
VM_MERGEABLE flag, as the other VM_* flags are all constants of type int
and after ~ operation, they end up with leading 1 and are thus converted
to unsigned long with leading 1s.
Note 2:
After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:
[ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
but the root-cause (flag-drop) remains the same.
[akpm(a)linux-foundation.org: rust bindgen wasn't able to handle BIT(), from Miguel]
Link: https://lore.kernel.org/oe-kbuild-all/202510030449.VfSaAjvd-lkp@intel.com/
Link: https://lkml.kernel.org/r/20251001090353.57523-2-acsjakub@amazon.de
Fixes: 7677f7fd8be7 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub(a)amazon.de>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: SeongJae Park <sj(a)kernel.org>
Tested-by: Alice Ryhl <aliceryhl(a)google.com>
Tested-by: Miguel Ojeda <miguel.ojeda.sandonis(a)gmail.com>
Cc: Xu Xin <xu.xin16(a)zte.com.cn>
Cc: Chengming Zhou <chengming.zhou(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 2 +-
rust/bindings/bindings_helper.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
--- a/include/linux/mm.h~mm-ksm-fix-flag-dropping-behavior-in-ksm_madvise
+++ a/include/linux/mm.h
@@ -323,7 +323,7 @@ extern unsigned int kobjsize(const void
#define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */
#define VM_HUGEPAGE 0x20000000 /* MADV_HUGEPAGE marked this vma */
#define VM_NOHUGEPAGE 0x40000000 /* MADV_NOHUGEPAGE marked this vma */
-#define VM_MERGEABLE 0x80000000 /* KSM may merge identical pages */
+#define VM_MERGEABLE BIT(31) /* KSM may merge identical pages */
#ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
#define VM_HIGH_ARCH_BIT_0 32 /* bit only usable on 64-bit architectures */
--- a/rust/bindings/bindings_helper.h~mm-ksm-fix-flag-dropping-behavior-in-ksm_madvise
+++ a/rust/bindings/bindings_helper.h
@@ -108,6 +108,7 @@ const xa_mark_t RUST_CONST_HELPER_XA_PRE
const gfp_t RUST_CONST_HELPER_XA_FLAGS_ALLOC = XA_FLAGS_ALLOC;
const gfp_t RUST_CONST_HELPER_XA_FLAGS_ALLOC1 = XA_FLAGS_ALLOC1;
+const vm_flags_t RUST_CONST_HELPER_VM_MERGEABLE = VM_MERGEABLE;
#if IS_ENABLED(CONFIG_ANDROID_BINDER_IPC_RUST)
#include "../../drivers/android/binder/rust_binder.h"
_
Patches currently in -mm which might be from acsjakub(a)amazon.de are
mm-redefine-vm_-flag-constants-with-bit.patch
The quilt patch titled
Subject: mm/damon/vaddr: do not repeat pte_offset_map_lock() until success
has been removed from the -mm tree. Its filename was
mm-damon-vaddr-do-not-repeat-pte_offset_map_lock-until-success.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/vaddr: do not repeat pte_offset_map_lock() until success
Date: Mon, 29 Sep 2025 17:44:09 -0700
DAMON's virtual address space operation set implementation (vaddr) calls
pte_offset_map_lock() inside the page table walk callback function. This
is for reading and writing page table accessed bits. If
pte_offset_map_lock() fails, it retries by returning the page table walk
callback function with ACTION_AGAIN.
pte_offset_map_lock() can continuously fail if the target is a pmd
migration entry, though. Hence it could cause an infinite page table walk
if the migration cannot be done until the page table walk is finished.
This indeed caused a soft lockup when CPU hotplugging and DAMON were
running in parallel.
Avoid the infinite loop by simply not retrying the page table walk. DAMON
is promising only a best-effort accuracy, so missing access to such pages
is no problem.
Link: https://lkml.kernel.org/r/20250930004410.55228-1-sj@kernel.org
Fixes: 7780d04046a2 ("mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: Xinyu Zheng <zhengxinyu6(a)huawei.com>
Closes: https://lore.kernel.org/20250918030029.2652607-1-zhengxinyu6@huawei.com
Acked-by: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/vaddr.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
--- a/mm/damon/vaddr.c~mm-damon-vaddr-do-not-repeat-pte_offset_map_lock-until-success
+++ a/mm/damon/vaddr.c
@@ -328,10 +328,8 @@ static int damon_mkold_pmd_entry(pmd_t *
}
pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
- if (!pte) {
- walk->action = ACTION_AGAIN;
+ if (!pte)
return 0;
- }
if (!pte_present(ptep_get(pte)))
goto out;
damon_ptep_mkold(pte, walk->vma, addr);
@@ -481,10 +479,8 @@ regular_page:
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
- if (!pte) {
- walk->action = ACTION_AGAIN;
+ if (!pte)
return 0;
- }
ptent = ptep_get(pte);
if (!pte_present(ptent))
goto out;
_
Patches currently in -mm which might be from sj(a)kernel.org are
The quilt patch titled
Subject: mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage
has been removed from the -mm tree. Its filename was
mm-rmap-fix-soft-dirty-and-uffd-wp-bit-loss-when-remapping-zero-filled-mthp-subpage-to-shared-zeropage.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lance Yang <lance.yang(a)linux.dev>
Subject: mm/rmap: fix soft-dirty and uffd-wp bit loss when remapping zero-filled mTHP subpage to shared zeropage
Date: Tue, 30 Sep 2025 16:10:40 +0800
When splitting an mTHP and replacing a zero-filled subpage with the shared
zeropage, try_to_map_unused_to_zeropage() currently drops several
important PTE bits.
For userspace tools like CRIU, which rely on the soft-dirty mechanism for
incremental snapshots, losing the soft-dirty bit means modified pages are
missed, leading to inconsistent memory state after restore.
As pointed out by David, the more critical uffd-wp bit is also dropped.
This breaks the userfaultfd write-protection mechanism, causing writes to
be silently missed by monitoring applications, which can lead to data
corruption.
Preserve both the soft-dirty and uffd-wp bits from the old PTE when
creating the new zeropage mapping to ensure they are correctly tracked.
Link: https://lkml.kernel.org/r/20250930081040.80926-1-lance.yang@linux.dev
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
Suggested-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: Dev Jain <dev.jain(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Dev Jain <dev.jain(a)arm.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Byungchul Park <byungchul(a)sk.com>
Cc: Gregory Price <gourry(a)gourry.net>
Cc: "Huang, Ying" <ying.huang(a)linux.alibaba.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Joshua Hahn <joshua.hahnjy(a)gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Mathew Brost <matthew.brost(a)intel.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rakie Kim <rakie.kim(a)sk.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Usama Arif <usamaarif642(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
--- a/mm/migrate.c~mm-rmap-fix-soft-dirty-and-uffd-wp-bit-loss-when-remapping-zero-filled-mthp-subpage-to-shared-zeropage
+++ a/mm/migrate.c
@@ -296,8 +296,7 @@ bool isolate_folio_to_list(struct folio
}
static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvmw,
- struct folio *folio,
- unsigned long idx)
+ struct folio *folio, pte_t old_pte, unsigned long idx)
{
struct page *page = folio_page(folio, idx);
pte_t newpte;
@@ -306,7 +305,7 @@ static bool try_to_map_unused_to_zeropag
return false;
VM_BUG_ON_PAGE(!PageAnon(page), page);
VM_BUG_ON_PAGE(!PageLocked(page), page);
- VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
+ VM_BUG_ON_PAGE(pte_present(old_pte), page);
if (folio_test_mlocked(folio) || (pvmw->vma->vm_flags & VM_LOCKED) ||
mm_forbids_zeropage(pvmw->vma->vm_mm))
@@ -322,6 +321,12 @@ static bool try_to_map_unused_to_zeropag
newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address),
pvmw->vma->vm_page_prot));
+
+ if (pte_swp_soft_dirty(old_pte))
+ newpte = pte_mksoft_dirty(newpte);
+ if (pte_swp_uffd_wp(old_pte))
+ newpte = pte_mkuffd_wp(newpte);
+
set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte);
dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio));
@@ -364,13 +369,13 @@ static bool remove_migration_pte(struct
continue;
}
#endif
+ old_pte = ptep_get(pvmw.pte);
if (rmap_walk_arg->map_unused_to_zeropage &&
- try_to_map_unused_to_zeropage(&pvmw, folio, idx))
+ try_to_map_unused_to_zeropage(&pvmw, folio, old_pte, idx))
continue;
folio_get(folio);
pte = mk_pte(new, READ_ONCE(vma->vm_page_prot));
- old_pte = ptep_get(pvmw.pte);
entry = pte_to_swp_entry(old_pte);
if (!is_migration_entry_young(entry))
_
Patches currently in -mm which might be from lance.yang(a)linux.dev are
hung_task-fix-warnings-caused-by-unaligned-lock-pointers.patch
mm-khugepaged-abort-collapse-scan-on-non-swap-entries.patch
The quilt patch titled
Subject: mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
has been removed from the -mm tree. Its filename was
mm-thp-fix-mte-tag-mismatch-when-replacing-zero-filled-subpages.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lance Yang <lance.yang(a)linux.dev>
Subject: mm/thp: fix MTE tag mismatch when replacing zero-filled subpages
Date: Mon, 22 Sep 2025 10:14:58 +0800
From: Lance Yang <lance.yang(a)linux.dev>
When both THP and MTE are enabled, splitting a THP and replacing its
zero-filled subpages with the shared zeropage can cause MTE tag mismatch
faults in userspace.
Remapping zero-filled subpages to the shared zeropage is unsafe, as the
zeropage has a fixed tag of zero, which may not match the tag expected by
the userspace pointer.
KSM already avoids this problem by using memcmp_pages(), which on arm64
intentionally reports MTE-tagged pages as non-identical to prevent unsafe
merging.
As suggested by David[1], this patch adopts the same pattern, replacing the
memchr_inv() byte-level check with a call to pages_identical(). This
leverages existing architecture-specific logic to determine if a page is
truly identical to the shared zeropage.
Having both the THP shrinker and KSM rely on pages_identical() makes the
design more future-proof, IMO. Instead of handling quirks in generic code,
we just let the architecture decide what makes two pages identical.
[1] https://lore.kernel.org/all/ca2106a3-4bb2-4457-81af-301fd99fbef4@redhat.com
Link: https://lkml.kernel.org/r/20250922021458.68123-1-lance.yang@linux.dev
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Lance Yang <lance.yang(a)linux.dev>
Reported-by: Qun-wei Lin <Qun-wei.Lin(a)mediatek.com>
Closes: https://lore.kernel.org/all/a7944523fcc3634607691c35311a5d59d1a3f8d4.camel@…
Suggested-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Usama Arif <usamaarif642(a)gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: andrew.yang <andrew.yang(a)mediatek.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Byungchul Park <byungchul(a)sk.com>
Cc: Charlie Jenkins <charlie(a)rivosinc.com>
Cc: Chinwen Chang <chinwen.chang(a)mediatek.com>
Cc: Dev Jain <dev.jain(a)arm.com>
Cc: Domenico Cerasuolo <cerasuolodomenico(a)gmail.com>
Cc: Gregory Price <gourry(a)gourry.net>
Cc: "Huang, Ying" <ying.huang(a)linux.alibaba.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Joshua Hahn <joshua.hahnjy(a)gmail.com>
Cc: Kairui Song <ryncsn(a)gmail.com>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mariano Pache <npache(a)redhat.com>
Cc: Mathew Brost <matthew.brost(a)intel.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Palmer Dabbelt <palmer(a)rivosinc.com>
Cc: Rakie Kim <rakie.kim(a)sk.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Samuel Holland <samuel.holland(a)sifive.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 15 +++------------
mm/migrate.c | 8 +-------
2 files changed, 4 insertions(+), 19 deletions(-)
--- a/mm/huge_memory.c~mm-thp-fix-mte-tag-mismatch-when-replacing-zero-filled-subpages
+++ a/mm/huge_memory.c
@@ -4104,32 +4104,23 @@ static unsigned long deferred_split_coun
static bool thp_underused(struct folio *folio)
{
int num_zero_pages = 0, num_filled_pages = 0;
- void *kaddr;
int i;
if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1)
return false;
for (i = 0; i < folio_nr_pages(folio); i++) {
- kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
- if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
- num_zero_pages++;
- if (num_zero_pages > khugepaged_max_ptes_none) {
- kunmap_local(kaddr);
+ if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) {
+ if (++num_zero_pages > khugepaged_max_ptes_none)
return true;
- }
} else {
/*
* Another path for early exit once the number
* of non-zero filled pages exceeds threshold.
*/
- num_filled_pages++;
- if (num_filled_pages >= HPAGE_PMD_NR - khugepaged_max_ptes_none) {
- kunmap_local(kaddr);
+ if (++num_filled_pages >= HPAGE_PMD_NR - khugepaged_max_ptes_none)
return false;
- }
}
- kunmap_local(kaddr);
}
return false;
}
--- a/mm/migrate.c~mm-thp-fix-mte-tag-mismatch-when-replacing-zero-filled-subpages
+++ a/mm/migrate.c
@@ -300,9 +300,7 @@ static bool try_to_map_unused_to_zeropag
unsigned long idx)
{
struct page *page = folio_page(folio, idx);
- bool contains_data;
pte_t newpte;
- void *addr;
if (PageCompound(page))
return false;
@@ -319,11 +317,7 @@ static bool try_to_map_unused_to_zeropag
* this subpage has been non present. If the subpage is only zero-filled
* then map it to the shared zeropage.
*/
- addr = kmap_local_page(page);
- contains_data = memchr_inv(addr, 0, PAGE_SIZE);
- kunmap_local(addr);
-
- if (contains_data)
+ if (!pages_identical(page, ZERO_PAGE(0)))
return false;
newpte = pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address),
_
Patches currently in -mm which might be from lance.yang(a)linux.dev are
hung_task-fix-warnings-caused-by-unaligned-lock-pointers.patch
mm-khugepaged-abort-collapse-scan-on-non-swap-entries.patch
The quilt patch titled
Subject: memcg: skip cgroup_file_notify if spinning is not allowed
has been removed from the -mm tree. Its filename was
memcg-skip-cgroup_file_notify-if-spinning-is-not-allowed.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Subject: memcg: skip cgroup_file_notify if spinning is not allowed
Date: Mon, 22 Sep 2025 15:02:03 -0700
Generally memcg charging is allowed from all the contexts including NMI
where even spinning on spinlock can cause locking issues. However one
call chain was missed during the addition of memcg charging from any
context support. That is try_charge_memcg() -> memcg_memory_event() ->
cgroup_file_notify().
The possible function call tree under cgroup_file_notify() can acquire
many different spin locks in spinning mode. Some of them are
cgroup_file_kn_lock, kernfs_notify_lock, pool_workqeue's lock. So, let's
just skip cgroup_file_notify() from memcg charging if the context does not
allow spinning.
Alternative approach was also explored where instead of skipping
cgroup_file_notify(), we defer the memcg event processing to irq_work [1].
However it adds complexity and it was decided to keep things simple until
we need more memcg events with !allow_spinning requirement.
Link: https://lore.kernel.org/all/5qi2llyzf7gklncflo6gxoozljbm4h3tpnuv4u4ej4ztysv… [1]
Link: https://lkml.kernel.org/r/20250922220203.261714-1-shakeel.butt@linux.dev
Fixes: 3ac4638a734a ("memcg: make memcg_rstat_updated nmi safe")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Closes: https://lore.kernel.org/all/20250905061919.439648-1-yepeilin@google.com/
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kumar Kartikeya Dwivedi <memxor(a)gmail.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peilin Ye <yepeilin(a)google.com>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 26 +++++++++++++++++++-------
mm/memcontrol.c | 7 ++++---
2 files changed, 23 insertions(+), 10 deletions(-)
--- a/include/linux/memcontrol.h~memcg-skip-cgroup_file_notify-if-spinning-is-not-allowed
+++ a/include/linux/memcontrol.h
@@ -1001,22 +1001,28 @@ static inline void count_memcg_event_mm(
count_memcg_events_mm(mm, idx, 1);
}
-static inline void memcg_memory_event(struct mem_cgroup *memcg,
- enum memcg_memory_event event)
+static inline void __memcg_memory_event(struct mem_cgroup *memcg,
+ enum memcg_memory_event event,
+ bool allow_spinning)
{
bool swap_event = event == MEMCG_SWAP_HIGH || event == MEMCG_SWAP_MAX ||
event == MEMCG_SWAP_FAIL;
+ /* For now only MEMCG_MAX can happen with !allow_spinning context. */
+ VM_WARN_ON_ONCE(!allow_spinning && event != MEMCG_MAX);
+
atomic_long_inc(&memcg->memory_events_local[event]);
- if (!swap_event)
+ if (!swap_event && allow_spinning)
cgroup_file_notify(&memcg->events_local_file);
do {
atomic_long_inc(&memcg->memory_events[event]);
- if (swap_event)
- cgroup_file_notify(&memcg->swap_events_file);
- else
- cgroup_file_notify(&memcg->events_file);
+ if (allow_spinning) {
+ if (swap_event)
+ cgroup_file_notify(&memcg->swap_events_file);
+ else
+ cgroup_file_notify(&memcg->events_file);
+ }
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
break;
@@ -1026,6 +1032,12 @@ static inline void memcg_memory_event(st
!mem_cgroup_is_root(memcg));
}
+static inline void memcg_memory_event(struct mem_cgroup *memcg,
+ enum memcg_memory_event event)
+{
+ __memcg_memory_event(memcg, event, true);
+}
+
static inline void memcg_memory_event_mm(struct mm_struct *mm,
enum memcg_memory_event event)
{
--- a/mm/memcontrol.c~memcg-skip-cgroup_file_notify-if-spinning-is-not-allowed
+++ a/mm/memcontrol.c
@@ -2307,12 +2307,13 @@ static int try_charge_memcg(struct mem_c
bool drained = false;
bool raised_max_event = false;
unsigned long pflags;
+ bool allow_spinning = gfpflags_allow_spinning(gfp_mask);
retry:
if (consume_stock(memcg, nr_pages))
return 0;
- if (!gfpflags_allow_spinning(gfp_mask))
+ if (!allow_spinning)
/* Avoid the refill and flush of the older stock */
batch = nr_pages;
@@ -2348,7 +2349,7 @@ retry:
if (!gfpflags_allow_blocking(gfp_mask))
goto nomem;
- memcg_memory_event(mem_over_limit, MEMCG_MAX);
+ __memcg_memory_event(mem_over_limit, MEMCG_MAX, allow_spinning);
raised_max_event = true;
psi_memstall_enter(&pflags);
@@ -2415,7 +2416,7 @@ force:
* a MEMCG_MAX event.
*/
if (!raised_max_event)
- memcg_memory_event(mem_over_limit, MEMCG_MAX);
+ __memcg_memory_event(mem_over_limit, MEMCG_MAX, allow_spinning);
/*
* The allocation either can't fail or will lead to more memory
_
Patches currently in -mm which might be from shakeel.butt(a)linux.dev are
This series backports 19 patches to update minmax.h in the 5.15.y branch,
aligning it with v6.17-rc7.
The ultimate goal is to synchronize all longterm branches so that they
include the full set of minmax.h changes (6.12.y and 6.6.y were already
backported by me and are now aligned, 6.1.y is in progress).
The key motivation is to bring in commit d03eba99f5bf ("minmax: allow
min()/max()/clamp() if the arguments have the same signedness"), which
is missing in kernel 5.10.y.
In mainline, this change enables min()/max()/clamp() to accept mixed
argument types, provided both have the same signedness. Without it,
backported patches that use these forms may trigger compiler warnings,
which escalate to build failures when -Werror is enabled.
Changes in v2:
- Fix the order of patches 6 - 10 according to order in mainline branch.
- Use same style of [ Upstream commit <HASH> ] in all patches.
Andy Shevchenko (1):
minmax: deduplicate __unconst_integer_typeof()
David Laight (8):
minmax: fix indentation of __cmp_once() and __clamp_once()
minmax.h: add whitespace around operators and after commas
minmax.h: update some comments
minmax.h: reduce the #define expansion of min(), max() and clamp()
minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp()
minmax.h: move all the clamp() definitions after the min/max() ones
minmax.h: simplify the variants of clamp()
minmax.h: remove some #defines that are only expanded once
Herve Codina (1):
minmax: Introduce {min,max}_array()
Linus Torvalds (8):
minmax: avoid overly complicated constant expressions in VM code
minmax: add a few more MIN_T/MAX_T users
minmax: simplify and clarify min_t()/max_t() implementation
minmax: make generic MIN() and MAX() macros available everywhere
minmax: don't use max() in situations that want a C constant
expression
minmax: simplify min()/max()/clamp() implementation
minmax: improve macro expansion and type checking
minmax: fix up min3() and max3() too
Matthew Wilcox (Oracle) (1):
minmax: add in_range() macro
arch/arm/mm/pageattr.c | 6 +-
arch/um/drivers/mconsole_user.c | 2 +
arch/x86/mm/pgtable.c | 2 +-
drivers/edac/sb_edac.c | 4 +-
drivers/edac/skx_common.h | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +
.../drm/amd/display/modules/hdcp/hdcp_ddc.c | 2 +
.../drm/amd/pm/powerplay/hwmgr/ppevvmath.h | 14 +-
.../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c | 2 +
.../drm/arm/display/include/malidp_utils.h | 2 +-
.../display/komeda/komeda_pipeline_state.c | 24 +-
drivers/gpu/drm/drm_color_mgmt.c | 2 +-
drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 6 -
drivers/gpu/drm/radeon/evergreen_cs.c | 2 +
drivers/hwmon/adt7475.c | 24 +-
drivers/input/touchscreen/cyttsp4_core.c | 2 +-
drivers/irqchip/irq-sun6i-r.c | 2 +-
drivers/md/dm-integrity.c | 4 +-
drivers/media/dvb-frontends/stv0367_priv.h | 3 +
.../net/ethernet/chelsio/cxgb3/cxgb3_main.c | 18 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
drivers/net/fjes/fjes_main.c | 4 +-
drivers/nfc/pn544/i2c.c | 2 -
drivers/platform/x86/sony-laptop.c | 1 -
drivers/scsi/isci/init.c | 6 +-
.../pci/hive_isp_css_include/math_support.h | 5 -
drivers/virt/acrn/ioreq.c | 4 +-
fs/btrfs/misc.h | 2 -
fs/btrfs/tree-checker.c | 2 +-
fs/ext2/balloc.c | 2 -
fs/ext4/ext4.h | 2 -
fs/ufs/util.h | 6 -
include/linux/compiler.h | 9 +
include/linux/minmax.h | 264 +++++++++++++-----
kernel/trace/preemptirq_delay_test.c | 2 -
lib/btree.c | 1 -
lib/decompress_unlzma.c | 2 +
lib/logic_pio.c | 3 -
lib/vsprintf.c | 2 +-
lib/zstd/zstd_internal.h | 2 -
mm/zsmalloc.c | 1 -
net/ipv4/proc.c | 2 +-
net/ipv6/proc.c | 2 +-
net/netfilter/nf_nat_core.c | 6 +-
net/tipc/core.h | 2 +-
net/tipc/link.c | 10 +-
tools/testing/selftests/vm/mremap_test.c | 2 +
47 files changed, 289 insertions(+), 183 deletions(-)
--
2.47.3
From: Jeongjun Park <aha310510(a)gmail.com>
[ Upstream commit 9f2c0ac1423d5f267e7f1d1940780fc764b0fee3 ]
The previous commit 0718a78f6a9f ("ALSA: usb-audio: Kill timer properly at
removal") patched a UAF issue caused by the error timer.
However, because the error timer kill added in this patch occurs after the
endpoint delete, a race condition to UAF still occurs, albeit rarely.
Additionally, since kill-cleanup for urb is also missing, freed memory can
be accessed in interrupt context related to urb, which can cause UAF.
Therefore, to prevent this, error timer and urb must be killed before
freeing the heap memory.
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot+f02665daa2abeef4a947(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f02665daa2abeef4a947
Fixes: 0718a78f6a9f ("ALSA: usb-audio: Kill timer properly at removal")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
---
sound/usb/midi.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/sound/usb/midi.c b/sound/usb/midi.c
index c3de2b137435..461e183680da 100644
--- a/sound/usb/midi.c
+++ b/sound/usb/midi.c
@@ -1522,15 +1522,14 @@ static void snd_usbmidi_free(struct snd_usb_midi *umidi)
{
int i;
+ if (!umidi->disconnected)
+ snd_usbmidi_disconnect(&umidi->list);
+
for (i = 0; i < MIDI_MAX_ENDPOINTS; ++i) {
struct snd_usb_midi_endpoint *ep = &umidi->endpoints[i];
- if (ep->out)
- snd_usbmidi_out_endpoint_delete(ep->out);
- if (ep->in)
- snd_usbmidi_in_endpoint_delete(ep->in);
+ kfree(ep->out);
}
mutex_destroy(&umidi->mutex);
- timer_shutdown_sync(&umidi->error_timer);
kfree(umidi);
}
--
The desc->len value can be set up to U32_MAX. If umem tx_metadata_len
option is also set, then the value of the expression
'desc->len + pool->tx_metadata_len' can overflow and validation
of the incorrect descriptor will be successfully passed.
This can lead to a subsequent chain of arithmetic overflows
in the xsk_build_skb() function and incorrect sk_buff allocation.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ilia Gavrilov <Ilia.Gavrilov(a)infotecs.ru>
---
net/xdp/xsk_queue.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index f16f390370dc..b206a8839b39 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -144,7 +144,7 @@ static inline bool xp_aligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
u64 addr = desc->addr - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
+ u64 len = (u64)desc->len + pool->tx_metadata_len;
u64 offset = addr & (pool->chunk_size - 1);
if (!desc->len)
@@ -165,7 +165,7 @@ static inline bool xp_unaligned_validate_desc(struct xsk_buff_pool *pool,
struct xdp_desc *desc)
{
u64 addr = xp_unaligned_add_offset_to_addr(desc->addr) - pool->tx_metadata_len;
- u64 len = desc->len + pool->tx_metadata_len;
+ u64 len = (u64)desc->len + pool->tx_metadata_len;
if (!desc->len)
return false;
--
2.39.5
Hi,
We have some reports of long compute jobs on APUs hanging the system.
This has been root caused and a workaround has been introduced in the
mainline kernel. I didn't CC stable on the original W/A because I
wanted to make sure we've had enough time to test it didn't have
unintended side effects.
I feel comfortable with the testing at this point and I think it's worth
bringing back to any stable kernels it will apply to 6.12.y and newer.
The commit is:
1fb710793ce2619223adffaf981b1ff13cd48f17
Thanks,