- Linux-stable-mirror - lists.linaro.org

[merged mm-hotfixes-stable] iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size has been removed from the -mm tree. Its filename was iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Dominique Martinet <asmadeus(a)codewreck.org> Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size Date: Wed, 13 Aug 2025 15:04:55 +0900 It's apparently possible to get an iov advanced all the way up to the end of the current page we're looking at, e.g. (gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}} Where iov_offset is 4k with 4k-sized folios This should have been fine because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-0-a0ffad2b665a@codewre… Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-1-a0ffad2b665a@codewre… Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org> Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Reported-by: Maximilian Bosch <maximilian(a)mbosch.me> Reported-by: Ryan Lahfa <ryan(a)lahfa.xyz> Reported-by: Christian Theune <ct(a)flyingcircus.io> Reported-by: Arnout Engelen <arnout(a)bzzt.net> Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Acked-by: David Howells <dhowells(a)redhat.com> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Christian Brauner <brauner(a)kernel.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [6.12+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/iov_iter.h | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) --- a/include/linux/iov_iter.h~iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size +++ a/include/linux/iov_iter.h @@ -160,7 +160,7 @@ size_t iterate_folioq(struct iov_iter *i do { struct folio *folio = folioq_folio(folioq, slot); - size_t part, remain, consumed; + size_t part, remain = 0, consumed; size_t fsize; void *base; @@ -168,14 +168,16 @@ size_t iterate_folioq(struct iov_iter *i break; fsize = folioq_folio_size(folioq, slot); - base = kmap_local_folio(folio, skip); - part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); - remain = step(base, progress, part, priv, priv2); - kunmap_local(base); - consumed = part - remain; - len -= consumed; - progress += consumed; - skip += consumed; + if (skip < fsize) { + base = kmap_local_folio(folio, skip); + part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); + remain = step(base, progress, part, priv, priv2); + kunmap_local(base); + consumed = part - remain; + len -= consumed; + progress += consumed; + skip += consumed; + } if (skip >= fsize) { skip = 0; slot++; _ Patches currently in -mm which might be from asmadeus(a)codewreck.org are

3 weeks, 3 days

1
0
0 0

[merged mm-hotfixes-stable] mm-damon-core-fix-commit_ops_filters-by-using-correct-nth-function.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/damon/core: fix commit_ops_filters by using correct nth function has been removed from the -mm tree. Its filename was mm-damon-core-fix-commit_ops_filters-by-using-correct-nth-function.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Sang-Heon Jeon <ekffu200098(a)gmail.com> Subject: mm/damon/core: fix commit_ops_filters by using correct nth function Date: Sun, 10 Aug 2025 21:42:01 +0900 damos_commit_ops_filters() incorrectly uses damos_nth_filter() which iterates core_filters. As a result, performing a commit unintentionally corrupts ops_filters. Add damos_nth_ops_filter() which iterates ops_filters. Use this function to fix issues caused by wrong iteration. Link: https://lkml.kernel.org/r/20250810124201.15743-1-ekffu200098@gmail.com Fixes: 3607cc590f18 ("mm/damon/core: support committing ops_filters") # 6.15.x Signed-off-by: Sang-Heon Jeon <ekffu200098(a)gmail.com> Reviewed-by: SeongJae Park <sj(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/damon/core.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) --- a/mm/damon/core.c~mm-damon-core-fix-commit_ops_filters-by-using-correct-nth-function +++ a/mm/damon/core.c @@ -845,6 +845,18 @@ static struct damos_filter *damos_nth_fi return NULL; } +static struct damos_filter *damos_nth_ops_filter(int n, struct damos *s) +{ + struct damos_filter *filter; + int i = 0; + + damos_for_each_ops_filter(filter, s) { + if (i++ == n) + return filter; + } + return NULL; +} + static void damos_commit_filter_arg( struct damos_filter *dst, struct damos_filter *src) { @@ -908,7 +920,7 @@ static int damos_commit_ops_filters(stru int i = 0, j = 0; damos_for_each_ops_filter_safe(dst_filter, next, dst) { - src_filter = damos_nth_filter(i++, src); + src_filter = damos_nth_ops_filter(i++, src); if (src_filter) damos_commit_filter(dst_filter, src_filter); else _ Patches currently in -mm which might be from ekffu200098(a)gmail.com are mm-damon-core-set-quota-charged_from-to-jiffies-at-first-charge-window.patch mm-damon-update-expired-description-of-damos_action.patch docs-mm-damon-design-fix-typo-s-sz_trtied-sz_tried.patch selftests-damon-test-no-op-commit-broke-damon-status.patch selftests-damon-test-no-op-commit-broke-damon-status-fix.patch mm-damon-tests-core-kunit-add-damos_commit_filter-test.patch

3 weeks, 3 days

1
0
0 0

[merged mm-hotfixes-stable] mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/debug_vm_pgtable: clear page table entries at destroy_args() has been removed from the -mm tree. Its filename was mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: "Herton R. Krzesinski" <herton(a)redhat.com> Subject: mm/debug_vm_pgtable: clear page table entries at destroy_args() Date: Thu, 31 Jul 2025 18:40:51 -0300 The mm/debug_vm_pagetable test allocates manually page table entries for the tests it runs, using also its manually allocated mm_struct. That in itself is ok, but when it exits, at destroy_args() it fails to clear those entries with the *_clear functions. The problem is that leaves stale entries. If another process allocates an mm_struct with a pgd at the same address, it may end up running into the stale entry. This is happening in practice on a debug kernel with CONFIG_DEBUG_VM_PGTABLE=y, for example this is the output with some extra debugging I added (it prints a warning trace if pgtables_bytes goes negative, in addition to the warning at check_mm() function): [ 2.539353] debug_vm_pgtable: [get_random_vaddr ]: random_vaddr is 0x7ea247140000 [ 2.539366] kmem_cache info [ 2.539374] kmem_cachep 0x000000002ce82385 - freelist 0x0000000000000000 - offset 0x508 [ 2.539447] debug_vm_pgtable: [init_args ]: args->mm is 0x000000002267cc9e (...) [ 2.552800] WARNING: CPU: 5 PID: 116 at include/linux/mm.h:2841 free_pud_range+0x8bc/0x8d0 [ 2.552816] Modules linked in: [ 2.552843] CPU: 5 UID: 0 PID: 116 Comm: modprobe Not tainted 6.12.0-105.debug_vm2.el10.ppc64le+debug #1 VOLUNTARY [ 2.552859] Hardware name: IBM,9009-41A POWER9 (architected) 0x4e0202 0xf000005 of:IBM,FW910.00 (VL910_062) hv:phyp pSeries [ 2.552872] NIP: c0000000007eef3c LR: c0000000007eef30 CTR: c0000000003d8c90 [ 2.552885] REGS: c0000000622e73b0 TRAP: 0700 Not tainted (6.12.0-105.debug_vm2.el10.ppc64le+debug) [ 2.552899] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24002822 XER: 0000000a [ 2.552954] CFAR: c0000000008f03f0 IRQMASK: 0 [ 2.552954] GPR00: c0000000007eef30 c0000000622e7650 c000000002b1ac00 0000000000000001 [ 2.552954] GPR04: 0000000000000008 0000000000000000 c0000000007eef30 ffffffffffffffff [ 2.552954] GPR08: 00000000ffff00f5 0000000000000001 0000000000000048 0000000000004000 [ 2.552954] GPR12: 00000003fa440000 c000000017ffa300 c0000000051d9f80 ffffffffffffffdb [ 2.552954] GPR16: 0000000000000000 0000000000000008 000000000000000a 60000000000000e0 [ 2.552954] GPR20: 4080000000000000 c0000000113af038 00007fffcf130000 0000700000000000 [ 2.552954] GPR24: c000000062a6a000 0000000000000001 8000000062a68000 0000000000000001 [ 2.552954] GPR28: 000000000000000a c000000062ebc600 0000000000002000 c000000062ebc760 [ 2.553170] NIP [c0000000007eef3c] free_pud_range+0x8bc/0x8d0 [ 2.553185] LR [c0000000007eef30] free_pud_range+0x8b0/0x8d0 [ 2.553199] Call Trace: [ 2.553207] [c0000000622e7650] [c0000000007eef30] free_pud_range+0x8b0/0x8d0 (unreliable) [ 2.553229] [c0000000622e7750] [c0000000007f40b4] free_pgd_range+0x284/0x3b0 [ 2.553248] [c0000000622e7800] [c0000000007f4630] free_pgtables+0x450/0x570 [ 2.553274] [c0000000622e78e0] [c0000000008161c0] exit_mmap+0x250/0x650 [ 2.553292] [c0000000622e7a30] [c0000000001b95b8] __mmput+0x98/0x290 [ 2.558344] [c0000000622e7a80] [c0000000001d1018] exit_mm+0x118/0x1b0 [ 2.558361] [c0000000622e7ac0] [c0000000001d141c] do_exit+0x2ec/0x870 [ 2.558376] [c0000000622e7b60] [c0000000001d1ca8] do_group_exit+0x88/0x150 [ 2.558391] [c0000000622e7bb0] [c0000000001d1db8] sys_exit_group+0x48/0x50 [ 2.558407] [c0000000622e7be0] [c00000000003d810] system_call_exception+0x1e0/0x4c0 [ 2.558423] [c0000000622e7e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec (...) [ 2.558892] ---[ end trace 0000000000000000 ]--- [ 2.559022] BUG: Bad rss-counter state mm:000000002267cc9e type:MM_ANONPAGES val:1 [ 2.559037] BUG: non-zero pgtables_bytes on freeing mm: -6144 Here the modprobe process ended up with an allocated mm_struct from the mm_struct slab that was used before by the debug_vm_pgtable test. That is not a problem, since the mm_struct is initialized again etc., however, if it ends up using the same pgd table, it bumps into the old stale entry when clearing/freeing the page table entries, so it tries to free an entry already gone (that one which was allocated by the debug_vm_pgtable test), which also explains the negative pgtables_bytes since it's accounting for not allocated entries in the current process. As far as I looked pgd_{alloc,free} etc. does not clear entries, and clearing of the entries is explicitly done in the free_pgtables-> free_pgd_range->free_p4d_range->free_pud_range->free_pmd_range-> free_pte_range path. However, the debug_vm_pgtable test does not call free_pgtables, since it allocates mm_struct and entries manually for its test and eg. not goes through page faults. So it also should clear manually the entries before exit at destroy_args(). This problem was noticed on a reboot X number of times test being done on a powerpc host, with a debug kernel with CONFIG_DEBUG_VM_PGTABLE enabled. Depends on the system, but on a 100 times reboot loop the problem could manifest once or twice, if a process ends up getting the right mm->pgd entry with the stale entries used by mm/debug_vm_pagetable. After using this patch, I couldn't reproduce/experience the problems anymore. I was able to reproduce the problem as well on latest upstream kernel (6.16). I also modified destroy_args() to use mmput() instead of mmdrop(), there is no reason to hold mm_users reference and not release the mm_struct entirely, and in the output above with my debugging prints I already had patched it to use mmput, it did not fix the problem, but helped in the debugging as well. Link: https://lkml.kernel.org/r/20250731214051.4115182-1-herton@redhat.com Fixes: 3c9b84f044a9 ("mm/debug_vm_pgtable: introduce struct pgtable_debug_args") Signed-off-by: Herton R. Krzesinski <herton(a)redhat.com> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com> Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu> Cc: Gavin Shan <gshan(a)redhat.com> Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/debug_vm_pgtable.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-clear-page-table-entries-at-destroy_args +++ a/mm/debug_vm_pgtable.c @@ -990,29 +990,34 @@ static void __init destroy_args(struct p /* Free page table entries */ if (args->start_ptep) { + pmd_clear(args->pmdp); pte_free(args->mm, args->start_ptep); mm_dec_nr_ptes(args->mm); } if (args->start_pmdp) { + pud_clear(args->pudp); pmd_free(args->mm, args->start_pmdp); mm_dec_nr_pmds(args->mm); } if (args->start_pudp) { + p4d_clear(args->p4dp); pud_free(args->mm, args->start_pudp); mm_dec_nr_puds(args->mm); } - if (args->start_p4dp) + if (args->start_p4dp) { + pgd_clear(args->pgdp); p4d_free(args->mm, args->start_p4dp); + } /* Free vma and mm struct */ if (args->vma) vm_area_free(args->vma); if (args->mm) - mmdrop(args->mm); + mmput(args->mm); } static struct page * __init _ Patches currently in -mm which might be from herton(a)redhat.com are

3 weeks, 3 days

1
0
0 0

[merged mm-hotfixes-stable] squashfs-fix-memory-leak-in-squashfs_fill_super.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: squashfs: fix memory leak in squashfs_fill_super has been removed from the -mm tree. Its filename was squashfs-fix-memory-leak-in-squashfs_fill_super.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Phillip Lougher <phillip(a)squashfs.org.uk> Subject: squashfs: fix memory leak in squashfs_fill_super Date: Mon, 11 Aug 2025 23:37:40 +0100 If sb_min_blocksize returns 0, squashfs_fill_super exits without freeing allocated memory (sb->s_fs_info). Fix this by moving the call to sb_min_blocksize to before memory is allocated. Link: https://lkml.kernel.org/r/20250811223740.110392-1-phillip@squashfs.org.uk Fixes: 734aa85390ea ("Squashfs: check return result of sb_min_blocksize") Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk> Reported-by: Scott GUO <scottzhguo(a)tencent.com> Closes: https://lore.kernel.org/all/20250811061921.3807353-1-scott_gzh@163.com Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/squashfs/super.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) --- a/fs/squashfs/super.c~squashfs-fix-memory-leak-in-squashfs_fill_super +++ a/fs/squashfs/super.c @@ -187,10 +187,15 @@ static int squashfs_fill_super(struct su unsigned short flags; unsigned int fragments; u64 lookup_table_start, xattr_id_table_start, next_table; - int err; + int err, devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE); TRACE("Entered squashfs_fill_superblock\n"); + if (!devblksize) { + errorf(fc, "squashfs: unable to set blocksize\n"); + return -EINVAL; + } + sb->s_fs_info = kzalloc(sizeof(*msblk), GFP_KERNEL); if (sb->s_fs_info == NULL) { ERROR("Failed to allocate squashfs_sb_info\n"); @@ -201,12 +206,7 @@ static int squashfs_fill_super(struct su msblk->panic_on_errors = (opts->errors == Opt_errors_panic); - msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE); - if (!msblk->devblksize) { - errorf(fc, "squashfs: unable to set blocksize\n"); - return -EINVAL; - } - + msblk->devblksize = devblksize; msblk->devblksize_log2 = ffz(~msblk->devblksize); mutex_init(&msblk->meta_index_mutex); _ Patches currently in -mm which might be from phillip(a)squashfs.org.uk are

3 weeks, 3 days

1
0
0 0

[merged mm-hotfixes-stable] kho-warn-if-kho-is-disabled-due-to-an-error.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: kho: warn if KHO is disabled due to an error has been removed from the -mm tree. Its filename was kho-warn-if-kho-is-disabled-due-to-an-error.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Pasha Tatashin <pasha.tatashin(a)soleen.com> Subject: kho: warn if KHO is disabled due to an error Date: Fri, 8 Aug 2025 20:18:04 +0000 During boot scratch area is allocated based on command line parameters or auto calculated. However, scratch area may fail to allocate, and in that case KHO is disabled. Currently, no warning is printed that KHO is disabled, which makes it confusing for the end user to figure out why KHO is not available. Add the missing warning message. Link: https://lkml.kernel.org/r/20250808201804.772010-4-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin <pasha.tatashin(a)soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org> Acked-by: Pratyush Yadav <pratyush(a)kernel.org> Cc: Alexander Graf <graf(a)amazon.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Baoquan He <bhe(a)redhat.com> Cc: Changyuan Lyu <changyuanl(a)google.com> Cc: Coiby Xu <coxu(a)redhat.com> Cc: Dave Vasilevsky <dave(a)vasilevsky.ca> Cc: Eric Biggers <ebiggers(a)google.com> Cc: Kees Cook <kees(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/kexec_handover.c | 1 + 1 file changed, 1 insertion(+) --- a/kernel/kexec_handover.c~kho-warn-if-kho-is-disabled-due-to-an-error +++ a/kernel/kexec_handover.c @@ -564,6 +564,7 @@ err_free_scratch_areas: err_free_scratch_desc: memblock_free(kho_scratch, kho_scratch_cnt * sizeof(*kho_scratch)); err_disable_kho: + pr_warn("Failed to reserve scratch area, disabling kexec handover\n"); kho_enable = false; } _ Patches currently in -mm which might be from pasha.tatashin(a)soleen.com are

3 weeks, 3 days

1
0
0 0

[merged mm-hotfixes-stable] kho-mm-dont-allow-deferred-struct-page-with-kho.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: kho: mm: don't allow deferred struct page with KHO has been removed from the -mm tree. Its filename was kho-mm-dont-allow-deferred-struct-page-with-kho.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Pasha Tatashin <pasha.tatashin(a)soleen.com> Subject: kho: mm: don't allow deferred struct page with KHO Date: Fri, 8 Aug 2025 20:18:03 +0000 KHO uses struct pages for the preserved memory early in boot, however, with deferred struct page initialization, only a small portion of memory has properly initialized struct pages. This problem was detected where vmemmap is poisoned, and illegal flag combinations are detected. Don't allow them to be enabled together, and later we will have to teach KHO to work properly with deferred struct page init kernel feature. Link: https://lkml.kernel.org/r/20250808201804.772010-3-pasha.tatashin@soleen.com Fixes: 4e1d010e3bda ("kexec: add config option for KHO") Signed-off-by: Pasha Tatashin <pasha.tatashin(a)soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org> Acked-by: Pratyush Yadav <pratyush(a)kernel.org> Cc: Alexander Graf <graf(a)amazon.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Baoquan He <bhe(a)redhat.com> Cc: Changyuan Lyu <changyuanl(a)google.com> Cc: Coiby Xu <coxu(a)redhat.com> Cc: Dave Vasilevsky <dave(a)vasilevsky.ca> Cc: Eric Biggers <ebiggers(a)google.com> Cc: Kees Cook <kees(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/Kconfig.kexec | 1 + 1 file changed, 1 insertion(+) --- a/kernel/Kconfig.kexec~kho-mm-dont-allow-deferred-struct-page-with-kho +++ a/kernel/Kconfig.kexec @@ -97,6 +97,7 @@ config KEXEC_JUMP config KEXEC_HANDOVER bool "kexec handover" depends on ARCH_SUPPORTS_KEXEC_HANDOVER && ARCH_SUPPORTS_KEXEC_FILE + depends on !DEFERRED_STRUCT_PAGE_INIT select MEMBLOCK_KHO_SCRATCH select KEXEC_FILE select DEBUG_FS _ Patches currently in -mm which might be from pasha.tatashin(a)soleen.com are

3 weeks, 3 days

1
0
0 0

[merged mm-hotfixes-stable] kho-init-new_physxa-phys_bits-to-fix-lockdep.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: kho: init new_physxa->phys_bits to fix lockdep has been removed from the -mm tree. Its filename was kho-init-new_physxa-phys_bits-to-fix-lockdep.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Pasha Tatashin <pasha.tatashin(a)soleen.com> Subject: kho: init new_physxa->phys_bits to fix lockdep Date: Fri, 8 Aug 2025 20:18:02 +0000 Patch series "Several KHO Hotfixes". Three unrelated fixes for Kexec Handover. This patch (of 3): Lockdep shows the following warning: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. [<ffffffff810133a6>] dump_stack_lvl+0x66/0xa0 [<ffffffff8136012c>] assign_lock_key+0x10c/0x120 [<ffffffff81358bb4>] register_lock_class+0xf4/0x2f0 [<ffffffff813597ff>] __lock_acquire+0x7f/0x2c40 [<ffffffff81360cb0>] ? __pfx_hlock_conflict+0x10/0x10 [<ffffffff811707be>] ? native_flush_tlb_global+0x8e/0xa0 [<ffffffff8117096e>] ? __flush_tlb_all+0x4e/0xa0 [<ffffffff81172fc2>] ? __kernel_map_pages+0x112/0x140 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff81359556>] lock_acquire+0xe6/0x280 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff8100b9e0>] _raw_spin_lock+0x30/0x40 [<ffffffff813ec327>] ? xa_load_or_alloc+0x67/0xe0 [<ffffffff813ec327>] xa_load_or_alloc+0x67/0xe0 [<ffffffff813eb4c0>] kho_preserve_folio+0x90/0x100 [<ffffffff813ebb7f>] __kho_finalize+0xcf/0x400 [<ffffffff813ebef4>] kho_finalize+0x34/0x70 This is becase xa has its own lock, that is not initialized in xa_load_or_alloc. Modifiy __kho_preserve_order(), to properly call xa_init(&new_physxa->phys_bits); Link: https://lkml.kernel.org/r/20250808201804.772010-2-pasha.tatashin@soleen.com Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation") Signed-off-by: Pasha Tatashin <pasha.tatashin(a)soleen.com> Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org> Cc: Alexander Graf <graf(a)amazon.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Baoquan He <bhe(a)redhat.com> Cc: Changyuan Lyu <changyuanl(a)google.com> Cc: Coiby Xu <coxu(a)redhat.com> Cc: Dave Vasilevsky <dave(a)vasilevsky.ca> Cc: Eric Biggers <ebiggers(a)google.com> Cc: Kees Cook <kees(a)kernel.org> Cc: Pratyush Yadav <pratyush(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/kexec_handover.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) --- a/kernel/kexec_handover.c~kho-init-new_physxa-phys_bits-to-fix-lockdep +++ a/kernel/kexec_handover.c @@ -144,14 +144,34 @@ static int __kho_preserve_order(struct k unsigned int order) { struct kho_mem_phys_bits *bits; - struct kho_mem_phys *physxa; + struct kho_mem_phys *physxa, *new_physxa; const unsigned long pfn_high = pfn >> order; might_sleep(); - physxa = xa_load_or_alloc(&track->orders, order, sizeof(*physxa)); - if (IS_ERR(physxa)) - return PTR_ERR(physxa); + physxa = xa_load(&track->orders, order); + if (!physxa) { + int err; + + new_physxa = kzalloc(sizeof(*physxa), GFP_KERNEL); + if (!new_physxa) + return -ENOMEM; + + xa_init(&new_physxa->phys_bits); + physxa = xa_cmpxchg(&track->orders, order, NULL, new_physxa, + GFP_KERNEL); + + err = xa_err(physxa); + if (err || physxa) { + xa_destroy(&new_physxa->phys_bits); + kfree(new_physxa); + + if (err) + return err; + } else { + physxa = new_physxa; + } + } bits = xa_load_or_alloc(&physxa->phys_bits, pfn_high / PRESERVE_BITS, sizeof(*bits)); _ Patches currently in -mm which might be from pasha.tatashin(a)soleen.com are

3 weeks, 3 days

1
0
0 0

+ x86-mm-64-define-arch_page_table_sync_mask-and-arch_sync_kernel_mappings.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() has been added to the -mm mm-hotfixes-unstable branch. Its filename is x86-mm-64-define-arch_page_table_sync_mask-and-arch_sync_kernel_mappings.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Harry Yoo <harry.yoo(a)oracle.com> Subject: x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() Date: Mon, 18 Aug 2025 11:02:06 +0900 Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to ensure page tables are properly synchronized when calling p*d_populate_kernel(). For 5-level paging, synchronization is performed via pgd_populate_kernel(). In 4-level paging, pgd_populate() is a no-op, so synchronization is instead performed at the P4D level via p4d_populate_kernel(). This fixes intermittent boot failures on systems using 4-level paging and a large amount of persistent memory: BUG: unable to handle page fault for address: ffffe70000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI RIP: 0010:__init_single_page+0x9/0x6d Call Trace: <TASK> __init_zone_device_page+0x17/0x5d memmap_init_zone_device+0x154/0x1bb pagemap_range+0x2e0/0x40f memremap_pages+0x10b/0x2f0 devm_memremap_pages+0x1e/0x60 dev_dax_probe+0xce/0x2ec [device_dax] dax_bus_probe+0x6d/0xc9 [... snip ...] </TASK> It also fixes a crash in vmemmap_set_pmd() caused by accessing vmemmap before sync_global_pgds() [1]: BUG: unable to handle page fault for address: ffffeb3ff1200000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI Tainted: [W]=WARN RIP: 0010:vmemmap_set_pmd+0xff/0x230 <TASK> vmemmap_populate_hugepages+0x176/0x180 vmemmap_populate+0x34/0x80 __populate_section_memmap+0x41/0x90 sparse_add_section+0x121/0x3e0 __add_pages+0xba/0x150 add_pages+0x1d/0x70 memremap_pages+0x3dc/0x810 devm_memremap_pages+0x1c/0x60 xe_devm_add+0x8b/0x100 [xe] xe_tile_init_noalloc+0x6a/0x70 [xe] xe_device_probe+0x48c/0x740 [xe] [... snip ...] Link: https://lkml.kernel.org/r/20250818020206.4517-4-harry.yoo@oracle.com Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com> Closes: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@in… [1] Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com> Acked-by: Kiryl Shutsemau <kas(a)kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Alistair Popple <apopple(a)nvidia.com> Cc: Andrey Konovalov <andreyknvl(a)gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com> Cc: Ard Biesheuvel <ardb(a)kernel.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: bibo mao <maobibo(a)loongson.cn> Cc: Borislav Betkov <bp(a)alien8.de> Cc: Christoph Lameter (Ampere) <cl(a)gentwo.org> Cc: Dennis Zhou <dennis(a)kernel.org> Cc: Dev Jain <dev.jain(a)arm.com> Cc: Dmitriy Vyukov <dvyukov(a)google.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Jane Chu <jane.chu(a)oracle.com> Cc: Joao Martins <joao.m.martins(a)oracle.com> Cc: Joerg Roedel <joro(a)8bytes.org> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Kevin Brodsky <kevin.brodsky(a)arm.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Peter Xu <peterx(a)redhat.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Qi Zheng <zhengqi.arch(a)bytedance.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: Thomas Gleinxer <tglx(a)linutronix.de> Cc: Thomas Huth <thuth(a)redhat.com> Cc: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/x86/include/asm/pgtable_64_types.h | 3 +++ arch/x86/mm/init_64.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+) --- a/arch/x86/include/asm/pgtable_64_types.h~x86-mm-64-define-arch_page_table_sync_mask-and-arch_sync_kernel_mappings +++ a/arch/x86/include/asm/pgtable_64_types.h @@ -36,6 +36,9 @@ static inline bool pgtable_l5_enabled(vo #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57) #endif /* USE_EARLY_PGTABLE_L5 */ +#define ARCH_PAGE_TABLE_SYNC_MASK \ + (pgtable_l5_enabled() ? PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED) + extern unsigned int pgdir_shift; extern unsigned int ptrs_per_p4d; --- a/arch/x86/mm/init_64.c~x86-mm-64-define-arch_page_table_sync_mask-and-arch_sync_kernel_mappings +++ a/arch/x86/mm/init_64.c @@ -224,6 +224,24 @@ static void sync_global_pgds(unsigned lo } /* + * Make kernel mappings visible in all page tables in the system. + * This is necessary except when the init task populates kernel mappings + * during the boot process. In that case, all processes originating from + * the init task copies the kernel mappings, so there is no issue. + * Otherwise, missing synchronization could lead to kernel crashes due + * to missing page table entries for certain kernel mappings. + * + * Synchronization is performed at the top level, which is the PGD in + * 5-level paging systems. But in 4-level paging systems, however, + * pgd_populate() is a no-op, so synchronization is done at the P4D level. + * sync_global_pgds() handles this difference between paging levels. + */ +void arch_sync_kernel_mappings(unsigned long start, unsigned long end) +{ + sync_global_pgds(start, end); +} + +/* * NOTE: This function is marked __ref because it calls __init function * (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0. */ _ Patches currently in -mm which might be from harry.yoo(a)oracle.com are mm-move-page-table-sync-declarations-to-linux-pgtableh.patch mm-introduce-and-use-pgdp4d_populate_kernel.patch x86-mm-64-define-arch_page_table_sync_mask-and-arch_sync_kernel_mappings.patch

3 weeks, 3 days

1
0
0 0

+ mm-introduce-and-use-pgdp4d_populate_kernel.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: introduce and use {pgd,p4d}_populate_kernel() has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-introduce-and-use-pgdp4d_populate_kernel.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Harry Yoo <harry.yoo(a)oracle.com> Subject: mm: introduce and use {pgd,p4d}_populate_kernel() Date: Mon, 18 Aug 2025 11:02:05 +0900 Introduce and use {pgd,p4d}_populate_kernel() in core MM code when populating PGD and P4D entries for the kernel address space. These helpers ensure proper synchronization of page tables when updating the kernel portion of top-level page tables. Until now, the kernel has relied on each architecture to handle synchronization of top-level page tables in an ad-hoc manner. For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes"). However, this approach has proven fragile for following reasons: 1) It is easy to forget to perform the necessary page table synchronization when introducing new changes. For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") overlooked the need to synchronize page tables for the vmemmap area. 2) It is also easy to overlook that the vmemmap and direct mapping areas must not be accessed before explicit page table synchronization. For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")) caused crashes by accessing the vmemmap area before calling sync_global_pgds(). To address this, as suggested by Dave Hansen, introduce _kernel() variants of the page table population helpers, which invoke architecture-specific hooks to properly synchronize page tables. These are introduced in a new header file, include/linux/pgalloc.h, so they can be called from common code. They reuse existing infrastructure for vmalloc and ioremap. Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK, and the actual synchronization is performed by arch_sync_kernel_mappings(). This change currently targets only x86_64, so only PGD and P4D level helpers are introduced. Currently, these helpers are no-ops since no architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK. In theory, PUD and PMD level helpers can be added later if needed by other architectures. For now, 32-bit architectures (x86-32 and arm) only handle PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless we introduce a PMD level helper. Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com> Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com> Acked-by: Kiryl Shutsemau <kas(a)kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Alistair Popple <apopple(a)nvidia.com> Cc: Andrey Konovalov <andreyknvl(a)gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com> Cc: Ard Biesheuvel <ardb(a)kernel.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: bibo mao <maobibo(a)loongson.cn> Cc: Borislav Betkov <bp(a)alien8.de> Cc: Christoph Lameter (Ampere) <cl(a)gentwo.org> Cc: Dennis Zhou <dennis(a)kernel.org> Cc: Dev Jain <dev.jain(a)arm.com> Cc: Dmitriy Vyukov <dvyukov(a)google.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun(a)intel.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Jane Chu <jane.chu(a)oracle.com> Cc: Joao Martins <joao.m.martins(a)oracle.com> Cc: Joerg Roedel <joro(a)8bytes.org> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Kevin Brodsky <kevin.brodsky(a)arm.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Peter Xu <peterx(a)redhat.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Qi Zheng <zhengqi.arch(a)bytedance.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: Thomas Gleinxer <tglx(a)linutronix.de> Cc: Thomas Huth <thuth(a)redhat.com> Cc: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/pgalloc.h | 24 ++++++++++++++++++++++++ include/linux/pgtable.h | 13 +++++++------ mm/kasan/init.c | 12 ++++++------ mm/percpu.c | 6 +++--- mm/sparse-vmemmap.c | 6 +++--- 5 files changed, 43 insertions(+), 18 deletions(-) diff --git a/include/linux/pgalloc.h a/include/linux/pgalloc.h new file mode 100644 --- /dev/null +++ a/include/linux/pgalloc.h @@ -0,0 +1,24 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PGALLOC_H +#define _LINUX_PGALLOC_H + +#include <linux/pgtable.h> +#include <asm/pgalloc.h> + +static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd, + p4d_t *p4d) +{ + pgd_populate(&init_mm, pgd, p4d); + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED) + arch_sync_kernel_mappings(addr, addr); +} + +static inline void p4d_populate_kernel(unsigned long addr, p4d_t *p4d, + pud_t *pud) +{ + p4d_populate(&init_mm, p4d, pud); + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED) + arch_sync_kernel_mappings(addr, addr); +} + +#endif /* _LINUX_PGALLOC_H */ --- a/include/linux/pgtable.h~mm-introduce-and-use-pgdp4d_populate_kernel +++ a/include/linux/pgtable.h @@ -1469,8 +1469,8 @@ static inline void modify_prot_commit_pt /* * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values - * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings() - * needs to be called. + * and let generic vmalloc, ioremap and page table update code know when + * arch_sync_kernel_mappings() needs to be called. */ #ifndef ARCH_PAGE_TABLE_SYNC_MASK #define ARCH_PAGE_TABLE_SYNC_MASK 0 @@ -1954,10 +1954,11 @@ static inline bool arch_has_pfn_modify_c /* * Page Table Modification bits for pgtbl_mod_mask. * - * These are used by the p?d_alloc_track*() set of functions an in the generic - * vmalloc/ioremap code to track at which page-table levels entries have been - * modified. Based on that the code can better decide when vmalloc and ioremap - * mapping changes need to be synchronized to other page-tables in the system. + * These are used by the p?d_alloc_track*() and p*d_populate_kernel() + * functions in the generic vmalloc, ioremap and page table update code + * to track at which page-table levels entries have been modified. + * Based on that the code can better decide when page table changes need + * to be synchronized to other page-tables in the system. */ #define __PGTBL_PGD_MODIFIED 0 #define __PGTBL_P4D_MODIFIED 1 --- a/mm/kasan/init.c~mm-introduce-and-use-pgdp4d_populate_kernel +++ a/mm/kasan/init.c @@ -13,9 +13,9 @@ #include <linux/mm.h> #include <linux/pfn.h> #include <linux/slab.h> +#include <linux/pgalloc.h> #include <asm/page.h> -#include <asm/pgalloc.h> #include "kasan.h" @@ -191,7 +191,7 @@ static int __ref zero_p4d_populate(pgd_t pud_t *pud; pmd_t *pmd; - p4d_populate(&init_mm, p4d, + p4d_populate_kernel(addr, p4d, lm_alias(kasan_early_shadow_pud)); pud = pud_offset(p4d, addr); pud_populate(&init_mm, pud, @@ -212,7 +212,7 @@ static int __ref zero_p4d_populate(pgd_t } else { p = early_alloc(PAGE_SIZE, NUMA_NO_NODE); pud_init(p); - p4d_populate(&init_mm, p4d, p); + p4d_populate_kernel(addr, p4d, p); } } zero_pud_populate(p4d, addr, next); @@ -251,10 +251,10 @@ int __ref kasan_populate_early_shadow(co * puds,pmds, so pgd_populate(), pud_populate() * is noops. */ - pgd_populate(&init_mm, pgd, + pgd_populate_kernel(addr, pgd, lm_alias(kasan_early_shadow_p4d)); p4d = p4d_offset(pgd, addr); - p4d_populate(&init_mm, p4d, + p4d_populate_kernel(addr, p4d, lm_alias(kasan_early_shadow_pud)); pud = pud_offset(p4d, addr); pud_populate(&init_mm, pud, @@ -273,7 +273,7 @@ int __ref kasan_populate_early_shadow(co if (!p) return -ENOMEM; } else { - pgd_populate(&init_mm, pgd, + pgd_populate_kernel(addr, pgd, early_alloc(PAGE_SIZE, NUMA_NO_NODE)); } } --- a/mm/percpu.c~mm-introduce-and-use-pgdp4d_populate_kernel +++ a/mm/percpu.c @@ -3108,7 +3108,7 @@ out_free: #endif /* BUILD_EMBED_FIRST_CHUNK */ #ifdef BUILD_PAGE_FIRST_CHUNK -#include <asm/pgalloc.h> +#include <linux/pgalloc.h> #ifndef P4D_TABLE_SIZE #define P4D_TABLE_SIZE PAGE_SIZE @@ -3134,13 +3134,13 @@ void __init __weak pcpu_populate_pte(uns if (pgd_none(*pgd)) { p4d = memblock_alloc_or_panic(P4D_TABLE_SIZE, P4D_TABLE_SIZE); - pgd_populate(&init_mm, pgd, p4d); + pgd_populate_kernel(addr, pgd, p4d); } p4d = p4d_offset(pgd, addr); if (p4d_none(*p4d)) { pud = memblock_alloc_or_panic(PUD_TABLE_SIZE, PUD_TABLE_SIZE); - p4d_populate(&init_mm, p4d, pud); + p4d_populate_kernel(addr, p4d, pud); } pud = pud_offset(p4d, addr); --- a/mm/sparse-vmemmap.c~mm-introduce-and-use-pgdp4d_populate_kernel +++ a/mm/sparse-vmemmap.c @@ -27,9 +27,9 @@ #include <linux/spinlock.h> #include <linux/vmalloc.h> #include <linux/sched.h> +#include <linux/pgalloc.h> #include <asm/dma.h> -#include <asm/pgalloc.h> #include <asm/tlbflush.h> #include "hugetlb_vmemmap.h" @@ -229,7 +229,7 @@ p4d_t * __meminit vmemmap_p4d_populate(p if (!p) return NULL; pud_init(p); - p4d_populate(&init_mm, p4d, p); + p4d_populate_kernel(addr, p4d, p); } return p4d; } @@ -241,7 +241,7 @@ pgd_t * __meminit vmemmap_pgd_populate(u void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); if (!p) return NULL; - pgd_populate(&init_mm, pgd, p); + pgd_populate_kernel(addr, pgd, p); } return pgd; } _ Patches currently in -mm which might be from harry.yoo(a)oracle.com are mm-move-page-table-sync-declarations-to-linux-pgtableh.patch mm-introduce-and-use-pgdp4d_populate_kernel.patch x86-mm-64-define-arch_page_table_sync_mask-and-arch_sync_kernel_mappings.patch

3 weeks, 3 days

1
0
0 0

+ mm-move-page-table-sync-declarations-to-linux-pgtableh.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: move page table sync declarations to linux/pgtable.h has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-move-page-table-sync-declarations-to-linux-pgtableh.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Harry Yoo <harry.yoo(a)oracle.com> Subject: mm: move page table sync declarations to linux/pgtable.h Date: Mon, 18 Aug 2025 11:02:04 +0900 During our internal testing, we started observing intermittent boot failures when the machine uses 4-level paging and has a large amount of persistent memory: BUG: unable to handle page fault for address: ffffe70000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI RIP: 0010:__init_single_page+0x9/0x6d Call Trace: <TASK> __init_zone_device_page+0x17/0x5d memmap_init_zone_device+0x154/0x1bb pagemap_range+0x2e0/0x40f memremap_pages+0x10b/0x2f0 devm_memremap_pages+0x1e/0x60 dev_dax_probe+0xce/0x2ec [device_dax] dax_bus_probe+0x6d/0xc9 [... snip ...] </TASK> It turns out that the kernel panics while initializing vmemmap (struct page array) when the vmemmap region spans two PGD entries, because the new PGD entry is only installed in init_mm.pgd, but not in the page tables of other tasks. And looking at __populate_section_memmap(): if (vmemmap_can_optimize(altmap, pgmap)) // does not sync top level page tables r = vmemmap_populate_compound_pages(pfn, start, end, nid, pgmap); else // sync top level page tables in x86 r = vmemmap_populate(start, end, nid, altmap); In the normal path, vmemmap_populate() in arch/x86/mm/init_64.c synchronizes the top level page table (See commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes")) so that all tasks in the system can see the new vmemmap area. However, when vmemmap_can_optimize() returns true, the optimized path skips synchronization of top-level page tables. This is because vmemmap_populate_compound_pages() is implemented in core MM code, which does not handle synchronization of the top-level page tables. Instead, the core MM has historically relied on each architecture to perform this synchronization manually. We're not the first party to encounter a crash caused by not-sync'd top level page tables: earlier this year, Gwan-gyeong Mun attempted to address the issue [1] [2] after hitting a kernel panic when x86 code accessed the vmemmap area before the corresponding top-level entries were synced. At that time, the issue was believed to be triggered only when struct page was enlarged for debugging purposes, and the patch did not get further updates. It turns out that current approach of relying on each arch to handle the page table sync manually is fragile because 1) it's easy to forget to sync the top level page table, and 2) it's also easy to overlook that the kernel should not access the vmemmap and direct mapping areas before the sync. # The solution: Make page table sync more code robust and harder to miss To address this, Dave Hansen suggested [3] [4] introducing {pgd,p4d}_populate_kernel() for updating kernel portion of the page tables and allow each architecture to explicitly perform synchronization when installing top-level entries. With this approach, we no longer need to worry about missing the sync step, reducing the risk of future regressions. The new interface reuses existing ARCH_PAGE_TABLE_SYNC_MASK, PGTBL_P*D_MODIFIED and arch_sync_kernel_mappings() facility used by vmalloc and ioremap to synchronize page tables. pgd_populate_kernel() looks like this: static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd, p4d_t *p4d) { pgd_populate(&init_mm, pgd, p4d); if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED) arch_sync_kernel_mappings(addr, addr); } It is worth noting that vmalloc() and apply_to_range() carefully synchronizes page tables by calling p*d_alloc_track() and arch_sync_kernel_mappings(), and thus they are not affected by this patch series. This series was hugely inspired by Dave Hansen's suggestion and hence added Suggested-by: Dave Hansen. Cc stable because lack of this series opens the door to intermittent boot failures. This patch (of 3): Move ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to linux/pgtable.h so that they can be used outside of vmalloc and ioremap. Link: https://lkml.kernel.org/r/20250818020206.4517-1-harry.yoo@oracle.com Link: https://lkml.kernel.org/r/20250818020206.4517-2-harry.yoo@oracle.com Link: https://lore.kernel.org/linux-mm/20250220064105.808339-1-gwan-gyeong.mun@in… [1] Link: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@in… [2] Link: https://lore.kernel.org/linux-mm/d1da214c-53d3-45ac-a8b6-51821c5416e4@intel… [3] Link: https://lore.kernel.org/linux-mm/4d800744-7b88-41aa-9979-b245e8bf794b@intel… [4] Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com> Acked-by: Kiryl Shutsemau <kas(a)kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org> Reviewed-by: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Alistair Popple <apopple(a)nvidia.com> Cc: Andrey Konovalov <andreyknvl(a)gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com> Cc: Ard Biesheuvel <ardb(a)kernel.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: bibo mao <maobibo(a)loongson.cn> Cc: Borislav Betkov <bp(a)alien8.de> Cc: Christoph Lameter (Ampere) <cl(a)gentwo.org> Cc: Dennis Zhou <dennis(a)kernel.org> Cc: Dev Jain <dev.jain(a)arm.com> Cc: Dmitriy Vyukov <dvyukov(a)google.com> Cc: Gwan-gyeong Mun <gwan-gyeong.mun(a)intel.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Jane Chu <jane.chu(a)oracle.com> Cc: Joao Martins <joao.m.martins(a)oracle.com> Cc: Joerg Roedel <joro(a)8bytes.org> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Kevin Brodsky <kevin.brodsky(a)arm.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Peter Xu <peterx(a)redhat.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Qi Zheng <zhengqi.arch(a)bytedance.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: Thomas Gleinxer <tglx(a)linutronix.de> Cc: Thomas Huth <thuth(a)redhat.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/pgtable.h | 16 ++++++++++++++++ include/linux/vmalloc.h | 16 ---------------- 2 files changed, 16 insertions(+), 16 deletions(-) --- a/include/linux/pgtable.h~mm-move-page-table-sync-declarations-to-linux-pgtableh +++ a/include/linux/pgtable.h @@ -1467,6 +1467,22 @@ static inline void modify_prot_commit_pt } #endif +/* + * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values + * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings() + * needs to be called. + */ +#ifndef ARCH_PAGE_TABLE_SYNC_MASK +#define ARCH_PAGE_TABLE_SYNC_MASK 0 +#endif + +/* + * There is no default implementation for arch_sync_kernel_mappings(). It is + * relied upon the compiler to optimize calls out if ARCH_PAGE_TABLE_SYNC_MASK + * is 0. + */ +void arch_sync_kernel_mappings(unsigned long start, unsigned long end); + #endif /* CONFIG_MMU */ /* --- a/include/linux/vmalloc.h~mm-move-page-table-sync-declarations-to-linux-pgtableh +++ a/include/linux/vmalloc.h @@ -220,22 +220,6 @@ int vmap_pages_range(unsigned long addr, struct page **pages, unsigned int page_shift); /* - * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values - * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings() - * needs to be called. - */ -#ifndef ARCH_PAGE_TABLE_SYNC_MASK -#define ARCH_PAGE_TABLE_SYNC_MASK 0 -#endif - -/* - * There is no default implementation for arch_sync_kernel_mappings(). It is - * relied upon the compiler to optimize calls out if ARCH_PAGE_TABLE_SYNC_MASK - * is 0. - */ -void arch_sync_kernel_mappings(unsigned long start, unsigned long end); - -/* * Lowlevel-APIs (not for driver use!) */ _ Patches currently in -mm which might be from harry.yoo(a)oracle.com are mm-move-page-table-sync-declarations-to-linux-pgtableh.patch mm-introduce-and-use-pgdp4d_populate_kernel.patch x86-mm-64-define-arch_page_table_sync_mask-and-arch_sync_kernel_mappings.patch

3 weeks, 3 days

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror