Linux-stable-mirror March 2021

linux-stable-mirror@lists.linaro.org

372 participants
2280 discussions

[merged] mm-userfaultfd-fix-memory-corruption-due-to-writeprotect.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/userfaultfd: fix memory corruption due to writeprotect has been removed from the -mm tree. Its filename was mm-userfaultfd-fix-memory-corruption-due-to-writeprotect.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Nadav Amit <namit(a)vmware.com> Subject: mm/userfaultfd: fix memory corruption due to writeprotect Userfaultfd self-test fails occasionally, indicating a memory corruption. Analyzing this problem indicates that there is a real bug since mmap_lock is only taken for read in mwriteprotect_range() and defers flushes, and since there is insufficient consideration of concurrent deferred TLB flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in wp_page_copy(), this flush takes place after the copy has already been performed, and therefore changes of the page are possible between the time of the copy and the time in which the PTE is flushed. To make matters worse, memory-unprotection using userfaultfd also poses a problem. Although memory unprotection is logically a promotion of PTE permissions, and therefore should not require a TLB flush, the current userrfaultfd code might actually cause a demotion of the architectural PTE permission: when userfaultfd_writeprotect() unprotects memory region, it unintentionally *clears* the RW-bit if it was already set. Note that this unprotecting a PTE that is not write-protected is a valid use-case: the userfaultfd monitor might ask to unprotect a region that holds both write-protected and write-unprotected PTEs. The scenario that happens in selftests/vm/userfaultfd is as follows: cpu0 cpu1 cpu2 ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-*unprotect* ] mwriteprotect_range() mmap_read_lock() change_protection() change_protection_range() ... change_pte_range() [ *clear* “write”-bit ] [ defer TLB flushes ] [ page-fault ] ... wp_page_copy() cow_user_page() [ copy page ] [ write to old page ] ... set_pte_at_notify() A similar scenario can happen: cpu0 cpu1 cpu2 cpu3 ---- ---- ---- ---- [ Writable PTE cached in TLB ] userfaultfd_writeprotect() [ write-protect ] [ deferred TLB flush ] userfaultfd_writeprotect() [ write-unprotect ] [ deferred TLB flush] [ page-fault ] wp_page_copy() cow_user_page() [ copy page ] ... [ write to page ] set_pte_at_notify() This race exists since commit 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made wp_page_copy() more likely to take place, specifically if page_count(page) > 1. To resolve the aforementioned races, check whether there are pending flushes on uffd-write-protected VMAs, and if there are, perform a flush before doing the COW. Further optimizations will follow to avoid during uffd-write-unprotect unnecassary PTE write-protection and TLB flushes. Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Signed-off-by: Nadav Amit <namit(a)vmware.com> Suggested-by: Yu Zhao <yuzhao(a)google.com> Reviewed-by: Peter Xu <peterx(a)redhat.com> Tested-by: Peter Xu <peterx(a)redhat.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Pavel Emelyanov <xemul(a)openvz.org> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Will Deacon <will(a)kernel.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: <stable(a)vger.kernel.org> [5.9+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/mm/memory.c~mm-userfaultfd-fix-memory-corruption-due-to-writeprotect +++ a/mm/memory.c @@ -3097,6 +3097,14 @@ static vm_fault_t do_wp_page(struct vm_f return handle_userfault(vmf, VM_UFFD_WP); } + /* + * Userfaultfd write-protect can defer flushes. Ensure the TLB + * is flushed in this case before copying. + */ + if (unlikely(userfaultfd_wp(vmf->vma) && + mm_tlb_flush_pending(vmf->vma->vm_mm))) + flush_tlb_page(vmf->vma, vmf->address); + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /* _ Patches currently in -mm which might be from namit(a)vmware.com are

4 years, 9 months

[merged] kasan-fix-kasan_stack-dependency-for-hw_tags.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: kasan: fix KASAN_STACK dependency for HW_TAGS has been removed from the -mm tree. Its filename was kasan-fix-kasan_stack-dependency-for-hw_tags.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Andrey Konovalov <andreyknvl(a)google.com> Subject: kasan: fix KASAN_STACK dependency for HW_TAGS There's a runtime failure when running HW_TAGS-enabled kernel built with GCC on hardware that doesn't support MTE. GCC-built kernels always have CONFIG_KASAN_STACK enabled, even though stack instrumentation isn't supported by HW_TAGS. Having that config enabled causes KASAN to issue MTE-only instructions to unpoison kernel stacks, which causes the failure. Fix the issue by disallowing CONFIG_KASAN_STACK when HW_TAGS is used. (The commit that introduced CONFIG_KASAN_HW_TAGS specified proper dependency for CONFIG_KASAN_STACK_ENABLE but not for CONFIG_KASAN_STACK.) Link: https://lkml.kernel.org/r/59e75426241dbb5611277758c8d4d6f5f9298dac.16152154… Fixes: 6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS") Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com> Reported-by: Catalin Marinas <catalin.marinas(a)arm.com> Cc: <stable(a)vger.kernel.org> Cc: Will Deacon <will.deacon(a)arm.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Marco Elver <elver(a)google.com> Cc: Peter Collingbourne <pcc(a)google.com> Cc: Evgenii Stepanov <eugenis(a)google.com> Cc: Branislav Rankov <Branislav.Rankov(a)arm.com> Cc: Kevin Brodsky <kevin.brodsky(a)arm.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/Kconfig.kasan | 1 + 1 file changed, 1 insertion(+) --- a/lib/Kconfig.kasan~kasan-fix-kasan_stack-dependency-for-hw_tags +++ a/lib/Kconfig.kasan @@ -156,6 +156,7 @@ config KASAN_STACK_ENABLE config KASAN_STACK int + depends on KASAN_GENERIC || KASAN_SW_TAGS default 1 if KASAN_STACK_ENABLE || CC_IS_GCC default 0 _ Patches currently in -mm which might be from andreyknvl(a)google.com are kasan-fix-per-page-tags-for-non-page_alloc-pages.patch kasan-initialize-shadow-to-tag_invalid-for-sw_tags.patch mm-kasan-dont-poison-boot-memory-with-tag-based-modes.patch arm64-kasan-allow-to-init-memory-when-setting-tags.patch kasan-init-memory-in-kasan_unpoison-for-hw_tags.patch kasan-mm-integrate-page_alloc-init-with-hw_tags.patch kasan-mm-integrate-slab-init_on_alloc-with-hw_tags.patch kasan-mm-integrate-slab-init_on_free-with-hw_tags.patch kasan-docs-clean-up-sections.patch kasan-docs-update-overview-section.patch kasan-docs-update-usage-section.patch kasan-docs-update-error-reports-section.patch kasan-docs-update-boot-parameters-section.patch kasan-docs-update-generic-implementation-details-section.patch kasan-docs-update-sw_tags-implementation-details-section.patch kasan-docs-update-hw_tags-implementation-details-section.patch kasan-docs-update-shadow-memory-section.patch kasan-docs-update-ignoring-accesses-section.patch kasan-docs-update-tests-section.patch

4 years, 9 months

[merged] kasan-mm-fix-crash-with-hw_tags-and-debug_pagealloc.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC has been removed from the -mm tree. Its filename was kasan-mm-fix-crash-with-hw_tags-and-debug_pagealloc.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Andrey Konovalov <andreyknvl(a)google.com> Subject: kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called after debug_pagealloc_unmap_pages(). This causes a crash when debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an unmapped page. This patch puts kasan_free_nondeferred_pages() before debug_pagealloc_unmap_pages() and arch_free_page(), which can also make the page unavailable. Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.16149873… Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS") Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Will Deacon <will.deacon(a)arm.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Marco Elver <elver(a)google.com> Cc: Peter Collingbourne <pcc(a)google.com> Cc: Evgenii Stepanov <eugenis(a)google.com> Cc: Branislav Rankov <Branislav.Rankov(a)arm.com> Cc: Kevin Brodsky <kevin.brodsky(a)arm.com> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/mm/page_alloc.c~kasan-mm-fix-crash-with-hw_tags-and-debug_pagealloc +++ a/mm/page_alloc.c @@ -1282,6 +1282,12 @@ static __always_inline bool free_pages_p kernel_poison_pages(page, 1 << order); /* + * With hardware tag-based KASAN, memory tags must be set before the + * page becomes unavailable via debug_pagealloc or arch_free_page. + */ + kasan_free_nondeferred_pages(page, order); + + /* * arch_free_page() can make the page's contents inaccessible. s390 * does this. So nothing which can access the page's contents should * happen after this. @@ -1290,8 +1296,6 @@ static __always_inline bool free_pages_p debug_pagealloc_unmap_pages(page, 1 << order); - kasan_free_nondeferred_pages(page, order); - return true; } _ Patches currently in -mm which might be from andreyknvl(a)google.com are kasan-fix-per-page-tags-for-non-page_alloc-pages.patch kasan-initialize-shadow-to-tag_invalid-for-sw_tags.patch mm-kasan-dont-poison-boot-memory-with-tag-based-modes.patch arm64-kasan-allow-to-init-memory-when-setting-tags.patch kasan-init-memory-in-kasan_unpoison-for-hw_tags.patch kasan-mm-integrate-page_alloc-init-with-hw_tags.patch kasan-mm-integrate-slab-init_on_alloc-with-hw_tags.patch kasan-mm-integrate-slab-init_on_free-with-hw_tags.patch kasan-docs-clean-up-sections.patch kasan-docs-update-overview-section.patch kasan-docs-update-usage-section.patch kasan-docs-update-error-reports-section.patch kasan-docs-update-boot-parameters-section.patch kasan-docs-update-generic-implementation-details-section.patch kasan-docs-update-sw_tags-implementation-details-section.patch kasan-docs-update-hw_tags-implementation-details-section.patch kasan-docs-update-shadow-memory-section.patch kasan-docs-update-ignoring-accesses-section.patch kasan-docs-update-tests-section.patch

4 years, 9 months

[merged] mm-madvise-replace-ptrace-attach-requirement-for-process_madvise.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/madvise: replace ptrace attach requirement for process_madvise has been removed from the -mm tree. Its filename was mm-madvise-replace-ptrace-attach-requirement-for-process_madvise.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Suren Baghdasaryan <surenb(a)google.com> Subject: mm/madvise: replace ptrace attach requirement for process_madvise process_madvise currently requires ptrace attach capability. PTRACE_MODE_ATTACH gives one process complete control over another process. It effectively removes the security boundary between the two processes (in one direction). Granting ptrace attach capability even to a system process is considered dangerous since it creates an attack surface. This severely limits the usage of this API. The operations process_madvise can perform do not affect the correctness of the operation of the target process; they only affect where the data is physically located (and therefore, how fast it can be accessed). What we want is the ability for one process to influence another process in order to optimize performance across the entire system while leaving the security boundary intact. Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata and CAP_SYS_NICE for influencing process performance. Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb(a)google.com> Reviewed-by: Kees Cook <keescook(a)chromium.org> Acked-by: Minchan Kim <minchan(a)kernel.org> Acked-by: David Rientjes <rientjes(a)google.com> Cc: Jann Horn <jannh(a)google.com> Cc: Jeff Vander Stoep <jeffv(a)google.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Shakeel Butt <shakeelb(a)google.com> Cc: Tim Murray <timmurray(a)google.com> Cc: Florian Weimer <fweimer(a)redhat.com> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: James Morris <jmorris(a)namei.org> Cc: <stable(a)vger.kernel.org> [5.10+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/madvise.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) --- a/mm/madvise.c~mm-madvise-replace-ptrace-attach-requirement-for-process_madvise +++ a/mm/madvise.c @@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pi goto release_task; } - mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS); + /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */ + mm = mm_access(task, PTRACE_MODE_READ_FSCREDS); if (IS_ERR_OR_NULL(mm)) { ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; goto release_task; } + /* + * Require CAP_SYS_NICE for influencing process performance. Note that + * only non-destructive hints are currently supported. + */ + if (!capable(CAP_SYS_NICE)) { + ret = -EPERM; + goto release_mm; + } + total_len = iov_iter_count(&iter); while (iov_iter_count(&iter)) { @@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi if (ret == 0) ret = total_len - iov_iter_count(&iter); +release_mm: mmput(mm); release_task: put_task_struct(task); _ Patches currently in -mm which might be from surenb(a)google.com are

4 years, 9 months

[merged] linux-compiler-clangh-define-have_builtin_bswap.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* has been removed from the -mm tree. Its filename was linux-compiler-clangh-define-have_builtin_bswap.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Arnd Bergmann <arnd(a)arndb.de> Subject: linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP* Separating compiler-clang.h from compiler-gcc.h inadventently dropped the definitions of the three HAVE_BUILTIN_BSWAP macros, which requires falling back to the open-coded version and hoping that the compiler detects it. Since all versions of clang support the __builtin_bswap interfaces, add back the flags and have the headers pick these up automatically. This results in a 4% improvement of compilation speed for arm defconfig. Note: it might also be worth revisiting which architectures set CONFIG_ARCH_USE_BUILTIN_BSWAP for one compiler or the other, today this is set on six architectures (arm32, csky, mips, powerpc, s390, x86), while another ten architectures define custom helpers (alpha, arc, ia64, m68k, mips, nios2, parisc, sh, sparc, xtensa), and the rest (arm64, h8300, hexagon, microblaze, nds32, openrisc, riscv) just get the unoptimized version and rely on the compiler to detect it. A long time ago, the compiler builtins were architecture specific, but nowadays, all compilers that are able to build the kernel have correct implementations of them, though some may not be as optimized as the inline asm versions. The patch that dropped the optimization landed in v4.19, so as discussed it would be fairly safe to backport this revert to stable kernels to the 4.19/5.4/5.10 stable kernels, but there is a remaining risk for regressions, and it has no known side-effects besides compile speed. Link: https://lkml.kernel.org/r/20210226161151.2629097-1-arnd@kernel.org Link: https://lore.kernel.org/lkml/20210225164513.3667778-1-arnd@kernel.org/ Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive") Signed-off-by: Arnd Bergmann <arnd(a)arndb.de> Reviewed-by: Nathan Chancellor <nathan(a)kernel.org> Reviewed-by: Kees Cook <keescook(a)chromium.org> Acked-by: Miguel Ojeda <ojeda(a)kernel.org> Acked-by: Nick Desaulniers <ndesaulniers(a)google.com> Acked-by: Luc Van Oostenryck <luc.vanoostenryck(a)gmail.com> Cc: Masahiro Yamada <masahiroy(a)kernel.org> Cc: Nick Hu <nickhu(a)andestech.com> Cc: Greentime Hu <green.hu(a)gmail.com> Cc: Vincent Chen <deanbo422(a)gmail.com> Cc: Paul Walmsley <paul.walmsley(a)sifive.com> Cc: Palmer Dabbelt <palmer(a)dabbelt.com> Cc: Albert Ou <aou(a)eecs.berkeley.edu> Cc: Guo Ren <guoren(a)kernel.org> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: Sami Tolvanen <samitolvanen(a)google.com> Cc: Marco Elver <elver(a)google.com> Cc: Arvind Sankar <nivedita(a)alum.mit.edu> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/compiler-clang.h | 6 ++++++ 1 file changed, 6 insertions(+) --- a/include/linux/compiler-clang.h~linux-compiler-clangh-define-have_builtin_bswap +++ a/include/linux/compiler-clang.h @@ -31,6 +31,12 @@ #define __no_sanitize_thread #endif +#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) +#define __HAVE_BUILTIN_BSWAP32__ +#define __HAVE_BUILTIN_BSWAP64__ +#define __HAVE_BUILTIN_BSWAP16__ +#endif /* CONFIG_ARCH_USE_BUILTIN_BSWAP */ + #if __has_feature(undefined_behavior_sanitizer) /* GCC does not have __SANITIZE_UNDEFINED__ */ #define __no_sanitize_undefined \ _ Patches currently in -mm which might be from arnd(a)arndb.de are

4 years, 9 months

[merged] binfmt_misc-fix-possible-deadlock-in-bm_register_write.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: binfmt_misc: fix possible deadlock in bm_register_write has been removed from the -mm tree. Its filename was binfmt_misc-fix-possible-deadlock-in-bm_register_write.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Lior Ribak <liorribak(a)gmail.com> Subject: binfmt_misc: fix possible deadlock in bm_register_write There is a deadlock in bm_register_write: First, in the begining of the function, a lock is taken on the binfmt_misc root inode with inode_lock(d_inode(root)). Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call open_exec on the user-provided interpreter. open_exec will call a path lookup, and if the path lookup process includes the root of binfmt_misc, it will try to take a shared lock on its inode again, but it is already locked, and the code will get stuck in a deadlock To reproduce the bug: $ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register backtrace of where the lock occurs (#5): 0 schedule () at ./arch/x86/include/asm/current.h:15 1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992 2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213 3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222 4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355 5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783 6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177 7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366 8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396 9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913 10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948 11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682 12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF ", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603 13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF ", count=19) at fs/read_write.c:658 14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46 15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120 To solve the issue, the open_exec call is moved to before the write lock is taken by bm_register_write Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com Fixes: 948b701a607f1 ("binfmt_misc: add persistent opened binary handler for containers") Signed-off-by: Lior Ribak <liorribak(a)gmail.com> Acked-by: Helge Deller <deller(a)gmx.de> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/binfmt_misc.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) --- a/fs/binfmt_misc.c~binfmt_misc-fix-possible-deadlock-in-bm_register_write +++ a/fs/binfmt_misc.c @@ -649,12 +649,24 @@ static ssize_t bm_register_write(struct struct super_block *sb = file_inode(file)->i_sb; struct dentry *root = sb->s_root, *dentry; int err = 0; + struct file *f = NULL; e = create_entry(buffer, count); if (IS_ERR(e)) return PTR_ERR(e); + if (e->flags & MISC_FMT_OPEN_FILE) { + f = open_exec(e->interpreter); + if (IS_ERR(f)) { + pr_notice("register: failed to install interpreter file %s\n", + e->interpreter); + kfree(e); + return PTR_ERR(f); + } + e->interp_file = f; + } + inode_lock(d_inode(root)); dentry = lookup_one_len(e->name, root, strlen(e->name)); err = PTR_ERR(dentry); @@ -678,21 +690,6 @@ static ssize_t bm_register_write(struct goto out2; } - if (e->flags & MISC_FMT_OPEN_FILE) { - struct file *f; - - f = open_exec(e->interpreter); - if (IS_ERR(f)) { - err = PTR_ERR(f); - pr_notice("register: failed to install interpreter file %s\n", e->interpreter); - simple_release_fs(&bm_mnt, &entry_count); - iput(inode); - inode = NULL; - goto out2; - } - e->interp_file = f; - } - e->dentry = dget(dentry); inode->i_private = e; inode->i_fop = &bm_entry_operations; @@ -709,6 +706,8 @@ out: inode_unlock(d_inode(root)); if (err) { + if (f) + filp_close(f, NULL); kfree(e); return err; } _ Patches currently in -mm which might be from liorribak(a)gmail.com are

4 years, 9 months

[merged] fix-zero_user_segments-with-start-end.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/highmem.c: fix zero_user_segments() with start > end has been removed from the -mm tree. Its filename was fix-zero_user_segments-with-start-end.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp> Subject: mm/highmem.c: fix zero_user_segments() with start > end zero_user_segments() is used from __block_write_begin_int(), for example like the following zero_user_segments(page, 4096, 1024, 512, 918) But new the zero_user_segments() implementation for for HIGHMEM + TRANSPARENT_HUGEPAGE doesn't handle "start > end" case correctly, and hits BUG_ON(). (we can fix __block_write_begin_int() instead though, it is the old and multiple usage) Also it calls kmap_atomic() unnecessarily while start == end == 0. Link: https://lkml.kernel.org/r/87v9ab60r4.fsf@mail.parknet.co.jp Fixes: 0060ef3b4e6d ("mm: support THPs in zero_user_segments") Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/highmem.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) --- a/mm/highmem.c~fix-zero_user_segments-with-start-end +++ a/mm/highmem.c @@ -368,20 +368,24 @@ void zero_user_segments(struct page *pag BUG_ON(end1 > page_size(page) || end2 > page_size(page)); + if (start1 >= end1) + start1 = end1 = 0; + if (start2 >= end2) + start2 = end2 = 0; + for (i = 0; i < compound_nr(page); i++) { void *kaddr = NULL; - if (start1 < PAGE_SIZE || start2 < PAGE_SIZE) - kaddr = kmap_atomic(page + i); - if (start1 >= PAGE_SIZE) { start1 -= PAGE_SIZE; end1 -= PAGE_SIZE; } else { unsigned this_end = min_t(unsigned, end1, PAGE_SIZE); - if (end1 > start1) + if (end1 > start1) { + kaddr = kmap_atomic(page + i); memset(kaddr + start1, 0, this_end - start1); + } end1 -= this_end; start1 = 0; } @@ -392,8 +396,11 @@ void zero_user_segments(struct page *pag } else { unsigned this_end = min_t(unsigned, end2, PAGE_SIZE); - if (end2 > start2) + if (end2 > start2) { + if (!kaddr) + kaddr = kmap_atomic(page + i); memset(kaddr + start2, 0, this_end - start2); + } end2 -= this_end; start2 = 0; } _ Patches currently in -mm which might be from hirofumi(a)mail.parknet.co.jp are

4 years, 9 months

[merged] mm-page_allocc-refactor-initialization-of-struct-page-for-holes-in-memory-layout.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/page_alloc.c: refactor initialization of struct page for holes in memory layout has been removed from the -mm tree. Its filename was mm-page_allocc-refactor-initialization-of-struct-page-for-holes-in-memory-layout.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Mike Rapoport <rppt(a)linux.ibm.com> Subject: mm/page_alloc.c: refactor initialization of struct page for holes in memory layout There could be struct pages that are not backed by actual physical memory. This can happen when the actual memory bank is not a multiple of SECTION_SIZE or when an architecture does not register memory holes reserved by the firmware as memblock.memory. Such pages are currently initialized using init_unavailable_mem() function that iterates through PFNs in holes in memblock.memory and if there is a struct page corresponding to a PFN, the fields of this page are set to default values and it is marked as Reserved. init_unavailable_mem() does not take into account zone and node the page belongs to and sets both zone and node links in struct page to zero. Before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") the holes inside a zone were re-initialized during memmap_init() and got their zone/node links right. However, after that commit nothing updates the struct pages representing such holes. On a system that has firmware reserved holes in a zone above ZONE_DMA, for instance in a configuration below: # grep -A1 E820 /proc/iomem 7a17b000-7a216fff : Unknown E820 type 7a217000-7bffffff : System RAM unset zone link in struct page will trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); in set_pfnblock_flags_mask() when called with a struct page from a range other than E820_TYPE_RAM because there are pages in the range of ZONE_DMA32 but the unset zone link in struct page makes them appear as a part of ZONE_DMA. Interleave initialization of the unavailable pages with the normal initialization of memory map, so that zone and node information will be properly set on struct pages that are not backed by the actual memory. With this change the pages for holes inside a zone will get proper zone/node links and the pages that are not spanned by any node will get links to the adjacent zone/node. The holes between nodes will be prepended to the zone/node above the hole and the trailing pages in the last section that will be appended to the zone/node below. [akpm(a)linux-foundation.org: don't initialize static to zero, use %llu for u64] Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com> Reported-by: Qian Cai <cai(a)lca.pw> Reported-by: Andrea Arcangeli <aarcange(a)redhat.com> Reviewed-by: Baoquan He <bhe(a)redhat.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: Chris Wilson <chris(a)chris-wilson.co.uk> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Łukasz Majczak <lma(a)semihalf.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: "Sarvela, Tomi P" <tomi.p.sarvela(a)intel.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 158 +++++++++++++++++++++------------------------- 1 file changed, 75 insertions(+), 83 deletions(-) --- a/mm/page_alloc.c~mm-page_allocc-refactor-initialization-of-struct-page-for-holes-in-memory-layout +++ a/mm/page_alloc.c @@ -6259,12 +6259,65 @@ static void __meminit zone_init_free_lis } } +#if !defined(CONFIG_FLAT_NODE_MEM_MAP) +/* + * Only struct pages that correspond to ranges defined by memblock.memory + * are zeroed and initialized by going through __init_single_page() during + * memmap_init_zone(). + * + * But, there could be struct pages that correspond to holes in + * memblock.memory. This can happen because of the following reasons: + * - physical memory bank size is not necessarily the exact multiple of the + * arbitrary section size + * - early reserved memory may not be listed in memblock.memory + * - memory layouts defined with memmap= kernel parameter may not align + * nicely with memmap sections + * + * Explicitly initialize those struct pages so that: + * - PG_Reserved is set + * - zone and node links point to zone and node that span the page if the + * hole is in the middle of a zone + * - zone and node links point to adjacent zone/node if the hole falls on + * the zone boundary; the pages in such holes will be prepended to the + * zone/node above the hole except for the trailing pages in the last + * section that will be appended to the zone/node below. + */ +static u64 __meminit init_unavailable_range(unsigned long spfn, + unsigned long epfn, + int zone, int node) +{ + unsigned long pfn; + u64 pgcnt = 0; + + for (pfn = spfn; pfn < epfn; pfn++) { + if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { + pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) + + pageblock_nr_pages - 1; + continue; + } + __init_single_page(pfn_to_page(pfn), pfn, zone, node); + __SetPageReserved(pfn_to_page(pfn)); + pgcnt++; + } + + return pgcnt; +} +#else +static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn, + int zone, int node) +{ + return 0; +} +#endif + void __meminit __weak memmap_init_zone(struct zone *zone) { unsigned long zone_start_pfn = zone->zone_start_pfn; unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages; int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone); + static unsigned long hole_pfn; unsigned long start_pfn, end_pfn; + u64 pgcnt = 0; for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn); @@ -6274,7 +6327,29 @@ void __meminit __weak memmap_init_zone(s memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn, zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); + + if (hole_pfn < start_pfn) + pgcnt += init_unavailable_range(hole_pfn, start_pfn, + zone_id, nid); + hole_pfn = end_pfn; } + +#ifdef CONFIG_SPARSEMEM + /* + * Initialize the hole in the range [zone_end_pfn, section_end]. + * If zone boundary falls in the middle of a section, this hole + * will be re-initialized during the call to this function for the + * higher zone. + */ + end_pfn = round_up(zone_end_pfn, PAGES_PER_SECTION); + if (hole_pfn < end_pfn) + pgcnt += init_unavailable_range(hole_pfn, end_pfn, + zone_id, nid); +#endif + + if (pgcnt) + pr_info(" %s zone: %llu pages in unavailable ranges\n", + zone->name, pgcnt); } static int zone_batchsize(struct zone *zone) @@ -7071,88 +7146,6 @@ void __init free_area_init_memoryless_no free_area_init_node(nid); } -#if !defined(CONFIG_FLAT_NODE_MEM_MAP) -/* - * Initialize all valid struct pages in the range [spfn, epfn) and mark them - * PageReserved(). Return the number of struct pages that were initialized. - */ -static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn) -{ - unsigned long pfn; - u64 pgcnt = 0; - - for (pfn = spfn; pfn < epfn; pfn++) { - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { - pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) - + pageblock_nr_pages - 1; - continue; - } - /* - * Use a fake node/zone (0) for now. Some of these pages - * (in memblock.reserved but not in memblock.memory) will - * get re-initialized via reserve_bootmem_region() later. - */ - __init_single_page(pfn_to_page(pfn), pfn, 0, 0); - __SetPageReserved(pfn_to_page(pfn)); - pgcnt++; - } - - return pgcnt; -} - -/* - * Only struct pages that are backed by physical memory are zeroed and - * initialized by going through __init_single_page(). But, there are some - * struct pages which are reserved in memblock allocator and their fields - * may be accessed (for example page_to_pfn() on some configuration accesses - * flags). We must explicitly initialize those struct pages. - * - * This function also addresses a similar issue where struct pages are left - * uninitialized because the physical address range is not covered by - * memblock.memory or memblock.reserved. That could happen when memblock - * layout is manually configured via memmap=, or when the highest physical - * address (max_pfn) does not end on a section boundary. - */ -static void __init init_unavailable_mem(void) -{ - phys_addr_t start, end; - u64 i, pgcnt; - phys_addr_t next = 0; - - /* - * Loop through unavailable ranges not covered by memblock.memory. - */ - pgcnt = 0; - for_each_mem_range(i, &start, &end) { - if (next < start) - pgcnt += init_unavailable_range(PFN_DOWN(next), - PFN_UP(start)); - next = end; - } - - /* - * Early sections always have a fully populated memmap for the whole - * section - see pfn_valid(). If the last section has holes at the - * end and that section is marked "online", the memmap will be - * considered initialized. Make sure that memmap has a well defined - * state. - */ - pgcnt += init_unavailable_range(PFN_DOWN(next), - round_up(max_pfn, PAGES_PER_SECTION)); - - /* - * Struct pages that do not have backing memory. This could be because - * firmware is using some of this memory, or for some other reasons. - */ - if (pgcnt) - pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt); -} -#else -static inline void __init init_unavailable_mem(void) -{ -} -#endif /* !CONFIG_FLAT_NODE_MEM_MAP */ - #if MAX_NUMNODES > 1 /* * Figure out the number of possible node ids. @@ -7576,7 +7569,6 @@ void __init free_area_init(unsigned long /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids(); - init_unavailable_mem(); for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid); _ Patches currently in -mm which might be from rppt(a)linux.ibm.com are

4 years, 9 months

[PATCH v4 01/11] media: v4l2-ioctl: Fix check_ext_ctrls

by Ricardo Ribalda

Drivers that do not use the ctrl-framework use this function instead. - Return error when handling of REQUEST_VAL. - Do not check for multiple classes when getting the DEF_VAL. Fixes v4l2-compliance: Control ioctls (Input 0): fail: v4l2-test-controls.cpp(813): doioctl(node, VIDIOC_G_EXT_CTRLS, &ctrls) test VIDIOC_G/S/TRY_EXT_CTRLS: FAIL Cc: stable(a)vger.kernel.org Fixes: 6fa6f831f095 ("media: v4l2-ctrls: add core request support") Suggested-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl> Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org> Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com> --- drivers/media/v4l2-core/v4l2-ioctl.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c index 31d1342e61e8..9406e90ff805 100644 --- a/drivers/media/v4l2-core/v4l2-ioctl.c +++ b/drivers/media/v4l2-core/v4l2-ioctl.c @@ -917,15 +917,24 @@ static int check_ext_ctrls(struct v4l2_ext_controls *c, int allow_priv) for (i = 0; i < c->count; i++) c->controls[i].reserved2[0] = 0; - /* V4L2_CID_PRIVATE_BASE cannot be used as control class - when using extended controls. - Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL - is it allowed for backwards compatibility. - */ - if (!allow_priv && c->which == V4L2_CID_PRIVATE_BASE) - return 0; - if (!c->which) + switch (c->which) { + case V4L2_CID_PRIVATE_BASE: + /* + * V4L2_CID_PRIVATE_BASE cannot be used as control class + * when using extended controls. + * Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL + * is it allowed for backwards compatibility. + */ + if (!allow_priv) + return 0; + break; + case V4L2_CTRL_WHICH_DEF_VAL: + case V4L2_CTRL_WHICH_CUR_VAL: return 1; + case V4L2_CTRL_WHICH_REQUEST_VAL: + return 0; + } + /* Check that all controls are from the same control class. */ for (i = 0; i < c->count; i++) { if (V4L2_CTRL_ID2WHICH(c->controls[i].id) != c->which) { -- 2.31.0.rc2.261.g7f71774620-goog

4 years, 9 months

nvme: ns_head vs namespace mismatch fixes

by Anton Eidelman

Please, apply the following two upstream commits (attached) (in this order): d567572906d9 nvme: unlink head after removing last namespace ac262508daa8 nvme: release namespace head reference on error TO: v5.4, v5.5, v5.6, v5.7 These commits are present in v5.8 and apply cleanly to the above. Reason: These fix a potential crash or malfunction when an nvme namespace is deleted and then a new namespace with the same nsid is created before the old ns_head for this nsid is gone. The first commit prevents the new namespace from being matched by nvme_init_ns_head() with the old ns_head causing ID mismatch and consequently a failure to initialize the new namespace. The second commit prevents ns_head refcount imbalance in case nvme_init_ns_head() detects ID mismatch, and consequently a crash later. -- *Lightbits Labs** *Lead the cloud-native data center transformation by delivering *scalable *and *efficient *software defined storage that is *easy *to consume. *This message is sent in confidence for the addressee only. It may contain legally privileged information. The contents are not to be disclosed to anyone other than the addressee. Unauthorized recipients are requested to preserve this confidentiality, advise the sender immediately of any error in transmission and delete the email from their systems.*

4 years, 9 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror March 2021