The patch titled
Subject: mm/userfaultfd: fix memory corruption due to writeprotect
has been removed from the -mm tree. Its filename was
mm-userfaultfd-fix-memory-corruption-due-to-writeprotect.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Nadav Amit <namit(a)vmware.com>
Subject: mm/userfaultfd: fix memory corruption due to writeprotect
Userfaultfd self-test fails occasionally, indicating a memory corruption.
Analyzing this problem indicates that there is a real bug since mmap_lock
is only taken for read in mwriteprotect_range() and defers flushes, and
since there is insufficient consideration of concurrent deferred TLB
flushes in wp_page_copy(). Although the PTE is flushed from the TLBs in
wp_page_copy(), this flush takes place after the copy has already been
performed, and therefore changes of the page are possible between the time
of the copy and the time in which the PTE is flushed.
To make matters worse, memory-unprotection using userfaultfd also poses a
problem. Although memory unprotection is logically a promotion of PTE
permissions, and therefore should not require a TLB flush, the current
userrfaultfd code might actually cause a demotion of the architectural PTE
permission: when userfaultfd_writeprotect() unprotects memory region, it
unintentionally *clears* the RW-bit if it was already set. Note that this
unprotecting a PTE that is not write-protected is a valid use-case: the
userfaultfd monitor might ask to unprotect a region that holds both
write-protected and write-unprotected PTEs.
The scenario that happens in selftests/vm/userfaultfd is as follows:
cpu0 cpu1 cpu2
---- ---- ----
[ Writable PTE
cached in TLB ]
userfaultfd_writeprotect()
[ write-*unprotect* ]
mwriteprotect_range()
mmap_read_lock()
change_protection()
change_protection_range()
...
change_pte_range()
[ *clear* “write”-bit ]
[ defer TLB flushes ]
[ page-fault ]
...
wp_page_copy()
cow_user_page()
[ copy page ]
[ write to old
page ]
...
set_pte_at_notify()
A similar scenario can happen:
cpu0 cpu1 cpu2 cpu3
---- ---- ---- ----
[ Writable PTE
cached in TLB ]
userfaultfd_writeprotect()
[ write-protect ]
[ deferred TLB flush ]
userfaultfd_writeprotect()
[ write-unprotect ]
[ deferred TLB flush]
[ page-fault ]
wp_page_copy()
cow_user_page()
[ copy page ]
... [ write to page ]
set_pte_at_notify()
This race exists since commit 292924b26024 ("userfaultfd: wp: apply
_PAGE_UFFD_WP bit"). Yet, as Yu Zhao pointed, these races became apparent
since commit 09854ba94c6a ("mm: do_wp_page() simplification") which made
wp_page_copy() more likely to take place, specifically if page_count(page)
> 1.
To resolve the aforementioned races, check whether there are pending
flushes on uffd-write-protected VMAs, and if there are, perform a flush
before doing the COW.
Further optimizations will follow to avoid during uffd-write-unprotect
unnecassary PTE write-protection and TLB flushes.
Link: https://lkml.kernel.org/r/20210304095423.3825684-1-namit@vmware.com
Fixes: 09854ba94c6a ("mm: do_wp_page() simplification")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Suggested-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Tested-by: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Pavel Emelyanov <xemul(a)openvz.org>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/memory.c~mm-userfaultfd-fix-memory-corruption-due-to-writeprotect
+++ a/mm/memory.c
@@ -3097,6 +3097,14 @@ static vm_fault_t do_wp_page(struct vm_f
return handle_userfault(vmf, VM_UFFD_WP);
}
+ /*
+ * Userfaultfd write-protect can defer flushes. Ensure the TLB
+ * is flushed in this case before copying.
+ */
+ if (unlikely(userfaultfd_wp(vmf->vma) &&
+ mm_tlb_flush_pending(vmf->vma->vm_mm)))
+ flush_tlb_page(vmf->vma, vmf->address);
+
vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte);
if (!vmf->page) {
/*
_
Patches currently in -mm which might be from namit(a)vmware.com are
The patch titled
Subject: kasan: fix KASAN_STACK dependency for HW_TAGS
has been removed from the -mm tree. Its filename was
kasan-fix-kasan_stack-dependency-for-hw_tags.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrey Konovalov <andreyknvl(a)google.com>
Subject: kasan: fix KASAN_STACK dependency for HW_TAGS
There's a runtime failure when running HW_TAGS-enabled kernel built with
GCC on hardware that doesn't support MTE. GCC-built kernels always have
CONFIG_KASAN_STACK enabled, even though stack instrumentation isn't
supported by HW_TAGS. Having that config enabled causes KASAN to issue
MTE-only instructions to unpoison kernel stacks, which causes the failure.
Fix the issue by disallowing CONFIG_KASAN_STACK when HW_TAGS is used.
(The commit that introduced CONFIG_KASAN_HW_TAGS specified proper
dependency for CONFIG_KASAN_STACK_ENABLE but not for CONFIG_KASAN_STACK.)
Link: https://lkml.kernel.org/r/59e75426241dbb5611277758c8d4d6f5f9298dac.16152154…
Fixes: 6a63a63ff1ac ("kasan: introduce CONFIG_KASAN_HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Reported-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: Evgenii Stepanov <eugenis(a)google.com>
Cc: Branislav Rankov <Branislav.Rankov(a)arm.com>
Cc: Kevin Brodsky <kevin.brodsky(a)arm.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Kconfig.kasan | 1 +
1 file changed, 1 insertion(+)
--- a/lib/Kconfig.kasan~kasan-fix-kasan_stack-dependency-for-hw_tags
+++ a/lib/Kconfig.kasan
@@ -156,6 +156,7 @@ config KASAN_STACK_ENABLE
config KASAN_STACK
int
+ depends on KASAN_GENERIC || KASAN_SW_TAGS
default 1 if KASAN_STACK_ENABLE || CC_IS_GCC
default 0
_
Patches currently in -mm which might be from andreyknvl(a)google.com are
kasan-fix-per-page-tags-for-non-page_alloc-pages.patch
kasan-initialize-shadow-to-tag_invalid-for-sw_tags.patch
mm-kasan-dont-poison-boot-memory-with-tag-based-modes.patch
arm64-kasan-allow-to-init-memory-when-setting-tags.patch
kasan-init-memory-in-kasan_unpoison-for-hw_tags.patch
kasan-mm-integrate-page_alloc-init-with-hw_tags.patch
kasan-mm-integrate-slab-init_on_alloc-with-hw_tags.patch
kasan-mm-integrate-slab-init_on_free-with-hw_tags.patch
kasan-docs-clean-up-sections.patch
kasan-docs-update-overview-section.patch
kasan-docs-update-usage-section.patch
kasan-docs-update-error-reports-section.patch
kasan-docs-update-boot-parameters-section.patch
kasan-docs-update-generic-implementation-details-section.patch
kasan-docs-update-sw_tags-implementation-details-section.patch
kasan-docs-update-hw_tags-implementation-details-section.patch
kasan-docs-update-shadow-memory-section.patch
kasan-docs-update-ignoring-accesses-section.patch
kasan-docs-update-tests-section.patch
The patch titled
Subject: kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
has been removed from the -mm tree. Its filename was
kasan-mm-fix-crash-with-hw_tags-and-debug_pagealloc.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrey Konovalov <andreyknvl(a)google.com>
Subject: kasan, mm: fix crash with HW_TAGS and DEBUG_PAGEALLOC
Currently, kasan_free_nondeferred_pages()->kasan_free_pages() is called
after debug_pagealloc_unmap_pages(). This causes a crash when
debug_pagealloc is enabled, as HW_TAGS KASAN can't set tags on an
unmapped page.
This patch puts kasan_free_nondeferred_pages() before
debug_pagealloc_unmap_pages() and arch_free_page(), which can also make
the page unavailable.
Link: https://lkml.kernel.org/r/24cd7db274090f0e5bc3adcdc7399243668e3171.16149873…
Fixes: 94ab5b61ee16 ("kasan, arm64: enable CONFIG_KASAN_HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: Evgenii Stepanov <eugenis(a)google.com>
Cc: Branislav Rankov <Branislav.Rankov(a)arm.com>
Cc: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~kasan-mm-fix-crash-with-hw_tags-and-debug_pagealloc
+++ a/mm/page_alloc.c
@@ -1282,6 +1282,12 @@ static __always_inline bool free_pages_p
kernel_poison_pages(page, 1 << order);
/*
+ * With hardware tag-based KASAN, memory tags must be set before the
+ * page becomes unavailable via debug_pagealloc or arch_free_page.
+ */
+ kasan_free_nondeferred_pages(page, order);
+
+ /*
* arch_free_page() can make the page's contents inaccessible. s390
* does this. So nothing which can access the page's contents should
* happen after this.
@@ -1290,8 +1296,6 @@ static __always_inline bool free_pages_p
debug_pagealloc_unmap_pages(page, 1 << order);
- kasan_free_nondeferred_pages(page, order);
-
return true;
}
_
Patches currently in -mm which might be from andreyknvl(a)google.com are
kasan-fix-per-page-tags-for-non-page_alloc-pages.patch
kasan-initialize-shadow-to-tag_invalid-for-sw_tags.patch
mm-kasan-dont-poison-boot-memory-with-tag-based-modes.patch
arm64-kasan-allow-to-init-memory-when-setting-tags.patch
kasan-init-memory-in-kasan_unpoison-for-hw_tags.patch
kasan-mm-integrate-page_alloc-init-with-hw_tags.patch
kasan-mm-integrate-slab-init_on_alloc-with-hw_tags.patch
kasan-mm-integrate-slab-init_on_free-with-hw_tags.patch
kasan-docs-clean-up-sections.patch
kasan-docs-update-overview-section.patch
kasan-docs-update-usage-section.patch
kasan-docs-update-error-reports-section.patch
kasan-docs-update-boot-parameters-section.patch
kasan-docs-update-generic-implementation-details-section.patch
kasan-docs-update-sw_tags-implementation-details-section.patch
kasan-docs-update-hw_tags-implementation-details-section.patch
kasan-docs-update-shadow-memory-section.patch
kasan-docs-update-ignoring-accesses-section.patch
kasan-docs-update-tests-section.patch
The patch titled
Subject: mm/madvise: replace ptrace attach requirement for process_madvise
has been removed from the -mm tree. Its filename was
mm-madvise-replace-ptrace-attach-requirement-for-process_madvise.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: mm/madvise: replace ptrace attach requirement for process_madvise
process_madvise currently requires ptrace attach capability.
PTRACE_MODE_ATTACH gives one process complete control over another
process. It effectively removes the security boundary between the two
processes (in one direction). Granting ptrace attach capability even to a
system process is considered dangerous since it creates an attack surface.
This severely limits the usage of this API.
The operations process_madvise can perform do not affect the correctness
of the operation of the target process; they only affect where the data is
physically located (and therefore, how fast it can be accessed). What we
want is the ability for one process to influence another process in order
to optimize performance across the entire system while leaving the
security boundary intact.
Replace PTRACE_MODE_ATTACH with a combination of PTRACE_MODE_READ and
CAP_SYS_NICE. PTRACE_MODE_READ to prevent leaking ASLR metadata and
CAP_SYS_NICE for influencing process performance.
Link: https://lkml.kernel.org/r/20210303185807.2160264-1-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Jeff Vander Stoep <jeffv(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Tim Murray <timmurray(a)google.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: James Morris <jmorris(a)namei.org>
Cc: <stable(a)vger.kernel.org> [5.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mm-madvise-replace-ptrace-attach-requirement-for-process_madvise
+++ a/mm/madvise.c
@@ -1198,12 +1198,22 @@ SYSCALL_DEFINE5(process_madvise, int, pi
goto release_task;
}
- mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
+ /* Require PTRACE_MODE_READ to avoid leaking ASLR metadata. */
+ mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
if (IS_ERR_OR_NULL(mm)) {
ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
goto release_task;
}
+ /*
+ * Require CAP_SYS_NICE for influencing process performance. Note that
+ * only non-destructive hints are currently supported.
+ */
+ if (!capable(CAP_SYS_NICE)) {
+ ret = -EPERM;
+ goto release_mm;
+ }
+
total_len = iov_iter_count(&iter);
while (iov_iter_count(&iter)) {
@@ -1218,6 +1228,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi
if (ret == 0)
ret = total_len - iov_iter_count(&iter);
+release_mm:
mmput(mm);
release_task:
put_task_struct(task);
_
Patches currently in -mm which might be from surenb(a)google.com are
The patch titled
Subject: linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
has been removed from the -mm tree. Its filename was
linux-compiler-clangh-define-have_builtin_bswap.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: linux/compiler-clang.h: define HAVE_BUILTIN_BSWAP*
Separating compiler-clang.h from compiler-gcc.h inadventently dropped the
definitions of the three HAVE_BUILTIN_BSWAP macros, which requires falling
back to the open-coded version and hoping that the compiler detects it.
Since all versions of clang support the __builtin_bswap interfaces, add
back the flags and have the headers pick these up automatically.
This results in a 4% improvement of compilation speed for arm defconfig.
Note: it might also be worth revisiting which architectures set
CONFIG_ARCH_USE_BUILTIN_BSWAP for one compiler or the other, today this is
set on six architectures (arm32, csky, mips, powerpc, s390, x86), while
another ten architectures define custom helpers (alpha, arc, ia64, m68k,
mips, nios2, parisc, sh, sparc, xtensa), and the rest (arm64, h8300,
hexagon, microblaze, nds32, openrisc, riscv) just get the unoptimized
version and rely on the compiler to detect it.
A long time ago, the compiler builtins were architecture specific, but
nowadays, all compilers that are able to build the kernel have correct
implementations of them, though some may not be as optimized as the inline
asm versions.
The patch that dropped the optimization landed in v4.19, so as discussed
it would be fairly safe to backport this revert to stable kernels to the
4.19/5.4/5.10 stable kernels, but there is a remaining risk for
regressions, and it has no known side-effects besides compile speed.
Link: https://lkml.kernel.org/r/20210226161151.2629097-1-arnd@kernel.org
Link: https://lore.kernel.org/lkml/20210225164513.3667778-1-arnd@kernel.org/
Fixes: 815f0ddb346c ("include/linux/compiler*.h: make compiler-*.h mutually exclusive")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Miguel Ojeda <ojeda(a)kernel.org>
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Acked-by: Luc Van Oostenryck <luc.vanoostenryck(a)gmail.com>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Nick Hu <nickhu(a)andestech.com>
Cc: Greentime Hu <green.hu(a)gmail.com>
Cc: Vincent Chen <deanbo422(a)gmail.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: Guo Ren <guoren(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Sami Tolvanen <samitolvanen(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Arvind Sankar <nivedita(a)alum.mit.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler-clang.h | 6 ++++++
1 file changed, 6 insertions(+)
--- a/include/linux/compiler-clang.h~linux-compiler-clangh-define-have_builtin_bswap
+++ a/include/linux/compiler-clang.h
@@ -31,6 +31,12 @@
#define __no_sanitize_thread
#endif
+#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP)
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#define __HAVE_BUILTIN_BSWAP16__
+#endif /* CONFIG_ARCH_USE_BUILTIN_BSWAP */
+
#if __has_feature(undefined_behavior_sanitizer)
/* GCC does not have __SANITIZE_UNDEFINED__ */
#define __no_sanitize_undefined \
_
Patches currently in -mm which might be from arnd(a)arndb.de are
The patch titled
Subject: binfmt_misc: fix possible deadlock in bm_register_write
has been removed from the -mm tree. Its filename was
binfmt_misc-fix-possible-deadlock-in-bm_register_write.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Lior Ribak <liorribak(a)gmail.com>
Subject: binfmt_misc: fix possible deadlock in bm_register_write
There is a deadlock in bm_register_write:
First, in the begining of the function, a lock is taken on the binfmt_misc
root inode with inode_lock(d_inode(root)).
Then, if the user used the MISC_FMT_OPEN_FILE flag, the function will call
open_exec on the user-provided interpreter.
open_exec will call a path lookup, and if the path lookup process includes
the root of binfmt_misc, it will try to take a shared lock on its inode
again, but it is already locked, and the code will get stuck in a deadlock
To reproduce the bug:
$ echo ":iiiii:E::ii::/proc/sys/fs/binfmt_misc/bla:F" > /proc/sys/fs/binfmt_misc/register
backtrace of where the lock occurs (#5):
0 schedule () at ./arch/x86/include/asm/current.h:15
1 0xffffffff81b51237 in rwsem_down_read_slowpath (sem=0xffff888003b202e0, count=<optimized out>, state=state@entry=2) at kernel/locking/rwsem.c:992
2 0xffffffff81b5150a in __down_read_common (state=2, sem=<optimized out>) at kernel/locking/rwsem.c:1213
3 __down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1222
4 down_read (sem=<optimized out>) at kernel/locking/rwsem.c:1355
5 0xffffffff811ee22a in inode_lock_shared (inode=<optimized out>) at ./include/linux/fs.h:783
6 open_last_lookups (op=0xffffc9000022fe34, file=0xffff888004098600, nd=0xffffc9000022fd10) at fs/namei.c:3177
7 path_openat (nd=nd@entry=0xffffc9000022fd10, op=op@entry=0xffffc9000022fe34, flags=flags@entry=65) at fs/namei.c:3366
8 0xffffffff811efe1c in do_filp_open (dfd=<optimized out>, pathname=pathname@entry=0xffff8880031b9000, op=op@entry=0xffffc9000022fe34) at fs/namei.c:3396
9 0xffffffff811e493f in do_open_execat (fd=fd@entry=-100, name=name@entry=0xffff8880031b9000, flags=<optimized out>, flags@entry=0) at fs/exec.c:913
10 0xffffffff811e4a92 in open_exec (name=<optimized out>) at fs/exec.c:948
11 0xffffffff8124aa84 in bm_register_write (file=<optimized out>, buffer=<optimized out>, count=19, ppos=<optimized out>) at fs/binfmt_misc.c:682
12 0xffffffff811decd2 in vfs_write (file=file@entry=0xffff888004098500, buf=buf@entry=0xa758d0 ":iiiii:E::ii::i:CF
", count=count@entry=19, pos=pos@entry=0xffffc9000022ff10) at fs/read_write.c:603
13 0xffffffff811defda in ksys_write (fd=<optimized out>, buf=0xa758d0 ":iiiii:E::ii::i:CF
", count=19) at fs/read_write.c:658
14 0xffffffff81b49813 in do_syscall_64 (nr=<optimized out>, regs=0xffffc9000022ff58) at arch/x86/entry/common.c:46
15 0xffffffff81c0007c in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:120
To solve the issue, the open_exec call is moved to before the write
lock is taken by bm_register_write
Link: https://lkml.kernel.org/r/20210228224414.95962-1-liorribak@gmail.com
Fixes: 948b701a607f1 ("binfmt_misc: add persistent opened binary handler for containers")
Signed-off-by: Lior Ribak <liorribak(a)gmail.com>
Acked-by: Helge Deller <deller(a)gmx.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/binfmt_misc.c | 29 ++++++++++++++---------------
1 file changed, 14 insertions(+), 15 deletions(-)
--- a/fs/binfmt_misc.c~binfmt_misc-fix-possible-deadlock-in-bm_register_write
+++ a/fs/binfmt_misc.c
@@ -649,12 +649,24 @@ static ssize_t bm_register_write(struct
struct super_block *sb = file_inode(file)->i_sb;
struct dentry *root = sb->s_root, *dentry;
int err = 0;
+ struct file *f = NULL;
e = create_entry(buffer, count);
if (IS_ERR(e))
return PTR_ERR(e);
+ if (e->flags & MISC_FMT_OPEN_FILE) {
+ f = open_exec(e->interpreter);
+ if (IS_ERR(f)) {
+ pr_notice("register: failed to install interpreter file %s\n",
+ e->interpreter);
+ kfree(e);
+ return PTR_ERR(f);
+ }
+ e->interp_file = f;
+ }
+
inode_lock(d_inode(root));
dentry = lookup_one_len(e->name, root, strlen(e->name));
err = PTR_ERR(dentry);
@@ -678,21 +690,6 @@ static ssize_t bm_register_write(struct
goto out2;
}
- if (e->flags & MISC_FMT_OPEN_FILE) {
- struct file *f;
-
- f = open_exec(e->interpreter);
- if (IS_ERR(f)) {
- err = PTR_ERR(f);
- pr_notice("register: failed to install interpreter file %s\n", e->interpreter);
- simple_release_fs(&bm_mnt, &entry_count);
- iput(inode);
- inode = NULL;
- goto out2;
- }
- e->interp_file = f;
- }
-
e->dentry = dget(dentry);
inode->i_private = e;
inode->i_fop = &bm_entry_operations;
@@ -709,6 +706,8 @@ out:
inode_unlock(d_inode(root));
if (err) {
+ if (f)
+ filp_close(f, NULL);
kfree(e);
return err;
}
_
Patches currently in -mm which might be from liorribak(a)gmail.com are
The patch titled
Subject: mm/highmem.c: fix zero_user_segments() with start > end
has been removed from the -mm tree. Its filename was
fix-zero_user_segments-with-start-end.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Subject: mm/highmem.c: fix zero_user_segments() with start > end
zero_user_segments() is used from __block_write_begin_int(), for example
like the following
zero_user_segments(page, 4096, 1024, 512, 918)
But new the zero_user_segments() implementation for for HIGHMEM +
TRANSPARENT_HUGEPAGE doesn't handle "start > end" case correctly, and hits
BUG_ON(). (we can fix __block_write_begin_int() instead though, it is the
old and multiple usage)
Also it calls kmap_atomic() unnecessarily while start == end == 0.
Link: https://lkml.kernel.org/r/87v9ab60r4.fsf@mail.parknet.co.jp
Fixes: 0060ef3b4e6d ("mm: support THPs in zero_user_segments")
Signed-off-by: OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/highmem.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
--- a/mm/highmem.c~fix-zero_user_segments-with-start-end
+++ a/mm/highmem.c
@@ -368,20 +368,24 @@ void zero_user_segments(struct page *pag
BUG_ON(end1 > page_size(page) || end2 > page_size(page));
+ if (start1 >= end1)
+ start1 = end1 = 0;
+ if (start2 >= end2)
+ start2 = end2 = 0;
+
for (i = 0; i < compound_nr(page); i++) {
void *kaddr = NULL;
- if (start1 < PAGE_SIZE || start2 < PAGE_SIZE)
- kaddr = kmap_atomic(page + i);
-
if (start1 >= PAGE_SIZE) {
start1 -= PAGE_SIZE;
end1 -= PAGE_SIZE;
} else {
unsigned this_end = min_t(unsigned, end1, PAGE_SIZE);
- if (end1 > start1)
+ if (end1 > start1) {
+ kaddr = kmap_atomic(page + i);
memset(kaddr + start1, 0, this_end - start1);
+ }
end1 -= this_end;
start1 = 0;
}
@@ -392,8 +396,11 @@ void zero_user_segments(struct page *pag
} else {
unsigned this_end = min_t(unsigned, end2, PAGE_SIZE);
- if (end2 > start2)
+ if (end2 > start2) {
+ if (!kaddr)
+ kaddr = kmap_atomic(page + i);
memset(kaddr + start2, 0, this_end - start2);
+ }
end2 -= this_end;
start2 = 0;
}
_
Patches currently in -mm which might be from hirofumi(a)mail.parknet.co.jp are
The patch titled
Subject: mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
has been removed from the -mm tree. Its filename was
mm-page_allocc-refactor-initialization-of-struct-page-for-holes-in-memory-layout.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: mm/page_alloc.c: refactor initialization of struct page for holes in memory layout
There could be struct pages that are not backed by actual physical memory.
This can happen when the actual memory bank is not a multiple of
SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.
Such pages are currently initialized using init_unavailable_mem() function
that iterates through PFNs in holes in memblock.memory and if there is a
struct page corresponding to a PFN, the fields of this page are set to
default values and it is marked as Reserved.
init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.
Before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock
regions rather that check each PFN") the holes inside a zone were
re-initialized during memmap_init() and got their zone/node links right.
However, after that commit nothing updates the struct pages representing
such holes.
On a system that has firmware reserved holes in a zone above ZONE_DMA, for
instance in a configuration below:
# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM
unset zone link in struct page will trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
in set_pfnblock_flags_mask() when called with a struct page from a range
other than E820_TYPE_RAM because there are pages in the range of
ZONE_DMA32 but the unset zone link in struct page makes them appear as a
part of ZONE_DMA.
Interleave initialization of the unavailable pages with the normal
initialization of memory map, so that zone and node information will be
properly set on struct pages that are not backed by the actual memory.
With this change the pages for holes inside a zone will get proper
zone/node links and the pages that are not spanned by any node will get
links to the adjacent zone/node. The holes between nodes will be
prepended to the zone/node above the hole and the trailing pages in the
last section that will be appended to the zone/node below.
[akpm(a)linux-foundation.org: don't initialize static to zero, use %llu for u64]
Link: https://lkml.kernel.org/r/20210225224351.7356-2-rppt@kernel.org
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Qian Cai <cai(a)lca.pw>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Łukasz Majczak <lma(a)semihalf.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: "Sarvela, Tomi P" <tomi.p.sarvela(a)intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 158 +++++++++++++++++++++-------------------------
1 file changed, 75 insertions(+), 83 deletions(-)
--- a/mm/page_alloc.c~mm-page_allocc-refactor-initialization-of-struct-page-for-holes-in-memory-layout
+++ a/mm/page_alloc.c
@@ -6259,12 +6259,65 @@ static void __meminit zone_init_free_lis
}
}
+#if !defined(CONFIG_FLAT_NODE_MEM_MAP)
+/*
+ * Only struct pages that correspond to ranges defined by memblock.memory
+ * are zeroed and initialized by going through __init_single_page() during
+ * memmap_init_zone().
+ *
+ * But, there could be struct pages that correspond to holes in
+ * memblock.memory. This can happen because of the following reasons:
+ * - physical memory bank size is not necessarily the exact multiple of the
+ * arbitrary section size
+ * - early reserved memory may not be listed in memblock.memory
+ * - memory layouts defined with memmap= kernel parameter may not align
+ * nicely with memmap sections
+ *
+ * Explicitly initialize those struct pages so that:
+ * - PG_Reserved is set
+ * - zone and node links point to zone and node that span the page if the
+ * hole is in the middle of a zone
+ * - zone and node links point to adjacent zone/node if the hole falls on
+ * the zone boundary; the pages in such holes will be prepended to the
+ * zone/node above the hole except for the trailing pages in the last
+ * section that will be appended to the zone/node below.
+ */
+static u64 __meminit init_unavailable_range(unsigned long spfn,
+ unsigned long epfn,
+ int zone, int node)
+{
+ unsigned long pfn;
+ u64 pgcnt = 0;
+
+ for (pfn = spfn; pfn < epfn; pfn++) {
+ if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
+ pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
+ + pageblock_nr_pages - 1;
+ continue;
+ }
+ __init_single_page(pfn_to_page(pfn), pfn, zone, node);
+ __SetPageReserved(pfn_to_page(pfn));
+ pgcnt++;
+ }
+
+ return pgcnt;
+}
+#else
+static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
+ int zone, int node)
+{
+ return 0;
+}
+#endif
+
void __meminit __weak memmap_init_zone(struct zone *zone)
{
unsigned long zone_start_pfn = zone->zone_start_pfn;
unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+ static unsigned long hole_pfn;
unsigned long start_pfn, end_pfn;
+ u64 pgcnt = 0;
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
@@ -6274,7 +6327,29 @@ void __meminit __weak memmap_init_zone(s
memmap_init_range(end_pfn - start_pfn, nid,
zone_id, start_pfn, zone_end_pfn,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
+
+ if (hole_pfn < start_pfn)
+ pgcnt += init_unavailable_range(hole_pfn, start_pfn,
+ zone_id, nid);
+ hole_pfn = end_pfn;
}
+
+#ifdef CONFIG_SPARSEMEM
+ /*
+ * Initialize the hole in the range [zone_end_pfn, section_end].
+ * If zone boundary falls in the middle of a section, this hole
+ * will be re-initialized during the call to this function for the
+ * higher zone.
+ */
+ end_pfn = round_up(zone_end_pfn, PAGES_PER_SECTION);
+ if (hole_pfn < end_pfn)
+ pgcnt += init_unavailable_range(hole_pfn, end_pfn,
+ zone_id, nid);
+#endif
+
+ if (pgcnt)
+ pr_info(" %s zone: %llu pages in unavailable ranges\n",
+ zone->name, pgcnt);
}
static int zone_batchsize(struct zone *zone)
@@ -7071,88 +7146,6 @@ void __init free_area_init_memoryless_no
free_area_init_node(nid);
}
-#if !defined(CONFIG_FLAT_NODE_MEM_MAP)
-/*
- * Initialize all valid struct pages in the range [spfn, epfn) and mark them
- * PageReserved(). Return the number of struct pages that were initialized.
- */
-static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn)
-{
- unsigned long pfn;
- u64 pgcnt = 0;
-
- for (pfn = spfn; pfn < epfn; pfn++) {
- if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
- pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
- + pageblock_nr_pages - 1;
- continue;
- }
- /*
- * Use a fake node/zone (0) for now. Some of these pages
- * (in memblock.reserved but not in memblock.memory) will
- * get re-initialized via reserve_bootmem_region() later.
- */
- __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
- __SetPageReserved(pfn_to_page(pfn));
- pgcnt++;
- }
-
- return pgcnt;
-}
-
-/*
- * Only struct pages that are backed by physical memory are zeroed and
- * initialized by going through __init_single_page(). But, there are some
- * struct pages which are reserved in memblock allocator and their fields
- * may be accessed (for example page_to_pfn() on some configuration accesses
- * flags). We must explicitly initialize those struct pages.
- *
- * This function also addresses a similar issue where struct pages are left
- * uninitialized because the physical address range is not covered by
- * memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=, or when the highest physical
- * address (max_pfn) does not end on a section boundary.
- */
-static void __init init_unavailable_mem(void)
-{
- phys_addr_t start, end;
- u64 i, pgcnt;
- phys_addr_t next = 0;
-
- /*
- * Loop through unavailable ranges not covered by memblock.memory.
- */
- pgcnt = 0;
- for_each_mem_range(i, &start, &end) {
- if (next < start)
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- PFN_UP(start));
- next = end;
- }
-
- /*
- * Early sections always have a fully populated memmap for the whole
- * section - see pfn_valid(). If the last section has holes at the
- * end and that section is marked "online", the memmap will be
- * considered initialized. Make sure that memmap has a well defined
- * state.
- */
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- round_up(max_pfn, PAGES_PER_SECTION));
-
- /*
- * Struct pages that do not have backing memory. This could be because
- * firmware is using some of this memory, or for some other reasons.
- */
- if (pgcnt)
- pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
-}
-#else
-static inline void __init init_unavailable_mem(void)
-{
-}
-#endif /* !CONFIG_FLAT_NODE_MEM_MAP */
-
#if MAX_NUMNODES > 1
/*
* Figure out the number of possible node ids.
@@ -7576,7 +7569,6 @@ void __init free_area_init(unsigned long
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
- init_unavailable_mem();
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid);
_
Patches currently in -mm which might be from rppt(a)linux.ibm.com are
Drivers that do not use the ctrl-framework use this function instead.
- Return error when handling of REQUEST_VAL.
- Do not check for multiple classes when getting the DEF_VAL.
Fixes v4l2-compliance:
Control ioctls (Input 0):
fail: v4l2-test-controls.cpp(813): doioctl(node, VIDIOC_G_EXT_CTRLS, &ctrls)
test VIDIOC_G/S/TRY_EXT_CTRLS: FAIL
Cc: stable(a)vger.kernel.org
Fixes: 6fa6f831f095 ("media: v4l2-ctrls: add core request support")
Suggested-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
---
drivers/media/v4l2-core/v4l2-ioctl.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 31d1342e61e8..9406e90ff805 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -917,15 +917,24 @@ static int check_ext_ctrls(struct v4l2_ext_controls *c, int allow_priv)
for (i = 0; i < c->count; i++)
c->controls[i].reserved2[0] = 0;
- /* V4L2_CID_PRIVATE_BASE cannot be used as control class
- when using extended controls.
- Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL
- is it allowed for backwards compatibility.
- */
- if (!allow_priv && c->which == V4L2_CID_PRIVATE_BASE)
- return 0;
- if (!c->which)
+ switch (c->which) {
+ case V4L2_CID_PRIVATE_BASE:
+ /*
+ * V4L2_CID_PRIVATE_BASE cannot be used as control class
+ * when using extended controls.
+ * Only when passed in through VIDIOC_G_CTRL and VIDIOC_S_CTRL
+ * is it allowed for backwards compatibility.
+ */
+ if (!allow_priv)
+ return 0;
+ break;
+ case V4L2_CTRL_WHICH_DEF_VAL:
+ case V4L2_CTRL_WHICH_CUR_VAL:
return 1;
+ case V4L2_CTRL_WHICH_REQUEST_VAL:
+ return 0;
+ }
+
/* Check that all controls are from the same control class. */
for (i = 0; i < c->count; i++) {
if (V4L2_CTRL_ID2WHICH(c->controls[i].id) != c->which) {
--
2.31.0.rc2.261.g7f71774620-goog
Please, apply the following two upstream commits (attached)
(in this order):
d567572906d9 nvme: unlink head after removing last namespace
ac262508daa8 nvme: release namespace head reference on error
TO: v5.4, v5.5, v5.6, v5.7
These commits are present in v5.8
and apply cleanly to the above.
Reason:
These fix a potential crash or malfunction
when an nvme namespace is deleted
and then a new namespace with the same nsid is created
before the old ns_head for this nsid is gone.
The first commit prevents the new namespace
from being matched by nvme_init_ns_head()
with the old ns_head causing ID mismatch
and consequently a failure to initialize the new namespace.
The second commit prevents ns_head refcount imbalance
in case nvme_init_ns_head() detects ID mismatch,
and consequently a crash later.
--
*Lightbits Labs**
*Lead the cloud-native data center
transformation by
delivering *scalable *and *efficient *software
defined storage that is
*easy *to consume.
*This message is sent in confidence for the addressee
only. It
may contain legally privileged information. The contents are not
to be
disclosed to anyone other than the addressee. Unauthorized recipients
are
requested to preserve this confidentiality, advise the sender
immediately of
any error in transmission and delete the email from their
systems.*