The patch titled
Subject: btrfs: avoid live-lock in search_ioctl() on hardware with sub-page faults
has been removed from the -mm tree. Its filename was
btrfs-avoid-live-lock-in-search_ioctl-on-hardware-with-sub-page-faults.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Catalin Marinas <catalin.marinas(a)arm.com>
Subject: btrfs: avoid live-lock in search_ioctl() on hardware with sub-page faults
Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
addressed a lockdep warning by pre-faulting the user pages and attempting
the copy_to_user_nofault() in an infinite loop. On architectures like
arm64 with MTE, an access may fault within a page at a location different
from what fault_in_writeable() probed. Since the sk_offset is rewound to
the previous struct btrfs_ioctl_search_header boundary, there is no
guaranteed forward progress and search_ioctl() may live-lock.
Use fault_in_exact_writeable() instead which probes the entire user
buffer for faults at sub-page granularity.
Link: https://lkml.kernel.org/r/20211124192024.2408218-4-catalin.marinas@arm.com
Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reported-by: Al Viro <viro(a)zeniv.linux.org.uk>
Acked-by: David Sterba <dsterba(a)suse.com>
Cc: Josef Bacik <josef(a)toxicpanda.com>
Cc: Andreas Gruenbacher <agruenba(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/btrfs/ioctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/btrfs/ioctl.c~btrfs-avoid-live-lock-in-search_ioctl-on-hardware-with-sub-page-faults
+++ a/fs/btrfs/ioctl.c
@@ -2225,7 +2225,8 @@ static noinline int search_ioctl(struct
while (1) {
ret = -EFAULT;
- if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset))
+ if (fault_in_exact_writeable(ubuf + sk_offset,
+ *buf_size - sk_offset))
break;
ret = btrfs_search_forward(root, &key, path, sk->min_transid);
_
Patches currently in -mm which might be from catalin.marinas(a)arm.com are
The patch titled
Subject: arm64: add support for sub-page faults user probing
has been removed from the -mm tree. Its filename was
arm64-add-support-for-sub-page-faults-user-probing.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Catalin Marinas <catalin.marinas(a)arm.com>
Subject: arm64: add support for sub-page faults user probing
With MTE, even if the pte allows an access, a mismatched tag somewhere
within a page can still cause a fault. Select ARCH_HAS_SUBPAGE_FAULTS if
MTE is enabled and implement probe_user_writeable().
Link: https://lkml.kernel.org/r/20211124192024.2408218-3-catalin.marinas@arm.com
Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba(a)redhat.com>
Cc: David Sterba <dsterba(a)suse.com>
Cc: Josef Bacik <josef(a)toxicpanda.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm64/Kconfig | 1
arch/arm64/include/asm/uaccess.h | 33 +++++++++++++++++++++++++++++
2 files changed, 34 insertions(+)
--- a/arch/arm64/include/asm/uaccess.h~arm64-add-support-for-sub-page-faults-user-probing
+++ a/arch/arm64/include/asm/uaccess.h
@@ -479,4 +479,37 @@ static inline int __copy_from_user_flush
}
#endif
+#ifdef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
+static inline size_t __mte_probe_user_range(const char __user *uaddr,
+ size_t size)
+{
+ const char __user *end = uaddr + size;
+ int err = 0;
+ char val;
+
+ uaddr = PTR_ALIGN_DOWN(uaddr, MTE_GRANULE_SIZE);
+ while (uaddr < end) {
+ /*
+ * A read is sufficient for MTE, the caller should have probed
+ * for the pte write permission.
+ */
+ __raw_get_user(val, uaddr, err);
+ if (err)
+ return end - uaddr;
+ uaddr += MTE_GRANULE_SIZE;
+ }
+ (void)val;
+
+ return 0;
+}
+
+static inline size_t probe_user_writable(const void __user *uaddr,
+ size_t size)
+{
+ if (!system_supports_mte())
+ return 0;
+ return __mte_probe_user_range(uaddr, size);
+}
+#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
+
#endif /* __ASM_UACCESS_H */
--- a/arch/arm64/Kconfig~arm64-add-support-for-sub-page-faults-user-probing
+++ a/arch/arm64/Kconfig
@@ -1777,6 +1777,7 @@ config ARM64_MTE
depends on AS_HAS_LSE_ATOMICS
# Required for tag checking in the uaccess routines
depends on ARM64_PAN
+ select ARCH_HAS_SUBPAGE_FAULTS
select ARCH_USES_HIGH_VMA_FLAGS
help
Memory Tagging (part of the ARMv8.5 Extensions) provides
_
Patches currently in -mm which might be from catalin.marinas(a)arm.com are
btrfs-avoid-live-lock-in-search_ioctl-on-hardware-with-sub-page-faults.patch
The patch titled
Subject: mm: introduce fault_in_exact_writeable() to probe for sub-page faults
has been removed from the -mm tree. Its filename was
mm-introduce-fault_in_exact_writeable-to-probe-for-sub-page-faults.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Catalin Marinas <catalin.marinas(a)arm.com>
Subject: mm: introduce fault_in_exact_writeable() to probe for sub-page faults
Patch series "Avoid live-lock in fault-in+uaccess loops with sub-page faults".
There are a few places in the filesystem layer where a uaccess is
performed in a loop with page faults disabled, together with a
fault_in_*() call to pre-fault the pages. On architectures like arm64
with MTE (memory tagging extensions) or SPARC ADI, even if the
fault_in_*() succeeded, the uaccess can still fault indefinitely.
In general this is not an issue since such code restarts the fault_in_*()
from where the uaccess failed, therefore guaranteeing forward progress.
The btrfs search_ioctl(), however, rewinds the fault_in_*() position and
it can live-lock. This was reported by Al here:
https://lore.kernel.org/r/YSqOUb7yZ7kBoKRY@zeniv-ca.linux.org.uk
There's also an analysis by Al of other fault-in places:
https://lore.kernel.org/r/YSldx9uhMYhT/G8X@zeniv-ca.linux.org.uk
and another sub-thread on the same topic:
https://lore.kernel.org/r/YXBFqD9WVuU8awIv@arm.com
So far only btrfs search_ioctl() seems to be affected and that's what this
series addresses. The existing loops like generic_perform_write() already
guarantee forward progress.
Andreas raised a concern about O_DIRECT accesses since on fault the user
address is rewound to a block size boundary. I tried ext4, btrfs and gfs2
and I could not get any of them to live-lock. Depending on the alignment
of the user buffer (page or not), I found two behaviours:
- the copy to or from the user buffer succeeds entirely if it goes
through the kernel mapping (GUP, kmap'ed page; user MTE tags are not
checked) or
- the copy partially succeeds after a few attempts at uaccess on the
faulting same address (the highest number of attempts in my tests was
11 with btrfs).
Given the high cost of such sub-page probing (which is done prior to the
uaccess) my proposal is to only change the btrfs search_ioctl() (as per
the last patch). We can extend the API and call places in the future if
needed but I hope filesystems already deal with this in other ways.
This patch (of 3):
On hardware with features like arm64 MTE or SPARC ADI, an access fault can
be triggered at sub-page granularity. Depending on how the fault_in_*()
functions are used, the caller can get into a live-lock by continuously
retrying the fault-in on an address different from the one where the
uaccess failed.
In the majority of cases progress is ensured by the following conditions:
1. copy_{to,from}_user() guarantees at least one byte access if the user
address is not faulting;
2. The fault_in_*() is attempted on the next address that could not be
accessed by copy_*_user().
In the places where the above conditions are not met or the
fault-in/uaccess loop does not have a mechanism to bail out, the
fault_in_exact_writeable() ensures that the arch code will probe the range
in question at a sub-page fault granularity (e.g. 16 bytes for arm64
MTE). For large ranges, this is significantly more expensive than the
non-exact versions which probe a single byte in each page or use GUP.
The architecture code has to select ARCH_HAS_SUBPAGE_FAULTS and implement
probe_user_writeable().
Link: https://lkml.kernel.org/r/20211124192024.2408218-1-catalin.marinas@arm.com
Link: https://lkml.kernel.org/r/20211124192024.2408218-2-catalin.marinas@arm.com
Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl")
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Andreas Gruenbacher <agruenba(a)redhat.com>
Cc: David Sterba <dsterba(a)suse.com>
Cc: Josef Bacik <josef(a)toxicpanda.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/Kconfig | 7 +++++++
include/linux/pagemap.h | 1 +
include/linux/uaccess.h | 21 +++++++++++++++++++++
mm/gup.c | 19 +++++++++++++++++++
4 files changed, 48 insertions(+)
--- a/arch/Kconfig~mm-introduce-fault_in_exact_writeable-to-probe-for-sub-page-faults
+++ a/arch/Kconfig
@@ -27,6 +27,13 @@ config HAVE_IMA_KEXEC
config SET_FS
bool
+config ARCH_HAS_SUBPAGE_FAULTS
+ bool
+ help
+ Select if the architecture can check permissions at sub-page
+ granularity (e.g. arm64 MTE). The probe_user_*() functions
+ must be implemented.
+
config HOTPLUG_SMT
bool
--- a/include/linux/pagemap.h~mm-introduce-fault_in_exact_writeable-to-probe-for-sub-page-faults
+++ a/include/linux/pagemap.h
@@ -925,6 +925,7 @@ void folio_add_wait_queue(struct folio *
* Fault in userspace address range.
*/
size_t fault_in_writeable(char __user *uaddr, size_t size);
+size_t fault_in_exact_writeable(char __user *uaddr, size_t size);
size_t fault_in_safe_writeable(const char __user *uaddr, size_t size);
size_t fault_in_readable(const char __user *uaddr, size_t size);
--- a/include/linux/uaccess.h~mm-introduce-fault_in_exact_writeable-to-probe-for-sub-page-faults
+++ a/include/linux/uaccess.h
@@ -271,6 +271,27 @@ static inline bool pagefault_disabled(vo
*/
#define faulthandler_disabled() (pagefault_disabled() || in_atomic())
+#ifndef CONFIG_ARCH_HAS_SUBPAGE_FAULTS
+/**
+ * probe_user_writable: probe for sub-page faults in the user range
+ * @uaddr: start of address range
+ * @size: size of address range
+ *
+ * Returns the number of bytes not accessible (like copy_to_user() and
+ * copy_from_user()).
+ *
+ * Architectures that can generate sub-page faults (e.g. arm64 MTE) should
+ * implement this function. It is expected that the caller checked for the
+ * write permission of each page in the range either by put_user() or GUP.
+ * The architecture port can implement a more efficient get_user() probing of
+ * the range if sub-page faults are triggered by either a load or store.
+ */
+static inline size_t probe_user_writable(void __user *uaddr, size_t size)
+{
+ return 0;
+}
+#endif
+
#ifndef ARCH_HAS_NOCACHE_UACCESS
static inline __must_check unsigned long
--- a/mm/gup.c~mm-introduce-fault_in_exact_writeable-to-probe-for-sub-page-faults
+++ a/mm/gup.c
@@ -1699,6 +1699,25 @@ out:
}
EXPORT_SYMBOL(fault_in_writeable);
+/**
+ * fault_in_exact_writeable - fault in userspace address range for writing,
+ * potentially checking for sub-page faults
+ * @uaddr: start of address range
+ * @size: size of address range
+ *
+ * Returns the number of bytes not faulted in (like copy_to_user() and
+ * copy_from_user()).
+ */
+size_t fault_in_exact_writeable(char __user *uaddr, size_t size)
+{
+ size_t accessible = size - fault_in_writeable(uaddr, size);
+
+ if (accessible)
+ accessible -= probe_user_writable(uaddr, accessible);
+ return size - accessible;
+}
+EXPORT_SYMBOL(fault_in_exact_writeable);
+
/*
* fault_in_safe_writeable - fault in an address range for writing
* @uaddr: start of address range
_
Patches currently in -mm which might be from catalin.marinas(a)arm.com are
arm64-add-support-for-sub-page-faults-user-probing.patch
btrfs-avoid-live-lock-in-search_ioctl-on-hardware-with-sub-page-faults.patch
The patch titled
Subject: mm: use compare-exchange operation to set KASAN page tag
has been added to the -mm tree. Its filename is
mm-use-compare-exchange-operation-to-set-kasan-page-tag.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-use-compare-exchange-operation…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-use-compare-exchange-operation…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Collingbourne <pcc(a)google.com>
Subject: mm: use compare-exchange operation to set KASAN page tag
It has been reported that the tag setting operation on newly-allocated
pages can cause the page flags to be corrupted when performed concurrently
with other flag updates as a result of the use of non-atomic operations.
Fix the problem by using a compare-exchange loop to update the tag.
Link: https://lkml.kernel.org/r/20220113031434.464992-1-pcc@google.com
Link: https://linux-review.googlesource.com/id/I456b24a2b9067d93968d43b4bb3351c0c…
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Fixes: 2813b9c02962 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
--- a/include/linux/mm.h~mm-use-compare-exchange-operation-to-set-kasan-page-tag
+++ a/include/linux/mm.h
@@ -1524,11 +1524,17 @@ static inline u8 page_kasan_tag(const st
static inline void page_kasan_tag_set(struct page *page, u8 tag)
{
- if (kasan_enabled()) {
- tag ^= 0xff;
- page->flags &= ~(KASAN_TAG_MASK << KASAN_TAG_PGSHIFT);
- page->flags |= (tag & KASAN_TAG_MASK) << KASAN_TAG_PGSHIFT;
- }
+ unsigned long old_flags, flags;
+
+ if (!kasan_enabled())
+ return;
+
+ tag ^= 0xff;
+ do {
+ old_flags = flags = page->flags;
+ flags &= ~(KASAN_TAG_MASK << KASAN_TAG_PGSHIFT);
+ flags |= (tag & KASAN_TAG_MASK) << KASAN_TAG_PGSHIFT;
+ } while (unlikely(cmpxchg(&page->flags, old_flags, flags) != old_flags));
}
static inline void page_kasan_tag_reset(struct page *page)
_
Patches currently in -mm which might be from pcc(a)google.com are
mm-use-compare-exchange-operation-to-set-kasan-page-tag.patch