The previous patch made it possible for MTE to restore tags before they
are freed by hooking arch_do_swap_page().
However, the arch_do_swap_page() hook API is incompatible with swap
restoration in circumstances where we do not have an mm or a vma,
such as swapoff with swapped out shmem, and I expect that ADI will
currently fail to restore tags in these circumstances. This implies that
arch-specific metadata stores ought to be indexed by swap index, as MTE
does, rather than by mm and vma, as ADI does, and we should discourage
hooking arch_do_swap_page(), preferring to hook arch_swap_restore()
instead, as MTE already does.
Therefore, instead of directly hooking arch_do_swap_page() for
MTE, deprecate that hook, change its default implementation to call
arch_swap_restore() and rely on the existing implementation of the latter
for MTE.
Fixes: c145e0b47c77 ("mm: streamline COW logic in do_swap_page()")
Link: https://linux-review.googlesource.com/id/Id2f1ad76eaf606ae210e1d2dd0b7fe287…
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Reported-by: Qun-wei Lin (林群崴) <Qun-wei.Lin(a)mediatek.com>
Link: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@…
Cc: <stable(a)vger.kernel.org> # 6.1
---
include/linux/pgtable.h | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index c63cd44777ec..fc0259cf60fb 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -740,6 +740,12 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
set_pgd(pgdp, pgd); \
})
+#ifndef __HAVE_ARCH_SWAP_RESTORE
+static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
+{
+}
+#endif
+
#ifndef __HAVE_ARCH_DO_SWAP_PAGE
/*
* Some architectures support metadata associated with a page. When a
@@ -748,14 +754,14 @@ static inline int pgd_same(pgd_t pgd_a, pgd_t pgd_b)
* processors support an ADI (Application Data Integrity) tag for the
* page as metadata for the page. arch_do_swap_page() can restore this
* metadata when a page is swapped back in.
+ *
+ * This hook is deprecated. Architectures should hook arch_swap_restore()
+ * instead, because this hook is not called on all code paths that can
+ * swap in a page, particularly those where mm and vma are not available
+ * (e.g. swapoff for shmem pages).
*/
-static inline void arch_do_swap_page(struct mm_struct *mm,
- struct vm_area_struct *vma,
- unsigned long addr,
- pte_t pte, pte_t oldpte)
-{
-
-}
+#define arch_do_swap_page(mm, vma, addr, pte, oldpte) \
+ arch_swap_restore(pte_to_swp_entry(oldpte), page_folio(pte_page(pte)))
#endif
#ifndef __HAVE_ARCH_UNMAP_ONE
@@ -798,12 +804,6 @@ static inline void arch_swap_invalidate_area(int type)
}
#endif
-#ifndef __HAVE_ARCH_SWAP_RESTORE
-static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
-{
-}
-#endif
-
#ifndef __HAVE_ARCH_PGD_OFFSET_GATE
#define pgd_offset_gate(mm, addr) pgd_offset(mm, addr)
#endif
--
2.40.1.606.ga4b1b128d6-goog
Although CONFIG_DEVICE_PRIVATE and hmm_range_fault() and related
functionality was first developed on x86, it also works on arm64.
However, when trying this out on an arm64 system, it turns out that
there is a massive slowdown during the setup and teardown phases.
This slowdown is due to lots of calls to WARN_ON()'s that are checking
for pages that are out of the physical range for the CPU. However,
that's a design feature of device private pages: they are specfically
chosen in order to be outside of the range of the CPU's true physical
pages.
x86 doesn't have this warning. It only checks that pages are properly
aligned. I've shown a comparison below between x86 (which works well)
and arm64 (which has these warnings).
memunmap_pages()
pageunmap_range()
if (pgmap->type == MEMORY_DEVICE_PRIVATE)
__remove_pages()
__remove_section()
sparse_remove_section()
section_deactivate()
depopulate_section_memmap()
/* arch/arm64/mm/mmu.c */
vmemmap_free()
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
...
}
/* arch/x86/mm/init_64.c */
vmemmap_free()
{
VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
...
}
So, the warning is a false positive for this case. Therefore, skip the
warning if CONFIG_DEVICE_PRIVATE is set.
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
cc: <stable(a)vger.kernel.org>
---
arch/arm64/mm/mmu.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 6f9d8898a025..d5c9b611a8d1 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1157,8 +1157,10 @@ int __meminit vmemmap_check_pmd(pmd_t *pmdp, int node,
int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
+/* Device private pages are outside of the CPU's physical page range. */
+#ifndef CONFIG_DEVICE_PRIVATE
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
-
+#endif
if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
return vmemmap_populate_basepages(start, end, node, altmap);
else
@@ -1169,8 +1171,10 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap)
{
+/* Device private pages are outside of the CPU's physical page range. */
+#ifndef CONFIG_DEVICE_PRIVATE
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
-
+#endif
unmap_hotplug_range(start, end, true, altmap);
free_empty_tables(start, end, VMEMMAP_START, VMEMMAP_END);
}
--
2.40.0
As a result of the previous two patches, there are no circumstances
in which a swapped-in page is installed in a page table without first
having arch_swap_restore() called on it. Therefore, we no longer need
the logic in set_pte_at() that restores the tags, so remove it.
Because we can now rely on the page being locked, we no longer need to
handle the case where a page is having its tags restored by multiple tasks
concurrently, so we can slightly simplify the logic in mte_restore_tags().
This patch also fixes an issue where a page can have PG_mte_tagged set
with uninitialized tags. The issue is that the mte_sync_page_tags()
function sets PG_mte_tagged if it initializes page tags. Then we
return to mte_sync_tags(), which sets PG_mte_tagged again. At best,
this is redundant. However, it is possible for mte_sync_page_tags()
to return without having initialized tags for the page, i.e. in the
case where check_swap is true (non-compound page), is_swap_pte(old_pte)
is false and pte_is_tagged is false. So at worst, we set PG_mte_tagged
on a page with uninitialized tags. This can happen if, for example,
page migration causes a PTE for an untagged page to be replaced. If the
userspace program subsequently uses mprotect() to enable PROT_MTE for
that page, the uninitialized tags will be exposed to userspace.
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Link: https://linux-review.googlesource.com/id/I8ad54476f3b2d0144ccd8ce0c1d7a2963…
Fixes: e059853d14ca ("arm64: mte: Fix/clarify the PG_mte_tagged semantics")
Cc: <stable(a)vger.kernel.org> # 6.1
---
The Fixes: tag (and the commit message in general) are written assuming
that this patch is landed in a maintainer tree instead of
"arm64: mte: Do not set PG_mte_tagged if tags were not initialized".
arch/arm64/include/asm/mte.h | 4 ++--
arch/arm64/include/asm/pgtable.h | 14 ++------------
arch/arm64/kernel/mte.c | 32 +++-----------------------------
arch/arm64/mm/mteswap.c | 7 +++----
4 files changed, 10 insertions(+), 47 deletions(-)
diff --git a/arch/arm64/include/asm/mte.h b/arch/arm64/include/asm/mte.h
index 20dd06d70af5..dfea486a6a85 100644
--- a/arch/arm64/include/asm/mte.h
+++ b/arch/arm64/include/asm/mte.h
@@ -90,7 +90,7 @@ static inline bool try_page_mte_tagging(struct page *page)
}
void mte_zero_clear_page_tags(void *addr);
-void mte_sync_tags(pte_t old_pte, pte_t pte);
+void mte_sync_tags(pte_t pte);
void mte_copy_page_tags(void *kto, const void *kfrom);
void mte_thread_init_user(void);
void mte_thread_switch(struct task_struct *next);
@@ -122,7 +122,7 @@ static inline bool try_page_mte_tagging(struct page *page)
static inline void mte_zero_clear_page_tags(void *addr)
{
}
-static inline void mte_sync_tags(pte_t old_pte, pte_t pte)
+static inline void mte_sync_tags(pte_t pte)
{
}
static inline void mte_copy_page_tags(void *kto, const void *kfrom)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b6ba466e2e8a..efdf48392026 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -337,18 +337,8 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
* don't expose tags (instruction fetches don't check tags).
*/
if (system_supports_mte() && pte_access_permitted(pte, false) &&
- !pte_special(pte)) {
- pte_t old_pte = READ_ONCE(*ptep);
- /*
- * We only need to synchronise if the new PTE has tags enabled
- * or if swapping in (in which case another mapping may have
- * set tags in the past even if this PTE isn't tagged).
- * (!pte_none() && !pte_present()) is an open coded version of
- * is_swap_pte()
- */
- if (pte_tagged(pte) || (!pte_none(old_pte) && !pte_present(old_pte)))
- mte_sync_tags(old_pte, pte);
- }
+ !pte_special(pte) && pte_tagged(pte))
+ mte_sync_tags(pte);
__check_safe_pte_update(mm, ptep, pte);
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index f5bcb0dc6267..c40728046fed 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -35,41 +35,15 @@ DEFINE_STATIC_KEY_FALSE(mte_async_or_asymm_mode);
EXPORT_SYMBOL_GPL(mte_async_or_asymm_mode);
#endif
-static void mte_sync_page_tags(struct page *page, pte_t old_pte,
- bool check_swap, bool pte_is_tagged)
-{
- if (check_swap && is_swap_pte(old_pte)) {
- swp_entry_t entry = pte_to_swp_entry(old_pte);
-
- if (!non_swap_entry(entry))
- mte_restore_tags(entry, page);
- }
-
- if (!pte_is_tagged)
- return;
-
- if (try_page_mte_tagging(page)) {
- mte_clear_page_tags(page_address(page));
- set_page_mte_tagged(page);
- }
-}
-
-void mte_sync_tags(pte_t old_pte, pte_t pte)
+void mte_sync_tags(pte_t pte)
{
struct page *page = pte_page(pte);
long i, nr_pages = compound_nr(page);
- bool check_swap = nr_pages == 1;
- bool pte_is_tagged = pte_tagged(pte);
-
- /* Early out if there's nothing to do */
- if (!check_swap && !pte_is_tagged)
- return;
/* if PG_mte_tagged is set, tags have already been initialised */
for (i = 0; i < nr_pages; i++, page++) {
- if (!page_mte_tagged(page)) {
- mte_sync_page_tags(page, old_pte, check_swap,
- pte_is_tagged);
+ if (try_page_mte_tagging(page)) {
+ mte_clear_page_tags(page_address(page));
set_page_mte_tagged(page);
}
}
diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index cd508ba80ab1..3a78bf1b1364 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -53,10 +53,9 @@ void mte_restore_tags(swp_entry_t entry, struct page *page)
if (!tags)
return;
- if (try_page_mte_tagging(page)) {
- mte_restore_page_tags(page_address(page), tags);
- set_page_mte_tagged(page);
- }
+ WARN_ON_ONCE(!try_page_mte_tagging(page));
+ mte_restore_page_tags(page_address(page), tags);
+ set_page_mte_tagged(page);
}
void mte_invalidate_tags(int type, pgoff_t offset)
--
2.40.1.606.ga4b1b128d6-goog
The patch titled
Subject: kasan: add kasan_tag_mismatch prototype
has been added to the -mm mm-unstable branch. Its filename is
kasan-add-kasan_tag_mismatch-prototype.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: kasan: add kasan_tag_mismatch prototype
Date: Tue, 9 May 2023 16:57:20 +0200
The kasan sw-tags implementation contains one function that is only called
from assembler and has no prototype in a header. This causes a W=1
warning:
mm/kasan/sw_tags.c:171:6: warning: no previous prototype for 'kasan_tag_mismatch' [-Wmissing-prototypes]
171 | void kasan_tag_mismatch(unsigned long addr, unsigned long access_info,
Add a prototype in the local header to get a clean build.
Link: https://lkml.kernel.org/r/20230509145735.9263-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan.h | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/kasan/kasan.h~kasan-add-kasan_tag_mismatch-prototype
+++ a/mm/kasan/kasan.h
@@ -646,4 +646,7 @@ void *__hwasan_memset(void *addr, int c,
void *__hwasan_memmove(void *dest, const void *src, size_t len);
void *__hwasan_memcpy(void *dest, const void *src, size_t len);
+void kasan_tag_mismatch(unsigned long addr, unsigned long access_info,
+ unsigned long ret_ip);
+
#endif /* __MM_KASAN_KASAN_H */
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kasan-add-kasan_tag_mismatch-prototype.patch
kasan-use-internal-prototypes-matching-gcc-13-builtins.patch
The patch titled
Subject: nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-use-after-free-bug-of-nilfs_root-in-nilfs_evict_inode.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode()
Date: Wed, 10 May 2023 00:29:56 +0900
During unmount process of nilfs2, nothing holds nilfs_root structure after
nilfs2 detaches its writer in nilfs_detach_log_writer(). However, since
nilfs_evict_inode() uses nilfs_root for some cleanup operations, it may
cause use-after-free read if inodes are left in "garbage_list" and
released by nilfs_dispose_list() at the end of nilfs_detach_log_writer().
Fix this issue by modifying nilfs_evict_inode() to only clear inode
without additional metadata changes that use nilfs_root if the file system
is degraded to read-only or the writer is detached.
Link: https://lkml.kernel.org/r/20230509152956.8313-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+78d4495558999f55d1da(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/00000000000099e5ac05fb1c3b85@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
--- a/fs/nilfs2/inode.c~nilfs2-fix-use-after-free-bug-of-nilfs_root-in-nilfs_evict_inode
+++ a/fs/nilfs2/inode.c
@@ -917,6 +917,7 @@ void nilfs_evict_inode(struct inode *ino
struct nilfs_transaction_info ti;
struct super_block *sb = inode->i_sb;
struct nilfs_inode_info *ii = NILFS_I(inode);
+ struct the_nilfs *nilfs;
int ret;
if (inode->i_nlink || !ii->i_root || unlikely(is_bad_inode(inode))) {
@@ -929,6 +930,23 @@ void nilfs_evict_inode(struct inode *ino
truncate_inode_pages_final(&inode->i_data);
+ nilfs = sb->s_fs_info;
+ if (unlikely(sb_rdonly(sb) || !nilfs->ns_writer)) {
+ /*
+ * If this inode is about to be disposed after the file system
+ * has been degraded to read-only due to file system corruption
+ * or after the writer has been detached, do not make any
+ * changes that cause writes, just clear it.
+ * Do this check after read-locking ns_segctor_sem by
+ * nilfs_transaction_begin() in order to avoid a race with
+ * the writer detach operation.
+ */
+ clear_inode(inode);
+ nilfs_clear_inode(inode);
+ nilfs_transaction_abort(sb);
+ return;
+ }
+
/* TODO: some of the following operations may fail. */
nilfs_truncate_bmap(ii, 0);
nilfs_mark_inode_dirty(inode);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-bug-of-nilfs_root-in-nilfs_evict_inode.patch