The patch titled
Subject: mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
has been added to the -mm tree. Its filename is
mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-filemap-check-if-thp-has-hwpoi…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-filemap-check-if-thp-has-hwpoi…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
When handling shmem page fault the THP with corrupted subpage could be PMD
mapped if certain conditions are satisfied. But kernel is supposed to
send SIGBUS when trying to map hwpoisoned page.
There are two paths which may do PMD map: fault around and regular fault.
Before commit f9ce0be71d1f ("mm: Cleanup faultaround and finish_fault()
codepaths") the thing was even worse in fault around path. The THP could
be PMD mapped as long as the VMA fits regardless what subpage is accessed
and corrupted. After this commit as long as head page is not corrupted
the THP could be PMD mapped.
In the regular fault path the THP could be PMD mapped as long as the
corrupted page is not accessed and the VMA fits.
This loophole could be fixed by iterating every subpage to check if any of
them is hwpoisoned or not, but it is somewhat costly in page fault path.
So introduce a new page flag called HasHWPoisoned on the first tail page.
It indicates the THP has hwpoisoned subpage(s). It is set if any subpage
of THP is found hwpoisoned by memory failure and after the refcount is
bumped successfully, then cleared when the THP is freed or split.
The soft offline path doesn't need this since soft offline handler just
marks a subpage hwpoisoned when the subpage is migrated successfully. But
shmem THP didn't get split then migrated at all.
Link: https://lkml.kernel.org/r/20211014191615.6674-3-shy828301@gmail.com
Fixes: 800d8c63b2e9 ("shmem: add huge pages support")
Suggested-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/page-flags.h | 23 +++++++++++++++++++++++
mm/huge_memory.c | 2 ++
mm/memory-failure.c | 14 ++++++++++++++
mm/memory.c | 9 +++++++++
mm/page_alloc.c | 4 +++-
5 files changed, 51 insertions(+), 1 deletion(-)
--- a/include/linux/page-flags.h~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/include/linux/page-flags.h
@@ -171,6 +171,15 @@ enum pageflags {
/* Compound pages. Stored in first tail page's flags */
PG_double_map = PG_workingset,
+#ifdef CONFIG_MEMORY_FAILURE
+ /*
+ * Compound pages. Stored in first tail page's flags.
+ * Indicates that at least one subpage is hwpoisoned in the
+ * THP.
+ */
+ PG_has_hwpoisoned = PG_mappedtodisk,
+#endif
+
/* non-lru isolated movable page */
PG_isolated = PG_reclaim,
@@ -668,6 +677,20 @@ PAGEFLAG_FALSE(DoubleMap)
TESTSCFLAG_FALSE(DoubleMap)
#endif
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
+/*
+ * PageHasHWPoisoned indicates that at least on subpage is hwpoisoned in the
+ * compound page.
+ *
+ * This flag is set by hwpoison handler. Cleared by THP split or free page.
+ */
+PAGEFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+ TESTSCFLAG(HasHWPoisoned, has_hwpoisoned, PF_SECOND)
+#else
+PAGEFLAG_FALSE(HasHWPoisoned)
+ TESTSCFLAG_FALSE(HasHWPoisoned)
+#endif
+
/*
* Check if a page is currently marked HWPoisoned. Note that this check is
* best effort only and inherently racy: there is no way to synchronize with
--- a/mm/huge_memory.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/huge_memory.c
@@ -2426,6 +2426,8 @@ static void __split_huge_page(struct pag
/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
lruvec = lock_page_lruvec(head);
+ ClearPageHasHWPoisoned(head);
+
for (i = nr - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond EOF: drop them from page cache */
--- a/mm/memory.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/memory.c
@@ -3897,6 +3897,15 @@ vm_fault_t do_set_pmd(struct vm_fault *v
return ret;
/*
+ * Just backoff if any subpage of a THP is corrupted otherwise
+ * the corrupted page may mapped by PMD silently to escape the
+ * check. This kind of THP just can be PTE mapped. Access to
+ * the corrupted subpage should trigger SIGBUS as expected.
+ */
+ if (unlikely(PageHasHWPoisoned(page)))
+ return ret;
+
+ /*
* Archs like ppc64 need additional space to store information
* related to pte entry. Use the preallocated table for that.
*/
--- a/mm/memory-failure.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/memory-failure.c
@@ -1695,6 +1695,20 @@ try_again:
}
if (PageTransHuge(hpage)) {
+ /*
+ * The flag must be set after the refcount is bumpped
+ * otherwise it may race with THP split.
+ * And the flag can't be set in get_hwpoison_page() since
+ * it is called by soft offline too and it is just called
+ * for !MF_COUNT_INCREASE. So here seems to be the best
+ * place.
+ *
+ * Don't need care about the above error handling paths for
+ * get_hwpoison_page() since they handle either free page
+ * or unhandlable page. The refcount is bumpped iff the
+ * page is a valid handlable page.
+ */
+ SetPageHasHWPoisoned(hpage);
if (try_to_split_thp_page(p, "Memory Failure") < 0) {
action_result(pfn, MF_MSG_UNSPLIT_THP, MF_IGNORED);
res = -EBUSY;
--- a/mm/page_alloc.c~mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault
+++ a/mm/page_alloc.c
@@ -1310,8 +1310,10 @@ static __always_inline bool free_pages_p
VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
- if (compound)
+ if (compound) {
ClearPageDoubleMap(page);
+ ClearPageHasHWPoisoned(page);
+ }
for (i = 1; i < (1 << order); i++) {
if (compound)
bad += free_tail_pages_check(page, page + i);
_
Patches currently in -mm which might be from shy828301(a)gmail.com are
mm-hwpoison-remove-the-unnecessary-thp-check.patch
mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault.patch
mm-filemap-coding-style-cleanup-for-filemap_map_pmd.patch
mm-hwpoison-refactor-refcount-check-handling.patch
mm-shmem-dont-truncate-page-if-memory-failure-happens.patch
mm-hwpoison-handle-non-anonymous-thp-correctly.patch
mm-migrate-make-demotion-knob-depend-on-migration.patch
The patch titled
Subject: mm: hwpoison: remove the unnecessary THP check
has been added to the -mm tree. Its filename is
mm-hwpoison-remove-the-unnecessary-thp-check.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hwpoison-remove-the-unnecessar…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hwpoison-remove-the-unnecessar…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yang Shi <shy828301(a)gmail.com>
Subject: mm: hwpoison: remove the unnecessary THP check
Patch series "Solve silent data loss caused by poisoned page cache (shmem/tmpfs)", v4.
When discussing the patch that splits page cache THP in order to offline
the poisoned page, Noaya mentioned there is a bigger problem [1] that
prevents this from working since the page cache page will be truncated if
uncorrectable errors happen. By looking this deeper it turns out this
approach (truncating poisoned page) may incur silent data loss for all
non-readonly filesystems if the page is dirty. It may be worse for
in-memory filesystem, e.g. shmem/tmpfs since the data blocks are actually
gone.
To solve this problem we could keep the poisoned dirty page in page cache
then notify the users on any later access, e.g. page fault, read/write,
etc. The clean page could be truncated as is since they can be reread
from disk later on.
The consequence is the filesystems may find poisoned page and manipulate
it as healthy page since all the filesystems actually don't check if the
page is poisoned or not in all the relevant paths except page fault. In
general, we need make the filesystems be aware of poisoned page before we
could keep the poisoned page in page cache in order to solve the data loss
problem.
To make filesystems be aware of poisoned page we should consider:
- The page should be not written back: clearing dirty flag could prevent from
writeback.
- The page should not be dropped (it shows as a clean page) by drop caches or
other callers: the refcount pin from hwpoison could prevent from invalidating
(called by cache drop, inode cache shrinking, etc), but it doesn't avoid
invalidation in DIO path.
- The page should be able to get truncated/hole punched/unlinked: it works as it
is.
- Notify users when the page is accessed, e.g. read/write, page fault and other
paths (compression, encryption, etc).
The scope of the last one is huge since almost all filesystems need do it once
a page is returned from page cache lookup. There are a couple of options to
do it:
1. Check hwpoison flag for every path, the most straightforward way.
2. Return NULL for poisoned page from page cache lookup, the most callsites
check if NULL is returned, this should have least work I think. But the
error handling in filesystems just return -ENOMEM, the error code will incur
confusion to the users obviously.
3. To improve #2, we could return error pointer, e.g. ERR_PTR(-EIO), but this
will involve significant amount of code change as well since all the paths
need check if the pointer is ERR or not just like option #1.
I did prototype for both #1 and #3, but it seems #3 may require more
changes than #1. For #3 ERR_PTR will be returned so all the callers need
to check the return value otherwise invalid pointer may be dereferenced,
but not all callers really care about the content of the page, for
example, partial truncate which just sets the truncated range in one page
to 0. So for such paths it needs additional modification if ERR_PTR is
returned. And if the callers have their own way to handle the problematic
pages we need to add a new FGP flag to tell FGP functions to return the
pointer to the page.
It may happen very rarely, but once it happens the consequence (data
corruption) could be very bad and it is very hard to debug. It seems this
problem had been slightly discussed before, but seems no action was taken
at that time. [2]
As the aforementioned investigation, it needs huge amount of work to solve
the potential data loss for all filesystems. But it is much easier for
in-memory filesystems and such filesystems actually suffer more than
others since even the data blocks are gone due to truncating. So this
patchset starts from shmem/tmpfs by taking option #1.
TODO:
* The unpoison has been broken since commit 0ed950d1f281 ("mm,hwpoison: make
get_hwpoison_page() call get_any_page()"), and this patch series make
refcount check for unpoisoning shmem page fail.
* Expand to other filesystems. But I haven't heard feedback from filesystem
developers yet.
Patch breakdown:
Patch #1: cleanup, depended by patch #2
Patch #2: fix THP with hwpoisoned subpage(s) PMD map bug
Patch #3: coding style cleanup
Patch #4: refactor and preparation.
Patch #5: keep the poisoned page in page cache and handle such case for all
the paths.
Patch #6: the previous patches unblock page cache THP split, so this patch
add page cache THP split support.
This patch (of 6):
When handling THP hwpoison checked if the THP is in allocation or free
stage since hwpoison may mistreat it as hugetlb page. After commit
415c64c1453a ("mm/memory-failure: split thp earlier in memory error
handling") the problem has been fixed, so this check is no longer needed.
Remove it. The side effect of the removal is hwpoison may report unsplit
THP instead of unknown error for shmem THP. It seems not like a big deal.
The following patch depends on this, which fixes shmem THP with hwpoisoned
subpage(s) are mapped PMD wrongly. So this patch needs to be backported
to -stable as well.
Link: https://lkml.kernel.org/r/20211014191615.6674-1-shy828301@gmail.com
Link: https://lkml.kernel.org/r/20211014191615.6674-2-shy828301@gmail.com
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Suggested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 14 --------------
1 file changed, 14 deletions(-)
--- a/mm/memory-failure.c~mm-hwpoison-remove-the-unnecessary-thp-check
+++ a/mm/memory-failure.c
@@ -1148,20 +1148,6 @@ static int __get_hwpoison_page(struct pa
if (!HWPoisonHandlable(head))
return -EBUSY;
- if (PageTransHuge(head)) {
- /*
- * Non anonymous thp exists only in allocation/free time. We
- * can't handle such a case correctly, so let's give it up.
- * This should be better than triggering BUG_ON when kernel
- * tries to touch the "partially handled" page.
- */
- if (!PageAnon(head)) {
- pr_err("Memory failure: %#lx: non anonymous thp\n",
- page_to_pfn(page));
- return 0;
- }
- }
-
if (get_page_unless_zero(head)) {
if (head == compound_head(page))
return 1;
_
Patches currently in -mm which might be from shy828301(a)gmail.com are
mm-hwpoison-remove-the-unnecessary-thp-check.patch
mm-filemap-check-if-thp-has-hwpoisoned-subpage-for-pmd-page-fault.patch
mm-filemap-coding-style-cleanup-for-filemap_map_pmd.patch
mm-hwpoison-refactor-refcount-check-handling.patch
mm-shmem-dont-truncate-page-if-memory-failure-happens.patch
mm-hwpoison-handle-non-anonymous-thp-correctly.patch
mm-migrate-make-demotion-knob-depend-on-migration.patch