The quilt patch titled
Subject: mm/memory-failure: check the mapcount of the precise page
has been removed from the -mm tree. Its filename was
mm-memory-failure-check-the-mapcount-of-the-precise-page.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: mm/memory-failure: check the mapcount of the precise page
Date: Mon, 18 Dec 2023 13:58:36 +0000
A process may map only some of the pages in a folio, and might be missed
if it maps the poisoned page but not the head page. Or it might be
unnecessarily hit if it maps the head page, but not the poisoned page.
Link: https://lkml.kernel.org/r/20231218135837.3310403-3-willy@infradead.org
Fixes: 7af446a841a2 ("HWPOISON, hugetlb: enable error handling path for hugepage")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failure-check-the-mapcount-of-the-precise-page
+++ a/mm/memory-failure.c
@@ -1570,7 +1570,7 @@ static bool hwpoison_user_mappings(struc
* This check implies we don't kill processes if their pages
* are in the swap cache early. Those are always late kills.
*/
- if (!page_mapped(hpage))
+ if (!page_mapped(p))
return true;
if (PageSwapCache(p)) {
@@ -1621,10 +1621,10 @@ static bool hwpoison_user_mappings(struc
try_to_unmap(folio, ttu);
}
- unmap_success = !page_mapped(hpage);
+ unmap_success = !page_mapped(p);
if (!unmap_success)
pr_err("%#lx: failed to unmap page (mapcount=%d)\n",
- pfn, page_mapcount(hpage));
+ pfn, page_mapcount(p));
/*
* try_to_unmap() might put mlocked page in lru cache, so call
_
Patches currently in -mm which might be from willy(a)infradead.org are
buffer-return-bool-from-grow_dev_folio.patch
buffer-calculate-block-number-inside-folio_init_buffers.patch
buffer-fix-grow_buffers-for-block-size-page_size.patch
buffer-cast-block-to-loff_t-before-shifting-it.patch
buffer-fix-various-functions-for-block-size-page_size.patch
buffer-handle-large-folios-in-__block_write_begin_int.patch
buffer-fix-more-functions-for-block-size-page_size.patch
mm-convert-ksm_might_need_to_copy-to-work-on-folios.patch
mm-convert-ksm_might_need_to_copy-to-work-on-folios-fix.patch
mm-remove-pageanonexclusive-assertions-in-unuse_pte.patch
mm-convert-unuse_pte-to-use-a-folio-throughout.patch
mm-remove-some-calls-to-page_add_new_anon_rmap.patch
mm-remove-stale-example-from-comment.patch
mm-remove-references-to-page_add_new_anon_rmap-in-comments.patch
mm-convert-migrate_vma_insert_page-to-use-a-folio.patch
mm-convert-collapse_huge_page-to-use-a-folio.patch
mm-remove-page_add_new_anon_rmap-and-lru_cache_add_inactive_or_unevictable.patch
mm-return-the-folio-from-__read_swap_cache_async.patch
mm-pass-a-folio-to-__swap_writepage.patch
mm-pass-a-folio-to-swap_writepage_fs.patch
mm-pass-a-folio-to-swap_writepage_bdev_sync.patch
mm-pass-a-folio-to-swap_writepage_bdev_async.patch
mm-pass-a-folio-to-swap_readpage_fs.patch
mm-pass-a-folio-to-swap_readpage_bdev_sync.patch
mm-pass-a-folio-to-swap_readpage_bdev_async.patch
mm-convert-swap_page_sector-to-swap_folio_sector.patch
mm-convert-swap_readpage-to-swap_read_folio.patch
mm-remove-page_swap_info.patch
mm-return-a-folio-from-read_swap_cache_async.patch
mm-convert-swap_cluster_readahead-and-swap_vma_readahead-to-return-a-folio.patch
mm-convert-swap_cluster_readahead-and-swap_vma_readahead-to-return-a-folio-fix.patch
fs-remove-clean_page_buffers.patch
fs-convert-clean_buffers-to-take-a-folio.patch
fs-reduce-stack-usage-in-__mpage_writepage.patch
fs-reduce-stack-usage-in-do_mpage_readpage.patch
adfs-remove-writepage-implementation.patch
bfs-remove-writepage-implementation.patch
hfs-really-remove-hfs_writepage.patch
hfsplus-really-remove-hfsplus_writepage.patch
minix-remove-writepage-implementation.patch
ocfs2-remove-writepage-implementation.patch
sysv-remove-writepage-implementation.patch
ufs-remove-writepage-implementation.patch
fs-convert-block_write_full_page-to-block_write_full_folio.patch
fs-remove-the-bh_end_io-argument-from-__block_write_full_folio.patch
The quilt patch titled
Subject: mm/memory-failure: pass the folio and the page to collect_procs()
has been removed from the -mm tree. Its filename was
mm-memory-failure-pass-the-folio-and-the-page-to-collect_procs.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: mm/memory-failure: pass the folio and the page to collect_procs()
Date: Mon, 18 Dec 2023 13:58:35 +0000
Patch series "Three memory-failure fixes".
I've been looking at the memory-failure code and I believe I have found
three bugs that need fixing -- one going all the way back to 2010! I'll
have more patches later to use folios more extensively but didn't want
these bugfixes to get caught up in that.
This patch (of 3):
Both collect_procs_anon() and collect_procs_file() iterate over the VMA
interval trees looking for a single pgoff, so it is wrong to look for the
pgoff of the head page as is currently done. However, it is also wrong to
look at page->mapping of the precise page as this is invalid for tail
pages. Clear up the confusion by passing both the folio and the precise
page to collect_procs().
Link: https://lkml.kernel.org/r/20231218135837.3310403-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20231218135837.3310403-2-willy@infradead.org
Fixes: 415c64c1453a ("mm/memory-failure: split thp earlier in memory error handling")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failure-pass-the-folio-and-the-page-to-collect_procs
+++ a/mm/memory-failure.c
@@ -595,10 +595,9 @@ struct task_struct *task_early_kill(stru
/*
* Collect processes when the error hit an anonymous page.
*/
-static void collect_procs_anon(struct page *page, struct list_head *to_kill,
- int force_early)
+static void collect_procs_anon(struct folio *folio, struct page *page,
+ struct list_head *to_kill, int force_early)
{
- struct folio *folio = page_folio(page);
struct vm_area_struct *vma;
struct task_struct *tsk;
struct anon_vma *av;
@@ -633,12 +632,12 @@ static void collect_procs_anon(struct pa
/*
* Collect processes when the error hit a file mapped page.
*/
-static void collect_procs_file(struct page *page, struct list_head *to_kill,
- int force_early)
+static void collect_procs_file(struct folio *folio, struct page *page,
+ struct list_head *to_kill, int force_early)
{
struct vm_area_struct *vma;
struct task_struct *tsk;
- struct address_space *mapping = page->mapping;
+ struct address_space *mapping = folio->mapping;
pgoff_t pgoff;
i_mmap_lock_read(mapping);
@@ -704,17 +703,17 @@ static void collect_procs_fsdax(struct p
/*
* Collect the processes who have the corrupted page mapped to kill.
*/
-static void collect_procs(struct page *page, struct list_head *tokill,
- int force_early)
+static void collect_procs(struct folio *folio, struct page *page,
+ struct list_head *tokill, int force_early)
{
- if (!page->mapping)
+ if (!folio->mapping)
return;
if (unlikely(PageKsm(page)))
collect_procs_ksm(page, tokill, force_early);
else if (PageAnon(page))
- collect_procs_anon(page, tokill, force_early);
+ collect_procs_anon(folio, page, tokill, force_early);
else
- collect_procs_file(page, tokill, force_early);
+ collect_procs_file(folio, page, tokill, force_early);
}
struct hwpoison_walk {
@@ -1602,7 +1601,7 @@ static bool hwpoison_user_mappings(struc
* mapped in dirty form. This has to be done before try_to_unmap,
* because ttu takes the rmap data structures down.
*/
- collect_procs(hpage, &tokill, flags & MF_ACTION_REQUIRED);
+ collect_procs(folio, p, &tokill, flags & MF_ACTION_REQUIRED);
if (PageHuge(hpage) && !PageAnon(hpage)) {
/*
@@ -1772,7 +1771,7 @@ static int mf_generic_kill_procs(unsigne
* SIGBUS (i.e. MF_MUST_KILL)
*/
flags |= MF_ACTION_REQUIRED | MF_MUST_KILL;
- collect_procs(&folio->page, &to_kill, true);
+ collect_procs(folio, &folio->page, &to_kill, true);
unmap_and_kill(&to_kill, pfn, folio->mapping, folio->index, flags);
unlock:
_
Patches currently in -mm which might be from willy(a)infradead.org are
buffer-return-bool-from-grow_dev_folio.patch
buffer-calculate-block-number-inside-folio_init_buffers.patch
buffer-fix-grow_buffers-for-block-size-page_size.patch
buffer-cast-block-to-loff_t-before-shifting-it.patch
buffer-fix-various-functions-for-block-size-page_size.patch
buffer-handle-large-folios-in-__block_write_begin_int.patch
buffer-fix-more-functions-for-block-size-page_size.patch
mm-convert-ksm_might_need_to_copy-to-work-on-folios.patch
mm-convert-ksm_might_need_to_copy-to-work-on-folios-fix.patch
mm-remove-pageanonexclusive-assertions-in-unuse_pte.patch
mm-convert-unuse_pte-to-use-a-folio-throughout.patch
mm-remove-some-calls-to-page_add_new_anon_rmap.patch
mm-remove-stale-example-from-comment.patch
mm-remove-references-to-page_add_new_anon_rmap-in-comments.patch
mm-convert-migrate_vma_insert_page-to-use-a-folio.patch
mm-convert-collapse_huge_page-to-use-a-folio.patch
mm-remove-page_add_new_anon_rmap-and-lru_cache_add_inactive_or_unevictable.patch
mm-return-the-folio-from-__read_swap_cache_async.patch
mm-pass-a-folio-to-__swap_writepage.patch
mm-pass-a-folio-to-swap_writepage_fs.patch
mm-pass-a-folio-to-swap_writepage_bdev_sync.patch
mm-pass-a-folio-to-swap_writepage_bdev_async.patch
mm-pass-a-folio-to-swap_readpage_fs.patch
mm-pass-a-folio-to-swap_readpage_bdev_sync.patch
mm-pass-a-folio-to-swap_readpage_bdev_async.patch
mm-convert-swap_page_sector-to-swap_folio_sector.patch
mm-convert-swap_readpage-to-swap_read_folio.patch
mm-remove-page_swap_info.patch
mm-return-a-folio-from-read_swap_cache_async.patch
mm-convert-swap_cluster_readahead-and-swap_vma_readahead-to-return-a-folio.patch
mm-convert-swap_cluster_readahead-and-swap_vma_readahead-to-return-a-folio-fix.patch
fs-remove-clean_page_buffers.patch
fs-convert-clean_buffers-to-take-a-folio.patch
fs-reduce-stack-usage-in-__mpage_writepage.patch
fs-reduce-stack-usage-in-do_mpage_readpage.patch
adfs-remove-writepage-implementation.patch
bfs-remove-writepage-implementation.patch
hfs-really-remove-hfs_writepage.patch
hfsplus-really-remove-hfsplus_writepage.patch
minix-remove-writepage-implementation.patch
ocfs2-remove-writepage-implementation.patch
sysv-remove-writepage-implementation.patch
ufs-remove-writepage-implementation.patch
fs-convert-block_write_full_page-to-block_write_full_folio.patch
fs-remove-the-bh_end_io-argument-from-__block_write_full_folio.patch
The quilt patch titled
Subject: selftests: secretmem: floor the memory size to the multiple of page_size
has been removed from the -mm tree. Its filename was
selftests-secretmem-floor-the-memory-size-to-the-multiple-of-page_size.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: selftests: secretmem: floor the memory size to the multiple of page_size
Date: Thu, 14 Dec 2023 15:19:30 +0500
The "locked-in-memory size" limit per process can be non-multiple of
page_size. The mmap() fails if we try to allocate locked-in-memory with
same size as the allowed limit if it isn't multiple of the page_size
because mmap() rounds off the memory size to be allocated to next multiple
of page_size.
Fix this by flooring the length to be allocated with mmap() to the
previous multiple of the page_size.
This was getting triggered on KernelCI regularly because of different
ulimit settings which wasn't multiple of the page_size. Find logs
here: https://linux.kernelci.org/test/plan/id/657654bd8e81e654fae13532/
The bug in was present from the time test was first added.
Link: https://lkml.kernel.org/r/20231214101931.1155586-1-usama.anjum@collabora.com
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reported-by: "kernelci.org bot" <bot(a)kernelci.org>
Closes: https://linux.kernelci.org/test/plan/id/657654bd8e81e654fae13532/
Cc: "James E.J. Bottomley" <James.Bottomley(a)HansenPartnership.com>
Cc: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/memfd_secret.c | 3 +++
1 file changed, 3 insertions(+)
--- a/tools/testing/selftests/mm/memfd_secret.c~selftests-secretmem-floor-the-memory-size-to-the-multiple-of-page_size
+++ a/tools/testing/selftests/mm/memfd_secret.c
@@ -62,6 +62,9 @@ static void test_mlock_limit(int fd)
char *mem;
len = mlock_limit_cur;
+ if (len % page_size != 0)
+ len = (len/page_size) * page_size;
+
mem = mmap(NULL, len, prot, mode, fd, 0);
if (mem == MAP_FAILED) {
fail("unable to mmap secret memory\n");
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
The quilt patch titled
Subject: mm: migrate high-order folios in swap cache correctly
has been removed from the -mm tree. Its filename was
mm-migrate-high-order-folios-in-swap-cache-correctly.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Charan Teja Kalla <quic_charante(a)quicinc.com>
Subject: mm: migrate high-order folios in swap cache correctly
Date: Thu, 14 Dec 2023 04:58:41 +0000
Large folios occupy N consecutive entries in the swap cache instead of
using multi-index entries like the page cache. However, if a large folio
is re-added to the LRU list, it can be migrated. The migration code was
not aware of the difference between the swap cache and the page cache and
assumed that a single xas_store() would be sufficient.
This leaves potentially many stale pointers to the now-migrated folio in
the swap cache, which can lead to almost arbitrary data corruption in the
future. This can also manifest as infinite loops with the RCU read lock
held.
[willy(a)infradead.org: modifications to the changelog & tweaked the fix]
Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
Link: https://lkml.kernel.org/r/20231214045841.961776-1-willy@infradead.org
Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Closes: https://lkml.kernel.org/r/1700569840-17327-1-git-send-email-quic_charante@q…
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-high-order-folios-in-swap-cache-correctly
+++ a/mm/migrate.c
@@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address
int dirty;
int expected_count = folio_expected_refs(mapping, folio) + extra_count;
long nr = folio_nr_pages(folio);
+ long entries, i;
if (!mapping) {
/* Anonymous page without mapping */
@@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address
folio_set_swapcache(newfolio);
newfolio->private = folio_get_private(folio);
}
+ entries = nr;
} else {
VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
+ entries = 1;
}
/* Move dirty while page refs frozen and newpage not yet exposed */
@@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address
folio_set_dirty(newfolio);
}
- xas_store(&xas, newfolio);
+ /* Swap cache still stores N entries instead of a high-order entry */
+ for (i = 0; i < entries; i++) {
+ xas_store(&xas, newfolio);
+ xas_next(&xas);
+ }
/*
* Drop cache reference from old page by unfreezing
_
Patches currently in -mm which might be from quic_charante(a)quicinc.com are
mm-sparsemem-fix-race-in-accessing-memory_section-usage.patch
mm-sparsemem-fix-race-in-accessing-memory_section-usage-v2.patch
The quilt patch titled
Subject: maple_tree: do not preallocate nodes for slot stores
has been removed from the -mm tree. Its filename was
maple_tree-do-not-preallocate-nodes-for-slot-stores.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Subject: maple_tree: do not preallocate nodes for slot stores
Date: Wed, 13 Dec 2023 12:50:57 -0800
mas_preallocate() defaults to requesting 1 node for preallocation and then
,depending on the type of store, will update the request variable. There
isn't a check for a slot store type, so slot stores are preallocating the
default 1 node. Slot stores do not require any additional nodes, so add a
check for the slot store case that will bypass node_count_gfp(). Update
the tests to reflect that slot stores do not require allocations.
User visible effects of this bug include increased memory usage from the
unneeded node that was allocated.
Link: https://lkml.kernel.org/r/20231213205058.386589-1-sidhartha.kumar@oracle.com
Fixes: 0b8bb544b1a7 ("maple_tree: update mas_preallocate() testing")
Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 11 +++++++++++
tools/testing/radix-tree/maple.c | 2 +-
2 files changed, 12 insertions(+), 1 deletion(-)
--- a/lib/maple_tree.c~maple_tree-do-not-preallocate-nodes-for-slot-stores
+++ a/lib/maple_tree.c
@@ -5501,6 +5501,17 @@ int mas_preallocate(struct ma_state *mas
mas_wr_end_piv(&wr_mas);
node_size = mas_wr_new_end(&wr_mas);
+
+ /* Slot store, does not require additional nodes */
+ if (node_size == wr_mas.node_end) {
+ /* reuse node */
+ if (!mt_in_rcu(mas->tree))
+ return 0;
+ /* shifting boundary */
+ if (wr_mas.offset_end - mas->offset == 1)
+ return 0;
+ }
+
if (node_size >= mt_slots[wr_mas.type]) {
/* Split, worst case for now. */
request = 1 + mas_mt_height(mas) * 2;
--- a/tools/testing/radix-tree/maple.c~maple_tree-do-not-preallocate-nodes-for-slot-stores
+++ a/tools/testing/radix-tree/maple.c
@@ -35538,7 +35538,7 @@ static noinline void __init check_preall
MT_BUG_ON(mt, mas_preallocate(&mas, ptr, GFP_KERNEL) != 0);
allocated = mas_allocated(&mas);
height = mas_mt_height(&mas);
- MT_BUG_ON(mt, allocated != 1);
+ MT_BUG_ON(mt, allocated != 0);
mas_store_prealloc(&mas, ptr);
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
_
Patches currently in -mm which might be from sidhartha.kumar(a)oracle.com are
The quilt patch titled
Subject: mm/filemap: avoid buffered read/write race to read inconsistent data
has been removed from the -mm tree. Its filename was
mm-filemap-avoid-buffered-read-write-race-to-read-inconsistent-data.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baokun Li <libaokun1(a)huawei.com>
Subject: mm/filemap: avoid buffered read/write race to read inconsistent data
Date: Wed, 13 Dec 2023 14:23:24 +0800
The following concurrency may cause the data read to be inconsistent with
the data on disk:
cpu1 cpu2
------------------------------|------------------------------
// Buffered write 2048 from 0
ext4_buffered_write_iter
generic_perform_write
copy_page_from_iter_atomic
ext4_da_write_end
ext4_da_do_write_end
block_write_end
__block_commit_write
folio_mark_uptodate
// Buffered read 4096 from 0 smp_wmb()
ext4_file_read_iter set_bit(PG_uptodate, folio_flags)
generic_file_read_iter i_size_write // 2048
filemap_read unlock_page(page)
filemap_get_pages
filemap_get_read_batch
folio_test_uptodate(folio)
ret = test_bit(PG_uptodate, folio_flags)
if (ret)
smp_rmb();
// Ensure that the data in page 0-2048 is up-to-date.
// New buffered write 2048 from 2048
ext4_buffered_write_iter
generic_perform_write
copy_page_from_iter_atomic
ext4_da_write_end
ext4_da_do_write_end
block_write_end
__block_commit_write
folio_mark_uptodate
smp_wmb()
set_bit(PG_uptodate, folio_flags)
i_size_write // 4096
unlock_page(page)
isize = i_size_read(inode) // 4096
// Read the latest isize 4096, but without smp_rmb(), there may be
// Load-Load disorder resulting in the data in the 2048-4096 range
// in the page is not up-to-date.
copy_page_to_iter
// copyout 4096
In the concurrency above, we read the updated i_size, but there is no read
barrier to ensure that the data in the page is the same as the i_size at
this point, so we may copy the unsynchronized page out. Hence adding the
missing read memory barrier to fix this.
This is a Load-Load reordering issue, which only occurs on some weak
mem-ordering architectures (e.g. ARM64, ALPHA), but not on strong
mem-ordering architectures (e.g. X86). And theoretically the problem
doesn't only happen on ext4, filesystems that call filemap_read() but
don't hold inode lock (e.g. btrfs, f2fs, ubifs ...) will have this
problem, while filesystems with inode lock (e.g. xfs, nfs) won't have
this problem.
Link: https://lkml.kernel.org/r/20231213062324.739009-1-libaokun1@huawei.com
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Andreas Dilger <adilger.kernel(a)dilger.ca>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Dave Chinner <david(a)fromorbit.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Theodore Ts'o <tytso(a)mit.edu>
Cc: yangerkun <yangerkun(a)huawei.com>
Cc: Yu Kuai <yukuai3(a)huawei.com>
Cc: Zhang Yi <yi.zhang(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/filemap.c~mm-filemap-avoid-buffered-read-write-race-to-read-inconsistent-data
+++ a/mm/filemap.c
@@ -2608,6 +2608,15 @@ ssize_t filemap_read(struct kiocb *iocb,
end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
/*
+ * Pairs with a barrier in
+ * block_write_end()->mark_buffer_dirty() or other page
+ * dirtying routines like iomap_write_end() to ensure
+ * changes to page contents are visible before we see
+ * increased inode size.
+ */
+ smp_rmb();
+
+ /*
* Once we start copying data, we don't want to be touching any
* cachelines that might be contended:
*/
_
Patches currently in -mm which might be from libaokun1(a)huawei.com are