The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x 98c6d259319ecf6e8d027abd3f14b81324b8c0ad # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025092158-payee-omega-5893@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 98c6d259319ecf6e8d027abd3f14b81324b8c0ad Mon Sep 17 00:00:00 2001 From: Hugh Dickins hughd@google.com Date: Mon, 8 Sep 2025 15:15:03 -0700 Subject: [PATCH] mm/gup: check ref_count instead of lru before migration
Patch series "mm: better GUP pin lru_add_drain_all()", v2.
Series of lru_add_drain_all()-related patches, arising from recent mm/gup migration report from Will Deacon.
This patch (of 5):
Will Deacon reports:-
When taking a longterm GUP pin via pin_user_pages(), __gup_longterm_locked() tries to migrate target folios that should not be longterm pinned, for example because they reside in a CMA region or movable zone. This is done by first pinning all of the target folios anyway, collecting all of the longterm-unpinnable target folios into a list, dropping the pins that were just taken and finally handing the list off to migrate_pages() for the actual migration.
It is critically important that no unexpected references are held on the folios being migrated, otherwise the migration will fail and pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is relatively easy to observe migration failures when running pKVM (which uses pin_user_pages() on crosvm's virtual address space to resolve stage-2 page faults from the guest) on a 6.15-based Pixel 6 device and this results in the VM terminating prematurely.
In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its mapping of guest memory prior to the pinning. Subsequently, when pin_user_pages() walks the page-table, the relevant 'pte' is not present and so the faulting logic allocates a new folio, mlocks it with mlock_folio() and maps it in the page-table.
Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch by pagevec"), mlock/munlock operations on a folio (formerly page), are deferred. For example, mlock_folio() takes an additional reference on the target folio before placing it into a per-cpu 'folio_batch' for later processing by mlock_folio_batch(), which drops the refcount once the operation is complete. Processing of the batches is coupled with the LRU batch logic and can be forcefully drained with lru_add_drain_all() but as long as a folio remains unprocessed on the batch, its refcount will be elevated.
This deferred batching therefore interacts poorly with the pKVM pinning scenario as we can find ourselves in a situation where the migration code fails to migrate a folio due to the elevated refcount from the pending mlock operation.
Hugh Dickins adds:-
!folio_test_lru() has never been a very reliable way to tell if an lru_add_drain_all() is worth calling, to remove LRU cache references to make the folio migratable: the LRU flag may be set even while the folio is held with an extra reference in a per-CPU LRU cache.
5.18 commit 2fbb0c10d1e8 may have made it more unreliable. Then 6.11 commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch") tried to make it reliable, by moving LRU flag clearing; but missed the mlock/munlock batches, so still unreliable as reported.
And it turns out to be difficult to extend 33dfe9204f29's LRU flag clearing to the mlock/munlock batches: if they do benefit from batching, mlock/munlock cannot be so effective when easily suppressed while !LRU.
Instead, switch to an expected ref_count check, which was more reliable all along: some more false positives (unhelpful drains) than before, and never a guarantee that the folio will prove migratable, but better.
Note on PG_private_2: ceph and nfs are still using the deprecated PG_private_2 flag, with the aid of netfs and filemap support functions. Although it is consistently matched by an increment of folio ref_count, folio_expected_ref_count() intentionally does not recognize it, and ceph folio migration currently depends on that for PG_private_2 folios to be rejected. New references to the deprecated flag are discouraged, so do not add it into the collect_longterm_unpinnable_folios() calculation: but longterm pinning of transiently PG_private_2 ceph and nfs folios (an uncommon case) may invoke a redundant lru_add_drain_all(). And this makes easy the backport to earlier releases: up to and including 6.12, btrfs also used PG_private_2, but without a ref_count increment.
Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation").
Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins hughd@google.com Reported-by: Will Deacon will@kernel.org Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/ Acked-by: Kiryl Shutsemau kas@kernel.org Acked-by: David Hildenbrand david@redhat.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Chris Li chrisl@kernel.org Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Johannes Weiner hannes@cmpxchg.org Cc: John Hubbard jhubbard@nvidia.com Cc: Keir Fraser keirf@google.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Li Zhe lizhe.67@bytedance.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Rik van Riel riel@surriel.com Cc: Shivank Garg shivankg@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Xu weixugc@google.com Cc: yangge yangge1116@126.com Cc: Yuanchu Xie yuanchu@google.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
diff --git a/mm/gup.c b/mm/gup.c index adffe663594d..82aec6443c0a 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2307,7 +2307,8 @@ static unsigned long collect_longterm_unpinnable_folios( continue; }
- if (!folio_test_lru(folio) && drain_allow) { + if (drain_allow && folio_ref_count(folio) != + folio_expected_ref_count(folio) + 1) { lru_add_drain_all(); drain_allow = false; }
From: Shivank Garg shivankg@amd.com
[ Upstream commit 86ebd50224c0734d965843260d0dc057a9431c61 ]
Patch series " JFS: Implement migrate_folio for jfs_metapage_aops" v5.
This patchset addresses a warning that occurs during memory compaction due to JFS's missing migrate_folio operation. The warning was introduced by commit 7ee3647243e5 ("migrate: Remove call to ->writepage") which added explicit warnings when filesystem don't implement migrate_folio.
The syzbot reported following [1]: jfs_metapage_aops does not implement migrate_folio WARNING: CPU: 1 PID: 5861 at mm/migrate.c:955 fallback_migrate_folio mm/migrate.c:953 [inline] WARNING: CPU: 1 PID: 5861 at mm/migrate.c:955 move_to_new_folio+0x70e/0x840 mm/migrate.c:1007 Modules linked in: CPU: 1 UID: 0 PID: 5861 Comm: syz-executor280 Not tainted 6.15.0-rc1-next-20250411-syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 RIP: 0010:fallback_migrate_folio mm/migrate.c:953 [inline] RIP: 0010:move_to_new_folio+0x70e/0x840 mm/migrate.c:1007
To fix this issue, this series implement metapage_migrate_folio() for JFS which handles both single and multiple metapages per page configurations.
While most filesystems leverage existing migration implementations like filemap_migrate_folio(), buffer_migrate_folio_norefs() or buffer_migrate_folio() (which internally used folio_expected_refs()), JFS's metapage architecture requires special handling of its private data during migration. To support this, this series introduce the folio_expected_ref_count(), which calculates external references to a folio from page/swap cache, private data, and page table mappings.
This standardized implementation replaces the previous ad-hoc folio_expected_refs() function and enables JFS to accurately determine whether a folio has unexpected references before attempting migration.
Implement folio_expected_ref_count() to calculate expected folio reference counts from: - Page/swap cache (1 per page) - Private data (1) - Page table mappings (1 per map)
While originally needed for page migration operations, this improved implementation standardizes reference counting by consolidating all refcount contributors into a single, reusable function that can benefit any subsystem needing to detect unexpected references to folios.
The folio_expected_ref_count() returns the sum of these external references without including any reference the caller itself might hold. Callers comparing against the actual folio_ref_count() must account for their own references separately.
Link: https://syzkaller.appspot.com/bug?extid=8bb6fd945af4e0ad9299 [1] Link: https://lkml.kernel.org/r/20250430100150.279751-1-shivankg@amd.com Link: https://lkml.kernel.org/r/20250430100150.279751-2-shivankg@amd.com Signed-off-by: David Hildenbrand david@redhat.com Signed-off-by: Shivank Garg shivankg@amd.com Suggested-by: Matthew Wilcox willy@infradead.org Co-developed-by: David Hildenbrand david@redhat.com Cc: Alistair Popple apopple@nvidia.com Cc: Dave Kleikamp shaggy@kernel.org Cc: Donet Tom donettom@linux.ibm.com Cc: Jane Chu jane.chu@oracle.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: 98c6d259319e ("mm/gup: check ref_count instead of lru before migration") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/mm.h | 55 ++++++++++++++++++++++++++++++++++++++++++++++ mm/migrate.c | 22 ++++--------------- 2 files changed, 59 insertions(+), 18 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index deeb535f920c8..41f5c88bdf3bc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2200,6 +2200,61 @@ static inline bool folio_likely_mapped_shared(struct folio *folio) return atomic_read(&folio->_mapcount) > 0; }
+/** + * folio_expected_ref_count - calculate the expected folio refcount + * @folio: the folio + * + * Calculate the expected folio refcount, taking references from the pagecache, + * swapcache, PG_private and page table mappings into account. Useful in + * combination with folio_ref_count() to detect unexpected references (e.g., + * GUP or other temporary references). + * + * Does currently not consider references from the LRU cache. If the folio + * was isolated from the LRU (which is the case during migration or split), + * the LRU cache does not apply. + * + * Calling this function on an unmapped folio -- !folio_mapped() -- that is + * locked will return a stable result. + * + * Calling this function on a mapped folio will not result in a stable result, + * because nothing stops additional page table mappings from coming (e.g., + * fork()) or going (e.g., munmap()). + * + * Calling this function without the folio lock will also not result in a + * stable result: for example, the folio might get dropped from the swapcache + * concurrently. + * + * However, even when called without the folio lock or on a mapped folio, + * this function can be used to detect unexpected references early (for example, + * if it makes sense to even lock the folio and unmap it). + * + * The caller must add any reference (e.g., from folio_try_get()) it might be + * holding itself to the result. + * + * Returns the expected folio refcount. + */ +static inline int folio_expected_ref_count(const struct folio *folio) +{ + const int order = folio_order(folio); + int ref_count = 0; + + if (WARN_ON_ONCE(folio_test_slab(folio))) + return 0; + + if (folio_test_anon(folio)) { + /* One reference per page from the swapcache. */ + ref_count += folio_test_swapcache(folio) << order; + } else if (!((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS)) { + /* One reference per page from the pagecache. */ + ref_count += !!folio->mapping << order; + /* One reference from PG_private. */ + ref_count += folio_test_private(folio); + } + + /* One reference per page table mapping. */ + return ref_count + folio_mapcount(folio); +} + #ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE static inline int arch_make_folio_accessible(struct folio *folio) { diff --git a/mm/migrate.c b/mm/migrate.c index 25e7438af968a..8619aa884eaa8 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -453,20 +453,6 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd) } #endif
-static int folio_expected_refs(struct address_space *mapping, - struct folio *folio) -{ - int refs = 1; - if (!mapping) - return refs; - - refs += folio_nr_pages(folio); - if (folio_test_private(folio)) - refs++; - - return refs; -} - /* * Replace the folio in the mapping. * @@ -609,7 +595,7 @@ static int __folio_migrate_mapping(struct address_space *mapping, int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int extra_count) { - int expected_count = folio_expected_refs(mapping, folio) + extra_count; + int expected_count = folio_expected_ref_count(folio) + extra_count + 1;
if (folio_ref_count(folio) != expected_count) return -EAGAIN; @@ -626,7 +612,7 @@ int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src) { XA_STATE(xas, &mapping->i_pages, folio_index(src)); - int rc, expected_count = folio_expected_refs(mapping, src); + int rc, expected_count = folio_expected_ref_count(src) + 1;
if (folio_ref_count(src) != expected_count) return -EAGAIN; @@ -756,7 +742,7 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, void *src_private, enum migrate_mode mode) { - int rc, expected_count = folio_expected_refs(mapping, src); + int rc, expected_count = folio_expected_ref_count(src) + 1;
/* Check whether src does not have extra refs before we do more work */ if (folio_ref_count(src) != expected_count) @@ -844,7 +830,7 @@ static int __buffer_migrate_folio(struct address_space *mapping, return migrate_folio(mapping, dst, src, mode);
/* Check whether page does not have extra refs before we do more work */ - expected_count = folio_expected_refs(mapping, src); + expected_count = folio_expected_ref_count(src) + 1; if (folio_ref_count(src) != expected_count) return -EAGAIN;
From: Hugh Dickins hughd@google.com
[ Upstream commit 98c6d259319ecf6e8d027abd3f14b81324b8c0ad ]
Patch series "mm: better GUP pin lru_add_drain_all()", v2.
Series of lru_add_drain_all()-related patches, arising from recent mm/gup migration report from Will Deacon.
This patch (of 5):
Will Deacon reports:-
When taking a longterm GUP pin via pin_user_pages(), __gup_longterm_locked() tries to migrate target folios that should not be longterm pinned, for example because they reside in a CMA region or movable zone. This is done by first pinning all of the target folios anyway, collecting all of the longterm-unpinnable target folios into a list, dropping the pins that were just taken and finally handing the list off to migrate_pages() for the actual migration.
It is critically important that no unexpected references are held on the folios being migrated, otherwise the migration will fail and pin_user_pages() will return -ENOMEM to its caller. Unfortunately, it is relatively easy to observe migration failures when running pKVM (which uses pin_user_pages() on crosvm's virtual address space to resolve stage-2 page faults from the guest) on a 6.15-based Pixel 6 device and this results in the VM terminating prematurely.
In the failure case, 'crosvm' has called mlock(MLOCK_ONFAULT) on its mapping of guest memory prior to the pinning. Subsequently, when pin_user_pages() walks the page-table, the relevant 'pte' is not present and so the faulting logic allocates a new folio, mlocks it with mlock_folio() and maps it in the page-table.
Since commit 2fbb0c10d1e8 ("mm/munlock: mlock_page() munlock_page() batch by pagevec"), mlock/munlock operations on a folio (formerly page), are deferred. For example, mlock_folio() takes an additional reference on the target folio before placing it into a per-cpu 'folio_batch' for later processing by mlock_folio_batch(), which drops the refcount once the operation is complete. Processing of the batches is coupled with the LRU batch logic and can be forcefully drained with lru_add_drain_all() but as long as a folio remains unprocessed on the batch, its refcount will be elevated.
This deferred batching therefore interacts poorly with the pKVM pinning scenario as we can find ourselves in a situation where the migration code fails to migrate a folio due to the elevated refcount from the pending mlock operation.
Hugh Dickins adds:-
!folio_test_lru() has never been a very reliable way to tell if an lru_add_drain_all() is worth calling, to remove LRU cache references to make the folio migratable: the LRU flag may be set even while the folio is held with an extra reference in a per-CPU LRU cache.
5.18 commit 2fbb0c10d1e8 may have made it more unreliable. Then 6.11 commit 33dfe9204f29 ("mm/gup: clear the LRU flag of a page before adding to LRU batch") tried to make it reliable, by moving LRU flag clearing; but missed the mlock/munlock batches, so still unreliable as reported.
And it turns out to be difficult to extend 33dfe9204f29's LRU flag clearing to the mlock/munlock batches: if they do benefit from batching, mlock/munlock cannot be so effective when easily suppressed while !LRU.
Instead, switch to an expected ref_count check, which was more reliable all along: some more false positives (unhelpful drains) than before, and never a guarantee that the folio will prove migratable, but better.
Note on PG_private_2: ceph and nfs are still using the deprecated PG_private_2 flag, with the aid of netfs and filemap support functions. Although it is consistently matched by an increment of folio ref_count, folio_expected_ref_count() intentionally does not recognize it, and ceph folio migration currently depends on that for PG_private_2 folios to be rejected. New references to the deprecated flag are discouraged, so do not add it into the collect_longterm_unpinnable_folios() calculation: but longterm pinning of transiently PG_private_2 ceph and nfs folios (an uncommon case) may invoke a redundant lru_add_drain_all(). And this makes easy the backport to earlier releases: up to and including 6.12, btrfs also used PG_private_2, but without a ref_count increment.
Note for stable backports: requires 6.16 commit 86ebd50224c0 ("mm: add folio_expected_ref_count() for reference count calculation").
Link: https://lkml.kernel.org/r/41395944-b0e3-c3ac-d648-8ddd70451d28@google.com Link: https://lkml.kernel.org/r/bd1f314a-fca1-8f19-cac0-b936c9614557@google.com Fixes: 9a4e9f3b2d73 ("mm: update get_user_pages_longterm to migrate pages allocated from CMA region") Signed-off-by: Hugh Dickins hughd@google.com Reported-by: Will Deacon will@kernel.org Closes: https://lore.kernel.org/linux-mm/20250815101858.24352-1-will@kernel.org/ Acked-by: Kiryl Shutsemau kas@kernel.org Acked-by: David Hildenbrand david@redhat.com Cc: "Aneesh Kumar K.V" aneesh.kumar@kernel.org Cc: Axel Rasmussen axelrasmussen@google.com Cc: Chris Li chrisl@kernel.org Cc: Christoph Hellwig hch@infradead.org Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Johannes Weiner hannes@cmpxchg.org Cc: John Hubbard jhubbard@nvidia.com Cc: Keir Fraser keirf@google.com Cc: Konstantin Khlebnikov koct9i@gmail.com Cc: Li Zhe lizhe.67@bytedance.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Peter Xu peterx@redhat.com Cc: Rik van Riel riel@surriel.com Cc: Shivank Garg shivankg@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Wei Xu weixugc@google.com Cc: yangge yangge1116@126.com Cc: Yuanchu Xie yuanchu@google.com Cc: Yu Zhao yuzhao@google.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/gup.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c index e323843cc5dd8..e010173b24c39 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2354,7 +2354,8 @@ static unsigned long collect_longterm_unpinnable_folios( continue; }
- if (!folio_test_lru(folio) && drain_allow) { + if (drain_allow && folio_ref_count(folio) != + folio_expected_ref_count(folio) + 1) { lru_add_drain_all(); drain_allow = false; }
linux-stable-mirror@lists.linaro.org