On 18 Apr 2025, at 5:03, Gavin Guo wrote:
On 4/18/25 16:58, Gavin Guo wrote:
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to a invalid address access, as illustrated below. To prevent this page fault, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace:
<TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Cc: stable@vger.kernel.org Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
V1 -> V2: Add explanation from Hugh and correct the wording from page fault to invalid address access.
mm/huge_memory.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2a47682d1ab7..0cb9547dcff2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze, struct folio *folio) {
- bool pmd_migration = is_pmd_migration_entry(*pmd);
- VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
@@ -3085,10 +3087,18 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, * require a folio to check the PMD against. Otherwise, there * is a risk of replacing the wrong folio. */
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
is_pmd_migration_entry(*pmd)) {
if (folio && folio != pmd_folio(*pmd))
return;
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) {
if (folio) {
/*
* Do not apply pmd_folio() to a migration entry; and
* folio lock guarantees that it must be of the wrong
* folio anyway.
*/
if (pmd_migration)
return;
if (folio != pmd_folio(*pmd))
return;
__split_huge_pmd_locked(vma, pmd, address, freeze); } }}
base-commit: a24588245776dafc227243a01bfbeb8a59bafba9
Hi Zi, I've carefully reviewed the mailing list and observed that the indentation is not a strong concern from the reviews. And the cleanup suggestion from David will override the modification in this patch. I have decided to keep the original version (the unindented one). Let me know if you have any feedback with the v2 patch. Thank you!
No problem. Thank you for the fix.
Best Regards, Yan, Zi