The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025051204-tidal-lake-6ae7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 Mon Sep 17 00:00:00 2001 From: Gavin Guo gavinguo@igalia.com Date: Mon, 21 Apr 2025 19:35:36 +0800 Subject: [PATCH] mm/huge_memory: fix dereferencing invalid pmd migration entry
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace: <TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2a47682d1ab7..47d76d03ce30 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, bool freeze, struct folio *folio) { + bool pmd_migration = is_pmd_migration_entry(*pmd); + VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio)); VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE)); VM_WARN_ON_ONCE(folio && !folio_test_locked(folio)); @@ -3085,9 +3087,12 @@ void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address, * require a folio to check the PMD against. Otherwise, there * is a risk of replacing the wrong folio. */ - if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || - is_pmd_migration_entry(*pmd)) { - if (folio && folio != pmd_folio(*pmd)) + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) { + /* + * Do not apply pmd_folio() to a migration entry; and folio lock + * guarantees that it must be of the wrong folio anyway. + */ + if (folio && (pmd_migration || folio != pmd_folio(*pmd))) return; __split_huge_pmd_locked(vma, pmd, address, freeze); }
[ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace: <TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [gavin: backport the migration checking logic to __split_huge_pmd] Signed-off-by: Gavin Guo gavinguo@igalia.com --- mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9139da4baa39..bcefc17954d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page)); - if (page != pmd_page(*pmd)) + if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd)) goto out; }
@@ -2196,7 +2196,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!pmd_devmap(*pmd)) goto out; __split_huge_pmd_locked(vma, pmd, range.start, freeze); out:
base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
On Mon, 16 Jun 2025, Gavin Guo wrote:
[ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace:
<TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [gavin: backport the migration checking logic to __split_huge_pmd] Signed-off-by: Gavin Guo gavinguo@igalia.com
mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9139da4baa39..bcefc17954d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page));
if (page != pmd_page(*pmd))
}if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd)) goto out;
@@ -2196,7 +2196,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page);
- } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
- } else if (!pmd_devmap(*pmd)) goto out;
I'm sorry, Gavin, but this 5.15 and the 5.10 and 5.4 backports look wrong to me, because here you drop the is_pmd_migration_entry(*pmd) condition, but if !page then that has not been checked earlier (this check here is specifically allowing a pmd migration entry to proceed to the split).
Hugh
__split_huge_pmd_locked(vma, pmd, range.start, freeze); out:
base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
2.43.0
On 6/19/25 11:30, Hugh Dickins wrote:
On Mon, 16 Jun 2025, Gavin Guo wrote:
[ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace:
<TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [gavin: backport the migration checking logic to __split_huge_pmd] Signed-off-by: Gavin Guo gavinguo@igalia.com
mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9139da4baa39..bcefc17954d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page));
if (page != pmd_page(*pmd))
}if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd)) goto out;
@@ -2196,7 +2196,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page);
- } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
- } else if (!pmd_devmap(*pmd)) goto out;
I'm sorry, Gavin, but this 5.15 and the 5.10 and 5.4 backports look wrong to me, because here you drop the is_pmd_migration_entry(*pmd) condition, but if !page then that has not been checked earlier (this check here is specifically allowing a pmd migration entry to proceed to the split).
Hugh
Hi Hugh,
Thank you again for the review.
Regarding the 5.4/5.10/5.15. How do you think about the following changes?
@@ -2327,6 +2327,8 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, mmu_notifier_invalidate_range_start(&range); ptl = pmd_lock(vma->vm_mm, pmd);
+ if (is_pmd_migration_entry(*pmd)) + goto out; /* * If caller asks to setup a migration entries, we need a page to check * pmd against. Otherwise we can end up replacing wrong page. @@ -2369,7 +2371,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page); - } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd))) + } else if (!pmd_devmap(*pmd) ) goto out; __split_huge_pmd_locked(vma, pmd, range.start, freeze); out:
There is still an access, page = pmd_page(*pmd), inside the if(!page). I'm not sure if pmd could be a migration entry when the page is NULL. To avoid this as well, maybe just goto out directly in the beginning?
__split_huge_pmd_locked(vma, pmd, range.start, freeze); out:
base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
2.43.0
On Thu, 19 Jun 2025, Gavin Guo wrote:
On 6/19/25 11:30, Hugh Dickins wrote:
On Mon, 16 Jun 2025, Gavin Guo wrote:
[ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace:
<TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [gavin: backport the migration checking logic to __split_huge_pmd] Signed-off-by: Gavin Guo gavinguo@igalia.com
mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9139da4baa39..bcefc17954d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page));
if (page != pmd_page(*pmd))
} @@ -2196,7 +2196,7 @@ void __split_huge_pmd(struct vm_area_struct *vma,if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd)) goto out;
pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page);
- } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
- } else if (!pmd_devmap(*pmd)) goto out;
I'm sorry, Gavin, but this 5.15 and the 5.10 and 5.4 backports look wrong to me, because here you drop the is_pmd_migration_entry(*pmd) condition, but if !page then that has not been checked earlier (this check here is specifically allowing a pmd migration entry to proceed to the split).
Hugh
Hi Hugh,
Thank you again for the review.
Regarding the 5.4/5.10/5.15. How do you think about the following changes?
I think you are going way off track with the following changes.
The first hunk of your backport (the pmd_page line) was fine, it was the second hunk (the pmd_devmap line) that I objected to: that second hunk should just be deleted, to make no change on the pmd_devmap line.
Maybe you're misreading that pmd_devmap line, it is easy to get lost in its ! and parentheses.
@@ -2327,6 +2327,8 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, mmu_notifier_invalidate_range_start(&range); ptl = pmd_lock(vma->vm_mm, pmd);
if (is_pmd_migration_entry(*pmd))
goto out;
No. In general, __split_huge_pmd_locked() works on pmd migration entries; the bug you are fixing is not with pmd migration entries as such, but with applying pmd_page(*pmd) when *pmd is a migration entry.
I do not recall offhand how important it is that __split_huge_pmd_locked() should apply to pmd migration entries (when page here is NULL), and I do not wish to spend time researching that: maybe it's just an optimization, or maybe it's essential on some path. What is clear is that this bugfix backport should not be making any change to that behaviour.
/* * If caller asks to setup a migration entries, we need a page to
check * pmd against. Otherwise we can end up replacing wrong page. @@ -2369,7 +2371,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page);
} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
} else if (!pmd_devmap(*pmd) ) goto out; __split_huge_pmd_locked(vma, pmd, range.start, freeze);
out:
There is still an access, page = pmd_page(*pmd), inside the if(!page). I'm not sure if pmd could be a migration entry when the page is NULL. To avoid this as well, maybe just goto out directly in the beginning?
No. The other pmd_page(*pmd) is inside a pmd_trans_huge(*pmd) block, so it's safe, *pmd cannot be a migration entry there. (Though admittedly I have to check rather carefully, because, at least in the x86 case, pmd_trans_huge(*pmd) does not guarantee that the present bit is set.)
Hugh
__split_huge_pmd_locked(vma, pmd, range.start, freeze); out:
base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
2.43.0
On 6/20/25 04:17, Hugh Dickins wrote:
On Thu, 19 Jun 2025, Gavin Guo wrote:
On 6/19/25 11:30, Hugh Dickins wrote:
On Mon, 16 Jun 2025, Gavin Guo wrote:
[ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace:
<TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [gavin: backport the migration checking logic to __split_huge_pmd] Signed-off-by: Gavin Guo gavinguo@igalia.com
mm/huge_memory.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9139da4baa39..bcefc17954d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page));
if (page != pmd_page(*pmd))
} @@ -2196,7 +2196,7 @@ void __split_huge_pmd(struct vm_area_struct *vma,if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd)) goto out;
pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page);
- } else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
- } else if (!pmd_devmap(*pmd)) goto out;
I'm sorry, Gavin, but this 5.15 and the 5.10 and 5.4 backports look wrong to me, because here you drop the is_pmd_migration_entry(*pmd) condition, but if !page then that has not been checked earlier (this check here is specifically allowing a pmd migration entry to proceed to the split).
Hugh
Hi Hugh,
Thank you again for the review.
Regarding the 5.4/5.10/5.15. How do you think about the following changes?
I think you are going way off track with the following changes.
The first hunk of your backport (the pmd_page line) was fine, it was the second hunk (the pmd_devmap line) that I objected to: that second hunk should just be deleted, to make no change on the pmd_devmap line.
Maybe you're misreading that pmd_devmap line, it is easy to get lost in its ! and parentheses.
Got it. I'll just go ahead removing the second hunk and submit the patch soon.
@@ -2327,6 +2327,8 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, mmu_notifier_invalidate_range_start(&range); ptl = pmd_lock(vma->vm_mm, pmd);
if (is_pmd_migration_entry(*pmd))
goto out;
No. In general, __split_huge_pmd_locked() works on pmd migration entries; the bug you are fixing is not with pmd migration entries as such, but with applying pmd_page(*pmd) when *pmd is a migration entry.
Yeah, you are correct. Maybe I was thinking too much. When I modified the code, I recalled that the folio lock theory you mentioned in the upstream discussion that when migration happens the folio lock is taken and in this split path, it's impossible to take the same lock and the folio must be the same. If it's not the same, it could be the symptom of the reverse mapping behavior and we just skip it.
However, I would stop the discussion here without sprawling the logic and focusing on fixing the pmd_page(*pmd) problem.
I do not recall offhand how important it is that __split_huge_pmd_locked() should apply to pmd migration entries (when page here is NULL), and I do not wish to spend time researching that: maybe it's just an optimization, or maybe it's essential on some path. What is clear is that this bugfix backport should not be making any change to that behaviour.
Agreed.
/* * If caller asks to setup a migration entries, we need a page to
check * pmd against. Otherwise we can end up replacing wrong page. @@ -2369,7 +2371,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, } if (PageMlocked(page)) clear_page_mlock(page);
} else if (!(pmd_devmap(*pmd) || is_pmd_migration_entry(*pmd)))
out:} else if (!pmd_devmap(*pmd) ) goto out; __split_huge_pmd_locked(vma, pmd, range.start, freeze);
There is still an access, page = pmd_page(*pmd), inside the if(!page). I'm not sure if pmd could be a migration entry when the page is NULL. To avoid this as well, maybe just goto out directly in the beginning?
No. The other pmd_page(*pmd) is inside a pmd_trans_huge(*pmd) block, so it's safe, *pmd cannot be a migration entry there. (Though admittedly I have to check rather carefully, because, at least in the x86 case, pmd_trans_huge(*pmd) does not guarantee that the present bit is set.)
Looks good to me. Thank you for reviewing the logic.
Hugh
__split_huge_pmd_locked(vma, pmd, range.start, freeze);
out:
base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
2.43.0
[ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace: <TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [gavin: backport the migration checking logic to __split_huge_pmd] Signed-off-by: Gavin Guo gavinguo@igalia.com --- mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9139da4baa39..e9c5de967b2c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page)); - if (page != pmd_page(*pmd)) + if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd)) goto out; }
base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
On Sat, 21 Jun 2025, Gavin Guo wrote:
[ Upstream commit be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7 ]
When migrating a THP, concurrent access to the PMD migration entry during a deferred split scan can lead to an invalid address access, as illustrated below. To prevent this invalid access, it is necessary to check the PMD migration entry and return early. In this context, there is no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the equality of the target folio. Since the PMD migration entry is locked, it cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma lookup points to a location which may contain the folio of interest, but might instead contain another folio: and weeding out those other folios is precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008 CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60 Call Trace:
<TASK> try_to_migrate_one+0x28c/0x3730 rmap_walk_anon+0x4f6/0x770 unmap_folio+0x196/0x1f0 split_huge_page_to_list_to_order+0x9f6/0x1560 deferred_split_scan+0xac5/0x12a0 shrinker_debugfs_scan_write+0x376/0x470 full_proxy_write+0x15c/0x220 vfs_write+0x2fc/0xcb0 ksys_write+0x146/0x250 do_syscall_64+0x6a/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/ Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/ Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path") Signed-off-by: Gavin Guo gavinguo@igalia.com Acked-by: David Hildenbrand david@redhat.com Acked-by: Hugh Dickins hughd@google.com Acked-by: Zi Yan ziy@nvidia.com Reviewed-by: Gavin Shan gshan@redhat.com Cc: Florent Revest revest@google.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org [gavin: backport the migration checking logic to __split_huge_pmd] Signed-off-by: Gavin Guo gavinguo@igalia.com
Thanks, yes, this new 5.15 version Acked-by: Hugh Dickins hughd@google.com
mm/huge_memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9139da4baa39..e9c5de967b2c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2161,7 +2161,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page));
if (page != pmd_page(*pmd))
}if (is_pmd_migration_entry(*pmd) || page != pmd_page(*pmd)) goto out;
base-commit: 1c700860e8bc079c5c71d73c55e51865d273943c
2.43.0
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: be6e843fc51a584672dfd9c4a6a24c8cb81d5fb7
Status in newer kernel trees: 6.15.y | Present (exact SHA1) 6.12.y | Present (different SHA1: 6166c3cf4054) 6.6.y | Not found 6.1.y | Not found
Note: The patch differs from the upstream commit: --- 1: be6e843fc51a5 < -: ------------- mm/huge_memory: fix dereferencing invalid pmd migration entry -: ------------- > 1: 9b81485fcaa21 mm/huge_memory: fix dereferencing invalid pmd migration entry ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-5.15.y | Success | Success |
linux-stable-mirror@lists.linaro.org