On Tue, Jul 01, 2025 at 10:31:00PM +0800, Lance Yang wrote:
From: Lance Yang lance.yang@linux.dev
As pointed out by David[1], the batched unmap logic in try_to_unmap_one() may read past the end of a PTE table when a large folio's PTE mappings are not fully contained within a single page table.
While this scenario might be rare, an issue triggerable from userspace must be fixed regardless of its likelihood. This patch fixes the out-of-bounds access by refactoring the logic into a new helper, folio_unmap_pte_batch().
The new helper correctly calculates the safe batch size by capping the scan at both the VMA and PMD boundaries. To simplify the code, it also supports partial batching (i.e., any number of pages from 1 up to the calculated safe maximum), as there is no strong reason to special-case for fully mapped folios.
[1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat...
Cc: stable@vger.kernel.org Reported-by: David Hildenbrand david@redhat.com Closes: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat... Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation") Suggested-by: Barry Song baohua@kernel.org Acked-by: Barry Song baohua@kernel.org Reviewed-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com Acked-by: David Hildenbrand david@redhat.com Signed-off-by: Lance Yang lance.yang@linux.dev
LGTM, Reviewed-by: Harry Yoo harry.yoo@oracle.com
With a minor comment below.
diff --git a/mm/rmap.c b/mm/rmap.c index fb63d9256f09..1320b88fab74 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2206,13 +2213,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, hugetlb_remove_rmap(folio); } else { folio_remove_rmap_ptes(folio, subpage, nr_pages, vma);
} if (vma->vm_flags & VM_LOCKED) mlock_drain_local();folio_ref_sub(folio, nr_pages - 1);
folio_put(folio);
/* We have already batched the entire folio */
if (nr_pages > 1)
folio_put_refs(folio, nr_pages);
/*
* If we are sure that we batched the entire folio and cleared
* all PTEs, we can just optimize and stop right here.
*/
if (nr_pages == folio_nr_pages(folio)) goto walk_done;
Just a minor comment.
We should probably teach page_vma_mapped_walk() to skip nr_pages pages, or just rely on next_pte: do { ... } while (pte_none(ptep_get(pvmw->pte))) loop in page_vma_mapped_walk() to skip those ptes?
Taking different paths depending on (nr_pages == folio_nr_pages(folio)) doesn't seem sensible.
continue;