On 27.06.25 09:36, Barry Song wrote:
On Fri, Jun 27, 2025 at 7:15 PM Lance Yang lance.yang@linux.dev wrote:
On 2025/6/27 14:55, Barry Song wrote:
On Fri, Jun 27, 2025 at 6:52 PM Barry Song 21cnbao@gmail.com wrote:
On Fri, Jun 27, 2025 at 6:23 PM Lance Yang ioworker0@gmail.com wrote:
From: Lance Yang lance.yang@linux.dev
As pointed out by David[1], the batched unmap logic in try_to_unmap_one() can read past the end of a PTE table if a large folio is mapped starting at the last entry of that table. It would be quite rare in practice, as MADV_FREE typically splits the large folio ;)
So let's fix the potential out-of-bounds read by refactoring the logic into a new helper, folio_unmap_pte_batch().
The new helper now correctly calculates the safe number of pages to scan by limiting the operation to the boundaries of the current VMA and the PTE table.
In addition, the "all-or-nothing" batching restriction is removed to support partial batches. The reference counting is also cleaned up to use folio_put_refs().
[1] https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat...
What about ?
As pointed out by David[1], the batched unmap logic in try_to_unmap_one() may read past the end of a PTE table when a large folio spans across two PMDs, particularly after being remapped with mremap(). This patch fixes the potential out-of-bounds access by capping the batch at vm_end and the PMD boundary.
It also refactors the logic into a new helper, folio_unmap_pte_batch(), which supports batching between 1 and folio_nr_pages. This improves code clarity. Note that such cases are rare in practice, as MADV_FREE typically splits large folios.
Sorry, I meant that MADV_FREE typically splits large folios if the specified range doesn't cover the entire folio.
Hmm... I got it wrong as well :( It's the partial coverage that triggers the split.
how about this revised version:
As pointed out by David[1], the batched unmap logic in try_to_unmap_one() may read past the end of a PTE table when a large folio spans across two PMDs, particularly after being remapped with mremap(). This patch fixes the potential out-of-bounds access by capping the batch at vm_end and the PMD boundary.
It also refactors the logic into a new helper, folio_unmap_pte_batch(), which supports batching between 1 and folio_nr_pages. This improves code clarity. Note that such boundary-straddling cases are rare in practice, as MADV_FREE will typically split a large folio if the advice range does not cover the entire folio.
I assume the out-of-bounds access must be fixed, even though it is very unlikely. It might occur after a large folio is marked with MADV_FREE and then remapped to an unaligned address, potentially crossing two PTE tables.
Right. If it can be triggered from userspace, it doesn't matter how likely/common/whatever it is. It must be fixed.
A batch size between 2 and nr_pages - 1 is practically rare, as we typically split large folios when MADV_FREE does not cover the entire folio range. Cases where a batch of size 2 or nr_pages - 1 occurs may only happen if a large folio is partially unmapped after being marked MADV_FREE, which is quite an unusual pattern in userspace.
I think the point is rather "Simplify the code by not special-casing for completely mapped folios, there is no real reason why we cannot batch ranges that don't cover the complete folio.".