Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table

28 May 2025


      On 28.05.25 17:45, Oscar Salvador wrote:
...
On Wed, May 28, 2025 at 05:09:26PM +0200, David Hildenbrand wrote:
...
On 28.05.25 17:03, Peter Xu wrote:
...
So I'm not 100% sure we need the folio lock even for copy; IIUC a refcount
would be enough?
The introducing patches seem to talk about blocking concurrent migration /
rmap walks.
I thought the main reason was because PageLock protects us against writes,
so when copying (in case of copying the underlying file), we want the
file to be stable throughout the copy?
Well, we don't do the same for ordinary pages, why should we do for hugetlb?
See wp_page_copy().
If you have a MAP_PRIVATE mapping of a file and modify the pagecache 
pages concurrently (write to another MAP_SHARED mapping, write() ...), 
there are no guarantees about one observing any specific page state.
At least not that I am aware of ;)
...
...
Maybe also concurrent fallocate(PUNCH_HOLE) is a problem regarding
reservations? Not sure ...
fallocate()->hugetlb_vmdelete_list() tries to grab the vma in write-mode,
and hugetlb_wp() grabs the lock in read-mode, so we should be covered?
Yeah, maybe that's the case nowadays. Maybe it wasn't in the past ...
...
Also, hugetlbfs_punch_hole()->remove_inode_hugepages() will try to grab the mutex.
The only fishy thing I see is hugetlbfs_zero_partial_page().
But that is for old_page, and as I said, I thought main reason was to
protect us against writes during the copy.
See above, I really wouldn't understand why that is required.
...
...
For 2) I am also not sure if we need need the pagecache folio locked; I
doubt it ... but this code is not the easiest to follow.
I have been staring at that code and thinking about potential scenarios
for a few days now, and I cannot convice myself that we need
pagecache_folio's lock when pagecache_folio != old_folio because as a
matter of fact I cannot think of anything it protects us against.
I plan to rework this in a more sane way, or at least less offusctaed, and then
Galvin can fire his syzkaller to check whether we are good.
-- 
Cheers,

David / dhildenb

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table