Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table

28 May 2025


      On Wed, May 28, 2025 at 10:26:04PM +0200, David Hildenbrand wrote:
...
Digging a bit:
commit 56c9cfb13c9b6516017eea4e8cbe22ea02e07ee6
Author: Naoya Horiguchi nao.horiguchi@gmail.com
Date:   Fri Sep 10 13:23:04 2010 +0900
hugetlb, rmap: fix confusing page locking in hugetlb_cow()
The "if (!trylock_page)" block in the avoidcopy path of hugetlb_cow()
looks confusing and is buggy.  Originally this trylock_page() was
intended to make sure that old_page is locked even when old_page !=
pagecache_page, because then only pagecache_page is locked.


Added the comment

  /*


   * hugetlb_cow() requires page locks of pte_page(entry) and


   * pagecache_page, so here we need take the former one


   * when page != pagecache_page or !pagecache_page.


   * Note that locking order is always pagecache_page -> page,


   * so no worry about deadlock.


   */





And
commit 0fe6e20b9c4c53b3e97096ee73a0857f60aad43f
Author: Naoya Horiguchi nao.horiguchi@gmail.com
Date:   Fri May 28 09:29:16 2010 +0900
hugetlb, rmap: add reverse mapping for hugepage
This patch adds reverse mapping feature for hugepage by introducing
mapcount for shared/private-mapped hugepage and anon_vma for
private-mapped hugepage.
While hugepage is not currently swappable, reverse mapping can be useful
for memory error handler.
Without this patch, memory error handler cannot identify processes
using the bad hugepage nor unmap it from them. That is:
- for shared hugepage:
  we can collect processes using a hugepage through pagecache,
  but can not unmap the hugepage because of the lack of mapcount.
- for privately mapped hugepage:
  we can neither collect processes nor unmap the hugepage.
This patch solves these problems.
This patch include the bug fix given by commit 23be7468e8, so reverts it.


Added the real locking magic.
Yes, I have been checking "hugetlb, rmap: add reverse mapping for
hugepage", which added locking the now-so-called 'old_folio' in case
hugetlbfs_pagecache_page() didn't return anything.
Because in hugetlb_wp, this was added:
@@ -2286,8 +2299,11 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
  retry_avoidcopy:
         /* If no-one else is actually using this page, avoid the copy
          * and just make the page writable */
 -       avoidcopy = (page_count(old_page) == 1);
 +       avoidcopy = (page_mapcount(old_page) == 1);
         if (avoidcopy) {
 +               if (!trylock_page(old_page))
 +                       if (PageAnon(old_page))
 +                               page_move_anon_rmap(old_page, vma, address);
So, as you mentioned, it was done to keep the rmap stable as I guess rmap test test the
PageLock.
...
Not that much changed regarding locking until COW support was added in
commit 1e8f889b10d8d2223105719e36ce45688fedbd59
Author: David Gibson david@gibson.dropbear.id.au
Date:   Fri Jan 6 00:10:44 2006 -0800
[PATCH] Hugetlb: Copy on Write support
Implement copy-on-write support for hugetlb mappings so MAP_PRIVATE can be
supported.  This helps us to safely use hugetlb pages in many more
applications.  The patch makes the following changes.  If needed, I also have
it broken out according to the following paragraphs.



Confusing.
Locking the *old_folio* when calling hugetlb_wp() makes sense when it is
an anon folio because we might want to call folio_move_anon_rmap() to adjust the rmap root.
Yes, this is clear.
...
Locking the pagecache folio when calling hugetlb_wp() if old_folio is an anon folio ...
does not make sense to me.
I think this one is also clear.
...
Locking the pagecache folio when calling hugetlb_wp if old_folio is a pageache folio ...
also doesn't quite make sense for me.
Again, we don't take the lock for ordinary pages, so what's special about hugetlb for the last
case (reservations, I assume?).
So, this case is when pagecache_folio == old_folio.
I guess we are talking about resv_maps? But I think we cannot interfere there.
For the reserves to be modified the page has to go away.
Now, I have been checking this one too:
commit 04f2cbe35699d22dbf428373682ead85ca1240f5
 Author: Mel Gorman mel@csn.ul.ie
 Date:   Wed Jul 23 21:27:25 2008 -0700
hugetlb: guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed
And I think it is interesting.
That one added this chunk in hugetlb_fault():
@@ -1126,8 +1283,15 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
         spin_lock(&mm->page_table_lock);
         /* Check for a racing update before calling hugetlb_cow */
         if (likely(pte_same(entry, huge_ptep_get(ptep))))
 -               if (write_access && !pte_write(entry))
 -                       ret = hugetlb_cow(mm, vma, address, ptep, entry);
 +               if (write_access && !pte_write(entry)) {
 +                       struct page *page;
 +                       page = hugetlbfs_pagecache_page(vma, address);
 +                       ret = hugetlb_cow(mm, vma, address, ptep, entry, page);
 +                       if (page) {
 +                               unlock_page(page);
 +                               put_page(page);
 +                       }
 +               }
So, it finds and lock the page in the pagecache, and calls hugetlb_cow.
hugetlb_fault() takes hugetlb_instantiation_mutex, and there is a
comment saying:
/*
         * Serialize hugepage allocation and instantiation, so that we don't
         * get spurious allocation failures if two CPUs race to instantiate
         * the same page in the page cache.
         */
        mutex_lock(&hugetlb_instantiation_mutex);
But it does not say anything about truncation.
Actually, checking the truncation code from back then,
truncate_hugepages() (and none of its callers) take the hugetlb_instantiation_mutex,
as it is done today (e.g: current remove_inode_hugepages() code).
Back then, truncate_hugepages() relied only in lock_page():
static void truncate_hugepages(struct inode *inode, loff_t lstart)
 {
  ...
  ...
  lock_page(page);
  truncate_huge_page(page);
  unlock_page(page);
 }
While today, remove_inode_hugepages() takes the mutex, and also the lock.
And then zaps the page and does its thing with resv_maps.
So I think that we should not even need the lock for hugetlb_wp
when pagecache_folio == old_folio (pagecache), because the mutex
already protects us from the page to go away, right (e.g: truncated)?
Besides we hold a reference on that page since
filemap_lock_hugetlb_folio() locks the page and increases its refcount.
All in all, I am leaning towards not being needed, but it's getting late
here..
-- 
Oscar Salvador
SUSE Labs


    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v3] mm/hugetlb: fix a deadlock with pagecache_folio and hugetlb_fault_mutex_table