The patch titled Subject: mm: hugetlb: fix incorrect fallback for subpool has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-hugetlb-fix-incorrect-fallback-for-subpool.patch
This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches...
This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days
------------------------------------------------------ From: Wupeng Ma mawupeng1@huawei.com Subject: mm: hugetlb: fix incorrect fallback for subpool Date: Thu, 10 Apr 2025 14:26:33 +0800
During our testing with hugetlb subpool enabled, we observe that hstate->resv_huge_pages may underflow into negative values. Root cause analysis reveals a race condition in subpool reservation fallback handling as follow:
hugetlb_reserve_pages() /* Attempt subpool reservation */ gbl_reserve = hugepage_subpool_get_pages(spool, chg);
/* Global reservation may fail after subpool allocation */ if (hugetlb_acct_memory(h, gbl_reserve) < 0) goto out_put_pages;
out_put_pages: /* This incorrectly restores reservation to subpool */ hugepage_subpool_put_pages(spool, chg);
When hugetlb_acct_memory() fails after subpool allocation, the current implementation over-commits subpool reservations by returning the full 'chg' value instead of the actual allocated 'gbl_reserve' amount. This discrepancy propagates to global reservations during subsequent releases, eventually causing resv_huge_pages underflow.
This problem can be trigger easily with the following steps: 1. reverse hugepage for hugeltb allocation 2. mount hugetlbfs with min_size to enable hugetlb subpool 3. alloc hugepages with two task(make sure the second will fail due to insufficient amount of hugepages) 4. with for a few seconds and repeat step 3 which will make hstate->resv_huge_pages to go below zero.
To fix this problem, return corrent amount of pages to subpool during the fallback after hugepage_subpool_get_pages is called.
Link: https://lkml.kernel.org/r/20250410062633.3102457-1-mawupeng1@huawei.com Fixes: 1c5ecae3a93f ("hugetlbfs: add minimum size accounting to subpools") Signed-off-by: Wupeng Ma mawupeng1@huawei.com Tested-by: Joshua Hahn joshua.hahnjy@gmail.com Cc: David Hildenbrand david@redhat.com Cc: Ma Wupeng mawupeng1@huawei.com Cc: Muchun Song muchun.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
mm/hugetlb.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-incorrect-fallback-for-subpool +++ a/mm/hugetlb.c @@ -3010,7 +3010,7 @@ struct folio *alloc_hugetlb_folio(struct struct hugepage_subpool *spool = subpool_vma(vma); struct hstate *h = hstate_vma(vma); struct folio *folio; - long retval, gbl_chg; + long retval, gbl_chg, gbl_reserve; map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg = NULL; @@ -3163,8 +3163,16 @@ out_uncharge_cgroup_reservation: hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg) - hugepage_subpool_put_pages(spool, 1); + /* + * put page to subpool iff the quota of subpool's rsv_hpages is used + * during hugepage_subpool_get_pages. + */ + if (map_chg && !gbl_chg) { + gbl_reserve = hugepage_subpool_put_pages(spool, 1); + hugetlb_acct_memory(h, -gbl_reserve); + } + + out_end_reservation: if (map_chg != MAP_CHG_ENFORCED) vma_end_reservation(h, vma, addr); @@ -7233,7 +7241,7 @@ bool hugetlb_reserve_pages(struct inode struct vm_area_struct *vma, vm_flags_t vm_flags) { - long chg = -1, add = -1; + long chg = -1, add = -1, spool_resv, gbl_resv; struct hstate *h = hstate_inode(inode); struct hugepage_subpool *spool = subpool_inode(inode); struct resv_map *resv_map; @@ -7368,8 +7376,16 @@ bool hugetlb_reserve_pages(struct inode return true;
out_put_pages: - /* put back original number of pages, chg */ - (void)hugepage_subpool_put_pages(spool, chg); + spool_resv = chg - gbl_reserve; + if (spool_resv) { + /* put sub pool's reservation back, chg - gbl_reserve */ + gbl_resv = hugepage_subpool_put_pages(spool, spool_resv); + /* + * subpool's reserved pages can not be put back due to race, + * return to hstate. + */ + hugetlb_acct_memory(h, -gbl_resv); + } out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); _
Patches currently in -mm which might be from mawupeng1@huawei.com are
mm-hugetlb-fix-incorrect-fallback-for-subpool.patch