One fix for occasional failures I found while testing and a bunch of cleanups that should make that test easier to digest.
Tested on x86-64, the test seems to reliably pass.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: Zi Yan ziy@nvidia.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Liam R. Howlett" Liam.Howlett@oracle.com Cc: Nico Pache npache@redhat.com Cc: Ryan Roberts ryan.roberts@arm.com Cc: Dev Jain dev.jain@arm.com Cc: Barry Song baohua@kernel.org Cc: Wei Yang richard.weiyang@gmail.com
David Hildenbrand (2): selftests/mm: split_huge_page_test: fix occasional is_backed_by_folio() wrong results selftests/mm: split_huge_page_test: cleanups for split_pte_mapped_thp test
.../selftests/mm/split_huge_page_test.c | 138 ++++++++++-------- 1 file changed, 81 insertions(+), 57 deletions(-)
base-commit: b73c6f2b5712809f5f386780ac46d1d78c31b2e6
When checking for actual tail or head pages of a folio, we must make sure that the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flag is paired with KPF_THP.
For example, if we have another large folio after our large folio in physical memory, our "pfn_flags & (KPF_THP | KPF_COMPOUND_TAIL" would trigger even though it's actually a head page of the next folio.
If is_backed_by_folio() returns a wrong result, split_pte_mapped_thp() can fail with "Some THPs are missing during mremap".
Fix it by checking for head/tail pages of folios properly. Add folio_tail_flags/folio_head_flags to improve readability and use these masks also when just testing for any compound page.
Fixes: 169b456b0162 ("selftests/mm: reimplement is_backed_by_thp() with more precise check") Signed-off-by: David Hildenbrand david@redhat.com --- tools/testing/selftests/mm/split_huge_page_test.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index 10ae65ea032f6..72d6d8bb329ed 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -44,6 +44,8 @@ int kpageflags_fd; static bool is_backed_by_folio(char *vaddr, int order, int pagemap_fd, int kpageflags_fd) { + const uint64_t folio_head_flags = KPF_THP | KPF_COMPOUND_HEAD; + const uint64_t folio_tail_flags = KPF_THP | KPF_COMPOUND_TAIL; const unsigned long nr_pages = 1UL << order; unsigned long pfn_head; uint64_t pfn_flags; @@ -61,7 +63,7 @@ static bool is_backed_by_folio(char *vaddr, int order, int pagemap_fd,
/* check for order-0 pages */ if (!order) { - if (pfn_flags & (KPF_THP | KPF_COMPOUND_HEAD | KPF_COMPOUND_TAIL)) + if (pfn_flags & (folio_head_flags | folio_tail_flags)) return false; return true; } @@ -76,14 +78,14 @@ static bool is_backed_by_folio(char *vaddr, int order, int pagemap_fd, goto fail;
/* head PFN has no compound_head flag set */ - if (!(pfn_flags & (KPF_THP | KPF_COMPOUND_HEAD))) + if ((pfn_flags & folio_head_flags) != folio_head_flags) return false;
/* check all tail PFN flags */ for (i = 1; i < nr_pages; i++) { if (pageflags_get(pfn_head + i, kpageflags_fd, &pfn_flags)) goto fail; - if (!(pfn_flags & (KPF_THP | KPF_COMPOUND_TAIL))) + if ((pfn_flags & folio_tail_flags) != folio_tail_flags) return false; }
@@ -94,11 +96,8 @@ static bool is_backed_by_folio(char *vaddr, int order, int pagemap_fd, if (pageflags_get(pfn_head + nr_pages, kpageflags_fd, &pfn_flags)) return true;
- /* this folio is bigger than the given order */ - if (pfn_flags & (KPF_THP | KPF_COMPOUND_TAIL)) - return false; - - return true; + /* If we find another tail page, then the folio is larger. */ + return (pfn_flags & folio_tail_flags) != folio_tail_flags; fail: ksft_exit_fail_msg("Failed to get folio info\n"); return false;
On 2 Sep 2025, at 12:25, David Hildenbrand wrote:
When checking for actual tail or head pages of a folio, we must make sure that the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flag is paired with KPF_THP.
For example, if we have another large folio after our large folio in physical memory, our "pfn_flags & (KPF_THP | KPF_COMPOUND_TAIL" would trigger even though it's actually a head page of the next folio.
If is_backed_by_folio() returns a wrong result, split_pte_mapped_thp() can fail with "Some THPs are missing during mremap".
Fix it by checking for head/tail pages of folios properly. Add folio_tail_flags/folio_head_flags to improve readability and use these masks also when just testing for any compound page.
Fixes: 169b456b0162 ("selftests/mm: reimplement is_backed_by_thp() with more precise check") Signed-off-by: David Hildenbrand david@redhat.com
tools/testing/selftests/mm/split_huge_page_test.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
LGTM. Thanks for fixing it. Reviewed-by: Zi Yan ziy@nvidia.com
Best Regards, Yan, Zi
On Tue, Sep 02, 2025 at 06:25:35PM +0200, David Hildenbrand wrote:
When checking for actual tail or head pages of a folio, we must make sure that the KPF_COMPOUND_HEAD/KPF_COMPOUND_TAIL flag is paired with KPF_THP.
For example, if we have another large folio after our large folio in physical memory, our "pfn_flags & (KPF_THP | KPF_COMPOUND_TAIL" would
^ One nit here, we missed )
trigger even though it's actually a head page of the next folio.
If is_backed_by_folio() returns a wrong result, split_pte_mapped_thp() can fail with "Some THPs are missing during mremap".
Fix it by checking for head/tail pages of folios properly. Add folio_tail_flags/folio_head_flags to improve readability and use these masks also when just testing for any compound page.
Fixes: 169b456b0162 ("selftests/mm: reimplement is_backed_by_thp() with more precise check") Signed-off-by: David Hildenbrand david@redhat.com
Otherwise,
Reviewed-by: Wei Yang richard.weiyang@gmail.com
There is room for improvement, so let's clean up a bit:
(1) Define "4" as a constant.
(2) SKIP if we fail to allocate all THPs (e.g., fragmented) and add recovery code for all other failure cases: no need to exit the test.
(3) Rename "len" to thp_area_size, and "one_page" to "thp_area".
(4) Allocate a new area "page_area" into which we will mremap the pages; add "page_area_size". Now we can easily merge the two mremap instances into a single one.
(5) Iterate THPs instead of bytes when checking for missed THPs after mremap.
(6) Rename "pte_mapped2" to "tmp", used to verify mremap(MAP_FIXED) result.
(7) Split the corruption test from the failed-split test, so we can just iterate bytes vs. thps naturally.
(8) Extend comments and clarify why we are using mremap in the first place.
Reviewed-by: Zi Yan ziy@nvidia.com Signed-off-by: David Hildenbrand david@redhat.com --- .../selftests/mm/split_huge_page_test.c | 123 +++++++++++------- 1 file changed, 74 insertions(+), 49 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c index 72d6d8bb329ed..7731191cc8e9b 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -389,67 +389,92 @@ static void split_pmd_thp_to_order(int order)
static void split_pte_mapped_thp(void) { - char *one_page, *pte_mapped, *pte_mapped2; - size_t len = 4 * pmd_pagesize; - uint64_t thp_size; + const size_t nr_thps = 4; + const size_t thp_area_size = nr_thps * pmd_pagesize; + const size_t page_area_size = nr_thps * pagesize; + char *thp_area, *tmp, *page_area = MAP_FAILED; size_t i;
- one_page = mmap((void *)(1UL << 30), len, PROT_READ | PROT_WRITE, + thp_area = mmap((void *)(1UL << 30), thp_area_size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); - if (one_page == MAP_FAILED) - ksft_exit_fail_msg("Fail to allocate memory: %s\n", strerror(errno)); + if (thp_area == MAP_FAILED) { + ksft_test_result_fail("Fail to allocate memory: %s\n", strerror(errno)); + return; + }
- madvise(one_page, len, MADV_HUGEPAGE); + madvise(thp_area, thp_area_size, MADV_HUGEPAGE);
- for (i = 0; i < len; i++) - one_page[i] = (char)i; + for (i = 0; i < thp_area_size; i++) + thp_area[i] = (char)i;
- if (!check_huge_anon(one_page, 4, pmd_pagesize)) - ksft_exit_fail_msg("No THP is allocated\n"); + if (!check_huge_anon(thp_area, nr_thps, pmd_pagesize)) { + ksft_test_result_skip("Not all THPs allocated\n"); + goto out; + }
- /* remap the first pagesize of first THP */ - pte_mapped = mremap(one_page, pagesize, pagesize, MREMAP_MAYMOVE); - - /* remap the Nth pagesize of Nth THP */ - for (i = 1; i < 4; i++) { - pte_mapped2 = mremap(one_page + pmd_pagesize * i + pagesize * i, - pagesize, pagesize, - MREMAP_MAYMOVE|MREMAP_FIXED, - pte_mapped + pagesize * i); - if (pte_mapped2 == MAP_FAILED) - ksft_exit_fail_msg("mremap failed: %s\n", strerror(errno)); - } - - /* smap does not show THPs after mremap, use kpageflags instead */ - thp_size = 0; - for (i = 0; i < pagesize * 4; i++) - if (i % pagesize == 0 && - is_backed_by_folio(&pte_mapped[i], pmd_order, pagemap_fd, kpageflags_fd)) - thp_size++; - - if (thp_size != 4) - ksft_exit_fail_msg("Some THPs are missing during mremap\n"); - - /* split all remapped THPs */ - write_debugfs(PID_FMT, getpid(), (uint64_t)pte_mapped, - (uint64_t)pte_mapped + pagesize * 4, 0); - - /* smap does not show THPs after mremap, use kpageflags instead */ - thp_size = 0; - for (i = 0; i < pagesize * 4; i++) { - if (pte_mapped[i] != (char)i) - ksft_exit_fail_msg("%ld byte corrupted\n", i); + /* + * To challenge spitting code, we will mremap a single page of each + * THP (page[i] of thp[i]) in the thp_area into page_area. This will + * replace the PMD mappings in the thp_area by PTE mappings first, + * but leaving the THP unsplit, to then create a page-sized hole in + * the thp_area. + * We will then manually trigger splitting of all THPs through the + * single mremap'ed pages of each THP in the page_area. + */ + page_area = mmap(NULL, page_area_size, PROT_READ | PROT_WRITE, + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); + if (page_area == MAP_FAILED) { + ksft_test_result_fail("Fail to allocate memory: %s\n", strerror(errno)); + goto out; + }
- if (i % pagesize == 0 && - !is_backed_by_folio(&pte_mapped[i], 0, pagemap_fd, kpageflags_fd)) - thp_size++; + for (i = 0; i < nr_thps; i++) { + tmp = mremap(thp_area + pmd_pagesize * i + pagesize * i, + pagesize, pagesize, MREMAP_MAYMOVE|MREMAP_FIXED, + page_area + pagesize * i); + if (tmp != MAP_FAILED) + continue; + ksft_test_result_fail("mremap failed: %s\n", strerror(errno)); + goto out; + } + + /* + * Verify that our THPs were not split yet. Note that + * check_huge_anon() cannot be used as it checks for PMD mappings. + */ + for (i = 0; i < nr_thps; i++) { + if (is_backed_by_folio(page_area + i * pagesize, pmd_order, + pagemap_fd, kpageflags_fd)) + continue; + ksft_test_result_fail("THP %zu missing after mremap\n", i); + goto out; }
- if (thp_size) - ksft_exit_fail_msg("Still %ld THPs not split\n", thp_size); + /* Split all THPs through the remapped pages. */ + write_debugfs(PID_FMT, getpid(), (uint64_t)page_area, + (uint64_t)page_area + page_area_size, 0); + + /* Corruption during mremap or split? */ + for (i = 0; i < page_area_size; i++) { + if (page_area[i] == (char)i) + continue; + ksft_test_result_fail("%zu byte corrupted\n", i); + goto out; + } + + /* Split failed? */ + for (i = 0; i < nr_thps; i++) { + if (is_backed_by_folio(page_area + i * pagesize, 0, + pagemap_fd, kpageflags_fd)) + continue; + ksft_test_result_fail("THP %zu not split\n", i); + }
ksft_test_result_pass("Split PTE-mapped huge pages successful\n"); - munmap(one_page, len); +out: + munmap(thp_area, thp_area_size); + if (page_area != MAP_FAILED) + munmap(page_area, page_area_size); }
static void split_file_backed_thp(int order)
On Tue, Sep 02, 2025 at 06:25:36PM +0200, David Hildenbrand wrote:
There is room for improvement, so let's clean up a bit:
(1) Define "4" as a constant.
(2) SKIP if we fail to allocate all THPs (e.g., fragmented) and add recovery code for all other failure cases: no need to exit the test.
(3) Rename "len" to thp_area_size, and "one_page" to "thp_area".
(4) Allocate a new area "page_area" into which we will mremap the pages; add "page_area_size". Now we can easily merge the two mremap instances into a single one.
(5) Iterate THPs instead of bytes when checking for missed THPs after mremap.
(6) Rename "pte_mapped2" to "tmp", used to verify mremap(MAP_FIXED) result.
(7) Split the corruption test from the failed-split test, so we can just iterate bytes vs. thps naturally.
(8) Extend comments and clarify why we are using mremap in the first place.
Reviewed-by: Zi Yan ziy@nvidia.com Signed-off-by: David Hildenbrand david@redhat.com
Reviewed-by: Wei Yang richard.weiyang@gmail.com
linux-kselftest-mirror@lists.linaro.org