The two patches fix the va_high_addr_switch.sh test failure on x86_64. Patch 1 fixes the hugepages setup issue that nr_hugepages is reset too early in run_vmtests.sh and break the later va_high_addr_switch testing. Patch 2 fixes the test failure caused by the hint addr align method change in hugetlb_get_unmapped_area().
Chunyu Hu (2): selftests/mm: fix hugepages cleanup too early selftests/mm: fix va_high_addr_switch.sh failure on x86_64
tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++-- tools/testing/selftests/mm/va_high_addr_switch.c | 4 ++-- 2 files changed, 9 insertions(+), 4 deletions(-)
The nr_hugepgs variable is used to keep the original nr_hugepages at the hugepage setup step at test beginning. After userfaultfd test, a cleaup is executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and /proc/sys//vm/nr_hugepages are reset to 'original' value before userfaultfd test starts.
Issue here is the value used to restore /proc/sys/vm/nr_hugepages is nr_hugepgs which is the initial value before the vm_runtests.sh runs, not the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests runs after that will possibly see no hugepages available for test, and got EINVAL when mmap(HUGETLB), making the result invalid.
And before pkey tests, nr_hugepgs is changed to be used as a temp variable to save nr_hugepages before pkey test, and restore it after pkey tests finish. The original nr_hugepages value is not tracked anymore, so no way to restore it after all tests finish.
Add a new variable nr_hugepgs_origin to save the original nr_hugepages, and and restore it to nr_hugepages after all tests finish. And change to use the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after hugepage setup, it's also the value before userfaultfd test starts, and the correct value to be restored after userfaultfd finishes. The va_high_addr_switch.sh broken will be resolved.
Signed-off-by: Chunyu Hu chuhu@redhat.com --- tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 471e539d82b8..f1a7ad3ec6a7 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -172,13 +172,13 @@ fi
# set proper nr_hugepages if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then - nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) + nr_hugepgs_origin=$(cat /proc/sys/vm/nr_hugepages) needpgs=$((needmem_KB / hpgsize_KB)) tries=2 while [ "$tries" -gt 0 ] && [ "$freepgs" -lt "$needpgs" ]; do lackpgs=$((needpgs - freepgs)) echo 3 > /proc/sys/vm/drop_caches - if ! echo $((lackpgs + nr_hugepgs)) > /proc/sys/vm/nr_hugepages; then + if ! echo $((lackpgs + nr_hugepgs_origin)) > /proc/sys/vm/nr_hugepages; then echo "Please run this test as root" exit $ksft_skip fi @@ -189,6 +189,7 @@ if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then done < /proc/meminfo tries=$((tries - 1)) done + nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) if [ "$freepgs" -lt "$needpgs" ]; then printf "Not enough huge pages available (%d < %d)\n" \ "$freepgs" "$needpgs" @@ -532,6 +533,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned
CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned
+if [ "${HAVE_HUGEPAGES}" = 1 ]; then + echo "$nr_hugepgs_origin" > /proc/sys/vm/nr_hugepages +fi + echo "SUMMARY: PASS=${count_pass} SKIP=${count_skip} FAIL=${count_fail}" | tap_prefix echo "1..${count_total}" | tap_output
The test will fail as below on x86_64 with cpu la57 support (will skip if no la57 support). Note, the test requries nr_hugepages to be set first.
# running bash ./va_high_addr_switch.sh # ------------------------------------- # mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK # mmap(addr_switch_hint - pagesize, (2 * pagesize)): 0x7f55b60f9000 - OK # mmap(addr_switch_hint, pagesize): 0x800000000000 - OK # mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK # mmap(NULL): 0x7f55b60f9000 - OK # mmap(low_addr): 0x40000000 - OK # mmap(high_addr): 0x1000000000000 - OK # mmap(high_addr) again: 0xffff55b6136000 - OK # mmap(high_addr, MAP_FIXED): 0x1000000000000 - OK # mmap(-1): 0xffff55b6134000 - OK # mmap(-1) again: 0xffff55b6132000 - OK # mmap(addr_switch_hint - pagesize, pagesize): 0x7f55b60fa000 - OK # mmap(addr_switch_hint - pagesize, 2 * pagesize): 0x7f55b60f9000 - OK # mmap(addr_switch_hint - pagesize/2 , 2 * pagesize): 0x7f55b60f7000 - OK # mmap(addr_switch_hint, pagesize): 0x800000000000 - OK # mmap(addr_switch_hint, 2 * pagesize, MAP_FIXED): 0x800000000000 - OK # mmap(NULL, MAP_HUGETLB): 0x7f55b5c00000 - OK # mmap(low_addr, MAP_HUGETLB): 0x40000000 - OK # mmap(high_addr, MAP_HUGETLB): 0x1000000000000 - OK # mmap(high_addr, MAP_HUGETLB) again: 0xffff55b5e00000 - OK # mmap(high_addr, MAP_FIXED | MAP_HUGETLB): 0x1000000000000 - OK # mmap(-1, MAP_HUGETLB): 0x7f55b5c00000 - OK # mmap(-1, MAP_HUGETLB) again: 0x7f55b5a00000 - OK # mmap(addr_switch_hint - pagesize, 2*hugepagesize, MAP_HUGETLB): 0x800000000000 - FAILED # mmap(addr_switch_hint , 2*hugepagesize, MAP_FIXED | MAP_HUGETLB): 0x800000000000 - OK # [FAIL]
addr_switch_hint is defined as DFEFAULT_MAP_WINDOW in the failed test (for x86_64, DFEFAULT_MAP_WINDOW is defined as (1UL<<47) - pagesize) in 64 bit.
Before commit cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*} functions"), for x86_64 hugetlb_get_unmapped_area() is handled in arch code arch/x86/mm/hugetlbpage.c and addr is checked with map_address_hint_valid() after align with 'addr &= huge_page_mask(h)' which is a round down way, and it will fail the check because the addr is within the DEFAULT_MAP_WINDOW but (addr + len) is above the DFEFAULT_MAP_WINDOW. So it wil go through the hugetlb_get_unmmaped_area_top_down() to find an area within the DFEFAULT_MAP_WINDOW.
After commit cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*} functions"). The addr hint for hugetlb_get_unmmaped_area() will be rounded up and aligned to hugepage size with ALIGN() for all arches. And after the align, the addr will be above the default MAP_DEFAULT_WINDOW, and the map_addresshint_valid() check will pass because both aligned addr (addr0) and (addr + len) are above the DEFAULT_MAP_WINDOW, and the aligned hint address (0x800000000000) is returned as an suitable gap is found there, in arch_get_unmapped_area_topdown().
To still cover the case that addr is within the DEFAULT_MAP_WINDOW, and addr + len is above the DFEFAULT_MAP_WINDOW, make the addr hint one hugepage lower, so that after the align it's still within DEFAULT_MAP_WINDOW, and the addr + len (2 hugepages) will be above DEFAULT_MAP_WINDOW.
Fixes: cc92882ee218 ("mm: drop hugetlb_get_unmapped_area{_*} functions") Signed-off-by: Chunyu Hu chuhu@redhat.com --- tools/testing/selftests/mm/va_high_addr_switch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/va_high_addr_switch.c b/tools/testing/selftests/mm/va_high_addr_switch.c index 896b3f73fc53..bd96dc3b5931 100644 --- a/tools/testing/selftests/mm/va_high_addr_switch.c +++ b/tools/testing/selftests/mm/va_high_addr_switch.c @@ -230,10 +230,10 @@ void testcases_init(void) .msg = "mmap(-1, MAP_HUGETLB) again", }, { - .addr = (void *)(addr_switch_hint - pagesize), + .addr = (void *)(addr_switch_hint - pagesize - hugepagesize), .size = 2 * hugepagesize, .flags = MAP_HUGETLB | MAP_PRIVATE | MAP_ANONYMOUS, - .msg = "mmap(addr_switch_hint - pagesize, 2*hugepagesize, MAP_HUGETLB)", + .msg = "mmap(addr_switch_hint - pagesize - hugepagesize, 2*hugepagesize, MAP_HUGETLB)", .low_addr_required = 1, .keep_mapped = 1, },
On 27.08.25 09:52, Chunyu Hu wrote:
The nr_hugepgs variable is used to keep the original nr_hugepages at the hugepage setup step at test beginning. After userfaultfd test, a cleaup is executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and /proc/sys//vm/nr_hugepages are reset to 'original' value before userfaultfd test starts.
Issue here is the value used to restore /proc/sys/vm/nr_hugepages is nr_hugepgs which is the initial value before the vm_runtests.sh runs, not the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests runs after that will possibly see no hugepages available for test, and got EINVAL when mmap(HUGETLB), making the result invalid.
And before pkey tests, nr_hugepgs is changed to be used as a temp variable to save nr_hugepages before pkey test, and restore it after pkey tests finish. The original nr_hugepages value is not tracked anymore, so no way to restore it after all tests finish.
Add a new variable nr_hugepgs_origin to save the original nr_hugepages, and and restore it to nr_hugepages after all tests finish. And change to use the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after hugepage setup, it's also the value before userfaultfd test starts, and the correct value to be restored after userfaultfd finishes. The va_high_addr_switch.sh broken will be resolved.
Signed-off-by: Chunyu Hu chuhu@redhat.com
tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 471e539d82b8..f1a7ad3ec6a7 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -172,13 +172,13 @@ fi # set proper nr_hugepages if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then
- nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages)
- nr_hugepgs_origin=$(cat /proc/sys/vm/nr_hugepages)
I'd call this "orig_nr_hugepgs".
But it's a shame that the naming is then out of sync with nr_size_hugepgs?
needpgs=$((needmem_KB / hpgsize_KB)) tries=2 while [ "$tries" -gt 0 ] && [ "$freepgs" -lt "$needpgs" ]; do lackpgs=$((needpgs - freepgs)) echo 3 > /proc/sys/vm/drop_caches
if ! echo $((lackpgs + nr_hugepgs)) > /proc/sys/vm/nr_hugepages; then
fiif ! echo $((lackpgs + nr_hugepgs_origin)) > /proc/sys/vm/nr_hugepages; then echo "Please run this test as root" exit $ksft_skip
@@ -189,6 +189,7 @@ if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then done < /proc/meminfo tries=$((tries - 1)) done
- nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) if [ "$freepgs" -lt "$needpgs" ]; then printf "Not enough huge pages available (%d < %d)\n" \ "$freepgs" "$needpgs"
@@ -532,6 +533,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned +if [ "${HAVE_HUGEPAGES}" = 1 ]; then
- echo "$nr_hugepgs_origin" > /proc/sys/vm/nr_hugepages
+fi
FWIW, I think the tests should maybe be doing that (save+configure+restore) themselves, like we do with THP settings through.
thp_save_settings() thp_write_settings()
and friends.
This is not really something run_vmtests.sh should bother with.
A bigger rework, though ...
On Wed, Aug 27, 2025 at 7:41 PM David Hildenbrand david@redhat.com wrote:
On 27.08.25 09:52, Chunyu Hu wrote:
The nr_hugepgs variable is used to keep the original nr_hugepages at the hugepage setup step at test beginning. After userfaultfd test, a cleaup is executed, both /sys/kernel/mm/hugepages/hugepages-*/nr_hugepages and /proc/sys//vm/nr_hugepages are reset to 'original' value before userfaultfd test starts.
Issue here is the value used to restore /proc/sys/vm/nr_hugepages is nr_hugepgs which is the initial value before the vm_runtests.sh runs, not the value before userfaultfd test starts. 'va_high_addr_swith.sh' tests runs after that will possibly see no hugepages available for test, and got EINVAL when mmap(HUGETLB), making the result invalid.
And before pkey tests, nr_hugepgs is changed to be used as a temp variable to save nr_hugepages before pkey test, and restore it after pkey tests finish. The original nr_hugepages value is not tracked anymore, so no way to restore it after all tests finish.
Add a new variable nr_hugepgs_origin to save the original nr_hugepages, and and restore it to nr_hugepages after all tests finish. And change to use the nr_hugepgs variable to save the /proc/sys/vm/nr_hugeages after hugepage setup, it's also the value before userfaultfd test starts, and the correct value to be restored after userfaultfd finishes. The va_high_addr_switch.sh broken will be resolved.
Signed-off-by: Chunyu Hu chuhu@redhat.com
tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index 471e539d82b8..f1a7ad3ec6a7 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -172,13 +172,13 @@ fi
# set proper nr_hugepages if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then
nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages)
nr_hugepgs_origin=$(cat /proc/sys/vm/nr_hugepages)
I'd call this "orig_nr_hugepgs".
Hi David,
Thank you for your review and valuable feedback. I will rename it with a v2 and resend the two patches. Do you have suggestions on patch 2?
But it's a shame that the naming is then out of sync with nr_size_hugepgs?
nr_size_hugepgs is for uffd-wp-mremap, the test need all sizes hugepages, it's used to save and restore the nr_hugepagees of all sizes of hugepages, it's a test case setup, not like nr_hugepgs which is a global/general setup. They are not the same kind, maybe they don't need to be aligned...
needpgs=$((needmem_KB / hpgsize_KB)) tries=2 while [ "$tries" -gt 0 ] && [ "$freepgs" -lt "$needpgs" ]; do lackpgs=$((needpgs - freepgs)) echo 3 > /proc/sys/vm/drop_caches
if ! echo $((lackpgs + nr_hugepgs)) > /proc/sys/vm/nr_hugepages; then
if ! echo $((lackpgs + nr_hugepgs_origin)) > /proc/sys/vm/nr_hugepages; then echo "Please run this test as root" exit $ksft_skip fi
@@ -189,6 +189,7 @@ if [ -n "$freepgs" ] && [ -n "$hpgsize_KB" ]; then done < /proc/meminfo tries=$((tries - 1)) done
nr_hugepgs=$(cat /proc/sys/vm/nr_hugepages) if [ "$freepgs" -lt "$needpgs" ]; then printf "Not enough huge pages available (%d < %d)\n" \ "$freepgs" "$needpgs"
@@ -532,6 +533,10 @@ CATEGORY="page_frag" run_test ./test_page_frag.sh aligned
CATEGORY="page_frag" run_test ./test_page_frag.sh nonaligned
+if [ "${HAVE_HUGEPAGES}" = 1 ]; then
echo "$nr_hugepgs_origin" > /proc/sys/vm/nr_hugepages
+fi
FWIW, I think the tests should maybe be doing that (save+configure+restore) themselves, like we do with THP settings through.
thp_save_settings() thp_write_settings()
and friends.
This is not really something run_vmtests.sh should bother with.
A bigger rework, though ...
Totally agree, with the c interface to do that is better. then the vm_runtest.sh would be clean. It's a bigger rework outside of this topic...
-- Cheers
David / dhildenb
-- ---- Thanks, Chunyu Hu
linux-kselftest-mirror@lists.linaro.org