The selftest started failing since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") was merged. While debugging I stumbled upon some memory usage optimizations.
With these test now runs on a VM with only 60MiB of memory.
Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de --- Changes in v2: - Drop /dev/null usage - Avoid overcommit restrictions by dropping PROT_WRITE - Avoid high memory usage due to PTEs - Link to v1: https://lore.kernel.org/r/20250107-virtual_address_range-tests-v1-0-3834a2fb...
--- Thomas Weißschuh (3): selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/mm: virtual_address_range: Unmap chunks after validation selftests/mm: virtual_address_range: Avoid reading VVAR mappings
tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/virtual_address_range.c | 34 +++++++++++++++++++--- 2 files changed, 31 insertions(+), 4 deletions(-) --- base-commit: 32af4d2269d20fe2f8d32aaa456cad8e40abd365 change-id: 20250107-virtual_address_range-tests-95843766fa97
Best regards,
When mapping a larger chunk than physical memory is available with PROT_WRITE and overcommit is disabled, the mapping will fail. This will prevent the test from running on systems with less then ~1GiB of memory and triggering an inscrutinable test failure. As the mappings are never written to anyways, the flag can be removed.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
--- I went with dropping PROT_WRITE instead of adding MAP_NORESERVE as this works even in the face of OVERCOMMIT_NEVER. --- tools/testing/selftests/mm/virtual_address_range.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..ea6ccf49ef4c552f26317c2a40b09bca1a677f8f 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -166,7 +166,7 @@ int main(int argc, char *argv[]) ksft_set_plan(1);
for (i = 0; i < NR_CHUNKS_LOW; i++) { - ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, + ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr[i] == MAP_FAILED) { @@ -186,7 +186,7 @@ int main(int argc, char *argv[])
for (i = 0; i < NR_CHUNKS_HIGH; i++) { hint = hint_addr(); - hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, + hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (hptr[i] == MAP_FAILED)
On 10.01.25 14:05, Thomas Weißschuh wrote:
When mapping a larger chunk than physical memory is available with PROT_WRITE and overcommit is disabled, the mapping will fail. This will prevent the test from running on systems with less then ~1GiB of memory and triggering an inscrutinable test failure. As the mappings are never written to anyways, the flag can be removed.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
I went with dropping PROT_WRITE instead of adding MAP_NORESERVE as this works even in the face of OVERCOMMIT_NEVER.
Yes, makes sense, it's certainly simpler this way.
Acked-by: David Hildenbrand david@redhat.com
On 10/01/25 6:35 pm, Thomas Weißschuh wrote:
When mapping a larger chunk than physical memory is available with PROT_WRITE and overcommit is disabled, the mapping will fail. This will prevent the test from running on systems with less then ~1GiB of memory and triggering an inscrutinable test failure. As the mappings are never written to anyways, the flag can be removed.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
Thanks! FWIW:
Acked-by: Dev Jain dev.jain@arm.com
For each accessed chunk a PTE is created. More than 1GiB of PTEs is used in this way. Remove each PTE after validating a chunk to reduce peak memory usage.
It is important to only unmap memory that previously mmap()ed, as unmapping other mappings like the stack, heap or executable mappings will crash the process. The mappings read from /proc/self/maps and the return values from mmap() don't allow a simple correlation due to merging and no guaranteed order. To correlate the pointers and mappings use prctl(PR_SET_VMA_ANON_NAME). While it introduces a test dependency, other alternatives would introduce runtime or development overhead.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de --- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/virtual_address_range.c | 26 ++++++++++++++++++++-- 2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/config b/tools/testing/selftests/mm/config index 4309916f629e36498efb07eb606b2f0c49ee6211..a28baa536332f3fcfb1b83759b5fbb432ae80178 100644 --- a/tools/testing/selftests/mm/config +++ b/tools/testing/selftests/mm/config @@ -7,3 +7,4 @@ CONFIG_TEST_HMM=m CONFIG_GUP_TEST=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_MEM_SOFT_DIRTY=y +CONFIG_ANON_VMA_NAME=y diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index ea6ccf49ef4c552f26317c2a40b09bca1a677f8f..4fc1c21a5e218eaec4d059b75c31a21dd4e8a215 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -10,6 +10,7 @@ #include <string.h> #include <unistd.h> #include <errno.h> +#include <sys/prctl.h> #include <sys/mman.h> #include <sys/time.h> #include <fcntl.h> @@ -82,6 +83,17 @@ static void validate_addr(char *ptr, int high_addr) ksft_exit_fail_msg("Bad address %lx\n", addr); }
+static void mark_addr(char *ptr) +{ + if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr, MAP_CHUNK_SIZE, "virtual_address_range")) + ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno)); +} + +static int is_marked_addr(const char *vma_name) +{ + return vma_name && !strcmp(vma_name, "[anon:virtual_address_range]\n"); +} + static int validate_lower_address_hint(void) { char *ptr; @@ -116,12 +128,17 @@ static int validate_complete_va_space(void)
prev_end_addr = 0; while (fgets(line, sizeof(line), file)) { + const char *vma_name = NULL; + int vma_name_start = 0; unsigned long hop;
- if (sscanf(line, "%lx-%lx %s[rwxp-]", - &start_addr, &end_addr, prot) != 3) + if (sscanf(line, "%lx-%lx %4s %*s %*s %*s %n", + &start_addr, &end_addr, prot, &vma_name_start) != 3) ksft_exit_fail_msg("cannot parse /proc/self/maps\n");
+ if (vma_name_start) + vma_name = line + vma_name_start; + /* end of userspace mappings; ignore vsyscall mapping */ if (start_addr & (1UL << 63)) return 0; @@ -149,6 +166,9 @@ static int validate_complete_va_space(void) return 1; lseek(fd, 0, SEEK_SET);
+ if (is_marked_addr(vma_name)) + munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE); + hop += MAP_CHUNK_SIZE; } } @@ -175,6 +195,7 @@ int main(int argc, char *argv[]) break; }
+ mark_addr(ptr[i]); validate_addr(ptr[i], 0); } lchunks = i; @@ -192,6 +213,7 @@ int main(int argc, char *argv[]) if (hptr[i] == MAP_FAILED) break;
+ mark_addr(ptr[i]); validate_addr(hptr[i], 1); } hchunks = i;
On 10.01.25 14:05, Thomas Weißschuh wrote:
For each accessed chunk a PTE is created. More than 1GiB of PTEs is used in this way. Remove each PTE after validating a chunk to reduce peak memory usage.
It is important to only unmap memory that previously mmap()ed, as unmapping other mappings like the stack, heap or executable mappings will crash the process. The mappings read from /proc/self/maps and the return values from mmap() don't allow a simple correlation due to merging and no guaranteed order. To correlate the pointers and mappings use prctl(PR_SET_VMA_ANON_NAME). While it introduces a test dependency, other alternatives would introduce runtime or development overhead.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/virtual_address_range.c | 26 ++++++++++++++++++++-- 2 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/config b/tools/testing/selftests/mm/config index 4309916f629e36498efb07eb606b2f0c49ee6211..a28baa536332f3fcfb1b83759b5fbb432ae80178 100644 --- a/tools/testing/selftests/mm/config +++ b/tools/testing/selftests/mm/config @@ -7,3 +7,4 @@ CONFIG_TEST_HMM=m CONFIG_GUP_TEST=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_MEM_SOFT_DIRTY=y +CONFIG_ANON_VMA_NAME=y
I'm afraid, nobody uses these :) People run these tests against arbitrary kernels (i.e., distro kernels).
In addition to that, we should handle it like uffd-unit-tests.c and sense support for CONFIG_ANON_VMA_NAME in the current kernel.
If not around skip the test, like uffd-unit-tests.c does.
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index ea6ccf49ef4c552f26317c2a40b09bca1a677f8f..4fc1c21a5e218eaec4d059b75c31a21dd4e8a215 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -10,6 +10,7 @@ #include <string.h> #include <unistd.h> #include <errno.h> +#include <sys/prctl.h> #include <sys/mman.h> #include <sys/time.h> #include <fcntl.h> @@ -82,6 +83,17 @@ static void validate_addr(char *ptr, int high_addr) ksft_exit_fail_msg("Bad address %lx\n", addr); } +static void mark_addr(char *ptr)
I would call this "mark_range" and pass the size (MAP_CHUNK_SIZE) as well.
+{
- if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr, MAP_CHUNK_SIZE, "virtual_address_range"))
ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno));
+}
+static int is_marked_addr(const char *vma_name)
"is_marked_vma" / "is_marked_mapping" ?
Because you are not passing an address ...
Apart from that LGTM.
The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 [vvar_vclock] can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it. Only the VDSO itself has the necessary information to know this. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, leading to EFAULT from read() during the validation.
Skip the various vvar mappings in virtual_address_range to avoid the issue.
Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de --- tools/testing/selftests/mm/virtual_address_range.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 4fc1c21a5e218eaec4d059b75c31a21dd4e8a215..993990aba56fc986c42084ffa91973558aa07e87 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -152,6 +152,10 @@ static int validate_complete_va_space(void) if (prot[0] != 'r') continue;
+ /* Only the VDSO can know if a VVAR mapping is really readable */ + if (vma_name && !strncmp(vma_name, "[vvar", 5)) + continue; + /* * Confirm whether MAP_CHUNK_SIZE chunk can be found or not. * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
On 10.01.25 14:05, Thomas Weißschuh wrote:
The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 [vvar_vclock] can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it. Only the VDSO itself has the necessary information to know this. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, leading to EFAULT from read() during the validation.
Skip the various vvar mappings in virtual_address_range to avoid the issue.
Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
tools/testing/selftests/mm/virtual_address_range.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 4fc1c21a5e218eaec4d059b75c31a21dd4e8a215..993990aba56fc986c42084ffa91973558aa07e87 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -152,6 +152,10 @@ static int validate_complete_va_space(void) if (prot[0] != 'r') continue;
/* Only the VDSO can know if a VVAR mapping is really readable */
if (vma_name && !strncmp(vma_name, "[vvar", 5))
continue;
I'm wondering if there is a more generic way ... but likely not when staring at /proc/self/maps.
/proc/self/smaps would indicate this as
VM_IO: "io" VM_DONTDUMP: "dd" VM_PFNMAP: "pf"
Especially checking for VM_IO sounds reasonable ...
On Fri, Jan 10, 2025 at 04:41:03PM +0100, David Hildenbrand wrote:
On 10.01.25 14:05, Thomas Weißschuh wrote:
The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 [vvar_vclock] can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it. Only the VDSO itself has the necessary information to know this. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, leading to EFAULT from read() during the validation.
Skip the various vvar mappings in virtual_address_range to avoid the issue.
Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
tools/testing/selftests/mm/virtual_address_range.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 4fc1c21a5e218eaec4d059b75c31a21dd4e8a215..993990aba56fc986c42084ffa91973558aa07e87 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -152,6 +152,10 @@ static int validate_complete_va_space(void) if (prot[0] != 'r') continue;
/* Only the VDSO can know if a VVAR mapping is really readable */
if (vma_name && !strncmp(vma_name, "[vvar", 5))
continue;
I'm wondering if there is a more generic way ... but likely not when staring at /proc/self/maps.
/proc/self/smaps would indicate this as
VM_IO: "io" VM_DONTDUMP: "dd" VM_PFNMAP: "pf"
Especially checking for VM_IO sounds reasonable ...
Agreed.
Can we instead rely on madvise(MADV_DOFORK) returning EINVAL iff VM_IO? That would remove the need for custom parsing and the dependency on CONFIG_PROC_PAGE_MONITOR.
Thomas
On 13.01.25 10:09, Thomas Weißschuh wrote:
On Fri, Jan 10, 2025 at 04:41:03PM +0100, David Hildenbrand wrote:
On 10.01.25 14:05, Thomas Weißschuh wrote:
The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 [vvar_vclock] can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it. Only the VDSO itself has the necessary information to know this. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, leading to EFAULT from read() during the validation.
Skip the various vvar mappings in virtual_address_range to avoid the issue.
Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
tools/testing/selftests/mm/virtual_address_range.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 4fc1c21a5e218eaec4d059b75c31a21dd4e8a215..993990aba56fc986c42084ffa91973558aa07e87 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -152,6 +152,10 @@ static int validate_complete_va_space(void) if (prot[0] != 'r') continue;
/* Only the VDSO can know if a VVAR mapping is really readable */
if (vma_name && !strncmp(vma_name, "[vvar", 5))
continue;
I'm wondering if there is a more generic way ... but likely not when staring at /proc/self/maps.
/proc/self/smaps would indicate this as
VM_IO: "io" VM_DONTDUMP: "dd" VM_PFNMAP: "pf"
Especially checking for VM_IO sounds reasonable ...
Agreed.
Can we instead rely on madvise(MADV_DOFORK) returning EINVAL iff VM_IO?
That might be one option indeed, although it feels like this is something that might change in the future ... not sure.
MADV_POPULATE_READ fails with -EFAULT on VM_IO, but also on VM_PFNMAP and some other conditions, and might not be completely what we want.
We do have some initial smaps parsing in vm_util.c:__check_huge(), maybe that could be factored out / reused, not sure how hard that would be to extend to return the flags.
linux-kselftest-mirror@lists.linaro.org