The selftest started failing since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") was merged. While debugging I stumbled upon another bug and potential cleanup.
Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de --- Thomas Weißschuh (3): selftests/mm: virtual_address_range: Fix error when CommitLimit < 1GiB selftests/mm: virtual_address_range: Avoid reading VVAR mappings selftests/mm: virtual_address_range: Dump to /dev/null
tools/testing/selftests/mm/virtual_address_range.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) --- base-commit: fbfd64d25c7af3b8695201ebc85efe90be28c5a3 change-id: 20250107-virtual_address_range-tests-95843766fa97
Best regards,
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
--- The logic in __vm_enough_memory() seems weird. It describes itself as "Check that a process has enough memory to allocate a new virtual mapping", however it never checks the current memory usage of the process. So it only disallows large mappings. But many small mappings taking the same amount of memory are allowed; and then even automatically merged into one big mapping. --- tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) validate_addr(ptr[i], 0); } lchunks = i; + + if (!lchunks) { + ksft_test_result_skip("Not enough memory for a single chunk\n"); + ksft_finished(); + } + hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); if (hptr == NULL) { ksft_test_result_skip("Memory constraint not fulfilled\n");
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
The logic in __vm_enough_memory() seems weird. It describes itself as "Check that a process has enough memory to allocate a new virtual mapping", however it never checks the current memory usage of the process. So it only disallows large mappings. But many small mappings taking the same amount of memory are allowed; and then even automatically merged into one big mapping.
tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) validate_addr(ptr[i], 0); } lchunks = i;
- if (!lchunks) {
ksft_test_result_skip("Not enough memory for a single chunk\n");
ksft_finished();
- }
- hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); if (hptr == NULL) { ksft_test_result_skip("Memory constraint not fulfilled\n");
I do not know about __vm_enough_memory(), but I am going by your description: You say that the kernel may fail mmap() when enough physical memory is not there, but it may happen that we have already done 100 mmap()'s, and then the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
The basic assumption of the test is that any process should be able to exhaust its virtual address space, and running the test under memory pressure and the kernel violating this behaviour defeats the point of the test I think?
On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
The logic in __vm_enough_memory() seems weird. It describes itself as "Check that a process has enough memory to allocate a new virtual mapping", however it never checks the current memory usage of the process. So it only disallows large mappings. But many small mappings taking the same amount of memory are allowed; and then even automatically merged into one big mapping.
tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) validate_addr(ptr[i], 0); } lchunks = i;
- if (!lchunks) {
ksft_test_result_skip("Not enough memory for a single chunk\n");
ksft_finished();
- }
- hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); if (hptr == NULL) { ksft_test_result_skip("Memory constraint not fulfilled\n");
I do not know about __vm_enough_memory(), but I am going by your description: You say that the kernel may fail mmap() when enough physical memory is not there, but it may happen that we have already done 100 mmap()'s, and then the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
__vm_enough_memory() only checks the size of each single mmap() on its own. It does not actually check the current memory or address space usage of the process. This seems a bit weird, as indicated in my after-the-fold explanation.
The basic assumption of the test is that any process should be able to exhaust its virtual address space, and running the test under memory pressure and the kernel violating this behaviour defeats the point of the test I think?
The assumption is correct, as soon as one mapping succeeds the others will also succeed, until the actual address space is exhausted.
Looking at it again, __vm_enough_memory() is only called for writable mappings, so it would be possible to use only readable mappings in the test. The test will still fail with OOM, as the many PTEs need more than 1GiB of physical memory anyways, but at least that produces a usable error message. However I'm not sure if this would violate other test assumptions.
On 08.01.25 09:05, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org
CC stable on tests is ... odd.
Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
The logic in __vm_enough_memory() seems weird. It describes itself as "Check that a process has enough memory to allocate a new virtual mapping", however it never checks the current memory usage of the process. So it only disallows large mappings. But many small mappings taking the same amount of memory are allowed; and then even automatically merged into one big mapping.
tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) validate_addr(ptr[i], 0); } lchunks = i;
- if (!lchunks) {
ksft_test_result_skip("Not enough memory for a single chunk\n");
ksft_finished();
- }
- hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); if (hptr == NULL) { ksft_test_result_skip("Memory constraint not fulfilled\n");
I do not know about __vm_enough_memory(), but I am going by your description: You say that the kernel may fail mmap() when enough physical memory is not there, but it may happen that we have already done 100 mmap()'s, and then the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
__vm_enough_memory() only checks the size of each single mmap() on its own. It does not actually check the current memory or address space usage of the process. This seems a bit weird, as indicated in my after-the-fold explanation.
The basic assumption of the test is that any process should be able to exhaust its virtual address space, and running the test under memory pressure and the kernel violating this behaviour defeats the point of the test I think?
The assumption is correct, as soon as one mapping succeeds the others will also succeed, until the actual address space is exhausted.
Looking at it again, __vm_enough_memory() is only called for writable mappings, so it would be possible to use only readable mappings in the test. The test will still fail with OOM, as the many PTEs need more than 1GiB of physical memory anyways, but at least that produces a usable error message. However I'm not sure if this would violate other test assumptions.
Note that with MAP_NORESRVE, most setups we care about will allow mapping as much as you want, but on access OOM will fire.
So one could require that /proc/sys/vm/overcommit_memory is setup properly and use MAP_NORESRVE.
Reading from anonymous memory will populate the shared zeropage. To mitigate OOM from "too many page tables", one could simply unmap the pieces as they are verified (or MAP_FIXED over them, to free page tables).
On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
On 08.01.25 09:05, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org
CC stable on tests is ... odd.
I thought it was fairly common, but it isn't. Will drop it.
Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
The logic in __vm_enough_memory() seems weird. It describes itself as "Check that a process has enough memory to allocate a new virtual mapping", however it never checks the current memory usage of the process. So it only disallows large mappings. But many small mappings taking the same amount of memory are allowed; and then even automatically merged into one big mapping.
tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) validate_addr(ptr[i], 0); } lchunks = i;
- if (!lchunks) {
ksft_test_result_skip("Not enough memory for a single chunk\n");
ksft_finished();
- }
- hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); if (hptr == NULL) { ksft_test_result_skip("Memory constraint not fulfilled\n");
I do not know about __vm_enough_memory(), but I am going by your description: You say that the kernel may fail mmap() when enough physical memory is not there, but it may happen that we have already done 100 mmap()'s, and then the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
__vm_enough_memory() only checks the size of each single mmap() on its own. It does not actually check the current memory or address space usage of the process. This seems a bit weird, as indicated in my after-the-fold explanation.
The basic assumption of the test is that any process should be able to exhaust its virtual address space, and running the test under memory pressure and the kernel violating this behaviour defeats the point of the test I think?
The assumption is correct, as soon as one mapping succeeds the others will also succeed, until the actual address space is exhausted.
Looking at it again, __vm_enough_memory() is only called for writable mappings, so it would be possible to use only readable mappings in the test. The test will still fail with OOM, as the many PTEs need more than 1GiB of physical memory anyways, but at least that produces a usable error message. However I'm not sure if this would violate other test assumptions.
Note that with MAP_NORESRVE, most setups we care about will allow mapping as much as you want, but on access OOM will fire.
Thanks for the hint.
So one could require that /proc/sys/vm/overcommit_memory is setup properly and use MAP_NORESRVE.
Isn't the check for lchunks == 0 essentially exactly this?
Reading from anonymous memory will populate the shared zeropage. To mitigate OOM from "too many page tables", one could simply unmap the pieces as they are verified (or MAP_FIXED over them, to free page tables).
The code has to figure out if a verified region was created by mmap(), otherwise an munmap() could crash the process. As the entries from /proc/self/maps may have been merged and (I assume) the ordering of mappings is not guaranteed, some bespoke logic to establish the link will be needed.
Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement.
Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even in very low physical memory conditions.
Thomas
On 08.01.25 17:13, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
On 08.01.25 09:05, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org
CC stable on tests is ... odd.
I thought it was fairly common, but it isn't. Will drop it.
As it's not really a "kernel BUG", it's rather uncommon.
Note that with MAP_NORESRVE, most setups we care about will allow mapping as much as you want, but on access OOM will fire.
Thanks for the hint.
So one could require that /proc/sys/vm/overcommit_memory is setup properly and use MAP_NORESRVE.
Isn't the check for lchunks == 0 essentially exactly this?
I assume paired with MAP_NORESERVE?
Maybe, but it could be better to have something that says "if overcommit_memory is not setup properly I will SKIP this test", but otherwise I expect this to work and will FAIL if it doesn't".
Or would you expect to run into lchunks == 0 even if overcommit_memory is setup properly and MAP_NORESERVE is used? (very very low memory that we cannot even create all the VMAs?)
Reading from anonymous memory will populate the shared zeropage. To mitigate OOM from "too many page tables", one could simply unmap the pieces as they are verified (or MAP_FIXED over them, to free page tables).
The code has to figure out if a verified region was created by mmap(), otherwise an munmap() could crash the process. As the entries from /proc/self/maps may have been merged and (I assume)
Yes, and partial unmap (in chunk granularity?) would split them again.
the ordering of mappings is not guaranteed, some bespoke logic to establish the link will be needed.
My thinking was that you simply process one /proc/self/maps entry in some chunks. After processing a chunk, you munmap() it.
So you would process + munmap in chunks.
Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement.
Can you elaborate how you would do it?
Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even in very low physical memory conditions.
Cool.
On Wed, Jan 08, 2025 at 05:46:37PM +0100, David Hildenbrand wrote:
On 08.01.25 17:13, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
On 08.01.25 09:05, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org
CC stable on tests is ... odd.
I thought it was fairly common, but it isn't. Will drop it.
As it's not really a "kernel BUG", it's rather uncommon.
I also used it on patch 2, which is now reproducibly broken on x86 mainline since my commit mentioned in that patch. But I'll drop it there, too.
Note that with MAP_NORESRVE, most setups we care about will allow mapping as much as you want, but on access OOM will fire.
Thanks for the hint.
So one could require that /proc/sys/vm/overcommit_memory is setup properly and use MAP_NORESRVE.
Isn't the check for lchunks == 0 essentially exactly this?
I assume paired with MAP_NORESERVE?
Yes.
Maybe, but it could be better to have something that says "if overcommit_memory is not setup properly I will SKIP this test", but otherwise I expect this to work and will FAIL if it doesn't".
Ok, I'll validate the sysctl value.
Or would you expect to run into lchunks == 0 even if overcommit_memory is setup properly and MAP_NORESERVE is used? (very very low memory that we cannot even create all the VMAs?)
No.
Reading from anonymous memory will populate the shared zeropage. To mitigate OOM from "too many page tables", one could simply unmap the pieces as they are verified (or MAP_FIXED over them, to free page tables).
The code has to figure out if a verified region was created by mmap(), otherwise an munmap() could crash the process. As the entries from /proc/self/maps may have been merged and (I assume)
Yes, and partial unmap (in chunk granularity?) would split them again.
the ordering of mappings is not guaranteed, some bespoke logic to establish the link will be needed.
My thinking was that you simply process one /proc/self/maps entry in some chunks. After processing a chunk, you munmap() it.
So you would process + munmap in chunks.
That is clear. The issue would be to figure which chunks are valid to unmap. If something critical like the executable file is unmapped, the process crashes. But see below.
Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement.
Can you elaborate how you would do it?
First set the VMA name after mmap():
for (i = 0; i < NR_CHUNKS_LOW; i++) { ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr[i] == MAP_FAILED) { if (validate_lower_address_hint()) ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n"); break; }
validate_addr(ptr[i], 0); if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range")) ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno)); }
During validation:
hop = 0; while (start_addr + hop < end_addr) { if (write(fd, (void *)(start_addr + hop), 1) != 1) return 1; lseek(fd, 0, SEEK_SET);
if (!strncmp(line + path_offset, "[anon:virtual_address_range]", 28)) munmap((char *)(start_addr + hop), MAP_CHUNK_SIZE);
hop += MAP_CHUNK_SIZE;
}
It is done for each chunk, as all chunks may have been merged into a single VMA and a per-VMA unmap would not happen before OOM.
Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even in very low physical memory conditions.
Cool.
That is clear. The issue would be to figure which chunks are valid to unmap. If something critical like the executable file is unmapped, the process crashes. But see below.
Ah, now I see what you mean. Yes, also the stack etc. will be problematic. So IIUC, you want to limit the munmap optimization only to the manually mmap()ed parts.
Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement.
Can you elaborate how you would do it?
First set the VMA name after mmap():
for (i = 0; i < NR_CHUNKS_LOW; i++) { ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr[i] == MAP_FAILED) { if (validate_lower_address_hint()) ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n"); break; }
validate_addr(ptr[i], 0); if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range")) ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno));
Likely this would prevent merging of VMAs.
With a 1 GiB chunk size, and NR_CHUNKS_LOW == 128TiB, you'd already require 128k VMAs. The default limit is frequently 64k.
We could just scan the ptr / hptr array to see if this is a manual mmap area or not. If this takes too long, one could sort the arrays by address and perform a binary search.
Not the most efficient way of doing it, but maybe good enough for this test?
Alternatively, store the pointer in a xarray-like tree instead of two arrays. Requires a bit more memory ... and we'd have to find a simple implementation we could just reuse in this test. So maybe there is a simpler way to get it done.
On 09.01.25 14:05, David Hildenbrand wrote:
That is clear. The issue would be to figure which chunks are valid to unmap. If something critical like the executable file is unmapped, the process crashes. But see below.
Ah, now I see what you mean. Yes, also the stack etc. will be problematic. So IIUC, you want to limit the munmap optimization only to the manually mmap()ed parts.
Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement.
Can you elaborate how you would do it?
First set the VMA name after mmap():
I took a look at the implementation, and VMA merging seems to be able to merge such VMAs that share the same name (even when set separately).
So assuming you use the same name for all, that should indeed also work.
On Thu, Jan 09, 2025 at 02:05:43PM +0100, David Hildenbrand wrote:
That is clear. The issue would be to figure which chunks are valid to unmap. If something critical like the executable file is unmapped, the process crashes. But see below.
Ah, now I see what you mean. Yes, also the stack etc. will be problematic. So IIUC, you want to limit the munmap optimization only to the manually mmap()ed parts.
Correct.
Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement.
Can you elaborate how you would do it?
First set the VMA name after mmap():
for (i = 0; i < NR_CHUNKS_LOW; i++) { ptr[i] = mmap(NULL, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, MAP_NORESERVE | MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr[i] == MAP_FAILED) { if (validate_lower_address_hint()) ksft_exit_fail_msg("mmap unexpectedly succeeded with hint\n"); break; }
validate_addr(ptr[i], 0); if (prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, ptr[i], MAP_CHUNK_SIZE, "virtual_address_range")) ksft_exit_fail_msg("prctl(PR_SET_VMA_ANON_NAME) failed: %s\n", strerror(errno));
Likely this would prevent merging of VMAs.
With a 1 GiB chunk size, and NR_CHUNKS_LOW == 128TiB, you'd already require 128k VMAs. The default limit is frequently 64k.
They are merged for me, as they all share the same name.
PR_SET_VMA(2const) even mentions merging:
Note that assigning an attribute to a virtual memory area might prevent it from being merged with adjacent virtual memory areas due to the difference in that attribute's value.
is_mergeable_vma() has an explicit check using anon_vma_name_eq().
We could just scan the ptr / hptr array to see if this is a manual mmap area or not. If this takes too long, one could sort the arrays by address and perform a binary search.
Not the most efficient way of doing it, but maybe good enough for this test?
A naive loop is what I tried first, but it took forever.
Alternatively, store the pointer in a xarray-like tree instead of two arrays. Requires a bit more memory ... and we'd have to find a simple implementation we could just reuse in this test. So maybe there is a simpler way to get it done.
IMO the prctl() is that simpler way. The only real drawback is the dependency on CONFIG_ANON_VMA_NAME. We can add an entry to tools/testing/selftests/mm/config for it.
Thomas
On 08/01/25 9:43 pm, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 02:36:57PM +0100, David Hildenbrand wrote:
On 08.01.25 09:05, Thomas Weißschuh wrote:
On Wed, Jan 08, 2025 at 11:46:19AM +0530, Dev Jain wrote:
On 07/01/25 8:44 pm, Thomas Weißschuh wrote:
If not enough physical memory is available the kernel may fail mmap(); see __vm_enough_memory() and vm_commit_limit(). In that case the logic in validate_complete_va_space() does not make sense and will even incorrectly fail. Instead skip the test if no mmap() succeeded.
Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org
CC stable on tests is ... odd.
I thought it was fairly common, but it isn't. Will drop it.
Oh, well... https://lore.kernel.org/all/20240521074358.675031-4-dev.jain@arm.com/ I have done that before :) although the change I was making was fixing a fundamental flaw in the test and your change is fixing the test for a specific case (memory pressure), so I tend to concur with David.
Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
The logic in __vm_enough_memory() seems weird. It describes itself as "Check that a process has enough memory to allocate a new virtual mapping", however it never checks the current memory usage of the process. So it only disallows large mappings. But many small mappings taking the same amount of memory are allowed; and then even automatically merged into one big mapping.
tools/testing/selftests/mm/virtual_address_range.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 2a2b69e91950a37999f606847c9c8328d79890c2..d7bf8094d8bcd4bc96e2db4dc3fcb41968def859 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -178,6 +178,12 @@ int main(int argc, char *argv[]) validate_addr(ptr[i], 0); } lchunks = i;
- if (!lchunks) {
ksft_test_result_skip("Not enough memory for a single chunk\n");
ksft_finished();
- }
hptr = (char **) calloc(NR_CHUNKS_HIGH, sizeof(char *)); if (hptr == NULL) { ksft_test_result_skip("Memory constraint not fulfilled\n");
I do not know about __vm_enough_memory(), but I am going by your description: You say that the kernel may fail mmap() when enough physical memory is not there, but it may happen that we have already done 100 mmap()'s, and then the kernel fails mmap(), so if (!lchunks) won't be able to handle this case. Basically, lchunks == 0 is not a complete indicator of kernel failing mmap().
__vm_enough_memory() only checks the size of each single mmap() on its own. It does not actually check the current memory or address space usage of the process. This seems a bit weird, as indicated in my after-the-fold explanation.
The basic assumption of the test is that any process should be able to exhaust its virtual address space, and running the test under memory pressure and the kernel violating this behaviour defeats the point of the test I think?
The assumption is correct, as soon as one mapping succeeds the others will also succeed, until the actual address space is exhausted.
Looking at it again, __vm_enough_memory() is only called for writable mappings, so it would be possible to use only readable mappings in the test. The test will still fail with OOM, as the many PTEs need more than 1GiB of physical memory anyways, but at least that produces a usable error message. However I'm not sure if this would violate other test assumptions.
Note that with MAP_NORESRVE, most setups we care about will allow mapping as much as you want, but on access OOM will fire.
Thanks for the hint.
So one could require that /proc/sys/vm/overcommit_memory is setup properly and use MAP_NORESRVE.
Isn't the check for lchunks == 0 essentially exactly this?
Reading from anonymous memory will populate the shared zeropage. To mitigate OOM from "too many page tables", one could simply unmap the pieces as they are verified (or MAP_FIXED over them, to free page tables).
The code has to figure out if a verified region was created by mmap(), otherwise an munmap() could crash the process. As the entries from /proc/self/maps may have been merged and (I assume) the ordering of mappings is not guaranteed, some bespoke logic to establish the link will be needed.
Is it fine to rely on CONFIG_ANON_VMA_NAME? That would make it much easier to implement.
Using MAP_NORESERVE and eager munmap()s, the testcase works nicely even in very low physical memory conditions.
Thomas
The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it, which can only determined by the VDSO code itself. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, triggering faulting accesses by virtual_address_range.
Skip the various vvar mappings in virtual_address_range to avoid errors.
Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Cc: stable@vger.kernel.org Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de --- tools/testing/selftests/mm/virtual_address_range.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index d7bf8094d8bcd4bc96e2db4dc3fcb41968def859..484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -116,10 +116,11 @@ static int validate_complete_va_space(void)
prev_end_addr = 0; while (fgets(line, sizeof(line), file)) { + int path_offset = 0; unsigned long hop;
- if (sscanf(line, "%lx-%lx %s[rwxp-]", - &start_addr, &end_addr, prot) != 3) + if (sscanf(line, "%lx-%lx %4s %*s %*s %*s %n", + &start_addr, &end_addr, prot, &path_offset) != 3) ksft_exit_fail_msg("cannot parse /proc/self/maps\n");
/* end of userspace mappings; ignore vsyscall mapping */ @@ -135,6 +136,10 @@ static int validate_complete_va_space(void) if (prot[0] != 'r') continue;
+ /* Only the VDSO can know if a VVAR mapping is really readable */ + if (path_offset && !strncmp(line + path_offset, "[vvar", 5)) + continue; + /* * Confirm whether MAP_CHUNK_SIZE chunk can be found or not. * If write succeeds, no need to check MAP_CHUNK_SIZE - 1
During the execution of validate_complete_va_space() a lot of memory is on the VM subsystem. When running on a low memory subsystem an OOM may be triggered, when writing to the dump file as the filesystem may also require memory.
On my test system with 1100MiB physical memory:
Tasks state (memory values in pages): [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name [ 57] 0 57 34359215953 695 256 0 439 1064390656 0 0 virtual_address
Out of memory: Killed process 57 (virtual_address) total-vm:137436863812kB, anon-rss:1024kB, file-rss:0kB, shmem-rss:1756kB, UID:0 pgtables:1039444kB oom_score_adj:0 <snip> fault_in_iov_iter_readable+0x4a/0xd0 generic_perform_write+0x9c/0x280 shmem_file_write_iter+0x86/0x90 vfs_write+0x29c/0x480 ksys_write+0x6c/0xe0 do_syscall_64+0x9e/0x1a0 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Write the dumped data into /dev/null instead which does not require additional memory during write(), making the code simpler as a side-effect.
Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de --- tools/testing/selftests/mm/virtual_address_range.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 484f82c7b7c871f82a7d9ec6d6c649f2ab1eb0cd..4042fd878acd702d23da2c3293292de33bd48143 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -103,10 +103,9 @@ static int validate_complete_va_space(void) FILE *file; int fd;
- fd = open("va_dump", O_CREAT | O_WRONLY, 0600); - unlink("va_dump"); + fd = open("/dev/null", O_WRONLY); if (fd < 0) { - ksft_test_result_skip("cannot create or open dump file\n"); + ksft_test_result_skip("cannot create or open /dev/null\n"); ksft_finished(); }
@@ -152,7 +151,6 @@ static int validate_complete_va_space(void) while (start_addr + hop < end_addr) { if (write(fd, (void *)(start_addr + hop), 1) != 1) return 1; - lseek(fd, 0, SEEK_SET);
hop += MAP_CHUNK_SIZE; }
linux-kselftest-mirror@lists.linaro.org