Hi all,
This series refactors the VMA count limit code to improve clarity, test coverage, and observability.
The VMA count limit, controlled by sysctl_max_map_count, is a safeguard that prevents a single process from consuming excessive kernel memory by creating too many memory mappings.
A major change since v3 is the first patch in the series which instead of attempting to fix overshooting the limit now documents that this is the intended behavior. As Hugh pointed out, the lenient check (>) in do_mmap() and do_brk_flags() is intentional to allow for potential VMA merges or expansions when the process is at the sysctl_max_map_count limit. The consensus is that this historical behavior is correct but non-obvious.
This series now focuses on making that behavior clear and the surrounding code more robust. Based on feedback from Lorenzo and David, this series retains the helper function and the rename of map_count.
The refined v4 series is now structured as follows:
1. Documents the lenient VMA count checks with comments to clarify their purpose.
2. Adds a comprehensive selftest to codify the expected behavior at the limit, including the lenient mmap case.
3. Introduces max_vma_count() to abstract the max map count sysctl, making the sysctl static and converting all callers to use the new helper.
4. Renames mm_struct->map_count to the more explicit vma_count for better code clarity.
5. Adds a tracepoint for observability when a process fails to allocate a VMA due to the count limit.
Tested on x86_64 and arm64:
1. Build test: allyesconfig for rename
2. Selftests: cd tools/testing/selftests/mm && \ make && \ ./run_vmtests.sh -t max_vma_count
3. vma tests: cd tools/testing/vma && \ make && \ ./vma
Link to v3: https://lore.kernel.org/r/20251013235259.589015-1-kaleshsingh@google.com/
Thanks to everyone for the valuable discussion on previous revisions.
-- Kalesh
Kalesh Singh (5): mm: Document lenient map_count checks mm/selftests: add max_vma_count tests mm: Introduce max_vma_count() to abstract the max map count sysctl mm: rename mm_struct::map_count to vma_count mm/tracing: introduce trace_mm_insufficient_vma_slots event
MAINTAINERS | 2 + fs/binfmt_elf.c | 2 +- fs/coredump.c | 2 +- include/linux/mm.h | 2 - include/linux/mm_types.h | 2 +- include/trace/events/vma.h | 32 + kernel/fork.c | 2 +- mm/debug.c | 2 +- mm/internal.h | 3 + mm/mmap.c | 25 +- mm/mremap.c | 13 +- mm/nommu.c | 8 +- mm/util.c | 1 - mm/vma.c | 42 +- mm/vma_internal.h | 2 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/max_vma_count_tests.c | 716 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 5 + tools/testing/vma/vma.c | 32 +- tools/testing/vma/vma_internal.h | 13 +- 21 files changed, 856 insertions(+), 52 deletions(-) create mode 100644 include/trace/events/vma.h create mode 100644 tools/testing/selftests/mm/max_vma_count_tests.c
base-commit: b227c04932039bccc21a0a89cd6df50fa57e4716
Add comments to the map_count limit checks in do_mmap() and do_brk_flags() to clarify their intended behavior.
The use of a strict inequality ('>') in these checks is intentional but non-obvious. It allows these functions to succeed when the VMA count is exactly at the sysctl_max_map_count limit. This historical behavior accounts for cases where the operation might not create a new VMA, but instead merge with or expand an existing one, in which case the VMA count does not increase.
These comments clarify the long-standing behavior and will help prevent future misinterpretation as an off-by-one error.
Signed-off-by: Kalesh Singh kaleshsingh@google.com ---
Changes in v4: - Keep the existing lenient behavior, per Hugh - Document this is intended, per Lorenzo
Changes in v3: - Collect Reviewed-by and Acked-by tags.
Changes in v2: - Fix mmap check, per Pedro
mm/mmap.c | 9 +++++++++ mm/vma.c | 6 ++++++ 2 files changed, 15 insertions(+)
diff --git a/mm/mmap.c b/mm/mmap.c index 644f02071a41..78843a2fae42 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -374,6 +374,15 @@ unsigned long do_mmap(struct file *file, unsigned long addr, return -EOVERFLOW;
/* Too many mappings? */ + /* + * The check is intentionally lenient (>) to allow an mmap() at the limit + * to succeed. This is for historical reasons, as the new mapping might + * merge with an adjacent VMA and not increase the total VMA count. + * + * If a merge does not occur, the process is allowed to exceed the + * sysctl_max_map_count limit by one. This behavior is preserved to + * avoid breaking existing applications. + */ if (mm->map_count > sysctl_max_map_count) return -ENOMEM;
diff --git a/mm/vma.c b/mm/vma.c index 919d1fc63a52..d0bb3127280e 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -2813,6 +2813,12 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) return -ENOMEM;
+ /* + * The check is intentionally lenient (>) to allow brk() to succeed at + * the limit. This is for historical reasons, as expanding the heap + * typically extends the existing brk VMA rather than creating a new one. + * See also the comment in do_mmap(). + */ if (mm->map_count > sysctl_max_map_count) return -ENOMEM;
Add a new selftest to verify that the max VMA count limit is correctly enforced.
This test suite checks that various VMA operations (mmap, mprotect, munmap, mremap) succeed or fail as expected when the number of VMAs is close to the sysctl_max_map_count limit.
The test works by first creating a large number of VMAs to bring the process close to the limit, and then performing various operations that may or may not create new VMAs. The test then verifies that the operations that would exceed the limit fail, and that the operations that do not exceed the limit succeed.
NOTE: munmap is special as it's allowed to temporarily exceed the limit by one for splits as this will decrease back to the limit once the unmap succeeds.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Hildenbrand david@redhat.com Cc: "Liam R. Howlett" Liam.Howlett@oracle.com Cc: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: Mike Rapoport rppt@kernel.org Cc: Minchan Kim minchan@kernel.org Cc: Pedro Falcato pfalcato@suse.de Signed-off-by: Kalesh Singh kaleshsingh@google.com ---
Changes in v4: - Update the mmap test cases to correctly validate the historical lenient behavior discussed in the v3 review. The test now confirms that an mmap() call succeeds at the limit and fails when one above the limit. - Add comments to the test code to document this expected behavior.
Changes in v3: - Rewrite test using kselftest harness, per Lorenzo - Update test diagram to be vertical so as to not exceed 80 chars, per Lorenzo - Use vm_util.h helpers, per Lorenzo - Update .gitignore, per Lorenzo - Add max_vma_count_tests to MEMORY MAPPING section in MAINTAINERS, per Lorenzo - Remove /proc/*/maps debugging prints and globals, per Lorenzo - rename guard regions to holes to avoid confusion with VMA guard regions, per David
Changes in v2: - Add tests, per Liam (note that the do_brk_flags() path is not easily tested from userspace, so it's not included here). Exceeding the limit there should be uncommon.
MAINTAINERS | 1 + tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/max_vma_count_tests.c | 716 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 5 + 5 files changed, 724 insertions(+) create mode 100644 tools/testing/selftests/mm/max_vma_count_tests.c
diff --git a/MAINTAINERS b/MAINTAINERS index 51aa95b80034..66f7ca5b01ad 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16581,6 +16581,7 @@ F: mm/vma.h F: mm/vma_exec.c F: mm/vma_init.c F: mm/vma_internal.h +F: tools/testing/selftests/mm/max_vma_count_tests.c F: tools/testing/selftests/mm/merge.c F: tools/testing/vma/
diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index c2a8586e51a1..010f1bced5b9 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -10,6 +10,7 @@ hugetlb-soft-offline khugepaged map_hugetlb map_populate +max_vma_count_tests thuge-gen compaction_test migration diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index eaf9312097f7..4f0b03cdece5 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -93,6 +93,7 @@ TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += uffd-stress TEST_GEN_FILES += uffd-unit-tests TEST_GEN_FILES += uffd-wp-mremap +TEST_GEN_FILES += max_vma_count_tests TEST_GEN_FILES += split_huge_page_test TEST_GEN_FILES += ksm_tests TEST_GEN_FILES += ksm_functional_tests diff --git a/tools/testing/selftests/mm/max_vma_count_tests.c b/tools/testing/selftests/mm/max_vma_count_tests.c new file mode 100644 index 000000000000..7506f44321a9 --- /dev/null +++ b/tools/testing/selftests/mm/max_vma_count_tests.c @@ -0,0 +1,716 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright 2025 Google LLC + */ +#define _GNU_SOURCE + +#include <errno.h> +#include <linux/prctl.h> /* Definition of PR_* constants */ +#include <stdbool.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/mman.h> +#include <sys/prctl.h> +#include <unistd.h> + +#define TH_LOG_ENABLED 0 +#include "../kselftest_harness.h" +#include "vm_util.h" + +#define DEFAULT_MAX_MAP_COUNT 65530 +#define TEST_AREA_NR_PAGES 3 +#define TEST_AREA_PROT (PROT_NONE) +#define EXTRA_MAP_PROT (PROT_NONE) + +/* mremap accounts for the worst case to fail early */ +#define MREMAP_REQUIRED_VMA_SLOTS 6 + +FIXTURE(max_vma_count) { + int max_vma_count; + int original_max_vma_count; + int test_area_size; + int nr_extra_maps; + char *test_area; + char *extra_maps; +}; + +/* To keep checkpatch happy */ +#define max_vma_count_data_t FIXTURE_DATA(max_vma_count) + +static int get_max_vma_count(void); +static bool set_max_vma_count(int val); +static int get_current_vma_count(void); +static bool is_test_area_mapped(char *test_area, int test_area_size); +static bool lower_max_map_count_if_needed(max_vma_count_data_t *self, + struct __test_metadata *_metadata); +static void restore_max_map_count_if_needed(max_vma_count_data_t *self, + struct __test_metadata *_metadata); +static bool free_vma_slots(max_vma_count_data_t *self, int slots_to_free); +static void create_reservation(max_vma_count_data_t *self, + struct __test_metadata *_metadata); +static void create_extra_maps(max_vma_count_data_t *self, + struct __test_metadata *_metadata); + +/** + * FIXTURE_SETUP - Sets up the VMA layout for max VMA count testing. + * + * Sets up a specific VMA layout to test behavior near the max_vma_count limit. + * A large memory area is reserved and then unmapped to create a contiguous + * address space. Mappings are then created within this space. + * + * The layout is as follows (addresses increase downwards): + * + * base_addr --> +----------------------+ + * | Hole (1 page) | + * +----------------------+ + * TEST_AREA --> | TEST_AREA | + * | (unmapped, 3 pages) | + * +----------------------+ + * | Hole (1 page) | + * +----------------------+ + * EXTRA_MAPS --> | Extra Map 1 (1 page) | + * +----------------------+ + * | Hole (1 page) | + * +----------------------+ + * | Extra Map 2 (1 page) | + * +----------------------+ + * | ... | + * +----------------------+ + * | Extra Map N (1 page) | + * +----------------------+ + * | Hole (1 page) | + * +----------------------+ + * + * "Holes" are unmapped, 1-page gaps used to isolate mappings. + * The number of "Extra Maps" is calculated to bring the total VMA count + * to MAX_VMA_COUNT - 1. + * + * Populates TEST_AREA and other globals required for the tests. + * + * Return: true on success, false on failure. + */ +FIXTURE_SETUP(max_vma_count) +{ + int initial_vma_count; + + TH_LOG("Setting up vma_max_count test ..."); + + self->test_area_size = TEST_AREA_NR_PAGES * psize(); + + if (!lower_max_map_count_if_needed(self, _metadata)) { + SKIP(return, + "max_map_count too high and cannot be lowered. Please rerun as root."); + } + + initial_vma_count = get_current_vma_count(); + ASSERT_GT(initial_vma_count, 0); + + self->nr_extra_maps = self->max_vma_count - 1 - initial_vma_count; + if (self->nr_extra_maps < 1) { + SKIP(return, + "Not enough available maps to run test (max: %d, current: %d)", + self->max_vma_count, initial_vma_count); + } + + create_reservation(self, _metadata); + create_extra_maps(self, _metadata); + + ASSERT_EQ(get_current_vma_count(), self->max_vma_count - 1); + TH_LOG("vma_max_count test setup done."); +} + +FIXTURE_TEARDOWN(max_vma_count) +{ + /* + * NOTE: Each test is run in a separate process; we leave + * mapping cleanup to process teardown for simplicity. + */ + + restore_max_map_count_if_needed(self, _metadata); +} + +static bool mmap_anon(max_vma_count_data_t *self) +{ + void *addr = mmap(NULL, psize(), PROT_READ, + MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); + return addr != MAP_FAILED; +} + +static inline bool __mprotect(char *addr, int size) +{ + int new_prot = ~TEST_AREA_PROT & (PROT_READ | PROT_WRITE | PROT_EXEC); + + return mprotect(addr, size, new_prot) == 0; +} + +static bool mprotect_nosplit(max_vma_count_data_t *self) +{ + return __mprotect(self->test_area, self->test_area_size); +} + +static bool mprotect_2way_split(max_vma_count_data_t *self) +{ + return __mprotect(self->test_area, self->test_area_size - psize()); +} + +static bool mprotect_3way_split(max_vma_count_data_t *self) +{ + return __mprotect(self->test_area + psize(), psize()); +} + +static inline bool __munmap(char *addr, int size) +{ + return munmap(addr, size) == 0; +} + +static bool munmap_nosplit(max_vma_count_data_t *self) +{ + return __munmap(self->test_area, self->test_area_size); +} + +static bool munmap_2way_split(max_vma_count_data_t *self) +{ + return __munmap(self->test_area, self->test_area_size - psize()); +} + +static bool munmap_3way_split(max_vma_count_data_t *self) +{ + return __munmap(self->test_area + psize(), psize()); +} + +static bool mremap_dontunmap(max_vma_count_data_t *self) +{ + /* + * Using MREMAP_DONTUNMAP will create a new mapping without + * removing the old one, consuming one VMA slot. + */ + return mremap(self->test_area, self->test_area_size, + self->test_area_size, MREMAP_MAYMOVE | MREMAP_DONTUNMAP, + NULL) != MAP_FAILED; +} + +TEST_F(max_vma_count, mmap_at_1_below_vma_count_limit) +{ + int vma_slots_needed = 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(mmap_anon(self)); +} + +TEST_F(max_vma_count, mmap_at_vma_count_limit) +{ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + /* + * Validate the historical lenient behavior of mmap() at the VMA limit. + * + * Unlike stricter syscalls (e.g., mprotect(), mremap()) that fail + * preemptively at the limit, mmap() is allowed to proceed. This is + * because the new mapping may merge with an adjacent VMA, in which + * case a new VMA slot is not consumed. + * + * This test confirms that an mmap() call at exactly the + * sysctl_max_map_count limit succeeds, preserving this behavior. + */ + ASSERT_TRUE(mmap_anon(self)); +} + +TEST_F(max_vma_count, mmap_at_1_above_vma_count_limit) +{ + /* + * Verify the upper bound of the lenient mmap() behavior. + * + * The previous test confirms mmap() can succeed at the VMA limit, + * potentially bringing the count to limit + 1. This test ensures + * that this behavior does not permit unrestricted growth. + * + * We first perform one successful mmap() to exceed the limit, then + * assert that the subsequent mmap() call fails as expected. + */ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(mmap_anon(self)); + + /* + * We are now 1 above the vma_count_limit. + * Test that unrestricted growth of VMAs is prevented. + */ + ASSERT_FALSE(mmap_anon(self)); +} + +TEST_F(max_vma_count, mprotect_nosplit_at_1_below_vma_count_limit) +{ + int vma_slots_needed = 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(mprotect_nosplit(self)); +} + +TEST_F(max_vma_count, mprotect_nosplit_at_vma_count_limit) +{ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(mprotect_nosplit(self)); +} + +TEST_F(max_vma_count, mprotect_2way_split_at_1_below_vma_count_limit) +{ + int vma_slots_needed = 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(mprotect_2way_split(self)); +} + +TEST_F(max_vma_count, mprotect_2way_split_at_vma_count_limit) +{ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_FALSE(mprotect_2way_split(self)); +} + +TEST_F(max_vma_count, mprotect_3way_split_at_2_below_vma_count_limit) +{ + int vma_slots_needed = 2; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(mprotect_3way_split(self)); +} + +TEST_F(max_vma_count, mprotect_3way_split_at_1_below_vma_count_limit) +{ + int vma_slots_needed = 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_FALSE(mprotect_3way_split(self)); +} + +TEST_F(max_vma_count, mprotect_3way_split_at_vma_count_limit) +{ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_FALSE(mprotect_3way_split(self)); +} + +TEST_F(max_vma_count, munmap_nosplit_at_1_below_vma_count_limit) +{ + int vma_slots_needed = 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(munmap_nosplit(self)); +} + +TEST_F(max_vma_count, munmap_nosplit_at_vma_count_limit) +{ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(munmap_nosplit(self)); +} + +TEST_F(max_vma_count, munmap_2way_split_at_1_below_vma_count_limit) +{ + int vma_slots_needed = 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(munmap_2way_split(self)); +} + +TEST_F(max_vma_count, munmap_2way_split_at_vma_count_limit) +{ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(munmap_2way_split(self)); +} + +TEST_F(max_vma_count, munmap_3way_split_at_2_below_vma_count_limit) +{ + int vma_slots_needed = 2; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(munmap_3way_split(self)); +} + +TEST_F(max_vma_count, munmap_3way_split_at_1_below_vma_count_limit) +{ + int vma_slots_needed = 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(munmap_3way_split(self)); +} + +TEST_F(max_vma_count, munmap_3way_split_at_vma_count_limit) +{ + int vma_slots_needed = 0; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_FALSE(munmap_3way_split(self)); +} + +TEST_F(max_vma_count, mremap_dontunmap_at_required_vma_count_capcity) +{ + int vma_slots_needed = MREMAP_REQUIRED_VMA_SLOTS; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_TRUE(mremap_dontunmap(self)); +} + +TEST_F(max_vma_count, mremap_dontunmap_at_1_below_required_vma_count_capacity) +{ + int vma_slots_needed = MREMAP_REQUIRED_VMA_SLOTS - 1; + + ASSERT_NE(mmap(self->test_area, self->test_area_size, TEST_AREA_PROT, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0), + MAP_FAILED); + + ASSERT_TRUE(free_vma_slots(self, vma_slots_needed)); + + ASSERT_EQ(get_current_vma_count(), + self->max_vma_count - vma_slots_needed); + ASSERT_TRUE(is_test_area_mapped(self->test_area, self->test_area_size)); + + ASSERT_FALSE(mremap_dontunmap(self)); +} + +TEST_HARNESS_MAIN + +/* --- Utilities --- */ + +static bool lower_max_map_count_if_needed(max_vma_count_data_t *self, + struct __test_metadata *_metadata) +{ + self->max_vma_count = get_max_vma_count(); + + ASSERT_GT(self->max_vma_count, 0); + + self->original_max_vma_count = 0; + if (self->max_vma_count > DEFAULT_MAX_MAP_COUNT) { + self->original_max_vma_count = self->max_vma_count; + TH_LOG("Max VMA count: %d; lowering to default %d for test...", + self->max_vma_count, DEFAULT_MAX_MAP_COUNT); + + if (!set_max_vma_count(DEFAULT_MAX_MAP_COUNT)) + return false; + self->max_vma_count = DEFAULT_MAX_MAP_COUNT; + } + return true; +} + +static void restore_max_map_count_if_needed(max_vma_count_data_t *self, + struct __test_metadata *_metadata) +{ + if (!self->original_max_vma_count) + return; + + if (self->max_vma_count == self->original_max_vma_count) + return; + + if (!set_max_vma_count(self->original_max_vma_count)) + TH_LOG("Failed to restore max_map_count to %d", + self->original_max_vma_count); +} + +static int get_max_vma_count(void) +{ + unsigned long val; + int ret; + + ret = read_sysfs("/proc/sys/vm/max_map_count", &val); + if (ret) + return -1; + return val; +} + +static bool set_max_vma_count(int val) +{ + return write_sysfs("/proc/sys/vm/max_map_count", val) == 0; +} + +static int get_current_vma_count(void) +{ + struct procmap_fd pmap; + int count = 0; + int ret; + char vma_name[PATH_MAX]; + + ret = open_self_procmap(&pmap); + if (ret) + return -1; + + pmap.query.query_addr = 0; + pmap.query.query_flags = PROCMAP_QUERY_COVERING_OR_NEXT_VMA; + + while (true) { + pmap.query.vma_name_addr = (uint64_t)(uintptr_t)vma_name; + pmap.query.vma_name_size = sizeof(vma_name); + vma_name[0] = '\0'; + + ret = query_procmap(&pmap); + if (ret != 0) + break; + + /* + * The [vsyscall] mapping is a special mapping that + * doesn't count against the max_map_count limit. + * Ignore it here to match the kernel's accounting. + */ + if (strcmp(vma_name, "[vsyscall]") != 0) + count++; + + pmap.query.query_addr = pmap.query.vma_end; + } + + close_procmap(&pmap); + return count; +} + +static void create_reservation(max_vma_count_data_t *self, + struct __test_metadata *_metadata) +{ + size_t reservation_size; + void *base_addr = NULL; + + /* + * To break the dependency on knowing the exact number of extra maps + * before creating the reservation, we allocate a reservation size + * large enough for the maximum possible number of extra maps. + * The maximum number of extra maps is bounded by max_vma_count. + */ + reservation_size = ((self->max_vma_count * 2) + + TEST_AREA_NR_PAGES + + 2 /* Holes around TEST_AREA */) * psize(); + + base_addr = mmap(NULL, reservation_size, PROT_NONE, + MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); + ASSERT_NE(base_addr, MAP_FAILED); + + ASSERT_EQ(munmap(base_addr, reservation_size), 0); + + /* The test area is offset by one hole page from the base address. */ + self->test_area = (char *)base_addr + psize(); + + /* The extra maps start after the test area and another hole page. */ + self->extra_maps = self->test_area + self->test_area_size + psize(); +} + +static void create_extra_maps(max_vma_count_data_t *self, + struct __test_metadata *_metadata) +{ + char *ptr = self->extra_maps; + + for (int i = 0; i < self->nr_extra_maps; ++i) { + ASSERT_NE(mmap(ptr, psize(), EXTRA_MAP_PROT, + MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREPLACE, + -1, 0), MAP_FAILED) { + TH_LOG("Failed on mapping #%d of %d", i + 1, + self->nr_extra_maps); + } + + /* + * Advance pointer by two pages to leave a 1-page hole, + * after each 1-page map. + */ + ptr += (2 * psize()); + } +} + +static bool free_vma_slots(max_vma_count_data_t *self, int slots_to_free) +{ + for (int i = 0; i < slots_to_free; i++) { + if (munmap(self->extra_maps + (i * 2 * psize()), psize()) != 0) + return false; + } + + return true; +} + +static bool is_test_area_mapped(char *test_area, int test_area_size) +{ + struct procmap_fd pmap; + bool found = false; + int ret; + + ret = open_self_procmap(&pmap); + if (ret) + return false; + + pmap.query.query_addr = (uint64_t)(uintptr_t)test_area; + pmap.query.query_flags = 0; /* Find VMA covering address */ + + if (query_procmap(&pmap) == 0 && + pmap.query.vma_start == (unsigned long)test_area && + pmap.query.vma_end == (unsigned long)test_area + test_area_size) + found = true; + + close_procmap(&pmap); + return found; +} + diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh index d9173f2312b7..a85db61e6a92 100755 --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -49,6 +49,8 @@ separated by spaces: test madvise(2) MADV_GUARD_INSTALL and MADV_GUARD_REMOVE options - madv_populate test memadvise(2) MADV_POPULATE_{READ,WRITE} options +- max_vma_count + tests for max vma_count - memfd_secret test memfd_secret(2) - process_mrelease @@ -426,6 +428,9 @@ fi # VADDR64 # vmalloc stability smoke test CATEGORY="vmalloc" run_test bash ./test_vmalloc.sh smoke
+# test operations against max vma count limit +CATEGORY="max_vma_count" run_test ./max_vma_count_tests + CATEGORY="mremap" run_test ./mremap_dontunmap
CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
Introduce a new helper function, max_vma_count(), to act as the canonical accessor for the maximum VMA count limit.
The global variable sysctl_max_map_count is used in multiple files to check the VMA limit. This direct usage exposes an implementation detail and makes the code harder to read and maintain.
This patch abstracts the global variable behind the more aptly named max_vma_count() function. As a result, the sysctl_max_map_count variable can now be made static to mm/mmap.c, improving encapsulation.
All call sites are converted to use the new helper, making the limit checks more readable.
Signed-off-by: Kalesh Singh kaleshsingh@google.com ---
Changes in v4: - Introduce max_vma_count() to abstract the max map count sysctl, replacing the previously proposed vma_count_remaining() helper -- since this remaining count can now be negative as some cases are allowed to exceed the limit. - Convert all callers to use the new helper.
Changes in v3: - Move vma_count_remaining() out of #if CONFIG_SYSCTL to fix build failure - Use READ_ONCE() for sysclt_max_map_count, per David, Lorenzo - Remove use of ternary op in vma_count_remaining, per Lorenzo - Rebase on mm-new to fix conflicts in vma_internal.h and mm/internal.h
include/linux/mm.h | 2 -- mm/internal.h | 3 +++ mm/mmap.c | 9 ++++++++- mm/mremap.c | 7 ++++--- mm/nommu.c | 2 +- mm/util.c | 1 - mm/vma.c | 10 +++++----- tools/testing/vma/vma_internal.h | 6 ++++++ 8 files changed, 27 insertions(+), 13 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index aada935c4950..5db9d95043f6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -205,8 +205,6 @@ static inline void __mm_zero_struct_page(struct page *page) #define MAPCOUNT_ELF_CORE_MARGIN (5) #define DEFAULT_MAX_MAP_COUNT (USHRT_MAX - MAPCOUNT_ELF_CORE_MARGIN)
-extern int sysctl_max_map_count; - extern unsigned long sysctl_user_reserve_kbytes; extern unsigned long sysctl_admin_reserve_kbytes;
diff --git a/mm/internal.h b/mm/internal.h index 116a1ba85e66..eba30ff7c8dc 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1702,4 +1702,7 @@ static inline int io_remap_pfn_range_complete(struct vm_area_struct *vma, return remap_pfn_range_complete(vma, addr, pfn, size, prot); }
+/* mmap.c */ +int max_vma_count(void); + #endif /* __MM_INTERNAL_H */ diff --git a/mm/mmap.c b/mm/mmap.c index 78843a2fae42..5a967a307099 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -383,7 +383,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * sysctl_max_map_count limit by one. This behavior is preserved to * avoid breaking existing applications. */ - if (mm->map_count > sysctl_max_map_count) + if (max_vma_count() - mm->map_count < 0) return -ENOMEM;
/* @@ -1504,6 +1504,13 @@ struct vm_area_struct *_install_special_mapping( &special_mapping_vmops); }
+static int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; + +int max_vma_count(void) +{ + return READ_ONCE(sysctl_max_map_count); +} + #ifdef CONFIG_SYSCTL #if defined(HAVE_ARCH_PICK_MMAP_LAYOUT) || \ defined(CONFIG_ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT) diff --git a/mm/mremap.c b/mm/mremap.c index a7f531c17b79..02c38fd957e4 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1040,7 +1040,7 @@ static unsigned long prep_move_vma(struct vma_remap_struct *vrm) * We'd prefer to avoid failure later on in do_munmap: * which may split one vma into three before unmapping. */ - if (current->mm->map_count >= sysctl_max_map_count - 3) + if (max_vma_count() - current->mm->map_count < 4) return -ENOMEM;
if (vma->vm_ops && vma->vm_ops->may_split) { @@ -1811,9 +1811,10 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm) * split in 3 before unmapping it. * That means 2 more maps (1 for each) to the ones we already hold. * Check whether current map count plus 2 still leads us to 4 maps below - * the threshold, otherwise return -ENOMEM here to be more safe. + * the threshold. In other words, is the current map count + 6 at or + * below the threshold? Otherwise return -ENOMEM here to be more safe. */ - if ((current->mm->map_count + 2) >= sysctl_max_map_count - 3) + if (max_vma_count() - current->mm->map_count < 6) return -ENOMEM;
return 0; diff --git a/mm/nommu.c b/mm/nommu.c index c3a23b082adb..ae2b20cc324a 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -1317,7 +1317,7 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, return -ENOMEM;
mm = vma->vm_mm; - if (mm->map_count >= sysctl_max_map_count) + if (max_vma_count() - mm->map_count < 1) return -ENOMEM;
region = kmem_cache_alloc(vm_region_jar, GFP_KERNEL); diff --git a/mm/util.c b/mm/util.c index 97cae40c0209..eb1bcfc1d48d 100644 --- a/mm/util.c +++ b/mm/util.c @@ -752,7 +752,6 @@ EXPORT_SYMBOL(folio_mc_copy); int sysctl_overcommit_memory __read_mostly = OVERCOMMIT_GUESS; static int sysctl_overcommit_ratio __read_mostly = 50; static unsigned long sysctl_overcommit_kbytes __read_mostly; -int sysctl_max_map_count __read_mostly = DEFAULT_MAX_MAP_COUNT; unsigned long sysctl_user_reserve_kbytes __read_mostly = 1UL << 17; /* 128MB */ unsigned long sysctl_admin_reserve_kbytes __read_mostly = 1UL << 13; /* 8MB */
diff --git a/mm/vma.c b/mm/vma.c index d0bb3127280e..768d216beed3 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -493,8 +493,8 @@ void unmap_region(struct ma_state *mas, struct vm_area_struct *vma, }
/* - * __split_vma() bypasses sysctl_max_map_count checking. We use this where it - * has already been checked or doesn't make sense to fail. + * __split_vma() bypasses max_vma_count() checks. We use this where + * it has already been checked or doesn't make sense to fail. * VMA Iterator will point to the original VMA. */ static __must_check int @@ -594,7 +594,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long addr, int new_below) { - if (vma->vm_mm->map_count >= sysctl_max_map_count) + if (max_vma_count() - vma->vm_mm->map_count < 1) return -ENOMEM;
return __split_vma(vmi, vma, addr, new_below); @@ -1347,7 +1347,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms, * its limit temporarily, to help free resources as expected. */ if (vms->end < vms->vma->vm_end && - vms->vma->vm_mm->map_count >= sysctl_max_map_count) { + max_vma_count() - vms->vma->vm_mm->map_count < 1) { error = -ENOMEM; goto map_count_exceeded; } @@ -2819,7 +2819,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, * typically extends the existing brk VMA rather than creating a new one. * See also the comment in do_mmap(). */ - if (mm->map_count > sysctl_max_map_count) + if (max_vma_count() - mm->map_count < 0) return -ENOMEM;
if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT)) diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h index d873667704e8..41d354a699c5 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -1491,4 +1491,10 @@ static inline int do_munmap(struct mm_struct *, unsigned long, size_t, return 0; }
+/* Helper to get max vma count */ +static int max_vma_count(void) +{ + return sysctl_max_map_count; +} + #endif /* __MM_VMA_INTERNAL_H */
Perform a mechanical rename of mm_struct->map_count to vma_count.
The name "map_count" is ambiguous. It can be confused with other counters like the page mapcount (page->_mapcount), which tracks PTE references.
The new name, vma_count, is more precise and self-documenting, as this field has always counted the number of vm_area_structs associated with an mm_struct. This change improves code clarity and readability.
While at it, update the BUG_ON() in exit_mmap() to a WARN_ON_ONCE() to avoid crashing the kernel on a simple accounting mismatch. No other functional change is intended.
Signed-off-by: Kalesh Singh kaleshsingh@google.com ---
Changes in v4: - Update the new max_vma_count_tests.c to use the vma_count name.
Changes in v3: - Change vma_count BUG_ON() in exit_mmap() to WARN_ON_ONCE, per David and Lorenzo - Collect Reviewed-by tags
fs/binfmt_elf.c | 2 +- fs/coredump.c | 2 +- include/linux/mm_types.h | 2 +- kernel/fork.c | 2 +- mm/debug.c | 2 +- mm/mmap.c | 6 ++-- mm/mremap.c | 4 +-- mm/nommu.c | 8 ++--- mm/vma.c | 30 ++++++++--------- .../selftests/mm/max_vma_count_tests.c | 28 ++++++++-------- tools/testing/vma/vma.c | 32 +++++++++---------- tools/testing/vma/vma_internal.h | 2 +- 12 files changed, 60 insertions(+), 60 deletions(-)
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index e4653bb99946..a5acfe97612d 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -1660,7 +1660,7 @@ static int fill_files_note(struct memelfnote *note, struct coredump_params *cprm data[0] = count; data[1] = PAGE_SIZE; /* - * Count usually is less than mm->map_count, + * Count usually is less than mm->vma_count, * we need to move filenames down. */ n = cprm->vma_count - count; diff --git a/fs/coredump.c b/fs/coredump.c index b5fc06a092a4..5e0859813141 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1733,7 +1733,7 @@ static bool dump_vma_snapshot(struct coredump_params *cprm)
cprm->vma_data_size = 0; gate_vma = get_gate_vma(mm); - cprm->vma_count = mm->map_count + (gate_vma ? 1 : 0); + cprm->vma_count = mm->vma_count + (gate_vma ? 1 : 0);
cprm->vma_meta = kvmalloc_array(cprm->vma_count, sizeof(*cprm->vma_meta), GFP_KERNEL); if (!cprm->vma_meta) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5021047485a9..2a102f4899ed 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1085,7 +1085,7 @@ struct mm_struct { #ifdef CONFIG_MMU atomic_long_t pgtables_bytes; /* size of all page tables */ #endif - int map_count; /* number of VMAs */ + int vma_count; /* number of VMAs */
spinlock_t page_table_lock; /* Protects page tables and some * counters diff --git a/kernel/fork.c b/kernel/fork.c index dd0bb5fe4305..b2c2ca8a0a9d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1038,7 +1038,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); mm_pgtables_bytes_init(mm); - mm->map_count = 0; + mm->vma_count = 0; mm->locked_vm = 0; atomic64_set(&mm->pinned_vm, 0); memset(&mm->rss_stat, 0, sizeof(mm->rss_stat)); diff --git a/mm/debug.c b/mm/debug.c index 64ddb0c4b4be..a35e2912ae53 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -204,7 +204,7 @@ void dump_mm(const struct mm_struct *mm) mm->pgd, atomic_read(&mm->mm_users), atomic_read(&mm->mm_count), mm_pgtables_bytes(mm), - mm->map_count, + mm->vma_count, mm->hiwater_rss, mm->hiwater_vm, mm->total_vm, mm->locked_vm, (u64)atomic64_read(&mm->pinned_vm), mm->data_vm, mm->exec_vm, mm->stack_vm, diff --git a/mm/mmap.c b/mm/mmap.c index 5a967a307099..647a676c0ab4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -383,7 +383,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * sysctl_max_map_count limit by one. This behavior is preserved to * avoid breaking existing applications. */ - if (max_vma_count() - mm->map_count < 0) + if (max_vma_count() - mm->vma_count < 0) return -ENOMEM;
/* @@ -1314,7 +1314,7 @@ void exit_mmap(struct mm_struct *mm) vma = vma_next(&vmi); } while (vma && likely(!xa_is_zero(vma)));
- BUG_ON(count != mm->map_count); + WARN_ON_ONCE(count != mm->vma_count);
trace_exit_mmap(mm); destroy: @@ -1822,7 +1822,7 @@ __latent_entropy int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) */ vma_iter_bulk_store(&vmi, tmp);
- mm->map_count++; + mm->vma_count++;
if (tmp->vm_ops && tmp->vm_ops->open) tmp->vm_ops->open(tmp); diff --git a/mm/mremap.c b/mm/mremap.c index 02c38fd957e4..4874729cd65c 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -1040,7 +1040,7 @@ static unsigned long prep_move_vma(struct vma_remap_struct *vrm) * We'd prefer to avoid failure later on in do_munmap: * which may split one vma into three before unmapping. */ - if (max_vma_count() - current->mm->map_count < 4) + if (max_vma_count() - current->mm->vma_count < 4) return -ENOMEM;
if (vma->vm_ops && vma->vm_ops->may_split) { @@ -1814,7 +1814,7 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm) * the threshold. In other words, is the current map count + 6 at or * below the threshold? Otherwise return -ENOMEM here to be more safe. */ - if (max_vma_count() - current->mm->map_count < 6) + if (max_vma_count() - current->mm->vma_count < 6) return -ENOMEM;
return 0; diff --git a/mm/nommu.c b/mm/nommu.c index ae2b20cc324a..ef05d5abbe9f 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -577,7 +577,7 @@ static void setup_vma_to_mm(struct vm_area_struct *vma, struct mm_struct *mm)
static void cleanup_vma_from_mm(struct vm_area_struct *vma) { - vma->vm_mm->map_count--; + vma->vm_mm->vma_count--; /* remove the VMA from the mapping */ if (vma->vm_file) { struct address_space *mapping; @@ -1199,7 +1199,7 @@ unsigned long do_mmap(struct file *file, goto error_just_free;
setup_vma_to_mm(vma, current->mm); - current->mm->map_count++; + current->mm->vma_count++; /* add the VMA to the tree */ vma_iter_store_new(&vmi, vma);
@@ -1317,7 +1317,7 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, return -ENOMEM;
mm = vma->vm_mm; - if (max_vma_count() - mm->map_count < 1) + if (max_vma_count() - mm->vma_count < 1) return -ENOMEM;
region = kmem_cache_alloc(vm_region_jar, GFP_KERNEL); @@ -1367,7 +1367,7 @@ static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, setup_vma_to_mm(vma, mm); setup_vma_to_mm(new, mm); vma_iter_store_new(vmi, new); - mm->map_count++; + mm->vma_count++; return 0;
err_vmi_preallocate: diff --git a/mm/vma.c b/mm/vma.c index 768d216beed3..fbb8d1a0449d 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -354,7 +354,7 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi, * (it may either follow vma or precede it). */ vma_iter_store_new(vmi, vp->insert); - mm->map_count++; + mm->vma_count++; }
if (vp->anon_vma) { @@ -385,7 +385,7 @@ static void vma_complete(struct vma_prepare *vp, struct vma_iterator *vmi, } if (vp->remove->anon_vma) anon_vma_merge(vp->vma, vp->remove); - mm->map_count--; + mm->vma_count--; mpol_put(vma_policy(vp->remove)); if (!vp->remove2) WARN_ON_ONCE(vp->vma->vm_end < vp->remove->vm_end); @@ -594,7 +594,7 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long addr, int new_below) { - if (max_vma_count() - vma->vm_mm->map_count < 1) + if (max_vma_count() - vma->vm_mm->vma_count < 1) return -ENOMEM;
return __split_vma(vmi, vma, addr, new_below); @@ -685,13 +685,13 @@ void validate_mm(struct mm_struct *mm) } #endif /* Check for a infinite loop */ - if (++i > mm->map_count + 10) { + if (++i > mm->vma_count + 10) { i = -1; break; } } - if (i != mm->map_count) { - pr_emerg("map_count %d vma iterator %d\n", mm->map_count, i); + if (i != mm->vma_count) { + pr_emerg("vma_count %d vma iterator %d\n", mm->vma_count, i); bug = 1; } VM_BUG_ON_MM(bug, mm); @@ -1268,7 +1268,7 @@ static void vms_complete_munmap_vmas(struct vma_munmap_struct *vms, struct mm_struct *mm;
mm = current->mm; - mm->map_count -= vms->vma_count; + mm->vma_count -= vms->vma_count; mm->locked_vm -= vms->locked_vm; if (vms->unlock) mmap_write_downgrade(mm); @@ -1342,14 +1342,14 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms, if (vms->start > vms->vma->vm_start) {
/* - * Make sure that map_count on return from munmap() will + * Make sure that vma_count on return from munmap() will * not exceed its limit; but let map_count go just above * its limit temporarily, to help free resources as expected. */ if (vms->end < vms->vma->vm_end && - max_vma_count() - vms->vma->vm_mm->map_count < 1) { + max_vma_count() - vms->vma->vm_mm->vma_count < 1) { error = -ENOMEM; - goto map_count_exceeded; + goto vma_count_exceeded; }
/* Don't bother splitting the VMA if we can't unmap it anyway */ @@ -1463,7 +1463,7 @@ static int vms_gather_munmap_vmas(struct vma_munmap_struct *vms, modify_vma_failed: reattach_vmas(mas_detach); start_split_failed: -map_count_exceeded: +vma_count_exceeded: return error; }
@@ -1781,7 +1781,7 @@ static int vma_link(struct mm_struct *mm, struct vm_area_struct *vma) vma_start_write(vma); vma_iter_store_new(&vmi, vma); vma_link_file(vma, /* hold_rmap_lock= */false); - mm->map_count++; + mm->vma_count++; validate_mm(mm); return 0; } @@ -2498,7 +2498,7 @@ static int __mmap_new_vma(struct mmap_state *map, struct vm_area_struct **vmap) /* Lock the VMA since it is modified after insertion into VMA tree */ vma_start_write(vma); vma_iter_store_new(vmi, vma); - map->mm->map_count++; + map->mm->vma_count++; vma_link_file(vma, map->hold_file_rmap_lock);
/* @@ -2819,7 +2819,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, * typically extends the existing brk VMA rather than creating a new one. * See also the comment in do_mmap(). */ - if (max_vma_count() - mm->map_count < 0) + if (max_vma_count() - mm->vma_count < 0) return -ENOMEM;
if (security_vm_enough_memory_mm(mm, len >> PAGE_SHIFT)) @@ -2857,7 +2857,7 @@ int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL)) goto mas_store_fail;
- mm->map_count++; + mm->vma_count++; validate_mm(mm); out: perf_event_mmap(vma); diff --git a/tools/testing/selftests/mm/max_vma_count_tests.c b/tools/testing/selftests/mm/max_vma_count_tests.c index e20cafaefc82..79404cb22df2 100644 --- a/tools/testing/selftests/mm/max_vma_count_tests.c +++ b/tools/testing/selftests/mm/max_vma_count_tests.c @@ -18,7 +18,7 @@ #include "../kselftest_harness.h" #include "vm_util.h"
-#define DEFAULT_MAX_MAP_COUNT 65530 +#define DEFAULT_MAX_VMA_COUNT 65530 #define TEST_AREA_NR_PAGES 3 #define TEST_AREA_PROT (PROT_NONE) #define EXTRA_MAP_PROT (PROT_NONE) @@ -42,9 +42,9 @@ static int get_max_vma_count(void); static bool set_max_vma_count(int val); static int get_current_vma_count(void); static bool is_test_area_mapped(char *test_area, int test_area_size); -static bool lower_max_map_count_if_needed(max_vma_count_data_t *self, +static bool lower_max_vma_count_if_needed(max_vma_count_data_t *self, struct __test_metadata *_metadata); -static void restore_max_map_count_if_needed(max_vma_count_data_t *self, +static void restore_max_vma_count_if_needed(max_vma_count_data_t *self, struct __test_metadata *_metadata); static bool free_vma_slots(max_vma_count_data_t *self, int slots_to_free); static void create_reservation(max_vma_count_data_t *self, @@ -98,9 +98,9 @@ FIXTURE_SETUP(max_vma_count)
self->test_area_size = TEST_AREA_NR_PAGES * psize();
- if (!lower_max_map_count_if_needed(self, _metadata)) { + if (!lower_max_vma_count_if_needed(self, _metadata)) { SKIP(return, - "max_map_count too high and cannot be lowered. Please rerun as root."); + "max_vma_count too high and cannot be lowered. Please rerun as root."); }
initial_vma_count = get_current_vma_count(); @@ -127,7 +127,7 @@ FIXTURE_TEARDOWN(max_vma_count) * mapping cleanup to process teardown for simplicity. */
- restore_max_map_count_if_needed(self, _metadata); + restore_max_vma_count_if_needed(self, _metadata); }
static bool mmap_anon(max_vma_count_data_t *self) @@ -542,7 +542,7 @@ TEST_HARNESS_MAIN
/* --- Utilities --- */
-static bool lower_max_map_count_if_needed(max_vma_count_data_t *self, +static bool lower_max_vma_count_if_needed(max_vma_count_data_t *self, struct __test_metadata *_metadata) { self->max_vma_count = get_max_vma_count(); @@ -550,19 +550,19 @@ static bool lower_max_map_count_if_needed(max_vma_count_data_t *self, ASSERT_GT(self->max_vma_count, 0);
self->original_max_vma_count = 0; - if (self->max_vma_count > DEFAULT_MAX_MAP_COUNT) { + if (self->max_vma_count > DEFAULT_MAX_VMA_COUNT) { self->original_max_vma_count = self->max_vma_count; TH_LOG("Max VMA count: %d; lowering to default %d for test...", - self->max_vma_count, DEFAULT_MAX_MAP_COUNT); + self->max_vma_count, DEFAULT_MAX_VMA_COUNT);
- if (!set_max_vma_count(DEFAULT_MAX_MAP_COUNT)) + if (!set_max_vma_count(DEFAULT_MAX_VMA_COUNT)) return false; - self->max_vma_count = DEFAULT_MAX_MAP_COUNT; + self->max_vma_count = DEFAULT_MAX_VMA_COUNT; } return true; }
-static void restore_max_map_count_if_needed(max_vma_count_data_t *self, +static void restore_max_vma_count_if_needed(max_vma_count_data_t *self, struct __test_metadata *_metadata) { if (!self->original_max_vma_count) @@ -572,7 +572,7 @@ static void restore_max_map_count_if_needed(max_vma_count_data_t *self, return;
if (!set_max_vma_count(self->original_max_vma_count)) - TH_LOG("Failed to restore max_map_count to %d", + TH_LOG("Failed to restore max_vma_count to %d", self->original_max_vma_count); }
@@ -617,7 +617,7 @@ static int get_current_vma_count(void)
/* * The [vsyscall] mapping is a special mapping that - * doesn't count against the max_map_count limit. + * doesn't count against the max_vma_count limit. * Ignore it here to match the kernel's accounting. */ if (strcmp(vma_name, "[vsyscall]") != 0) diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c index 656e1c75b711..69fa7d14a6c2 100644 --- a/tools/testing/vma/vma.c +++ b/tools/testing/vma/vma.c @@ -261,7 +261,7 @@ static int cleanup_mm(struct mm_struct *mm, struct vma_iterator *vmi) }
mtree_destroy(&mm->mm_mt); - mm->map_count = 0; + mm->vma_count = 0; return count; }
@@ -500,7 +500,7 @@ static bool test_merge_new(void) INIT_LIST_HEAD(&vma_d->anon_vma_chain); list_add(&dummy_anon_vma_chain_d.same_vma, &vma_d->anon_vma_chain); ASSERT_FALSE(merged); - ASSERT_EQ(mm.map_count, 4); + ASSERT_EQ(mm.vma_count, 4);
/* * Merge BOTH sides. @@ -519,7 +519,7 @@ static bool test_merge_new(void) ASSERT_EQ(vma->vm_pgoff, 0); ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 3); + ASSERT_EQ(mm.vma_count, 3);
/* * Merge to PREVIOUS VMA. @@ -536,7 +536,7 @@ static bool test_merge_new(void) ASSERT_EQ(vma->vm_pgoff, 0); ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 3); + ASSERT_EQ(mm.vma_count, 3);
/* * Merge to NEXT VMA. @@ -555,7 +555,7 @@ static bool test_merge_new(void) ASSERT_EQ(vma->vm_pgoff, 6); ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 3); + ASSERT_EQ(mm.vma_count, 3);
/* * Merge BOTH sides. @@ -573,7 +573,7 @@ static bool test_merge_new(void) ASSERT_EQ(vma->vm_pgoff, 0); ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 2); + ASSERT_EQ(mm.vma_count, 2);
/* * Merge to NEXT VMA. @@ -591,7 +591,7 @@ static bool test_merge_new(void) ASSERT_EQ(vma->vm_pgoff, 0xa); ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 2); + ASSERT_EQ(mm.vma_count, 2);
/* * Merge BOTH sides. @@ -608,7 +608,7 @@ static bool test_merge_new(void) ASSERT_EQ(vma->vm_pgoff, 0); ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 1); + ASSERT_EQ(mm.vma_count, 1);
/* * Final state. @@ -967,7 +967,7 @@ static bool test_vma_merge_new_with_close(void) ASSERT_EQ(vma->vm_pgoff, 0); ASSERT_EQ(vma->vm_ops, &vm_ops); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 2); + ASSERT_EQ(mm.vma_count, 2);
cleanup_mm(&mm, &vmi); return true; @@ -1017,7 +1017,7 @@ static bool test_merge_existing(void) ASSERT_EQ(vma->vm_pgoff, 2); ASSERT_TRUE(vma_write_started(vma)); ASSERT_TRUE(vma_write_started(vma_next)); - ASSERT_EQ(mm.map_count, 2); + ASSERT_EQ(mm.vma_count, 2);
/* Clear down and reset. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2); @@ -1045,7 +1045,7 @@ static bool test_merge_existing(void) ASSERT_EQ(vma_next->vm_pgoff, 2); ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma_next)); - ASSERT_EQ(mm.map_count, 1); + ASSERT_EQ(mm.vma_count, 1);
/* Clear down and reset. We should have deleted vma. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1); @@ -1079,7 +1079,7 @@ static bool test_merge_existing(void) ASSERT_EQ(vma->vm_pgoff, 6); ASSERT_TRUE(vma_write_started(vma_prev)); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 2); + ASSERT_EQ(mm.vma_count, 2);
/* Clear down and reset. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2); @@ -1108,7 +1108,7 @@ static bool test_merge_existing(void) ASSERT_EQ(vma_prev->vm_pgoff, 0); ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma_prev)); - ASSERT_EQ(mm.map_count, 1); + ASSERT_EQ(mm.vma_count, 1);
/* Clear down and reset. We should have deleted vma. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1); @@ -1138,7 +1138,7 @@ static bool test_merge_existing(void) ASSERT_EQ(vma_prev->vm_pgoff, 0); ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma_prev)); - ASSERT_EQ(mm.map_count, 1); + ASSERT_EQ(mm.vma_count, 1);
/* Clear down and reset. We should have deleted prev and next. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1); @@ -1540,7 +1540,7 @@ static bool test_merge_extend(void) ASSERT_EQ(vma->vm_end, 0x4000); ASSERT_EQ(vma->vm_pgoff, 0); ASSERT_TRUE(vma_write_started(vma)); - ASSERT_EQ(mm.map_count, 1); + ASSERT_EQ(mm.vma_count, 1);
cleanup_mm(&mm, &vmi); return true; @@ -1652,7 +1652,7 @@ static bool test_mmap_region_basic(void) 0x24d, NULL); ASSERT_EQ(addr, 0x24d000);
- ASSERT_EQ(mm.map_count, 2); + ASSERT_EQ(mm.vma_count, 2);
for_each_vma(vmi, vma) { if (vma->vm_start == 0x300000) { diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h index 41d354a699c5..d89b26e81679 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -261,7 +261,7 @@ typedef struct {
struct mm_struct { struct maple_tree mm_mt; - int map_count; /* number of VMAs */ + int vma_count; /* number of VMAs */ unsigned long total_vm; /* Total pages mapped */ unsigned long locked_vm; /* Pages that have PG_mlocked set */ unsigned long data_vm; /* VM_WRITE & ~VM_SHARED & ~VM_STACK */
Introduce the trace_mm_insufficient_vma_slots tracepoint to improve observability of VMA allocation failures.
This event fires when an operation is about to fail because it requires more VMA slots than are currently available, according to the sysctl_max_map_count limit. This is a preemptive check that occurs in call paths like mmap(), mremap(), and split_vma() before they attempt to create new VMAs.
This tracepoint can be used with event-driven telemetry, such as BPF programs, to collect data from devices in the field with minimal overhead.
The tracepoint captures the mm_struct pointer and the current vma_count at the time of failure. This allows for observing the distribution of these events to determine if there are legitimate bugs or if an increase to the limit is warranted.
Cc: Andrew Morton akpm@linux-foundation.org Cc: David Hildenbrand david@redhat.com Cc: "Liam R. Howlett" Liam.Howlett@oracle.com Cc: Lorenzo Stoakes lorenzo.stoakes@oracle.com Cc: Mike Rapoport rppt@kernel.org Cc: Minchan Kim minchan@kernel.org Cc: Pedro Falcato pfalcato@suse.de Signed-off-by: Kalesh Singh kaleshsingh@google.com ---
Changes in v4: - Update commit description to accurately reflect the trace event's parameters.
Changes in v3: - capture the mm pointer as the unique identifier and capture the vma_count as well, instead of current task tgid, per Steve - Add include/trace/events/vma.h to MEMORY MAPPING section in MAINTAINERS, per Lorenzo - rename trace_max_vma_count_exceeded() to trace_mm_insufficient_vma_slots(), since this is a preemptive check, per Lorenzo - Fix tools/testing/vma build errors, per Lorenzo
MAINTAINERS | 1 + include/trace/events/vma.h | 32 ++++++++++++++++++++++++++++++++ mm/mmap.c | 5 ++++- mm/mremap.c | 10 ++++++++-- mm/vma.c | 4 +++- mm/vma_internal.h | 2 ++ tools/testing/vma/vma_internal.h | 5 +++++ 7 files changed, 55 insertions(+), 4 deletions(-) create mode 100644 include/trace/events/vma.h
diff --git a/MAINTAINERS b/MAINTAINERS index 66f7ca5b01ad..223124cb7d21 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16567,6 +16567,7 @@ S: Maintained W: http://www.linux-mm.org T: git git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm F: include/trace/events/mmap.h +F: include/trace/events/vma.h F: mm/interval_tree.c F: mm/mincore.c F: mm/mlock.c diff --git a/include/trace/events/vma.h b/include/trace/events/vma.h new file mode 100644 index 000000000000..4540fa607f66 --- /dev/null +++ b/include/trace/events/vma.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM vma + +#if !defined(_TRACE_VMA_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_VMA_H + +#include <linux/tracepoint.h> + +TRACE_EVENT(mm_insufficient_vma_slots, + + TP_PROTO(struct mm_struct *mm), + + TP_ARGS(mm), + + TP_STRUCT__entry( + __field(void *, mm) + __field(int, vma_count) + ), + + TP_fast_assign( + __entry->mm = mm; + __entry->vma_count = mm->vma_count; + ), + + TP_printk("mm=%p vma_count=%d", __entry->mm, __entry->vma_count) +); + +#endif /* _TRACE_VMA_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/mm/mmap.c b/mm/mmap.c index 647a676c0ab4..3ebe9d5f7dfe 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -56,6 +56,7 @@
#define CREATE_TRACE_POINTS #include <trace/events/mmap.h> +#include <trace/events/vma.h>
#include "internal.h"
@@ -383,8 +384,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, * sysctl_max_map_count limit by one. This behavior is preserved to * avoid breaking existing applications. */ - if (max_vma_count() - mm->vma_count < 0) + if (max_vma_count() - mm->vma_count < 0) { + trace_mm_insufficient_vma_slots(mm); return -ENOMEM; + }
/* * addr is returned from get_unmapped_area, diff --git a/mm/mremap.c b/mm/mremap.c index 4874729cd65c..dfb481c5bfb1 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -30,6 +30,8 @@ #include <asm/cacheflush.h> #include <asm/tlb.h>
+#include <trace/events/vma.h> + #include "internal.h"
/* Classify the kind of remap operation being performed. */ @@ -1040,8 +1042,10 @@ static unsigned long prep_move_vma(struct vma_remap_struct *vrm) * We'd prefer to avoid failure later on in do_munmap: * which may split one vma into three before unmapping. */ - if (max_vma_count() - current->mm->vma_count < 4) + if (max_vma_count() - current->mm->vma_count < 4) { + trace_mm_insufficient_vma_slots(current->mm); return -ENOMEM; + }
if (vma->vm_ops && vma->vm_ops->may_split) { if (vma->vm_start != old_addr) @@ -1814,8 +1818,10 @@ static unsigned long check_mremap_params(struct vma_remap_struct *vrm) * the threshold. In other words, is the current map count + 6 at or * below the threshold? Otherwise return -ENOMEM here to be more safe. */ - if (max_vma_count() - current->mm->vma_count < 6) + if (max_vma_count() - current->mm->vma_count < 6) { + trace_mm_insufficient_vma_slots(current->mm); return -ENOMEM; + }
return 0; } diff --git a/mm/vma.c b/mm/vma.c index fbb8d1a0449d..2c35c3d008bc 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -594,8 +594,10 @@ __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, static int split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, unsigned long addr, int new_below) { - if (max_vma_count() - vma->vm_mm->vma_count < 1) + if (max_vma_count() - vma->vm_mm->vma_count < 1) { + trace_mm_insufficient_vma_slots(vma->vm_mm); return -ENOMEM; + }
return __split_vma(vmi, vma, addr, new_below); } diff --git a/mm/vma_internal.h b/mm/vma_internal.h index 2f05735ff190..86823ca6857b 100644 --- a/mm/vma_internal.h +++ b/mm/vma_internal.h @@ -52,4 +52,6 @@
#include "internal.h"
+#include <trace/events/vma.h> + #endif /* __MM_VMA_INTERNAL_H */ diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h index d89b26e81679..0fdde2eb5a57 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -1497,4 +1497,9 @@ static int max_vma_count(void) return sysctl_max_map_count; }
+/* Stub for trace_mm_insufficient_vma_slots */ +static inline void trace_mm_insufficient_vma_slots(struct mm_struct *mm) +{ +} + #endif /* __MM_VMA_INTERNAL_H */
linux-kselftest-mirror@lists.linaro.org