On Wed, Oct 29, 2025 at 9:51 AM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
It's useful to be able to force a VMA to be copied on fork outside of the parameters specified by vma_needs_copy(), which otherwise only copies page tables if:
- The destination VMA has VM_UFFD_WP set
- The mapping is a PFN or mixed map
- The mapping is anonymous and forked in (i.e. vma->anon_vma is non-NULL)
Setting this flag implies that the page tables mapping the VMA are such that simply re-faulting the VMA will not re-establish them in identical form.
We introduce VM_COPY_ON_FORK to clearly identify which flags require this behaviour, which currently is only VM_MAYBE_GUARD.
Any VMA flags which require this behaviour are inherently 'sticky', that is, should we merge two VMAs together, this implies that the newly merged VMA maps a range that requires page table copying on fork.
In order to implement this we must both introduce the concept of a 'sticky' VMA flag and adjust the VMA merge logic accordingly, and also have VMA merge still successfully succeed should one VMA have the flag set and another not.
"successfully succeed" sounds weird. Just "succeed"?
Note that we update the VMA expand logic to handle new VMA merging, as this function is the one ultimately called by all instances of merging of new VMAs.
This patch implements this, establishing VM_STICKY to contain all such flags and VM_IGNORE_MERGE for those flags which should be ignored when comparing adjacent VMA's flags for the purposes of merging.
As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it already had this behaviour, alongside VM_STICKY as sticky flags by implication must not disallow merge.
We update the VMA userland tests to account for the changes and, furthermore, in order to assert that the functionality is workingly
s/workingly/working
correctly, update the new VMA and existing VMA merging logic to consider every permutation of the flag being set/not set in all VMAs being considered for merge.
As a result of this change, VMAs with guard ranges will now not have their merge behaviour impacted by doing so and can be freely merged with other VMAs without VM_MAYBE_GUARD set.
Signed-off-by: Lorenzo Stoakes lorenzo.stoakes@oracle.com
include/linux/mm.h | 32 ++++++++++++ mm/memory.c | 3 +- mm/vma.c | 22 ++++---- tools/testing/vma/vma.c | 89 ++++++++++++++++++++++++++++---- tools/testing/vma/vma_internal.h | 32 ++++++++++++ 5 files changed, 156 insertions(+), 22 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h index f963afa1b9de..a8811ba57150 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -522,6 +522,38 @@ extern unsigned int kobjsize(const void *objp); #endif #define VM_FLAGS_CLEAR (ARCH_VM_PKEY_FLAGS | VM_ARCH_CLEAR)
+/* Flags which should result in page tables being copied on fork. */ +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
+/*
- Flags which should be 'sticky' on merge - that is, flags which, when one VMA
- possesses it but the other does not, the merged VMA should nonetheless have
- applied to it:
- VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
metadata which should be unconditionally propagated upon
fork. When merging two VMAs, we encapsulate this range in
the merged VMA, so the flag should be 'sticky' as a result.
It's probably worth noting that after a split, we do not remove "sticky" flags even if the VMA acquired them as a result of a previous merge.
- */
+#define VM_STICKY VM_COPY_ON_FORK
+/*
- VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
- of these flags and the other not does not preclude a merge.
- VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
dirty bit -- the caller should mark merged VMA as dirty. If
dirty bit won't be excluded from comparison, we increase
pressure on the memory system forcing the kernel to generate
new VMAs when old one could be extended instead.
- VM_STICKY - If one VMA has flags which most be 'sticky', that is ones
s/most/must ?
which should propagate to all VMAs, but the other does not,
the merge should still proceed with the merge logic applying
sticky flags to the final VMA.- */
+#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
/*
- mapping from the currently active vm_flags protection bits (the
- low four bits) to a page protection mask..
diff --git a/mm/memory.c b/mm/memory.c index a2c79ee43d68..9528133e5147 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1478,8 +1478,7 @@ vma_needs_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) if (src_vma->anon_vma) return true;
/* Guard regions have momdified page tables that require copying. */if (src_vma->vm_flags & VM_MAYBE_GUARD)
if (src_vma->vm_flags & VM_COPY_ON_FORK) return true; /*diff --git a/mm/vma.c b/mm/vma.c index 919d1fc63a52..50a6909c4be3 100644 --- a/mm/vma.c +++ b/mm/vma.c @@ -89,15 +89,7 @@ static inline bool is_mergeable_vma(struct vma_merge_struct *vmg, bool merge_nex
if (!mpol_equal(vmg->policy, vma_policy(vma))) return false;
/** VM_SOFTDIRTY should not prevent from VMA merging, if we* match the flags but dirty bit -- the caller should mark* merged VMA as dirty. If dirty bit won't be excluded from* comparison, we increase pressure on the memory system forcing* the kernel to generate new VMAs when old one could be* extended instead.*/if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_SOFTDIRTY)
if ((vma->vm_flags ^ vmg->vm_flags) & ~VM_IGNORE_MERGE) return false; if (vma->vm_file != vmg->file) return false;@@ -809,6 +801,7 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma) static __must_check struct vm_area_struct *vma_merge_existing_range( struct vma_merge_struct *vmg) {
vm_flags_t sticky_flags = vmg->vm_flags & VM_STICKY; struct vm_area_struct *middle = vmg->middle; struct vm_area_struct *prev = vmg->prev; struct vm_area_struct *next;@@ -901,11 +894,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range( if (merge_right) { vma_start_write(next); vmg->target = next;
sticky_flags |= (next->vm_flags & VM_STICKY); } if (merge_left) { vma_start_write(prev); vmg->target = prev;sticky_flags |= (prev->vm_flags & VM_STICKY); } if (merge_both) {@@ -975,6 +970,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range( if (err || commit_merge(vmg)) goto abort;
vm_flags_set(vmg->target, sticky_flags); khugepaged_enter_vma(vmg->target, vmg->vm_flags); vmg->state = VMA_MERGE_SUCCESS; return vmg->target;@@ -1125,6 +1121,10 @@ int vma_expand(struct vma_merge_struct *vmg) bool remove_next = false; struct vm_area_struct *target = vmg->target; struct vm_area_struct *next = vmg->next;
vm_flags_t sticky_flags;sticky_flags = vmg->vm_flags & VM_STICKY;sticky_flags |= target->vm_flags & VM_STICKY; VM_WARN_ON_VMG(!target, vmg);@@ -1134,6 +1134,7 @@ int vma_expand(struct vma_merge_struct *vmg) if (next && (target != next) && (vmg->end == next->vm_end)) { int ret;
sticky_flags |= next->vm_flags & VM_STICKY; remove_next = true; /* This should already have been checked by this point. */ VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg);@@ -1160,6 +1161,7 @@ int vma_expand(struct vma_merge_struct *vmg) if (commit_merge(vmg)) goto nomem;
vm_flags_set(target, sticky_flags); return 0;nomem: @@ -1903,7 +1905,7 @@ static int anon_vma_compatible(struct vm_area_struct *a, struct vm_area_struct * return a->vm_end == b->vm_start && mpol_equal(vma_policy(a), vma_policy(b)) && a->vm_file == b->vm_file &&
!((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_SOFTDIRTY)) &&
!((a->vm_flags ^ b->vm_flags) & ~(VM_ACCESS_FLAGS | VM_IGNORE_MERGE)) && b->vm_pgoff == a->vm_pgoff + ((b->vm_start - a->vm_start) >> PAGE_SHIFT);}
diff --git a/tools/testing/vma/vma.c b/tools/testing/vma/vma.c index 656e1c75b711..ee9d3547c421 100644 --- a/tools/testing/vma/vma.c +++ b/tools/testing/vma/vma.c
I prefer tests in a separate patch, but that might just be me. Feel free to ignore.
@@ -48,6 +48,8 @@ static struct anon_vma dummy_anon_vma; #define ASSERT_EQ(_val1, _val2) ASSERT_TRUE((_val1) == (_val2)) #define ASSERT_NE(_val1, _val2) ASSERT_TRUE((_val1) != (_val2))
+#define IS_SET(_val, _flags) ((_val & _flags) == _flags)
static struct task_struct __current;
struct task_struct *get_current(void) @@ -441,7 +443,7 @@ static bool test_simple_shrink(void) return true; }
-static bool test_merge_new(void) +static bool __test_merge_new(bool is_sticky, bool a_is_sticky, bool b_is_sticky, bool c_is_sticky) { vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE; struct mm_struct mm = {}; @@ -469,23 +471,32 @@ static bool test_merge_new(void) struct vm_area_struct *vma, *vma_a, *vma_b, *vma_c, *vma_d; bool merged;
if (is_sticky)vm_flags |= VM_STICKY;/* * 0123456789abc * AA B CC */ vma_a = alloc_and_link_vma(&mm, 0, 0x2000, 0, vm_flags); ASSERT_NE(vma_a, NULL);if (a_is_sticky)vm_flags_set(vma_a, VM_STICKY); /* We give each VMA a single avc so we can test anon_vma duplication. */ INIT_LIST_HEAD(&vma_a->anon_vma_chain); list_add(&dummy_anon_vma_chain_a.same_vma, &vma_a->anon_vma_chain); vma_b = alloc_and_link_vma(&mm, 0x3000, 0x4000, 3, vm_flags); ASSERT_NE(vma_b, NULL);if (b_is_sticky)vm_flags_set(vma_b, VM_STICKY); INIT_LIST_HEAD(&vma_b->anon_vma_chain); list_add(&dummy_anon_vma_chain_b.same_vma, &vma_b->anon_vma_chain); vma_c = alloc_and_link_vma(&mm, 0xb000, 0xc000, 0xb, vm_flags); ASSERT_NE(vma_c, NULL);if (c_is_sticky)vm_flags_set(vma_c, VM_STICKY); INIT_LIST_HEAD(&vma_c->anon_vma_chain); list_add(&dummy_anon_vma_chain_c.same_vma, &vma_c->anon_vma_chain);@@ -520,6 +531,8 @@ static bool test_merge_new(void) ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); ASSERT_EQ(mm.map_count, 3);
if (is_sticky || a_is_sticky || b_is_sticky)ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY)); /* * Merge to PREVIOUS VMA.@@ -537,6 +550,8 @@ static bool test_merge_new(void) ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); ASSERT_EQ(mm.map_count, 3);
if (is_sticky || a_is_sticky)ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY)); /* * Merge to NEXT VMA.@@ -556,6 +571,8 @@ static bool test_merge_new(void) ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); ASSERT_EQ(mm.map_count, 3);
if (is_sticky) /* D uses is_sticky. */ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY)); /* * Merge BOTH sides.@@ -574,6 +591,8 @@ static bool test_merge_new(void) ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); ASSERT_EQ(mm.map_count, 2);
if (is_sticky || a_is_sticky)ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY)); /* * Merge to NEXT VMA.@@ -592,6 +611,8 @@ static bool test_merge_new(void) ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); ASSERT_EQ(mm.map_count, 2);
if (is_sticky || c_is_sticky)ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY)); /* * Merge BOTH sides.@@ -609,6 +630,8 @@ static bool test_merge_new(void) ASSERT_EQ(vma->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma)); ASSERT_EQ(mm.map_count, 1);
if (is_sticky || a_is_sticky || c_is_sticky)ASSERT_TRUE(IS_SET(vma->vm_flags, VM_STICKY)); /* * Final state.@@ -637,6 +660,20 @@ static bool test_merge_new(void) return true; }
+static bool test_merge_new(void) +{
int i, j, k, l;/* Generate every possible permutation of sticky flags. */for (i = 0; i < 2; i++)for (j = 0; j < 2; j++)for (k = 0; k < 2; k++)for (l = 0; l < 2; l++)ASSERT_TRUE(__test_merge_new(i, j, k, l));return true;+}
static bool test_vma_merge_special_flags(void) { vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE; @@ -973,9 +1010,11 @@ static bool test_vma_merge_new_with_close(void) return true; }
-static bool test_merge_existing(void) +static bool __test_merge_existing(bool prev_is_sticky, bool middle_is_sticky, bool next_is_sticky) { vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE;
vm_flags_t prev_flags = vm_flags;vm_flags_t next_flags = vm_flags; struct mm_struct mm = {}; VMA_ITERATOR(vmi, &mm, 0); struct vm_area_struct *vma, *vma_prev, *vma_next;@@ -988,6 +1027,13 @@ static bool test_merge_existing(void) }; struct anon_vma_chain avc = {};
if (prev_is_sticky)prev_flags |= VM_STICKY;if (middle_is_sticky)vm_flags |= VM_STICKY;if (next_is_sticky)next_flags |= VM_STICKY;/* * Merge right case - partial span. *@@ -1000,7 +1046,7 @@ static bool test_merge_existing(void) */ vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags); vma->vm_ops = &vm_ops; /* This should have no impact. */
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags); vma_next->vm_ops = &vm_ops; /* This should have no impact. */ vmg_set_range_anon_vma(&vmg, 0x3000, 0x6000, 3, vm_flags, &dummy_anon_vma); vmg.middle = vma;@@ -1018,6 +1064,8 @@ static bool test_merge_existing(void) ASSERT_TRUE(vma_write_started(vma)); ASSERT_TRUE(vma_write_started(vma_next)); ASSERT_EQ(mm.map_count, 2);
if (middle_is_sticky || next_is_sticky)ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY)); /* Clear down and reset. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);@@ -1033,7 +1081,7 @@ static bool test_merge_existing(void) * NNNNNNN */ vma = alloc_and_link_vma(&mm, 0x2000, 0x6000, 2, vm_flags);
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, vm_flags);
vma_next = alloc_and_link_vma(&mm, 0x6000, 0x9000, 6, next_flags); vma_next->vm_ops = &vm_ops; /* This should have no impact. */ vmg_set_range_anon_vma(&vmg, 0x2000, 0x6000, 2, vm_flags, &dummy_anon_vma); vmg.middle = vma;@@ -1046,6 +1094,8 @@ static bool test_merge_existing(void) ASSERT_EQ(vma_next->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma_next)); ASSERT_EQ(mm.map_count, 1);
if (middle_is_sticky || next_is_sticky)ASSERT_TRUE(IS_SET(vma_next->vm_flags, VM_STICKY)); /* Clear down and reset. We should have deleted vma. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);@@ -1060,7 +1110,7 @@ static bool test_merge_existing(void) * 0123456789 * PPPPPPV */
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags); vma_prev->vm_ops = &vm_ops; /* This should have no impact. */ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags); vma->vm_ops = &vm_ops; /* This should have no impact. */@@ -1080,6 +1130,8 @@ static bool test_merge_existing(void) ASSERT_TRUE(vma_write_started(vma_prev)); ASSERT_TRUE(vma_write_started(vma)); ASSERT_EQ(mm.map_count, 2);
if (prev_is_sticky || middle_is_sticky)ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY)); /* Clear down and reset. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 2);@@ -1094,7 +1146,7 @@ static bool test_merge_existing(void) * 0123456789 * PPPPPPP */
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags); vma_prev->vm_ops = &vm_ops; /* This should have no impact. */ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags); vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma);@@ -1109,6 +1161,8 @@ static bool test_merge_existing(void) ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma_prev)); ASSERT_EQ(mm.map_count, 1);
if (prev_is_sticky || middle_is_sticky)ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY)); /* Clear down and reset. We should have deleted vma. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);@@ -1123,10 +1177,10 @@ static bool test_merge_existing(void) * 0123456789 * PPPPPPPPPP */
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags); vma_prev->vm_ops = &vm_ops; /* This should have no impact. */ vma = alloc_and_link_vma(&mm, 0x3000, 0x7000, 3, vm_flags);
vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, vm_flags);
vma_next = alloc_and_link_vma(&mm, 0x7000, 0x9000, 7, next_flags); vmg_set_range_anon_vma(&vmg, 0x3000, 0x7000, 3, vm_flags, &dummy_anon_vma); vmg.prev = vma_prev; vmg.middle = vma;@@ -1139,6 +1193,8 @@ static bool test_merge_existing(void) ASSERT_EQ(vma_prev->anon_vma, &dummy_anon_vma); ASSERT_TRUE(vma_write_started(vma_prev)); ASSERT_EQ(mm.map_count, 1);
if (prev_is_sticky || middle_is_sticky || next_is_sticky)ASSERT_TRUE(IS_SET(vma_prev->vm_flags, VM_STICKY)); /* Clear down and reset. We should have deleted prev and next. */ ASSERT_EQ(cleanup_mm(&mm, &vmi), 1);@@ -1158,9 +1214,9 @@ static bool test_merge_existing(void) * PPPVVVVVNNN */
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, vm_flags);
vma_prev = alloc_and_link_vma(&mm, 0, 0x3000, 0, prev_flags); vma = alloc_and_link_vma(&mm, 0x3000, 0x8000, 3, vm_flags);
vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, vm_flags);
vma_next = alloc_and_link_vma(&mm, 0x8000, 0xa000, 8, next_flags); vmg_set_range(&vmg, 0x4000, 0x5000, 4, vm_flags); vmg.prev = vma;@@ -1203,6 +1259,19 @@ static bool test_merge_existing(void) return true; }
+static bool test_merge_existing(void) +{
int i, j, k;/* Generate every possible permutation of sticky flags. */for (i = 0; i < 2; i++)for (j = 0; j < 2; j++)for (k = 0; k < 2; k++)ASSERT_TRUE(__test_merge_existing(i, j, k));return true;+}
static bool test_anon_vma_non_mergeable(void) { vm_flags_t vm_flags = VM_READ | VM_WRITE | VM_MAYREAD | VM_MAYWRITE; diff --git a/tools/testing/vma/vma_internal.h b/tools/testing/vma/vma_internal.h index e40c93edc5a7..3d9cb3a9411a 100644 --- a/tools/testing/vma/vma_internal.h +++ b/tools/testing/vma/vma_internal.h @@ -117,6 +117,38 @@ extern unsigned long dac_mmap_min_addr; #define VM_SEALED VM_NONE #endif
+/* Flags which should result in page tables being copied on fork. */ +#define VM_COPY_ON_FORK VM_MAYBE_GUARD
+/*
- Flags which should be 'sticky' on merge - that is, flags which, when one VMA
- possesses it but the other does not, the merged VMA should nonetheless have
- applied to it:
- VM_COPY_ON_FORK - These flags indicates that a VMA maps a range that contains
metadata which should be unconditionally propagated upon
fork. When merging two VMAs, we encapsulate this range in
the merged VMA, so the flag should be 'sticky' as a result.- */
+#define VM_STICKY VM_COPY_ON_FORK
+/*
- VMA flags we ignore for the purposes of merge, i.e. one VMA possessing one
- of these flags and the other not does not preclude a merge.
- VM_SOFTDIRTY - Should not prevent from VMA merging, if we match the flags but
dirty bit -- the caller should mark merged VMA as dirty. If
dirty bit won't be excluded from comparison, we increase
pressure on the memory system forcing the kernel to generate
new VMAs when old one could be extended instead.
- VM_STICKY - If one VMA has flags which must be 'sticky', that is ones
which should propagate to all VMAs, but the other does not,
the merge should still proceed with the merge logic applying
sticky flags to the final VMA.- */
+#define VM_IGNORE_MERGE (VM_SOFTDIRTY | VM_STICKY)
#define FIRST_USER_ADDRESS 0UL #define USER_PGTABLES_CEILING 0UL
-- 2.51.0