On Mon, Jul 31, 2023 at 1:30 PM Liam R. Howlett Liam.Howlett@oracle.com wrote:
Adding Lorenzo since this also touches vma_merge() again..
- Suren Baghdasaryan surenb@google.com [230731 13:12]:
vma_prepare() is currently the central place where vmas are being locked before vma_complete() applies changes to them. While this is convenient, it also obscures vma locking and makes it hard to follow the locking rules. Move vma locking out of vma_prepare() and take vma locks explicitly at the locations where vmas are being modified.
I get the idea of locking closer to the edits, but vma_merge() is very hard to follow. It was worse when it was two functions and much larger, but adding this into vma_merge() is difficult to validate.
We still set vma = <another vma> in places, so that adds to the difficulty of ensuring the end result is all VMAs that will be modified/removed have been locked...and the 'locking rules' are also obscured.
It's also annoying that this doesn't fully allow you to follow the locking anyways. dup_anon_vma() still locks internally, with good reason, but it's still not clear that the VMA is locked when looking at this.
That being said, I did go through each case and it looks like it locks the correct VMA(s) to me.
Thanks! Yeah, it took me some time to ensure the locking there is correct. If you see a better alternative to make the locking more obvious I'm open to suggestions. I accept that locking in vma_merge() is not easy to follow.
Reviewed-by: Liam R. Howlett Liam.Howlett@oracle.com
Suggested-by: Linus Torvalds torvalds@linuxfoundation.org Signed-off-by: Suren Baghdasaryan surenb@google.com
mm/mmap.c | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c index 850a39dee075..e59d83cb1d7a 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -476,16 +476,6 @@ static inline void init_vma_prep(struct vma_prepare *vp, */ static inline void vma_prepare(struct vma_prepare *vp) {
vma_start_write(vp->vma);
if (vp->adj_next)
vma_start_write(vp->adj_next);
if (vp->insert)
vma_start_write(vp->insert);
if (vp->remove)
vma_start_write(vp->remove);
if (vp->remove2)
vma_start_write(vp->remove2);
if (vp->file) { uprobe_munmap(vp->vma, vp->vma->vm_start, vp->vma->vm_end);
@@ -650,6 +640,7 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, bool remove_next = false; struct vma_prepare vp;
vma_start_write(vma); if (next && (vma != next) && (end == next->vm_end)) { int ret;
@@ -657,6 +648,7 @@ int vma_expand(struct vma_iterator *vmi, struct vm_area_struct *vma, ret = dup_anon_vma(vma, next); if (ret) return ret;
vma_start_write(next); } init_multi_vma_prep(&vp, vma, NULL, remove_next ? next : NULL, NULL);
@@ -708,6 +700,8 @@ int vma_shrink(struct vma_iterator *vmi, struct vm_area_struct *vma, if (vma_iter_prealloc(vmi)) return -ENOMEM;
vma_start_write(vma);
init_vma_prep(&vp, vma); vma_prepare(&vp); vma_adjust_trans_huge(vma, start, end, 0);
@@ -946,10 +940,12 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, /* Can we merge both the predecessor and the successor? */ if (merge_prev && merge_next && is_mergeable_anon_vma(prev->anon_vma, next->anon_vma, NULL)) {
vma_start_write(next); remove = next; /* case 1 */ vma_end = next->vm_end; err = dup_anon_vma(prev, next); if (curr) { /* case 6 */
vma_start_write(curr); remove = curr; remove2 = next; if (!next->anon_vma)
@@ -958,6 +954,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, } else if (merge_prev) { /* case 2 */ if (curr) { err = dup_anon_vma(prev, curr);
vma_start_write(curr); if (end == curr->vm_end) { /* case 7 */ remove = curr; } else { /* case 5 */
@@ -969,6 +966,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, res = next; if (prev && addr < prev->vm_end) { /* case 4 */ vma_end = addr;
vma_start_write(next); adjust = next; adj_start = -(prev->vm_end - addr); err = dup_anon_vma(next, prev);
@@ -983,6 +981,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, vma_pgoff = next->vm_pgoff - pglen; if (curr) { /* case 8 */ vma_pgoff = curr->vm_pgoff;
vma_start_write(curr); remove = curr; err = dup_anon_vma(next, curr); }
@@ -996,6 +995,8 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm, if (vma_iter_prealloc(vmi)) return NULL;
vma_start_write(vma);
init_multi_vma_prep(&vp, vma, adjust, remove, remove2); VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && vp.anon_vma != adjust->anon_vma);
@@ -2373,6 +2374,9 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma, if (new->vm_ops && new->vm_ops->open) new->vm_ops->open(new);
vma_start_write(vma);
vma_start_write(new);
init_vma_prep(&vp, vma); vp.insert = new; vma_prepare(&vp);
@@ -3078,6 +3082,8 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma, if (vma_iter_prealloc(vmi)) goto unacct_fail;
vma_start_write(vma);
init_vma_prep(&vp, vma); vma_prepare(&vp); vma_adjust_trans_huge(vma, vma->vm_start, addr + len, 0);
-- 2.41.0.487.g6d72f3e995-goog