On Thu, Aug 3, 2023 at 11:15 AM Liam R. Howlett Liam.Howlett@oracle.com wrote:
- Linus Torvalds torvalds@linux-foundation.org [230803 14:02]:
On Thu, 3 Aug 2023 at 10:27, Suren Baghdasaryan surenb@google.com wrote:
While it's not strictly necessary to lock a newly created vma before adding it into the vma tree (as long as no further changes are performed to it), it seems like a good policy to lock it and prevent accidental changes after it becomes visible to the page faults. Lock the vma before adding it into the vma tree.
So my main reaction here is that I started to wonder about the vma allocation.
Why doesn't vma_init() do something like
mmap_assert_write_locked(mm); vma->vm_lock_seq = mm->mm_lock_seq;
and instead we seem to expect vma_lock_alloc() to do this (and do it very badly indeed).
Strange.
Anyway, this observation was just a reaction to that "not strictly necessary to lock a newly created vma" part of the commentary. I feel like we could/should just make sure that all newly created vma's are always simply created write-locked.
I thought the same thing initially, but Suren pointed out that it's not necessary to hold the vma lock to allocate a vma object. And it seems there is at least one user (arch/ia64/mm/init.c) which does allocate outside the lock during ia64_init_addr_space(), which is fine but I'm not sure it gains much to do it this way - the insert needs to take the lock anyways and it is hardly going to be contended.
Yeah, I remember discussing that. At the time of VMA creation the mmap_lock might not be write-locked, so mmap_assert_write_locked() would trigger and mm->mm_lock_seq is not stable. Maybe we can necessitate holding mmap_lock at the time of VMA creation but that sounds like an unnecessary restriction. IIRC some drivers also create vm_are_structs without holding mmap_lock... I'll double-check.
Anywhere else besides an address space setup would probably introduce a race.
Thanks, Liam