On 11/5/25 20:48, Lorenzo Stoakes wrote:
On Thu, Oct 30, 2025 at 07:47:34PM +0100, Vlastimil Babka wrote:
On 10/30/25 19:31, Vlastimil Babka wrote:
On 10/30/25 17:43, Lorenzo Stoakes wrote:
Could we use MADVISE_VMA_READ_LOCK mode (would be actually an improvement over the current MADVISE_MMAP_READ_LOCK), together with the atomic flag setting? I think the places that could race with us to cause RMW use vma write lock so that would be excluded. Fork AFAICS unfortunately doesn't (for the oldmm) and it probably would't make sense to start doing it. Maybe we could think of something to deal with this special case...
During discussion with Pedro off-list I realized fork takes mmap lock for write on the old mm, so if we kept taking mmap sem for read, then vma lock for read in addition (which should be cheap enough, also we'd only need it in case VM_MAYBE_GUARD is not yet set), and set the flag atomicaly, perhaps that would cover all non-bening races?
So thinking about this again, taking mmap read lock will exclude any VMA write locks (which must hold mmap write lock), so no need to additionally take VMA read lock.
Right. Of course it would be nice if we could later also replace the mmap read lock with vma read lock. Hopefully should be feasible because MADV_DONTNEED can do it and guard ranges perform similar kind of operations (with more complex page table handling necessary, IIRC but that should hopefully be still fine with vma read lock).
It's obviously not in scope of the series here, maybe just to keep in mind if that next step will be compatible with anything done here.
Also as mentioned later in thread, the invocation of vma_needs_copy() is performed under VMA write lock (and this mmap write lock) so no need to read atomically there either.
As per Pedro, we can treat other races as benign.
Merge/Split etc. are done under VMA write lock so there's no race that matters there.
The only other place we even care about this flag in is for /proc/$pid/smaps, but there it'd be benign as you'd happen not to observe the flag being set up at the point at which a concurrent guard region install is happening - something that you could race with _anyway_.
Right.
As Pedro says, remaining cases where you read flags would be benign, as the readers should never observe a meaningful torn read being a bitmap and given this flag only matters on fork and smaps.
So I think just having something that sets atomically with an allow list is fine, but making that very strict so literally just this flag is allowed (currently).
Ack.