Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot

4 Jun 2025

      On 03.06.25 21:03, Jann Horn wrote:
...
On Tue, Jun 3, 2025 at 8:29 PM Matthew Wilcox willy@infradead.org wrote:
...
On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
...
When fork() encounters possibly-pinned pages, those pages are immediately
copied instead of just marking PTEs to make CoW happen later. If the parent
is multithreaded, this can cause the child to see memory contents that are
inconsistent in multiple ways:

We are copying the contents of a page with a memcpy() while userspace
 may be writing to it. This can cause the resulting data in the child to
 be inconsistent.
After we've copied this page, future writes to other pages may
 continue to be visible to the child while future writes to this page are
 no longer visible to the child.

This means the child could theoretically see incoherent states where
allocator freelists point to objects that are actually in use or stuff like
that. A mitigating factor is that, unless userspace already has a deadlock
bug, userspace can pretty much only observe such issues when fancy lockless
data structures are used (because if another thread was in the middle of
mutating data during fork() and the post-fork child tried to take the mutex
protecting that data, it might wait forever).
Um, OK, but isn't that expected behaviour?  POSIX says:
I don't think it is expected behavior that locklessly-updated data
structures in application code could break.
...
: A process shall be created with a single thread. If a multi-threaded
: process calls fork(), the new process shall contain a replica of the
: calling thread and its entire address space, possibly including the
: states of mutexes and other resources. Consequently, the application
: shall ensure that the child process only executes async-signal-safe
: operations until such time as one of the exec functions is successful.
I think that is only talking about ways in which you can interact with
libc, and does not mean that application code couldn't access its own
lockless data structures or such.
Though admittedly that is a fairly theoretical point, since in
practice the most likely place where you'd encounter this kind of
issue would be in an allocator implementation or such.
I thought a bit further about that.
The thing is, that another thread in the parent might be doing something 
lockless using atomics etc. And in the parent, that thread will make 
progress as soon as fork() is done. In the child, that thread will never 
make progress again, as it is gone.
So the assumption must be that another thread possibly stalling forever 
and not making any progress will not affect the algorithm in the child.
I am not sure if that is actually the case for many lockless algorithms 
in allocators. I'm curious, do we have any examples of such algorithms, 
in particular, regarding allocators?
-- 
Cheers,

David / dhildenb

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot