On Thu, Jun 05, 2025 at 09:33:24AM +0200, Vlastimil Babka wrote:
On 6/3/25 20:21, Jann Horn wrote:
When fork() encounters possibly-pinned pages, those pages are immediately copied instead of just marking PTEs to make CoW happen later. If the parent is multithreaded, this can cause the child to see memory contents that are inconsistent in multiple ways:
- We are copying the contents of a page with a memcpy() while userspace may be writing to it. This can cause the resulting data in the child to be inconsistent.
- After we've copied this page, future writes to other pages may continue to be visible to the child while future writes to this page are no longer visible to the child.
This means the child could theoretically see incoherent states where allocator freelists point to objects that are actually in use or stuff like that. A mitigating factor is that, unless userspace already has a deadlock bug, userspace can pretty much only observe such issues when fancy lockless data structures are used (because if another thread was in the middle of mutating data during fork() and the post-fork child tried to take the mutex protecting that data, it might wait forever).
On top of that, this issue is only observable when pages are either DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024 references and the parent process having used DMA-pinning at least once before.
Seems the changelog seems to be missing the part describing what it's doing to fix the issue? Some details are not immediately obvious (the writing threads become blocked in page fault) as the conversation has shown.
Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn jannh@google.com
Given how the fix seems to be localized to the already rare slowpath and doesn't require us to pessimize every trivial fork(), it seems reasonable to me even if don't have a concrete example of a sane code in the wild that's broken by the current behavior, so:
Acked-by: Vlastimil Babka vbabka@suse.cz
Acked-by: Pedro Falcato pfalcato@suse.de