Re: Bug: Performance regression in 1013af4f585f: mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race

9 Oct 2025


      ...
On Oct 9, 2025, at 12:23 AM, David Hildenbrand david@redhat.com wrote:
On 09.10.25 00:54, Prakash Sangappa wrote:
...
...
On Sep 1, 2025, at 4:26 AM, David Hildenbrand david@redhat.com wrote:
On 01.09.25 12:58, Jann Horn wrote:
...
Hi!
On Fri, Aug 29, 2025 at 4:30 PM Uschakow, Stanislav suschako@amazon.de wrote:
...
We have observed a huge latency increase using `fork()` after ingesting the CVE-2025-38085 fix which leads to the commit `1013af4f585f: mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race`. On large machines with 1.5TB of memory with 196 cores, we identified mmapping of 1.2TB of shared memory and forking itself dozens or hundreds of times we see a increase of execution times of a factor of 4. The reproducer is at the end of the email.
Yeah, every 1G virtual address range you unshare on unmap will do an
extra synchronous IPI broadcast to all CPU cores, so it's not very
surprising that doing this would be a bit slow on a machine with 196
cores.
What is the use case for this extreme usage of fork() in that context? Is it just something people noticed and it's suboptimal, or is this a real problem for some use cases?
Our DB team is reporting performance issues due to this change. While running TPCC,  Database
timeouts & shuts down(crashes). This is seen when there are a large number of
processes(thousands) involved. It is not so prominent when there are lesser number of
processes.
Backing out this change addresses the problem.
I suspect the timeouts are due to fork() taking longer, and there is no kernel crash etc, right?
That is correct, there is no kernel crash.
-Prakash
...
-- 
Cheers
David / dhildenb

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: Bug: Performance regression in 1013af4f585f: mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race