On Oct 9, 2025, at 12:23 AM, David Hildenbrand david@redhat.com wrote:
On 09.10.25 00:54, Prakash Sangappa wrote:
On Sep 1, 2025, at 4:26 AM, David Hildenbrand david@redhat.com wrote:
On 01.09.25 12:58, Jann Horn wrote:
Hi! On Fri, Aug 29, 2025 at 4:30 PM Uschakow, Stanislav suschako@amazon.de wrote:
We have observed a huge latency increase using `fork()` after ingesting the CVE-2025-38085 fix which leads to the commit `1013af4f585f: mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race`. On large machines with 1.5TB of memory with 196 cores, we identified mmapping of 1.2TB of shared memory and forking itself dozens or hundreds of times we see a increase of execution times of a factor of 4. The reproducer is at the end of the email.
Yeah, every 1G virtual address range you unshare on unmap will do an extra synchronous IPI broadcast to all CPU cores, so it's not very surprising that doing this would be a bit slow on a machine with 196 cores.
What is the use case for this extreme usage of fork() in that context? Is it just something people noticed and it's suboptimal, or is this a real problem for some use cases?
Our DB team is reporting performance issues due to this change. While running TPCC, Database timeouts & shuts down(crashes). This is seen when there are a large number of processes(thousands) involved. It is not so prominent when there are lesser number of processes. Backing out this change addresses the problem.
I suspect the timeouts are due to fork() taking longer, and there is no kernel crash etc, right?
That is correct, there is no kernel crash. -Prakash
-- Cheers
David / dhildenb