Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb)

18 Dec 2021


      ...
On Dec 17, 2021, at 8:02 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
On Fri, Dec 17, 2021 at 3:53 PM Nadav Amit namit@vmware.com wrote:
...
I understand the discussion mainly revolves correctness, which is
obviously the most important property, but I would like to mention
that having transient get_page() calls causing unnecessary COWs can
cause hard-to-analyze and hard-to-avoid performance degradation.
Note that the COW itself is pretty cheap. Yes, there's the page
allocation and copy, but it's mostly a local thing.
I don’t know about the page-lock overhead, but I understand your argument.
Having said that, I do know a bit about TLB flushes, which you did not
mention as overheads of COW. Such flushes can be quite expensive on
multithreaded workloads (specifically on VMs, but lets put those aside).
Take for instance memcached and assume you overcommit memory with a very fast
swap (e.g., pmem, zram, perhaps even slower). Now, it turns out memcached
often accesses a page first for read and shortly after for write. I
encountered, in a similar scenario, that the page reference that
lru_cache_add() takes during the first faultin event (for read), causes a COW
on a write page-fault that happens shortly after [1]. So on memcached I
assume this would also trigger frequent unnecessary COWs.
Besides page allocation and copy, COW would then require a TLB flush, which,
when performed locally, might not be too bad (~200 cycles). But if memcached
has many threads, as it usually does, then you need a TLB shootdown and this
one can be expensive (microseconds). If you start getting a TLB shootdown
storm, you may avoid some IPIs since you see that other CPUs already queued
IPIs for the target CPU. But then the kernel would flush the entire TLB on
the the target CPU, as it realizes that multiple TLB flushes were queued,
and as it assumes that a full TLB flush would be cheaper.
[ I can try to run a benchmark during the weekend to measure the impact, as I
  did not really measure the impact on memcached before/after 5.8. ]
So I am in no position to prioritize one overhead over the other, but I do
not think that COW can be characterized as mostly-local and cheap in the
case of multithreaded workloads.
[1] https://lore.kernel.org/linux-mm/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail....

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v1 06/11] mm: support GUP-triggered unsharing via FAULT_FLAG_UNSHARE (!hugetlb)