On Fri, Dec 17, 2021 at 3:53 PM Nadav Amit namit@vmware.com wrote:
I understand the discussion mainly revolves correctness, which is obviously the most important property, but I would like to mention that having transient get_page() calls causing unnecessary COWs can cause hard-to-analyze and hard-to-avoid performance degradation.
Note that the COW itself is pretty cheap. Yes, there's the page allocation and copy, but it's mostly a local thing.
So that falls under the "good to avoid" heading, but in the end it's not an immense deal.
In contrast, the page lock has been an actual big user-visible latency issue, to the point of correctness.
A couple of years ago, we literally had NMI watchdog timeouts due to the page wait-queues growing basically boundlessly. This was some customer internal benchmark code that I never saw, so it wasn't *quite* clear exactly what was going on, but we ended up having to split up the page wait list traversal using bookmark entries, because it was such a huge latency issue.
That was mostly NUMA balancing faults, I think, but the point I'm making is that avoiding the page lock can be a *much* bigger deal than avoiding some local allocation and copying of a page of data. There are real loads where the page-lock gets insanely bad, and I think it's because we use it much too much.
See commit 2554db916586 ("sched/wait: Break up long wake list walk") for some of that saga.
So I really think that having to serialize with the page lock in order to do some "exact page use counting" is a false economy. Yes, maybe you'd be able to avoid a COW or two, but at what locking cost?
Linus