On Dec 17, 2021, at 9:03 PM, Matthew Wilcox willy@infradead.org wrote:
On Sat, Dec 18, 2021 at 04:52:13AM +0000, Nadav Amit wrote:
Take for instance memcached and assume you overcommit memory with a very fast swap (e.g., pmem, zram, perhaps even slower). Now, it turns out memcached often accesses a page first for read and shortly after for write. I encountered, in a similar scenario, that the page reference that lru_cache_add() takes during the first faultin event (for read), causes a COW on a write page-fault that happens shortly after [1]. So on memcached I assume this would also trigger frequent unnecessary COWs.
Why are we comparing page_count() against 1 and not 1 + PageLRU(page)? Having a reference from the LRU should be expected. Is it because of some race that we'd need to take the page lock to protect against?
IIUC, the reference that is taken on the page is taken before SetPageLRU() is called and the reference is later dropped:
lru_add_drain() lru_add_drain_cpu() __pagevec_lru_add() __pagevec_lru_add_fn() __pagevec_lru_add_fn() SetPageLRU() <- sets the LRU release_pages() <- drops the reference
It is one scenario I encountered. There might be others that take transient references on pages that cause unnecessary COWs. I think David and Andrea had few in mind. To trigger a COW bug I once used mlock()/munlock() that take such transient reference. But who knows how many other cases exist (KSM? vmscan?)