Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

22 Oct 2018


      On Mon, 15 Oct 2018, Andrea Arcangeli wrote:
...
...
On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes rientjes@google.com wrote:
...
Would it be possible to test with my 
patch[*] that does not try reclaim to address the thrashing issue?
Yes please.
It'd also be great if a testcase reproducing the 40% higher access
latency (with the one liner original fix) was available.
I never said 40% higher access latency, I said 40% higher fault latency.
The higher access latency is 13.9% as measured on Haswell.
The test case is rather trivial: fragment all memory with order-4 memory 
to replicate a fragmented local zone, use sched_setaffinity() to bind to 
that node, and fault a reasonable number of hugepages (128MB, 256, 
whatever).  The cost of faulting remotely in this case was measured to be 
40% higher than falling back to local small pages.  This occurs quite 
obviously because you are thrashing the remote node trying to allocate 
thp.
...
We don't have a testcase for David's 40% latency increase problem, but
that's likely to only happen when the system is somewhat low on memory
globally.
Well, yes, but that's most of our systems.  We can't keep around gigabytes 
of memory free just to work around this patch.  Removing __GFP_THISNODE to 
avoid thrashing the local node obviously will incur a substantial 
performance degradation if you thrash the remote node as well.  This 
should be rather straight forward.
...
When there's 75% or more of the RAM free (not even allocated as easily
reclaimable pagecache) globally, you don't expect to hit heavy
swapping.
I agree there is no regression introduced by your patch when 75% of memory 
is free.
...
The 40% THP allocation latency increase if you use MADV_HUGEPAGE in
such window where all remote zones are fully fragmented is somehow
lesser of a concern in my view (plus there's the compact deferred
logic that should mitigate that scenario). Furthermore it is only a
concern for page faults in MADV_HUGEPAGE ranges. If MADV_HUGEPAGE is
set the userland allocation is long lived, so such higher allocation
latency won't risk to hit short lived allocations that don't set
MADV_HUGEPAGE (unless madvise=always, but that's not the default
precisely because not all allocations are long lived).
If the MADV_HUGEPAGE using library was freely available it'd also be
nice.
You scan your mappings for .text segments, map a hugepage-aligned region 
sufficient in size, mremap() to that region, and do MADV_HUGEPAGE.

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings