On 6/21/19 1:43 PM, Alan Jenkins wrote:
When setting the low and high watermarks we use min_wmark_pages(zone). I guess this is to reduce the line length. But we forgot that this macro includes zone->watermark_boost. We need to reset zone->watermark_boost first. Otherwise the watermarks will be set inconsistently.
E.g. this could cause inconsistent values if the watermarks have been boosted, and then you change a sysctl which triggers __setup_per_zone_wmarks().
I strongly suspect this explains why I have seen slightly high watermarks. Suspicious-looking zoneinfo below - notice high-low != low-min.
Node 0, zone Normal pages free 74597 min 9582 low 34505 high 36900
https://unix.stackexchange.com/questions/525674/my-low-and-high-watermarks-s...
Signed-off-by: Alan Jenkins alan.christopher.jenkins@gmail.com Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Cc: stable@vger.kernel.org
Nice catch, thanks!
Acked-by: Vlastimil Babka vbabka@suse.cz
Personally I would implement it a bit differently, see below. If you agree, it's fine if you keep the authorship of the whole patch.
Tested by compiler :-).
Ideally the commit message would be clear about what happens the *first* time __setup_per_zone_watermarks() is called. I guess that zone->watermark_boost is *usually* zero, or we would have noticed some wild problems :-). However I am not familiar with how the zone structures are allocated & initialized. Maybe there is a case where zone->watermark_boost could contain an arbitrary unitialized value at this point. Can we rule that out?
Dunno if there's some arch override, but generic_alloc_nodedata() uses kzalloc() so it's zeroed.
-----8<----- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d66bc8abe0af..3b2f0cedf78e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7624,6 +7624,7 @@ static void __setup_per_zone_wmarks(void)
for_each_zone(zone) { u64 tmp; + unsigned long wmark_min;
spin_lock_irqsave(&zone->lock, flags); tmp = (u64)pages_min * zone_managed_pages(zone); @@ -7642,13 +7643,13 @@ static void __setup_per_zone_wmarks(void)
min_pages = zone_managed_pages(zone) / 1024; min_pages = clamp(min_pages, SWAP_CLUSTER_MAX, 128UL); - zone->_watermark[WMARK_MIN] = min_pages; + wmark_min = min_pages; } else { /* * If it's a lowmem zone, reserve a number of pages * proportionate to the zone's size. */ - zone->_watermark[WMARK_MIN] = tmp; + wmark_min = tmp; }
/* @@ -7660,8 +7661,9 @@ static void __setup_per_zone_wmarks(void) mult_frac(zone_managed_pages(zone), watermark_scale_factor, 10000));
- zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp; - zone->_watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2; + zone->_watermark[WMARK_MIN] = wmark_min; + zone->_watermark[WMARK_LOW] = wmark_min + tmp; + zone->_watermark[WMARK_HIGH] = wmark_min + tmp * 2; zone->watermark_boost = 0;
spin_unlock_irqrestore(&zone->lock, flags);