Greetings,
Find attached email very confidential. reply for more details
Thanks.
Peter Wong
----------------------------------------------------
This email was sent by the shareware version of Postman Professional.
The patch titled
Subject: mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
has been removed from the -mm tree. Its filename was
mm-memcontrol-fix-network-errors-from-failing-__gfp_atomic-charges.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
While upgrading from 4.16 to 5.2, we noticed these allocation errors in
the log of the new kernel:
[ 8642.253395] SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC)
[ 8642.269170] cache: tw_sock_TCPv6(960:helper-logs), object size: 232, buffer size: 240, default order: 1, min order: 0
[ 8642.293009] node 0: slabs: 5, objs: 170, free: 0
slab_out_of_memory+1
___slab_alloc+969
__slab_alloc+14
kmem_cache_alloc+346
inet_twsk_alloc+60
tcp_time_wait+46
tcp_fin+206
tcp_data_queue+2034
tcp_rcv_state_process+784
tcp_v6_do_rcv+405
__release_sock+118
tcp_close+385
inet_release+46
__sock_release+55
sock_close+17
__fput+170
task_work_run+127
exit_to_usermode_loop+191
do_syscall_64+212
entry_SYSCALL_64_after_hwframe+68
accompanied by an increase in machines going completely radio silent under
memory pressure.
One thing that changed since 4.16 is e699e2c6a654 ("net, mm: account sock
objects to kmemcg"), which made these slab caches subject to cgroup memory
accounting and control.
The problem with that is that cgroups, unlike the page allocator, do not
maintain dedicated atomic reserves. As a cgroup's usage hovers at its
limit, atomic allocations - such as done during network rx - can fail
consistently for extended periods of time. The kernel is not able to
operate under these conditions.
We don't want to revert the culprit patch, because it indeed tracks a
potentially substantial amount of memory used by a cgroup.
We also don't want to implement dedicated atomic reserves for cgroups.
There is no point in keeping a fixed margin of unused bytes in the
cgroup's memory budget to accomodate a consumer that is impossible to
predict - we'd be wasting memory and get into configuration headaches, not
unlike what we have going with min_free_kbytes. We do this for physical
mem because we have to, but cgroups are an accounting game.
Instead, account these privileged allocations to the cgroup, but let them
bypass the configured limit if they have to. This way, we get the
benefits of accounting the consumed memory and have it exert pressure on
the rest of the cgroup, but like with the page allocator, we shift the
burden of reclaimining on behalf of atomic allocations onto the regular
allocations that can block.
Link: http://lkml.kernel.org/r/20191022233708.365764-1-hannes@cmpxchg.org
Fixes: e699e2c6a654 ("net, mm: account sock objects to kmemcg")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Suleiman Souhlal <suleiman(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.18+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/memcontrol.c~mm-memcontrol-fix-network-errors-from-failing-__gfp_atomic-charges
+++ a/mm/memcontrol.c
@@ -2535,6 +2535,15 @@ retry:
}
/*
+ * Memcg doesn't have a dedicated reserve for atomic
+ * allocations. But like the global atomic pool, we need to
+ * put the burden of reclaim on regular allocation requests
+ * and let these go through as privileged allocations.
+ */
+ if (gfp_mask & __GFP_ATOMIC)
+ goto force;
+
+ /*
* Unlike in global OOM situations, memcg is not in a physical
* memory shortage. Allow dying and OOM-killed tasks to
* bypass the last charges so that they can exit quickly and
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-memcontrol-remove-dead-code-from-memory_max_write.patch
mm-memcontrol-try-harder-to-set-a-new-memoryhigh.patch
mm-drop-mmap_sem-before-calling-balance_dirty_pages-in-write-fault.patch
mm-vmscan-simplify-lruvec_lru_size.patch
mm-clean-up-and-clarify-lruvec-lookup-procedure.patch
mm-vmscan-move-inactive_list_is_low-swap-check-to-the-caller.patch
mm-vmscan-naming-fixes-global_reclaim-and-sane_reclaim.patch
mm-vmscan-replace-shrink_node-loop-with-a-retry-jump.patch
mm-vmscan-turn-shrink_node_memcg-into-shrink_lruvec.patch
mm-vmscan-split-shrink_node-into-node-part-and-memcgs-part.patch
mm-vmscan-split-shrink_node-into-node-part-and-memcgs-part-fix.patch
mm-vmscan-harmonize-writeback-congestion-tracking-for-nodes-memcgs.patch
kernel-sysctl-make-drop_caches-write-only.patch
The patch titled
Subject: mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly
has been removed from the -mm tree. Its filename was
mm-slab-make-page_cgroup_ino-to-recognize-non-compound-slab-pages-properly.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: slab: make page_cgroup_ino() to recognize non-compound slab pages properly
page_cgroup_ino() doesn't return a valid memcg pointer for non-compound
slab pages, because it depends on PgHead AND PgSlab flags to be set to
determine the memory cgroup from the kmem_cache. It's correct for
compound pages, but not for generic small pages. Those don't have PgHead
set, so it ends up returning zero.
Fix this by replacing the condition to PageSlab() && !PageTail().
Before this patch:
[root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/user(a)0.service/ | grep slab
0x0000000000000080 38 0 _______S___________________________________ slab
After this patch:
[root@localhost ~]# ./page-types -c /sys/fs/cgroup/user.slice/user-0.slice/user(a)0.service/ | grep slab
0x0000000000000080 147 0 _______S___________________________________ slab
Also, hwpoison_filter_task() uses output of page_cgroup_ino() in order
to filter error injection events based on memcg. So if
page_cgroup_ino() fails to return memcg pointer, we just fail to inject
memory error. Considering that hwpoison filter is for testing,
affected users are limited and the impact should be marginal.
[n-horiguchi(a)ah.jp.nec.com: changelog additions]
Link: http://lkml.kernel.org/r/20191031012151.2722280-1-guro@fb.com
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
mm/slab.h | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
--- a/mm/memcontrol.c~mm-slab-make-page_cgroup_ino-to-recognize-non-compound-slab-pages-properly
+++ a/mm/memcontrol.c
@@ -484,7 +484,7 @@ ino_t page_cgroup_ino(struct page *page)
unsigned long ino = 0;
rcu_read_lock();
- if (PageHead(page) && PageSlab(page))
+ if (PageSlab(page) && !PageTail(page))
memcg = memcg_from_slab_page(page);
else
memcg = READ_ONCE(page->mem_cgroup);
--- a/mm/slab.h~mm-slab-make-page_cgroup_ino-to-recognize-non-compound-slab-pages-properly
+++ a/mm/slab.h
@@ -323,8 +323,8 @@ static inline struct kmem_cache *memcg_r
* Expects a pointer to a slab page. Please note, that PageSlab() check
* isn't sufficient, as it returns true also for tail compound slab pages,
* which do not have slab_cache pointer set.
- * So this function assumes that the page can pass PageHead() and PageSlab()
- * checks.
+ * So this function assumes that the page can pass PageSlab() && !PageTail()
+ * check.
*
* The kmem_cache can be reparented asynchronously. The caller must ensure
* the memcg lifetime, e.g. by taking rcu_read_lock() or cgroup_mutex.
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcg-switch-to-css_tryget-in-get_mem_cgroup_from_mm.patch
mm-hugetlb-switch-to-css_tryget-in-hugetlb_cgroup_charge_cgroup.patch
The patch titled
Subject: dump_stack: avoid the livelock of the dump_lock
has been removed from the -mm tree. Its filename was
dump_stack-avoid-the-livelock-of-the-dump_lock.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Kevin Hao <haokexin(a)gmail.com>
Subject: dump_stack: avoid the livelock of the dump_lock
In the current code, we use the atomic_cmpxchg() to serialize the output
of the dump_stack(), but this implementation suffers the thundering herd
problem. We have observed such kind of livelock on a Marvell cn96xx
board(24 cpus) when heavily using the dump_stack() in a kprobe handler.
Actually we can let the competitors to wait for the releasing of the lock
before jumping to atomic_cmpxchg(). This will definitely mitigate the
thundering herd problem. Thanks Linus for the suggestion.
[akpm(a)linux-foundation.org: fix comment]
Link: http://lkml.kernel.org/r/20191030031637.6025-1-haokexin@gmail.com
Fixes: b58d977432c8 ("dump_stack: serialize the output from dump_stack()")
Signed-off-by: Kevin Hao <haokexin(a)gmail.com>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/dump_stack.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/lib/dump_stack.c~dump_stack-avoid-the-livelock-of-the-dump_lock
+++ a/lib/dump_stack.c
@@ -106,7 +106,12 @@ retry:
was_locked = 1;
} else {
local_irq_restore(flags);
- cpu_relax();
+ /*
+ * Wait for the lock to release before jumping to
+ * atomic_cmpxchg() in order to mitigate the thundering herd
+ * problem.
+ */
+ do { cpu_relax(); } while (atomic_read(&dump_lock) != -1);
goto retry;
}
_
Patches currently in -mm which might be from haokexin(a)gmail.com are
The patch titled
Subject: mm, vmstat: hide /proc/pagetypeinfo from normal users
has been removed from the -mm tree. Its filename was
mm-vmstat-hide-proc-pagetypeinfo-from-normal-users.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm, vmstat: hide /proc/pagetypeinfo from normal users
/proc/pagetypeinfo is a debugging tool to examine internal page allocator
state wrt to fragmentation. It is not very useful for any other use so
normal users really do not need to read this file.
Waiman Long has noticed that reading this file can have negative side
effects because zone->lock is necessary for gathering data and that a)
interferes with the page allocator and its users and b) can lead to hard
lockups on large machines which have very long free_list.
Reduce both issues by simply not exporting the file to regular users.
Link: http://lkml.kernel.org/r/20191025072610.18526-2-mhocko@kernel.org
Fixes: 467c996c1e19 ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Waiman Long <longman(a)redhat.com>
Acked-by: Mel Gorman <mgorman(a)suse.de>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Waiman Long <longman(a)redhat.com>
Acked-by: Rafael Aquini <aquini(a)redhat.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
Cc: Jann Horn <jannh(a)google.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmstat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmstat.c~mm-vmstat-hide-proc-pagetypeinfo-from-normal-users
+++ a/mm/vmstat.c
@@ -1972,7 +1972,7 @@ void __init init_mm_internals(void)
#endif
#ifdef CONFIG_PROC_FS
proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op);
- proc_create_seq("pagetypeinfo", 0444, NULL, &pagetypeinfo_op);
+ proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op);
proc_create_seq("vmstat", 0444, NULL, &vmstat_op);
proc_create_seq("zoneinfo", 0444, NULL, &zoneinfo_op);
#endif
_
Patches currently in -mm which might be from mhocko(a)suse.com are
The patch titled
Subject: mm: thp: handle page cache THP correctly in PageTransCompoundMap
has been removed from the -mm tree. Its filename was
mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm: thp: handle page cache THP correctly in PageTransCompoundMap
We have a usecase to use tmpfs as QEMU memory backend and we would like to
take the advantage of THP as well. But, our test shows the EPT is not PMD
mapped even though the underlying THP are PMD mapped on host. The number
showed by /sys/kernel/debug/kvm/largepage is much less than the number of
PMD mapped shmem pages as the below:
7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 579584 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
12
And some benchmarks do worse than with anonymous THPs.
By digging into the code we figured out that commit 127393fbe597 ("mm:
thp: kvm: fix memory corruption in KVM with THP enabled") checks if there
is a single PTE mapping on the page for anonymous THP when setting up EPT
map. But, the _mapcount < 0 check doesn't fit to page cache THP since
every subpage of page cache THP would get _mapcount inc'ed once it is PMD
mapped, so PageTransCompoundMap() always returns false for page cache THP.
This would prevent KVM from setting up PMD mapped EPT entry.
So we need handle page cache THP correctly. However, when page cache
THP's PMD gets split, kernel just remove the map instead of setting up PTE
map like what anonymous THP does. Before KVM calls get_user_pages() the
subpages may get PTE mapped even though it is still a THP since the page
cache THP may be mapped by other processes at the mean time.
Checking its _mapcount and whether the THP has PTE mapped or not.
Although this may report some false negative cases (PTE mapped by other
processes), it looks not trivial to make this accurate.
With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages
are PMD mapped by EPT as the below:
7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 557056 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
271
And the benchmarks are as same as anonymous THPs.
[yang.shi(a)linux.alibaba.com: v4]
Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.a…
Fixes: dd78fedde4b9 ("rmap: support file thp")
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reported-by: Gang Deng <gavin.dg(a)linux.alibaba.com>
Tested-by: Gang Deng <gavin.dg(a)linux.alibaba.com>
Suggested-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 5 -----
include/linux/mm_types.h | 5 +++++
include/linux/page-flags.h | 20 ++++++++++++++++++--
3 files changed, 23 insertions(+), 7 deletions(-)
--- a/include/linux/mm.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap
+++ a/include/linux/mm.h
@@ -695,11 +695,6 @@ static inline void *kvcalloc(size_t n, s
extern void kvfree(const void *addr);
-static inline atomic_t *compound_mapcount_ptr(struct page *page)
-{
- return &page[1].compound_mapcount;
-}
-
static inline int compound_mapcount(struct page *page)
{
VM_BUG_ON_PAGE(!PageCompound(page), page);
--- a/include/linux/mm_types.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap
+++ a/include/linux/mm_types.h
@@ -221,6 +221,11 @@ struct page {
#endif
} _struct_page_alignment;
+static inline atomic_t *compound_mapcount_ptr(struct page *page)
+{
+ return &page[1].compound_mapcount;
+}
+
/*
* Used for sizing the vmemmap region on some architectures
*/
--- a/include/linux/page-flags.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap
+++ a/include/linux/page-flags.h
@@ -622,12 +622,28 @@ static inline int PageTransCompound(stru
*
* Unlike PageTransCompound, this is safe to be called only while
* split_huge_pmd() cannot run from under us, like if protected by the
- * MMU notifier, otherwise it may result in page->_mapcount < 0 false
+ * MMU notifier, otherwise it may result in page->_mapcount check false
* positives.
+ *
+ * We have to treat page cache THP differently since every subpage of it
+ * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE
+ * mapped in the current process so comparing subpage's _mapcount to
+ * compound_mapcount to filter out PTE mapped case.
*/
static inline int PageTransCompoundMap(struct page *page)
{
- return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0;
+ struct page *head;
+
+ if (!PageTransCompound(page))
+ return 0;
+
+ if (PageAnon(page))
+ return atomic_read(&page->_mapcount) < 0;
+
+ head = compound_head(page);
+ /* File THP is PMD mapped and not PTE mapped */
+ return atomic_read(&page->_mapcount) ==
+ atomic_read(compound_mapcount_ptr(head));
}
/*
_
Patches currently in -mm which might be from yang.shi(a)linux.alibaba.com are
mm-mempolicy-fix-the-wrong-return-value-and-potential-pages-leak-of-mbind.patch
mm-rmap-use-vm_bug_on-in-__page_check_anon_rmap.patch
mm-vmscan-remove-unused-scan_control-parameter-from-pageout.patch
The patch titled
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
has been removed from the -mm tree. Its filename was
mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm, meminit: recalculate pcpu batch and high limits after init completes
Deferred memory initialisation updates zone->managed_pages during the
initialisation phase but before that finishes, the per-cpu page allocator
(pcpu) calculates the number of pages allocated/freed in batches as well
as the maximum number of pages allowed on a per-cpu list. As
zone->managed_pages is not up to date yet, the pcpu initialisation
calculates inappropriately low batch and high values.
This increases zone lock contention quite severely in some cases with the
degree of severity depending on how many CPUs share a local zone and the
size of the zone. A private report indicated that kernel build times were
excessive with extremely high system CPU usage. A perf profile indicated
that a large chunk of time was lost on zone->lock contention.
This patch recalculates the pcpu batch and high values after deferred
initialisation completes for every populated zone in the system. It was
tested on a 2-socket AMD EPYC 2 machine using a kernel compilation
workload -- allmodconfig and all available CPUs.
mmtests configuration: config-workload-kernbench-max Configuration was
modified to build on a fresh XFS partition.
kernbench
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%*
Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%*
Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%*
Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%)
Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%)
Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%)
5.4.0-rc3 5.4.0-rc3
vanilla resetpcpu-v2
Duration User 39766.24 49221.79
Duration System 44298.10 13361.67
Duration Elapsed 519.11 388.87
The patch reduces system CPU usage by 69.86% and total build time by
26.65%. The variance of system CPU usage is also much reduced.
Before, this was the breakdown of batch and high values over all zones was.
256 batch: 1
256 batch: 63
512 batch: 7
256 high: 0
256 high: 378
512 high: 42
512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After
the patch:
256 batch: 1
768 batch: 63
256 high: 0
768 high: 378
[mgorman(a)techsingularity.net: fix merge/linkage snafu]
Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Matt Fleming <matt(a)codeblueprint.co.uk>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org> [4.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/page_alloc.c~mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes
+++ a/mm/page_alloc.c
@@ -1948,6 +1948,14 @@ void __init page_alloc_init_late(void)
wait_for_completion(&pgdat_init_all_done_comp);
/*
+ * The number of managed pages has changed due to the initialisation
+ * so the pcpu batch and high limits needs to be updated or the limits
+ * will be artificially small.
+ */
+ for_each_populated_zone(zone)
+ zone_pcp_update(zone);
+
+ /*
* We initialized the rest of the deferred pages. Permanently disable
* on-demand struct page initialization.
*/
@@ -8514,7 +8522,6 @@ void free_contig_range(unsigned long pfn
WARN(count != 0, "%d pages are still in use!\n", count);
}
-#ifdef CONFIG_MEMORY_HOTPLUG
/*
* The zone indicated has a new number of managed_pages; batch sizes and percpu
* page high values need to be recalulated.
@@ -8528,7 +8535,6 @@ void __meminit zone_pcp_update(struct zo
per_cpu_ptr(zone->pageset, cpu));
mutex_unlock(&pcp_batch_high_lock);
}
-#endif
void zone_pcp_reset(struct zone *zone)
{
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
mm-pcp-share-common-code-between-memory-hotplug-and-percpu-sysctl-handler.patch
mm-pcpu-make-zone-pcp-updates-and-reset-internal-to-the-mm.patch