The patch titled
Subject: mm, compaction: move high_pfn to the for loop scope
has been added to the -mm tree. Its filename is
mm-compaction-move-high_pfn-to-the-for-loop-scope.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-compaction-move-high_pfn-to-th…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-compaction-move-high_pfn-to-th…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Rokudo Yan <wu-yan(a)tcl.com>
Subject: mm, compaction: move high_pfn to the for loop scope
In fast_isolate_freepages, high_pfn will be used if a prefered one(PFN >=
low_fn) not found. But the high_pfn is not reset before searching an free
area, so when it was used as freepage, it may from another free area
searched before. And move_freelist_head(freelist, freepage) will have
unexpected behavior(eg. corrupt the MOVABLE freelist)
Unable to handle kernel paging request at virtual address dead000000000200
Mem abort info:
ESR = 0x96000044
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000044
CM = 0, WnR = 1
[dead000000000200] address between user and kernel address ranges
-000|list_cut_before(inline)
-000|move_freelist_head(inline)
-000|fast_isolate_freepages(inline)
-000|isolate_freepages(inline)
-000|compaction_alloc(?, ?)
-001|unmap_and_move(inline)
-001|migrate_pages([NSD:0xFFFFFF80088CBBD0] from = 0xFFFFFF80088CBD88, [NSD:0xFFFFFF80088CBBC8] get_new_p
-002|__read_once_size(inline)
-002|static_key_count(inline)
-002|static_key_false(inline)
-002|trace_mm_compaction_migratepages(inline)
-002|compact_zone(?, [NSD:0xFFFFFF80088CBCB0] capc = 0x0)
-003|kcompactd_do_work(inline)
-003|kcompactd([X19] p = 0xFFFFFF93227FBC40)
-004|kthread([X20] _create = 0xFFFFFFE1AFB26380)
-005|ret_from_fork(asm)
---|end of frame
The issue was reported on an smart phone product with 6GB ram and 3GB zram
as swap device.
This patch fixes the issue by reset high_pfn before searching each free
area, which ensure freepage and freelist match when call
move_freelist_head in fast_isolate_freepages().
Link: http://lkml.kernel.org/r/20190118175136.31341-12-mgorman@techsingularity.net
Link: https://lkml.kernel.org/r/20210112094720.1238444-1-wu-yan@tcl.com
Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/compaction.c~mm-compaction-move-high_pfn-to-the-for-loop-scope
+++ a/mm/compaction.c
@@ -1342,7 +1342,7 @@ fast_isolate_freepages(struct compact_co
{
unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
unsigned int nr_scanned = 0;
- unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+ unsigned long low_pfn, min_pfn, highest = 0;
unsigned long nr_isolated = 0;
unsigned long distance;
struct page *page = NULL;
@@ -1387,6 +1387,7 @@ fast_isolate_freepages(struct compact_co
struct page *freepage;
unsigned long flags;
unsigned int order_scanned = 0;
+ unsigned long high_pfn = 0;
if (!area->nr_free)
continue;
_
Patches currently in -mm which might be from wu-yan(a)tcl.com are
mm-compaction-move-high_pfn-to-the-for-loop-scope.patch
The patch titled
Subject: mm: memcontrol: prevent starvation when writing memory.high
has been added to the -mm tree. Its filename is
mm-memcontrol-prevent-starvation-when-writing-memoryhigh.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-prevent-starvation-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-prevent-starvation-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: memcontrol: prevent starvation when writing memory.high
When a value is written to a cgroup's memory.high control file, the
write() context first tries to reclaim the cgroup to size before putting
the limit in place for the workload. Concurrent charges from the workload
can keep such a write() looping in reclaim indefinitely.
In the past, a write to memory.high would first put the limit in place for
the workload, then do targeted reclaim until the new limit has been met -
similar to how we do it for memory.max. This wasn't prone to the
described starvation issue. However, this sequence could cause excessive
latencies in the workload, when allocating threads could be put into long
penalty sleeps on the sudden memory.high overage created by the write(),
before that had a chance to work it off.
Now that memory_high_write() performs reclaim before enforcing the new
limit, reflect that the cgroup may well fail to converge due to concurrent
workload activity. Bail out of the loop after a few tries.
Link: https://lkml.kernel.org/r/20210112163011.127833-1-hannes@cmpxchg.org
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-prevent-starvation-when-writing-memoryhigh
+++ a/mm/memcontrol.c
@@ -6273,7 +6273,6 @@ static ssize_t memory_high_write(struct
for (;;) {
unsigned long nr_pages = page_counter_read(&memcg->memory);
- unsigned long reclaimed;
if (nr_pages <= high)
break;
@@ -6287,10 +6286,10 @@ static ssize_t memory_high_write(struct
continue;
}
- reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
- GFP_KERNEL, true);
+ try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
+ GFP_KERNEL, true);
- if (!reclaimed && !nr_retries--)
+ if (!nr_retries--)
break;
}
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-memcontrol-prevent-starvation-when-writing-memoryhigh.patch
Changes since v1 [1]:
- Clarify the failing condition in patch 3 (Michal)
- Clarify how subsection collisions manifest in shipping systems
(Michal)
- Use zone_idx() (Michal)
- Move section_taint_zone_device() conditions to
move_pfn_range_to_zone() (Michal)
- Fix pfn_to_online_page() to account for pfn_valid() vs
pfn_section_valid() confusion (David)
[1]: http://lore.kernel.org/r/160990599013.2430134.11556277600719835946.stgit@dw…
---
Michal reminds that the discussion about how to ensure pfn-walkers do
not get confused by ZONE_DEVICE pages never resolved. A pfn-walker that
uses pfn_to_online_page() may inadvertently translate a pfn as online
and in the page allocator, when it is offline managed by a ZONE_DEVICE
mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about
ZONE_DEVICE section collisions")).
The 2 proposals under consideration are teach pfn_to_online_page() to be
precise in the presence of mixed-zone sections, or teach the memory-add
code to drop the System RAM associated with ZONE_DEVICE collisions. In
order to not regress memory capacity by a few 10s to 100s of MiB the
approach taken in this set is to add precision to pfn_to_online_page().
In the course of validating pfn_to_online_page() a couple other fixes
fell out:
1/ soft_offline_page() fails to drop the reference taken in the
madvise(..., MADV_SOFT_OFFLINE) case.
2/ The libnvdimm sysfs attribute visibility code was failing to publish
the resource base for memmap=ss!nn defined namespaces. This is needed
for the regression test for soft_offline_page().
---
Dan Williams (5):
mm: Move pfn_to_online_page() out of line
mm: Teach pfn_to_online_page() to consider subsection validity
mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
mm: Fix page reference leak in soft_offline_page()
libnvdimm/namespace: Fix visibility of namespace resource attribute
drivers/nvdimm/namespace_devs.c | 10 +++---
include/linux/memory_hotplug.h | 17 +----------
include/linux/mmzone.h | 22 +++++++++-----
mm/memory-failure.c | 20 ++++++++++---
mm/memory_hotplug.c | 62 +++++++++++++++++++++++++++++++++++++++
5 files changed, 99 insertions(+), 32 deletions(-)