On Mon 05-11-18 08:20:09, Baoquan He wrote:
Hi Michal,
On 11/02/18 at 04:55pm, Michal Hocko wrote:
From: Michal Hocko mhocko@suse.com
Baoquan He has noticed that 15c30bc09085 ("mm, memory_hotplug: make has_unmovable_pages more robust") is causing memory offlining failures on a movable node. After a further debugging it turned out that has_unmovable_pages fails prematurely because it stumbles over off-LRU pages. Nevertheless those pages are not on LRU because they are waiting on the pcp LRU caches (an example of __dump_page added by a debugging patch) [ 560.923297] page:ffffea043f39fa80 count:1 mapcount:0 mapping:ffff880e5dce1b59 index:0x7f6eec459 [ 560.931967] flags: 0x5fffffc0080024(uptodate|active|swapbacked) [ 560.937867] raw: 005fffffc0080024 dead000000000100 dead000000000200 ffff880e5dce1b59 [ 560.945606] raw: 00000007f6eec459 0000000000000000 00000001ffffffff ffff880e43ae8000 [ 560.953323] page dumped because: hotplug [ 560.957238] page->mem_cgroup:ffff880e43ae8000 [ 560.961620] has_unmovable_pages: pfn:0x10fd030d, found:0x1, count:0x0 [ 560.968127] page:ffffea043f40c340 count:2 mapcount:0 mapping:ffff880e2f2d8628 index:0x0 [ 560.976104] flags: 0x5fffffc0000006(referenced|uptodate) [ 560.981401] raw: 005fffffc0000006 dead000000000100 dead000000000200 ffff880e2f2d8628 [ 560.989119] raw: 0000000000000000 0000000000000000 00000002ffffffff ffff88010a8f5000 [ 560.996833] page dumped because: hotplug
Sorry, last week I didn't test this patch with memory pressure adding. Today use "stress -m 200 -t 2h" to add pressure, hot removing failed. Will send you output log. W/o memory pressure, it sometimes succeed. I saw one failure last night, it still show un-removable as 0 in hotpluggable node one time, I worried it might be caused by my compiling mistake, so compile and try again this morning.
In a private email you have sent this (let's assume this is correctly testing the patch I have posted):
: [43283.914082] has_unmovable_pages: pfn:0x10e62600, found:0x1, count:0x0 : [43283.920669] page:ffffea0439898000 count:1 mapcount:1 mapping:ffff880e5639d3c9 index:0x7f2430400 compound_mapcount: 1 : [43283.931219] flags: 0x5fffffc0090034(uptodate|lru|active|head|swapbacked) : [43283.937954] raw: 005fffffc0090034 ffffea043ffcb888 ffffea043f728008 ffff880e5639d3c9 : [43283.945722] raw: 00000007f2430400 0000000000000000 00000001ffffffff ffff880e2baad000 : [43283.955381] page dumped because: hotplug
The page is both LRU and SwapBacked which should hit the some of the checks in has_unmovable_pages. The fact it hasn't means we have clearly raced with the page being allocated and marked SwapBacked/LRU. This is surely possible and there is no universal way to prevent from that for all types of potentially migratedable pages. The race window should be relatively small. Maybe we can add a retry for movable zone pages.
How reproducible this is?
But, as I've said memory isolation resp. has_unmovable_pages begs for a complete redesign.