On 15.01.20 16:39, Greg Kroah-Hartman wrote:
On Wed, Jan 15, 2020 at 04:33:14PM +0100, David Hildenbrand wrote:
This is the backport of the following fixes for 4.19-stable:
- a31b264c2b41 ("mm/memory_hotplug: make unregister_memory_block_under_nodes() never fail")
-- Turned out to not only be a cleanup but also a fix
Took the wrong one. It's d84f2f5a7552 ("drivers/base/node.c: simplify unregister_memory_block_under_nodes()")
- 2c91f8fc6c99 ("mm/memory_hotplug: fix try_offline_node()")
-- Automatic stable backport failed due to missing dependencies.
- feee6b298916 ("mm/memory_hotplug: shrink zones when offlining memory")
-- Was marked as stable 5.0+ due to the backport complexity,, but it's also relevant for 4.19/4.14. As I have to backport quite some cleanups already ...
To minimize manual code changes, I decided to pull in quite some cleanups. Still some manual code changes are necessary (indicated in the individual patches). Especially missing arm64 hot(un)plug, missing sub-section hotadd support, and missing unification of mm/hmm.c and kernel/memremap.c requires care.
Due to:
- 4e0d2e7ef14d ("mm, sparse: pass nid instead of pgdat to sparse_add_one_section()")
I need:
- afe9b36ca890 ("mm/memunmap: don't access uninitialized memmap in memunmap_pages()")
Please note that:
- 4c4b7f9ba948 ("mm/memory_hotplug: remove memory block devices before arch_remove_memory()")
Makes big (e.g., 32TB) machines boot up slower (e.g., 2h vs 10m). There is a performance fix in linux-next, but it does not seem to classify as a fix for current RC / stable.
I did quite some testing with hot(un)plug, onlining/offlining of memory blocks and memory-less/CPU-less NUMA nodes under x86_64 - the same set of tests I run against upstream on a fairly regular basis. I compile-tested on PowerPC. I did not test any ZONE_DEVICE/HMM thingies.
Let's see what people think - it's a lot of patches. If we want this, then I can try to prepare a similar set for 4.4-stable.
What bug(s) are these trying to fix here?
All tackle memory unplug issues, especially when memory was never onlined (or onlining failed), paired with memory unplug. When trying to access garbage memmaps we crash the kernel (e.g., because the derviced pgdat pointer is broken)
d84f2f5a7552 ("drivers/base/node.c: simplify unregister_memory_block_under_nodes()")
-> https://lore.kernel.org/linux-mm/b2e31976-b07d-11e6-f806-f13f4619be4d@redhat...
"If the memory we are removing was never onlined, get_nid_for_pfn()->pfn_to_nid() will return garbage. Removing will succeed but links will remain in place. [...] We will trigger the BUG_ON(ret) in add_memory_resource(), because link_mem_sections() will return with -EEXIST."
2c91f8fc6c99 ("mm/memory_hotplug: fix try_offline_node()")
We might access garbage memmaps on memory unplug and trigger a crash on memory unplug, when trying to offline the node.
feee6b298916 ("mm/memory_hotplug: shrink zones when offlining memory")
Memory unplug will access garbage memmaps (resulting in crashes) and the zones might not get fixed up properly. Relevant when memory was never onlined, when memory blocks of a DIMM were onlined to different zones, or when memory blocks were re-onlined to different zones.
This backports the remaining "don't access uninitialized memmaps"-like fixes. The other ones, were already backported.
And why would 4.9 and 4.4 care about them?
The crashes can be trigger under 4.9 and 4.4. If we decide that we do not care, then this series can be dropped.