In move_freepages() a BUG_ON() can be triggered on uninitialized page structures due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the same way as in move_freepages_block() simply fixes those crashes.
Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: Daniel Vacek neelx@redhat.com Cc: stable@vger.kernel.org --- mm/page_alloc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..9edee36e6a74 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, /* * Skip to the pfn preceding the next valid one (or * end_pfn), such that we hit a valid pfn (or end_pfn) - * on our next iteration of the loop. + * on our next iteration of the loop. Note that it needs + * to be pageblock aligned even when the region itself + * is not as move_freepages_block() can shift ahead of + * the valid region but still depends on correct page + * metadata. */ - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1; + pfn = (memblock_next_valid_pfn(pfn, end_pfn) & + ~(pageblock_nr_pages-1)) - 1; #endif continue; }
On Thu 01-03-18 13:47:45, Daniel Vacek wrote:
In move_freepages() a BUG_ON() can be triggered on uninitialized page structures due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the same way as in move_freepages_block() simply fixes those crashes.
This changelog doesn't describe how the fix works. Why doesn't memblock_next_valid_pfn return the first valid pfn as one would expect?
It would be also good put the panic info in the changelog.
Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: Daniel Vacek neelx@redhat.com Cc: stable@vger.kernel.org
mm/page_alloc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..9edee36e6a74 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, /* * Skip to the pfn preceding the next valid one (or * end_pfn), such that we hit a valid pfn (or end_pfn)
* on our next iteration of the loop.
* on our next iteration of the loop. Note that it needs
* to be pageblock aligned even when the region itself
* is not as move_freepages_block() can shift ahead of
* the valid region but still depends on correct page
* metadata. */
pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
~(pageblock_nr_pages-1)) - 1;
#endif continue; } -- 2.16.2
ffffe31d01ed8000 7b600000 0 0 0 0 On Thu, Mar 1, 2018 at 2:10 PM, Michal Hocko mhocko@kernel.org wrote:
On Thu 01-03-18 13:47:45, Daniel Vacek wrote:
In move_freepages() a BUG_ON() can be triggered on uninitialized page structures due to pageblock alignment. Aligning the skipped pfns in memmap_init_zone() the same way as in move_freepages_block() simply fixes those crashes.
This changelog doesn't describe how the fix works. Why doesn't memblock_next_valid_pfn return the first valid pfn as one would expect?
Actually it does. The point is it is not guaranteed to be pageblock aligned. And we actually want to initialize even those page structures which are outside of the range. Hence the alignment here.
For example from reproducer machine, memory map from e820/BIOS:
$ grep 7b7ff000 /proc/iomem 7b7ff000-7b7fffff : System RAM
Page structures before commit b92df1de5d28:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff73941e00000 78000000 0 0 1 1fffff00000000 fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000 fffff73941ed8000 7b600000 0 0 1 1fffff00000000 fffff73941edff80 7b7fe000 0 0 1 1fffff00000000 fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk <<<< start of the range here fffff73941ee0000 7b800000 0 0 1 1fffff00000000 fffff73941ffffc0 7ffff000 0 0 1 1fffff00000000
So far so good.
After commit b92df1de5d28 machine eventually crashes with:
BUG at mm/page_alloc.c:1913
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
From registers and stack I digged start_page points to
ffffe31d01ed8000 (note that this is page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffe31d01e00000 78000000 0 0 0 0 ffffe31d01ed7fc0 7b5ff000 0 0 0 0 ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note that nodeid and zonenr are encoded in top bits of page flags which are not initialized here, hence the crash :-( ffffe31d01edff80 7b7fe000 0 0 0 0 ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000 ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000 ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000
With my fix applied:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0001e00000 78000000 0 0 0 0 ffffea0001e00000 7b5ff000 0 0 0 0 ffffea0001ed8000 7b600000 0 0 1 1fffff00000000 <<<< vital data filled in here this time \o/ ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000 ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk ffffea0001ee0000 7b800000 0 0 1 1fffff00000000 ffffea0001ffffc0 7ffff000 0 0 1 1fffff00000000
We are not interested in the beginning of whole section. Just the pages in the first populated block where the range begins are important (actually just the first one really, but...).
It would be also good put the panic info in the changelog.
Of course I forgot to link the related bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=196443
Though it is not very well explained there as well. I hope my notes above make it clear.
Fixes: b92df1de5d28 ("[mm] page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: Daniel Vacek neelx@redhat.com Cc: stable@vger.kernel.org
mm/page_alloc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..9edee36e6a74 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, /* * Skip to the pfn preceding the next valid one (or * end_pfn), such that we hit a valid pfn (or end_pfn)
* on our next iteration of the loop.
* on our next iteration of the loop. Note that it needs
* to be pageblock aligned even when the region itself
* is not as move_freepages_block() can shift ahead of
* the valid region but still depends on correct page
* metadata. */
pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
~(pageblock_nr_pages-1)) - 1;
#endif continue; } -- 2.16.2
-- Michal Hocko SUSE Labs
On Thu 01-03-18 16:09:35, Daniel Vacek wrote: [...]
$ grep 7b7ff000 /proc/iomem 7b7ff000-7b7fffff : System RAM
[...]
After commit b92df1de5d28 machine eventually crashes with:
BUG at mm/page_alloc.c:1913
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
This is an important information that should be in the changelog.
From registers and stack I digged start_page points to
ffffe31d01ed8000 (note that this is page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffe31d01e00000 78000000 0 0 0 0 ffffe31d01ed7fc0 7b5ff000 0 0 0 0 ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note
Are those ranges covered by the System RAM as well?
that nodeid and zonenr are encoded in top bits of page flags which are not initialized here, hence the crash :-( ffffe31d01edff80 7b7fe000 0 0 0 0 ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000 ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000 ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000
It is still not clear why not to do the alignment in memblock_next_valid_pfn rahter than its caller.
On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko mhocko@kernel.org wrote:
On Thu 01-03-18 16:09:35, Daniel Vacek wrote: [...]
$ grep 7b7ff000 /proc/iomem 7b7ff000-7b7fffff : System RAM
[...]
After commit b92df1de5d28 machine eventually crashes with:
BUG at mm/page_alloc.c:1913
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
This is an important information that should be in the changelog.
And that's exactly what my seven very first words tried to express in human readable form instead of mechanically pasting the source code. I guess that's a matter of preference. Though I see grepping later can be an issue here.
From registers and stack I digged start_page points to
ffffe31d01ed8000 (note that this is page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffe31d01e00000 78000000 0 0 0 0 ffffe31d01ed7fc0 7b5ff000 0 0 0 0 ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note
Are those ranges covered by the System RAM as well?
that nodeid and zonenr are encoded in top bits of page flags which are not initialized here, hence the crash :-( ffffe31d01edff80 7b7fe000 0 0 0 0 ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000 ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000 ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000
It is still not clear why not to do the alignment in memblock_next_valid_pfn rather than its caller.
As it's the mem init which needs it to be aligned. Other callers may not, possibly? Not that there are any other callers at the moment so it really does not matter where it is placed. The only difference would be the end of the loop with end_pfn vs aligned end_pfn. And it looks like the pure (unaligned) end_pfn would be preferred here. Wanna me send a v2?
-- Michal Hocko SUSE Labs
On Thu, 1 Mar 2018 17:20:04 +0100 Daniel Vacek neelx@redhat.com wrote:
Wanna me send a v2?
Yes please ;)
On Thu, Mar 1, 2018 at 5:20 PM, Daniel Vacek neelx@redhat.com wrote:
On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko mhocko@kernel.org wrote:
It is still not clear why not to do the alignment in memblock_next_valid_pfn rather than its caller.
As it's the mem init which needs it to be aligned. Other callers may not, possibly? Not that there are any other callers at the moment so it really does not matter where it is placed. The only difference would be the end of the loop with end_pfn vs aligned end_pfn. And it looks like the pure (unaligned) end_pfn would be preferred here. Wanna me send a v2?
Thinking about it again memblock has nothing to do with pageblock. And the function name suggests one shall get a next valid pfn, not something totally unrelated to memblock. So that's what it returns. It's the mem init which needs to align this and hence mem init aligns it for it's purposes. I'd call this the correct design.
To deal with the end_pfn special case I'd actually get rid of it completely and hardcode -1UL as max pfn instead (rather than 0). Caller should handle max pfn as an error or end of the loop as here in this case.
I'll send a v2 with this implemented.
Paul> Why is it based on memblock actually? Wouldn't a generic mem_section solution work satisfiable for you? That would be natively aligned with whole section (doing a bit more work as a result in the end) and also independent of CONFIG_HAVE_MEMBLOCK_NODE_MAP availability.
-- Michal Hocko SUSE Labs
On Thu 01-03-18 17:20:04, Daniel Vacek wrote:
On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko mhocko@kernel.org wrote:
On Thu 01-03-18 16:09:35, Daniel Vacek wrote: [...]
$ grep 7b7ff000 /proc/iomem 7b7ff000-7b7fffff : System RAM
[...]
After commit b92df1de5d28 machine eventually crashes with:
BUG at mm/page_alloc.c:1913
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
This is an important information that should be in the changelog.
And that's exactly what my seven very first words tried to express in human readable form instead of mechanically pasting the source code. I guess that's a matter of preference. Though I see grepping later can be an issue here.
Do not get me wrong I do not want to nag just for fun of it. The changelog should be really clear about the problem. What might be clear to you based on the debugging might not be so clear to others. And the struct page initialization code is far from trivial especially when we have different alignment requirements by the memory model and the page allocator.
Therefore being as clear as possible is really valuable. So I would really love to see the changelog to contain. - What is going on - VM_BUG_ON in move_freepages along with the crash report - memory ranges exported by BIOS/FW - explain why is the pageblock alignment the proper one. How does the range look from the memory section POV (with SPARSEMEM). - What about those unaligned pages which are not backed by any memory? Are they reserved so that they will never get used?
And just to be clear. I am not saying your patch is wrong. It just raises more questions than answers and I suspect it just papers over some more fundamental problem. I might be clearly wrong and I cannot deserve this more time for the next week because I will be offline but I would _really_ appreciate if this all got explained.
Thanks!
On Fri, Mar 2, 2018 at 2:01 PM, Michal Hocko mhocko@kernel.org wrote:
On Thu 01-03-18 17:20:04, Daniel Vacek wrote:
On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko mhocko@kernel.org wrote:
On Thu 01-03-18 16:09:35, Daniel Vacek wrote: [...]
$ grep 7b7ff000 /proc/iomem 7b7ff000-7b7fffff : System RAM
[...]
After commit b92df1de5d28 machine eventually crashes with:
BUG at mm/page_alloc.c:1913
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
This is an important information that should be in the changelog.
And that's exactly what my seven very first words tried to express in human readable form instead of mechanically pasting the source code. I guess that's a matter of preference. Though I see grepping later can be an issue here.
Do not get me wrong I do not want to nag just for fun of it. The changelog should be really clear about the problem. What might be clear to you based on the debugging might not be so clear to others. And the struct page initialization code is far from trivial especially when we have different alignment requirements by the memory model and the page allocator.
I get it. I didn't mean to be rude or something. I just thought I covered all the relevant details..
Therefore being as clear as possible is really valuable. So I would really love to see the changelog to contain.
- What is going on - VM_BUG_ON in move_freepages along with the crash report
I'll put more details there.
- memory ranges exported by BIOS/FW
They were not mentioned as they are not really relevant. Any e820 map can have issues. Now I only saw reports on few selected machines, mostly LENOVO System x3650 M5, some FUJITSU, some Cisco blades. But the map is always fairly normal. IIUC, the bug only happens if the range which is not pageblock aligned happens to be the first one in a zone or following after an not-populated section.
Again, nothing of that is really relevant. What is is that the commit b92df1de5d28 changes the way page structures are initialized so that for some perfectly fine maps from BIOS kernel now can crash as a result. And my fix tries to keep at least the bare minimum of the original behavior needed to keep kernel stable.
- explain why is the pageblock alignment the proper one. How does the range look from the memory section POV (with SPARSEMEM).
The commit message explains that. "the same way as in move_freepages_block()" to quote myself. The alignment in this function is the one causing the crash as the VM_BUG_ON() assert in subsequential move_freepages() is checking the (now) uninitialized structure. If we follow this alignment the initialization will not get skipped for that structure. Again, this is partially restoring the original behavior rather than rewriting move_freepages{,_block} to not crash with some data it was not designed for.
I'll try to explain this more transparently in commit message.
Alternatively you can just revert the b92df1de5d28. That will fix the crashes as well.
- What about those unaligned pages which are not backed by any memory? Are they reserved so that they will never get used?
They are handled the same way as it used to be before b92df1de5d28. This patch does not change or touch anything with this regards. Or am I wrong?
And just to be clear. I am not saying your patch is wrong. It just
You better not. My patch it totally correct :p (I hope)
raises more questions than answers and I suspect it just papers over some more fundamental problem. I might be clearly wrong and I cannot
I see. Thank you for looking into it. It's appreciated. I would not call it a fundamental problem, rather a design of move_freepages{,_block} which I'd vote for keeping for now. Hopefully I explained it above.
deserve this more time for the next week because I will be offline
Enjoy your time off.
but I would _really_ appreciate if this all got explained.
I'll do my best.
Thanks!
Michal Hocko SUSE Labs
On Thu, Mar 1, 2018 at 4:27 PM, Michal Hocko mhocko@kernel.org wrote:
On Thu 01-03-18 16:09:35, Daniel Vacek wrote:
From registers and stack I digged start_page points to ffffe31d01ed8000 (note that this is page ffffe31d01edffc0 aligned to pageblock) and I can see this in memory dump:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 7b800000 7ffff000 80000000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffe31d01e00000 78000000 0 0 0 0 ffffe31d01ed7fc0 7b5ff000 0 0 0 0 ffffe31d01ed8000 7b600000 0 0 0 0 <<<< note
Are those ranges covered by the System RAM as well?
Sorry I forgot to answer this. If they were, the loop won't be skipping them, right? But it really does not matter here, kernel needs (some) page structures initialized anyways. And I do not feel comfortable with removing the VM_BUG_ON(). The initialization is what changed with commit b92df1de5d28, hence fixing this.
--nX
that nodeid and zonenr are encoded in top bits of page flags which are not initialized here, hence the crash :-( ffffe31d01edff80 7b7fe000 0 0 0 0 ffffe31d01edffc0 7b7ff000 0 0 1 1fffff00000000 ffffe31d01ee0000 7b800000 0 0 1 1fffff00000000 ffffe31d01ffffc0 7ffff000 0 0 1 1fffff00000000
It is still not clear why not to do the alignment in memblock_next_valid_pfn rahter than its caller. -- Michal Hocko SUSE Labs
BUG at mm/page_alloc.c:1913
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") introduced a bug where move_freepages() triggers a VM_BUG_ON() on uninitialized page structure due to pageblock alignment. To fix this, simply align the skipped pfns in memmap_init_zone() the same way as in move_freepages_block().
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: Daniel Vacek neelx@redhat.com Cc: stable@vger.kernel.org --- mm/memblock.c | 13 ++++++------- mm/page_alloc.c | 9 +++++++-- 2 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c index 5a9ca2a1751b..2a5facd236bb 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, *out_nid = r->nid; }
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, - unsigned long max_pfn) +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) { struct memblock_type *type = &memblock.memory; unsigned int right = type->cnt; unsigned int mid, left = 0; - phys_addr_t addr = PFN_PHYS(pfn + 1); + phys_addr_t addr = PFN_PHYS(++pfn);
do { mid = (right + left) / 2; @@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, type->regions[mid].size)) left = mid + 1; else { - /* addr is within the region, so pfn + 1 is valid */ - return min(pfn + 1, max_pfn); + /* addr is within the region, so pfn is valid */ + return pfn; } } while (left < right);
if (right == type->cnt) - return max_pfn; + return -1UL; else - return min(PHYS_PFN(type->regions[right].base), max_pfn); + return PHYS_PFN(type->regions[right].base); }
/** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..eb27ccb50928 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, /* * Skip to the pfn preceding the next valid one (or * end_pfn), such that we hit a valid pfn (or end_pfn) - * on our next iteration of the loop. + * on our next iteration of the loop. Note that it needs + * to be pageblock aligned even when the region itself + * is not as move_freepages_block() can shift ahead of + * the valid region but still depends on correct page + * metadata. */ - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1; + pfn = (memblock_next_valid_pfn(pfn) & + ~(pageblock_nr_pages-1)) - 1; #endif continue; }
Kernel can crash on failed VM_BUG_ON assertion in function move_freepages() on some rare physical memory mappings (with huge range(s) of memory reserved by BIOS followed by usable memory not aligned to pageblock).
crash> page_init_bug -v | grep resource | sed '/RAM .3/,/RAM .4/!d' <struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB) <struct resource 0xffff88067fffd4b8> 646b2000 - 793fefff reserved (333.30 MiB = 341300.00 KiB) <struct resource 0xffff88067fffd4f0> 793ff000 - 7b3fefff ACPI Non-volatile Storage ( 32.00 MiB) <struct resource 0xffff88067fffd528> 7b3ff000 - 7b787fff ACPI Tables ( 3.54 MiB = 3620.00 KiB) <struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
More details in second patch.
v2: Use -1 constant for max_pfn and remove the parameter. That's mostly just a cosmetics. v3: Split to two patches series to make clear what is the actual fix and what is just a clean up. No code changes compared to v2 and second patch is identical to original v1.
Cc: stable@vger.kernel.org
Daniel Vacek (2): mm/memblock: hardcode the max_pfn being -1 mm/page_alloc: fix memmap_init_zone pageblock alignment
mm/memblock.c | 13 ++++++------- mm/page_alloc.c | 9 +++++++-- 2 files changed, 13 insertions(+), 9 deletions(-)
This is just a clean up. It aids preventing to handle the special end case in the next commit.
Signed-off-by: Daniel Vacek neelx@redhat.com Cc: stable@vger.kernel.org --- mm/memblock.c | 13 ++++++------- mm/page_alloc.c | 2 +- 2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/mm/memblock.c b/mm/memblock.c index 5a9ca2a1751b..2a5facd236bb 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_range(int *idx, int nid, *out_nid = r->nid; }
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, - unsigned long max_pfn) +unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) { struct memblock_type *type = &memblock.memory; unsigned int right = type->cnt; unsigned int mid, left = 0; - phys_addr_t addr = PFN_PHYS(pfn + 1); + phys_addr_t addr = PFN_PHYS(++pfn);
do { mid = (right + left) / 2; @@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, type->regions[mid].size)) left = mid + 1; else { - /* addr is within the region, so pfn + 1 is valid */ - return min(pfn + 1, max_pfn); + /* addr is within the region, so pfn is valid */ + return pfn; } } while (left < right);
if (right == type->cnt) - return max_pfn; + return -1UL; else - return min(PHYS_PFN(type->regions[right].base), max_pfn); + return PHYS_PFN(type->regions[right].base); }
/** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb416723538f..f2c57da5bbe5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5361,7 +5361,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * end_pfn), such that we hit a valid pfn (or end_pfn) * on our next iteration of the loop. */ - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1; + pfn = memblock_next_valid_pfn(pfn) - 1; #endif continue; }
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") introduced a bug where move_freepages() triggers a VM_BUG_ON() on uninitialized page structure due to pageblock alignment. To fix this, simply align the skipped pfns in memmap_init_zone() the same way as in move_freepages_block().
From one of the RHEL reports:
crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1 kernel BUG at mm/page_alloc.c:1389! invalid opcode: 0000 [#1] SMP -- RIP: 0010:[<ffffffff8118833e>] [<ffffffff8118833e>] move_freepages+0x15e/0x160 RSP: 0018:ffff88054d727688 EFLAGS: 00010087 -- Call Trace: [<ffffffff811883b3>] move_freepages_block+0x73/0x80 [<ffffffff81189e63>] __rmqueue+0x263/0x460 [<ffffffff8118c781>] get_page_from_freelist+0x7e1/0x9e0 [<ffffffff8118caf6>] __alloc_pages_nodemask+0x176/0x420 -- RIP [<ffffffff8118833e>] move_freepages+0x15e/0x160 RSP <ffff88054d727688>
crash> page_init_bug -v | grep RAM <struct resource 0xffff88067fffd2f8> 1000 - 9bfff System RAM (620.00 KiB) <struct resource 0xffff88067fffd3a0> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB) <struct resource 0xffff88067fffd410> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB) <struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB) <struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB) <struct resource 0xffff88067fffd640> 100000000 - 67fffffff System RAM ( 22.00 GiB)
crash> page_init_bug | head -6 <struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB) <struct page 0xffffea0001ede200> 1fffff00000000 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575 <struct page 0xffffea0001ede200> 505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0> <struct page 0xffffea0001ed8000> 0 0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA 1 4095 <struct page 0xffffea0001edffc0> 1fffff00000400 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575 BUG, zones differ!
Note that this range follows two not populated sections 68000000-77ffffff in this zone. 7b788000-7b7fffff is the first one after a gap. This makes memmap_init_zone() skip all the pfns up to the beginning of this range. But this range is not pageblock (2M) aligned. In fact no range has to be.
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0001e00000 78000000 0 0 0 0 ffffea0001ed7fc0 7b5ff000 0 0 0 0 ffffea0001ed8000 7b600000 0 0 0 0 <<<< ffffea0001ede1c0 7b787000 0 0 0 0 ffffea0001ede200 7b788000 0 0 1 1fffff00000000
Top part of page flags should contain nodeid and zonenr, which is not the case for page ffffea0001ed8000 here (<<<<).
crash> log | grep -o fffea0001ed[^\ ]* | sort -u fffea0001ed8000 fffea0001eded20 fffea0001edffc0
crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u fffea0001ed8000 fffea0001eded00 fffea0001eded20 fffea0001edffc0
Initialization of the whole beginning of the section is skipped up to the start of the range due to the commit b92df1de5d28. Now any code calling move_freepages_block() (like reusing the page from a freelist as in this example) with a page from the beginning of the range will get the page rounded down to start_page ffffea0001ed8000 and passed to move_freepages() which crashes on assertion getting wrong zonenr.
VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Note, page_zone() derives the zone from page flags here.
From similar machine before commit b92df1de5d28:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff73941e00000 78000000 0 0 1 1fffff00000000 fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000 fffff73941ed8000 7b600000 0 0 1 1fffff00000000 fffff73941edff80 7b7fe000 0 0 1 1fffff00000000 fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk
All the pages since the beginning of the section are initialized. move_freepages()' not gonna blow up.
The same machine with this fix applied:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0001e00000 78000000 0 0 0 0 ffffea0001e00000 7b5ff000 0 0 0 0 ffffea0001ed8000 7b600000 0 0 1 1fffff00000000 ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000 ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk
At least the bare minimum of pages is initialized preventing the crash as well.
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: Daniel Vacek neelx@redhat.com Cc: stable@vger.kernel.org --- mm/page_alloc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f2c57da5bbe5..eb27ccb50928 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, /* * Skip to the pfn preceding the next valid one (or * end_pfn), such that we hit a valid pfn (or end_pfn) - * on our next iteration of the loop. + * on our next iteration of the loop. Note that it needs + * to be pageblock aligned even when the region itself + * is not. move_freepages_block() can shift ahead of + * the valid region but still depends on correct page + * metadata. */ - pfn = memblock_next_valid_pfn(pfn) - 1; + pfn = (memblock_next_valid_pfn(pfn) & + ~(pageblock_nr_pages-1)) - 1; #endif continue; }
On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek neelx@redhat.com wrote:
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") introduced a bug where move_freepages() triggers a VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
b92df1de5d28 was merged a year ago. Can you suggest why this hasn't been reported before now?
This makes me wonder whether a -stable backport is really needed...
On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton akpm@linux-foundation.org wrote:
On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek neelx@redhat.com wrote:
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") introduced a bug where move_freepages() triggers a VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
b92df1de5d28 was merged a year ago. Can you suggest why this hasn't been reported before now?
Yeah. I was surprised myself I couldn't find a fix to backport to RHEL. But actually customers started to report this as soon as 7.4 (where b92df1de5d28 was merged in RHEL) was released. I remember reports from September/October-ish times. It's not easily reproduced and happens on a handful of machines only. I guess that's why. But that does not make it less serious, I think.
Though there actually is a report here: https://bugzilla.kernel.org/show_bug.cgi?id=196443
And there are reports for Fedora from July: https://bugzilla.redhat.com/show_bug.cgi?id=1473242 and CentOS: https://bugs.centos.org/view.php?id=13964 and we internally track several dozens reports for RHEL bug https://bugzilla.redhat.com/show_bug.cgi?id=1525121
Enough? ;-)
This makes me wonder whether a -stable backport is really needed...
For some machines it definitely is. Won't hurt either, IMHO.
--nX
Hi,
I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5 but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone pageblock alignment" cause boot hang on my ARM64 platform.
Log: [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x00000009ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff] [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff] [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff] [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff] [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff] [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff] [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff] [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff] [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff] [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff] [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff] [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek neelx@redhat.com wrote:
On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton akpm@linux-foundation.org wrote:
On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek neelx@redhat.com wrote:
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") introduced a bug where move_freepages() triggers a VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
b92df1de5d28 was merged a year ago. Can you suggest why this hasn't been reported before now?
Yeah. I was surprised myself I couldn't find a fix to backport to RHEL. But actually customers started to report this as soon as 7.4 (where b92df1de5d28 was merged in RHEL) was released. I remember reports from September/October-ish times. It's not easily reproduced and happens on a handful of machines only. I guess that's why. But that does not make it less serious, I think.
Though there actually is a report here: https://bugzilla.kernel.org/show_bug.cgi?id=196443
And there are reports for Fedora from July: https://bugzilla.redhat.com/show_bug.cgi?id=1473242 and CentOS: https://bugs.centos.org/view.php?id=13964 and we internally track several dozens reports for RHEL bug https://bugzilla.redhat.com/show_bug.cgi?id=1525121
Enough? ;-)
This makes me wonder whether a -stable backport is really needed...
For some machines it definitely is. Won't hurt either, IMHO.
--nX
On 12 March 2018 at 17:56, Sudeep Holla sudeep.holla@arm.com wrote:
Hi,
I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5 but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone pageblock alignment" cause boot hang on my ARM64 platform.
I have also noticed this problem on hi6220 Hikey - arm64.
LKFT: linux-next: Hikey boot failed linux-next-20180308 https://bugs.linaro.org/show_bug.cgi?id=3676
- Naresh
Log: [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x00000009ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff] [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff] [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff] [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff] [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff] [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff] [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff] [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff] [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff] [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff] [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff] [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek neelx@redhat.com wrote:
On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton akpm@linux-foundation.org wrote:
On Sat, 3 Mar 2018 01:12:26 +0100 Daniel Vacek neelx@redhat.com wrote:
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") introduced a bug where move_freepages() triggers a VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
b92df1de5d28 was merged a year ago. Can you suggest why this hasn't been reported before now?
Yeah. I was surprised myself I couldn't find a fix to backport to RHEL. But actually customers started to report this as soon as 7.4 (where b92df1de5d28 was merged in RHEL) was released. I remember reports from September/October-ish times. It's not easily reproduced and happens on a handful of machines only. I guess that's why. But that does not make it less serious, I think.
Though there actually is a report here: https://bugzilla.kernel.org/show_bug.cgi?id=196443
And there are reports for Fedora from July: https://bugzilla.redhat.com/show_bug.cgi?id=1473242 and CentOS: https://bugs.centos.org/view.php?id=13964 and we internally track several dozens reports for RHEL bug https://bugzilla.redhat.com/show_bug.cgi?id=1525121
Enough? ;-)
This makes me wonder whether a -stable backport is really needed...
For some machines it definitely is. Won't hurt either, IMHO.
--nX
On Mon, Mar 12, 2018 at 3:49 PM, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On 12 March 2018 at 17:56, Sudeep Holla sudeep.holla@arm.com wrote:
Hi,
I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5 but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone pageblock alignment" cause boot hang on my ARM64 platform.
I have also noticed this problem on hi6220 Hikey - arm64.
LKFT: linux-next: Hikey boot failed linux-next-20180308 https://bugs.linaro.org/show_bug.cgi?id=3676
- Naresh
Log: [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x00000009ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff] [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff] [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff] [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff] [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff] [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff] [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff] [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff] [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff] [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff] [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff] [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek neelx@redhat.com wrote:
On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton akpm@linux-foundation.org wrote:
This makes me wonder whether a -stable backport is really needed...
For some machines it definitely is. Won't hurt either, IMHO.
--nX
Hmm, does it step back perhaps?
Can you check if below cures the boot hang?
--nX
~~~~ neelx@metal:~/nX/src/linux$ git diff diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d974cb2a1a1..415571120bbd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * the valid region but still depends on correct page * metadata. */ - pfn = (memblock_next_valid_pfn(pfn, end_pfn) & + unsigned long next_pfn; + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) & ~(pageblock_nr_pages-1)) - 1; + pfn = max(next_pfn, pfn); #endif continue; } ~~~~
On 12/03/18 16:51, Daniel Vacek wrote: [...]
Hmm, does it step back perhaps?
Can you check if below cures the boot hang?
Yes it does fix the boot hang.
--nX
neelx@metal:~/nX/src/linux$ git diff diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d974cb2a1a1..415571120bbd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * the valid region but still depends on correct page * metadata. */ - pfn = (memblock_next_valid_pfn(pfn, end_pfn) & + unsigned long next_pfn; + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) & ~(pageblock_nr_pages-1)) - 1; + pfn = max(next_pfn, pfn); #endif continue; }
On 12 March 2018 at 22:21, Daniel Vacek neelx@redhat.com wrote:
On Mon, Mar 12, 2018 at 3:49 PM, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On 12 March 2018 at 17:56, Sudeep Holla sudeep.holla@arm.com wrote:
Hi,
I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5 but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone pageblock alignment" cause boot hang on my ARM64 platform.
I have also noticed this problem on hi6220 Hikey - arm64.
LKFT: linux-next: Hikey boot failed linux-next-20180308 https://bugs.linaro.org/show_bug.cgi?id=3676
- Naresh
Log: [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x00000009ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff] [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff] [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff] [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff] [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff] [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff] [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff] [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff] [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff] [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff] [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff] [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek neelx@redhat.com wrote:
On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton akpm@linux-foundation.org wrote:
This makes me wonder whether a -stable backport is really needed...
For some machines it definitely is. Won't hurt either, IMHO.
--nX
Hmm, does it step back perhaps?
Can you check if below cures the boot hang?
--nX
neelx@metal:~/nX/src/linux$ git diff diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d974cb2a1a1..415571120bbd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * the valid region but still depends on correct page * metadata. */ - pfn = (memblock_next_valid_pfn(pfn, end_pfn) & + unsigned long next_pfn; + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) & ~(pageblock_nr_pages-1)) - 1; + pfn = max(next_pfn, pfn); #endif continue; }
After applying this patch on linux-next the boot hang problem resolved. Now the hi6220-hikey is booting successfully. Thank you.
- Naresh
On Tue, Mar 13, 2018 at 7:34 AM, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On 12 March 2018 at 22:21, Daniel Vacek neelx@redhat.com wrote:
On Mon, Mar 12, 2018 at 3:49 PM, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On 12 March 2018 at 17:56, Sudeep Holla sudeep.holla@arm.com wrote:
Hi,
I couldn't find the exact mail corresponding to the patch merged in v4.16-rc5 but commit 864b75f9d6b01 "mm/page_alloc: fix memmap_init_zone pageblock alignment" cause boot hang on my ARM64 platform.
I have also noticed this problem on hi6220 Hikey - arm64.
LKFT: linux-next: Hikey boot failed linux-next-20180308 https://bugs.linaro.org/show_bug.cgi?id=3676
- Naresh
Log: [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x00000009ffffffff] [ 0.000000] NUMA: NODE_DATA [mem 0x9fffcb480-0x9fffccf7f] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000ffffffff] [ 0.000000] Normal [mem 0x0000000100000000-0x00000009ffffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000080000000-0x00000000f8f9afff] [ 0.000000] node 0: [mem 0x00000000f8f9b000-0x00000000f908ffff] [ 0.000000] node 0: [mem 0x00000000f9090000-0x00000000f914ffff] [ 0.000000] node 0: [mem 0x00000000f9150000-0x00000000f920ffff] [ 0.000000] node 0: [mem 0x00000000f9210000-0x00000000f922ffff] [ 0.000000] node 0: [mem 0x00000000f9230000-0x00000000f95bffff] [ 0.000000] node 0: [mem 0x00000000f95c0000-0x00000000fe58ffff] [ 0.000000] node 0: [mem 0x00000000fe590000-0x00000000fe5cffff] [ 0.000000] node 0: [mem 0x00000000fe5d0000-0x00000000fe5dffff] [ 0.000000] node 0: [mem 0x00000000fe5e0000-0x00000000fe62ffff] [ 0.000000] node 0: [mem 0x00000000fe630000-0x00000000feffffff] [ 0.000000] node 0: [mem 0x0000000880000000-0x00000009ffffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x00000009ffffffff]
On Sat, Mar 3, 2018 at 1:08 AM, Daniel Vacek neelx@redhat.com wrote:
On Sat, Mar 3, 2018 at 1:40 AM, Andrew Morton akpm@linux-foundation.org wrote:
This makes me wonder whether a -stable backport is really needed...
For some machines it definitely is. Won't hurt either, IMHO.
--nX
Hmm, does it step back perhaps?
Can you check if below cures the boot hang?
--nX
neelx@metal:~/nX/src/linux$ git diff diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d974cb2a1a1..415571120bbd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5365,8 +5365,10 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone, * the valid region but still depends on correct page * metadata. */ - pfn = (memblock_next_valid_pfn(pfn, end_pfn) & + unsigned long next_pfn; + next_pfn = (memblock_next_valid_pfn(pfn, end_pfn) & ~(pageblock_nr_pages-1)) - 1; + pfn = max(next_pfn, pfn); #endif continue; }
After applying this patch on linux-next the boot hang problem resolved. Now the hi6220-hikey is booting successfully. Thank you.
Thank you and Sudeep for testing. I've just sent Andrew a formal patch.
- Naresh
linux-stable-mirror@lists.linaro.org