I have posted a patch for the above issue: http://lkml.kernel.org/r/20180716151630.770-1-pasha.tatashin@oracle.com On Mon, Jul 16, 2018 at 9:15 AM Pavel Tatashin pasha.tatashin@oracle.com wrote:
I have figured out what is going with x86-32. Since it has FLATMEM memory layout, the memmap is now allocated after zero_resv_unavail():
Now, we have something like this:
zero_resv_unavail() free_area_init_node() #ifdef CONFIG_FLAT_NODE_MEM_MAP alloc_node_mem_map() #endif
At the time when zero_resv_unavail() is called, memmap for FLAT_NODE_MEM_MAP is not yet allocated. On the other hand, alloc_node_mem_map() calls memblock_virt_alloc_node_nopanic() which calls memset(0), so zero_resv_unavail() is not needed for this layout.
The fix is:
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 5d800d61ddb7..9ec34218713b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6847,7 +6847,9 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids(); +#ifndef CONFIG_FLAT_NODE_MEM_MAP zero_resv_unavail(); +#endif for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, NULL,
This is just a temporary fix, I will do a proper fix later, when I will get rid of zero_resv_unavail(), but that will require more thinking, on how to ensure that no section in memmap is skipped while we go through memmap_init_zone().
Should I submit an updated patch for "mm: zero unavailable pages before memmap init", or just this incremental fix?
Thank you, Pavel
On Mon, Jul 16, 2018 at 7:56 AM Pavel Tatashin pasha.tatashin@oracle.com wrote:
I have reproduced the problem on mainline. Use x86_32 defcontig + qemu, and problem is reproduced immediately. I will send an update once I figure out what is going on.
Pavel On Mon, Jul 16, 2018 at 7:02 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, Jul 16, 2018 at 11:54:51AM +0100, Mark Brown wrote:
On Mon, Jul 16, 2018 at 11:40:06AM +0100, Guillaume Tucker wrote:
On 15/07/18 01:32, kernelci.org bot wrote:
mainline/master boot: 177 boots: 2 failed, 174 passed with 1 conflict (v4.18-rc4-160-gf353078f028f)
Full Boot Summary: https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.18-rc4-16... Full Build Summary: https://kernelci.org/build/mainline/branch/master/kernel/v4.18-rc4-160-gf353...
Tree: mainline Branch: master Git Describe: v4.18-rc4-160-gf353078f028f Git Commit: f353078f028fbfe9acd4b747b4a19c69ef6846cd Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Tested: 67 unique boards, 25 SoC families, 21 builds out of 199
Boot Regressions Detected:
[...]
x86:
i386_defconfig: x86-celeron: lab-mhart: new failure (last pass: v4.18-rc4-147-g2db39a2f491a) x86-pentium4: lab-mhart: new failure (last pass: v4.18-rc4-147-g2db39a2f491a)
Please see below an automated bisection report for this regression. Several bisections were run on other x86 platforms with i386_defconfig on a few revisions up to v4.18-rc5, they all reached the same "bad" commit.
Unfortunately there isn't much to learn from the kernelci.org boot logs as the kernel seems to crash very early on:
https://kernelci.org/boot/all/job/mainline/branch/master/kernel/v4.18-rc5/ https://storage.kernelci.org/mainline/master/v4.18-rc4-160-gf353078f028f/x86/i386_defconfig/lab-mhart/lava-x86-celeron.html
It looks like stable-rc/linux-4.17.y is also broken with i386_defconfig, which tends to confirm the "bad" commit found by the automated bisection which was applied there as well:
https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.17.y/kernel/v4.17.6-68-gbc0bd9e05fa1/
Adding Greg directly to the CCs due to the stable impact, not deleting context for his benefit.
Hey, I read all stable emails, who else would? :)
The automated bisection on kernelci.org is still quite new, so please take the results with a pinch of salt as the "bad" commit found may not be the actual root cause of the boot failure.
Hope this helps!
Best wishes, Guillaume
--------------------------------------8<--------------------------------------
Bisection result for mainline/master (v4.18-rc4-160-gf353078f028f) on x86-celeron
Good: 2db39a2f491a Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Bad: f353078f028f Merge branch 'akpm' (patches from Andrew) Found: e181ae0c5db9 mm: zero unavailable pages before memmap init
Checks: revert: PASS verify: PASS
Parameters: Tree: mainline URL: http://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Branch: master Target: x86-celeron Lab: lab-mhart Config: i386_defconfig Plan: boot
Breaking commit found:
commit e181ae0c5db9544de9c53239eb22bc012ce75033 Author: Pavel Tatashin pasha.tatashin@oracle.com Date: Sat Jul 14 09:15:07 2018 -0400
mm: zero unavailable pages before memmap init We must zero struct pages for memory that is not backed by physical memory, or kernel does not have access to. Recently, there was a change which zeroed all memmap for all holes in e820. Unfortunately, it introduced a bug that is discussed here: https://www.spinics.net/lists/linux-mm/msg156764.html Linus, also saw this bug on his machine, and confirmed that reverting commit 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") fixes the issue. The problem is that we incorrectly zero some struct pages after they were setup. The fix is to zero unavailable struct pages prior to initializing of struct pages. A more detailed fix should come later that would avoid double zeroing cases: one in __init_single_page(), the other one in zero_resv_unavail(). Fixes: 124049decbb1 ("x86/e820: put !E820_TYPE_RAM regions into memblock.reserved") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1521100f1e63..5d800d61ddb7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6847,6 +6847,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) /* Initialise every node */ mminit_verify_pageflags_layout(); setup_nr_node_ids();
- zero_resv_unavail(); for_each_online_node(nid) { pg_data_t *pgdat = NODE_DATA(nid); free_area_init_node(nid, NULL,
@@ -6857,7 +6858,6 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) node_set_state(nid, N_MEMORY); check_for_memory(pgdat, nid); }
- zero_resv_unavail();
} static int __init cmdline_parse_core(char *p, unsigned long *core, @@ -7033,9 +7033,9 @@ void __init set_dma_reserve(unsigned long new_dma_reserve) void __init free_area_init(unsigned long *zones_size) {
- zero_resv_unavail(); free_area_init_node(0, zones_size, __pa(PAGE_OFFSET) >> PAGE_SHIFT, NULL);
- zero_resv_unavail();
} static int page_alloc_cpu_dead(unsigned int cpu)
So this patch breaks i386, ick. I'll wait for the fix to hit Linus's tree as that's a bit more important to have the large majority of the x86-64 boxes fixed with this patch for now.
thanks,
greg k-h