On Fri, 2 Mar 2018 23:55:29 +0100 Daniel Vacek neelx@redhat.com wrote:
On Fri, Mar 2, 2018 at 9:59 PM, akpm@linux-foundation.org wrote:
The patch titled Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment has been added to the -mm tree. Its filename is mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-memmap_init_zone-... and later at http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-memmap_init_zone-...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
Documentation/process/submit-checklist.rst nowadays, btw.
thanks.
The -mm tree is included into linux-next and is updated there every 3-4 working days
Actually this should go directly to v4.16-rc4. Shall I cc Linus for v3 I'm about to send? Or do you think it's fine for -next and -stable and we keep it like this?
I normally will sit on these things for a week or so before sending to Linus, to make sure they've had a cycle in -next and that reviewers have had time to consider the patch.
Take a look in http://ozlabs.org/~akpm/mmots/series, at the "mainline-urgent" and "mainline-later" sections. Those are the patches which I'm planning on sending to Linus for this -rc cycle.
On this particular patch I'd like to see some reviews and acks - I'm not terribly familiar with the zone initialization code.
Please be aware that this code later gets altered by mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma.patch, which is below. I did some reject-fixing on this one.
From: Eugeniu Rosca erosca@de.adit-jv.com Subject: mm: page_alloc: skip over regions of invalid pfns on UMA
As a result of bisecting the v4.10..v4.11 commit range, it was determined that commits [1] and [2] are both responsible of a ~140ms early startup improvement on Rcar-H3-ES20 arm64 platform.
Since Rcar Gen3 family is not NUMA, we don't define CONFIG_NUMA in the rcar3 defconfig (which also reduces KNL binary image by ~64KB), but this is how the boot time improvement is lost.
This patch makes optimization [2] available on UMA systems which provide support for CONFIG_HAVE_MEMBLOCK.
Testing this change on Rcar H3-ES20-ULCB using v4.15-rc9 KNL and vanilla arm64 defconfig + NUMA=n, a speed-up of ~139ms (from ~174ms [3] to ~35ms [4]) is observed in the execution of memmap_init_zone().
No boot time improvement is sensed on Apollo Lake SoC.
[1] commit 0f84832fb8f9 ("arm64: defconfig: Enable NUMA and NUMA_BALANCING") [2] commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
[3] 174ms spent in memmap_init_zone() on H3ULCB w/o this patch (NUMA=n) [ 2.643685] On node 0 totalpages: 1015808 [ 2.643688] DMA zone: 3584 pages used for memmap [ 2.643691] DMA zone: 0 pages reserved [ 2.643693] DMA zone: 229376 pages, LIFO batch:31 [ 2.643696] > memmap_init_zone [ 2.663628] < memmap_init_zone (19.932 ms) [ 2.663632] Normal zone: 12288 pages used for memmap [ 2.663635] Normal zone: 786432 pages, LIFO batch:31 [ 2.663637] > memmap_init_zone [ 2.818012] < memmap_init_zone (154.375 ms) [ 2.818041] psci: probing for conduit method from DT.
[4] 35ms spent in memmap_init_zone() on H3ULCB with this patch (NUMA=n) [ 2.677202] On node 0 totalpages: 1015808 [ 2.677205] DMA zone: 3584 pages used for memmap [ 2.677208] DMA zone: 0 pages reserved [ 2.677211] DMA zone: 229376 pages, LIFO batch:31 [ 2.677213] > memmap_init_zone [ 2.684378] < memmap_init_zone (7.165 ms) [ 2.684382] Normal zone: 12288 pages used for memmap [ 2.684385] Normal zone: 786432 pages, LIFO batch:31 [ 2.684387] > memmap_init_zone [ 2.712556] < memmap_init_zone (28.169 ms) [ 2.712584] psci: probing for conduit method from DT.
[mhocko@kernel.org: fix build] Link: http://lkml.kernel.org/r/20180222072037.GC30681@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20180217222846.29589-1-rosca.eugeniu@gmail.com Signed-off-by: Eugeniu Rosca erosca@de.adit-jv.com Signed-off-by: Michal Hocko mhocko@suse.com Reported-by: Eugeniu Rosca erosca@de.adit-jv.com Tested-by: Eugeniu Rosca erosca@de.adit-jv.com Acked-by: Michal Hocko mhocko@suse.com Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
include/linux/memblock.h | 3 +- mm/memblock.c | 54 ++++++++++++++++++------------------- mm/page_alloc.c | 2 - 3 files changed, 30 insertions(+), 29 deletions(-)
diff -puN include/linux/memblock.h~mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma include/linux/memblock.h --- a/include/linux/memblock.h~mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma +++ a/include/linux/memblock.h @@ -187,7 +187,6 @@ int memblock_search_pfn_nid(unsigned lon unsigned long *end_pfn); void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid); -unsigned long memblock_next_valid_pfn(unsigned long pfn, unsigned long max_pfn);
/** * for_each_mem_pfn_range - early memory pfn range iterator @@ -204,6 +203,8 @@ unsigned long memblock_next_valid_pfn(un i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid)) #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+unsigned long memblock_next_valid_pfn(unsigned long pfn, unsigned long max_pfn); + /** * for_each_free_mem_range - iterate through free memblock areas * @i: u64 used as loop variable diff -puN mm/memblock.c~mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma mm/memblock.c --- a/mm/memblock.c~mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma +++ a/mm/memblock.c @@ -1101,33 +1101,6 @@ void __init_memblock __next_mem_pfn_rang *out_nid = r->nid; }
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) -{ - struct memblock_type *type = &memblock.memory; - unsigned int right = type->cnt; - unsigned int mid, left = 0; - phys_addr_t addr = PFN_PHYS(++pfn); - - do { - mid = (right + left) / 2; - - if (addr < type->regions[mid].base) - right = mid; - else if (addr >= (type->regions[mid].base + - type->regions[mid].size)) - left = mid + 1; - else { - /* addr is within the region, so pfn is valid */ - return pfn; - } - } while (left < right); - - if (right == type->cnt) - return -1UL; - else - return PHYS_PFN(type->regions[right].base); -} - /** * memblock_set_node - set node ID on memblock regions * @base: base of area to set node ID for @@ -1159,6 +1132,33 @@ int __init_memblock memblock_set_node(ph } #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn) +{ + struct memblock_type *type = &memblock.memory; + unsigned int right = type->cnt; + unsigned int mid, left = 0; + phys_addr_t addr = PFN_PHYS(++pfn); + + do { + mid = (right + left) / 2; + + if (addr < type->regions[mid].base) + right = mid; + else if (addr >= (type->regions[mid].base + + type->regions[mid].size)) + left = mid + 1; + else { + /* addr is within the region, so pfn is valid */ + return pfn; + } + } while (left < right); + + if (right == type->cnt) + return -1UL; + else + return PHYS_PFN(type->regions[right].base); +} + static phys_addr_t __init memblock_alloc_range_nid(phys_addr_t size, phys_addr_t align, phys_addr_t start, phys_addr_t end, int nid, ulong flags) diff -puN mm/page_alloc.c~mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma mm/page_alloc.c --- a/mm/page_alloc.c~mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma +++ a/mm/page_alloc.c @@ -5440,7 +5440,6 @@ void __meminit memmap_init_zone(unsigned goto not_early;
if (!early_pfn_valid(pfn)) { -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP /* * Skip to the pfn preceding the next valid one (or * end_pfn), such that we hit a valid pfn (or end_pfn) @@ -5450,6 +5449,7 @@ void __meminit memmap_init_zone(unsigned * the valid region but still depends on correct page * metadata. */ +#ifdef CONFIG_HAVE_MEMBLOCK pfn = (memblock_next_valid_pfn(pfn) & ~(pageblock_nr_pages-1)) - 1; #endif _