The patch titled
Subject: mm/hugetlb: avoid hardcoding while checking if cma is enabled
has been added to the -mm tree. Its filename is
mm-hugetlb-avoid-hardcoding-while-checking-if-cma-is-enabled.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-avoid-hardcoding-while-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-avoid-hardcoding-while-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Barry Song <song.bao.hua(a)hisilicon.com>
Subject: mm/hugetlb: avoid hardcoding while checking if cma is enabled
hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has
no memory. so NULL hugetlb_cma[0] doesn't necessarily mean cma is not
enabled. gigantic pages might have been reserved on other nodes. This
patch fixes possible double reservation and CMA leak.
Link: http://lkml.kernel.org/r/20200710005726.36068-1-song.bao.hua@hisilicon.com
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Barry Song <song.bao.hua(a)hisilicon.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-avoid-hardcoding-while-checking-if-cma-is-enabled
+++ a/mm/hugetlb.c
@@ -46,6 +46,7 @@ unsigned int default_hstate_idx;
struct hstate hstates[HUGE_MAX_HSTATE];
static struct cma *hugetlb_cma[MAX_NUMNODES];
+static unsigned long hugetlb_cma_size __initdata;
/*
* Minimum page order among possible hugepage sizes, set to a proper value
@@ -2571,7 +2572,7 @@ static void __init hugetlb_hstate_alloc_
for (i = 0; i < h->max_huge_pages; ++i) {
if (hstate_is_gigantic(h)) {
- if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
+ if (hugetlb_cma_size) {
pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n");
break;
}
@@ -5654,7 +5655,6 @@ void move_hugetlb_state(struct page *old
}
#ifdef CONFIG_CMA
-static unsigned long hugetlb_cma_size __initdata;
static bool cma_reserve_called __initdata;
static int __init cmdline_parse_hugetlb_cma(char *p)
_
Patches currently in -mm which might be from song.bao.hua(a)hisilicon.com are
mm-hugetlb-avoid-hardcoding-while-checking-if-cma-is-enabled.patch
mm-hugetlb-split-hugetlb_cma-in-nodes-with-memory.patch
mm-cma-fix-the-name-of-cma-areas.patch
mm-hugetlb-fix-the-name-of-hugetlb-cma.patch
The patch titled
Subject: mm/hugetlb: avoid hardcoding while checking if cma is enabled
has been removed from the -mm tree. Its filename was
mm-hugetlb-avoid-hardcoding-while-checking-if-cma-is-enable.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Barry Song <song.bao.hua(a)hisilicon.com>
Subject: mm/hugetlb: avoid hardcoding while checking if cma is enabled
hugetlb_cma[0] can be NULL due to various reasons, for example, node0 has
no memory. so NULL hugetlb_cma[0] doesn't necessarily mean cma is not
enabled. gigantic pages might have been reserved on other nodes.
Mike Kravetz said:
: Based on the code changes, I believe the following could happen:
: - Someone uses 'hugetlb_cma=' kernel command line parameter to reserve
: CMA for gigantic pages.
: - The system topology is such that no memory is on node 0. Therefore,
: no CMA can be reserved for gigantic pages on node 0. CMA is reserved
: on other nodes.
: - The user also specifies a number of gigantic pages to pre-allocate on
: the command line with hugepagesz=<gigantic_page_size> hugepages=<N>
: - The routine which allocates gigantic pages from the bootmem allocator
: will not detect CMA has been reserved as there is no memory on node 0.
: Therefore, pages will be pre-allocated from bootmem allocator as well
: as reserved in CMA.
:
: This double allocation (bootmem and CMA) is the worst case scenario. Not
: sure if this is what Barry saw, and I suspect this would rarely happen.
Link: http://lkml.kernel.org/r/20200707040204.30132-1-song.bao.hua@hisilicon.com
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Barry Song <song.bao.hua(a)hisilicon.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-avoid-hardcoding-while-checking-if-cma-is-enable
+++ a/mm/hugetlb.c
@@ -2546,6 +2546,20 @@ static void __init gather_bootmem_preall
}
}
+bool __init hugetlb_cma_enabled(void)
+{
+#ifdef CONFIG_CMA
+ int node;
+
+ for_each_online_node(node) {
+ if (hugetlb_cma[node])
+ return true;
+ }
+#endif
+
+ return false;
+}
+
static void __init hugetlb_hstate_alloc_pages(struct hstate *h)
{
unsigned long i;
@@ -2571,7 +2585,7 @@ static void __init hugetlb_hstate_alloc_
for (i = 0; i < h->max_huge_pages; ++i) {
if (hstate_is_gigantic(h)) {
- if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
+ if (hugetlb_cma_enabled()) {
pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n");
break;
}
_
Patches currently in -mm which might be from song.bao.hua(a)hisilicon.com are
mm-hugetlb-split-hugetlb_cma-in-nodes-with-memory.patch
mm-cma-fix-the-name-of-cma-areas.patch
mm-hugetlb-fix-the-name-of-hugetlb-cma.patch
mm-hugetlb-avoid-hardcoding-while-checking-if-cma-is-enabled.patch
The patch titled
Subject: mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
has been added to the -mm tree. Its filename is
mm-memory_hotplug-fix-unpaired-mem_hotplug_begin-done.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-memory_hotplug-fix-unpaired-mem…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-memory_hotplug-fix-unpaired-mem…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jia He <justin.he(a)arm.com>
Subject: mm/memory_hotplug: fix unpaired mem_hotplug_begin/done
When check_memblock_offlined_cb() returns failed rc(e.g. the memblock is
online at that time), mem_hotplug_begin/done is unpaired in such case.
Therefore a warning:
Call Trace:
percpu_up_write+0x33/0x40
try_remove_memory+0x66/0x120
? _cond_resched+0x19/0x30
remove_memory+0x2b/0x40
dev_dax_kmem_remove+0x36/0x72 [kmem]
device_release_driver_internal+0xf0/0x1c0
device_release_driver+0x12/0x20
bus_remove_device+0xe1/0x150
device_del+0x17b/0x3e0
unregister_dev_dax+0x29/0x60
devm_action_release+0x15/0x20
release_nodes+0x19a/0x1e0
devres_release_all+0x3f/0x50
device_release_driver_internal+0x100/0x1c0
driver_detach+0x4c/0x8f
bus_remove_driver+0x5c/0xd0
driver_unregister+0x31/0x50
dax_pmem_exit+0x10/0xfe0 [dax_pmem]
Link: http://lkml.kernel.org/r/20200710031619.18762-3-justin.he@arm.com
Fixes: f1037ec0cc8a ("mm/memory_hotplug: fix remove_memory() lockdep splat")
Signed-off-by: Jia He <justin.he(a)arm.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org> [5.6+]
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Fenghua Yu <fenghua.yu(a)intel.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron(a)Huawei.com>
Cc: Kaly Xin <Kaly.Xin(a)arm.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Yoshinori Sato <ysato(a)users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-unpaired-mem_hotplug_begin-done
+++ a/mm/memory_hotplug.c
@@ -1756,7 +1756,7 @@ static int __ref try_remove_memory(int n
*/
rc = walk_memory_blocks(start, size, NULL, check_memblock_offlined_cb);
if (rc)
- goto done;
+ return rc;
/* remove memmap entry */
firmware_map_remove(start, start + size, "System RAM");
@@ -1780,9 +1780,8 @@ static int __ref try_remove_memory(int n
try_offline_node(nid);
-done:
mem_hotplug_done();
- return rc;
+ return 0;
}
/**
_
Patches currently in -mm which might be from justin.he(a)arm.com are
mm-memory_hotplug-introduce-default-dummy-memory_add_physaddr_to_nid.patch
mm-memory_hotplug-fix-unpaired-mem_hotplug_begin-done.patch
VMA with VM_GROWSDOWN or VM_GROWSUP flag set can change their size under
mmap_read_lock(). It can lead to race with __do_munmap():
Thread A Thread B
__do_munmap()
detach_vmas_to_be_unmapped()
mmap_write_downgrade()
expand_downwards()
vma->vm_start = address;
// The VMA now overlaps with
// VMAs detached by the Thread A
// page fault populates expanded part
// of the VMA
unmap_region()
// Zaps pagetables partly
// populated by Thread B
Similar race exists for expand_upwards().
The fix is to avoid downgrading mmap_lock in __do_munmap() if detached
VMAs are next to VM_GROWSDOWN or VM_GROWSUP VMA.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Jann Horn <jannh(a)google.com>
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Cc: <stable(a)vger.kernel.org> # 4.20
Cc: Yang Shi <yang.shi(a)linux.alibaba.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
---
mm/mmap.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 59a4682ebf3f..71df4b36b42a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2620,7 +2620,7 @@ static void unmap_region(struct mm_struct *mm,
* Create a list of vma's touched by the unmap, removing them from the mm's
* vma list as we go..
*/
-static void
+static bool
detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
struct vm_area_struct *prev, unsigned long end)
{
@@ -2645,6 +2645,17 @@ detach_vmas_to_be_unmapped(struct mm_struct *mm, struct vm_area_struct *vma,
/* Kill the cache */
vmacache_invalidate(mm);
+
+ /*
+ * Do not downgrade mmap_sem if we are next to VM_GROWSDOWN or
+ * VM_GROWSUP VMA. Such VMAs can change their size under
+ * down_read(mmap_sem) and collide with the VMA we are about to unmap.
+ */
+ if (vma && (vma->vm_flags & VM_GROWSDOWN))
+ return false;
+ if (prev && (prev->vm_flags & VM_GROWSUP))
+ return false;
+ return true;
}
/*
@@ -2825,7 +2836,8 @@ int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
}
/* Detach vmas from rbtree */
- detach_vmas_to_be_unmapped(mm, vma, prev, end);
+ if (!detach_vmas_to_be_unmapped(mm, vma, prev, end))
+ downgrade = false;
if (downgrade)
mmap_write_downgrade(mm);
--
2.26.2