The patch titled
Subject: mm/hmm: hmm_pfns_bad() was accessing wrong struct
has been added to the -mm tree. Its filename is
mm-hmm-hmm_pfns_bad-was-accessing-wrong-struct.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-hmm_pfns_bad-was-accessing-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-hmm_pfns_bad-was-accessing-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jérôme Glisse <jglisse(a)redhat.com>
Subject: mm/hmm: hmm_pfns_bad() was accessing wrong struct
The private field of mm_walk struct point to an hmm_vma_walk struct and
not to the hmm_range struct desired. Fix to get proper struct pointer.
Link: http://lkml.kernel.org/r/20180323005527.758-6-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Evgeny Baskakov <ebaskakov(a)nvidia.com>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Mark Hairgrove <mhairgrove(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hmm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN mm/hmm.c~mm-hmm-hmm_pfns_bad-was-accessing-wrong-struct mm/hmm.c
--- a/mm/hmm.c~mm-hmm-hmm_pfns_bad-was-accessing-wrong-struct
+++ a/mm/hmm.c
@@ -336,7 +336,8 @@ static int hmm_pfns_bad(unsigned long ad
unsigned long end,
struct mm_walk *walk)
{
- struct hmm_range *range = walk->private;
+ struct hmm_vma_walk *hmm_vma_walk = walk->private;
+ struct hmm_range *range = hmm_vma_walk->range;
hmm_pfn_t *pfns = range->pfns;
unsigned long i;
_
Patches currently in -mm which might be from jglisse(a)redhat.com are
mm-hmm-fix-header-file-if-else-endif-maze-v2.patch
mm-hmm-unregister-mmu_notifier-when-last-hmm-client-quit-v3.patch
mm-hmm-hmm_pfns_bad-was-accessing-wrong-struct.patch
mm-hmm-use-struct-for-hmm_vma_fault-hmm_vma_get_pfns-parameters-v2.patch
mm-hmm-remove-hmm_pfn_read-flag-and-ignore-peculiar-architecture-v2.patch
mm-hmm-use-uint64_t-for-hmm-pfn-instead-of-defining-hmm_pfn_t-to-ulong-v2.patch
mm-hmm-cleanup-special-vma-handling-vm_special.patch
mm-hmm-do-not-differentiate-between-empty-entry-or-missing-directory-v3.patch
mm-hmm-rename-hmm_pfn_device_unaddressable-to-hmm_pfn_device_private.patch
mm-hmm-move-hmm_pfns_clear-closer-to-where-it-is-use.patch
mm-hmm-factor-out-pte-and-pmd-handling-to-simplify-hmm_vma_walk_pmd-v2.patch
mm-hmm-change-hmm_vma_fault-to-allow-write-fault-on-page-basis.patch
mm-hmm-use-device-driver-encoding-for-hmm-pfn-v2.patch
The patch titled
Subject: mm/hmm: fix header file if/else/endif maze
has been added to the -mm tree. Its filename is
mm-hmm-fix-header-file-if-else-endif-maze-v2.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-fix-header-file-if-else-end…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-fix-header-file-if-else-end…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jérôme Glisse <jglisse(a)redhat.com>
Subject: mm/hmm: fix header file if/else/endif maze
The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. Because of
this after multiple include there was multiple definition of both
hmm_mm_init() and hmm_mm_destroy() leading to build failure if HMM was
enabled (CONFIG_HMM set).
Link: http://lkml.kernel.org/r/20180323005527.758-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse(a)redhat.com>
Acked-by: Balbir Singh <bsingharora(a)gmail.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Evgeny Baskakov <ebaskakov(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hmm.h | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff -puN include/linux/hmm.h~mm-hmm-fix-header-file-if-else-endif-maze-v2 include/linux/hmm.h
--- a/include/linux/hmm.h~mm-hmm-fix-header-file-if-else-endif-maze-v2
+++ a/include/linux/hmm.h
@@ -498,23 +498,16 @@ struct hmm_device {
struct hmm_device *hmm_device_new(void *drvdata);
void hmm_device_put(struct hmm_device *hmm_device);
#endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */
-#endif /* IS_ENABLED(CONFIG_HMM) */
/* Below are for HMM internal use only! Not to be used by device driver! */
-#if IS_ENABLED(CONFIG_HMM_MIRROR)
void hmm_mm_destroy(struct mm_struct *mm);
static inline void hmm_mm_init(struct mm_struct *mm)
{
mm->hmm = NULL;
}
-#else /* IS_ENABLED(CONFIG_HMM_MIRROR) */
-static inline void hmm_mm_destroy(struct mm_struct *mm) {}
-static inline void hmm_mm_init(struct mm_struct *mm) {}
-#endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
-
-
#else /* IS_ENABLED(CONFIG_HMM) */
static inline void hmm_mm_destroy(struct mm_struct *mm) {}
static inline void hmm_mm_init(struct mm_struct *mm) {}
+#endif /* IS_ENABLED(CONFIG_HMM) */
#endif /* LINUX_HMM_H */
_
Patches currently in -mm which might be from jglisse(a)redhat.com are
mm-hmm-fix-header-file-if-else-endif-maze-v2.patch
mm-hmm-unregister-mmu_notifier-when-last-hmm-client-quit-v3.patch
mm-hmm-hmm_pfns_bad-was-accessing-wrong-struct.patch
mm-hmm-use-struct-for-hmm_vma_fault-hmm_vma_get_pfns-parameters-v2.patch
mm-hmm-remove-hmm_pfn_read-flag-and-ignore-peculiar-architecture-v2.patch
mm-hmm-use-uint64_t-for-hmm-pfn-instead-of-defining-hmm_pfn_t-to-ulong-v2.patch
mm-hmm-cleanup-special-vma-handling-vm_special.patch
mm-hmm-do-not-differentiate-between-empty-entry-or-missing-directory-v3.patch
mm-hmm-rename-hmm_pfn_device_unaddressable-to-hmm_pfn_device_private.patch
mm-hmm-move-hmm_pfns_clear-closer-to-where-it-is-use.patch
mm-hmm-factor-out-pte-and-pmd-handling-to-simplify-hmm_vma_walk_pmd-v2.patch
mm-hmm-change-hmm_vma_fault-to-allow-write-fault-on-page-basis.patch
mm-hmm-use-device-driver-encoding-for-hmm-pfn-v2.patch
From: Toshi Kani <toshi.kani(a)hpe.com>
Subject: mm/vmalloc: add interfaces to free unmapped page table
On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may create
pud/pmd mappings. Kernel panic was observed on arm64 systems with
Cortex-A75 in the following steps as described by Hanjun Guo.
1. ioremap a 4K size, valid page table will build,
2. iounmap it, pte0 will set to 0;
3. ioremap the same address with 2M size, pgd/pmd is unchanged,
then set the a new value for pmd;
4. pte0 is leaked;
5. CPU may meet exception because the old pmd is still in TLB,
which will lead to kernel panic.
This panic is not reproducible on x86. INVLPG, called from iounmap,
purges all levels of entries associated with purged address on x86. x86
still has memory leak.
The patch changes the ioremap path to free unmapped page table(s) since
doing so in the unmap path has the following issues:
- The iounmap() path is shared with vunmap(). Since vmap() only
supports pte mappings, making vunmap() to free a pte page is an
overhead for regular vmap users as they do not need a pte page
freed up.
- Checking if all entries in a pte page are cleared in the unmap path
is racy, and serializing this check is expensive.
- The unmap path calls free_vmap_area_noflush() to do lazy TLB purges.
Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB
purge.
Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which
clear a given pud/pmd entry and free up a page for the lower level
entries.
This patch implements their stub functions on x86 and arm64, which work as
workaround.
[akpm(a)linux-foundation.org: fix typo in pmd_free_pte_page() stub]
Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com
Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
Reported-by: Lei Li <lious.lilei(a)hisilicon.com>
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Wang Xuefeng <wxf.wang(a)hisilicon.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Hanjun Guo <guohanjun(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Chintan Pandya <cpandya(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <>
---
arch/arm64/mm/mmu.c | 10 ++++++++++
arch/x86/mm/pgtable.c | 24 ++++++++++++++++++++++++
include/asm-generic/pgtable.h | 10 ++++++++++
lib/ioremap.c | 6 ++++--
4 files changed, 48 insertions(+), 2 deletions(-)
diff -puN arch/arm64/mm/mmu.c~mm-vmalloc-add-interfaces-to-free-unmapped-page-table arch/arm64/mm/mmu.c
--- a/arch/arm64/mm/mmu.c~mm-vmalloc-add-interfaces-to-free-unmapped-page-table
+++ a/arch/arm64/mm/mmu.c
@@ -972,3 +972,13 @@ int pmd_clear_huge(pmd_t *pmdp)
pmd_clear(pmdp);
return 1;
}
+
+int pud_free_pmd_page(pud_t *pud)
+{
+ return pud_none(*pud);
+}
+
+int pmd_free_pte_page(pmd_t *pmd)
+{
+ return pmd_none(*pmd);
+}
diff -puN arch/x86/mm/pgtable.c~mm-vmalloc-add-interfaces-to-free-unmapped-page-table arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c~mm-vmalloc-add-interfaces-to-free-unmapped-page-table
+++ a/arch/x86/mm/pgtable.c
@@ -702,4 +702,28 @@ int pmd_clear_huge(pmd_t *pmd)
return 0;
}
+
+/**
+ * pud_free_pmd_page - Clear pud entry and free pmd page.
+ * @pud: Pointer to a PUD.
+ *
+ * Context: The pud range has been unmaped and TLB purged.
+ * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ */
+int pud_free_pmd_page(pud_t *pud)
+{
+ return pud_none(*pud);
+}
+
+/**
+ * pmd_free_pte_page - Clear pmd entry and free pte page.
+ * @pmd: Pointer to a PMD.
+ *
+ * Context: The pmd range has been unmaped and TLB purged.
+ * Return: 1 if clearing the entry succeeded. 0 otherwise.
+ */
+int pmd_free_pte_page(pmd_t *pmd)
+{
+ return pmd_none(*pmd);
+}
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
diff -puN include/asm-generic/pgtable.h~mm-vmalloc-add-interfaces-to-free-unmapped-page-table include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h~mm-vmalloc-add-interfaces-to-free-unmapped-page-table
+++ a/include/asm-generic/pgtable.h
@@ -983,6 +983,8 @@ int pud_set_huge(pud_t *pud, phys_addr_t
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
int pud_clear_huge(pud_t *pud);
int pmd_clear_huge(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud);
+int pmd_free_pte_page(pmd_t *pmd);
#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
{
@@ -1008,6 +1010,14 @@ static inline int pmd_clear_huge(pmd_t *
{
return 0;
}
+static inline int pud_free_pmd_page(pud_t *pud)
+{
+ return 0;
+}
+static inline int pmd_free_pte_page(pmd_t *pmd)
+{
+ return 0;
+}
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
#ifndef __HAVE_ARCH_FLUSH_PMD_TLB_RANGE
diff -puN lib/ioremap.c~mm-vmalloc-add-interfaces-to-free-unmapped-page-table lib/ioremap.c
--- a/lib/ioremap.c~mm-vmalloc-add-interfaces-to-free-unmapped-page-table
+++ a/lib/ioremap.c
@@ -91,7 +91,8 @@ static inline int ioremap_pmd_range(pud_
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
- IS_ALIGNED(phys_addr + addr, PMD_SIZE)) {
+ IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
+ pmd_free_pte_page(pmd)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -117,7 +118,8 @@ static inline int ioremap_pud_range(p4d_
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
- IS_ALIGNED(phys_addr + addr, PUD_SIZE)) {
+ IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
+ pud_free_pmd_page(pud)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}
_
The patch titled
Subject: mm/vmscan: wake up flushers for legacy cgroups too
has been removed from the -mm tree. Its filename was
mm-vmscan-wake-up-flushers-for-legacy-cgroups-too.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Subject: mm/vmscan: wake up flushers for legacy cgroups too
Commit 726d061fbd36 ("mm: vmscan: kick flushers when we encounter dirty
pages on the LRU") added flusher invocation to shrink_inactive_list() when
many dirty pages on the LRU are encountered.
However, shrink_inactive_list() doesn't wake up flushers for legacy cgroup
reclaim, so the next commit bbef938429f5 ("mm: vmscan: remove old flusher
wakeup from direct reclaim path") removed the only source of flusher's
wake up in legacy mem cgroup reclaim path.
This leads to premature OOM if there is too many dirty pages in cgroup:
# mkdir /sys/fs/cgroup/memory/test
# echo $$ > /sys/fs/cgroup/memory/test/tasks
# echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes
# dd if=/dev/zero of=tmp_file bs=1M count=100
Killed
dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
Call Trace:
dump_stack+0x46/0x65
dump_header+0x6b/0x2ac
oom_kill_process+0x21c/0x4a0
out_of_memory+0x2a5/0x4b0
mem_cgroup_out_of_memory+0x3b/0x60
mem_cgroup_oom_synchronize+0x2ed/0x330
pagefault_out_of_memory+0x24/0x54
__do_page_fault+0x521/0x540
page_fault+0x45/0x50
Task in /test killed as a result of limit of /test
memory: usage 51200kB, limit 51200kB, failcnt 73
memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0
kmem: usage 296kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB
mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB
active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB
Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child
Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB
oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Wake up flushers in legacy cgroup reclaim too.
Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com
Fixes: bbef938429f5 ("mm: vmscan: remove old flusher wakeup from direct reclaim path")
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Tested-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff -puN mm/vmscan.c~mm-vmscan-wake-up-flushers-for-legacy-cgroups-too mm/vmscan.c
--- a/mm/vmscan.c~mm-vmscan-wake-up-flushers-for-legacy-cgroups-too
+++ a/mm/vmscan.c
@@ -1780,6 +1780,20 @@ shrink_inactive_list(unsigned long nr_to
set_bit(PGDAT_WRITEBACK, &pgdat->flags);
/*
+ * If dirty pages are scanned that are not queued for IO, it
+ * implies that flushers are not doing their job. This can
+ * happen when memory pressure pushes dirty pages to the end of
+ * the LRU before the dirty limits are breached and the dirty
+ * data has expired. It can also happen when the proportion of
+ * dirty pages grows not through writes but through memory
+ * pressure reclaiming all the clean cache. And in some cases,
+ * the flushers simply cannot keep up with the allocation
+ * rate. Nudge the flusher threads in case they are asleep.
+ */
+ if (stat.nr_unqueued_dirty == nr_taken)
+ wakeup_flusher_threads(WB_REASON_VMSCAN);
+
+ /*
* Legacy memcg will stall in page writeback so avoid forcibly
* stalling here.
*/
@@ -1791,22 +1805,9 @@ shrink_inactive_list(unsigned long nr_to
if (stat.nr_dirty && stat.nr_dirty == stat.nr_congested)
set_bit(PGDAT_CONGESTED, &pgdat->flags);
- /*
- * If dirty pages are scanned that are not queued for IO, it
- * implies that flushers are not doing their job. This can
- * happen when memory pressure pushes dirty pages to the end of
- * the LRU before the dirty limits are breached and the dirty
- * data has expired. It can also happen when the proportion of
- * dirty pages grows not through writes but through memory
- * pressure reclaiming all the clean cache. And in some cases,
- * the flushers simply cannot keep up with the allocation
- * rate. Nudge the flusher threads in case they are asleep, but
- * also allow kswapd to start writing pages during reclaim.
- */
- if (stat.nr_unqueued_dirty == nr_taken) {
- wakeup_flusher_threads(WB_REASON_VMSCAN);
+ /* Allow kswapd to start writing pages during reclaim. */
+ if (stat.nr_unqueued_dirty == nr_taken)
set_bit(PGDAT_DIRTY, &pgdat->flags);
- }
/*
* If kswapd scans pages marked marked for immediate
_
Patches currently in -mm which might be from aryabinin(a)virtuozzo.com are
mm-vmscan-update-stale-comments.patch
mm-vmscan-replace-mm_vmscan_lru_shrink_inactive-with-shrink_page_list-tracepoint.patch
mm-vmscan-remove-redundant-current_may_throttle-check.patch
mm-vmscan-dont-change-pgdat-state-on-base-of-a-single-lru-list-state.patch
mm-vmscan-dont-mess-with-pgdat-flags-in-memcg-reclaim.patch
mm-kasan-dont-vfree-nonexistent-vm_area.patch
The patch titled
Subject: mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
has been removed from the -mm tree. Its filename was
mm-shmem-do-not-wait-for-lock_page-in-shmem_unused_huge_shrink.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()
shmem_unused_huge_shrink() gets called from reclaim path. Waiting for
page lock may lead to deadlock there.
There was a bug report that may be attributed to this:
http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.…
Replace lock_page() with trylock_page() and skip the page if we failed to
lock it. We will get to the page on the next scan.
We can test for the PageTransHuge() outside the page lock as we only need
protection against splitting the page under us. Holding pin oni the page
is enough for this.
Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel…
Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Eric Wheeler <linux-mm(a)lists.ewheeler.net>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 31 ++++++++++++++++++++-----------
1 file changed, 20 insertions(+), 11 deletions(-)
diff -puN mm/shmem.c~mm-shmem-do-not-wait-for-lock_page-in-shmem_unused_huge_shrink mm/shmem.c
--- a/mm/shmem.c~mm-shmem-do-not-wait-for-lock_page-in-shmem_unused_huge_shrink
+++ a/mm/shmem.c
@@ -493,36 +493,45 @@ next:
info = list_entry(pos, struct shmem_inode_info, shrinklist);
inode = &info->vfs_inode;
- if (nr_to_split && split >= nr_to_split) {
- iput(inode);
- continue;
- }
+ if (nr_to_split && split >= nr_to_split)
+ goto leave;
- page = find_lock_page(inode->i_mapping,
+ page = find_get_page(inode->i_mapping,
(inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT);
if (!page)
goto drop;
+ /* No huge page at the end of the file: nothing to split */
if (!PageTransHuge(page)) {
- unlock_page(page);
put_page(page);
goto drop;
}
+ /*
+ * Leave the inode on the list if we failed to lock
+ * the page at this time.
+ *
+ * Waiting for the lock may lead to deadlock in the
+ * reclaim path.
+ */
+ if (!trylock_page(page)) {
+ put_page(page);
+ goto leave;
+ }
+
ret = split_huge_page(page);
unlock_page(page);
put_page(page);
- if (ret) {
- /* split failed: leave it on the list */
- iput(inode);
- continue;
- }
+ /* If split failed leave the inode on the list */
+ if (ret)
+ goto leave;
split++;
drop:
list_del_init(&info->shrinklist);
removed++;
+leave:
iput(inode);
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
The patch titled
Subject: mm/thp: do not wait for lock_page() in deferred_split_scan()
has been removed from the -mm tree. Its filename was
mm-thp-do-not-wait-for-lock_page-in-deferred_split_scan.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/thp: do not wait for lock_page() in deferred_split_scan()
deferred_split_scan() gets called from reclaim path. Waiting for page
lock may lead to deadlock there.
Replace lock_page() with trylock_page() and skip the page if we failed to
lock it. We will get to the page on the next scan.
Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel…
Fixes: 9a982250f773 ("thp: introduce deferred_split_huge_page()")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/huge_memory.c~mm-thp-do-not-wait-for-lock_page-in-deferred_split_scan mm/huge_memory.c
--- a/mm/huge_memory.c~mm-thp-do-not-wait-for-lock_page-in-deferred_split_scan
+++ a/mm/huge_memory.c
@@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan
list_for_each_safe(pos, next, &list) {
page = list_entry((void *)pos, struct page, mapping);
- lock_page(page);
+ if (!trylock_page(page))
+ goto next;
/* split_huge_page() removes page from list on success */
if (!split_huge_page(page))
split++;
unlock_page(page);
+next:
put_page(page);
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
The patch titled
Subject: mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
has been removed from the -mm tree. Its filename was
mm-khugepaged-convert-vm_bug_on-to-collapse-fail.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/khugepaged.c: convert VM_BUG_ON() to collapse fail
khugepaged is not yet able to convert PTE-mapped huge pages back to PMD
mapped. We do not collapse such pages. See check khugepaged_scan_pmd().
But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate()
somebody managed to instantiate THP in the range and then split the PMD
back to PTEs we would have a problem -- VM_BUG_ON_PAGE(PageCompound(page))
will get triggered.
It's possible since we drop mmap_sem during collapse to re-take for
write.
Replace the VM_BUG_ON() with graceful collapse fail.
Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel…
Fixes: b1caa957ae6d ("khugepaged: ignore pmd tables with THP mapped with ptes")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Jerome Marchand <jmarchan(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff -puN mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail mm/khugepaged.c
--- a/mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail
+++ a/mm/khugepaged.c
@@ -530,7 +530,12 @@ static int __collapse_huge_page_isolate(
goto out;
}
- VM_BUG_ON_PAGE(PageCompound(page), page);
+ /* TODO: teach khugepaged to collapse THP mapped with pte */
+ if (PageCompound(page)) {
+ result = SCAN_PAGE_COMPOUND;
+ goto out;
+ }
+
VM_BUG_ON_PAGE(!PageAnon(page), page);
/*
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
The patch titled
Subject: x86/mm: implement free pmd/pte page interfaces
has been removed from the -mm tree. Its filename was
x86-mm-implement-free-pmd-pte-page-interfaces.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Toshi Kani <toshi.kani(a)hpe.com>
Subject: x86/mm: implement free pmd/pte page interfaces
Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which clear
a given pud/pmd entry and free up lower level page table(s). Address
range associated with the pud/pmd entry must have been purged by INVLPG.
Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com
Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Reported-by: Lei Li <lious.lilei(a)hisilicon.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/mm/pgtable.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff -puN arch/x86/mm/pgtable.c~x86-mm-implement-free-pmd-pte-page-interfaces arch/x86/mm/pgtable.c
--- a/arch/x86/mm/pgtable.c~x86-mm-implement-free-pmd-pte-page-interfaces
+++ a/arch/x86/mm/pgtable.c
@@ -712,7 +712,22 @@ int pmd_clear_huge(pmd_t *pmd)
*/
int pud_free_pmd_page(pud_t *pud)
{
- return pud_none(*pud);
+ pmd_t *pmd;
+ int i;
+
+ if (pud_none(*pud))
+ return 1;
+
+ pmd = (pmd_t *)pud_page_vaddr(*pud);
+
+ for (i = 0; i < PTRS_PER_PMD; i++)
+ if (!pmd_free_pte_page(&pmd[i]))
+ return 0;
+
+ pud_clear(pud);
+ free_page((unsigned long)pmd);
+
+ return 1;
}
/**
@@ -724,6 +739,15 @@ int pud_free_pmd_page(pud_t *pud)
*/
int pmd_free_pte_page(pmd_t *pmd)
{
- return pmd_none(*pmd);
+ pte_t *pte;
+
+ if (pmd_none(*pmd))
+ return 1;
+
+ pte = (pte_t *)pmd_page_vaddr(*pmd);
+ pmd_clear(pmd);
+ free_page((unsigned long)pte);
+
+ return 1;
}
#endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */
_
Patches currently in -mm which might be from toshi.kani(a)hpe.com are