March 2018 - Linux-stable-mirror

[PATCH 02/15] mm/hmm: fix header file if/else/endif maze v2

by jglisse＠redhat.com

From: Jérôme Glisse <jglisse(a)redhat.com> The #if/#else/#endif for IS_ENABLED(CONFIG_HMM) were wrong. Because of this after multiple include there was multiple definition of both hmm_mm_init() and hmm_mm_destroy() leading to build failure if HMM was enabled (CONFIG_HMM set). Changed since v1: - Fix the maze when CONFIG_HMM is disabled not just when it is disabled. This fix bot build failure. - Improved commit message. Signed-off-by: Jérôme Glisse <jglisse(a)redhat.com> Acked-by: Balbir Singh <bsingharora(a)gmail.com> Cc: stable(a)vger.kernel.org Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Ralph Campbell <rcampbell(a)nvidia.com> Cc: John Hubbard <jhubbard(a)nvidia.com> Cc: Evgeny Baskakov <ebaskakov(a)nvidia.com> --- include/linux/hmm.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 325017ad9311..36dd21fe5caf 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -498,23 +498,16 @@ struct hmm_device { struct hmm_device *hmm_device_new(void *drvdata); void hmm_device_put(struct hmm_device *hmm_device); #endif /* CONFIG_DEVICE_PRIVATE || CONFIG_DEVICE_PUBLIC */ -#endif /* IS_ENABLED(CONFIG_HMM) */ /* Below are for HMM internal use only! Not to be used by device driver! */ -#if IS_ENABLED(CONFIG_HMM_MIRROR) void hmm_mm_destroy(struct mm_struct *mm); static inline void hmm_mm_init(struct mm_struct *mm) { mm->hmm = NULL; } -#else /* IS_ENABLED(CONFIG_HMM_MIRROR) */ -static inline void hmm_mm_destroy(struct mm_struct *mm) {} -static inline void hmm_mm_init(struct mm_struct *mm) {} -#endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */ - - #else /* IS_ENABLED(CONFIG_HMM) */ static inline void hmm_mm_destroy(struct mm_struct *mm) {} static inline void hmm_mm_init(struct mm_struct *mm) {} +#endif /* IS_ENABLED(CONFIG_HMM) */ #endif /* LINUX_HMM_H */ -- 2.14.3

7 years, 3 months

1
0
0 0

[PATCH] ACPI / video: Add a quirk to force acpi-video backlight on Samsung 670Z5E

by Hans de Goede

Just like many other Samsung models, the 670Z5E needs to use the acpi-video backlight interface rather then the native one for backlight control to work, add a quirk for this. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1557060 Cc: stable(a)vger.kernel.org Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> --- drivers/acpi/video_detect.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/acpi/video_detect.c b/drivers/acpi/video_detect.c index 601e5d372887..43587ac680e4 100644 --- a/drivers/acpi/video_detect.c +++ b/drivers/acpi/video_detect.c @@ -219,6 +219,15 @@ static const struct dmi_system_id video_detect_dmi_table[] = { "3570R/370R/470R/450R/510R/4450RV"), }, }, + { + /* https://bugzilla.redhat.com/show_bug.cgi?id=1557060 */ + .callback = video_detect_force_video, + .ident = "SAMSUNG 670Z5E", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "SAMSUNG ELECTRONICS CO., LTD."), + DMI_MATCH(DMI_PRODUCT_NAME, "670Z5E"), + }, + }, { /* https://bugzilla.redhat.com/show_bug.cgi?id=1094948 */ .callback = video_detect_force_video, -- 2.14.3

7 years, 3 months

2
1
0 0

[patch 12/13] mm/vmscan: wake up flushers for legacy cgroups too

by akpm＠linux-foundation.org

From: Andrey Ryabinin <aryabinin(a)virtuozzo.com> Subject: mm/vmscan: wake up flushers for legacy cgroups too Commit 726d061fbd36 ("mm: vmscan: kick flushers when we encounter dirty pages on the LRU") added flusher invocation to shrink_inactive_list() when many dirty pages on the LRU are encountered. However, shrink_inactive_list() doesn't wake up flushers for legacy cgroup reclaim, so the next commit bbef938429f5 ("mm: vmscan: remove old flusher wakeup from direct reclaim path") removed the only source of flusher's wake up in legacy mem cgroup reclaim path. This leads to premature OOM if there is too many dirty pages in cgroup: # mkdir /sys/fs/cgroup/memory/test # echo $$ > /sys/fs/cgroup/memory/test/tasks # echo 50M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes # dd if=/dev/zero of=tmp_file bs=1M count=100 Killed dd invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0 Call Trace: dump_stack+0x46/0x65 dump_header+0x6b/0x2ac oom_kill_process+0x21c/0x4a0 out_of_memory+0x2a5/0x4b0 mem_cgroup_out_of_memory+0x3b/0x60 mem_cgroup_oom_synchronize+0x2ed/0x330 pagefault_out_of_memory+0x24/0x54 __do_page_fault+0x521/0x540 page_fault+0x45/0x50 Task in /test killed as a result of limit of /test memory: usage 51200kB, limit 51200kB, failcnt 73 memory+swap: usage 51200kB, limit 9007199254740988kB, failcnt 0 kmem: usage 296kB, limit 9007199254740988kB, failcnt 0 Memory cgroup stats for /test: cache:49632KB rss:1056KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:49500KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:1168KB inactive_file:24760KB active_file:24960KB unevictable:0KB Memory cgroup out of memory: Kill process 3861 (bash) score 88 or sacrifice child Killed process 3876 (dd) total-vm:8484kB, anon-rss:1052kB, file-rss:1720kB, shmem-rss:0kB oom_reaper: reaped process 3876 (dd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Wake up flushers in legacy cgroup reclaim too. Link: http://lkml.kernel.org/r/20180315164553.17856-1-aryabinin@virtuozzo.com Fixes: bbef938429f5 ("mm: vmscan: remove old flusher wakeup from direct reclaim path") Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com> Tested-by: Shakeel Butt <shakeelb(a)google.com> Acked-by: Michal Hocko <mhocko(a)suse.cz> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Tejun Heo <tj(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <> --- mm/vmscan.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff -puN mm/vmscan.c~mm-vmscan-wake-up-flushers-for-legacy-cgroups-too mm/vmscan.c --- a/mm/vmscan.c~mm-vmscan-wake-up-flushers-for-legacy-cgroups-too +++ a/mm/vmscan.c @@ -1780,6 +1780,20 @@ shrink_inactive_list(unsigned long nr_to set_bit(PGDAT_WRITEBACK, &pgdat->flags); /* + * If dirty pages are scanned that are not queued for IO, it + * implies that flushers are not doing their job. This can + * happen when memory pressure pushes dirty pages to the end of + * the LRU before the dirty limits are breached and the dirty + * data has expired. It can also happen when the proportion of + * dirty pages grows not through writes but through memory + * pressure reclaiming all the clean cache. And in some cases, + * the flushers simply cannot keep up with the allocation + * rate. Nudge the flusher threads in case they are asleep. + */ + if (stat.nr_unqueued_dirty == nr_taken) + wakeup_flusher_threads(WB_REASON_VMSCAN); + + /* * Legacy memcg will stall in page writeback so avoid forcibly * stalling here. */ @@ -1791,22 +1805,9 @@ shrink_inactive_list(unsigned long nr_to if (stat.nr_dirty && stat.nr_dirty == stat.nr_congested) set_bit(PGDAT_CONGESTED, &pgdat->flags); - /* - * If dirty pages are scanned that are not queued for IO, it - * implies that flushers are not doing their job. This can - * happen when memory pressure pushes dirty pages to the end of - * the LRU before the dirty limits are breached and the dirty - * data has expired. It can also happen when the proportion of - * dirty pages grows not through writes but through memory - * pressure reclaiming all the clean cache. And in some cases, - * the flushers simply cannot keep up with the allocation - * rate. Nudge the flusher threads in case they are asleep, but - * also allow kswapd to start writing pages during reclaim. - */ - if (stat.nr_unqueued_dirty == nr_taken) { - wakeup_flusher_threads(WB_REASON_VMSCAN); + /* Allow kswapd to start writing pages during reclaim. */ + if (stat.nr_unqueued_dirty == nr_taken) set_bit(PGDAT_DIRTY, &pgdat->flags); - } /* * If kswapd scans pages marked marked for immediate _

7 years, 3 months

1
0
0 0

[patch 11/13] Revert "mm: page_alloc: skip over regions of invalid pfns where possible"

by akpm＠linux-foundation.org

From: Daniel Vacek <neelx(a)redhat.com> Subject: Revert "mm: page_alloc: skip over regions of invalid pfns where possible" This reverts b92df1de5d289c0b ("mm: page_alloc: skip over regions of invalid pfns where possible"). The commit is meant to be a boot init speed up skipping the loop in memmap_init_zone() for invalid pfns. But given some specific memory mapping on x86_64 (or more generally theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the implementation also skips valid pfns which is plain wrong and causes 'kernel BUG at mm/page_alloc.c:1389!' crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1 kernel BUG at mm/page_alloc.c:1389! invalid opcode: 0000 [#1] SMP -- RIP: 0010:[<ffffffff8118833e>] [<ffffffff8118833e>] move_freepages+0x15e/0x160 RSP: 0018:ffff88054d727688 EFLAGS: 00010087 -- Call Trace: [<ffffffff811883b3>] move_freepages_block+0x73/0x80 [<ffffffff81189e63>] __rmqueue+0x263/0x460 [<ffffffff8118c781>] get_page_from_freelist+0x7e1/0x9e0 [<ffffffff8118caf6>] __alloc_pages_nodemask+0x176/0x420 -- RIP [<ffffffff8118833e>] move_freepages+0x15e/0x160 RSP <ffff88054d727688> crash> page_init_bug -v | grep RAM <struct resource 0xffff88067fffd2f8> 1000 - 9bfff System RAM (620.00 KiB) <struct resource 0xffff88067fffd3a0> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB) <struct resource 0xffff88067fffd410> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB) <struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB) <struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB) <struct resource 0xffff88067fffd640> 100000000 - 67fffffff System RAM ( 22.00 GiB) crash> page_init_bug | head -6 <struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB) <struct page 0xffffea0001ede200> 1fffff00000000 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575 <struct page 0xffffea0001ede200> 505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0> <struct page 0xffffea0001ed8000> 0 0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA 1 4095 <struct page 0xffffea0001edffc0> 1fffff00000400 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575 BUG, zones differ! crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000 PAGE PHYSICAL MAPPING INDEX CNT FLAGS ffffea0001e00000 78000000 0 0 0 0 ffffea0001ed7fc0 7b5ff000 0 0 0 0 ffffea0001ed8000 7b600000 0 0 0 0 <<<< ffffea0001ede1c0 7b787000 0 0 0 0 ffffea0001ede200 7b788000 0 0 1 1fffff00000000 Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible") Signed-off-by: Daniel Vacek <neelx(a)redhat.com> Acked-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com> Cc: Paul Burton <paul.burton(a)imgtec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <> --- include/linux/memblock.h | 1 - mm/memblock.c | 28 ---------------------------- mm/page_alloc.c | 11 +---------- 3 files changed, 1 insertion(+), 39 deletions(-) diff -puN include/linux/memblock.h~revert-mm-page_alloc-skip-over-regions-of-invalid-pfns-where-possible include/linux/memblock.h --- a/include/linux/memblock.h~revert-mm-page_alloc-skip-over-regions-of-invalid-pfns-where-possible +++ a/include/linux/memblock.h @@ -187,7 +187,6 @@ int memblock_search_pfn_nid(unsigned lon unsigned long *end_pfn); void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn, unsigned long *out_end_pfn, int *out_nid); -unsigned long memblock_next_valid_pfn(unsigned long pfn, unsigned long max_pfn); /** * for_each_mem_pfn_range - early memory pfn range iterator diff -puN mm/memblock.c~revert-mm-page_alloc-skip-over-regions-of-invalid-pfns-where-possible mm/memblock.c --- a/mm/memblock.c~revert-mm-page_alloc-skip-over-regions-of-invalid-pfns-where-possible +++ a/mm/memblock.c @@ -1101,34 +1101,6 @@ void __init_memblock __next_mem_pfn_rang *out_nid = r->nid; } -unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn, - unsigned long max_pfn) -{ - struct memblock_type *type = &memblock.memory; - unsigned int right = type->cnt; - unsigned int mid, left = 0; - phys_addr_t addr = PFN_PHYS(++pfn); - - do { - mid = (right + left) / 2; - - if (addr < type->regions[mid].base) - right = mid; - else if (addr >= (type->regions[mid].base + - type->regions[mid].size)) - left = mid + 1; - else { - /* addr is within the region, so pfn is valid */ - return pfn; - } - } while (left < right); - - if (right == type->cnt) - return -1UL; - else - return PHYS_PFN(type->regions[right].base); -} - /** * memblock_set_node - set node ID on memblock regions * @base: base of area to set node ID for diff -puN mm/page_alloc.c~revert-mm-page_alloc-skip-over-regions-of-invalid-pfns-where-possible mm/page_alloc.c --- a/mm/page_alloc.c~revert-mm-page_alloc-skip-over-regions-of-invalid-pfns-where-possible +++ a/mm/page_alloc.c @@ -5356,17 +5356,8 @@ void __meminit memmap_init_zone(unsigned if (context != MEMMAP_EARLY) goto not_early; - if (!early_pfn_valid(pfn)) { -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP - /* - * Skip to the pfn preceding the next valid one (or - * end_pfn), such that we hit a valid pfn (or end_pfn) - * on our next iteration of the loop. - */ - pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1; -#endif + if (!early_pfn_valid(pfn)) continue; - } if (!early_pfn_in_nid(pfn, nid)) continue; if (!update_defer_init(pgdat, pfn, end_pfn, &nr_initialised)) _

7 years, 3 months

1
0
0 0

[patch 10/13] mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink()

by akpm＠linux-foundation.org

From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Subject: mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink() shmem_unused_huge_shrink() gets called from reclaim path. Waiting for page lock may lead to deadlock there. There was a bug report that may be attributed to this: http://lkml.kernel.org/r/alpine.LRH.2.11.1801242349220.30642@mail.ewheeler.… Replace lock_page() with trylock_page() and skip the page if we failed to lock it. We will get to the page on the next scan. We can test for the PageTransHuge() outside the page lock as we only need protection against splitting the page under us. Holding pin oni the page is enough for this. Link: http://lkml.kernel.org/r/20180316210830.43738-1-kirill.shutemov@linux.intel… Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Reported-by: Eric Wheeler <linux-mm(a)lists.ewheeler.net> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp> Cc: Hugh Dickins <hughd(a)google.com> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <> --- mm/shmem.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff -puN mm/shmem.c~mm-shmem-do-not-wait-for-lock_page-in-shmem_unused_huge_shrink mm/shmem.c --- a/mm/shmem.c~mm-shmem-do-not-wait-for-lock_page-in-shmem_unused_huge_shrink +++ a/mm/shmem.c @@ -493,36 +493,45 @@ next: info = list_entry(pos, struct shmem_inode_info, shrinklist); inode = &info->vfs_inode; - if (nr_to_split && split >= nr_to_split) { - iput(inode); - continue; - } + if (nr_to_split && split >= nr_to_split) + goto leave; - page = find_lock_page(inode->i_mapping, + page = find_get_page(inode->i_mapping, (inode->i_size & HPAGE_PMD_MASK) >> PAGE_SHIFT); if (!page) goto drop; + /* No huge page at the end of the file: nothing to split */ if (!PageTransHuge(page)) { - unlock_page(page); put_page(page); goto drop; } + /* + * Leave the inode on the list if we failed to lock + * the page at this time. + * + * Waiting for the lock may lead to deadlock in the + * reclaim path. + */ + if (!trylock_page(page)) { + put_page(page); + goto leave; + } + ret = split_huge_page(page); unlock_page(page); put_page(page); - if (ret) { - /* split failed: leave it on the list */ - iput(inode); - continue; - } + /* If split failed leave the inode on the list */ + if (ret) + goto leave; split++; drop: list_del_init(&info->shrinklist); removed++; +leave: iput(inode); } _

7 years, 3 months

1
0
0 0

[patch 09/13] mm/thp: do not wait for lock_page() in deferred_split_scan()

by akpm＠linux-foundation.org

From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Subject: mm/thp: do not wait for lock_page() in deferred_split_scan() deferred_split_scan() gets called from reclaim path. Waiting for page lock may lead to deadlock there. Replace lock_page() with trylock_page() and skip the page if we failed to lock it. We will get to the page on the next scan. Link: http://lkml.kernel.org/r/20180315150747.31945-1-kirill.shutemov@linux.intel… Fixes: 9a982250f773 ("thp: introduce deferred_split_huge_page()") Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <> --- mm/huge_memory.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff -puN mm/huge_memory.c~mm-thp-do-not-wait-for-lock_page-in-deferred_split_scan mm/huge_memory.c --- a/mm/huge_memory.c~mm-thp-do-not-wait-for-lock_page-in-deferred_split_scan +++ a/mm/huge_memory.c @@ -2783,11 +2783,13 @@ static unsigned long deferred_split_scan list_for_each_safe(pos, next, &list) { page = list_entry((void *)pos, struct page, mapping); - lock_page(page); + if (!trylock_page(page)) + goto next; /* split_huge_page() removes page from list on success */ if (!split_huge_page(page)) split++; unlock_page(page); +next: put_page(page); } _

7 years, 3 months

1
0
0 0

[patch 08/13] mm/khugepaged.c: convert VM_BUG_ON() to collapse fail

by akpm＠linux-foundation.org

From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Subject: mm/khugepaged.c: convert VM_BUG_ON() to collapse fail khugepaged is not yet able to convert PTE-mapped huge pages back to PMD mapped. We do not collapse such pages. See check khugepaged_scan_pmd(). But if between khugepaged_scan_pmd() and __collapse_huge_page_isolate() somebody managed to instantiate THP in the range and then split the PMD back to PTEs we would have a problem -- VM_BUG_ON_PAGE(PageCompound(page)) will get triggered. It's possible since we drop mmap_sem during collapse to re-take for write. Replace the VM_BUG_ON() with graceful collapse fail. Link: http://lkml.kernel.org/r/20180315152353.27989-1-kirill.shutemov@linux.intel… Fixes: b1caa957ae6d ("khugepaged: ignore pmd tables with THP mapped with ptes") Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Laura Abbott <labbott(a)redhat.com> Cc: Jerome Marchand <jmarchan(a)redhat.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <> --- mm/khugepaged.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff -puN mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail mm/khugepaged.c --- a/mm/khugepaged.c~mm-khugepaged-convert-vm_bug_on-to-collapse-fail +++ a/mm/khugepaged.c @@ -530,7 +530,12 @@ static int __collapse_huge_page_isolate( goto out; } - VM_BUG_ON_PAGE(PageCompound(page), page); + /* TODO: teach khugepaged to collapse THP mapped with pte */ + if (PageCompound(page)) { + result = SCAN_PAGE_COMPOUND; + goto out; + } + VM_BUG_ON_PAGE(!PageAnon(page), page); /* _

7 years, 3 months

1
0
0 0

[patch 07/13] x86/mm: implement free pmd/pte page interfaces

by akpm＠linux-foundation.org

From: Toshi Kani <toshi.kani(a)hpe.com> Subject: x86/mm: implement free pmd/pte page interfaces Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which clear a given pud/pmd entry and free up lower level page table(s). Address range associated with the pud/pmd entry must have been purged by INVLPG. Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com Fixes: e61ce6ade404e ("mm: change ioremap to set up huge I/O mappings") Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com> Reported-by: Lei Li <lious.lilei(a)hisilicon.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Borislav Petkov <bp(a)suse.de> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <> --- arch/x86/mm/pgtable.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff -puN arch/x86/mm/pgtable.c~x86-mm-implement-free-pmd-pte-page-interfaces arch/x86/mm/pgtable.c --- a/arch/x86/mm/pgtable.c~x86-mm-implement-free-pmd-pte-page-interfaces +++ a/arch/x86/mm/pgtable.c @@ -712,7 +712,22 @@ int pmd_clear_huge(pmd_t *pmd) */ int pud_free_pmd_page(pud_t *pud) { - return pud_none(*pud); + pmd_t *pmd; + int i; + + if (pud_none(*pud)) + return 1; + + pmd = (pmd_t *)pud_page_vaddr(*pud); + + for (i = 0; i < PTRS_PER_PMD; i++) + if (!pmd_free_pte_page(&pmd[i])) + return 0; + + pud_clear(pud); + free_page((unsigned long)pmd); + + return 1; } /** @@ -724,6 +739,15 @@ int pud_free_pmd_page(pud_t *pud) */ int pmd_free_pte_page(pmd_t *pmd) { - return pmd_none(*pmd); + pte_t *pte; + + if (pmd_none(*pmd)) + return 1; + + pte = (pte_t *)pmd_page_vaddr(*pmd); + pmd_clear(pmd); + free_page((unsigned long)pte); + + return 1; } #endif /* CONFIG_HAVE_ARCH_HUGE_VMAP */ _

7 years, 3 months

1
0
0 0

[patch 05/13] h8300: remove extraneous __BIG_ENDIAN definition

by akpm＠linux-foundation.org

From: Arnd Bergmann <arnd(a)arndb.de> Subject: h8300: remove extraneous __BIG_ENDIAN definition A bugfix I did earlier caused a build regression on h8300, which defines the __BIG_ENDIAN macro in a slightly different way than the generic code: arch/h8300/include/asm/byteorder.h:5:0: warning: "__BIG_ENDIAN" redefined We don't need to define it here, as the same macro is already provided by the linux/byteorder/big_endian.h, and that version does not conflict. While this is a v4.16 regression, my earlier patch also got backported to the 4.14 and 4.15 stable kernels, so we need the fixup there as well. Link: http://lkml.kernel.org/r/20180313120752.2645129-1-arnd@arndb.de Fixes: 101110f6271c ("Kbuild: always define endianess in kconfig.h") Signed-off-by: Arnd Bergmann <arnd(a)arndb.de> Cc: Yoshinori Sato <ysato(a)users.sourceforge.jp> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <> --- arch/h8300/include/asm/byteorder.h | 1 - 1 file changed, 1 deletion(-) diff -puN arch/h8300/include/asm/byteorder.h~h8300-remove-extranous-__big_endian-definition arch/h8300/include/asm/byteorder.h --- a/arch/h8300/include/asm/byteorder.h~h8300-remove-extranous-__big_endian-definition +++ a/arch/h8300/include/asm/byteorder.h @@ -2,7 +2,6 @@ #ifndef __H8300_BYTEORDER_H__ #define __H8300_BYTEORDER_H__ -#define __BIG_ENDIAN __ORDER_BIG_ENDIAN__ #include <linux/byteorder/big_endian.h> #endif _

7 years, 3 months

1
0
0 0

[patch 04/13] hugetlbfs: check for pgoff value overflow

by akpm＠linux-foundation.org

From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: hugetlbfs: check for pgoff value overflow A vma with vm_pgoff large enough to overflow a loff_t type when converted to a byte offset can be passed via the remap_file_pages system call. The hugetlbfs mmap routine uses the byte offset to calculate reservations and file size. A sequence such as: mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0); remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0); will result in the following when task exits/file closed, kernel BUG at mm/hugetlb.c:749! Call Trace: hugetlbfs_evict_inode+0x2f/0x40 evict+0xcb/0x190 __dentry_kill+0xcb/0x150 __fput+0x164/0x1e0 task_work_run+0x84/0xa0 exit_to_usermode_loop+0x7d/0x80 do_syscall_64+0x18b/0x190 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 The overflowed pgoff value causes hugetlbfs to try to set up a mapping with a negative range (end < start) that leaves invalid state which causes the BUG. The previous overflow fix to this code was incomplete and did not take the remap_file_pages system call into account. [mike.kravetz(a)oracle.com: v3] Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com [akpm(a)linux-foundation.org: include mmdebug.h] [akpm(a)linux-foundation.org: fix -ve left shift count on sh] Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reported-by: Nic Losby <blurbdust(a)gmail.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: Yisheng Xie <xieyisheng1(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <> --- fs/hugetlbfs/inode.c | 17 ++++++++++++++--- mm/hugetlb.c | 7 +++++++ 2 files changed, 21 insertions(+), 3 deletions(-) diff -puN fs/hugetlbfs/inode.c~hugetlbfs-check-for-pgoff-value-overflow fs/hugetlbfs/inode.c --- a/fs/hugetlbfs/inode.c~hugetlbfs-check-for-pgoff-value-overflow +++ a/fs/hugetlbfs/inode.c @@ -108,6 +108,16 @@ static void huge_pagevec_release(struct pagevec_reinit(pvec); } +/* + * Mask used when checking the page offset value passed in via system + * calls. This value will be converted to a loff_t which is signed. + * Therefore, we want to check the upper PAGE_SHIFT + 1 bits of the + * value. The extra bit (- 1 in the shift value) is to take the sign + * bit into account. + */ +#define PGOFF_LOFFT_MAX \ + (((1UL << (PAGE_SHIFT + 1)) - 1) << (BITS_PER_LONG - (PAGE_SHIFT + 1))) + static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma) { struct inode *inode = file_inode(file); @@ -127,12 +137,13 @@ static int hugetlbfs_file_mmap(struct fi vma->vm_ops = &hugetlb_vm_ops; /* - * Offset passed to mmap (before page shift) could have been - * negative when represented as a (l)off_t. + * page based offset in vm_pgoff could be sufficiently large to + * overflow a (l)off_t when converted to byte offset. */ - if (((loff_t)vma->vm_pgoff << PAGE_SHIFT) < 0) + if (vma->vm_pgoff & PGOFF_LOFFT_MAX) return -EINVAL; + /* must be huge page aligned */ if (vma->vm_pgoff & (~huge_page_mask(h) >> PAGE_SHIFT)) return -EINVAL; diff -puN mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow mm/hugetlb.c --- a/mm/hugetlb.c~hugetlbfs-check-for-pgoff-value-overflow +++ a/mm/hugetlb.c @@ -18,6 +18,7 @@ #include <linux/bootmem.h> #include <linux/sysfs.h> #include <linux/slab.h> +#include <linux/mmdebug.h> #include <linux/sched/signal.h> #include <linux/rmap.h> #include <linux/string_helpers.h> @@ -4374,6 +4375,12 @@ int hugetlb_reserve_pages(struct inode * struct resv_map *resv_map; long gbl_reserve; + /* This should never happen */ + if (from > to) { + VM_WARN(1, "%s called with a negative range\n", __func__); + return -EINVAL; + } + /* * Only apply hugepage reservation if asked. At fault time, an * attempt will be made for VM_NORESERVE to allocate a page _

7 years, 3 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror March 2018