August 2020 - Linux-stable-mirror

[merged] khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock has been removed from the -mm tree. Its filename was khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock When retract_page_tables() removes a page table to make way for a huge pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the case when the original mmap_write_trylock had failed), only mmap_write_trylock and pmd lock are held. That's not enough. One machine has twice crashed under load, with "BUG: spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving page_referenced() on a file THP, that had found a page table at *pmd) discovers that the page table page and its lock have already been freed by the time it comes to unlock. Follow the example of retract_page_tables(), but we only need one of huge page lock or i_mmap_lock_write to secure against this: because it's the narrower lock, and because it simplifies collapse_pte_mapped_thp() to know the hpage earlier, choose to rely on huge page lock here. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Song Liu <songliubraving(a)fb.com> Cc: <stable(a)vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/khugepaged.c | 44 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 25 deletions(-) --- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock +++ a/mm/khugepaged.c @@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s { unsigned long haddr = addr & HPAGE_PMD_MASK; struct vm_area_struct *vma = find_vma(mm, haddr); - struct page *hpage = NULL; + struct page *hpage; pte_t *start_pte, *pte; pmd_t *pmd, _pmd; spinlock_t *ptl; @@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE)) return; + hpage = find_lock_page(vma->vm_file->f_mapping, + linear_page_index(vma, haddr)); + if (!hpage) + return; + + if (!PageHead(hpage)) + goto drop_hpage; + pmd = mm_find_pmd(mm, haddr); if (!pmd) - return; + goto drop_hpage; start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl); @@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s page = vm_normal_page(vma, addr, *pte); - if (!page || !PageCompound(page)) - goto abort; - - if (!hpage) { - hpage = compound_head(page); - /* - * The mapping of the THP should not change. - * - * Note that uprobe, debugger, or MAP_PRIVATE may - * change the page table, but the new page will - * not pass PageCompound() check. - */ - if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping)) - goto abort; - } - /* - * Confirm the page maps to the correct subpage. - * - * Note that uprobe, debugger, or MAP_PRIVATE may change - * the page table, but the new page will not pass - * PageCompound() check. + * Note that uprobe, debugger, or MAP_PRIVATE may change the + * page table, but the new page will not be a subpage of hpage. */ - if (WARN_ON(hpage + i != page)) + if (hpage + i != page) goto abort; count++; } @@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s pte_unmap_unlock(start_pte, ptl); /* step 3: set proper refcount and mm_counters. */ - if (hpage) { + if (count) { page_ref_sub(hpage, count); add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count); } @@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd)); + +drop_hpage: + unlock_page(hpage); + put_page(hpage); return; abort: pte_unmap_unlock(start_pte, ptl); + goto drop_hpage; } static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) _ Patches currently in -mm which might be from hughd(a)google.com are

5 years, 4 months

1
0
0 0

[merged] khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: khugepaged: collapse_pte_mapped_thp() flush the right range has been removed from the -mm tree. Its filename was khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: khugepaged: collapse_pte_mapped_thp() flush the right range pmdp_collapse_flush() should be given the start address at which the huge page is mapped, haddr: it was given addr, which at that point has been used as a local variable, incremented to the end address of the extent. Found by source inspection while chasing a hugepage locking bug, which I then could not explain by this. At first I thought this was very bad; then saw that all of the page translations that were not flushed would actually still point to the right pages afterwards, so harmless; then realized that I know nothing of how different architectures and models cache intermediate paging structures, so maybe it matters after all - particularly since the page table concerned is immediately freed. Much easier to fix than to think about. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Song Liu <songliubraving(a)fb.com> Cc: <stable(a)vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/khugepaged.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-flush-the-right-range +++ a/mm/khugepaged.c @@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s /* step 4: collapse pmd */ ptl = pmd_lock(vma->vm_mm, pmd); - _pmd = pmdp_collapse_flush(vma, addr, pmd); + _pmd = pmdp_collapse_flush(vma, haddr, pmd); spin_unlock(ptl); mm_dec_nr_ptes(mm); pte_free(mm, pmd_pgtable(_pmd)); _ Patches currently in -mm which might be from hughd(a)google.com are

5 years, 4 months

1
0
0 0

[merged] mm-hugetlb-fix-calculation-of-adjust_range_if_pmd_sharing_possible.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible has been removed from the -mm tree. Its filename was mm-hugetlb-fix-calculation-of-adjust_range_if_pmd_sharing_possible.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Peter Xu <peterx(a)redhat.com> Subject: mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible This is found by code observation only. Firstly, the worst case scenario should assume the whole range was covered by pmd sharing. The old algorithm might not work as expected for ranges like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the expected range should be (0, 2g). Since at it, remove the loop since it should not be required. With that, the new code should be faster too when the invalidating range is huge. Mike said: : With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only : adjust to (0, 1g+2m) which is incorrect. : : We should cc stable. The original reason for adjusting the range was to : prevent data corruption (getting wrong page). Since the range is not : always adjusted correctly, the potential for corruption still exists. : : However, I am fairly confident that adjust_range_if_pmd_sharing_possible : is only gong to be called in two cases: : : 1) for a single page : 2) for range == entire vma : : In those cases, the current code should produce the correct results. : : To be safe, let's just cc stable. Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages") Signed-off-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-fix-calculation-of-adjust_range_if_pmd_sharing_possible +++ a/mm/hugetlb.c @@ -5314,25 +5314,21 @@ static bool vma_shareable(struct vm_area void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { - unsigned long check_addr; + unsigned long a_start, a_end; if (!(vma->vm_flags & VM_MAYSHARE)) return; - for (check_addr = *start; check_addr < *end; check_addr += PUD_SIZE) { - unsigned long a_start = check_addr & PUD_MASK; - unsigned long a_end = a_start + PUD_SIZE; + /* Extend the range to be PUD aligned for a worst case scenario */ + a_start = ALIGN_DOWN(*start, PUD_SIZE); + a_end = ALIGN(*end, PUD_SIZE); - /* - * If sharing is possible, adjust start/end if necessary. - */ - if (range_in_vma(vma, a_start, a_end)) { - if (a_start < *start) - *start = a_start; - if (a_end > *end) - *end = a_end; - } - } + /* + * Intersect the range with the vma range, since pmd sharing won't be + * across vma after all + */ + *start = max(vma->vm_start, a_start); + *end = min(vma->vm_end, a_end); } /* _ Patches currently in -mm which might be from peterx(a)redhat.com are mm-do-page-fault-accounting-in-handle_mm_fault.patch mm-alpha-use-general-page-fault-accounting.patch mm-arc-use-general-page-fault-accounting.patch mm-arm-use-general-page-fault-accounting.patch mm-arm64-use-general-page-fault-accounting.patch mm-csky-use-general-page-fault-accounting.patch mm-hexagon-use-general-page-fault-accounting.patch mm-ia64-use-general-page-fault-accounting.patch mm-m68k-use-general-page-fault-accounting.patch mm-microblaze-use-general-page-fault-accounting.patch mm-mips-use-general-page-fault-accounting.patch mm-nds32-use-general-page-fault-accounting.patch mm-nios2-use-general-page-fault-accounting.patch mm-openrisc-use-general-page-fault-accounting.patch mm-parisc-use-general-page-fault-accounting.patch mm-powerpc-use-general-page-fault-accounting.patch mm-riscv-use-general-page-fault-accounting.patch mm-s390-use-general-page-fault-accounting.patch mm-sh-use-general-page-fault-accounting.patch mm-sparc32-use-general-page-fault-accounting.patch mm-sparc64-use-general-page-fault-accounting.patch mm-x86-use-general-page-fault-accounting.patch mm-xtensa-use-general-page-fault-accounting.patch mm-clean-up-the-last-pieces-of-page-fault-accountings.patch mm-gup-remove-task_struct-pointer-for-all-gup-code.patch

5 years, 4 months

1
0
0 0

[merged] mm-fix-protection-usage-propagation.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/page_counter.c: fix protection usage propagation has been removed from the -mm tree. Its filename was mm-fix-protection-usage-propagation.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Michal Koutný <mkoutny(a)suse.com> Subject: mm/page_counter.c: fix protection usage propagation When workload runs in cgroups that aren't directly below root cgroup and their parent specifies reclaim protection, it may end up ineffective. The reason is that propagate_protected_usage() is not called in all hierarchy up. All the protected usage is incorrectly accumulated in the workload's parent. This means that siblings_low_usage is overestimated and effective protection underestimated. Even though it is transitional phenomenon (uncharge path does correct propagation and fixes the wrong children_low_usage), it can undermine the intended protection unexpectedly. We have noticed this problem while seeing a swap out in a descendant of a protected memcg (intermediate node) while the parent was conveniently under its protection limit and the memory pressure was external to that hierarchy. Michal has pinpointed this down to the wrong siblings_low_usage which led to the unwanted reclaim. The fix is simply updating children_low_usage in respective ancestors also in the charging path. Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.org Fixes: 230671533d64 ("mm: memory.low hierarchical behavior") Signed-off-by: Michal Koutný <mkoutny(a)suse.com> Signed-off-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Roman Gushchin <guro(a)fb.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> [4.18+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_counter.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/mm/page_counter.c~mm-fix-protection-usage-propagation +++ a/mm/page_counter.c @@ -72,7 +72,7 @@ void page_counter_charge(struct page_cou long new; new = atomic_long_add_return(nr_pages, &c->usage); - propagate_protected_usage(counter, new); + propagate_protected_usage(c, new); /* * This is indeed racy, but we can live with some * inaccuracy in the watermark. @@ -116,7 +116,7 @@ bool page_counter_try_charge(struct page new = atomic_long_add_return(nr_pages, &c->usage); if (new > c->max) { atomic_long_sub(nr_pages, &c->usage); - propagate_protected_usage(counter, new); + propagate_protected_usage(c, new); /* * This is racy, but we can live with some * inaccuracy in the failcnt. @@ -125,7 +125,7 @@ bool page_counter_try_charge(struct page *fail = c; goto failed; } - propagate_protected_usage(counter, new); + propagate_protected_usage(c, new); /* * Just like with failcnt, we can live with some * inaccuracy in the watermark. _ Patches currently in -mm which might be from mkoutny(a)suse.com are proc-pid-smaps-consistent-whitespace-output-format.patch

5 years, 4 months

1
0
0 0

[merged] ocfs2-change-slot-number-type-s16-to-u16.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: ocfs2: change slot number type s16 to u16 has been removed from the -mm tree. Its filename was ocfs2-change-slot-number-type-s16-to-u16.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Junxiao Bi <junxiao.bi(a)oracle.com> Subject: ocfs2: change slot number type s16 to u16 Dan Carpenter reported the following static checker warning. fs/ocfs2/super.c:1269 ocfs2_parse_options() warn: '(-1)' 65535 can't fit into 32767 'mopt->slot' fs/ocfs2/suballoc.c:859 ocfs2_init_inode_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_inode_steal_slot' fs/ocfs2/suballoc.c:867 ocfs2_init_meta_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_meta_steal_slot' That's because OCFS2_INVALID_SLOT is (u16)-1. Slot number in ocfs2 can be never negative, so change s16 to u16. Link: http://lkml.kernel.org/r/20200627001259.19757-1-junxiao.bi@oracle.com Fixes: 9277f8334ffc ("ocfs2: fix value of OCFS2_INVALID_SLOT") Signed-off-by: Junxiao Bi <junxiao.bi(a)oracle.com> Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com> Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com> Reviewed-by: Gang He <ghe(a)suse.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Changwei Ge <gechangwei(a)live.cn> Cc: Jun Piao <piaojun(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/ocfs2/ocfs2.h | 4 ++-- fs/ocfs2/suballoc.c | 4 ++-- fs/ocfs2/super.c | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) --- a/fs/ocfs2/ocfs2.h~ocfs2-change-slot-number-type-s16-to-u16 +++ a/fs/ocfs2/ocfs2.h @@ -327,8 +327,8 @@ struct ocfs2_super spinlock_t osb_lock; u32 s_next_generation; unsigned long osb_flags; - s16 s_inode_steal_slot; - s16 s_meta_steal_slot; + u16 s_inode_steal_slot; + u16 s_meta_steal_slot; atomic_t s_num_inodes_stolen; atomic_t s_num_meta_stolen; --- a/fs/ocfs2/suballoc.c~ocfs2-change-slot-number-type-s16-to-u16 +++ a/fs/ocfs2/suballoc.c @@ -879,9 +879,9 @@ static void __ocfs2_set_steal_slot(struc { spin_lock(&osb->osb_lock); if (type == INODE_ALLOC_SYSTEM_INODE) - osb->s_inode_steal_slot = slot; + osb->s_inode_steal_slot = (u16)slot; else if (type == EXTENT_ALLOC_SYSTEM_INODE) - osb->s_meta_steal_slot = slot; + osb->s_meta_steal_slot = (u16)slot; spin_unlock(&osb->osb_lock); } --- a/fs/ocfs2/super.c~ocfs2-change-slot-number-type-s16-to-u16 +++ a/fs/ocfs2/super.c @@ -78,7 +78,7 @@ struct mount_options unsigned long commit_interval; unsigned long mount_opt; unsigned int atime_quantum; - signed short slot; + unsigned short slot; int localalloc_opt; unsigned int resv_level; int dir_resv_level; @@ -1349,7 +1349,7 @@ static int ocfs2_parse_options(struct su goto bail; } if (option) - mopt->slot = (s16)option; + mopt->slot = (u16)option; break; case Opt_commit: if (match_int(&args[0], &option)) { _ Patches currently in -mm which might be from junxiao.bi(a)oracle.com are

5 years, 4 months

1
0
0 0

[merged] mm-fix-kthread_use_mm-vs-tlb-invalidate.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: fix kthread_use_mm() vs TLB invalidate has been removed from the -mm tree. Its filename was mm-fix-kthread_use_mm-vs-tlb-invalidate.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Peter Zijlstra <peterz(a)infradead.org> Subject: mm: fix kthread_use_mm() vs TLB invalidate For SMP systems using IPI based TLB invalidation, looking at current->active_mm is entirely reasonable. This then presents the following race condition: CPU0 CPU1 flush_tlb_mm(mm) use_mm(mm) <send-IPI> tsk->active_mm = mm; <IPI> if (tsk->active_mm == mm) // flush TLBs </IPI> switch_mm(old_mm,mm,tsk); Where it is possible the IPI flushed the TLBs for @old_mm, not @mm, because the IPI lands before we actually switched. Avoid this by disabling IRQs across changing ->active_mm and switch_mm(). Of the (SMP) architectures that have IPI based TLB invalidate: Alpha - checks active_mm ARC - ASID specific IA64 - checks active_mm MIPS - ASID specific flush OpenRISC - shoots down world PARISC - shoots down world SH - ASID specific SPARC - ASID specific x86 - N/A xtensa - checks active_mm So at the very least Alpha, IA64 and Xtensa are suspect. On top of this, for scheduler consistency we need at least preemption disabled across changing tsk->mm and doing switch_mm(), which is currently provided by task_lock(), but that's not sufficient for PREEMPT_RT. [akpm(a)linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/20200721154106.GE10769@hirez.programming.kicks-ass… Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reported-by: Andy Lutomirski <luto(a)amacapital.net> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: Jens Axboe <axboe(a)kernel.dk> Cc: Kees Cook <keescook(a)chromium.org> Cc: Jann Horn <jannh(a)google.com> Cc: Will Deacon <will(a)kernel.org> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Nicholas Piggin <npiggin(a)gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/kthread.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate +++ a/kernel/kthread.c @@ -1241,13 +1241,16 @@ void kthread_use_mm(struct mm_struct *mm WARN_ON_ONCE(tsk->mm); task_lock(tsk); + /* Hold off tlb flush IPIs while switching mm's */ + local_irq_disable(); active_mm = tsk->active_mm; if (active_mm != mm) { mmgrab(mm); tsk->active_mm = mm; } tsk->mm = mm; - switch_mm(active_mm, mm, tsk); + switch_mm_irqs_off(active_mm, mm, tsk); + local_irq_enable(); task_unlock(tsk); #ifdef finish_arch_post_lock_switch finish_arch_post_lock_switch(); @@ -1276,9 +1279,11 @@ void kthread_unuse_mm(struct mm_struct * task_lock(tsk); sync_mm_rss(mm); + local_irq_disable(); tsk->mm = NULL; /* active_mm is still 'mm' */ enter_lazy_tlb(mm, tsk); + local_irq_enable(); task_unlock(tsk); } EXPORT_SYMBOL_GPL(kthread_unuse_mm); _ Patches currently in -mm which might be from peterz(a)infradead.org are

5 years, 4 months

1
0
0 0

[merged] mm-shuffle-dont-move-pages-between-zones-and-dont-read-garbage-memmaps.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm/shuffle: don't move pages between zones and don't read garbage memmaps has been removed from the -mm tree. Its filename was mm-shuffle-dont-move-pages-between-zones-and-dont-read-garbage-memmaps.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: David Hildenbrand <david(a)redhat.com> Subject: mm/shuffle: don't move pages between zones and don't read garbage memmaps Especially with memory hotplug, we can have offline sections (with a garbage memmap) and overlapping zones. We have to make sure to only touch initialized memmaps (online sections managed by the buddy) and that the zone matches, to not move pages between zones. To test if this can actually happen, I added a simple BUG_ON(page_zone(page_i) != page_zone(page_j)); right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and onlining the first memory block "online_movable" and the second memory block "online_kernel", it will trigger the BUG, as both zones (NORMAL and MOVABLE) overlap. This might result in all kinds of weird situations (e.g., double allocations, list corruptions, unmovable allocations ending up in the movable zone). Link: http://lkml.kernel.org/r/20200624094741.9918-2-david@redhat.com Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Wei Yang <richard.weiyang(a)linux.alibaba.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Dan Williams <dan.j.williams(a)intel.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Huang Ying <ying.huang(a)intel.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: <stable(a)vger.kernel.org> [5.2+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shuffle.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) --- a/mm/shuffle.c~mm-shuffle-dont-move-pages-between-zones-and-dont-read-garbage-memmaps +++ a/mm/shuffle.c @@ -58,25 +58,25 @@ module_param_call(shuffle, shuffle_store * For two pages to be swapped in the shuffle, they must be free (on a * 'free_area' lru), have the same order, and have the same migratetype. */ -static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order) +static struct page * __meminit shuffle_valid_page(struct zone *zone, + unsigned long pfn, int order) { - struct page *page; + struct page *page = pfn_to_online_page(pfn); /* * Given we're dealing with randomly selected pfns in a zone we * need to ask questions like... */ - /* ...is the pfn even in the memmap? */ - if (!pfn_valid_within(pfn)) + /* ... is the page managed by the buddy? */ + if (!page) return NULL; - /* ...is the pfn in a present section or a hole? */ - if (!pfn_in_present_section(pfn)) + /* ... is the page assigned to the same zone? */ + if (page_zone(page) != zone) return NULL; /* ...is the page free and currently on a free_area list? */ - page = pfn_to_page(pfn); if (!PageBuddy(page)) return NULL; @@ -123,7 +123,7 @@ void __meminit __shuffle_zone(struct zon * page_j randomly selected in the span @zone_start_pfn to * @spanned_pages. */ - page_i = shuffle_valid_page(i, order); + page_i = shuffle_valid_page(z, i, order); if (!page_i) continue; @@ -137,7 +137,7 @@ void __meminit __shuffle_zone(struct zon j = z->zone_start_pfn + ALIGN_DOWN(get_random_long() % z->spanned_pages, order_pages); - page_j = shuffle_valid_page(j, order); + page_j = shuffle_valid_page(z, j, order); if (page_j && page_j != page_i) break; } _ Patches currently in -mm which might be from david(a)redhat.com are

5 years, 4 months

1
0
0 0

[for-linus][PATCH 05/17] ftrace: Setup correct FTRACE_FL_REGS flags for module

by Steven Rostedt

From: Chengming Zhou <zhouchengming(a)bytedance.com> When module loaded and enabled, we will use __ftrace_replace_code for module if any ftrace_ops referenced it found. But we will get wrong ftrace_addr for module rec in ftrace_get_addr_new, because rec->flags has not been setup correctly. It can cause the callback function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called with pt_regs set to NULL. So setup correct FTRACE_FL_REGS flags for rec when we call referenced_filters to find ftrace_ops references it. Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com Cc: stable(a)vger.kernel.org Fixes: 8c4f3c3fa9681 ("ftrace: Check module functions being traced on reload") Signed-off-by: Chengming Zhou <zhouchengming(a)bytedance.com> Signed-off-by: Muchun Song <songmuchun(a)bytedance.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> --- kernel/trace/ftrace.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index c141d347f71a..d052f856f1cf 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -6198,8 +6198,11 @@ static int referenced_filters(struct dyn_ftrace *rec) int cnt = 0; for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) { - if (ops_references_rec(ops, rec)) - cnt++; + if (ops_references_rec(ops, rec)) { + cnt++; + if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) + rec->flags |= FTRACE_FL_REGS; + } } return cnt; @@ -6378,8 +6381,8 @@ void ftrace_module_enable(struct module *mod) if (ftrace_start_up) cnt += referenced_filters(rec); - /* This clears FTRACE_FL_DISABLED */ - rec->flags = cnt; + rec->flags &= ~FTRACE_FL_DISABLED; + rec->flags += cnt; if (ftrace_start_up && cnt) { int failed = __ftrace_replace_code(rec, 1); -- 2.26.2

5 years, 4 months

2
1
0 0

[PATCH AUTOSEL 4.4 1/5] EDAC: Fix reference count leaks

by Sasha Levin

From: Qiushi Wu <wu000273(a)umn.edu> [ Upstream commit 17ed808ad243192fb923e4e653c1338d3ba06207 ] When kobject_init_and_add() returns an error, it should be handled because kobject_init_and_add() takes a reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Therefore, replace calling kfree() and call kobject_put() and add a missing kobject_put() in the edac_device_register_sysfs_main_kobj() error path. [ bp: Massage and merge into a single patch. ] Fixes: b2ed215a3338 ("Kobject: change drivers/edac to use kobject_init_and_add") Signed-off-by: Qiushi Wu <wu000273(a)umn.edu> Signed-off-by: Borislav Petkov <bp(a)suse.de> Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/edac/edac_device_sysfs.c | 1 + drivers/edac/edac_pci_sysfs.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/edac/edac_device_sysfs.c b/drivers/edac/edac_device_sysfs.c index fb68a06ad6837..18991cfec2af4 100644 --- a/drivers/edac/edac_device_sysfs.c +++ b/drivers/edac/edac_device_sysfs.c @@ -280,6 +280,7 @@ int edac_device_register_sysfs_main_kobj(struct edac_device_ctl_info *edac_dev) /* Error exit stack */ err_kobj_reg: + kobject_put(&edac_dev->kobj); module_put(edac_dev->owner); err_mod_get: diff --git a/drivers/edac/edac_pci_sysfs.c b/drivers/edac/edac_pci_sysfs.c index 24d877f6e5775..c56128402bc67 100644 --- a/drivers/edac/edac_pci_sysfs.c +++ b/drivers/edac/edac_pci_sysfs.c @@ -394,7 +394,7 @@ static int edac_pci_main_kobj_setup(void) /* Error unwind statck */ kobject_init_and_add_fail: - kfree(edac_pci_top_main_kobj); + kobject_put(edac_pci_top_main_kobj); kzalloc_fail: module_put(THIS_MODULE); -- 2.25.1

5 years, 4 months

1
4
0 0

[PATCH AUTOSEL 4.9 1/9] EDAC: Fix reference count leaks

by Sasha Levin

From: Qiushi Wu <wu000273(a)umn.edu> [ Upstream commit 17ed808ad243192fb923e4e653c1338d3ba06207 ] When kobject_init_and_add() returns an error, it should be handled because kobject_init_and_add() takes a reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Therefore, replace calling kfree() and call kobject_put() and add a missing kobject_put() in the edac_device_register_sysfs_main_kobj() error path. [ bp: Massage and merge into a single patch. ] Fixes: b2ed215a3338 ("Kobject: change drivers/edac to use kobject_init_and_add") Signed-off-by: Qiushi Wu <wu000273(a)umn.edu> Signed-off-by: Borislav Petkov <bp(a)suse.de> Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/edac/edac_device_sysfs.c | 1 + drivers/edac/edac_pci_sysfs.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/edac/edac_device_sysfs.c b/drivers/edac/edac_device_sysfs.c index 93da1a45c7161..470b02fc2de96 100644 --- a/drivers/edac/edac_device_sysfs.c +++ b/drivers/edac/edac_device_sysfs.c @@ -275,6 +275,7 @@ int edac_device_register_sysfs_main_kobj(struct edac_device_ctl_info *edac_dev) /* Error exit stack */ err_kobj_reg: + kobject_put(&edac_dev->kobj); module_put(edac_dev->owner); err_out: diff --git a/drivers/edac/edac_pci_sysfs.c b/drivers/edac/edac_pci_sysfs.c index 6e3428ba400f3..622d117e25335 100644 --- a/drivers/edac/edac_pci_sysfs.c +++ b/drivers/edac/edac_pci_sysfs.c @@ -386,7 +386,7 @@ static int edac_pci_main_kobj_setup(void) /* Error unwind statck */ kobject_init_and_add_fail: - kfree(edac_pci_top_main_kobj); + kobject_put(edac_pci_top_main_kobj); kzalloc_fail: module_put(THIS_MODULE); -- 2.25.1

5 years, 4 months

1
8
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror August 2020