November 2022 - Linux-stable-mirror

+ hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing has been added to the -mm mm-hotfixes-unstable branch. Its filename is hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing Date: Mon, 14 Nov 2022 15:55:06 -0800 madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page tables associated with the address range. For hugetlb vmas, zap_page_range will call __unmap_hugepage_range_final. However, __unmap_hugepage_range_final assumes the passed vma is about to be removed and deletes the vma_lock to prevent pmd sharing as the vma is on the way out. In the case of madvise(MADV_DONTNEED) the vma remains, but the missing vma_lock prevents pmd sharing and could potentially lead to issues with truncation/fault races. This issue was originally reported here [1] as a BUG triggered in page_try_dup_anon_rmap. Prior to the introduction of the hugetlb vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to prevent pmd sharing. Subsequent faults on this vma were confused as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was not set in new pages added to the page table. This resulted in pages that appeared anonymous in a VM_SHARED vma and triggered the BUG. Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap call from unmap_vmas(). This is used to indicate the 'final' unmapping of a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and the vm_lock is not deleted. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J… Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reported-by: Wei Chen <harperchen1110(a)gmail.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Mina Almasry <almasrymina(a)google.com> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev> Cc: Peter Xu <peterx(a)redhat.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mm.h | 2 ++ mm/hugetlb.c | 27 ++++++++++++++++----------- mm/memory.c | 2 +- 3 files changed, 19 insertions(+), 12 deletions(-) --- a/include/linux/mm.h~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing +++ a/include/linux/mm.h @@ -1868,6 +1868,8 @@ struct zap_details { * default, the flag is not set. */ #define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) +/* Set in unmap_vmas() to indicate a final unmap call. Only used by hugetlb */ +#define ZAP_FLAG_UNMAP ((__force zap_flags_t) BIT(1)) #ifdef CONFIG_MMU extern bool can_do_mlock(void); --- a/mm/hugetlb.c~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing +++ a/mm/hugetlb.c @@ -5204,17 +5204,22 @@ void __unmap_hugepage_range_final(struct __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); - /* - * Unlock and free the vma lock before releasing i_mmap_rwsem. When - * the vma_lock is freed, this makes the vma ineligible for pmd - * sharing. And, i_mmap_rwsem is required to set up pmd sharing. - * This is important as page tables for this unmapped range will - * be asynchrously deleted. If the page tables are shared, there - * will be issues when accessed by someone else. - */ - __hugetlb_vma_unlock_write_free(vma); - - i_mmap_unlock_write(vma->vm_file->f_mapping); + if (zap_flags & ZAP_FLAG_UNMAP) { /* final unmap */ + /* + * Unlock and free the vma lock before releasing i_mmap_rwsem. + * When the vma_lock is freed, this makes the vma ineligible + * for pmd sharing. And, i_mmap_rwsem is required to set up + * pmd sharing. This is important as page tables for this + * unmapped range will be asynchrously deleted. If the page + * tables are shared, there will be issues when accessed by + * someone else. + */ + __hugetlb_vma_unlock_write_free(vma); + i_mmap_unlock_write(vma->vm_file->f_mapping); + } else { + i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); + } } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, --- a/mm/memory.c~hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing +++ a/mm/memory.c @@ -1711,7 +1711,7 @@ void unmap_vmas(struct mmu_gather *tlb, { struct mmu_notifier_range range; struct zap_details details = { - .zap_flags = ZAP_FLAG_DROP_MARKER, + .zap_flags = ZAP_FLAG_DROP_MARKER | ZAP_FLAG_UNMAP, /* Careful - we need to zap private pages too! */ .even_cows = true, }; _ Patches currently in -mm which might be from mike.kravetz(a)oracle.com are ipc-shm-call-underlying-open-close-vm_ops.patch madvise-use-zap_page_range_single-for-madvise-dontneed.patch hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch selftests-vm-update-hugetlb-madvise.patch hugetlb-remove-duplicate-mmu-notifications.patch

2 years, 11 months

1
0
0 0

+ madvise-use-zap_page_range_single-for-madvise-dontneed.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: madvise: use zap_page_range_single for madvise dontneed has been added to the -mm mm-hotfixes-unstable branch. Its filename is madvise-use-zap_page_range_single-for-madvise-dontneed.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Mike Kravetz <mike.kravetz(a)oracle.com> Subject: madvise: use zap_page_range_single for madvise dontneed Date: Mon, 14 Nov 2022 15:55:05 -0800 This series addresses the issue first reported in [1], and fully described in patch 2. Patches 1 and 2 address the user visible issue and are tagged for stable backports. While exploring solutions to this issue, related problems with mmu notification calls were discovered. This is addressed in the patch "hugetlb: remove duplicate mmu notifications:". Since there are no user visible effects, this third is not tagged for stable backports. Previous discussions suggested further cleanup by removing the routine zap_page_range. This is possible because zap_page_range_single is now exported, and all callers of zap_page_range pass ranges entirely within a single vma. This work will be done in a later patch so as not to distract from this bug fix. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J… This patch (of 2): Expose the routine zap_page_range_single to zap a range within a single vma. The madvise routine madvise_dontneed_single_vma can use this routine as it explicitly operates on a single vma. Also, update the mmu notification range in zap_page_range_single to take hugetlb pmd sharing into account. This is required as MADV_DONTNEED supports hugetlb vmas. Link: https://lkml.kernel.org/r/20221114235507.294320-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20221114235507.294320-2-mike.kravetz@oracle.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reported-by: Wei Chen <harperchen1110(a)gmail.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Mina Almasry <almasrymina(a)google.com> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev> Cc: Peter Xu <peterx(a)redhat.com> Cc: Rik van Riel <riel(a)surriel.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mm.h | 27 +++++++++++++++++++-------- mm/madvise.c | 6 +++--- mm/memory.c | 23 +++++++++++------------ 3 files changed, 33 insertions(+), 23 deletions(-) --- a/include/linux/mm.h~madvise-use-zap_page_range_single-for-madvise-dontneed +++ a/include/linux/mm.h @@ -1852,6 +1852,23 @@ static void __maybe_unused show_free_are __show_free_areas(flags, nodemask, MAX_NR_ZONES - 1); } +/* + * Parameter block passed down to zap_pte_range in exceptional cases. + */ +struct zap_details { + struct folio *single_folio; /* Locked folio to be unmapped */ + bool even_cows; /* Zap COWed private pages too? */ + zap_flags_t zap_flags; /* Extra flags for zapping */ +}; + +/* + * Whether to drop the pte markers, for example, the uffd-wp information for + * file-backed memory. This should only be specified when we will completely + * drop the page in the mm, either by truncation or unmapping of the vma. By + * default, the flag is not set. + */ +#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) + #ifdef CONFIG_MMU extern bool can_do_mlock(void); #else @@ -1869,6 +1886,8 @@ void zap_vma_ptes(struct vm_area_struct unsigned long size); void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size); +void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, + unsigned long size, struct zap_details *details); void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *start_vma, unsigned long start, unsigned long end); @@ -3467,12 +3486,4 @@ madvise_set_anon_name(struct mm_struct * } #endif -/* - * Whether to drop the pte markers, for example, the uffd-wp information for - * file-backed memory. This should only be specified when we will completely - * drop the page in the mm, either by truncation or unmapping of the vma. By - * default, the flag is not set. - */ -#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) - #endif /* _LINUX_MM_H */ --- a/mm/madvise.c~madvise-use-zap_page_range_single-for-madvise-dontneed +++ a/mm/madvise.c @@ -772,8 +772,8 @@ static int madvise_free_single_vma(struc * Application no longer needs these pages. If the pages are dirty, * it's OK to just throw them away. The app will be more careful about * data it wants to keep. Be sure to free swap resources too. The - * zap_page_range call sets things up for shrink_active_list to actually free - * these pages later if no one else has touched them in the meantime, + * zap_page_range_single call sets things up for shrink_active_list to actually + * free these pages later if no one else has touched them in the meantime, * although we could add these pages to a global reuse list for * shrink_active_list to pick up before reclaiming other pages. * @@ -790,7 +790,7 @@ static int madvise_free_single_vma(struc static long madvise_dontneed_single_vma(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - zap_page_range(vma, start, end - start); + zap_page_range_single(vma, start, end - start, NULL); return 0; } --- a/mm/memory.c~madvise-use-zap_page_range_single-for-madvise-dontneed +++ a/mm/memory.c @@ -1341,15 +1341,6 @@ copy_page_range(struct vm_area_struct *d return ret; } -/* - * Parameter block passed down to zap_pte_range in exceptional cases. - */ -struct zap_details { - struct folio *single_folio; /* Locked folio to be unmapped */ - bool even_cows; /* Zap COWed private pages too? */ - zap_flags_t zap_flags; /* Extra flags for zapping */ -}; - /* Whether we should zap all COWed (private) pages too */ static inline bool should_zap_cows(struct zap_details *details) { @@ -1774,19 +1765,27 @@ void zap_page_range(struct vm_area_struc * * The range must fit into one VMA. */ -static void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, +void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *details) { + const unsigned long end = address + size; struct mmu_notifier_range range; struct mmu_gather tlb; lru_add_drain(); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, - address, address + size); + address, end); + if (is_vm_hugetlb_page(vma)) + adjust_range_if_pmd_sharing_possible(vma, &range.start, + &range.end); tlb_gather_mmu(&tlb, vma->vm_mm); update_hiwater_rss(vma->vm_mm); mmu_notifier_invalidate_range_start(&range); - unmap_single_vma(&tlb, vma, address, range.end, details); + /* + * unmap 'address-end' not 'range.start-range.end' as range + * could have been expanded for hugetlb pmd sharing. + */ + unmap_single_vma(&tlb, vma, address, end, details); mmu_notifier_invalidate_range_end(&range); tlb_finish_mmu(&tlb); } _ Patches currently in -mm which might be from mike.kravetz(a)oracle.com are ipc-shm-call-underlying-open-close-vm_ops.patch madvise-use-zap_page_range_single-for-madvise-dontneed.patch hugetlb-dont-delete-vma_lock-in-hugetlb-madv_dontneed-processing.patch selftests-vm-update-hugetlb-madvise.patch hugetlb-remove-duplicate-mmu-notifications.patch

2 years, 11 months

1
0
0 0

+ mm-migrate-fix-read-only-page-got-writable-when-recover-pte.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/migrate: fix read-only page got writable when recover pte has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-migrate-fix-read-only-page-got-writable-when-recover-pte.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Peter Xu <peterx(a)redhat.com> Subject: mm/migrate: fix read-only page got writable when recover pte Date: Sun, 13 Nov 2022 19:04:46 -0500 Ives van Hoorne from codesandbox.io reported an issue regarding possible data loss of uffd-wp when applied to memfds on heavily loaded systems. The symptom is some read page got data mismatch from the snapshot child VMs. Here I can also reproduce with a Rust reproducer that was provided by Ives that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate 80 instances I can trigger the issues in ten minutes. It turns out that we got some pages write-through even if uffd-wp is applied to the pte. The problem is, when removing migration entries, we didn't really worry about write bit as long as we know it's not a write migration entry. That may not be true, for some memory types (e.g. writable shmem) mk_pte can return a pte with write bit set, then to recover the migration entry to its original state we need to explicit wr-protect the pte or it'll has the write bit set if it's a read migration entry. For uffd it can cause write-through. The relevant code on uffd was introduced in the anon support, which is commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration", 2020-04-07). However anon shouldn't suffer from this problem because anon should already have the write bit cleared always, so that may not be a proper Fixes target, while I'm adding the Fixes to be uffd shmem support. Link: https://lkml.kernel.org/r/20221114000447.1681003-2-peterx@redhat.com Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Reported-by: Ives van Hoorne <ives(a)codesandbox.io> Reviewed-by: Alistair Popple <apopple(a)nvidia.com> Tested-by: Ives van Hoorne <ives(a)codesandbox.io> Signed-off-by: Peter Xu <peterx(a)redhat.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Nadav Amit <nadav.amit(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) --- a/mm/migrate.c~mm-migrate-fix-read-only-page-got-writable-when-recover-pte +++ a/mm/migrate.c @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct pte = pte_mkdirty(pte); if (is_writable_migration_entry(entry)) pte = maybe_mkwrite(pte, vma); - else if (pte_swp_uffd_wp(*pvmw.pte)) + else + /* NOTE: mk_pte can have write bit set */ + pte = pte_wrprotect(pte); + + if (pte_swp_uffd_wp(*pvmw.pte)) { + WARN_ON_ONCE(pte_write(pte)); pte = pte_mkuffd_wp(pte); + } if (folio_test_anon(folio) && !is_readable_migration_entry(entry)) rmap_flags |= RMAP_EXCLUSIVE; _ Patches currently in -mm which might be from peterx(a)redhat.com are mm-migrate-fix-read-only-page-got-writable-when-recover-pte.patch mm-always-compile-in-pte-markers.patch mm-use-pte-markers-for-swap-errors.patch

2 years, 11 months

1
0
0 0

[PATCH v10 2/3] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing

by Mike Kravetz

madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page tables associated with the address range. For hugetlb vmas, zap_page_range will call __unmap_hugepage_range_final. However, __unmap_hugepage_range_final assumes the passed vma is about to be removed and deletes the vma_lock to prevent pmd sharing as the vma is on the way out. In the case of madvise(MADV_DONTNEED) the vma remains, but the missing vma_lock prevents pmd sharing and could potentially lead to issues with truncation/fault races. This issue was originally reported here [1] as a BUG triggered in page_try_dup_anon_rmap. Prior to the introduction of the hugetlb vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to prevent pmd sharing. Subsequent faults on this vma were confused as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was not set in new pages added to the page table. This resulted in pages that appeared anonymous in a VM_SHARED vma and triggered the BUG. Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap call from unmap_vmas(). This is used to indicate the 'final' unmapping of a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and the vm_lock is not deleted. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J… Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reported-by: Wei Chen <harperchen1110(a)gmail.com> Cc: <stable(a)vger.kernel.org> --- include/linux/mm.h | 2 ++ mm/hugetlb.c | 27 ++++++++++++++++----------- mm/memory.c | 2 +- 3 files changed, 19 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index dd5a38682537..a4e24dd2d96e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1886,6 +1886,8 @@ struct zap_details { * default, the flag is not set. */ #define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) +/* Set in unmap_vmas() to indicate a final unmap call. Only used by hugetlb */ +#define ZAP_FLAG_UNMAP ((__force zap_flags_t) BIT(1)) #ifdef CONFIG_MMU extern bool can_do_mlock(void); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9d765364231e..7559b9dfe782 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5210,17 +5210,22 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb, __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); - /* - * Unlock and free the vma lock before releasing i_mmap_rwsem. When - * the vma_lock is freed, this makes the vma ineligible for pmd - * sharing. And, i_mmap_rwsem is required to set up pmd sharing. - * This is important as page tables for this unmapped range will - * be asynchrously deleted. If the page tables are shared, there - * will be issues when accessed by someone else. - */ - __hugetlb_vma_unlock_write_free(vma); - - i_mmap_unlock_write(vma->vm_file->f_mapping); + if (zap_flags & ZAP_FLAG_UNMAP) { /* final unmap */ + /* + * Unlock and free the vma lock before releasing i_mmap_rwsem. + * When the vma_lock is freed, this makes the vma ineligible + * for pmd sharing. And, i_mmap_rwsem is required to set up + * pmd sharing. This is important as page tables for this + * unmapped range will be asynchrously deleted. If the page + * tables are shared, there will be issues when accessed by + * someone else. + */ + __hugetlb_vma_unlock_write_free(vma); + i_mmap_unlock_write(vma->vm_file->f_mapping); + } else { + i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); + } } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, diff --git a/mm/memory.c b/mm/memory.c index a177f6bbfafc..6d77bc00bca1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1673,7 +1673,7 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, { struct mmu_notifier_range range; struct zap_details details = { - .zap_flags = ZAP_FLAG_DROP_MARKER, + .zap_flags = ZAP_FLAG_DROP_MARKER | ZAP_FLAG_UNMAP, /* Careful - we need to zap private pages too! */ .even_cows = true, }; -- 2.38.1

2 years, 11 months

1
0
0 0

[PATCH v10 1/3] madvise: use zap_page_range_single for madvise dontneed

by Mike Kravetz

Expose the routine zap_page_range_single to zap a range within a single vma. The madvise routine madvise_dontneed_single_vma can use this routine as it explicitly operates on a single vma. Also, update the mmu notification range in zap_page_range_single to take hugetlb pmd sharing into account. This is required as MADV_DONTNEED supports hugetlb vmas. Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reported-by: Wei Chen <harperchen1110(a)gmail.com> Cc: <stable(a)vger.kernel.org> --- include/linux/mm.h | 27 +++++++++++++++++++-------- mm/madvise.c | 6 +++--- mm/memory.c | 23 +++++++++++------------ 3 files changed, 33 insertions(+), 23 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 9838b535fa21..dd5a38682537 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1870,6 +1870,23 @@ static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t *nodem __show_free_areas(flags, nodemask, MAX_NR_ZONES - 1); } +/* + * Parameter block passed down to zap_pte_range in exceptional cases. + */ +struct zap_details { + struct folio *single_folio; /* Locked folio to be unmapped */ + bool even_cows; /* Zap COWed private pages too? */ + zap_flags_t zap_flags; /* Extra flags for zapping */ +}; + +/* + * Whether to drop the pte markers, for example, the uffd-wp information for + * file-backed memory. This should only be specified when we will completely + * drop the page in the mm, either by truncation or unmapping of the vma. By + * default, the flag is not set. + */ +#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) + #ifdef CONFIG_MMU extern bool can_do_mlock(void); #else @@ -1887,6 +1904,8 @@ void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address, unsigned long size); void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size); +void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, + unsigned long size, struct zap_details *details); void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *start_vma, unsigned long start, unsigned long end); @@ -3518,12 +3537,4 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, } #endif -/* - * Whether to drop the pte markers, for example, the uffd-wp information for - * file-backed memory. This should only be specified when we will completely - * drop the page in the mm, either by truncation or unmapping of the vma. By - * default, the flag is not set. - */ -#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) - #endif /* _LINUX_MM_H */ diff --git a/mm/madvise.c b/mm/madvise.c index df62d9e1035a..a21b186eb7a0 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -785,8 +785,8 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, * Application no longer needs these pages. If the pages are dirty, * it's OK to just throw them away. The app will be more careful about * data it wants to keep. Be sure to free swap resources too. The - * zap_page_range call sets things up for shrink_active_list to actually free - * these pages later if no one else has touched them in the meantime, + * zap_page_range_single call sets things up for shrink_active_list to actually + * free these pages later if no one else has touched them in the meantime, * although we could add these pages to a global reuse list for * shrink_active_list to pick up before reclaiming other pages. * @@ -803,7 +803,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, static long madvise_dontneed_single_vma(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - zap_page_range(vma, start, end - start); + zap_page_range_single(vma, start, end - start, NULL); return 0; } diff --git a/mm/memory.c b/mm/memory.c index 98ddb91df9a7..a177f6bbfafc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1294,15 +1294,6 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return ret; } -/* - * Parameter block passed down to zap_pte_range in exceptional cases. - */ -struct zap_details { - struct folio *single_folio; /* Locked folio to be unmapped */ - bool even_cows; /* Zap COWed private pages too? */ - zap_flags_t zap_flags; /* Extra flags for zapping */ -}; - /* Whether we should zap all COWed (private) pages too */ static inline bool should_zap_cows(struct zap_details *details) { @@ -1736,19 +1727,27 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start, * * The range must fit into one VMA. */ -static void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, +void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *details) { + const unsigned long end = address + size; struct mmu_notifier_range range; struct mmu_gather tlb; lru_add_drain(); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, - address, address + size); + address, end); + if (is_vm_hugetlb_page(vma)) + adjust_range_if_pmd_sharing_possible(vma, &range.start, + &range.end); tlb_gather_mmu(&tlb, vma->vm_mm); update_hiwater_rss(vma->vm_mm); mmu_notifier_invalidate_range_start(&range); - unmap_single_vma(&tlb, vma, address, range.end, details); + /* + * unmap 'address-end' not 'range.start-range.end' as range + * could have been expanded for hugetlb pmd sharing. + */ + unmap_single_vma(&tlb, vma, address, end, details); mmu_notifier_invalidate_range_end(&range); tlb_finish_mmu(&tlb); } -- 2.38.1

2 years, 11 months

1
0
0 0

[PATCH] rtc: ds1347: fix value written to century register

by Ian Abbott

In `ds1347_set_time()`, the wrong value is being written to the `DS1347_CENTURY_REG` register. It needs to be converted to BCD. Fix it. Fixes: 147dae76dbb9 ("rtc: ds1347: handle century register") Cc: <stable(a)vger.kernel.org> # v5.5+ Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk> --- drivers/rtc/rtc-ds1347.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-ds1347.c b/drivers/rtc/rtc-ds1347.c index 157bf5209ac4..a40c1a52df65 100644 --- a/drivers/rtc/rtc-ds1347.c +++ b/drivers/rtc/rtc-ds1347.c @@ -112,7 +112,7 @@ static int ds1347_set_time(struct device *dev, struct rtc_time *dt) return err; century = (dt->tm_year / 100) + 19; - err = regmap_write(map, DS1347_CENTURY_REG, century); + err = regmap_write(map, DS1347_CENTURY_REG, bin2bcd(century)); if (err) return err; -- 2.35.1

2 years, 11 months

2
2
0 0

[PATCH v2 4/4] drm/amdgpu/dm/dp_mst: Don't grab mst_mgr->lock when computing DSC state

by Lyude Paul

Now that we've fixed the issue with using the incorrect topology manager, we're actually grabbing the topology manager's lock - and consequently deadlocking. Luckily for us though, there's actually nothing in AMD's DSC state computation code that really should need this lock. The one exception is the mutex_lock() in dm_dp_mst_is_port_support_mode(), however we grab no locks beneath &mgr->lock there so that should be fine to leave be. Signed-off-by: Lyude Paul <lyude(a)redhat.com> Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share") Cc: <stable(a)vger.kernel.org> # v5.6+ --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c index 5196c9a0e432d..59648f5ffb59d 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -1148,10 +1148,8 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, continue; mst_mgr = aconnector->port->mgr; - mutex_lock(&mst_mgr->lock); ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr, &link_vars_start_index); - mutex_unlock(&mst_mgr->lock); if (ret != 0) return ret; @@ -1208,10 +1206,8 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, continue; mst_mgr = aconnector->port->mgr; - mutex_lock(&mst_mgr->lock); ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr, &link_vars_start_index); - mutex_unlock(&mst_mgr->lock); if (ret != 0) return ret; -- 2.37.3

2 years, 11 months

1
0
0 0

[PATCH v2 3/4] drm/amdgpu/dm/mst: Use the correct topology mgr pointer in amdgpu_dm_connector

by Lyude Paul

This bug hurt me. Basically, it appears that we've been grabbing the entirely wrong mutex in the MST DSC computation code for amdgpu! While we've been grabbing: amdgpu_dm_connector->mst_mgr That's zero-initialized memory, because the only connectors we'll ever actually be doing DSC computations for are MST ports. Which have mst_mgr zero-initialized, and instead have the correct topology mgr pointer located at: amdgpu_dm_connector->mst_port->mgr; I'm a bit impressed that until now, this code has managed not to crash anyone's systems! It does seem to cause a warning in LOCKDEP though: [ 66.637670] DEBUG_LOCKS_WARN_ON(lock->magic != lock) This was causing the problems that appeared to have been introduced by: commit 4d07b0bc4034 ("drm/display/dp_mst: Move all payload info into the atomic state") This wasn't actually where they came from though. Presumably, before the only thing we were doing with the topology mgr pointer was attempting to grab mst_mgr->lock. Since the above commit however, we grab much more information from mst_mgr including the atomic MST state and respective modesetting locks. This patch also implies that up until now, it's quite likely we could be susceptible to race conditions when going through the MST topology state for DSC computations since we technically will not have grabbed any lock when going through it. So, let's fix this by adjusting all the respective code paths to look at the right pointer and skip things that aren't actual MST connectors from a topology. Signed-off-by: Lyude Paul <lyude(a)redhat.com> Gitlab issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2171 Fixes: 8c20a1ed9b4f ("drm/amd/display: MST DSC compute fair share") Cc: <stable(a)vger.kernel.org> # v5.6+ --- .../display/amdgpu_dm/amdgpu_dm_mst_types.c | 37 +++++++++---------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c index bba2e8aaa2c20..5196c9a0e432d 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c @@ -1117,6 +1117,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, struct dc_stream_state *stream; bool computed_streams[MAX_PIPES]; struct amdgpu_dm_connector *aconnector; + struct drm_dp_mst_topology_mgr *mst_mgr; int link_vars_start_index = 0; int ret = 0; @@ -1131,7 +1132,7 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context; - if (!aconnector || !aconnector->dc_sink) + if (!aconnector || !aconnector->dc_sink || !aconnector->port) continue; if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported) @@ -1146,16 +1147,13 @@ int compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, if (!is_dsc_need_re_compute(state, dc_state, stream->link)) continue; - mutex_lock(&aconnector->mst_mgr.lock); - - ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, - &aconnector->mst_mgr, + mst_mgr = aconnector->port->mgr; + mutex_lock(&mst_mgr->lock); + ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr, &link_vars_start_index); - if (ret != 0) { - mutex_unlock(&aconnector->mst_mgr.lock); + mutex_unlock(&mst_mgr->lock); + if (ret != 0) return ret; - } - mutex_unlock(&aconnector->mst_mgr.lock); for (j = 0; j < dc_state->stream_count; j++) { if (dc_state->streams[j]->link == stream->link) @@ -1182,6 +1180,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, struct dc_stream_state *stream; bool computed_streams[MAX_PIPES]; struct amdgpu_dm_connector *aconnector; + struct drm_dp_mst_topology_mgr *mst_mgr; int link_vars_start_index = 0; int ret; @@ -1196,7 +1195,7 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, aconnector = (struct amdgpu_dm_connector *)stream->dm_stream_context; - if (!aconnector || !aconnector->dc_sink) + if (!aconnector || !aconnector->dc_sink || !aconnector->port) continue; if (!aconnector->dc_sink->dsc_caps.dsc_dec_caps.is_dsc_supported) @@ -1208,15 +1207,13 @@ static int pre_compute_mst_dsc_configs_for_state(struct drm_atomic_state *state, if (!is_dsc_need_re_compute(state, dc_state, stream->link)) continue; - mutex_lock(&aconnector->mst_mgr.lock); - ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, - &aconnector->mst_mgr, + mst_mgr = aconnector->port->mgr; + mutex_lock(&mst_mgr->lock); + ret = compute_mst_dsc_configs_for_link(state, dc_state, stream->link, vars, mst_mgr, &link_vars_start_index); - if (ret != 0) { - mutex_unlock(&aconnector->mst_mgr.lock); + mutex_unlock(&mst_mgr->lock); + if (ret != 0) return ret; - } - mutex_unlock(&aconnector->mst_mgr.lock); for (j = 0; j < dc_state->stream_count; j++) { if (dc_state->streams[j]->link == stream->link) @@ -1419,6 +1416,7 @@ enum dc_status dm_dp_mst_is_port_support_mode( unsigned int upper_link_bw_in_kbps = 0, down_link_bw_in_kbps = 0; unsigned int max_compressed_bw_in_kbps = 0; struct dc_dsc_bw_range bw_range = {0}; + struct drm_dp_mst_topology_mgr *mst_mgr; /* * check if the mode could be supported if DSC pass-through is supported @@ -1427,7 +1425,8 @@ enum dc_status dm_dp_mst_is_port_support_mode( */ if (is_dsc_common_config_possible(stream, &bw_range) && aconnector->port->passthrough_aux) { - mutex_lock(&aconnector->mst_mgr.lock); + mst_mgr = aconnector->port->mgr; + mutex_lock(&mst_mgr->lock); cur_link_settings = stream->link->verified_link_cap; @@ -1440,7 +1439,7 @@ enum dc_status dm_dp_mst_is_port_support_mode( end_to_end_bw_in_kbps = min(upper_link_bw_in_kbps, down_link_bw_in_kbps); - mutex_unlock(&aconnector->mst_mgr.lock); + mutex_unlock(&mst_mgr->lock); /* * use the maximum dsc compression bandwidth as the required -- 2.37.3

2 years, 11 months

1
0
0 0

[PATCH v2 2/4] drm/display/dp_mst: Fix drm_dp_mst_add_affected_dsc_crtcs() return code

by Lyude Paul

Looks like that we're accidentally dropping a pretty important return code here. For some reason, we just return -EINVAL if we fail to get the MST topology state. This is wrong: error codes are important and should never be squashed without being handled, which here seems to have the potential to cause a deadlock. Signed-off-by: Lyude Paul <lyude(a)redhat.com> Reviewed-by: Wayne Lin <Wayne.Lin(a)amd.com> Fixes: 8ec046716ca8 ("drm/dp_mst: Add helper to trigger modeset on affected DSC MST CRTCs") Cc: <stable(a)vger.kernel.org> # v5.6+ --- drivers/gpu/drm/display/drm_dp_mst_topology.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c index ecd22c038c8c0..51a46689cda70 100644 --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c @@ -5186,7 +5186,7 @@ int drm_dp_mst_add_affected_dsc_crtcs(struct drm_atomic_state *state, struct drm mst_state = drm_atomic_get_mst_topology_state(state, mgr); if (IS_ERR(mst_state)) - return -EINVAL; + return PTR_ERR(mst_state); list_for_each_entry(pos, &mst_state->payloads, next) { -- 2.37.3

2 years, 11 months

1
0
0 0

+ mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: SeongJae Park <sj(a)kernel.org> Subject: mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed Date: Mon, 14 Nov 2022 17:55:52 +0000 A DAMON sysfs interface user can start DAMON with a scheme, remove the sysfs directory for the scheme, and then ask update of the scheme's stats. Because the schemes stats update logic isn't aware of the situation, it results in an invalid memory access. Fix the bug by checking if the scheme sysfs directory exists. Link: https://lkml.kernel.org/r/20221114175552.1951-1-sj@kernel.org Fixes: 0ac32b8affb5 ("mm/damon/sysfs: support DAMOS stats") Signed-off-by: SeongJae Park <sj(a)kernel.org> Cc: <stable(a)vger.kernel.org> [v5.18] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/damon/sysfs.c | 4 ++++ 1 file changed, 4 insertions(+) --- a/mm/damon/sysfs.c~mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed +++ a/mm/damon/sysfs.c @@ -2339,6 +2339,10 @@ static int damon_sysfs_upd_schemes_stats damon_for_each_scheme(scheme, ctx) { struct damon_sysfs_stats *sysfs_stats; + /* user could removed the scheme sysfs dir */ + if (schemes_idx >= sysfs_schemes->nr) + break; + sysfs_stats = sysfs_schemes->schemes_arr[schemes_idx++]->stats; sysfs_stats->nr_tried = scheme->stat.nr_tried; sysfs_stats->sz_tried = scheme->stat.sz_tried; _ Patches currently in -mm which might be from sj(a)kernel.org are mm-damon-sysfs-schemes-skip-stats-update-if-the-scheme-directory-is-removed.patch docs-admin-guide-mm-damon-usage-describe-the-rules-of-sysfs-region-directories.patch docs-admin-guide-mm-damon-usage-fix-wrong-usage-example-of-init_regions-file.patch mm-damon-core-add-a-callback-for-scheme-target-regions-check.patch mm-damon-sysfs-schemes-implement-schemes-tried_regions-directory.patch mm-damon-sysfs-schemes-implement-scheme-region-directory.patch mm-damon-sysfs-implement-damos-tried-regions-update-command.patch mm-damon-sysfs-implement-damos-tried-regions-update-command-fix.patch mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command.patch mm-damon-sysfs-schemes-implement-damos-tried-regions-clear-command-fix.patch tools-selftets-damon-sysfs-test-tried_regions-directory-existence.patch docs-admin-guide-mm-damon-usage-document-schemes-s-tried_regions-sysfs-directory.patch docs-abi-damon-document-schemes-s-tried_regions-sysfs-directory.patch selftests-damon-test-non-context-inputs-to-rm_contexts-file.patch

2 years, 11 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror November 2022