The patch titled Subject: mm/thp: try_to_unmap() use TTU_SYNC for safe splitting has been added to the -mm tree. Its filename is mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch
This patch should soon appear at https://ozlabs.org/~akpm/mmots/broken-out/mm-thp-try_to_unmap-use-ttu_sync-f... and later at https://ozlabs.org/~akpm/mmotm/broken-out/mm-thp-try_to_unmap-use-ttu_sync-f...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
------------------------------------------------------ From: Hugh Dickins hughd@google.com Subject: mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE (!unmap_success): with dump_page() showing mapcount:1, but then its raw struct page output showing _mapcount ffffffff i.e. mapcount 0.
And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed, it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)), and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG(): all indicative of some mapcount difficulty in development here perhaps. But the !CONFIG_DEBUG_VM path handles the failures correctly and silently.
I believe the problem is that once a racing unmap has cleared pte or pmd, try_to_unmap_one() may skip taking the page table lock, and emerge from try_to_unmap() before the racing task has reached decrementing mapcount.
Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding TTU_SYNC to the options, and passing that from unmap_page().
When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same for both: the slight overhead added should rarely matter, except perhaps if splitting sparsely-populated multiply-mapped shmem. Once confident that bugs are fixed, TTU_SYNC here can be removed, and the race tolerated.
Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com Fixes: fec89c109f3a ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers") Signed-off-by: Hugh Dickins hughd@google.com Cc: stable@vger.kernel.org Cc: Alistair Popple apopple@nvidia.com Cc: Jan Kara jack@suse.cz Cc: Jue Wang juew@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: "Matthew Wilcox (Oracle)" willy@infradead.org Cc: Miaohe Lin linmiaohe@huawei.com Cc: Minchan Kim minchan@kernel.org Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Oscar Salvador osalvador@suse.de Cc: Peter Xu peterx@redhat.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: Shakeel Butt shakeelb@google.com Cc: Wang Yugui wangyugui@e16-tech.com Cc: Yang Shi shy828301@gmail.com Cc: Zi Yan ziy@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
include/linux/rmap.h | 1 + mm/huge_memory.c | 2 +- mm/page_vma_mapped.c | 11 +++++++++++ mm/rmap.c | 17 ++++++++++++++++- 4 files changed, 29 insertions(+), 2 deletions(-)
--- a/include/linux/rmap.h~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/include/linux/rmap.h @@ -91,6 +91,7 @@ enum ttu_flags {
TTU_SPLIT_HUGE_PMD = 0x4, /* split huge PMD if any */ TTU_IGNORE_MLOCK = 0x8, /* ignore mlock */ + TTU_SYNC = 0x10, /* avoid racy checks with PVMW_SYNC */ TTU_IGNORE_HWPOISON = 0x20, /* corrupted page is recoverable */ TTU_BATCH_FLUSH = 0x40, /* Batch TLB flushes where possible * and caller guarantees they will --- a/mm/huge_memory.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/mm/huge_memory.c @@ -2350,7 +2350,7 @@ void vma_adjust_trans_huge(struct vm_are
static void unmap_page(struct page *page) { - enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | + enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; bool unmap_success;
--- a/mm/page_vma_mapped.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/mm/page_vma_mapped.c @@ -212,6 +212,17 @@ restart: pvmw->ptl = NULL; } } else if (!pmd_present(pmde)) { + /* + * If PVMW_SYNC, take and drop THP pmd lock so that we + * cannot return prematurely, while zap_huge_pmd() has + * cleared *pmd but not decremented compound_mapcount(). + */ + if ((pvmw->flags & PVMW_SYNC) && + PageTransCompound(pvmw->page)) { + spinlock_t *ptl = pmd_lock(mm, pvmw->pmd); + + spin_unlock(ptl); + } return false; } if (!map_pte(pvmw)) --- a/mm/rmap.c~mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting +++ a/mm/rmap.c @@ -1405,6 +1405,15 @@ static bool try_to_unmap_one(struct page struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg;
+ /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + if (flags & TTU_SYNC) + pvmw.flags = PVMW_SYNC; + /* munlock has nothing to gain from examining un-locked vmas */ if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED)) return true; @@ -1777,7 +1786,13 @@ bool try_to_unmap(struct page *page, enu else rmap_walk(page, &rwc);
- return !page_mapcount(page) ? true : false; + /* + * When racing against e.g. zap_pte_range() on another cpu, + * in between its ptep_get_and_clear_full() and page_remove_rmap(), + * try_to_unmap() may return false when it is about to become true, + * if page table locking is skipped: use TTU_SYNC to wait for that. + */ + return !page_mapcount(page); }
/** _
Patches currently in -mm which might be from hughd@google.com are
mm-thp-fix-__split_huge_pmd_locked-on-shmem-migration-entry.patch mm-thp-make-is_huge_zero_pmd-safe-and-quicker.patch mm-thp-try_to_unmap-use-ttu_sync-for-safe-splitting.patch mm-thp-fix-vma_address-if-virtual-address-below-file-offset.patch mm-thp-unmap_mapping_page-to-fix-thp-truncate_cleanup_page.patch mm-page_vma_mapped_walk-use-page-for-pvmw-page.patch mm-page_vma_mapped_walk-settle-pagehuge-on-entry.patch mm-page_vma_mapped_walk-use-pmd_read_atomic.patch mm-page_vma_mapped_walk-use-pmde-for-pvmw-pmd.patch mm-page_vma_mapped_walk-prettify-pvmw_migration-block.patch mm-page_vma_mapped_walk-crossing-page-table-boundary.patch mm-page_vma_mapped_walk-add-a-level-of-indentation.patch mm-page_vma_mapped_walk-use-goto-instead-of-while-1.patch mm-page_vma_mapped_walk-get-vma_address_end-earlier.patch mm-thp-fix-page_vma_mapped_walk-if-thp-mapped-by-ptes.patch mm-thp-another-pvmw_sync-fix-in-page_vma_mapped_walk.patch mm-thp-remap_page-is-only-needed-on-anonymous-thp.patch mm-hwpoison_user_mappings-try_to_unmap-with-ttu_sync.patch