The patch titled Subject: hugetlbfs: remove unnecessary code after i_mmap_rwsem synchronization has been added to the -mm tree. Its filename is hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization.patch
This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-remove-unnecessary-code-a... and later at http://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-remove-unnecessary-code-a...
Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated there every 3-4 working days
------------------------------------------------------ From: Mike Kravetz mike.kravetz@oracle.com Subject: hugetlbfs: remove unnecessary code after i_mmap_rwsem synchronization
After expanding i_mmap_rwsem use for better shared pmd and page fault/ truncation synchronization, remove code that is no longer necessary.
Link: http://lkml.kernel.org/r/20181203200850.6460-4-mike.kravetz@oracle.com Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd") Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "Aneesh Kumar K . V" aneesh.kumar@linux.vnet.ibm.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Hugh Dickins hughd@google.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@kernel.org Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Prakash Sangappa prakash.sangappa@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org ---
fs/hugetlbfs/inode.c | 46 +++++++++++++---------------------------- mm/hugetlb.c | 21 ++++++++---------- 2 files changed, 25 insertions(+), 42 deletions(-)
--- a/fs/hugetlbfs/inode.c~hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization +++ a/fs/hugetlbfs/inode.c @@ -383,17 +383,16 @@ hugetlb_vmdelete_list(struct rb_root_cac * truncation is indicated by end of range being LLONG_MAX * In this case, we first scan the range and release found pages. * After releasing pages, hugetlb_unreserve_pages cleans up region/reserv - * maps and global counts. Page faults can not race with truncation - * in this routine. hugetlb_no_page() prevents page faults in the - * truncated range. It checks i_size before allocation, and again after - * with the page table lock for the page held. The same lock must be - * acquired to unmap a page. + * maps and global counts. * hole punch is indicated if end is not LLONG_MAX * In the hole punch case we scan the range and release found pages. * Only when releasing a page is the associated region/reserv map * deleted. The region/reserv map for ranges without associated - * pages are not modified. Page faults can race with hole punch. - * This is indicated if we find a mapped page. + * pages are not modified. + * + * Callers of this routine must hold the i_mmap_rwsem in write mode to prevent + * races with page faults. + * * Note: If the passed end of range value is beyond the end of file, but * not LLONG_MAX this routine still performs a hole punch operation. */ @@ -423,32 +422,14 @@ static void remove_inode_hugepages(struc
for (i = 0; i < pagevec_count(&pvec); ++i) { struct page *page = pvec.pages[i]; - u32 hash;
index = page->index; - hash = hugetlb_fault_mutex_hash(h, current->mm, - &pseudo_vma, - mapping, index, 0); - mutex_lock(&hugetlb_fault_mutex_table[hash]); - /* - * If page is mapped, it was faulted in after being - * unmapped in caller. Unmap (again) now after taking - * the fault mutex. The mutex will prevent faults - * until we finish removing the page. - * - * This race can only happen in the hole punch case. - * Getting here in a truncate operation is a bug. + * A mapped page is impossible as callers should unmap + * all references before calling. And, i_mmap_rwsem + * prevents the creation of additional mappings. */ - if (unlikely(page_mapped(page))) { - BUG_ON(truncate_op); - - i_mmap_lock_write(mapping); - hugetlb_vmdelete_list(&mapping->i_mmap, - index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); - i_mmap_unlock_write(mapping); - } + VM_BUG_ON(page_mapped(page));
lock_page(page); /* @@ -470,7 +451,6 @@ static void remove_inode_hugepages(struc }
unlock_page(page); - mutex_unlock(&hugetlb_fault_mutex_table[hash]); } huge_pagevec_release(&pvec); cond_resched(); @@ -624,7 +604,11 @@ static long hugetlbfs_fallocate(struct f /* addr is the offset within the file (zero based) */ addr = index * hpage_size;
- /* mutex taken here, fault path and hole punch */ + /* + * fault mutex taken here, protects against fault path + * and hole punch. inode_lock previously taken protects + * against truncation. + */ hash = hugetlb_fault_mutex_hash(h, mm, &pseudo_vma, mapping, index, addr); mutex_lock(&hugetlb_fault_mutex_table[hash]); --- a/mm/hugetlb.c~hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization +++ a/mm/hugetlb.c @@ -3760,16 +3760,16 @@ static vm_fault_t hugetlb_no_page(struct }
/* - * Use page lock to guard against racing truncation - * before we get page_table_lock. + * We can not race with truncation due to holding i_mmap_rwsem. + * Check once here for faults beyond end of file. */ + size = i_size_read(mapping->host) >> huge_page_shift(h); + if (idx >= size) + goto out; + retry: page = find_lock_page(mapping, idx); if (!page) { - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto out; - /* * Check for page in userfault range */ @@ -3859,9 +3859,6 @@ retry: }
ptl = huge_pte_lock(h, mm, ptep); - size = i_size_read(mapping->host) >> huge_page_shift(h); - if (idx >= size) - goto backout;
ret = 0; if (!huge_pte_none(huge_ptep_get(ptep))) @@ -3964,8 +3961,10 @@ vm_fault_t hugetlb_fault(struct mm_struc
/* * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold - * until finished with ptep. This prevents huge_pmd_unshare from - * being called elsewhere and making the ptep no longer valid. + * until finished with ptep. This serves two purposes: + * 1) It prevents huge_pmd_unshare from being called elsewhere + * and making the ptep no longer valid. + * 2) It synchronizes us with file truncation. * * ptep could have already be assigned via huge_pte_offset. That * is OK, as huge_pte_alloc will return the same value unless _
Patches currently in -mm which might be from mike.kravetz@oracle.com are
hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch hugetlbfs-remove-unnecessary-code-after-i_mmap_rwsem-synchronization.patch
linux-stable-mirror@lists.linaro.org