4.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugh Dickins hughd@google.com
commit b0b9b3df27d100a975b4e8818f35382b64a5e35c upstream.
4.10-rc loadtest (even on x86, and even without THPCache) fails with "fork: Cannot allocate memory" or some such; and /proc/meminfo shows PageTables growing.
Commit 953c66c2b22a ("mm: THP page cache support for ppc64") that got merged in rc1 removed the freeing of an unused preallocated pagetable after do_fault_around() has called map_pages().
This is usually a good optimization, so that the followup doesn't have to reallocate one; but it's not sufficient to shift the freeing into alloc_set_pte(), since there are failure cases (most commonly VM_FAULT_RETRY) which never reach finish_fault().
Check and free it at the outer level in do_fault(), then we don't need to worry in alloc_set_pte(), and can restore that to how it was (I cannot find any reason to pte_free() under lock as it was doing).
And fix a separate pagetable leak, or crash, introduced by the same change, that could only show up on some ppc64: why does do_set_pmd()'s failure case attempt to withdraw a pagetable when it never deposited one, at the same time overwriting (so leaking) the vmf->prealloc_pte? Residue of an earlier implementation, perhaps? Delete it.
Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64") Cc: Aneesh Kumar K.V aneesh.kumar@linux.vnet.ibm.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Michael Ellerman mpe@ellerman.id.au Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Michael Neuling mikey@neuling.org Cc: Paul Mackerras paulus@samba.org Cc: Balbir Singh bsingharora@gmail.com Cc: Andrew Morton akpm@linux-foundation.org Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Minchan Kim minchan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
--- a/mm/memory.c +++ b/mm/memory.c @@ -3329,15 +3329,24 @@ static int do_fault(struct fault_env *fe { struct vm_area_struct *vma = fe->vma; pgoff_t pgoff = linear_page_index(vma, fe->address); + int ret;
/* The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ if (!vma->vm_ops->fault) - return VM_FAULT_SIGBUS; - if (!(fe->flags & FAULT_FLAG_WRITE)) - return do_read_fault(fe, pgoff); - if (!(vma->vm_flags & VM_SHARED)) - return do_cow_fault(fe, pgoff); - return do_shared_fault(fe, pgoff); + ret = VM_FAULT_SIGBUS; + else if (!(fe->flags & FAULT_FLAG_WRITE)) + ret = do_read_fault(fe, pgoff); + else if (!(vma->vm_flags & VM_SHARED)) + ret = do_cow_fault(fe, pgoff); + else + ret = do_shared_fault(fe, pgoff); + + /* preallocated pagetable is unused: free it */ + if (fe->prealloc_pte) { + pte_free(vma->vm_mm, fe->prealloc_pte); + fe->prealloc_pte = 0; + } + return ret; }
static int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,