December 2017 - Linux-stable-mirror

[Linux-stable-mirror] FAILED: patch "[PATCH] nfsd: Fix stateid races between OPEN and CLOSE" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 15ca08d3299682dc49bad73251677b2c5017ef08 Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.myklebust(a)primarydata.com> Date: Fri, 3 Nov 2017 08:00:10 -0400 Subject: [PATCH] nfsd: Fix stateid races between OPEN and CLOSE Open file stateids can linger on the nfs4_file list of stateids even after they have been closed. In order to avoid reusing such a stateid, and confusing the client, we need to recheck the nfs4_stid's type after taking the mutex. Otherwise, we risk reusing an old stateid that was already closed, which will confuse clients that expect new stateids to conform to RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4. Signed-off-by: Trond Myklebust <trond.myklebust(a)primarydata.com> Cc: stable(a)vger.kernel.org Signed-off-by: J. Bruce Fields <bfields(a)redhat.com> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index b82817767b9d..ee8fde2dfa92 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3562,7 +3562,9 @@ nfsd4_find_existing_open(struct nfs4_file *fp, struct nfsd4_open *open) /* ignore lock owners */ if (local->st_stateowner->so_is_open_owner == 0) continue; - if (local->st_stateowner == &oo->oo_owner) { + if (local->st_stateowner != &oo->oo_owner) + continue; + if (local->st_stid.sc_type == NFS4_OPEN_STID) { ret = local; refcount_inc(&ret->st_stid.sc_count); break; @@ -3571,6 +3573,52 @@ nfsd4_find_existing_open(struct nfs4_file *fp, struct nfsd4_open *open) return ret; } +static __be32 +nfsd4_verify_open_stid(struct nfs4_stid *s) +{ + __be32 ret = nfs_ok; + + switch (s->sc_type) { + default: + break; + case NFS4_CLOSED_STID: + case NFS4_CLOSED_DELEG_STID: + ret = nfserr_bad_stateid; + break; + case NFS4_REVOKED_DELEG_STID: + ret = nfserr_deleg_revoked; + } + return ret; +} + +/* Lock the stateid st_mutex, and deal with races with CLOSE */ +static __be32 +nfsd4_lock_ol_stateid(struct nfs4_ol_stateid *stp) +{ + __be32 ret; + + mutex_lock(&stp->st_mutex); + ret = nfsd4_verify_open_stid(&stp->st_stid); + if (ret != nfs_ok) + mutex_unlock(&stp->st_mutex); + return ret; +} + +static struct nfs4_ol_stateid * +nfsd4_find_and_lock_existing_open(struct nfs4_file *fp, struct nfsd4_open *open) +{ + struct nfs4_ol_stateid *stp; + for (;;) { + spin_lock(&fp->fi_lock); + stp = nfsd4_find_existing_open(fp, open); + spin_unlock(&fp->fi_lock); + if (!stp || nfsd4_lock_ol_stateid(stp) == nfs_ok) + break; + nfs4_put_stid(&stp->st_stid); + } + return stp; +} + static struct nfs4_openowner * alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open, struct nfsd4_compound_state *cstate) @@ -3615,6 +3663,7 @@ init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) mutex_init(&stp->st_mutex); mutex_lock(&stp->st_mutex); +retry: spin_lock(&oo->oo_owner.so_client->cl_lock); spin_lock(&fp->fi_lock); @@ -3639,7 +3688,11 @@ init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) spin_unlock(&fp->fi_lock); spin_unlock(&oo->oo_owner.so_client->cl_lock); if (retstp) { - mutex_lock(&retstp->st_mutex); + /* Handle races with CLOSE */ + if (nfsd4_lock_ol_stateid(retstp) != nfs_ok) { + nfs4_put_stid(&retstp->st_stid); + goto retry; + } /* To keep mutex tracking happy */ mutex_unlock(&stp->st_mutex); stp = retstp; @@ -4460,9 +4513,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf status = nfs4_check_deleg(cl, open, &dp); if (status) goto out; - spin_lock(&fp->fi_lock); - stp = nfsd4_find_existing_open(fp, open); - spin_unlock(&fp->fi_lock); + stp = nfsd4_find_and_lock_existing_open(fp, open); } else { open->op_file = NULL; status = nfserr_bad_stateid; @@ -4476,7 +4527,6 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf */ if (stp) { /* Stateid was found, this is an OPEN upgrade */ - mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { mutex_unlock(&stp->st_mutex); @@ -5367,7 +5417,6 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s) bool unhashed; LIST_HEAD(reaplist); - s->st_stid.sc_type = NFS4_CLOSED_STID; spin_lock(&clp->cl_lock); unhashed = unhash_open_stateid(s, &reaplist); @@ -5407,10 +5456,12 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, nfsd4_bump_seqid(cstate, status); if (status) goto out; + + stp->st_stid.sc_type = NFS4_CLOSED_STID; nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); - mutex_unlock(&stp->st_mutex); nfsd4_close_open_stateid(stp); + mutex_unlock(&stp->st_mutex); /* put reference from nfs4_preprocess_seqid_op */ nfs4_put_stid(&stp->st_stid);

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a8f97366452ed491d13cf1e44241bc0b5740b1f0 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Date: Mon, 27 Nov 2017 06:21:25 +0300 Subject: mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream. Currently, we unconditionally make page table dirty in touch_pmd(). It may result in false-positive can_follow_write_pmd(). We may avoid the situation, if we would only make the page table entry dirty if caller asks for write access -- FOLL_WRITE. The patch also changes touch_pud() in the same way. Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Hugh Dickins <hughd(a)google.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> [Salvatore Bonaccorso: backport for 4.9: - Adjust context - Drop specific part for PUD-sized transparent hugepages. Support for PUD-sized transparent hugepages was added in v4.11-rc1 ] Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- mm/huge_memory.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -745,20 +745,15 @@ int vmf_insert_pfn_pmd(struct vm_area_st EXPORT_SYMBOL_GPL(vmf_insert_pfn_pmd); static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd) + pmd_t *pmd, int flags) { pmd_t _pmd; - /* - * We should set the dirty bit only for FOLL_WRITE but for now - * the dirty bit in the pmd is meaningless. And if the dirty - * bit will become meaningful and we'll only set it with - * FOLL_WRITE, an atomic set_bit will be required on the pmd to - * set the young bit, instead of the current set_pmd_at. - */ - _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); + _pmd = pmd_mkyoung(*pmd); + if (flags & FOLL_WRITE) + _pmd = pmd_mkdirty(_pmd); if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, - pmd, _pmd, 1)) + pmd, _pmd, flags & FOLL_WRITE)) update_mmu_cache_pmd(vma, addr, pmd); } @@ -787,7 +782,7 @@ struct page *follow_devmap_pmd(struct vm return NULL; if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd); + touch_pmd(vma, addr, pmd, flags); /* * device mapped pages can only be returned if the @@ -1158,7 +1153,7 @@ struct page *follow_trans_huge_pmd(struc page = pmd_page(*pmd); VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd); + touch_pmd(vma, addr, pmd, flags); if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* * We don't mlock() pte-mapped THPs. This way we can avoid Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are queue-4.9/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch queue-4.9/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm/madvise.c: fix madvise() infinite loop under special circumstances" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm/madvise.c: fix madvise() infinite loop under special circumstances to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 Mon Sep 17 00:00:00 2001 From: chenjie <chenjie6(a)huawei.com> Date: Wed, 29 Nov 2017 16:10:54 -0800 Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances From: chenjie <chenjie6(a)huawei.com> commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream. MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings. Unfortunately madvise_willneed() doesn't communicate this information properly to the generic madvise syscall implementation. The calling convention is quite subtle there. madvise_vma() is supposed to either return an error or update &prev otherwise the main loop will never advance to the next vma and it will keep looping for ever without a way to get out of the kernel. It seems this has been broken since introduction. Nobody has noticed because nobody seems to be using MADVISE_WILLNEED on these DAX mappings. [mhocko(a)suse.com: rewrite changelog] Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place") Signed-off-by: chenjie <chenjie6(a)huawei.com> Signed-off-by: guoxuenan <guoxuenan(a)huawei.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: zhangyi (F) <yi.zhang(a)huawei.com> Cc: Miao Xie <miaoxie(a)huawei.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Shaohua Li <shli(a)fb.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com> Cc: Rik van Riel <riel(a)redhat.com> Cc: Carsten Otte <cotte(a)de.ibm.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- mm/madvise.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/mm/madvise.c +++ b/mm/madvise.c @@ -228,15 +228,14 @@ static long madvise_willneed(struct vm_a { struct file *file = vma->vm_file; + *prev = vma; #ifdef CONFIG_SWAP if (!file) { - *prev = vma; force_swapin_readahead(vma, start, end); return 0; } if (shmem_mapping(file->f_mapping)) { - *prev = vma; force_shm_swapin_readahead(vma, start, end, file->f_mapping); return 0; @@ -251,7 +250,6 @@ static long madvise_willneed(struct vm_a return 0; } - *prev = vma; start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; if (end > vma->vm_end) end = vma->vm_end; Patches currently in stable-queue which might be from chenjie6(a)huawei.com are queue-4.9/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm, hugetlbfs: introduce ->split() to vm_operations_struct" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm, hugetlbfs: introduce ->split() to vm_operations_struct to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 31383c6865a578834dd953d9dbc88e6b19fe3997 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Wed, 29 Nov 2017 16:10:28 -0800 Subject: mm, hugetlbfs: introduce ->split() to vm_operations_struct From: Dan Williams <dan.j.williams(a)intel.com> commit 31383c6865a578834dd953d9dbc88e6b19fe3997 upstream. Patch series "device-dax: fix unaligned munmap handling" When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and fail attempts to split vmas into unaligned ranges. It would be messy to teach the munmap path about device-dax alignment constraints in the same (hstate) way that hugetlbfs communicates this constraint. Instead, these patches introduce a new ->split() vm operation. This patch (of 2): The device-dax interface has similar constraints as hugetlbfs in that it requires the munmap path to unmap in huge page aligned units. Rather than add more custom vma handling code in __split_vma() introduce a new vm operation to perform this vma specific check. Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwilli… Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- include/linux/mm.h | 1 + mm/hugetlb.c | 8 ++++++++ mm/mmap.c | 8 +++++--- 3 files changed, 14 insertions(+), 3 deletions(-) --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -347,6 +347,7 @@ struct fault_env { struct vm_operations_struct { void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); + int (*split)(struct vm_area_struct * area, unsigned long addr); int (*mremap)(struct vm_area_struct * area); int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf); int (*pmd_fault)(struct vm_area_struct *, unsigned long address, --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3135,6 +3135,13 @@ static void hugetlb_vm_op_close(struct v } } +static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) +{ + if (addr & ~(huge_page_mask(hstate_vma(vma)))) + return -EINVAL; + return 0; +} + /* * We cannot handle pagefaults against hugetlb pages at all. They cause * handle_mm_fault() to try to instantiate regular-sized pages in the @@ -3151,6 +3158,7 @@ const struct vm_operations_struct hugetl .fault = hugetlb_vm_op_fault, .open = hugetlb_vm_op_open, .close = hugetlb_vm_op_close, + .split = hugetlb_vm_op_split, }; static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2538,9 +2538,11 @@ static int __split_vma(struct mm_struct struct vm_area_struct *new; int err; - if (is_vm_hugetlb_page(vma) && (addr & - ~(huge_page_mask(hstate_vma(vma))))) - return -EINVAL; + if (vma->vm_ops && vma->vm_ops->split) { + err = vma->vm_ops->split(vma, addr); + if (err) + return err; + } new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (!new) Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are queue-4.9/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch queue-4.9/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm/cma: fix alloc_contig_range ret code/potential leak" has been added to the 4.9-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm/cma: fix alloc_contig_range ret code/potential leak to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-cma-fix-alloc_contig_range-ret-code-potential-leak.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 63cd448908b5eb51d84c52f02b31b9b4ccd1cb5a Mon Sep 17 00:00:00 2001 From: Mike Kravetz <mike.kravetz(a)oracle.com> Date: Wed, 29 Nov 2017 16:10:01 -0800 Subject: mm/cma: fix alloc_contig_range ret code/potential leak From: Mike Kravetz <mike.kravetz(a)oracle.com> commit 63cd448908b5eb51d84c52f02b31b9b4ccd1cb5a upstream. If the call __alloc_contig_migrate_range() in alloc_contig_range returns -EBUSY, processing continues so that test_pages_isolated() is called where there is a tracepoint to identify the busy pages. However, it is possible for busy pages to become available between the calls to these two routines. In this case, the range of pages may be allocated. Unfortunately, the original return code (ret == -EBUSY) is still set and returned to the caller. Therefore, the caller believes the pages were not allocated and they are leaked. Update the comment to indicate that allocation is still possible even if __alloc_contig_migrate_range returns -EBUSY. Also, clear return code in this case so that it is not accidentally used or returned to caller. Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure") Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Johannes Weiner <hannes(a)cmpxchg.org> Acked-by: Joonsoo Kim <iamjoonsoo.kim(a)lge.com> Cc: Michal Nazarewicz <mina86(a)mina86.com> Cc: Laura Abbott <labbott(a)redhat.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- mm/page_alloc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7309,11 +7309,18 @@ int alloc_contig_range(unsigned long sta /* * In case of -EBUSY, we'd like to know which page causes problem. - * So, just fall through. We will check it in test_pages_isolated(). + * So, just fall through. test_pages_isolated() has a tracepoint + * which will report the busy page. + * + * It is possible that busy pages could become available before + * the call to test_pages_isolated, and the range will actually be + * allocated. So, if we fall through be sure to clear ret so that + * -EBUSY is not accidentally used or returned to caller. */ ret = __alloc_contig_migrate_range(&cc, start, end); if (ret && ret != -EBUSY) goto done; + ret =0; /* * Pages from [start, end) are within a MAX_ORDER_NR_PAGES Patches currently in stable-queue which might be from mike.kravetz(a)oracle.com are queue-4.9/mm-cma-fix-alloc_contig_range-ret-code-potential-leak.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a8f97366452ed491d13cf1e44241bc0b5740b1f0 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Date: Mon, 27 Nov 2017 06:21:25 +0300 Subject: mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream. Currently, we unconditionally make page table dirty in touch_pmd(). It may result in false-positive can_follow_write_pmd(). We may avoid the situation, if we would only make the page table entry dirty if caller asks for write access -- FOLL_WRITE. The patch also changes touch_pud() in the same way. Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Hugh Dickins <hughd(a)google.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> [Salvatore Bonaccorso: backport for 3.16: - Adjust context - Drop specific part for PUD-sized transparent hugepages. Support for PUD-sized transparent hugepages was added in v4.11-rc1 ] Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- mm/huge_memory.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1304,17 +1304,11 @@ struct page *follow_trans_huge_pmd(struc VM_BUG_ON_PAGE(!PageHead(page), page); if (flags & FOLL_TOUCH) { pmd_t _pmd; - /* - * We should set the dirty bit only for FOLL_WRITE but - * for now the dirty bit in the pmd is meaningless. - * And if the dirty bit will become meaningful and - * we'll only set it with FOLL_WRITE, an atomic - * set_bit will be required on the pmd to set the - * young bit, instead of the current set_pmd_at. - */ - _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); + _pmd = pmd_mkyoung(*pmd); + if (flags & FOLL_WRITE) + _pmd = pmd_mkdirty(_pmd); if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, - pmd, _pmd, 1)) + pmd, _pmd, flags & FOLL_WRITE)) update_mmu_cache_pmd(vma, addr, pmd); } if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are queue-4.4/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch queue-4.4/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm/madvise.c: fix madvise() infinite loop under special circumstances" has been added to the 4.4-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm/madvise.c: fix madvise() infinite loop under special circumstances to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 Mon Sep 17 00:00:00 2001 From: chenjie <chenjie6(a)huawei.com> Date: Wed, 29 Nov 2017 16:10:54 -0800 Subject: mm/madvise.c: fix madvise() infinite loop under special circumstances From: chenjie <chenjie6(a)huawei.com> commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream. MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings. Unfortunately madvise_willneed() doesn't communicate this information properly to the generic madvise syscall implementation. The calling convention is quite subtle there. madvise_vma() is supposed to either return an error or update &prev otherwise the main loop will never advance to the next vma and it will keep looping for ever without a way to get out of the kernel. It seems this has been broken since introduction. Nobody has noticed because nobody seems to be using MADVISE_WILLNEED on these DAX mappings. [mhocko(a)suse.com: rewrite changelog] Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place") Signed-off-by: chenjie <chenjie6(a)huawei.com> Signed-off-by: guoxuenan <guoxuenan(a)huawei.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: zhangyi (F) <yi.zhang(a)huawei.com> Cc: Miao Xie <miaoxie(a)huawei.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Shaohua Li <shli(a)fb.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com> Cc: Rik van Riel <riel(a)redhat.com> Cc: Carsten Otte <cotte(a)de.ibm.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- mm/madvise.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) --- a/mm/madvise.c +++ b/mm/madvise.c @@ -223,15 +223,14 @@ static long madvise_willneed(struct vm_a { struct file *file = vma->vm_file; + *prev = vma; #ifdef CONFIG_SWAP if (!file) { - *prev = vma; force_swapin_readahead(vma, start, end); return 0; } if (shmem_mapping(file->f_mapping)) { - *prev = vma; force_shm_swapin_readahead(vma, start, end, file->f_mapping); return 0; @@ -246,7 +245,6 @@ static long madvise_willneed(struct vm_a return 0; } - *prev = vma; start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; if (end > vma->vm_end) end = vma->vm_end; Patches currently in stable-queue which might be from chenjie6(a)huawei.com are queue-4.4/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "v4l2: disable filesystem-dax mapping support" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled v4l2: disable filesystem-dax mapping support to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: v4l2-disable-filesystem-dax-mapping-support.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From b70131de648c2b997d22f4653934438013f407a1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Wed, 29 Nov 2017 16:10:43 -0800 Subject: v4l2: disable filesystem-dax mapping support From: Dan Williams <dan.j.williams(a)intel.com> commit b70131de648c2b997d22f4653934438013f407a1 upstream. V4L2 memory registrations are incompatible with filesystem-dax that needs the ability to revoke dma access to a mapping at will, or otherwise allow the kernel to wait for completion of DMA. The filesystem-dax implementation breaks the traditional solution of truncate of active file backed mappings since there is no page-cache page we can orphan to sustain ongoing DMA. If v4l2 wants to support long lived DMA mappings it needs to arrange to hold a file lease or use some other mechanism so that the kernel can coordinate revoking DMA access when the filesystem needs to truncate mappings. Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwill… Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reported-by: Jan Kara <jack(a)suse.cz> Reviewed-by: Jan Kara <jack(a)suse.cz> Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org> Cc: Christoph Hellwig <hch(a)lst.de> Cc: Doug Ledford <dledford(a)redhat.com> Cc: Hal Rosenstock <hal.rosenstock(a)gmail.com> Cc: Inki Dae <inki.dae(a)samsung.com> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Jeff Moyer <jmoyer(a)redhat.com> Cc: Joonyoung Shim <jy0922.shim(a)samsung.com> Cc: Kyungmin Park <kyungmin.park(a)samsung.com> Cc: Mel Gorman <mgorman(a)suse.de> Cc: Ross Zwisler <ross.zwisler(a)linux.intel.com> Cc: Sean Hefty <sean.hefty(a)intel.com> Cc: Seung-Woo Kim <sw0312.kim(a)samsung.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n", data, size, dma->nr_pages); - err = get_user_pages(data & PAGE_MASK, dma->nr_pages, + err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages, flags, dma->pages, NULL); if (err != dma->nr_pages) { dma->nr_pages = (err >= 0) ? err : 0; - dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages); + dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err, + dma->nr_pages); return err < 0 ? err : -EINVAL; } return 0; Patches currently in stable-queue which might be from dan.j.williams(a)intel.com are queue-4.14/mm-hugetlbfs-introduce-split-to-vm_operations_struct.patch queue-4.14/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch queue-4.14/mm-introduce-get_user_pages_longterm.patch queue-4.14/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch queue-4.14/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch queue-4.14/v4l2-disable-filesystem-dax-mapping-support.patch queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From a8f97366452ed491d13cf1e44241bc0b5740b1f0 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Date: Mon, 27 Nov 2017 06:21:25 +0300 Subject: mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() From: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream. Currently, we unconditionally make page table dirty in touch_pmd(). It may result in false-positive can_follow_write_pmd(). We may avoid the situation, if we would only make the page table entry dirty if caller asks for write access -- FOLL_WRITE. The patch also changes touch_pud() in the same way. Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Hugh Dickins <hughd(a)google.com> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- mm/huge_memory.c | 36 +++++++++++++----------------------- 1 file changed, 13 insertions(+), 23 deletions(-) --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -842,20 +842,15 @@ EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud); #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd) + pmd_t *pmd, int flags) { pmd_t _pmd; - /* - * We should set the dirty bit only for FOLL_WRITE but for now - * the dirty bit in the pmd is meaningless. And if the dirty - * bit will become meaningful and we'll only set it with - * FOLL_WRITE, an atomic set_bit will be required on the pmd to - * set the young bit, instead of the current set_pmd_at. - */ - _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); + _pmd = pmd_mkyoung(*pmd); + if (flags & FOLL_WRITE) + _pmd = pmd_mkdirty(_pmd); if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, - pmd, _pmd, 1)) + pmd, _pmd, flags & FOLL_WRITE)) update_mmu_cache_pmd(vma, addr, pmd); } @@ -884,7 +879,7 @@ struct page *follow_devmap_pmd(struct vm return NULL; if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd); + touch_pmd(vma, addr, pmd, flags); /* * device mapped pages can only be returned if the @@ -995,20 +990,15 @@ out: #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD static void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud) + pud_t *pud, int flags) { pud_t _pud; - /* - * We should set the dirty bit only for FOLL_WRITE but for now - * the dirty bit in the pud is meaningless. And if the dirty - * bit will become meaningful and we'll only set it with - * FOLL_WRITE, an atomic set_bit will be required on the pud to - * set the young bit, instead of the current set_pud_at. - */ - _pud = pud_mkyoung(pud_mkdirty(*pud)); + _pud = pud_mkyoung(*pud); + if (flags & FOLL_WRITE) + _pud = pud_mkdirty(_pud); if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, - pud, _pud, 1)) + pud, _pud, flags & FOLL_WRITE)) update_mmu_cache_pud(vma, addr, pud); } @@ -1031,7 +1021,7 @@ struct page *follow_devmap_pud(struct vm return NULL; if (flags & FOLL_TOUCH) - touch_pud(vma, addr, pud); + touch_pud(vma, addr, pud, flags); /* * device mapped pages can only be returned if the @@ -1407,7 +1397,7 @@ struct page *follow_trans_huge_pmd(struc page = pmd_page(*pmd); VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd); + touch_pmd(vma, addr, pmd, flags); if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* * We don't mlock() pte-mapped THPs. This way we can avoid Patches currently in stable-queue which might be from kirill.shutemov(a)linux.intel.com are queue-4.14/mm-fix-device-dax-pud-write-faults-triggered-by-get_user_pages.patch queue-4.14/mm-hugetlb-fix-null-pointer-dereference-on-5-level-paging-machine.patch queue-4.14/mm-thp-do-not-make-page-table-dirty-unconditionally-in-touch_pd.patch queue-4.14/mm-madvise.c-fix-madvise-infinite-loop-under-special-circumstances.patch

7 years, 7 months

1
0
0 0

[Linux-stable-mirror] Patch "mm: migrate: fix an incorrect call of prep_transhuge_page()" has been added to the 4.14-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled mm: migrate: fix an incorrect call of prep_transhuge_page() to the 4.14-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch and it can be found in the queue-4.14 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From 40a899ed16486455f964e46d1af31fd4fded21c1 Mon Sep 17 00:00:00 2001 From: Zi Yan <zi.yan(a)cs.rutgers.edu> Date: Wed, 29 Nov 2017 16:11:12 -0800 Subject: mm: migrate: fix an incorrect call of prep_transhuge_page() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From: Zi Yan <zi.yan(a)cs.rutgers.edu> commit 40a899ed16486455f964e46d1af31fd4fded21c1 upstream. In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during memory hotplug/hot remove prep_transhuge_page() is called incorrectly on non-THP pages for migration, when THP is on but THP migration is not enabled. This leads to a bad state of target pages for migration. By inspecting the code, if called on a non-THP, prep_transhuge_page() will 1) change the value of the mapping of (page + 2), since it is used for THP deferred list; 2) change the lru value of (page + 1), since it is used for THP's dtor. Both can lead to data corruption of these two pages. Andrea said: "Pragmatically and from the point of view of the memory_hotplug subsys, the effect is a kernel crash when pages are being migrated during a memory hot remove offline and migration target pages are found in a bad state" This patch fixes it by only calling prep_transhuge_page() when we are certain that the target page is THP. Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration") Signed-off-by: Zi Yan <zi.yan(a)cs.rutgers.edu> Reported-by: Andrea Reale <ar(a)linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: "Jérôme Glisse" <jglisse(a)redhat.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 895ec0c4942e..a2246cf670ba 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -54,7 +54,7 @@ static inline struct page *new_page_nodemask(struct page *page, new_page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask); - if (new_page && PageTransHuge(page)) + if (new_page && PageTransHuge(new_page)) prep_transhuge_page(new_page); return new_page; Patches currently in stable-queue which might be from zi.yan(a)cs.rutgers.edu are queue-4.14/mm-migrate-fix-an-incorrect-call-of-prep_transhuge_page.patch

7 years, 7 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror December 2017