From: Ross Zwisler ross.zwisler@linux.intel.com
commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.
When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge pages, they were all added to the end of if() statements after existing pmd_trans_huge() checks. So, things like:
- if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
When further checks were added after pmd_trans_unstable() checks by commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") they were also added at the end of the conditional:
+ if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
This ordering is fine for pmd_trans_huge(), but doesn't work for pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd() check inside of pmd_none_or_trans_huge_or_clear_bad() (called by pmd_trans_unstable()), which prints out a warning and returns 1. So, we do end up doing the right thing, but only after spamming dmesg with suspicious looking messages:
mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
Reorder these checks in a helper so that pmd_devmap() is checked first, avoiding the error messages, and add a comment explaining why the ordering is important.
Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler ross.zwisler@linux.intel.com Reviewed-by: Jan Kara jack@suse.cz Cc: Pawel Lebioda pawel.lebioda@intel.com Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Christoph Hellwig hch@lst.de Cc: Dan Williams dan.j.williams@intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Dave Jiang dave.jiang@intel.com Cc: Xiong Zhou xzhou@redhat.com Cc: Eryu Guan eguan@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org --- mm/memory.c | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index e2e68767a373..d2db2c4eb0a4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env *fe, pgoff_t pgoff, return ret; }
+/* + * The ordering of these checks is important for pmds with _PAGE_DEVMAP set. + * If we check pmd_trans_unstable() first we will trip the bad_pmd() check + * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly + * returning 1 but not before it spams dmesg with the pmd_clear_bad() output. + */ +static int pmd_devmap_trans_unstable(pmd_t *pmd) +{ + return pmd_devmap(*pmd) || pmd_trans_unstable(pmd); +} + static int pte_alloc_one_map(struct fault_env *fe) { struct vm_area_struct *vma = fe->vma; @@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct fault_env *fe) map_pte: /* * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd - * didn't become pmd_trans_huge under us and then back to pmd_none, as - * a result of MADV_DONTNEED running immediately after a huge pmd fault - * in a different thread of this mm, in turn leading to a misleading - * pmd_trans_huge() retval. All we have to ensure is that it is a - * regular pmd that we can walk with pte_offset_map() and we can do that - * through an atomic read in C, which is what pmd_trans_unstable() - * provides. + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of + * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge + * under us and then back to pmd_none, as a result of MADV_DONTNEED + * running immediately after a huge pmd fault in a different thread of + * this mm, in turn leading to a misleading pmd_trans_huge() retval. + * All we have to ensure is that it is a regular pmd that we can walk + * with pte_offset_map() and we can do that through an atomic read in + * C, which is what pmd_trans_unstable() provides. */ - if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) + if (pmd_devmap_trans_unstable(fe->pmd)) return VM_FAULT_NOPAGE;
+ /* + * At this point we know that our vmf->pmd points to a page of ptes + * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge() + * for the duration of the fault. If a racing MADV_DONTNEED runs and + * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still + * be valid and we will re-check to make sure the vmf->pte isn't + * pte_none() under vmf->ptl protection when we return to + * alloc_set_pte(). + */ fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address, &fe->ptl); return 0; @@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault_env *fe) fe->pte = NULL; } else { /* See comment in pte_alloc_one_map() */ - if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) + if (pmd_devmap_trans_unstable(fe->pmd)) return 0; /* * A regular pmd is established and it can't morph into a huge
This is a note to let you know that I've just added the patch titled
mm: avoid spurious 'bad pmd' warning messages
to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: mm-avoid-spurious-bad-pmd-warning-messages.patch and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From foo@baz Mon Feb 26 20:55:53 CET 2018
From: Dan Williams dan.j.williams@intel.com Date: Fri, 23 Feb 2018 14:05:27 -0800 Subject: mm: avoid spurious 'bad pmd' warning messages To: gregkh@linuxfoundation.org Cc: Jan Kara jack@suse.cz, Eryu Guan eguan@redhat.com, Xiong Zhou xzhou@redhat.com, linux-kernel@vger.kernel.org, Matthew Wilcox mawilcox@microsoft.com, Christoph Hellwig hch@lst.de, stable@vger.kernel.org, Pawel Lebioda pawel.lebioda@intel.com, Dave Hansen dave.hansen@intel.com, Alexander Viro viro@zeniv.linux.org.uk, Ross Zwisler ross.zwisler@linux.intel.com, Dave Jiang dave.jiang@intel.com, Andrew Morton akpm@linux-foundation.org, Linus Torvalds torvalds@linux-foundation.org, "Darrick J. Wong" darrick.wong@oracle.com, "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Message-ID: 151942352781.21775.15841303754448120195.stgit@dwillia2-desk3.amr.corp.intel.com
From: Ross Zwisler ross.zwisler@linux.intel.com
commit d0f0931de936a0a468d7e59284d39581c16d3a73 upstream.
When the pmd_devmap() checks were added by 5c7fb56e5e3f ("mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd") to add better support for DAX huge pages, they were all added to the end of if() statements after existing pmd_trans_huge() checks. So, things like:
- if (pmd_trans_huge(*pmd)) + if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd))
When further checks were added after pmd_trans_unstable() checks by commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") they were also added at the end of the conditional:
+ if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd))
This ordering is fine for pmd_trans_huge(), but doesn't work for pmd_trans_unstable(). This is because DAX huge pages trip the bad_pmd() check inside of pmd_none_or_trans_huge_or_clear_bad() (called by pmd_trans_unstable()), which prints out a warning and returns 1. So, we do end up doing the right thing, but only after spamming dmesg with suspicious looking messages:
mm/pgtable-generic.c:39: bad pmd ffff8808daa49b88(84000001006000a5)
Reorder these checks in a helper so that pmd_devmap() is checked first, avoiding the error messages, and add a comment explaining why the ordering is important.
Fixes: commit 7267ec008b5c ("mm: postpone page table allocation until we have page to map") Link: http://lkml.kernel.org/r/20170522215749.23516-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler ross.zwisler@linux.intel.com Reviewed-by: Jan Kara jack@suse.cz Cc: Pawel Lebioda pawel.lebioda@intel.com Cc: "Darrick J. Wong" darrick.wong@oracle.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Christoph Hellwig hch@lst.de Cc: Dan Williams dan.j.williams@intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Matthew Wilcox mawilcox@microsoft.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Dave Jiang dave.jiang@intel.com Cc: Xiong Zhou xzhou@redhat.com Cc: Eryu Guan eguan@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory.c | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-)
--- a/mm/memory.c +++ b/mm/memory.c @@ -2848,6 +2848,17 @@ static int __do_fault(struct fault_env * return ret; }
+/* + * The ordering of these checks is important for pmds with _PAGE_DEVMAP set. + * If we check pmd_trans_unstable() first we will trip the bad_pmd() check + * inside of pmd_none_or_trans_huge_or_clear_bad(). This will end up correctly + * returning 1 but not before it spams dmesg with the pmd_clear_bad() output. + */ +static int pmd_devmap_trans_unstable(pmd_t *pmd) +{ + return pmd_devmap(*pmd) || pmd_trans_unstable(pmd); +} + static int pte_alloc_one_map(struct fault_env *fe) { struct vm_area_struct *vma = fe->vma; @@ -2871,18 +2882,27 @@ static int pte_alloc_one_map(struct faul map_pte: /* * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() instead of pmd_trans_huge() to ensure the pmd - * didn't become pmd_trans_huge under us and then back to pmd_none, as - * a result of MADV_DONTNEED running immediately after a huge pmd fault - * in a different thread of this mm, in turn leading to a misleading - * pmd_trans_huge() retval. All we have to ensure is that it is a - * regular pmd that we can walk with pte_offset_map() and we can do that - * through an atomic read in C, which is what pmd_trans_unstable() - * provides. + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead of + * pmd_trans_huge() to ensure the pmd didn't become pmd_trans_huge + * under us and then back to pmd_none, as a result of MADV_DONTNEED + * running immediately after a huge pmd fault in a different thread of + * this mm, in turn leading to a misleading pmd_trans_huge() retval. + * All we have to ensure is that it is a regular pmd that we can walk + * with pte_offset_map() and we can do that through an atomic read in + * C, which is what pmd_trans_unstable() provides. */ - if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) + if (pmd_devmap_trans_unstable(fe->pmd)) return VM_FAULT_NOPAGE;
+ /* + * At this point we know that our vmf->pmd points to a page of ptes + * and it cannot become pmd_none(), pmd_devmap() or pmd_trans_huge() + * for the duration of the fault. If a racing MADV_DONTNEED runs and + * we zap the ptes pointed to by our vmf->pmd, the vmf->ptl will still + * be valid and we will re-check to make sure the vmf->pte isn't + * pte_none() under vmf->ptl protection when we return to + * alloc_set_pte(). + */ fe->pte = pte_offset_map_lock(vma->vm_mm, fe->pmd, fe->address, &fe->ptl); return 0; @@ -3456,7 +3476,7 @@ static int handle_pte_fault(struct fault fe->pte = NULL; } else { /* See comment in pte_alloc_one_map() */ - if (pmd_trans_unstable(fe->pmd) || pmd_devmap(*fe->pmd)) + if (pmd_devmap_trans_unstable(fe->pmd)) return 0; /* * A regular pmd is established and it can't morph into a huge
Patches currently in stable-queue which might be from dan.j.williams@intel.com are
queue-4.9/mm-fix-devm_memremap_pages-collision-handling.patch queue-4.9/ib-core-disable-memory-registration-of-filesystem-dax-vmas.patch queue-4.9/mm-avoid-spurious-bad-pmd-warning-messages.patch queue-4.9/mm-introduce-get_user_pages_longterm.patch queue-4.9/mm-fail-get_vaddr_frames-for-filesystem-dax-mappings.patch queue-4.9/fs-dax.c-fix-inefficiency-in-dax_writeback_mapping_range.patch queue-4.9/device-dax-implement-split-to-catch-invalid-munmap-attempts.patch queue-4.9/v4l2-disable-filesystem-dax-mapping-support.patch queue-4.9/libnvdimm-dax-fix-1gb-aligned-namespaces-vs-physical-misalignment.patch queue-4.9/x86-entry-64-clear-extra-registers-beyond-syscall-arguments-to-reduce-speculation-attack-surface.patch queue-4.9/libnvdimm-fix-integer-overflow-static-analysis-warning.patch
linux-stable-mirror@lists.linaro.org