New subject: [Linux-stable-mirror] [stable 4.9 2/3] thp: fix MADV_DONTNEED vs. numa balancing race

7 Dec 2017

From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com
commit 0a85e51d37645e9ce57e5e1a30859e07810ed07c upstream.
Patch series "thp: fix few MADV_DONTNEED races"
For MADV_DONTNEED to work properly with huge pages, it's critical to not
clear pmd intermittently unless you hold down_write(mmap_sem).
Otherwise MADV_DONTNEED can miss the THP which can lead to userspace
breakage.
See example of such race in commit message of patch 2/4.
All these races are found by code inspection.  I haven't seen them
triggered.  I don't think it's worth to apply them to stable@.
This patch (of 4):
Restructure code in preparation for a fix.
Link: http://lkml.kernel.org/r/20170302151034.27829-2-kirill.shutemov@linux.intel....
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com
Acked-by: Vlastimil Babka vbabka@suse.cz
Cc: Andrea Arcangeli aarcange@redhat.com
Cc: Hillf Danton hillf.zj@alibaba-inc.com
Cc: stable@vger.kernel.org
Signed-off-by: Andrew Morton akpm@linux-foundation.org
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org
[jwang: adjust context for 4.9]
Signed-off-by: Jack Wang jinpu.wang@profitbricks.com
---
 mm/huge_memory.c | 52 ++++++++++++++++++++++++++--------------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3cae1dc..660e732 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1509,37 +1509,37 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 {
    struct mm_struct *mm = vma->vm_mm;
    spinlock_t *ptl;
-	int ret = 0;
+	pmd_t entry;
+	bool preserve_write;
+	int ret;
ptl = __pmd_trans_huge_lock(pmd, vma);
-	if (ptl) {
-		pmd_t entry;
-		bool preserve_write = prot_numa && pmd_write(*pmd);
-		ret = 1;
+	if (!ptl)
+		return 0;
-		/*
-		 * Avoid trapping faults against the zero page. The read-only
-		 * data is likely to be read-cached on the local CPU and
-		 * local/remote hits to the zero page are not interesting.
-		 */
-		if (prot_numa && is_huge_zero_pmd(*pmd)) {
-			spin_unlock(ptl);
-			return ret;
-		}
+	preserve_write = prot_numa && pmd_write(*pmd);
+	ret = 1;
-		if (!prot_numa || !pmd_protnone(*pmd)) {
-			entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
-			entry = pmd_modify(entry, newprot);
-			if (preserve_write)
-				entry = pmd_mkwrite(entry);
-			ret = HPAGE_PMD_NR;
-			set_pmd_at(mm, addr, pmd, entry);
-			BUG_ON(vma_is_anonymous(vma) && !preserve_write &&
-					pmd_write(entry));
-		}
-		spin_unlock(ptl);
-	}
+	/*
+	 * Avoid trapping faults against the zero page. The read-only
+	 * data is likely to be read-cached on the local CPU and
+	 * local/remote hits to the zero page are not interesting.
+	 */
+	if (prot_numa && is_huge_zero_pmd(*pmd))
+		goto unlock;
+	if (prot_numa && pmd_protnone(*pmd))
+		goto unlock;
+
+	entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
+	entry = pmd_modify(entry, newprot);
+	if (preserve_write)
+		entry = pmd_mkwrite(entry);
+	ret = HPAGE_PMD_NR;
+	set_pmd_at(mm, addr, pmd, entry);
+	BUG_ON(vma_is_anonymous(vma) && !preserve_write && pmd_write(entry));
+unlock:
+	spin_unlock(ptl);
    return ret;
 }
-- 
2.7.4

    

[Linux-stable-mirror] [stable 4.9 1/3] thp: reduce indentation level in change_huge_pmd()