This is a note to let you know that I've just added the patch titled
thp: fix MADV_DONTNEED vs. numa balancing race
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: thp-fix-madv_dontneed-vs.-numa-balancing-race.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From ced108037c2aa542b3ed8b7afd1576064ad1362a Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Date: Thu, 13 Apr 2017 14:56:20 -0700 Subject: thp: fix MADV_DONTNEED vs. numa balancing race
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit ced108037c2aa542b3ed8b7afd1576064ad1362a upstream.
In case prot_numa, we are under down_read(mmap_sem). It's critical to not clear pmd intermittently to avoid race with MADV_DONTNEED which is also under down_read(mmap_sem):
CPU0: CPU1: change_huge_pmd(prot_numa=1) pmdp_huge_get_and_clear_notify() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established
The race makes MADV_DONTNEED miss the huge pmd and don't clear it which may break userspace.
Found by code analysis, never saw triggered.
Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.... Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: Hillf Danton hillf.zj@alibaba-inc.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org [jwang: adjust context for 4.4] Signed-off-by: Jack Wang jinpu.wang@profitbricks.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/huge_memory.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1588,7 +1588,39 @@ int change_huge_pmd(struct vm_area_struc if (prot_numa && pmd_protnone(*pmd)) goto unlock;
- entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd); + /* + * In case prot_numa, we are under down_read(mmap_sem). It's critical + * to not clear pmd intermittently to avoid race with MADV_DONTNEED + * which is also under down_read(mmap_sem): + * + * CPU0: CPU1: + * change_huge_pmd(prot_numa=1) + * pmdp_huge_get_and_clear_notify() + * madvise_dontneed() + * zap_pmd_range() + * pmd_trans_huge(*pmd) == 0 (without ptl) + * // skip the pmd + * set_pmd_at(); + * // pmd is re-established + * + * The race makes MADV_DONTNEED miss the huge pmd and don't clear it + * which may break userspace. + * + * pmdp_invalidate() is required to make sure we don't miss + * dirty/young flags set by hardware. + */ + entry = *pmd; + pmdp_invalidate(vma, addr, pmd); + + /* + * Recover dirty/young flags. It relies on pmdp_invalidate to not + * corrupt them. + */ + if (pmd_dirty(*pmd)) + entry = pmd_mkdirty(entry); + if (pmd_young(*pmd)) + entry = pmd_mkyoung(entry); + entry = pmd_modify(entry, newprot); if (preserve_write) entry = pmd_mkwrite(entry);
Patches currently in stable-queue which might be from kirill.shutemov@linux.intel.com are
queue-4.4/mm-drop-unused-pmdp_huge_get_and_clear_notify.patch queue-4.4/thp-fix-madv_dontneed-vs.-numa-balancing-race.patch queue-4.4/thp-reduce-indentation-level-in-change_huge_pmd.patch