The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 2b0f922323ccfa76219bcaacd35cd50aeaa13592 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2024101837-mammogram-headsman-2dec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2b0f922323ccfa76219bcaacd35cd50aeaa13592 Mon Sep 17 00:00:00 2001 From: David Hildenbrand david@redhat.com Date: Fri, 11 Oct 2024 12:24:45 +0200 Subject: [PATCH] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
We (or rather, readahead logic :) ) might be allocating a THP in the pagecache and then try mapping it into a process that explicitly disabled THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before starting the VM.
For example, starting a VM backed on a file system with large folios supported makes the VM crash when the VM tries accessing such a mapping using KVM.
Is it also a problem when the HW disabled THP using TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED really wants. For now, fix it by essentially performing the same check as would be done in __thp_vma_allowable_orders() or in shmem code, where this works as expected, and disallow PMD mappings, making us fallback to PTE mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Leo Fu bfu@redhat.com Tested-by: Thomas Huth thuth@redhat.com Cc: Thomas Huth thuth@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Ryan Roberts ryan.roberts@arm.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Janosch Frank frankja@linux.ibm.com Cc: Claudio Imbrenda imbrenda@linux.ibm.com Cc: Hugh Dickins hughd@google.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org
diff --git a/mm/memory.c b/mm/memory.c index c0869a962ddd..30feedabc932 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4920,6 +4920,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) pmd_t entry; vm_fault_t ret = VM_FAULT_FALLBACK;
+ /* + * It is too late to allocate a small folio, we already have a large + * folio in the pagecache: especially s390 KVM cannot tolerate any + * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any + * PMD mappings if THPs are disabled. + */ + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) + return ret; + if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) return ret;
Am 18.10.24 um 09:57 schrieb gregkh@linuxfoundation.org:
The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 2b0f922323ccfa76219bcaacd35cd50aeaa13592 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2024101837-mammogram-headsman-2dec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
I'll take a stab at this today.
From: Kefeng Wang wangkefeng.wang@huawei.com
Patch series "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma".
During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit.
For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ] Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Leo Fu bfu@redhat.com Tested-by: Thomas Huth thuth@redhat.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Boqiao Fu bfu@redhat.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Claudio Imbrenda imbrenda@linux.ibm.com Cc: Hugh Dickins hughd@google.com Cc: Janosch Frank frankja@linux.ibm.com Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 963756aac1f011d904ddd9548ae82286d3a91f96) Signed-off-by: David Hildenbrand david@redhat.com ---
Only contextual differences in shmem_allowable_huge_orders(). Note that this patch is required to backport the fix 2b0f922323ccfa76219bcaacd35cd50aeaa13592, which can be cleanly cherry picked on top.
--- include/linux/huge_mm.h | 18 ++++++++++++++++++ mm/huge_memory.c | 13 +------------ mm/shmem.c | 7 +------ 3 files changed, 20 insertions(+), 18 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e25d9ebfdf89..6d334c211176 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -308,6 +308,24 @@ static inline void count_mthp_stat(int order, enum mthp_stat_item item) (transparent_hugepage_flags & \ (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
+static inline bool vma_thp_disabled(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + /* + * Explicitly disabled through madvise or prctl, or some + * architectures may disable THP for some mappings, for + * example, s390 kvm. + */ + return (vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags); +} + +static inline bool thp_disabled_by_hw(void) +{ + /* If the hardware/firmware marked hugepage support disabled. */ + return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED); +} + unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 99b146d16a18..1536421f76d4 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -106,18 +106,7 @@ unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, if (!vma->vm_mm) /* vdso */ return 0;
- /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. - * */ - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return 0; - /* - * If the hardware/firmware marked hugepage support disabled. - */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) return 0;
/* khugepaged doesn't collapse DAX vma, but page fault is fine. */ diff --git a/mm/shmem.c b/mm/shmem.c index 5a77acf6ac6a..2e21e06565ef 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1632,12 +1632,7 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode, loff_t i_size; int order;
- if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return 0; - - /* If the hardware/firmware marked hugepage support disabled. */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) return 0;
/*
On 22.10.24 11:00, David Hildenbrand wrote:
From: Kefeng Wang wangkefeng.wang@huawei.com
Patch series "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma".
During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit.
For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ] Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Leo Fu bfu@redhat.com Tested-by: Thomas Huth thuth@redhat.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Boqiao Fu bfu@redhat.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Claudio Imbrenda imbrenda@linux.ibm.com Cc: Hugh Dickins hughd@google.com Cc: Janosch Frank frankja@linux.ibm.com Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 963756aac1f011d904ddd9548ae82286d3a91f96) Signed-off-by: David Hildenbrand david@redhat.com
Only contextual differences in shmem_allowable_huge_orders(). Note that this patch is required to backport the fix 2b0f922323ccfa76219bcaacd35cd50aeaa13592, which can be cleanly cherry picked on top.
ARG my backporting skills (or rather patch sending skills) are not strong today. This is the 6.11.y variant. Please ignore this mail ... :(
From: Kefeng Wang wangkefeng.wang@huawei.com
Patch series "mm: don't install PMD mappings when THPs are disabled by the hw/process/vma".
During testing, it was found that we can get PMD mappings in processes where THP (and more precisely, PMD mappings) are supposed to be disabled. While it works as expected for anon+shmem, the pagecache is the problematic bit.
For s390 KVM this currently means that a VM backed by a file located on filesystem with large folio support can crash when KVM tries accessing the problematic page, because the readahead logic might decide to use a PMD-sized THP and faulting it into the page tables will install a PMD mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings, but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we can install a PMD mapping. khugepaged should already be taking care of not collapsing if THPs are effectively disabled for the hw/process/vma.
This patch (of 2):
Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
[david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ] Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Leo Fu bfu@redhat.com Tested-by: Thomas Huth thuth@redhat.com Reviewed-by: Ryan Roberts ryan.roberts@arm.com Cc: Boqiao Fu bfu@redhat.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Claudio Imbrenda imbrenda@linux.ibm.com Cc: Hugh Dickins hughd@google.com Cc: Janosch Frank frankja@linux.ibm.com Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 963756aac1f011d904ddd9548ae82286d3a91f96) Signed-off-by: David Hildenbrand david@redhat.com ---
The change in mm/shmem.c does not exist yet. TRANSPARENT_HUGEPAGE_UNSUPPORTED was called TRANSPARENT_HUGEPAGE_NEVER_DAX.
This patch is required to backport the fix 2b0f922323ccfa76219bcaacd35cd50aeaa13592, for which a backport will be sent separately in reply to the "FAILED: ..." mail.
--- include/linux/huge_mm.h | 18 ++++++++++++++++++ mm/huge_memory.c | 15 ++------------- 2 files changed, 20 insertions(+), 13 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a1341fdcf666..0396c39e9e40 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -175,6 +175,24 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, (transparent_hugepage_flags & \ (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
+static inline bool vma_thp_disabled(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + /* + * Explicitly disabled through madvise or prctl, or some + * architectures may disable THP for some mappings, for + * example, s390 kvm. + */ + return (vm_flags & VM_NOHUGEPAGE) || + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags); +} + +static inline bool thp_disabled_by_hw(void) +{ + /* If the hardware/firmware marked hugepage support disabled. */ + return transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX); +} + unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 98a1a05f2db2..160d975d930f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -78,19 +78,8 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, if (!vma->vm_mm) /* vdso */ return false;
- /* - * Explicitly disabled through madvise or prctl, or some - * architectures may disable THP for some mappings, for - * example, s390 kvm. - * */ - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return false; - /* - * If the hardware/firmware marked hugepage support disabled. - */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX)) - return false; + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) + return 0;
/* khugepaged doesn't collapse DAX vma, but page fault is fine. */ if (vma_is_dax(vma))
We (or rather, readahead logic :) ) might be allocating a THP in the pagecache and then try mapping it into a process that explicitly disabled THP: we might end up installing PMD mappings.
This is a problem for s390x KVM, which explicitly remaps all PMD-mapped THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before starting the VM.
For example, starting a VM backed on a file system with large folios supported makes the VM crash when the VM tries accessing such a mapping using KVM.
Is it also a problem when the HW disabled THP using TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case without X86_FEATURE_PSE.
In the future, we might be able to do better on s390x and only disallow PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED really wants. For now, fix it by essentially performing the same check as would be done in __thp_vma_allowable_orders() or in shmem code, where this works as expected, and disallow PMD mappings, making us fallback to PTE mappings.
Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: David Hildenbrand david@redhat.com Reported-by: Leo Fu bfu@redhat.com Tested-by: Thomas Huth thuth@redhat.com Cc: Thomas Huth thuth@redhat.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Ryan Roberts ryan.roberts@arm.com Cc: Christian Borntraeger borntraeger@linux.ibm.com Cc: Janosch Frank frankja@linux.ibm.com Cc: Claudio Imbrenda imbrenda@linux.ibm.com Cc: Hugh Dickins hughd@google.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org (cherry picked from commit 2b0f922323ccfa76219bcaacd35cd50aeaa13592) Signed-off-by: David Hildenbrand david@redhat.com ---
Minor contextual difference.
Note that the backport of 963756aac1f011d904ddd9548ae82286d3a91f96 is required (send separately as reply to the "FAILED:" mail).
--- mm/memory.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c index da9fed5e6025..d0af31ffd6b5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4328,6 +4328,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) int i; vm_fault_t ret = VM_FAULT_FALLBACK;
+ /* + * It is too late to allocate a small folio, we already have a large + * folio in the pagecache: especially s390 KVM cannot tolerate any + * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any + * PMD mappings if THPs are disabled. + */ + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) + return ret; + if (!transhuge_vma_suitable(vma, haddr)) return ret;
linux-stable-mirror@lists.linaro.org