The patch titled
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
Date: Tue, 15 Oct 2024 09:45:21 +0800
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff. The
problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
unuse_pud_range() by ftrace. And therefore the HugeTLB pages will never
be freed because we lost it from page table. We can skip HugeTLB pages
for unuse_vma to fix it.
Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-swapfile-skip-hugetlb-pages-for-unuse_vma
+++ a/mm/swapfile.c
@@ -2313,7 +2313,7 @@ static int unuse_mm(struct mm_struct *mm
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff.
The problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad
in unuse_pud_range() by ftrace. And therefore the HugeTLB pages will
never be freed because we lost it from page table. We can skip
HugeTLB pages for unuse_vma to fix it.
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0cded32414a1..f4ef91513fc9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2312,7 +2312,7 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type)
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
--
2.34.1
Previously, the domain_context_clear() function incorrectly called
pci_for_each_dma_alias() to set up context entries for non-PCI devices.
This could lead to kernel hangs or other unexpected behavior.
Add a check to only call pci_for_each_dma_alias() for PCI devices. For
non-PCI devices, domain_context_clear_one() is called directly.
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219363
Fixes: 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with context mapping")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
---
drivers/iommu/intel/iommu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9f6b0780f2ef..e860bc9439a2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3340,8 +3340,10 @@ static int domain_context_clear_one_cb(struct pci_dev *pdev, u16 alias, void *op
*/
static void domain_context_clear(struct device_domain_info *info)
{
- if (!dev_is_pci(info->dev))
+ if (!dev_is_pci(info->dev)) {
domain_context_clear_one(info, info->bus, info->devfn);
+ return;
+ }
pci_for_each_dma_alias(to_pci_dev(info->dev),
&domain_context_clear_one_cb, info);
--
2.43.0
The added test case from commit
09bcf9254838 ("selftests/ftrace: Add new test case which checks non unique symbol")
failed, it is fixed by
b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols").
Backport it and its fix commit to 5.4.y together. Resolved minor context change conflicts.
Resend the patch series after backpporting to 5.10.y first.
Andrii Nakryiko (1):
tracing/kprobes: Fix symbol counting logic by looking at modules as
well
Francis Laniel (1):
tracing/kprobes: Return EADDRNOTAVAIL when func matches several
symbols
kernel/trace/trace_kprobe.c | 76 +++++++++++++++++++++++++++++++++++++
kernel/trace/trace_probe.h | 1 +
2 files changed, 77 insertions(+)
--
2.46.0
Hello all,
We are experiencing a boot hang issue when booting kernel version
6.1.83+ on a Dell Inc. PowerEdge R770 equipped with an Intel Xeon
6710E processor. After extensive testing and use of `git bisect`, we
have traced the issue to commit:
`586e19c88a0c ("iommu/vt-d: Retrieve IOMMU perfmon capability information")`
This commit appears to be part of a larger patchset, which can be found here:
[Patchset on lore.kernel.org](https://lore.kernel.org/lkml/7c4b3e4e-1c5d-04f1-1891-84f68…
We attempted to boot with the `intel_iommu=off` option, but the system
hangs in the same manner. However, the system boots successfully after
disabling `CONFIG_INTEL_IOMMU_PERF_EVENTS`.
I'm reporting here in case others hit the same issue.
Any assistance or guidance on understanding/resolving this issue would
be greatly appreciated.
Thank you.
Jinpu Wang @ IONOS Cloud