A Partial Region Controller can be connected to one or more
Freeze Bridge. Each Freeze Bridge has an illegal_request
bit represented in the freeze_illegal_request register.
Thus, instead of just set to clear the illegal_request bit
for first Freeze Bridge, we need to ensure the set to clear
action is applied to which ever Freeze Bridge that has
occurrence of illegal request.
Fixes: ca24a648f535 ("fpga: add altera freeze bridge support")
Signed-off-by: Chiau Ee Chew <chiau.ee.chew(a)intel.com>
Signed-off-by: Tanmay Kathpalia <tanmay.kathpalia(a)altera.com>
---
drivers/fpga/altera-freeze-bridge.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/fpga/altera-freeze-bridge.c b/drivers/fpga/altera-freeze-bridge.c
index 594693ff786e..23e8b2b54355 100644
--- a/drivers/fpga/altera-freeze-bridge.c
+++ b/drivers/fpga/altera-freeze-bridge.c
@@ -52,7 +52,7 @@ static int altera_freeze_br_req_ack(struct altera_freeze_br_data *priv,
if (illegal) {
dev_err(dev, "illegal request detected 0x%x", illegal);
- writel(1, csr_illegal_req_addr);
+ writel(illegal, csr_illegal_req_addr);
illegal = readl(csr_illegal_req_addr);
if (illegal)
--
2.19.0
Add qlcnic_sriov_free_vlans() in qlcnic_sriov_alloc_vlans() if
any sriov_vlans fails to be allocated.
Add qlcnic_sriov_free_vlans() to free the memory allocated by
qlcnic_sriov_alloc_vlans() if "sriov->allowed_vlans" fails to
be allocated.
Fixes: 91b7282b613d ("qlcnic: Support VLAN id config.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024(a)163.com>
---
Changes in v3:
- Handle allocation errors in qlcnic_sriov_alloc_vlans()
- Modify the patch title and description.
There's one more thing I'm confused about: I'm not sure if the fixes-tag
is correct, because I noticed that the two modifications correspond to
different commits. Should I split them into two separate patch submissions? Thanks, Paolo!
Changes in v2:
- Add qlcnic_sriov_free_vlans() if qlcnic_sriov_alloc_vlans() fails.
- Modify the patch description.
vf_info was allocated by kcalloc, no need to do more checks cause
kfree(NULL) is safe. Thanks, Paolo!
---
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
index f9dd50152b1e..28d24d59efb8 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_common.c
@@ -454,8 +454,10 @@ static int qlcnic_sriov_set_guest_vlan_mode(struct qlcnic_adapter *adapter,
num_vlans = sriov->num_allowed_vlans;
sriov->allowed_vlans = kcalloc(num_vlans, sizeof(u16), GFP_KERNEL);
- if (!sriov->allowed_vlans)
+ if (!sriov->allowed_vlans) {
+ qlcnic_sriov_free_vlans(adapter);
return -ENOMEM;
+ }
vlans = (u16 *)&cmd->rsp.arg[3];
for (i = 0; i < num_vlans; i++)
@@ -2167,8 +2169,10 @@ int qlcnic_sriov_alloc_vlans(struct qlcnic_adapter *adapter)
vf = &sriov->vf_info[i];
vf->sriov_vlans = kcalloc(sriov->num_allowed_vlans,
sizeof(*vf->sriov_vlans), GFP_KERNEL);
- if (!vf->sriov_vlans)
+ if (!vf->sriov_vlans) {
+ qlcnic_sriov_free_vlans(adapter);
return -ENOMEM;
+ }
}
return 0;
--
2.25.1
Hello,
New build issue found on stable-rc/linux-5.10.y:
---
in vmlinux (Makefile:1212) [logspec:kbuild,kbuild.other]
---
- dashboard: https://d.kernelci.org/issue/maestro:d5c2be698989c7de46471109aae8df0339b713…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: a0e8dfa03993fda7b4d4b696c50f69726522abba
Log excerpt:
=====================================================
.lds
In file included from ./include/linux/kernel.h:15,
net/ipv6/udp.c: In function ‘udp_v6_send_skb’:
./include/linux/minmax.h:20:35: warning: comparison of distinct
pointer types lacks a cast
./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp’
net/ipv6/udp.c:1213:28: note: in expansion of macro ‘min’
In file included from ./include/linux/uaccess.h:7,
net/ipv4/udp.c: In function ‘udp_send_skb’:
./include/linux/minmax.h:20:35: warning: comparison of distinct
pointer types lacks a cast
./include/linux/minmax.h:26:18: note: in expansion of macro ‘__typecheck’
./include/linux/minmax.h:36:31: note: in expansion of macro ‘__safe_cmp’
./include/linux/minmax.h:45:25: note: in expansion of macro ‘__careful_cmp’
net/ipv4/udp.c:926:28: note: in expansion of macro ‘min’
FAILED unresolved symbol filp_close
=====================================================
# Builds where the incident occurred:
## cros://chromeos-5.10/x86_64/chromeos-amd-stoneyridge.flavour.config+lab-setup+x86-board+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y
on (x86_64):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ceffea18018371957ebdc0
#kernelci issue maestro:d5c2be698989c7de46471109aae8df0339b713c1
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
---8<---
Changes in v2:
- Added explicit comment about the quirk, as requested by Mani.
- Made commit message more clear, as requested by Bjorn.
---8<---
On our Marvell OCTEON CN96XX board, we observed the following panic on
the latest kernel:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000080
CPU: 22 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc6 #20
Hardware name: Marvell OcteonTX CN96XX board (DT)
pc : of_pci_add_properties+0x278/0x4c8
Call trace:
of_pci_add_properties+0x278/0x4c8 (P)
of_pci_make_dev_node+0xe0/0x158
pci_bus_add_device+0x158/0x228
pci_bus_add_devices+0x40/0x98
pci_host_probe+0x94/0x118
pci_host_common_probe+0x130/0x1b0
platform_probe+0x70/0xf0
The dmesg logs indicated that the PCI bridge was scanning with an invalid bus range:
pci-host-generic 878020000000.pci: PCI host bridge to bus 0002:00
pci_bus 0002:00: root bus resource [bus 00-ff]
pci 0002:00:00.0: scanning [bus f9-f9] behind bridge, pass 0
pci 0002:00:01.0: scanning [bus fa-fa] behind bridge, pass 0
pci 0002:00:02.0: scanning [bus fb-fb] behind bridge, pass 0
pci 0002:00:03.0: scanning [bus fc-fc] behind bridge, pass 0
pci 0002:00:04.0: scanning [bus fd-fd] behind bridge, pass 0
pci 0002:00:05.0: scanning [bus fe-fe] behind bridge, pass 0
pci 0002:00:06.0: scanning [bus ff-ff] behind bridge, pass 0
pci 0002:00:07.0: scanning [bus 00-00] behind bridge, pass 0
pci 0002:00:07.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0002:00:08.0: scanning [bus 01-01] behind bridge, pass 0
pci 0002:00:09.0: scanning [bus 02-02] behind bridge, pass 0
pci 0002:00:0a.0: scanning [bus 03-03] behind bridge, pass 0
pci 0002:00:0b.0: scanning [bus 04-04] behind bridge, pass 0
pci 0002:00:0c.0: scanning [bus 05-05] behind bridge, pass 0
pci 0002:00:0d.0: scanning [bus 06-06] behind bridge, pass 0
pci 0002:00:0e.0: scanning [bus 07-07] behind bridge, pass 0
pci 0002:00:0f.0: scanning [bus 08-08] behind bridge, pass 0
This regression was introduced by commit 7246a4520b4b ("PCI: Use
preserve_config in place of pci_flags"). On our board, the 0002:00:07.0
bridge is misconfigured by the bootloader. Both its secondary and
subordinate bus numbers are initialized to 0, while its fixed secondary
bus number is set to 8. However, bus number 8 is also assigned to another
bridge (0002:00:0f.0). Although this is a bootloader issue, before the
change in commit 7246a4520b4b, the PCI_REASSIGN_ALL_BUS flag was set
by default when PCI_PROBE_ONLY was not enabled, ensuing that all the
bus number for these bridges were reassigned, avoiding any conflicts.
After the change introduced in commit 7246a4520b4b, the bus numbers
assigned by the bootloader are reused by all other bridges, except
the misconfigured 0002:00:07.0 bridge. The kernel attempt to reconfigure
0002:00:07.0 by reusing the fixed secondary bus number 8 assigned by
bootloader. However, since a pci_bus has already been allocated for
bus 8 due to the probe of 0002:00:0f.0, no new pci_bus allocated for
0002:00:07.0. This results in a pci bridge device without a pci_bus
attached (pdev->subordinate == NULL). Consequently, accessing
pdev->subordinate in of_pci_prop_bus_range() leads to a NULL pointer
dereference.
To summarize, we need to set the PCI_REASSIGN_ALL_BUS flag when
PCI_PROBE_ONLY is not enabled in order to work around issue like the
one described above.
Cc: stable(a)vger.kernel.org
Fixes: 7246a4520b4b ("PCI: Use preserve_config in place of pci_flags")
Signed-off-by: Bo Sun <Bo.Sun.CN(a)windriver.com>
---
drivers/pci/quirks.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 82b21e34c545..cec58c7479e1 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6181,6 +6181,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
+/*
+ * Quirk for Marvell CN96XX/CN10XXX boards:
+ *
+ * Adds PCI_REASSIGN_ALL_BUS unless PCI_PROBE_ONLY is set, forcing bus number
+ * reassignment to avoid conflicts caused by bootloader misconfigured PCI bridges.
+ *
+ * This resolves a regression introduced by commit 7246a4520b4b ("PCI: Use
+ * preserve_config in place of pci_flags"), which removed this behavior.
+ */
+static void quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr(struct pci_dev *dev)
+{
+ if (!pci_has_flag(PCI_PROBE_ONLY))
+ pci_add_flags(PCI_REASSIGN_ALL_BUS);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_CAVIUM, 0xa002,
+ quirk_marvell_cn96xx_cn10xxx_reassign_all_busnr);
+
#ifdef CONFIG_PCIEASPM
/*
* Several Intel DG2 graphics devices advertise that they can only tolerate
--
2.48.1
On the arm64 platform with 4K base page config, SECTION_SIZE_BITS is set
to 27, making one section 128M. The related page struct which vmemmap
points to is 2M then.
Commit c1cc1552616d ("arm64: MMU initialisation") optimizes the
vmemmap to populate at the PMD section level which was suitable
initially since hot plug granule is always one section(128M). However,
commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
introduced a 2M(SUBSECTION_SIZE) hot plug granule, which disrupted the
existing arm64 assumptions.
The first problem is that if start or end is not aligned to a section
boundary, such as when a subsection is hot added, populating the entire
section is wasteful.
The next problem is if we hotplug something that spans part of 128 MiB
section (subsections, let's call it memblock1), and then hotplug something
that spans another part of a 128 MiB section(subsections, let's call it
memblock2), and subsequently unplug memblock1, vmemmap_free() will clear
the entire PMD entry which also supports memblock2 even though memblock2
is still active.
Assuming hotplug/unplug sizes are guaranteed to be symmetric. Do the
fix similar to x86-64: populate to pages levels if start/end is not aligned
with section boundary.
Cc: <stable(a)vger.kernel.org> # v5.4+
Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug")
Acked-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
---
arch/arm64/mm/mmu.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index b4df5bc5b1b8..1dfe1a8efdbe 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1177,8 +1177,11 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
struct vmem_altmap *altmap)
{
WARN_ON((start < VMEMMAP_START) || (end > VMEMMAP_END));
+ /* [start, end] should be within one section */
+ WARN_ON_ONCE(end - start > PAGES_PER_SECTION * sizeof(struct page));
- if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES))
+ if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) ||
+ (end - start < PAGES_PER_SECTION * sizeof(struct page)))
return vmemmap_populate_basepages(start, end, node, altmap);
else
return vmemmap_populate_hugepages(start, end, node, altmap);
--
2.25.1
Some users are reporting that ov08x40_identify_module() fails
to identify the chip reading 0x00 as value for OV08X40_REG_CHIP_ID.
Intel's out of tree IPU6 drivers include some ov08x40 changes
including adding support for the reset GPIO for older kernels and
Intel's patch for this uses 5 ms. Extend the sleep to 5 ms following
Intel's example, this fixes the ov08x40_identify_module() problem.
Link: https://github.com/intel/ipu6-drivers/blob/c09e2198d801e1eb701984d294837312…
Fixes: df1ae2251a50 ("media: ov08x40: Add OF probe support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/media/i2c/ov08x40.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/i2c/ov08x40.c b/drivers/media/i2c/ov08x40.c
index cf0e41fc3071..54575eea3c49 100644
--- a/drivers/media/i2c/ov08x40.c
+++ b/drivers/media/i2c/ov08x40.c
@@ -1341,7 +1341,7 @@ static int ov08x40_power_on(struct device *dev)
}
gpiod_set_value_cansleep(ov08x->reset_gpio, 0);
- usleep_range(1500, 1800);
+ usleep_range(5000, 5500);
return 0;
--
2.48.1
On Tue, Mar 11, 2025 at 06:54:00AM +0000, Cameron Williams wrote:
> Cc'ing stable
>
> Cc: stable(a)vger.kernel.org
>
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
The patch titled
Subject: memcg: drain obj stock on cpu hotplug teardown
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
memcg-drain-obj-stock-on-cpu-hotplug-teardown.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Shakeel Butt <shakeel.butt(a)linux.dev>
Subject: memcg: drain obj stock on cpu hotplug teardown
Date: Mon, 10 Mar 2025 16:09:34 -0700
Currently on cpu hotplug teardown, only memcg stock is drained but we
need to drain the obj stock as well otherwise we will miss the stats
accumulated on the target cpu as well as the nr_bytes cached. The stats
include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
addition we are leaking reference to struct obj_cgroup object.
Link: https://lkml.kernel.org/r/20250310230934.2913113-1-shakeel.butt@linux.dev
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc:
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 9 +++++++++
1 file changed, 9 insertions(+)
--- a/mm/memcontrol.c~memcg-drain-obj-stock-on-cpu-hotplug-teardown
+++ a/mm/memcontrol.c
@@ -1921,9 +1921,18 @@ void drain_all_stock(struct mem_cgroup *
static int memcg_hotplug_cpu_dead(unsigned int cpu)
{
struct memcg_stock_pcp *stock;
+ struct obj_cgroup *old;
+ unsigned long flags;
stock = &per_cpu(memcg_stock, cpu);
+
+ /* drain_obj_stock requires stock_lock */
+ local_lock_irqsave(&memcg_stock.stock_lock, flags);
+ old = drain_obj_stock(stock);
+ local_unlock_irqrestore(&memcg_stock.stock_lock, flags);
+
drain_stock(stock);
+ obj_cgroup_put(old);
return 0;
}
_
Patches currently in -mm which might be from shakeel.butt(a)linux.dev are
memcg-drain-obj-stock-on-cpu-hotplug-teardown.patch
memcg-add-hierarchical-effective-limits-for-v2.patch
memcg-dont-call-propagate_protected_usage-for-v1.patch
page_counter-track-failcnt-only-for-legacy-cgroups.patch
page_counter-reduce-struct-page_counter-size.patch
memcg-bypass-root-memcg-check-for-skmem-charging.patch
The patch titled
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs.
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Zi Yan <ziy(a)nvidia.com>
Subject: mm/huge_memory: drop beyond-EOF folios with the right number of refs.
Date: Mon, 10 Mar 2025 11:57:27 -0400
When an after-split folio is large and needs to be dropped due to EOF,
folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop all
page cache refs. Otherwise, the folio will not be freed, causing memory
leak.
This leak would happen on a filesystem with blocksize > page_size and a
truncate is performed, where the blocksize makes folios split to >0 order
ones, causing truncated folios not being freed.
Link: https://lkml.kernel.org/r/20250310155727.472846-1-ziy@nvidia.com
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Reported-by: Hugh Dickins <hughd(a)google.com>
Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Kirill A. Shuemov <kirill.shutemov(a)linux.intel.com>
Cc: Luis Chamberalin <mcgrof(a)kernel.org>
Cc: Matthew Wilcow (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Pankaj Raghav <p.raghav(a)samsung.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Yang Shi <yang(a)os.amperecomputing.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs
+++ a/mm/huge_memory.c
@@ -3304,7 +3304,7 @@ static void __split_huge_page(struct pag
folio_account_cleaned(tail,
inode_to_wb(folio->mapping->host));
__filemap_remove_folio(tail, NULL);
- folio_put(tail);
+ folio_put_refs(tail, folio_nr_pages(tail));
} else if (!folio_test_anon(folio)) {
__xa_store(&folio->mapping->i_pages, tail->index,
tail, 0);
_
Patches currently in -mm which might be from ziy(a)nvidia.com are
mm-migrate-fix-shmem-xarray-update-during-migration.patch
mm-huge_memory-drop-beyond-eof-folios-with-the-right-number-of-refs.patch
selftests-mm-make-file-backed-thp-split-work-by-writing-pmd-size-data.patch
mm-huge_memory-allow-split-shmem-large-folio-to-any-lower-order.patch
selftests-mm-test-splitting-file-backed-thp-to-any-lower-order.patch
xarray-add-xas_try_split-to-split-a-multi-index-entry.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split.patch
mm-huge_memory-add-two-new-not-yet-used-functions-for-folio_split-fix.patch
mm-huge_memory-move-folio-split-common-code-to-__folio_split.patch
mm-huge_memory-add-buddy-allocator-like-non-uniform-folio_split.patch
mm-huge_memory-remove-the-old-unused-__split_huge_page.patch
mm-huge_memory-add-folio_split-to-debugfs-testing-interface.patch
mm-truncate-use-folio_split-in-truncate-operation.patch
selftests-mm-add-tests-for-folio_split-buddy-allocator-like-split.patch
mm-filemap-use-xas_try_split-in-__filemap_add_folio.patch
mm-shmem-use-xas_try_split-in-shmem_split_large_entry.patch
The patch titled
Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0
has been added to the -mm mm-unstable branch. Its filename is
mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: mm/mremap: correctly handle partial mremap() of VMA starting at 0
Date: Mon, 10 Mar 2025 20:50:34 +0000
Patch series "refactor mremap and fix bug", v3.
The existing mremap() logic has grown organically over a very long period
of time, resulting in code that is in many parts, very difficult to follow
and full of subtleties and sources of confusion.
In addition, it is difficult to thread state through the operation
correctly, as function arguments have expanded, some parameters are
expected to be temporarily altered during the operation, others are
intended to remain static and some can be overridden.
This series completely refactors the mremap implementation, sensibly
separating functions, adding comments to explain the more subtle aspects
of the implementation and making use of small structs to thread state
through everything.
The reason for doing so is to lay the groundwork for planned future
changes to the mremap logic, changes which require the ability to easily
pass around state.
Additionally, it would be unhelpful to add yet more logic to code that is
already difficult to follow without first refactoring it like this.
The first patch in this series additionally fixes a bug when a VMA with
start address zero is partially remapped.
Tested on real hardware under heavy workload and all self tests are
passing.
This patch (of 3):
Consider the case of a partial mremap() (that results in a VMA split) of
an accountable VMA (i.e. which has the VM_ACCOUNT flag set) whose start
address is zero, with the MREMAP_MAYMOVE flag specified and a scenario
where a move does in fact occur:
addr end
| |
v v
|-------------|
| vma |
|-------------|
0
This move is affected by unmapping the range [addr, end). In order to
prevent an incorrect decrement of accounted memory which has already been
determined, the mremap() code in move_vma() clears VM_ACCOUNT from the VMA
prior to doing so, before reestablishing it in each of the VMAs
post-split:
addr end
| |
v v
|---| |---|
| A | | B |
|---| |---|
Commit 6b73cff239e5 ("mm: change munmap splitting order and move_vma()")
changed this logic such as to determine whether there is a need to do so
by establishing account_start and account_end and, in the instance where
such an operation is required, assigning them to vma->vm_start and
vma->vm_end.
Later the code checks if the operation is required for 'A' referenced
above thusly:
if (account_start) {
...
}
However, if the VMA described above has vma->vm_start == 0, which is now
assigned to account_start, this branch will not be executed.
As a result, the VMA 'A' above will remain stripped of its VM_ACCOUNT
flag, incorrectly.
The fix is to simply convert these variables to booleans and set them as
required.
Link: https://lkml.kernel.org/r/cover.1741639347.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/dc55cb6db25d97c3d9e460de4986a323fa959676.17416393…
Fixes: 6b73cff239e5 ("mm: change munmap splitting order and move_vma()")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Harry Yoo <harry.yoo(a)oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
--- a/mm/mremap.c~mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0
+++ a/mm/mremap.c
@@ -705,8 +705,8 @@ static unsigned long move_vma(struct vm_
unsigned long vm_flags = vma->vm_flags;
unsigned long new_pgoff;
unsigned long moved_len;
- unsigned long account_start = 0;
- unsigned long account_end = 0;
+ bool account_start = false;
+ bool account_end = false;
unsigned long hiwater_vm;
int err = 0;
bool need_rmap_locks;
@@ -790,9 +790,9 @@ static unsigned long move_vma(struct vm_
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
vm_flags_clear(vma, VM_ACCOUNT);
if (vma->vm_start < old_addr)
- account_start = vma->vm_start;
+ account_start = true;
if (vma->vm_end > old_addr + old_len)
- account_end = vma->vm_end;
+ account_end = true;
}
/*
@@ -832,7 +832,7 @@ static unsigned long move_vma(struct vm_
/* OOM: unable to split vma, just get accounts right */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP))
vm_acct_memory(old_len >> PAGE_SHIFT);
- account_start = account_end = 0;
+ account_start = account_end = false;
}
if (vm_flags & VM_LOCKED) {
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-simplify-vma-merge-structure-and-expand-comments.patch
mm-further-refactor-commit_merge.patch
mm-eliminate-adj_start-parameter-from-commit_merge.patch
mm-make-vmg-target-consistent-and-further-simplify-commit_merge.patch
mm-completely-abstract-unnecessary-adj_start-calculation.patch
mm-madvise-split-out-mmap-locking-operations-for-madvise-fix.patch
mm-use-read-write_once-for-vma-vm_flags-on-migrate-mprotect.patch
mm-refactor-rmap_walk_file-to-separate-out-traversal-logic.patch
mm-provide-mapping_wrprotect_range-function.patch
fb_defio-do-not-use-deprecated-page-mapping-index-fields.patch
fb_defio-do-not-use-deprecated-page-mapping-index-fields-fix.patch
mm-allow-guard-regions-in-file-backed-and-read-only-mappings.patch
selftests-mm-rename-guard-pages-to-guard-regions.patch
selftests-mm-rename-guard-pages-to-guard-regions-fix.patch
tools-selftests-expand-all-guard-region-tests-to-file-backed.patch
tools-selftests-add-file-shmem-backed-mapping-guard-region-tests.patch
fs-proc-task_mmu-add-guard-region-bit-to-pagemap.patch
tools-selftests-add-guard-region-test-for-proc-pid-pagemap.patch
tools-selftests-add-guard-region-test-for-proc-pid-pagemap-fix.patch
mm-mremap-correctly-handle-partial-mremap-of-vma-starting-at-0.patch
mm-mremap-refactor-mremap-system-call-implementation.patch
mm-mremap-introduce-and-use-vma_remap_struct-threaded-state.patch
mm-mremap-initial-refactor-of-move_vma.patch
mm-mremap-complete-refactor-of-move_vma.patch
mm-mremap-refactor-move_page_tables-abstracting-state.patch
mm-mremap-thread-state-through-move-page-table-operation.patch
The handling of the MST Connection Status Notify message is skipped if
the probing of the topology is still pending. Acquiring the
drm_dp_mst_topology_mgr::probe_lock for this in
drm_dp_mst_handle_up_req() is problematic: the task/work this function
is called from is also responsible for handling MST down-request replies
(in drm_dp_mst_handle_down_rep()). Thus drm_dp_mst_link_probe_work() -
holding already probe_lock - could be blocked waiting for an MST
down-request reply while drm_dp_mst_handle_up_req() is waiting for
probe_lock while processing a CSN message. This leads to the probe
work's down-request message timing out.
A scenario similar to the above leading to a down-request timeout is
handling a CSN message in drm_dp_mst_handle_conn_stat(), holding the
probe_lock and sending down-request messages while a second CSN message
sent by the sink subsequently is handled by drm_dp_mst_handle_up_req().
Fix the above by moving the logic to skip the CSN handling to
drm_dp_mst_process_up_req(). This function is called from a work
(separate from the task/work handling new up/down messages), already
holding probe_lock. This solves the above timeout issue, since handling
of down-request replies won't be blocked by probe_lock.
Fixes: ddf983488c3e ("drm/dp_mst: Skip CSN if topology probing is not done yet")
Cc: Wayne Lin <Wayne.Lin(a)amd.com>
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org # v6.6+
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 40 +++++++++++--------
1 file changed, 24 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 8b68bb3fbffb0..3a1f1ffc7b552 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -4036,6 +4036,22 @@ static int drm_dp_mst_handle_down_rep(struct drm_dp_mst_topology_mgr *mgr)
return 0;
}
+static bool primary_mstb_probing_is_done(struct drm_dp_mst_topology_mgr *mgr)
+{
+ bool probing_done = false;
+
+ mutex_lock(&mgr->lock);
+
+ if (mgr->mst_primary && drm_dp_mst_topology_try_get_mstb(mgr->mst_primary)) {
+ probing_done = mgr->mst_primary->link_address_sent;
+ drm_dp_mst_topology_put_mstb(mgr->mst_primary);
+ }
+
+ mutex_unlock(&mgr->lock);
+
+ return probing_done;
+}
+
static inline bool
drm_dp_mst_process_up_req(struct drm_dp_mst_topology_mgr *mgr,
struct drm_dp_pending_up_req *up_req)
@@ -4066,8 +4082,12 @@ drm_dp_mst_process_up_req(struct drm_dp_mst_topology_mgr *mgr,
/* TODO: Add missing handler for DP_RESOURCE_STATUS_NOTIFY events */
if (msg->req_type == DP_CONNECTION_STATUS_NOTIFY) {
- dowork = drm_dp_mst_handle_conn_stat(mstb, &msg->u.conn_stat);
- hotplug = true;
+ if (!primary_mstb_probing_is_done(mgr)) {
+ drm_dbg_kms(mgr->dev, "Got CSN before finish topology probing. Skip it.\n");
+ } else {
+ dowork = drm_dp_mst_handle_conn_stat(mstb, &msg->u.conn_stat);
+ hotplug = true;
+ }
}
drm_dp_mst_topology_put_mstb(mstb);
@@ -4149,10 +4169,11 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
drm_dp_send_up_ack_reply(mgr, mst_primary, up_req->msg.req_type,
false);
+ drm_dp_mst_topology_put_mstb(mst_primary);
+
if (up_req->msg.req_type == DP_CONNECTION_STATUS_NOTIFY) {
const struct drm_dp_connection_status_notify *conn_stat =
&up_req->msg.u.conn_stat;
- bool handle_csn;
drm_dbg_kms(mgr->dev, "Got CSN: pn: %d ldps:%d ddps: %d mcs: %d ip: %d pdt: %d\n",
conn_stat->port_number,
@@ -4161,16 +4182,6 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
conn_stat->message_capability_status,
conn_stat->input_port,
conn_stat->peer_device_type);
-
- mutex_lock(&mgr->probe_lock);
- handle_csn = mst_primary->link_address_sent;
- mutex_unlock(&mgr->probe_lock);
-
- if (!handle_csn) {
- drm_dbg_kms(mgr->dev, "Got CSN before finish topology probing. Skip it.");
- kfree(up_req);
- goto out_put_primary;
- }
} else if (up_req->msg.req_type == DP_RESOURCE_STATUS_NOTIFY) {
const struct drm_dp_resource_status_notify *res_stat =
&up_req->msg.u.resource_stat;
@@ -4185,9 +4196,6 @@ static int drm_dp_mst_handle_up_req(struct drm_dp_mst_topology_mgr *mgr)
list_add_tail(&up_req->next, &mgr->up_req_list);
mutex_unlock(&mgr->up_req_lock);
queue_work(system_long_wq, &mgr->up_req_work);
-
-out_put_primary:
- drm_dp_mst_topology_put_mstb(mst_primary);
out_clear_reply:
reset_msg_rx_state(&mgr->up_req_recv);
return ret;
--
2.44.2
Dear stable team,
I noticed that ceeeb99cd821 ("dmaengine: mxs: rename custom flag") got backported, but the additional fix 269e31aecdd0 ("spi-mxs: Fix chipselect glitch") hasn't.
I think was caused by the lack of Cc to stable. Without the latter patch the SPI is causing glitches on MXS platform.
Please backport it from 5.4 to 6.6.
Thanks
Stefan
Sometimes I get a NULL pointer dereference at boot time in kobject_get()
with the following call stack:
anatop_regulator_probe()
devm_regulator_register()
regulator_register()
regulator_resolve_supply()
kobject_get()
By placing some extra BUG_ON() statements I could verify that this is
raised because probing of the 'dummy' regulator driver is not completed
('dummy_regulator_rdev' is still NULL).
In the JTAG debugger I can see that dummy_regulator_probe() and
anatop_regulator_probe() can be run by different kernel threads
(kworker/u4:*). I haven't further investigated whether this can be
changed or if there are other possibilities to force synchronization
between these two probe routines. On the other hand I don't expect much
boot time penalty by probing the 'dummy' regulator synchronously.
Cc: stable(a)vger.kernel.org
Fixes: 259b93b21a9f ("regulator: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in 4.14")
Signed-off-by: Christian Eggers <ceggers(a)arri.de>
---
drivers/regulator/dummy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/dummy.c b/drivers/regulator/dummy.c
index 5b9b9e4e762d..9f59889129ab 100644
--- a/drivers/regulator/dummy.c
+++ b/drivers/regulator/dummy.c
@@ -60,7 +60,7 @@ static struct platform_driver dummy_regulator_driver = {
.probe = dummy_regulator_probe,
.driver = {
.name = "reg-dummy",
- .probe_type = PROBE_PREFER_ASYNCHRONOUS,
+ .probe_type = PROBE_FORCE_SYNCHRONOUS,
},
};
--
2.43.0
Upon encountering errors during the HSIC pinctrl handling section the
regulator should be disabled.
After the above-stated changes it is possible to jump onto
"disable_hsic_regulator" label without having added the CPU latency QoS
request previously. This would result in cpu_latency_qos_remove_request()
yielding a WARNING.
So rearrange the error handling path to follow the reverse order of
different probing phases.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4d6141288c33 ("usb: chipidea: imx: pinctrl for HSIC is optional")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 619779eef333..3f11ae071c7f 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -407,13 +407,13 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
"pinctrl_hsic_idle lookup failed, err=%ld\n",
PTR_ERR(pinctrl_hsic_idle));
ret = PTR_ERR(pinctrl_hsic_idle);
- goto err_put;
+ goto disable_hsic_regulator;
}
ret = pinctrl_select_state(data->pinctrl, pinctrl_hsic_idle);
if (ret) {
dev_err(dev, "hsic_idle select failed, err=%d\n", ret);
- goto err_put;
+ goto disable_hsic_regulator;
}
data->pinctrl_hsic_active = pinctrl_lookup_state(data->pinctrl,
@@ -423,7 +423,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
"pinctrl_hsic_active lookup failed, err=%ld\n",
PTR_ERR(data->pinctrl_hsic_active));
ret = PTR_ERR(data->pinctrl_hsic_active);
- goto err_put;
+ goto disable_hsic_regulator;
}
}
@@ -432,11 +432,11 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
ret = imx_get_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = imx_prepare_enable_clks(dev);
if (ret)
- goto disable_hsic_regulator;
+ goto qos_remove_request;
ret = clk_prepare_enable(data->clk_wakeup);
if (ret)
@@ -526,12 +526,13 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
clk_disable_unprepare(data->clk_wakeup);
err_wakeup_clk:
imx_disable_unprepare_clks(dev);
+qos_remove_request:
+ if (pdata.flags & CI_HDRC_PMQOS)
+ cpu_latency_qos_remove_request(&data->pm_qos_req);
disable_hsic_regulator:
if (data->hsic_pad_regulator)
/* don't overwrite original ret (cf. EPROBE_DEFER) */
regulator_disable(data->hsic_pad_regulator);
- if (pdata.flags & CI_HDRC_PMQOS)
- cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
err_put:
if (data->usbmisc_data)
--
2.48.1
On Sun, Mar 09, 2025 at 03:45:57PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> drm/i915: Plumb 'dsb' all way to the plane hooks
>
> to the 6.12-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> drm-i915-plumb-dsb-all-way-to-the-plane-hooks.patch
> and it can be found in the queue-6.12 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit f03e7cca22f4bb50cae98840f91fcf1e6d780a54
> Author: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
> Date: Mon Sep 30 20:04:13 2024 +0300
>
> drm/i915: Plumb 'dsb' all way to the plane hooks
>
> [ Upstream commit 01389846f7d61d262cc92d42ad4d1a25730e3eff ]
It would help if you actually mentioned *why* you need to backport this?
--
Ville Syrjälä
Intel
Hello,
New build issue found on stable-rc/linux-6.6.y:
---
‘RISCV_ISA_EXT_XLINUXENVCFG’ undeclared (first use in this function);
did you mean ‘RISCV_ISA_EXT_ZIFENCEI’? in arch/riscv/kernel/suspend.o
(arch/riscv/kernel/suspend.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:f277022d07efdd2a5858eb44b3c3dab79cca84…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: b49d45c66a5e8cc1c82591049bfc0d04daa1e77c
Log excerpt:
=====================================================
arch/riscv/kernel/suspend.c:14:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’
undeclared (first use in this function); did you mean
‘RISCV_ISA_EXT_ZIFENCEI’?
14 | if
(riscv_cpu_has_extension_unlikely(smp_processor_id(),
RISCV_ISA_EXT_XLINUXENVCFG))
|
^~~~~~~~~~~~~~~~~~~~~~~~~~
|
RISCV_ISA_EXT_ZIFENCEI
arch/riscv/kernel/suspend.c:14:66: note: each undeclared identifier is
reported only once for each function it appears in
CC fs/proc/cpuinfo.o
arch/riscv/kernel/suspend.c: In function ‘suspend_restore_csrs’:
arch/riscv/kernel/suspend.c:37:66: error: ‘RISCV_ISA_EXT_XLINUXENVCFG’
undeclared (first use in this function); did you mean
‘RISCV_ISA_EXT_ZIFENCEI’?
37 | if
(riscv_cpu_has_extension_unlikely(smp_processor_id(),
RISCV_ISA_EXT_XLINUXENVCFG))
|
^~~~~~~~~~~~~~~~~~~~~~~~~~
|
RISCV_ISA_EXT_ZIFENCEI
=====================================================
# Builds where the incident occurred:
## defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67cf00ee18018371957ec83e
#kernelci issue maestro:f277022d07efdd2a5858eb44b3c3dab79cca847e
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
When an after-split folio is large and needs to be dropped due to EOF,
folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop
all page cache refs. Otherwise, the folio will not be freed, causing
memory leak.
This leak would happen on a filesystem with blocksize > page_size and
a truncate is performed, where the blocksize makes folios split to
>0 order ones, causing truncated folios not being freed.
Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages")
Reported-by: Hugh Dickins <hughd(a)google.com>
Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/
Cc: stable(a)vger.kernel.org
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3d3ebdc002d5..373781b21e5c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3304,7 +3304,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
folio_account_cleaned(tail,
inode_to_wb(folio->mapping->host));
__filemap_remove_folio(tail, NULL);
- folio_put(tail);
+ folio_put_refs(tail, folio_nr_pages(tail));
} else if (!folio_test_anon(folio)) {
__xa_store(&folio->mapping->i_pages, tail->index,
tail, 0);
--
2.47.2
The macb ethernet driver (Raspberry Pi 5) delivers interrupts only to
the first core, quickly saturating it at higher packet rates.
Introducing software interrupt coalescing dramatically alleviates this
limitation; the oneliner fix is upstream at
d57f7b45945ac0517ff8ea50655f00db6e8d637c.
Please backport this fix to 6.6 -stable to bring this benefit to more
Raspberry Pis; it applies cleanly on this branch.
Many thanks,
Daniel
--
Daniel J Blueman
usbmisc is an optional device property so it is totally valid for the
corresponding data->usbmisc_data to have a NULL value.
Check that before dereferencing the pointer.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
Fixes: 74adad500346 ("usb: chipidea: ci_hdrc_imx: decrement device's refcount in .remove() and in the error path of .probe()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index 1a7fc638213e..619779eef333 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -534,7 +534,8 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
cpu_latency_qos_remove_request(&data->pm_qos_req);
data->ci_pdev = NULL;
err_put:
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
return ret;
}
@@ -559,7 +560,8 @@ static void ci_hdrc_imx_remove(struct platform_device *pdev)
if (data->hsic_pad_regulator)
regulator_disable(data->hsic_pad_regulator);
}
- put_device(data->usbmisc_data->dev);
+ if (data->usbmisc_data)
+ put_device(data->usbmisc_data->dev);
}
static void ci_hdrc_imx_shutdown(struct platform_device *pdev)
--
2.48.1
Hello,
New build issue found on stable-rc/linux-5.15.y:
---
implicit declaration of function ‘acpi_get_cache_info’; did you mean
‘acpi_get_system_info’? [-Werror=implicit-function-declaration] in
arch/riscv/kernel/cacheinfo.o (arch/riscv/kernel/cacheinfo.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:c4d70565f303a7d7450fbf5add7ca4cc80a961…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 2ae395ef666caf57984ff9d2ad7bca6be851f719
Log excerpt:
=====================================================
arch/riscv/kernel/cacheinfo.c:127:23: error: implicit declaration of
function ‘acpi_get_cache_info’; did you mean ‘acpi_get_system_info’?
[-Werror=implicit-function-declaration]
127 | ret = acpi_get_cache_info(cpu, &fw_levels,
&split_levels);
| ^~~~~~~~~~~~~~~~~~~
| acpi_get_system_info
cc1: some warnings being treated as errors
CC arch/riscv/kernel/patch.o
CC fs/proc/generic.o
=====================================================
# Builds where the incident occurred:
## defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ced73618018371957dfa8e
## nommu_k210_defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ced73a18018371957dfa91
#kernelci issue maestro:c4d70565f303a7d7450fbf5add7ca4cc80a96112
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
Hello,
New build issue found on stable-rc/linux-5.4.y:
---
implicit declaration of function ‘acpi_get_cache_info’; did you mean
‘acpi_get_system_info’? [-Werror=implicit-function-declaration] in
arch/riscv/kernel/cacheinfo.o (arch/riscv/kernel/cacheinfo.c)
[logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/issue/maestro:0f2670909ac3275cc312c3c604f3ed03443fee…
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 2f9225fb6ea4ba2ad94f50f0e24bad9c353b8649
Log excerpt:
=====================================================
arch/riscv/kernel/cacheinfo.c:118:23: error: implicit declaration of
function ‘acpi_get_cache_info’; did you mean ‘acpi_get_system_info’?
[-Werror=implicit-function-declaration]
118 | ret = acpi_get_cache_info(cpu, &fw_levels,
&split_levels);
| ^~~~~~~~~~~~~~~~~~~
| acpi_get_system_info
arch/riscv/kernel/cacheinfo.c:140:13: error: implicit declaration of
function ‘of_property_present’; did you mean
‘fwnode_property_present’? [-Werror=implicit-function-declaration]
140 | if (of_property_present(np, "cache-size"))
| ^~~~~~~~~~~~~~~~~~~
| fwnode_property_present
CC arch/riscv/kernel/module-sections.o
CC arch/riscv/kernel/perf_regs.o
cc1: some warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## defconfig on (riscv):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:67ced63718018371957df9ae
#kernelci issue maestro:0f2670909ac3275cc312c3c604f3ed03443feecc
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
From: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
Ensure the PHY reset and perst is asserted during power-off to
guarantee it is in a reset state upon repeated power-on calls. This
resolves an issue where the PHY may not properly initialize during
subsequent power-on cycles. Power-on will deassert the reset at the
appropriate time after tuning the PHY parameters.
During suspend/resume cycles, we observed that the PHY PLL failed to
lock during resume when the CPU temperature increased from 65C to 75C.
The observed errors were:
phy phy-32f00000.pcie-phy.3: phy poweron failed --> -110
imx6q-pcie 33800000.pcie: waiting for PHY ready timeout!
imx6q-pcie 33800000.pcie: PM: dpm_run_callback(): genpd_resume_noirq+0x0/0x80 returns -110
imx6q-pcie 33800000.pcie: PM: failed to resume noirq: error -110
This resulted in a complete CPU freeze, which is resolved by ensuring
the PHY is in reset during power-on, thus preventing PHY PLL failures.
Cc: stable(a)vger.kernel.org
Fixes: 1aa97b002258 ("phy: freescale: pcie: Initialize the imx8 pcie standalone phy driver")
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Acked-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Signed-off-by: Stefan Eichenberger <stefan.eichenberger(a)toradex.com>
---
drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
index 5b505e34ca364..7355d9921b646 100644
--- a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
+++ b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
@@ -156,6 +156,16 @@ static int imx8_pcie_phy_power_on(struct phy *phy)
return ret;
}
+static int imx8_pcie_phy_power_off(struct phy *phy)
+{
+ struct imx8_pcie_phy *imx8_phy = phy_get_drvdata(phy);
+
+ reset_control_assert(imx8_phy->reset);
+ reset_control_assert(imx8_phy->perst);
+
+ return 0;
+}
+
static int imx8_pcie_phy_init(struct phy *phy)
{
struct imx8_pcie_phy *imx8_phy = phy_get_drvdata(phy);
@@ -176,6 +186,7 @@ static const struct phy_ops imx8_pcie_phy_ops = {
.init = imx8_pcie_phy_init,
.exit = imx8_pcie_phy_exit,
.power_on = imx8_pcie_phy_power_on,
+ .power_off = imx8_pcie_phy_power_off,
.owner = THIS_MODULE,
};
--
2.45.2
This is a note to let you know that I've just added the patch titled
iio: dac: ad3552r: clear reset status flag
to the 6.1-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iio-dac-ad3552r-clear-reset-status-flag.patch
and it can be found in the queue-6.1 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From e17b9f20da7d2bc1f48878ab2230523b2512d965 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Sat, 25 Jan 2025 17:24:32 +0100
Subject: iio: dac: ad3552r: clear reset status flag
From: Angelo Dureghello <adureghello(a)baylibre.com>
commit e17b9f20da7d2bc1f48878ab2230523b2512d965 upstream.
Clear reset status flag, to keep error status register clean after reset
(ad3552r manual, rev B table 38).
Reset error flag was left to 1, so debugging registers, the "Error
Status Register" was dirty (0x01). It is important to clear this bit, so
if there is any reset event over normal working mode, it is possible to
detect it.
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Link: https://patch.msgid.link/20250125-wip-bl-ad3552r-clear-reset-v2-1-aa3a27f3f…
Cc: <Stable@vger..kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/dac/ad3552r.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/iio/dac/ad3552r.c
+++ b/drivers/iio/dac/ad3552r.c
@@ -703,6 +703,12 @@ static int ad3552r_reset(struct ad3552r_
return ret;
}
+ /* Clear reset error flag, see ad3552r manual, rev B table 38. */
+ ret = ad3552r_write_reg(dac, AD3552R_REG_ADDR_ERR_STATUS,
+ AD3552R_MASK_RESET_STATUS);
+ if (ret)
+ return ret;
+
return ad3552r_update_reg_field(dac,
addr_mask_map[AD3552R_ADDR_ASCENSION][0],
addr_mask_map[AD3552R_ADDR_ASCENSION][1],
Patches currently in stable-queue which might be from adureghello(a)baylibre.com are
queue-6.1/iio-dac-ad3552r-clear-reset-status-flag.patch
This is a note to let you know that I've just added the patch titled
iio: dac: ad3552r: clear reset status flag
to the 6.12-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iio-dac-ad3552r-clear-reset-status-flag.patch
and it can be found in the queue-6.12 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From e17b9f20da7d2bc1f48878ab2230523b2512d965 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Sat, 25 Jan 2025 17:24:32 +0100
Subject: iio: dac: ad3552r: clear reset status flag
From: Angelo Dureghello <adureghello(a)baylibre.com>
commit e17b9f20da7d2bc1f48878ab2230523b2512d965 upstream.
Clear reset status flag, to keep error status register clean after reset
(ad3552r manual, rev B table 38).
Reset error flag was left to 1, so debugging registers, the "Error
Status Register" was dirty (0x01). It is important to clear this bit, so
if there is any reset event over normal working mode, it is possible to
detect it.
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Link: https://patch.msgid.link/20250125-wip-bl-ad3552r-clear-reset-v2-1-aa3a27f3f…
Cc: <Stable@vger..kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/dac/ad3552r.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/iio/dac/ad3552r.c
+++ b/drivers/iio/dac/ad3552r.c
@@ -714,6 +714,12 @@ static int ad3552r_reset(struct ad3552r_
return ret;
}
+ /* Clear reset error flag, see ad3552r manual, rev B table 38. */
+ ret = ad3552r_write_reg(dac, AD3552R_REG_ADDR_ERR_STATUS,
+ AD3552R_MASK_RESET_STATUS);
+ if (ret)
+ return ret;
+
return ad3552r_update_reg_field(dac,
addr_mask_map[AD3552R_ADDR_ASCENSION][0],
addr_mask_map[AD3552R_ADDR_ASCENSION][1],
Patches currently in stable-queue which might be from adureghello(a)baylibre.com are
queue-6.12/iio-dac-ad3552r-clear-reset-status-flag.patch
This is a note to let you know that I've just added the patch titled
iio: dac: ad3552r: clear reset status flag
to the 6.13-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
iio-dac-ad3552r-clear-reset-status-flag.patch
and it can be found in the queue-6.13 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
From e17b9f20da7d2bc1f48878ab2230523b2512d965 Mon Sep 17 00:00:00 2001
From: Angelo Dureghello <adureghello(a)baylibre.com>
Date: Sat, 25 Jan 2025 17:24:32 +0100
Subject: iio: dac: ad3552r: clear reset status flag
From: Angelo Dureghello <adureghello(a)baylibre.com>
commit e17b9f20da7d2bc1f48878ab2230523b2512d965 upstream.
Clear reset status flag, to keep error status register clean after reset
(ad3552r manual, rev B table 38).
Reset error flag was left to 1, so debugging registers, the "Error
Status Register" was dirty (0x01). It is important to clear this bit, so
if there is any reset event over normal working mode, it is possible to
detect it.
Fixes: 8f2b54824b28 ("drivers:iio:dac: Add AD3552R driver support")
Signed-off-by: Angelo Dureghello <adureghello(a)baylibre.com>
Link: https://patch.msgid.link/20250125-wip-bl-ad3552r-clear-reset-v2-1-aa3a27f3f…
Cc: <Stable@vger..kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/iio/dac/ad3552r.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/iio/dac/ad3552r.c
+++ b/drivers/iio/dac/ad3552r.c
@@ -410,6 +410,12 @@ static int ad3552r_reset(struct ad3552r_
return ret;
}
+ /* Clear reset error flag, see ad3552r manual, rev B table 38. */
+ ret = ad3552r_write_reg(dac, AD3552R_REG_ADDR_ERR_STATUS,
+ AD3552R_MASK_RESET_STATUS);
+ if (ret)
+ return ret;
+
return ad3552r_update_reg_field(dac,
AD3552R_REG_ADDR_INTERFACE_CONFIG_A,
AD3552R_MASK_ADDR_ASCENSION,
Patches currently in stable-queue which might be from adureghello(a)baylibre.com are
queue-6.13/iio-dac-ad3552r-clear-reset-status-flag.patch
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 91d44c1afc61a2fec37a9c7a3485368309391e0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031035-dangle-briskness-0e29@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91d44c1afc61a2fec37a9c7a3485368309391e0b Mon Sep 17 00:00:00 2001
From: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Date: Sat, 18 Jan 2025 15:08:33 +0800
Subject: [PATCH] cdx: Fix possible UAF error in driver_override_show()
Fixed a possible UAF problem in driver_override_show() in drivers/cdx/cdx.c
This function driver_override_show() is part of DEVICE_ATTR_RW, which
includes both driver_override_show() and driver_override_store().
These functions can be executed concurrently in sysfs.
The driver_override_store() function uses driver_set_override() to
update the driver_override value, and driver_set_override() internally
locks the device (device_lock(dev)). If driver_override_show() reads
cdx_dev->driver_override without locking, it could potentially access
a freed pointer if driver_override_store() frees the string
concurrently. This could lead to printing a kernel address, which is a
security risk since DEVICE_ATTR can be read by all users.
Additionally, a similar pattern is used in drivers/amba/bus.c, as well
as many other bus drivers, where device_lock() is taken in the show
function, and it has been working without issues.
This potential bug was detected by our experimental static analysis
tool, which analyzes locking APIs and paired functions to identify
data races and atomicity violations.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Link: https://lore.kernel.org/r/20250118070833.27201-1-chenqiuji666@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index c573ed2ee71a..7811aa734053 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -473,8 +473,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cdx_device *cdx_dev = to_cdx_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 91d44c1afc61a2fec37a9c7a3485368309391e0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031035-unmoving-oak-e2a9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91d44c1afc61a2fec37a9c7a3485368309391e0b Mon Sep 17 00:00:00 2001
From: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Date: Sat, 18 Jan 2025 15:08:33 +0800
Subject: [PATCH] cdx: Fix possible UAF error in driver_override_show()
Fixed a possible UAF problem in driver_override_show() in drivers/cdx/cdx.c
This function driver_override_show() is part of DEVICE_ATTR_RW, which
includes both driver_override_show() and driver_override_store().
These functions can be executed concurrently in sysfs.
The driver_override_store() function uses driver_set_override() to
update the driver_override value, and driver_set_override() internally
locks the device (device_lock(dev)). If driver_override_show() reads
cdx_dev->driver_override without locking, it could potentially access
a freed pointer if driver_override_store() frees the string
concurrently. This could lead to printing a kernel address, which is a
security risk since DEVICE_ATTR can be read by all users.
Additionally, a similar pattern is used in drivers/amba/bus.c, as well
as many other bus drivers, where device_lock() is taken in the show
function, and it has been working without issues.
This potential bug was detected by our experimental static analysis
tool, which analyzes locking APIs and paired functions to identify
data races and atomicity violations.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Link: https://lore.kernel.org/r/20250118070833.27201-1-chenqiuji666@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index c573ed2ee71a..7811aa734053 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -473,8 +473,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cdx_device *cdx_dev = to_cdx_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 91d44c1afc61a2fec37a9c7a3485368309391e0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031034-faction-uphold-6310@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91d44c1afc61a2fec37a9c7a3485368309391e0b Mon Sep 17 00:00:00 2001
From: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Date: Sat, 18 Jan 2025 15:08:33 +0800
Subject: [PATCH] cdx: Fix possible UAF error in driver_override_show()
Fixed a possible UAF problem in driver_override_show() in drivers/cdx/cdx.c
This function driver_override_show() is part of DEVICE_ATTR_RW, which
includes both driver_override_show() and driver_override_store().
These functions can be executed concurrently in sysfs.
The driver_override_store() function uses driver_set_override() to
update the driver_override value, and driver_set_override() internally
locks the device (device_lock(dev)). If driver_override_show() reads
cdx_dev->driver_override without locking, it could potentially access
a freed pointer if driver_override_store() frees the string
concurrently. This could lead to printing a kernel address, which is a
security risk since DEVICE_ATTR can be read by all users.
Additionally, a similar pattern is used in drivers/amba/bus.c, as well
as many other bus drivers, where device_lock() is taken in the show
function, and it has been working without issues.
This potential bug was detected by our experimental static analysis
tool, which analyzes locking APIs and paired functions to identify
data races and atomicity violations.
Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
Link: https://lore.kernel.org/r/20250118070833.27201-1-chenqiuji666@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/cdx/cdx.c b/drivers/cdx/cdx.c
index c573ed2ee71a..7811aa734053 100644
--- a/drivers/cdx/cdx.c
+++ b/drivers/cdx/cdx.c
@@ -473,8 +473,12 @@ static ssize_t driver_override_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct cdx_device *cdx_dev = to_cdx_device(dev);
+ ssize_t len;
- return sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_lock(dev);
+ len = sysfs_emit(buf, "%s\n", cdx_dev->driver_override);
+ device_unlock(dev);
+ return len;
}
static DEVICE_ATTR_RW(driver_override);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 189ecdb3e112da703ac0699f4ec76aa78122f911
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031003-unstitch-arbitrate-baa1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 189ecdb3e112da703ac0699f4ec76aa78122f911 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:10 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single(). Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c6fd0edc41f..12d5f47c1bbe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,7 +4968,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
- vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
@@ -10969,6 +10968,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
+ vcpu->arch.host_debugctl = get_debugctlmsr();
+
guest_timing_enter_irqoff();
for (;;) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 189ecdb3e112da703ac0699f4ec76aa78122f911
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031002-campsite-railroad-4d13@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 189ecdb3e112da703ac0699f4ec76aa78122f911 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:10 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
Snapshot the host's DEBUGCTL after disabling IRQs, as perf can toggle
debugctl bits from IRQ context, e.g. when enabling/disabling events via
smp_call_function_single(). Taking the snapshot (long) before IRQs are
disabled could result in KVM effectively clobbering DEBUGCTL due to using
a stale snapshot.
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-6-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5c6fd0edc41f..12d5f47c1bbe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,7 +4968,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
- vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
@@ -10969,6 +10968,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
+ vcpu->arch.host_debugctl = get_debugctlmsr();
+
guest_timing_enter_irqoff();
for (;;) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fb71c795935652fa20eaf9517ca9547f5af99a76
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031034-twister-stash-ba87@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb71c795935652fa20eaf9517ca9547f5af99a76 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:08 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL in common x86
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li(a)intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b7af5902ff7..32ae3aa50c7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -780,6 +780,7 @@ struct kvm_vcpu_arch {
u32 pkru;
u32 hflags;
u64 efer;
+ u64 host_debugctl;
u64 apic_base;
struct kvm_lapic *apic; /* kernel irqchip context */
bool load_eoi_exitmap_pending;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6c56d5235f0f..3b92f893b239 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
*/
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- struct vcpu_vmx *vmx = to_vmx(vcpu);
-
if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
shrink_ple_window(vcpu);
vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
vmx_vcpu_pi_load(vcpu, cpu);
-
- vmx->host_debugctlmsr = get_debugctlmsr();
}
void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
}
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
- if (vmx->host_debugctlmsr)
- update_debugctlmsr(vmx->host_debugctlmsr);
+ if (vcpu->arch.host_debugctl)
+ update_debugctlmsr(vcpu->arch.host_debugctl);
#ifndef CONFIG_X86_64
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8b111ce1087c..951e44dc9d0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,8 +340,6 @@ struct vcpu_vmx {
/* apic deadline value in host tsc */
u64 hv_deadline_tsc;
- unsigned long host_debugctlmsr;
-
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
* msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02159c967d29..5c6fd0edc41f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,6 +4968,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
+ vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x fb71c795935652fa20eaf9517ca9547f5af99a76
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031034-latitude-stinking-09c1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fb71c795935652fa20eaf9517ca9547f5af99a76 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Thu, 27 Feb 2025 14:24:08 -0800
Subject: [PATCH] KVM: x86: Snapshot the host's DEBUGCTL in common x86
Move KVM's snapshot of DEBUGCTL to kvm_vcpu_arch and take the snapshot in
common x86, so that SVM can also use the snapshot.
Opportunistically change the field to a u64. While bits 63:32 are reserved
on AMD, not mentioned at all in Intel's SDM, and managed as an "unsigned
long" by the kernel, DEBUGCTL is an MSR and therefore a 64-bit value.
Reviewed-by: Xiaoyao Li <xiaoyao.li(a)intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-and-tested-by: Ravi Bangoria <ravi.bangoria(a)amd.com>
Link: https://lore.kernel.org/r/20250227222411.3490595-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b7af5902ff7..32ae3aa50c7e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -780,6 +780,7 @@ struct kvm_vcpu_arch {
u32 pkru;
u32 hflags;
u64 efer;
+ u64 host_debugctl;
u64 apic_base;
struct kvm_lapic *apic; /* kernel irqchip context */
bool load_eoi_exitmap_pending;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6c56d5235f0f..3b92f893b239 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1514,16 +1514,12 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
*/
void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- struct vcpu_vmx *vmx = to_vmx(vcpu);
-
if (vcpu->scheduled_out && !kvm_pause_in_guest(vcpu->kvm))
shrink_ple_window(vcpu);
vmx_vcpu_load_vmcs(vcpu, cpu, NULL);
vmx_vcpu_pi_load(vcpu, cpu);
-
- vmx->host_debugctlmsr = get_debugctlmsr();
}
void vmx_vcpu_put(struct kvm_vcpu *vcpu)
@@ -7458,8 +7454,8 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
}
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
- if (vmx->host_debugctlmsr)
- update_debugctlmsr(vmx->host_debugctlmsr);
+ if (vcpu->arch.host_debugctl)
+ update_debugctlmsr(vcpu->arch.host_debugctl);
#ifndef CONFIG_X86_64
/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 8b111ce1087c..951e44dc9d0e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -340,8 +340,6 @@ struct vcpu_vmx {
/* apic deadline value in host tsc */
u64 hv_deadline_tsc;
- unsigned long host_debugctlmsr;
-
/*
* Only bits masked by msr_ia32_feature_control_valid_bits can be set in
* msr_ia32_feature_control. FEAT_CTL_LOCKED is always included
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 02159c967d29..5c6fd0edc41f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4968,6 +4968,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
/* Save host pkru register if supported */
vcpu->arch.host_pkru = read_pkru();
+ vcpu->arch.host_debugctl = get_debugctlmsr();
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031025-hurry-muster-0e93@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031024-bootleg-parkway-393c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031023-dodge-ungodly-172a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x be45bc4eff33d9a7dae84a2150f242a91a617402
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025031022-debunk-winner-e8fe@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be45bc4eff33d9a7dae84a2150f242a91a617402 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 24 Feb 2025 08:54:41 -0800
Subject: [PATCH] KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the
STI shadow
Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.
The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.
However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.
Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.
Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.
Oppurtunstically document why KVM needs to do STI in the first place.
Reported-by: Doug Covelli <doug.covelli(a)broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4…
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a713c803a3a3..0d299f3f921e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4189,6 +4189,18 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
guest_state_enter_irqoff();
+ /*
+ * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of
+ * VMRUN controls whether or not physical IRQs are masked (KVM always
+ * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the
+ * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow
+ * into guest state if delivery of an event during VMRUN triggers a
+ * #VMEXIT, and the guest_state transitions already tell lockdep that
+ * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of
+ * this path, so IRQs aren't actually unmasked while running host code.
+ */
+ raw_local_irq_enable();
+
amd_clear_divider();
if (sev_es_guest(vcpu->kvm))
@@ -4197,6 +4209,8 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
else
__svm_vcpu_run(svm, spec_ctrl_intercepted);
+ raw_local_irq_disable();
+
guest_state_exit_irqoff();
}
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 2ed80aea3bb1..0c61153b275f 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -170,12 +170,8 @@ SYM_FUNC_START(__svm_vcpu_run)
mov VCPU_RDI(%_ASM_DI), %_ASM_DI
/* Enter guest mode */
- sti
-
3: vmrun %_ASM_AX
4:
- cli
-
/* Pop @svm to RAX while it's the only available register. */
pop %_ASM_AX
@@ -340,12 +336,8 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
mov KVM_VMCB_pa(%rax), %rax
/* Enter guest mode */
- sti
-
1: vmrun %rax
-
-2: cli
-
+2:
/* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */
FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
This patch series addresses 2 issues
1) Fix typo in pattern properties for R-Car V4M.
2) Fix page entries in the AFL list.
v1->v2:
* Split fixes patches as separate series.
* Added Rb tag from Geert for binding patch.
* Added the tag Cc:stable@vger.kernel.org
Biju Das (2):
dt-bindings: can: renesas,rcar-canfd: Fix typo in pattern properties
for R-Car V4M
can: rcar_canfd: Fix page entries in the AFL list
.../bindings/net/can/renesas,rcar-canfd.yaml | 2 +-
drivers/net/can/rcar/rcar_canfd.c | 17 ++++++++++-------
2 files changed, 11 insertions(+), 8 deletions(-)
--
2.43.0
This small series adds support for non-coherent video capture buffers
on Rockchip ISP V1. Patch 1 fixes cache management for dmabuf's
allocated by dma-contig allocator. Patch 2 allows non-coherent
allocations on the rkisp1 capture queue. Some timing measurements are
provided in the commit message of patch 2.
Signed-off-by: Mikhail Rudenko <mike.rudenko(a)gmail.com>
---
Changes in v4:
- rebase to media/next
- use `direction` instead of `buf->dma_dir` in dma_sync_sgtable_*
- Link to v3: https://lore.kernel.org/r/20250128-b4-rkisp-noncoherent-v3-0-baf39c997d2a@g…
Changes in v3:
- ignore skip_cache_sync_* flags in vb2_dc_dmabuf_ops_{begin,end}_cpu_access
- invalidate/flush kernel mappings as appropriate if they exist
- use dma_sync_sgtable_* instead of dma_sync_sg_*
- Link to v2: https://lore.kernel.org/r/20250115-b4-rkisp-noncoherent-v2-0-0853e1a24012@g…
Changes in v2:
- Fix vb2_dc_dmabuf_ops_{begin,end}_cpu_access() for non-coherent buffers.
- Add cache management timing information to patch 2 commit message.
- Link to v1: https://lore.kernel.org/r/20250102-b4-rkisp-noncoherent-v1-1-bba164f7132c@g…
---
Mikhail Rudenko (2):
media: videobuf2: Fix dmabuf cache sync/flush in dma-contig
media: rkisp1: Allow non-coherent video capture buffers
.../media/common/videobuf2/videobuf2-dma-contig.c | 22 ++++++++++++++++++++++
.../platform/rockchip/rkisp1/rkisp1-capture.c | 1 +
2 files changed, 23 insertions(+)
---
base-commit: b2c4bf0c102084e77ed1b12090d77a76469a6814
change-id: 20241231-b4-rkisp-noncoherent-ad6e7c7a68ba
Best regards,
--
Mikhail Rudenko <mike.rudenko(a)gmail.com>
Note that this was a real fix, but the fix only matters if commit
aaec5a95d596 ("pipe_read: don't wake up the writer if the pipe is
still full") is in the tree.
Now, the bug was pre-existing, and *maybe* it could be hit without
that commit aaec5a95d596, but nobody has ever reported it, so it's
very very unlikely.
Also, this fix then had some fall-out, and while I think you've queued
all the fallout fixes too, I think it might be a good idea to wait for
more reports from the development tree before considering these for
stable.
Put another way: this fix caused some pain. It might not be worth
back-porting to stable at all, and if it is, it might be worth waiting
to see that there's no other fallout.
Linus
On Sun, 9 Mar 2025 at 09:52, Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> fs/pipe: Read pipe->{head,tail} atomically outside pipe->mutex
From: Saurabh Sengar <ssengar(a)linux.microsoft.com>
On a x86 system under test with 1780 CPUs, topology_span_sane() takes
around 8 seconds cumulatively for all the iterations. It is an expensive
operation which does the sanity of non-NUMA topology masks.
CPU topology is not something which changes very frequently hence make
this check optional for the systems where the topology is trusted and
need faster bootup.
Restrict this to sched_verbose kernel cmdline option so that this penalty
can be avoided for the systems who want to avoid it.
Cc: stable(a)vger.kernel.org
Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't (partially) overlap")
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Co-developed-by: Naman Jain <namjain(a)linux.microsoft.com>
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
Tested-by: K Prateek Nayak <kprateek.nayak(a)amd.com>
---
Changes since v3:
https://lore.kernel.org/all/20250203114738.3109-1-namjain@linux.microsoft.c…
- Minor typo correction in comment
- Added Tested-by tag from Prateek for x86
Changes since v2:
https://lore.kernel.org/all/1731922777-7121-1-git-send-email-ssengar@linux.…
- Use sched_debug() instead of using sched_debug_verbose
variable directly (addressing Prateek's comment)
Changes since v1:
https://lore.kernel.org/all/1729619853-2597-1-git-send-email-ssengar@linux.…
- Use kernel cmdline param instead of compile time flag.
Adding a link to the other patch which is under review.
https://lore.kernel.org/lkml/20241031200431.182443-1-steve.wahl@hpe.com/
Above patch tries to optimize the topology sanity check, whereas this
patch makes it optional. We believe both patches can coexist, as even
with optimization, there will still be some performance overhead for
this check.
---
kernel/sched/topology.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c49aea8c1025..666f0a18cc6c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2359,6 +2359,13 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl,
{
int i = cpu + 1;
+ /* Skip the topology sanity check for non-debug, as it is a time-consuming operation */
+ if (!sched_debug()) {
+ pr_info_once("%s: Skipping topology span sanity check. Use `sched_verbose` boot parameter to enable it.\n",
+ __func__);
+ return true;
+ }
+
/* NUMA levels are allowed to overlap */
if (tl->flags & SDTL_OVERLAP)
return true;
--
2.34.1
Backport of a similar change from commit 5ac9b4e935df ("lib/buildid:
Handle memfd_secret() files in build_id_parse()") to address an issue
where accessing secret memfd contents through build_id_parse() would
trigger faults.
Original report and repro can be found in [0].
[0] https://lore.kernel.org/bpf/ZwyG8Uro%2FSyTXAni@ly-workstation/
This repro will cause BUG: unable to handle kernel paging request in
build_id_parse in 5.15/6.1/6.6.
Some other discussions can be found in [1].
[1] https://lore.kernel.org/bpf/20241104175256.2327164-1-jolsa@kernel.org/T/#u
Cc: stable(a)vger.kernel.org
Fixes: 88a16a130933 ("perf: Add build id data in mmap2 event")
Signed-off-by: Chen Linxuan <chenlinxuan(a)deepin.org>
---
lib/buildid.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/lib/buildid.c b/lib/buildid.c
index 9fc46366597e..b78d119ed1f7 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -157,6 +157,12 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
+#ifdef CONFIG_SECRETMEM
+ /* reject secretmem folios created with memfd_secret() */
+ if (vma->vm_file->f_mapping->a_ops == &secretmem_aops)
+ return -EFAULT;
+#endif
+
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
--
2.48.1
From: David Hildenbrand <david(a)redhat.com>
commit 091c1dd2d4df6edd1beebe0e5863d4034ade9572 upstream.
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm(a)linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[ Alexey: mmap_read_lock is not used in this context, so mmap_read_unlock
is removed. Synchronization is provided by an external context in
do_migrate_pages(). ]
Signed-off-by: Alexey Panov <apanov(a)astralinux.ru>
---
v2: Clarify mmap_lock context in changes summary. Fix braces for a single
statement block. Rearrange the changes with a comment and VM_BUG_ON to
look more consistent with upstream.
mm/mempolicy.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 399d8cb48813..f60ff4727f46 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1062,13 +1062,17 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
nodes_clear(nmask);
node_set(source, nmask);
+ VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
+
+ vma = find_vma(mm, 0);
+ if (unlikely(!vma))
+ return 0;
+
/*
* This does not "check" the range but isolates all pages that
* need migration. Between passing in the full user address
* space range and MPOL_MF_DISCONTIG_OK, this call can not fail.
*/
- vma = find_vma(mm, 0);
- VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask,
flags | MPOL_MF_DISCONTIG_OK, &pagelist);
--
2.30.2
From: David Hildenbrand <david(a)redhat.com>
commit 091c1dd2d4df6edd1beebe0e5863d4034ade9572 upstream.
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm(a)linux-foundation.org: add unlikely()]
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[ Alexey: mmap_read_lock is not used in this context, so mmap_read_unlock
is removed. Synchronization is provided by an external context in
do_migrate_pages(). find_vma(mm, 0) is the same as mm->mmap. ]
Signed-off-by: Alexey Panov <apanov(a)astralinux.ru>
---
v2: Clarify mmap_lock context in changes summary. Fix braces for a single
statement block. Rearrange the changes with a comment and VM_BUG_ON to
look more consistent with upstream.
mm/mempolicy.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 6c98585f20df..db94aec0ea17 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1067,6 +1067,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
int flags)
{
nodemask_t nmask;
+ struct vm_area_struct *vma;
LIST_HEAD(pagelist);
int err = 0;
struct migration_target_control mtc = {
@@ -1077,13 +1078,18 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
nodes_clear(nmask);
node_set(source, nmask);
+ VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
+
+ vma = find_vma(mm, 0);
+ if (unlikely(!vma))
+ return 0;
+
/*
* This does not "check" the range but isolates all pages that
* need migration. Between passing in the full user address
* space range and MPOL_MF_DISCONTIG_OK, this call can not fail.
*/
- VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
- queue_pages_range(mm, mm->mmap->vm_start, mm->task_size, &nmask,
+ queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask,
flags | MPOL_MF_DISCONTIG_OK, &pagelist);
if (!list_empty(&pagelist)) {
--
2.30.2
This patch series backports the minimum set of changes in order to fix
this warning that popped up with >= 5.4.284 stable kernels:
In file included from ./include/linux/mm.h:29,
from ./include/linux/pagemap.h:8,
from ./include/linux/buffer_head.h:14,
from fs/udf/udfdecl.h:12,
from fs/udf/super.c:41:
fs/udf/super.c: In function 'udf_fill_partdesc_info':
./include/linux/overflow.h:70:15: warning: comparison of distinct pointer types lacks a cast
(void) (&__a == &__b); \
^~
fs/udf/super.c:1162:7: note: in expansion of macro 'check_add_overflow'
if (check_add_overflow(map->s_partition_len,
^~~~~~~~~~~~~~~~~~
Changes in v2:
- added missing upstream commit ID to the last patch in the series
Kees Cook (2):
overflow: Add __must_check attribute to check_*() helpers
overflow: Allow mixed type arguments
Keith Busch (1):
overflow: Correct check_shl_overflow() comment
include/linux/overflow.h | 101 +++++++++++++++++++++++----------------
1 file changed, 60 insertions(+), 41 deletions(-)
--
2.34.1
No upstream commit exists for this commit.
The issue was introduced with backporting upstream commit 091c1dd2d4df
("mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA
in a MM").
The backport incorrectly added unlock logic to a path where
mmap_lock was provided by external context in do_migrate_pages(),
creating lock imbalance when no VMAs are found.
This fixes the report:
WARNING: bad unlock balance detected!
6.6.79 #1 Not tainted
-------------------------------------
repro/9655 is trying to release lock (&mm->mmap_lock) at:
[<ffffffff81daa36f>] mmap_read_unlock include/linux/mmap_lock.h:173 [inline]
[<ffffffff81daa36f>] do_migrate_pages+0x59f/0x700 mm/mempolicy.c:1196
but there are no more locks to release!
other info that might help us debug this:
no locks held by repro/9655.
stack backtrace:
CPU: 1 PID: 9655 Comm: a Not tainted 6.6.79 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd5/0x1b0 lib/dump_stack.c:106
__lock_release kernel/locking/lockdep.c:5431 [inline]
lock_release+0x4b1/0x680 kernel/locking/lockdep.c:5774
up_read+0x12/0x20 kernel/locking/rwsem.c:1615
mmap_read_unlock include/linux/mmap_lock.h:173 [inline]
do_migrate_pages+0x59f/0x700 mm/mempolicy.c:1196
kernel_migrate_pages+0x59b/0x780 mm/mempolicy.c:1665
__do_sys_migrate_pages mm/mempolicy.c:1684 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1680 [inline]
__x64_sys_migrate_pages+0x92/0xf0 mm/mempolicy.c:1680
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x68/0xd2
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: a13b2b9b0b0b ("mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM")
Signed-off-by: Alexey Panov <apanov(a)astralinux.ru>
---
v2: Clarify mmap_lock context in commit description. Fix braces for a
single statement block. Add empty line after VM_BUG_ON to look more
consistent with upstream.
mm/mempolicy.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 94c74c594d10..d2855507d2e9 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1070,11 +1070,10 @@ static long migrate_to_node(struct mm_struct *mm, int source, int dest,
node_set(source, nmask);
VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)));
+
vma = find_vma(mm, 0);
- if (unlikely(!vma)) {
- mmap_read_unlock(mm);
+ if (unlikely(!vma))
return 0;
- }
/*
* This does not migrate the range, but isolates all pages that
--
2.30.2
The internal microphone on the Lenovo ThinkPad E16 model requires a
quirk entry to work properly. This was fixed in a previous patch (linked
below), but depending on the specific variant of the model, the product
name may be "21M5" or "21M6".
The following patch fixed this issue for the 21M5 variant:
https://lore.kernel.org/all/20240725065442.9293-1-tiwai@suse.de/
This patch adds support for the microphone on the 21M6 variant.
Link: https://github.com/ramaureirac/thinkpad-e14-linux/issues/31
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thomas Mizrahi <thomasmizra(a)gmail.com>
---
I recently acquired a ThinkPad E16 Gen 2 AMD and could not get the internal
microphone working. After some research, I discovered this issue. Since my
machine is a 21M6 variant, the required quirk was not applied by the
existing patch. After applying this patch and testing on my machine, the
microphone was immediately recognized and worked without further issues.
sound/soc/amd/yc/acp6x-mach.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c
index b16587d8f97a..a7637056972a 100644
--- a/sound/soc/amd/yc/acp6x-mach.c
+++ b/sound/soc/amd/yc/acp6x-mach.c
@@ -248,6 +248,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
DMI_MATCH(DMI_PRODUCT_NAME, "21M5"),
}
},
+ {
+ .driver_data = &acp6x_card,
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "21M6"),
+ }
+ },
{
.driver_data = &acp6x_card,
.matches = {
--
2.48.1
On Thu, Mar 06, 2025 at 09:06:23PM +0000, Colin Evans wrote:
> > Please try collecting a usbmon trace for bus 2 showing the problem.
> > Ideally the trace should show what happens from system boot-up, but
> > there's no way to do that. Instead, you can do this (the first command
> > below disables the bus, the second starts the usbmon trace, and the
> > third re-enables the bus):
> >
> > echo 0 >/sys/bus/usb/devices/usb2/bConfigurationValue
> > cat /sys/kernel/debug/usb/usbmon/2u >usbmon.txt &
> > echo 1 >/sys/bus/usb/devices/usb2/bConfigurationValue
> >
> > Then after enough time has passed for the errors to show up, kill the
> > "cat" process and post the resulting trace file. (Note: If your
> > keyboard is attached to bus 2, you won't be able to use it to issue the
> > second and third commands. You could use a network login, or put the
> > commands into a shell file and run them that way.)
> >
> > In fact, you should do this twice: The second time, run it on machine 2
> > with the powered hub plugged in to suppress the errors.
> >
> > Alan Stern
>
> Happy to try this, but as it stands there is no such file, or file-like
> thing, on my machine-
>
> # ls /sys/kernel/debug/usb/usbmon/2u
> ls: cannot access '/sys/kernel/debug/usb/usbmon/2u': No such file or
> directory
>
> # find /sys/kernel/debug/usb -name "2u"
> #
>
> # ls /sys/kernel/debug/usb
> devices ehci ohci uhci uvcvideo xhci
>
>
> It seems something is missing?
Ah -- you have to load the usbmon module first:
modprobe usbmon
Some distributions do this for you automatically.
Alan Stern
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c133ec0e5717868c9967fa3df92a55e537b1aead
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030900-slaw-onstage-6b47@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c133ec0e5717868c9967fa3df92a55e537b1aead Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio(a)gmail.com>
Date: Tue, 25 Feb 2025 11:59:27 +0200
Subject: [PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805
Raspberry Pi is a major user of those chips and they discovered a bug -
when the end of a transfer ring segment is reached, up to four TRBs can
be prefetched from the next page even if the segment ends with link TRB
and on page boundary (the chip claims to support standard 4KB pages).
It also appears that if the prefetched TRBs belong to a different ring
whose doorbell is later rung, they may be used without refreshing from
system RAM and the endpoint will stay idle if their cycle bit is stale.
Other users complain about IOMMU faults on x86 systems, unsurprisingly.
Deal with it by using existing quirk which allocates a dummy page after
each transfer ring segment. This was seen to resolve both problems. RPi
came up with a more efficient solution, shortening each segment by four
TRBs, but it complicated the driver and they ditched it for this quirk.
Also rename the quirk and add VL805 device ID macro.
Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com>
Link: https://github.com/raspberrypi/linux/issues/4685
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215906
CC: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20250225095927.2512358-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..fdf0c1008225 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2437,7 +2437,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ if (xhci->quirks & XHCI_TRB_OVERFETCH)
+ /* Buggy HC prefetches beyond segment bounds - allocate dummy space at the end */
xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
else
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index ad0ff356f6fa..54460d11f7ee 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -38,6 +38,8 @@
#define PCI_DEVICE_ID_ETRON_EJ168 0x7023
#define PCI_DEVICE_ID_ETRON_EJ188 0x7052
+#define PCI_DEVICE_ID_VIA_VL805 0x3483
+
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31
#define PCI_DEVICE_ID_INTEL_WILDCATPOINT_LP_XHCI 0x9cb1
@@ -418,8 +420,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
- if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483)
+ if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == PCI_DEVICE_ID_VIA_VL805) {
xhci->quirks |= XHCI_LPM_SUPPORT;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
+ }
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) {
@@ -467,11 +471,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->device == 0x9202) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->device == 0x9203)
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->vendor == PCI_VENDOR_ID_CDNS &&
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..779b01dee068 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1632,7 +1632,7 @@ struct xhci_hcd {
#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
-#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
+#define XHCI_TRB_OVERFETCH BIT_ULL(45)
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x cc5bfc4e16fc1d1c520cd7bb28646e82b6e69217
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030953-washboard-overcrowd-fed5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cc5bfc4e16fc1d1c520cd7bb28646e82b6e69217 Mon Sep 17 00:00:00 2001
From: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Date: Thu, 30 Jan 2025 23:49:31 +0000
Subject: [PATCH] usb: dwc3: Set SUSPENDENABLE soon after phy init
After phy initialization, some phy operations can only be executed while
in lower P states. Ensure GUSB3PIPECTL.SUSPENDENABLE and
GUSB2PHYCFG.SUSPHY are set soon after initialization to avoid blocking
phy ops.
Previously the SUSPENDENABLE bits are only set after the controller
initialization, which may not happen right away if there's no gadget
driver or xhci driver bound. Revise this to clear SUSPENDENABLE bits
only when there's mode switching (change in GCTL.PRTCAPDIR).
Fixes: 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Link: https://lore.kernel.org/r/633aef0afee7d56d2316f7cc3e1b2a6d518a8cc9.17382809…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 2c472cb97f6c..66a08b527165 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -131,11 +131,24 @@ void dwc3_enable_susphy(struct dwc3 *dwc, bool enable)
}
}
-void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
+void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode, bool ignore_susphy)
{
+ unsigned int hw_mode;
u32 reg;
reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+
+ /*
+ * For DRD controllers, GUSB3PIPECTL.SUSPENDENABLE and
+ * GUSB2PHYCFG.SUSPHY should be cleared during mode switching,
+ * and they can be set after core initialization.
+ */
+ hw_mode = DWC3_GHWPARAMS0_MODE(dwc->hwparams.hwparams0);
+ if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD && !ignore_susphy) {
+ if (DWC3_GCTL_PRTCAP(reg) != mode)
+ dwc3_enable_susphy(dwc, false);
+ }
+
reg &= ~(DWC3_GCTL_PRTCAPDIR(DWC3_GCTL_PRTCAP_OTG));
reg |= DWC3_GCTL_PRTCAPDIR(mode);
dwc3_writel(dwc->regs, DWC3_GCTL, reg);
@@ -216,7 +229,7 @@ static void __dwc3_set_mode(struct work_struct *work)
spin_lock_irqsave(&dwc->lock, flags);
- dwc3_set_prtcap(dwc, desired_dr_role);
+ dwc3_set_prtcap(dwc, desired_dr_role, false);
spin_unlock_irqrestore(&dwc->lock, flags);
@@ -658,16 +671,7 @@ static int dwc3_ss_phy_setup(struct dwc3 *dwc, int index)
*/
reg &= ~DWC3_GUSB3PIPECTL_UX_EXIT_PX;
- /*
- * Above DWC_usb3.0 1.94a, it is recommended to set
- * DWC3_GUSB3PIPECTL_SUSPHY to '0' during coreConsultant configuration.
- * So default value will be '0' when the core is reset. Application
- * needs to set it to '1' after the core initialization is completed.
- *
- * Similarly for DRD controllers, GUSB3PIPECTL.SUSPENDENABLE must be
- * cleared after power-on reset, and it can be set after core
- * initialization.
- */
+ /* Ensure the GUSB3PIPECTL.SUSPENDENABLE is cleared prior to phy init. */
reg &= ~DWC3_GUSB3PIPECTL_SUSPHY;
if (dwc->u2ss_inp3_quirk)
@@ -747,15 +751,7 @@ static int dwc3_hs_phy_setup(struct dwc3 *dwc, int index)
break;
}
- /*
- * Above DWC_usb3.0 1.94a, it is recommended to set
- * DWC3_GUSB2PHYCFG_SUSPHY to '0' during coreConsultant configuration.
- * So default value will be '0' when the core is reset. Application
- * needs to set it to '1' after the core initialization is completed.
- *
- * Similarly for DRD controllers, GUSB2PHYCFG.SUSPHY must be cleared
- * after power-on reset, and it can be set after core initialization.
- */
+ /* Ensure the GUSB2PHYCFG.SUSPHY is cleared prior to phy init. */
reg &= ~DWC3_GUSB2PHYCFG_SUSPHY;
if (dwc->dis_enblslpm_quirk)
@@ -830,6 +826,25 @@ static int dwc3_phy_init(struct dwc3 *dwc)
goto err_exit_usb3_phy;
}
+ /*
+ * Above DWC_usb3.0 1.94a, it is recommended to set
+ * DWC3_GUSB3PIPECTL_SUSPHY and DWC3_GUSB2PHYCFG_SUSPHY to '0' during
+ * coreConsultant configuration. So default value will be '0' when the
+ * core is reset. Application needs to set it to '1' after the core
+ * initialization is completed.
+ *
+ * Certain phy requires to be in P0 power state during initialization.
+ * Make sure GUSB3PIPECTL.SUSPENDENABLE and GUSB2PHYCFG.SUSPHY are clear
+ * prior to phy init to maintain in the P0 state.
+ *
+ * After phy initialization, some phy operations can only be executed
+ * while in lower P states. Ensure GUSB3PIPECTL.SUSPENDENABLE and
+ * GUSB2PHYCFG.SUSPHY are set soon after initialization to avoid
+ * blocking phy ops.
+ */
+ if (!DWC3_VER_IS_WITHIN(DWC3, ANY, 194A))
+ dwc3_enable_susphy(dwc, true);
+
return 0;
err_exit_usb3_phy:
@@ -1588,7 +1603,7 @@ static int dwc3_core_init_mode(struct dwc3 *dwc)
switch (dwc->dr_mode) {
case USB_DR_MODE_PERIPHERAL:
- dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_DEVICE);
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_DEVICE, false);
if (dwc->usb2_phy)
otg_set_vbus(dwc->usb2_phy->otg, false);
@@ -1600,7 +1615,7 @@ static int dwc3_core_init_mode(struct dwc3 *dwc)
return dev_err_probe(dev, ret, "failed to initialize gadget\n");
break;
case USB_DR_MODE_HOST:
- dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST, false);
if (dwc->usb2_phy)
otg_set_vbus(dwc->usb2_phy->otg, true);
@@ -1645,7 +1660,7 @@ static void dwc3_core_exit_mode(struct dwc3 *dwc)
}
/* de-assert DRVVBUS for HOST and OTG mode */
- dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_DEVICE);
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_DEVICE, true);
}
static void dwc3_get_software_properties(struct dwc3 *dwc)
@@ -2453,7 +2468,7 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
if (ret)
return ret;
- dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_DEVICE);
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_DEVICE, true);
dwc3_gadget_resume(dwc);
break;
case DWC3_GCTL_PRTCAP_HOST:
@@ -2461,7 +2476,7 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
ret = dwc3_core_init_for_resume(dwc);
if (ret)
return ret;
- dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST);
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_HOST, true);
break;
}
/* Restore GUSB2PHYCFG bits that were modified in suspend */
@@ -2490,7 +2505,7 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg)
if (ret)
return ret;
- dwc3_set_prtcap(dwc, dwc->current_dr_role);
+ dwc3_set_prtcap(dwc, dwc->current_dr_role, true);
dwc3_otg_init(dwc);
if (dwc->current_otg_role == DWC3_OTG_ROLE_HOST) {
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index c955039bb4f6..aaa39e663f60 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1558,7 +1558,7 @@ struct dwc3_gadget_ep_cmd_params {
#define DWC3_HAS_OTG BIT(3)
/* prototypes */
-void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode);
+void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode, bool ignore_susphy);
void dwc3_set_mode(struct dwc3 *dwc, u32 mode);
u32 dwc3_core_fifo_space(struct dwc3_ep *dep, u8 type);
diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c
index d76ae676783c..7977860932b1 100644
--- a/drivers/usb/dwc3/drd.c
+++ b/drivers/usb/dwc3/drd.c
@@ -173,7 +173,7 @@ void dwc3_otg_init(struct dwc3 *dwc)
* block "Initialize GCTL for OTG operation".
*/
/* GCTL.PrtCapDir=2'b11 */
- dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_OTG);
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_OTG, true);
/* GUSB2PHYCFG0.SusPHY=0 */
reg = dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0));
reg &= ~DWC3_GUSB2PHYCFG_SUSPHY;
@@ -556,7 +556,7 @@ int dwc3_drd_init(struct dwc3 *dwc)
dwc3_drd_update(dwc);
} else {
- dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_OTG);
+ dwc3_set_prtcap(dwc, DWC3_GCTL_PRTCAP_OTG, true);
/* use OTG block to get ID event */
irq = dwc3_otg_get_irq(dwc);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x c133ec0e5717868c9967fa3df92a55e537b1aead
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030902-fernlike-flashback-65c0@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c133ec0e5717868c9967fa3df92a55e537b1aead Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio(a)gmail.com>
Date: Tue, 25 Feb 2025 11:59:27 +0200
Subject: [PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805
Raspberry Pi is a major user of those chips and they discovered a bug -
when the end of a transfer ring segment is reached, up to four TRBs can
be prefetched from the next page even if the segment ends with link TRB
and on page boundary (the chip claims to support standard 4KB pages).
It also appears that if the prefetched TRBs belong to a different ring
whose doorbell is later rung, they may be used without refreshing from
system RAM and the endpoint will stay idle if their cycle bit is stale.
Other users complain about IOMMU faults on x86 systems, unsurprisingly.
Deal with it by using existing quirk which allocates a dummy page after
each transfer ring segment. This was seen to resolve both problems. RPi
came up with a more efficient solution, shortening each segment by four
TRBs, but it complicated the driver and they ditched it for this quirk.
Also rename the quirk and add VL805 device ID macro.
Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com>
Link: https://github.com/raspberrypi/linux/issues/4685
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215906
CC: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20250225095927.2512358-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..fdf0c1008225 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2437,7 +2437,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ if (xhci->quirks & XHCI_TRB_OVERFETCH)
+ /* Buggy HC prefetches beyond segment bounds - allocate dummy space at the end */
xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
else
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index ad0ff356f6fa..54460d11f7ee 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -38,6 +38,8 @@
#define PCI_DEVICE_ID_ETRON_EJ168 0x7023
#define PCI_DEVICE_ID_ETRON_EJ188 0x7052
+#define PCI_DEVICE_ID_VIA_VL805 0x3483
+
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31
#define PCI_DEVICE_ID_INTEL_WILDCATPOINT_LP_XHCI 0x9cb1
@@ -418,8 +420,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
- if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483)
+ if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == PCI_DEVICE_ID_VIA_VL805) {
xhci->quirks |= XHCI_LPM_SUPPORT;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
+ }
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) {
@@ -467,11 +471,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->device == 0x9202) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->device == 0x9203)
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->vendor == PCI_VENDOR_ID_CDNS &&
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..779b01dee068 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1632,7 +1632,7 @@ struct xhci_hcd {
#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
-#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
+#define XHCI_TRB_OVERFETCH BIT_ULL(45)
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c133ec0e5717868c9967fa3df92a55e537b1aead
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030959-thee-uniformed-b4eb@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c133ec0e5717868c9967fa3df92a55e537b1aead Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio(a)gmail.com>
Date: Tue, 25 Feb 2025 11:59:27 +0200
Subject: [PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805
Raspberry Pi is a major user of those chips and they discovered a bug -
when the end of a transfer ring segment is reached, up to four TRBs can
be prefetched from the next page even if the segment ends with link TRB
and on page boundary (the chip claims to support standard 4KB pages).
It also appears that if the prefetched TRBs belong to a different ring
whose doorbell is later rung, they may be used without refreshing from
system RAM and the endpoint will stay idle if their cycle bit is stale.
Other users complain about IOMMU faults on x86 systems, unsurprisingly.
Deal with it by using existing quirk which allocates a dummy page after
each transfer ring segment. This was seen to resolve both problems. RPi
came up with a more efficient solution, shortening each segment by four
TRBs, but it complicated the driver and they ditched it for this quirk.
Also rename the quirk and add VL805 device ID macro.
Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com>
Link: https://github.com/raspberrypi/linux/issues/4685
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215906
CC: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20250225095927.2512358-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..fdf0c1008225 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2437,7 +2437,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ if (xhci->quirks & XHCI_TRB_OVERFETCH)
+ /* Buggy HC prefetches beyond segment bounds - allocate dummy space at the end */
xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
else
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index ad0ff356f6fa..54460d11f7ee 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -38,6 +38,8 @@
#define PCI_DEVICE_ID_ETRON_EJ168 0x7023
#define PCI_DEVICE_ID_ETRON_EJ188 0x7052
+#define PCI_DEVICE_ID_VIA_VL805 0x3483
+
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31
#define PCI_DEVICE_ID_INTEL_WILDCATPOINT_LP_XHCI 0x9cb1
@@ -418,8 +420,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
- if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483)
+ if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == PCI_DEVICE_ID_VIA_VL805) {
xhci->quirks |= XHCI_LPM_SUPPORT;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
+ }
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) {
@@ -467,11 +471,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->device == 0x9202) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->device == 0x9203)
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->vendor == PCI_VENDOR_ID_CDNS &&
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..779b01dee068 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1632,7 +1632,7 @@ struct xhci_hcd {
#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
-#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
+#define XHCI_TRB_OVERFETCH BIT_ULL(45)
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c133ec0e5717868c9967fa3df92a55e537b1aead
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030959-character-delouse-db17@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c133ec0e5717868c9967fa3df92a55e537b1aead Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio(a)gmail.com>
Date: Tue, 25 Feb 2025 11:59:27 +0200
Subject: [PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805
Raspberry Pi is a major user of those chips and they discovered a bug -
when the end of a transfer ring segment is reached, up to four TRBs can
be prefetched from the next page even if the segment ends with link TRB
and on page boundary (the chip claims to support standard 4KB pages).
It also appears that if the prefetched TRBs belong to a different ring
whose doorbell is later rung, they may be used without refreshing from
system RAM and the endpoint will stay idle if their cycle bit is stale.
Other users complain about IOMMU faults on x86 systems, unsurprisingly.
Deal with it by using existing quirk which allocates a dummy page after
each transfer ring segment. This was seen to resolve both problems. RPi
came up with a more efficient solution, shortening each segment by four
TRBs, but it complicated the driver and they ditched it for this quirk.
Also rename the quirk and add VL805 device ID macro.
Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com>
Link: https://github.com/raspberrypi/linux/issues/4685
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215906
CC: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20250225095927.2512358-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..fdf0c1008225 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2437,7 +2437,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ if (xhci->quirks & XHCI_TRB_OVERFETCH)
+ /* Buggy HC prefetches beyond segment bounds - allocate dummy space at the end */
xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
else
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index ad0ff356f6fa..54460d11f7ee 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -38,6 +38,8 @@
#define PCI_DEVICE_ID_ETRON_EJ168 0x7023
#define PCI_DEVICE_ID_ETRON_EJ188 0x7052
+#define PCI_DEVICE_ID_VIA_VL805 0x3483
+
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31
#define PCI_DEVICE_ID_INTEL_WILDCATPOINT_LP_XHCI 0x9cb1
@@ -418,8 +420,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
- if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483)
+ if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == PCI_DEVICE_ID_VIA_VL805) {
xhci->quirks |= XHCI_LPM_SUPPORT;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
+ }
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) {
@@ -467,11 +471,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->device == 0x9202) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->device == 0x9203)
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->vendor == PCI_VENDOR_ID_CDNS &&
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..779b01dee068 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1632,7 +1632,7 @@ struct xhci_hcd {
#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
-#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
+#define XHCI_TRB_OVERFETCH BIT_ULL(45)
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x c133ec0e5717868c9967fa3df92a55e537b1aead
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030958-june-lard-2d9f@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c133ec0e5717868c9967fa3df92a55e537b1aead Mon Sep 17 00:00:00 2001
From: Michal Pecio <michal.pecio(a)gmail.com>
Date: Tue, 25 Feb 2025 11:59:27 +0200
Subject: [PATCH] usb: xhci: Enable the TRB overfetch quirk on VIA VL805
Raspberry Pi is a major user of those chips and they discovered a bug -
when the end of a transfer ring segment is reached, up to four TRBs can
be prefetched from the next page even if the segment ends with link TRB
and on page boundary (the chip claims to support standard 4KB pages).
It also appears that if the prefetched TRBs belong to a different ring
whose doorbell is later rung, they may be used without refreshing from
system RAM and the endpoint will stay idle if their cycle bit is stale.
Other users complain about IOMMU faults on x86 systems, unsurprisingly.
Deal with it by using existing quirk which allocates a dummy page after
each transfer ring segment. This was seen to resolve both problems. RPi
came up with a more efficient solution, shortening each segment by four
TRBs, but it complicated the driver and they ditched it for this quirk.
Also rename the quirk and add VL805 device ID macro.
Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com>
Link: https://github.com/raspberrypi/linux/issues/4685
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=215906
CC: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20250225095927.2512358-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..fdf0c1008225 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2437,7 +2437,8 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
* and our use of dma addresses in the trb_address_map radix tree needs
* TRB_SEGMENT_SIZE alignment, so we pick the greater alignment need.
*/
- if (xhci->quirks & XHCI_ZHAOXIN_TRB_FETCH)
+ if (xhci->quirks & XHCI_TRB_OVERFETCH)
+ /* Buggy HC prefetches beyond segment bounds - allocate dummy space at the end */
xhci->segment_pool = dma_pool_create("xHCI ring segments", dev,
TRB_SEGMENT_SIZE * 2, TRB_SEGMENT_SIZE * 2, xhci->page_size * 2);
else
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index ad0ff356f6fa..54460d11f7ee 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -38,6 +38,8 @@
#define PCI_DEVICE_ID_ETRON_EJ168 0x7023
#define PCI_DEVICE_ID_ETRON_EJ188 0x7052
+#define PCI_DEVICE_ID_VIA_VL805 0x3483
+
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31
#define PCI_DEVICE_ID_INTEL_WILDCATPOINT_LP_XHCI 0x9cb1
@@ -418,8 +420,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
pdev->device == 0x3432)
xhci->quirks |= XHCI_BROKEN_STREAMS;
- if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483)
+ if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == PCI_DEVICE_ID_VIA_VL805) {
xhci->quirks |= XHCI_LPM_SUPPORT;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
+ }
if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA &&
pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) {
@@ -467,11 +471,11 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->device == 0x9202) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->device == 0x9203)
- xhci->quirks |= XHCI_ZHAOXIN_TRB_FETCH;
+ xhci->quirks |= XHCI_TRB_OVERFETCH;
}
if (pdev->vendor == PCI_VENDOR_ID_CDNS &&
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..779b01dee068 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1632,7 +1632,7 @@ struct xhci_hcd {
#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42)
#define XHCI_SUSPEND_RESUME_CLKS BIT_ULL(43)
#define XHCI_RESET_TO_DEFAULT BIT_ULL(44)
-#define XHCI_ZHAOXIN_TRB_FETCH BIT_ULL(45)
+#define XHCI_TRB_OVERFETCH BIT_ULL(45)
#define XHCI_ZHAOXIN_HOST BIT_ULL(46)
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 67bab13307c83fb742c2556b06cdc39dbad27f07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030908-defacing-rumor-448c@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67bab13307c83fb742c2556b06cdc39dbad27f07 Mon Sep 17 00:00:00 2001
From: Ge Yang <yangge1116(a)126.com>
Date: Wed, 19 Feb 2025 11:46:44 +0800
Subject: [PATCH] mm/hugetlb: wait for hugetlb folios to be freed
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..dbe76d4f1bfc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..811b29f77abf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d72865..a051a29e95ad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -607,6 +607,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct zone *zone;
int ret;
+ /*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
/*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 67bab13307c83fb742c2556b06cdc39dbad27f07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030907-blush-surname-f05c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67bab13307c83fb742c2556b06cdc39dbad27f07 Mon Sep 17 00:00:00 2001
From: Ge Yang <yangge1116(a)126.com>
Date: Wed, 19 Feb 2025 11:46:44 +0800
Subject: [PATCH] mm/hugetlb: wait for hugetlb folios to be freed
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..dbe76d4f1bfc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..811b29f77abf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d72865..a051a29e95ad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -607,6 +607,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct zone *zone;
int ret;
+ /*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
/*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 67bab13307c83fb742c2556b06cdc39dbad27f07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030906-iodize-baboon-b1af@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67bab13307c83fb742c2556b06cdc39dbad27f07 Mon Sep 17 00:00:00 2001
From: Ge Yang <yangge1116(a)126.com>
Date: Wed, 19 Feb 2025 11:46:44 +0800
Subject: [PATCH] mm/hugetlb: wait for hugetlb folios to be freed
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..dbe76d4f1bfc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..811b29f77abf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d72865..a051a29e95ad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -607,6 +607,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct zone *zone;
int ret;
+ /*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
/*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 67bab13307c83fb742c2556b06cdc39dbad27f07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030905-parchment-riddance-0a09@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67bab13307c83fb742c2556b06cdc39dbad27f07 Mon Sep 17 00:00:00 2001
From: Ge Yang <yangge1116(a)126.com>
Date: Wed, 19 Feb 2025 11:46:44 +0800
Subject: [PATCH] mm/hugetlb: wait for hugetlb folios to be freed
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..dbe76d4f1bfc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..811b29f77abf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d72865..a051a29e95ad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -607,6 +607,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct zone *zone;
int ret;
+ /*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
/*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 67bab13307c83fb742c2556b06cdc39dbad27f07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030904-splendor-sly-a852@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67bab13307c83fb742c2556b06cdc39dbad27f07 Mon Sep 17 00:00:00 2001
From: Ge Yang <yangge1116(a)126.com>
Date: Wed, 19 Feb 2025 11:46:44 +0800
Subject: [PATCH] mm/hugetlb: wait for hugetlb folios to be freed
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..dbe76d4f1bfc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..811b29f77abf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d72865..a051a29e95ad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -607,6 +607,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct zone *zone;
int ret;
+ /*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
/*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 67bab13307c83fb742c2556b06cdc39dbad27f07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030903-simplify-blooming-c758@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67bab13307c83fb742c2556b06cdc39dbad27f07 Mon Sep 17 00:00:00 2001
From: Ge Yang <yangge1116(a)126.com>
Date: Wed, 19 Feb 2025 11:46:44 +0800
Subject: [PATCH] mm/hugetlb: wait for hugetlb folios to be freed
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..dbe76d4f1bfc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..811b29f77abf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d72865..a051a29e95ad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -607,6 +607,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct zone *zone;
int ret;
+ /*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
/*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 67bab13307c83fb742c2556b06cdc39dbad27f07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030902-guidance-kung-0573@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 67bab13307c83fb742c2556b06cdc39dbad27f07 Mon Sep 17 00:00:00 2001
From: Ge Yang <yangge1116(a)126.com>
Date: Wed, 19 Feb 2025 11:46:44 +0800
Subject: [PATCH] mm/hugetlb: wait for hugetlb folios to be freed
Since the introduction of commit c77c0a8ac4c52 ("mm/hugetlb: defer freeing
of huge pages if in non-task context"), which supports deferring the
freeing of hugetlb pages, the allocation of contiguous memory through
cma_alloc() may fail probabilistically.
In the CMA allocation process, if it is found that the CMA area is
occupied by in-use hugetlb folios, these in-use hugetlb folios need to be
migrated to another location. When there are no available hugetlb folios
in the free hugetlb pool during the migration of in-use hugetlb folios,
new folios are allocated from the buddy system. A temporary state is set
on the newly allocated folio. Upon completion of the hugetlb folio
migration, the temporary state is transferred from the new folios to the
old folios. Normally, when the old folios with the temporary state are
freed, it is directly released back to the buddy system. However, due to
the deferred freeing of hugetlb pages, the PageBuddy() check fails,
ultimately leading to the failure of cma_alloc().
Here is a simplified call trace illustrating the process:
cma_alloc()
->__alloc_contig_migrate_range() // Migrate in-use hugetlb folios
->unmap_and_move_huge_page()
->folio_putback_hugetlb() // Free old folios
->test_pages_isolated()
->__test_page_isolated_in_pageblock()
->PageBuddy(page) // Check if the page is in buddy
To resolve this issue, we have implemented a function named
wait_for_freed_hugetlb_folios(). This function ensures that the hugetlb
folios are properly released back to the buddy system after their
migration is completed. By invoking wait_for_freed_hugetlb_folios()
before calling PageBuddy(), we ensure that PageBuddy() will succeed.
Link: https://lkml.kernel.org/r/1739936804-18199-1-git-send-email-yangge1116@126.…
Fixes: c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task context")
Signed-off-by: Ge Yang <yangge1116(a)126.com>
Reviewed-by: Muchun Song <muchun.song(a)linux.dev>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <21cnbao(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ec8c0ccc8f95..dbe76d4f1bfc 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -682,6 +682,7 @@ struct huge_bootmem_page {
int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list);
int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn);
+void wait_for_freed_hugetlb_folios(void);
struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr, bool cow_from_owner);
struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid,
@@ -1066,6 +1067,10 @@ static inline int replace_free_hugepage_folios(unsigned long start_pfn,
return 0;
}
+static inline void wait_for_freed_hugetlb_folios(void)
+{
+}
+
static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma,
unsigned long addr,
bool cow_from_owner)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 163190e89ea1..811b29f77abf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2943,6 +2943,14 @@ int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long end_pfn)
return ret;
}
+void wait_for_freed_hugetlb_folios(void)
+{
+ if (llist_empty(&hpage_freelist))
+ return;
+
+ flush_work(&free_hpage_work);
+}
+
typedef enum {
/*
* For either 0/1: we checked the per-vma resv map, and one resv
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index c608e9d72865..a051a29e95ad 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -607,6 +607,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
struct zone *zone;
int ret;
+ /*
+ * Due to the deferred freeing of hugetlb folios, the hugepage folios may
+ * not immediately release to the buddy system. This can cause PageBuddy()
+ * to fail in __test_page_isolated_in_pageblock(). To ensure that the
+ * hugetlb folios are properly released back to the buddy system, we
+ * invoke the wait_for_freed_hugetlb_folios() function to wait for the
+ * release to complete.
+ */
+ wait_for_freed_hugetlb_folios();
+
/*
* Note: pageblock_nr_pages != MAX_PAGE_ORDER. Then, chunks of free
* pages are not aligned to pageblock_nr_pages.
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 058313515d5aab10d0a01dd634f92ed4a4e71d4c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030954-polish-overeater-d2be@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 058313515d5aab10d0a01dd634f92ed4a4e71d4c Mon Sep 17 00:00:00 2001
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Date: Tue, 25 Feb 2025 17:52:55 +0800
Subject: [PATCH] mm: shmem: fix potential data corruption during shmem swapin
Alex and Kairui reported some issues (system hang or data corruption) when
swapping out or swapping in large shmem folios. This is especially easy
to reproduce when the tmpfs is mount with the 'huge=within_size'
parameter. Thanks to Kairui's reproducer, the issue can be easily
replicated.
The root cause of the problem is that swap readahead may asynchronously
swap in order 0 folios into the swap cache, while the shmem mapping can
still store large swap entries. Then an order 0 folio is inserted into
the shmem mapping without splitting the large swap entry, which overwrites
the original large swap entry, leading to data corruption.
When getting a folio from the swap cache, we should split the large swap
entry stored in the shmem mapping if the orders do not match, to fix this
issue.
Link: https://lkml.kernel.org/r/2fe47c557e74e9df5fe2437ccdc6c9115fa1bf70.17404769…
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reported-by: Alex Xu (Hello71) <alex_y_xu(a)yahoo.ca>
Reported-by: Kairui Song <ryncsn(a)gmail.com>
Closes: https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
Tested-by: Kairui Song <kasong(a)tencent.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Matthew Wilcow <willy(a)infradead.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/shmem.c b/mm/shmem.c
index 4ea6109a8043..cebbac97a221 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2253,7 +2253,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
struct folio *folio = NULL;
bool skip_swapcache = false;
swp_entry_t swap;
- int error, nr_pages;
+ int error, nr_pages, order, split_order;
VM_BUG_ON(!*foliop || !xa_is_value(*foliop));
swap = radix_to_swp_entry(*foliop);
@@ -2272,10 +2272,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
/* Look it up and read it in.. */
folio = swap_cache_get_folio(swap, NULL, 0);
+ order = xa_get_order(&mapping->i_pages, index);
if (!folio) {
- int order = xa_get_order(&mapping->i_pages, index);
bool fallback_order0 = false;
- int split_order;
/* Or update major stats only when swapin succeeds?? */
if (fault_type) {
@@ -2339,6 +2338,29 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
error = -ENOMEM;
goto failed;
}
+ } else if (order != folio_order(folio)) {
+ /*
+ * Swap readahead may swap in order 0 folios into swapcache
+ * asynchronously, while the shmem mapping can still stores
+ * large swap entries. In such cases, we should split the
+ * large swap entry to prevent possible data corruption.
+ */
+ split_order = shmem_split_large_entry(inode, index, swap, gfp);
+ if (split_order < 0) {
+ error = split_order;
+ goto failed;
+ }
+
+ /*
+ * If the large swap entry has already been split, it is
+ * necessary to recalculate the new swap entry based on
+ * the old order alignment.
+ */
+ if (split_order > 0) {
+ pgoff_t offset = index - round_down(index, 1 << split_order);
+
+ swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
+ }
}
alloced:
@@ -2346,7 +2368,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
folio->swap.val != swap.val ||
- !shmem_confirm_swap(mapping, index, swap)) {
+ !shmem_confirm_swap(mapping, index, swap) ||
+ xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
error = -EEXIST;
goto unlock;
}
The patch below does not apply to the 6.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.13.y
git checkout FETCH_HEAD
git cherry-pick -x 058313515d5aab10d0a01dd634f92ed4a4e71d4c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030953-alkalize-eardrum-de40@gregkh' --subject-prefix 'PATCH 6.13.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 058313515d5aab10d0a01dd634f92ed4a4e71d4c Mon Sep 17 00:00:00 2001
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Date: Tue, 25 Feb 2025 17:52:55 +0800
Subject: [PATCH] mm: shmem: fix potential data corruption during shmem swapin
Alex and Kairui reported some issues (system hang or data corruption) when
swapping out or swapping in large shmem folios. This is especially easy
to reproduce when the tmpfs is mount with the 'huge=within_size'
parameter. Thanks to Kairui's reproducer, the issue can be easily
replicated.
The root cause of the problem is that swap readahead may asynchronously
swap in order 0 folios into the swap cache, while the shmem mapping can
still store large swap entries. Then an order 0 folio is inserted into
the shmem mapping without splitting the large swap entry, which overwrites
the original large swap entry, leading to data corruption.
When getting a folio from the swap cache, we should split the large swap
entry stored in the shmem mapping if the orders do not match, to fix this
issue.
Link: https://lkml.kernel.org/r/2fe47c557e74e9df5fe2437ccdc6c9115fa1bf70.17404769…
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reported-by: Alex Xu (Hello71) <alex_y_xu(a)yahoo.ca>
Reported-by: Kairui Song <ryncsn(a)gmail.com>
Closes: https://lore.kernel.org/all/1738717785.im3r5g2vxc.none@localhost/
Tested-by: Kairui Song <kasong(a)tencent.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Matthew Wilcow <willy(a)infradead.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/shmem.c b/mm/shmem.c
index 4ea6109a8043..cebbac97a221 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2253,7 +2253,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
struct folio *folio = NULL;
bool skip_swapcache = false;
swp_entry_t swap;
- int error, nr_pages;
+ int error, nr_pages, order, split_order;
VM_BUG_ON(!*foliop || !xa_is_value(*foliop));
swap = radix_to_swp_entry(*foliop);
@@ -2272,10 +2272,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
/* Look it up and read it in.. */
folio = swap_cache_get_folio(swap, NULL, 0);
+ order = xa_get_order(&mapping->i_pages, index);
if (!folio) {
- int order = xa_get_order(&mapping->i_pages, index);
bool fallback_order0 = false;
- int split_order;
/* Or update major stats only when swapin succeeds?? */
if (fault_type) {
@@ -2339,6 +2338,29 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
error = -ENOMEM;
goto failed;
}
+ } else if (order != folio_order(folio)) {
+ /*
+ * Swap readahead may swap in order 0 folios into swapcache
+ * asynchronously, while the shmem mapping can still stores
+ * large swap entries. In such cases, we should split the
+ * large swap entry to prevent possible data corruption.
+ */
+ split_order = shmem_split_large_entry(inode, index, swap, gfp);
+ if (split_order < 0) {
+ error = split_order;
+ goto failed;
+ }
+
+ /*
+ * If the large swap entry has already been split, it is
+ * necessary to recalculate the new swap entry based on
+ * the old order alignment.
+ */
+ if (split_order > 0) {
+ pgoff_t offset = index - round_down(index, 1 << split_order);
+
+ swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
+ }
}
alloced:
@@ -2346,7 +2368,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
folio_lock(folio);
if ((!skip_swapcache && !folio_test_swapcache(folio)) ||
folio->swap.val != swap.val ||
- !shmem_confirm_swap(mapping, index, swap)) {
+ !shmem_confirm_swap(mapping, index, swap) ||
+ xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
error = -EEXIST;
goto unlock;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x af288a426c3e3552b62595c6138ec6371a17dbba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030937-relax-dubbed-d185@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From af288a426c3e3552b62595c6138ec6371a17dbba Mon Sep 17 00:00:00 2001
From: Ma Wupeng <mawupeng1(a)huawei.com>
Date: Mon, 17 Feb 2025 09:43:29 +0800
Subject: [PATCH] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned
folio
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------
kernel BUG at ./include/linux/swapops.h:400!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0xb08/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0xb08/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com
Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a6abd8d4a09c..16cf9e17077e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);
- if (folio_mapped(folio))
+ if (folio_mapped(folio)) {
+ folio_lock(folio);
unmap_poisoned_folio(folio, pfn, false);
+ folio_unlock(folio);
+ }
goto put_folio;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x af288a426c3e3552b62595c6138ec6371a17dbba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030936-oink-rocklike-abc3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From af288a426c3e3552b62595c6138ec6371a17dbba Mon Sep 17 00:00:00 2001
From: Ma Wupeng <mawupeng1(a)huawei.com>
Date: Mon, 17 Feb 2025 09:43:29 +0800
Subject: [PATCH] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned
folio
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------
kernel BUG at ./include/linux/swapops.h:400!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0xb08/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0xb08/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com
Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a6abd8d4a09c..16cf9e17077e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);
- if (folio_mapped(folio))
+ if (folio_mapped(folio)) {
+ folio_lock(folio);
unmap_poisoned_folio(folio, pfn, false);
+ folio_unlock(folio);
+ }
goto put_folio;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x af288a426c3e3552b62595c6138ec6371a17dbba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030935-pasted-diner-95df@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From af288a426c3e3552b62595c6138ec6371a17dbba Mon Sep 17 00:00:00 2001
From: Ma Wupeng <mawupeng1(a)huawei.com>
Date: Mon, 17 Feb 2025 09:43:29 +0800
Subject: [PATCH] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned
folio
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------
kernel BUG at ./include/linux/swapops.h:400!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0xb08/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0xb08/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com
Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a6abd8d4a09c..16cf9e17077e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);
- if (folio_mapped(folio))
+ if (folio_mapped(folio)) {
+ folio_lock(folio);
unmap_poisoned_folio(folio, pfn, false);
+ folio_unlock(folio);
+ }
goto put_folio;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x af288a426c3e3552b62595c6138ec6371a17dbba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025030934-clock-preview-4a7a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From af288a426c3e3552b62595c6138ec6371a17dbba Mon Sep 17 00:00:00 2001
From: Ma Wupeng <mawupeng1(a)huawei.com>
Date: Mon, 17 Feb 2025 09:43:29 +0800
Subject: [PATCH] hwpoison, memory_hotplug: lock folio before unmap hwpoisoned
folio
Commit b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to
be offlined) add page poison checks in do_migrate_range in order to make
offline hwpoisoned page possible by introducing isolate_lru_page and
try_to_unmap for hwpoisoned page. However folio lock must be held before
calling try_to_unmap. Add it to fix this problem.
Warning will be produced if folio is not locked during unmap:
------------[ cut here ]------------
kernel BUG at ./include/linux/swapops.h:400!
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 UID: 0 PID: 411 Comm: bash Tainted: G W 6.13.0-rc1-00016-g3c434c7ee82a-dirty #41
Tainted: [W]=WARN
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : try_to_unmap_one+0xb08/0xd3c
lr : try_to_unmap_one+0x3dc/0xd3c
Call trace:
try_to_unmap_one+0xb08/0xd3c (P)
try_to_unmap_one+0x3dc/0xd3c (L)
rmap_walk_anon+0xdc/0x1f8
rmap_walk+0x3c/0x58
try_to_unmap+0x88/0x90
unmap_poisoned_folio+0x30/0xa8
do_migrate_range+0x4a0/0x568
offline_pages+0x5a4/0x670
memory_block_action+0x17c/0x374
memory_subsys_offline+0x3c/0x78
device_offline+0xa4/0xd0
state_store+0x8c/0xf0
dev_attr_store+0x18/0x2c
sysfs_kf_write+0x44/0x54
kernfs_fop_write_iter+0x118/0x1a8
vfs_write+0x3a8/0x4bc
ksys_write+0x6c/0xf8
__arm64_sys_write+0x1c/0x28
invoke_syscall+0x44/0x100
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x30/0xd0
el0t_64_sync_handler+0xc8/0xcc
el0t_64_sync+0x198/0x19c
Code: f9407be0 b5fff320 d4210000 17ffff97 (d4210000)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20250217014329.3610326-4-mawupeng1@huawei.com
Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined")
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index a6abd8d4a09c..16cf9e17077e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1832,8 +1832,11 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
(folio_test_large(folio) && folio_test_has_hwpoisoned(folio))) {
if (WARN_ON(folio_test_lru(folio)))
folio_isolate_lru(folio);
- if (folio_mapped(folio))
+ if (folio_mapped(folio)) {
+ folio_lock(folio);
unmap_poisoned_folio(folio, pfn, false);
+ folio_unlock(folio);
+ }
goto put_folio;
}