Currently the driver only configure the data edge sampling partially. The
AM62 require it to be configured in two distincts registers: one in tidss
and one in the general device registers.
Introduce a new dt property to link the proper syscon node from the main
device registers into the tidss driver.
Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
---
Cc: stable(a)vger.kernel.org
Signed-off-by: Louis Chauvet <louis.chauvet(a)bootlin.com>
---
Louis Chauvet (4):
dt-bindings: display: ti,am65x-dss: Add clk property for data edge synchronization
dt-bindings: mfd: syscon: Add ti,am625-dss-clk-ctrl
arm64: dts: ti: k3-am62-main: Add tidss clk-ctrl property
drm/tidss: Fix sampling edge configuration
.../devicetree/bindings/display/ti/ti,am65x-dss.yaml | 6 ++++++
Documentation/devicetree/bindings/mfd/syscon.yaml | 3 ++-
arch/arm64/boot/dts/ti/k3-am62-main.dtsi | 6 ++++++
drivers/gpu/drm/tidss/tidss_dispc.c | 14 ++++++++++++++
4 files changed, 28 insertions(+), 1 deletion(-)
---
base-commit: 85c23f28905cf20a86ceec3cfd7a0a5572c9eb13
change-id: 20250730-fix-edge-handling-9123f7438910
Best regards,
--
Louis Chauvet <louis.chauvet(a)bootlin.com>
From: Rong Zhang <i(a)rong.moe>
[ Upstream commit e5d1e313d7b6272d6dfda983906d99f97ad9062b ]
The device ID of Strix Halo Data Fabric Function 3 has been in the tree
since commit 0e640f0a47d8 ("x86/amd_nb: Add new PCI IDs for AMD family
0x1a"), but is somehow missing from k10temp_id_table.
Add it so that it works out of the box.
Tested on Beelink GTR9 Pro Mini PC.
Signed-off-by: Rong Zhang <i(a)rong.moe>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Link: https://lore.kernel.org/r/20250823180443.85512-1-i@rong.moe
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
YES – the added ID lets the existing k10temp driver bind to Strix Halo’s
DF3 device so users get temperature readings on that platform.
- `drivers/hwmon/k10temp.c:560` gains
`PCI_DEVICE_ID_AMD_1AH_M70H_DF_F3`, fixing the current omission that
prevents the module from attaching to Strix Halo’s Data Fabric
function 3 and leaves its sensors unavailable.
- The constant already exists in released kernels
(`include/linux/pci_ids.h:587`) and is used by the AMD northbridge
driver (`arch/x86/kernel/amd_nb.c:98`), so the new table entry simply
connects existing infrastructure; no functional code paths change.
- Scope is minimal (one ID entry, no new logic), making regression risk
negligible; the patch has been verified on shipping hardware (Beelink
GTR9 Pro).
- For stable backports, this applies cleanly to branches ≥ v6.10 where
the PCI ID is defined; older long-term trees would first need commit
0e640f0a47d8 (or an equivalent definition).
Natural next step: backport to the relevant stable lines that already
carry the Strix Halo PCI ID definition (6.10.y, upcoming 6.11.y, etc.).
drivers/hwmon/k10temp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index 2f90a2e9ad496..b98d5ec72c4ff 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -565,6 +565,7 @@ static const struct pci_device_id k10temp_id_table[] = {
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_1AH_M20H_DF_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_1AH_M50H_DF_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_1AH_M60H_DF_F3) },
+ { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_1AH_M70H_DF_F3) },
{ PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_1AH_M90H_DF_F3) },
{ PCI_VDEVICE(HYGON, PCI_DEVICE_ID_AMD_17H_DF_F3) },
{}
--
2.51.0
Our implementation for BAR2 (lmembar) resize works at the xe_vram layer
and only releases that BAR before resizing. That is not always
sufficient. If the parent bridge needs to move, the BAR0 also needs to
be released, otherwise the resize fails. This is the case of not having
enough space allocated from the beginning.
Also, there's a BAR0 in the upstream port of the pcie switch in BMG
preventing the resize to propagate to the bridge as previously discussed
in https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
and https://lore.kernel.org/intel-xe/wqukxnxni2dbpdhri3cbvlrzsefgdanesgskzmxi5s…
I'm bringing that commit from Ilpo here so this can be tested with the
xe changes and propagate to stable. Note that the use of a pci fixup is
not ideal, but without intrusive changes on resource fitting it's
possibly the best alternative. I also have confirmation from HW folks
that the BAR in the upstream port has no production use.
I have more cleanups on top on the xe side, but those conflict with some
refactors Ilpo is working on as prep for the resource fitting, so I will
wait things settle to submit again.
I propose to take this through the drm tree.
With this I could resize the lmembar on some problematic hosts and after
doing an SBR, with one caveat: the audio device also prevents the BAR
from moving and it needs to be manually removed before resizing. With
the PCI refactors and BAR fitting logic that Ilpo is working on, it's
expected that it won't be needed for a long time.
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
---
Ilpo Järvinen (1):
PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize
Lucas De Marchi (1):
drm/xe: Move rebar to be done earlier
drivers/gpu/drm/xe/xe_pci.c | 2 ++
drivers/gpu/drm/xe/xe_vram.c | 34 ++++++++++++++++++++++++++--------
drivers/gpu/drm/xe/xe_vram.h | 1 +
drivers/pci/quirks.c | 23 +++++++++++++++++++++++
4 files changed, 52 insertions(+), 8 deletions(-)
base-commit: 8031d70dbb4201841897de480cec1f9750d4a5dc
change-id: 20250917-xe-pci-rebar-2-c0fe2f04c879
Lucas De Marchi
The patch titled
Subject: ocfs2: clear extent cache after moving/defragmenting extents
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-clear-extent-cache-after-moving-defragmenting-extents.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Deepanshu Kartikey <kartikey406(a)gmail.com>
Subject: ocfs2: clear extent cache after moving/defragmenting extents
Date: Thu, 9 Oct 2025 21:19:03 +0530
The extent map cache can become stale when extents are moved or
defragmented, causing subsequent operations to see outdated extent flags.
This triggers a BUG_ON in ocfs2_refcount_cal_cow_clusters().
The problem occurs when:
1. copy_file_range() creates a reflinked extent with OCFS2_EXT_REFCOUNTED
2. ioctl(FITRIM) triggers ocfs2_move_extents()
3. __ocfs2_move_extents_range() reads and caches the extent (flags=0x2)
4. ocfs2_move_extent()/ocfs2_defrag_extent() calls __ocfs2_move_extent()
which clears OCFS2_EXT_REFCOUNTED flag on disk (flags=0x0)
5. The extent map cache is not invalidated after the move
6. Later write() operations read stale cached flags (0x2) but disk has
updated flags (0x0), causing a mismatch
7. BUG_ON(!(rec->e_flags & OCFS2_EXT_REFCOUNTED)) triggers
Fix by clearing the extent map cache after each extent move/defrag
operation in __ocfs2_move_extents_range(). This ensures subsequent
operations read fresh extent data from disk.
Link: https://lore.kernel.org/all/20251009142917.517229-1-kartikey406@gmail.com/T/
Link: https://lkml.kernel.org/r/20251009154903.522339-1-kartikey406@gmail.com
Signed-off-by: Deepanshu Kartikey <kartikey406(a)gmail.com>
Reported-by: syzbot+6fdd8fa3380730a4b22c(a)syzkaller.appspotmail.com
Tested-by: syzbot+6fdd8fa3380730a4b22c(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=2959889e1f6e216585ce522f7e8bc002b46ad9…
Reviewed-by: Mark Fasheh <mark(a)fasheh.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/move_extents.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/fs/ocfs2/move_extents.c~ocfs2-clear-extent-cache-after-moving-defragmenting-extents
+++ a/fs/ocfs2/move_extents.c
@@ -867,6 +867,11 @@ static int __ocfs2_move_extents_range(st
mlog_errno(ret);
goto out;
}
+ /*
+ * Invalidate extent cache after moving/defragging to prevent
+ * stale cached data with outdated extent flags.
+ */
+ ocfs2_extent_map_trunc(inode, cpos);
context->clusters_moved += alloc_size;
next:
_
Patches currently in -mm which might be from kartikey406(a)gmail.com are
hugetlbfs-check-for-shareable-lock-before-calling-huge_pmd_unshare.patch
ocfs2-clear-extent-cache-after-moving-defragmenting-extents.patch
Hi,
This series adds support for the power domains on Google GS101. It's
fairly similar to SoCs already supported by this driver, except that
register acces does not work via plain ioremap() / readl() / writel().
Instead, the regmap created by the PMU driver must be used (which uses
Arm SMCC calls under the hood).
The DT update to add the new required properties on gs101 will be
posted separately.
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
---
Changes in v2:
- Krzysztof:
- move google,gs101-pmu binding into separate file
- mark devm_kstrdup_const() patch as fix
- use bool for need_early_sync_state
- merge patches 8 and 10 from v1 series into one patch
- collect tags
- Link to v1: https://lore.kernel.org/r/20251006-gs101-pd-v1-0-f0cb0c01ea7b@linaro.org
---
André Draszik (10):
dt-bindings: power: samsung: add google,gs101-pd
dt-bindings: soc: samsung: exynos-pmu: move gs101-pmu into separate binding
dt-bindings: soc: samsung: gs101-pmu: allow power domains as children
pmdomain: samsung: plug potential memleak during probe
pmdomain: samsung: convert to using regmap
pmdomain: samsung: convert to regmap_read_poll_timeout()
pmdomain: samsung: don't hardcode offset for registers to 0 and 4
pmdomain: samsung: selectively handle enforced sync_state
pmdomain: samsung: add support for google,gs101-pd
pmdomain: samsung: use dev_err() instead of pr_err()
.../devicetree/bindings/power/pd-samsung.yaml | 1 +
.../bindings/soc/google/google,gs101-pmu.yaml | 107 +++++++++++++++++
.../bindings/soc/samsung/exynos-pmu.yaml | 20 ----
MAINTAINERS | 1 +
drivers/pmdomain/samsung/exynos-pm-domains.c | 126 +++++++++++++++------
5 files changed, 201 insertions(+), 54 deletions(-)
---
base-commit: a5f97c90e75f09f24ece2dca34168722b140a798
change-id: 20251001-gs101-pd-d4dc97d70a84
Best regards,
--
André Draszik <andre.draszik(a)linaro.org>
For idpf:
Milena fixes a memory leak in the idpf reset logic when the driver resets
with an outstanding Tx timestamp.
For ixgbe and ixgbevf:
Jedrzej fixes an issue with reporting link speed on E610 VFs.
Jedrzej also fixes the VF mailbox API incompatibilities caused by the
confusion with API v1.4, v1.5, and v1.6. The v1.4 API introduced IPSEC
offload, but this was only supported on Linux hosts. The v1.5 API
introduced a new mailbox API which is necessary to resolve issues on ESX
hosts. The v1.6 API introduced a new link management API for E610. Jedrzej
introduces a new v1.7 API with a feature negotiation which enables properly
checking if features such as IPSEC or the ESX mailbox APIs are supported.
This resolves issues with compatibility on different hosts, and aligns the
API across hosts instead of having Linux require custom mailbox API
versions for IPSEC offload.
Koichiro fixes a KASAN use-after-free bug in ixgbe_remove().
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
---
Changes in v3:
- Rebase and verified compilation issues are resolved.
- Configured b4 to exclude undeliverable addresses.
- Link to v2: https://lore.kernel.org/r/20251003-jk-iwl-net-2025-10-01-v2-0-e59b4141c1b5@…
Changes in v2:
- Drop Emil's idpf_vport_open race fix for now.
- Add my signature.
- Link to v1: https://lore.kernel.org/r/20251001-jk-iwl-net-2025-10-01-v1-0-49fa99e86600@…
---
Jedrzej Jagielski (4):
ixgbevf: fix getting link speed data for E610 devices
ixgbe: handle IXGBE_VF_GET_PF_LINK_STATE mailbox operation
ixgbevf: fix mailbox API compatibility by negotiating supported features
ixgbe: handle IXGBE_VF_FEATURES_NEGOTIATE mbox cmd
Koichiro Den (1):
ixgbe: fix too early devlink_free() in ixgbe_remove()
Milena Olech (1):
idpf: cleanup remaining SKBs in PTP flows
drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h | 15 ++
drivers/net/ethernet/intel/ixgbevf/defines.h | 1 +
drivers/net/ethernet/intel/ixgbevf/ixgbevf.h | 7 +
drivers/net/ethernet/intel/ixgbevf/mbx.h | 8 +
drivers/net/ethernet/intel/ixgbevf/vf.h | 1 +
drivers/net/ethernet/intel/idpf/idpf_ptp.c | 3 +
.../net/ethernet/intel/idpf/idpf_virtchnl_ptp.c | 1 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 79 +++++++++
drivers/net/ethernet/intel/ixgbevf/ipsec.c | 10 ++
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 34 +++-
drivers/net/ethernet/intel/ixgbevf/vf.c | 182 +++++++++++++++++----
12 files changed, 310 insertions(+), 34 deletions(-)
---
base-commit: fea8cdf6738a8b25fccbb7b109b440795a0892cb
change-id: 20251001-jk-iwl-net-2025-10-01-92cd2a626ff7
Best regards,
--
Jacob Keller <jacob.e.keller(a)intel.com>
The patch titled
Subject: dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
dma-debug-dont-report-false-positives-with-dma_bounce_unaligned_kmalloc.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Marek Szyprowski <m.szyprowski(a)samsung.com>
Subject: dma-debug: don't report false positives with DMA_BOUNCE_UNALIGNED_KMALLOC
Date: Thu, 9 Oct 2025 16:15:08 +0200
Commit 370645f41e6e ("dma-mapping: force bouncing if the kmalloc() size is
not cache-line-aligned") introduced DMA_BOUNCE_UNALIGNED_KMALLOC feature
and lets architecture specific code to configure kmalloc slabs with sizes
smaller than the value of dma_get_cache_alignment().
When that feature is enabled, the physical address of some small
kmalloc()-ed buffers might be not aligned to the CPU cachelines, thus not
really suitable for typical DMA. To properly handle that case a SWIOTLB
buffer bouncing is used, so no CPU cache corruption occurs. When that
happens, there is no point reporting a false-positive DMA-API warning that
the buffer is not properly aligned, as this is not a client driver fault.
Link: https://lkml.kernel.org/r/20251009141508.2342138-1-m.szyprowski@samsung.com
Fixes: 370645f41e6e ("dma-mapping: force bouncing if the kmalloc() size is not cache-line-aligned")
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Inki Dae <m.szyprowski(a)samsung.com>
Cc: Robin Murohy <robin.murphy(a)arm.com>
Cc: "Isaac J. Manjarres" <isaacmanjarres(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/dma/debug.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/kernel/dma/debug.c~dma-debug-dont-report-false-positives-with-dma_bounce_unaligned_kmalloc
+++ a/kernel/dma/debug.c
@@ -23,6 +23,7 @@
#include <linux/ctype.h>
#include <linux/list.h>
#include <linux/slab.h>
+#include <linux/swiotlb.h>
#include <asm/sections.h>
#include "debug.h"
@@ -594,7 +595,9 @@ static void add_dma_entry(struct dma_deb
if (rc == -ENOMEM) {
pr_err_once("cacheline tracking ENOMEM, dma-debug disabled\n");
global_disable = true;
- } else if (rc == -EEXIST && !(attrs & DMA_ATTR_SKIP_CPU_SYNC)) {
+ } else if (rc == -EEXIST && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+ !(IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) &&
+ is_swiotlb_allocated())) {
err_printk(entry->dev, entry,
"cacheline tracking EEXIST, overlapping mappings aren't supported\n");
}
_
Patches currently in -mm which might be from m.szyprowski(a)samsung.com are
dma-debug-dont-report-false-positives-with-dma_bounce_unaligned_kmalloc.patch
A few cleanups and a bugfix that are either suitable after the swap
table phase I or found during code review.
Patch 1 is a bugfix and needs to be included in the stable branch,
the rest have no behavior change.
---
Kairui Song (4):
mm, swap: do not perform synchronous discard during allocation
mm, swap: rename helper for setup bad slots
mm, swap: cleanup swap entry allocation parameter
mm/migrate, swap: drop usage of folio_index
include/linux/swap.h | 4 ++--
mm/migrate.c | 4 ++--
mm/shmem.c | 2 +-
mm/swap.h | 21 -----------------
mm/swapfile.c | 64 ++++++++++++++++++++++++++++++++++++----------------
mm/vmscan.c | 4 ++--
6 files changed, 52 insertions(+), 47 deletions(-)
---
base-commit: 53e573001f2b5168f9b65d2b79e9563a3b479c17
change-id: 20251007-swap-clean-after-swap-table-p1-b9a7635ee3fa
Best regards,
--
Kairui Song <kasong(a)tencent.com>
After the loop that converts characters to ucs2 ends, the variable i
may be greater than or equal to len. However, when checking whether the
last byte of p_cstring is NULL, the variable i is used as is, resulting
in an out-of-bounds read if i >= len.
Therefore, to prevent this, we need to modify the function to check
whether i is less than len, and if i is greater than or equal to len,
to check p_cstring[len - 1] byte.
Cc: <stable(a)vger.kernel.org>
Reported-by: syzbot+98cc76a76de46b3714d4(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=98cc76a76de46b3714d4
Fixes: 370e812b3ec1 ("exfat: add nls operations")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
---
fs/exfat/nls.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exfat/nls.c b/fs/exfat/nls.c
index 8243d94ceaf4..a52f3494eb20 100644
--- a/fs/exfat/nls.c
+++ b/fs/exfat/nls.c
@@ -616,7 +616,7 @@ static int exfat_nls_to_ucs2(struct super_block *sb,
unilen++;
}
- if (p_cstring[i] != '\0')
+ if (p_cstring[min(i, len - 1)] != '\0')
lossy |= NLS_NAME_OVERLEN;
*uniname = '\0';
--
From: Steven Rostedt <rostedt(a)goodmis.org>
It was reported that using __copy_from_user_inatomic() can actually
schedule. Which is bad when preemption is disabled. Even though there's
logic to check in_atomic() is set, but this is a nop when the kernel is
configured with PREEMPT_NONE. This is due to page faulting and the code
could schedule with preemption disabled.
Link: https://lore.kernel.org/all/20250819105152.2766363-1-luogengkun@huaweicloud…
The solution was to change the __copy_from_user_inatomic() to
copy_from_user_nofault(). But then it was reported that this caused a
regression in Android. There's several applications writing into
trace_marker() in Android, but now instead of showing the expected data,
it is showing:
tracing_mark_write: <faulted>
After reverting the conversion to copy_from_user_nofault(), Android was
able to get the data again.
Writes to the trace_marker is a way to efficiently and quickly enter data
into the Linux tracing buffer. It takes no locks and was designed to be as
non-intrusive as possible. This means it cannot allocate memory, and must
use pre-allocated data.
A method that is actively being worked on to have faultable system call
tracepoints read user space data is to allocate per CPU buffers, and use
them in the callback. The method uses a technique similar to seqcount.
That is something like this:
preempt_disable();
cpu = smp_processor_id();
buffer = this_cpu_ptr(&pre_allocated_cpu_buffers, cpu);
do {
cnt = nr_context_switches_cpu(cpu);
migrate_disable();
preempt_enable();
ret = copy_from_user(buffer, ptr, size);
preempt_disable();
migrate_enable();
} while (!ret && cnt != nr_context_switches_cpu(cpu));
if (!ret)
ring_buffer_write(buffer);
preempt_enable();
It's a little more involved than that, but the above is the basic logic.
The idea is to acquire the current CPU buffer, disable migration, and then
enable preemption. At this moment, it can safely use copy_from_user().
After reading the data from user space, it disables preemption again. It
then checks to see if there was any new scheduling on this CPU. If there
was, it must assume that the buffer was corrupted by another task. If
there wasn't, then the buffer is still valid as only tasks in preemptable
context can write to this buffer and only those that are running on the
CPU.
By using this method, where trace_marker open allocates the per CPU
buffers, trace_marker writes can access user space and even fault it in,
without having to allocate or take any locks of its own.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Luo Gengkun <luogengkun(a)huaweicloud.com>
Cc: Wattson CI <wattson-external(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/20251008124510.6dba541a@gandalf.local.home
Fixes: 3d62ab32df065 ("tracing: Fix tracing_marker may trigger page fault during preempt_disable")
Reported-by: Runping Lai <runpinglai(a)google.com>
Tested-by: Runping Lai <runpinglai(a)google.com>
Closes: https://lore.kernel.org/linux-trace-kernel/20251007003417.3470979-2-runping…
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 268 +++++++++++++++++++++++++++++++++++--------
1 file changed, 220 insertions(+), 48 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index b3c94fbaf002..0fd582651293 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4791,12 +4791,6 @@ int tracing_single_release_file_tr(struct inode *inode, struct file *filp)
return single_release(inode, filp);
}
-static int tracing_mark_open(struct inode *inode, struct file *filp)
-{
- stream_open(inode, filp);
- return tracing_open_generic_tr(inode, filp);
-}
-
static int tracing_release(struct inode *inode, struct file *file)
{
struct trace_array *tr = inode->i_private;
@@ -7163,7 +7157,7 @@ tracing_free_buffer_release(struct inode *inode, struct file *filp)
#define TRACE_MARKER_MAX_SIZE 4096
-static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user *ubuf,
+static ssize_t write_marker_to_buffer(struct trace_array *tr, const char *buf,
size_t cnt, unsigned long ip)
{
struct ring_buffer_event *event;
@@ -7173,20 +7167,11 @@ static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user
int meta_size;
ssize_t written;
size_t size;
- int len;
-
-/* Used in tracing_mark_raw_write() as well */
-#define FAULTED_STR "<faulted>"
-#define FAULTED_SIZE (sizeof(FAULTED_STR) - 1) /* '\0' is already accounted for */
meta_size = sizeof(*entry) + 2; /* add '\0' and possible '\n' */
again:
size = cnt + meta_size;
- /* If less than "<faulted>", then make sure we can still add that */
- if (cnt < FAULTED_SIZE)
- size += FAULTED_SIZE - cnt;
-
buffer = tr->array_buffer.buffer;
event = __trace_buffer_lock_reserve(buffer, TRACE_PRINT, size,
tracing_gen_ctx());
@@ -7196,9 +7181,6 @@ static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user
* make it smaller and try again.
*/
if (size > ring_buffer_max_event_size(buffer)) {
- /* cnt < FAULTED size should never be bigger than max */
- if (WARN_ON_ONCE(cnt < FAULTED_SIZE))
- return -EBADF;
cnt = ring_buffer_max_event_size(buffer) - meta_size;
/* The above should only happen once */
if (WARN_ON_ONCE(cnt + meta_size == size))
@@ -7212,14 +7194,8 @@ static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user
entry = ring_buffer_event_data(event);
entry->ip = ip;
-
- len = copy_from_user_nofault(&entry->buf, ubuf, cnt);
- if (len) {
- memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE);
- cnt = FAULTED_SIZE;
- written = -EFAULT;
- } else
- written = cnt;
+ memcpy(&entry->buf, buf, cnt);
+ written = cnt;
if (tr->trace_marker_file && !list_empty(&tr->trace_marker_file->triggers)) {
/* do not add \n before testing triggers, but add \0 */
@@ -7243,6 +7219,169 @@ static ssize_t write_marker_to_buffer(struct trace_array *tr, const char __user
return written;
}
+struct trace_user_buf {
+ char *buf;
+};
+
+struct trace_user_buf_info {
+ struct trace_user_buf __percpu *tbuf;
+ int ref;
+};
+
+
+static DEFINE_MUTEX(trace_user_buffer_mutex);
+static struct trace_user_buf_info *trace_user_buffer;
+
+static void trace_user_fault_buffer_free(struct trace_user_buf_info *tinfo)
+{
+ char *buf;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ buf = per_cpu_ptr(tinfo->tbuf, cpu)->buf;
+ kfree(buf);
+ }
+ free_percpu(tinfo->tbuf);
+ kfree(tinfo);
+}
+
+static int trace_user_fault_buffer_enable(void)
+{
+ struct trace_user_buf_info *tinfo;
+ char *buf;
+ int cpu;
+
+ guard(mutex)(&trace_user_buffer_mutex);
+
+ if (trace_user_buffer) {
+ trace_user_buffer->ref++;
+ return 0;
+ }
+
+ tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL);
+ if (!tinfo)
+ return -ENOMEM;
+
+ tinfo->tbuf = alloc_percpu(struct trace_user_buf);
+ if (!tinfo->tbuf) {
+ kfree(tinfo);
+ return -ENOMEM;
+ }
+
+ tinfo->ref = 1;
+
+ /* Clear each buffer in case of error */
+ for_each_possible_cpu(cpu) {
+ per_cpu_ptr(tinfo->tbuf, cpu)->buf = NULL;
+ }
+
+ for_each_possible_cpu(cpu) {
+ buf = kmalloc_node(TRACE_MARKER_MAX_SIZE, GFP_KERNEL,
+ cpu_to_node(cpu));
+ if (!buf) {
+ trace_user_fault_buffer_free(tinfo);
+ return -ENOMEM;
+ }
+ per_cpu_ptr(tinfo->tbuf, cpu)->buf = buf;
+ }
+
+ trace_user_buffer = tinfo;
+
+ return 0;
+}
+
+static void trace_user_fault_buffer_disable(void)
+{
+ struct trace_user_buf_info *tinfo;
+
+ guard(mutex)(&trace_user_buffer_mutex);
+
+ tinfo = trace_user_buffer;
+
+ if (WARN_ON_ONCE(!tinfo))
+ return;
+
+ if (--tinfo->ref)
+ return;
+
+ trace_user_fault_buffer_free(tinfo);
+ trace_user_buffer = NULL;
+}
+
+/* Must be called with preemption disabled */
+static char *trace_user_fault_read(struct trace_user_buf_info *tinfo,
+ const char __user *ptr, size_t size,
+ size_t *read_size)
+{
+ int cpu = smp_processor_id();
+ char *buffer = per_cpu_ptr(tinfo->tbuf, cpu)->buf;
+ unsigned int cnt;
+ int trys = 0;
+ int ret;
+
+ if (size > TRACE_MARKER_MAX_SIZE)
+ size = TRACE_MARKER_MAX_SIZE;
+ *read_size = 0;
+
+ /*
+ * This acts similar to a seqcount. The per CPU context switches are
+ * recorded, migration is disabled and preemption is enabled. The
+ * read of the user space memory is copied into the per CPU buffer.
+ * Preemption is disabled again, and if the per CPU context switches count
+ * is still the same, it means the buffer has not been corrupted.
+ * If the count is different, it is assumed the buffer is corrupted
+ * and reading must be tried again.
+ */
+
+ do {
+ /*
+ * If for some reason, copy_from_user() always causes a context
+ * switch, this would then cause an infinite loop.
+ * If this task is preempted by another user space task, it
+ * will cause this task to try again. But just in case something
+ * changes where the copying from user space causes another task
+ * to run, prevent this from going into an infinite loop.
+ * 100 tries should be plenty.
+ */
+ if (WARN_ONCE(trys++ > 100, "Error: Too many tries to read user space"))
+ return NULL;
+
+ /* Read the current CPU context switch counter */
+ cnt = nr_context_switches_cpu(cpu);
+
+ /*
+ * Preemption is going to be enabled, but this task must
+ * remain on this CPU.
+ */
+ migrate_disable();
+
+ /*
+ * Now preemption is being enabed and another task can come in
+ * and use the same buffer and corrupt our data.
+ */
+ preempt_enable_notrace();
+
+ ret = __copy_from_user(buffer, ptr, size);
+
+ preempt_disable_notrace();
+ migrate_enable();
+
+ /* if it faulted, no need to test if the buffer was corrupted */
+ if (ret)
+ return NULL;
+
+ /*
+ * Preemption is disabled again, now check the per CPU context
+ * switch counter. If it doesn't match, then another user space
+ * process may have schedule in and corrupted our buffer. In that
+ * case the copying must be retried.
+ */
+ } while (nr_context_switches_cpu(cpu) != cnt);
+
+ *read_size = size;
+ return buffer;
+}
+
static ssize_t
tracing_mark_write(struct file *filp, const char __user *ubuf,
size_t cnt, loff_t *fpos)
@@ -7250,6 +7389,8 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
struct trace_array *tr = filp->private_data;
ssize_t written = -ENODEV;
unsigned long ip;
+ size_t size;
+ char *buf;
if (tracing_disabled)
return -EINVAL;
@@ -7263,6 +7404,16 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
if (cnt > TRACE_MARKER_MAX_SIZE)
cnt = TRACE_MARKER_MAX_SIZE;
+ /* Must have preemption disabled while having access to the buffer */
+ guard(preempt_notrace)();
+
+ buf = trace_user_fault_read(trace_user_buffer, ubuf, cnt, &size);
+ if (!buf)
+ return -EFAULT;
+
+ if (cnt > size)
+ cnt = size;
+
/* The selftests expect this function to be the IP address */
ip = _THIS_IP_;
@@ -7270,32 +7421,27 @@ tracing_mark_write(struct file *filp, const char __user *ubuf,
if (tr == &global_trace) {
guard(rcu)();
list_for_each_entry_rcu(tr, &marker_copies, marker_list) {
- written = write_marker_to_buffer(tr, ubuf, cnt, ip);
+ written = write_marker_to_buffer(tr, buf, cnt, ip);
if (written < 0)
break;
}
} else {
- written = write_marker_to_buffer(tr, ubuf, cnt, ip);
+ written = write_marker_to_buffer(tr, buf, cnt, ip);
}
return written;
}
static ssize_t write_raw_marker_to_buffer(struct trace_array *tr,
- const char __user *ubuf, size_t cnt)
+ const char *buf, size_t cnt)
{
struct ring_buffer_event *event;
struct trace_buffer *buffer;
struct raw_data_entry *entry;
ssize_t written;
- int size;
- int len;
-
-#define FAULT_SIZE_ID (FAULTED_SIZE + sizeof(int))
+ size_t size;
size = sizeof(*entry) + cnt;
- if (cnt < FAULT_SIZE_ID)
- size += FAULT_SIZE_ID - cnt;
buffer = tr->array_buffer.buffer;
@@ -7309,14 +7455,8 @@ static ssize_t write_raw_marker_to_buffer(struct trace_array *tr,
return -EBADF;
entry = ring_buffer_event_data(event);
-
- len = copy_from_user_nofault(&entry->id, ubuf, cnt);
- if (len) {
- entry->id = -1;
- memcpy(&entry->buf, FAULTED_STR, FAULTED_SIZE);
- written = -EFAULT;
- } else
- written = cnt;
+ memcpy(&entry->id, buf, cnt);
+ written = cnt;
__buffer_unlock_commit(buffer, event);
@@ -7329,8 +7469,8 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
{
struct trace_array *tr = filp->private_data;
ssize_t written = -ENODEV;
-
-#define FAULT_SIZE_ID (FAULTED_SIZE + sizeof(int))
+ size_t size;
+ char *buf;
if (tracing_disabled)
return -EINVAL;
@@ -7342,6 +7482,17 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
if (cnt < sizeof(unsigned int))
return -EINVAL;
+ /* Must have preemption disabled while having access to the buffer */
+ guard(preempt_notrace)();
+
+ buf = trace_user_fault_read(trace_user_buffer, ubuf, cnt, &size);
+ if (!buf)
+ return -EFAULT;
+
+ /* raw write is all or nothing */
+ if (cnt > size)
+ return -EINVAL;
+
/* The global trace_marker_raw can go to multiple instances */
if (tr == &global_trace) {
guard(rcu)();
@@ -7357,6 +7508,27 @@ tracing_mark_raw_write(struct file *filp, const char __user *ubuf,
return written;
}
+static int tracing_mark_open(struct inode *inode, struct file *filp)
+{
+ int ret;
+
+ ret = trace_user_fault_buffer_enable();
+ if (ret < 0)
+ return ret;
+
+ stream_open(inode, filp);
+ ret = tracing_open_generic_tr(inode, filp);
+ if (ret < 0)
+ trace_user_fault_buffer_disable();
+ return ret;
+}
+
+static int tracing_mark_release(struct inode *inode, struct file *file)
+{
+ trace_user_fault_buffer_disable();
+ return tracing_release_generic_tr(inode, file);
+}
+
static int tracing_clock_show(struct seq_file *m, void *v)
{
struct trace_array *tr = m->private;
@@ -7764,13 +7936,13 @@ static const struct file_operations tracing_free_buffer_fops = {
static const struct file_operations tracing_mark_fops = {
.open = tracing_mark_open,
.write = tracing_mark_write,
- .release = tracing_release_generic_tr,
+ .release = tracing_mark_release,
};
static const struct file_operations tracing_mark_raw_fops = {
.open = tracing_mark_open,
.write = tracing_mark_raw_write,
- .release = tracing_release_generic_tr,
+ .release = tracing_mark_release,
};
static const struct file_operations trace_clock_fops = {
--
2.51.0
From: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com>
The return value from `__rb_map_vma()`, which rejects writable or
executable mappings (VM_WRITE, VM_EXEC, or !VM_MAYSHARE), was being
ignored. As a result the caller of `__rb_map_vma` always returned 0
even when the mapping had actually failed, allowing it to proceed
with an invalid VMA.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Link: https://lore.kernel.org/20251008172516.20697-1-ankitkhushwaha.linux@gmail.c…
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions")
Reported-by: syzbot+ddc001b92c083dbf2b97(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=194151be8eaebd826005329b2e123aecae714b…
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 43460949ad3f..1244d2c5c384 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7273,7 +7273,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
atomic_dec(&cpu_buffer->resize_disabled);
}
- return 0;
+ return err;
}
int ring_buffer_unmap(struct trace_buffer *buffer, int cpu)
--
2.51.0
From: Steven Rostedt <rostedt(a)goodmis.org>
The functions irqsoff_graph_entry() and irqsoff_graph_return() both call
func_prolog_dec() that will test if the data->disable is already set and
if not, increment it and return. If it was set, it returns false and the
caller exits.
The caller of this function must decrement the disable counter, but misses
doing so if the calltime fails to be acquired.
Instead of exiting out when calltime is NULL, change the logic to do the
work if it is not NULL and still do the clean up at the end of the
function if it is NULL.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Link: https://lore.kernel.org/20251008114943.6f60f30f@gandalf.local.home
Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph")
Reported-by: Sasha Levin <sashal(a)kernel.org>
Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-2-sashal@…
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_irqsoff.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 5496758b6c76..4c45c49b06c8 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -184,7 +184,7 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
unsigned long flags;
unsigned int trace_ctx;
u64 *calltime;
- int ret;
+ int ret = 0;
if (ftrace_graph_ignore_func(gops, trace))
return 0;
@@ -202,13 +202,11 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
return 0;
calltime = fgraph_reserve_data(gops->idx, sizeof(*calltime));
- if (!calltime)
- return 0;
-
- *calltime = trace_clock_local();
-
- trace_ctx = tracing_gen_ctx_flags(flags);
- ret = __trace_graph_entry(tr, trace, trace_ctx);
+ if (calltime) {
+ *calltime = trace_clock_local();
+ trace_ctx = tracing_gen_ctx_flags(flags);
+ ret = __trace_graph_entry(tr, trace, trace_ctx);
+ }
local_dec(&data->disabled);
return ret;
@@ -233,11 +231,10 @@ static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
rettime = trace_clock_local();
calltime = fgraph_retrieve_data(gops->idx, &size);
- if (!calltime)
- return;
-
- trace_ctx = tracing_gen_ctx_flags(flags);
- __trace_graph_return(tr, trace, trace_ctx, *calltime, rettime);
+ if (calltime) {
+ trace_ctx = tracing_gen_ctx_flags(flags);
+ __trace_graph_return(tr, trace, trace_ctx, *calltime, rettime);
+ }
local_dec(&data->disabled);
}
--
2.51.0
From: Steven Rostedt <rostedt(a)goodmis.org>
The functions wakeup_graph_entry() and wakeup_graph_return() both call
func_prolog_preempt_disable() that will test if the data->disable is
already set and if not, increment it and disable preemption. If it was
set, it returns false and the caller exits.
The caller of this function must decrement the disable counter, but misses
doing so if the calltime fails to be acquired.
Instead of exiting out when calltime is NULL, change the logic to do the
work if it is not NULL and still do the clean up at the end of the
function if it is NULL.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Link: https://lore.kernel.org/20251008114835.027b878a@gandalf.local.home
Fixes: a485ea9e3ef3 ("tracing: Fix irqsoff and wakeup latency tracers when using function graph")
Reported-by: Sasha Levin <sashal(a)kernel.org>
Closes: https://lore.kernel.org/linux-trace-kernel/20251006175848.1906912-1-sashal@…
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_sched_wakeup.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index bf1cb80742ae..e3f2e4f56faa 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -138,12 +138,10 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
return 0;
calltime = fgraph_reserve_data(gops->idx, sizeof(*calltime));
- if (!calltime)
- return 0;
-
- *calltime = trace_clock_local();
-
- ret = __trace_graph_entry(tr, trace, trace_ctx);
+ if (calltime) {
+ *calltime = trace_clock_local();
+ ret = __trace_graph_entry(tr, trace, trace_ctx);
+ }
local_dec(&data->disabled);
preempt_enable_notrace();
@@ -169,12 +167,10 @@ static void wakeup_graph_return(struct ftrace_graph_ret *trace,
rettime = trace_clock_local();
calltime = fgraph_retrieve_data(gops->idx, &size);
- if (!calltime)
- return;
+ if (calltime)
+ __trace_graph_return(tr, trace, trace_ctx, *calltime, rettime);
- __trace_graph_return(tr, trace, trace_ctx, *calltime, rettime);
local_dec(&data->disabled);
-
preempt_enable_notrace();
return;
}
--
2.51.0
Hello.
We have observed a huge latency increase using `fork()` after ingesting the CVE-2025-38085 fix which leads to the commit `1013af4f585f: mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race`. On large machines with 1.5TB of memory with 196 cores, we identified mmapping of 1.2TB of shared memory and forking itself dozens or hundreds of times we see a increase of execution times of a factor of 4. The reproducer is at the end of the email.
Comparing the a kernel without this patch with a kernel with this patch applied when spawning 1000 children we see those execution times:
Patched kernel:
$ time make stress
...
real 0m11.275s
user 0m0.177s
sys 0m23.905s
Original kernel :
$ time make stress
...real 0m2.475s
user 0m1.398s
sys 0m2.501s
The patch in question: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
My observation/assumption is:
each child touches 100 random pages and despawns
on each despawn `huge_pmd_unshare()` is called
each call to `huge_pmd_unshare()` syncrhonizes all threads using `tlb_remove_table_sync_one()` leading to the regression
I'm happy to provide more information.
Thank you
Stanislav Uschakow
=== Reproducer ===
Setup:
#!/bin/bash
echo "Setting up hugepages for reproduction..."
# hugepages (1.2TB / 2MB = 614400 pages)
REQUIRED_PAGES=614400
# Check current hugepage allocation
CURRENT_PAGES=$(cat /proc/sys/vm/nr_hugepages)
echo "Current hugepages: $CURRENT_PAGES"
if [ "$CURRENT_PAGES" -lt "$REQUIRED_PAGES" ]; then
echo "Allocating $REQUIRED_PAGES hugepages..."
echo $REQUIRED_PAGES | sudo tee /proc/sys/vm/nr_hugepages
ALLOCATED=$(cat /proc/sys/vm/nr_hugepages)
echo "Allocated hugepages: $ALLOCATED"
if [ "$ALLOCATED" -lt "$REQUIRED_PAGES" ]; then
echo "Warning: Could not allocate all required hugepages"
echo "Available: $ALLOCATED, Required: $REQUIRED_PAGES"
fi
fi
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo -e "\nHugepage information:"
cat /proc/meminfo | grep -i huge
echo -e "\nSetup complete. You can now run the reproduction test."
Makefile:
CXX = gcc
CXXFLAGS = -O2 -Wall
TARGET = hugepage_repro
SOURCE = hugepage_repro.c
$(TARGET): $(SOURCE)
$(CXX) $(CXXFLAGS) -o $(TARGET) $(SOURCE)
clean:
rm -f $(TARGET)
setup:
chmod +x setup_hugepages.sh
./setup_hugepages.sh
test: $(TARGET)
./$(TARGET) 20 3
stress: $(TARGET)
./$(TARGET) 1000 1
.PHONY: clean setup test stress
hugepage_repro.c:
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <stdio.h>
#define HUGEPAGE_SIZE (2 * 1024 * 1024) // 2MB
#define TOTAL_SIZE (1200ULL * 1024 * 1024 * 1024) // 1.2TB
#define NUM_HUGEPAGES (TOTAL_SIZE / HUGEPAGE_SIZE)
void* create_hugepage_mapping() {
void* addr = mmap(NULL, TOTAL_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap hugepages failed");
exit(1);
}
return addr;
}
void touch_random_pages(void* addr, int num_touches) {
char* base = (char*)addr;
for (int i = 0; i < num_touches; ++i) {
size_t offset = (rand() % NUM_HUGEPAGES) * HUGEPAGE_SIZE;
volatile char val = base[offset];
(void)val;
}
}
void child_process(void* shared_mem, int child_id) {
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
touch_random_pages(shared_mem, 100);
clock_gettime(CLOCK_MONOTONIC, &end);
long duration = (end.tv_sec - start.tv_sec) * 1000000 +
(end.tv_nsec - start.tv_nsec) / 1000;
printf("Child %d completed in %ld μs\n", child_id, duration);
}
int main(int argc, char* argv[]) {
int num_processes = argc > 1 ? atoi(argv[1]) : 50;
int iterations = argc > 2 ? atoi(argv[2]) : 5;
printf("Creating %lluGB hugepage mapping...\n", TOTAL_SIZE / (1024*1024*1024));
void* shared_mem = create_hugepage_mapping();
for (int iter = 0; iter < iterations; ++iter) {
printf("\nIteration %d: Forking %d processes\n", iter + 1, num_processes);
pid_t children[num_processes];
struct timespec iter_start, iter_end;
clock_gettime(CLOCK_MONOTONIC, &iter_start);
for (int i = 0; i < num_processes; ++i) {
pid_t pid = fork();
if (pid == 0) {
child_process(shared_mem, i);
exit(0);
} else if (pid > 0) {
children[i] = pid;
}
}
for (int i = 0; i < num_processes; ++i) {
waitpid(children[i], NULL, 0);
}
clock_gettime(CLOCK_MONOTONIC, &iter_end);
long iter_duration = (iter_end.tv_sec - iter_start.tv_sec) * 1000 +
(iter_end.tv_nsec - iter_start.tv_nsec) / 1000000;
printf("Iteration completed in %ld ms\n", iter_duration);
}
munmap(shared_mem, TOTAL_SIZE);
return 0;
}
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
From: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
For DMA initialization to work across all EPC drivers, the DMA
initialization has to be done in the .init() callback.
This is because not all EPC drivers will have a refclock (which is often
needed to access registers of a DMA controller embedded in a PCIe
controller) at the time the .bind() callback is called.
However, all EPC drivers are guaranteed to have a refclock by the time
the .init() callback is called.
Thus, move the DMA initialization to the .init() callback.
This change was already done for other EPF drivers in
commit 60bd3e039aa2 ("PCI: endpoint: pci-epf-{mhi/test}: Move DMA
initialization to EPC init callback").
Cc: stable(a)vger.kernel.org
Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
drivers/nvme/target/pci-epf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/target/pci-epf.c b/drivers/nvme/target/pci-epf.c
index 2e78397a7373a..9c5b0f78ce8df 100644
--- a/drivers/nvme/target/pci-epf.c
+++ b/drivers/nvme/target/pci-epf.c
@@ -2325,6 +2325,8 @@ static int nvmet_pci_epf_epc_init(struct pci_epf *epf)
return ret;
}
+ nvmet_pci_epf_init_dma(nvme_epf);
+
/* Set device ID, class, etc. */
epf->header->vendorid = ctrl->tctrl->subsys->vendor_id;
epf->header->subsys_vendor_id = ctrl->tctrl->subsys->subsys_vendor_id;
@@ -2422,8 +2424,6 @@ static int nvmet_pci_epf_bind(struct pci_epf *epf)
if (ret)
return ret;
- nvmet_pci_epf_init_dma(nvme_epf);
-
return 0;
}
--
2.51.0
The previous timeout of 500us seems to be too small; panning the map in
the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered
timeouts within a few seconds of usage, causing the monitor to freeze
and the following to be printed to dmesg:
[Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout
[Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out
I haven't hit a single timeout since increasing it to 1000us even after
several multi-hour testing sessions.
Fixes: c0114fdf6d4a ("drm/xe: Move DSB l2 flush to a more sensible place")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710
Signed-off-by: Kenneth Graunke <kenneth(a)whitecape.org>
Cc: stable(a)vger.kernel.org
Cc: Maarten Lankhorst <dev(a)lankhorst.se>
---
drivers/gpu/drm/xe/xe_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
This fixes my desktop which has been broken since 6.15. Given that
https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6097 was recently
filed and they seem to need a timeout of 2000 (and are having somewhat
different issues), maybe more work's needed here...but I figured I'd
send out the fix for my system and let xe folks figure out what they'd
like to do. Thanks :)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index a4d12ee7d575..6339b8800914 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -1064,7 +1064,7 @@ void xe_device_l2_flush(struct xe_device *xe)
spin_lock(>->global_invl_lock);
xe_mmio_write32(>->mmio, XE2_GLOBAL_INVAL, 0x1);
- if (xe_mmio_wait32(>->mmio, XE2_GLOBAL_INVAL, 0x1, 0x0, 500, NULL, true))
+ if (xe_mmio_wait32(>->mmio, XE2_GLOBAL_INVAL, 0x1, 0x0, 1000, NULL, true))
xe_gt_err_once(gt, "Global invalidation timeout\n");
spin_unlock(>->global_invl_lock);
--
2.51.0
This fix regressed the original issue that commit d83c747a1225
("drm/amd/display: Fix brightness level not retained over reboot") solved,
so revert it until a different approach to solve the regression that
it caused with AMD_PRIVATE_COLOR is found.
Fixes: a490c8d77d50 ("drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4620
Cc: stable(a)vger.kernel.org
Signed-off-by: Matthew Schwartz <matthew.schwartz(a)linux.dev>
---
v1 -> v2:
- Fix missing stable tag
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++++--------
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 7 -------
2 files changed, 4 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8e1622bf7a42..21281e684b84 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2081,8 +2081,6 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
dc_hardware_init(adev->dm.dc);
- adev->dm.restore_backlight = true;
-
adev->dm.hpd_rx_offload_wq = hpd_rx_irq_create_workqueue(adev);
if (!adev->dm.hpd_rx_offload_wq) {
drm_err(adev_to_drm(adev), "failed to create hpd rx offload workqueue.\n");
@@ -3438,7 +3436,6 @@ static int dm_resume(struct amdgpu_ip_block *ip_block)
dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D0);
dc_resume(dm->dc);
- adev->dm.restore_backlight = true;
amdgpu_dm_irq_resume_early(adev);
@@ -9965,6 +9962,7 @@ static void amdgpu_dm_commit_streams(struct drm_atomic_state *state,
bool mode_set_reset_required = false;
u32 i;
struct dc_commit_streams_params params = {dc_state->streams, dc_state->stream_count};
+ bool set_backlight_level = false;
/* Disable writeback */
for_each_old_connector_in_state(state, connector, old_con_state, i) {
@@ -10084,6 +10082,7 @@ static void amdgpu_dm_commit_streams(struct drm_atomic_state *state,
acrtc->hw_mode = new_crtc_state->mode;
crtc->hwmode = new_crtc_state->mode;
mode_set_reset_required = true;
+ set_backlight_level = true;
} else if (modereset_required(new_crtc_state)) {
drm_dbg_atomic(dev,
"Atomic commit: RESET. crtc id %d:[%p]\n",
@@ -10140,16 +10139,13 @@ static void amdgpu_dm_commit_streams(struct drm_atomic_state *state,
* to fix a flicker issue.
* It will cause the dm->actual_brightness is not the current panel brightness
* level. (the dm->brightness is the correct panel level)
- * So we set the backlight level with dm->brightness value after initial
- * set mode. Use restore_backlight flag to avoid setting backlight level
- * for every subsequent mode set.
+ * So we set the backlight level with dm->brightness value after set mode
*/
- if (dm->restore_backlight) {
+ if (set_backlight_level) {
for (i = 0; i < dm->num_of_edps; i++) {
if (dm->backlight_dev[i])
amdgpu_dm_backlight_set_level(dm, i, dm->brightness[i]);
}
- dm->restore_backlight = false;
}
}
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
index 009f206226f0..db75e991ac7b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
@@ -630,13 +630,6 @@ struct amdgpu_display_manager {
*/
u32 actual_brightness[AMDGPU_DM_MAX_NUM_EDP];
- /**
- * @restore_backlight:
- *
- * Flag to indicate whether to restore backlight after modeset.
- */
- bool restore_backlight;
-
/**
* @aux_hpd_discon_quirk:
*
--
2.51.0
This is the start of the stable review cycle for the 6.12.51 release.
There are 10 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 05 Oct 2025 16:02:25 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.51-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.51-rc1
Srinivas Kandagatla <srinivas.kandagatla(a)oss.qualcomm.com>
ASoC: qcom: audioreach: fix potential null pointer dereference
Matvey Kovalev <matvey.kovalev(a)ispras.ru>
wifi: ath11k: fix NULL dereference in ath11k_qmi_m3_load()
Charan Teja Kalla <charan.kalla(a)oss.qualcomm.com>
mm: swap: check for stable address space before operating on the VMA
Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
media: uvcvideo: Mark invalid entities with id UVC_INVALID_ENTITY_ID
Larshin Sergey <Sergey.Larshin(a)kaspersky.com>
media: rc: fix races with imon_disconnect()
Duoming Zhou <duoming(a)zju.edu.cn>
media: tuner: xc5000: Fix use-after-free in xc5000_release
Duoming Zhou <duoming(a)zju.edu.cn>
media: b2c2: Fix use-after-free causing by irq_check_work in flexcop_pci_remove
Wang Haoran <haoranwangsec(a)gmail.com>
scsi: target: target_core_configfs: Add length check to avoid buffer overflow
Kees Cook <kees(a)kernel.org>
gcc-plugins: Remove TODO_verify_il for GCC >= 16
Breno Leitao <leitao(a)debian.org>
crypto: sha256 - fix crash at kexec
-------------
Diffstat:
Makefile | 4 +-
drivers/media/pci/b2c2/flexcop-pci.c | 2 +-
drivers/media/rc/imon.c | 27 +++++++++----
drivers/media/tuners/xc5000.c | 2 +-
drivers/media/usb/uvc/uvc_driver.c | 73 ++++++++++++++++++++++-------------
drivers/media/usb/uvc/uvcvideo.h | 2 +
drivers/net/wireless/ath/ath11k/qmi.c | 2 +-
drivers/target/target_core_configfs.c | 2 +-
include/crypto/sha256_base.h | 2 +-
mm/swapfile.c | 3 ++
scripts/gcc-plugins/gcc-common.h | 7 ++++
sound/soc/qcom/qdsp6/topology.c | 4 +-
12 files changed, 87 insertions(+), 43 deletions(-)
Backport of the two riscv mmap patches from master. In effect, these two
patches removes arch_get_mmap_{base,end} for riscv.
Guo Ren: Please take a look. Patch 1 has a slightly non-trivial conflict
with your commit 97b7ac69be2e ("riscv: mm: Fixup compat
arch_get_mmap_end"), which changed STACK_TOP_MAX from TASK_SIZE_64 to
TASK_SIZE when CONFIG_64BIT=y. This shouldn't be a problem, but, well,
just to be safe.
---
Charlie Jenkins (2):
riscv: mm: Use hint address in mmap if available
riscv: mm: Do not restrict mmap address based on hint
arch/riscv/include/asm/processor.h | 33 +++++----------------------------
1 file changed, 5 insertions(+), 28 deletions(-)
---
base-commit: 60a9e718726fa7019ae00916e4b1c52498da5b60
change-id: 20250917-riscv-mmap-addr-space-6-6-15e7db6b5db6
Best regards,
--
Vivian "dramforever" Wang
From: Steve Wilkins <steve.wilkins(a)raymarine.com>
[ Upstream commit 9cf71eb0faef4bff01df4264841b8465382d7927 ]
While transmitting with rx_len == 0, the RX FIFO is not going to be
emptied in the interrupt handler. A subsequent transfer could then
read crap from the previous transfer out of the RX FIFO into the
start RX buffer. The core provides a register that will empty the RX and
TX FIFOs, so do that before each transfer.
Fixes: 9ac8d17694b6 ("spi: add support for microchip fpga spi controllers")
Signed-off-by: Steve Wilkins <steve.wilkins(a)raymarine.com>
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
Link: https://patch.msgid.link/20240715-flammable-provoke-459226d08e70@wendy
Signed-off-by: Mark Brown <broonie(a)kernel.org>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
drivers/spi/spi-microchip-core.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/spi/spi-microchip-core.c b/drivers/spi/spi-microchip-core.c
index bfad0fe743ad..acc05f5a929e 100644
--- a/drivers/spi/spi-microchip-core.c
+++ b/drivers/spi/spi-microchip-core.c
@@ -91,6 +91,8 @@
#define REG_CONTROL2 (0x28)
#define REG_COMMAND (0x2c)
#define COMMAND_CLRFRAMECNT BIT(4)
+#define COMMAND_TXFIFORST BIT(3)
+#define COMMAND_RXFIFORST BIT(2)
#define REG_PKTSIZE (0x30)
#define REG_CMD_SIZE (0x34)
#define REG_HWSTATUS (0x38)
@@ -489,6 +491,8 @@ static int mchp_corespi_transfer_one(struct spi_controller *host,
mchp_corespi_set_xfer_size(spi, (spi->tx_len > FIFO_DEPTH)
? FIFO_DEPTH : spi->tx_len);
+ mchp_corespi_write(spi, REG_COMMAND, COMMAND_RXFIFORST | COMMAND_TXFIFORST);
+
while (spi->tx_len)
mchp_corespi_write_fifo(spi);
--
2.34.1
From: Takashi Iwai <tiwai(a)suse.de>
[ Upstream commit 0718a78f6a9f04b88d0dc9616cc216b31c5f3cf1 ]
The USB-audio MIDI code initializes the timer, but in a rare case, the
driver might be freed without the disconnect call. This leaves the
timer in an active state while the assigned object is released via
snd_usbmidi_free(), which ends up with a kernel warning when the debug
configuration is enabled, as spotted by fuzzer.
For avoiding the problem, put timer_shutdown_sync() at
snd_usbmidi_free(), so that the timer can be killed properly.
While we're at it, replace the existing timer_delete_sync() at the
disconnect callback with timer_shutdown_sync(), too.
Reported-by: syzbot+d8f72178ab6783a7daea(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/681c70d7.050a0220.a19a9.00c6.GAE@google.com
Cc: <stable(a)vger.kernel.org>
Link: https://patch.msgid.link/20250519212031.14436-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
[ del_timer vs timer_delete differences ]
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
---
sound/usb/midi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/usb/midi.c b/sound/usb/midi.c
index a792ada18863..c3de2b137435 100644
--- a/sound/usb/midi.c
+++ b/sound/usb/midi.c
@@ -1530,6 +1530,7 @@ static void snd_usbmidi_free(struct snd_usb_midi *umidi)
snd_usbmidi_in_endpoint_delete(ep->in);
}
mutex_destroy(&umidi->mutex);
+ timer_shutdown_sync(&umidi->error_timer);
kfree(umidi);
}
@@ -1553,7 +1554,7 @@ void snd_usbmidi_disconnect(struct list_head *p)
spin_unlock_irq(&umidi->disc_lock);
up_write(&umidi->disc_rwsem);
- del_timer_sync(&umidi->error_timer);
+ timer_shutdown_sync(&umidi->error_timer);
for (i = 0; i < MIDI_MAX_ENDPOINTS; ++i) {
struct snd_usb_midi_endpoint *ep = &umidi->endpoints[i];
--
Make sure to drop the reference taken to the sysmgr platform device when
retrieving its driver data.
Note that holding a reference to a device does not prevent its driver
data from going away.
Fixes: f36e789a1f8d ("mfd: altera-sysmgr: Add SOCFPGA System Manager")
Cc: stable(a)vger.kernel.org # 5.2
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/mfd/altera-sysmgr.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mfd/altera-sysmgr.c b/drivers/mfd/altera-sysmgr.c
index fb5f988e61f3..90c6902d537d 100644
--- a/drivers/mfd/altera-sysmgr.c
+++ b/drivers/mfd/altera-sysmgr.c
@@ -117,6 +117,8 @@ struct regmap *altr_sysmgr_regmap_lookup_by_phandle(struct device_node *np,
sysmgr = dev_get_drvdata(dev);
+ put_device(dev);
+
return sysmgr->regmap;
}
EXPORT_SYMBOL_GPL(altr_sysmgr_regmap_lookup_by_phandle);
--
2.49.1
A few fixes for drm panic, that I found when writing unit tests with kunit.
Jocelyn Falempe (6):
drm/panic: Fix drawing the logo on a small narrow screen
drm/panic: Fix overlap between qr code and logo
drm/panic: Fix qr_code, ensure vmargin is positive
drm/panic: Fix kmsg text drawing rectangle
drm/panic: Fix divide by 0 if the screen width < font width
drm/panic: Fix 24bit pixel crossing page boundaries
drivers/gpu/drm/drm_panic.c | 60 +++++++++++++++++++++++++++++++++----
1 file changed, 54 insertions(+), 6 deletions(-)
base-commit: e4bea919584ff292c9156cf7d641a2ab3cbe27b0
--
2.51.0
A regression was reported to me recently whereby /dev/fb0 had disappeared
from a PowerBook G3 Series "Wallstreet". The problem shows up when the
"video=ofonly" parameter is passed to the kernel, which is what the
bootloader does when "no video driver" is selected. The cause of the
problem is the "offb" string comparison, which got mangled when it got
refactored. Fix it.
Cc: stable(a)vger.kernel.org
Fixes: 93604a5ade3a ("fbdev: Handle video= parameter in video/cmdline.c")
Reported-and-tested-by: Stan Johnson <userm57(a)yahoo.com>
Signed-off-by: Finn Thain <fthain(a)linux-m68k.org>
---
drivers/video/fbdev/core/fb_cmdline.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fb_cmdline.c b/drivers/video/fbdev/core/fb_cmdline.c
index 4d1634c492ec..594b60424d1c 100644
--- a/drivers/video/fbdev/core/fb_cmdline.c
+++ b/drivers/video/fbdev/core/fb_cmdline.c
@@ -40,7 +40,7 @@ int fb_get_options(const char *name, char **option)
bool enabled;
if (name)
- is_of = strncmp(name, "offb", 4);
+ is_of = !strncmp(name, "offb", 4);
enabled = __video_get_options(name, &options, is_of);
--
2.49.1