According to documentation, the DP PHY on x1e80100 has another clock
called ref.
The current X Elite devices supported upstream work fine without this
clock, because the boot firmware leaves this clock enabled. But we should
not rely on that. Also, when it comes to power management, this clock
needs to be also disabled on suspend. So even though this change breaks
the ABI, it is needed in order to make we disable this clock on runtime
PM, when that is going to be enabled in the driver.
So rework the driver to allow different number of clocks, fix the
dt-bindings schema and add the clock to the DT node as well.
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
Changes in v5:
- Picked-up Bjorn's R-b tags.
- Replaced "parse" with "get" on clocks acquiring failure.
- Link to v4: https://lore.kernel.org/r/20251029-phy-qcom-edp-add-missing-refclk-v4-0-adb…
Changes in v4:
- Picked Dmitry's R-b tag for the driver patch
- Added x1e80100 substring to subject of dts patch
- Link to v3 (resend): https://lore.kernel.org/r/20251014-phy-qcom-edp-add-missing-refclk-v3-0-078…
Changes in v3 (resend)
- picked-up Krzysztof's R-b tag for bindings patch
- Link to v3: https://lore.kernel.org/r/20250909-phy-qcom-edp-add-missing-refclk-v3-0-4ec…
Changes in v3:
- Use dev_err_probe() on clocks parsing failure.
- Explain why the ABI break is necessary.
- Drop the extra 'clk' suffix from the clock name. So ref instead of
refclk.
- Link to v2: https://lore.kernel.org/r/20250903-phy-qcom-edp-add-missing-refclk-v2-0-d88…
Changes in v2:
- Fix schema by adding the minItems, as suggested by Krzysztof.
- Use devm_clk_bulk_get_all, as suggested by Konrad.
- Rephrase the commit messages to reflect the flexible number of clocks.
- Link to v1: https://lore.kernel.org/r/20250730-phy-qcom-edp-add-missing-refclk-v1-0-6f7…
---
Abel Vesa (3):
dt-bindings: phy: qcom-edp: Add missing clock for X Elite
phy: qcom: edp: Make the number of clocks flexible
arm64: dts: qcom: x1e80100: Add missing TCSR ref clock to the DP PHYs
.../devicetree/bindings/phy/qcom,edp-phy.yaml | 28 +++++++++++++++++++++-
arch/arm64/boot/dts/qcom/hamoa.dtsi | 12 ++++++----
drivers/phy/qualcomm/phy-qcom-edp.c | 16 ++++++-------
3 files changed, 43 insertions(+), 13 deletions(-)
---
base-commit: 131f3d9446a6075192cdd91f197989d98302faa6
change-id: 20250730-phy-qcom-edp-add-missing-refclk-5ab82828f8e7
Best regards,
--
Abel Vesa <abel.vesa(a)oss.qualcomm.com>
From: Sean Christopherson <seanjc(a)google.com>
When loading guest XSAVE state via KVM_SET_XSAVE, and when updating XFD in
response to a guest WRMSR, clear XFD-disabled features in the saved (or to
be restored) XSTATE_BV to ensure KVM doesn't attempt to load state for
features that are disabled via the guest's XFD. Because the kernel
executes XRSTOR with the guest's XFD, saving XSTATE_BV[i]=1 with XFD[i]=1
will cause XRSTOR to #NM and panic the kernel.
E.g. if fpu_update_guest_xfd() sets XFD without clearing XSTATE_BV:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#29: amx_test/848
Modules linked in: kvm_intel kvm irqbypass
CPU: 29 UID: 1000 PID: 848 Comm: amx_test Not tainted 6.19.0-rc2-ffa07f7fd437-x86_amx_nm_xfd_non_init-vm #171 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
switch_fpu_return+0x4a/0xb0
kvm_arch_vcpu_ioctl_run+0x1245/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
This can happen if the guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1,
and a host IRQ triggers kernel_fpu_begin() prior to the vmexit handler's
call to fpu_update_guest_xfd().
and if userspace stuffs XSTATE_BV[i]=1 via KVM_SET_XSAVE:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#14: amx_test/867
Modules linked in: kvm_intel kvm irqbypass
CPU: 14 UID: 1000 PID: 867 Comm: amx_test Not tainted 6.19.0-rc2-2dace9faccd6-x86_amx_nm_xfd_non_init-vm #168 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
fpu_swap_kvm_fpstate+0x6b/0x120
kvm_load_guest_fpu+0x30/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
The new behavior is consistent with the AMX architecture. Per Intel's SDM,
XSAVE saves XSTATE_BV as '0' for components that are disabled via XFD
(and non-compacted XSAVE saves the initial configuration of the state
component):
If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it operates as if XINUSE[i] = 0 (and the state component was
in its initial state): it saves bit i of XSTATE_BV field of the XSAVE
header as 0; in addition, XSAVE saves the initial configuration of the
state component (the other instructions do not save state component i).
Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using
a constant XFD based on the set of enabled features when XSAVEing for
a struct fpu_guest. However, having XSTATE_BV[i]=1 for XFD-disabled
features can only happen in the above interrupt case, or in similar
scenarios involving preemption on preemptible kernels, because
fpu_swap_kvm_fpstate()'s call to save_fpregs_to_fpstate() saves the
outgoing FPU state with the current XFD; and that is (on all but the
first WRMSR to XFD) the guest XFD.
Therefore, XFD can only go out of sync with XSTATE_BV in the above
interrupt case, or in similar scenarios involving preemption on
preemptible kernels, and it we can consider it (de facto) part of KVM
ABI that KVM_GET_XSAVE returns XSTATE_BV[i]=0 for XFD-disabled features.
Reported-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 820a6ee944e7 ("kvm: x86: Add emulation for IA32_XFD", 2022-01-14)
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
[Move clearing of XSTATE_BV from fpu_copy_uabi_to_guest_fpstate
to kvm_vcpu_ioctl_x86_set_xsave. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kernel/fpu/core.c | 32 +++++++++++++++++++++++++++++---
arch/x86/kvm/x86.c | 9 +++++++++
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index da233f20ae6f..166c380b0161 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -319,10 +319,29 @@ EXPORT_SYMBOL_FOR_KVM(fpu_enable_guest_xfd_features);
#ifdef CONFIG_X86_64
void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
{
+ struct fpstate *fpstate = guest_fpu->fpstate;
+
fpregs_lock();
- guest_fpu->fpstate->xfd = xfd;
- if (guest_fpu->fpstate->in_use)
- xfd_update_state(guest_fpu->fpstate);
+
+ /*
+ * KVM's guest ABI is that setting XFD[i]=1 *can* immediately revert
+ * the save state to initialized. Likewise, KVM_GET_XSAVE does the
+ * same as XSAVE and returns XSTATE_BV[i]=0 whenever XFD[i]=1.
+ *
+ * If the guest's FPU state is in hardware, just update XFD: the XSAVE
+ * in fpu_swap_kvm_fpstate will clear XSTATE_BV[i] whenever XFD[i]=1.
+ *
+ * If however the guest's FPU state is NOT resident in hardware, clear
+ * disabled components in XSTATE_BV now, or a subsequent XRSTOR will
+ * attempt to load disabled components and generate #NM _in the host_.
+ */
+ if (xfd && test_thread_flag(TIF_NEED_FPU_LOAD))
+ fpstate->regs.xsave.header.xfeatures &= ~xfd;
+
+ fpstate->xfd = xfd;
+ if (fpstate->in_use)
+ xfd_update_state(fpstate);
+
fpregs_unlock();
}
EXPORT_SYMBOL_FOR_KVM(fpu_update_guest_xfd);
@@ -430,6 +449,13 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
+ /*
+ * Disabled features must be in their initial state, otherwise XRSTOR
+ * causes an exception.
+ */
+ if (WARN_ON_ONCE(ustate->xsave.header.xfeatures & kstate->xfd))
+ return -EINVAL;
+
/*
* Nullify @vpkru to preserve its current value if PKRU's bit isn't set
* in the header. KVM's odd ABI is to leave PKRU untouched in this
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff8812f3a129..c0416f53b5f5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5807,9 +5807,18 @@ static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ union fpregs_state *xstate = (union fpregs_state *)guest_xsave->region;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
+ /*
+ * Do not reject non-initialized disabled features for backwards
+ * compatibility, but clear XSTATE_BV[i] whenever XFD[i]=1.
+ * Otherwise, XRSTOR would cause a #NM.
+ */
+ xstate->xsave.header.xfeatures &= ~vcpu->arch.guest_fpu.fpstate->xfd;
+
return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
guest_xsave->region,
kvm_caps.supported_xcr0,
--
2.52.0
When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.
This erratum affects any system topology in which the following
conditions apply:
- The Point of Serialization (PoS) is located downstream of the
interconnect.
- A downstream observer accesses memory directly, bypassing the
interconnect.
Conditions:
This erratum occurs only when all of the following conditions are met:
1. Software executes a data cache maintenance operation, specifically,
a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
IVAC), that hits on unique dirty data in the CPU or DSU cache. This
results in a combined CopyBack and CMO being issued to the
interconnect.
2. The interconnect splits the combined transaction into separate Write
and CMO transactions and returns an early completion response to the
CPU or DSU before the write has completed at the downstream memory
or PoS.
3. A downstream observer accesses the affected memory address after the
early completion response is issued but before the actual memory
write has completed. This allows the observer to read stale data
that has not yet been updated at the PoS or downstream memory.
The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC.. This way of implementation mitigates
performance panalty compared to purly duplicate orignial CMO.
Reported-by: kernel test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org # 6.12.x
Signed-off-by: Lucas Wei <lucaswei(a)google.com>
---
Changes in v2:
1. Fixed warning from kernel test robot by changing
arm_si_l1_workaround_4311569 to static
[Reported-by: kernel test robot <lkp(a)intel.com>]
---
Documentation/arch/arm64/silicon-errata.rst | 3 ++
arch/arm64/Kconfig | 19 +++++++++++++
arch/arm64/include/asm/assembler.h | 10 +++++++
arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++++++
arch/arm64/mm/cache.S | 13 ++++++++-
arch/arm64/tools/cpucaps | 1 +
6 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..98efdf528719 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -213,6 +213,9 @@ stable kernels.
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | SI L1 | #4311569 | ARM64_ERRATUM_4311569 |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 65db12f66b8f..a834d30859cc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1153,6 +1153,25 @@ config ARM64_ERRATUM_3194386
If unsure, say Y.
+config ARM64_ERRATUM_4311569
+ bool "SI L1: 4311569: workaround for premature CMO completion erratum"
+ default y
+ help
+ This option adds the workaround for ARM SI L1 erratum 4311569.
+
+ The erratum of SI L1 can cause an early response to a combined write
+ and cache maintenance operation (WR+CMO) before the operation is fully
+ completed to the Point of Serialization (POS).
+ This can result in a non-I/O coherent agent observing stale data,
+ potentially leading to system instability or incorrect behavior.
+
+ Enabling this option implements a software workaround by inserting a
+ second loop of Cache Maintenance Operation (CMO) immediately following the
+ end of function to do CMOs. This ensures that the data is correctly serialized
+ before the buffer is handed off to a non-coherent agent.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index f0ca7196f6fa..d3d46e5f7188 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -381,6 +381,9 @@ alternative_endif
.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
sub \tmp, \linesz, #1
bic \start, \start, \tmp
+alternative_if ARM64_WORKAROUND_4311569
+ mov \tmp, \start
+alternative_else_nop_endif
.Ldcache_op\@:
.ifc \op, cvau
__dcache_op_workaround_clean_cache \op, \start
@@ -402,6 +405,13 @@ alternative_endif
add \start, \start, \linesz
cmp \start, \end
b.lo .Ldcache_op\@
+alternative_if ARM64_WORKAROUND_4311569
+ .ifnc \op, cvau
+ mov \start, \tmp
+ mov \tmp, xzr
+ cbnz \start, .Ldcache_op\@
+ .endif
+alternative_else_nop_endif
dsb \domain
_cond_uaccess_extable .Ldcache_op\@, \fixup
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 8cb3b575a031..5c0ab6bfd44a 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
return (ctr_real != sys) && (ctr_raw != sys);
}
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
+static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
+{
+ static_branch_enable(&arm_si_l1_workaround_4311569);
+ pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
+
+ return 0;
+}
+early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
+
+/*
+ * We have some earlier use cases to call cache maintenance operation functions, for example,
+ * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
+ * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
+ * safe to have the workaround off by default.
+ */
+static bool
+need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ return static_branch_unlikely(&arm_si_l1_workaround_4311569);
+}
+#endif
+
static void
cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *cap)
{
@@ -870,6 +894,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
ERRATA_MIDR_RANGE_LIST(erratum_spec_ssbs_list),
},
#endif
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+ {
+ .capability = ARM64_WORKAROUND_4311569,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = need_arm_si_l1_workaround_4311569,
+ },
+#endif
#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
{
.desc = "ARM errata 2966298, 3117295",
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 503567c864fd..ddf0097624ed 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -143,9 +143,14 @@ SYM_FUNC_END(dcache_clean_pou)
* - end - kernel end address of region
*/
SYM_FUNC_START(__pi_dcache_inval_poc)
+alternative_if ARM64_WORKAROUND_4311569
+ mov x4, x0
+ mov x5, x1
+ mov x6, #1
+alternative_else_nop_endif
dcache_line_size x2, x3
sub x3, x2, #1
- tst x1, x3 // end cache line aligned?
+again: tst x1, x3 // end cache line aligned?
bic x1, x1, x3
b.eq 1f
dc civac, x1 // clean & invalidate D / U line
@@ -158,6 +163,12 @@ SYM_FUNC_START(__pi_dcache_inval_poc)
3: add x0, x0, x2
cmp x0, x1
b.lo 2b
+alternative_if ARM64_WORKAROUND_4311569
+ mov x0, x4
+ mov x1, x5
+ sub x6, x6, #1
+ cbz x6, again
+alternative_else_nop_endif
dsb sy
ret
SYM_FUNC_END(__pi_dcache_inval_poc)
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1b32c1232d28..3b18734f9744 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -101,6 +101,7 @@ WORKAROUND_2077057
WORKAROUND_2457168
WORKAROUND_2645198
WORKAROUND_2658417
+WORKAROUND_4311569
WORKAROUND_AMPERE_AC03_CPU_38
WORKAROUND_AMPERE_AC04_CPU_23
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
base-commit: edde060637b92607f3522252c03d64ad06369933
--
2.52.0.358.g0dd7633a29-goog
If damon_call() is executed against a DAMON context that is not running,
the function returns error while keeping the damon_call_control object
linked to the context's call_controls list. Let's suppose the object is
deallocated after the damon_call(), and yet another damon_call() is
executed against the same context. The function tries to add the new
damon_call_control object to the call_controls list, which still has the
pointer to the previous damon_call_control object, which is deallocated.
As a result, use-after-free happens.
This can actually be triggered using the DAMON sysfs interface. It is
not easily exploitable since it requires the sysfs write permission and
making a definitely weird file writes, though. Please refer to the
report for more details about the issue reproduction steps.
Fix the issue by making damon_call() to cleanup the damon_call_control
object before returning the error.
Reported-by: JaeJoon Jung <rgbi3307(a)gmail.com>
Closes: https://lore.kernel.org/20251224094401.20384-1-rgbi3307@gmail.com
Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()")
Cc: <stable(a)vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/core.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 2d3e8006db50..65482a0ce20b 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1442,6 +1442,35 @@ bool damon_is_running(struct damon_ctx *ctx)
return running;
}
+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ * an inactive context.
+ * @ctx: The inactive DAMON context.
+ * @control: Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive). See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+ struct damon_ctx *ctx, struct damon_call_control *control)
+{
+ struct damon_call_control *c;
+
+ mutex_lock(&ctx->call_controls_lock);
+ list_for_each_entry(c, &ctx->call_controls, list) {
+ if (c == control) {
+ list_del(&control->list);
+ mutex_unlock(&ctx->call_controls_lock);
+ return -EINVAL;
+ }
+ }
+ mutex_unlock(&ctx->call_controls_lock);
+ return 0;
+}
+
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1472,7 +1501,7 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
if (!damon_is_running(ctx))
- return -EINVAL;
+ return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
base-commit: a100272b3541cb00a5e29fdc16e428234ebfddd1
--
2.47.3
The patch titled
Subject: mm/damon/core: remove call_control in inactive contexts
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-remove-call_control-in-inactive-contexts.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: remove call_control in inactive contexts
Date: Tue, 30 Dec 2025 17:23:13 -0800
If damon_call() is executed against a DAMON context that is not running,
the function returns error while keeping the damon_call_control object
linked to the context's call_controls list. Let's suppose the object is
deallocated after the damon_call(), and yet another damon_call() is
executed against the same context. The function tries to add the new
damon_call_control object to the call_controls list, which still has the
pointer to the previous damon_call_control object, which is deallocated.
As a result, use-after-free happens.
This can actually be triggered using the DAMON sysfs interface. It is not
easily exploitable since it requires the sysfs write permission and making
a definitely weird file writes, though. Please refer to the report for
more details about the issue reproduction steps.
Fix the issue by making two changes. Firstly, move the final
kdamond_call() for cancelling all existing damon_call() requests from
terminating DAMON context to be done before the ctx->kdamond reset. This
makes any code that sees NULL ctx->kdamond can safely assume the context
may not access damon_call() requests anymore. Secondly, let damon_call()
to cleanup the damon_call_control objects that were added to the
already-terminated DAMON context, before returning the error.
Link: https://lkml.kernel.org/r/20251231012315.75835-1-sj@kernel.org
Fixes: 004ded6bee11 ("mm/damon: accept parallel damon_call() requests")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: JaeJoon Jung <rgbi3307(a)gmail.com>
Closes: https://lore.kernel.org/20251224094401.20384-1-rgbi3307@gmail.com
Cc: <stable(a)vger.kernel.org> # 6.17.x
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
--- a/mm/damon/core.c~mm-damon-core-remove-call_control-in-inactive-contexts
+++ a/mm/damon/core.c
@@ -1431,6 +1431,35 @@ bool damon_is_running(struct damon_ctx *
return running;
}
+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ * an inactive context.
+ * @ctx: The inactive DAMON context.
+ * @control: Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive). See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+ struct damon_ctx *ctx, struct damon_call_control *control)
+{
+ struct damon_call_control *c;
+
+ mutex_lock(&ctx->call_controls_lock);
+ list_for_each_entry(c, &ctx->call_controls, list) {
+ if (c == control) {
+ list_del(&control->list);
+ mutex_unlock(&ctx->call_controls_lock);
+ return -EINVAL;
+ }
+ }
+ mutex_unlock(&ctx->call_controls_lock);
+ return 0;
+}
+
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1461,7 +1490,7 @@ int damon_call(struct damon_ctx *ctx, st
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
if (!damon_is_running(ctx))
- return -EINVAL;
+ return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
@@ -2755,13 +2784,13 @@ done:
if (ctx->ops.cleanup)
ctx->ops.cleanup(ctx);
kfree(ctx->regions_score_histogram);
+ kdamond_call(ctx, true);
pr_debug("kdamond (%d) finishes\n", current->pid);
mutex_lock(&ctx->kdamond_lock);
ctx->kdamond = NULL;
mutex_unlock(&ctx->kdamond_lock);
- kdamond_call(ctx, true);
damos_walk_cancel(ctx);
mutex_lock(&damon_lock);
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-remove-call_control-in-inactive-contexts.patch
mm-damon-core-introduce-nr_snapshots-damos-stat.patch
mm-damon-sysfs-schemes-introduce-nr_snapshots-damos-stat-file.patch
docs-mm-damon-design-update-for-nr_snapshots-damos-stat.patch
docs-admin-guide-mm-damon-usage-update-for-nr_snapshots-damos-stat.patch
docs-abi-damon-update-for-nr_snapshots-damos-stat.patch
mm-damon-update-damos-kerneldoc-for-stat-field.patch
mm-damon-core-implement-max_nr_snapshots.patch
mm-damon-sysfs-schemes-implement-max_nr_snapshots-file.patch
docs-mm-damon-design-update-for-max_nr_snapshots.patch
docs-admin-guide-mm-damon-usage-update-for-max_nr_snapshots.patch
docs-abi-damon-update-for-max_nr_snapshots.patch
mm-damon-core-add-trace-point-for-damos-stat-per-apply-interval.patch
mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure.patch
mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-tests-core-kunit-add-test-cases-for-multiple-regions-in-damon_test_split_regions_of-fix.patch
The patch titled
Subject: x86/kexec: add a sanity check on previous kernel's ima kexec buffer
has been added to the -mm mm-nonmm-unstable branch. Its filename is
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Subject: x86/kexec: add a sanity check on previous kernel's ima kexec buffer
Date: Tue, 30 Dec 2025 22:16:09 -0800
When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>", the physical range that contains the carried
over IMA measurement list may fall outside the truncated RAM leading to a
kernel panic.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) ��� not-present page
Other architectures already validate the range with page_is_ram(), as done
in commit cbf9c4b9617b ("of: check previous kernel's ima-kexec-buffer
against memory bounds") do a similar check on x86.
Without carrying the measurement list across kexec, the attestation
would fail.
Link: https://lkml.kernel.org/r/20251231061609.907170-4-harshit.m.mogalapalli@ora…
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Fixes: b69a2afd5afc ("x86/kexec: Carry forward IMA measurement log on kexec")
Reported-by: Paul Webb <paul.x.webb(a)oracle.com>
Reviewed-by: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: guoweikang <guoweikang.kernel(a)gmail.com>
Cc: Henry Willard <henry.willard(a)oracle.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jiri Bohac <jbohac(a)suse.cz>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Jonathan McDowell <noodles(a)fb.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Sohil Mehta <sohil.mehta(a)intel.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yifei Liu <yifei.l.liu(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/setup.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/arch/x86/kernel/setup.c~x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer
+++ a/arch/x86/kernel/setup.c
@@ -439,9 +439,15 @@ int __init ima_free_kexec_buffer(void)
int __init ima_get_kexec_buffer(void **addr, size_t *size)
{
+ int ret;
+
if (!ima_kexec_buffer_size)
return -ENOENT;
+ ret = ima_validate_range(ima_kexec_buffer_phys, ima_kexec_buffer_size);
+ if (ret)
+ return ret;
+
*addr = __va(ima_kexec_buffer_phys);
*size = ima_kexec_buffer_size;
_
Patches currently in -mm which might be from harshit.m.mogalapalli(a)oracle.com are
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch
The patch titled
Subject: of/kexec: refactor ima_get_kexec_buffer() to use ima_validate_range()
has been added to the -mm mm-nonmm-unstable branch. Its filename is
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Subject: of/kexec: refactor ima_get_kexec_buffer() to use ima_validate_range()
Date: Tue, 30 Dec 2025 22:16:08 -0800
Refactor the OF/DT ima_get_kexec_buffer() to use a generic helper to
validate the address range. No functional change intended.
Link: https://lkml.kernel.org/r/20251231061609.907170-3-harshit.m.mogalapalli@ora…
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Reviewed-by: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: guoweikang <guoweikang.kernel(a)gmail.com>
Cc: Henry Willard <henry.willard(a)oracle.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jiri Bohac <jbohac(a)suse.cz>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Jonathan McDowell <noodles(a)fb.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Paul Webb <paul.x.webb(a)oracle.com>
Cc: Sohil Mehta <sohil.mehta(a)intel.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yifei Liu <yifei.l.liu(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/of/kexec.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
--- a/drivers/of/kexec.c~of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range
+++ a/drivers/of/kexec.c
@@ -128,7 +128,6 @@ int __init ima_get_kexec_buffer(void **a
{
int ret, len;
unsigned long tmp_addr;
- unsigned long start_pfn, end_pfn;
size_t tmp_size;
const void *prop;
@@ -144,17 +143,9 @@ int __init ima_get_kexec_buffer(void **a
if (!tmp_size)
return -ENOENT;
- /*
- * Calculate the PFNs for the buffer and ensure
- * they are with in addressable memory.
- */
- start_pfn = PHYS_PFN(tmp_addr);
- end_pfn = PHYS_PFN(tmp_addr + tmp_size - 1);
- if (!page_is_ram(start_pfn) || !page_is_ram(end_pfn)) {
- pr_warn("IMA buffer at 0x%lx, size = 0x%zx beyond memory\n",
- tmp_addr, tmp_size);
- return -EINVAL;
- }
+ ret = ima_validate_range(tmp_addr, tmp_size);
+ if (ret)
+ return ret;
*addr = __va(tmp_addr);
*size = tmp_size;
_
Patches currently in -mm which might be from harshit.m.mogalapalli(a)oracle.com are
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch
The patch titled
Subject: ima: verify the previous kernel's IMA buffer lies in addressable RAM
has been added to the -mm mm-nonmm-unstable branch. Its filename is
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Subject: ima: verify the previous kernel's IMA buffer lies in addressable RAM
Date: Tue, 30 Dec 2025 22:16:07 -0800
Patch series "Address page fault in ima_restore_measurement_list()", v3.
When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>" we observe a pafe fault that happens.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) not-present page
This happens on x86_64 only, as this is already fixed in aarch64 in
commit: cbf9c4b9617b ("of: check previous kernel's ima-kexec-buffer
against memory bounds")
This patch (of 3):
When the second-stage kernel is booted with a limiting command line (e.g.
"mem=<size>"), the IMA measurement buffer handed over from the previous
kernel may fall outside the addressable RAM of the new kernel. Accessing
such a buffer can fault during early restore.
Introduce a small generic helper, ima_validate_range(), which verifies
that a physical [start, end] range for the previous-kernel IMA buffer lies
within addressable memory:
- On x86, use pfn_range_is_mapped().
- On OF based architectures, use page_is_ram().
Link: https://lkml.kernel.org/r/20251231061609.907170-1-harshit.m.mogalapalli@ora…
Link: https://lkml.kernel.org/r/20251231061609.907170-2-harshit.m.mogalapalli@ora…
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: guoweikang <guoweikang.kernel(a)gmail.com>
Cc: Henry Willard <henry.willard(a)oracle.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jiri Bohac <jbohac(a)suse.cz>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Jonathan McDowell <noodles(a)fb.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Paul Webb <paul.x.webb(a)oracle.com>
Cc: Sohil Mehta <sohil.mehta(a)intel.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yifei Liu <yifei.l.liu(a)oracle.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/ima.h | 1
security/integrity/ima/ima_kexec.c | 35 +++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
--- a/include/linux/ima.h~ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram
+++ a/include/linux/ima.h
@@ -69,6 +69,7 @@ static inline int ima_measure_critical_d
#ifdef CONFIG_HAVE_IMA_KEXEC
int __init ima_free_kexec_buffer(void);
int __init ima_get_kexec_buffer(void **addr, size_t *size);
+int ima_validate_range(phys_addr_t phys, size_t size);
#endif
#ifdef CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT
--- a/security/integrity/ima/ima_kexec.c~ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram
+++ a/security/integrity/ima/ima_kexec.c
@@ -12,6 +12,8 @@
#include <linux/kexec.h>
#include <linux/of.h>
#include <linux/ima.h>
+#include <linux/mm.h>
+#include <linux/overflow.h>
#include <linux/reboot.h>
#include <asm/page.h>
#include "ima.h"
@@ -294,3 +296,36 @@ void __init ima_load_kexec_buffer(void)
pr_debug("Error restoring the measurement list: %d\n", rc);
}
}
+
+/*
+ * ima_validate_range - verify a physical buffer lies in addressable RAM
+ * @phys: physical start address of the buffer from previous kernel
+ * @size: size of the buffer
+ *
+ * On success return 0. On failure returns -EINVAL so callers can skip
+ * restoring.
+ */
+int ima_validate_range(phys_addr_t phys, size_t size)
+{
+ unsigned long start_pfn, end_pfn;
+ phys_addr_t end_phys;
+
+ if (check_add_overflow(phys, (phys_addr_t)size - 1, &end_phys))
+ return -EINVAL;
+
+ start_pfn = PHYS_PFN(phys);
+ end_pfn = PHYS_PFN(end_phys);
+
+#ifdef CONFIG_X86
+ if (!pfn_range_is_mapped(start_pfn, end_pfn))
+#else
+ if (!page_is_ram(start_pfn) || !page_is_ram(end_pfn))
+#endif
+ {
+ pr_warn("IMA: previous kernel measurement buffer %pa (size 0x%zx) lies outside available memory\n",
+ &phys, size);
+ return -EINVAL;
+ }
+
+ return 0;
+}
_
Patches currently in -mm which might be from harshit.m.mogalapalli(a)oracle.com are
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch