From: Sean Christopherson <seanjc(a)google.com>
When loading guest XSAVE state via KVM_SET_XSAVE, and when updating XFD in
response to a guest WRMSR, clear XFD-disabled features in the saved (or to
be restored) XSTATE_BV to ensure KVM doesn't attempt to load state for
features that are disabled via the guest's XFD. Because the kernel
executes XRSTOR with the guest's XFD, saving XSTATE_BV[i]=1 with XFD[i]=1
will cause XRSTOR to #NM and panic the kernel.
E.g. if fpu_update_guest_xfd() sets XFD without clearing XSTATE_BV:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#29: amx_test/848
Modules linked in: kvm_intel kvm irqbypass
CPU: 29 UID: 1000 PID: 848 Comm: amx_test Not tainted 6.19.0-rc2-ffa07f7fd437-x86_amx_nm_xfd_non_init-vm #171 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
switch_fpu_return+0x4a/0xb0
kvm_arch_vcpu_ioctl_run+0x1245/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
This can happen if the guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1,
and a host IRQ triggers kernel_fpu_begin() prior to the vmexit handler's
call to fpu_update_guest_xfd().
and if userspace stuffs XSTATE_BV[i]=1 via KVM_SET_XSAVE:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#14: amx_test/867
Modules linked in: kvm_intel kvm irqbypass
CPU: 14 UID: 1000 PID: 867 Comm: amx_test Not tainted 6.19.0-rc2-2dace9faccd6-x86_amx_nm_xfd_non_init-vm #168 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
fpu_swap_kvm_fpstate+0x6b/0x120
kvm_load_guest_fpu+0x30/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
The new behavior is consistent with the AMX architecture. Per Intel's SDM,
XSAVE saves XSTATE_BV as '0' for components that are disabled via XFD
(and non-compacted XSAVE saves the initial configuration of the state
component):
If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it operates as if XINUSE[i] = 0 (and the state component was
in its initial state): it saves bit i of XSTATE_BV field of the XSAVE
header as 0; in addition, XSAVE saves the initial configuration of the
state component (the other instructions do not save state component i).
Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using
a constant XFD based on the set of enabled features when XSAVEing for
a struct fpu_guest. However, having XSTATE_BV[i]=1 for XFD-disabled
features can only happen in the above interrupt case, or in similar
scenarios involving preemption on preemptible kernels, because
fpu_swap_kvm_fpstate()'s call to save_fpregs_to_fpstate() saves the
outgoing FPU state with the current XFD; and that is (on all but the
first WRMSR to XFD) the guest XFD.
Therefore, XFD can only go out of sync with XSTATE_BV in the above
interrupt case, or in similar scenarios involving preemption on
preemptible kernels, and it we can consider it (de facto) part of KVM
ABI that KVM_GET_XSAVE returns XSTATE_BV[i]=0 for XFD-disabled features.
Reported-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 820a6ee944e7 ("kvm: x86: Add emulation for IA32_XFD", 2022-01-14)
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
[Move clearing of XSTATE_BV from fpu_copy_uabi_to_guest_fpstate
to kvm_vcpu_ioctl_x86_set_xsave. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kernel/fpu/core.c | 32 +++++++++++++++++++++++++++++---
arch/x86/kvm/x86.c | 9 +++++++++
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index da233f20ae6f..166c380b0161 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -319,10 +319,29 @@ EXPORT_SYMBOL_FOR_KVM(fpu_enable_guest_xfd_features);
#ifdef CONFIG_X86_64
void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
{
+ struct fpstate *fpstate = guest_fpu->fpstate;
+
fpregs_lock();
- guest_fpu->fpstate->xfd = xfd;
- if (guest_fpu->fpstate->in_use)
- xfd_update_state(guest_fpu->fpstate);
+
+ /*
+ * KVM's guest ABI is that setting XFD[i]=1 *can* immediately revert
+ * the save state to initialized. Likewise, KVM_GET_XSAVE does the
+ * same as XSAVE and returns XSTATE_BV[i]=0 whenever XFD[i]=1.
+ *
+ * If the guest's FPU state is in hardware, just update XFD: the XSAVE
+ * in fpu_swap_kvm_fpstate will clear XSTATE_BV[i] whenever XFD[i]=1.
+ *
+ * If however the guest's FPU state is NOT resident in hardware, clear
+ * disabled components in XSTATE_BV now, or a subsequent XRSTOR will
+ * attempt to load disabled components and generate #NM _in the host_.
+ */
+ if (xfd && test_thread_flag(TIF_NEED_FPU_LOAD))
+ fpstate->regs.xsave.header.xfeatures &= ~xfd;
+
+ fpstate->xfd = xfd;
+ if (fpstate->in_use)
+ xfd_update_state(fpstate);
+
fpregs_unlock();
}
EXPORT_SYMBOL_FOR_KVM(fpu_update_guest_xfd);
@@ -430,6 +449,13 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
+ /*
+ * Disabled features must be in their initial state, otherwise XRSTOR
+ * causes an exception.
+ */
+ if (WARN_ON_ONCE(ustate->xsave.header.xfeatures & kstate->xfd))
+ return -EINVAL;
+
/*
* Nullify @vpkru to preserve its current value if PKRU's bit isn't set
* in the header. KVM's odd ABI is to leave PKRU untouched in this
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff8812f3a129..c0416f53b5f5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5807,9 +5807,18 @@ static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ union fpregs_state *xstate = (union fpregs_state *)guest_xsave->region;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
+ /*
+ * Do not reject non-initialized disabled features for backwards
+ * compatibility, but clear XSTATE_BV[i] whenever XFD[i]=1.
+ * Otherwise, XRSTOR would cause a #NM.
+ */
+ xstate->xsave.header.xfeatures &= ~vcpu->arch.guest_fpu.fpstate->xfd;
+
return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
guest_xsave->region,
kvm_caps.supported_xcr0,
--
2.52.0
If damon_call() is executed against a DAMON context that is not running,
the function returns error while keeping the damon_call_control object
linked to the context's call_controls list. Let's suppose the object is
deallocated after the damon_call(), and yet another damon_call() is
executed against the same context. The function tries to add the new
damon_call_control object to the call_controls list, which still has the
pointer to the previous damon_call_control object, which is deallocated.
As a result, use-after-free happens.
This can actually be triggered using the DAMON sysfs interface. It is
not easily exploitable since it requires the sysfs write permission and
making a definitely weird file writes, though. Please refer to the
report for more details about the issue reproduction steps.
Fix the issue by making damon_call() to cleanup the damon_call_control
object before returning the error.
Reported-by: JaeJoon Jung <rgbi3307(a)gmail.com>
Closes: https://lore.kernel.org/20251224094401.20384-1-rgbi3307@gmail.com
Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()")
Cc: <stable(a)vger.kernel.org> # 6.14.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/core.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 2d3e8006db50..65482a0ce20b 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1442,6 +1442,35 @@ bool damon_is_running(struct damon_ctx *ctx)
return running;
}
+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ * an inactive context.
+ * @ctx: The inactive DAMON context.
+ * @control: Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive). See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+ struct damon_ctx *ctx, struct damon_call_control *control)
+{
+ struct damon_call_control *c;
+
+ mutex_lock(&ctx->call_controls_lock);
+ list_for_each_entry(c, &ctx->call_controls, list) {
+ if (c == control) {
+ list_del(&control->list);
+ mutex_unlock(&ctx->call_controls_lock);
+ return -EINVAL;
+ }
+ }
+ mutex_unlock(&ctx->call_controls_lock);
+ return 0;
+}
+
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1472,7 +1501,7 @@ int damon_call(struct damon_ctx *ctx, struct damon_call_control *control)
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
if (!damon_is_running(ctx))
- return -EINVAL;
+ return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
base-commit: a100272b3541cb00a5e29fdc16e428234ebfddd1
--
2.47.3
The patch titled
Subject: mm/damon/core: remove call_control in inactive contexts
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-remove-call_control-in-inactive-contexts.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: remove call_control in inactive contexts
Date: Tue, 30 Dec 2025 17:23:13 -0800
If damon_call() is executed against a DAMON context that is not running,
the function returns error while keeping the damon_call_control object
linked to the context's call_controls list. Let's suppose the object is
deallocated after the damon_call(), and yet another damon_call() is
executed against the same context. The function tries to add the new
damon_call_control object to the call_controls list, which still has the
pointer to the previous damon_call_control object, which is deallocated.
As a result, use-after-free happens.
This can actually be triggered using the DAMON sysfs interface. It is not
easily exploitable since it requires the sysfs write permission and making
a definitely weird file writes, though. Please refer to the report for
more details about the issue reproduction steps.
Fix the issue by making two changes. Firstly, move the final
kdamond_call() for cancelling all existing damon_call() requests from
terminating DAMON context to be done before the ctx->kdamond reset. This
makes any code that sees NULL ctx->kdamond can safely assume the context
may not access damon_call() requests anymore. Secondly, let damon_call()
to cleanup the damon_call_control objects that were added to the
already-terminated DAMON context, before returning the error.
Link: https://lkml.kernel.org/r/20251231012315.75835-1-sj@kernel.org
Fixes: 004ded6bee11 ("mm/damon: accept parallel damon_call() requests")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Reported-by: JaeJoon Jung <rgbi3307(a)gmail.com>
Closes: https://lore.kernel.org/20251224094401.20384-1-rgbi3307@gmail.com
Cc: <stable(a)vger.kernel.org> # 6.17.x
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 33 +++++++++++++++++++++++++++++++--
1 file changed, 31 insertions(+), 2 deletions(-)
--- a/mm/damon/core.c~mm-damon-core-remove-call_control-in-inactive-contexts
+++ a/mm/damon/core.c
@@ -1431,6 +1431,35 @@ bool damon_is_running(struct damon_ctx *
return running;
}
+/*
+ * damon_call_handle_inactive_ctx() - handle DAMON call request that added to
+ * an inactive context.
+ * @ctx: The inactive DAMON context.
+ * @control: Control variable of the call request.
+ *
+ * This function is called in a case that @control is added to @ctx but @ctx is
+ * not running (inactive). See if @ctx handled @control or not, and cleanup
+ * @control if it was not handled.
+ *
+ * Returns 0 if @control was handled by @ctx, negative error code otherwise.
+ */
+static int damon_call_handle_inactive_ctx(
+ struct damon_ctx *ctx, struct damon_call_control *control)
+{
+ struct damon_call_control *c;
+
+ mutex_lock(&ctx->call_controls_lock);
+ list_for_each_entry(c, &ctx->call_controls, list) {
+ if (c == control) {
+ list_del(&control->list);
+ mutex_unlock(&ctx->call_controls_lock);
+ return -EINVAL;
+ }
+ }
+ mutex_unlock(&ctx->call_controls_lock);
+ return 0;
+}
+
/**
* damon_call() - Invoke a given function on DAMON worker thread (kdamond).
* @ctx: DAMON context to call the function for.
@@ -1461,7 +1490,7 @@ int damon_call(struct damon_ctx *ctx, st
list_add_tail(&control->list, &ctx->call_controls);
mutex_unlock(&ctx->call_controls_lock);
if (!damon_is_running(ctx))
- return -EINVAL;
+ return damon_call_handle_inactive_ctx(ctx, control);
if (control->repeat)
return 0;
wait_for_completion(&control->completion);
@@ -2755,13 +2784,13 @@ done:
if (ctx->ops.cleanup)
ctx->ops.cleanup(ctx);
kfree(ctx->regions_score_histogram);
+ kdamond_call(ctx, true);
pr_debug("kdamond (%d) finishes\n", current->pid);
mutex_lock(&ctx->kdamond_lock);
ctx->kdamond = NULL;
mutex_unlock(&ctx->kdamond_lock);
- kdamond_call(ctx, true);
damos_walk_cancel(ctx);
mutex_lock(&damon_lock);
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-remove-call_control-in-inactive-contexts.patch
mm-damon-core-introduce-nr_snapshots-damos-stat.patch
mm-damon-sysfs-schemes-introduce-nr_snapshots-damos-stat-file.patch
docs-mm-damon-design-update-for-nr_snapshots-damos-stat.patch
docs-admin-guide-mm-damon-usage-update-for-nr_snapshots-damos-stat.patch
docs-abi-damon-update-for-nr_snapshots-damos-stat.patch
mm-damon-update-damos-kerneldoc-for-stat-field.patch
mm-damon-core-implement-max_nr_snapshots.patch
mm-damon-sysfs-schemes-implement-max_nr_snapshots-file.patch
docs-mm-damon-design-update-for-max_nr_snapshots.patch
docs-admin-guide-mm-damon-usage-update-for-max_nr_snapshots.patch
docs-abi-damon-update-for-max_nr_snapshots.patch
mm-damon-core-add-trace-point-for-damos-stat-per-apply-interval.patch
mm-damon-sysfs-cleanup-intervals-subdirs-on-attrs-dir-setup-failure.patch
mm-damon-sysfs-cleanup-attrs-subdirs-on-context-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-quotas-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-sysfs-scheme-cleanup-access_pattern-subdirs-on-scheme-dir-setup-failure.patch
mm-damon-tests-core-kunit-add-test-cases-for-multiple-regions-in-damon_test_split_regions_of-fix.patch
The patch titled
Subject: x86/kexec: add a sanity check on previous kernel's ima kexec buffer
has been added to the -mm mm-nonmm-unstable branch. Its filename is
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Subject: x86/kexec: add a sanity check on previous kernel's ima kexec buffer
Date: Tue, 30 Dec 2025 22:16:09 -0800
When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>", the physical range that contains the carried
over IMA measurement list may fall outside the truncated RAM leading to a
kernel panic.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) ��� not-present page
Other architectures already validate the range with page_is_ram(), as done
in commit cbf9c4b9617b ("of: check previous kernel's ima-kexec-buffer
against memory bounds") do a similar check on x86.
Without carrying the measurement list across kexec, the attestation
would fail.
Link: https://lkml.kernel.org/r/20251231061609.907170-4-harshit.m.mogalapalli@ora…
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Fixes: b69a2afd5afc ("x86/kexec: Carry forward IMA measurement log on kexec")
Reported-by: Paul Webb <paul.x.webb(a)oracle.com>
Reviewed-by: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: guoweikang <guoweikang.kernel(a)gmail.com>
Cc: Henry Willard <henry.willard(a)oracle.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jiri Bohac <jbohac(a)suse.cz>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Jonathan McDowell <noodles(a)fb.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Sohil Mehta <sohil.mehta(a)intel.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yifei Liu <yifei.l.liu(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/setup.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/arch/x86/kernel/setup.c~x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer
+++ a/arch/x86/kernel/setup.c
@@ -439,9 +439,15 @@ int __init ima_free_kexec_buffer(void)
int __init ima_get_kexec_buffer(void **addr, size_t *size)
{
+ int ret;
+
if (!ima_kexec_buffer_size)
return -ENOENT;
+ ret = ima_validate_range(ima_kexec_buffer_phys, ima_kexec_buffer_size);
+ if (ret)
+ return ret;
+
*addr = __va(ima_kexec_buffer_phys);
*size = ima_kexec_buffer_size;
_
Patches currently in -mm which might be from harshit.m.mogalapalli(a)oracle.com are
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch
The patch titled
Subject: of/kexec: refactor ima_get_kexec_buffer() to use ima_validate_range()
has been added to the -mm mm-nonmm-unstable branch. Its filename is
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Subject: of/kexec: refactor ima_get_kexec_buffer() to use ima_validate_range()
Date: Tue, 30 Dec 2025 22:16:08 -0800
Refactor the OF/DT ima_get_kexec_buffer() to use a generic helper to
validate the address range. No functional change intended.
Link: https://lkml.kernel.org/r/20251231061609.907170-3-harshit.m.mogalapalli@ora…
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Reviewed-by: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: guoweikang <guoweikang.kernel(a)gmail.com>
Cc: Henry Willard <henry.willard(a)oracle.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jiri Bohac <jbohac(a)suse.cz>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Jonathan McDowell <noodles(a)fb.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Paul Webb <paul.x.webb(a)oracle.com>
Cc: Sohil Mehta <sohil.mehta(a)intel.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yifei Liu <yifei.l.liu(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/of/kexec.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
--- a/drivers/of/kexec.c~of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range
+++ a/drivers/of/kexec.c
@@ -128,7 +128,6 @@ int __init ima_get_kexec_buffer(void **a
{
int ret, len;
unsigned long tmp_addr;
- unsigned long start_pfn, end_pfn;
size_t tmp_size;
const void *prop;
@@ -144,17 +143,9 @@ int __init ima_get_kexec_buffer(void **a
if (!tmp_size)
return -ENOENT;
- /*
- * Calculate the PFNs for the buffer and ensure
- * they are with in addressable memory.
- */
- start_pfn = PHYS_PFN(tmp_addr);
- end_pfn = PHYS_PFN(tmp_addr + tmp_size - 1);
- if (!page_is_ram(start_pfn) || !page_is_ram(end_pfn)) {
- pr_warn("IMA buffer at 0x%lx, size = 0x%zx beyond memory\n",
- tmp_addr, tmp_size);
- return -EINVAL;
- }
+ ret = ima_validate_range(tmp_addr, tmp_size);
+ if (ret)
+ return ret;
*addr = __va(tmp_addr);
*size = tmp_size;
_
Patches currently in -mm which might be from harshit.m.mogalapalli(a)oracle.com are
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch
The patch titled
Subject: ima: verify the previous kernel's IMA buffer lies in addressable RAM
has been added to the -mm mm-nonmm-unstable branch. Its filename is
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via various
branches at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there most days
------------------------------------------------------
From: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Subject: ima: verify the previous kernel's IMA buffer lies in addressable RAM
Date: Tue, 30 Dec 2025 22:16:07 -0800
Patch series "Address page fault in ima_restore_measurement_list()", v3.
When the second-stage kernel is booted via kexec with a limiting command
line such as "mem=<size>" we observe a pafe fault that happens.
BUG: unable to handle page fault for address: ffff97793ff47000
RIP: ima_restore_measurement_list+0xdc/0x45a
#PF: error_code(0x0000) not-present page
This happens on x86_64 only, as this is already fixed in aarch64 in
commit: cbf9c4b9617b ("of: check previous kernel's ima-kexec-buffer
against memory bounds")
This patch (of 3):
When the second-stage kernel is booted with a limiting command line (e.g.
"mem=<size>"), the IMA measurement buffer handed over from the previous
kernel may fall outside the addressable RAM of the new kernel. Accessing
such a buffer can fault during early restore.
Introduce a small generic helper, ima_validate_range(), which verifies
that a physical [start, end] range for the previous-kernel IMA buffer lies
within addressable memory:
- On x86, use pfn_range_is_mapped().
- On OF based architectures, use page_is_ram().
Link: https://lkml.kernel.org/r/20251231061609.907170-1-harshit.m.mogalapalli@ora…
Link: https://lkml.kernel.org/r/20251231061609.907170-2-harshit.m.mogalapalli@ora…
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Cc: Alexander Graf <graf(a)amazon.com>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: guoweikang <guoweikang.kernel(a)gmail.com>
Cc: Henry Willard <henry.willard(a)oracle.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jiri Bohac <jbohac(a)suse.cz>
Cc: Joel Granados <joel.granados(a)kernel.org>
Cc: Jonathan McDowell <noodles(a)fb.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Paul Webb <paul.x.webb(a)oracle.com>
Cc: Sohil Mehta <sohil.mehta(a)intel.com>
Cc: Sourabh Jain <sourabhjain(a)linux.ibm.com>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Yifei Liu <yifei.l.liu(a)oracle.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/ima.h | 1
security/integrity/ima/ima_kexec.c | 35 +++++++++++++++++++++++++++
2 files changed, 36 insertions(+)
--- a/include/linux/ima.h~ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram
+++ a/include/linux/ima.h
@@ -69,6 +69,7 @@ static inline int ima_measure_critical_d
#ifdef CONFIG_HAVE_IMA_KEXEC
int __init ima_free_kexec_buffer(void);
int __init ima_get_kexec_buffer(void **addr, size_t *size);
+int ima_validate_range(phys_addr_t phys, size_t size);
#endif
#ifdef CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT
--- a/security/integrity/ima/ima_kexec.c~ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram
+++ a/security/integrity/ima/ima_kexec.c
@@ -12,6 +12,8 @@
#include <linux/kexec.h>
#include <linux/of.h>
#include <linux/ima.h>
+#include <linux/mm.h>
+#include <linux/overflow.h>
#include <linux/reboot.h>
#include <asm/page.h>
#include "ima.h"
@@ -294,3 +296,36 @@ void __init ima_load_kexec_buffer(void)
pr_debug("Error restoring the measurement list: %d\n", rc);
}
}
+
+/*
+ * ima_validate_range - verify a physical buffer lies in addressable RAM
+ * @phys: physical start address of the buffer from previous kernel
+ * @size: size of the buffer
+ *
+ * On success return 0. On failure returns -EINVAL so callers can skip
+ * restoring.
+ */
+int ima_validate_range(phys_addr_t phys, size_t size)
+{
+ unsigned long start_pfn, end_pfn;
+ phys_addr_t end_phys;
+
+ if (check_add_overflow(phys, (phys_addr_t)size - 1, &end_phys))
+ return -EINVAL;
+
+ start_pfn = PHYS_PFN(phys);
+ end_pfn = PHYS_PFN(end_phys);
+
+#ifdef CONFIG_X86
+ if (!pfn_range_is_mapped(start_pfn, end_pfn))
+#else
+ if (!page_is_ram(start_pfn) || !page_is_ram(end_pfn))
+#endif
+ {
+ pr_warn("IMA: previous kernel measurement buffer %pa (size 0x%zx) lies outside available memory\n",
+ &phys, size);
+ return -EINVAL;
+ }
+
+ return 0;
+}
_
Patches currently in -mm which might be from harshit.m.mogalapalli(a)oracle.com are
ima-verify-the-previous-kernels-ima-buffer-lies-in-addressable-ram.patch
of-kexec-refactor-ima_get_kexec_buffer-to-use-ima_validate_range.patch
x86-kexec-add-a-sanity-check-on-previous-kernels-ima-kexec-buffer.patch
crypto_alloc_acomp_node() may return ERR_PTR(), but the fail path checks
only for NULL and can pass an error pointer to crypto_free_acomp().
Use IS_ERR_OR_NULL() to only free valid acomp instances.
Fixes: 779b9955f643 ("mm: zswap: move allocations during CPU init outside the lock")
Cc: stable(a)vger.kernel.org
Signed-off-by: Pavel Butsykin <pbutsykin(a)cloudlinux.com>
---
mm/zswap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index 5d0f8b13a958..ac9b7a60736b 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -787,7 +787,7 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
return 0;
fail:
- if (acomp)
+ if (!IS_ERR_OR_NULL(acomp))
crypto_free_acomp(acomp);
kfree(buffer);
return ret;
--
2.52.0
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x bba4322e3f303b2d656e748be758320b567f046f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122927-untapped-stimulate-e26d@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bba4322e3f303b2d656e748be758320b567f046f Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Wed, 5 Nov 2025 06:22:36 +0900
Subject: [PATCH] block: freeze queue when updating zone resources
Modify disk_update_zone_resources() to freeze the device queue before
updating the number of zones, zone capacity and other zone related
resources. The locking order resulting from the call to
queue_limits_commit_update_frozen() is preserved, that is, the queue
limits lock is first taken by calling queue_limits_start_update() before
freezing the queue, and the queue is unfrozen after executing
queue_limits_commit_update(), which replaces the call to
queue_limits_commit_update_frozen().
This change ensures that there are no in-flights I/Os when the zone
resources are updated due to a zone revalidation. In case of error when
the limits are applied, directly call disk_free_zone_resources() from
disk_update_zone_resources() while the disk queue is still frozen to
avoid needing to freeze & unfreeze the queue again in
blk_revalidate_disk_zones(), thus simplifying that function code a
little.
Fixes: 0b83c86b444a ("block: Prevent potential deadlock in blk_revalidate_disk_zones()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Reviewed-by: Hannes Reinecke <hare(a)suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 1621e8f78338..39381f2b2e94 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -1557,8 +1557,13 @@ static int disk_update_zone_resources(struct gendisk *disk,
{
struct request_queue *q = disk->queue;
unsigned int nr_seq_zones, nr_conv_zones;
- unsigned int pool_size;
+ unsigned int pool_size, memflags;
struct queue_limits lim;
+ int ret = 0;
+
+ lim = queue_limits_start_update(q);
+
+ memflags = blk_mq_freeze_queue(q);
disk->nr_zones = args->nr_zones;
disk->zone_capacity = args->zone_capacity;
@@ -1568,11 +1573,10 @@ static int disk_update_zone_resources(struct gendisk *disk,
if (nr_conv_zones >= disk->nr_zones) {
pr_warn("%s: Invalid number of conventional zones %u / %u\n",
disk->disk_name, nr_conv_zones, disk->nr_zones);
- return -ENODEV;
+ ret = -ENODEV;
+ goto unfreeze;
}
- lim = queue_limits_start_update(q);
-
/*
* Some devices can advertize zone resource limits that are larger than
* the number of sequential zones of the zoned block device, e.g. a
@@ -1609,7 +1613,15 @@ static int disk_update_zone_resources(struct gendisk *disk,
}
commit:
- return queue_limits_commit_update_frozen(q, &lim);
+ ret = queue_limits_commit_update(q, &lim);
+
+unfreeze:
+ if (ret)
+ disk_free_zone_resources(disk);
+
+ blk_mq_unfreeze_queue(q, memflags);
+
+ return ret;
}
static int blk_revalidate_conv_zone(struct blk_zone *zone, unsigned int idx,
@@ -1774,7 +1786,7 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
sector_t zone_sectors = q->limits.chunk_sectors;
sector_t capacity = get_capacity(disk);
struct blk_revalidate_zone_args args = { };
- unsigned int noio_flag;
+ unsigned int memflags, noio_flag;
int ret = -ENOMEM;
if (WARN_ON_ONCE(!blk_queue_is_zoned(q)))
@@ -1824,20 +1836,14 @@ int blk_revalidate_disk_zones(struct gendisk *disk)
ret = -ENODEV;
}
- /*
- * Set the new disk zone parameters only once the queue is frozen and
- * all I/Os are completed.
- */
if (ret > 0)
- ret = disk_update_zone_resources(disk, &args);
- else
- pr_warn("%s: failed to revalidate zones\n", disk->disk_name);
- if (ret) {
- unsigned int memflags = blk_mq_freeze_queue(q);
+ return disk_update_zone_resources(disk, &args);
- disk_free_zone_resources(disk);
- blk_mq_unfreeze_queue(q, memflags);
- }
+ pr_warn("%s: failed to revalidate zones\n", disk->disk_name);
+
+ memflags = blk_mq_freeze_queue(q);
+ disk_free_zone_resources(disk);
+ blk_mq_unfreeze_queue(q, memflags);
return ret;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x de83d4617f9fe059623e97acf7e1e10d209625b5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122941-reluctant-exhale-a49f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de83d4617f9fe059623e97acf7e1e10d209625b5 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 20 Oct 2025 06:53:10 +0200
Subject: [PATCH] iommu/mediatek: fix use-after-free on probe deferral
The driver is dropping the references taken to the larb devices during
probe after successful lookup as well as on errors. This can
potentially lead to a use-after-free in case a larb device has not yet
been bound to its driver so that the iommu driver probe defers.
Fix this by keeping the references as expected while the iommu driver is
bound.
Fixes: 26593928564c ("iommu/mediatek: Add error path for loop of mm_dts_parse")
Cc: stable(a)vger.kernel.org
Cc: Yong Wu <yong.wu(a)mediatek.com>
Acked-by: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Reviewed-by: Yong Wu <yong.wu(a)mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Signed-off-by: Joerg Roedel <joerg.roedel(a)amd.com>
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 82a55fe19a62..54d8936d9d11 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -1213,16 +1213,19 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match **m
}
component_match_add(dev, match, component_compare_dev, &plarbdev->dev);
- platform_device_put(plarbdev);
}
- if (!frst_avail_smicomm_node)
- return -EINVAL;
+ if (!frst_avail_smicomm_node) {
+ ret = -EINVAL;
+ goto err_larbdev_put;
+ }
pcommdev = of_find_device_by_node(frst_avail_smicomm_node);
of_node_put(frst_avail_smicomm_node);
- if (!pcommdev)
- return -ENODEV;
+ if (!pcommdev) {
+ ret = -ENODEV;
+ goto err_larbdev_put;
+ }
data->smicomm_dev = &pcommdev->dev;
link = device_link_add(data->smicomm_dev, dev,
@@ -1230,7 +1233,8 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match **m
platform_device_put(pcommdev);
if (!link) {
dev_err(dev, "Unable to link %s.\n", dev_name(data->smicomm_dev));
- return -EINVAL;
+ ret = -EINVAL;
+ goto err_larbdev_put;
}
return 0;
@@ -1402,8 +1406,12 @@ static int mtk_iommu_probe(struct platform_device *pdev)
iommu_device_sysfs_remove(&data->iommu);
out_list_del:
list_del(&data->list);
- if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM))
+ if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
device_link_remove(data->smicomm_dev, dev);
+
+ for (i = 0; i < MTK_LARB_NR_MAX; i++)
+ put_device(data->larb_imu[i].dev);
+ }
out_runtime_disable:
pm_runtime_disable(dev);
return ret;
@@ -1423,6 +1431,9 @@ static void mtk_iommu_remove(struct platform_device *pdev)
if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) {
device_link_remove(data->smicomm_dev, &pdev->dev);
component_master_del(&pdev->dev, &mtk_iommu_com_ops);
+
+ for (i = 0; i < MTK_LARB_NR_MAX; i++)
+ put_device(data->larb_imu[i].dev);
}
pm_runtime_disable(&pdev->dev);
for (i = 0; i < data->plat_data->banks_num; i++) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5654889a94b0de5ad6ceae3793e7f5e0b61b50b6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025122900-ripple-expert-4378@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5654889a94b0de5ad6ceae3793e7f5e0b61b50b6 Mon Sep 17 00:00:00 2001
From: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Date: Fri, 14 Nov 2025 11:33:13 +0100
Subject: [PATCH] ARM: dts: microchip: sama7g5: fix uart fifo size to 32
On some flexcom nodes related to uart, the fifo sizes were wrong: fix
them to 32 data.
Fixes: 7540629e2fc7 ("ARM: dts: at91: add sama7g5 SoC DT and sama7g5-ek")
Cc: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Nicolas Ferre <nicolas.ferre(a)microchip.com>
Link: https://lore.kernel.org/r/20251114103313.20220-2-nicolas.ferre@microchip.com
Signed-off-by: Claudiu Beznea <claudiu.beznea(a)tuxon.dev>
diff --git a/arch/arm/boot/dts/microchip/sama7g5.dtsi b/arch/arm/boot/dts/microchip/sama7g5.dtsi
index 381cbcfcb34a..03ef3d9aaeec 100644
--- a/arch/arm/boot/dts/microchip/sama7g5.dtsi
+++ b/arch/arm/boot/dts/microchip/sama7g5.dtsi
@@ -824,7 +824,7 @@ uart4: serial@200 {
dma-names = "tx", "rx";
atmel,use-dma-rx;
atmel,use-dma-tx;
- atmel,fifo-size = <16>;
+ atmel,fifo-size = <32>;
status = "disabled";
};
};
@@ -850,7 +850,7 @@ uart7: serial@200 {
dma-names = "tx", "rx";
atmel,use-dma-rx;
atmel,use-dma-tx;
- atmel,fifo-size = <16>;
+ atmel,fifo-size = <32>;
status = "disabled";
};
};