On Sat, Jul 19, 2025 at 2:24 PM Jiaqi Yan jiaqiyan@google.com wrote:
On Sat, Jul 12, 2025 at 12:57 PM Oliver Upton oliver.upton@linux.dev wrote:
On Fri, Jul 11, 2025 at 04:59:11PM -0700, Jiaqi Yan wrote:
- Add some detail about FEAT_RAS where we may still exit to userspace for host-controlled memory, as we cannot differentiate between a stage-1 or stage-2 TTW SEA when taken on the descriptor PA
Ah, IIUC, you are saying even if the FSC code tells fault is on TTW (esr_fsc_is_secc_ttw or esr_fsc_is_sea_ttw), it can either be guest stage-1's or stage-2's descriptor PA, and we can tell which from which.
However, if ESR_ELx_S1PTW is set, we can tell this is a sub-case of stage-2 descriptor PA, their usage is for stage-1 PTW but they are stage-2 memory.
Is my current understanding right?
Yep, that's exactly what I'm getting at. As you note, stage-2 aborts during a stage-1 walk are sufficiently described, but not much else.
Got it, thanks!
+/*
- Returns true if the SEA should be handled locally within KVM if the abort is
- caused by a kernel memory allocation (e.g. stage-2 table memory).
- */
+static bool host_owns_sea(struct kvm_vcpu *vcpu, u64 esr) +{
/*
* Without FEAT_RAS HCR_EL2.TEA is RES0, meaning any external abort
* taken from a guest EL to EL2 is due to a host-imposed access (e.g.
* stage-2 PTW).
*/
if (!cpus_have_final_cap(ARM64_HAS_RAS_EXTN))
return true;
/* KVM owns the VNCR when the vCPU isn't in a nested context. */
if (is_hyp_ctxt(vcpu) && (esr & ESR_ELx_VNCR))
return true;
/*
* Determining if an external abort during a table walk happened at
* stage-2 is only possible with S1PTW is set. Otherwise, since KVM
* sets HCR_EL2.TEA, SEAs due to a stage-1 walk (i.e. accessing the PA
* of the stage-1 descriptor) can reach here and are reported with a
* TTW ESR value.
*/
return esr_fsc_is_sea_ttw(esr) && (esr & ESR_ELx_S1PTW);
Should we include esr_fsc_is_secc_ttw? like (esr_fsc_is_sea_ttw(esr) || esr_fsc_is_secc_ttw(esr)) && (esr & ESR_ELx_S1PTW)
Parity / ECC errors are not permitted if FEAT_RAS is implemented (which is tested for up front).
Ah, thanks for pointing this out.
+}
int kvm_handle_guest_sea(struct kvm_vcpu *vcpu) {
u64 esr = kvm_vcpu_get_esr(vcpu);
struct kvm_run *run = vcpu->run;
struct kvm *kvm = vcpu->kvm;
u64 esr_mask = ESR_ELx_EC_MASK |
ESR_ELx_FnV |
ESR_ELx_EA |
ESR_ELx_CM |
ESR_ELx_WNR |
ESR_ELx_FSC;
Do you (and why) exclude ESR_ELx_IL on purpose?
Unintended :)
Will add into my patch.
BTW, if my previous statement about TTW SEA is correct, then I also understand why we need to explicitly exclude ESR_ELx_S1PTW.
Right, we shouldn't be exposing genuine stage-2 external aborts to userspace.
u64 ipa;
/* * Give APEI the opportunity to claim the abort before handling it * within KVM. apei_claim_sea() expects to be called with IRQs
@@ -1824,7 +1864,32 @@ int kvm_handle_guest_sea(struct kvm_vcpu *vcpu) if (apei_claim_sea(NULL) == 0)
I assume kvm should still lockdep_assert_irqs_enabled(), right? That is, a WARN_ON_ONCE is still useful in case?
Ah, this is diffed against my VNCR prefix which has this context. Yes, I want to preserve the lockdep assertion.
Thanks for sharing the patch! Should I wait for you to send and queue to kvmarm/next and rebase my v3 to it? Or should I insert it into my v3 patch series with you as the commit author, and Signed-off-by you?
Friendly ping for this question, my v3 is ready but want to confirm the best option here.
Recently we found even the newer ARM64 platforms used by our org has to rely on KVM to more gracefully handle SEA (lacking support from APEI), so we would really want to work with upstream to lock down the proposed approach/UAPI asap.
Thanks!
BTW, while I am working on v3, I think it is probably better to decouple the current patchset into two. The first one for KVM_EXIT_ARM_SEA, and the second one for injecting (D|I)ABT with user-supplemented esr. This way may help KVM_EXIT_ARM_SEA, the more important feature, get reviewed and accepted sooner. I will send out a separate patchset for enhancing the guest SEA injection.
From eb63dbf07b3d1f42b059f5c94abd147d195299c8 Mon Sep 17 00:00:00 2001 From: Oliver Upton oliver.upton@linux.dev Date: Thu, 10 Jul 2025 17:14:51 -0700 Subject: [PATCH] KVM: arm64: nv: Handle SEAs due to VNCR redirection
Signed-off-by: Oliver Upton oliver.upton@linux.dev
arch/arm64/include/asm/kvm_mmu.h | 1 + arch/arm64/include/asm/kvm_ras.h | 25 ------------------------- arch/arm64/kvm/mmu.c | 30 ++++++++++++++++++------------ arch/arm64/kvm/nested.c | 3 +++ 4 files changed, 22 insertions(+), 37 deletions(-) delete mode 100644 arch/arm64/include/asm/kvm_ras.h
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index ae563ebd6aee..e4069f2ce642 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -180,6 +180,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu); int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, phys_addr_t pa, unsigned long size, bool writable);
+int kvm_handle_guest_sea(struct kvm_vcpu *vcpu); int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
phys_addr_t kvm_mmu_get_httbr(void); diff --git a/arch/arm64/include/asm/kvm_ras.h b/arch/arm64/include/asm/kvm_ras.h deleted file mode 100644 index 9398ade632aa..000000000000 --- a/arch/arm64/include/asm/kvm_ras.h +++ /dev/null @@ -1,25 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* Copyright (C) 2018 - Arm Ltd */
-#ifndef __ARM64_KVM_RAS_H__ -#define __ARM64_KVM_RAS_H__
-#include <linux/acpi.h> -#include <linux/errno.h> -#include <linux/types.h>
-#include <asm/acpi.h>
-/*
- Was this synchronous external abort a RAS notification?
- Returns '0' for errors handled by some RAS subsystem, or -ENOENT.
- */
-static inline int kvm_handle_guest_sea(void) -{
/* apei_claim_sea(NULL) expects to mask interrupts itself */
lockdep_assert_irqs_enabled();
return apei_claim_sea(NULL);
-}
-#endif /* __ARM64_KVM_RAS_H__ */ diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 1c78864767c5..6934f4acdc45 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -4,19 +4,20 @@
- Author: Christoffer Dall c.dall@virtualopensystems.com
*/
+#include <linux/acpi.h> #include <linux/mman.h> #include <linux/kvm_host.h> #include <linux/io.h> #include <linux/hugetlb.h> #include <linux/sched/signal.h> #include <trace/events/kvm.h> +#include <asm/acpi.h> #include <asm/pgalloc.h> #include <asm/cacheflush.h> #include <asm/kvm_arm.h> #include <asm/kvm_mmu.h> #include <asm/kvm_pgtable.h> #include <asm/kvm_pkvm.h> -#include <asm/kvm_ras.h> #include <asm/kvm_asm.h> #include <asm/kvm_emulate.h> #include <asm/virt.h> @@ -1811,6 +1812,20 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) read_unlock(&vcpu->kvm->mmu_lock); }
+int kvm_handle_guest_sea(struct kvm_vcpu *vcpu) +{
/*
* Give APEI the opportunity to claim the abort before handling it
* within KVM. apei_claim_sea() expects to be called with IRQs
* enabled.
*/
lockdep_assert_irqs_enabled();
if (apei_claim_sea(NULL) == 0)
return 1;
return kvm_inject_serror(vcpu);
+}
/**
- kvm_handle_guest_abort - handles all 2nd stage aborts
- @vcpu: the VCPU pointer
@@ -1834,17 +1849,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) gfn_t gfn; int ret, idx;
/* Synchronous External Abort? */
if (kvm_vcpu_abt_issea(vcpu)) {
/*
* For RAS the host kernel may handle this abort.
* There is no need to pass the error into the guest.
*/
if (kvm_handle_guest_sea())
return kvm_inject_serror(vcpu);
return 1;
}
if (kvm_vcpu_abt_issea(vcpu))
return kvm_handle_guest_sea(vcpu); esr = kvm_vcpu_get_esr(vcpu);
diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c index 096747a61bf6..38b0e3a9a6db 100644 --- a/arch/arm64/kvm/nested.c +++ b/arch/arm64/kvm/nested.c @@ -1289,6 +1289,9 @@ int kvm_handle_vncr_abort(struct kvm_vcpu *vcpu)
BUG_ON(!(esr & ESR_ELx_VNCR_SHIFT));
if (kvm_vcpu_abt_issea(vcpu))
return kvm_handle_guest_sea(vcpu);
if (esr_fsc_is_permission_fault(esr)) { inject_vncr_perm(vcpu); } else if (esr_fsc_is_translation_fault(esr)) {
-- 2.39.5