Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots

21 Dec 2022

On Tue, 20 Dec 2022 21:47:36 +0000,
Oliver Upton oliver.upton@linux.dev wrote:
...
Hi Marc,
On Tue, Dec 20, 2022 at 08:09:21PM +0000, Marc Zyngier wrote:
...
A recent development on the EFI front has resulted in guests having
their page tables baked in the firmware binary, and mapped into
the IPA space as part as a read-only memslot.
as part of a
...
Not only this is legitimate, but it also results in added security,
so thumbs up. However, this clashes mildly with our handling of a S1PTW
as a write to correctly handle AF/DB updates to the S1 PTs, and results
in the guest taking an abort it won't recover from (the PTs mapping the
vectors will suffer freom the same problem...).
To be clear, the read-only page tables already have the AF set,
right?  They certainly must, or else the guest isn't getting far :)
Yes, the guest definitely has the AF set in the PT, and is not trying
to use the HW-assisted AF (which obviously wouldn't work).
...
I understand you're trying to describe _why_ we promote S1PTW to a
write, but doing it inline with the context of the EFI issue makes
it slightly unclear. Could you break these ideas up into two
paragraphs and maybe spell out the fault conditions a bit more?
A recent development on the EFI front has resulted in guests having
  their page tables baked in the firmware binary, and mapped into the
  IPA space as part of a read-only memslot. Not only is this legitimate,
  but it also results in added security, so thumbs up.
It is possible to take an S1PTW translation fault if the S1 PTs are
  unmapped at stage-2. However, KVM unconditionally treats S1PTW as a
  write to correctly handle hardware AF/DB updates to the S1 PTs.
  Furthermore, KVM injects a data abort into the guest for S1PTW writes.
  In the aforementioned case this results in the guest taking an abort
  it won't recover from, as the S1 PTs mapping the vectors suffer from
  the same problem.
Dunno, maybe I stink at reading which is why I got confused in the
first place.
Nothing wrong with you, just that my write-up is indeed sloppy. I'll
copy paste the above, thanks!
...
...
So clearly our handling is... wrong.
Instead, switch to a two-pronged approach:

On S1PTW translation fault, handle the fault as a read

On S1PTW permission fault, handle the fault as a write


This is of no consequence to SW that *writes* to its PTs (the write
will trigger a non-S1PTW fault), and SW that uses RO PTs will not
use AF/DB anyway, as that'd be wrong.
Only in the case described in c4ad98e4b72c ("KVM: arm64: Assume write
fault on S1PTW permission fault on instruction fetch") do we end-up
with two back-to-back faults (page being evicted and faulted back).
I don't think this is a case worth optimising for.
Fixes: c4ad98e4b72c ("KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch")
Signed-off-by: Marc Zyngier maz@kernel.org
Cc: stable@vger.kernel.org

arch/arm64/include/asm/kvm_emulate.h | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 9bdba47f7e14..fd6ad8b21f85 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -373,8 +373,26 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {

if (kvm_vcpu_abt_iss1tw(vcpu))
return true;




if (kvm_vcpu_abt_iss1tw(vcpu)) {
/*


 * Only a permission fault on a S1PTW should be


 * considered as a write. Otherwise, page tables baked


 * in a read-only memslot will result in an exception


 * being delivered in the guest.



Somewhat of a tangent, but:
Aren't we somewhat unaligned with the KVM UAPI by injecting an
exception in this case? I know we've been doing it for a while, but it
flies in the face of the rules outlined in the
KVM_SET_USER_MEMORY_REGION documentation.
That's an interesting point, and I certainly haven't considered that
for faults introduced by page table walks.
I'm not sure what userspace can do with that though. The problem is
that this is a write for which we don't have useful data: although we
know it is a page-table walker access, we don't know what it was about
to write. The instruction that caused the write is meaningless (it
could either be a load, a store, or an instruction fetch). How do you
populate the data[] field then?
If anything, this is closer to KVM_EXIT_ARM_NISV, for which we give
userspace the full ESR and ask it to sort it out. I doubt it will be
able to, but hey, maybe it is worth a shot. This would need to be a
different exit reason though, as NISV is explicitly for non-memslot
stuff.
In any case, the documentation for KVM_SET_USER_MEMORY_REGION needs to
reflect the fact that KVM_EXIT_MMIO cannot represent a fault due to a
S1 PTW.
...
...

 * The drawback is that we end-up fauling twice if the



typo: s/fauling/faulting/
...

 * guest is using any of HW AF/DB: a translation fault


 * to map the page containing the PT (read only at


 * first), then a permission fault to allow the flags


 * to be set.


 */


switch (kvm_vcpu_trap_get_fault_type(vcpu)) {


case ESR_ELx_FSC_PERM:


	return true;


default:


	return false;


}


}

if (kvm_vcpu_trap_is_iabt(vcpu))
   	return false;
-- 
2.34.1
Besides the changelog/comment suggestions, the patch looks good to me.
Reviewed-by: Oliver Upton oliver.upton@linux.dev
Thanks for the quick review! I'll wait a bit before respinning the
series, as I'd like to get closure on the UAPI point you have raised.
Cheers,
M.
-- 
Without deviation from the norm, progress is not possible.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/3] KVM: arm64: Fix S1PTW handling on RO memslots