On Wed, Sep 09, 2020 at 10:05:27PM +0100, Marc Zyngier wrote:
KVM currently assumes that an instruction abort can never be a write. This is in general true, except when the abort is triggered by a S1PTW on instruction fetch that tries to update the S1 page tables (to set AF, for example).
This can happen if the page tables have been paged out and brought back in without seeing a direct write to them (they are thus marked read only), and the fault handling code will make the PT executable(!) instead of writable. The guest gets stuck forever.
In these conditions, the permission fault must be considered as a write so that the Stage-1 update can take place. This is essentially the I-side equivalent of the problem fixed by 60e21a0ef54c ("arm64: KVM: Take S1 walks into account when determining S2 write faults").
Update both kvm_is_write_fault() to return true on IABT+S1PTW, as well as kvm_vcpu_trap_is_iabt() to return false in the same conditions.
Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier maz@kernel.org
This could do with some cleanup (kvm_vcpu_dabt_iss1tw has nothing to do with data aborts), but I've chosen to keep the patch simple in order to ease backporting.
arch/arm64/include/asm/kvm_emulate.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index d21676409a24..33d7e16edaa3 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -480,7 +480,8 @@ static __always_inline u8 kvm_vcpu_trap_get_class(const struct kvm_vcpu *vcpu) static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu) {
- return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
- return (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW &&
!kvm_vcpu_dabt_iss1tw(vcpu));
} static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu) @@ -520,6 +521,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu) static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu) {
- if (kvm_vcpu_dabt_iss1tw(vcpu))
return true;
Hmm, I'm a bit uneasy about the interaction of this with kvm_handle_guest_abort() if we take an S1PTW fault on instruction fetch with our page-tables sitting in a read-only memslot. In this case, I think we'll end up injecting a data abort into the guest instead of an instruction abort. It hurts my brain thinking about it though.
Overall, I'd be inclined to:
1. Rename kvm_vcpu_dabt_iss1tw() to kvm_vcpu_abt_iss1tw()
2. Introduce something like kvm_is_exec_fault() as:
return kvm_vcpu_trap_is_iabt() && !kvm_vcpu_abt_iss1tw();
3. Use that new function in user_mem_abort() to assign 'exec_fault'
4. Hack kvm_is_write_fault() as you have done above.
Which I _think_ should work (famous last words)...
The only nasty bit is that we then duplicate the kvm_vcpu_dabt_iss1tw() check in both kvm_is_write_fault() and kvm_vcpu_dabt_iswrite(). Perhaps we could remove the latter function in favour of the first? Anyway, obviously this sort of cleanup isn't for stable.
Will