Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") tracks the currently running vCPU, clearing the pointer to NULL on exit from a guest.
Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS code to go off into the weeds when it saves the DISR assuming that the CPU context is embedded in a struct vCPU.
Leave x1 alone and use x3 as a temporary register instead when clearing the vCPU on the guest exit path.
Cc: Marc Zyngier maz@kernel.org Cc: Andrew Scull ascull@google.com Cc: stable@vger.kernel.org Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") Suggested-by: Quentin Perret qperret@google.com Signed-off-by: Will Deacon will@kernel.org ---
This was pretty awful to debug!
arch/arm64/kvm/hyp/entry.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index b0afad7a99c6..0c66a1d408fd 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -146,7 +146,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // Now restore the hyp regs restore_callee_saved_regs x2
- set_loaded_vcpu xzr, x1, x2 + set_loaded_vcpu xzr, x2, x3
alternative_if ARM64_HAS_RAS_EXTN // If we have the RAS extensions we can consume a pending error
On 2021-02-26 18:12, Will Deacon wrote:
Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") tracks the currently running vCPU, clearing the pointer to NULL on exit from a guest.
Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS code to go off into the weeds when it saves the DISR assuming that the CPU context is embedded in a struct vCPU.
Leave x1 alone and use x3 as a temporary register instead when clearing the vCPU on the guest exit path.
Cc: Marc Zyngier maz@kernel.org Cc: Andrew Scull ascull@google.com Cc: stable@vger.kernel.org Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") Suggested-by: Quentin Perret qperret@google.com Signed-off-by: Will Deacon will@kernel.org
This was pretty awful to debug!
arch/arm64/kvm/hyp/entry.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index b0afad7a99c6..0c66a1d408fd 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -146,7 +146,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // Now restore the hyp regs restore_callee_saved_regs x2
- set_loaded_vcpu xzr, x1, x2
- set_loaded_vcpu xzr, x2, x3
alternative_if ARM64_HAS_RAS_EXTN // If we have the RAS extensions we can consume a pending error
Grmbl... How comes we have never seen that for the past 5 months, including on CPUs that implement RAS?
Thanks,
M.
On Fri, Feb 26, 2021 at 06:35:42PM +0000, Marc Zyngier wrote:
On 2021-02-26 18:12, Will Deacon wrote:
Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") tracks the currently running vCPU, clearing the pointer to NULL on exit from a guest.
Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS code to go off into the weeds when it saves the DISR assuming that the CPU context is embedded in a struct vCPU.
Leave x1 alone and use x3 as a temporary register instead when clearing the vCPU on the guest exit path.
Cc: Marc Zyngier maz@kernel.org Cc: Andrew Scull ascull@google.com Cc: stable@vger.kernel.org Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") Suggested-by: Quentin Perret qperret@google.com Signed-off-by: Will Deacon will@kernel.org
This was pretty awful to debug!
arch/arm64/kvm/hyp/entry.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index b0afad7a99c6..0c66a1d408fd 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -146,7 +146,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // Now restore the hyp regs restore_callee_saved_regs x2
- set_loaded_vcpu xzr, x1, x2
- set_loaded_vcpu xzr, x2, x3
alternative_if ARM64_HAS_RAS_EXTN // If we have the RAS extensions we can consume a pending error
Grmbl... How comes we have never seen that for the past 5 months, including on CPUs that implement RAS?
I think it's probably a combination of (a) not having a massive testing community (b) not having tools that would scream about this (e.g. I don't think you could detect this with KASAN) and (c) the nature of the corruption being mostly benign in practice.
We found it in pKVM development because it landed on the vtcr we were restoring when coming out of suspend, which then meant the page-table code went wonky on the next stage-2 fault because it got the wrong start level and kept returning -EAGAIN because it thought a table was a leaf. So even then, the failure mode is horribly subtle.
Will
On Fri, 26 Feb 2021 18:12:11 +0000, Will Deacon wrote:
Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") tracks the currently running vCPU, clearing the pointer to NULL on exit from a guest.
Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS code to go off into the weeds when it saves the DISR assuming that the CPU context is embedded in a struct vCPU.
[...]
Applied to kvmarm-master/fixes, thanks!
[1/1] KVM: arm64: Avoid corrupting vCPU context register in guest exit commit: a8a0f5dbcdf57d89bb8d555c6423763d99a156c1
Cheers,
M.
linux-stable-mirror@lists.linaro.org