The root cause is kvm_lapic_set_base() failing to handle x2APIC -> xapic ID switch, which is addressed by patch 1. Patch 2 provides a selftest to verify this behavior.
This serie is an RFC because I think that commit ef40757743b47 already tries to fix one such effect of the error made in kvm_lapic_set_base, but I am not sure how such error described in the commit message is triggered, nor how to reproduce it using a selftest. I don't think one can enable/disable x2APIC using KVM_SET_LAPIC, and kvm_lapic_set_base() in kvm_apic_set_state() just takes care of updating apic->base_address, since value == old_value. The test in patch 2 fails with the fix in ef40757743b47.
Thank you, Emanuele
Emanuele Giuseppe Esposito (2): KVM: x86: update APIC_ID also when disabling x2APIC in kvm_lapic_set_base KVM: selftests: APIC_ID must be correctly updated when disabling x2apic
arch/x86/kvm/lapic.c | 8 ++- .../selftests/kvm/x86_64/xapic_state_test.c | 64 +++++++++++++++++++ 2 files changed, 70 insertions(+), 2 deletions(-)
If KVM_SET_MSR firstly enables and then disables x2APIC, make sure APIC_ID is actually updated correctly, since bits and offset differ from xAPIC and x2APIC.
Currently this is not handled correctly, as kvm_set_apic_base() will have msr_info->host_initiated, so switching from x2APIC to xAPIC won't fail, but kvm_lapic_set_base() does not handle the case.
Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com --- arch/x86/kvm/lapic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4efdb4a4d72c..df0a50099aa2 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2394,8 +2394,12 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) } }
- if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) - kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id); + if ((old_value ^ value) & X2APIC_ENABLE) { + if (value & X2APIC_ENABLE) + kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id); + else + kvm_apic_set_xapic_id(apic, vcpu->vcpu_id); + }
if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) { kvm_vcpu_update_apicv(vcpu);
On Mon, 2023-01-09 at 08:06 -0500, Emanuele Giuseppe Esposito wrote:
If KVM_SET_MSR firstly enables and then disables x2APIC, make sure APIC_ID is actually updated correctly, since bits and offset differ from xAPIC and x2APIC.
Currently this is not handled correctly, as kvm_set_apic_base() will have msr_info->host_initiated, so switching from x2APIC to xAPIC won't fail, but kvm_lapic_set_base() does not handle the case.
Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com
arch/x86/kvm/lapic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4efdb4a4d72c..df0a50099aa2 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2394,8 +2394,12 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) } }
- if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
- if ((old_value ^ value) & X2APIC_ENABLE) {
if (value & X2APIC_ENABLE)
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
else
kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
- }
if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) { kvm_vcpu_update_apicv(vcpu);
I don't think that this patch is 100% needed in a strict sense, but I don't object to it either.
The switch between x2apic and xapic mode is not allowed by X86 spec, while vise versa is allowed and I think that the spec says that in this case APIC ID is restored to its default value.
When QEMU does x2apic->xapic switch anyway to reset the vCPU, it should both upload the IA32_APIC_BASE and all of the apic registers via KVM_SET_LAPIC, and it looks like that is what Qemu does:
static void kvm_apic_put(CPUState *cs, run_on_cpu_data data) { ...
// this calls KVM_SET_MSRS with MSR_IA32_APICBASE, and APIC might be with wrong apic id after this kvm_put_apicbase(s->cpu, s->apicbase);
// this just initializes the kapic with apic state to upload kvm_put_apic_state(s, &kapic);
// and this uploads the apic state, including the APIC ID register
ret = kvm_vcpu_ioctl(CPU(s->cpu), KVM_SET_LAPIC, &kapic); if (ret < 0) { fprintf(stderr, "KVM_SET_LAPIC failed: %s\n", strerror(-ret)); abort(); } }
SO between KVM_SET_MSRS and KVM_SET_LAPIC, apic id is indeed != vcpu_id, which might inhibit APICv/AVIC but after recent Sean's patch series, the inhibition is now reversible.
In fact I think it won't hurt to make the APICV_INHIBIT_REASON_APIC_BASE_MODIFIED inhibition reversible as well.
Best regards, Maxim Levitsky
On Mon, Jan 09, 2023, Maxim Levitsky wrote:
On Mon, 2023-01-09 at 08:06 -0500, Emanuele Giuseppe Esposito wrote:
If KVM_SET_MSR firstly enables and then disables x2APIC, make sure APIC_ID is actually updated correctly, since bits and offset differ from xAPIC and x2APIC.
Currently this is not handled correctly, as kvm_set_apic_base() will have msr_info->host_initiated, so switching from x2APIC to xAPIC won't fail, but kvm_lapic_set_base() does not handle the case.
Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com
arch/x86/kvm/lapic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4efdb4a4d72c..df0a50099aa2 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2394,8 +2394,12 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) } }
- if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
- if ((old_value ^ value) & X2APIC_ENABLE) {
if (value & X2APIC_ENABLE)
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
else
kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
- }
if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) { kvm_vcpu_update_apicv(vcpu);
I don't think that this patch is 100% needed in a strict sense, but I don't object to it either.
I'd prefer not to go this route, I assume/suspect there's a diffferent underlying issue that is the real problem.
The switch between x2apic and xapic mode is not allowed by X86 spec, while vise versa is allowed and I think that the spec says that in this case APIC ID is restored to its default value.
No, APIC ID is initialized on RESET, but AFAIK it's preserved for all other transitions. It's definitely preserved on INIT (doesn't touch the enable bit), and this snippet from the SDM more or less says the APIC ID is preserved when it's disabled in IA32_APIC_BASE.
From the disabled state, the only valid x2APIC transition using IA32_APIC_BASE is to the xAPIC mode (EN= 1, EXTD = 0). Thus the only means to transition from x2APIC mode to xAPIC mode is a two-step process:
- first transition from x2APIC mode to local APIC disabled mode (EN= 0, EXTD = 0), - followed by another transition from disabled mode to xAPIC mode (EN= 1, EXTD= 0).
Consequently, all the APIC register states in the x2APIC, except for the x2APIC ID (32 bits), are not preserved across mode transitions.
And for RESET vs. INIT
A reset in this state places the x2APIC in xAPIC mode. All APIC registers (including the local APIC ID register) are initialized as described in Section 10.12.5.1.
An INIT in this state keeps the x2APIC in the x2APIC mode. The state of the local APIC ID register is preserved (all 32 bits). However, all the other APIC registers are initialized as a result of the INIT transition.
Emanuele, what is the actual issue you are trying to fix? E.g. is APICv left inihibited after an emulated RESET? Something else? Stuffing APIC state from userspace should do the right thing after commit ef40757743b4 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself") and this patch:
https://lore.kernel.org/all/20230106011306.85230-33-seanjc@google.com
Am 09/01/2023 um 17:23 schrieb Sean Christopherson:
On Mon, Jan 09, 2023, Maxim Levitsky wrote:
On Mon, 2023-01-09 at 08:06 -0500, Emanuele Giuseppe Esposito wrote:
If KVM_SET_MSR firstly enables and then disables x2APIC, make sure APIC_ID is actually updated correctly, since bits and offset differ from xAPIC and x2APIC.
Currently this is not handled correctly, as kvm_set_apic_base() will have msr_info->host_initiated, so switching from x2APIC to xAPIC won't fail, but kvm_lapic_set_base() does not handle the case.
Fixes: 8d860bbeedef ("kvm: vmx: Basic APIC virtualization controls have three settings") Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com
arch/x86/kvm/lapic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4efdb4a4d72c..df0a50099aa2 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2394,8 +2394,12 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) } }
- if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
- if ((old_value ^ value) & X2APIC_ENABLE) {
if (value & X2APIC_ENABLE)
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
else
kvm_apic_set_xapic_id(apic, vcpu->vcpu_id);
- }
if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) { kvm_vcpu_update_apicv(vcpu);
I don't think that this patch is 100% needed in a strict sense, but I don't object to it either.
I'd prefer not to go this route, I assume/suspect there's a diffferent underlying issue that is the real problem.
The switch between x2apic and xapic mode is not allowed by X86 spec, while vise versa is allowed and I think that the spec says that in this case APIC ID is restored to its default value.
No, APIC ID is initialized on RESET, but AFAIK it's preserved for all other transitions. It's definitely preserved on INIT (doesn't touch the enable bit), and this snippet from the SDM more or less says the APIC ID is preserved when it's disabled in IA32_APIC_BASE.
From the disabled state, the only valid x2APIC transition using IA32_APIC_BASE is to the xAPIC mode (EN= 1, EXTD = 0). Thus the only means to transition from x2APIC mode to xAPIC mode is a two-step process:
- first transition from x2APIC mode to local APIC disabled mode (EN= 0, EXTD = 0),
- followed by another transition from disabled mode to xAPIC mode (EN= 1, EXTD= 0).
Consequently, all the APIC register states in the x2APIC, except for the x2APIC ID (32 bits), are not preserved across mode transitions.
And for RESET vs. INIT
A reset in this state places the x2APIC in xAPIC mode. All APIC registers (including the local APIC ID register) are initialized as described in Section 10.12.5.1.
An INIT in this state keeps the x2APIC in the x2APIC mode. The state of the local APIC ID register is preserved (all 32 bits). However, all the other APIC registers are initialized as a result of the INIT transition.
Emanuele, what is the actual issue you are trying to fix? E.g. is APICv left inihibited after an emulated RESET? Something else?
I think the test in patch 2 I wrote gives a better idea on what I am trying to fix: if we are transitioning from x2APIC to xAPIC (RESET I would say, even though I am not sure if userspace really does it in the way I do it in the test, ie through KVM_SET_MSRS), the APIC_ID is not updated back in the right bits, and we can see that by querying the ID with KVM_GET_LAPIC after disabling x2APIC.
Now, if the way I reproduce this issue is correct, it is indeed a bug and needs to be fixed with the fix in patch 1 or something similar. I think it won't really make any difference if instead following what the doc says (x2APIC -> disabled -> xAPIC) we directly do x2APIC -> xAPIC.
The test in patch 2 started being developed to test ef40757743b47 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself") even though I honestly didn't really understand how to replicate that bug (see cover letter) and instead I found this other possibility that still manages to screw APIC_ID.
Hope this clarifies it a little, Emanuele
Stuffing APIC state from
userspace should do the right thing after commit ef40757743b4 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself") and this patch:
https://lore.kernel.org/all/20230106011306.85230-33-seanjc@google.com
On 1/10/23 13:16, Emanuele Giuseppe Esposito wrote:
I think the test in patch 2 I wrote gives a better idea on what I am trying to fix: if we are transitioning from x2APIC to xAPIC (RESET I would say, even though I am not sure if userspace really does it in the way I do it in the test, ie through KVM_SET_MSRS), the APIC_ID is not updated back in the right bits, and we can see that by querying the ID with KVM_GET_LAPIC after disabling x2APIC.
Now, if the way I reproduce this issue is correct, it is indeed a bug and needs to be fixed with the fix in patch 1 or something similar. I think it won't really make any difference if instead following what the doc says (x2APIC -> disabled -> xAPIC) we directly do x2APIC -> xAPIC.
Yes, the default value at reset is xAPIC mode, so a reset will do a KVM_SET_MSRS that clears X2APIC_ENABLE but leaves MSR_IA32_APICBASE_ENABLE set.
So, if I understand correctly...
The test in patch 2 started being developed to test ef40757743b47 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself") even though I honestly didn't really understand how to replicate that bug (see cover letter) and instead I found this other possibility that still manages to screw APIC_ID.
... what you're saying is that there were two different bugs, but one fixing any one of them was enough to prevent the symptoms shown by commit ef40757743b47? That is:
- the APICv inhibit was set by KVM_GET_LAPIC because it called kvm_lapic_xapic_id_updated(), and the call was unnecessary as fixed in commit ef40757743b47;
- however, there is no reason for the vCPU ID to be mismatched. It happened because the code didn't handle the host-initiated x2APIC->xAPIC case and thus lacked a call to kvm_apic_set_xapic_id().
If so, I think the idea of the patch is fine.
Just one thing: your patch also changes the APIC_ID on the x2APIC->disabled transition, not just the "forbidden" (i.e. host- initiated only) x2APIC->xAPIC transition. I think this is okay too: the manual says:
10.4.3 Enabling or Disabling the Local APIC
When IA32_APIC_BASE[11] is set to 0, prior initialization to the APIC may be lost and the APIC may return to the state described in Section 10.4.7.1, “Local APIC State After Power-Up or Reset.”
10.4.7.1 Local APIC State After Power-Up or Reset
... The local APIC ID register is set to a unique APIC ID. ...
(which must be an xAPIC ID) and this is what your patch does.
In fact perhaps you can change the code further to invoke kvm_lapic_reset() after static_branch_inc(&apic_hw_disabled.key)? It's just a bit messy that you have a call back to kvm_lapic_set_base() in there, so perhaps something like this can help:
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4efdb4a4d72c..24e5df23a4d9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2433,9 +2436,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) { - struct kvm_lapic *apic = vcpu->arch.apic; u64 msr_val; - int i;
if (!init_event) { msr_val = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; @@ -2444,8 +2445,14 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_lapic_set_base(vcpu, msr_val); }
- if (!apic) - return; + if (vcpu->arch.apic) + __kvm_lapic_reset(vcpu, init_event); +} + +static void __kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) +{ + struct kvm_lapic *apic = vcpu->arch.apic; + int i;
/* Stop the timer in case it's a reset to an active apic */ hrtimer_cancel(&apic->lapic_timer.timer);
(just a sketch to show the idea, of course __kvm_lapic_reset would have to go first).
Paolo
Am 10/01/2023 um 15:07 schrieb Paolo Bonzini:
On 1/10/23 13:16, Emanuele Giuseppe Esposito wrote:
I think the test in patch 2 I wrote gives a better idea on what I am trying to fix: if we are transitioning from x2APIC to xAPIC (RESET I would say, even though I am not sure if userspace really does it in the way I do it in the test, ie through KVM_SET_MSRS), the APIC_ID is not updated back in the right bits, and we can see that by querying the ID with KVM_GET_LAPIC after disabling x2APIC.
Now, if the way I reproduce this issue is correct, it is indeed a bug and needs to be fixed with the fix in patch 1 or something similar. I think it won't really make any difference if instead following what the doc says (x2APIC -> disabled -> xAPIC) we directly do x2APIC -> xAPIC.
Yes, the default value at reset is xAPIC mode, so a reset will do a KVM_SET_MSRS that clears X2APIC_ENABLE but leaves MSR_IA32_APICBASE_ENABLE set.
So, if I understand correctly...
The test in patch 2 started being developed to test ef40757743b47 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself") even though I honestly didn't really understand how to replicate that bug (see cover letter) and instead I found this other possibility that still manages to screw APIC_ID.
... what you're saying is that there were two different bugs, but one fixing any one of them was enough to prevent the symptoms shown by commit ef40757743b47? That is:
- the APICv inhibit was set by KVM_GET_LAPIC because it called
kvm_lapic_xapic_id_updated(), and the call was unnecessary as fixed in commit ef40757743b47;
- however, there is no reason for the vCPU ID to be mismatched. It
happened because the code didn't handle the host-initiated x2APIC->xAPIC case and thus lacked a call to kvm_apic_set_xapic_id().
If so, I think the idea of the patch is fine.
Yes :)
Just one thing: your patch also changes the APIC_ID on the x2APIC->disabled transition, not just the "forbidden" (i.e. host- initiated only) x2APIC->xAPIC transition. I think this is okay too: the manual says:
10.4.3 Enabling or Disabling the Local APIC
When IA32_APIC_BASE[11] is set to 0, prior initialization to the APIC may be lost and the APIC may return to the state described in Section 10.4.7.1, “Local APIC State After Power-Up or Reset.”
10.4.7.1 Local APIC State After Power-Up or Reset
... The local APIC ID register is set to a unique APIC ID. ...
(which must be an xAPIC ID) and this is what your patch does.
In fact perhaps you can change the code further to invoke kvm_lapic_reset() after static_branch_inc(&apic_hw_disabled.key)?
Ok, it makes sense. In that case we are disabling lapic (X2APIC_ENABLE should be 0 too, otherwise it would be invalid.
xAPIC global enable(IA32_APIC_BASE[11]) | x2APIC enable(IA32_APIC_BASE[10]) | Description 0 0 local APIC is disabled 0 1 Invalid 1 0 local APIC is enabled in xAPIC mode 1 1 local APIC is enabled in x2APIC mode
Thank you, Emanuele
It's
just a bit messy that you have a call back to kvm_lapic_set_base() in there, so perhaps something like this can help:
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4efdb4a4d72c..24e5df23a4d9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2433,9 +2436,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu) void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) { - struct kvm_lapic *apic = vcpu->arch.apic; u64 msr_val; - int i; if (!init_event) { msr_val = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; @@ -2444,8 +2445,14 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_lapic_set_base(vcpu, msr_val); } - if (!apic) - return; + if (vcpu->arch.apic) + __kvm_lapic_reset(vcpu, init_event); +}
+static void __kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) +{ + struct kvm_lapic *apic = vcpu->arch.apic; + int i; /* Stop the timer in case it's a reset to an active apic */ hrtimer_cancel(&apic->lapic_timer.timer);
(just a sketch to show the idea, of course __kvm_lapic_reset would have to go first).
Paolo
On Tue, Jan 10, 2023, Paolo Bonzini wrote:
On 1/10/23 13:16, Emanuele Giuseppe Esposito wrote:
I think the test in patch 2 I wrote gives a better idea on what I am trying to fix:
Tests and descriptions of expected s vs. actual behavior explain what _you_ think should happen, but don't help explain what higher level bug is being fixed. IIUC, QEMU emulates RESET and expects the xAPIC ID to be re-initialized. That's the info that should be provided in the changelog since it ties KVM behavior to a real world userspace emulating actual hardware behavior.
if we are transitioning from x2APIC to xAPIC (RESET I would say, even though I am not sure if userspace really does it in the way I do it in the test, ie through KVM_SET_MSRS), the APIC_ID is not updated back in the right bits, and we can see that by querying the ID with KVM_GET_LAPIC after disabling x2APIC.
Now, if the way I reproduce this issue is correct, it is indeed a bug and needs to be fixed with the fix in patch 1 or something similar. I think it won't really make any difference if instead following what the doc says (x2APIC -> disabled -> xAPIC) we directly do x2APIC -> xAPIC.
Yes, the default value at reset is xAPIC mode, so a reset will do a KVM_SET_MSRS that clears X2APIC_ENABLE but leaves MSR_IA32_APICBASE_ENABLE set.
So, if I understand correctly...
The test in patch 2 started being developed to test ef40757743b47 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself") even though I honestly didn't really understand how to replicate that bug (see cover letter) and instead I found this other possibility that still manages to screw APIC_ID.
... what you're saying is that there were two different bugs, but one fixing any one of them was enough to prevent the symptoms shown by commit ef40757743b47? That is:
- the APICv inhibit was set by KVM_GET_LAPIC because it called
kvm_lapic_xapic_id_updated(), and the call was unnecessary as fixed in commit ef40757743b47;
- however, there is no reason for the vCPU ID to be mismatched. It
happened because the code didn't handle the host-initiated x2APIC->xAPIC case and thus lacked a call to kvm_apic_set_xapic_id().
If so, I think the idea of the patch is fine.
Just one thing: your patch also changes the APIC_ID on the x2APIC->disabled transition, not just the "forbidden" (i.e. host- initiated only) x2APIC->xAPIC transition. I think this is okay too: the manual says:
10.4.3 Enabling or Disabling the Local APIC
When IA32_APIC_BASE[11] is set to 0, prior initialization to the APIC may be lost and the APIC may return to the state described in Section 10.4.7.1, “Local APIC State After Power-Up or Reset.”
10.4.7.1 Local APIC State After Power-Up or Reset
... The local APIC ID register is set to a unique APIC ID. ...
(which must be an xAPIC ID) and this is what your patch does.
Ugh, couldn't find that yesterday. It's actually irrelevant though, KVM already stuffs the APIC ID when the APIC goes from DISABLED to ENABLED (xAPIC) since commit:
49bd29ba1dbd ("KVM: x86: reset APIC ID when enabling LAPIC")
For giggles, and because I misread that like 5 times, I tested on hardware. Intel CPUs since at least Haswell make the APIC ID read-only, i.e. it's a moot point on Intel these days. But on AMD, the APIC ID is preserved across disabling => enabling xAPIC.
In fact perhaps you can change the code further to invoke kvm_lapic_reset() after static_branch_inc(&apic_hw_disabled.key)? It's just a bit messy that you have a call back to kvm_lapic_set_base() in there, so perhaps something like this can help:
I'd rather not touch kvm_lapic_reset(). KVM doesn't emulate RESET, and I don't want to make assumptions about why userspace is forcing x2APIC => xAPIC. If userspace wants to propery emulate RESET, it can use KVM_SET_LAPIC.
My preference is to do a light tweak on the original patch, with a rewritten shortlog and changelog. And because I spent way, way too much time digging into this, I went a bit overboard...
From: Sean Christopherson seanjc@google.com Date: Tue, 10 Jan 2023 10:40:33 -0800 Subject: [PATCH] KVM: x86: Reinitialize xAPIC ID when userspace forces x2APIC => xAPIC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Reinitialize the xAPIC ID to the vCPU ID when userspace forces the APIC to transition directly from x2APIC to xAPIC mode, e.g. to emulate RESET. KVM already stuffs the xAPIC ID when the APIC is transitioned from DISABLED to xAPIC (commit 49bd29ba1dbd ("KVM: x86: reset APIC ID when enabling LAPIC")), i.e. userspace is conditioned to expect KVM to update the xAPIC ID, but KVM doesn't handle the architecturally-impossible case where userspace forces x2APIC=>xAPIC via KVM_SET_MSRS.
On its own, the "bug" is benign, as userspace emulation of RESET will also stuff APIC registers via KVM_SET_LAPIC, i.e. will manually set the xAPIC ID. However, commit 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base") introduced a bug, fixed by commit commit ef40757743b4 ("KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself"), that caused KVM to fail to properly update the xAPIC ID when handling KVM_SET_LAPIC. Refresh the xAPIC ID even though it's not strictly necessary so that KVM provides consistent behavior.
Note, KVM follows Intel architecture with regard to handling the xAPIC ID and x2APIC IDs across mode transitions. For the APIC DISABLED case (commit 49bd29ba1dbd), Intel's SDM says the xAPIC ID _may_ be reinitialized
10.4.3 Enabling or Disabling the Local APIC
When IA32_APIC_BASE[11] is set to 0, prior initialization to the APIC may be lost and the APIC may return to the state described in Section 10.4.7.1, “Local APIC State After Power-Up or Reset.”
10.4.7.1 Local APIC State After Power-Up or Reset
... The local APIC ID register is set to a unique APIC ID. ...
i.e. KVM's behavior is legal as per Intel's architecture. In practice, Intel's behavior is N/A as modern Intel CPUs (since at least Haswell) make the xAPIC ID fully read-only.
And for xAPIC => x2APIC transitions (commit 257b9a5faab5 ("KVM: x86: use correct APIC ID on x2APIC transition")), Intel's SDM says:
Any APIC ID value written to the memory-mapped local APIC ID register is not preserved.
AMD's APM says nothing (that I could find) about the xAPIC ID when the APIC is DISABLED, but testing on bare metal (Rome) shows that the xAPIC ID is preserved when the APIC is DISABLED and re-enabled in xAPIC mode. AMD also preserves the xAPIC ID when the APIC is transitioned from xAPIC to x2APIC, i.e. allows a backdoor write of the x2APIC ID, which is again not emulated by KVM.
Reported-by: Emanuele Giuseppe Esposito eesposit@redhat.com Signed-off-by: Sean Christopherson seanjc@google.com --- arch/x86/kvm/lapic.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 80f92cbc4029..79141d76ad49 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2485,8 +2485,12 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) } }
- if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) - kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id); + if ((old_value ^ value) & X2APIC_ENABLE) { + if (value & X2APIC_ENABLE) + kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id); + else if (value & MSR_IA32_APICBASE_ENABLE) + kvm_apic_set_xapic_id(apic, vcpu->vcpu_id); + }
if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) { kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
base-commit: 91dc252b0dbb6879e4067f614df1e397fec532a1
Make sure the APIC_ID is correctly shifted in the right bit positions when disabling x2APIC via KVM_SET_MSRS.
Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com --- .../selftests/kvm/x86_64/xapic_state_test.c | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+)
diff --git a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c index d7d37dae3eeb..6ebda7162a25 100644 --- a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c +++ b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c @@ -132,6 +132,62 @@ static void test_icr(struct xapic_vcpu *x) __test_icr(x, -1ull & ~APIC_DM_FIXED_MASK); }
+static void _test_lapic_id(struct kvm_vcpu *vcpu, bool x2apic_enabled, + int expected_id) +{ + struct kvm_lapic_state xapic; + + vcpu_ioctl(vcpu, KVM_GET_LAPIC, &xapic); + if (x2apic_enabled) + ASSERT_EQ(xapic.regs[APIC_ID], expected_id); + else + ASSERT_EQ(xapic.regs[0x23], expected_id); + +} + +static void test_apic_id(struct kvm_vcpu *vcpu, int id) +{ + int ret; + struct { + struct kvm_msrs info; + struct kvm_msr_entry entries[1]; + } msr_data = { + .info.nmsrs = 1, + .entries[0].index = MSR_IA32_APICBASE, + }; + + /* vcpu is initialized with xAPIC enabled */ + ret = __vcpu_ioctl(vcpu, KVM_GET_MSRS, &msr_data.info); + TEST_ASSERT(ret == 1, __KVM_IOCTL_ERROR("__vcpu_ioctl", ret)); + ASSERT_EQ(msr_data.entries[0].data & MSR_IA32_APICBASE_ENABLE, + MSR_IA32_APICBASE_ENABLE); + ASSERT_EQ(msr_data.entries[0].data & X2APIC_ENABLE, 0); + _test_lapic_id(vcpu, false, id); + + /* enable x2APIC */ + msr_data.entries[0].data |= X2APIC_ENABLE; + ret = __vcpu_ioctl(vcpu, KVM_SET_MSRS, &msr_data.info); + TEST_ASSERT(ret == 1, __KVM_IOCTL_ERROR("__vcpu_ioctl", ret)); + ASSERT_EQ(msr_data.entries[0].data & MSR_IA32_APICBASE_ENABLE, + MSR_IA32_APICBASE_ENABLE); + ASSERT_EQ(msr_data.entries[0].data & X2APIC_ENABLE, X2APIC_ENABLE); + _test_lapic_id(vcpu, true, id); + + /* + * Check that disabling x2APIC correctly updates the APIC ID to the + * xAPIC format. + */ + msr_data.entries[0].data ^= X2APIC_ENABLE; + ret = __vcpu_ioctl(vcpu, KVM_SET_MSRS, &msr_data.info); + TEST_ASSERT(ret == 1, __KVM_IOCTL_ERROR("__vcpu_ioctl", ret)); + ASSERT_EQ(msr_data.entries[0].data & MSR_IA32_APICBASE_ENABLE, + MSR_IA32_APICBASE_ENABLE); + ASSERT_EQ(msr_data.entries[0].data & X2APIC_ENABLE, 0); + _test_lapic_id(vcpu, false, id); +} + +#define NCPUS 3 + int main(int argc, char *argv[]) { struct xapic_vcpu x = { @@ -139,6 +195,14 @@ int main(int argc, char *argv[]) .is_x2apic = true, }; struct kvm_vm *vm; + struct kvm_vcpu *vcpus[NCPUS] = { 0 }; + int i; + + vm = vm_create_with_vcpus(NCPUS, NULL, vcpus); + vm_enable_cap(vm, KVM_CAP_X2APIC_API, KVM_X2APIC_API_USE_32BIT_IDS); + for (i = 0; i < NCPUS; i++) + test_apic_id(vcpus[i], i); + kvm_vm_free(vm);
vm = vm_create_with_one_vcpu(&x.vcpu, x2apic_guest_code); test_icr(&x);
On Mon, Jan 09, 2023, Emanuele Giuseppe Esposito wrote:
Make sure the APIC_ID is correctly shifted in the right bit positions when disabling x2APIC via KVM_SET_MSRS.
Signed-off-by: Emanuele Giuseppe Esposito eesposit@redhat.com
.../selftests/kvm/x86_64/xapic_state_test.c | 64 +++++++++++++++++++ 1 file changed, 64 insertions(+)
diff --git a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c index d7d37dae3eeb..6ebda7162a25 100644 --- a/tools/testing/selftests/kvm/x86_64/xapic_state_test.c +++ b/tools/testing/selftests/kvm/x86_64/xapic_state_test.c @@ -132,6 +132,62 @@ static void test_icr(struct xapic_vcpu *x) __test_icr(x, -1ull & ~APIC_DM_FIXED_MASK); } +static void _test_lapic_id
There's no need for the underscore since this is "lapic" vs. "apic", though I would prefer to name them both "apic" and go with double underscores, i.e. __test_apic_id().
(struct kvm_vcpu *vcpu, bool x2apic_enabled,
Pass the entire apic_base value to avoid magic booleans, and then that also lets this helper do the vcpu_set_msr().
int expected_id)
There's no need to pass the expected APIC ID, it can be derived from vcpu->id.
+{
- struct kvm_lapic_state xapic;
- vcpu_ioctl(vcpu, KVM_GET_LAPIC, &xapic);
- if (x2apic_enabled)
ASSERT_EQ(xapic.regs[APIC_ID], expected_id);
- else
ASSERT_EQ(xapic.regs[0x23], expected_id);
Oof. It's gross (we need more helpers), but the APIC_ID should be read as a 32-bit value, both to fully validate x2APIC and to check that KVM doesn't leave bits set in the reserved portion of APIC_ID when in xAPIC mode.
apic_id = *((u32 *)&xapic.regs[APIC_ID]);
And then shift the expected ID instead of the actual ID so that it's more obvious what went wrong on failure, e.g. generate errors like
APIC_ID not set back to xAPIC format; wanted = 1000000, got = 1
versus just seeing '0' from the high byte.
+}
+static void test_apic_id(struct kvm_vcpu *vcpu, int id) +{
- int ret;
- struct {
struct kvm_msrs info;
struct kvm_msr_entry entries[1];
- } msr_data = {
.info.nmsrs = 1,
.entries[0].index = MSR_IA32_APICBASE,
- };
- /* vcpu is initialized with xAPIC enabled */
- ret = __vcpu_ioctl(vcpu, KVM_GET_MSRS, &msr_data.info);
- TEST_ASSERT(ret == 1, __KVM_IOCTL_ERROR("__vcpu_ioctl", ret));
Use vcpu_get_msr().
- ASSERT_EQ(msr_data.entries[0].data & MSR_IA32_APICBASE_ENABLE,
MSR_IA32_APICBASE_ENABLE);
This is hard to read. I get annoyed with TEST_ASSERT() requiring a message, but in this case using ASSERT_EQ() to avoid the message is a net negative (I blinked a few times to figure out what it was asserting).
TEST_ASSERT(apic_base & MSR_IA32_APICBASE_ENABLE, "APIC not in ENABLED state at vCPU RESET"); TEST_ASSERT(!(apic_base & X2APIC_ENABLE), "APIC not in xAPIC mode at vCPU RESET");
- ASSERT_EQ(msr_data.entries[0].data & X2APIC_ENABLE, 0);
- _test_lapic_id(vcpu, false, id);
- /* enable x2APIC */
- msr_data.entries[0].data |= X2APIC_ENABLE;
- ret = __vcpu_ioctl(vcpu, KVM_SET_MSRS, &msr_data.info);
- TEST_ASSERT(ret == 1, __KVM_IOCTL_ERROR("__vcpu_ioctl", ret));
- ASSERT_EQ(msr_data.entries[0].data & MSR_IA32_APICBASE_ENABLE,
MSR_IA32_APICBASE_ENABLE);
- ASSERT_EQ(msr_data.entries[0].data & X2APIC_ENABLE, X2APIC_ENABLE);
- _test_lapic_id(vcpu, true, id);
- /*
* Check that disabling x2APIC correctly updates the APIC ID to the
* xAPIC format.
*/
- msr_data.entries[0].data ^= X2APIC_ENABLE;
XOR works, but it obfuscates the code. AND ~, or just use the original value.
- ret = __vcpu_ioctl(vcpu, KVM_SET_MSRS, &msr_data.info);
- TEST_ASSERT(ret == 1, __KVM_IOCTL_ERROR("__vcpu_ioctl", ret));
- ASSERT_EQ(msr_data.entries[0].data & MSR_IA32_APICBASE_ENABLE,
MSR_IA32_APICBASE_ENABLE);
- ASSERT_EQ(msr_data.entries[0].data & X2APIC_ENABLE, 0);
- _test_lapic_id(vcpu, false, id);
+}
+#define NCPUS 3
int main(int argc, char *argv[]) { struct xapic_vcpu x = { @@ -139,6 +195,14 @@ int main(int argc, char *argv[]) .is_x2apic = true, }; struct kvm_vm *vm;
- struct kvm_vcpu *vcpus[NCPUS] = { 0 };
- int i;
- vm = vm_create_with_vcpus(NCPUS, NULL, vcpus);
- vm_enable_cap(vm, KVM_CAP_X2APIC_API, KVM_X2APIC_API_USE_32BIT_IDS);
- for (i = 0; i < NCPUS; i++)
test_apic_id(vcpus[i], i);
- kvm_vm_free(vm);
I would prefer to put this in the helper, test_apic_id(), so that there isn't confusion between the number of vCPUs for that sub-test and the existing tests.
This is what I ended up with:
static void __test_apic_id(struct kvm_vcpu *vcpu, uint64_t apic_base) { uint32_t apic_id, expected; struct kvm_lapic_state xapic;
vcpu_set_msr(vcpu, MSR_IA32_APICBASE, apic_base);
vcpu_ioctl(vcpu, KVM_GET_LAPIC, &xapic);
expected = apic_base & X2APIC_ENABLE ? vcpu->id : vcpu->id << 24; apic_id = *((u32 *)&xapic.regs[APIC_ID]);
TEST_ASSERT(apic_id == expected, "APIC_ID not set back to %s format; wanted = %x, got = %x", (apic_base & X2APIC_ENABLE) ? "x2APIC" : "xAPIC", expected, apic_id); }
/* * Verify that KVM switches the APIC_ID between xAPIC and x2APIC when userspace * stuffs MSR_IA32_APICBASE. Setting the APIC_ID when x2APIC is enabled and * when the APIC transitions for DISABLED to ENABLED is architectural behavior * (on Intel), whereas the x2APIC => xAPIC transition behavior is KVM ABI since * attempted to transition from x2APIC to xAPIC without disabling the APIC is * architecturally disallowed. */ static void test_apic_id(void) { const uint32_t NR_VCPUS = 3; struct kvm_vcpu *vcpus[NR_VCPUS]; uint64_t apic_base; struct kvm_vm *vm; int i;
vm = vm_create_with_vcpus(NR_VCPUS, NULL, vcpus); vm_enable_cap(vm, KVM_CAP_X2APIC_API, KVM_X2APIC_API_USE_32BIT_IDS);
for (i = 0; i < NR_VCPUS; i++) { apic_base = vcpu_get_msr(vcpus[i], MSR_IA32_APICBASE);
TEST_ASSERT(apic_base & MSR_IA32_APICBASE_ENABLE, "APIC not in ENABLED state at vCPU RESET"); TEST_ASSERT(!(apic_base & X2APIC_ENABLE), "APIC not in xAPIC mode at vCPU RESET");
__test_apic_id(vcpus[i], apic_base); __test_apic_id(vcpus[i], apic_base | X2APIC_ENABLE); __test_apic_id(vcpus[i], apic_base); }
kvm_vm_free(vm); }
On Mon, 09 Jan 2023 08:06:03 -0500, Emanuele Giuseppe Esposito wrote:
The root cause is kvm_lapic_set_base() failing to handle x2APIC -> xapic ID switch, which is addressed by patch 1. Patch 2 provides a selftest to verify this behavior.
This serie is an RFC because I think that commit ef40757743b47 already tries to fix one such effect of the error made in kvm_lapic_set_base, but I am not sure how such error described in the commit message is triggered, nor how to reproduce it using a selftest. I don't think one can enable/disable x2APIC using KVM_SET_LAPIC, and kvm_lapic_set_base() in kvm_apic_set_state() just takes care of updating apic->base_address, since value == old_value. The test in patch 2 fails with the fix in ef40757743b47.
[...]
Applied to kvm-x86 apic, with the tweak of only stuffing the APIC_ID if the APIC is enabled. I also heavily reworked the testcase (see feedback on that patch).
Thanks!
[1/2] KVM: x86: Reinitialize xAPIC ID when userspace forces x2APIC => xAPIC https://github.com/kvm-x86/linux/commit/052c3b99cbc8 [2/2] KVM: selftests: Verify APIC_ID is set when forcing x2APIC=>xAPIC transition https://github.com/kvm-x86/linux/commit/eb9819257631
-- https://github.com/kvm-x86/linux/tree/next https://github.com/kvm-x86/linux/tree/fixes
linux-kselftest-mirror@lists.linaro.org