The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the vCPU that receives it. In that case, the interrupt will just cause a vmexit and leave the ON bit set together with the PIR bit corresponding to the interrupt.
Right now, the interrupt would not be delivered until APICv is re-enabled. However, fixing this is just a matter of always doing the PIR->IRR synchronization, even if the vCPU has temporarily disabled APICv.
This is not a problem for performance, or if anything it is an improvement. First, in the common case where vcpu->arch.apicv_active is true, one fewer check has to be performed. Second, static_call_cond will elide the function call if APICv is not present or disabled. Finally, in the case for AMD hardware we can remove the sync_pir_to_irr callback: it is only needed for apic_has_interrupt_for_ppr, and that function already has a fallback for !APICv.
Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/svm/svm.c | 1 - arch/x86/kvm/x86.c | 18 +++++++++--------- 3 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 759952dd1222..f206fc35deff 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -707,7 +707,7 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr) { int highest_irr; - if (apic->vcpu->arch.apicv_active) + if (kvm_x86_ops.sync_pir_to_irr) highest_irr = static_call(kvm_x86_sync_pir_to_irr)(apic->vcpu); else highest_irr = apic_find_highest_irr(apic); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5630c241d5f6..d0f68d11ec70 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4651,7 +4651,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .load_eoi_exitmap = svm_load_eoi_exitmap, .hwapic_irr_update = svm_hwapic_irr_update, .hwapic_isr_update = svm_hwapic_isr_update, - .sync_pir_to_irr = kvm_lapic_find_highest_irr, .apicv_post_state_restore = avic_post_state_restore,
.set_tss_addr = svm_set_tss_addr, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 627c955101a0..a8f12c83db4b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4448,8 +4448,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) { - if (vcpu->arch.apicv_active) - static_call(kvm_x86_sync_pir_to_irr)(vcpu); + static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
return kvm_apic_get_state(vcpu, s); } @@ -9528,8 +9527,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) if (irqchip_split(vcpu->kvm)) kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors); else { - if (vcpu->arch.apicv_active) - static_call(kvm_x86_sync_pir_to_irr)(vcpu); + static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu); if (ioapic_in_kernel(vcpu->kvm)) kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors); } @@ -9802,10 +9800,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
/* * This handles the case where a posted interrupt was - * notified with kvm_vcpu_kick. + * notified with kvm_vcpu_kick. Assigned devices can + * use the POSTED_INTR_VECTOR even if APICv is disabled, + * so do it even if !kvm_vcpu_apicv_active(vcpu). */ - if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active) - static_call(kvm_x86_sync_pir_to_irr)(vcpu); + if (kvm_lapic_enabled(vcpu)) + static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_vcpu_exit_request(vcpu)) { vcpu->mode = OUTSIDE_GUEST_MODE; @@ -9849,8 +9849,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST)) break;
- if (kvm_lapic_enabled(vcpu) && kvm->arch.apicv_active) - static_call(kvm_x86_sync_pir_to_irr)(vcpu); + if (kvm_lapic_enabled(vcpu)) + static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (unlikely(kvm_vcpu_exit_request(vcpu))) { exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED;
On Thu, 2021-11-18 at 02:25 -0500, Paolo Bonzini wrote:
The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the vCPU that receives it. In that case, the interrupt will just cause a vmexit and leave the ON bit set together with the PIR bit corresponding to the interrupt.
100% true.
Right now, the interrupt would not be delivered until APICv is re-enabled. However, fixing this is just a matter of always doing the PIR->IRR synchronization, even if the vCPU has temporarily disabled APICv.
This is not a problem for performance, or if anything it is an improvement. First, in the common case where vcpu->arch.apicv_active is true, one fewer check has to be performed. Second, static_call_cond will elide the function call if APICv is not present or disabled. Finally, in the case for AMD hardware we can remove the sync_pir_to_irr callback: it is only needed for apic_has_interrupt_for_ppr, and that function already has a fallback for !APICv.
Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com
arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/svm/svm.c | 1 - arch/x86/kvm/x86.c | 18 +++++++++--------- 3 files changed, 10 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 759952dd1222..f206fc35deff 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -707,7 +707,7 @@ static void pv_eoi_clr_pending(struct kvm_vcpu *vcpu) static int apic_has_interrupt_for_ppr(struct kvm_lapic *apic, u32 ppr) { int highest_irr;
- if (apic->vcpu->arch.apicv_active)
- if (kvm_x86_ops.sync_pir_to_irr) highest_irr = static_call(kvm_x86_sync_pir_to_irr)(apic->vcpu);
else highest_irr = apic_find_highest_irr(apic); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5630c241d5f6..d0f68d11ec70 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4651,7 +4651,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = { .load_eoi_exitmap = svm_load_eoi_exitmap, .hwapic_irr_update = svm_hwapic_irr_update, .hwapic_isr_update = svm_hwapic_isr_update,
- .sync_pir_to_irr = kvm_lapic_find_highest_irr, .apicv_post_state_restore = avic_post_state_restore,
.set_tss_addr = svm_set_tss_addr, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 627c955101a0..a8f12c83db4b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4448,8 +4448,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s) {
- if (vcpu->arch.apicv_active)
static_call(kvm_x86_sync_pir_to_irr)(vcpu);
- static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
return kvm_apic_get_state(vcpu, s); } @@ -9528,8 +9527,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu) if (irqchip_split(vcpu->kvm)) kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors); else {
if (vcpu->arch.apicv_active)
static_call(kvm_x86_sync_pir_to_irr)(vcpu);
if (ioapic_in_kernel(vcpu->kvm)) kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors); }static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
@@ -9802,10 +9800,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) /* * This handles the case where a posted interrupt was
* notified with kvm_vcpu_kick.
* notified with kvm_vcpu_kick. Assigned devices can
* use the POSTED_INTR_VECTOR even if APICv is disabled,
*/* so do it even if !kvm_vcpu_apicv_active(vcpu).
- if (kvm_lapic_enabled(vcpu) && vcpu->arch.apicv_active)
static_call(kvm_x86_sync_pir_to_irr)(vcpu);
- if (kvm_lapic_enabled(vcpu))
static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_vcpu_exit_request(vcpu)) { vcpu->mode = OUTSIDE_GUEST_MODE; @@ -9849,8 +9849,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST)) break;
if (kvm_lapic_enabled(vcpu) && kvm->arch.apicv_active)
static_call(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_lapic_enabled(vcpu))
static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (unlikely(kvm_vcpu_exit_request(vcpu))) { exit_fastpath = EXIT_FASTPATH_EXIT_HANDLED;
vmx_sync_pir_to_irr has 'if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))' That has to be removed I think for this to work.
Plus the above calls now can happen when APICv is fully disabled (and not just inhibited), which is also something that I think that vmx_sync_pir_to_irr should be fixed to be aware of.
Also note that VMX has code that sets vmx_x86_ops.sync_pir_to_irr to NULL in its 'hardware_setup' if APICv is disabled. I wonder if that done befor or after the static_call_cond sites are updated.
I think that this code should be removed as well, and vmx_sync_pir_to_irr should just do nothing when APICv is fully disabled.
I haven't run tested this code so I might be wrong of course.
Best regards, Maxim Levitsky
On 11/18/21 10:56, Maxim Levitsky wrote:
vmx_sync_pir_to_irr has 'if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))' That has to be removed I think for this to work.
Good point.
Plus the above calls now can happen when APICv is fully disabled (and not just inhibited), which is also something that I think that vmx_sync_pir_to_irr should be fixed to be aware of.
No, that works because sync_pir_to_irr is set to NULL as you point out below. static_call sites are updated right after ops->hardware_setup(), in kvm_arch_hardware_setup.
Paolo
Also note that VMX has code that sets vmx_x86_ops.sync_pir_to_irr to NULL in its 'hardware_setup' if APICv is disabled. I wonder if that done befor or after the static_call_cond sites are updated.
I think that this code should be removed as well, and vmx_sync_pir_to_irr should just do nothing when APICv is fully disabled.
On Thu, 2021-11-18 at 12:11 +0100, Paolo Bonzini wrote:
On 11/18/21 10:56, Maxim Levitsky wrote:
vmx_sync_pir_to_irr has 'if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))' That has to be removed I think for this to work.
Good point.
Plus the above calls now can happen when APICv is fully disabled (and not just inhibited), which is also something that I think that vmx_sync_pir_to_irr should be fixed to be aware of.
No, that works because sync_pir_to_irr is set to NULL as you point out below. static_call sites are updated right after ops->hardware_setup(), in kvm_arch_hardware_setup.
I understand now. Thanks!
Best regards, Maxim Levitsky
Paolo
Also note that VMX has code that sets vmx_x86_ops.sync_pir_to_irr to NULL in its 'hardware_setup' if APICv is disabled. I wonder if that done befor or after the static_call_cond sites are updated.
I think that this code should be removed as well, and vmx_sync_pir_to_irr should just do nothing when APICv is fully disabled.
On Thu, Nov 18, 2021, Paolo Bonzini wrote:
On 11/18/21 10:56, Maxim Levitsky wrote:
vmx_sync_pir_to_irr has 'if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))' That has to be removed I think for this to work.
Good point.
Hmm, I think I'd prefer to keep it as
if (KVM_BUG_ON(!enable_apicv)) return -EIO;
since calling it directly or failing to nullify vmx_x86_ops.sync_pir_to_irr when APICv is unsupported would lead to all sorts of errors. It's not a strong preference though.
On 11/18/21 17:17, Sean Christopherson wrote:
On Thu, Nov 18, 2021, Paolo Bonzini wrote:
On 11/18/21 10:56, Maxim Levitsky wrote:
vmx_sync_pir_to_irr has 'if (KVM_BUG_ON(!vcpu->arch.apicv_active, vcpu->kvm))' That has to be removed I think for this to work.
Good point.
Hmm, I think I'd prefer to keep it as
if (KVM_BUG_ON(!enable_apicv)) return -EIO;
since calling it directly or failing to nullify vmx_x86_ops.sync_pir_to_irr when APICv is unsupported would lead to all sorts of errors. It's not a strong preference though.
Sure, why not. There's a few more changes required to handle KVM_REQ_EVENT when APICv is !active on the CPU, so I'll post it early next week.
(The MOVE/COPY context stuff also exposed itself as a bit of a trainwreck and ate half of my day).
Paolo
On Thu, Nov 18, 2021, Paolo Bonzini wrote:
The IRTE for an assigned device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the vCPU that receives it. In that case, the interrupt will just cause a vmexit and leave the ON bit set together with the PIR bit corresponding to the interrupt.
Right now, the interrupt would not be delivered until APICv is re-enabled. However, fixing this is just a matter of always doing the PIR->IRR synchronization, even if the vCPU has temporarily disabled APICv.
This is not a problem for performance, or if anything it is an improvement. First, in the common case where vcpu->arch.apicv_active is true, one fewer check has to be performed. Second, static_call_cond will elide the function call if APICv is not present or disabled. Finally, in the case for AMD hardware we can remove the sync_pir_to_irr callback: it is only needed for apic_has_interrupt_for_ppr, and that function already has a fallback for !APICv.
Cc: stable@vger.kernel.org Co-developed-by: Sean Christopherson seanjc@google.com
For my bits:
Signed-off-by: Sean Christopherson seanjc@google.com
linux-stable-mirror@lists.linaro.org