After fixing the handling of POSTED_INTR_WAKEUP_VECTOR for vCPUs with
disabled APICv, take care of POSTED_INTR_VECTOR. The IRTE for an assigned
device can trigger a POSTED_INTR_VECTOR even if APICv is disabled on the
vCPU that receives it. In that case, the interrupt will just cause a
vmexit and leave the ON bit set together with the PIR bit corresponding
to the interrupt.
Right now, the interrupt would not be delivered until APICv is re-enabled.
However, fixing this is just a matter of always doing the PIR->IRR
synchronization, even if the vCPU does not have APICv enabled.
This is not a problem for performance, or if anything it is an
improvement. static_call_cond will elide the function call if APICv is
not present or disabled, or if (as is the case for AMD hardware) it does
not require a sync_pir_to_irr callback. And in the common case where
kvm_vcpu_apicv_active(vcpu) is true, one fewer check has to be performed.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/x86.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dcefb1485362..eda86378dcff 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4445,8 +4445,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
struct kvm_lapic_state *s)
{
- if (kvm_vcpu_apicv_active(vcpu))
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
return kvm_apic_get_state(vcpu, s);
}
@@ -9645,8 +9644,7 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
if (irqchip_split(vcpu->kvm))
kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
else {
- if (kvm_vcpu_apicv_active(vcpu))
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (ioapic_in_kernel(vcpu->kvm))
kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
}
@@ -9919,10 +9917,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
/*
* This handles the case where a posted interrupt was
- * notified with kvm_vcpu_kick.
+ * notified with kvm_vcpu_kick. Assigned devices can
+ * use the POSTED_INTR_VECTOR even if APICv is disabled,
+ * so do it even if !kvm_vcpu_apicv_active(vcpu).
*/
- if (kvm_lapic_enabled(vcpu) && kvm_vcpu_apicv_active(vcpu))
- static_call(kvm_x86_sync_pir_to_irr)(vcpu);
+ if (kvm_lapic_enabled(vcpu))
+ static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
if (kvm_vcpu_exit_request(vcpu)) {
vcpu->mode = OUTSIDE_GUEST_MODE;
--
2.27.0
Currently in the is_continue case in hugetlb_mcopy_atomic_pte(), if we
bail out using "goto out_release_unlock;" in the cases where idx >=
size, or !huge_pte_none(), the code will detect that new_pagecache_page
== false, and so call restore_reserve_on_error().
In this case I see restore_reserve_on_error() delete the reservation,
and the following call to remove_inode_hugepages() will increment
h->resv_hugepages causing a 100% reproducible leak.
We should treat the is_continue case similar to adding a page into the
pagecache and set new_pagecache_page to true, to indicate that there is
no reservation to restore on the error path, and we need not call
restore_reserve_on_error(). Rename new_pagecache_page to
page_in_pagecache to make that clear.
Cc: Wei Xu <weixugc(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: c7b1850dfb41 ("hugetlb: don't pass page cache pages to restore_reserve_on_error")
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reported-by: James Houghton <jthoughton(a)google.com>
---
Changes in v2:
- Renamed new_pagecache_page to page_in_pagecache
- Removed unnecessary comment after the name update.
- Cc: stable
---
mm/hugetlb.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e09159c957e3..e7ebc4b355cf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5734,13 +5734,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
int ret = -ENOMEM;
struct page *page;
int writable;
- bool new_pagecache_page = false;
+ bool page_in_pagecache = false;
if (is_continue) {
ret = -EFAULT;
page = find_lock_page(mapping, idx);
if (!page)
goto out;
+ page_in_pagecache = true;
} else if (!*pagep) {
/* If a page already exists, then it's UFFDIO_COPY for
* a non-missing case. Return -EEXIST.
@@ -5828,7 +5829,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ret = huge_add_to_page_cache(page, mapping, idx);
if (ret)
goto out_release_nounlock;
- new_pagecache_page = true;
+ page_in_pagecache = true;
}
ptl = huge_pte_lockptr(h, dst_mm, dst_pte);
@@ -5892,7 +5893,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
if (vm_shared || is_continue)
unlock_page(page);
out_release_nounlock:
- if (!new_pagecache_page)
+ if (!page_in_pagecache)
restore_reserve_on_error(h, dst_vma, dst_addr, page);
put_page(page);
goto out;
--
2.34.0.rc2.393.gf8c9666880-goog