From: Lai Jiangshan laijs@linux.alibaba.com
[ Upstream commit e45e9e3998f0001079b09555db5bb3b4257f6746 ]
The KVM doesn't know whether any TLB for a specific pcid is cached in the CPU when tdp is enabled. So it is better to flush all the guest TLB when invalidating any single PCID context.
The case is very rare or even impossible since KVM generally doesn't intercept CR3 write or INVPCID instructions when tdp is enabled, so the fix is mostly for the sake of overall robustness.
Signed-off-by: Lai Jiangshan laijs@linux.alibaba.com Message-Id: 20211019110154.4091-2-jiangshanlai@gmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c48e2b5729c5d..0644f429f848c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1091,6 +1091,18 @@ static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid) unsigned long roots_to_free = 0; int i;
+ /* + * MOV CR3 and INVPCID are usually not intercepted when using TDP, but + * this is reachable when running EPT=1 and unrestricted_guest=0, and + * also via the emulator. KVM's TDP page tables are not in the scope of + * the invalidation, but the guest's TLB entries need to be flushed as + * the CPU may have cached entries in its TLB for the target PCID. + */ + if (unlikely(tdp_enabled)) { + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); + return; + } + /* * If neither the current CR3 nor any of the prev_roots use the given * PCID, then nothing needs to be done here because a resync will
From: Lai Jiangshan laijs@linux.alibaba.com
[ Upstream commit 552617382c197949ff965a3559da8952bf3c1fa5 ]
X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context doesn't need to be reset. It is only required to flush all the guest tlb.
Signed-off-by: Lai Jiangshan laijs@linux.alibaba.com Reviewed-by: Sean Christopherson seanjc@google.com Message-Id: 20210919024246.89230-2-jiangshanlai@gmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/x86.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0644f429f848c..98a0f3c28caec 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1042,9 +1042,10 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);
void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4) { - if (((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) || - (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE))) + if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) kvm_mmu_reset_context(vcpu); + else if (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)) + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); } EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
Hello,
[PATCH MANUALSEL 5.15 2/8] KVM: X86: Don't reset mmu context when X86_CR4_PCIDE 1->0 [PATCH MANUALSEL 5.15 3/8] KVM: X86: Don't check unsync if the original spte is writible
are pure cleanups, they don't need to backport to stable.
Thanks Lai
On 2021/11/24 00:36, Sasha Levin wrote:
From: Lai Jiangshan laijs@linux.alibaba.com
[ Upstream commit 552617382c197949ff965a3559da8952bf3c1fa5 ]
X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context doesn't need to be reset. It is only required to flush all the guest tlb.
Signed-off-by: Lai Jiangshan laijs@linux.alibaba.com Reviewed-by: Sean Christopherson seanjc@google.com Message-Id: 20210919024246.89230-2-jiangshanlai@gmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
arch/x86/kvm/x86.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0644f429f848c..98a0f3c28caec 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1042,9 +1042,10 @@ EXPORT_SYMBOL_GPL(kvm_is_valid_cr4); void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4) {
- if (((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) ||
(!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE)))
- if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) kvm_mmu_reset_context(vcpu);
- else if (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE))
} EXPORT_SYMBOL_GPL(kvm_post_set_cr4);kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
From: Lai Jiangshan laijs@linux.alibaba.com
[ Upstream commit 8b8f9d753b84c243bf0b1004b515c53b7ec7e138 ]
If the original spte is writable, the target gfn should not be the gfn of synchronized shadowpage and can continue to be writable.
When !can_unsync, speculative must be false. So when the check of "!can_unsync" is removed, we need to move the label of "out" up.
Signed-off-by: Lai Jiangshan laijs@linux.alibaba.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Message-Id: 20210918005636.3675-11-jiangshanlai@gmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/mmu/spte.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 3e97cdb13eb7e..86a21eb85d25f 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -150,7 +150,7 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots. * Same reasoning can be applied to dirty page accounting. */ - if (!can_unsync && is_writable_pte(old_spte)) + if (is_writable_pte(old_spte)) goto out;
/* @@ -171,10 +171,10 @@ int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, if (pte_access & ACC_WRITE_MASK) spte |= spte_shadow_dirty_mask(spte);
+out: if (speculative) spte = mark_spte_for_access_track(spte);
-out: WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
From: Thomas Huth thuth@redhat.com
[ Upstream commit 22d7108ce47290d47e1ea83a28fbfc85e0ecf97e ]
The kvm_vm_free() statement here is currently dead code, since the loop in front of it can only be left with the "goto done" that jumps right after the kvm_vm_free(). Fix it by swapping the locations of the "done" label and the kvm_vm_free().
Signed-off-by: Thomas Huth thuth@redhat.com Message-Id: 20210826074928.240942-1-thuth@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c | 3 +-- tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c index f40fd097cb359..6f6fd189dda3f 100644 --- a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c +++ b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c @@ -109,8 +109,7 @@ int main(int argc, char *argv[]) } }
- kvm_vm_free(vm); - done: + kvm_vm_free(vm); return 0; } diff --git a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c index 7e33a350b053a..e683d0ac3e45e 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c @@ -161,7 +161,7 @@ int main(int argc, char *argv[]) } }
- kvm_vm_free(vm); done: + kvm_vm_free(vm); return 0; }
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit 82cc27eff4486f8e79ef8faac1af1f5573039aa4 ]
KVM_CAP_NR_VCPUS is a legacy advisory value which on other architectures return num_online_cpus() caped by KVM_CAP_NR_VCPUS or something else (ppc and arm64 are special cases). On s390, KVM_CAP_NR_VCPUS returns the same as KVM_CAP_MAX_VCPUS and this may turn out to be a bad 'advice'. Switch s390 to returning caped num_online_cpus() too.
Acked-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Reviewed-by: Christian Borntraeger borntraeger@linux.ibm.com Message-Id: 20211116163443.88707-6-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kvm/kvm-s390.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 1c97493d21e10..31bf4bc5a23d7 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -585,6 +585,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_MAX_VCPUS; else if (sclp.has_esca && sclp.has_64bscao) r = KVM_S390_ESCA_CPU_SLOTS; + if (ext == KVM_CAP_NR_VCPUS) + r = min_t(unsigned int, num_online_cpus(), r); break; case KVM_CAP_S390_COW: r = MACHINE_HAS_ESOP;
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit b7915d55b1ac0e68a7586697fa2d06c018135c49 ]
It doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211116163443.88707-4-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kvm/powerpc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index b4e6f70b97b94..8334563a46236 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -641,9 +641,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) * implementations just count online CPUs. */ if (hv_enabled) - r = num_present_cpus(); + r = min_t(unsigned int, num_present_cpus(), KVM_MAX_VCPUS); else - r = num_online_cpus(); + r = min_t(unsigned int, num_online_cpus(), KVM_MAX_VCPUS); break; case KVM_CAP_MAX_VCPUS: r = KVM_MAX_VCPUS;
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit 57a2e13ebdda8b65602b44ec8b80e385603eb84c ]
It doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211116163443.88707-3-vkuznets@redhat.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/kvm/mips.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 75c6f264c626c..713ac87fbeb59 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1067,7 +1067,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 1; break; case KVM_CAP_NR_VCPUS: - r = num_online_cpus(); + r = min_t(unsigned int, num_online_cpus(), KVM_MAX_VCPUS); break; case KVM_CAP_MAX_VCPUS: r = KVM_MAX_VCPUS;
From: Vitaly Kuznetsov vkuznets@redhat.com
[ Upstream commit f60a00d7295057cb4baea5a321501efc72794453 ]
Generally, it doesn't make sense to return the recommended maximum number of vCPUs which exceeds the maximum possible number of vCPUs.
Note: ARM64 is special as the value returned by KVM_CAP_MAX_VCPUS differs depending on whether it is a system-wide ioctl or a per-VM one. Previously, KVM_CAP_NR_VCPUS didn't have this difference and it seems preferable to keep the status quo. Cap KVM_CAP_NR_VCPUS by kvm_arm_default_max_vcpus() which is what gets returned by system-wide KVM_CAP_MAX_VCPUS.
Signed-off-by: Vitaly Kuznetsov vkuznets@redhat.com Message-Id: 20211116163443.88707-2-vkuznets@redhat.com Acked-by: Marc Zyngier maz@kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kvm/arm.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 9b328bb05596a..1ed82b6d5e8db 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -223,7 +223,14 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = 1; break; case KVM_CAP_NR_VCPUS: - r = num_online_cpus(); + /* + * ARM64 treats KVM_CAP_NR_CPUS differently from all other + * architectures, as it does not always bound it to + * KVM_CAP_MAX_VCPUS. It should not matter much because + * this is just an advisory value. + */ + r = min_t(unsigned int, num_online_cpus(), + kvm_arm_default_max_vcpus()); break; case KVM_CAP_MAX_VCPUS: case KVM_CAP_MAX_VCPU_ID:
On 11/23/21 17:36, Sasha Levin wrote:
From: Lai Jiangshan laijs@linux.alibaba.com
[ Upstream commit e45e9e3998f0001079b09555db5bb3b4257f6746 ]
The KVM doesn't know whether any TLB for a specific pcid is cached in the CPU when tdp is enabled. So it is better to flush all the guest TLB when invalidating any single PCID context.
The case is very rare or even impossible since KVM generally doesn't intercept CR3 write or INVPCID instructions when tdp is enabled, so the fix is mostly for the sake of overall robustness.
Signed-off-by: Lai Jiangshan laijs@linux.alibaba.com Message-Id: 20211019110154.4091-2-jiangshanlai@gmail.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
Acked-by: Paolo Bonzini pbonzini@redhat.com
for this patch, but not to all the others.
Paolo
linux-stable-mirror@lists.linaro.org