Linux-stable-mirror February 2018

linux-stable-mirror@lists.linaro.org

304 participants
3308 discussions

[Linux-stable-mirror] Patch "array_index_nospec: Sanitize speculative array de-references" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled array_index_nospec: Sanitize speculative array de-references to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: array_index_nospec_Sanitize_speculative_array_de-references.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: array_index_nospec: Sanitize speculative array de-references From: Dan Williams dan.j.williams(a)intel.com Date: Mon Jan 29 17:02:22 2018 -0800 From: Dan Williams dan.j.williams(a)intel.com commit f3804203306e098dae9ca51540fcd5eb700d7f40 array_index_nospec() is proposed as a generic mechanism to mitigate against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks via speculative execution. The array_index_nospec() implementation is expected to be safe for current generation CPUs across multiple architectures (ARM, x86). Based on an original implementation by Linus Torvalds, tweaked to remove speculative flows by Alexei Starovoitov, and tweaked again by Linus to introduce an x86 assembly implementation for the mask generation. Co-developed-by: Linus Torvalds <torvalds(a)linux-foundation.org> Co-developed-by: Alexei Starovoitov <ast(a)kernel.org> Suggested-by: Cyril Novikov <cnovikov(a)lynx.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Cc: linux-arch(a)vger.kernel.org Cc: kernel-hardening(a)lists.openwall.com Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Will Deacon <will.deacon(a)arm.com> Cc: Russell King <linux(a)armlinux.org.uk> Cc: gregkh(a)linuxfoundation.org Cc: torvalds(a)linux-foundation.org Cc: alan(a)linux.intel.com Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwi… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- include/linux/nospec.h | 72 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) --- /dev/null +++ b/include/linux/nospec.h @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2018 Linus Torvalds. All rights reserved. +// Copyright(c) 2018 Alexei Starovoitov. All rights reserved. +// Copyright(c) 2018 Intel Corporation. All rights reserved. + +#ifndef _LINUX_NOSPEC_H +#define _LINUX_NOSPEC_H + +/** + * array_index_mask_nospec() - generate a ~0 mask when index < size, 0 otherwise + * @index: array element index + * @size: number of elements in array + * + * When @index is out of bounds (@index >= @size), the sign bit will be + * set. Extend the sign bit to all bits and invert, giving a result of + * zero for an out of bounds index, or ~0 if within bounds [0, @size). + */ +#ifndef array_index_mask_nospec +static inline unsigned long array_index_mask_nospec(unsigned long index, + unsigned long size) +{ + /* + * Warn developers about inappropriate array_index_nospec() usage. + * + * Even if the CPU speculates past the WARN_ONCE branch, the + * sign bit of @index is taken into account when generating the + * mask. + * + * This warning is compiled out when the compiler can infer that + * @index and @size are less than LONG_MAX. + */ + if (WARN_ONCE(index > LONG_MAX || size > LONG_MAX, + "array_index_nospec() limited to range of [0, LONG_MAX]\n")) + return 0; + + /* + * Always calculate and emit the mask even if the compiler + * thinks the mask is not needed. The compiler does not take + * into account the value of @index under speculation. + */ + OPTIMIZER_HIDE_VAR(index); + return ~(long)(index | (size - 1UL - index)) >> (BITS_PER_LONG - 1); +} +#endif + +/* + * array_index_nospec - sanitize an array index after a bounds check + * + * For a code sequence like: + * + * if (index < size) { + * index = array_index_nospec(index, size); + * val = array[index]; + * } + * + * ...if the CPU speculates past the bounds check then + * array_index_nospec() will clamp the index within the range of [0, + * size). + */ +#define array_index_nospec(index, size) \ +({ \ + typeof(index) _i = (index); \ + typeof(size) _s = (size); \ + unsigned long _mask = array_index_mask_nospec(_i, _s); \ + \ + BUILD_BUG_ON(sizeof(_i) > sizeof(long)); \ + BUILD_BUG_ON(sizeof(_s) > sizeof(long)); \ + \ + _i &= _mask; \ + _i; \ +}) +#endif /* _LINUX_NOSPEC_H */ Patches currently in stable-queue which might be from torvalds(a)linux-foundation.org are queue-4.15/objtool_Add_support_for_alternatives_at_the_end_of_a_section.patch queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch queue-4.15/x86_Introduce_barrier_nospec.patch queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch queue-4.15/x86get_user_Use_pointer_masking_to_limit_speculation.patch queue-4.15/x86_Introduce___uaccess_begin_nospec()_and_uaccess_try_nospec.patch queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch queue-4.15/x86alternative_Print_unadorned_pointers.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch queue-4.15/array_index_nospec_Sanitize_speculative_array_de-references.patch queue-4.15/Documentation_Document_array_index_nospec.patch queue-4.15/x86entry64_Remove_the_SYSCALL64_fast_path.patch queue-4.15/x86bugs_Drop_one_mitigation_from_dmesg.patch queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86asm_Move_status_from_thread_struct_to_thread_info.patch queue-4.15/KVMx86_Add_IBPB_support.patch queue-4.15/x86_Implement_array_index_mask_nospec.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/nl80211_Sanitize_array_index_in_parse_txq_params.patch queue-4.15/moduleretpoline_Warn_about_missing_retpoline_in_module.patch queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch queue-4.15/x86nospec_Fix_header_guards_names.patch queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch queue-4.15/x86uaccess_Use___uaccess_begin_nospec()_and_uaccess_try_nospec.patch queue-4.15/x86entry64_Push_extra_regs_right_away.patch queue-4.15/x86usercopy_Replace_open_coded_stacclac_with___uaccess_begin_end.patch queue-4.15/vfs_fdtable_Prevent_bounds-check_bypass_via_speculative_execution.patch queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch queue-4.15/objtool_Warn_on_stripped_section_symbol.patch queue-4.15/x86spectre_Report_get_user_mitigation_for_spectre_v1.patch queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch queue-4.15/x86syscall_Sanitize_syscall_table_de-references_under_speculation.patch queue-4.15/objtool_Improve_retpoline_alternative_handling.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX From: KarimAllah Ahmed karahmed(a)amazon.de Date: Thu Feb 1 22:59:42 2018 +0100 From: KarimAllah Ahmed karahmed(a)amazon.de commit b7b27aa011a1df42728d1768fc181d9ce69e6911 [dwmw2: Stop using KF() for bits in it, too] Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de> Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Reviewed-by: Jim Mattson <jmattson(a)google.com> Cc: kvm(a)vger.kernel.org Cc: Radim Krčmář <rkrcmar(a)redhat.com> Link: https://lkml.kernel.org/r/1517522386-18410-2-git-send-email-karahmed@amazon… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/cpuid.c | 8 +++----- arch/x86/kvm/cpuid.h | 1 + 2 files changed, 4 insertions(+), 5 deletions(-) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -67,9 +67,7 @@ u64 kvm_supported_xcr0(void) #define F(x) bit(X86_FEATURE_##x) -/* These are scattered features in cpufeatures.h. */ -#define KVM_CPUID_BIT_AVX512_4VNNIW 2 -#define KVM_CPUID_BIT_AVX512_4FMAPS 3 +/* For scattered features from cpufeatures.h; we currently expose none */ #define KF(x) bit(KVM_CPUID_BIT_##x) int kvm_update_cpuid(struct kvm_vcpu *vcpu) @@ -392,7 +390,7 @@ static inline int __do_cpuid_ent(struct /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - KF(AVX512_4VNNIW) | KF(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -477,7 +475,7 @@ static inline int __do_cpuid_ent(struct if (!tdp_enabled || !boot_cpu_has(X86_FEATURE_OSPKE)) entry->ecx &= ~F(PKU); entry->edx &= kvm_cpuid_7_0_edx_x86_features; - entry->edx &= get_scattered_cpuid_leaf(7, 0, CPUID_EDX); + cpuid_mask(&entry->edx, CPUID_7_EDX); } else { entry->ebx = 0; entry->ecx = 0; --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -54,6 +54,7 @@ static const struct cpuid_reg reverse_cp [CPUID_8000_000A_EDX] = {0x8000000a, 0, CPUID_EDX}, [CPUID_7_ECX] = { 7, 0, CPUID_ECX}, [CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX}, + [CPUID_7_EDX] = { 7, 0, CPUID_EDX}, }; static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned x86_feature) Patches currently in stable-queue which might be from karahmed(a)amazon.de are queue-4.15/x86spectre_Simplify_spectre_v2_command_line_parsing.patch queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch queue-4.15/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVMx86_Add_IBPB_support.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch queue-4.15/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM/x86: Add IBPB support" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM/x86: Add IBPB support to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVMx86_Add_IBPB_support.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM/x86: Add IBPB support From: Ashok Raj ashok.raj(a)intel.com Date: Thu Feb 1 22:59:43 2018 +0100 From: Ashok Raj ashok.raj(a)intel.com commit 15d45071523d89b3fb7372e2135fbd72f6af9506 The Indirect Branch Predictor Barrier (IBPB) is an indirect branch control mechanism. It keeps earlier branches from influencing later ones. Unlike IBRS and STIBP, IBPB does not define a new mode of operation. It's a command that ensures predicted branch targets aren't used after the barrier. Although IBRS and IBPB are enumerated by the same CPUID enumeration, IBPB is very different. IBPB helps mitigate against three potential attacks: * Mitigate guests from being attacked by other guests. - This is addressed by issing IBPB when we do a guest switch. * Mitigate attacks from guest/ring3->host/ring3. These would require a IBPB during context switch in host, or after VMEXIT. The host process has two ways to mitigate - Either it can be compiled with retpoline - If its going through context switch, and has set !dumpable then there is a IBPB in that path. (Tim's patch: https://patchwork.kernel.org/patch/10192871) - The case where after a VMEXIT you return back to Qemu might make Qemu attackable from guest when Qemu isn't compiled with retpoline. There are issues reported when doing IBPB on every VMEXIT that resulted in some tsc calibration woes in guest. * Mitigate guest/ring0->host/ring0 attacks. When host kernel is using retpoline it is safe against these attacks. If host kernel isn't using retpoline we might need to do a IBPB flush on every VMEXIT. Even when using retpoline for indirect calls, in certain conditions 'ret' can use the BTB on Skylake-era CPUs. There are other mitigations available like RSB stuffing/clearing. * IBPB is issued only for SVM during svm_free_vcpu(). VMX has a vmclear and SVM doesn't. Follow discussion here: https://lkml.org/lkml/2018/1/15/146 Please refer to the following spec for more details on the enumeration and control. Refer here to get documentation about mitigations. https://software.intel.com/en-us/side-channel-security-support [peterz: rebase and changelog rewrite] [karahmed: - rebase - vmx: expose PRED_CMD if guest has it in CPUID - svm: only pass through IBPB if guest has it in CPUID - vmx: support !cpu_has_vmx_msr_bitmap()] - vmx: support nested] [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS) PRED_CMD is a write-only MSR] Signed-off-by: Ashok Raj <ashok.raj(a)intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk> Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: kvm(a)vger.kernel.org Cc: Asit Mallick <asit.k.mallick(a)intel.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com> Cc: Greg KH <gregkh(a)linuxfoundation.org> Cc: Jun Nakajima <jun.nakajima(a)intel.com> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.… Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/cpuid.c | 11 ++++++- arch/x86/kvm/svm.c | 28 +++++++++++++++++ arch/x86/kvm/vmx.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 116 insertions(+), 3 deletions(-) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -365,6 +365,10 @@ static inline int __do_cpuid_ent(struct F(3DNOWPREFETCH) | F(OSVW) | 0 /* IBS */ | F(XOP) | 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM); + /* cpuid 0x80000008.ebx */ + const u32 kvm_cpuid_8000_0008_ebx_x86_features = + F(IBPB); + /* cpuid 0xC0000001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = F(XSTORE) | F(XSTORE_EN) | F(XCRYPT) | F(XCRYPT_EN) | @@ -625,7 +629,12 @@ static inline int __do_cpuid_ent(struct if (!g_phys_as) g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); - entry->ebx = entry->edx = 0; + entry->edx = 0; + /* IBPB isn't necessarily present in hardware cpuid */ + if (boot_cpu_has(X86_FEATURE_IBPB)) + entry->ebx |= F(IBPB); + entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; + cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; } case 0x80000019: --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -249,6 +249,7 @@ static const struct svm_direct_access_ms { .index = MSR_CSTAR, .always = true }, { .index = MSR_SYSCALL_MASK, .always = true }, #endif + { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, { .index = MSR_IA32_LASTINTFROMIP, .always = false }, @@ -529,6 +530,7 @@ struct svm_cpu_data { struct kvm_ldttss_desc *tss_desc; struct page *save_area; + struct vmcb *current_vmcb; }; static DEFINE_PER_CPU(struct svm_cpu_data *, svm_data); @@ -1703,11 +1705,17 @@ static void svm_free_vcpu(struct kvm_vcp __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); + /* + * The vmcb page can be recycled, causing a false negative in + * svm_vcpu_load(). So do a full IBPB now. + */ + indirect_branch_prediction_barrier(); } static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct vcpu_svm *svm = to_svm(vcpu); + struct svm_cpu_data *sd = per_cpu(svm_data, cpu); int i; if (unlikely(cpu != vcpu->cpu)) { @@ -1736,6 +1744,10 @@ static void svm_vcpu_load(struct kvm_vcp if (static_cpu_has(X86_FEATURE_RDTSCP)) wrmsrl(MSR_TSC_AUX, svm->tsc_aux); + if (sd->current_vmcb != svm->vmcb) { + sd->current_vmcb = svm->vmcb; + indirect_branch_prediction_barrier(); + } avic_vcpu_load(vcpu, cpu); } @@ -3684,6 +3696,22 @@ static int svm_set_msr(struct kvm_vcpu * case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_PRED_CMD: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + if (is_guest_mode(vcpu)) + break; + set_msr_interception(svm->msrpm, MSR_IA32_PRED_CMD, 0, 1); + break; case MSR_STAR: svm->vmcb->save.star = data; break; --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -593,6 +593,7 @@ struct vcpu_vmx { u64 msr_host_kernel_gs_base; u64 msr_guest_kernel_gs_base; #endif + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -934,6 +935,8 @@ static void vmx_set_nmi_mask(struct kvm_ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -1905,6 +1908,29 @@ static void update_exception_bitmap(stru vmcs_write32(EXCEPTION_BITMAP, eb); } +/* + * Check if MSR is intercepted for L01 MSR bitmap. + */ +static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr) +{ + unsigned long *msr_bitmap; + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return true; + + msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap; + + if (msr <= 0x1fff) { + return !!test_bit(msr, msr_bitmap + 0x800 / f); + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + return !!test_bit(msr, msr_bitmap + 0xc00 / f); + } + + return true; +} + static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx, unsigned long entry, unsigned long exit) { @@ -2283,6 +2309,7 @@ static void vmx_vcpu_load(struct kvm_vcp if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; vmcs_load(vmx->loaded_vmcs->vmcs); + indirect_branch_prediction_barrier(); } if (!already_loaded) { @@ -3340,6 +3367,34 @@ static int vmx_set_msr(struct kvm_vcpu * case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_PRED_CMD: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + if (data & ~PRED_CMD_IBPB) + return 1; + + if (!data) + break; + + wrmsrl(MSR_IA32_PRED_CMD, PRED_CMD_IBPB); + + /* + * For non-nested: + * When it's written (to non-zero) for the first time, pass + * it through. + * + * For nested: + * The handling of the MSR bitmap for L2 guests is done in + * nested_vmx_merge_msr_bitmap. We should not touch the + * vmcs02.msr_bitmap here since it gets completely overwritten + * in the merging. + */ + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, + MSR_TYPE_W); + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -10042,9 +10097,23 @@ static inline bool nested_vmx_merge_msr_ struct page *page; unsigned long *msr_bitmap_l1; unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; + /* + * pred_cmd is trying to verify two things: + * + * 1. L0 gave a permission to L1 to actually passthrough the MSR. This + * ensures that we do not accidentally generate an L02 MSR bitmap + * from the L12 MSR bitmap that is too permissive. + * 2. That L1 or L2s have actually used the MSR. This avoids + * unnecessarily merging of the bitmap if the MSR is unused. This + * works properly because we only update the L01 MSR bitmap lazily. + * So even if L0 should pass L1 these MSRs, the L01 bitmap is only + * updated to reflect this when L1 (or its L2s) actually write to + * the MSR. + */ + bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); - /* This shortcut is ok because we support only x2APIC MSRs so far. */ - if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) + if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && + !pred_cmd) return false; page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); @@ -10077,6 +10146,13 @@ static inline bool nested_vmx_merge_msr_ MSR_TYPE_W); } } + + if (pred_cmd) + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_PRED_CMD, + MSR_TYPE_W); + kunmap(page); kvm_release_page_clean(page); Patches currently in stable-queue which might be from ashok.raj(a)intel.com are queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVMx86_Add_IBPB_support.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM: VMX: make MSR bitmaps per-VCPU" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM: VMX: make MSR bitmaps per-VCPU to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVM_VMX_make_MSR_bitmaps_per-VCPU.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM: VMX: make MSR bitmaps per-VCPU From: Paolo Bonzini pbonzini(a)redhat.com Date: Tue Jan 16 16:51:18 2018 +0100 From: Paolo Bonzini pbonzini(a)redhat.com commit 904e14fb7cb96401a7dc803ca2863fd5ba32ffe6 Place the MSR bitmap in struct loaded_vmcs, and update it in place every time the x2apic or APICv state can change. This is rare and the loop can handle 64 MSRs per iteration, in a similar fashion as nested_vmx_prepare_msr_bitmap. This prepares for choosing, on a per-VM basis, whether to intercept the SPEC_CTRL and PRED_CMD MSRs. Cc: stable(a)vger.kernel.org # prereq for Spectre mitigation Suggested-by: Jim Mattson <jmattson(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/vmx.c | 276 ++++++++++++++++++++++++++++------------------------- 1 file changed, 150 insertions(+), 126 deletions(-) --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -111,6 +111,14 @@ static u64 __read_mostly host_xss; static bool __read_mostly enable_pml = 1; module_param_named(pml, enable_pml, bool, S_IRUGO); +#define MSR_TYPE_R 1 +#define MSR_TYPE_W 2 +#define MSR_TYPE_RW 3 + +#define MSR_BITMAP_MODE_X2APIC 1 +#define MSR_BITMAP_MODE_X2APIC_APICV 2 +#define MSR_BITMAP_MODE_LM 4 + #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ @@ -209,6 +217,7 @@ struct loaded_vmcs { int soft_vnmi_blocked; ktime_t entry_time; s64 vnmi_blocked_time; + unsigned long *msr_bitmap; struct list_head loaded_vmcss_on_cpu_link; }; @@ -449,8 +458,6 @@ struct nested_vmx { bool pi_pending; u16 posted_intr_nv; - unsigned long *msr_bitmap; - struct hrtimer preemption_timer; bool preemption_timer_expired; @@ -573,6 +580,7 @@ struct vcpu_vmx { struct kvm_vcpu vcpu; unsigned long host_rsp; u8 fail; + u8 msr_bitmap_mode; u32 exit_intr_info; u32 idt_vectoring_info; ulong rflags; @@ -927,6 +935,7 @@ static bool vmx_get_nmi_mask(struct kvm_ static void vmx_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked); static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, u16 error_code); +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); @@ -946,12 +955,6 @@ static DEFINE_PER_CPU(spinlock_t, blocke enum { VMX_IO_BITMAP_A, VMX_IO_BITMAP_B, - VMX_MSR_BITMAP_LEGACY, - VMX_MSR_BITMAP_LONGMODE, - VMX_MSR_BITMAP_LEGACY_X2APIC_APICV, - VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV, - VMX_MSR_BITMAP_LEGACY_X2APIC, - VMX_MSR_BITMAP_LONGMODE_X2APIC, VMX_VMREAD_BITMAP, VMX_VMWRITE_BITMAP, VMX_BITMAP_NR @@ -961,12 +964,6 @@ static unsigned long *vmx_bitmap[VMX_BIT #define vmx_io_bitmap_a (vmx_bitmap[VMX_IO_BITMAP_A]) #define vmx_io_bitmap_b (vmx_bitmap[VMX_IO_BITMAP_B]) -#define vmx_msr_bitmap_legacy (vmx_bitmap[VMX_MSR_BITMAP_LEGACY]) -#define vmx_msr_bitmap_longmode (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE]) -#define vmx_msr_bitmap_legacy_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC_APICV]) -#define vmx_msr_bitmap_longmode_x2apic_apicv (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC_APICV]) -#define vmx_msr_bitmap_legacy_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LEGACY_X2APIC]) -#define vmx_msr_bitmap_longmode_x2apic (vmx_bitmap[VMX_MSR_BITMAP_LONGMODE_X2APIC]) #define vmx_vmread_bitmap (vmx_bitmap[VMX_VMREAD_BITMAP]) #define vmx_vmwrite_bitmap (vmx_bitmap[VMX_VMWRITE_BITMAP]) @@ -2564,36 +2561,6 @@ static void move_msr_up(struct vcpu_vmx vmx->guest_msrs[from] = tmp; } -static void vmx_set_msr_bitmap(struct kvm_vcpu *vcpu) -{ - unsigned long *msr_bitmap; - - if (is_guest_mode(vcpu)) - msr_bitmap = to_vmx(vcpu)->nested.msr_bitmap; - else if (cpu_has_secondary_exec_ctrls() && - (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & - SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { - if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv; - else - msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv; - } else { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode_x2apic; - else - msr_bitmap = vmx_msr_bitmap_legacy_x2apic; - } - } else { - if (is_long_mode(vcpu)) - msr_bitmap = vmx_msr_bitmap_longmode; - else - msr_bitmap = vmx_msr_bitmap_legacy; - } - - vmcs_write64(MSR_BITMAP, __pa(msr_bitmap)); -} - /* * Set up the vmcs to automatically save and restore system * msrs. Don't touch the 64-bit msrs if the guest is in legacy @@ -2634,7 +2601,7 @@ static void setup_msrs(struct vcpu_vmx * vmx->save_nmsrs = save_nmsrs; if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(&vmx->vcpu); + vmx_update_msr_bitmap(&vmx->vcpu); } /* @@ -3844,6 +3811,8 @@ static void free_loaded_vmcs(struct load loaded_vmcs_clear(loaded_vmcs); free_vmcs(loaded_vmcs->vmcs); loaded_vmcs->vmcs = NULL; + if (loaded_vmcs->msr_bitmap) + free_page((unsigned long)loaded_vmcs->msr_bitmap); WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } @@ -3860,7 +3829,18 @@ static int alloc_loaded_vmcs(struct load loaded_vmcs->shadow_vmcs = NULL; loaded_vmcs_init(loaded_vmcs); + + if (cpu_has_vmx_msr_bitmap()) { + loaded_vmcs->msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); + if (!loaded_vmcs->msr_bitmap) + goto out_vmcs; + memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); + } return 0; + +out_vmcs: + free_loaded_vmcs(loaded_vmcs); + return -ENOMEM; } static void free_kvm_area(void) @@ -4921,10 +4901,8 @@ static void free_vpid(int vpid) spin_unlock(&vmx_vpid_lock); } -#define MSR_TYPE_R 1 -#define MSR_TYPE_W 2 -static void __vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, - u32 msr, int type) +static void __always_inline vmx_disable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) { int f = sizeof(unsigned long); @@ -4958,6 +4936,50 @@ static void __vmx_disable_intercept_for_ } } +static void __always_inline vmx_enable_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type) +{ + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return; + + /* + * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals + * have the write-low and read-high bitmap offsets the wrong way round. + * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff. + */ + if (msr <= 0x1fff) { + if (type & MSR_TYPE_R) + /* read-low */ + __set_bit(msr, msr_bitmap + 0x000 / f); + + if (type & MSR_TYPE_W) + /* write-low */ + __set_bit(msr, msr_bitmap + 0x800 / f); + + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + if (type & MSR_TYPE_R) + /* read-high */ + __set_bit(msr, msr_bitmap + 0x400 / f); + + if (type & MSR_TYPE_W) + /* write-high */ + __set_bit(msr, msr_bitmap + 0xc00 / f); + + } +} + +static void __always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, + u32 msr, int type, bool value) +{ + if (value) + vmx_enable_intercept_for_msr(msr_bitmap, msr, type); + else + vmx_disable_intercept_for_msr(msr_bitmap, msr, type); +} + /* * If a msr is allowed by L0, we should check whether it is allowed by L1. * The corresponding bit will be cleared unless both of L0 and L1 allow it. @@ -5004,28 +5026,68 @@ static void nested_vmx_disable_intercept } } -static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only) +static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu) { - if (!longmode_only) - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy, - msr, MSR_TYPE_R | MSR_TYPE_W); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode, - msr, MSR_TYPE_R | MSR_TYPE_W); -} - -static void vmx_disable_intercept_msr_x2apic(u32 msr, int type, bool apicv_active) -{ - if (apicv_active) { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv, - msr, type); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv, - msr, type); - } else { - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, - msr, type); - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, - msr, type); + u8 mode = 0; + + if (cpu_has_secondary_exec_ctrls() && + (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & + SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { + mode |= MSR_BITMAP_MODE_X2APIC; + if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) + mode |= MSR_BITMAP_MODE_X2APIC_APICV; + } + + if (is_long_mode(vcpu)) + mode |= MSR_BITMAP_MODE_LM; + + return mode; +} + +#define X2APIC_MSR(r) (APIC_BASE_MSR + ((r) >> 4)) + +static void vmx_update_msr_bitmap_x2apic(unsigned long *msr_bitmap, + u8 mode) +{ + int msr; + + for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) { + unsigned word = msr / BITS_PER_LONG; + msr_bitmap[word] = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0; + msr_bitmap[word + (0x800 / sizeof(long))] = ~0; } + + if (mode & MSR_BITMAP_MODE_X2APIC) { + /* + * TPR reads and writes can be virtualized even if virtual interrupt + * delivery is not in use. + */ + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW); + if (mode & MSR_BITMAP_MODE_X2APIC_APICV) { + vmx_enable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_R); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_EOI), MSR_TYPE_W); + vmx_disable_intercept_for_msr(msr_bitmap, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W); + } + } +} + +static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long *msr_bitmap = vmx->vmcs01.msr_bitmap; + u8 mode = vmx_msr_bitmap_mode(vcpu); + u8 changed = mode ^ vmx->msr_bitmap_mode; + + if (!changed) + return; + + vmx_set_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW, + !(mode & MSR_BITMAP_MODE_LM)); + + if (changed & (MSR_BITMAP_MODE_X2APIC | MSR_BITMAP_MODE_X2APIC_APICV)) + vmx_update_msr_bitmap_x2apic(msr_bitmap, mode); + + vmx->msr_bitmap_mode = mode; } static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu) @@ -5277,7 +5339,7 @@ static void vmx_refresh_apicv_exec_ctrl( } if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); } static u32 vmx_exec_control(struct vcpu_vmx *vmx) @@ -5464,7 +5526,7 @@ static void vmx_vcpu_setup(struct vcpu_v vmcs_write64(VMWRITE_BITMAP, __pa(vmx_vmwrite_bitmap)); } if (cpu_has_vmx_msr_bitmap()) - vmcs_write64(MSR_BITMAP, __pa(vmx_msr_bitmap_legacy)); + vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */ @@ -6747,7 +6809,7 @@ void vmx_enable_tdp(void) static __init int hardware_setup(void) { - int r = -ENOMEM, i, msr; + int r = -ENOMEM, i; rdmsrl_safe(MSR_EFER, &host_efer); @@ -6767,9 +6829,6 @@ static __init int hardware_setup(void) memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_legacy, 0xff, PAGE_SIZE); - memset(vmx_msr_bitmap_longmode, 0xff, PAGE_SIZE); - if (setup_vmcs_config(&vmcs_config) < 0) { r = -EIO; goto out; @@ -6838,42 +6897,8 @@ static __init int hardware_setup(void) kvm_tsc_scaling_ratio_frac_bits = 48; } - vmx_disable_intercept_for_msr(MSR_FS_BASE, false); - vmx_disable_intercept_for_msr(MSR_GS_BASE, false); - vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); - vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); - - memcpy(vmx_msr_bitmap_legacy_x2apic_apicv, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic_apicv, - vmx_msr_bitmap_longmode, PAGE_SIZE); - memcpy(vmx_msr_bitmap_legacy_x2apic, - vmx_msr_bitmap_legacy, PAGE_SIZE); - memcpy(vmx_msr_bitmap_longmode_x2apic, - vmx_msr_bitmap_longmode, PAGE_SIZE); - set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ - for (msr = 0x800; msr <= 0x8ff; msr++) { - if (msr == 0x839 /* TMCCT */) - continue; - vmx_disable_intercept_msr_x2apic(msr, MSR_TYPE_R, true); - } - - /* - * TPR reads and writes can be virtualized even if virtual interrupt - * delivery is not in use. - */ - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_W, true); - vmx_disable_intercept_msr_x2apic(0x808, MSR_TYPE_R | MSR_TYPE_W, false); - - /* EOI */ - vmx_disable_intercept_msr_x2apic(0x80b, MSR_TYPE_W, true); - /* SELF-IPI */ - vmx_disable_intercept_msr_x2apic(0x83f, MSR_TYPE_W, true); - if (enable_ept) vmx_enable_tdp(); else @@ -7162,13 +7187,6 @@ static int enter_vmx_operation(struct kv if (r < 0) goto out_vmcs02; - if (cpu_has_vmx_msr_bitmap()) { - vmx->nested.msr_bitmap = - (unsigned long *)__get_free_page(GFP_KERNEL); - if (!vmx->nested.msr_bitmap) - goto out_msr_bitmap; - } - vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL); if (!vmx->nested.cached_vmcs12) goto out_cached_vmcs12; @@ -7195,9 +7213,6 @@ out_shadow_vmcs: kfree(vmx->nested.cached_vmcs12); out_cached_vmcs12: - free_page((unsigned long)vmx->nested.msr_bitmap); - -out_msr_bitmap: free_loaded_vmcs(&vmx->nested.vmcs02); out_vmcs02: @@ -7343,10 +7358,6 @@ static void free_nested(struct vcpu_vmx free_vpid(vmx->nested.vpid02); vmx->nested.posted_intr_nv = -1; vmx->nested.current_vmptr = -1ull; - if (vmx->nested.msr_bitmap) { - free_page((unsigned long)vmx->nested.msr_bitmap); - vmx->nested.msr_bitmap = NULL; - } if (enable_shadow_vmcs) { vmx_disable_shadow_vmcs(vmx); vmcs_clear(vmx->vmcs01.shadow_vmcs); @@ -8862,7 +8873,7 @@ static void vmx_set_virtual_x2apic_mode( } vmcs_write32(SECONDARY_VM_EXEC_CONTROL, sec_exec_control); - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); } static void vmx_set_apic_access_page_addr(struct kvm_vcpu *vcpu, hpa_t hpa) @@ -9523,6 +9534,7 @@ static struct kvm_vcpu *vmx_create_vcpu( { int err; struct vcpu_vmx *vmx = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL); + unsigned long *msr_bitmap; int cpu; if (!vmx) @@ -9559,6 +9571,15 @@ static struct kvm_vcpu *vmx_create_vcpu( if (err < 0) goto free_msrs; + msr_bitmap = vmx->vmcs01.msr_bitmap; + vmx_disable_intercept_for_msr(msr_bitmap, MSR_FS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_KERNEL_GS_BASE, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_CS, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_ESP, MSR_TYPE_RW); + vmx_disable_intercept_for_msr(msr_bitmap, MSR_IA32_SYSENTER_EIP, MSR_TYPE_RW); + vmx->msr_bitmap_mode = 0; + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); @@ -10022,7 +10043,7 @@ static inline bool nested_vmx_merge_msr_ int msr; struct page *page; unsigned long *msr_bitmap_l1; - unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.msr_bitmap; + unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; /* This shortcut is ok because we support only x2APIC MSRs so far. */ if (!nested_cpu_has_virt_x2apic_mode(vmcs12)) @@ -10599,6 +10620,9 @@ static int prepare_vmcs02(struct kvm_vcp if (kvm_has_tsc_control) decache_tsc_multiplier(vmx); + if (cpu_has_vmx_msr_bitmap()) + vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap)); + if (enable_vpid) { /* * There is no direct mapping between vpid02 and vpid12, the @@ -11397,7 +11421,7 @@ static void load_vmcs12_host_state(struc vmcs_write64(GUEST_IA32_DEBUGCTL, 0); if (cpu_has_vmx_msr_bitmap()) - vmx_set_msr_bitmap(vcpu); + vmx_update_msr_bitmap(vcpu); if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, vmcs12->vm_exit_msr_load_count)) Patches currently in stable-queue which might be from jmattson(a)google.com are queue-4.15/x86kvm_Update_spectre-v1_mitigation.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVM_nVMX_Eliminate_vmcs02_pool.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM: nVMX: Eliminate vmcs02 pool" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM: nVMX: Eliminate vmcs02 pool to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVM_nVMX_Eliminate_vmcs02_pool.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM: nVMX: Eliminate vmcs02 pool From: Jim Mattson jmattson(a)google.com Date: Mon Nov 27 17:22:25 2017 -0600 From: Jim Mattson jmattson(a)google.com commit de3a0021a60635de96aa92713c1a31a96747d72c The potential performance advantages of a vmcs02 pool have never been realized. To simplify the code, eliminate the pool. Instead, a single vmcs02 is allocated per VCPU when the VCPU enters VMX operation. Cc: stable(a)vger.kernel.org # prereq for Spectre mitigation Signed-off-by: Jim Mattson <jmattson(a)google.com> Signed-off-by: Mark Kanda <mark.kanda(a)oracle.com> Reviewed-by: Ameya More <ameya.more(a)oracle.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/vmx.c | 146 ++++++++--------------------------------------------- 1 file changed, 23 insertions(+), 123 deletions(-) --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -185,7 +185,6 @@ module_param(ple_window_max, int, S_IRUG extern const ulong vmx_return; #define NR_AUTOLOAD_MSRS 8 -#define VMCS02_POOL_SIZE 1 struct vmcs { u32 revision_id; @@ -226,7 +225,7 @@ struct shared_msr_entry { * stored in guest memory specified by VMPTRLD, but is opaque to the guest, * which must access it using VMREAD/VMWRITE/VMCLEAR instructions. * More than one of these structures may exist, if L1 runs multiple L2 guests. - * nested_vmx_run() will use the data here to build a vmcs02: a VMCS for the + * nested_vmx_run() will use the data here to build the vmcs02: a VMCS for the * underlying hardware which will be used to run L2. * This structure is packed to ensure that its layout is identical across * machines (necessary for live migration). @@ -409,13 +408,6 @@ struct __packed vmcs12 { */ #define VMCS12_SIZE 0x1000 -/* Used to remember the last vmcs02 used for some recently used vmcs12s */ -struct vmcs02_list { - struct list_head list; - gpa_t vmptr; - struct loaded_vmcs vmcs02; -}; - /* * The nested_vmx structure is part of vcpu_vmx, and holds information we need * for correct emulation of VMX (i.e., nested VMX) on this vcpu. @@ -440,15 +432,15 @@ struct nested_vmx { */ bool sync_shadow_vmcs; - /* vmcs02_list cache of VMCSs recently used to run L2 guests */ - struct list_head vmcs02_pool; - int vmcs02_num; bool change_vmcs01_virtual_x2apic_mode; /* L2 must run next, and mustn't decide to exit to L1. */ bool nested_run_pending; + + struct loaded_vmcs vmcs02; + /* - * Guest pages referred to in vmcs02 with host-physical pointers, so - * we must keep them pinned while L2 runs. + * Guest pages referred to in the vmcs02 with host-physical + * pointers, so we must keep them pinned while L2 runs. */ struct page *apic_access_page; struct page *virtual_apic_page; @@ -6974,94 +6966,6 @@ static int handle_monitor(struct kvm_vcp } /* - * To run an L2 guest, we need a vmcs02 based on the L1-specified vmcs12. - * We could reuse a single VMCS for all the L2 guests, but we also want the - * option to allocate a separate vmcs02 for each separate loaded vmcs12 - this - * allows keeping them loaded on the processor, and in the future will allow - * optimizations where prepare_vmcs02 doesn't need to set all the fields on - * every entry if they never change. - * So we keep, in vmx->nested.vmcs02_pool, a cache of size VMCS02_POOL_SIZE - * (>=0) with a vmcs02 for each recently loaded vmcs12s, most recent first. - * - * The following functions allocate and free a vmcs02 in this pool. - */ - -/* Get a VMCS from the pool to use as vmcs02 for the current vmcs12. */ -static struct loaded_vmcs *nested_get_current_vmcs02(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmx->nested.current_vmptr) { - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) { - /* Recycle the least recently used VMCS. */ - item = list_last_entry(&vmx->nested.vmcs02_pool, - struct vmcs02_list, list); - item->vmptr = vmx->nested.current_vmptr; - list_move(&item->list, &vmx->nested.vmcs02_pool); - return &item->vmcs02; - } - - /* Create a new VMCS */ - item = kzalloc(sizeof(struct vmcs02_list), GFP_KERNEL); - if (!item) - return NULL; - item->vmcs02.vmcs = alloc_vmcs(); - item->vmcs02.shadow_vmcs = NULL; - if (!item->vmcs02.vmcs) { - kfree(item); - return NULL; - } - loaded_vmcs_init(&item->vmcs02); - item->vmptr = vmx->nested.current_vmptr; - list_add(&(item->list), &(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num++; - return &item->vmcs02; -} - -/* Free and remove from pool a vmcs02 saved for a vmcs12 (if there is one) */ -static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr) -{ - struct vmcs02_list *item; - list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) - if (item->vmptr == vmptr) { - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - return; - } -} - -/* - * Free all VMCSs saved for this vcpu, except the one pointed by - * vmx->loaded_vmcs. We must be running L1, so vmx->loaded_vmcs - * must be &vmx->vmcs01. - */ -static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx) -{ - struct vmcs02_list *item, *n; - - WARN_ON(vmx->loaded_vmcs != &vmx->vmcs01); - list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) { - /* - * Something will leak if the above WARN triggers. Better than - * a use-after-free. - */ - if (vmx->loaded_vmcs == &item->vmcs02) - continue; - - free_loaded_vmcs(&item->vmcs02); - list_del(&item->list); - kfree(item); - vmx->nested.vmcs02_num--; - } -} - -/* * The following 3 functions, nested_vmx_succeed()/failValid()/failInvalid(), * set the success or error code of an emulated VMX instruction, as specified * by Vol 2B, VMX Instruction Reference, "Conventions". @@ -7242,6 +7146,12 @@ static int enter_vmx_operation(struct kv struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + vmx->nested.vmcs02.vmcs = alloc_vmcs(); + vmx->nested.vmcs02.shadow_vmcs = NULL; + if (!vmx->nested.vmcs02.vmcs) + goto out_vmcs02; + loaded_vmcs_init(&vmx->nested.vmcs02); + if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); @@ -7264,9 +7174,6 @@ static int enter_vmx_operation(struct kv vmx->vmcs01.shadow_vmcs = shadow_vmcs; } - INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool)); - vmx->nested.vmcs02_num = 0; - hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED); vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; @@ -7281,6 +7188,9 @@ out_cached_vmcs12: free_page((unsigned long)vmx->nested.msr_bitmap); out_msr_bitmap: + free_loaded_vmcs(&vmx->nested.vmcs02); + +out_vmcs02: return -ENOMEM; } @@ -7434,7 +7344,7 @@ static void free_nested(struct vcpu_vmx vmx->vmcs01.shadow_vmcs = NULL; } kfree(vmx->nested.cached_vmcs12); - /* Unpin physical memory we referred to in current vmcs02 */ + /* Unpin physical memory we referred to in the vmcs02 */ if (vmx->nested.apic_access_page) { kvm_release_page_dirty(vmx->nested.apic_access_page); vmx->nested.apic_access_page = NULL; @@ -7450,7 +7360,7 @@ static void free_nested(struct vcpu_vmx vmx->nested.pi_desc = NULL; } - nested_free_all_saved_vmcss(vmx); + free_loaded_vmcs(&vmx->nested.vmcs02); } /* Emulate the VMXOFF instruction */ @@ -7493,8 +7403,6 @@ static int handle_vmclear(struct kvm_vcp vmptr + offsetof(struct vmcs12, launch_state), &zero, sizeof(zero)); - nested_free_vmcs02(vmx, vmptr); - nested_vmx_succeed(vcpu); return kvm_skip_emulated_instruction(vcpu); } @@ -8406,10 +8314,11 @@ static bool nested_vmx_exit_reflected(st /* * The host physical addresses of some pages of guest memory - * are loaded into VMCS02 (e.g. L1's Virtual APIC Page). The CPU - * may write to these pages via their host physical address while - * L2 is running, bypassing any address-translation-based dirty - * tracking (e.g. EPT write protection). + * are loaded into the vmcs02 (e.g. vmcs12's Virtual APIC + * Page). The CPU may write to these pages via their host + * physical address while L2 is running, bypassing any + * address-translation-based dirty tracking (e.g. EPT write + * protection). * * Mark them dirty on every exit from L2 to prevent them from * getting out of sync with dirty tracking. @@ -10903,20 +10812,15 @@ static int enter_vmx_non_root_mode(struc { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs12 *vmcs12 = get_vmcs12(vcpu); - struct loaded_vmcs *vmcs02; u32 msr_entry_idx; u32 exit_qual; - vmcs02 = nested_get_current_vmcs02(vmx); - if (!vmcs02) - return -ENOMEM; - enter_guest_mode(vcpu); if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); - vmx_switch_vmcs(vcpu, vmcs02); + vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02); vmx_segment_cache_clear(vmx); if (prepare_vmcs02(vcpu, vmcs12, from_vmentry, &exit_qual)) { @@ -11534,10 +11438,6 @@ static void nested_vmx_vmexit(struct kvm vm_exit_controls_reset_shadow(vmx); vmx_segment_cache_clear(vmx); - /* if no vmcs02 cache requested, remove the one we used */ - if (VMCS02_POOL_SIZE == 0) - nested_free_vmcs02(vmx, vmx->nested.current_vmptr); - /* Update any VMCS fields that might have changed while L2 ran */ vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.nr); vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.nr); Patches currently in stable-queue which might be from jmattson(a)google.com are queue-4.15/x86kvm_Update_spectre-v1_mitigation.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVM_nVMX_Eliminate_vmcs02_pool.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL From: KarimAllah Ahmed karahmed(a)amazon.de Date: Thu Feb 1 22:59:45 2018 +0100 From: KarimAllah Ahmed karahmed(a)amazon.de commit d28b387fb74da95d69d2615732f50cceb38e9a4d [ Based on a patch from Ashok Raj <ashok.raj(a)intel.com> ] Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for guests that will only mitigate Spectre V2 through IBRS+IBPB and will not be using a retpoline+IBPB based approach. To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for guests that do not actually use the MSR, only start saving and restoring when a non-zero is written to it. No attempt is made to handle STIBP here, intentionally. Filtering STIBP may be added in a future patch, which may require trapping all writes if we don't want to pass it through directly to the guest. [dwmw2: Clean up CPUID bits, save/restore manually, handle reset] Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de> Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Darren Kenny <darren.kenny(a)oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Reviewed-by: Jim Mattson <jmattson(a)google.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Jun Nakajima <jun.nakajima(a)intel.com> Cc: kvm(a)vger.kernel.org Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Asit Mallick <asit.k.mallick(a)intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com> Cc: Greg KH <gregkh(a)linuxfoundation.org> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Ashok Raj <ashok.raj(a)intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/cpuid.c | 9 ++-- arch/x86/kvm/vmx.c | 105 ++++++++++++++++++++++++++++++++++++++++++++++++++- arch/x86/kvm/x86.c | 2 3 files changed, 110 insertions(+), 6 deletions(-) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct /* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(IBPB); + F(IBPB) | F(IBRS); /* cpuid 0xC0000001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = @@ -394,7 +394,8 @@ static inline int __do_cpuid_ent(struct /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) | + F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); @@ -630,9 +631,11 @@ static inline int __do_cpuid_ent(struct g_phys_as = phys_as; entry->eax = g_phys_as | (virt_as << 8); entry->edx = 0; - /* IBPB isn't necessarily present in hardware cpuid */ + /* IBRS and IBPB aren't necessarily present in hardware cpuid */ if (boot_cpu_has(X86_FEATURE_IBPB)) entry->ebx |= F(IBPB); + if (boot_cpu_has(X86_FEATURE_IBRS)) + entry->ebx |= F(IBRS); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); break; --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -595,6 +595,7 @@ struct vcpu_vmx { #endif u64 arch_capabilities; + u64 spec_ctrl; u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; @@ -1911,6 +1912,29 @@ static void update_exception_bitmap(stru } /* + * Check if MSR is intercepted for currently loaded MSR bitmap. + */ +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr) +{ + unsigned long *msr_bitmap; + int f = sizeof(unsigned long); + + if (!cpu_has_vmx_msr_bitmap()) + return true; + + msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap; + + if (msr <= 0x1fff) { + return !!test_bit(msr, msr_bitmap + 0x800 / f); + } else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) { + msr &= 0x1fff; + return !!test_bit(msr, msr_bitmap + 0xc00 / f); + } + + return true; +} + +/* * Check if MSR is intercepted for L01 MSR bitmap. */ static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr) @@ -3262,6 +3286,14 @@ static int vmx_get_msr(struct kvm_vcpu * case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + msr_info->data = to_vmx(vcpu)->spec_ctrl; + break; case MSR_IA32_ARCH_CAPABILITIES: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) @@ -3375,6 +3407,37 @@ static int vmx_set_msr(struct kvm_vcpu * case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr_info); break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS) && + !guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) + return 1; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + vmx->spec_ctrl = data; + + if (!data) + break; + + /* + * For non-nested: + * When it's written (to non-zero) for the first time, pass + * it through. + * + * For nested: + * The handling of the MSR bitmap for L2 guests is done in + * nested_vmx_merge_msr_bitmap. We should not touch the + * vmcs02.msr_bitmap here since it gets completely overwritten + * in the merging. We update the vmcs01 here for L1 as well + * since it will end up touching the MSR anyway now. + */ + vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, + MSR_IA32_SPEC_CTRL, + MSR_TYPE_RW); + break; case MSR_IA32_PRED_CMD: if (!msr_info->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_IBPB) && @@ -5700,6 +5763,7 @@ static void vmx_vcpu_reset(struct kvm_vc u64 cr0; vmx->rmode.vm86_active = 0; + vmx->spec_ctrl = 0; vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val(); kvm_set_cr8(vcpu, 0); @@ -9371,6 +9435,15 @@ static void __noclone vmx_vcpu_run(struc vmx_arm_hv_timer(vcpu); + /* + * If this vCPU has touched SPEC_CTRL, restore the guest's value if + * it's non-zero. Since vmentry is serialising on affected CPUs, there + * is no need to worry about the conditional branch over the wrmsr + * being speculatively taken. + */ + if (vmx->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + vmx->__launched = vmx->loaded_vmcs->launched; asm( /* Store host registers */ @@ -9489,6 +9562,27 @@ static void __noclone vmx_vcpu_run(struc #endif ); + /* + * We do not use IBRS in the kernel. If this vCPU has used the + * SPEC_CTRL MSR it may have left it on; save the value and + * turn it off. This is much more efficient than blindly adding + * it to the atomic save/restore list. Especially as the former + * (Saving guest MSRs on vmexit) doesn't even exist in KVM. + * + * For non-nested case: + * If the L01 MSR bitmap does not intercept the MSR, then we need to + * save it. + * + * For nested case: + * If the L02 MSR bitmap does not intercept the MSR, then we need to + * save it. + */ + if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) + rdmsrl(MSR_IA32_SPEC_CTRL, vmx->spec_ctrl); + + if (vmx->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); @@ -10113,7 +10207,7 @@ static inline bool nested_vmx_merge_msr_ unsigned long *msr_bitmap_l1; unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap; /* - * pred_cmd is trying to verify two things: + * pred_cmd & spec_ctrl are trying to verify two things: * * 1. L0 gave a permission to L1 to actually passthrough the MSR. This * ensures that we do not accidentally generate an L02 MSR bitmap @@ -10126,9 +10220,10 @@ static inline bool nested_vmx_merge_msr_ * the MSR. */ bool pred_cmd = msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD); + bool spec_ctrl = msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL); if (!nested_cpu_has_virt_x2apic_mode(vmcs12) && - !pred_cmd) + !pred_cmd && !spec_ctrl) return false; page = kvm_vcpu_gpa_to_page(vcpu, vmcs12->msr_bitmap); @@ -10162,6 +10257,12 @@ static inline bool nested_vmx_merge_msr_ } } + if (spec_ctrl) + nested_vmx_disable_intercept_for_msr( + msr_bitmap_l1, msr_bitmap_l0, + MSR_IA32_SPEC_CTRL, + MSR_TYPE_R | MSR_TYPE_W); + if (pred_cmd) nested_vmx_disable_intercept_for_msr( msr_bitmap_l1, msr_bitmap_l0, --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1009,7 +1009,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, - MSR_IA32_ARCH_CAPABILITIES + MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; Patches currently in stable-queue which might be from ashok.raj(a)intel.com are queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVMx86_Add_IBPB_support.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES From: KarimAllah Ahmed karahmed(a)amazon.de Date: Thu Feb 1 22:59:44 2018 +0100 From: KarimAllah Ahmed karahmed(a)amazon.de commit 28c1c9fabf48d6ad596273a11c46e0d0da3e14cd Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the contents will come directly from the hardware, but user-space can still override it. [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional] Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de> Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Paolo Bonzini <pbonzini(a)redhat.com> Reviewed-by: Darren Kenny <darren.kenny(a)oracle.com> Reviewed-by: Jim Mattson <jmattson(a)google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Jun Nakajima <jun.nakajima(a)intel.com> Cc: kvm(a)vger.kernel.org Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Asit Mallick <asit.k.mallick(a)intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com> Cc: Greg KH <gregkh(a)linuxfoundation.org> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Ashok Raj <ashok.raj(a)intel.com> Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx.c | 15 +++++++++++++++ arch/x86/kvm/x86.c | 1 + 3 files changed, 17 insertions(+), 1 deletion(-) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -394,7 +394,7 @@ static inline int __do_cpuid_ent(struct /* cpuid 7.0.edx*/ const u32 kvm_cpuid_7_0_edx_x86_features = - F(AVX512_4VNNIW) | F(AVX512_4FMAPS); + F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(ARCH_CAPABILITIES); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -594,6 +594,8 @@ struct vcpu_vmx { u64 msr_guest_kernel_gs_base; #endif + u64 arch_capabilities; + u32 vm_entry_controls_shadow; u32 vm_exit_controls_shadow; u32 secondary_exec_control; @@ -3260,6 +3262,12 @@ static int vmx_get_msr(struct kvm_vcpu * case MSR_IA32_TSC: msr_info->data = guest_read_tsc(vcpu); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_ARCH_CAPABILITIES)) + return 1; + msr_info->data = to_vmx(vcpu)->arch_capabilities; + break; case MSR_IA32_SYSENTER_CS: msr_info->data = vmcs_read32(GUEST_SYSENTER_CS); break; @@ -3395,6 +3403,11 @@ static int vmx_set_msr(struct kvm_vcpu * vmx_disable_intercept_for_msr(vmx->vmcs01.msr_bitmap, MSR_IA32_PRED_CMD, MSR_TYPE_W); break; + case MSR_IA32_ARCH_CAPABILITIES: + if (!msr_info->host_initiated) + return 1; + vmx->arch_capabilities = data; + break; case MSR_IA32_CR_PAT: if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) { if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) @@ -5657,6 +5670,8 @@ static void vmx_vcpu_setup(struct vcpu_v ++vmx->nmsrs; } + if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities); vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl); --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1009,6 +1009,7 @@ static u32 msrs_to_save[] = { #endif MSR_IA32_TSC, MSR_IA32_CR_PAT, MSR_VM_HSAVE_PA, MSR_IA32_FEATURE_CONTROL, MSR_IA32_BNDCFGS, MSR_TSC_AUX, + MSR_IA32_ARCH_CAPABILITIES }; static unsigned num_msrs_to_save; Patches currently in stable-queue which might be from karahmed(a)amazon.de are queue-4.15/x86spectre_Simplify_spectre_v2_command_line_parsing.patch queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch queue-4.15/x86cpuid_Fix_up_virtual_IBRSIBPBSTIBP_feature_bits_on_Intel.patch queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVMx86_Add_IBPB_support.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch queue-4.15/x86retpoline_Avoid_retpolines_for_built-in___init_functions.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM: VMX: introduce alloc_loaded_vmcs" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM: VMX: introduce alloc_loaded_vmcs to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVM_VMX_introduce_alloc_loaded_vmcs.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM: VMX: introduce alloc_loaded_vmcs From: Paolo Bonzini pbonzini(a)redhat.com Date: Thu Jan 11 12:16:15 2018 +0100 From: Paolo Bonzini pbonzini(a)redhat.com commit f21f165ef922c2146cc5bdc620f542953c41714b Group together the calls to alloc_vmcs and loaded_vmcs_init. Soon we'll also allocate an MSR bitmap there. Cc: stable(a)vger.kernel.org # prereq for Spectre mitigation Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/vmx.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3829,11 +3829,6 @@ static struct vmcs *alloc_vmcs_cpu(int c return vmcs; } -static struct vmcs *alloc_vmcs(void) -{ - return alloc_vmcs_cpu(raw_smp_processor_id()); -} - static void free_vmcs(struct vmcs *vmcs) { free_pages((unsigned long)vmcs, vmcs_config.order); @@ -3852,6 +3847,22 @@ static void free_loaded_vmcs(struct load WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } +static struct vmcs *alloc_vmcs(void) +{ + return alloc_vmcs_cpu(raw_smp_processor_id()); +} + +static int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) +{ + loaded_vmcs->vmcs = alloc_vmcs(); + if (!loaded_vmcs->vmcs) + return -ENOMEM; + + loaded_vmcs->shadow_vmcs = NULL; + loaded_vmcs_init(loaded_vmcs); + return 0; +} + static void free_kvm_area(void) { int cpu; @@ -7145,12 +7156,11 @@ static int enter_vmx_operation(struct kv { struct vcpu_vmx *vmx = to_vmx(vcpu); struct vmcs *shadow_vmcs; + int r; - vmx->nested.vmcs02.vmcs = alloc_vmcs(); - vmx->nested.vmcs02.shadow_vmcs = NULL; - if (!vmx->nested.vmcs02.vmcs) + r = alloc_loaded_vmcs(&vmx->nested.vmcs02); + if (r < 0) goto out_vmcs02; - loaded_vmcs_init(&vmx->nested.vmcs02); if (cpu_has_vmx_msr_bitmap()) { vmx->nested.msr_bitmap = @@ -9545,13 +9555,11 @@ static struct kvm_vcpu *vmx_create_vcpu( if (!vmx->guest_msrs) goto free_pml; - vmx->loaded_vmcs = &vmx->vmcs01; - vmx->loaded_vmcs->vmcs = alloc_vmcs(); - vmx->loaded_vmcs->shadow_vmcs = NULL; - if (!vmx->loaded_vmcs->vmcs) + err = alloc_loaded_vmcs(&vmx->vmcs01); + if (err < 0) goto free_msrs; - loaded_vmcs_init(vmx->loaded_vmcs); + vmx->loaded_vmcs = &vmx->vmcs01; cpu = get_cpu(); vmx_vcpu_load(&vmx->vcpu, cpu); vmx->vcpu.cpu = cpu; Patches currently in stable-queue which might be from pbonzini(a)redhat.com are queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch queue-4.15/x86kvm_Update_spectre-v1_mitigation.patch queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch queue-4.15/KVM_VMX_introduce_alloc_loaded_vmcs.patch queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch queue-4.15/KVM_nVMX_Eliminate_vmcs02_pool.patch queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVMx86_Add_IBPB_support.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch queue-4.15/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch

7 years, 11 months

[Linux-stable-mirror] Patch "KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL From: KarimAllah Ahmed karahmed(a)amazon.de Date: Sat Feb 3 15:56:23 2018 +0100 From: KarimAllah Ahmed karahmed(a)amazon.de commit b2ac58f90540e39324e7a29a7ad471407ae0bf48 [ Based on a patch from Paolo Bonzini <pbonzini(a)redhat.com> ] ... basically doing exactly what we do for VMX: - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID) - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest actually used it. Signed-off-by: KarimAllah Ahmed <karahmed(a)amazon.de> Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Darren Kenny <darren.kenny(a)oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Andi Kleen <ak(a)linux.intel.com> Cc: Jun Nakajima <jun.nakajima(a)intel.com> Cc: kvm(a)vger.kernel.org Cc: Dave Hansen <dave.hansen(a)intel.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Asit Mallick <asit.k.mallick(a)intel.com> Cc: Arjan Van De Ven <arjan.van.de.ven(a)intel.com> Cc: Greg KH <gregkh(a)linuxfoundation.org> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Dan Williams <dan.j.williams(a)intel.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Ashok Raj <ashok.raj(a)intel.com> Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kvm/svm.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -184,6 +184,8 @@ struct vcpu_svm { u64 gs_base; } host; + u64 spec_ctrl; + u32 *msrpm; ulong nmi_iret_rip; @@ -249,6 +251,7 @@ static const struct svm_direct_access_ms { .index = MSR_CSTAR, .always = true }, { .index = MSR_SYSCALL_MASK, .always = true }, #endif + { .index = MSR_IA32_SPEC_CTRL, .always = false }, { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, @@ -882,6 +885,25 @@ static bool valid_msr_intercept(u32 inde return false; } +static bool msr_write_intercepted(struct kvm_vcpu *vcpu, unsigned msr) +{ + u8 bit_write; + unsigned long tmp; + u32 offset; + u32 *msrpm; + + msrpm = is_guest_mode(vcpu) ? to_svm(vcpu)->nested.msrpm: + to_svm(vcpu)->msrpm; + + offset = svm_msrpm_offset(msr); + bit_write = 2 * (msr & 0x0f) + 1; + tmp = msrpm[offset]; + + BUG_ON(offset == MSR_INVALID); + + return !!test_bit(bit_write, &tmp); +} + static void set_msr_interception(u32 *msrpm, unsigned msr, int read, int write) { @@ -1584,6 +1606,8 @@ static void svm_vcpu_reset(struct kvm_vc u32 dummy; u32 eax = 1; + svm->spec_ctrl = 0; + if (!init_event) { svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; @@ -3605,6 +3629,13 @@ static int svm_get_msr(struct kvm_vcpu * case MSR_VM_CR: msr_info->data = svm->nested.vm_cr_msr; break; + case MSR_IA32_SPEC_CTRL: + if (!msr_info->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + msr_info->data = svm->spec_ctrl; + break; case MSR_IA32_UCODE_REV: msr_info->data = 0x01000065; break; @@ -3696,6 +3727,33 @@ static int svm_set_msr(struct kvm_vcpu * case MSR_IA32_TSC: kvm_write_tsc(vcpu, msr); break; + case MSR_IA32_SPEC_CTRL: + if (!msr->host_initiated && + !guest_cpuid_has(vcpu, X86_FEATURE_IBRS)) + return 1; + + /* The STIBP bit doesn't fault even if it's not advertised */ + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + return 1; + + svm->spec_ctrl = data; + + if (!data) + break; + + /* + * For non-nested: + * When it's written (to non-zero) for the first time, pass + * it through. + * + * For nested: + * The handling of the MSR bitmap for L2 guests is done in + * nested_svm_vmrun_msrpm. + * We update the L1 MSR bit as well since it will end up + * touching the MSR anyway now. + */ + set_msr_interception(svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); + break; case MSR_IA32_PRED_CMD: if (!msr->host_initiated && !guest_cpuid_has(vcpu, X86_FEATURE_IBPB)) @@ -4964,6 +5022,15 @@ static void svm_vcpu_run(struct kvm_vcpu local_irq_enable(); + /* + * If this vCPU has touched SPEC_CTRL, restore the guest's value if + * it's non-zero. Since vmentry is serialising on affected CPUs, there + * is no need to worry about the conditional branch over the wrmsr + * being speculatively taken. + */ + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + asm volatile ( "push %%" _ASM_BP "; \n\t" "mov %c[rbx](%[svm]), %%" _ASM_BX " \n\t" @@ -5056,6 +5123,27 @@ static void svm_vcpu_run(struct kvm_vcpu #endif ); + /* + * We do not use IBRS in the kernel. If this vCPU has used the + * SPEC_CTRL MSR it may have left it on; save the value and + * turn it off. This is much more efficient than blindly adding + * it to the atomic save/restore list. Especially as the former + * (Saving guest MSRs on vmexit) doesn't even exist in KVM. + * + * For non-nested case: + * If the L01 MSR bitmap does not intercept the MSR, then we need to + * save it. + * + * For nested case: + * If the L02 MSR bitmap does not intercept the MSR, then we need to + * save it. + */ + if (!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)) + rdmsrl(MSR_IA32_SPEC_CTRL, svm->spec_ctrl); + + if (svm->spec_ctrl) + wrmsrl(MSR_IA32_SPEC_CTRL, 0); + /* Eliminate branch target predictions from guest mode */ vmexit_fill_RSB(); Patches currently in stable-queue which might be from pbonzini(a)redhat.com are queue-4.15/x86pti_Do_not_enable_PTI_on_CPUs_which_are_not_vulnerable_to_Meltdown.patch queue-4.15/x86kvm_Update_spectre-v1_mitigation.patch queue-4.15/x86speculation_Use_Indirect_Branch_Prediction_Barrier_in_context_switch.patch queue-4.15/KVM_VMX_introduce_alloc_loaded_vmcs.patch queue-4.15/x86cpufeature_Blacklist_SPEC_CTRLPRED_CMD_on_early_Spectre_v2_microcodes.patch queue-4.15/x86cpufeatures_Add_Intel_feature_bits_for_Speculation_Control.patch queue-4.15/x86paravirt_Remove_noreplace-paravirt_cmdline_option.patch queue-4.15/KVM_VMX_Make_indirect_call_speculation_safe.patch queue-4.15/x86msr_Add_definitions_for_new_speculation_control_MSRs.patch queue-4.15/KVMVMX_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/x86cpufeatures_Add_CPUID_7_EDX_CPUID_leaf.patch queue-4.15/KVM_nVMX_Eliminate_vmcs02_pool.patch queue-4.15/x86cpufeatures_Add_AMD_feature_bits_for_Speculation_Control.patch queue-4.15/KVMSVM_Allow_direct_access_to_MSR_IA32_SPEC_CTRL.patch queue-4.15/KVMx86_Add_IBPB_support.patch queue-4.15/KVMVMX_Emulate_MSR_IA32_ARCH_CAPABILITIES.patch queue-4.15/x86speculation_Add_basic_IBPB_(Indirect_Branch_Prediction_Barrier)_support.patch queue-4.15/x86speculation_Simplify_indirect_branch_prediction_barrier().patch queue-4.15/KVM_VMX_make_MSR_bitmaps_per-VCPU.patch queue-4.15/KVM_x86_Make_indirect_calls_in_emulator_speculation_safe.patch queue-4.15/x86retpoline_Simplify_vmexit_fill_RSB().patch queue-4.15/x86cpufeatures_Clean_up_Spectre_v2_related_CPUID_flags.patch queue-4.15/KVMx86_Update_the_reverse_cpuid_list_to_include_CPUID_7_EDX.patch

7 years, 11 months

[Linux-stable-mirror] Patch "Documentation: Document array_index_nospec" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled Documentation: Document array_index_nospec to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: Documentation_Document_array_index_nospec.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. Subject: Documentation: Document array_index_nospec From: Mark Rutland mark.rutland(a)arm.com Date: Mon Jan 29 17:02:16 2018 -0800 From: Mark Rutland mark.rutland(a)arm.com commit f84a56f73dddaeac1dba8045b007f742f61cd2da Document the rationale and usage of the new array_index_nospec() helper. Signed-off-by: Mark Rutland <mark.rutland(a)arm.com> Signed-off-by: Will Deacon <will.deacon(a)arm.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Reviewed-by: Kees Cook <keescook(a)chromium.org> Cc: linux-arch(a)vger.kernel.org Cc: Jonathan Corbet <corbet(a)lwn.net> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: gregkh(a)linuxfoundation.org Cc: kernel-hardening(a)lists.openwall.com Cc: torvalds(a)linux-foundation.org Cc: alan(a)linux.intel.com Link: https://lkml.kernel.org/r/151727413645.33451.15878817161436755393.stgit@dwi… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- Documentation/speculation.txt | 90 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) --- /dev/null +++ b/Documentation/speculation.txt @@ -0,0 +1,90 @@ +This document explains potential effects of speculation, and how undesirable +effects can be mitigated portably using common APIs. + +=========== +Speculation +=========== + +To improve performance and minimize average latencies, many contemporary CPUs +employ speculative execution techniques such as branch prediction, performing +work which may be discarded at a later stage. + +Typically speculative execution cannot be observed from architectural state, +such as the contents of registers. However, in some cases it is possible to +observe its impact on microarchitectural state, such as the presence or +absence of data in caches. Such state may form side-channels which can be +observed to extract secret information. + +For example, in the presence of branch prediction, it is possible for bounds +checks to be ignored by code which is speculatively executed. Consider the +following code: + + int load_array(int *array, unsigned int index) + { + if (index >= MAX_ARRAY_ELEMS) + return 0; + else + return array[index]; + } + +Which, on arm64, may be compiled to an assembly sequence such as: + + CMP <index>, #MAX_ARRAY_ELEMS + B.LT less + MOV <returnval>, #0 + RET + less: + LDR <returnval>, [<array>, <index>] + RET + +It is possible that a CPU mis-predicts the conditional branch, and +speculatively loads array[index], even if index >= MAX_ARRAY_ELEMS. This +value will subsequently be discarded, but the speculated load may affect +microarchitectural state which can be subsequently measured. + +More complex sequences involving multiple dependent memory accesses may +result in sensitive information being leaked. Consider the following +code, building on the prior example: + + int load_dependent_arrays(int *arr1, int *arr2, int index) + { + int val1, val2, + + val1 = load_array(arr1, index); + val2 = load_array(arr2, val1); + + return val2; + } + +Under speculation, the first call to load_array() may return the value +of an out-of-bounds address, while the second call will influence +microarchitectural state dependent on this value. This may provide an +arbitrary read primitive. + +==================================== +Mitigating speculation side-channels +==================================== + +The kernel provides a generic API to ensure that bounds checks are +respected even under speculation. Architectures which are affected by +speculation-based side-channels are expected to implement these +primitives. + +The array_index_nospec() helper in <linux/nospec.h> can be used to +prevent information from being leaked via side-channels. + +A call to array_index_nospec(index, size) returns a sanitized index +value that is bounded to [0, size) even under cpu speculation +conditions. + +This can be used to protect the earlier load_array() example: + + int load_array(int *array, unsigned int index) + { + if (index >= MAX_ARRAY_ELEMS) + return 0; + else { + index = array_index_nospec(index, MAX_ARRAY_ELEMS); + return array[index]; + } + } Patches currently in stable-queue which might be from mark.rutland(a)arm.com are queue-4.15/Documentation_Document_array_index_nospec.patch

7 years, 11 months

Jump to page:

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2018