Hi Sean,
Add a new feature flag, X86_FEATURE_MSR_IA32_FEAT_CTL, to track whether IA32_FEAT_CTL has been initialized. This will allow KVM, and any future subsystems that depend on IA32_FEAT_CTL, to rely purely on cpufeatures to query platform support, e.g. allows a future patch to remove KVM's manual IA32_FEAT_CTL MSR checks.
Various features (on platforms that support IA32_FEAT_CTL) are dependent on IA32_FEAT_CTL being configured and locked, e.g. VMX and LMCE. The MSR is always configured during boot, but only if the CPU vendor is recognized by the kernel. Because CPUID doesn't incorporate the current IA32_FEAT_CTL value in its reporting of relevant features, it's possible for a feature to be reported as supported in cpufeatures but not truly enabled, e.g. if the CPU supports VMX but the kernel doesn't recognize the CPU.
As a result, without the flag, KVM would see VMX as supported even if IA32_FEAT_CTL hasn't been initialized, and so would need to manually read the MSR and check the various enabling bits to avoid taking an unexpected #GP on VMXON.
I recently ran into a general protection fault that I believe is the fault of this patch:
[ 32.189584] general protection fault, maybe for address 0xffffb567801bcf58: 0000 [#1] SMP PTI [ 32.198103] CPU: 1 PID: 2600 Comm: rngd Not tainted 5.6.0-rc2-jk+ #2 [ 32.204454] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 32.214887] RIP: 0010:hardware_enable+0x100/0x1a0 [kvm_intel] [ 32.220628] Code: 00 00 48 39 f8 74 0f 65 48 89 3d 43 a2 cb 3c e8 66 d3 cc c5 66 90 48 89 df 57 9d 0f 1f 44 00 00 bf 01 00 00 00 e8 90 3d ca c5 <f3> 0f c7 34 24 31 c0 80 3d 59 8d 03 00 00 75 36 48 8b 5c 24 10 65 [ 32.239373] RSP: 0000:ffffb567801bcf58 EFLAGS: 00010002 [ 32.244598] RAX: 0000000000300000 RBX: 0000000000000086 RCX: ffff8f2650440000 [ 32.251730] RDX: 0000000000300000 RSI: 0000000000000000 RDI: ffff8f2650457020 [ 32.258862] RBP: 0000000000000007 R08: 000000077ea5d531 R09: 0000000000000000 [ 32.265986] R10: 000001432bf20982 R11: 0000000000000000 R12: ffffd55b80467110 [ 32.273118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 32.280243] FS: 00007facfe66f700(0000) GS:ffff8f2650440000(0000) knlGS:0000000000000000 [ 32.288329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.294077] CR2: 00007facf0003000 CR3: 0000000b7d402006 CR4: 00000000007626e0 [ 32.301210] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 32.308342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 32.315474] PKRU: 55555554 [ 32.318186] Call Trace: [ 32.320642] <IRQ> [ 32.322689] kvm_arch_hardware_enable+0x84/0x240 [kvm] [ 32.327836] hardware_enable_nolock+0x31/0x60 [kvm] [ 32.332717] flush_smp_call_function_queue+0x4d/0xe0 [ 32.337683] smp_call_function_interrupt+0x3a/0xd0 [ 32.342471] call_function_interrupt+0xf/0x20 [ 32.346830] </IRQ> [ 32.348935] RIP: 0033:0x7facffd4c753 [ 32.352514] Code: e8 48 c7 45 e0 00 00 00 00 eb 5f 48 8b 45 c8 48 8b 50 38 48 8b 45 c8 8b 40 40 89 c0 48 01 d0 48 89 45 f0 48 8b 45 f0 0f b6 00 <83> c0 01 89 c2 48 8b 45 f0 88 10 48 8b 45 c8 8b 50 40 48 8b 45 c8 [ 32.371263] RSP: 002b:00007facfe66ebf0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff03 [ 32.378826] RAX: 00000000000000ee RBX: 0000000000004097 RCX: 0000000000000000 [ 32.385961] RDX: 0000562781dbadf0 RSI: 0000000000000000 RDI: 00007ffd7edf9080 [ 32.393092] RBP: 00007facfe66ec30 R08: 00007ffd7edf9080 R09: 000000000000cd4a [ 32.400226] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 32.407358] R13: 00007facf0000b20 R14: 0000562781dba2e8 R15: 00007facfe66ed10 [ 32.414493] Modules linked in: ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter rfkill ib_isert iscsi _target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib vfat fat ib_umad rpcrdma sunrpc intel_rapl_msr intel_rapl_common rdma_ucm ib_iser rdma_cm isst_if_common iw_cm ib_cm libiscsi skx_edac scsi_transport_iscsi nfit libnv dimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate i40iw qat_c62x iTCO_wdt ipmi_ssif iTCO_vendor_support ib_uverbs mei_me intel_qat intel_uncore ib_c ore joydev intel_rapl_perf pcspkr ipmi_si authenc ioatdma mei i2c_i801 lpc_ich dca ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ip_tables ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec drm ice i40e crc32 c_intel wmi fuse [ 32.498314] ---[ end trace bfeeeba337a01208 ]---
I noticed that a slightly older commit from before this does not fail. Additionally, the system reports the following during boot:
kvm: disabled by bios
I looked into the vmx_disabled_by_bios and noticed that it checks for both X86_FEATURE_MSR_IA32_FEAT_CTL and X86_FEATURE_VMX.
Compared to the older code before commit a4d0b2fdbcf7 ("KVM: VMX: Use VMX feature flag to query BIOS enabling") it's not clear to me how exactly this could fail to match up.
I suspect something is wrong and the features are enabled even though the BIOS has it disabled, leading to later failure because of this.
Thanks, Jake