Supersedes: 20240130180400.1698136-1-pbonzini@redhat.com
MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits. This breaks boot on machines that have TME enabled and do something to cleanup MTRRs, unless "disable_mtrr_cleanup" is passed on the command line. The fix is to move the check to early CPU initialization, which runs before Linux sets up MTRRs.
However, as noticed by Kirill, the patch I sent as v1 actually works only until Linux 6.6. In Linux 6.7, commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") reorganized the initialization of c->x86_phys_bits in a way that broke the patch. But even in 6.7 AMD processors, which did try to reduce it in this_cpu->c_early_init(c), had their x86_phys_bits value overwritten by get_cpu_address_sizes(), so that early_identify_cpu() left the wrong value in x86_phys_bits. This probably went unnoticed because on AMD processors you need not apply the reduced MAXPHYADDR to MTRR masks.
Therefore, this v2 prepends the fix for this issue in commit fbf6449f84bf. Apologies for the oversight.
Tested on an AMD Epyc machine (where I resorted to dumping mtrr_state) and on the problematic Intel Emerald Rapids machine.
Thanks,
Paolo
Paolo Bonzini (2): x86/cpu: allow reducing x86_phys_bits during early_identify_cpu() x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers
arch/x86/kernel/cpu/common.c | 4 +- arch/x86/kernel/cpu/intel.c | 178 ++++++++++++++++++----------------- 2 files changed, 93 insertions(+), 89 deletions(-)