Supersedes: 20240130180400.1698136-1-pbonzini@redhat.com
MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits. This breaks boot on machines that have TME enabled and do something to cleanup MTRRs, unless "disable_mtrr_cleanup" is passed on the command line. The fix is to move the check to early CPU initialization, which runs before Linux sets up MTRRs.
However, as noticed by Kirill, the patch I sent as v1 actually works only until Linux 6.6. In Linux 6.7, commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") reorganized the initialization of c->x86_phys_bits in a way that broke the patch. But even in 6.7 AMD processors, which did try to reduce it in this_cpu->c_early_init(c), had their x86_phys_bits value overwritten by get_cpu_address_sizes(), so that early_identify_cpu() left the wrong value in x86_phys_bits. This probably went unnoticed because on AMD processors you need not apply the reduced MAXPHYADDR to MTRR masks.
Therefore, this v2 prepends the fix for this issue in commit fbf6449f84bf. Apologies for the oversight.
Tested on an AMD Epyc machine (where I resorted to dumping mtrr_state) and on the problematic Intel Emerald Rapids machine.
Thanks,
Paolo
Paolo Bonzini (2): x86/cpu: allow reducing x86_phys_bits during early_identify_cpu() x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers
arch/x86/kernel/cpu/common.c | 4 +- arch/x86/kernel/cpu/intel.c | 178 ++++++++++++++++++----------------- 2 files changed, 93 insertions(+), 89 deletions(-)
In commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach"), the initialization of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is incorrect because early_init_amd() expected to be able to reduce the value according to the contents of CPUID leaf 0x8000001f.
Fortunately, the bug was negated by init_amd()'s call to early_init_amd(), which does reduce x86_phys_bits in the end. However, this is very late in the boot process and, most notably, the wrong value is used for x86_phys_bits when setting up MTRRs.
To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is set/cleared, and c->extended_cpuid_level is retrieved.
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Cc: Adam Dunlap acdunlap@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Xiaoyao Li xiaoyao.li@intel.com Cc: Kai Huang kai.huang@intel.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- arch/x86/kernel/cpu/common.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0b97bcde70c6..fbc4e60d027c 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1589,6 +1589,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) get_cpu_vendor(c); get_cpu_cap(c); setup_force_cpu_cap(X86_FEATURE_CPUID); + get_cpu_address_sizes(c); cpu_parse_early_param();
if (this_cpu->c_early_init) @@ -1601,10 +1602,9 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) this_cpu->c_bsp_init(c); } else { setup_clear_cpu_cap(X86_FEATURE_CPUID); + get_cpu_address_sizes(c); }
- get_cpu_address_sizes(c); - setup_force_cpu_cap(X86_FEATURE_ALWAYS);
cpu_set_bug_bits(c);
On Thu, Feb 01, 2024 at 12:09:01AM +0100, Paolo Bonzini wrote:
In commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach"), the initialization of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is incorrect because early_init_amd() expected to be able to reduce the value according to the contents of CPUID leaf 0x8000001f.
Fortunately, the bug was negated by init_amd()'s call to early_init_amd(), which does reduce x86_phys_bits in the end. However, this is very late in the boot process and, most notably, the wrong value is used for x86_phys_bits when setting up MTRRs.
To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is set/cleared, and c->extended_cpuid_level is retrieved.
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Cc: Adam Dunlap acdunlap@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Xiaoyao Li xiaoyao.li@intel.com Cc: Kai Huang kai.huang@intel.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com
Reviewed-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com
On 1/02/2024 7:09 am, Paolo Bonzini wrote:
In commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach"), the initialization of c->x86_phys_bits was moved after this_cpu->c_early_init(c). This is incorrect because early_init_amd() expected to be able to reduce the value according to the contents of CPUID leaf 0x8000001f.
Fortunately, the bug was negated by init_amd()'s call to early_init_amd(), which does reduce x86_phys_bits in the end. However, this is very late in the boot process and, most notably, the wrong value is used for x86_phys_bits when setting up MTRRs.
To fix this, call get_cpu_address_sizes() as soon as X86_FEATURE_CPUID is set/cleared, and c->extended_cpuid_level is retrieved.
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Cc: Adam Dunlap acdunlap@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Xiaoyao Li xiaoyao.li@intel.com Cc: Kai Huang kai.huang@intel.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com
Acked-by: Kai Huang kai.huang@intel.com
MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits.
detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts it from the total usable physical bits, but it is called too late. Move the call to early_init_intel() so that it is called in setup_arch(), before MTRRs are setup.
This fixes boot on TDX-enabled systems, which until now only worked with "disable_mtrr_cleanup". Without the patch, the values written to the MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and the writes failed; with the patch, the values are 46-bit wide, which matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo.
Reported-by: Zixi Chen zixchen@redhat.com Cc: Adam Dunlap acdunlap@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Xiaoyao Li xiaoyao.li@intel.com Cc: Kai Huang kai.huang@intel.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com --- arch/x86/kernel/cpu/intel.c | 178 ++++++++++++++++++------------------ 1 file changed, 91 insertions(+), 87 deletions(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index a927a8fc9624..40dec9b56f87 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -184,6 +184,90 @@ static bool bad_spectre_microcode(struct cpuinfo_x86 *c) return false; }
+#define MSR_IA32_TME_ACTIVATE 0x982 + +/* Helpers to access TME_ACTIVATE MSR */ +#define TME_ACTIVATE_LOCKED(x) (x & 0x1) +#define TME_ACTIVATE_ENABLED(x) (x & 0x2) + +#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */ +#define TME_ACTIVATE_POLICY_AES_XTS_128 0 + +#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */ + +#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */ +#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1 + +/* Values for mktme_status (SW only construct) */ +#define MKTME_ENABLED 0 +#define MKTME_DISABLED 1 +#define MKTME_UNINITIALIZED 2 +static int mktme_status = MKTME_UNINITIALIZED; + +static void detect_tme_early(struct cpuinfo_x86 *c) +{ + u64 tme_activate, tme_policy, tme_crypto_algs; + int keyid_bits = 0, nr_keyids = 0; + static u64 tme_activate_cpu0 = 0; + + rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate); + + if (mktme_status != MKTME_UNINITIALIZED) { + if (tme_activate != tme_activate_cpu0) { + /* Broken BIOS? */ + pr_err_once("x86/tme: configuration is inconsistent between CPUs\n"); + pr_err_once("x86/tme: MKTME is not usable\n"); + mktme_status = MKTME_DISABLED; + + /* Proceed. We may need to exclude bits from x86_phys_bits. */ + } + } else { + tme_activate_cpu0 = tme_activate; + } + + if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) { + pr_info_once("x86/tme: not enabled by BIOS\n"); + mktme_status = MKTME_DISABLED; + return; + } + + if (mktme_status != MKTME_UNINITIALIZED) + goto detect_keyid_bits; + + pr_info("x86/tme: enabled by BIOS\n"); + + tme_policy = TME_ACTIVATE_POLICY(tme_activate); + if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128) + pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy); + + tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate); + if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) { + pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n", + tme_crypto_algs); + mktme_status = MKTME_DISABLED; + } +detect_keyid_bits: + keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate); + nr_keyids = (1UL << keyid_bits) - 1; + if (nr_keyids) { + pr_info_once("x86/mktme: enabled by BIOS\n"); + pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids); + } else { + pr_info_once("x86/mktme: disabled by BIOS\n"); + } + + if (mktme_status == MKTME_UNINITIALIZED) { + /* MKTME is usable */ + mktme_status = MKTME_ENABLED; + } + + /* + * KeyID bits effectively lower the number of physical address + * bits. Update cpuinfo_x86::x86_phys_bits accordingly. + */ + c->x86_phys_bits -= keyid_bits; +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; @@ -322,6 +406,13 @@ static void early_init_intel(struct cpuinfo_x86 *c) */ if (detect_extended_topology_early(c) < 0) detect_ht_early(c); + + /* + * Adjust the number of physical bits early because it affects the + * valid bits of the MTRR mask registers. + */ + if (cpu_has(c, X86_FEATURE_TME)) + detect_tme_early(c); }
static void bsp_init_intel(struct cpuinfo_x86 *c) @@ -482,90 +573,6 @@ static void srat_detect_node(struct cpuinfo_x86 *c) #endif }
-#define MSR_IA32_TME_ACTIVATE 0x982 - -/* Helpers to access TME_ACTIVATE MSR */ -#define TME_ACTIVATE_LOCKED(x) (x & 0x1) -#define TME_ACTIVATE_ENABLED(x) (x & 0x2) - -#define TME_ACTIVATE_POLICY(x) ((x >> 4) & 0xf) /* Bits 7:4 */ -#define TME_ACTIVATE_POLICY_AES_XTS_128 0 - -#define TME_ACTIVATE_KEYID_BITS(x) ((x >> 32) & 0xf) /* Bits 35:32 */ - -#define TME_ACTIVATE_CRYPTO_ALGS(x) ((x >> 48) & 0xffff) /* Bits 63:48 */ -#define TME_ACTIVATE_CRYPTO_AES_XTS_128 1 - -/* Values for mktme_status (SW only construct) */ -#define MKTME_ENABLED 0 -#define MKTME_DISABLED 1 -#define MKTME_UNINITIALIZED 2 -static int mktme_status = MKTME_UNINITIALIZED; - -static void detect_tme(struct cpuinfo_x86 *c) -{ - u64 tme_activate, tme_policy, tme_crypto_algs; - int keyid_bits = 0, nr_keyids = 0; - static u64 tme_activate_cpu0 = 0; - - rdmsrl(MSR_IA32_TME_ACTIVATE, tme_activate); - - if (mktme_status != MKTME_UNINITIALIZED) { - if (tme_activate != tme_activate_cpu0) { - /* Broken BIOS? */ - pr_err_once("x86/tme: configuration is inconsistent between CPUs\n"); - pr_err_once("x86/tme: MKTME is not usable\n"); - mktme_status = MKTME_DISABLED; - - /* Proceed. We may need to exclude bits from x86_phys_bits. */ - } - } else { - tme_activate_cpu0 = tme_activate; - } - - if (!TME_ACTIVATE_LOCKED(tme_activate) || !TME_ACTIVATE_ENABLED(tme_activate)) { - pr_info_once("x86/tme: not enabled by BIOS\n"); - mktme_status = MKTME_DISABLED; - return; - } - - if (mktme_status != MKTME_UNINITIALIZED) - goto detect_keyid_bits; - - pr_info("x86/tme: enabled by BIOS\n"); - - tme_policy = TME_ACTIVATE_POLICY(tme_activate); - if (tme_policy != TME_ACTIVATE_POLICY_AES_XTS_128) - pr_warn("x86/tme: Unknown policy is active: %#llx\n", tme_policy); - - tme_crypto_algs = TME_ACTIVATE_CRYPTO_ALGS(tme_activate); - if (!(tme_crypto_algs & TME_ACTIVATE_CRYPTO_AES_XTS_128)) { - pr_err("x86/mktme: No known encryption algorithm is supported: %#llx\n", - tme_crypto_algs); - mktme_status = MKTME_DISABLED; - } -detect_keyid_bits: - keyid_bits = TME_ACTIVATE_KEYID_BITS(tme_activate); - nr_keyids = (1UL << keyid_bits) - 1; - if (nr_keyids) { - pr_info_once("x86/mktme: enabled by BIOS\n"); - pr_info_once("x86/mktme: %d KeyIDs available\n", nr_keyids); - } else { - pr_info_once("x86/mktme: disabled by BIOS\n"); - } - - if (mktme_status == MKTME_UNINITIALIZED) { - /* MKTME is usable */ - mktme_status = MKTME_ENABLED; - } - - /* - * KeyID bits effectively lower the number of physical address - * bits. Update cpuinfo_x86::x86_phys_bits accordingly. - */ - c->x86_phys_bits -= keyid_bits; -} - static void init_cpuid_fault(struct cpuinfo_x86 *c) { u64 msr; @@ -702,9 +709,6 @@ static void init_intel(struct cpuinfo_x86 *c)
init_ia32_feat_ctl(c);
- if (cpu_has(c, X86_FEATURE_TME)) - detect_tme(c); - init_intel_misc_features(c);
split_lock_init();
On Thu, Feb 01, 2024 at 12:09:02AM +0100, Paolo Bonzini wrote:
MKTME repurposes the high bit of physical address to key id for encryption key and, even though MAXPHYADDR in CPUID[0x80000008] remains the same, the valid bits in the MTRR mask register are based on the reduced number of physical address bits.
detect_tme() in arch/x86/kernel/cpu/intel.c detects TME and subtracts it from the total usable physical bits, but it is called too late. Move the call to early_init_intel() so that it is called in setup_arch(), before MTRRs are setup.
This fixes boot on TDX-enabled systems, which until now only worked with "disable_mtrr_cleanup". Without the patch, the values written to the MTRRs mask registers were 52-bit wide (e.g. 0x000fffff_80000800) and the writes failed; with the patch, the values are 46-bit wide, which matches the reduced MAXPHYADDR that is shown in /proc/cpuinfo.
Reported-by: Zixi Chen zixchen@redhat.com Cc: Adam Dunlap acdunlap@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Xiaoyao Li xiaoyao.li@intel.com Cc: Kai Huang kai.huang@intel.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com
Reviewed-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com
static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable; @@ -322,6 +406,13 @@ static void early_init_intel(struct cpuinfo_x86 *c) */ if (detect_extended_topology_early(c) < 0) detect_ht_early(c);
- /*
* Adjust the number of physical bits early because it affects the
* valid bits of the MTRR mask registers.
*/
- if (cpu_has(c, X86_FEATURE_TME))
}detect_tme_early(c);
early_init_intel() is also called by init_intel(), so IIUC for BSP detect_tme_early() will be called twice.
But this is no harm AFAICT, so:
Acked-by: Kai Huang kai.huang@intel.com
On 1/31/24 15:09, Paolo Bonzini wrote:
However, as noticed by Kirill, the patch I sent as v1 actually works only until Linux 6.6. In Linux 6.7, commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") reorganized the initialization of c->x86_phys_bits in a way that broke the patch. But even in 6.7 AMD processors, which did try to reduce it in this_cpu->c_early_init(c), had their x86_phys_bits value overwritten by get_cpu_address_sizes(), so that early_identify_cpu() left the wrong value in x86_phys_bits. This probably went unnoticed because on AMD processors you need not apply the reduced MAXPHYADDR to MTRR masks.
I really wanted get_cpu_address_sizes() to be the one and only spot where c->x86_phys_bits is established. That way, we don't get a bunch of code all of the place tweaking it and fighting for who "wins".
We're not there yet, but the approach in this patch moves it back in the wrong direction because it permits the random tweaking of c->x86_phys_bits.
Could we instead do something more like the (completely untested) attached patch?
BTW, I'm pretty sure the WARN_ON() in the patch won't actually work, but it'd be nice to work toward a point when it does.
On Thu, Feb 1, 2024 at 7:29 PM Dave Hansen dave.hansen@intel.com wrote:
I really wanted get_cpu_address_sizes() to be the one and only spot where c->x86_phys_bits is established. That way, we don't get a bunch of code all of the place tweaking it and fighting for who "wins". We're not there yet, but the approach in this patch moves it back in the wrong direction because it permits the random tweaking of c->x86_phys_bits.
I see your point; one of my earlier attempts added a ->c_detect_mem_encrypt() callback that basically amounted to either amd_detect_mem_encrypt() or detect_tme(). I bailed out of it mostly because these functions do more than adjusting phys_bits, and it seemed arbitrary to call them from get_cpu_address_sizes(). The two approaches share the idea of centralizing the computation of x86_phys_bits in get_cpu_address_sizes().
There is unfortunately an important hurdle for your patch, in that currently the BSP and AP flows are completely different. For the BSP the flow is ->c_early_init(), then get_cpu_address_sizes(), then again ->c_early_init() called by ->c_init(), then ->c_init(). For APs it is get_cpu_address_sizes(), then ->c_early_init() called by ->c_init(), then the rest of ->c_init(). And let's not even look at ->c_identify().
This difference is bad for your patch, because get_cpu_address_sizes() is called too early to see enc_phys_bits on APs. But it was also something that fbf6449f84bf didn't take into account, because it left behind the tentative initialization of x86_*_bits in identify_cpu(), while removing it from early_identify_cpu(). And
TBH my first reaction after Kirill pointed me to fbf6449f84bf was to revert it. It's not like the code before was much less of a dumpster fire, but that commit made the BSP vs. AP mess even worse. But it fixed a real-world bug and it did remove most of the "first set then adjust" logic, at least for the BSP, so a revert wasn't on the table and patch 1 was what came out of it.
I know that in general in Linux we prefer to fix things for good. Dancing one step behind and two ahead comes with the the risk that you only do the step behind. But in the end something like this patch 1 would have to be posted for stable branches (and Greg doesn't like one-off patches), and I am not even sure it's a step behind because it removes _some_ of the BSP vs. AP differences introduced by fbf6449f84bf.
In a nutshell: I don't dislike the idea behind your patch, but the code is just not ready for it.
I can look into cleaning up cpu/common.c though. I'm a natural procrastinator, it's my kind of thing to embark on a series of 30 patches to clean up 20 years old code...
Paolo
Could we instead do something more like the (completely untested) attached patch?
BTW, I'm pretty sure the WARN_ON() in the patch won't actually work, but it'd be nice to work toward a point when it does.
On Fri, Feb 2, 2024 at 12:08 AM Paolo Bonzini pbonzini@redhat.com wrote:
On Thu, Feb 1, 2024 at 7:29 PM Dave Hansen dave.hansen@intel.com wrote:
I really wanted get_cpu_address_sizes() to be the one and only spot where c->x86_phys_bits is established. That way, we don't get a bunch of code all of the place tweaking it and fighting for who "wins". We're not there yet, but the approach in this patch moves it back in the wrong direction because it permits the random tweaking of c->x86_phys_bits.
I see your point; one of my earlier attempts added a ->c_detect_mem_encrypt() callback that basically amounted to either amd_detect_mem_encrypt() or detect_tme(). I bailed out of it mostly because these functions do more than adjusting phys_bits, and it seemed arbitrary to call them from get_cpu_address_sizes(). The two approaches share the idea of centralizing the computation of x86_phys_bits in get_cpu_address_sizes().
There is unfortunately an important hurdle for your patch, in that currently the BSP and AP flows are completely different. For the BSP the flow is ->c_early_init(), then get_cpu_address_sizes(), then again ->c_early_init() called by ->c_init(), then ->c_init(). For APs it is get_cpu_address_sizes(), then ->c_early_init() called by ->c_init(), then the rest of ->c_init(). And let's not even look at ->c_identify().
This difference is bad for your patch, because get_cpu_address_sizes() is called too early to see enc_phys_bits on APs. But it was also something that fbf6449f84bf didn't take into account, because it left behind the tentative initialization of x86_*_bits in identify_cpu(), while removing it from early_identify_cpu(). And
TBH my first reaction after Kirill pointed me to fbf6449f84bf was to revert it. It's not like the code before was much less of a dumpster fire, but that commit made the BSP vs. AP mess even worse. But it fixed a real-world bug and it did remove most of the "first set then adjust" logic, at least for the BSP, so a revert wasn't on the table and patch 1 was what came out of it.
I know that in general in Linux we prefer to fix things for good. Dancing one step behind and two ahead comes with the the risk that you only do the step behind. But in the end something like this patch 1 would have to be posted for stable branches (and Greg doesn't like one-off patches), and I am not even sure it's a step behind because it removes _some_ of the BSP vs. AP differences introduced by fbf6449f84bf.
In a nutshell: I don't dislike the idea behind your patch, but the code is just not ready for it.
This is the diffstat before your patch can be applied more or less as is:
$ git diffstat origin/master arch/x86/include/asm/processor.h | 1 + arch/x86/kernel/cpu/amd.c | 12 ++-- arch/x86/kernel/cpu/centaur.c | 13 ++--- arch/x86/kernel/cpu/common.c | 103 +++++++++++++++++++---------------- arch/x86/kernel/cpu/hygon.c | 2 - arch/x86/kernel/cpu/intel.c | 4 +- arch/x86/kernel/cpu/transmeta.c | 2 - arch/x86/kernel/cpu/zhaoxin.c | 1 - 8 files changed, 69 insertions(+), 69 deletions(-)
$ git log --oneline --reverse origin/master.. d639afed02aa x86/cpu/common: move code up to early get_cpu_address_sizes() to a new function 40d34260a4ba x86/cpu/common: pull get_cpu_*() calls common to BSPs and APs to a new function 67b9839f9c38 x86/cpu/common: put all setup_force/clear capabilities together ebeae7f91cbc x86/cpu/centaur: do everything before early_init_centaur() in early_init_centaur() d62fa7400885 x86/cpu/cyrix: call early init function from init function 0aa0916cd7e0 x86/cpu/common: call early_init function on APs from common code d656b651d217 (HEAD) dave
I still haven't tested very well, but anyway, what do you think?
Paolo
On Sun, Feb 4, 2024 at 6:21 PM Paolo Bonzini pbonzini@redhat.com wrote:
On Fri, Feb 2, 2024 at 12:08 AM Paolo Bonzini pbonzini@redhat.com wrote:
On Thu, Feb 1, 2024 at 7:29 PM Dave Hansen dave.hansen@intel.com wrote:
I really wanted get_cpu_address_sizes() to be the one and only spot where c->x86_phys_bits is established. That way, we don't get a bunch of code all of the place tweaking it and fighting for who "wins". We're not there yet, but the approach in this patch moves it back in the wrong direction because it permits the random tweaking of c->x86_phys_bits.
There is unfortunately an important hurdle [...] in that currently the BSP and AP flows are completely different. For the BSP the flow is ->c_early_init(), then get_cpu_address_sizes(), then again ->c_early_init() called by ->c_init(), then ->c_init(). For APs it is get_cpu_address_sizes(), then ->c_early_init() called by ->c_init(), then the rest of ->c_init(). And let's not even look at ->c_identify(). [...] get_cpu_address_sizes() is called too early to see enc_phys_bits on APs. But it was also something that fbf6449f84bf didn't take into account, because it left behind the tentative initialization of x86_*_bits in identify_cpu(), while removing it from early_identify_cpu(). And
Ping, either for applying the original patches or for guidance on how to proceed.
Paolo
On 2/13/24 07:45, Paolo Bonzini wrote:
Ping, either for applying the original patches or for guidance on how to proceed.
Gah, all of the gunk that get_cpu_address_sizes() touches are out of control.
They (phys/virt_bits and clflush) need to get consolidated back to a single copy that gets set up *once* in early boot and then read by everyone else. I've got a series to do that, but it's got its tentacles in quite a few places. They're not great backporting material.
Your patches make things a wee bit worse in the meantime, but they pale in comparison to the random spaghetti that we've already got. Also, we probably need the early TME stuff regardless.
I think I'll probably suck it up, apply them, then fix them up along with the greater mess.
Anybody have any better ideas?
On Tue, Feb 13, 2024 at 11:02 PM Dave Hansen dave.hansen@intel.com wrote:
On 2/13/24 07:45, Paolo Bonzini wrote:
Ping, either for applying the original patches or for guidance on how to proceed.
Gah, all of the gunk that get_cpu_address_sizes() touches are out of control.
They (phys/virt_bits and clflush) need to get consolidated back to a single copy that gets set up *once* in early boot and then read by everyone else.
I've got a series to do that, but it's got its tentacles in quite a few places. They're not great backporting material.
Yes, same for mine. I probably digressed in a different direction than you, but there will be conflicts almost surely. I can post my stuff in the next few days.
Paolo
On Tue, Feb 13, 2024 at 11:02 PM Dave Hansen dave.hansen@intel.com wrote:
Your patches make things a wee bit worse in the meantime, but they pale in comparison to the random spaghetti that we've already got. Also, we probably need the early TME stuff regardless.
I think I'll probably suck it up, apply them, then fix them up along with the greater mess.
Anybody have any better ideas?
Ping, in the end are we applying these patches for either 6.8 or 6.9?
Paolo
On 2/20/24 04:22, Paolo Bonzini wrote:
On Tue, Feb 13, 2024 at 11:02 PM Dave Hansen dave.hansen@intel.com wrote:
Your patches make things a wee bit worse in the meantime, but they pale in comparison to the random spaghetti that we've already got. Also, we probably need the early TME stuff regardless.
I think I'll probably suck it up, apply them, then fix them up along with the greater mess.
Anybody have any better ideas?
Ping, in the end are we applying these patches for either 6.8 or 6.9?
Let me poke at them and see if we can stick them in x86/urgent early next week. They do fix an actual bug that's biting people, right?
On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen dave.hansen@intel.com wrote:
Ping, in the end are we applying these patches for either 6.8 or 6.9?
Let me poke at them and see if we can stick them in x86/urgent early next week. They do fix an actual bug that's biting people, right?
Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that don't boot at all without either these patches or "disable_mtrr_cleanup".
Paolo
On 2/23/24 02:08, Paolo Bonzini wrote:
On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen dave.hansen@intel.com wrote:
Ping, in the end are we applying these patches for either 6.8 or 6.9?
Let me poke at them and see if we can stick them in x86/urgent early next week. They do fix an actual bug that's biting people, right?
Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that don't boot at all without either these patches or "disable_mtrr_cleanup".
We tried platform other than Sapphire and Emerald. This patchset can fix boot issues on that platform also.
Regards Yin, Fengwei
Paolo
On 2/25/24 17:57, Yin Fengwei wrote:
On 2/23/24 02:08, Paolo Bonzini wrote:
On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen dave.hansen@intel.com wrote:
Ping, in the end are we applying these patches for either 6.8 or 6.9?
Let me poke at them and see if we can stick them in x86/urgent early next week. They do fix an actual bug that's biting people, right?
Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that don't boot at all without either these patches or "disable_mtrr_cleanup".
We tried platform other than Sapphire and Emerald. This patchset can fix boot issues on that platform also.
Fengwei, could you also test this series on the troublesome hardware, please?
https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.inte...
If it _also_ fixes the problem, it'll be a strong indication that it's the right long-term approach.
On 2/27/24 00:21, Dave Hansen wrote:
On 2/25/24 17:57, Yin Fengwei wrote:
On 2/23/24 02:08, Paolo Bonzini wrote:
On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen dave.hansen@intel.com wrote:
Ping, in the end are we applying these patches for either 6.8 or 6.9?
Let me poke at them and see if we can stick them in x86/urgent early next week. They do fix an actual bug that's biting people, right?
Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that don't boot at all without either these patches or "disable_mtrr_cleanup".
We tried platform other than Sapphire and Emerald. This patchset can fix boot issues on that platform also.
Fengwei, could you also test this series on the troublesome hardware, please?
Sure. I will try it on my env. I just didn't connected your patchset to this boot issue. :(.
Regards Yin, Fengwei
https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.inte...
If it _also_ fixes the problem, it'll be a strong indication that it's the right long-term approach.
Hi Dave,
On 2/27/24 00:21, Dave Hansen wrote:
On 2/25/24 17:57, Yin Fengwei wrote:
On 2/23/24 02:08, Paolo Bonzini wrote:
On Thu, Feb 22, 2024 at 7:07 PM Dave Hansen dave.hansen@intel.com wrote:
Ping, in the end are we applying these patches for either 6.8 or 6.9?
Let me poke at them and see if we can stick them in x86/urgent early next week. They do fix an actual bug that's biting people, right?
Yes, I have gotten reports of {Sapphire,Emerald} Rapids machines that don't boot at all without either these patches or "disable_mtrr_cleanup".
We tried platform other than Sapphire and Emerald. This patchset can fix boot issues on that platform also.
Fengwei, could you also test this series on the troublesome hardware, please?
https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.inte...
If it _also_ fixes the problem, it'll be a strong indication that it's the right long-term approach.
I tried your patchset on a Sapphire machine which is the only one broken machine I can access today. The base commit is 45ec2f5f6ed3 from the latest linus tree.
Without your patchset, the system boot hang. With your patchset, the system boot successfully.
Regards Yin, Fengwei
On 2/26/24 22:08, Yin Fengwei wrote:
https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.inte...
If it _also_ fixes the problem, it'll be a strong indication that it's the right long-term approach.
I tried your patchset on a Sapphire machine which is the only one broken machine I can access today. The base commit is 45ec2f5f6ed3 from the latest linus tree.
Without your patchset, the system boot hang. With your patchset, the system boot successfully.
Yay! Thanks for testing.
On 2/27/24 21:30, Dave Hansen wrote:
On 2/26/24 22:08, Yin Fengwei wrote:
https://lore.kernel.org/all/20240222183926.517AFCD2@davehans-spike.ostc.inte...
If it _also_ fixes the problem, it'll be a strong indication that it's the right long-term approach.
I tried your patchset on a Sapphire machine which is the only one broken machine I can access today. The base commit is 45ec2f5f6ed3 from the latest linus tree.
Without your patchset, the system boot hang. With your patchset, the system boot successfully.
Yay! Thanks for testing.
My pleasure.
Regards Yin, Fengwei
On 2/1/24 15:08, Paolo Bonzini wrote:
There is unfortunately an important hurdle for your patch, in that currently the BSP and AP flows are completely different.
Do we even _need_ c->x86_phys_bits for APs? I need to do a bit more grepping, but I only see it being read in show_cpuinfo(). Everything else seems to be boot_cpu_data.x86_phys_bits.
linux-stable-mirror@lists.linaro.org