Re: [PATCH v2 0/2] x86/cpu: fix invalid MTRR mask values for SEV or TME

4 Feb 2024


      On Fri, Feb 2, 2024 at 12:08 AM Paolo Bonzini pbonzini@redhat.com wrote:
...
On Thu, Feb 1, 2024 at 7:29 PM Dave Hansen dave.hansen@intel.com wrote:
...
I really wanted get_cpu_address_sizes() to be the one and only spot
where c->x86_phys_bits is established.  That way, we don't get a bunch
of code all of the place tweaking it and fighting for who "wins".
We're not there yet, but the approach in this patch moves it back in the
wrong direction because it permits the random tweaking of c->x86_phys_bits.
I see your point; one of my earlier attempts added a
->c_detect_mem_encrypt() callback that basically amounted to either
amd_detect_mem_encrypt() or detect_tme(). I bailed out of it mostly
because these functions do more than adjusting phys_bits, and it
seemed arbitrary to call them from get_cpu_address_sizes(). The two
approaches share the idea of centralizing the computation of
x86_phys_bits in get_cpu_address_sizes().
There is unfortunately an important hurdle for your patch, in that
currently the BSP and AP flows are completely different. For the BSP
the flow is ->c_early_init(), then get_cpu_address_sizes(), then again
->c_early_init() called by ->c_init(), then ->c_init(). For APs it is
get_cpu_address_sizes(), then ->c_early_init() called by ->c_init(),
then the rest of ->c_init(). And let's not even look at
->c_identify().
This difference is bad for your patch, because get_cpu_address_sizes()
is called too early to see enc_phys_bits on APs. But it was also
something that fbf6449f84bf didn't take into account, because it left
behind the tentative initialization of x86_*_bits in identify_cpu(),
while removing it from early_identify_cpu().  And
TBH my first reaction after Kirill pointed me to fbf6449f84bf was to
revert it. It's not like the code before was much less of a dumpster
fire, but that commit made the BSP vs. AP mess even worse. But it
fixed a real-world bug and it did remove most of the "first set then
adjust" logic, at least for the BSP, so a revert wasn't on the table
and patch 1 was what came out of it.
I know that in general in Linux we prefer to fix things for good.
Dancing one step behind and two ahead comes with the the risk that you
only do the step behind. But in the end something like this patch 1
would have to be posted for stable branches (and Greg doesn't like
one-off patches), and I am not even sure it's a step behind because it
removes _some_ of the BSP vs. AP differences introduced by
fbf6449f84bf.
In a nutshell: I don't dislike the idea behind your patch, but the
code is just not ready for it.
This is the diffstat before your patch can be applied more or less as is:
$ git diffstat origin/master
 arch/x86/include/asm/processor.h |   1 +
 arch/x86/kernel/cpu/amd.c        |  12 ++--
 arch/x86/kernel/cpu/centaur.c    |  13 ++---
 arch/x86/kernel/cpu/common.c     | 103 +++++++++++++++++++----------------
 arch/x86/kernel/cpu/hygon.c      |   2 -
 arch/x86/kernel/cpu/intel.c      |   4 +-
 arch/x86/kernel/cpu/transmeta.c  |   2 -
 arch/x86/kernel/cpu/zhaoxin.c    |   1 -
 8 files changed, 69 insertions(+), 69 deletions(-)
$ git log --oneline --reverse origin/master..
d639afed02aa x86/cpu/common: move code up to early
get_cpu_address_sizes() to a new function
40d34260a4ba x86/cpu/common: pull get_cpu_*() calls common to BSPs and
APs to a new function
67b9839f9c38 x86/cpu/common: put all setup_force/clear capabilities together
ebeae7f91cbc x86/cpu/centaur: do everything before
early_init_centaur() in early_init_centaur()
d62fa7400885 x86/cpu/cyrix: call early init function from init function
0aa0916cd7e0 x86/cpu/common: call early_init function on APs from common code
d656b651d217 (HEAD) dave
I still haven't tested very well, but anyway, what do you think?
Paolo

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2 0/2] x86/cpu: fix invalid MTRR mask values for SEV or TME