Peter!
On Mon, May 27 2024 at 23:15, Peter Schneider wrote:
Thanks for providing all the information!
I want to add one thing: there is a log entry in the dmesg output of a "bad" kernel, which I initially overlooked, because it is way up, and I noticed this just now. I guess this might be relevant:
[ 1.683564] [Firmware Bug]: CPU0: Topology domain 0 shift 1 != 5
Yes. That's absolutely related. I can see what goes wrong, but I have absolutely no idea how that happens.
Can you please apply the debug patch below ad provide the full dmesg after boot?
Thanks,
tglx --- --- a/arch/x86/kernel/cpu/topology_common.c +++ b/arch/x86/kernel/cpu/topology_common.c @@ -65,6 +65,7 @@ static void parse_legacy(struct topo_sca cores <<= smt_shift; }
+ pr_info("Legacy: %u %u %u\n", c->cpuid_level, smt_shift, core_shift); topology_set_dom(tscan, TOPO_SMT_DOMAIN, smt_shift, 1U << smt_shift); topology_set_dom(tscan, TOPO_CORE_DOMAIN, core_shift, cores); } --- a/arch/x86/kernel/cpu/topology_ext.c +++ b/arch/x86/kernel/cpu/topology_ext.c @@ -72,6 +72,9 @@ static inline bool topo_subleaf(struct t
cpuid_subleaf(leaf, subleaf, &sl);
+ pr_info("L:%0x %0x %0x S:%u N:%u T:%u\n", leaf, subleaf, sl.level, sl.x2apic_shift, + sl.num_processors, sl.type); + if (!sl.num_processors || sl.type == INVALID_TYPE) return false;
@@ -97,6 +100,7 @@ static inline bool topo_subleaf(struct t leaf, subleaf, tscan->c->topo.initial_apicid, sl.x2apic_id); }
+ pr_info("D: %u\n", dom); topology_set_dom(tscan, dom, sl.x2apic_shift, sl.num_processors); return true; }