Hi all,
I have noticed strange messages in kernel version 6.9, obviously from CPU topology detection, which were not present in 6.8.y and earlier kernels.
This is coming from an older server machine: 2-socket Ivy Bridge Xeon E5-2697 v2 (24C/48T) in an Asus Z9PE-D16/2L motherboard (Intel C-602A chipset); BIOS patched to the latest available from Asus. All memory slots occupied, so 256 GB RAM in total.
From a "good boot", e.g. kernel 6.8.11, dmesg output looks like this:
[ 1.823797] smpboot: x86: Booting SMP configuration: [ 1.823799] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 [ 1.827514] .... node #1, CPUs: #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 [ 0.011462] smpboot: CPU 12 Converting physical 0 to logical die 1
[ 1.875532] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 [ 1.882453] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 [ 1.887532] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. [ 1.933640] smp: Brought up 2 nodes, 48 CPUs [ 1.933640] smpboot: Max logical packages: 2 [ 1.933640] smpboot: Total of 48 processors activated (259199.61 BogoMIPS)
From a "bad" boot, e.g. kernel 6.9.2, dmesg output has these messages in it:
[ 1.785937] smpboot: x86: Booting SMP configuration: [ 1.785939] .... node #0, CPUs: #4 [ 1.786215] .... node #1, CPUs: #12 #16 [ 1.793547] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[ 1.797547] .... node #0, CPUs: #1 #2 #3 #5 #6 #7 #8 #9 #10 #11 [ 1.801858] .... node #1, CPUs: #13 #14 #15 #17 #18 #19 #20 #21 #22 #23 [ 1.804687] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 [ 1.810728] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 [ 1.901547] smp: Brought up 2 nodes, 48 CPUs [ 1.901547] smpboot: Total of 48 processors activated (259207.87 BogoMIPS) [ 1.903803] BUG: arch topology borken [ 1.903879] the SMT domain not a subset of the CLS domain [ 1.903970] BUG: arch topology borken [ 1.904040] the SMT domain not a subset of the CLS domain [ 1.904128] BUG: arch topology borken [ 1.904198] the SMT domain not a subset of the CLS domain
... and this "BUG" and the following line repeat 48 times which is the number of logical CPUs this machine has. Also, there is a funny typo in the message, but that might be intended, I guess?! Moreover I noticed, from node #1, CPU #12 detection message is missing, so the counting maybe wrong?!
However the machine boots, and except from these strange messages, I cannot detect any other abnormal behaviour. It is running ~15 QEMU/KVM virtual machines just fine. Because these messages look unusual and a bit scary though, I have bisected the issue, to be able to report it here. The first bad commit I found is this one:
22d63660c35eb751c63a709bf901a64c1726592a is the first bad commit commit 22d63660c35eb751c63a709bf901a64c1726592a Author: Thomas Gleixner tglx@linutronix.de Date: Tue Feb 13 22:04:08 2024 +0100
x86/cpu: Use common topology code for Intel
Intel CPUs use either topology leaf 0xb/0x1f evaluation or the legacy SMP/HT evaluation based on CPUID leaf 0x1/0x4.
Move it over to the consolidated topology code and remove the random topology hacks which are sprinkled into the Intel and the common code.
No functional change intended.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Juergen Gross jgross@suse.com Tested-by: Sohil Mehta sohil.mehta@intel.com Tested-by: Michael Kelley mhklinux@outlook.com Tested-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Tested-by: K Prateek Nayak kprateek.nayak@amd.com Link: https://lore.kernel.org/r/20240212153624.893644349@linutronix.de
arch/x86/kernel/cpu/common.c | 65 ----------------------------------- arch/x86/kernel/cpu/cpu.h | 4 --- arch/x86/kernel/cpu/intel.c | 25 -------------- arch/x86/kernel/cpu/topology.c | 22 ------------ arch/x86/kernel/cpu/topology_common.c | 5 ++- 5 files changed, 4 insertions(+), 117 deletions(-) root@linus:/usr/src/linux#
I attach my bisect log, and full dmesg output from a good and from a bad kernel version.
Moreover, the last 3 bad kernels from my bisect session did not boot at all, including the one with commit SHA1 from the first bad commit above. These kernels also had the series of "BUG" messages scrolling through on the console, and then additionally a kernel panic, seemingly coming from a divide exception from function init_intel_microcode:
<5>[ 5.968685] Key type dns_resolver registered <4>[ 5.974402] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' <4>[ 5.977017] divide error: 0000 [#1] PREEMPT SMP PTI <4>[ 5.977116] CPU: 9 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc4+ #1 <4>[ 5.977213] Hardware name: ASUSTeK COMPUTER INC. Z9PE-D16 Series/Z9PE-D16 Series, BIOS 5601 06/11/2015 <4>[ 5.977337] RIP: 0010:init_intel_microcode+0x3c/0x80 <4>[ 5.977436] Code: ff 75 44 40 80 fe 05 76 3e 48 8b 05 b6 45 f7 ff a9 00 00 00 40 75 30 8b 05 85 46 f7 ff 0f b7 0d aa 46 f7 ff 31 d2 48 c1 e0 0a <48> f7 f1 89 05 9b f9 46 ff 48 c7 c0 c0 98 e4 a8 31 d2 31 c9 31 f6 <4>[ 5.977602] RSP: 0000:ffffb79b8008fd80 EFLAGS: 00010206 <4>[ 5.977697] RAX: 0000000001e00000 RBX: 0000000000000000 RCX: 0000000000000000 <4>[ 5.977795] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000 <4>[ 5.977894] RBP: ffffb79b8008fdf8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 5.977992] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[ 5.978090] R13: 000000000000019a R14: ffffb79b8008fe08 R15: ffff96ad4026cf00 <4>[ 5.978187] FS: 0000000000000000(0000) GS:ffff96cc3fa40000(0000) knlGS:0000000000000000 <4>[ 5.978308] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 5.978402] CR2: 0000000000000000 CR3: 0000000e6d236001 CR4: 00000000001706f0 <4>[ 5.978500] Call Trace: <4>[ 5.978588] <TASK> <4>[ 5.978675] ? show_regs+0x6d/0x80 <4>[ 5.978767] ? die+0x37/0xa0 <4>[ 5.978857] ? do_trap+0xd4/0xf0 <4>[ 5.978948] ? do_error_trap+0x71/0xb0 <4>[ 5.979040] ? init_intel_microcode+0x3c/0x80 <4>[ 5.979131] ? exc_divide_error+0x3a/0x70 <4>[ 5.979226] ? init_intel_microcode+0x3c/0x80 <4>[ 5.979317] ? asm_exc_divide_error+0x1b/0x20 <4>[ 5.979427] ? init_intel_microcode+0x3c/0x80 <4>[ 5.979520] ? microcode_init+0x196/0x260 <4>[ 5.979612] ? __pfx_microcode_init+0x10/0x10 <4>[ 5.979718] do_one_initcall+0x5e/0x340 <4>[ 5.979813] kernel_init_freeable+0x322/0x490 <4>[ 5.979906] ? __pfx_kernel_init+0x10/0x10 <4>[ 5.979998] kernel_init+0x1b/0x200 <4>[ 5.980089] ret_from_fork+0x47/0x70 <4>[ 5.980180] ? __pfx_kernel_init+0x10/0x10 <4>[ 5.980272] ret_from_fork_asm+0x1b/0x30 <4>[ 5.980364] </TASK> <4>[ 5.980450] Modules linked in: <4>[ 5.980544] ---[ end trace 0000000000000000 ]--- <4>[ 6.959943] RIP: 0010:init_intel_microcode+0x3c/0x80 <4>[ 6.960041] Code: ff 75 44 40 80 fe 05 76 3e 48 8b 05 b6 45 f7 ff a9 00 00 00 40 75 30 8b 05 85 46 f7 ff 0f b7 0d aa 46 f7 ff 31 d2 48 c1 e0 0a <48> f7 f1 89 05 9b f9 46 ff 48 c7 c0 c0 98 e4 a8 31 d2 31 c9 31 f6 <4>[ 6.960207] RSP: 0000:ffffb79b8008fd80 EFLAGS: 00010206 <4>[ 6.960316] RAX: 0000000001e00000 RBX: 0000000000000000 RCX: 0000000000000000 <4>[ 6.960414] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000000 <4>[ 6.960512] RBP: ffffb79b8008fdf8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 6.960610] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>[ 6.960708] R13: 000000000000019a R14: ffffb79b8008fe08 R15: ffff96ad4026cf00 <4>[ 6.960806] FS: 0000000000000000(0000) GS:ffff96cc3fa40000(0000) knlGS:0000000000000000 <4>[ 6.960927] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 6.961021] CR2: 0000000000000000 CR3: 0000000e6d236001 CR4: 00000000001706f0 <0>[ 6.961120] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b <0>[ 6.961312] Kernel Offset: 0x25c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
I also attached full dmesg log file "dmesg-erst-7373208397568540677" of this panic which I could find in /var/lib/systemd/pstore.
Beste Grüße, Peter Schneider
On Mon, May 27 2024 at 09:29, Peter Schneider wrote:
This is coming from an older server machine: 2-socket Ivy Bridge Xeon E5-2697 v2 (24C/48T) in an Asus Z9PE-D16/2L motherboard (Intel C-602A chipset); BIOS patched to the latest available from Asus. All memory slots occupied, so 256 GB RAM in total.
From a "good boot", e.g. kernel 6.8.11, dmesg output looks like this:
[ 1.823797] smpboot: x86: Booting SMP configuration: [ 1.823799] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 [ 1.827514] .... node #1, CPUs: #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 [ 0.011462] smpboot: CPU 12 Converting physical 0 to logical die 1
[ 1.875532] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 [ 1.882453] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 [ 1.887532] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. [ 1.933640] smp: Brought up 2 nodes, 48 CPUs [ 1.933640] smpboot: Max logical packages: 2 [ 1.933640] smpboot: Total of 48 processors activated (259199.61 BogoMIPS)
From a "bad" boot, e.g. kernel 6.9.2, dmesg output has these messages in it:
[ 1.785937] smpboot: x86: Booting SMP configuration: [ 1.785939] .... node #0, CPUs: #4 [ 1.786215] .... node #1, CPUs: #12 #16
Yuck. That does not make any sense.
[ 1.797547] .... node #0, CPUs: #1 #2 #3 #5 #6 #7 #8 #9 #10 #11 [ 1.801858] .... node #1, CPUs: #13 #14 #15 #17 #18 #19 #20 #21 #22 #23 [ 1.804687] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 [ 1.810728] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
However the machine boots, and except from these strange messages, I cannot detect any other abnormal behaviour. It is running ~15 QEMU/KVM virtual machines just fine. Because these messages look unusual and a bit scary though, I have bisected the issue, to be able to report it here. The first bad commit I found is this one:
Ok. So as the machine is booting, can you please provide the output of:
cat /sys/kernel/debug/x86/topo/cpus/*
on the 6.9 kernel and
cat /proc/cpuinfo
for both 6.8 and 6.9?
Thanks,
tglx
On Mon, May 27 2024 at 15:14, Thomas Gleixner wrote:
On Mon, May 27 2024 at 09:29, Peter Schneider wrote:
This is coming from an older server machine: 2-socket Ivy Bridge Xeon E5-2697 v2 (24C/48T) in an Asus Z9PE-D16/2L motherboard (Intel C-602A chipset); BIOS patched to the latest available from Asus. All memory slots occupied, so 256 GB RAM in total.
From a "good boot", e.g. kernel 6.8.11, dmesg output looks like this:
[ 1.823797] smpboot: x86: Booting SMP configuration: [ 1.823799] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 [ 1.827514] .... node #1, CPUs: #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 [ 0.011462] smpboot: CPU 12 Converting physical 0 to logical die 1
[ 1.875532] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 [ 1.882453] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 [ 1.887532] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details. [ 1.933640] smp: Brought up 2 nodes, 48 CPUs [ 1.933640] smpboot: Max logical packages: 2 [ 1.933640] smpboot: Total of 48 processors activated (259199.61 BogoMIPS)
From a "bad" boot, e.g. kernel 6.9.2, dmesg output has these messages in it:
[ 1.785937] smpboot: x86: Booting SMP configuration: [ 1.785939] .... node #0, CPUs: #4 [ 1.786215] .... node #1, CPUs: #12 #16
Yuck. That does not make any sense.
[ 1.797547] .... node #0, CPUs: #1 #2 #3 #5 #6 #7 #8 #9 #10 #11 [ 1.801858] .... node #1, CPUs: #13 #14 #15 #17 #18 #19 #20 #21 #22 #23 [ 1.804687] .... node #0, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 [ 1.810728] .... node #1, CPUs: #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
However the machine boots, and except from these strange messages, I cannot detect any other abnormal behaviour. It is running ~15 QEMU/KVM virtual machines just fine. Because these messages look unusual and a bit scary though, I have bisected the issue, to be able to report it here. The first bad commit I found is this one:
Ok. So as the machine is booting, can you please provide the output of:
cat /sys/kernel/debug/x86/topo/cpus/*
on the 6.9 kernel and
cat /proc/cpuinfo
for both 6.8 and 6.9?
And once the output of:
cpuid -r
no matter on which kernel please?
Thanks,
tglx
Hello Thomas,
thanks very much for looking into this issue!
Am 27.05.2024 um 15:14 schrieb Thomas Gleixner:
Ok. So as the machine is booting, can you please provide the output of:
cat /sys/kernel/debug/x86/topo/cpus/*
on the 6.9 kernel and
Please find attached files topo_cpus_RAW_6_8_11.txt, topo_cpus_RAW_6_9_2.txt, topo_cpus_SORTED_6_8_11.txt, topo_cpus_SORTED_6_9_2.txt. One for each kernel, and one raw as requested, and one a bit sorted for easier navigation.
cat /proc/cpuinfo
for both 6.8 and 6.9?
Please find attached files cpuinfo_6_8_11.txt and cpuinfo_6_9_2.txt
And once the output of:
cpuid -r
no matter on which kernel please?
Please find attached files cpuid.txt and cpuid-r.txt.
Beste Grüße, Peter Schneider
Thomas,
Am 27.05.2024 um 23:06 schrieb Peter Schneider:
Hello Thomas,
thanks very much for looking into this issue!
[...]
I want to add one thing: there is a log entry in the dmesg output of a "bad" kernel, which I initially overlooked, because it is way up, and I noticed this just now. I guess this might be relevant:
[ 1.683564] [Firmware Bug]: CPU0: Topology domain 0 shift 1 != 5
This does not appear in the 6.8 kernel dmesg.
What do you think?
Beste Grüße, Peter Schneider
Hey Peter,
On 24/05/27 11:15PM, Peter Schneider wrote:
I want to add one thing: there is a log entry in the dmesg output of a "bad" kernel, which I initially overlooked, because it is way up, and I noticed this just now. I guess this might be relevant:
[ 1.683564] [Firmware Bug]: CPU0: Topology domain 0 shift 1 != 5
This does not appear in the 6.8 kernel dmesg.
I also can't comment on whether this is relevant or not, but I have noticed this in more places:
- https://bugzilla.kernel.org/show_bug.cgi?id=218879 - https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/57
Cheers, Chris
Peter!
On Mon, May 27 2024 at 23:15, Peter Schneider wrote:
Thanks for providing all the information!
I want to add one thing: there is a log entry in the dmesg output of a "bad" kernel, which I initially overlooked, because it is way up, and I noticed this just now. I guess this might be relevant:
[ 1.683564] [Firmware Bug]: CPU0: Topology domain 0 shift 1 != 5
Yes. That's absolutely related. I can see what goes wrong, but I have absolutely no idea how that happens.
Can you please apply the debug patch below ad provide the full dmesg after boot?
Thanks,
tglx --- --- a/arch/x86/kernel/cpu/topology_common.c +++ b/arch/x86/kernel/cpu/topology_common.c @@ -65,6 +65,7 @@ static void parse_legacy(struct topo_sca cores <<= smt_shift; }
+ pr_info("Legacy: %u %u %u\n", c->cpuid_level, smt_shift, core_shift); topology_set_dom(tscan, TOPO_SMT_DOMAIN, smt_shift, 1U << smt_shift); topology_set_dom(tscan, TOPO_CORE_DOMAIN, core_shift, cores); } --- a/arch/x86/kernel/cpu/topology_ext.c +++ b/arch/x86/kernel/cpu/topology_ext.c @@ -72,6 +72,9 @@ static inline bool topo_subleaf(struct t
cpuid_subleaf(leaf, subleaf, &sl);
+ pr_info("L:%0x %0x %0x S:%u N:%u T:%u\n", leaf, subleaf, sl.level, sl.x2apic_shift, + sl.num_processors, sl.type); + if (!sl.num_processors || sl.type == INVALID_TYPE) return false;
@@ -97,6 +100,7 @@ static inline bool topo_subleaf(struct t leaf, subleaf, tscan->c->topo.initial_apicid, sl.x2apic_id); }
+ pr_info("D: %u\n", dom); topology_set_dom(tscan, dom, sl.x2apic_shift, sl.num_processors); return true; }
Hi Thomas,
Am 30.05.24 um 10:30 schrieb Thomas Gleixner:
Can you please apply the debug patch below ad provide the full dmesg after boot?
Here you go... The patch applied cleanly against 6.9.3, which I saw was just released by Greg, so I used that. If you want, I can repeat the test against 6.9.2, too.
Please note: to be able to boot any kernel >= 6.8.4 on my machine, I also had to apply this patch by Martin Petersen, fixing another (unrelated SCSI) regression I reported some time ago, see here:
https://lore.kernel.org/all/20240521023040.2703884-1-martin.petersen@oracle....
But I think these two issues are not connected in any way. It was during testing the above patch by Martin that I noticed this new issue in 6.9 BTW.
I have attached resulting file dmesg_6.9.3-dirty_Bad_wDebugInfo.txt, and I hope you can make some sense of it.
Beste Grüße, Peter Schneider
Peter!
On Thu, May 30 2024 at 12:06, Peter Schneider wrote:
Am 30.05.24 um 10:30 schrieb Thomas Gleixner:
Can you please apply the debug patch below ad provide the full dmesg after boot?
Here you go... The patch applied cleanly against 6.9.3, which I saw was just released by Greg, so I used that. If you want, I can repeat the test against 6.9.2, too.
.3 is fine
Please note: to be able to boot any kernel >= 6.8.4 on my machine, I also had to apply this patch by Martin Petersen, fixing another (unrelated SCSI) regression I reported some time ago, see here:
https://lore.kernel.org/all/20240521023040.2703884-1-martin.petersen@oracle....
But I think these two issues are not connected in any way. It was during testing the above patch by Martin that I noticed this new issue in 6.9 BTW.
Right. It's a seperate problem.
I have attached resulting file dmesg_6.9.3-dirty_Bad_wDebugInfo.txt, and I hope you can make some sense of it.
It's exactly what I expected but it does not make any sense at all.
[ 0.000000] Legacy: 2 5 5
So that means that during early boot where the topology parameters are decoded from CPUID the CPUID evaluation code sees that the maximum supported CPUID leaf is 0x02 and it therefore reads complete non-sense.
Later on when the full CPUID evaluation happens it sees the full space and uses leaf 0xb.
[ 1.687649] L:b 0 0 S:1 N:2 T:1 [ 1.687652] D: 0 [ 1.687653] L:b 1 1 S:5 N:24 T:2 [ 1.687655] D: 1 [ 1.687656] L:b 2 2 S:0 N:0 T:0 [ 1.687658] [Firmware Bug]: CPU0: Topology domain 0 shift 1 != 5
And this obviously sees the proper numbers and complains about the inconsistency.
So something on this CPU is broken. The same problem exists on all APs:
[ 1.790035] .... node #0, CPUs: #4 [ 1.790312] .... node #1, CPUs: #12 #16 [ 0.011992] Legacy: 2 5 5 [ 0.011992] Legacy: 2 5 5 [ 0.011992] Legacy: 2 5 5 [ 0.011992] Legacy: 2 5 5
.....
Now the million-dollar question is what unlocks CPUID to read the proper value of EAX of leaf 0. All I could come up with is to sprinkle a dozen of printks into that code. Updated debug patch below.
Thanks,
tglx --- --- a/arch/x86/kernel/cpu/topology_common.c +++ b/arch/x86/kernel/cpu/topology_common.c @@ -65,6 +65,7 @@ static void parse_legacy(struct topo_sca cores <<= smt_shift; }
+ pr_info("Legacy: %u %u %u\n", c->cpuid_level, smt_shift, core_shift); topology_set_dom(tscan, TOPO_SMT_DOMAIN, smt_shift, 1U << smt_shift); topology_set_dom(tscan, TOPO_CORE_DOMAIN, core_shift, cores); } --- a/arch/x86/kernel/cpu/topology_ext.c +++ b/arch/x86/kernel/cpu/topology_ext.c @@ -72,6 +72,9 @@ static inline bool topo_subleaf(struct t
cpuid_subleaf(leaf, subleaf, &sl);
+ pr_info("L:%0x %0x %0x S:%u N:%u T:%u\n", leaf, subleaf, sl.level, sl.x2apic_shift, + sl.num_processors, sl.type); + if (!sl.num_processors || sl.type == INVALID_TYPE) return false;
@@ -97,6 +100,7 @@ static inline bool topo_subleaf(struct t leaf, subleaf, tscan->c->topo.initial_apicid, sl.x2apic_id); }
+ pr_info("D: %u\n", dom); topology_set_dom(tscan, dom, sl.x2apic_shift, sl.num_processors); return true; } --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1584,22 +1584,30 @@ static void __init early_identify_cpu(st /* cyrix could have cpuid enabled via c_identify()*/ if (have_cpuid_p()) { cpu_detect(c); + pr_info("MAXL1: %x\n", cpuid_eax(0)); get_cpu_vendor(c); + pr_info("MAXL2: %x\n", cpuid_eax(0)); get_cpu_cap(c); + pr_info("MAXL3: %x\n", cpuid_eax(0)); setup_force_cpu_cap(X86_FEATURE_CPUID); get_cpu_address_sizes(c); + pr_info("MAXL4: %x\n", cpuid_eax(0)); cpu_parse_early_param(); + pr_info("MAXL5: %x\n", cpuid_eax(0));
cpu_init_topology(c); + pr_info("MAXL6: %x\n", cpuid_eax(0));
if (this_cpu->c_early_init) this_cpu->c_early_init(c); + pr_info("MAXL7: %x\n", cpuid_eax(0));
c->cpu_index = 0; filter_cpuid_features(c, false);
if (this_cpu->c_bsp_init) this_cpu->c_bsp_init(c); + pr_info("MAXL8: %x\n", cpuid_eax(0)); } else { setup_clear_cpu_cap(X86_FEATURE_CPUID); get_cpu_address_sizes(c); @@ -1797,9 +1805,12 @@ static void identify_cpu(struct cpuinfo_ #ifdef CONFIG_X86_VMX_FEATURE_NAMES memset(&c->vmx_capability, 0, sizeof(c->vmx_capability)); #endif + pr_info("MAXLG1: %x\n", cpuid_eax(0));
generic_identify(c);
+ pr_info("MAXLG2: %x\n", cpuid_eax(0)); + cpu_parse_topology(c);
if (this_cpu->c_identify)
On Thu, May 30 2024 at 15:35, Thomas Gleixner wrote:
On Thu, May 30 2024 at 12:06, Peter Schneider wrote: Now the million-dollar question is what unlocks CPUID to read the proper value of EAX of leaf 0. All I could come up with is to sprinkle a dozen of printks into that code. Updated debug patch below.
Don't bother. Dave pointed out to me that this is unlocked in early_init_intel() via MSR_IA32_MISC_ENABLE_LIMIT_CPUID...
Let me figure out how to fix that sanely.
Thanks,
tglx
On Thu, May 30 2024 at 17:53, Thomas Gleixner wrote:
On Thu, May 30 2024 at 15:35, Thomas Gleixner wrote:
On Thu, May 30 2024 at 12:06, Peter Schneider wrote: Now the million-dollar question is what unlocks CPUID to read the proper value of EAX of leaf 0. All I could come up with is to sprinkle a dozen of printks into that code. Updated debug patch below.
Don't bother. Dave pointed out to me that this is unlocked in early_init_intel() via MSR_IA32_MISC_ENABLE_LIMIT_CPUID...
Let me figure out how to fix that sanely.
The original code just worked because it was reevaluating this stuff over and over until it magically became "correct".
The proper fix is obviously to unlock CPUID on Intel _before_ anything which depends on cpuid_level is evaluated.
Thanks,
tglx --- --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -969,7 +969,7 @@ static void init_speculation_control(str } }
-void get_cpu_cap(struct cpuinfo_x86 *c) +static void get_cpu_cap(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx;
@@ -1585,6 +1585,7 @@ static void __init early_identify_cpu(st if (have_cpuid_p()) { cpu_detect(c); get_cpu_vendor(c); + intel_unlock_cpuid_leafs(c); get_cpu_cap(c); setup_force_cpu_cap(X86_FEATURE_CPUID); get_cpu_address_sizes(c); @@ -1744,7 +1745,7 @@ static void generic_identify(struct cpui cpu_detect(c);
get_cpu_vendor(c); - + intel_unlock_cpuid_leafs(c); get_cpu_cap(c);
get_cpu_address_sizes(c); --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -61,14 +61,15 @@ extern __ro_after_init enum tsx_ctrl_sta
extern void __init tsx_init(void); void tsx_ap_init(void); +void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c); #else static inline void tsx_init(void) { } static inline void tsx_ap_init(void) { } +static inline void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) { } #endif /* CONFIG_CPU_SUP_INTEL */
extern void init_spectral_chicken(struct cpuinfo_x86 *c);
-extern void get_cpu_cap(struct cpuinfo_x86 *c); extern void get_cpu_address_sizes(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -269,19 +269,26 @@ static void detect_tme_early(struct cpui c->x86_phys_bits -= keyid_bits; }
+void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) +{ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return; + + if (c->x86 < 6 || (c->x86 == 6 && c->x86_model < 0xd)) + return; + + /* + * The BIOS can have limited CPUID to leaf 2, which breaks feature + * enumeration. Unlock it and update the maximum leaf info. + */ + if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) + c->cpuid_level = cpuid_eax(0); +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable;
- /* Unmask CPUID levels if masked: */ - if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { - if (msr_clear_bit(MSR_IA32_MISC_ENABLE, - MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) { - c->cpuid_level = cpuid_eax(0); - get_cpu_cap(c); - } - } - if ((c->x86 == 0xf && c->x86_model >= 0x03) || (c->x86 == 0x6 && c->x86_model >= 0x0e)) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
Hi Thomas,
Am 30.05.2024 um 18:24 schrieb Thomas Gleixner:
The proper fix is obviously to unlock CPUID on Intel _before_ anything which depends on cpuid_level is evaluated.
Thanks,
tglx
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -969,7 +969,7 @@ static void init_speculation_control(str } }
-void get_cpu_cap(struct cpuinfo_x86 *c) +static void get_cpu_cap(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx;
@@ -1585,6 +1585,7 @@ static void __init early_identify_cpu(st if (have_cpuid_p()) { cpu_detect(c); get_cpu_vendor(c);
get_cpu_cap(c); setup_force_cpu_cap(X86_FEATURE_CPUID); get_cpu_address_sizes(c);intel_unlock_cpuid_leafs(c);
@@ -1744,7 +1745,7 @@ static void generic_identify(struct cpui cpu_detect(c);
get_cpu_vendor(c);
intel_unlock_cpuid_leafs(c); get_cpu_cap(c);
get_cpu_address_sizes(c);
--- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -61,14 +61,15 @@ extern __ro_after_init enum tsx_ctrl_sta
extern void __init tsx_init(void); void tsx_ap_init(void); +void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c); #else static inline void tsx_init(void) { } static inline void tsx_ap_init(void) { } +static inline void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) { } #endif /* CONFIG_CPU_SUP_INTEL */
extern void init_spectral_chicken(struct cpuinfo_x86 *c);
-extern void get_cpu_cap(struct cpuinfo_x86 *c); extern void get_cpu_address_sizes(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -269,19 +269,26 @@ static void detect_tme_early(struct cpui c->x86_phys_bits -= keyid_bits; }
+void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) +{
- if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return;
- if (c->x86 < 6 || (c->x86 == 6 && c->x86_model < 0xd))
return;
- /*
* The BIOS can have limited CPUID to leaf 2, which breaks feature
* enumeration. Unlock it and update the maximum leaf info.
*/
- if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0)
c->cpuid_level = cpuid_eax(0);
+}
- static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable;
- /* Unmask CPUID levels if masked: */
- if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
if (msr_clear_bit(MSR_IA32_MISC_ENABLE,
MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) {
c->cpuid_level = cpuid_eax(0);
get_cpu_cap(c);
}
- }
- if ((c->x86 == 0xf && c->x86_model >= 0x03) || (c->x86 == 0x6 && c->x86_model >= 0x0e)) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
With that patch applied, I now get a build error:
CC [M] drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp.o CC [M] drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp1_execution.o CC [M] drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp1_transition.o CC [M] drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp2_execution.o CC [M] drivers/gpu/drm/amd/amdgpu/../display/modules/hdcp/hdcp2_transition.o LD [M] drivers/gpu/drm/amd/amdgpu/amdgpu.o AR drivers/gpu/built-in.a AR drivers/built-in.a make[1]: *** [/usr/src/linux/Makefile:1919: .] Fehler 2 make: *** [Makefile:240: __sub-make] Fehler 2 root@linus:/usr/src/linux# make CALL scripts/checksyscalls.sh DESCEND objtool INSTALL libsubcmd_headers DESCEND bpf/resolve_btfids INSTALL libsubcmd_headers CC arch/x86/xen/enlighten_pv.o arch/x86/xen/enlighten_pv.c: In Funktion »xen_start_kernel«: arch/x86/xen/enlighten_pv.c:1388:9: Fehler: Implizite Deklaration der Funktion »get_cpu_cap«; meinten Sie »set_cpu_cap«? [-Werror=implicit-function-declaration] 1388 | get_cpu_cap(&boot_cpu_data); | ^~~~~~~~~~~ | set_cpu_cap cc1: Einige Warnungen werden als Fehler behandelt make[4]: *** [scripts/Makefile.build:244: arch/x86/xen/enlighten_pv.o] Fehler 1 make[3]: *** [scripts/Makefile.build:485: arch/x86/xen] Fehler 2 make[2]: *** [scripts/Makefile.build:485: arch/x86] Fehler 2 make[1]: *** [/usr/src/linux/Makefile:1919: .] Fehler 2 make: *** [Makefile:240: __sub-make] Fehler 2 root@linus:/usr/src/linux#
I used the kernel config of my Proxmox VE kernel, like so:
root@linus:/usr/src/linux# cp /boot/config-6.5.13-5-pve .config
and then ran "make olddefconfig", and then "make -j 48". That's how I tested all these patches, including Martin's previously mentionened SCSI patch, and this used to work. I have attached the .config file.
I am not a C programmer, let alone a kernel dev, so please bear with me if this is nonsense, but: could the reason be that with your change, you have removed the declaration of get_cpu_cap from the cpu.h header file, while it is still being referenced in arch/x86/xen/enlighten_pv.c like so:
#include "../kernel/cpu/cpu.h" /* get_cpu_cap() */
Should I try to just add it back in, and see if that works? Or would you prefer to look more deeply at this first, and then send me a reworked patch?
Beste Grüße, Peter Schneider
Peter!
On Fri, May 31 2024 at 08:52, Peter Schneider wrote:
Am 30.05.2024 um 18:24 schrieb Thomas Gleixner: With that patch applied, I now get a build error:
arch/x86/xen/enlighten_pv.c: In Funktion »xen_start_kernel«: arch/x86/xen/enlighten_pv.c:1388:9: Fehler: Implizite Deklaration der Funktion »get_cpu_cap«; meinten Sie »set_cpu_cap«? [-Werror=implicit-function-declaration] 1388 | get_cpu_cap(&boot_cpu_data);
Bah. Updated patch below.
Thanks,
tglx --- Subject: x86/topology/intel: Unlock CPUID before evaluating anything From: Thomas Gleixner tglx@linutronix.de Date: Thu, 30 May 2024 17:29:18 +0200
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If this bit is set by the BIOS then CPUID evaluation including topology enumeration does not work correctly as the evaluation code does not try to analyze any leaf greater than two.
This went unnoticed before because the original topology code just repeated evaluation several times and managed to overwrite the initial limited information with the correct one later. The new evaluation code does it once and therefore ends up with the limited and wrong information.
Cure this by unlocking CPUID right before evaluating anything which depends on the maximum CPUID leaf being greater than two instead of rereading stuff after unlock.
Fixes: 22d63660c35e ("x86/cpu: Use common topology code for Intel") Reported-by: Peter Schneider pschneider1968@googlemail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de --- arch/x86/kernel/cpu/common.c | 3 ++- arch/x86/kernel/cpu/intel.c | 25 ++++++++++++++++--------- 2 files changed, 18 insertions(+), 10 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1585,6 +1585,7 @@ static void __init early_identify_cpu(st if (have_cpuid_p()) { cpu_detect(c); get_cpu_vendor(c); + intel_unlock_cpuid_leafs(c); get_cpu_cap(c); setup_force_cpu_cap(X86_FEATURE_CPUID); get_cpu_address_sizes(c); @@ -1744,7 +1745,7 @@ static void generic_identify(struct cpui cpu_detect(c);
get_cpu_vendor(c); - + intel_unlock_cpuid_leafs(c); get_cpu_cap(c);
get_cpu_address_sizes(c); --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -269,19 +269,26 @@ static void detect_tme_early(struct cpui c->x86_phys_bits -= keyid_bits; }
+void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) +{ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return; + + if (c->x86 < 6 || (c->x86 == 6 && c->x86_model < 0xd)) + return; + + /* + * The BIOS can have limited CPUID to leaf 2, which breaks feature + * enumeration. Unlock it and update the maximum leaf info. + */ + if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) + c->cpuid_level = cpuid_eax(0); +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable;
- /* Unmask CPUID levels if masked: */ - if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { - if (msr_clear_bit(MSR_IA32_MISC_ENABLE, - MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) { - c->cpuid_level = cpuid_eax(0); - get_cpu_cap(c); - } - } - if ((c->x86 == 0xf && c->x86_model >= 0x03) || (c->x86 == 0x6 && c->x86_model >= 0x0e)) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
On Fri, May 31 2024 at 10:33, Thomas Gleixner wrote:
Clearly coffee did not set in yet.
Thanks,
tglx --- Subject: x86/topology/intel: Unlock CPUID before evaluating anything From: Thomas Gleixner tglx@linutronix.de Date: Thu, 30 May 2024 17:29:18 +0200
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If this bit is set by the BIOS then CPUID evaluation including topology enumeration does not work correctly as the evaluation code does not try to analyze any leaf greater than two.
This went unnoticed before because the original topology code just repeated evaluation several times and managed to overwrite the initial limited information with the correct one later. The new evaluation code does it once and therefore ends up with the limited and wrong information.
Cure this by unlocking CPUID right before evaluating anything which depends on the maximum CPUID leaf being greater than two instead of rereading stuff after unlock.
Fixes: 22d63660c35e ("x86/cpu: Use common topology code for Intel") Reported-by: Peter Schneider pschneider1968@googlemail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de --- arch/x86/kernel/cpu/common.c | 3 ++- arch/x86/kernel/cpu/cpu.h | 2 ++ arch/x86/kernel/cpu/intel.c | 25 ++++++++++++++++--------- 3 files changed, 20 insertions(+), 10 deletions(-)
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1585,6 +1585,7 @@ static void __init early_identify_cpu(st if (have_cpuid_p()) { cpu_detect(c); get_cpu_vendor(c); + intel_unlock_cpuid_leafs(c); get_cpu_cap(c); setup_force_cpu_cap(X86_FEATURE_CPUID); get_cpu_address_sizes(c); @@ -1744,7 +1745,7 @@ static void generic_identify(struct cpui cpu_detect(c);
get_cpu_vendor(c); - + intel_unlock_cpuid_leafs(c); get_cpu_cap(c);
get_cpu_address_sizes(c); --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -61,9 +61,11 @@ extern __ro_after_init enum tsx_ctrl_sta
extern void __init tsx_init(void); void tsx_ap_init(void); +void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c); #else static inline void tsx_init(void) { } static inline void tsx_ap_init(void) { } +static inline void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) { } #endif /* CONFIG_CPU_SUP_INTEL */
extern void init_spectral_chicken(struct cpuinfo_x86 *c); --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -269,19 +269,26 @@ static void detect_tme_early(struct cpui c->x86_phys_bits -= keyid_bits; }
+void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) +{ + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return; + + if (c->x86 < 6 || (c->x86 == 6 && c->x86_model < 0xd)) + return; + + /* + * The BIOS can have limited CPUID to leaf 2, which breaks feature + * enumeration. Unlock it and update the maximum leaf info. + */ + if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) + c->cpuid_level = cpuid_eax(0); +} + static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable;
- /* Unmask CPUID levels if masked: */ - if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) { - if (msr_clear_bit(MSR_IA32_MISC_ENABLE, - MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) { - c->cpuid_level = cpuid_eax(0); - get_cpu_cap(c); - } - } - if ((c->x86 == 0xf && c->x86_model >= 0x03) || (c->x86 == 0x6 && c->x86_model >= 0x0e)) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
Hey Thomas,
Am 31.05.2024 um 10:42 schrieb Thomas Gleixner:
Bah. Updated patch below. Clearly coffee did not set in yet.
[...]
;-)
There seems to be an absolute lower limit of caffeine per ml blood serum, below which you just can't get things done right... I know that too!
Anyway, this last version of your patch fixes things for me, please see attached dmesg output. Thanks very much for investigating and fixing this issue!
Tested-by: Peter Schneider pschneider1968@googlemail.com
If you like, I can retest with your first patch (with additional debug info output) additionally applied on top of that and send the output, if that would be useful for you. Just let me know.
Beste Grüße, Peter Schneider
Peter!
On Fri, May 31 2024 at 11:41, Peter Schneider wrote:
Anyway, this last version of your patch fixes things for me, please see attached dmesg output. Thanks very much for investigating and fixing this issue!
Tested-by: Peter Schneider pschneider1968@googlemail.com
If you like, I can retest with your first patch (with additional debug info output) additionally applied on top of that and send the output, if that would be useful for you.
No need. I'm properly coffeiniated and confident enough that this cures it. :)
Thanks a lot for testing and providing all the information!
tglx
Am 31.05.2024 um 12:07 schrieb Thomas Gleixner:
Thanks a lot for testing and providing all the information!
Refactoring messy legacy code is not an easy task. I'm glad I could help a tiny little bit so that you can get this done right!
Beste Grüße, Peter Schneider
On 31.05.24 10:42, Thomas Gleixner wrote:
On Fri, May 31 2024 at 10:33, Thomas Gleixner wrote:
Subject: x86/topology/intel: Unlock CPUID before evaluating anything From: Thomas Gleixner tglx@linutronix.de Date: Thu, 30 May 2024 17:29:18 +0200
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If this bit is set by the BIOS then CPUID evaluation including topology enumeration does not work correctly as the evaluation code does not try to analyze any leaf greater than two. [...]
TWIMC, I noticed a bug report with a "12 × 12th Gen Intel® Core™ i7-1255U" where the reporter also noticed a lot of messages like these:
archlinux kernel: [Firmware Bug]: CPU4: Topology domain 1 shift 7 != 6 archlinux kernel: [Firmware Bug]: CPU4: Topology domain 2 shift 7 != 6 archlinux kernel: [Firmware Bug]: CPU4: Topology domain 3 shift 7 != 6
Asked the reporter to test this patch. For details see: https://bugzilla.kernel.org/show_bug.cgi?id=218879
Ciao, Thorsten
#regzbot fix: x86/topology/intel: Unlock CPUID before evaluating anything
On Sat, Jun 01 2024 at 09:06, Linux regression tracking (Thorsten Leemhuis) wrote:
On 31.05.24 10:42, Thomas Gleixner wrote:
On Fri, May 31 2024 at 10:33, Thomas Gleixner wrote:
Subject: x86/topology/intel: Unlock CPUID before evaluating anything From: Thomas Gleixner tglx@linutronix.de Date: Thu, 30 May 2024 17:29:18 +0200
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If this bit is set by the BIOS then CPUID evaluation including topology enumeration does not work correctly as the evaluation code does not try to analyze any leaf greater than two. [...]
TWIMC, I noticed a bug report with a "12 × 12th Gen Intel® Core™ i7-1255U" where the reporter also noticed a lot of messages like these:
archlinux kernel: [Firmware Bug]: CPU4: Topology domain 1 shift 7 != 6 archlinux kernel: [Firmware Bug]: CPU4: Topology domain 2 shift 7 != 6 archlinux kernel: [Firmware Bug]: CPU4: Topology domain 3 shift 7 != 6
Asked the reporter to test this patch. For details see: https://bugzilla.kernel.org/show_bug.cgi?id=218879
Won't help. See: https://lore.kernel.org/all/87plt26m2b.ffs@tglx/
Thanks,
tglx
On 01.06.24 09:20, Thomas Gleixner wrote:
On Sat, Jun 01 2024 at 09:06, Linux regression tracking (Thorsten Leemhuis) wrote:
On 31.05.24 10:42, Thomas Gleixner wrote:
On Fri, May 31 2024 at 10:33, Thomas Gleixner wrote:
Subject: x86/topology/intel: Unlock CPUID before evaluating anything From: Thomas Gleixner tglx@linutronix.de Date: Thu, 30 May 2024 17:29:18 +0200
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If this bit is set by the BIOS then CPUID evaluation including topology enumeration does not work correctly as the evaluation code does not try to analyze any leaf greater than two. [...]
TWIMC, I noticed a bug report with a "12 × 12th Gen Intel® Core™ i7-1255U" where the reporter also noticed a lot of messages like these:
archlinux kernel: [Firmware Bug]: CPU4: Topology domain 1 shift 7 != 6 archlinux kernel: [Firmware Bug]: CPU4: Topology domain 2 shift 7 != 6 archlinux kernel: [Firmware Bug]: CPU4: Topology domain 3 shift 7 != 6
Asked the reporter to test this patch. For details see: https://bugzilla.kernel.org/show_bug.cgi?id=218879
Won't help. See: https://lore.kernel.org/all/87plt26m2b.ffs@tglx/
Ahh, it was the other problem in this thread. Sorry for not noticing that, had not followed things that closely. Forwarded that info to the ticket. Many thx! Ciao, Thorsten
On 24/05/30 06:24PM, Thomas Gleixner wrote:
On Thu, May 30 2024 at 17:53, Thomas Gleixner wrote:
Let me figure out how to fix that sanely.
The proper fix is obviously to unlock CPUID on Intel _before_ anything which depends on cpuid_level is evaluated.
Thanks,
tglx
Hey Thomas,
as reported on the other mail the proposed fix broke the build (see below) due to get_cpu_cap() becoming static but still being used in other parts of the code.
One of the reporters in the Arch Bugtracker with an Intel Core i7-7700k has tested a modified version of this fix[0] with the static change reversed on top of the 6.9.2 stable kernel and reports that the patch does not fix the issue for them. I have attached their output for the patched (dmesg6.9.2-1.5.log) and nonpatched (dmesg6.9.2-1.log) kernel.
Should we also get them to test the mainline version or do you need any other debug output?
Cheers, gromit
[0]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/57#...
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -969,7 +969,7 @@ static void init_speculation_control(str } } -void get_cpu_cap(struct cpuinfo_x86 *c) +static void get_cpu_cap(struct cpuinfo_x86 *c)
making this function static breaks the build for me:
arch/x86/xen/enlighten_pv.c: In function ‘xen_start_kernel’: arch/x86/xen/enlighten_pv.c:1388:9: error: implicit declaration of function ‘get_cpu_cap’; did you mean ‘set_cpu_cap’? [-Wimplicit-function-declaration] 1388 | get_cpu_cap(&boot_cpu_data); ¦ | ^~~~~~~~~~~ ¦ | set_cpu_cap
{ u32 eax, ebx, ecx, edx; @@ -1585,6 +1585,7 @@ static void __init early_identify_cpu(st if (have_cpuid_p()) { cpu_detect(c); get_cpu_vendor(c);
get_cpu_cap(c); setup_force_cpu_cap(X86_FEATURE_CPUID); get_cpu_address_sizes(c);intel_unlock_cpuid_leafs(c);
@@ -1744,7 +1745,7 @@ static void generic_identify(struct cpui cpu_detect(c); get_cpu_vendor(c);
- intel_unlock_cpuid_leafs(c); get_cpu_cap(c);
get_cpu_address_sizes(c); --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -61,14 +61,15 @@ extern __ro_after_init enum tsx_ctrl_sta extern void __init tsx_init(void); void tsx_ap_init(void); +void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c); #else static inline void tsx_init(void) { } static inline void tsx_ap_init(void) { } +static inline void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) { } #endif /* CONFIG_CPU_SUP_INTEL */ extern void init_spectral_chicken(struct cpuinfo_x86 *c); -extern void get_cpu_cap(struct cpuinfo_x86 *c); extern void get_cpu_address_sizes(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c); --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -269,19 +269,26 @@ static void detect_tme_early(struct cpui c->x86_phys_bits -= keyid_bits; } +void intel_unlock_cpuid_leafs(struct cpuinfo_x86 *c) +{
- if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
return;
- if (c->x86 < 6 || (c->x86 == 6 && c->x86_model < 0xd))
return;
- /*
* The BIOS can have limited CPUID to leaf 2, which breaks feature
* enumeration. Unlock it and update the maximum leaf info.
*/
- if (msr_clear_bit(MSR_IA32_MISC_ENABLE, MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0)
c->cpuid_level = cpuid_eax(0);
+}
static void early_init_intel(struct cpuinfo_x86 *c) { u64 misc_enable;
- /* Unmask CPUID levels if masked: */
- if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
if (msr_clear_bit(MSR_IA32_MISC_ENABLE,
MSR_IA32_MISC_ENABLE_LIMIT_CPUID_BIT) > 0) {
c->cpuid_level = cpuid_eax(0);
get_cpu_cap(c);
}
- }
- if ((c->x86 == 0xf && c->x86_model >= 0x03) || (c->x86 == 0x6 && c->x86_model >= 0x0e)) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
On 24/05/31 10:13AM, Christian Heusel wrote:
On 24/05/30 06:24PM, Thomas Gleixner wrote:
On Thu, May 30 2024 at 17:53, Thomas Gleixner wrote:
Let me figure out how to fix that sanely.
The proper fix is obviously to unlock CPUID on Intel _before_ anything which depends on cpuid_level is evaluated.
Thanks,
tglx
Hey Thomas,
as reported on the other mail the proposed fix broke the build (see below) due to get_cpu_cap() becoming static but still being used in other parts of the code.
One of the reporters in the Arch Bugtracker with an Intel Core i7-7700k has tested a modified version of this fix[0] with the static change reversed on top of the 6.9.2 stable kernel and reports that the patch does not fix the issue for them. I have attached their output for the patched (dmesg6.9.2-1.5.log) and nonpatched (dmesg6.9.2-1.log) kernel.
Should we also get them to test the mainline version or do you need any other debug output?
Cheers, gromit
Now with the logs really attached!
Cheers, Chris
Christian!
On Fri, May 31 2024 at 10:16, Christian Heusel wrote:
One of the reporters in the Arch Bugtracker with an Intel Core i7-7700k has tested a modified version of this fix[0] with the static change reversed on top of the 6.9.2 stable kernel and reports that the patch does not fix the issue for them. I have attached their output for the patched (dmesg6.9.2-1.5.log) and nonpatched (dmesg6.9.2-1.log) kernel.
Should we also get them to test the mainline version or do you need any other debug output?
Can I get:
- dmesg from 6.8.y kernel - output of cpuid -r - content of /sys/kernel/debug/x86/topo/cpus/* (on 6.9.y)
please?
Thanks,
tglx
On Fri, May 31 2024 at 10:48, Thomas Gleixner wrote:
Christian!
On Fri, May 31 2024 at 10:16, Christian Heusel wrote:
One of the reporters in the Arch Bugtracker with an Intel Core i7-7700k has tested a modified version of this fix[0] with the static change reversed on top of the 6.9.2 stable kernel and reports that the patch does not fix the issue for them. I have attached their output for the patched (dmesg6.9.2-1.5.log) and nonpatched (dmesg6.9.2-1.log) kernel.
Should we also get them to test the mainline version or do you need any other debug output?
Can I get:
- dmesg from 6.8.y kernel - output of cpuid -r - content of /sys/kernel/debug/x86/topo/cpus/* (on 6.9.y)
please?
It seems there are two different issues here. The dmesg you provided is from a i7-1255U, which is a hybrid CPU. The i7-7700k has 4 cores (8 threads) and there is not necessarily the same root cause.
Thanks,
tglx
On 24/05/31 11:11AM, Thomas Gleixner wrote:
On Fri, May 31 2024 at 10:48, Thomas Gleixner wrote:
It seems there are two different issues here. The dmesg you provided is from a i7-1255U, which is a hybrid CPU. The i7-7700k has 4 cores (8 threads) and there is not necessarily the same root cause.
It seems like I was also below my needed caffeine levels :p The person reporting (in the same thread) with the i7-7700k reports the problem fixed[1] as well, so this is in line with Peters observerations!
The other person with the i7-1255U in the meantime got back to me with the needed outputs:
Can I get:
- dmesg from 6.8.y kernel
See attachment (dmesg6.8.9-arch1-2.log)
- output of cpuid -r
Basic Leafs : ================ 0x00000000: EAX=0x00000020, EBX=0x756e6547, ECX=0x6c65746e, EDX=0x49656e69 0x00000001: EAX=0x000906a4, EBX=0x12400800, ECX=0x7ffafbff, EDX=0xbfebfbff 0x00000002: EAX=0x00feff01, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000 0x00000004: subleafs: 0: EAX=0x7c004121, EBX=0x01c0003f, ECX=0x0000003f, EDX=0x00000000 1: EAX=0x7c004122, EBX=0x01c0003f, ECX=0x0000007f, EDX=0x00000000 2: EAX=0x7c01c143, EBX=0x03c0003f, ECX=0x000007ff, EDX=0x00000000 3: EAX=0x7c0fc163, EBX=0x02c0003f, ECX=0x00003fff, EDX=0x00000004 0x00000005: EAX=0x00000040, EBX=0x00000040, ECX=0x00000003, EDX=0x10102020 0x00000006: EAX=0x00df8ff7, EBX=0x00000002, ECX=0x00000409, EDX=0x00020003 0x00000007: subleafs: 0: EAX=0x00000002, EBX=0x239ca7eb, ECX=0x984007bc, EDX=0xfc18c410 1: EAX=0x00400810, EBX=0x00000000, ECX=0x00000000, EDX=0x00040000 2: EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000017 0x0000000a: EAX=0x07300605, EBX=0x00000000, ECX=0x00000007, EDX=0x00008603 0x0000000b: subleafs: 0: EAX=0x00000001, EBX=0x00000001, ECX=0x00000100, EDX=0x00000012 1: EAX=0x00000006, EBX=0x0000000c, ECX=0x00000201, EDX=0x00000012 0x0000000d: subleafs: 0: EAX=0x00000207, EBX=0x00000a88, ECX=0x00000a88, EDX=0x00000000 1: EAX=0x0000000f, EBX=0x00000680, ECX=0x00009900, EDX=0x00000000 2: EAX=0x00000100, EBX=0x00000240, ECX=0x00000000, EDX=0x00000000 8: EAX=0x00000080, EBX=0x00000000, ECX=0x00000001, EDX=0x00000000 9: EAX=0x00000008, EBX=0x00000a80, ECX=0x00000000, EDX=0x00000000 11: EAX=0x00000010, EBX=0x00000000, ECX=0x00000001, EDX=0x00000000 12: EAX=0x00000018, EBX=0x00000000, ECX=0x00000001, EDX=0x00000000 15: EAX=0x00000328, EBX=0x00000000, ECX=0x00000001, EDX=0x00000000 0x00000010: subleafs: 0: EAX=0x00000000, EBX=0x00000004, ECX=0x00000000, EDX=0x00000000 2: EAX=0x0000000f, EBX=0x00000000, ECX=0x00000004, EDX=0x0000000f 0x00000014: subleafs: 0: EAX=0x00000001, EBX=0x0000005f, ECX=0x80000007, EDX=0x00000000 1: EAX=0x02490002, EBX=0x003f003f, ECX=0x00000000, EDX=0x00000000 0x00000015: EAX=0x00000002, EBX=0x00000088, ECX=0x0249f000, EDX=0x00000000 0x00000016: EAX=0x00000a28, EBX=0x00000dac, ECX=0x00000064, EDX=0x00000000 0x00000018: subleafs: 0: EAX=0x00000004, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000 1: EAX=0x00000000, EBX=0x00300001, ECX=0x00000001, EDX=0x00000121 2: EAX=0x00000000, EBX=0x00040003, ECX=0x00000200, EDX=0x00000043 3: EAX=0x00000000, EBX=0x00400001, ECX=0x00000001, EDX=0x00000122 4: EAX=0x00000000, EBX=0x00080008, ECX=0x00000001, EDX=0x00000143 0x0000001a: EAX=0x20000001, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000 0x0000001c: EAX=0xc000000b, EBX=0x00000007, ECX=0x00000007, EDX=0x00000000 0x0000001f: subleafs: 0: EAX=0x00000001, EBX=0x00000001, ECX=0x00000100, EDX=0x00000012 1: EAX=0x00000007, EBX=0x0000000c, ECX=0x00000201, EDX=0x00000012 2: EAX=0x00000000, EBX=0x00000000, ECX=0x00000002, EDX=0x00000012 3: EAX=0x00000000, EBX=0x00000000, ECX=0x00000003, EDX=0x00000012 4: EAX=0x00000000, EBX=0x00000000, ECX=0x00000004, EDX=0x00000012 5: EAX=0x00000000, EBX=0x00000000, ECX=0x00000005, EDX=0x00000012 6: EAX=0x00000000, EBX=0x00000000, ECX=0x00000006, EDX=0x00000012 7: EAX=0x00000000, EBX=0x00000000, ECX=0x00000007, EDX=0x00000012 8: EAX=0x00000000, EBX=0x00000000, ECX=0x00000008, EDX=0x00000012 9: EAX=0x00000000, EBX=0x00000000, ECX=0x00000009, EDX=0x00000012 10: EAX=0x00000000, EBX=0x00000000, ECX=0x0000000a, EDX=0x00000012 11: EAX=0x00000000, EBX=0x00000000, ECX=0x0000000b, EDX=0x00000012 12: EAX=0x00000000, EBX=0x00000000, ECX=0x0000000c, EDX=0x00000012 13: EAX=0x00000000, EBX=0x00000000, ECX=0x0000000d, EDX=0x00000012 14: EAX=0x00000000, EBX=0x00000000, ECX=0x0000000e, EDX=0x00000012 15: EAX=0x00000000, EBX=0x00000000, ECX=0x0000000f, EDX=0x00000012 16: EAX=0x00000000, EBX=0x00000000, ECX=0x00000010, EDX=0x00000012 17: EAX=0x00000000, EBX=0x00000000, ECX=0x00000011, EDX=0x00000012 18: EAX=0x00000000, EBX=0x00000000, ECX=0x00000012, EDX=0x00000012 19: EAX=0x00000000, EBX=0x00000000, ECX=0x00000013, EDX=0x00000012 20: EAX=0x00000000, EBX=0x00000000, ECX=0x00000014, EDX=0x00000012 21: EAX=0x00000000, EBX=0x00000000, ECX=0x00000015, EDX=0x00000012 22: EAX=0x00000000, EBX=0x00000000, ECX=0x00000016, EDX=0x00000012 23: EAX=0x00000000, EBX=0x00000000, ECX=0x00000017, EDX=0x00000012 24: EAX=0x00000000, EBX=0x00000000, ECX=0x00000018, EDX=0x00000012 25: EAX=0x00000000, EBX=0x00000000, ECX=0x00000019, EDX=0x00000012 26: EAX=0x00000000, EBX=0x00000000, ECX=0x0000001a, EDX=0x00000012 27: EAX=0x00000000, EBX=0x00000000, ECX=0x0000001b, EDX=0x00000012 28: EAX=0x00000000, EBX=0x00000000, ECX=0x0000001c, EDX=0x00000012 29: EAX=0x00000000, EBX=0x00000000, ECX=0x0000001d, EDX=0x00000012 30: EAX=0x00000000, EBX=0x00000000, ECX=0x0000001e, EDX=0x00000012 31: EAX=0x00000000, EBX=0x00000000, ECX=0x0000001f, EDX=0x00000012 0x00000020: EAX=0x00000000, EBX=0x00000001, ECX=0x00000000, EDX=0x00000000 Extended Leafs : ================ 0x80000000: EAX=0x80000008, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000 0x80000001: EAX=0x00000000, EBX=0x00000000, ECX=0x00000121, EDX=0x2c100800 0x80000002: EAX=0x68743231, EBX=0x6e654720, ECX=0x746e4920, EDX=0x52286c65 0x80000003: EAX=0x6f432029, EBX=0x54286572, ECX=0x6920294d, EDX=0x32312d37 0x80000004: EAX=0x00553535, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000 0x80000006: EAX=0x00000000, EBX=0x00000000, ECX=0x08008040, EDX=0x00000000 0x80000007: EAX=0x00000000, EBX=0x00000000, ECX=0x00000000, EDX=0x00000100 0x80000008: EAX=0x00003027, EBX=0x00000000, ECX=0x00000000, EDX=0x00000000
- content of /sys/kernel/debug/x86/topo/cpus/* (on 6.9.y)
See attachment (cat_debug.log)
Cheers, chris
[1]: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/57#...
On Fri, May 31 2024 at 15:08, Christian Heusel wrote:
On 24/05/31 11:11AM, Thomas Gleixner wrote:
On Fri, May 31 2024 at 10:48, Thomas Gleixner wrote:
It seems there are two different issues here. The dmesg you provided is from a i7-1255U, which is a hybrid CPU. The i7-7700k has 4 cores (8 threads) and there is not necessarily the same root cause.
It seems like I was also below my needed caffeine levels :p The person reporting (in the same thread) with the i7-7700k reports the problem fixed[1] as well, so this is in line with Peters observerations!
Cool!
The other person with the i7-1255U in the meantime got back to me with the needed outputs:
- output of cpuid -r
0x0000000b: subleafs: 0: EAX=0x00000001, EBX=0x00000001, ECX=0x00000100, EDX=0x00000012 1: EAX=0x00000006, EBX=0x0000000c, ECX=0x00000201, EDX=0x00000012
0x0000001f: subleafs: 0: EAX=0x00000001, EBX=0x00000001, ECX=0x00000100, EDX=0x00000012 1: EAX=0x00000007, EBX=0x0000000c, ECX=0x00000201, EDX=0x00000012
So this is inconsistent already. Both leafs should describe the same topology. See the differing EAX values (6/7) in subleaf 1, which are exactly the values the kernel complains about :)
But that should not be an issue because the kernel preferres 0x1f over 0xb and will never evaluate both, but this is just from one randomly picked CPU.
I wonder which variant of the cpuid tool that is. cpuid -r gives you usually just the plain values and collects them for all CPUs.
I really need to have the values for all CPUs to see whether there are differences at the relevant places. The above is probably from one of the E-Cores.
Thanks,
tglx
On 24/05/31 03:42PM, Thomas Gleixner wrote:
On Fri, May 31 2024 at 15:08, Christian Heusel wrote:
The other person with the i7-1255U in the meantime got back to me with the needed outputs:
- output of cpuid -r
0x0000000b: subleafs: 0: EAX=0x00000001, EBX=0x00000001, ECX=0x00000100, EDX=0x00000012 1: EAX=0x00000006, EBX=0x0000000c, ECX=0x00000201, EDX=0x00000012
0x0000001f: subleafs: 0: EAX=0x00000001, EBX=0x00000001, ECX=0x00000100, EDX=0x00000012 1: EAX=0x00000007, EBX=0x0000000c, ECX=0x00000201, EDX=0x00000012
So this is inconsistent already. Both leafs should describe the same topology. See the differing EAX values (6/7) in subleaf 1, which are exactly the values the kernel complains about :)
But that should not be an issue because the kernel preferres 0x1f over 0xb and will never evaluate both, but this is just from one randomly picked CPU.
I wonder which variant of the cpuid tool that is. cpuid -r gives you usually just the plain values and collects them for all CPUs.
The previously attached one is output from the version located here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tool...
The one I have now attached is the one being built from this: https://www.etallen.com/cpuid.html
Cheers, Chris
On Fri, May 31 2024 at 16:29, Christian Heusel wrote:
P-Cores are consistent:
CPU 0: 0x0000000b 0x01: eax=0x00000006 ebx=0x0000000c ecx=0x00000201 edx=0x00000000
0x0000001f 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000000
E-Cores are not:
CPU 4: 0x0000000b 0x01: eax=0x00000006 ebx=0x0000000c ecx=0x00000201 edx=0x00000010
0x0000001f 0x01: eax=0x00000007 ebx=0x0000000c ecx=0x00000201 edx=0x00000010
As the topology is evaluated from CPU0 CPUID leaf 0x1f it's obvious that CPU4...11 will trigger the sanity checks because their CPUID leaf 0x1f subleaf 1 entries are bogus.
IOW it's a firmware bug and there is nothing the kernel will and can do about it except what it does already: complaining about the inconsistency.
Thanks for providing all the information!
tglx
On Fri, May 31 2024 at 10:16, Christian Heusel wrote:
On 24/05/31 10:13AM, Christian Heusel wrote: [ 0.046127] TSC deadline timer available [ 0.046129] CPU topo: Max. logical packages: 1 [ 0.046129] CPU topo: Max. logical dies: 1 [ 0.046129] CPU topo: Max. dies per package: 1 [ 0.046131] CPU topo: Max. threads per core: 2 [ 0.046132] CPU topo: Num. cores per package: 10 [ 0.046132] CPU topo: Num. threads per package: 12 [ 0.046132] CPU topo: Allowing 12 present CPUs plus 0 hotplug CPUs
This looks correct.
[ 0.117308] smpboot: x86: Booting SMP configuration: [ 0.117308] .... node #0, CPUs: #2 #4 #5 #6 #7 #8 #9 #10 #11 [ 0.009676] [Firmware Bug]: CPU4: Topology domain 1 shift 7 != 6
So this means that the E-Cores have a different topology information for the CORE shift value than the P-Cores which is definitely wrong.
Let's see what cpuid -r reports.
Thanks,
tglx
linux-stable-mirror@lists.linaro.org