[PATCH AUTOSEL 5.10 1/6] pm: cpupower: bench: Prevent NULL dereference on malloc failure

List overview All Threads
Download

newer

older

Recall: Re: [PATCH v3 1/2]...

[Patch v2 1/4] Drivers: hv:...

Sasha Levin

31 Mar 2025 31 Mar '25

2:37 p.m.

From: Zhongqiu Han quic_zhonhan@quicinc.com

[ Upstream commit 208baa3ec9043a664d9acfb8174b332e6b17fb69 ]

If malloc returns NULL due to low memory, 'config' pointer can be NULL. Add a check to prevent NULL dereference.

Link: https://lore.kernel.org/r/20250219122715.3892223-1-quic_zhonhan@quicinc.com Signed-off-by: Zhongqiu Han quic_zhonhan@quicinc.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/power/cpupower/bench/parse.c | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/tools/power/cpupower/bench/parse.c b/tools/power/cpupower/bench/parse.c index e63dc11fa3a53..48e25be6e1635 100644 --- a/tools/power/cpupower/bench/parse.c +++ b/tools/power/cpupower/bench/parse.c @@ -120,6 +120,10 @@ FILE *prepare_output(const char *dirname) struct config *prepare_default_config() { struct config *config = malloc(sizeof(struct config)); + if (!config) { + perror("malloc"); + return NULL; + }

dprintf("loading defaults\n");

-- 2.39.5

Show replies by date

Sasha Levin

31 Mar 31 Mar

2:37 p.m.

New subject: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

From: Max Grobecker max@grobecker.info

[ Upstream commit a4248ee16f411ac1ea7dfab228a6659b111e3d65 ]

When running in a virtual machine, we might see the original hardware CPU vendor string (i.e. "AuthenticAMD"), but a model and family ID set by the hypervisor. In case we run on AMD hardware and the hypervisor sets a model ID < 0x14, the LAHF cpu feature is eliminated from the the list of CPU capabilities present to circumvent a bug with some BIOSes in conjunction with AMD K8 processors.

Parsing the flags list from /proc/cpuinfo seems to be happening mostly in bash scripts and prebuilt Docker containers, as it does not need to have additionals tools present – even though more reliable ways like using "kcpuid", which calls the CPUID instruction instead of parsing a list, should be preferred. Scripts, that use /proc/cpuinfo to determine if the current CPU is "compliant" with defined microarchitecture levels like x86-64-v2 will falsely claim the CPU is incapable of modern CPU instructions when "lahf_lm" is missing in that flags list.

This can prevent some docker containers from starting or build scripts to create unoptimized binaries.

Admittably, this is more a small inconvenience than a severe bug in the kernel and the shoddy scripts that rely on parsing /proc/cpuinfo should be fixed instead.

This patch adds an additional check to see if we're running inside a virtual machine (X86_FEATURE_HYPERVISOR is present), which, to my understanding, can't be present on a real K8 processor as it was introduced only with the later/other Athlon64 models.

Example output with the "lahf_lm" flag missing in the flags list (should be shown between "hypervisor" and "abm"):

$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 6 model name : Common KVM processor stepping : 1 microcode : 0x1000065 cpu MHz : 2599.998 cache size : 512 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c hypervisor abm 3dnowprefetch vmmcall bmi1 avx2 bmi2 xsaveopt

... while kcpuid shows the feature to be present in the CPU:

# kcpuid -d | grep lahf lahf_lm - LAHF/SAHF available in 64-bit mode

[ mingo: Updated the comment a bit, incorporated Boris's review feedback. ]

Signed-off-by: Max Grobecker max@grobecker.info Signed-off-by: Ingo Molnar mingo@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Borislav Petkov bp@alien8.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/amd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index c10f7dcaa7b7c..5f0bdb53b0067 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -839,7 +839,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) * (model = 0x14) and later actually support it. * (AMD Erratum #110, docId: 25759). */ - if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM)) { + if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) { clear_cpu_cap(c, X86_FEATURE_LAHF_LM); if (!rdmsrl_amd_safe(0xc001100d, &value)) { value &= ~BIT_64(32);

-- 2.39.5

Pavel Machek

18 Apr 18 Apr

4:54 p.m.

New subject: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

Hi!

...

From: Max Grobecker max@grobecker.info

[ Upstream commit a4248ee16f411ac1ea7dfab228a6659b111e3d65 ]

...

This can prevent some docker containers from starting or build scripts to create unoptimized binaries.

Admittably, this is more a small inconvenience than a severe bug in the kernel and the shoddy scripts that rely on parsing /proc/cpuinfo should be fixed instead.

I'd say this is not good stable candidate.

Best regards, Pavel

...

+++ b/arch/x86/kernel/cpu/amd.c @@ -839,7 +839,7 @@ static void init_amd_k8(struct cpuinfo_x86 *c) * (model = 0x14) and later actually support it. * (AMD Erratum #110, docId: 25759). */

if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM)) {

if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) { clear_cpu_cap(c, X86_FEATURE_LAHF_LM); if (!rdmsrl_amd_safe(0xc001100d, &value)) { value &= ~BIT_64(32);

-- DENX Software Engineering GmbH, Managing Director: Erika Unter HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

Sean Christopherson

5:19 p.m.

New subject: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

On Fri, Apr 18, 2025, Pavel Machek wrote:

...

Hi!

...
From: Max Grobecker max@grobecker.info

[ Upstream commit a4248ee16f411ac1ea7dfab228a6659b111e3d65 ]

...
This can prevent some docker containers from starting or build scripts to create unoptimized binaries.

Admittably, this is more a small inconvenience than a severe bug in the kernel and the shoddy scripts that rely on parsing /proc/cpuinfo should be fixed instead.

Uh, and the hypervisor too? Why is the hypervisor enumerating an old K8 CPU for what appears to be a modern workload?

...

I'd say this is not good stable candidate.

Eh, practically speaking, there's no chance of this causing problems. The setup is all kinds of weird, but AIUI, K8 CPUs don't support virtualization so there's no chance that the underlying CPU is actually affected by the K8 bug, because the underlying CPU can't be K8. And no bare metal CPU will ever set the HYPERVISOR bit, so there won't be false positives on that front.

I personally object to the patch itself; it's not the kernel's responsibility to deal with a misconfigured VM. But unless we revert the commit, I don't see any reason to withhold this from stable@.

Borislav Petkov

5:36 p.m.

New subject: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

On Fri, Apr 18, 2025 at 10:19:14AM -0700, Sean Christopherson wrote:

...

Uh, and the hypervisor too? Why is the hypervisor enumerating an old K8 CPU for what appears to be a modern workload?

...
I'd say this is not good stable candidate.

Eh, practically speaking, there's no chance of this causing problems. The setup is all kinds of weird, but AIUI, K8 CPUs don't support virtualization so there's no chance that the underlying CPU is actually affected by the K8 bug, because the underlying CPU can't be K8. And no bare metal CPU will ever set the HYPERVISOR bit, so there won't be false positives on that front.

I personally object to the patch itself; it's not the kernel's responsibility to deal with a misconfigured VM. But unless we revert the commit, I don't see any reason to withhold this from stable@.

I objected back then but it is some obscure VM migration madness (pasting the whole reply here because it didn't land on any ML):

Date: Tue, 17 Dec 2024 21:32:21 +0100 From: Max Grobecker max@grobecker.info To: Thomas Gleixner tglx@linutronix.de, Ingo Molnar mingo@redhat.com, Borislav Petkov bp@alien8.de, Dave Hansen dave.hansen@linux.intel.com Cc: Max Grobecker max@grobecker.info, x86@kernel.org Subject: Re: [PATCH v2] Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine Message-ID: <d77caeea-b922-4bf5-8349-4b5acab4d2eb> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8

Hi,

sorry for my late response, was hit by a flu last days.

On Tue, 10 Dec 2024 13:51:50 +0100, Borislav Petkov wrote:

...

Lemme get this straight: you - I don't know who "we" is - are running K8 models in guests? Why?

Oh, I see, I missed to explain that, indeed.

This error happens, when I start a virtual machine using libvirt/QEMU while not passing through the host CPU. I do this, because I want to be able to live-migrate the VM between hosts, that have slightly different CPUs. Migration works, but only if I choose the generic "kvm64" CPU preset to be used with QEMU using the "-cpu kvm64" parameter:

qemu-system-x86_64 -cpu kvm64

I also explicitly enabled additional features like SSE4.1 or AXV2 to have as most features as I can but still being able to do live-migration between hosts.

By using this config, the CPU is identified as "Common KVM processor" inside the VM:

processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 6 model name : Common KVM processor

Also, the model reads as 0x06, which is set by that kvm64 CPU preset, but usually does not pose a problem.

The original vendor id of the host CPU is still visible to the guest, and in case the host uses an AMD CPU the combination of "AuthenticAMD" and model 0x06 triggers the bug and the lahf_lm flag vanishes. If the guest is running with the same settings on an Intel CPU and therefore reads "GenuineIntel" as the vendor string, the model is still 0x06, but also the lahf_lm flag is still listed in /proc/cpuinfo.

The CPU is mistakenly identified to be an AMD K8 model, while, in fact, nearly all features a modern Epyc or Xeon CPU is offering, are available.

Greetings, Max

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson

6:31 p.m.

New subject: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

On Fri, Apr 18, 2025, Borislav Petkov wrote:

...

On Fri, Apr 18, 2025 at 10:19:14AM -0700, Sean Christopherson wrote:

...
Uh, and the hypervisor too? Why is the hypervisor enumerating an old K8 CPU for what appears to be a modern workload?

...
I'd say this is not good stable candidate.

Eh, practically speaking, there's no chance of this causing problems. The setup is all kinds of weird, but AIUI, K8 CPUs don't support virtualization so there's no chance that the underlying CPU is actually affected by the K8 bug, because the underlying CPU can't be K8. And no bare metal CPU will ever set the HYPERVISOR bit, so there won't be false positives on that front.

I personally object to the patch itself; it's not the kernel's responsibility to deal with a misconfigured VM. But unless we revert the commit, I don't see any reason to withhold this from stable@.

I objected back then but it is some obscure VM migration madness (pasting the whole reply here because it didn't land on any ML):

Date: Tue, 17 Dec 2024 21:32:21 +0100 From: Max Grobecker max@grobecker.info To: Thomas Gleixner tglx@linutronix.de, Ingo Molnar mingo@redhat.com, Borislav Petkov bp@alien8.de, Dave Hansen dave.hansen@linux.intel.com Cc: Max Grobecker max@grobecker.info, x86@kernel.org Subject: Re: [PATCH v2] Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine Message-ID: <d77caeea-b922-4bf5-8349-4b5acab4d2eb> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=utf-8

Hi,

sorry for my late response, was hit by a flu last days.

On Tue, 10 Dec 2024 13:51:50 +0100, Borislav Petkov wrote:

...
Lemme get this straight: you - I don't know who "we" is - are running K8 models in guests? Why?

Oh, I see, I missed to explain that, indeed.

This error happens, when I start a virtual machine using libvirt/QEMU while not passing through the host CPU. I do this, because I want to be able to live-migrate the VM between hosts, that have slightly different CPUs. Migration works, but only if I choose the generic "kvm64" CPU preset to be used with QEMU using the "-cpu kvm64" parameter: qemu-system-x86_64 -cpu kvm64 I also explicitly enabled additional features like SSE4.1 or AXV2 to have as most features as I can but still being able to do live-migration between hosts. By using this config, the CPU is identified as "Common KVM processor" inside the VM:

processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 6 model name : Common KVM processor

Also, the model reads as 0x06, which is set by that kvm64 CPU preset, but usually does not pose a problem.

IMO, this is blatantly a QEMU bug (I verified the behavior when using "kvm64" on AMD). As per QEMU commit d1cd4bf419 ("introduce kvm64 CPU"), the vendor + FMS enumerates an Intel P4:

.name = "kvm64", .level = 0xd, .vendor = CPUID_VENDOR_INTEL, .family = 15, .model = 6,

Per x86_cpu_load_model(), QEMU overrides the vendor when using KVM (at a glance, I can't find the code that actually overrides the vendor, gotta love QEMU's object model):

/* * vendor property is set here but then overloaded with the * host cpu vendor for KVM and HVF. */ object_property_set_str(OBJECT(cpu), "vendor", def->vendor, &error_abort);

Overriding the vendor but using Intel's P4 FMS is flat out wrong. IMO, QEMU should use the same FMS as qemu64 for kvm64 when running on AMD.

.name = "qemu64", .level = 0xd, .vendor = CPUID_VENDOR_AMD, .family = 15, .model = 107, .stepping = 1,

Yeah, scraping FMS information is a bad idea, but what QEMU is doing is arguably far worse.

...

The original vendor id of the host CPU is still visible to the guest, and in case the host uses an AMD CPU the combination of "AuthenticAMD" and model 0x06 triggers the bug and the lahf_lm flag vanishes. If the guest is running with the same settings on an Intel CPU and therefore reads "GenuineIntel" as the vendor string, the model is still 0x06, but also the lahf_lm flag is still listed in /proc/cpuinfo.

The CPU is mistakenly identified to be an AMD K8 model, while, in fact, nearly all features a modern Epyc or Xeon CPU is offering, are available.

Borislav Petkov

7:12 p.m.

New subject: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

On Fri, Apr 18, 2025 at 11:31:27AM -0700, Sean Christopherson wrote:

...

IMO, this is blatantly a QEMU bug (I verified the behavior when using "kvm64" on AMD). As per QEMU commit d1cd4bf419 ("introduce kvm64 CPU"), the vendor + FMS enumerates an Intel P4:
    .name = "kvm64",
    .level = 0xd,
    .vendor = CPUID_VENDOR_INTEL,
    .family = 15,
    .model = 6,
Per x86_cpu_load_model(), QEMU overrides the vendor when using KVM (at a glance, I can't find the code that actually overrides the vendor, gotta love QEMU's object model):

LOL, I thought I was the only one who thought this is madness. :-P

...

/*
 * vendor property is set here but then overloaded with the
 * host cpu vendor for KVM and HVF.
 */
object_property_set_str(OBJECT(cpu), "vendor", def->vendor, &error_abort);
Overriding the vendor but using Intel's P4 FMS is flat out wrong. IMO, QEMU should use the same FMS as qemu64 for kvm64 when running on AMD.
    .name = "qemu64",
    .level = 0xd,
    .vendor = CPUID_VENDOR_AMD,
    .family = 15,
    .model = 107,
    .stepping = 1,
Yeah, scraping FMS information is a bad idea, but what QEMU is doing is arguably far worse.

Ok, let's fix qemu. I don't have a clue, though, how to go about that so I'd rely on your guidance here.

Because I really hate wagging the dog and "fixing" the kernel because something else can't be bothered. I didn't object stronger to that fix because it is meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's apparently not that awful-ish...

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson

22 Apr 22 Apr

5:22 p.m.

New subject: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine

+Paolo

On Fri, Apr 18, 2025, Borislav Petkov wrote:

...

On Fri, Apr 18, 2025 at 11:31:27AM -0700, Sean Christopherson wrote:

...
IMO, this is blatantly a QEMU bug (I verified the behavior when using "kvm64" on AMD). As per QEMU commit d1cd4bf419 ("introduce kvm64 CPU"), the vendor + FMS enumerates an Intel P4:
    .name = "kvm64",
    .level = 0xd,
    .vendor = CPUID_VENDOR_INTEL,
    .family = 15,
    .model = 6,
Per x86_cpu_load_model(), QEMU overrides the vendor when using KVM (at a glance, I can't find the code that actually overrides the vendor, gotta love QEMU's object model):
LOL, I thought I was the only one who thought this is madness. :-P

Yeah, I've got backtraces and I still don't entirely understand who's doing what.

...

...
/*
 * vendor property is set here but then overloaded with the
 * host cpu vendor for KVM and HVF.
 */
object_property_set_str(OBJECT(cpu), "vendor", def->vendor, &error_abort);
Overriding the vendor but using Intel's P4 FMS is flat out wrong. IMO, QEMU should use the same FMS as qemu64 for kvm64 when running on AMD.
    .name = "qemu64",
    .level = 0xd,
    .vendor = CPUID_VENDOR_AMD,
    .family = 15,
    .model = 107,
    .stepping = 1,
Yeah, scraping FMS information is a bad idea, but what QEMU is doing is arguably far worse.
Ok, let's fix qemu. I don't have a clue, though, how to go about that so I'd rely on your guidance here.

I have no idea how to fix the QEMU code.

Paolo,

The TL;DR of the problem is that QEMU's "kvm64" CPU type sets FMS to Intel P4, and doesn't swizzle the FMS to something sane when running on AMD. This results in QEMU advertising the CPU as an ancient K8, which causes at least one *known* problem due software making decisions on the funky FMS.

My stance is that QEMU is buggy/flawed and should stuff a FMS that is sane for the underlying vendor for kvm64. I'd send an RFC patch, but for the life of me I can't figure what that would even look like.

...

Because I really hate wagging the dog and "fixing" the kernel because something else can't be bothered. I didn't object stronger to that fix because it is meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's apparently not that awful-ish...

FWIW, I think splattering X86_FEATURE_HYPERVISOR everywhere is quite awful. There are definitely cases where the kernel needs to know if it's running as a guest, because the behavior of "hardware" fundamentally changes in ways that can't be enumerated otherwise. E.g. that things like the HPET are fully emulated and thus will be prone to significant jitter.

But when it comes to feature enumeration, IMO sprinkling HYPERVISOR everywhere is unnecessary because it's the hypervisor/VMM's responsibility to present a sane model. And I also think it's outright dangerous, because everywhere the kernel does X for bare metal and Y for guest results in reduced test coverage.

E.g. things like syzkaller and other bots will largely be testing the HYPERVISOR code, while humans will largely be testing and using the bare metal code.

Borislav Petkov

5:33 p.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Tue, Apr 22, 2025 at 10:22:54AM -0700, Sean Christopherson wrote:

...

...
Because I really hate wagging the dog and "fixing" the kernel because something else can't be bothered. I didn't object stronger to that fix because it is meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's apparently not that awful-ish...

FWIW, I think splattering X86_FEATURE_HYPERVISOR everywhere is quite awful. There are definitely cases where the kernel needs to know if it's running as a guest, because the behavior of "hardware" fundamentally changes in ways that can't be enumerated otherwise. E.g. that things like the HPET are fully emulated and thus will be prone to significant jitter.

But when it comes to feature enumeration, IMO sprinkling HYPERVISOR everywhere is unnecessary because it's the hypervisor/VMM's responsibility to present a sane model. And I also think it's outright dangerous, because everywhere the kernel does X for bare metal and Y for guest results in reduced test coverage.

E.g. things like syzkaller and other bots will largely be testing the HYPERVISOR code, while humans will largely be testing and using the bare metal code.

All valid points...

At least one case justifies the X86_FEATURE_HYPERVISOR check: microcode loading and we've chewed that topic back then with Xen ad nauseam.

But I'd love to whack as many of such checks as possible.

$ git grep X86_FEATURE_HYPERVISOR | wc -l 60

I think I should start whacking at those and CC you if I'm not sure. It'll be a long-term, low prio thing but it'll be a good cleanup.

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson

7:48 p.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Tue, Apr 22, 2025, Borislav Petkov wrote:

...

On Tue, Apr 22, 2025 at 10:22:54AM -0700, Sean Christopherson wrote:

...
...
Because I really hate wagging the dog and "fixing" the kernel because something else can't be bothered. I didn't object stronger to that fix because it is meh, more of those "if I'm a guest" gunk which we sprinkle nowadays and that's apparently not that awful-ish...

FWIW, I think splattering X86_FEATURE_HYPERVISOR everywhere is quite awful. There are definitely cases where the kernel needs to know if it's running as a guest, because the behavior of "hardware" fundamentally changes in ways that can't be enumerated otherwise. E.g. that things like the HPET are fully emulated and thus will be prone to significant jitter.

But when it comes to feature enumeration, IMO sprinkling HYPERVISOR everywhere is unnecessary because it's the hypervisor/VMM's responsibility to present a sane model. And I also think it's outright dangerous, because everywhere the kernel does X for bare metal and Y for guest results in reduced test coverage.

E.g. things like syzkaller and other bots will largely be testing the HYPERVISOR code, while humans will largely be testing and using the bare metal code.

All valid points...

At least one case justifies the X86_FEATURE_HYPERVISOR check: microcode loading and we've chewed that topic back then with Xen ad nauseam.

Yeah, from my perspective, ucode loading falls into the "fundamentally different" bucket.

...

But I'd love to whack as many of such checks as possible.

$ git grep X86_FEATURE_HYPERVISOR | wc -l 60

I think I should start whacking at those and CC you if I'm not sure. It'll be a long-term, low prio thing but it'll be a good cleanup.

I did a quick pass. Most of the usage is "fine". E.g. explicit PV code, cases where checking for HYPERVISOR is the least awful option, etc.

Looks sketchy, might be worth investigating? -------------------------------------------- arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_IBPB_BRTYPE)) { arch/x86/kernel/cpu/amd.c: if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (cpu_has(c, X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/topology_amd.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && tscan->c->x86_model <= 0x3) { arch/x86/mm/init_64.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/mm/pat/set_memory.c: return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR); drivers/platform/x86/intel/pmc/pltdrv.c: if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && !xen_initial_domain()) drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c: if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) --------------------------------------

Could do with some love, but not horrible. ------------------------------------------ Eww. Optimization to lessen the pain of DR7 interception. It'd be nice to clean this up at some point, especially with things like SEV-ES with DebugSwap, where DR7 is never intercepted. arch/x86/include/asm/debugreg.h: if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active()) arch/x86/kernel/hw_breakpoint.c: * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()

This usage should be restricted to just the FMS matching, but unfortunately needs to be kept for that check. arch/x86/kernel/cpu/bus_lock.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

Most of these look sane, e.g. are just being transparent about the state of mitigations when running in a VM. The use in update_srbds_msr() is the only one that stands out as somewhat sketchy. arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: else if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { ------------------------------------------

Don't bother ------------------------------------------ Most of these look sane, e.g. are just being transparent about the state of mitigations when running in a VM. The use in update_srbds_msr() is the only one that stands out as somewhat sketchy. arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: else if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/bugs.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {

Perf, don't bother. PMUs are notoriously virtualization-unfriendly, and perf has had to resort to detecting its running in a VM to avoid crashing the kernel, and I don't see this being fully solved any time soon. arch/x86/events/core.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/events/intel/core.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/events/intel/core.c: int assume = 3 * !boot_cpu_has(X86_FEATURE_HYPERVISOR); arch/x86/events/intel/cstate.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/events/intel/uncore.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

PV code of one form or another. arch/x86/include/asm/acrn.h: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/hyperv/ivm.c: if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/mshyperv.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/vmware.c: * If !boot_cpu_has(X86_FEATURE_HYPERVISOR), vmware_hypercall_mode arch/x86/kernel/cpu/vmware.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/jailhouse.c: !boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/kvm.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/paravirt.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) arch/x86/kernel/tsc.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || arch/x86/kvm/vmx/vmx.c: if (!static_cpu_has(X86_FEATURE_HYPERVISOR) || arch/x86/virt/svm/cmdline.c: if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) {

Ugh. Eliding WBINVD when running as a VM. Probably the least awful option as there's no sane way to enumerate that WBINVD is a nop, and a "passthrough" setup can (and should) simply omit HYPERVISOR. arch/x86/include/asm/acenv.h: if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR)) \

Skip sanity check on TSC deadline timer. Makes sense to keep; either the timer is emulated and thus not subject to hardware errata, or its passed through, in which case HYPERVSIOR arguably shouldn't be set. arch/x86/kernel/apic/apic.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

This "feature" is awful, but getting rid of it may not be feasible. https://lore.kernel.org/all/20250201005048.657470-1-seanjc@google.com arch/x86/kernel/cpu/mtrr/generic.c: if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))

Exempting VMs from a gross workaround for old, buggy Intel chipsets. Fine to keep. drivers/acpi/processor_idle.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

More mitigation crud, probably not worth pursuing. arch/x86/kernel/cpu/common.c: boot_cpu_has(X86_FEATURE_HYPERVISOR))) arch/x86/kernel/cpu/common.c: if (cpu_has(c, X86_FEATURE_HYPERVISOR)) {

LOL. Skip ucode revision check when detecting bad Spectre mitigation. arch/x86/kernel/cpu/intel.c: if (cpu_has(c, X86_FEATURE_HYPERVISOR)) ------------------------------------------

Borislav Petkov

23 Apr 23 Apr

7:20 a.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Tue, Apr 22, 2025 at 12:48:44PM -0700, Sean Christopherson wrote:

...

I did a quick pass.

You couldn't resist, I know. Doing something else for a change is always cool.

:-P

...

Most of the usage is "fine". E.g. explicit PV code, cases where checking for HYPERVISOR is the least awful option, etc.

Looks sketchy, might be worth investigating?

Oh, I will, it is on my do-this-while-waiting-for-compile/test-to-finish. ;-P

...

arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&

So that first one is to set CC_ATTR_HOST_SEV_SNP when we really are a SNP host. I'll go through the rest slowly.

...

arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_IBPB_BRTYPE)) { arch/x86/kernel/cpu/amd.c: if (c->x86_model < 0x14 && cpu_has(c, X86_FEATURE_LAHF_LM) && !cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (cpu_has(c, X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) { arch/x86/kernel/cpu/amd.c: if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) arch/x86/kernel/cpu/topology_amd.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR) && tscan->c->x86_model <= 0x3) { arch/x86/mm/init_64.c: if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) { arch/x86/mm/pat/set_memory.c: return !cpu_feature_enabled(X86_FEATURE_HYPERVISOR); drivers/platform/x86/intel/pmc/pltdrv.c: if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR) && !xen_initial_domain()) drivers/platform/x86/intel/uncore-frequency/uncore-frequency.c: if (cpu_feature_enabled(X86_FEATURE_HYPERVISOR))

Could do with some love, but not horrible.

Eww. Optimization to lessen the pain of DR7 interception. It'd be nice to clean this up at some point, especially with things like SEV-ES with DebugSwap, where DR7 is never intercepted. arch/x86/include/asm/debugreg.h: if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active()) arch/x86/kernel/hw_breakpoint.c: * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()

Patch adding it says "Because DRn access is 'difficult' with virt;..." so yeah. I guess we need to agree how to do debug exceptions in guests. Probably start documenting it and then have guest and host adhere to that. I'm talking completely without having looked at what the code does but the "handshake" agreement should be something like this and then we can start simplifying code...

...

This usage should be restricted to just the FMS matching, but unfortunately needs to be kept for that check. arch/x86/kernel/cpu/bus_lock.c: if (boot_cpu_has(X86_FEATURE_HYPERVISOR))

I have no idea why that was added - perhaps to avoid split-lock related #ACs on guests...

/does more git archeology...

Aha, I see it: 6650cdd9a8ccf

Although this doesn't explicitly comment on the guest aspect...

Anyway, thanks for the initial run-thru - I'll keep coming back to this as time provides and we can talk.

Others reading are ofc more than welcome to do patches...

;-)

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson

2:10 p.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Wed, Apr 23, 2025, Borislav Petkov wrote:

...

...
Eww. Optimization to lessen the pain of DR7 interception. It'd be nice to clean this up at some point, especially with things like SEV-ES with DebugSwap, where DR7 is never intercepted. arch/x86/include/asm/debugreg.h: if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active()) arch/x86/kernel/hw_breakpoint.c: * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()

Patch adding it says "Because DRn access is 'difficult' with virt;..." so yeah. I guess we need to agree how to do debug exceptions in guests. Probably start documenting it and then have guest and host adhere to that. I'm talking completely without having looked at what the code does but the "handshake" agreement should be something like this and then we can start simplifying code...

I don't know that we'll be able to simplify the code.

#DBs in the guest are complex because DR[0-3] aren't context switched by hardware, and running with active breakpoints is uncommon. As a result, loading the guest's DRs into hardware on every VM-Enter is undesirable, because it would add significant latency (load DRs on entry, save DRs on exit) for a relatively rare situation (guest has active breakpoints).

KVM (and presumably other hypervisors) intercepts DR accesses so that it can detect when the guest has active breakpoints (DR7 bits enabled), at which point KVM does load the guest's DRs into hardware and disables DR interception until the next VM-Exit.

KVM also allows the host user to utilize hardware breakpoints to debug the guest, which further adds to the madness, and that's not something the guest can change or even influence.

So removing the "am I guest logic" entirely probably isn't feasible, because in the common case where there are no active breakpoints, reading cpu_dr7 instead of DR7 is a significant performance boost for "normal" VMs.

I mentioned SEV-ES+ DebugSwap because in that case DR7 is effectively guaranteed to not be intercepted, and so the native behavior of reading DR7 instead of the per-CPU variable is likely desirable. I believe TDX has similar functionality (I forget if it's always on, or opt-in).

Borislav Petkov

6:43 p.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Wed, Apr 23, 2025 at 07:10:17AM -0700, Sean Christopherson wrote:

...

On Wed, Apr 23, 2025, Borislav Petkov wrote:

...
...
Eww. Optimization to lessen the pain of DR7 interception. It'd be nice to clean this up at some point, especially with things like SEV-ES with DebugSwap, where DR7 is never intercepted. arch/x86/include/asm/debugreg.h: if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active()) arch/x86/kernel/hw_breakpoint.c: * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()

Patch adding it says "Because DRn access is 'difficult' with virt;..." so yeah. I guess we need to agree how to do debug exceptions in guests. Probably start documenting it and then have guest and host adhere to that. I'm talking completely without having looked at what the code does but the "handshake" agreement should be something like this and then we can start simplifying code...

I don't know that we'll be able to simplify the code.

#DBs in the guest are complex because DR[0-3] aren't context switched by hardware, and running with active breakpoints is uncommon. As a result, loading the guest's DRs into hardware on every VM-Enter is undesirable, because it would add significant latency (load DRs on entry, save DRs on exit) for a relatively rare situation (guest has active breakpoints).

KVM (and presumably other hypervisors) intercepts DR accesses so that it can detect when the guest has active breakpoints (DR7 bits enabled), at which point KVM does load the guest's DRs into hardware and disables DR interception until the next VM-Exit.

KVM also allows the host user to utilize hardware breakpoints to debug the guest, which further adds to the madness, and that's not something the guest can change or even influence.

So removing the "am I guest logic" entirely probably isn't feasible, because in the common case where there are no active breakpoints, reading cpu_dr7 instead of DR7 is a significant performance boost for "normal" VMs.

So I see three modes:

- default off - the usual case

- host debugs the guest

- guests are allowed to do breakpoints

So depending on what is enabled, the code can behave properly - it just needs logic which tells the relevant code - guest or host - which of the debugging mode is enabled. And then everything adheres to that and DTRT.

But before any of that, the even more important question is: do we even care to beef it up that much?

I get the feeling that we don't so it likely is a "whatever's the easiest" game.

...

I mentioned SEV-ES+ DebugSwap because in that case DR7 is effectively guaranteed to not be intercepted, and so the native behavior of reading DR7 instead of the per-CPU variable is likely desirable. I believe TDX has similar functionality (I forget if it's always on, or opt-in).

Aha, the choice was made by the CoCo hw designers - guests are allowed to do breakpoints.

Oh well...

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson

24 Apr 24 Apr

7:18 p.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Wed, Apr 23, 2025, Borislav Petkov wrote:

...

On Wed, Apr 23, 2025 at 07:10:17AM -0700, Sean Christopherson wrote:

...
On Wed, Apr 23, 2025, Borislav Petkov wrote:

...
...
Eww. Optimization to lessen the pain of DR7 interception. It'd be nice to clean this up at some point, especially with things like SEV-ES with DebugSwap, where DR7 is never intercepted. arch/x86/include/asm/debugreg.h: if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active()) arch/x86/kernel/hw_breakpoint.c: * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()

Patch adding it says "Because DRn access is 'difficult' with virt;..." so yeah. I guess we need to agree how to do debug exceptions in guests. Probably start documenting it and then have guest and host adhere to that. I'm talking completely without having looked at what the code does but the "handshake" agreement should be something like this and then we can start simplifying code...

I don't know that we'll be able to simplify the code.

#DBs in the guest are complex because DR[0-3] aren't context switched by hardware, and running with active breakpoints is uncommon. As a result, loading the guest's DRs into hardware on every VM-Enter is undesirable, because it would add significant latency (load DRs on entry, save DRs on exit) for a relatively rare situation (guest has active breakpoints).

KVM (and presumably other hypervisors) intercepts DR accesses so that it can detect when the guest has active breakpoints (DR7 bits enabled), at which point KVM does load the guest's DRs into hardware and disables DR interception until the next VM-Exit.

KVM also allows the host user to utilize hardware breakpoints to debug the guest, which further adds to the madness, and that's not something the guest can change or even influence.

So removing the "am I guest logic" entirely probably isn't feasible, because in the common case where there are no active breakpoints, reading cpu_dr7 instead of DR7 is a significant performance boost for "normal" VMs.

So I see three modes:

default off - the usual case

host debugs the guest

guests are allowed to do breakpoints

Not quite. KVM supports all of those seamlessly, with some caveats. E.g. if host userspace and guest kernel are trying to use the same DRx, the guest will "lose" and not get its #DBs.

...

So depending on what is enabled, the code can behave properly - it just needs logic which tells the relevant code - guest or host - which of the debugging mode is enabled. And then everything adheres to that and DTRT.

But before any of that, the even more important question is: do we even care to beef it up that much?

I get the feeling that we don't so it likely is a "whatever's the easiest" game.

Definitely not. All I was thinking was something like:

diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h index fdbbbfec745a..a218c5170ecd 100644 --- a/arch/x86/include/asm/debugreg.h +++ b/arch/x86/include/asm/debugreg.h @@ -121,7 +121,7 @@ static __always_inline unsigned long local_db_save(void) { unsigned long dr7;

- if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active()) + if (static_cpu_has(X86_FEATURE_DRS_MAY_VMEXIT) && !hw_breakpoint_active()) return 0;

get_debugreg(dr7, 7);

Where X86_FEATURE_DRS_MAY_VMEXIT is set if HYPERVISOR is detected, but then cleared by SEV-ES+ and TDX guests with guaranteed access to DRs. That said, even that much infrastructure probably isn't worth the marginal benefits.

Borislav Petkov

8:31 p.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Thu, Apr 24, 2025 at 12:18:50PM -0700, Sean Christopherson wrote:

...

Not quite. KVM supports all of those seamlessly, with some caveats. E.g. if host userspace and guest kernel are trying to use the same DRx, the guest will "lose" and not get its #DBs.

Pff, so cloud providers have big fat signs over their workstations saying: you're not allowed to use breakpoints on production systems?

With my silly thinking, I'd prefer to reglement this more explicitly and actually have the kernel enforce policy:

HV userspace has higher prio with #DB or guests do. But the "losing" bit sounds weird and not nice.

...

Definitely not. All I was thinking was something like:

diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h index fdbbbfec745a..a218c5170ecd 100644 --- a/arch/x86/include/asm/debugreg.h +++ b/arch/x86/include/asm/debugreg.h @@ -121,7 +121,7 @@ static __always_inline unsigned long local_db_save(void) { unsigned long dr7;
  if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
  if (static_cpu_has(X86_FEATURE_DRS_MAY_VMEXIT) && !hw_breakpoint_active())
          return 0;
get_debugreg(dr7, 7);

Where X86_FEATURE_DRS_MAY_VMEXIT is set if HYPERVISOR is detected, but then cleared by SEV-ES+ and TDX guests with guaranteed access to DRs. That said, even that much infrastructure probably isn't worth the marginal benefits.

Btw you can replace that X86_FEATURE_DRS_MAY_VMEXIT with a cc_platform flag which gets properly set on all those coco guest types as those flags are exactly for that stuff.

In any case, I don't see why not. It is easy enough and doesn't make things worse, API-wise.

Care to send a proper patch with rationale why?

Thx.

-- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson

26 Apr 26 Apr

12:08 a.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Thu, Apr 24, 2025, Borislav Petkov wrote:

...

On Thu, Apr 24, 2025 at 12:18:50PM -0700, Sean Christopherson wrote:

...
Not quite. KVM supports all of those seamlessly, with some caveats. E.g. if host userspace and guest kernel are trying to use the same DRx, the guest will "lose" and not get its #DBs.

Pff, so cloud providers have big fat signs over their workstations saying: you're not allowed to use breakpoints on production systems?

Heh, it's a bit more than a sign.

...

With my silly thinking, I'd prefer to reglement this more explicitly and actually have the kernel enforce policy:

The kernel already can enforce policy. Setting host breakpoints on guest code is done through a dedicated ioctl(), and access to said ioctl() can be restricted through various sandboxing methods, e.g. seccomp.

...

HV userspace has higher prio with #DB or guests do. But the "losing" bit sounds weird and not nice.

Yeah, it's weird and not nice. But if a human is interactive debugging a guest, odds are very, very good that a missing breakpoint in the guest is not at all a concern.

...

...
Definitely not. All I was thinking was something like:

diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h index fdbbbfec745a..a218c5170ecd 100644 --- a/arch/x86/include/asm/debugreg.h +++ b/arch/x86/include/asm/debugreg.h @@ -121,7 +121,7 @@ static __always_inline unsigned long local_db_save(void) { unsigned long dr7;
  if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active())
  if (static_cpu_has(X86_FEATURE_DRS_MAY_VMEXIT) && !hw_breakpoint_active())
          return 0;
get_debugreg(dr7, 7);

Where X86_FEATURE_DRS_MAY_VMEXIT is set if HYPERVISOR is detected, but then cleared by SEV-ES+ and TDX guests with guaranteed access to DRs. That said, even that much infrastructure probably isn't worth the marginal benefits.
Btw you can replace that X86_FEATURE_DRS_MAY_VMEXIT with a cc_platform flag which gets properly set on all those coco guest types as those flags are exactly for that stuff.

No, that would defeat the purpose of the check. The X86_FEATURE_HYPERVISOR has nothing to do with correctness, it's all about performance. Critically, it's a static check that gets patched at runtime. It's a micro-optimization for bare metal to avoid a single cache miss (the __this_cpu_read(cpu_dr7)). Routing through cc_platform_has() would be far, far heavier than calling hw_breakpoint_active().

I pointed out the SEV-ES+/TDX cases because they likely would benefit from that same micro-optimization, i.e. by avoiding the call to hw_breakpoint_active().

Borislav Petkov

11:26 a.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On April 26, 2025 3:08:29 AM GMT+03:00, Sean Christopherson seanjc@google.com wrote:

...

The kernel already can enforce policy. Setting host breakpoints on guest code is done through a dedicated ioctl(), and access to said ioctl() can be restricted through various sandboxing methods, e.g. seccomp.

Ok, makes sense.

...

No, that would defeat the purpose of the check. The X86_FEATURE_HYPERVISOR has nothing to do with correctness, it's all about performance. Critically, it's a static check that gets patched at runtime. It's a micro-optimization for bare metal to avoid a single cache miss (the __this_cpu_read(cpu_dr7)). Routing through cc_platform_has() would be far, far heavier than calling hw_breakpoint_active().

Huh, we care so much about speed here?

-- Sent from a small device: formatting sucks and brevity is inevitable.

Sean Christopherson

6 May 6 May

1:04 a.m.

New subject: CONFIG_X86_HYPERVISOR (was: Re: [PATCH AUTOSEL 5.10 2/6] x86/cpu: Don't clear X86_FEATURE_LAHF_LM flag in init_amd_k8() on AMD when running in a virtual machine)

On Sat, Apr 26, 2025, Borislav Petkov wrote:

...

On April 26, 2025 3:08:29 AM GMT+03:00, Sean Christopherson seanjc@google.com wrote:

...
No, that would defeat the purpose of the check. The X86_FEATURE_HYPERVISOR has nothing to do with correctness, it's all about performance. Critically, it's a static check that gets patched at runtime. It's a micro-optimization for bare metal to avoid a single cache miss (the __this_cpu_read(cpu_dr7)). Routing through cc_platform_has() would be far, far heavier than calling hw_breakpoint_active().

Huh, we care so much about speed here?

That's a PeterZ question :-)

Sasha Levin

31 Mar 31 Mar

2:37 p.m.

New subject: [PATCH AUTOSEL 5.10 3/6] perf: arm_pmu: Don't disable counter in armpmu_add()

From: Mark Rutland mark.rutland@arm.com

[ Upstream commit dcca27bc1eccb9abc2552aab950b18a9742fb8e7 ]

Currently armpmu_add() tries to handle a newly-allocated counter having a stale associated event, but this should not be possible, and if this were to happen the current mitigation is insufficient and potentially expensive. It would be better to warn if we encounter the impossible case.

Calls to pmu::add() and pmu::del() are serialized by the core perf code, and armpmu_del() clears the relevant slot in pmu_hw_events::events[] before clearing the bit in pmu_hw_events::used_mask such that the counter can be reallocated. Thus when armpmu_add() allocates a counter index from pmu_hw_events::used_mask, it should not be possible to observe a stale even in pmu_hw_events::events[] unless either pmu_hw_events::used_mask or pmu_hw_events::events[] have been corrupted.

If this were to happen, we'd end up with two events with the same event->hw.idx, which would clash with each other during reprogramming, deletion, etc, and produce bogus results. Add a WARN_ON_ONCE() for this case so that we can detect if this ever occurs in practice.

That possiblity aside, there's no need to call arm_pmu::disable(event) for the new event. The PMU reset code initialises the counter in a disabled state, and armpmu_del() will disable the counter before it can be reused. Remove the redundant disable.

Signed-off-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Rob Herring (Arm) robh@kernel.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Tested-by: James Clark james.clark@linaro.org Link: https://lore.kernel.org/r/20250218-arm-brbe-v19-v20-2-4e9922fc2e8e@kernel.or... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/perf/arm_pmu.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 7fd11ef5cb8a2..8568b5a78c45b 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -338,12 +338,10 @@ armpmu_add(struct perf_event *event, int flags) if (idx < 0) return idx;

- /* - * If there is an event in the counter we are going to use then make - * sure it is disabled. - */ + /* The newly-allocated counter should be empty */ + WARN_ON_ONCE(hw_events->events[idx]); + event->hw.idx = idx; - armpmu->disable(event); hw_events->events[idx] = event;

hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;

-- 2.39.5

Sasha Levin

2:37 p.m.

New subject: [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD

From: Douglas Anderson dianders@chromium.org

[ Upstream commit 401c3333bb2396aa52e4121887a6f6a6e2f040bc ]

Add a definition for the Qualcomm Kryo 300-series Gold cores.

Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Douglas Anderson dianders@chromium.org Acked-by: Trilok Soni quic_tsoni@quicinc.com Link: https://lore.kernel.org/r/20241219131107.v3.1.I18e0288742871393228249a768e5d... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/include/asm/cputype.h | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index d8305b4657d2e..5e292e08393d5 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -110,6 +110,7 @@ #define QCOM_CPU_PART_KRYO 0x200 #define QCOM_CPU_PART_KRYO_2XX_GOLD 0x800 #define QCOM_CPU_PART_KRYO_2XX_SILVER 0x801 +#define QCOM_CPU_PART_KRYO_3XX_GOLD 0x802 #define QCOM_CPU_PART_KRYO_3XX_SILVER 0x803 #define QCOM_CPU_PART_KRYO_4XX_GOLD 0x804 #define QCOM_CPU_PART_KRYO_4XX_SILVER 0x805 @@ -167,6 +168,7 @@ #define MIDR_QCOM_KRYO MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO) #define MIDR_QCOM_KRYO_2XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_GOLD) #define MIDR_QCOM_KRYO_2XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_SILVER) +#define MIDR_QCOM_KRYO_3XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_GOLD) #define MIDR_QCOM_KRYO_3XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_SILVER) #define MIDR_QCOM_KRYO_4XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_GOLD) #define MIDR_QCOM_KRYO_4XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_SILVER)

-- 2.39.5

Pavel Machek

18 Apr 18 Apr

4:55 p.m.

New subject: [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD

Hi!

...

From: Douglas Anderson dianders@chromium.org

[ Upstream commit 401c3333bb2396aa52e4121887a6f6a6e2f040bc ]

Add a definition for the Qualcomm Kryo 300-series Gold cores.

Why are we adding unused defines to stable?

Best regards, Pavel

...

+++ b/arch/arm64/include/asm/cputype.h @@ -110,6 +110,7 @@ #define QCOM_CPU_PART_KRYO 0x200 #define QCOM_CPU_PART_KRYO_2XX_GOLD 0x800 #define QCOM_CPU_PART_KRYO_2XX_SILVER 0x801 +#define QCOM_CPU_PART_KRYO_3XX_GOLD 0x802 #define QCOM_CPU_PART_KRYO_3XX_SILVER 0x803 #define QCOM_CPU_PART_KRYO_4XX_GOLD 0x804 #define QCOM_CPU_PART_KRYO_4XX_SILVER 0x805 @@ -167,6 +168,7 @@ #define MIDR_QCOM_KRYO MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO) #define MIDR_QCOM_KRYO_2XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_GOLD) #define MIDR_QCOM_KRYO_2XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_2XX_SILVER) +#define MIDR_QCOM_KRYO_3XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_GOLD) #define MIDR_QCOM_KRYO_3XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_3XX_SILVER) #define MIDR_QCOM_KRYO_4XX_GOLD MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_GOLD) #define MIDR_QCOM_KRYO_4XX_SILVER MIDR_CPU_MODEL(ARM_CPU_IMP_QCOM, QCOM_CPU_PART_KRYO_4XX_SILVER)

-- DENX Software Engineering GmbH, Managing Director: Erika Unter HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

Doug Anderson

7:27 p.m.

New subject: [PATCH AUTOSEL 5.10 4/6] arm64: cputype: Add QCOM_CPU_PART_KRYO_3XX_GOLD

Hi,

On Fri, Apr 18, 2025 at 9:55 AM Pavel Machek pavel@denx.de wrote:

...

Hi!

...
From: Douglas Anderson dianders@chromium.org

[ Upstream commit 401c3333bb2396aa52e4121887a6f6a6e2f040bc ]

Add a definition for the Qualcomm Kryo 300-series Gold cores.

Why are we adding unused defines to stable?

I don't really have a strong opinion, but I can see the logic at some level. This patch definitely doesn't _hurt_ and it seems plausible that a define like this could be used in a future errata. Having this already in stable would mean that the future errata would just pick cleanly without anyone having to track down the original patch.

-Doug

Sasha Levin

31 Mar 31 Mar

2:37 p.m.

New subject: [PATCH AUTOSEL 5.10 5/6] xen/mcelog: Add __nonstring annotations for unterminated strings

From: Kees Cook kees@kernel.org

[ Upstream commit 1c3dfc7c6b0f551fdca3f7c1f1e4c73be8adb17d ]

When a character array without a terminating NUL character has a static initializer, GCC 15's -Wunterminated-string-initialization will only warn if the array lacks the "nonstring" attribute[1]. Mark the arrays with __nonstring to and correctly identify the char array as "not a C string" and thereby eliminate the warning.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117178 [1] Cc: Juergen Gross jgross@suse.com Cc: Stefano Stabellini sstabellini@kernel.org Cc: Oleksandr Tyshchenko oleksandr_tyshchenko@epam.com Cc: xen-devel@lists.xenproject.org Signed-off-by: Kees Cook kees@kernel.org Acked-by: Juergen Gross jgross@suse.com Message-ID: 20250310222234.work.473-kees@kernel.org Signed-off-by: Juergen Gross jgross@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/xen/interface/xen-mca.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/xen/interface/xen-mca.h b/include/xen/interface/xen-mca.h index 7483a78d24251..20a3b320d1a58 100644 --- a/include/xen/interface/xen-mca.h +++ b/include/xen/interface/xen-mca.h @@ -371,7 +371,7 @@ struct xen_mce { #define XEN_MCE_LOG_LEN 32

struct xen_mce_log { - char signature[12]; /* "MACHINECHECK" */ + char signature[12] __nonstring; /* "MACHINECHECK" */ unsigned len; /* = XEN_MCE_LOG_LEN */ unsigned next; unsigned flags;

-- 2.39.5

Sasha Levin

2:37 p.m.

New subject: [PATCH AUTOSEL 5.10 6/6] x86/mm/ident_map: Fix theoretical virtual address overflow to zero

From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com

[ Upstream commit f666c92090a41ac5524dade63ff96b3adcf8c2ab ]

The current calculation of the 'next' virtual address in the page table initialization functions in arch/x86/mm/ident_map.c doesn't protect against wrapping to zero.

This is a theoretical issue that cannot happen currently, the problematic case is possible only if the user sets a high enough x86_mapping_info::offset value - which no current code in the upstream kernel does.

( The wrapping to zero only occurs if the top PGD entry is accessed. There are no such users upstream. Only hibernate_64.c uses x86_mapping_info::offset, and it operates on the direct mapping range, which is not the top PGD entry. )

Should such an overflow happen, it can result in page table corruption and a hang.

To future-proof this code, replace the manual 'next' calculation with p?d_addr_end() which handles wrapping correctly.

[ Backporter's note: there's no need to backport this patch. ]

Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Kai Huang kai.huang@intel.com Reviewed-by: Tom Lendacky thomas.lendacky@amd.com Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Link: https://lore.kernel.org/r/20241016111458.846228-2-kirill.shutemov@linux.inte... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/mm/ident_map.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c index 968d7005f4a72..2f383e288c430 100644 --- a/arch/x86/mm/ident_map.c +++ b/arch/x86/mm/ident_map.c @@ -27,9 +27,7 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, pud_t *pud = pud_page + pud_index(addr); pmd_t *pmd;

- next = (addr & PUD_MASK) + PUD_SIZE; - if (next > end) - next = end; + next = pud_addr_end(addr, end);

if (info->direct_gbpages) { pud_t pudval; @@ -68,10 +66,7 @@ static int ident_p4d_init(struct x86_mapping_info *info, p4d_t *p4d_page, p4d_t *p4d = p4d_page + p4d_index(addr); pud_t *pud;

- next = (addr & P4D_MASK) + P4D_SIZE; - if (next > end) - next = end; - + next = p4d_addr_end(addr, end); if (p4d_present(*p4d)) { pud = pud_offset(p4d, 0); result = ident_pud_init(info, pud, addr, next); @@ -113,10 +108,7 @@ int kernel_ident_mapping_init(struct x86_mapping_info *info, pgd_t *pgd_page, pgd_t *pgd = pgd_page + pgd_index(addr); p4d_t *p4d;

- next = (addr & PGDIR_MASK) + PGDIR_SIZE; - if (next > end) - next = end; - + next = pgd_addr_end(addr, end); if (pgd_present(*pgd)) { p4d = p4d_offset(pgd, 0); result = ident_p4d_init(info, p4d, addr, next);

-- 2.39.5

Pavel Machek

18 Apr 18 Apr

4:52 p.m.

Hi!

...

[ Upstream commit 208baa3ec9043a664d9acfb8174b332e6b17fb69 ]

If malloc returns NULL due to low memory, 'config' pointer can be NULL. Add a check to prevent NULL dereference.

This fixes nothing. We have oom killer, so we don't have malloc returning NULL.

Best regards, Pavel

...

+++ b/tools/power/cpupower/bench/parse.c @@ -120,6 +120,10 @@ FILE *prepare_output(const char *dirname) struct config *prepare_default_config() { struct config *config = malloc(sizeof(struct config));
if (!config) {
perror("malloc");
return NULL;
}
dprintf("loading defaults\n");

-- DENX Software Engineering GmbH, Managing Director: Erika Unter HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany

110

days inactive

146

days old

linux-stable-mirror@lists.linaro.org

24 comments

participants

tags (0)

participants (5)

Borislav Petkov
Doug Anderson
Pavel Machek
Sasha Levin
Sean Christopherson