On Wed, Apr 23, 2025, Borislav Petkov wrote:
Eww. Optimization to lessen the pain of DR7 interception. It'd be nice to clean this up at some point, especially with things like SEV-ES with DebugSwap, where DR7 is never intercepted. arch/x86/include/asm/debugreg.h: if (static_cpu_has(X86_FEATURE_HYPERVISOR) && !hw_breakpoint_active()) arch/x86/kernel/hw_breakpoint.c: * When in guest (X86_FEATURE_HYPERVISOR), local_db_save()
Patch adding it says "Because DRn access is 'difficult' with virt;..." so yeah. I guess we need to agree how to do debug exceptions in guests. Probably start documenting it and then have guest and host adhere to that. I'm talking completely without having looked at what the code does but the "handshake" agreement should be something like this and then we can start simplifying code...
I don't know that we'll be able to simplify the code.
#DBs in the guest are complex because DR[0-3] aren't context switched by hardware, and running with active breakpoints is uncommon. As a result, loading the guest's DRs into hardware on every VM-Enter is undesirable, because it would add significant latency (load DRs on entry, save DRs on exit) for a relatively rare situation (guest has active breakpoints).
KVM (and presumably other hypervisors) intercepts DR accesses so that it can detect when the guest has active breakpoints (DR7 bits enabled), at which point KVM does load the guest's DRs into hardware and disables DR interception until the next VM-Exit.
KVM also allows the host user to utilize hardware breakpoints to debug the guest, which further adds to the madness, and that's not something the guest can change or even influence.
So removing the "am I guest logic" entirely probably isn't feasible, because in the common case where there are no active breakpoints, reading cpu_dr7 instead of DR7 is a significant performance boost for "normal" VMs.
I mentioned SEV-ES+ DebugSwap because in that case DR7 is effectively guaranteed to not be intercepted, and so the native behavior of reading DR7 instead of the per-CPU variable is likely desirable. I believe TDX has similar functionality (I forget if it's always on, or opt-in).