RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on demand

31 Dec 2021


      ...
From: Tian, Kevin
Sent: Thursday, December 30, 2021 9:28 AM
...
As posted, there is zero chance that the patches correctly handling #NM in
L2
...
because nested_vmx_l0_wants_exit() doesn't prevent an #NM from being
forwarded
to L1.  E.g. if L0 (KVM) and L1 both intercept #NM, handle_exception_nm()
will
queue a #NM for injection and then syntehsizea nested VM-Exit, i.e. the
#NM
...
from
L2 will get injected into L1 after the nested exit.
Yes, it's incorrect behavior.
...
That also means handle_exception_nmi_irqoff() => handle_exception_nm()
is fatally
broken on non-XFD hardware, as it will attempt
RDMSR(MSR_IA32_XFD_ERR)
...
if L1
intercepts #NM since handle_exception_nmi_irqoff() will run before
__vmx_handle_exit() => nested_vmx_reflect_vmexit() checks whether L0 or
L1 should
handle the exit.
Ditto. Thanks for pointing out those facts that we obviously overlooked.
So handle_exception_nm() should neither blindly read xfd_err (#NM might
be
caused by L1 interception on a non-xfd platform) nor blindly queue a #NM
exception (triggered in L2) to L1 which intercepts #NM (then expects vm-exit)
The former suggests that reading xfd_err should be made conditionally
similar to what we did in vmx_update_exception_bitmap(). The latter
suggests the actual exception is better postponed until __vmx_handle_exit().
We are looking at this change now.
And once #NM handler works correctly to handle interception by either L0
or L1, I'm not sure whether we still want to special case L1 vs. L2 in
vmx_update_exception_bitmap(), since in the end L0 wants to intercept
#NM to save xfd_err for both L1 and L2 (if exposed with xfd capability by L1).
the new change is like below.
static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
 {
    /*
     * Save xfd_err to guest_fpu before interrupt is enabled, so the
     * guest value is not clobbered by the host activity before the guest
     * has chance to consume it.
     *
     * Since trapping #NM is started when xfd write interception is
     * disabled, using this flag to guard the saving operation. This
     * implies no-op for a non-xfd #NM due to L1 interception.
     *
     * Queuing exception is done in vmx_handle_exit.
     */
    if (vcpu->arch.xfd_no_write_intercept)
    	rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
}
in the final series it will first check vcpu->arch.guest_fpu.fpstate->xfd
before the disable interception patch is applied and then becomes
the above form, similar to your suggestion on vmx_update_exception_bitmap().
whether to check msr_bitmap vs. an extra flag is an orthogonal open.
Then:
handle_exception_nmi(struct kvm_vcpu *vcpu)
{
    ...
    if (is_machine_check(intr_info) || is_nmi(intr_info))
    	return 1; /* handled by handle_exception_nmi_irqoff() */
/*
     * Queue the exception here instead of in handle_nm_fault_irqoff().
     * This ensures the nested_vmx check is not skipped so vmexit can
     * be reflected to L1 (when it intercepts #NM) before reaching this
     * point.
     */
    if (is_nm_fault(intr_info)) {
    	kvm_queue_exception(vcpu, NM_VECTOR);
    	return 1;
    }
...
}
Then regarding to test non-AMX nested #NM usage, it might be difficult
to trigger it from modern OS. As commented by Linux #NM handler, it's
expected only for XFD or math emulation when fpu is missing. So we plan
to run a selftest in L1 which sets CR0.TS and then touch fpu registers. and
for L1 kernel we will run two binaries with one trapping #NM and the other
not.
Please let us know if you have a better suggestion for the testing here.
Thanks
Kevin

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on demand