RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on demand

31 Dec 2021


      ...
From: Sean Christopherson seanjc@google.com
Sent: Thursday, December 30, 2021 1:26 AM
On Wed, Dec 29, 2021, Tian, Kevin wrote:
...
...
From: Tian, Kevin
Sent: Wednesday, December 29, 2021 11:35 AM
...
Speaking of nested, interception of #NM in
vmx_update_exception_bitmap()
...
is wrong
with respect to nested guests.  Until XFD is supported for L2, which I
didn't
...
...
...
see
in this series, #NM should not be intercepted while L2 is running.
Can you remind what additional thing is required to support XFD for L2?
ok, at least we require additional work on when to disable write
interception.
...
It can be done only when both L0 and L1 make the same decision, just like
what we did for many other VMCS settings...
I'm not terribly concerned with exposing XFD to L2 right now, I'm much more
concerned with not breaking nVMX when XFD is exposed to L1.
...
...
If only about performance I prefer to the current conservative approach
as the first step. As explained earlier, #NM should be rare if the guest
doesn't run AMX applications at all. Adding nested into this picture
doesn't
...
...
make things a lot worser.
All your comments incorporated except this one. As said always trapping
#NM for L2 is not a big problem, as long as it's rare and don't break
function.
...
Usually a relatively static scheme is safer than a dynamic one in case of
anything overlooked for the initial implementation. 😊
I strongly disagree, it's not automagically safer.  Either way, we need to
validate
that KVM does the correct thing with respect to vmcs12.  E.g. does KVM
correctly
reflect the #NM back to L2 when it's not intercepted by L1?  Does it
synthesize a
nested VM-Exit when it is intercepted by L1?  The testing required doesn't
magically
go away.
As posted, there is zero chance that the patches correctly handling #NM in L2
because nested_vmx_l0_wants_exit() doesn't prevent an #NM from being
forwarded
to L1.  E.g. if L0 (KVM) and L1 both intercept #NM, handle_exception_nm()
will
queue a #NM for injection and then syntehsizea nested VM-Exit, i.e. the #NM
from
L2 will get injected into L1 after the nested exit.
Yes, it's incorrect behavior.
...
That also means handle_exception_nmi_irqoff() => handle_exception_nm()
is fatally
broken on non-XFD hardware, as it will attempt RDMSR(MSR_IA32_XFD_ERR)
if L1
intercepts #NM since handle_exception_nmi_irqoff() will run before
__vmx_handle_exit() => nested_vmx_reflect_vmexit() checks whether L0 or
L1 should
handle the exit.
Ditto. Thanks for pointing out those facts that we obviously overlooked.
So handle_exception_nm() should neither blindly read xfd_err (#NM might be 
caused by L1 interception on a non-xfd platform) nor blindly queue a #NM 
exception (triggered in L2) to L1 which intercepts #NM (then expects vm-exit)
The former suggests that reading xfd_err should be made conditionally
similar to what we did in vmx_update_exception_bitmap(). The latter
suggests the actual exception is better postponed until __vmx_handle_exit().
We are looking at this change now.
And once #NM handler works correctly to handle interception by either L0
or L1, I'm not sure whether we still want to special case L1 vs. L2 in 
vmx_update_exception_bitmap(), since in the end L0 wants to intercept 
#NM to save xfd_err for both L1 and L2 (if exposed with xfd capability by L1).
Thanks
Kevin

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on demand