On 6/4/2024 6:17 AM, Sean Christopherson wrote:
On Tue, May 28, 2024, Paolo Bonzini wrote:
On Tue, May 28, 2024 at 6:19 AM Manali Shukla manali.shukla@amd.com wrote:
The upcoming new Idle HLT Intercept feature allows for the HLT instruction execution by a vCPU to be intercepted by the hypervisor only if there are no pending V_INTR and V_NMI events for the vCPU. When the vCPU is expected to service the pending V_INTR and V_NMI events, the Idle HLT intercept won’t trigger. The feature allows the hypervisor to determine if the vCPU is actually idle and reduces wasteful VMEXITs.
Does this have an effect on the number of vmexits for KVM, unless AVIC is enabled? Can you write a testcase for kvm-unit-tests' vmexit.flat that shows an improvement?
The reason I am wondering is because KVM does not really use V_INTR injection. The "idle HLT" intercept basically differs from the basic HLT trigger only in how it handles an STI;HLT sequence, as in that case the interrupt can be injected directly and the HLT vmexit is suppressed. But in that circumstance KVM would anyway use a V_INTR intercept to detect the opening of the interrupt injection window (and then the interrupt uses event injection rather than V_INTR). Again, this is only true if AVIC is disabled, but that is the default.
So unless I'm wrong in my analysis above, I'm not sure this series, albeit small, is really worth it.
But aren't we hoping to enable x2AVIC by default sooner than later?
The idle halt intercept feature not only suppresses HLT exit when a V_INTR event is pending during the execution of halt instruction, but it also suppresses HLT exit when a V_NMI event is pending during the execution of halt instruction. This capability will be advantageous in IBS virtualization and PMC virtualization functionalities, as both rely on VNMI for delivering virtualized interrupts from IBS and PMC hardware.
As things stand, it would be more interesting to enable this for nested VMs, especially Hyper-V which does use V_INTR and V_TPL; even better, _emulating_ it on older processors would reduce the L2->L0->L1->L0->L2 path to a less-expensive L2->L0->L2 vmexit.