New subject: [RFC PATCH 01/13] KVM: Introduce config option for per-vCPU vLPI enablement

20 Nov 2025

      At the moment, the ability to direct-inject vLPIs is only enableable
on an all-or-nothing per-VM basis, causing unnecessary I/O performance
loss in cases where a VM's vCPU count exceeds available vPEs. This RFC
introduces per-vCPU control over vLPI injection to realize potential
I/O performance gain in such situations.
Background
----------
The value of dynamically enabling the direct injection of vLPIs on a
per-vCPU basis is the ability to run guest VMs with simultaneous
hardware-forwarded and software-forwarded message-signaled interrupts.
Currently, hardware-forwarded vLPI direct injection on a KVM guest
requires GICv4 and is enabled on a per-VM, all-or-nothing basis. vLPI
injection enablment happens in two stages:
1) At vGIC initialization, allocate direct injection structures for
       each vCPU (doorbell IRQ, vPE table entry, virtual pending table,
       vPEID).
    2) When a PCI device is configured for passthrough, map its MSIs to
       vLPIs using the structures allocated in step 1.
Step 1 is all-or-nothing; if any vCPU cannot be configured with the
vPE structures necessary for direct injection, the vPEs of all vCPUs
are torn down and direct injection is disabled VM-wide.
This universality of direct vLPI injection enablement sparks several
issues, with the most pressing being performance degradation on
overcommitted hosts.
VM-wide vLPI enablement creates resource inefficiency when guest
VMs have more vCPUs than the host has available vPEIDs. The amount of
vPEIDs (and consequently, vPEs) a host can allocate is constrained by
hardware and defined by GICD_TYPER2.VID + 1 (ITS_MAX_VPEID). Since
direct injection requires a vCPU to be assigned a vPEID, at most
ITS_MAX_VPEID vCPUs can be configured for direct injection at a time.
Because vLPI direct injection is all-or-nothing on a VM, if a new guest
VM would exhaust remaining vPEIDs, all vCPUs on that VM would fall back
to hypervisor-forwarded LPIs, causing considerable I/O performance
degradation.
Such performance degradation is exemplified on hosts with CPU
overcommitment. Overcommitting an arbitrarily high number of vCPUs
enables a VM's vCPU count to easily exceed the host's available vPEIDs.
Even with marginally more vCPUs than vPEIDs, the current all-or-nothing
vLPI paradigm disables direct injection entirely. This creates two
problems: first, a single many-vCPU overcommitted VM loses all direct
injection despite having vPEIDs available; second, on multi-tenant
hosts, VMs booted first consume all vPEIDs, leaving later VMs without
direct injection regardless of their I/O intensity. Per-vCPU control
would allow userspace to allocate available vPEIDs across VMs based on
I/O workload rather than boot order or per-VM vCPU count. This per-vCPU
granularity recovers most of the direct injection performance benefit
instead of losing it completely.
To allow this per-vCPU granularity, this RFC introduces three new ioctls
to the KVM API that enables userspace the ability to activate/deactivate
direct vLPI injection capability and resources to vCPUs ad-hoc during VM
runtime.
This RFC proposes userspace control, rather than kernel control, over
vPEID allocation for simplicity of implementation, ease of testability,
and autonomy over resource usage. In the future, the vLPI enable/disable
building blocks from this RFC may be used to implement a full vPE
allocation policy in the kernel.
The solution comes in several parts
-----------------------------------
1) [P 1] General declarations (ioctl definitions/stubs, kconfig option)
2) [P 2] Conditionally disable auto vLPI injection init routines
To prevent vCPUs from exceeding vPEID allocation limits upon VM boot,
   disable automatic vPEID allocation in the GICv4 initialization
   routine when the per-vCPU kconfig is active. Likewise, disable
   automatic hardware forwarding for PCI device-backed MSIs upon device
   registration.
3) [P 3-6] Implement per-vCPU vLPI enablement routine, which:
a) Creates per-vCPU doorbell IRQ on new vCPU-scoped, rather than
      VM-scoped, interrupt domain hierarchies.
b) Allocates per-vCPU vPE table entries and virtual pending table,
      linking them to the vCPU's doorbell IRQ.
c) Iterates through interrupt translation table to set hardware
      forwarding for all PCI device–backed interrupts targeting the
      specific vCPU.
3) [P 7-8] Implement per-vCPU vLPI disablement routine, which
a) Iterates through interrupt translation table to unset hardware
      forwarding for all interrupts targeting the specific vCPU.
b) Frees per-vCPU vPE table entries, virtual pending table, and
      doorbell IRQ, then removes vgic_dist's pointer to the vCPU's
      freed vPE.
4) [P 9] Couple vSGI enablement with per-vCPU vPE allocation
Since vSGIs cannot be direct-injected without an allocated vPE on
   the receiving vCPU, couple vSGI enablement with vLPI enablement
   on GICv4.1.
5) [P 10-13] Write selftests for vLPI direct injection
PCI devices cannot be passed through to selftest guests, so
   define an ioctl that mocks a hardware source for software-defined
   MSI interrupts and sets vLPI "hardware" forwarding for the MSIs. Use
   these vLPIs to selftest per-vCPU vLPI enablement/disablement ioctls.
Testing
-------
Testing has been carried out via selftests and QEMU-emulated guests.
Selftests have covered diverse vLPI configurations and race conditions.
These include:
1) Stress testing LPI injection across multiple vCPUs while 
   concurrently and repeatedly toggling the vCPUs' vLPI
   injection capability.
2) Enabling/disabling vLPI direct injection while scheduling or
   unscheduling a vCPU.
3) Allocating and freeing a single vPEID to multiple vCPUs, ensuring
   reusability.
4) Attempting to allocate a vPEID when all are already allocated,
   validating an error is thrown.
5) Calling enable/disable vLPI ioctls when GIC is not initialized.
6) Idempotent ioctl calls.
PCI device passthrough and interrupt injection to QEMU guest
demonstrated:
1) Complete hypervisor circumvention when vLPI injection is enabled on
   a vCPU, hypervisor forwarding when vLPI injection is disabled.
2) Interrupts are not lost when received during per-vCPU vLPI state
   transitions.
Caveats
-------
1) Pending interrupts are flushed when vLPI injection is disabled for a
   vCPU; hardware pending state is not transfered to software. This may
   cause pending interrupts to be lost upon vPE disablement.
Unlike vSGIs, vLPIs do not expose their pending state through a
   GICD_ISPENDR register. Thus, we would need to read the pending state
   of the vLPI from the vPT. To read the pending status of the vLPI from
   vPT, we would need to invalidate any vPT cache associated with the
   vCPU's vPE. This requires unmapping the vPE and halting the vCPU,
   which would be incredibly expensive and unecessary given that MSIs
   are usually recoverable by the driver.
2) Direct-injected vSGIs (GICv4.1) require vCPUs to have associated
   vPEs. Since disabling vLPI injection on a vCPU frees its
   vPE, vSGI direct injection must simultaenously be disabled as well.
   At the moment, we use the per-vCPU vSGI toggle mechanism introduced
   in commit bacf2c6 to enable/disable vSGI injection alongside vLPI
   injection.
Maximilian Dittgen (13):
  KVM: Introduce config option for per-vCPU vLPI enablement
  KVM: arm64: Disable auto vCPU vPE assignment with per-vCPU vLPI config
  KVM: arm64: Refactor out locked section of
    kvm_vgic_v4_set_forwarding()
  KVM: arm64: Implement vLPI QUERY ioctl for per-vCPU vLPI injection API
  KVM: arm64: Implement vLPI ENABLE ioctl for per-vCPU vLPI injection
    API
  KVM: arm64: Resolve race between vCPU scheduling and vLPI enablement
  KVM: arm64: Implement vLPI DISABLE ioctl for per-vCPU vLPI Injection
    API
  KVM: arm64: Make per-vCPU vLPI control ioctls atomic
  KVM: arm64: Couple vSGI enablement with per-vCPU vPE allocation
  KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stress
  KVM: Ioctl to set up userspace-injected MSIs as software-bypassing
    vLPIs
  KVM: arm64: selftests: Add support for stress testing direct-injected
    vLPIs
  KVM: arm64: selftests: Add test for per-vCPU vLPI control API
Documentation/virt/kvm/api.rst                |  56 +++
 arch/arm64/kvm/arm.c                          |  89 +++++
 arch/arm64/kvm/vgic/vgic-its.c                | 142 ++++++-
 arch/arm64/kvm/vgic/vgic-v3.c                 |  14 +-
 arch/arm64/kvm/vgic/vgic-v4.c                 | 370 +++++++++++++++++-
 arch/arm64/kvm/vgic/vgic.h                    |  10 +
 drivers/irqchip/Kconfig                       |  13 +
 drivers/irqchip/irq-gic-v3-its.c              |  58 ++-
 drivers/irqchip/irq-gic-v4.c                  |  75 +++-
 include/kvm/arm_vgic.h                        |   8 +
 include/linux/irqchip/arm-gic-v3.h            |   5 +
 include/linux/irqchip/arm-gic-v4.h            |  10 +-
 include/linux/kvm_host.h                      |  11 +
 include/uapi/linux/kvm.h                      |  22 ++
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../selftests/kvm/arm64/per_vcpu_vlpi.c       | 274 +++++++++++++
 .../selftests/kvm/arm64/vgic_lpi_stress.c     | 181 ++++++++-
 .../selftests/kvm/lib/arm64/gic_v3_its.c      |   9 +-
 18 files changed, 1307 insertions(+), 41 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/arm64/per_vcpu_vlpi.c
-- 
2.50.1 (Apple Git-155)

Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

[RFC PATCH 00/13] Introduce per-vCPU vLPI injection control API

Signed-off-by: Maximilian Dittgen mdittgen@amazon.de

Background