At the moment, the ability to direct-inject vLPIs is only enableable on an all-or-nothing per-VM basis, causing unnecessary I/O performance loss in cases where a VM's vCPU count exceeds available vPEs. This RFC introduces per-vCPU control over vLPI injection to realize potential I/O performance gain in such situations.
Background ----------
The value of dynamically enabling the direct injection of vLPIs on a per-vCPU basis is the ability to run guest VMs with simultaneous hardware-forwarded and software-forwarded message-signaled interrupts.
Currently, hardware-forwarded vLPI direct injection on a KVM guest requires GICv4 and is enabled on a per-VM, all-or-nothing basis. vLPI injection enablment happens in two stages:
1) At vGIC initialization, allocate direct injection structures for each vCPU (doorbell IRQ, vPE table entry, virtual pending table, vPEID). 2) When a PCI device is configured for passthrough, map its MSIs to vLPIs using the structures allocated in step 1.
Step 1 is all-or-nothing; if any vCPU cannot be configured with the vPE structures necessary for direct injection, the vPEs of all vCPUs are torn down and direct injection is disabled VM-wide.
This universality of direct vLPI injection enablement sparks several issues, with the most pressing being performance degradation on overcommitted hosts.
VM-wide vLPI enablement creates resource inefficiency when guest VMs have more vCPUs than the host has available vPEIDs. The amount of vPEIDs (and consequently, vPEs) a host can allocate is constrained by hardware and defined by GICD_TYPER2.VID + 1 (ITS_MAX_VPEID). Since direct injection requires a vCPU to be assigned a vPEID, at most ITS_MAX_VPEID vCPUs can be configured for direct injection at a time. Because vLPI direct injection is all-or-nothing on a VM, if a new guest VM would exhaust remaining vPEIDs, all vCPUs on that VM would fall back to hypervisor-forwarded LPIs, causing considerable I/O performance degradation.
Such performance degradation is exemplified on hosts with CPU overcommitment. Overcommitting an arbitrarily high number of vCPUs enables a VM's vCPU count to easily exceed the host's available vPEIDs. Even with marginally more vCPUs than vPEIDs, the current all-or-nothing vLPI paradigm disables direct injection entirely. This creates two problems: first, a single many-vCPU overcommitted VM loses all direct injection despite having vPEIDs available; second, on multi-tenant hosts, VMs booted first consume all vPEIDs, leaving later VMs without direct injection regardless of their I/O intensity. Per-vCPU control would allow userspace to allocate available vPEIDs across VMs based on I/O workload rather than boot order or per-VM vCPU count. This per-vCPU granularity recovers most of the direct injection performance benefit instead of losing it completely.
To allow this per-vCPU granularity, this RFC introduces three new ioctls to the KVM API that enables userspace the ability to activate/deactivate direct vLPI injection capability and resources to vCPUs ad-hoc during VM runtime.
This RFC proposes userspace control, rather than kernel control, over vPEID allocation for simplicity of implementation, ease of testability, and autonomy over resource usage. In the future, the vLPI enable/disable building blocks from this RFC may be used to implement a full vPE allocation policy in the kernel.
The solution comes in several parts -----------------------------------
1) [P 1] General declarations (ioctl definitions/stubs, kconfig option)
2) [P 2] Conditionally disable auto vLPI injection init routines
To prevent vCPUs from exceeding vPEID allocation limits upon VM boot, disable automatic vPEID allocation in the GICv4 initialization routine when the per-vCPU kconfig is active. Likewise, disable automatic hardware forwarding for PCI device-backed MSIs upon device registration.
3) [P 3-6] Implement per-vCPU vLPI enablement routine, which:
a) Creates per-vCPU doorbell IRQ on new vCPU-scoped, rather than VM-scoped, interrupt domain hierarchies.
b) Allocates per-vCPU vPE table entries and virtual pending table, linking them to the vCPU's doorbell IRQ.
c) Iterates through interrupt translation table to set hardware forwarding for all PCI device–backed interrupts targeting the specific vCPU.
3) [P 7-8] Implement per-vCPU vLPI disablement routine, which
a) Iterates through interrupt translation table to unset hardware forwarding for all interrupts targeting the specific vCPU.
b) Frees per-vCPU vPE table entries, virtual pending table, and doorbell IRQ, then removes vgic_dist's pointer to the vCPU's freed vPE.
4) [P 9] Couple vSGI enablement with per-vCPU vPE allocation
Since vSGIs cannot be direct-injected without an allocated vPE on the receiving vCPU, couple vSGI enablement with vLPI enablement on GICv4.1.
5) [P 10-13] Write selftests for vLPI direct injection
PCI devices cannot be passed through to selftest guests, so define an ioctl that mocks a hardware source for software-defined MSI interrupts and sets vLPI "hardware" forwarding for the MSIs. Use these vLPIs to selftest per-vCPU vLPI enablement/disablement ioctls.
Testing ------- Testing has been carried out via selftests and QEMU-emulated guests.
Selftests have covered diverse vLPI configurations and race conditions. These include: 1) Stress testing LPI injection across multiple vCPUs while concurrently and repeatedly toggling the vCPUs' vLPI injection capability. 2) Enabling/disabling vLPI direct injection while scheduling or unscheduling a vCPU. 3) Allocating and freeing a single vPEID to multiple vCPUs, ensuring reusability. 4) Attempting to allocate a vPEID when all are already allocated, validating an error is thrown. 5) Calling enable/disable vLPI ioctls when GIC is not initialized. 6) Idempotent ioctl calls.
PCI device passthrough and interrupt injection to QEMU guest demonstrated: 1) Complete hypervisor circumvention when vLPI injection is enabled on a vCPU, hypervisor forwarding when vLPI injection is disabled. 2) Interrupts are not lost when received during per-vCPU vLPI state transitions.
Caveats -------
1) Pending interrupts are flushed when vLPI injection is disabled for a vCPU; hardware pending state is not transfered to software. This may cause pending interrupts to be lost upon vPE disablement.
Unlike vSGIs, vLPIs do not expose their pending state through a GICD_ISPENDR register. Thus, we would need to read the pending state of the vLPI from the vPT. To read the pending status of the vLPI from vPT, we would need to invalidate any vPT cache associated with the vCPU's vPE. This requires unmapping the vPE and halting the vCPU, which would be incredibly expensive and unecessary given that MSIs are usually recoverable by the driver.
2) Direct-injected vSGIs (GICv4.1) require vCPUs to have associated vPEs. Since disabling vLPI injection on a vCPU frees its vPE, vSGI direct injection must simultaenously be disabled as well. At the moment, we use the per-vCPU vSGI toggle mechanism introduced in commit bacf2c6 to enable/disable vSGI injection alongside vLPI injection.
Maximilian Dittgen (13): KVM: Introduce config option for per-vCPU vLPI enablement KVM: arm64: Disable auto vCPU vPE assignment with per-vCPU vLPI config KVM: arm64: Refactor out locked section of kvm_vgic_v4_set_forwarding() KVM: arm64: Implement vLPI QUERY ioctl for per-vCPU vLPI injection API KVM: arm64: Implement vLPI ENABLE ioctl for per-vCPU vLPI injection API KVM: arm64: Resolve race between vCPU scheduling and vLPI enablement KVM: arm64: Implement vLPI DISABLE ioctl for per-vCPU vLPI Injection API KVM: arm64: Make per-vCPU vLPI control ioctls atomic KVM: arm64: Couple vSGI enablement with per-vCPU vPE allocation KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stress KVM: Ioctl to set up userspace-injected MSIs as software-bypassing vLPIs KVM: arm64: selftests: Add support for stress testing direct-injected vLPIs KVM: arm64: selftests: Add test for per-vCPU vLPI control API
Documentation/virt/kvm/api.rst | 56 +++ arch/arm64/kvm/arm.c | 89 +++++ arch/arm64/kvm/vgic/vgic-its.c | 142 ++++++- arch/arm64/kvm/vgic/vgic-v3.c | 14 +- arch/arm64/kvm/vgic/vgic-v4.c | 370 +++++++++++++++++- arch/arm64/kvm/vgic/vgic.h | 10 + drivers/irqchip/Kconfig | 13 + drivers/irqchip/irq-gic-v3-its.c | 58 ++- drivers/irqchip/irq-gic-v4.c | 75 +++- include/kvm/arm_vgic.h | 8 + include/linux/irqchip/arm-gic-v3.h | 5 + include/linux/irqchip/arm-gic-v4.h | 10 +- include/linux/kvm_host.h | 11 + include/uapi/linux/kvm.h | 22 ++ tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/arm64/per_vcpu_vlpi.c | 274 +++++++++++++ .../selftests/kvm/arm64/vgic_lpi_stress.c | 181 ++++++++- .../selftests/kvm/lib/arm64/gic_v3_its.c | 9 +- 18 files changed, 1307 insertions(+), 41 deletions(-) create mode 100644 tools/testing/selftests/kvm/arm64/per_vcpu_vlpi.c
Add CONFIG_ARM_GIC_V3_PER_VCPU_VLPI to control whether vLPI direct injection is to be enabled on a system-wide or a per-vCPU basis.
When enabled, vPEs can be allocated/deallocated to vCPUs on an ad-hoc, per-vCPU basis in runtime. When disabled, keep current vgic_v4_init behavior of automatic vCPU vPE allocation upon VM initialization.
We declare three ioctls numbers to manage per-vCPU vLPI enablement: - KVM_ENABLE_VCPU_VLPI, which given a vCPU ID, allocates a vPE and initializes the vCPU for receiving direct vLPI interrupts. - KVM_DISABLE_VCPU_VLPI, which given a vCPU ID, disables the vCPU’s ability to receive direct vLPI interrupts and frees its underlying vPE structure. - KVM_QUERY_VCPU_VLPI, which given a vCPU ID, returns a boolean describing whether the vCPU is configured to receive direct vLPI interrupts.
This commit declares the kconfig, ioctl numbers, and documentation. Implementation will come throughout this patch set.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de --- Documentation/virt/kvm/api.rst | 56 ++++++++++++++++++++++++++++++++++ arch/arm64/kvm/arm.c | 15 +++++++++ arch/arm64/kvm/vgic/vgic-v4.c | 9 ++++++ arch/arm64/kvm/vgic/vgic.h | 2 ++ drivers/irqchip/Kconfig | 13 ++++++++ include/uapi/linux/kvm.h | 6 ++++ 6 files changed, 101 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 27f726ff8fe0..dcfb326dff10 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6517,6 +6517,62 @@ the capability to be present.
`flags` must currently be zero.
+4.XXX KVM_ENABLE_VCPU_VLPI +-------------------------- + +:Capability: KVM_CAP_ARM_PER_VCPU_VLPI +:Architectures: arm64 +:Type: vm ioctl +:Parameters: int vcpu_id (in) +:Returns: 0 on success, negative value on error + +This ioctl enables GICv4 direct vLPI injection for the specified vCPU. +Allocates vPE structures (doorbell IRQ, vPE table entry, virtual pending +table, vPEID) and upgrades existing software-forwarded LPIs targeting +this vCPU to hardware-forwarded vLPIs. + +If GICv4.1 is supported and vSGIs are disabled on the specified vCPU, +this ioctl enables vCPU vSGI support. + +Requires CONFIG_ARM_GIC_V3_PER_VCPU_VLPI and GICv4 hardware support. + +Returns -EINVAL if vGICv4 is not initialized or if the passed vcpu_id +does not map to a vCPU. + +4.XXX KVM_DISABLE_VCPU_VLPI +--------------------------- + +:Capability: KVM_CAP_ARM_PER_VCPU_VLPI +:Architectures: arm64 +:Type: vm ioctl +:Parameters: int vcpu_id (in) +:Returns: 0 on success, negative value on error + +This ioctl disables GICv4 direct vLPI injection for the specified vCPU. +Downgrades hardware-forwarded vLPIs to software-forwarded LPIs and frees +vPE structures. Pending interrupts in the virtual pending table may be +lost. + +If vSGIs are enabled on the specified vCPU, this ioctl disables them. + +Returns -EINVAL if vGICv4 is not initialized or if the passed vcpu_id +does not map to a vCPU. + +4.XXX KVM_QUERY_VCPU_VLPI +------------------------- + +:Capability: KVM_CAP_ARM_PER_VCPU_VLPI +:Architectures: arm64 +:Type: vm ioctl +:Parameters: int vcpu_id (in) +:Returns: 1 if enabled, 0 if disabled, negative value on error + +This ioctl queries whether GICv4 direct vLPI injection is enabled for +the specified vCPU. + +Returns -EINVAL if vGICv4 is not initialized or if the passed vcpu_id +does not map to a vCPU. +
.. _kvm_run:
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 452d0c85281e..2839e11ba2c1 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -424,6 +424,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) else r = kvm_supports_cacheable_pfnmap(); break; + case KVM_CAP_ARM_PER_VCPU_VLPI: + r = kvm_per_vcpu_vlpi_supported(); + break;
default: r = 0; @@ -1947,6 +1950,18 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) return -EFAULT; return kvm_vm_ioctl_get_reg_writable_masks(kvm, &range); } + case KVM_ENABLE_VCPU_VLPI: { + /* TODO: create ioctl handler function */ + return -ENOSYS; + } + case KVM_DISABLE_VCPU_VLPI: { + /* TODO: create ioctl handler function */ + return -ENOSYS; + } + case KVM_QUERY_VCPU_VLPI: { + /* TODO: create ioctl handler function */ + return -ENOSYS; + } default: return -EINVAL; } diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index 09c3e9eb23f8..9ef12c33b3f7 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -226,6 +226,15 @@ void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val) *val = !!(*ptr & mask); }
+bool kvm_per_vcpu_vlpi_supported(void) +{ +#ifdef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI + return kvm_vgic_global_state.has_gicv4; +#else + return false; +#endif +} + int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq) { return request_irq(irq, vgic_v4_doorbell_handler, 0, "vcpu", vcpu); diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index 5f0fc96b4dc2..99894806a4e9 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -467,4 +467,6 @@ static inline bool vgic_is_v3(struct kvm *kvm) int vgic_its_debug_init(struct kvm_device *dev); void vgic_its_debug_destroy(struct kvm_device *dev);
+bool kvm_per_vcpu_vlpi_supported(void); + #endif diff --git a/drivers/irqchip/Kconfig b/drivers/irqchip/Kconfig index a61c6dc63c29..1c3e0c6d3177 100644 --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -52,6 +52,19 @@ config ARM_GIC_V3_ITS default ARM_GIC_V3 select IRQ_MSI_IOMMU
+config ARM_GIC_V3_PER_VCPU_VLPI + bool "ARM GICv4 per-vCPU vLPI direct injection support" + depends on ARM_GIC_V3_ITS + default n + help + Enable GICv4 direct injection of MSIs as vLPIs on a per vCPU + basis. Enables partial vLPI enablement on systems with more + vCPU capacity than vPE capacity. When enabled, all vCPUs + will boot without GICv4 vPE structures and handle interrupts + as software LPIs. KVM_ENABLE_VCPU_VLPI ioctl must then be called on + individual vCPUs to initialize their GICv4 structs and upgrade + targeting LPIs to vLPIs. + config ARM_GIC_V3_ITS_FSL_MC bool depends on ARM_GIC_V3_ITS diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 1e541193e98d..002fe0f4841d 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -973,6 +973,7 @@ struct kvm_enable_cap { #define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243 #define KVM_CAP_GUEST_MEMFD_FLAGS 244 #define KVM_CAP_ARM_SEA_TO_USER 245 +#define KVM_CAP_ARM_PER_VCPU_VLPI 246
struct kvm_irq_routing_irqchip { __u32 irqchip; @@ -1451,6 +1452,11 @@ struct kvm_enc_region { #define KVM_GET_SREGS2 _IOR(KVMIO, 0xcc, struct kvm_sregs2) #define KVM_SET_SREGS2 _IOW(KVMIO, 0xcd, struct kvm_sregs2)
+/* Per-vCPU vLPI enablement/disablement */ +#define KVM_ENABLE_VCPU_VLPI _IOW(KVMIO, 0xf0, int) +#define KVM_DISABLE_VCPU_VLPI _IOW(KVMIO, 0xf1, int) +#define KVM_QUERY_VCPU_VLPI _IOR(KVMIO, 0xf2, int) + #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
On Thu, 20 Nov 2025 14:02:50 +0000, Maximilian Dittgen mdittgen@amazon.de wrote:
Add CONFIG_ARM_GIC_V3_PER_VCPU_VLPI to control whether vLPI direct injection is to be enabled on a system-wide or a per-vCPU basis.
When enabled, vPEs can be allocated/deallocated to vCPUs on an ad-hoc, per-vCPU basis in runtime. When disabled, keep current vgic_v4_init behavior of automatic vCPU vPE allocation upon VM initialization.
We declare three ioctls numbers to manage per-vCPU vLPI enablement:
- KVM_ENABLE_VCPU_VLPI, which given a vCPU ID, allocates a vPE and
initializes the vCPU for receiving direct vLPI interrupts.
- KVM_DISABLE_VCPU_VLPI, which given a vCPU ID, disables the vCPU’s
ability to receive direct vLPI interrupts and frees its underlying vPE structure.
- KVM_QUERY_VCPU_VLPI, which given a vCPU ID, returns a boolean
describing whether the vCPU is configured to receive direct vLPI interrupts.
This commit declares the kconfig, ioctl numbers, and documentation. Implementation will come throughout this patch set.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de
Documentation/virt/kvm/api.rst | 56 ++++++++++++++++++++++++++++++++++ arch/arm64/kvm/arm.c | 15 +++++++++ arch/arm64/kvm/vgic/vgic-v4.c | 9 ++++++ arch/arm64/kvm/vgic/vgic.h | 2 ++ drivers/irqchip/Kconfig | 13 ++++++++ include/uapi/linux/kvm.h | 6 ++++ 6 files changed, 101 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 27f726ff8fe0..dcfb326dff10 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6517,6 +6517,62 @@ the capability to be present. `flags` must currently be zero. +4.XXX KVM_ENABLE_VCPU_VLPI +--------------------------
+:Capability: KVM_CAP_ARM_PER_VCPU_VLPI +:Architectures: arm64 +:Type: vm ioctl +:Parameters: int vcpu_id (in) +:Returns: 0 on success, negative value on error
+This ioctl enables GICv4 direct vLPI injection for the specified vCPU. +Allocates vPE structures (doorbell IRQ, vPE table entry, virtual pending +table, vPEID) and upgrades existing software-forwarded LPIs targeting +this vCPU to hardware-forwarded vLPIs.
+If GICv4.1 is supported and vSGIs are disabled on the specified vCPU, +this ioctl enables vCPU vSGI support.
+Requires CONFIG_ARM_GIC_V3_PER_VCPU_VLPI and GICv4 hardware support.
+Returns -EINVAL if vGICv4 is not initialized or if the passed vcpu_id +does not map to a vCPU.
+4.XXX KVM_DISABLE_VCPU_VLPI +---------------------------
+:Capability: KVM_CAP_ARM_PER_VCPU_VLPI +:Architectures: arm64 +:Type: vm ioctl +:Parameters: int vcpu_id (in) +:Returns: 0 on success, negative value on error
+This ioctl disables GICv4 direct vLPI injection for the specified vCPU. +Downgrades hardware-forwarded vLPIs to software-forwarded LPIs and frees +vPE structures. Pending interrupts in the virtual pending table may be +lost.
I'm going to put my foot down on that immediately.
There is no conceivable case where losing interrupts in acceptable. Ever. If that's what you want, please write your own hypervisor. I wish you luck!
+If vSGIs are enabled on the specified vCPU, this ioctl disables them.
So what? Something that didn't have an active state now has one that the guest doesn't know about? There is exactly *one* bit that defines that, and it doesn't exist in some quantum superposition.
This whole thing is completely insane, has not been thought out at all, is ignoring the basis of the architecture, and I'm really sorry that you wasted your time on that.
M.
The first step in implementing per-vCPU vLPI enablement ensuring vCPUs are not automatically assigned vPEs upon GICv4 VM boot. This is a) so that new VMs on a host do not selfishly grab all available vPEs when existing VMs are resource sharing, and b) to not crash hosts in which the number of launchable vCPUs can exceed the number of vPEIDs available in hardware.
When CONFIG_ARM_GIC_V3_PER_VCPU_VLPI kconfig is enabled, skip vPE initialization portion of vgic_v4_init routine. Note we continue to allocate memory for an array of vPE pointers for future initialization. This will allow us to easily track which vCPUs are vLPI-enabled by simply null-checking the vpes[vcpu_id] entry.
Disable automatic kvm_vgic_v4_set_forwarding() upon PCI endpoint configuration since vCPUs no longer have vPEs mapped by default. Instead, store the host_irq mapping so set_forwarding() can be called later upon per-vCPU vLPI enablement.
We must work towards modifying vPE allocation/freeing functions to work on a vCPU rather than a VM level. This commit modifies vPE unmap/map to function on a per-vCPU basis, and disables IRQ allocation/freeing functionality for now since it is currently implemented on a per-VM level. Per-vCPU IRQ allocation/freeing will come in a later patch.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de --- arch/arm64/kvm/arm.c | 6 ++++ arch/arm64/kvm/vgic/vgic-v3.c | 12 ++++++-- arch/arm64/kvm/vgic/vgic-v4.c | 55 ++++++++++++++++++++++++++++++++--- include/kvm/arm_vgic.h | 2 ++ 4 files changed, 69 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 2839e11ba2c1..31db3ccb3296 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2798,8 +2798,14 @@ int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons, if (irq_entry->type != KVM_IRQ_ROUTING_MSI) return 0;
+#ifndef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI return kvm_vgic_v4_set_forwarding(irqfd->kvm, prod->irq, &irqfd->irq_entry); +#else + /* Set forwarding later, ad-hoc upon per-vCPU vLPI enable request */ + return kvm_vgic_v4_map_irq_to_host(irqfd->kvm, prod->irq, + &irqfd->irq_entry); +#endif }
void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 968aa9d89be6..842a3a50f3a2 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -566,8 +566,12 @@ static void unmap_all_vpes(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic; int i;
- for (i = 0; i < dist->its_vm.nr_vpes; i++) + for (i = 0; i < dist->its_vm.nr_vpes; i++) { + if (!dist->its_vm.vpes[i]) /* Skip uninitialized vPEs */ + continue; + free_irq(dist->its_vm.vpes[i]->irq, kvm_get_vcpu(kvm, i)); + } }
static void map_all_vpes(struct kvm *kvm) @@ -575,9 +579,13 @@ static void map_all_vpes(struct kvm *kvm) struct vgic_dist *dist = &kvm->arch.vgic; int i;
- for (i = 0; i < dist->its_vm.nr_vpes; i++) + for (i = 0; i < dist->its_vm.nr_vpes; i++) { + if (!dist->its_vm.vpes[i]) + continue; + WARN_ON(vgic_v4_request_vpe_irq(kvm_get_vcpu(kvm, i), dist->its_vm.vpes[i]->irq)); + } }
/* diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index 9ef12c33b3f7..fb2e6af96aa9 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -252,7 +252,7 @@ int vgic_v4_init(struct kvm *kvm) { struct vgic_dist *dist = &kvm->arch.vgic; struct kvm_vcpu *vcpu; - int nr_vcpus, ret; + int nr_vcpus, ret = 0; unsigned long i;
lockdep_assert_held(&kvm->arch.config_lock); @@ -272,6 +272,7 @@ int vgic_v4_init(struct kvm *kvm)
dist->its_vm.nr_vpes = nr_vcpus;
+#ifndef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI kvm_for_each_vcpu(i, vcpu, kvm) dist->its_vm.vpes[i] = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
@@ -313,7 +314,12 @@ int vgic_v4_init(struct kvm *kvm) break; } } - +#else + /* + * TODO: Initialize the shared VM properties that remain necessary + * in per-vCPU mode + */ +#endif if (ret) vgic_v4_teardown(kvm);
@@ -335,6 +341,9 @@ void vgic_v4_teardown(struct kvm *kvm) return;
for (i = 0; i < its_vm->nr_vpes; i++) { + if (!its_vm->vpes[i]) /* Skip NULL vPEs */ + continue; + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i); int irq = its_vm->vpes[i]->irq;
@@ -342,7 +351,15 @@ void vgic_v4_teardown(struct kvm *kvm) free_irq(irq, vcpu); }
+#ifdef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI + /* + * TODO: Free the shared VM properties that remain necessary + * in per-vCPU mode. Create separate teardown function + * that operates on a per-vCPU basis. + */ +#else its_free_vcpu_irqs(its_vm); +#endif kfree(its_vm->vpes); its_vm->nr_vpes = 0; its_vm->vpes = NULL; @@ -368,7 +385,9 @@ int vgic_v4_put(struct kvm_vcpu *vcpu) { struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
- if (!vgic_supports_direct_irqs(vcpu->kvm) || !vpe->resident) + if (!vgic_supports_direct_irqs(vcpu->kvm) || + !vpe->its_vm || /* check if vPE is initialized for vCPU */ + !vpe->resident) return 0;
return its_make_vpe_non_resident(vpe, vgic_v4_want_doorbell(vcpu)); @@ -379,7 +398,9 @@ int vgic_v4_load(struct kvm_vcpu *vcpu) struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe; int err;
- if (!vgic_supports_direct_irqs(vcpu->kvm) || vpe->resident) + if (!vgic_supports_direct_irqs(vcpu->kvm) || + !vpe->its_vm || + vpe->resident) return 0;
if (vcpu_get_flag(vcpu, IN_WFI)) @@ -414,6 +435,9 @@ void vgic_v4_commit(struct kvm_vcpu *vcpu) { struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
+ if (!vpe->its_vm) + return; + /* * No need to wait for the vPE to be ready across a shallow guest * exit, as only a vcpu_put will invalidate it. @@ -436,6 +460,29 @@ static struct vgic_its *vgic_get_its(struct kvm *kvm, return vgic_msi_to_its(kvm, &msi); }
+/** + * Map an interrupt to a host IRQ without setting up hardware forwarding. + * Useful for defered vLPI enablement. + */ +int kvm_vgic_v4_map_irq_to_host(struct kvm *kvm, int virq, + struct kvm_kernel_irq_routing_entry *irq_entry) +{ + struct vgic_its *its; + struct vgic_irq *irq; + + its = vgic_get_its(kvm, irq_entry); + if (IS_ERR(its)) + return 0; + + if (vgic_its_resolve_lpi(kvm, its, irq_entry->msi.devid, + irq_entry->msi.data, &irq)) + return 0; + + irq->host_irq = virq; + + return 0; +} + int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int virq, struct kvm_kernel_irq_routing_entry *irq_entry) { diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index b261fb3968d0..02842754627f 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -450,6 +450,8 @@ int kvm_vgic_set_owner(struct kvm_vcpu *vcpu, unsigned int intid, void *owner);
struct kvm_kernel_irq_routing_entry;
+int kvm_vgic_v4_map_irq_to_host(struct kvm *kvm, int virq, + struct kvm_kernel_irq_routing_entry *irq_entry); int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int irq, struct kvm_kernel_irq_routing_entry *irq_entry);
kvm_vgic_v4_set_forwarding() acquires its_lock to safely map guest LPIs to host IRQs for vLPI upgrades. Future per-vCPU direct vLPI injection requires atomically upgrading multiple LPIs while holding its_lock, which would cause recursive locking when calling kvm_vgic_v4_set_forwarding().
Extract the locked portion to kvm_vgic_v4_set_forwarding_locked() to allow callers already holding its_lock to perform vLPI upgrades without recursive locking.
No functional change.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.com --- arch/arm64/kvm/vgic/vgic-v4.c | 38 +++++++++++++++++++++-------------- include/kvm/arm_vgic.h | 3 +++ 2 files changed, 26 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index fb2e6af96aa9..4a1825a1a5d7 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -483,27 +483,15 @@ int kvm_vgic_v4_map_irq_to_host(struct kvm *kvm, int virq, return 0; }
-int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int virq, - struct kvm_kernel_irq_routing_entry *irq_entry) +int kvm_vgic_v4_set_forwarding_locked(struct kvm *kvm, int virq, + struct kvm_kernel_irq_routing_entry *irq_entry, struct vgic_its *its) { - struct vgic_its *its; struct vgic_irq *irq; struct its_vlpi_map map; unsigned long flags; int ret = 0;
- if (!vgic_supports_direct_msis(kvm)) - return 0; - - /* - * Get the ITS, and escape early on error (not a valid - * doorbell for any of our vITSs). - */ - its = vgic_get_its(kvm, irq_entry); - if (IS_ERR(its)) - return 0; - - guard(mutex)(&its->its_lock); + lockdep_assert_held(&its->its_lock);
/* * Perform the actual DevID/EventID -> LPI translation. @@ -567,6 +555,26 @@ int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int virq, return ret; }
+int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int virq, + struct kvm_kernel_irq_routing_entry *irq_entry) +{ + struct vgic_its *its; + + if (!vgic_supports_direct_msis(kvm)) + return 0; + + /* + * Get the ITS, and escape early on error (not a valid + * doorbell for any of our vITSs). + */ + its = vgic_get_its(kvm, irq_entry); + if (IS_ERR(its)) + return 0; + + guard(mutex)(&its->its_lock); + return kvm_vgic_v4_set_forwarding_locked(kvm, virq, irq_entry, its); +} + static struct vgic_irq *__vgic_host_irq_get_vlpi(struct kvm *kvm, int host_irq) { struct vgic_irq *irq; diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 02842754627f..18a49c4b83f8 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -454,6 +454,9 @@ int kvm_vgic_v4_map_irq_to_host(struct kvm *kvm, int virq, struct kvm_kernel_irq_routing_entry *irq_entry); int kvm_vgic_v4_set_forwarding(struct kvm *kvm, int irq, struct kvm_kernel_irq_routing_entry *irq_entry); +int kvm_vgic_v4_set_forwarding_locked(struct kvm *kvm, int virq, + struct kvm_kernel_irq_routing_entry *irq_entry, + struct vgic_its *its);
void kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq);
Implement kvm_vgic_query_vcpu_vlpi, which handles the KVM_QUERY_VCPU_VLPI ioctl to query whether a vCPU is currently initialized to handle LPIs via direct vLPI injection. This function checks whether the vCPU's entry in the VM's vPE array is populated.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.com --- arch/arm64/kvm/arm.c | 13 +++++++++++-- arch/arm64/kvm/vgic/vgic-v4.c | 15 +++++++++++++++ arch/arm64/kvm/vgic/vgic.h | 1 + include/linux/kvm_host.h | 11 +++++++++++ 4 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 31db3ccb3296..afb04162e0cf 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1959,8 +1959,17 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) return -ENOSYS; } case KVM_QUERY_VCPU_VLPI: { - /* TODO: create ioctl handler function */ - return -ENOSYS; + int vcpu_id; + struct kvm_vcpu *vcpu; + + if (copy_from_user(&vcpu_id, argp, sizeof(vcpu_id))) + return -EFAULT; + + vcpu = kvm_get_vcpu_by_id(kvm, vcpu_id); + if (!vcpu) + return -EINVAL; + + return kvm_vgic_query_vcpu_vlpi(vcpu); } default: return -EINVAL; diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index 4a1825a1a5d7..cebcb9175572 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -617,3 +617,18 @@ void kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq) raw_spin_unlock_irqrestore(&irq->irq_lock, flags); vgic_put_irq(kvm, irq); } + +/* query whether vLPI direct injection is enabled on a specific vCPU. + * return 0 if disabled, 1 if enabled, -EINVAL if vCPU non-existant or GIC + * uninitialized + */ +int kvm_vgic_query_vcpu_vlpi(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + struct vgic_dist *dist = &kvm->arch.vgic; + int i = kvm_idx_from_vcpu(kvm, vcpu); + + if (i == UINT_MAX || !dist->its_vm.vpes) + return -EINVAL; /* vCPU non-existant or uninitialized */ + return dist->its_vm.vpes[i] != NULL; +} diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index 99894806a4e9..295088913c26 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -468,5 +468,6 @@ int vgic_its_debug_init(struct kvm_device *dev); void vgic_its_debug_destroy(struct kvm_device *dev);
bool kvm_per_vcpu_vlpi_supported(void); +int kvm_vgic_query_vcpu_vlpi(struct kvm_vcpu *vcpu);
#endif diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5bd76cf394fa..bc7001f8c5dd 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1030,6 +1030,17 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id) return NULL; }
+static inline unsigned int kvm_idx_from_vcpu(struct kvm *kvm, struct kvm_vcpu *target_vcpu) +{ + struct kvm_vcpu *vcpu; + unsigned long i; + + kvm_for_each_vcpu(i, vcpu, kvm) + if (vcpu == target_vcpu) + return i; + return UINT_MAX; +} + void kvm_destroy_vcpus(struct kvm *kvm);
int kvm_trylock_all_vcpus(struct kvm *kvm);
Implement kvm_vgic_enable_vcpu_vlpi, which handles the KVM_ENABLE_VCPU_VLPI ioctl to enable direct vLPI injection on a specific vCPU. The function has two components: a call to vgic_v4_cpu_init and a call to upgrade_existing_lpis_to_vlpis:
- vgic_v4_vcpu_init() is the per-vCPU corrolary to vgic_cpu_init, and initializes all of the GIC structures a vCPU needs to handle LPI interrupts via direct injection. While IRQ domains are usually allocated on a per-VM basis, vgic_v4_cpu_init() creates a per-vPE IRQ domain and fwnode to decouple vLPI doorbell allocation across separate vCPUs. The domain allocation routine in its_vpe_irq_domain_alloc() also allocates a vPE table entry and virtual pending table for the vCPU.
- upgrade_existing_lpis_to_vlpis() iterates through all of the LPIs targeting the vCPU and initializes hardware forwarding to process them as direct vLPIs. This includes updating the LPIs ITE to hold a vPE's vPEID instead of a Collection table's collection ID. It also toggles each interrupt's irq->hw flag to true to notify the ITS to handle the interrupt via direct injection.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.com --- arch/arm64/kvm/arm.c | 13 ++- arch/arm64/kvm/vgic/vgic-its.c | 4 +- arch/arm64/kvm/vgic/vgic-v4.c | 157 ++++++++++++++++++++++++++++- arch/arm64/kvm/vgic/vgic.h | 4 + drivers/irqchip/irq-gic-v3-its.c | 48 ++++++++- drivers/irqchip/irq-gic-v4.c | 56 ++++++++-- include/linux/irqchip/arm-gic-v3.h | 4 + include/linux/irqchip/arm-gic-v4.h | 8 +- 8 files changed, 277 insertions(+), 17 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index afb04162e0cf..169860649bdd 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1951,8 +1951,17 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) return kvm_vm_ioctl_get_reg_writable_masks(kvm, &range); } case KVM_ENABLE_VCPU_VLPI: { - /* TODO: create ioctl handler function */ - return -ENOSYS; + int vcpu_id; + struct kvm_vcpu *vcpu; + + if (copy_from_user(&vcpu_id, argp, sizeof(vcpu_id))) + return -EFAULT; + + vcpu = kvm_get_vcpu_by_id(kvm, vcpu_id); + if (!vcpu) + return -EINVAL; + + return kvm_vgic_enable_vcpu_vlpi(vcpu); } case KVM_DISABLE_VCPU_VLPI: { /* TODO: create ioctl handler function */ diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c index ce3e3ed3f29f..5f3bbf24cc2f 100644 --- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -23,7 +23,7 @@ #include "vgic.h" #include "vgic-mmio.h"
-static struct kvm_device_ops kvm_arm_vgic_its_ops; +struct kvm_device_ops kvm_arm_vgic_its_ops;
static int vgic_its_save_tables_v0(struct vgic_its *its); static int vgic_its_restore_tables_v0(struct vgic_its *its); @@ -2801,7 +2801,7 @@ static int vgic_its_get_attr(struct kvm_device *dev, return 0; }
-static struct kvm_device_ops kvm_arm_vgic_its_ops = { +struct kvm_device_ops kvm_arm_vgic_its_ops = { .name = "kvm-arm-vgic-its", .create = vgic_its_create, .destroy = vgic_its_destroy, diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index cebcb9175572..efb9ac9188e3 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -316,9 +316,15 @@ int vgic_v4_init(struct kvm *kvm) } #else /* - * TODO: Initialize the shared VM properties that remain necessary - * in per-vCPU mode + * Initialize the shared VM properties that remain necessary in per-vCPU mode */ + + /* vPE properties table */ + if (!dist->its_vm.vprop_page) { + dist->its_vm.vprop_page = its_allocate_prop_table(GFP_KERNEL); + if (!dist->its_vm.vprop_page) + ret = -ENOMEM; + } #endif if (ret) vgic_v4_teardown(kvm); @@ -326,6 +332,51 @@ int vgic_v4_init(struct kvm *kvm) return ret; }
+/** + * vgic_v4_vcpu_init - When per-vCPU vLPI injection is enabled, + * initialize the GICv4 data structures for a specific vCPU + * @vcpu: Pointer to the vcpu being initialized + * + * Called every time the KVM_ENABLE_VCPU_VLPI ioctl is called. + */ +int vgic_v4_vcpu_init(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + struct vgic_dist *dist = &kvm->arch.vgic; + int i, ret, irq; + unsigned long irq_flags = DB_IRQ_FLAGS; + + /* Validate vgic_v4_init() has been called to allocate the vpe array */ + if (!dist->its_vm.vpes) + return -ENODEV; + + /* Link KVM distributor to the newly-allocated vPE */ + i = kvm_idx_from_vcpu(kvm, vcpu); + if (i == UINT_MAX) + return -EINVAL; + dist->its_vm.vpes[i] = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe; + + ret = its_alloc_vcpu_irq(vcpu); + if (ret) + return ret; + + /* Same routine as the kvm_for_each_vcpu of vgic_v4_init */ + irq = dist->its_vm.vpes[i]->irq; + + if (kvm_vgic_global_state.has_gicv4_1) + irq_flags &= ~IRQ_NOAUTOEN; + irq_set_status_flags(irq, irq_flags); + + ret = vgic_v4_request_vpe_irq(vcpu, irq); + if (ret) + kvm_err("failed to allocate vcpu IRQ%d\n", irq); + + if (ret) + vgic_v4_teardown(kvm); + + return ret; +} + /** * vgic_v4_teardown - Free the GICv4 data structures * @kvm: Pointer to the VM being destroyed @@ -357,6 +408,9 @@ void vgic_v4_teardown(struct kvm *kvm) * in per-vCPU mode. Create separate teardown function * that operates on a per-vCPU basis. */ + + /* vPE properties table */ + its_free_prop_table(its_vm->vprop_page); #else its_free_vcpu_irqs(its_vm); #endif @@ -618,6 +672,105 @@ void kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq) vgic_put_irq(kvm, irq); }
+static int upgrade_existing_lpis_to_vlpis(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_device *dev; + struct vgic_its *its, *its_from_entry; + struct its_device *device; + struct its_ite *ite; + struct kvm_kernel_irq_routing_entry entry; + int ret = 0; + int host_irq; + + list_for_each_entry(dev, &kvm->devices, vm_node) { + /* Ensure we only look at ITS devices */ + if (dev->ops != &kvm_arm_vgic_its_ops) + continue; + + its = dev->private; + mutex_lock(&its->its_lock); + + list_for_each_entry(device, &its->device_list, dev_list) { + list_for_each_entry(ite, &device->itt_head, ite_list) { + /* ite->irq->hw means entry already upgraded to vLPI */ + if (ite->collection && + ite->collection->target_addr == vcpu->vcpu_id && + ite->irq && !ite->irq->hw) { + + /* + * An existing IRQ would only have a null host_irq if it is + * completely defined in software, in which case it cannot + * be direct injected anyways. Thus, we skip interrupt + * upgrade for IRQs with null host_irqs. + */ + if (ite->irq->host_irq > 0) + host_irq = ite->irq->host_irq; + else + continue; + + /* Create routing entry */ + memset(&entry, 0, sizeof(entry)); + entry.gsi = host_irq; + entry.type = KVM_IRQ_ROUTING_MSI; + /* MSI address is system defined for ARM GICv3 */ + entry.msi.address_lo = + (u32)(its->vgic_its_base + GITS_TRANSLATER); + entry.msi.address_hi = + (u32)((its->vgic_its_base + GITS_TRANSLATER) >> 32); + entry.msi.data = ite->event_id; + entry.msi.devid = device->device_id; + entry.msi.flags = KVM_MSI_VALID_DEVID; + + /* Verify ITS consistency */ + its_from_entry = vgic_get_its(kvm, &entry); + if (IS_ERR(its_from_entry) || its_from_entry != its) + continue; + + /* Upgrade to vLPI */ + ret = kvm_vgic_v4_set_forwarding_locked(kvm, host_irq, + &entry, its); + if (ret) + kvm_info("Failed to upgrade LPI %d: %d\n", + host_irq, ret); + } + } + } + + mutex_unlock(&its->its_lock); + } + + return 0; +} + +/* Enable vLPI direct injection on a specific vCPU */ +int kvm_vgic_enable_vcpu_vlpi(struct kvm_vcpu *vcpu) +{ + int ret; + int vcpu_vlpi_status = kvm_vgic_query_vcpu_vlpi(vcpu); + + /* vGIC not initialized for vCPU */ + if (vcpu_vlpi_status < 0) + return vcpu_vlpi_status; + /* vLPI already enabled */ + if (vcpu_vlpi_status > 0) + return 0; + + /* Allocate the vPE struct and vPE table for the vCPU */ + ret = vgic_v4_vcpu_init(vcpu); + if (ret) + return ret; + + /* + * Upgrade existing LPIs to vLPIs. We + * do not need to error check since + * a failure in upgrading an LPI is non-breaking; + * those LPIs may continue to be processed by + * software. + */ + return upgrade_existing_lpis_to_vlpis(vcpu); +} + /* query whether vLPI direct injection is enabled on a specific vCPU. * return 0 if disabled, 1 if enabled, -EINVAL if vCPU non-existant or GIC * uninitialized diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index 295088913c26..60ae0d1f044d 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -251,6 +251,8 @@ struct ap_list_summary { #define irqs_active_outside_lrs(s) \ ((s)->nr_act && irqs_outside_lrs(s))
+extern struct kvm_device_ops kvm_arm_vgic_its_ops; + int vgic_v3_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr, struct vgic_reg_attr *reg_attr); int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr, @@ -434,6 +436,7 @@ static inline bool vgic_supports_direct_irqs(struct kvm *kvm) }
int vgic_v4_init(struct kvm *kvm); +int vgic_v4_vcpu_init(struct kvm_vcpu *vcpu); void vgic_v4_teardown(struct kvm *kvm); void vgic_v4_configure_vsgis(struct kvm *kvm); void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val); @@ -468,6 +471,7 @@ int vgic_its_debug_init(struct kvm_device *dev); void vgic_its_debug_destroy(struct kvm_device *dev);
bool kvm_per_vcpu_vlpi_supported(void); +int kvm_vgic_enable_vcpu_vlpi(struct kvm_vcpu *vcpu); int kvm_vgic_query_vcpu_vlpi(struct kvm_vcpu *vcpu);
#endif diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 467cb78435a9..67749578f973 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2261,7 +2261,7 @@ static void gic_reset_prop_table(void *va) gic_flush_dcache_to_poc(va, LPI_PROPBASE_SZ); }
-static struct page *its_allocate_prop_table(gfp_t gfp_flags) +struct page *its_allocate_prop_table(gfp_t gfp_flags) { struct page *prop_page;
@@ -2275,7 +2275,7 @@ static struct page *its_allocate_prop_table(gfp_t gfp_flags) return prop_page; }
-static void its_free_prop_table(struct page *prop_page) +void its_free_prop_table(struct page *prop_page) { its_free_pages(page_address(prop_page), get_order(LPI_PROPBASE_SZ)); } @@ -4612,25 +4612,65 @@ static void its_vpe_irq_domain_free(struct irq_domain *domain,
BUG_ON(vm != vpe->its_vm);
+#ifdef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI + free_lpi_range(vpe->vpe_db_lpi, 1); +#else clear_bit(data->hwirq, vm->db_bitmap); +#endif its_vpe_teardown(vpe); irq_domain_reset_irq_data(data); }
+#ifndef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI if (bitmap_empty(vm->db_bitmap, vm->nr_db_lpis)) { its_lpi_free(vm->db_bitmap, vm->db_lpi_base, vm->nr_db_lpis); its_free_prop_table(vm->vprop_page); } +#endif }
static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs, void *args) { struct irq_chip *irqchip = &its_vpe_irq_chip; + int base, err; +#ifdef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI + struct its_vpe *vpe = args; + + /* Per-vCPU mode: allocate domain on vPE––rather than VM––level */ + WARN_ON(nr_irqs != 1); + + /* Use VM's shared properties table */ + if (!vpe->its_vm || !vpe->its_vm->vprop_page) + return -EINVAL; + + if (gic_rdists->has_rvpeid) + irqchip = &its_vpe_4_1_irq_chip; + + err = alloc_lpi_range(1, &base); + if (err) + return err; + vpe->vpe_db_lpi = base; + err = its_vpe_init(vpe); + if (err) + return err; + + err = its_irq_gic_domain_alloc(domain, virq, vpe->vpe_db_lpi); + if (err) + goto err_teardown_vpe; + + irq_domain_set_hwirq_and_chip(domain, virq, 0, irqchip, vpe); + irqd_set_resend_when_in_progress(irq_get_irq_data(virq)); + + return 0; + +err_teardown_vpe: + its_vpe_teardown(vpe); +#else struct its_vm *vm = args; unsigned long *bitmap; struct page *vprop_page; - int base, nr_ids, i, err = 0; + int nr_ids, i;
bitmap = its_lpi_alloc(roundup_pow_of_two(nr_irqs), &base, &nr_ids); if (!bitmap) @@ -4673,7 +4713,7 @@ static int its_vpe_irq_domain_alloc(struct irq_domain *domain, unsigned int virq
if (err) its_vpe_irq_domain_free(domain, virq, i); - +#endif return err; }
diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c index 8455b4a5fbb0..c8e324cd8911 100644 --- a/drivers/irqchip/irq-gic-v4.c +++ b/drivers/irqchip/irq-gic-v4.c @@ -7,6 +7,7 @@ #include <linux/interrupt.h> #include <linux/irq.h> #include <linux/irqdomain.h> +#include <linux/kvm_host.h> #include <linux/msi.h> #include <linux/pid.h> #include <linux/sched.h> @@ -128,14 +129,14 @@ static int its_alloc_vcpu_sgis(struct its_vpe *vpe, int idx) if (!name) goto err;
- vpe->fwnode = irq_domain_alloc_named_id_fwnode(name, idx); - if (!vpe->fwnode) + vpe->sgi_fwnode = irq_domain_alloc_named_id_fwnode(name, idx); + if (!vpe->sgi_fwnode) goto err;
kfree(name); name = NULL;
- vpe->sgi_domain = irq_domain_create_linear(vpe->fwnode, 16, + vpe->sgi_domain = irq_domain_create_linear(vpe->sgi_fwnode, 16, sgi_domain_ops, vpe); if (!vpe->sgi_domain) goto err; @@ -149,8 +150,8 @@ static int its_alloc_vcpu_sgis(struct its_vpe *vpe, int idx) err: if (vpe->sgi_domain) irq_domain_remove(vpe->sgi_domain); - if (vpe->fwnode) - irq_domain_free_fwnode(vpe->fwnode); + if (vpe->sgi_fwnode) + irq_domain_free_fwnode(vpe->sgi_fwnode); kfree(name); return -ENOMEM; } @@ -199,6 +200,49 @@ int its_alloc_vcpu_irqs(struct its_vm *vm) return -ENOMEM; }
+int its_alloc_vcpu_irq(struct kvm_vcpu *vcpu) +{ + struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe; + struct its_vm *vm = &vcpu->kvm->arch.vgic.its_vm; + int ret; + + vpe->its_vm = vm; /* point all vPEs on a VM to the same shared dist its_vm*/ + if (!has_v4_1_sgi()) /* idai bool shares memory with sgi_domain pointer */ + vpe->idai = true; + + /* create a per-vPE, rather than per-VM, fwnode */ + if (!vpe->lpi_fwnode) { + /* add vcpu_id to fwnode naming to differentiate vcpus in same VM */ + vpe->lpi_fwnode = irq_domain_alloc_named_id_fwnode("GICv4-vpe-lpi", + task_pid_nr(current) * 1000 + vcpu->vcpu_id); + if (!vpe->lpi_fwnode) + goto err; + } + + /* create domain hierarchy for vPE */ + vpe->lpi_domain = irq_domain_create_hierarchy(gic_domain, 0, 1, + vpe->lpi_fwnode, vpe_domain_ops, vpe); + if (!vpe->lpi_domain) + goto err; + + /* allocate IRQs from vPE domain */ + vpe->irq = irq_domain_alloc_irqs(vpe->lpi_domain, 1, NUMA_NO_NODE, vpe); + if (vpe->irq <= 0) + goto err; + + ret = its_alloc_vcpu_sgis(vpe, vcpu->vcpu_id); + if (ret) + goto err; + + return 0; +err: + if (vpe->lpi_domain) + irq_domain_remove(vpe->lpi_domain); + if (vpe->lpi_fwnode) + irq_domain_free_fwnode(vpe->lpi_fwnode); + return -ENOMEM; +} + static void its_free_sgi_irqs(struct its_vm *vm) { int i; @@ -214,7 +258,7 @@ static void its_free_sgi_irqs(struct its_vm *vm)
irq_domain_free_irqs(irq, 16); irq_domain_remove(vm->vpes[i]->sgi_domain); - irq_domain_free_fwnode(vm->vpes[i]->fwnode); + irq_domain_free_fwnode(vm->vpes[i]->sgi_fwnode); } }
diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 70c0948f978e..5031a4c25543 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -641,6 +641,10 @@ int its_init(struct fwnode_handle *handle, struct rdists *rdists, struct irq_domain *domain, u8 irq_prio); int mbi_init(struct fwnode_handle *fwnode, struct irq_domain *parent);
+/* Enable prop table alloc/free on vGIC init/destroy when per-vCPU vLPI is enabled */ +struct page *its_allocate_prop_table(gfp_t gfp_flags); +void its_free_prop_table(struct page *prop_page); + static inline bool gic_enable_sre(void) { u32 val; diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h index 0b0887099fd7..bc493fed75ab 100644 --- a/include/linux/irqchip/arm-gic-v4.h +++ b/include/linux/irqchip/arm-gic-v4.h @@ -8,6 +8,7 @@ #define __LINUX_IRQCHIP_ARM_GIC_V4_H
struct its_vpe; +struct kvm_vcpu;
/* * Maximum number of ITTs when GITS_TYPER.VMOVP == 0, using the @@ -42,6 +43,10 @@ struct its_vpe { struct its_vm *its_vm; /* per-vPE VLPI tracking */ atomic_t vlpi_count; + /* per-vPE domain for per-vCPU VLPI enablement */ + struct irq_domain *lpi_domain; + /* enables per-vPE vLPI IRQ Domains during per-vCPU VLPI enablement */ + struct fwnode_handle *lpi_fwnode; /* Doorbell interrupt */ int irq; irq_hw_number_t vpe_db_lpi; @@ -59,7 +64,7 @@ struct its_vpe { }; /* GICv4.1 implementations */ struct { - struct fwnode_handle *fwnode; + struct fwnode_handle *sgi_fwnode; struct irq_domain *sgi_domain; struct { u8 priority; @@ -139,6 +144,7 @@ struct its_cmd_info { };
int its_alloc_vcpu_irqs(struct its_vm *vm); +int its_alloc_vcpu_irq(struct kvm_vcpu *vcpu); void its_free_vcpu_irqs(struct its_vm *vm); int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en); int its_make_vpe_non_resident(struct its_vpe *vpe, bool db);
If vgic_v4_load() is called to schedule GICv4 on a vCPU at the same time that kvm_vgic_enable_vcpu_vlpi() is called to enable vLPI direct injection on the vCPU, vgic_v4_load() will attempt to map the vCPU's doorbell IRQ to the physical processor while kvm_vgic_enable_vcpu_vlpi() is still creating the doorbell IRQ.
This race will cause vgic_v4_load()'s mapping operation to fail, triggering a WARN_ON in vgic_v3_load().
Fix by checking for the presence of a doorbell IRQ before attempting to load GICv4. Remove WARN_ON to remove verbosity of GICv4 load failures resulting from this race; failure to load GICv4 is not breaking behavior.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.com --- arch/arm64/kvm/vgic/vgic-v3.c | 2 +- arch/arm64/kvm/vgic/vgic-v4.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 842a3a50f3a2..ffaf692399fd 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -995,7 +995,7 @@ void vgic_v3_load(struct kvm_vcpu *vcpu) if (has_vhe()) __vgic_v3_activate_traps(cpu_if);
- WARN_ON(vgic_v4_load(vcpu)); + vgic_v4_load(vcpu); }
void vgic_v3_put(struct kvm_vcpu *vcpu) diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index efb9ac9188e3..0affcfca17f0 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -454,6 +454,7 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
if (!vgic_supports_direct_irqs(vcpu->kvm) || !vpe->its_vm || + !vpe->irq || /* check if irq has been allocated yet */ vpe->resident) return 0;
Implement kvm_vgic_disable_vcpu_vlpi(), which handles the KVM_DISABLE_VCPU_VLPI ioctl to disable direct vLPI injection on a specific vCPU. The function has two components: a call to vgic_v4_vcpu_teardown() and a call to downgrade_existing_vlpis_to_lpis():
- vgic_v4_vcpu_teardown() is the per-vCPU corrolary to vgic_v4_teardown() and frees all of the GIC structures a vCPU needs to handle LPI interrupts via direct injection. While vgic_v4_teardown operates on a per-VM basis, vgic_v4_vcpu_teardown() frees the IRQ, LPI domain, and fwnode of the single targeted vCPU. The domain free routine in this function frees the vPE table entry and virtual pending table fo the vCPU.
- downgrade_existing_lpis_to_vlpis() iterates through all of the LPIs targeting the vCPU and tears down the hardware forwarding that processes them as vLPIs. Uses kvm_vgic_v4_unset_forwarding() to unmap direct injection.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.com --- arch/arm64/kvm/arm.c | 13 +++- arch/arm64/kvm/vgic/vgic-v4.c | 105 ++++++++++++++++++++++++++--- arch/arm64/kvm/vgic/vgic.h | 2 + drivers/irqchip/irq-gic-v3-its.c | 4 +- drivers/irqchip/irq-gic-v4.c | 19 ++++++ include/linux/irqchip/arm-gic-v4.h | 1 + 6 files changed, 131 insertions(+), 13 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 169860649bdd..180eaa4165e9 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1964,8 +1964,17 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) return kvm_vgic_enable_vcpu_vlpi(vcpu); } case KVM_DISABLE_VCPU_VLPI: { - /* TODO: create ioctl handler function */ - return -ENOSYS; + int vcpu_id; + struct kvm_vcpu *vcpu; + + if (copy_from_user(&vcpu_id, argp, sizeof(vcpu_id))) + return -EFAULT; + + vcpu = kvm_get_vcpu_by_id(kvm, vcpu_id); + if (!vcpu) + return -EINVAL; + + return kvm_vgic_disable_vcpu_vlpi(vcpu); } case KVM_QUERY_VCPU_VLPI: { int vcpu_id; diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index 0affcfca17f0..39fababf2861 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -372,7 +372,7 @@ int vgic_v4_vcpu_init(struct kvm_vcpu *vcpu) kvm_err("failed to allocate vcpu IRQ%d\n", irq);
if (ret) - vgic_v4_teardown(kvm); + vgic_v4_vcpu_teardown(vcpu);
return ret; } @@ -384,7 +384,8 @@ int vgic_v4_vcpu_init(struct kvm_vcpu *vcpu) void vgic_v4_teardown(struct kvm *kvm) { struct its_vm *its_vm = &kvm->arch.vgic.its_vm; - int i; + struct kvm_vcpu *vcpu; + unsigned long i;
lockdep_assert_held(&kvm->arch.config_lock);
@@ -395,7 +396,7 @@ void vgic_v4_teardown(struct kvm *kvm) if (!its_vm->vpes[i]) /* Skip NULL vPEs */ continue;
- struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i); + vcpu = kvm_get_vcpu(kvm, i); int irq = its_vm->vpes[i]->irq;
irq_clear_status_flags(irq, DB_IRQ_FLAGS); @@ -403,14 +404,14 @@ void vgic_v4_teardown(struct kvm *kvm) }
#ifdef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI - /* - * TODO: Free the shared VM properties that remain necessary - * in per-vCPU mode. Create separate teardown function - * that operates on a per-vCPU basis. - */ - - /* vPE properties table */ + /* Free shared VM vPE properties table */ its_free_prop_table(its_vm->vprop_page); + + /* Free remaining doorbell IRQs */ + kvm_for_each_vcpu(i, vcpu, kvm) { + if (its_vm->vpes[i]) + its_free_vcpu_irq(vcpu); + } #else its_free_vcpu_irqs(its_vm); #endif @@ -419,6 +420,41 @@ void vgic_v4_teardown(struct kvm *kvm) its_vm->vpes = NULL; }
+/** + * vgic_v4_vcpu_teardown - teardown the GICv4 data structures for a + * specific vCPU + * @vcpu: Pointer to the vcpu being torn down + * + * Called every time the KVM_DISABLE_VCPU_VLPI ioctl is called. + */ +int vgic_v4_vcpu_teardown(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + struct vgic_dist *dist = &kvm->arch.vgic; + int i, irq; + + /* Get vCPU index */ + i = kvm_idx_from_vcpu(kvm, vcpu); + /* On userspace to validate that vcpu has vLPIs enabled before calling ioctl */ + if (i == UINT_MAX || !dist->its_vm.vpes || !dist->its_vm.vpes[i]) + return -EINVAL; + + irq = dist->its_vm.vpes[i]->irq; + + /* Free the vPE IRQ */ + irq_clear_status_flags(irq, DB_IRQ_FLAGS); + free_irq(irq, vcpu); + + + /* Free vCPU IRQ resources */ + its_free_vcpu_irq(vcpu); + + /* Unlink distributor from vPE - this officially "disables" vLPIs on the vCPU */ + dist->its_vm.vpes[i] = NULL; + + return 0; +} + static inline bool vgic_v4_want_doorbell(struct kvm_vcpu *vcpu) { if (vcpu_get_flag(vcpu, IN_WFI)) @@ -744,6 +780,41 @@ static int upgrade_existing_lpis_to_vlpis(struct kvm_vcpu *vcpu) return 0; }
+static int downgrade_existing_vlpis_to_lpis(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_device *dev; + struct vgic_its *its; + struct its_device *device; + struct its_ite *ite; + + list_for_each_entry(dev, &kvm->devices, vm_node) { + /* Ensure we only look at ITS devices */ + if (dev->ops != &kvm_arm_vgic_its_ops) + continue; + + its = dev->private; + mutex_lock(&its->its_lock); + + list_for_each_entry(device, &its->device_list, dev_list) { + list_for_each_entry(ite, &device->itt_head, ite_list) { + /* Only downgrade vLPIs targeting this vCPU */ + if (ite->collection && + ite->collection->target_addr == vcpu->vcpu_id && + ite->irq && ite->irq->hw) { + + /* Unmap direct injection */ + kvm_vgic_v4_unset_forwarding(kvm, ite->irq->host_irq); + } + } + } + + mutex_unlock(&its->its_lock); + } + + return 0; +} + /* Enable vLPI direct injection on a specific vCPU */ int kvm_vgic_enable_vcpu_vlpi(struct kvm_vcpu *vcpu) { @@ -772,6 +843,20 @@ int kvm_vgic_enable_vcpu_vlpi(struct kvm_vcpu *vcpu) return upgrade_existing_lpis_to_vlpis(vcpu); }
+/* Disable vLPI direct injection on a specific vCPU */ +int kvm_vgic_disable_vcpu_vlpi(struct kvm_vcpu *vcpu) +{ + int vcpu_vlpi_status = kvm_vgic_query_vcpu_vlpi(vcpu); + + /* vGIC not initialized for vCPU or vLPI already disabled */ + if (vcpu_vlpi_status <= 0) + return vcpu_vlpi_status; + + downgrade_existing_vlpis_to_lpis(vcpu); + + return vgic_v4_vcpu_teardown(vcpu); +} + /* query whether vLPI direct injection is enabled on a specific vCPU. * return 0 if disabled, 1 if enabled, -EINVAL if vCPU non-existant or GIC * uninitialized diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index 60ae0d1f044d..b16419eb9121 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -438,6 +438,7 @@ static inline bool vgic_supports_direct_irqs(struct kvm *kvm) int vgic_v4_init(struct kvm *kvm); int vgic_v4_vcpu_init(struct kvm_vcpu *vcpu); void vgic_v4_teardown(struct kvm *kvm); +int vgic_v4_vcpu_teardown(struct kvm_vcpu *vcpu); void vgic_v4_configure_vsgis(struct kvm *kvm); void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val); int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq); @@ -472,6 +473,7 @@ void vgic_its_debug_destroy(struct kvm_device *dev);
bool kvm_per_vcpu_vlpi_supported(void); int kvm_vgic_enable_vcpu_vlpi(struct kvm_vcpu *vcpu); +int kvm_vgic_disable_vcpu_vlpi(struct kvm_vcpu *vcpu); int kvm_vgic_query_vcpu_vlpi(struct kvm_vcpu *vcpu);
#endif diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 67749578f973..0e0778d61df2 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -4600,7 +4600,9 @@ static void its_vpe_irq_domain_free(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs) { +#ifndef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI struct its_vm *vm = domain->host_data; +#endif int i;
irq_domain_free_irqs_parent(domain, virq, nr_irqs); @@ -4610,11 +4612,11 @@ static void its_vpe_irq_domain_free(struct irq_domain *domain, virq + i); struct its_vpe *vpe = irq_data_get_irq_chip_data(data);
- BUG_ON(vm != vpe->its_vm);
#ifdef CONFIG_ARM_GIC_V3_PER_VCPU_VLPI free_lpi_range(vpe->vpe_db_lpi, 1); #else + BUG_ON(vm != vpe->its_vm); clear_bit(data->hwirq, vm->db_bitmap); #endif its_vpe_teardown(vpe); diff --git a/drivers/irqchip/irq-gic-v4.c b/drivers/irqchip/irq-gic-v4.c index c8e324cd8911..6fa0edd19659 100644 --- a/drivers/irqchip/irq-gic-v4.c +++ b/drivers/irqchip/irq-gic-v4.c @@ -270,6 +270,25 @@ void its_free_vcpu_irqs(struct its_vm *vm) irq_domain_free_fwnode(vm->fwnode); }
+void its_free_vcpu_irq(struct kvm_vcpu *vcpu) +{ + struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe; + unsigned int irq = irq_find_mapping(vpe->lpi_domain, 0); + + if (WARN_ON(!irq)) + return; + + irq_domain_free_irqs(irq, 1); + irq_domain_remove(vpe->lpi_domain); + irq_domain_free_fwnode(vpe->lpi_fwnode); + + /* Reset vPE fields to prevent stale references during re-enablement */ + vpe->its_vm = NULL; + vpe->irq = 0; + vpe->lpi_domain = NULL; + vpe->lpi_fwnode = NULL; +} + static int its_send_vpe_cmd(struct its_vpe *vpe, struct its_cmd_info *info) { return irq_set_vcpu_affinity(vpe->irq, info); diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h index bc493fed75ab..bd3e8de35147 100644 --- a/include/linux/irqchip/arm-gic-v4.h +++ b/include/linux/irqchip/arm-gic-v4.h @@ -146,6 +146,7 @@ struct its_cmd_info { int its_alloc_vcpu_irqs(struct its_vm *vm); int its_alloc_vcpu_irq(struct kvm_vcpu *vcpu); void its_free_vcpu_irqs(struct its_vm *vm); +void its_free_vcpu_irq(struct kvm_vcpu *vcpu); int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en); int its_make_vpe_non_resident(struct its_vpe *vpe, bool db); int its_commit_vpe(struct its_vpe *vpe);
Per-vCPU vLPI enable, disable, and query ioctls should finish reading or writing the state of their target vCPU before another operation does the same.
Implement a vlpi_toggle_mutex to be acquired whenever KVM_ENABLE_VCPU_VLPI, KVM_DISABLE_VCPU_VLPI, or KVM_QUERY_VCPU_VLPI ioctls are handled.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.com --- arch/arm64/kvm/arm.c | 3 +++ arch/arm64/kvm/vgic/vgic-v4.c | 5 +++++ include/kvm/arm_vgic.h | 3 +++ 3 files changed, 11 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 180eaa4165e9..c2224664f05e 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1961,6 +1961,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) if (!vcpu) return -EINVAL;
+ guard(mutex)(&vcpu->arch.vgic_cpu.vlpi_toggle_mutex); return kvm_vgic_enable_vcpu_vlpi(vcpu); } case KVM_DISABLE_VCPU_VLPI: { @@ -1974,6 +1975,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) if (!vcpu) return -EINVAL;
+ guard(mutex)(&vcpu->arch.vgic_cpu.vlpi_toggle_mutex); return kvm_vgic_disable_vcpu_vlpi(vcpu); } case KVM_QUERY_VCPU_VLPI: { @@ -1987,6 +1989,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) if (!vcpu) return -EINVAL;
+ guard(mutex)(&vcpu->arch.vgic_cpu.vlpi_toggle_mutex); return kvm_vgic_query_vcpu_vlpi(vcpu); } default: diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index 39fababf2861..b7dbc1789c90 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -325,6 +325,11 @@ int vgic_v4_init(struct kvm *kvm) if (!dist->its_vm.vprop_page) ret = -ENOMEM; } + + /* vLPI toggle mutex */ + kvm_for_each_vcpu(i, vcpu, kvm) + mutex_init(&vcpu->arch.vgic_cpu.vlpi_toggle_mutex); + #endif if (ret) vgic_v4_teardown(kvm); diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 18a49c4b83f8..c9dad0e84c77 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -372,6 +372,9 @@ struct vgic_cpu { u32 rdreg_index; atomic_t syncr_busy;
+ /* Ensure atomicity of per-vCPU vLPI enable/disable/query operations */ + struct mutex vlpi_toggle_mutex; + /* Contains the attributes and gpa of the LPI pending tables. */ u64 pendbaser; /* GICR_CTLR.{ENABLE_LPIS,RWP} */
When KVM_ENABLE_VCPU_VLPI is enabled, vCPU vPEs are dynamically allocated and dislocated ad-hoc. vSGI direct injection requires receiving vCPUs to have vPEs, meaning we must couple vSGI enablement with vPE allocation to avoid injecting vSGIs to nonexistent vPEs.
Modify vgic_v4_configure_sgis() to validate whether a target vCPU has an assigned vPE before calling vgic_v4_enable_vsgis() on boot. Call vgic_v4_enable_vsgis() and vgic_v4_disable_vsgis() during vCPU vPE alloc and free within vLPI enablement and disablement functions.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de --- arch/arm64/kvm/vgic/vgic-v4.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c index b7dbc1789c90..5d6694d366b5 100644 --- a/arch/arm64/kvm/vgic/vgic-v4.c +++ b/arch/arm64/kvm/vgic/vgic-v4.c @@ -198,7 +198,7 @@ void vgic_v4_configure_vsgis(struct kvm *kvm) kvm_arm_halt_guest(kvm);
kvm_for_each_vcpu(i, vcpu, kvm) { - if (dist->nassgireq) + if (dist->nassgireq && kvm_vgic_query_vcpu_vlpi(vcpu) > 0) vgic_v4_enable_vsgis(vcpu); else vgic_v4_disable_vsgis(vcpu); @@ -838,6 +838,13 @@ int kvm_vgic_enable_vcpu_vlpi(struct kvm_vcpu *vcpu) if (ret) return ret;
+ /* Enable direct vSGIs */ + if (kvm_vgic_global_state.has_gicv4_1 && vcpu->kvm->arch.vgic.nassgireq) { + mutex_lock(&vcpu->kvm->arch.config_lock); + vgic_v4_enable_vsgis(vcpu); + mutex_unlock(&vcpu->kvm->arch.config_lock); + } + /* * Upgrade existing LPIs to vLPIs. We * do not need to error check since @@ -859,6 +866,8 @@ int kvm_vgic_disable_vcpu_vlpi(struct kvm_vcpu *vcpu)
downgrade_existing_vlpis_to_lpis(vcpu);
+ vgic_v4_disable_vsgis(vcpu); + return vgic_v4_vcpu_teardown(vcpu); }
Since GITS_TYPER.PTA == 0, the ITS MAPC command demands a CPU ID, rather than a physical redistributor address, for its RDbase command argument.
As such, when MAPC-ing guest ITS collections, vgic_lpi_stress iterates over CPU IDs in the range [0, nr_cpus), passing them as the RDbase vcpu_id argument to its_send_mapc_cmd().
However, its_encode_target() in the its_send_mapc_cmd() selftest handler expects RDbase arguments to have a 16 bit offset, as shown by the 16-bit target_addr right shift its implementation:
its_mask_encode(&cmd->raw_cmd[2], target_addr >> 16, 51, 16)
At the moment, all CPU IDs passed into its_send_mapc_cmd() have no offset, therefore becoming 0x0 after the bit shift. Thus, when vgic_its_cmd_handle_mapc() receives the ITS command in vgic-its.c, it always interprets the RDbase target CPU as CPU 0. All interrupts sent to collections will be processed by vCPU 0, which defeats the purpose of this multi-vCPU test.
Fix by creating procnum_to_rdbase() helper function, which left-shifts the vCPU parameter received by its_send_mapc_cmd 16 bits before passing it to its_encode_target for encoding.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de --- This patch has already been merged as a fix in Linux 6.18-rc6 as a24f7af. --- tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c index aec1b69a4de3..7f9fdcf42ae6 100644 --- a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c +++ b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c @@ -15,6 +15,8 @@ #include "gic_v3.h" #include "processor.h"
+#define GITS_COLLECTION_TARGET_SHIFT 16 + static u64 its_read_u64(unsigned long offset) { return readq_relaxed(GITS_BASE_GVA + offset); @@ -163,6 +165,11 @@ static void its_encode_collection(struct its_cmd_block *cmd, u16 col) its_mask_encode(&cmd->raw_cmd[2], col, 15, 0); }
+static u64 procnum_to_rdbase(u32 vcpu_id) +{ + return vcpu_id << GITS_COLLECTION_TARGET_SHIFT; +} + #define GITS_CMDQ_POLL_ITERATIONS 0
static void its_send_cmd(void *cmdq_base, struct its_cmd_block *cmd) @@ -217,7 +224,7 @@ void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool val
its_encode_cmd(&cmd, GITS_CMD_MAPC); its_encode_collection(&cmd, collection_id); - its_encode_target(&cmd, vcpu_id); + its_encode_target(&cmd, procnum_to_rdbase(vcpu_id)); its_encode_valid(&cmd, valid);
its_send_cmd(cmdq_base, &cmd);
At the moment, all MSIs injected from userspace using KVM_SIGNAL_MSI are preempted by the hypervisor and handled by software. To properly test GICv4 direct vLPI injection from KVM selftests, we write a KVM_DEBUG_GIC_MSI_SETUP ioctl that manually creates an IRQ routing table entry for the specified MSI, and populates ITS structures (device, collection, and interrupt translation table entries) to map the MSI to a vLPI. We then call GICv4 kvm_vgic_v4_set_forwarding to let the vLPI bypass hypervisor traps and inject directly to the vCPU.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de --- arch/arm64/kvm/arm.c | 34 +++++++ arch/arm64/kvm/vgic/vgic-its.c | 138 +++++++++++++++++++++++++++++ arch/arm64/kvm/vgic/vgic.h | 1 + include/linux/irqchip/arm-gic-v3.h | 1 + include/uapi/linux/kvm.h | 15 ++++ 5 files changed, 189 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index c2224664f05e..ecc3c87889db 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -45,6 +45,8 @@ #include <kvm/arm_pmu.h> #include <kvm/arm_psci.h>
+#include <vgic/vgic.h> + #include "sys_regs.h"
static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT; @@ -1992,6 +1994,38 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) guard(mutex)(&vcpu->arch.vgic_cpu.vlpi_toggle_mutex); return kvm_vgic_query_vcpu_vlpi(vcpu); } + case KVM_DEBUG_GIC_MSI_SETUP: { + /* Define interrupt ID boundaries for input validation */ + #define GIC_LPI_OFFSET 8192 + #define GIC_LPI_MAX 65535 + #define SPI_INTID_MIN 32 + #define SPI_INTID_MAX 1019 + + struct kvm_debug_gic_msi_setup params; + struct kvm_vcpu *vcpu; + + if (copy_from_user(¶ms, argp, sizeof(params))) + return -EFAULT; + + /* validate vcpu_id is in range and exists */ + vcpu = kvm_get_vcpu_by_id(kvm, params.vcpu_id); + if (!vcpu) + return -EINVAL; + + /* validate vintid is in LPI range */ + if (params.vintid < GIC_LPI_OFFSET || params.vintid > GIC_LPI_MAX) + return -EINVAL; + + /* + * Validate host_irq is in safe range -- we use SPI range since + * selftests guests will have no shared peripheral devices + */ + if (params.host_irq < SPI_INTID_MIN || params.host_irq > SPI_INTID_MAX) + return -EINVAL; + + /* Mock single MSI for testing */ + return debug_gic_msi_setup_mock_msi(kvm, ¶ms); + } default: return -EINVAL; } diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c index 5f3bbf24cc2f..a0d140ce35d1 100644 --- a/arch/arm64/kvm/vgic/vgic-its.c +++ b/arch/arm64/kvm/vgic/vgic-its.c @@ -2815,3 +2815,141 @@ int kvm_vgic_register_its_device(void) return kvm_register_device_ops(&kvm_arm_vgic_its_ops, KVM_DEV_TYPE_ARM_VGIC_ITS); } + +static struct vgic_its *vgic_get_its(struct kvm *kvm, + struct kvm_kernel_irq_routing_entry *irq_entry) +{ + struct kvm_msi msi = (struct kvm_msi) { + .address_lo = irq_entry->msi.address_lo, + .address_hi = irq_entry->msi.address_hi, + .data = irq_entry->msi.data, + .flags = irq_entry->msi.flags, + .devid = irq_entry->msi.devid, + }; + + return vgic_msi_to_its(kvm, &msi); +} + +/* + * debug_gic_msi_setup_mock_msi - manually set up vLPI direct injection infrastructure + * for an MSI upon userspace request. Used for testing vLPIs from selftests. + * + * Creates an IRQ routing entry mapping the specified MSI signature to a mock + * host IRQ, then populates ITS structures (device, collection, ITE) to establish + * the DevID/EventID to LPI translation. Finally enables GICv4 vLPI forwarding + * to bypass software emulation and inject interrupts directly to the vCPU. + * + * This function is intended solely for KVM selftests via KVM_DEBUG_GIC_MSI_SETUP. + * It uses mock host IRQs in the SPI range assuming no real hardware devices are + * present on a selftest guest. Using this interface in production will corrupt the + * IRQ routing table. + */ +int debug_gic_msi_setup_mock_msi(struct kvm *kvm, struct kvm_debug_gic_msi_setup *params) +{ + struct kvm_irq_routing_entry user_entry; + struct kvm_kernel_irq_routing_entry entry; + struct vgic_its *its; + struct its_device *device; + struct its_collection *collection; + struct its_ite *ite; + struct vgic_irq *irq; + struct kvm_vcpu *vcpu; + u64 doorbell_addr = GITS_BASE_GPA + GITS_TRANSLATER; + u32 device_id = params->device_id; + u32 event_id = params->event_id; + u32 coll_id = params->vcpu_id; + u32 lpi_nr = params->vintid; + gpa_t itt_addr = params->itt_addr; + int ret; + int host_irq = params->host_irq; + + /* Get target vCPU, validate it has a vPE for direct injection */ + vcpu = kvm_get_vcpu(kvm, params->vcpu_id); + if (!vcpu) + return -EINVAL; + else if (!vcpu->arch.vgic_cpu.vgic_v3.its_vpe.its_vm) + return -ENXIO; /* vPE not currently enabled for this vCPU */ + + /* + * Enable this vLPIs for this vCPU manually for testing, normally + * done by guest writing GICR_CTLR + */ + atomic_set(&vcpu->arch.vgic_cpu.ctlr, GICR_CTLR_ENABLE_LPIS); + + // Unmap any existing vLPI on the mock host IRQ (remnants from prior mocks) + kvm_vgic_v4_unset_forwarding(kvm, host_irq); + + /* Create mock user IRQ routing entry using kvm_set_routing_entry function */ + memset(&user_entry, 0, sizeof(user_entry)); + user_entry.gsi = host_irq; + user_entry.type = KVM_IRQ_ROUTING_MSI; + user_entry.u.msi.address_lo = doorbell_addr & 0xFFFFFFFF; + user_entry.u.msi.address_hi = doorbell_addr >> 32; + user_entry.u.msi.data = event_id; + user_entry.u.msi.devid = device_id; + user_entry.flags = KVM_MSI_VALID_DEVID; + + /* Initialize kernel routing entry */ + memset(&entry, 0, sizeof(entry)); + + /* Use vgic-irqfd.c function to create entry */ + ret = kvm_set_routing_entry(kvm, &entry, &user_entry); + if (ret) + return ret; + + /* Now that we created an MSI -> ITS mapping, we can populate the ITS for this MSI */ + + /* Get ITS instance */ + its = vgic_get_its(kvm, &entry); + if (IS_ERR(its)) + return PTR_ERR(its); + + /* Enable ITS manually for testing, normally done by guest writing to GITS_CTLR register */ + its->enabled = true; + + mutex_lock(&its->its_lock); + + /* Create ITS device */ + device = vgic_its_alloc_device(its, device_id, itt_addr, 8); + if (IS_ERR(device)) { + ret = PTR_ERR(device); + goto unlock; + } + + /* Create collection mapped to inputted vcpu */ + ret = vgic_its_alloc_collection(its, &collection, coll_id); + if (ret) + goto unlock; + + collection->target_addr = params->vcpu_id; // Map to specified vcpu + + /* Create ITE */ + ite = vgic_its_alloc_ite(device, collection, event_id); + if (IS_ERR(ite)) { + ret = PTR_ERR(ite); + vgic_its_free_collection(its, coll_id); + goto unlock; + } + + /* Create LPI */ + irq = vgic_add_lpi(kvm, lpi_nr, vcpu); + if (IS_ERR(irq)) { + ret = PTR_ERR(irq); + its_free_ite(kvm, ite); + vgic_its_free_collection(its, coll_id); + goto unlock; + } + + ite->irq = irq; + update_affinity_ite(kvm, ite); + + /* Now that routing entry is initialized, call v4 forwarding setup */ + ret = kvm_vgic_v4_set_forwarding_locked(kvm, host_irq, &entry, its); + + mutex_unlock(&its->its_lock); + return ret; + +unlock: + mutex_unlock(&its->its_lock); + return ret; +} diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index b16419eb9121..9f8be87e3294 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -475,5 +475,6 @@ bool kvm_per_vcpu_vlpi_supported(void); int kvm_vgic_enable_vcpu_vlpi(struct kvm_vcpu *vcpu); int kvm_vgic_disable_vcpu_vlpi(struct kvm_vcpu *vcpu); int kvm_vgic_query_vcpu_vlpi(struct kvm_vcpu *vcpu); +int debug_gic_msi_setup_mock_msi(struct kvm *kvm, struct kvm_debug_gic_msi_setup *params);
#endif diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 5031a4c25543..1ab1eb80e685 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -378,6 +378,7 @@ #define GITS_CIDR3 0xfffc
#define GITS_TRANSLATER 0x10040 +#define GITS_BASE_GPA 0x8000000ULL
#define GITS_SGIR 0x20020
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 002fe0f4841d..057eb9e61ac8 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1457,6 +1457,21 @@ struct kvm_enc_region { #define KVM_DISABLE_VCPU_VLPI _IOW(KVMIO, 0xf1, int) #define KVM_QUERY_VCPU_VLPI _IOR(KVMIO, 0xf2, int)
+/* + * Generate an IRQ routing entry and vLPI tables for userspace-sourced + * MSI, enabling direct vLPI injection testing from selftests + */ +#define KVM_DEBUG_GIC_MSI_SETUP _IOW(KVMIO, 0xf3, struct kvm_debug_gic_msi_setup) + +struct kvm_debug_gic_msi_setup { + __u32 device_id; + __u32 event_id; + __u32 vcpu_id; + __u32 vintid; + __u32 host_irq; + __u64 itt_addr; +}; + #define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (1 << 0) #define KVM_DIRTY_LOG_INITIALLY_SET (1 << 1)
vgic_lpi_stress stress tests LPIs by injecting MSIs to a guest's GIC. Since these MSIs are purely defined in userspace, they are handled as LPIs by the hypervisor as software interrupts.
We provide two ways to stress test direct-injected vLPIs. - When per-vCPU vLPI injection is disabled, use -D flag to upgrade all LPIs fired by the stress test to vLPIs. This flag mocks a host_irq for each MSI and calls KVM_DEBUG_GIC_MSI_SETUP to create and map vITS data structures needed for direct injection. - When per-vCPU vLPI injection is enabled, use -s flag to pass per-vCPU command strings to control the state of vPE initialization on each vCPU throughout the test. Allows stress testing vLPI injection on partially-enabled VMs.
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de --- .../selftests/kvm/arm64/vgic_lpi_stress.c | 181 +++++++++++++++++- 1 file changed, 177 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c b/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c index e857a605f577..b3fe5fdf4285 100644 --- a/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c +++ b/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c @@ -18,9 +18,17 @@ #include "ucall.h" #include "vgic.h"
+#define KVM_DEBUG_GIC_MSI_SETUP _IOW(KVMIO, 0xf3, struct kvm_debug_gic_msi_setup) + #define TEST_MEMSLOT_INDEX 1
#define GIC_LPI_OFFSET 8192 +#define SPI_IRQ_RANGE_OFFSET 32 + +static bool vlpi_enabled; +static bool string_mode; +static char **vcpu_strings; +static bool *vcpu_enabled;
static size_t nr_iterations = 1000; static vm_paddr_t gpa_base; @@ -222,6 +230,79 @@ static void setup_gic(void) its_fd = vgic_its_setup(vm); }
+static int enable_msi_vlpi_injection(u32 device_id, u32 event_id, + u32 vcpu_id, u32 vintid, u32 host_irq) +{ + struct kvm_debug_gic_msi_setup params = { + .device_id = device_id, + .event_id = event_id, + .vcpu_id = vcpu_id, + .vintid = vintid, + .host_irq = host_irq, + .itt_addr = test_data.itt_tables + (device_id * SZ_64K) + }; + + return __vm_ioctl(vm, KVM_DEBUG_GIC_MSI_SETUP, ¶ms); +} + +static void upgrade_vcpu_lpis(struct kvm_vcpu *vcpu) +{ + u32 intid = GIC_LPI_OFFSET; + u32 target_vcpu = 0; /* Start round-robin from vCPU 0 */ + u32 device_id, event_id; + + for (device_id = 0; device_id < test_data.nr_devices; device_id++) { + for (event_id = 0; event_id < test_data.nr_event_ids; + event_id++) { + /* + * Only setup vLPI mapping if this is the target vCPU + * for this interrupt + */ + if (target_vcpu == vcpu->id) { + /* + * we mock host_irqs in the SPI interrupt range + * of 100-1020 since selftest guests have no + * hardware devices + */ + int ret = enable_msi_vlpi_injection(device_id, + event_id, vcpu->id, intid, + intid - GIC_LPI_OFFSET + SPI_IRQ_RANGE_OFFSET); + + if (ret == -ENXIO || ret == -1) { + pr_info("Direct vLPI injection is disabled for vCPU %d, defaulting to software LPI handling\n", + vcpu->id); + return; + } + TEST_ASSERT(ret == 0, + "KVM_DEBUG_GIC_MSI_SETUP failed: %d\n", + ret); + } + + intid++; + target_vcpu = (target_vcpu + 1) % test_data.nr_cpus; + } + } +} + + + +static void enable_vcpu_vlpi_injection(int vcpu_id) +{ + int ret = ioctl(vm->fd, KVM_ENABLE_VCPU_VLPI, &vcpu_id); + + TEST_ASSERT(ret == 0, "KVM_ENABLE_VCPU_VLPI failed: %d", ret); + pr_info("Enabled vLPI injection on vCPU %d\n", vcpu_id); + upgrade_vcpu_lpis(vcpus[vcpu_id]); +} + +static void disable_vcpu_vlpi_injection(int vcpu_id) +{ + int ret = ioctl(vm->fd, KVM_DISABLE_VCPU_VLPI, &vcpu_id); + + TEST_ASSERT(ret == 0, "KVM_DISABLE_VCPU_VLPI failed: %d", ret); + pr_info("Disabled vLPI injection on vCPU %d\n", vcpu_id); +} + static void signal_lpi(u32 device_id, u32 event_id) { vm_paddr_t db_addr = GITS_BASE_GPA + GITS_TRANSLATER; @@ -243,18 +324,36 @@ static void signal_lpi(u32 device_id, u32 event_id) }
static pthread_barrier_t test_setup_barrier; +static pthread_barrier_t vlpi_upgrade_barrier;
static void *lpi_worker_thread(void *data) { u32 device_id = (size_t)data; u32 event_id; size_t i; + int vcpu_id;
pthread_barrier_wait(&test_setup_barrier); - - for (i = 0; i < nr_iterations; i++) + pthread_barrier_wait(&vlpi_upgrade_barrier); + + for (i = 0; i < nr_iterations; i++) { + /* conduct per-vCPU vLPI enablement/disablement */ + if (string_mode) { + for (vcpu_id = 0; vcpu_id < test_data.nr_cpus; vcpu_id++) { + char action = vcpu_strings[vcpu_id][i]; + + if (action == 'e' && !vcpu_enabled[vcpu_id]) { + enable_vcpu_vlpi_injection(vcpu_id); + vcpu_enabled[vcpu_id] = true; + } else if (action == 'd' && vcpu_enabled[vcpu_id]) { + disable_vcpu_vlpi_injection(vcpu_id); + vcpu_enabled[vcpu_id] = false; + } + } + } for (event_id = 0; event_id < test_data.nr_event_ids; event_id++) signal_lpi(device_id, event_id); + }
return NULL; } @@ -270,6 +369,10 @@ static void *vcpu_worker_thread(void *data) switch (get_ucall(vcpu, &uc)) { case UCALL_SYNC: pthread_barrier_wait(&test_setup_barrier); + /* if flag is set, set direct injection mappings for MSIs */ + if (vlpi_enabled) + upgrade_vcpu_lpis(vcpu); + pthread_barrier_wait(&vlpi_upgrade_barrier); continue; case UCALL_DONE: return NULL; @@ -309,6 +412,7 @@ static void run_test(void) TEST_ASSERT(lpi_threads && vcpu_threads, "Failed to allocate pthread arrays");
pthread_barrier_init(&test_setup_barrier, NULL, nr_vcpus + nr_devices + 1); + pthread_barrier_init(&vlpi_upgrade_barrier, NULL, nr_vcpus + nr_devices + 1);
for (i = 0; i < nr_vcpus; i++) pthread_create(&vcpu_threads[i], NULL, vcpu_worker_thread, vcpus[i]); @@ -317,6 +421,7 @@ static void run_test(void) pthread_create(&lpi_threads[i], NULL, lpi_worker_thread, (void *)i);
pthread_barrier_wait(&test_setup_barrier); + pthread_barrier_wait(&vlpi_upgrade_barrier); /* Wait for all vLPI upgrades */
clock_gettime(CLOCK_MONOTONIC, &start);
@@ -361,13 +466,71 @@ static void destroy_vm(void) free(vcpus); }
+static int parse_vcpu_strings(const char *str) +{ + char *token, *saveptr, *str_copy; + int count = 0, len = -1, i; + + str_copy = strdup(str); + TEST_ASSERT(str_copy, "Failed to allocate string copy"); + + token = strtok_r(str_copy, ",", &saveptr); + while (token) { + count++; + token = strtok_r(NULL, ",", &saveptr); + } + free(str_copy); + + vcpu_strings = malloc(count * sizeof(char *)); + vcpu_enabled = calloc(count, sizeof(bool)); + TEST_ASSERT(vcpu_strings && vcpu_enabled, "Failed to allocate arrays"); + + str_copy = strdup(str); + token = strtok_r(str_copy, ",", &saveptr); + for (i = 0; i < count; i++) { + int token_len = strlen(token); + + if (len == -1) + len = token_len; + else if (len != token_len) + TEST_FAIL("All strings must have same length"); + + TEST_ASSERT(len > 0, "Strings cannot be empty"); + + for (int j = 0; j < token_len; j++) + if (token[j] != 'd' && token[j] != 'e') + TEST_FAIL("Strings can only contain 'd' and 'e'"); + + vcpu_strings[i] = strdup(token); + TEST_ASSERT(vcpu_strings[i], "Failed to allocate string"); + token = strtok_r(NULL, ",", &saveptr); + } + free(str_copy); + + test_data.nr_cpus = count; + test_data.nr_devices = 1; + test_data.nr_event_ids = count; + nr_iterations = len; + + return 0; +} + static void pr_usage(const char *name) { - pr_info("%s [-v NR_VCPUS] [-d NR_DEVICES] [-e NR_EVENTS] [-i ITERS] -h\n", name); + pr_info("%s -D [-v NR_VCPUS] [-d NR_DEVICES] [-e NR_EVENTS] [-i ITERS] | -s STRINGS -h\n", + name); + pr_info(" -D:\tenable direct vLPI injection (default: %s)\n", + vlpi_enabled ? "true" : "false"); pr_info(" -v:\tnumber of vCPUs (default: %u)\n", test_data.nr_cpus); pr_info(" -d:\tnumber of devices (default: %u)\n", test_data.nr_devices); pr_info(" -e:\tnumber of event IDs per device (default: %u)\n", test_data.nr_event_ids); pr_info(" -i:\tnumber of iterations (default: %lu)\n", nr_iterations); + pr_info(" -s:\tvCPU control strings (comma-separated, e.g., "dede,eede"),\n"); + pr_info(" \twhere each string corresponds to the per-iteration vLPI status\n"); + pr_info(" \tof a single vCPU. "ddeed" means a vCPU will be vLPI-disabled for two\n"); + pr_info(" \titerations, enabled for two iterations, then disabled for one iteration.\n"); + pr_info(" \tNumber of strings corresponds to the number of vCPUs, and all strings must\n"); + pr_info(" \tbe of the same size. Cannot be used in conjunction with other flags.\n"); }
int main(int argc, char **argv) @@ -377,8 +540,11 @@ int main(int argc, char **argv)
TEST_REQUIRE(kvm_supports_vgic_v3());
- while ((c = getopt(argc, argv, "hv:d:e:i:")) != -1) { + while ((c = getopt(argc, argv, "hDv:d:e:i:s:")) != -1) { switch (c) { + case 'D': + vlpi_enabled = true; + break; case 'v': test_data.nr_cpus = atoi(optarg); break; @@ -391,6 +557,10 @@ int main(int argc, char **argv) case 'i': nr_iterations = strtoul(optarg, NULL, 0); break; + case 's': + string_mode = true; + parse_vcpu_strings(optarg); + break; case 'h': default: pr_usage(argv[0]); @@ -398,6 +568,9 @@ int main(int argc, char **argv) } }
+ if (string_mode && argc > 3) + TEST_FAIL("-s cannot be used with other flags"); + nr_threads = test_data.nr_cpus + test_data.nr_devices; if (nr_threads > get_nprocs()) pr_info("WARNING: running %u threads on %d CPUs; performance is degraded.\n",
Add a selftest for KVM API ioctls for enabling, disabling, and querying direct vLPI injection capability on a per-vCPU basis. Ensure that ITS data structures remain correct, vPEIDs can be reused by different vCPUs, and ioctl behavior works as intended in corner cases (idempotent behavior, vGIC uninitialized).
Signed-off-by: Maximilian Dittgen mdittgen@amazon.de --- arch/arm64/kvm/arm.c | 4 + drivers/irqchip/irq-gic-v3-its.c | 6 + include/linux/irqchip/arm-gic-v4.h | 1 + include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/arm64/per_vcpu_vlpi.c | 274 ++++++++++++++++++ 6 files changed, 287 insertions(+) create mode 100644 tools/testing/selftests/kvm/arm64/per_vcpu_vlpi.c
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index ecc3c87889db..eea0d77508a2 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -21,6 +21,7 @@ #include <linux/sched/stat.h> #include <linux/psci.h> #include <trace/events/kvm.h> +#include <linux/irqchip/arm-gic-v4.h>
#define CREATE_TRACE_POINTS #include "trace_arm.h" @@ -429,6 +430,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ARM_PER_VCPU_VLPI: r = kvm_per_vcpu_vlpi_supported(); break; + case KVM_CAP_ARM_MAX_VPEID: + r = its_get_max_vpeid(); + break;
default: r = 0; diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 0e0778d61df2..078a9cafaf17 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -4546,6 +4546,12 @@ static const struct irq_domain_ops its_sgi_domain_ops = { .deactivate = its_sgi_irq_domain_deactivate, };
+int its_get_max_vpeid(void) +{ + return ITS_MAX_VPEID; +} +EXPORT_SYMBOL_GPL(its_get_max_vpeid); + static int its_vpe_id_alloc(void) { return ida_alloc_max(&its_vpeid_ida, ITS_MAX_VPEID - 1, GFP_KERNEL); diff --git a/include/linux/irqchip/arm-gic-v4.h b/include/linux/irqchip/arm-gic-v4.h index bd3e8de35147..3a42cccb72af 100644 --- a/include/linux/irqchip/arm-gic-v4.h +++ b/include/linux/irqchip/arm-gic-v4.h @@ -147,6 +147,7 @@ int its_alloc_vcpu_irqs(struct its_vm *vm); int its_alloc_vcpu_irq(struct kvm_vcpu *vcpu); void its_free_vcpu_irqs(struct its_vm *vm); void its_free_vcpu_irq(struct kvm_vcpu *vcpu); +int its_get_max_vpeid(void); int its_make_vpe_resident(struct its_vpe *vpe, bool g0en, bool g1en); int its_make_vpe_non_resident(struct its_vpe *vpe, bool db); int its_commit_vpe(struct its_vpe *vpe); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 057eb9e61ac8..9f0ae2096e58 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -974,6 +974,7 @@ struct kvm_enable_cap { #define KVM_CAP_GUEST_MEMFD_FLAGS 244 #define KVM_CAP_ARM_SEA_TO_USER 245 #define KVM_CAP_ARM_PER_VCPU_VLPI 246 +#define KVM_CAP_ARM_MAX_VPEID 247
struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index 02a7663c097b..71a929ef7e5d 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -162,6 +162,7 @@ TEST_GEN_PROGS_arm64 += arm64/host_sve TEST_GEN_PROGS_arm64 += arm64/hypercalls TEST_GEN_PROGS_arm64 += arm64/external_aborts TEST_GEN_PROGS_arm64 += arm64/page_fault_test +TEST_GEN_PROGS_arm64 += arm64/per_vcpu_vlpi TEST_GEN_PROGS_arm64 += arm64/psci_test TEST_GEN_PROGS_arm64 += arm64/sea_to_user TEST_GEN_PROGS_arm64 += arm64/set_id_regs diff --git a/tools/testing/selftests/kvm/arm64/per_vcpu_vlpi.c b/tools/testing/selftests/kvm/arm64/per_vcpu_vlpi.c new file mode 100644 index 000000000000..9a5b1b40ff10 --- /dev/null +++ b/tools/testing/selftests/kvm/arm64/per_vcpu_vlpi.c @@ -0,0 +1,274 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test per-vCPU vLPI enable/disable/query correctness + */ + +#include <linux/kvm.h> +#include <pthread.h> +#include <sys/resource.h> +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "gic.h" +#include "vgic.h" +#include "../kselftest_harness.h" + +static int MAX_VCPUS; +static int ITS_MAX_VPEID; + +/* Dynamically fetch MAX_VCPUS and ITS_MAX_VPEID values */ +__attribute__((constructor)) +static void init_test_limits(void) +{ + int kvm_fd = open("/dev/kvm", O_RDWR); + int max_vcpus, max_vpeids; + + if (kvm_fd >= 0) { + max_vcpus = ioctl(kvm_fd, KVM_CHECK_EXTENSION, KVM_CAP_MAX_VCPUS); + if (max_vcpus > 0) + MAX_VCPUS = max_vcpus; + + max_vpeids = ioctl(kvm_fd, KVM_CHECK_EXTENSION, KVM_CAP_ARM_MAX_VPEID); + if (max_vpeids > 0) + ITS_MAX_VPEID = max_vpeids; + + close(kvm_fd); + } +} + +static void guest_code(void) +{ + GUEST_SYNC(0); + GUEST_DONE(); +} + +static void setup_vm_with_gic(struct kvm_vm **vm, struct kvm_vcpu **vcpu, int nr_vcpus) +{ + struct kvm_vcpu **vcpus; + + TEST_REQUIRE(kvm_supports_vgic_v3()); + + if (nr_vcpus == 1) { + *vm = vm_create_with_one_vcpu(vcpu, guest_code); + } else { + vcpus = calloc(nr_vcpus, sizeof(*vcpus)); + TEST_ASSERT(vcpus, "Failed to allocate vcpu array"); + *vm = vm_create_with_vcpus(nr_vcpus, guest_code, vcpus); + *vcpu = vcpus[0]; + free(vcpus); + } +} + +static void cleanup_vm(struct kvm_vm *vm, int its_fd) +{ + if (its_fd >= 0) + close(its_fd); + kvm_vm_free(vm); +} + +TEST(basic_vlpi_toggle) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + int its_fd, ret; + int vcpu_id = 0; + + setup_vm_with_gic(&vm, &vcpu, 1); + its_fd = vgic_its_setup(vm); + + ret = ioctl(vm->fd, KVM_QUERY_VCPU_VLPI, &vcpu_id); + EXPECT_GE(ret, 0); + + ret = ioctl(vm->fd, KVM_ENABLE_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + ret = ioctl(vm->fd, KVM_QUERY_VCPU_VLPI, &vcpu_id); + EXPECT_GT(ret, 0); + + ret = ioctl(vm->fd, KVM_DISABLE_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + ret = ioctl(vm->fd, KVM_QUERY_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + cleanup_vm(vm, its_fd); +} + +/* recycle test */ +struct thread_data { + struct kvm_vm *vm; + int vcpu_id; + int ret; +}; + +static void *vlpi_thread(void *arg) +{ + struct thread_data *data = arg; + + data->ret = ioctl(data->vm->fd, KVM_ENABLE_VCPU_VLPI, &data->vcpu_id); + if (data->ret == 0) + data->ret = ioctl(data->vm->fd, KVM_DISABLE_VCPU_VLPI, &data->vcpu_id); + + return NULL; +} + +TEST(vpeid_recycling) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + int its_fd; + int vcpu_id, i; + int cycles = (ITS_MAX_VPEID * 2) / MAX_VCPUS; + pthread_t threads[MAX_VCPUS]; + struct thread_data data[MAX_VCPUS]; + + setup_vm_with_gic(&vm, &vcpu, MAX_VCPUS); + its_fd = vgic_its_setup(vm); + + for (i = 0; i < cycles; i++) { + for (vcpu_id = 0; vcpu_id < MAX_VCPUS; vcpu_id++) { + data[vcpu_id].vm = vm; + data[vcpu_id].vcpu_id = vcpu_id; + pthread_create(&threads[vcpu_id], NULL, vlpi_thread, &data[vcpu_id]); + } + + for (vcpu_id = 0; vcpu_id < MAX_VCPUS; vcpu_id++) { + pthread_join(threads[vcpu_id], NULL); + EXPECT_EQ(data[vcpu_id].ret, 0); + } + } + + cleanup_vm(vm, its_fd); +} + +TEST(double_enable_disable) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + int its_fd, ret; + int vcpu_id = 0; + + setup_vm_with_gic(&vm, &vcpu, 1); + its_fd = vgic_its_setup(vm); + + ret = ioctl(vm->fd, KVM_ENABLE_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + ret = ioctl(vm->fd, KVM_ENABLE_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + ret = ioctl(vm->fd, KVM_QUERY_VCPU_VLPI, &vcpu_id); + EXPECT_GT(ret, 0); + + ret = ioctl(vm->fd, KVM_DISABLE_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + ret = ioctl(vm->fd, KVM_DISABLE_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + ret = ioctl(vm->fd, KVM_QUERY_VCPU_VLPI, &vcpu_id); + EXPECT_EQ(ret, 0); + + cleanup_vm(vm, its_fd); +} + +TEST(uninitialized_vcpu) +{ + struct kvm_vm *vm; + struct kvm_vcpu *vcpu; + int its_fd, ret; + int invalid_vcpu_id = 999; + + setup_vm_with_gic(&vm, &vcpu, 1); + its_fd = vgic_its_setup(vm); + + ret = ioctl(vm->fd, KVM_QUERY_VCPU_VLPI, &invalid_vcpu_id); + EXPECT_LT(ret, 0); + + ret = ioctl(vm->fd, KVM_ENABLE_VCPU_VLPI, &invalid_vcpu_id); + EXPECT_LT(ret, 0); + + ret = ioctl(vm->fd, KVM_DISABLE_VCPU_VLPI, &invalid_vcpu_id); + EXPECT_LT(ret, 0); + + cleanup_vm(vm, its_fd); +} + +TEST(vpeid_exhaustion) +{ + struct rlimit rlim; + struct kvm_vm **vms; + struct kvm_vcpu **vcpus; + int *its_fds; + /* Allocate enough VMs to exhaust vPEs, plus one */ + int num_vms = ITS_MAX_VPEID / MAX_VCPUS + 1; + int remainder_vcpus = ITS_MAX_VPEID % MAX_VCPUS; + int vm_idx, vcpu_id, ret; + int successful_enables = 0; + + /* Raise fd limit if below vPE limit, as we can't allocate enough vCPUs */ + if (getrlimit(RLIMIT_NOFILE, &rlim) == 0) { + struct rlimit new_rlim = rlim; + /* + * Require [num_vms * (vcpus_per_vm + VM_fd + ITS_fd) + KVM] file + * descriptors, tripled for safety. + */ + int required_fds = (num_vms * (MAX_VCPUS + 2) + 1) * 3; + + if (rlim.rlim_cur < required_fds) { + new_rlim.rlim_cur = min_t(rlim_t, required_fds, rlim.rlim_max); + if (setrlimit(RLIMIT_NOFILE, &new_rlim) != 0) { + SKIP(return, "Need %d FDs, have %ld, cannot increase limit", + required_fds, rlim.rlim_cur); + } + } + } + + vms = calloc(num_vms, sizeof(*vms)); + vcpus = calloc(num_vms, sizeof(*vcpus)); + its_fds = calloc(num_vms, sizeof(*its_fds)); + TEST_ASSERT(vms && vcpus && its_fds, "Failed to allocate VM arrays"); + + /* Create all VMs */ + for (vm_idx = 0; vm_idx < num_vms; vm_idx++) { + setup_vm_with_gic(&vms[vm_idx], &vcpus[vm_idx], MAX_VCPUS); + its_fds[vm_idx] = vgic_its_setup(vms[vm_idx]); + } + + /* Exhaust all vPEs */ + for (vm_idx = 0; vm_idx < num_vms - 1; vm_idx++) { + for (vcpu_id = 0; vcpu_id < MAX_VCPUS; vcpu_id++) { + ret = ioctl(vms[vm_idx]->fd, KVM_ENABLE_VCPU_VLPI, &vcpu_id); + if (ret == 0) + successful_enables++; + } + } + + for (vcpu_id = 0; vcpu_id < remainder_vcpus; vcpu_id++) { + ret = ioctl(vms[num_vms - 1]->fd, KVM_ENABLE_VCPU_VLPI, &vcpu_id); + if (ret == 0) + successful_enables++; + } + + /* Should have exhausted vPEID limit */ + TEST_ASSERT(successful_enables == ITS_MAX_VPEID, + "Failed to allocate all existing vPEIDs"); + + /* Try assigning one more vPEID past exhaustion*/ + vcpu_id = remainder_vcpus; + ret = ioctl(vms[num_vms - 1]->fd, KVM_ENABLE_VCPU_VLPI, &vcpu_id); + + /* Verify failure to allocate additional vPEID */ + TEST_ASSERT(ret < 0, "Failed to detect vPEID exhaustion"); + + /* Cleanup all VMs */ + for (vm_idx = 0; vm_idx < num_vms; vm_idx++) + cleanup_vm(vms[vm_idx], its_fds[vm_idx]); + + free(vms); + free(vcpus); + free(its_fds); + setrlimit(RLIMIT_NOFILE, &rlim); /* Restore fd limit */ +} + +TEST_HARNESS_MAIN
Maximilian: you keep ignoring the reviewers that are listed in MAINTAINERS. This isn't acceptable. Next time, I will simply ignore your patches.
On Thu, 20 Nov 2025 14:02:49 +0000, Maximilian Dittgen mdittgen@amazon.de wrote:
At the moment, the ability to direct-inject vLPIs is only enableable on an all-or-nothing per-VM basis, causing unnecessary I/O performance loss in cases where a VM's vCPU count exceeds available vPEs. This RFC introduces per-vCPU control over vLPI injection to realize potential I/O performance gain in such situations.
Background
The value of dynamically enabling the direct injection of vLPIs on a per-vCPU basis is the ability to run guest VMs with simultaneous hardware-forwarded and software-forwarded message-signaled interrupts.
Currently, hardware-forwarded vLPI direct injection on a KVM guest requires GICv4 and is enabled on a per-VM, all-or-nothing basis. vLPI injection enablment happens in two stages:
1) At vGIC initialization, allocate direct injection structures for each vCPU (doorbell IRQ, vPE table entry, virtual pending table, vPEID). 2) When a PCI device is configured for passthrough, map its MSIs to vLPIs using the structures allocated in step 1.Step 1 is all-or-nothing; if any vCPU cannot be configured with the vPE structures necessary for direct injection, the vPEs of all vCPUs are torn down and direct injection is disabled VM-wide.
This universality of direct vLPI injection enablement sparks several issues, with the most pressing being performance degradation on overcommitted hosts.
VM-wide vLPI enablement creates resource inefficiency when guest VMs have more vCPUs than the host has available vPEIDs. The amount of vPEIDs (and consequently, vPEs) a host can allocate is constrained by hardware and defined by GICD_TYPER2.VID + 1 (ITS_MAX_VPEID). Since direct injection requires a vCPU to be assigned a vPEID, at most ITS_MAX_VPEID vCPUs can be configured for direct injection at a time. Because vLPI direct injection is all-or-nothing on a VM, if a new guest VM would exhaust remaining vPEIDs, all vCPUs on that VM would fall back to hypervisor-forwarded LPIs, causing considerable I/O performance degradation.
Such performance degradation is exemplified on hosts with CPU overcommitment. Overcommitting an arbitrarily high number of vCPUs enables a VM's vCPU count to easily exceed the host's available vPEIDs.
Let it be crystal clear: GICv4 and overcommitment is a non-story. It isn't designed for that. If that's what you are trying to achieve, you clearly didn't get the memo.
Even with marginally more vCPUs than vPEIDs, the current all-or-nothing vLPI paradigm disables direct injection entirely. This creates two problems: first, a single many-vCPU overcommitted VM loses all direct injection despite having vPEIDs available;
Are you saying that your HW is so undersized that you cannot create a *single VM* with direct injection? You really have fewer than 9 bit worth of VPEIDs? I'm sorry, but that's laughable. Even a $200 dev board does better.
second, on multi-tenant hosts, VMs booted first consume all vPEIDs, leaving later VMs without direct injection regardless of their I/O intensity. Per-vCPU control would allow userspace to allocate available vPEIDs across VMs based on I/O workload rather than boot order or per-VM vCPU count. This per-vCPU granularity recovers most of the direct injection performance benefit instead of losing it completely.
To allow this per-vCPU granularity, this RFC introduces three new ioctls to the KVM API that enables userspace the ability to activate/deactivate direct vLPI injection capability and resources to vCPUs ad-hoc during VM runtime.
How can that even work when changing the affinity of a vLPI (directly injected) to a vcpu that doesn't have direct injection enabled? You'd have to unmap the vLPI, and plug it back as a normal LPI. Not only this is absolutely ridiculous from a performance perspective, but you are also guaranteed to lose interrupts that would have fired in the meantime. Losing interrupts in a total no-go.
Before I even look at the code, I you to explain how you are dealing with this.
M.
linux-kselftest-mirror@lists.linaro.org