From: Timothy Pearson tpearson@raptorengineering.com
[ Upstream commit 8b9f128947dd72e0fcf256088a673abac9b720bf ]
PCI devices prior to PCI 2.3 both use level interrupts and do not support interrupt masking, leading to a failure when passed through to a KVM guest on at least the ppc64 platform. This failure manifests as receiving and acknowledging a single interrupt in the guest, while the device continues to assert the level interrupt indicating a need for further servicing.
When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following sequence occurs:
* Level IRQ assertion on device * IRQ marked disabled in kernel * Host interrupt handler exits without clearing the interrupt on the device * Eventfd is delivered to userspace * Guest processes IRQ and clears device interrupt * Device de-asserts INTx, then re-asserts INTx while the interrupt is masked * Newly asserted interrupt acknowledged by kernel VMM without being handled * Software mask removed by VFIO driver * Device INTx still asserted, host controller does not see new edge after EOI
The behavior is now platform-dependent. Some platforms (amd64) will continue to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ will be handled by the host as soon as the mask is dropped. Others (ppc64) will only send the one request, and if it is not handled no further interrupts will be sent. The former behavior theoretically leaves the system vulnerable to interrupt storm, and the latter will result in the device stalling after receiving exactly one interrupt in the guest.
Work around this by disabling lazy IRQ masking for DisINTx- INTx devices.
Signed-off-by: Timothy Pearson tpearson@raptorengineering.com Link: https://lore.kernel.org/r/333803015.1744464.1758647073336.JavaMail.zimbra@ra... Signed-off-by: Alex Williamson alex.williamson@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES – the change makes VFIO’s INTx masking work correctly on legacy devices that lack PCI 2.3 masking support, eliminating a real guest- visible interrupt loss.
- The handler already relies on `disable_irq_nosync()` when `pci_2_3` is false (`drivers/vfio/pci/vfio_pci_intrs.c:232-235`), but without this patch the disable stays “lazy,” so a level-triggered device that reasserts while masked never generates another host interrupt on platforms such as ppc64. The new call to `irq_set_status_flags(..., IRQ_DISABLE_UNLAZY)` for those devices (`drivers/vfio/pci/vfio_pci_intrs.c:307-309`) forces the core to perform an immediate hardware disable, exactly as recommended in the IRQ core (`kernel/irq/chip.c:380-408`), preventing the lost-interrupt stall described in the commit message. - Cleanup paths clear the flag both on request failure and normal teardown (`drivers/vfio/pci/vfio_pci_intrs.c:312-314` and `drivers/vfio/pci/vfio_pci_intrs.c:360-361`), so the change is tightly contained and doesn’t leak settings after the device is released. - The fix is small, self-contained, and only touches the legacy INTx path, leaving MSI/MSI-X and modern PCI 2.3 devices untouched. It uses long-standing IRQ APIs with no new dependencies.
Given the user-visible failure (guest stops receiving interrupts or risks storms) and the minimal, well-scoped fix, this is a good candidate for stable backporting. Suggested next step: backport to supported stable branches that ship the current VFIO INTx logic.
drivers/vfio/pci/vfio_pci_intrs.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 123298a4dc8f5..61d29f6b3730c 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -304,9 +304,14 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
+ if (!vdev->pci_2_3) + irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); + ret = request_irq(pdev->irq, vfio_intx_handler, irqflags, ctx->name, ctx); if (ret) { + if (!vdev->pci_2_3) + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); vdev->irq_type = VFIO_PCI_NUM_IRQS; kfree(name); vfio_irq_ctx_free(vdev, ctx, 0); @@ -352,6 +357,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev) vfio_virqfd_disable(&ctx->unmask); vfio_virqfd_disable(&ctx->mask); free_irq(pdev->irq, ctx); + if (!vdev->pci_2_3) + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); if (ctx->trigger) eventfd_ctx_put(ctx->trigger); kfree(ctx->name);