Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI

8 Nov 2018

      On 07/11/2018 18:46, Marc Zyngier wrote:
...
On 07/11/18 12:57, Gustavo Pimentel wrote:
...
On 06/11/2018 16:00, Marc Zyngier wrote:
...
On 06/11/18 14:53, Lorenzo Pieralisi wrote:
...
[CC Marc]
On Sat, Oct 27, 2018 at 12:00:57AM +0000, Trent Piepho wrote:
...
This reverts commit 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status
after it is handled, not before").
This is a very real race that we observed quickly after switching from
4.13 to 4.16.  Using a custom PCI-e endpoint and driver, I was able to
track it to the precise race and verify the fixed behavior, as will be
described below.
This bug was originally fixed in 2013, in commit ca1658921b63 ("PCI:
designware: Fix missing MSI IRQs") The discussion of that commit,
archived in patchwork [1], is informative and worth reading.
The bug was re-added in the '8c934 commit this is reverting, which
appeared in the 4.14 kernel.
Unfortunately, Synopsys appears to consider the operation of this PCI-e
controller secret.  They provide no publicly available docs for it nor
allow the references manuals of SoCs using the controller to publish any
documentation of it.
So I can not say certain this code is correctly using the controller's
features.  However, extensive testing gives me high confidence in the
accuracy of what is described below.
If an MSI is received by the PCI-e controller while the status register
bit for that MSI is set, the PCI-e controller will NOT generate another
interrupt.  In addition, it will NOT queue or otherwise mark the
interrupt as "pending", which means it will NOT generate an interrupt
when the status bit is unmasked.
This gives the following race scenario:

An MSI is received by, and the status bit for the MSI is set in, the

DWC PCI-e controller.
2.  dw_handle_msi_irq() calls a driver's registered interrupt handler
for the MSI received.
3.  At some point, the interrupt handler must decide, correctly, that
there is no more work to do and return.
4.  The hardware generates a new MSI.  As the MSI's status bit is still
set, this new MSI is ignored.
6.  dw_handle_msi_irq() unsets the MSI status bit.
The MSI received at point 4 will never be acted upon.  It occurred after
the driver had finished checking the hardware status for interrupt
conditions to act on.  Since the MSI status was masked, it does not
generated a new IRQ, neither when it was received nor when the MSI is
unmasked.
It seems clear there is an unsolvable race here.
After this patch, the sequence becomes as follows:

An MSI is received and the status bit for the MSI is set in the

DWC PCI-e controller.
2.  dw_handle_msi_irq() clears this MSI status bit.
3.  dw_handle_msi_irq() calls a driver's registered interrupt handler
for the MSI received.
3.  At some point, the interrupt handler must decide, correctly, that
there is no more work to do and return.
4.  The hardware generates a new MSI.  This sets the MSI status bit and
triggers another interrupt in the interrupt controller(s) above the DWC
PCI-e controller.  As the the dwc-pcie handler is not re-entrant, it is
not run again at this time.
6.  dw_handle_msi_irq() finishes.  The MSI status bit remains set.
7.  The IRQ is re-raised and dw_handle_msi_irq() runs again.
8.  dw_handle_msi_irq() invokes the MSI's registered interrupt handler
again as the status bit was still set.
Not sure why (5) is not used in your lists, I assume because you want
to highlight the race condition with the jump from 4 to 6 (or maybe
you do not like number 5 :), just curious).
...
The observant will notice that point 4 present the opportunity for the
SoC's interrupt controller to lose the interrupt in the same manner as
the bug in this driver.  The driver for that interrupt controller will
be written to properly deal with this.  In some cases the hardware
supports an EOI concept, where the 2nd IRQ is masked and internally
queued in the hardware, to be re-raised at EOI in step 7.  In other
cases the IRQ will be unmasked and re-raised at step 4, but the kernel
will see the handler is INPROGRESS and not re-invoke it, and instead set
a PENDING flag, which causes the handler to re-run at step 7.
[1] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_pa...
Fixes: 8c934095fa2f ("PCI: dwc: Clear MSI interrupt status after it is handled, not before")
I have two questions:

Commit 8c934095fa2f has been in the kernel for a year and no
regression was reported. It was assumed to fix a problem so before
reverting it I want to make sure we are not breaking anything else.
Your reasoning seems correct but I would pick Marc's brain on this
because I want to understand if what this patch does is what IRQ core
expects it to do, especially in relation to the IRQ chaining you
are mentioning.

It is hard to decide what the right solution is without understanding
exactly what this particular write actually does. It seems to be some
form of acknowledgement, but I'm only making an educated guess, and some
of the defines suggest that there might be another register for that.
This status register indicates whether exists or not a MSI interrupt on that
controller [0..7] to be handle.
OK. What happen when the interrupt is masked? Does this interrupt appear
in the status register? And what is the effect of writing to that register?
When an MSI is received for a masked interrupt, the corresponding status bit
gets set in the interrupt status register but the controller will not signal it.
As soon as the masked interrupt is unmasked and assuming the status bit is still
set the controller will signal it. Have I explain this clearly?
...
...
In theory, we should clear the interrupt flag only after the interrupt has
actually handled (which can take some time to process on the worst case scenario).
At this stage, we do not care about performance, but correctness. If we
loose interrupts, then the driver is not correct, and we need to address
this first.
Ok, Marc. Let see if I can summarize it. You are suggesting to change the
register from PCIE_MSI_INTR0_ENABLE to PCIE_MSI_INTR0_MASK on
dw_pci_bottom_mask() and dw_pci_bottom_unmask() and clearing the interrupt
status bit inside of dw_pci_bottom_ack() instead of dw_handle_msi_irq(), right?
...
...
However, the Trent's patch allows to acknowledge the flag and handle the
interrupt later, giving the opportunity to catch a possible new interrupt, which
will be handle by a new call of this function.
...
What I'm interested in is the relationship this has with the mask/unmask
callbacks, and whether masking the interrupt before acking it would help.
Although there is the possibility of mask/unmask the interruptions on the
controller, from what I've seen typically in other dw drivers this is not done.
Probably we don't get much benefit from using it.
I don't worry care about the drivers. I need to know what the HW does,
and how this maps to the Linux interrupt subsystem. At the moment, this
is just too blurry to be understandable.
Ok, let's clarify this then.
...
Thanks,
M.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI