Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI

13 Nov 2018


      On Tue, 13 Nov 2018 00:41:24 +0000,
Gustavo Pimentel gustavo.pimentel@synopsys.com wrote:
...
On 09/11/2018 11:34, Marc Zyngier wrote:
...
On 08/11/18 19:49, Trent Piepho wrote:
...
On Thu, 2018-11-08 at 09:49 +0000, Marc Zyngier wrote:
...
On 07/11/18 20:17, Trent Piepho wrote:
...
On Wed, 2018-11-07 at 18:41 +0000, Marc Zyngier wrote:
...
On 06/11/18 19:40, Trent Piepho wrote:
>
> What about stable kernels that don't have the hierarchical API?
My goal is to fix mainline first. Once we have something that works on
mainline, we can look at propagating the fix to other versions. But
mainline always comes first.
This is a regression that went into 4.14.  Wouldn't the appropriate
action for the stable series be to undo the regression?
This is not how stable works. Stable kernels *only* contain patches that
are backported from mainline, and do not take standalone patch.
Furthermore, your fix is to actually undo someone else's fix. Who is
right? In the absence of any documentation, the answer is "nobody".
Little more history to this bug.  The code was originally the way it is
now, but this same bug was fixed in 2013 in https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.or&...
g/patch/3333681/
Then that lasted four years until it was changed Aug 2017 in https://urldefense.proofpoint.com/v2/url?u=https-3A__pa&d=DwICaQ&c=D...
tchwork.kernel.org/patch/9893303/
That lasted just six months until someone tried to revert it, https://urldefense.proofpoint.com/v2/url?u=https-3A__p&d=DwICaQ&c=DP...
atchwork.kernel.org/patch/9893303/
Seems pretty clear the way it is now is much worse than the way it was
before, even if the previous design may have had another flaw.  Though
I've yet to see anyone point out something makes the previous design
broken.  Sub-optimal yes, but not broken.
This is not what was reported by the previous submitter. I guess they
changed this for a reason, no? I'm prepared to admit this is a end-point
driver bug, but we need to understand what is happening and stop
changing this driver randomly.
...
...
Anything can be backported to stable once we understand the issue. At
the moment, we're just playing games moving stuff around and hope
nothing else will break. That's not a sustainable way of maintaining
this driver. At the moment, the only patch I'm inclined to propose until
we get an actual interrupt handling flow from Synopsys is to mark this
driver as "BROKEN".
It feels like you're using this bug to hold designware hostage in a
broken kernel, and me along with them.  I don't have the documentation,
no one does, there's no way for me to give you want you want.  But I've
got hardware that doesn't work in the mainline kernel.
I take it as a personal offence that I'd be holding anything or anyone
hostage. I think I have a long enough track record working with the
Linux kernel not to take any of this nonsense. What's my interest in
keeping anything in this sorry state? Think about it for a minute.
When I'm asking for documentation, I'm not asking you. I perfectly
understood that you don't have any. We need Synopsys to step up and give
us a simple description of what the MSI workflow is. Because no matter
how you randomly mutate this code, it will still be broken until we get
a clear answer from *Synopsys*.
Gustavo, here's one simple ask. Please reply to this email with a step
by step, register by register description of how an MSI must be handled
on this HW. We do need it, and we need it now.
Hi Marc, I thought that I did respond to your question on Nov 8. However I will
repeat it and add some extra information to it now.
About the registers:
PCIE_MSI_INTR0_MASK
When an MSI is received for a masked interrupt, the corresponding status bit
gets set in the interrupt status register but the controller will not signal it.
As soon as the masked interrupt is unmasked and assuming the status bit is still
set the controller will signal it.
So this is *really* a mask.
...
PCIE_MSI_INTR0_ENABLE
Enables/disables every interrupt. When an MSI is received from a disabled
interrupt, no status bit gets set in MSI controller interrupt status register.
And that's not a mask at all. This allows the interrupt to be simply
lost. It is just a bit worrying that this is exactly what the DWC
driver is using, and faces all kind of interrupt loss when
enabling/disabling interrupts (which do have a mask/unmask semantic --
yes, the language is confusing).
...
About this new callbacks:
The dw_pci_bottom_mask() and dw_pci_bottom_unmask() callbacks were added on
Linux kernel v4.17 on PCI: dwc: Move MSI IRQs allocation to IRQ domains
hierarchical API patch [1] and based on what I've seen so far, before this patch
there were no interrupt masking been performed.
Based on the information provided, its very likely that I should use the
PCIE_MSI_INTR0_MASK register instead of PCIE_MSI_INTR0_ENABLE on the
dw_pci_bottom_mask() and dw_pci_bottom_unmask() callbacks. As soon as I return
from Linux plumbers conference, I will test the system using the
PCIE_MSI_INTR0_MASK register.
I'm at Plumbers as well, so feel free to grab me and we'll sort this
out. We've wasted enough time on this broken driver.
Thanks,
M.
-- 
Jazz is not dead, it just smell funny.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] PCI: dwc: Fix interrupt race in when handling MSI