On Fri, Jul 16, 2021 at 02:25:04PM +0200, Pali Rohár wrote:
commit f18139966d072dab8e4398c95ce955a9742e04f7 upstream.
Trying to start a new PIO transfer by writing value 0 in PIO_START register when previous transfer has not yet completed (which is indicated by value 1 in PIO_START) causes an External Abort on CPU, which results in kernel panic:
SError Interrupt on CPU0, code 0xbf000002 -- SError Kernel panic - not syncing: Asynchronous SError Interrupt
To prevent kernel panic, it is required to reject a new PIO transfer when previous one has not finished yet.
If previous PIO transfer is not finished yet, the kernel may issue a new PIO request only if the previous PIO transfer timed out.
In the past the root cause of this issue was incorrectly identified (as it often happens during link retraining or after link down event) and special hack was implemented in Trusted Firmware to catch all SError events in EL3, to ignore errors with code 0xbf000002 and not forwarding any other errors to kernel and instead throw panic from EL3 Trusted Firmware handler.
Links to discussion and patches about this issue: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dc... https://lore.kernel.org/linux-pci/20190316161243.29517-1-repk@triplefau.lt/ https://lore.kernel.org/linux-pci/971be151d24312cc533989a64bd454b4@www.loen.... https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541
But the real cause was the fact that during link retraining or after link down event the PIO transfer may take longer time, up to the 1.44s until it times out. This increased probability that a new PIO transfer would be issued by kernel while previous one has not finished yet.
After applying this change into the kernel, it is possible to revert the mentioned TF-A hack and SError events do not have to be caught in TF-A EL3.
Link: https://lore.kernel.org/r/20210608203655.31228-1-pali@kernel.org Signed-off-by: Pali Rohár pali@kernel.org Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Marek Behún kabel@kernel.org Cc: stable@vger.kernel.org # 7fbcb5da811b ("PCI: aardvark: Don't rely on jiffies while holding spinlock") [pali: Backported to 4.19 version]
This patch is backported to 4.19 version. It depends on commit 7fbcb5da811b as presented on Cc: stable line.
drivers/pci/controller/pci-aardvark.c | 49 ++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 9 deletions(-)
Now queued up, thanks.
greg k-h