[PATCH 4.19 058/245] powerpc/eeh: Only dump stack once if an MMIO loop is detected

29 Sep 2020

From: Oliver O'Halloran oohall@gmail.com
[ Upstream commit 4e0942c0302b5ad76b228b1a7b8c09f658a1d58a ]
Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.
Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.
Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.
Signed-off-by: Oliver O'Halloran oohall@gmail.com
Reviewed-by: Sam Bobroff sbobroff@linux.ibm.com
Tested-by: Sam Bobroff sbobroff@linux.ibm.com
Signed-off-by: Michael Ellerman mpe@ellerman.id.au
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
Signed-off-by: Sasha Levin sashal@kernel.org
---
 arch/powerpc/kernel/eeh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index fe3c6f3bd3b62..d123cba0992d0 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -502,7 +502,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
    rc = 1;
    if (pe->state & EEH_PE_ISOLATED) {
    	pe->check_count++;
-		if (pe->check_count % EEH_MAX_FAILS == 0) {
+		if (pe->check_count == EEH_MAX_FAILS) {
    		dn = pci_device_to_OF_node(dev);
    		if (dn)
    			location = of_get_property(dn, "ibm,loc-code",
-- 
2.25.1




    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH 4.19 058/245] powerpc/eeh: Only dump stack once if an MMIO loop is detected