From: Niklas Schnelle schnelle@linux.ibm.com
[ Upstream commit 704e5dd1c02371dfc7d22e1520102b197a3b628b ]
Ever since uevent support was added for AER and EEH with commit 856e1eb9bdd4 ("PCI/AER: Add uevents in AER and EEH error/resume"), it reported PCI_ERS_RESULT_NONE as uevent when recovery begins.
Commit 7b42d97e99d3 ("PCI/ERR: Always report current recovery status for udev") subsequently amended AER to report the actual return value of error_detected().
Make the same change to EEH to align it with AER and s390.
Suggested-by: Lukas Wunner lukas@wunner.de Link: https://lore.kernel.org/linux-pci/aIp6LiKJor9KLVpv@wunner.de/ Signed-off-by: Niklas Schnelle schnelle@linux.ibm.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Lukas Wunner lukas@wunner.de Reviewed-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com Acked-by: Mahesh Salgaonkar mahesh@linux.ibm.com Link: https://patch.msgid.link/20250807-add_err_uevents-v5-3-adf85b0620b0@linux.ib... Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
YES
Rationale - Fixes incorrect uevent status at start of EEH recovery: the code currently emits a uevent with `PCI_ERS_RESULT_NONE` regardless of what the driver reported via `error_detected()`. This misrepresents the actual recovery status to user space. - The fix makes EEH behave like AER (already fixed by commit 7b42d97e99d3) and s390, improving cross-arch consistency and user space expectations.
Evidence in code - Current EEH behavior: emits BEGIN_RECOVERY unconditionally at error detection - `pci_uevent_ers(pdev, PCI_ERS_RESULT_NONE);` is called after `error_detected()` even if the driver “votes” differently (e.g., DISCONNECT/NEED_RESET): arch/powerpc/kernel/eeh_driver.c:337 - Proposed change: pass actual driver result - Changes the above call to `pci_uevent_ers(pdev, rc);`, where `rc` is the result of `driver->err_handler->error_detected()` captured just above: arch/powerpc/kernel/eeh_driver.c:337 - uevent mapping semantics (what user space sees) are centralized in `pci_uevent_ers()`: - NONE/CAN_RECOVER -> `ERROR_EVENT=BEGIN_RECOVERY`, `DEVICE_ONLINE=0` - RECOVERED -> `ERROR_EVENT=SUCCESSFUL_RECOVERY`, `DEVICE_ONLINE=1` - DISCONNECT -> `ERROR_EVENT=FAILED_RECOVERY`, `DEVICE_ONLINE=0` - Others (e.g., NEED_RESET) -> no immediate uevent (consistent with AER) - drivers/pci/pci-driver.c:1595 - AER already reports actual `error_detected()` return value to udev: - `pci_uevent_ers(dev, vote);` after computing `vote` in `report_error_detected()`: drivers/pci/pcie/err.c:83 - EEH already emits final-stage uevents correctly (unchanged by this patch): - Success at resume: `pci_uevent_ers(edev->pdev, PCI_ERS_RESULT_RECOVERED);` arch/powerpc/kernel/eeh_driver.c:432 - Failure path: `pci_uevent_ers(pdev, PCI_ERS_RESULT_DISCONNECT);` arch/powerpc/kernel/eeh_driver.c:462
Why this is a bugfix suitable for stable - User-visible correctness: With the current code, user space always sees “BEGIN_RECOVERY” even when drivers have already indicated an unrecoverable state (e.g., DISCONNECT). The patch ensures uevents reflect the true state immediately, matching AER behavior introduced by 7b42d97e99d3. - Minimal, contained change: One-line change in a single architecture- specific file (PowerPC EEH). No API/ABI changes; only corrects the parameter passed to an existing helper. - No architectural change: Keeps existing EEH flow; only adjusts the uevent status emitted at a single step. - Low regression risk: AER has used this semantic for years; `pci_uevent_ers()` already handles `rc` values. EEH already emits RECOVERED/DISCONNECT at later stages; this makes the initial event consistent. - Aligns cross-arch semantics: Consistent uevent reporting across AER, EEH, and s390 reduces user space special-casing and potential errors.
Potential side effects and why acceptable - For drivers returning `PCI_ERS_RESULT_DISCONNECT` at `error_detected()`, user space will now see `FAILED_RECOVERY` immediately instead of a misleading `BEGIN_RECOVERY`. This is a correctness fix. - For returns like `PCI_ERS_RESULT_NEED_RESET`, no initial uevent is emitted (consistent with AER); user space will still receive final RECOVERED/FAILED, as today. Any scripts that strictly expected an initial BEGIN_RECOVERY for all cases are already inconsistent with AER and should not rely on that behavior.
Historical context - Uevent support was added by 856e1eb9bdd4 (“PCI/AER: Add uevents in AER and EEH error/resume”), initially emitting `NONE` at error detection for both AER and EEH. - AER was corrected by 7b42d97e99d3 (“PCI/ERR: Always report current recovery status for udev”) to emit the actual `error_detected()` result. - This patch brings EEH to parity with that established AER behavior.
Conclusion - This is a targeted, low-risk correctness fix that improves user space observability and cross-arch consistency without changing kernel-side recovery logic. It fits stable backport rules (important bugfix, minimal change, low regression risk, confined to a subsystem).
arch/powerpc/kernel/eeh_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 48ad0116f3590..ef78ff77cf8f2 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c @@ -334,7 +334,7 @@ static enum pci_ers_result eeh_report_error(struct eeh_dev *edev, rc = driver->err_handler->error_detected(pdev, pci_channel_io_frozen);
edev->in_error = true; - pci_uevent_ers(pdev, PCI_ERS_RESULT_NONE); + pci_uevent_ers(pdev, rc); return rc; }