From: Robert Richter rrichter@amd.com
[ Upstream commit d90acdf49e18029cfe4194475c45ef143657737a ]
Broken target lists are hard to discover as the driver fails at a later initialization stage. Add an error message for this.
Example log messages:
cxl_mem mem1: failed to find endpoint6:0000:e0:01.3 in target list of decoder1.1 cxl_port endpoint6: failed to register decoder6.0: -6 cxl_port endpoint6: probe: 0
Signed-off-by: Robert Richter rrichter@amd.com Reviewed-by: Gregory Price gourry@gourry.net Reviewed-by: Jonathan Cameron Jonathan.Cameron@huawei.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Alison Schofield alison.schofield@intel.com Reviewed-by: "Fabio M. De Francesco" fabio.m.de.francesco@linux.intel.com Tested-by: Gregory Price gourry@gourry.net Acked-by: Dan Williams dan.j.williams@intel.com Link: https://patch.msgid.link/20250509150700.2817697-14-rrichter@amd.com Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org ---
Now let me check what the commit adds specifically - the error message when the target is not found:
**YES**
This commit should be backported to stable kernel trees. Here's my extensive analysis:
## Analysis of the Commit
### **What the Commit Does** The commit adds a single `dev_err()` call in the `find_pos_and_ways()` function at `/drivers/cxl/core/region.c:1808`. When the function fails to find a port's parent_dport in the target list of a switch decoder, it now logs a specific error message identifying exactly which port and device could not be found in which decoder's target list.
### **Why This Should Be Backported**
#### **1. Debugging and Diagnostic Improvement** This commit significantly improves the debugging experience for CXL region configuration failures. The existing code path: - Returns `-ENXIO` when `cxlsd->target[i] == port->parent_dport` fails to match for any target - Provides no indication of *why* the failure occurred or *which specific* port/device was missing
The new error message provides crucial diagnostic information: ```c dev_err(port->uport_dev, "failed to find %s:%s in target list of %s\n", dev_name(&port->dev), dev_name(port->parent_dport->dport_dev), dev_name(&cxlsd->cxld.dev)); ```
#### **2. Critical Failure Context** Looking at the usage context in `cxl_calc_interleave_pos()`, when this function fails: - The calling code at `region.c:1891` sets `cxled->pos` to the negative error code - The region sorting process continues but records the failure (`rc = -ENXIO`) - The failure ultimately prevents proper CXL region initialization
Without this diagnostic message, administrators and developers have no clear indication of which specific hardware topology element is misconfigured.
#### **3. Characteristics Supporting Backporting**
**Small and Contained Change**: The commit adds exactly 4 lines of code - just a single error message. There are no functional changes to the logic, error handling paths, or data structures.
**No Risk of Regression**: The change is purely additive logging. It cannot introduce functional regressions since: - The error path already existed and returned the same error code - No control flow is modified - No data structures are changed
**Important Bugfix Category**: While not fixing a crash, this addresses a significant usability/debuggability issue. CXL region configuration failures are difficult to diagnose without proper error messaging.
**Follows Stable Rules**: This is exactly the type of change stable trees are designed for - minimal risk improvements that help users diagnose real problems in production systems.
#### **4. Comparison with Historical Commits** Looking at the reference commits, this aligns perfectly with commits marked "YES": - Similar to commits #1, #2, and #3 which add error handling and diagnostic improvements - Much safer than architectural changes (like commits #4 and #5 marked "NO") - Focuses on improving failure diagnosis rather than changing core functionality
#### **5. Production Value** CXL is increasingly deployed in production environments. When region configuration fails due to broken target lists, the current failure mode provides insufficient information for field diagnosis. This commit provides the specific device and decoder information needed to identify hardware topology issues.
The commit message explicitly states this addresses "Broken target lists are hard to discover as the driver fails at a later initialization stage" - this is precisely the type of production issue that stable backports should address.
### **Conclusion** This commit represents a minimal-risk diagnostic improvement that significantly enhances the debuggability of CXL region configuration failures in production systems. It meets all criteria for stable backporting: important user-facing improvement, minimal code change, zero regression risk, and clear production value.
drivers/cxl/core/region.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index c3f4dc244df77..24b161c7749f9 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1805,6 +1805,13 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range, } put_device(dev);
+ if (rc) + dev_err(port->uport_dev, + "failed to find %s:%s in target list of %s\n", + dev_name(&port->dev), + dev_name(port->parent_dport->dport_dev), + dev_name(&cxlsd->cxld.dev)); + return rc; }