Robert Richter wrote:
On 22.10.24 18:43:15, Dan Williams wrote:
Changes since v1 [1]:
- Fix some misspellings missed by checkpatch in changelogs (Jonathan)
- Add comments explaining the order of objects in drivers/cxl/Makefile (Jonathan)
- Rename attach_device => cxl_rescan_attach (Jonathan)
- Fixup Zijun's email (Zijun)
Original cover:
Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to delayed arrival of the CXL "root" infrastructure [1] prompted questions of how the existing mechanism for retrying cxl_mem_probe() could be failing.
I found a similar issue with the region creation.
A region is created with the first endpoint found and immediately added as device which triggers cxl_region_probe(). Now, in interleaving setups the region state comes into commit state only after the last endpoint was probed. So the probe must be repeated until all endpoints were enumerated. I ended up with this change:
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index a07b62254596..c78704e435e5 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev) } if (p->state < CXL_CONFIG_COMMIT) {
dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
rc = -ENXIO;
rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
"region config state: %d\n", p->state);
I would argue EPROBE_DEFER is not appropriate because there is no guarantee that the other members of the region show up, and if they do they will re-trigger probe. So "probe must be repeated until all endpoints were enumerated" is the case either way. I.e. either more endpoint arrival triggers re-probe or EPROBE_DEFER triggers extra redundant probing *and* still results in a probe attempts as endpoints arrive.
So a dev_dbg() plus -ENXIO return on uncommited region state is expected.
goto out;
} -- 2.39.5
I don't see an init order issue here as the mem module is always up before the regions are probed.
Right, cxl_endpoint_port_probe() triggers region discovery and cxl_endpoint_port_probe() currently only triggers after cxl_mem has registered an endpoint port.
The failure this set is address is unwanted cxl_mem_probe() failures.