On 22.10.24 18:43:15, Dan Williams wrote:
Changes since v1 [1]:
- Fix some misspellings missed by checkpatch in changelogs (Jonathan)
- Add comments explaining the order of objects in drivers/cxl/Makefile
(Jonathan)
- Rename attach_device => cxl_rescan_attach (Jonathan)
- Fixup Zijun's email (Zijun)
Original cover:
Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
delayed arrival of the CXL "root" infrastructure [1] prompted questions
of how the existing mechanism for retrying cxl_mem_probe() could be
failing.
I found a similar issue with the region creation.
A region is created with the first endpoint found and immediately
added as device which triggers cxl_region_probe(). Now, in
interleaving setups the region state comes into commit state only
after the last endpoint was probed. So the probe must be repeated
until all endpoints were enumerated. I ended up with this change:
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a07b62254596..c78704e435e5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
}
if (p->state < CXL_CONFIG_COMMIT) {
- dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
- rc = -ENXIO;
+ rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
+ "region config state: %d\n", p->state);
goto out;
}
--
2.39.5
I don't see an init order issue here as the mem module is always up
before the regions are probed.
-Robert
>
> The critical missing piece in the debug was that Gregory's setup had
> almost all CXL modules built-in to the kernel.
>
> On the way to that discovery several other bugs and init-order corner
> cases were discovered.
>
> The main fix is to make sure the drivers/cxl/Makefile object order
> supports root CXL ports being fully initialized upon cxl_acpi_probe()
> exit. The modular case has some similar potential holes that are fixed
> with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
> cxl_test to reproduce the original report resulted in the discovery of a
> separate long standing use after free bug in cxl_region_detach().
>
> [2]:
http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
>
> ---
>
> Dan Williams (6):
> cxl/port: Fix CXL port initialization order when the subsystem is built-in
> cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
> cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
> cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
> cxl/port: Prevent out-of-order decoder allocation
> cxl/test: Improve init-order fidelity relative to real-world systems
>
>
> drivers/base/core.c | 35 +++++++
> drivers/cxl/Kconfig | 1
> drivers/cxl/Makefile | 20 +++-
> drivers/cxl/acpi.c | 7 +
> drivers/cxl/core/hdm.c | 50 +++++++++--
> drivers/cxl/core/port.c | 13 ++-
> drivers/cxl/core/region.c | 91 ++++++++++---------
> drivers/cxl/cxl.h | 3 -
> include/linux/device.h | 3 +
> tools/testing/cxl/test/cxl.c | 200 +++++++++++++++++++++++-------------------
> tools/testing/cxl/test/mem.c | 1
> 11 files changed, 269 insertions(+), 155 deletions(-)
>
> base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b