On the way to fixing and regression testing Jonathan's report of CXL region creation failure on a single-port host bridge configuration [1], several other fixes fell out. Details in the individual commits, but the fixes mostly revolve around leaked references and other bugs in the region creation failure case. All but the last fix are tagged for -stable. The final fix is cosmetic, but leaving it unfixed gives the appearance of another memory leak condition.
Lastly, the problematic configuration is added to cxl_test to allow for regression testing it going forward.
[1]: http://lore.kernel.org/r/20221010172057.00001559@huawei.com
---
Dan Williams (7): cxl/region: Fix region HPA ordering validation cxl/region: Fix cxl_region leak, cleanup targets at region delete cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak tools/testing/cxl: Fix some error exits tools/testing/cxl: Add a single-port host-bridge regression config cxl/region: Fix 'distance' calculation with passthrough ports cxl/region: Recycle region ids
drivers/cxl/core/pmem.c | 2 drivers/cxl/core/port.c | 11 +- drivers/cxl/core/region.c | 43 ++++++ drivers/cxl/cxl.h | 4 - drivers/cxl/pmem.c | 100 +++++++++----- tools/testing/cxl/test/cxl.c | 301 +++++++++++++++++++++++++++++++++++++++--- 6 files changed, 400 insertions(+), 61 deletions(-)
base-commit: 4f1aa35f1fb7d51b125487c835982af792697ecb
Some regions may not have any address space allocated. Skip them when validating HPA order otherwise a crash like the following may result:
devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9 BUG: kernel NULL pointer dereference, address: 0000000000000000 [..] RIP: 0010:store_targetN+0x655/0x1740 [cxl_core] [..] Call Trace: <TASK> kernfs_fop_write_iter+0x144/0x200 vfs_write+0x24a/0x4d0 ksys_write+0x69/0xf0 do_syscall_64+0x3a/0x90
store_targetN+0x655/0x1740: alloc_region_ref at drivers/cxl/core/region.c:676 (inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850 (inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290 (inlined by) attach_target at drivers/cxl/core/region.c:1410 (inlined by) store_targetN at drivers/cxl/core/region.c:1453
Cc: stable@vger.kernel.org Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/cxl/core/region.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index bb6f4fc84a3f..d26ca7a6beae 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -658,6 +658,9 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port, xa_for_each(&port->regions, index, iter) { struct cxl_region_params *ip = &iter->region->params;
+ if (!ip->res) + continue; + if (ip->res->start > p->res->start) { dev_dbg(&cxlr->dev, "%s: HPA order violation %s:%pr vs %pr\n",
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
Some regions may not have any address space allocated. Skip them when validating HPA order otherwise a crash like the following may result:
devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9 BUG: kernel NULL pointer dereference, address: 0000000000000000 [..] RIP: 0010:store_targetN+0x655/0x1740 [cxl_core] [..] Call Trace: <TASK> kernfs_fop_write_iter+0x144/0x200 vfs_write+0x24a/0x4d0 ksys_write+0x69/0xf0 do_syscall_64+0x3a/0x90
store_targetN+0x655/0x1740: alloc_region_ref at drivers/cxl/core/region.c:676 (inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850 (inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290 (inlined by) attach_target at drivers/cxl/core/region.c:1410 (inlined by) store_targetN at drivers/cxl/core/region.c:1453
Cc: stable@vger.kernel.org Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/cxl/core/region.c | 3 +++ 1 file changed, 3 insertions(+)
Makes sense,
Reviewed-by: Vishal Verma vishal.l.verma@intel.com
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index bb6f4fc84a3f..d26ca7a6beae 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -658,6 +658,9 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port, xa_for_each(&port->regions, index, iter) { struct cxl_region_params *ip = &iter->region->params; + if (!ip->res) + continue;
if (ip->res->start > p->res->start) { dev_dbg(&cxlr->dev, "%s: HPA order violation %s:%pr vs %pr\n",
On 11/3/2022 5:30 PM, Dan Williams wrote:
Some regions may not have any address space allocated. Skip them when validating HPA order otherwise a crash like the following may result:
devm_cxl_add_region: cxl_acpi cxl_acpi.0: decoder3.4: created region9 BUG: kernel NULL pointer dereference, address: 0000000000000000 [..] RIP: 0010:store_targetN+0x655/0x1740 [cxl_core] [..] Call Trace:
<TASK> kernfs_fop_write_iter+0x144/0x200 vfs_write+0x24a/0x4d0 ksys_write+0x69/0xf0 do_syscall_64+0x3a/0x90
store_targetN+0x655/0x1740: alloc_region_ref at drivers/cxl/core/region.c:676 (inlined by) cxl_port_attach_region at drivers/cxl/core/region.c:850 (inlined by) cxl_region_attach at drivers/cxl/core/region.c:1290 (inlined by) attach_target at drivers/cxl/core/region.c:1410 (inlined by) store_targetN at drivers/cxl/core/region.c:1453
Cc: stable@vger.kernel.org Fixes: 384e624bb211 ("cxl/region: Attach endpoint decoders") Signed-off-by: Dan Williams dan.j.williams@intel.com
Reviewed-by: Dave Jiang dave.jiang@intel.com
drivers/cxl/core/region.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index bb6f4fc84a3f..d26ca7a6beae 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -658,6 +658,9 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port, xa_for_each(&port->regions, index, iter) { struct cxl_region_params *ip = &iter->region->params;
if (!ip->res)
continue;
- if (ip->res->start > p->res->start) { dev_dbg(&cxlr->dev, "%s: HPA order violation %s:%pr vs %pr\n",
When a region is deleted any targets that have been previously assigned to that region hold references to it. Trigger those references to drop by detaching all targets at unregister_region() time.
Otherwise that region object will leak as userspace has lost the ability to detach targets once region sysfs is torn down.
Cc: stable@vger.kernel.org Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/cxl/core/region.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index d26ca7a6beae..c52465e09f26 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1557,8 +1557,19 @@ static struct cxl_region *to_cxl_region(struct device *dev) static void unregister_region(void *dev) { struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + int i;
device_del(dev); + + /* + * Now that region sysfs is shutdown, the parameter block is now + * read-only, so no need to hold the region rwsem to access the + * region parameters. + */ + for (i = 0; i < p->interleave_ways; i++) + detach_target(cxlr, i); + cxl_region_iomem_release(cxlr); put_device(dev); }
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
When a region is deleted any targets that have been previously assigned to that region hold references to it. Trigger those references to drop by detaching all targets at unregister_region() time.
Otherwise that region object will leak as userspace has lost the ability to detach targets once region sysfs is torn down.
Cc: stable@vger.kernel.org Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/cxl/core/region.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
Looks good,
Reviewed-by: Vishal Verma vishal.l.verma@intel.com
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index d26ca7a6beae..c52465e09f26 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1557,8 +1557,19 @@ static struct cxl_region *to_cxl_region(struct device *dev) static void unregister_region(void *dev) { struct cxl_region *cxlr = to_cxl_region(dev); + struct cxl_region_params *p = &cxlr->params; + int i; device_del(dev);
+ /* + * Now that region sysfs is shutdown, the parameter block is now + * read-only, so no need to hold the region rwsem to access the + * region parameters. + */ + for (i = 0; i < p->interleave_ways; i++) + detach_target(cxlr, i);
cxl_region_iomem_release(cxlr); put_device(dev); }
On 11/3/2022 5:30 PM, Dan Williams wrote:
When a region is deleted any targets that have been previously assigned to that region hold references to it. Trigger those references to drop by detaching all targets at unregister_region() time.
Otherwise that region object will leak as userspace has lost the ability to detach targets once region sysfs is torn down.
Cc: stable@vger.kernel.org Fixes: b9686e8c8e39 ("cxl/region: Enable the assignment of endpoint decoders to regions") Signed-off-by: Dan Williams dan.j.williams@intel.com
Reviewed-by: Dave Jiang dave.jiang@intel.com
drivers/cxl/core/region.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index d26ca7a6beae..c52465e09f26 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -1557,8 +1557,19 @@ static struct cxl_region *to_cxl_region(struct device *dev) static void unregister_region(void *dev) { struct cxl_region *cxlr = to_cxl_region(dev);
- struct cxl_region_params *p = &cxlr->params;
- int i;
device_del(dev);
- /*
* Now that region sysfs is shutdown, the parameter block is now
* read-only, so no need to hold the region rwsem to access the
* region parameters.
*/
- for (i = 0; i < p->interleave_ways; i++)
detach_target(cxlr, i);
- cxl_region_iomem_release(cxlr); put_device(dev); }
When a cxl_nvdimm object goes through a ->remove() event (device physically removed, nvdimm-bridge disabled, or nvdimm device disabled), then any associated regions must also be disabled. As highlighted by the cxl-create-region.sh test [1], a single device may host multiple regions, but the driver was only tracking one region at a time. This leads to a situation where only the last enabled region per nvdimm device is cleaned up properly. Other regions are leaked, and this also causes cxl_memdev reference leaks.
Fix the tracking by allowing cxl_nvdimm objects to track multiple region associations.
Cc: stable@vger.kernel.org Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1] Reported-by: Vishal Verma vishal.l.verma@intel.com Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/cxl/core/pmem.c | 2 + drivers/cxl/cxl.h | 2 - drivers/cxl/pmem.c | 100 ++++++++++++++++++++++++++++++----------------- 3 files changed, 67 insertions(+), 37 deletions(-)
diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index 1d12a8206444..36aa5070d902 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -188,6 +188,7 @@ static void cxl_nvdimm_release(struct device *dev) { struct cxl_nvdimm *cxl_nvd = to_cxl_nvdimm(dev);
+ xa_destroy(&cxl_nvd->pmem_regions); kfree(cxl_nvd); }
@@ -230,6 +231,7 @@ static struct cxl_nvdimm *cxl_nvdimm_alloc(struct cxl_memdev *cxlmd)
dev = &cxl_nvd->dev; cxl_nvd->cxlmd = cxlmd; + xa_init(&cxl_nvd->pmem_regions); device_initialize(dev); lockdep_set_class(&dev->mutex, &cxl_nvdimm_key); device_set_pm_not_required(dev); diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index f680450f0b16..1164ad49f3d3 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -423,7 +423,7 @@ struct cxl_nvdimm { struct device dev; struct cxl_memdev *cxlmd; struct cxl_nvdimm_bridge *bridge; - struct cxl_pmem_region *region; + struct xarray pmem_regions; };
struct cxl_pmem_region_mapping { diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index 0bac05d804bc..c98ff5eb85a6 100644 --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -30,17 +30,20 @@ static void unregister_nvdimm(void *nvdimm) struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm); struct cxl_nvdimm_bridge *cxl_nvb = cxl_nvd->bridge; struct cxl_pmem_region *cxlr_pmem; + unsigned long index;
device_lock(&cxl_nvb->dev); - cxlr_pmem = cxl_nvd->region; dev_set_drvdata(&cxl_nvd->dev, NULL); - cxl_nvd->region = NULL; - device_unlock(&cxl_nvb->dev); + xa_for_each(&cxl_nvd->pmem_regions, index, cxlr_pmem) { + get_device(&cxlr_pmem->dev); + device_unlock(&cxl_nvb->dev);
- if (cxlr_pmem) { device_release_driver(&cxlr_pmem->dev); put_device(&cxlr_pmem->dev); + + device_lock(&cxl_nvb->dev); } + device_unlock(&cxl_nvb->dev);
nvdimm_delete(nvdimm); cxl_nvd->bridge = NULL; @@ -366,25 +369,48 @@ static int match_cxl_nvdimm(struct device *dev, void *data)
static void unregister_nvdimm_region(void *nd_region) { - struct cxl_nvdimm_bridge *cxl_nvb; - struct cxl_pmem_region *cxlr_pmem; + nvdimm_region_delete(nd_region); +} + +static int cxl_nvdimm_add_region(struct cxl_nvdimm *cxl_nvd, + struct cxl_pmem_region *cxlr_pmem) +{ + int rc; + + rc = xa_insert(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem, + cxlr_pmem, GFP_KERNEL); + if (rc) + return rc; + + get_device(&cxlr_pmem->dev); + return 0; +} + +static void cxl_nvdimm_del_region(struct cxl_nvdimm *cxl_nvd, struct cxl_pmem_region *cxlr_pmem) +{ + /* + * It is possible this is called without a corresponding + * cxl_nvdimm_add_region for @cxlr_pmem + */ + cxlr_pmem = xa_erase(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem); + if (cxlr_pmem) + put_device(&cxlr_pmem->dev); +} + +static void release_mappings(void *data) +{ int i; + struct cxl_pmem_region *cxlr_pmem = data; + struct cxl_nvdimm_bridge *cxl_nvb = cxlr_pmem->bridge;
- cxlr_pmem = nd_region_provider_data(nd_region); - cxl_nvb = cxlr_pmem->bridge; device_lock(&cxl_nvb->dev); for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_nvdimm *cxl_nvd = m->cxl_nvd;
- if (cxl_nvd->region) { - put_device(&cxlr_pmem->dev); - cxl_nvd->region = NULL; - } + cxl_nvdimm_del_region(cxl_nvd, cxlr_pmem); } device_unlock(&cxl_nvb->dev); - - nvdimm_region_delete(nd_region); }
static void cxlr_pmem_remove_resource(void *res) @@ -422,7 +448,7 @@ static int cxl_pmem_region_probe(struct device *dev) if (!cxl_nvb->nvdimm_bus) { dev_dbg(dev, "nvdimm bus not found\n"); rc = -ENXIO; - goto err; + goto out_nvb; }
memset(&mappings, 0, sizeof(mappings)); @@ -431,7 +457,7 @@ static int cxl_pmem_region_probe(struct device *dev) res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL); if (!res) { rc = -ENOMEM; - goto err; + goto out_nvb; }
res->name = "Persistent Memory"; @@ -442,11 +468,11 @@ static int cxl_pmem_region_probe(struct device *dev)
rc = insert_resource(&iomem_resource, res); if (rc) - goto err; + goto out_nvb;
rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res); if (rc) - goto err; + goto out_nvb;
ndr_desc.res = res; ndr_desc.provider_data = cxlr_pmem; @@ -462,7 +488,7 @@ static int cxl_pmem_region_probe(struct device *dev) nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL); if (!nd_set) { rc = -ENOMEM; - goto err; + goto out_nvb; }
ndr_desc.memregion = cxlr->id; @@ -472,9 +498,13 @@ static int cxl_pmem_region_probe(struct device *dev) info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL); if (!info) { rc = -ENOMEM; - goto err; + goto out_nvb; }
+ rc = devm_add_action_or_reset(dev, release_mappings, cxlr_pmem); + if (rc) + goto out_nvd; + for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_memdev *cxlmd = m->cxlmd; @@ -486,7 +516,7 @@ static int cxl_pmem_region_probe(struct device *dev) dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV; - goto err; + goto out_nvd; }
/* safe to drop ref now with bridge lock held */ @@ -498,10 +528,17 @@ static int cxl_pmem_region_probe(struct device *dev) dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV; - goto err; + goto out_nvd; } - cxl_nvd->region = cxlr_pmem; - get_device(&cxlr_pmem->dev); + + /* + * Pin the region per nvdimm device as those may be released + * out-of-order with respect to the region, and a single nvdimm + * maybe associated with multiple regions + */ + rc = cxl_nvdimm_add_region(cxl_nvd, cxlr_pmem); + if (rc) + goto out_nvd; m->cxl_nvd = cxl_nvd; mappings[i] = (struct nd_mapping_desc) { .nvdimm = nvdimm, @@ -527,27 +564,18 @@ static int cxl_pmem_region_probe(struct device *dev) nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc); if (!cxlr_pmem->nd_region) { rc = -ENOMEM; - goto err; + goto out_nvd; }
rc = devm_add_action_or_reset(dev, unregister_nvdimm_region, cxlr_pmem->nd_region); -out: +out_nvd: kfree(info); +out_nvb: device_unlock(&cxl_nvb->dev); put_device(&cxl_nvb->dev);
return rc; - -err: - dev_dbg(dev, "failed to create nvdimm region\n"); - for (i--; i >= 0; i--) { - nvdimm = mappings[i].nvdimm; - cxl_nvd = nvdimm_provider_data(nvdimm); - put_device(&cxl_nvd->region->dev); - cxl_nvd->region = NULL; - } - goto out; }
static struct cxl_driver cxl_pmem_region_driver = {
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
When a cxl_nvdimm object goes through a ->remove() event (device physically removed, nvdimm-bridge disabled, or nvdimm device disabled), then any associated regions must also be disabled. As highlighted by the cxl-create-region.sh test [1], a single device may host multiple regions, but the driver was only tracking one region at a time. This leads to a situation where only the last enabled region per nvdimm device is cleaned up properly. Other regions are leaked, and this also causes cxl_memdev reference leaks.
Fix the tracking by allowing cxl_nvdimm objects to track multiple region associations.
Cc: stable@vger.kernel.org Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh%C2%A0%5B1] Reported-by: Vishal Verma vishal.l.verma@intel.com Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/cxl/core/pmem.c | 2 + drivers/cxl/cxl.h | 2 - drivers/cxl/pmem.c | 100 ++++++++++++++++++++++++++++++----------------- 3 files changed, 67 insertions(+), 37 deletions(-)
One minor nit below, otherwise looks good to me.
Reviewed-by: Vishal Verma vishal.l.verma@intel.com
[..]
diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index 0bac05d804bc..c98ff5eb85a6 100644 --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -30,17 +30,20 @@ static void unregister_nvdimm(void *nvdimm) struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm); struct cxl_nvdimm_bridge *cxl_nvb = cxl_nvd->bridge; struct cxl_pmem_region *cxlr_pmem; + unsigned long index; device_lock(&cxl_nvb->dev); - cxlr_pmem = cxl_nvd->region; dev_set_drvdata(&cxl_nvd->dev, NULL); - cxl_nvd->region = NULL; - device_unlock(&cxl_nvb->dev); + xa_for_each(&cxl_nvd->pmem_regions, index, cxlr_pmem) { + get_device(&cxlr_pmem->dev); + device_unlock(&cxl_nvb->dev); - if (cxlr_pmem) { device_release_driver(&cxlr_pmem->dev); put_device(&cxlr_pmem->dev);
+ device_lock(&cxl_nvb->dev); } + device_unlock(&cxl_nvb->dev); nvdimm_delete(nvdimm); cxl_nvd->bridge = NULL; @@ -366,25 +369,48 @@ static int match_cxl_nvdimm(struct device *dev, void *data) static void unregister_nvdimm_region(void *nd_region) { - struct cxl_nvdimm_bridge *cxl_nvb; - struct cxl_pmem_region *cxlr_pmem; + nvdimm_region_delete(nd_region); +}
+static int cxl_nvdimm_add_region(struct cxl_nvdimm *cxl_nvd, + struct cxl_pmem_region *cxlr_pmem) +{ + int rc;
+ rc = xa_insert(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem, + cxlr_pmem, GFP_KERNEL); + if (rc) + return rc;
+ get_device(&cxlr_pmem->dev); + return 0; +}
+static void cxl_nvdimm_del_region(struct cxl_nvdimm *cxl_nvd, struct cxl_pmem_region *cxlr_pmem)
Split the long line?
+{ + /* + * It is possible this is called without a corresponding + * cxl_nvdimm_add_region for @cxlr_pmem + */ + cxlr_pmem = xa_erase(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem); + if (cxlr_pmem) + put_device(&cxlr_pmem->dev); +}
+static void release_mappings(void *data) +{ int i; + struct cxl_pmem_region *cxlr_pmem = data; + struct cxl_nvdimm_bridge *cxl_nvb = cxlr_pmem->bridge; - cxlr_pmem = nd_region_provider_data(nd_region); - cxl_nvb = cxlr_pmem->bridge; device_lock(&cxl_nvb->dev); for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_nvdimm *cxl_nvd = m->cxl_nvd; - if (cxl_nvd->region) { - put_device(&cxlr_pmem->dev); - cxl_nvd->region = NULL; - } + cxl_nvdimm_del_region(cxl_nvd, cxlr_pmem); } device_unlock(&cxl_nvb->dev);
- nvdimm_region_delete(nd_region); } static void cxlr_pmem_remove_resource(void *res) @@ -422,7 +448,7 @@ static int cxl_pmem_region_probe(struct device *dev) if (!cxl_nvb->nvdimm_bus) { dev_dbg(dev, "nvdimm bus not found\n"); rc = -ENXIO; - goto err; + goto out_nvb; } memset(&mappings, 0, sizeof(mappings)); @@ -431,7 +457,7 @@ static int cxl_pmem_region_probe(struct device *dev) res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL); if (!res) { rc = -ENOMEM; - goto err; + goto out_nvb; } res->name = "Persistent Memory"; @@ -442,11 +468,11 @@ static int cxl_pmem_region_probe(struct device *dev) rc = insert_resource(&iomem_resource, res); if (rc) - goto err; + goto out_nvb; rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res); if (rc) - goto err; + goto out_nvb; ndr_desc.res = res; ndr_desc.provider_data = cxlr_pmem; @@ -462,7 +488,7 @@ static int cxl_pmem_region_probe(struct device *dev) nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL); if (!nd_set) { rc = -ENOMEM; - goto err; + goto out_nvb; } ndr_desc.memregion = cxlr->id; @@ -472,9 +498,13 @@ static int cxl_pmem_region_probe(struct device *dev) info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL); if (!info) { rc = -ENOMEM; - goto err; + goto out_nvb; } + rc = devm_add_action_or_reset(dev, release_mappings, cxlr_pmem); + if (rc) + goto out_nvd;
for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_memdev *cxlmd = m->cxlmd; @@ -486,7 +516,7 @@ static int cxl_pmem_region_probe(struct device *dev) dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV; - goto err; + goto out_nvd; } /* safe to drop ref now with bridge lock held */ @@ -498,10 +528,17 @@ static int cxl_pmem_region_probe(struct device *dev) dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV; - goto err; + goto out_nvd; } - cxl_nvd->region = cxlr_pmem; - get_device(&cxlr_pmem->dev);
+ /* + * Pin the region per nvdimm device as those may be released + * out-of-order with respect to the region, and a single nvdimm + * maybe associated with multiple regions + */ + rc = cxl_nvdimm_add_region(cxl_nvd, cxlr_pmem); + if (rc) + goto out_nvd; m->cxl_nvd = cxl_nvd; mappings[i] = (struct nd_mapping_desc) { .nvdimm = nvdimm, @@ -527,27 +564,18 @@ static int cxl_pmem_region_probe(struct device *dev) nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc); if (!cxlr_pmem->nd_region) { rc = -ENOMEM; - goto err; + goto out_nvd; } rc = devm_add_action_or_reset(dev, unregister_nvdimm_region, cxlr_pmem->nd_region); -out: +out_nvd: kfree(info); +out_nvb: device_unlock(&cxl_nvb->dev); put_device(&cxl_nvb->dev); return rc;
-err: - dev_dbg(dev, "failed to create nvdimm region\n"); - for (i--; i >= 0; i--) { - nvdimm = mappings[i].nvdimm; - cxl_nvd = nvdimm_provider_data(nvdimm); - put_device(&cxl_nvd->region->dev); - cxl_nvd->region = NULL; - } - goto out; } static struct cxl_driver cxl_pmem_region_driver = {
On 11/3/2022 5:30 PM, Dan Williams wrote:
When a cxl_nvdimm object goes through a ->remove() event (device physically removed, nvdimm-bridge disabled, or nvdimm device disabled), then any associated regions must also be disabled. As highlighted by the cxl-create-region.sh test [1], a single device may host multiple regions, but the driver was only tracking one region at a time. This leads to a situation where only the last enabled region per nvdimm device is cleaned up properly. Other regions are leaked, and this also causes cxl_memdev reference leaks.
Fix the tracking by allowing cxl_nvdimm objects to track multiple region associations.
Cc: stable@vger.kernel.org Link: https://github.com/pmem/ndctl/blob/main/test/cxl-create-region.sh [1] Reported-by: Vishal Verma vishal.l.verma@intel.com Fixes: 04ad63f086d1 ("cxl/region: Introduce cxl_pmem_region objects") Signed-off-by: Dan Williams dan.j.williams@intel.com
Reviewed-by: Dave Jiang dave.jiang@intel.com
drivers/cxl/core/pmem.c | 2 + drivers/cxl/cxl.h | 2 - drivers/cxl/pmem.c | 100 ++++++++++++++++++++++++++++++----------------- 3 files changed, 67 insertions(+), 37 deletions(-)
diff --git a/drivers/cxl/core/pmem.c b/drivers/cxl/core/pmem.c index 1d12a8206444..36aa5070d902 100644 --- a/drivers/cxl/core/pmem.c +++ b/drivers/cxl/core/pmem.c @@ -188,6 +188,7 @@ static void cxl_nvdimm_release(struct device *dev) { struct cxl_nvdimm *cxl_nvd = to_cxl_nvdimm(dev);
- xa_destroy(&cxl_nvd->pmem_regions); kfree(cxl_nvd); }
@@ -230,6 +231,7 @@ static struct cxl_nvdimm *cxl_nvdimm_alloc(struct cxl_memdev *cxlmd) dev = &cxl_nvd->dev; cxl_nvd->cxlmd = cxlmd;
- xa_init(&cxl_nvd->pmem_regions); device_initialize(dev); lockdep_set_class(&dev->mutex, &cxl_nvdimm_key); device_set_pm_not_required(dev);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index f680450f0b16..1164ad49f3d3 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -423,7 +423,7 @@ struct cxl_nvdimm { struct device dev; struct cxl_memdev *cxlmd; struct cxl_nvdimm_bridge *bridge;
- struct cxl_pmem_region *region;
- struct xarray pmem_regions; };
struct cxl_pmem_region_mapping { diff --git a/drivers/cxl/pmem.c b/drivers/cxl/pmem.c index 0bac05d804bc..c98ff5eb85a6 100644 --- a/drivers/cxl/pmem.c +++ b/drivers/cxl/pmem.c @@ -30,17 +30,20 @@ static void unregister_nvdimm(void *nvdimm) struct cxl_nvdimm *cxl_nvd = nvdimm_provider_data(nvdimm); struct cxl_nvdimm_bridge *cxl_nvb = cxl_nvd->bridge; struct cxl_pmem_region *cxlr_pmem;
- unsigned long index;
device_lock(&cxl_nvb->dev);
- cxlr_pmem = cxl_nvd->region; dev_set_drvdata(&cxl_nvd->dev, NULL);
- cxl_nvd->region = NULL;
- device_unlock(&cxl_nvb->dev);
- xa_for_each(&cxl_nvd->pmem_regions, index, cxlr_pmem) {
get_device(&cxlr_pmem->dev);
device_unlock(&cxl_nvb->dev);
- if (cxlr_pmem) { device_release_driver(&cxlr_pmem->dev); put_device(&cxlr_pmem->dev);
}device_lock(&cxl_nvb->dev);
- device_unlock(&cxl_nvb->dev);
nvdimm_delete(nvdimm); cxl_nvd->bridge = NULL; @@ -366,25 +369,48 @@ static int match_cxl_nvdimm(struct device *dev, void *data) static void unregister_nvdimm_region(void *nd_region) {
- struct cxl_nvdimm_bridge *cxl_nvb;
- struct cxl_pmem_region *cxlr_pmem;
- nvdimm_region_delete(nd_region);
+}
+static int cxl_nvdimm_add_region(struct cxl_nvdimm *cxl_nvd,
struct cxl_pmem_region *cxlr_pmem)
+{
- int rc;
- rc = xa_insert(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem,
cxlr_pmem, GFP_KERNEL);
- if (rc)
return rc;
- get_device(&cxlr_pmem->dev);
- return 0;
+}
+static void cxl_nvdimm_del_region(struct cxl_nvdimm *cxl_nvd, struct cxl_pmem_region *cxlr_pmem) +{
- /*
* It is possible this is called without a corresponding
* cxl_nvdimm_add_region for @cxlr_pmem
*/
- cxlr_pmem = xa_erase(&cxl_nvd->pmem_regions, (unsigned long)cxlr_pmem);
- if (cxlr_pmem)
put_device(&cxlr_pmem->dev);
+}
+static void release_mappings(void *data) +{ int i;
- struct cxl_pmem_region *cxlr_pmem = data;
- struct cxl_nvdimm_bridge *cxl_nvb = cxlr_pmem->bridge;
- cxlr_pmem = nd_region_provider_data(nd_region);
- cxl_nvb = cxlr_pmem->bridge; device_lock(&cxl_nvb->dev); for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_nvdimm *cxl_nvd = m->cxl_nvd;
if (cxl_nvd->region) {
put_device(&cxlr_pmem->dev);
cxl_nvd->region = NULL;
}
} device_unlock(&cxl_nvb->dev);cxl_nvdimm_del_region(cxl_nvd, cxlr_pmem);
- nvdimm_region_delete(nd_region); }
static void cxlr_pmem_remove_resource(void *res) @@ -422,7 +448,7 @@ static int cxl_pmem_region_probe(struct device *dev) if (!cxl_nvb->nvdimm_bus) { dev_dbg(dev, "nvdimm bus not found\n"); rc = -ENXIO;
goto err;
}goto out_nvb;
memset(&mappings, 0, sizeof(mappings)); @@ -431,7 +457,7 @@ static int cxl_pmem_region_probe(struct device *dev) res = devm_kzalloc(dev, sizeof(*res), GFP_KERNEL); if (!res) { rc = -ENOMEM;
goto err;
}goto out_nvb;
res->name = "Persistent Memory"; @@ -442,11 +468,11 @@ static int cxl_pmem_region_probe(struct device *dev) rc = insert_resource(&iomem_resource, res); if (rc)
goto err;
goto out_nvb;
rc = devm_add_action_or_reset(dev, cxlr_pmem_remove_resource, res); if (rc)
goto err;
goto out_nvb;
ndr_desc.res = res; ndr_desc.provider_data = cxlr_pmem; @@ -462,7 +488,7 @@ static int cxl_pmem_region_probe(struct device *dev) nd_set = devm_kzalloc(dev, sizeof(*nd_set), GFP_KERNEL); if (!nd_set) { rc = -ENOMEM;
goto err;
}goto out_nvb;
ndr_desc.memregion = cxlr->id; @@ -472,9 +498,13 @@ static int cxl_pmem_region_probe(struct device *dev) info = kmalloc_array(cxlr_pmem->nr_mappings, sizeof(*info), GFP_KERNEL); if (!info) { rc = -ENOMEM;
goto err;
}goto out_nvb;
- rc = devm_add_action_or_reset(dev, release_mappings, cxlr_pmem);
- if (rc)
goto out_nvd;
- for (i = 0; i < cxlr_pmem->nr_mappings; i++) { struct cxl_pmem_region_mapping *m = &cxlr_pmem->mapping[i]; struct cxl_memdev *cxlmd = m->cxlmd;
@@ -486,7 +516,7 @@ static int cxl_pmem_region_probe(struct device *dev) dev_dbg(dev, "[%d]: %s: no cxl_nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV;
goto err;
}goto out_nvd;
/* safe to drop ref now with bridge lock held */ @@ -498,10 +528,17 @@ static int cxl_pmem_region_probe(struct device *dev) dev_dbg(dev, "[%d]: %s: no nvdimm found\n", i, dev_name(&cxlmd->dev)); rc = -ENODEV;
goto err;
}goto out_nvd;
cxl_nvd->region = cxlr_pmem;
get_device(&cxlr_pmem->dev);
/*
* Pin the region per nvdimm device as those may be released
* out-of-order with respect to the region, and a single nvdimm
* maybe associated with multiple regions
*/
rc = cxl_nvdimm_add_region(cxl_nvd, cxlr_pmem);
if (rc)
m->cxl_nvd = cxl_nvd; mappings[i] = (struct nd_mapping_desc) { .nvdimm = nvdimm,goto out_nvd;
@@ -527,27 +564,18 @@ static int cxl_pmem_region_probe(struct device *dev) nvdimm_pmem_region_create(cxl_nvb->nvdimm_bus, &ndr_desc); if (!cxlr_pmem->nd_region) { rc = -ENOMEM;
goto err;
}goto out_nvd;
rc = devm_add_action_or_reset(dev, unregister_nvdimm_region, cxlr_pmem->nd_region); -out: +out_nvd: kfree(info); +out_nvb: device_unlock(&cxl_nvb->dev); put_device(&cxl_nvb->dev); return rc;
-err:
- dev_dbg(dev, "failed to create nvdimm region\n");
- for (i--; i >= 0; i--) {
nvdimm = mappings[i].nvdimm;
cxl_nvd = nvdimm_provider_data(nvdimm);
put_device(&cxl_nvd->region->dev);
cxl_nvd->region = NULL;
- }
- goto out; }
static struct cxl_driver cxl_pmem_region_driver = {
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible.
The @distance for the host-bridge-cxl_port a simple dual-ported host-bridge configuration with 2 direct-attached devices is 1. An x2 region divded by 2 dports to reach 2 region targets.
An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices).
However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough port.
Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance.
Cc: stable@vger.kernel.org Reported-by: Bobo WL lmw.bobo@gmail.com Reported-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Signed-off-by: Dan Williams dan.j.williams@intel.com --- drivers/cxl/core/port.c | 11 +++++++++-- drivers/cxl/core/region.c | 9 ++++++++- drivers/cxl/cxl.h | 2 ++ 3 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index bffde862de0b..e7556864ea80 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -811,6 +811,7 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id) static int add_dport(struct cxl_port *port, struct cxl_dport *new) { struct cxl_dport *dup; + int rc;
device_lock_assert(&port->dev); dup = find_dport(port, new->port_id); @@ -821,8 +822,14 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *new) dev_name(dup->dport)); return -EBUSY; } - return xa_insert(&port->dports, (unsigned long)new->dport, new, - GFP_KERNEL); + + rc = xa_insert(&port->dports, (unsigned long)new->dport, new, + GFP_KERNEL); + if (rc) + return rc; + + port->nr_dports++; + return 0; }
/* diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index c52465e09f26..c0253de74945 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -990,7 +990,14 @@ static int cxl_port_setup_targets(struct cxl_port *port, if (cxl_rr->nr_targets_set) { int i, distance;
- distance = p->nr_targets / cxl_rr->nr_targets; + /* + * Passthrough ports impose no distance requirements between + * peers + */ + if (port->nr_dports == 1) + distance = 0; + else + distance = p->nr_targets / cxl_rr->nr_targets; for (i = 0; i < cxl_rr->nr_targets_set; i++) if (ep->dport == cxlsd->target[i]) { rc = check_last_peer(cxled, ep, cxl_rr, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 1164ad49f3d3..ac75554b5d76 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -457,6 +457,7 @@ struct cxl_pmem_region { * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids + * @nr_dports: number of entries in @dports * @hdm_end: track last allocated HDM decoder instance for allocation ordering * @commit_end: cursor to track highest committed decoder for commit ordering * @component_reg_phys: component register capability base address (optional) @@ -475,6 +476,7 @@ struct cxl_port { struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; + int nr_dports; int hdm_end; int commit_end; resource_size_t component_reg_phys;
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible.
The @distance for the host-bridge-cxl_port a simple dual-ported
^ Is this meant to be "the distance for the host-bridge to a cxl_port"?
Also missing '/in/ a simple dual-ported..'?
host-bridge configuration with 2 direct-attached devices is 1. An x2 region divded by 2 dports to reach 2 region targets.
The second sentence seems slightly incomprehensible too.
An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices).
However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough
starts to matter?
port.
Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance.
Cc: stable@vger.kernel.org Reported-by: Bobo WL lmw.bobo@gmail.com Reported-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/cxl/core/port.c | 11 +++++++++-- drivers/cxl/core/region.c | 9 ++++++++- drivers/cxl/cxl.h | 2 ++ 3 files changed, 19 insertions(+), 3 deletions(-)
Other than the above, looks good,
Reviewed-by: Vishal Verma vishal.l.verma@intel.com
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index bffde862de0b..e7556864ea80 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -811,6 +811,7 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id) static int add_dport(struct cxl_port *port, struct cxl_dport *new) { struct cxl_dport *dup; + int rc; device_lock_assert(&port->dev); dup = find_dport(port, new->port_id); @@ -821,8 +822,14 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *new) dev_name(dup->dport)); return -EBUSY; } - return xa_insert(&port->dports, (unsigned long)new->dport, new, - GFP_KERNEL);
+ rc = xa_insert(&port->dports, (unsigned long)new->dport, new, + GFP_KERNEL); + if (rc) + return rc;
+ port->nr_dports++; + return 0; } /* diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index c52465e09f26..c0253de74945 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -990,7 +990,14 @@ static int cxl_port_setup_targets(struct cxl_port *port, if (cxl_rr->nr_targets_set) { int i, distance; - distance = p->nr_targets / cxl_rr->nr_targets; + /* + * Passthrough ports impose no distance requirements between + * peers + */ + if (port->nr_dports == 1) + distance = 0; + else + distance = p->nr_targets / cxl_rr->nr_targets; for (i = 0; i < cxl_rr->nr_targets_set; i++) if (ep->dport == cxlsd->target[i]) { rc = check_last_peer(cxled, ep, cxl_rr, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 1164ad49f3d3..ac75554b5d76 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -457,6 +457,7 @@ struct cxl_pmem_region { * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids
- @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering * @commit_end: cursor to track highest committed decoder for commit ordering * @component_reg_phys: component register capability base address (optional) @@ -475,6 +476,7 @@ struct cxl_port { struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; + int nr_dports; int hdm_end; int commit_end; resource_size_t component_reg_phys;
Verma, Vishal L wrote:
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible.
The @distance for the host-bridge-cxl_port a simple dual-ported
^
Is this meant to be "the distance for the host-bridge to a cxl_port"?
No, but I will preface this explanation by admitting "distance" may not be the best term for what this value is describing. CXL decode routes to targets in a round robin fashion per-port. Take the diagram below:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV0│x4│ │DEV1│x4│ │DEV2│x4│ │DEV3│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 0 2 1 3
...where an x4 region is being established, and all the xN values are the interleave-ways settings for those ports/devices. The @distance value for that "HB0" port is 2. I.e. in order for 2 devices in that region to be mapped under HB0 they must be at position X and X+2 in the region. The algorithm needs to be flexible to also allow this configuration:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV3│x4│ │DEV2│x4│ │DEV1│x4│ │DEV0│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 3 1 2 0
...and the algorithm can not know that a device is in the wrong position until trying to map the peer (like DEV0 and DEV1 are peers) into the decode. So "The @distance for the host-bridge-cxl_port" is referring to the value "2" for host-bridge-cxl_port:HB0 and host-bridge-cxl_port:HB1 in the diagram.
Also missing '/in/ a simple dual-ported..'?
Yes to this fixup though.
host-bridge configuration with 2 direct-attached devices is 1. An x2 region divded by 2 dports to reach 2 region targets.
The second sentence seems slightly incomprehensible too.
Oh, I think I meant that to be s/. An/, i.e. an/:
"...host-bridge configuration with 2 direct-attached devices is 1, i.e. an x2 region divded by 2 dports to reach 2 region positions"
An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices).
However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough
starts to matter?
s/starts matters/matters/
port.
Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance.
Cc: stable@vger.kernel.org Reported-by: Bobo WL lmw.bobo@gmail.com Reported-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/cxl/core/port.c | 11 +++++++++-- drivers/cxl/core/region.c | 9 ++++++++- drivers/cxl/cxl.h | 2 ++ 3 files changed, 19 insertions(+), 3 deletions(-)
Other than the above, looks good,
Reviewed-by: Vishal Verma vishal.l.verma@intel.com
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index bffde862de0b..e7556864ea80 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -811,6 +811,7 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id) static int add_dport(struct cxl_port *port, struct cxl_dport *new) { struct cxl_dport *dup; + int rc; device_lock_assert(&port->dev); dup = find_dport(port, new->port_id); @@ -821,8 +822,14 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *new) dev_name(dup->dport)); return -EBUSY; } - return xa_insert(&port->dports, (unsigned long)new->dport, new, - GFP_KERNEL);
+ rc = xa_insert(&port->dports, (unsigned long)new->dport, new, + GFP_KERNEL); + if (rc) + return rc;
+ port->nr_dports++; + return 0; } /* diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index c52465e09f26..c0253de74945 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -990,7 +990,14 @@ static int cxl_port_setup_targets(struct cxl_port *port, if (cxl_rr->nr_targets_set) { int i, distance; - distance = p->nr_targets / cxl_rr->nr_targets; + /* + * Passthrough ports impose no distance requirements between + * peers + */ + if (port->nr_dports == 1) + distance = 0; + else + distance = p->nr_targets / cxl_rr->nr_targets; for (i = 0; i < cxl_rr->nr_targets_set; i++) if (ep->dport == cxlsd->target[i]) { rc = check_last_peer(cxled, ep, cxl_rr, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 1164ad49f3d3..ac75554b5d76 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -457,6 +457,7 @@ struct cxl_pmem_region { * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids
- @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering * @commit_end: cursor to track highest committed decoder for commit ordering * @component_reg_phys: component register capability base address (optional) @@ -475,6 +476,7 @@ struct cxl_port { struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; + int nr_dports; int hdm_end; int commit_end; resource_size_t component_reg_phys;
On Fri, 2022-11-04 at 11:59 -0700, Dan Williams wrote:
Verma, Vishal L wrote:
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible.
The @distance for the host-bridge-cxl_port a simple dual-ported
^ Is this meant to be "the distance for the host-bridge to a cxl_port"?
No, but I will preface this explanation by admitting "distance" may not be the best term for what this value is describing. CXL decode routes to targets in a round robin fashion per-port. Take the diagram below:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV0│x4│ │DEV1│x4│ │DEV2│x4│ │DEV3│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 0 2 1 3
...where an x4 region is being established, and all the xN values are the interleave-ways settings for those ports/devices. The @distance value for that "HB0" port is 2. I.e. in order for 2 devices in that region to be mapped under HB0 they must be at position X and X+2 in the region. The algorithm needs to be flexible to also allow this configuration:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV3│x4│ │DEV2│x4│ │DEV1│x4│ │DEV0│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 3 1 2 0
...and the algorithm can not know that a device is in the wrong position until trying to map the peer (like DEV0 and DEV1 are peers) into the decode. So "The @distance for the host-bridge-cxl_port" is referring to the value "2" for host-bridge-cxl_port:HB0 and host-bridge-cxl_port:HB1 in the diagram.
Ah I understand now. The host-bridge-cxl_port term confused me slightly - I wonder if it is better to say "the host-bridge's cxl_port" or something? Regardless, thanks for the explanation!
Also missing '/in/ a simple dual-ported..'?
Yes to this fixup though.
host-bridge configuration with 2 direct-attached devices is 1. An x2 region divded by 2 dports to reach 2 region targets.
The second sentence seems slightly incomprehensible too.
Oh, I think I meant that to be s/. An/, i.e. an/:
"...host-bridge configuration with 2 direct-attached devices is 1, i.e. an x2 region divded by 2 dports to reach 2 region positions"
Ah ok that makes more sense!
An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices).
However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough
starts to matter?
s/starts matters/matters/
port.
Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance.
Cc: stable@vger.kernel.org Reported-by: Bobo WL lmw.bobo@gmail.com Reported-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/cxl/core/port.c | 11 +++++++++-- drivers/cxl/core/region.c | 9 ++++++++- drivers/cxl/cxl.h | 2 ++ 3 files changed, 19 insertions(+), 3 deletions(-)
Other than the above, looks good,
Reviewed-by: Vishal Verma vishal.l.verma@intel.com
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index bffde862de0b..e7556864ea80 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -811,6 +811,7 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id) static int add_dport(struct cxl_port *port, struct cxl_dport *new) { struct cxl_dport *dup; + int rc; device_lock_assert(&port->dev); dup = find_dport(port, new->port_id); @@ -821,8 +822,14 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *new) dev_name(dup->dport)); return -EBUSY; } - return xa_insert(&port->dports, (unsigned long)new->dport, new, - GFP_KERNEL);
+ rc = xa_insert(&port->dports, (unsigned long)new->dport, new, + GFP_KERNEL); + if (rc) + return rc;
+ port->nr_dports++; + return 0; } /* diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index c52465e09f26..c0253de74945 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -990,7 +990,14 @@ static int cxl_port_setup_targets(struct cxl_port *port, if (cxl_rr->nr_targets_set) { int i, distance; - distance = p->nr_targets / cxl_rr->nr_targets; + /* + * Passthrough ports impose no distance requirements between + * peers + */ + if (port->nr_dports == 1) + distance = 0; + else + distance = p->nr_targets / cxl_rr->nr_targets; for (i = 0; i < cxl_rr->nr_targets_set; i++) if (ep->dport == cxlsd->target[i]) { rc = check_last_peer(cxled, ep, cxl_rr, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 1164ad49f3d3..ac75554b5d76 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -457,6 +457,7 @@ struct cxl_pmem_region { * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids
- @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering * @commit_end: cursor to track highest committed decoder for commit ordering * @component_reg_phys: component register capability base address (optional) @@ -475,6 +476,7 @@ struct cxl_port { struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; + int nr_dports; int hdm_end; int commit_end; resource_size_t component_reg_phys;
On 11/4/2022 11:59 AM, Dan Williams wrote:
Verma, Vishal L wrote:
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible.
The @distance for the host-bridge-cxl_port a simple dual-ported
^
Is this meant to be "the distance for the host-bridge to a cxl_port"?
No, but I will preface this explanation by admitting "distance" may not be the best term for what this value is describing. CXL decode routes to targets in a round robin fashion per-port. Take the diagram below:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV0│x4│ │DEV1│x4│ │DEV2│x4│ │DEV3│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 0 2 1 3
...where an x4 region is being established, and all the xN values are the interleave-ways settings for those ports/devices. The @distance value for that "HB0" port is 2. I.e. in order for 2 devices in that region to be mapped under HB0 they must be at position X and X+2 in the region. The algorithm needs to be flexible to also allow this configuration:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV3│x4│ │DEV2│x4│ │DEV1│x4│ │DEV0│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 3 1 2 0
...and the algorithm can not know that a device is in the wrong position until trying to map the peer (like DEV0 and DEV1 are peers) into the decode. So "The @distance for the host-bridge-cxl_port" is referring to the value "2" for host-bridge-cxl_port:HB0 and host-bridge-cxl_port:HB1 in the diagram.
Also missing '/in/ a simple dual-ported..'?
Yes to this fixup though.
host-bridge configuration with 2 direct-attached devices is 1. An x2 region divded by 2 dports to reach 2 region targets.
The second sentence seems slightly incomprehensible too.
Oh, I think I meant that to be s/. An/, i.e. an/:
"...host-bridge configuration with 2 direct-attached devices is 1, i.e. an x2 region divded by 2 dports to reach 2 region positions"
s/divded/divided/ ?
An x4 region under an x2 host-bridge would need 2 intervening switches where the @distance at the host bridge level is 2 (x4 region divided by 2 switches to reach 4 devices).
However, the distance between peers underneath a single ported host-bridge is always zero because there is no limit to the number of devices that can be mapped. In other words, there are no decoders to program in a passthrough, all descendants are mapped and distance only starts matters for the intervening descendant ports of the passthrough
starts to matter?
s/starts matters/matters/
port.
Add tracking for the number of dports mapped to a port, and use that to detect the passthrough case for calculating @distance.
Cc: stable@vger.kernel.org Reported-by: Bobo WL lmw.bobo@gmail.com Reported-by: Jonathan Cameron Jonathan.Cameron@huawei.com Link: http://lore.kernel.org/r/20221010172057.00001559@huawei.com Fixes: 27b3f8d13830 ("cxl/region: Program target lists") Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/cxl/core/port.c | 11 +++++++++-- drivers/cxl/core/region.c | 9 ++++++++- drivers/cxl/cxl.h | 2 ++ 3 files changed, 19 insertions(+), 3 deletions(-)
Other than the above, looks good,
Reviewed-by: Vishal Verma vishal.l.verma@intel.com
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c index bffde862de0b..e7556864ea80 100644 --- a/drivers/cxl/core/port.c +++ b/drivers/cxl/core/port.c @@ -811,6 +811,7 @@ static struct cxl_dport *find_dport(struct cxl_port *port, int id) static int add_dport(struct cxl_port *port, struct cxl_dport *new) { struct cxl_dport *dup; + int rc; device_lock_assert(&port->dev); dup = find_dport(port, new->port_id); @@ -821,8 +822,14 @@ static int add_dport(struct cxl_port *port, struct cxl_dport *new) dev_name(dup->dport)); return -EBUSY; } - return xa_insert(&port->dports, (unsigned long)new->dport, new, - GFP_KERNEL);
+ rc = xa_insert(&port->dports, (unsigned long)new->dport, new, + GFP_KERNEL); + if (rc) + return rc;
+ port->nr_dports++; + return 0; } /* diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index c52465e09f26..c0253de74945 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -990,7 +990,14 @@ static int cxl_port_setup_targets(struct cxl_port *port, if (cxl_rr->nr_targets_set) { int i, distance; - distance = p->nr_targets / cxl_rr->nr_targets; + /* + * Passthrough ports impose no distance requirements between + * peers + */ + if (port->nr_dports == 1) + distance = 0; + else + distance = p->nr_targets / cxl_rr->nr_targets; for (i = 0; i < cxl_rr->nr_targets_set; i++) if (ep->dport == cxlsd->target[i]) { rc = check_last_peer(cxled, ep, cxl_rr, diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 1164ad49f3d3..ac75554b5d76 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -457,6 +457,7 @@ struct cxl_pmem_region { * @regions: cxl_region_ref instances, regions mapped by this port * @parent_dport: dport that points to this port in the parent * @decoder_ida: allocator for decoder ids
- @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering * @commit_end: cursor to track highest committed decoder for commit ordering * @component_reg_phys: component register capability base address (optional) @@ -475,6 +476,7 @@ struct cxl_port { struct xarray regions; struct cxl_dport *parent_dport; struct ida decoder_ida; + int nr_dports; int hdm_end; int commit_end; resource_size_t component_reg_phys;
Dave Jiang wrote:
On 11/4/2022 11:59 AM, Dan Williams wrote:
Verma, Vishal L wrote:
On Thu, 2022-11-03 at 17:30 -0700, Dan Williams wrote:
When programming port decode targets, the algorithm wants to ensure that two devices are compatible to be programmed as peers beneath a given port. A compatible peer is a target that shares the same dport, and where that target's interleave position also routes it to the same dport. Compatibility is determined by the device's interleave position being >= to distance. For example, if a given dport can only map every Nth position then positions less than N away from the last target programmed are incompatible.
The @distance for the host-bridge-cxl_port a simple dual-ported
^
Is this meant to be "the distance for the host-bridge to a cxl_port"?
No, but I will preface this explanation by admitting "distance" may not be the best term for what this value is describing. CXL decode routes to targets in a round robin fashion per-port. Take the diagram below:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV0│x4│ │DEV1│x4│ │DEV2│x4│ │DEV3│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 0 2 1 3
...where an x4 region is being established, and all the xN values are the interleave-ways settings for those ports/devices. The @distance value for that "HB0" port is 2. I.e. in order for 2 devices in that region to be mapped under HB0 they must be at position X and X+2 in the region. The algorithm needs to be flexible to also allow this configuration:
┌───────────────────────────────────┬──┐ │WINDOW0 │x2│ └─────────┬─────────────────┬───────┴──┘ │ │ ┌─────────▼────┬──┐ ┌──────▼───────┬──┐ │HB0 │x2│ │HB1 │x2│ └──┬────────┬──┴──┘ └─┬─────────┬──┴──┘ │ │ │ │ ┌──▼─┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ ┌─▼──┬──┐ │DEV3│x4│ │DEV2│x4│ │DEV1│x4│ │DEV0│x4│ └────┴──┘ └────┴──┘ └────┴──┘ └────┴──┘ 3 1 2 0
...and the algorithm can not know that a device is in the wrong position until trying to map the peer (like DEV0 and DEV1 are peers) into the decode. So "The @distance for the host-bridge-cxl_port" is referring to the value "2" for host-bridge-cxl_port:HB0 and host-bridge-cxl_port:HB1 in the diagram.
Also missing '/in/ a simple dual-ported..'?
Yes to this fixup though.
host-bridge configuration with 2 direct-attached devices is 1. An x2 region divded by 2 dports to reach 2 region targets.
The second sentence seems slightly incomprehensible too.
Oh, I think I meant that to be s/. An/, i.e. an/:
"...host-bridge configuration with 2 direct-attached devices is 1, i.e. an x2 region divded by 2 dports to reach 2 region positions"
s/divded/divided/ ?
Yup, thanks.
linux-stable-mirror@lists.linaro.org