Zijun Hu wrote:
On 2024/10/11 13:34, Dan Williams wrote:
In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature:
cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
- cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset
- mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0:
- cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace:
<TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core]
At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted.
The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them.
In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction.
A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously.
Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable@vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Davidlohr Bueso dave@stgolabs.net Cc: Jonathan Cameron jonathan.cameron@huawei.com Cc: Dave Jiang dave.jiang@intel.com Cc: Alison Schofield alison.schofield@intel.com Cc: Ira Weiny ira.weiny@intel.com Cc: Zijun Hu zijun_hu@icloud.com Signed-off-by: Dan Williams dan.j.williams@intel.com
drivers/base/core.c | 35 +++++++++++++++++++++++++++++ drivers/cxl/core/hdm.c | 50 +++++++++++++++++++++++++++++++++++------- drivers/cxl/core/region.c | 48 +++++++++++----------------------------- drivers/cxl/cxl.h | 3 ++- include/linux/device.h | 3 +++ tools/testing/cxl/test/cxl.c | 14 ++++-------- 6 files changed, 100 insertions(+), 53 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c index a4c853411a6b..e42f1ad73078 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_for_each_child_reverse); +/**
- device_for_each_child_reverse_from - device child iterator in reversed order.
- @parent: parent struct device.
- @from: optional starting point in child list
- @fn: function to be called for each device.
- @data: data for the callback.
- Iterate over @parent's child devices, starting at @from, and call @fn
- for each, passing it @data. This helper is identical to
- device_for_each_child_reverse() when @from is NULL.
- @fn is checked each iteration. If it returns anything other than 0,
- iteration stop and that value is returned to the caller of
- device_for_each_child_reverse_from();
- */
+int device_for_each_child_reverse_from(struct device *parent,
struct device *from, const void *data,
int (*fn)(struct device *, const void *))
+{
- struct klist_iter i;
- struct device *child;
- int error = 0;
- if (!parent->p)
return 0;
- klist_iter_init_node(&parent->p->klist_children, &i,
(from ? &from->p->knode_parent : NULL));
- while ((child = prev_device(&i)) && !error)
error = fn(child, data);
- klist_iter_exit(&i);
- return error;
+} +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from);
it does NOT deserve, also does NOT need to introduce a new core driver API device_for_each_child_reverse_from(). existing device_for_each_child_reverse() can do what the _from() wants to do.
we can use similar approach as below link shown: https://lore.kernel.org/all/20240815-const_dfc_prepare-v2-2-8316b87b8ff9@qui...
No, just have a simple starting point parameter. I understand that more logic can be placed around device_for_each_child_reverse() to achieve the same effect, but the core helpers should be removing logic from consumers, not forcing them to add more.
If bloat is a concern, then after your const cleanups go through device_for_each_child_reverse() can be rewritten in terms of device_for_each_child_reverse_from() as (untested):
diff --git a/drivers/base/core.c b/drivers/base/core.c index e42f1ad73078..2571c910da46 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4007,36 +4007,6 @@ int device_for_each_child(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_for_each_child);
-/** - * device_for_each_child_reverse - device child iterator in reversed order. - * @parent: parent struct device. - * @fn: function to be called for each device. - * @data: data for the callback. - * - * Iterate over @parent's child devices, and call @fn for each, - * passing it @data. - * - * We check the return of @fn each time. If it returns anything - * other than 0, we break out and return that value. - */ -int device_for_each_child_reverse(struct device *parent, void *data, - int (*fn)(struct device *dev, void *data)) -{ - struct klist_iter i; - struct device *child; - int error = 0; - - if (!parent || !parent->p) - return 0; - - klist_iter_init(&parent->p->klist_children, &i); - while ((child = prev_device(&i)) && !error) - error = fn(child, data); - klist_iter_exit(&i); - return error; -} -EXPORT_SYMBOL_GPL(device_for_each_child_reverse); - /** * device_for_each_child_reverse_from - device child iterator in reversed order. * @parent: parent struct device. diff --git a/include/linux/device.h b/include/linux/device.h index 667cb6db9019..96a2c072bf5b 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1076,11 +1076,14 @@ DEFINE_FREE(device_del, struct device *, if (_T) device_del(_T))
int device_for_each_child(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); -int device_for_each_child_reverse(struct device *dev, void *data, - int (*fn)(struct device *dev, void *data)); int device_for_each_child_reverse_from(struct device *parent, struct device *from, const void *data, int (*fn)(struct device *, const void *)); +static inline int device_for_each_child_reverse(struct device *dev, const void *data, + int (*fn)(struct device *, const void *)) +{ + return device_for_each_child_reverse_from(dev, NULL, data, fn); +} struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent,