In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com --- drivers/of/dynamic.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index 3bf27052832f..169e2a9ae22f 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -9,6 +9,7 @@
#define pr_fmt(fmt) "OF: " fmt
+#include <linux/device.h> #include <linux/of.h> #include <linux/spinlock.h> #include <linux/slab.h> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs) { struct of_changeset_entry *ce, *cen;
+ /* + * Wait for any ongoing device link removals before destroying some of + * nodes. + */ + device_link_wait_removal(); + list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) __of_changeset_entry_destroy(ce); }
On Wed, 2024-03-06 at 09:50 +0100, Herve Codina wrote:
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
LGTM
Reviewed-by: Nuno Sa nuno.sa@analog.com
drivers/of/dynamic.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index 3bf27052832f..169e2a9ae22f 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -9,6 +9,7 @@ #define pr_fmt(fmt) "OF: " fmt +#include <linux/device.h> #include <linux/of.h> #include <linux/spinlock.h> #include <linux/slab.h> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs) { struct of_changeset_entry *ce, *cen;
- /*
* Wait for any ongoing device link removals before destroying some
of
* nodes.
*/
- device_link_wait_removal();
list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) __of_changeset_entry_destroy(ce); }
On Wed, 6 Mar 2024 09:50:03 +0100 Herve Codina herve.codina@bootlin.com wrote:
In the following sequence:
- of_platform_depopulate()
- of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/dynamic.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index 3bf27052832f..169e2a9ae22f 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -9,6 +9,7 @@ #define pr_fmt(fmt) "OF: " fmt +#include <linux/device.h> #include <linux/of.h> #include <linux/spinlock.h> #include <linux/slab.h> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs) { struct of_changeset_entry *ce, *cen;
- /*
* Wait for any ongoing device link removals before destroying some of
* nodes.
*/
- device_link_wait_removal();
Tested-by: Luca Ceresoli luca.ceresoli@bootlin.com
And no problem appeared in my tests due to the removed unlock/lock around device_link_wait_removal().
Luca
On Wed, Mar 6, 2024 at 12:51 AM Herve Codina herve.codina@bootlin.com wrote:
In the following sequence:
- of_platform_depopulate()
- of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/dynamic.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c index 3bf27052832f..169e2a9ae22f 100644 --- a/drivers/of/dynamic.c +++ b/drivers/of/dynamic.c @@ -9,6 +9,7 @@
#define pr_fmt(fmt) "OF: " fmt
+#include <linux/device.h> #include <linux/of.h> #include <linux/spinlock.h> #include <linux/slab.h> @@ -667,6 +668,12 @@ void of_changeset_destroy(struct of_changeset *ocs) { struct of_changeset_entry *ce, *cen;
/*
* Wait for any ongoing device link removals before destroying some of
* nodes.
*/
Not going to ask you to revise this patch just for this, but this comment isn't very useful. It's telling what you are doing. Not why. And the function name is already clear on what you are doing.
Maybe something like this would be better at describing the "why"? Free free to reword it.
When a device is deleted, the device links to/from it are also queued for deletion. Until these device links are freed, the devices themselves aren't freed. If the device being deleted is due to an overlay change, this device might be holding a reference to a device node that will be freed. So, wait until all already pending device links are deleted before freeing a device node. This ensures we don't free any device node that has a non-zero reference count.
Reviewed-by: Saravana Kannan saravanak@google.com
-Saravana
device_link_wait_removal();
list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) __of_changeset_entry_destroy(ce);
}
2.43.0
linux-stable-mirror@lists.linaro.org