On Thu, Feb 29, 2024 at 12:18:49PM +0100, Nuno Sá wrote:
On Thu, 2024-02-29 at 11:52 +0100, Herve Codina wrote:
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/overlay.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..7a010a62b9d8 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -8,6 +8,7 @@ #define pr_fmt(fmt) "OF: overlay: " fmt +#include <linux/device.h>
This is clearly up to the DT maintainers to decide but, IMHO, I would very much prefer to see fwnode.h included in here rather than directly device.h (so yeah, renaming the function to fwnode_*).
IMO, the DT code should know almost nothing about fwnode because that's the layer above it. But then overlay stuff is kind of a layer above the core DT code too.
But yeah, I might be biased by own series :)
#include <linux/kernel.h> #include <linux/module.h> #include <linux/of.h> @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) { int i;
- /*
* Wait for any ongoing device link removals before removing some of
* nodes. Drop the global lock while waiting
*/
- mutex_unlock(&of_mutex);
- device_link_wait_removal();
- mutex_lock(&of_mutex);
I'm still not convinced we need to drop the lock. What happens if someone else grabs the lock while we are in device_link_wait_removal()? Can we guarantee that we can't screw things badly?
It is also just ugly because it's the callers of free_overlay_changeset() that hold the lock and now we're releasing it behind their back.
As device_link_wait_removal() is called before we touch anything, can't it be called before we take the lock? And do we need to call it if applying the overlay fails?
Rob