Hi Rafael,
On Wed, 6 Mar 2024 13:48:37 +0100 "Rafael J. Wysocki" rafael@kernel.org wrote:
On Wed, Mar 6, 2024 at 9:51 AM Herve Codina herve.codina@bootlin.com wrote:
The commit 80dd33cf72d1 ("drivers: base: Fix device link removal") introduces a workqueue to release the consumer and supplier devices used in the devlink. In the job queued, devices are release and in turn, when all the references to these devices are dropped, the release function of the device itself is called.
Nothing is present to provide some synchronisation with this workqueue in order to ensure that all ongoing releasing operations are done and so, some other operations can be started safely.
For instance, in the following sequence:
- of_platform_depopulate()
- of_overlay_remove()
During the step 1, devices are released and related devlinks are removed (jobs pushed in the workqueue). During the step 2, OF nodes are destroyed but, without any synchronisation with devlink removal jobs, of_overlay_remove() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2
Indeed, the missing of_node_put() call is going to be done, too late, from the workqueue job execution.
Introduce device_link_wait_removal() to offer a way to synchronize operations waiting for the end of devlink removals (i.e. end of workqueue jobs). Also, as a flushing operation is done on the workqueue, the workqueue used is moved from a system-wide workqueue to a local one.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
No, it is not fixed by this patch.
Was explicitly asked by Saravana on v1 review: https://lore.kernel.org/linux-kernel/CAGETcx9uP86EHyKJNifBMd23oCsA+KpMa+e36w...
The commit 80dd33cf72d1 introduces the workqueue and so some asynchronous tasks on removal. This patch and the next one allows to re-sync execution waiting for jobs in the workqueue when it is needed.
In fact, the only possibly observable effect of this patch is the failure when the allocation of device_link_wq fails AFAICS.
Cc: stable@vger.kernel.org
So why?
Cc:stable is needed as this patch is a prerequisite of patch 2 (needed to fix the asynchronous workqueue task issue).
Best regards, Hervé