On Wed, Mar 6, 2024 at 4:24 PM Herve Codina herve.codina@bootlin.com wrote:
Hi Rafael,
On Wed, 6 Mar 2024 13:48:37 +0100 "Rafael J. Wysocki" rafael@kernel.org wrote:
On Wed, Mar 6, 2024 at 9:51 AM Herve Codina herve.codina@bootlin.com wrote:
The commit 80dd33cf72d1 ("drivers: base: Fix device link removal") introduces a workqueue to release the consumer and supplier devices used in the devlink. In the job queued, devices are release and in turn, when all the references to these devices are dropped, the release function of the device itself is called.
Nothing is present to provide some synchronisation with this workqueue in order to ensure that all ongoing releasing operations are done and so, some other operations can be started safely.
For instance, in the following sequence:
- of_platform_depopulate()
- of_overlay_remove()
During the step 1, devices are released and related devlinks are removed (jobs pushed in the workqueue). During the step 2, OF nodes are destroyed but, without any synchronisation with devlink removal jobs, of_overlay_remove() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2
Indeed, the missing of_node_put() call is going to be done, too late, from the workqueue job execution.
Introduce device_link_wait_removal() to offer a way to synchronize operations waiting for the end of devlink removals (i.e. end of workqueue jobs). Also, as a flushing operation is done on the workqueue, the workqueue used is moved from a system-wide workqueue to a local one.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
No, it is not fixed by this patch.
Was explicitly asked by Saravana on v1 review: https://lore.kernel.org/linux-kernel/CAGETcx9uP86EHyKJNifBMd23oCsA+KpMa+e36w...
Well, I don't think this is a valid request, sorry.
The commit 80dd33cf72d1 introduces the workqueue and so some asynchronous tasks on removal. This patch and the next one allows to re-sync execution waiting for jobs in the workqueue when it is needed.
I get this, but still, this particular individual patch by itself doesn't fix anything. Do you agree with this?
If somebody applies this patch without the next one in the series, they will not get any change in behavior, so the tag is at least misleading.
You can claim that the next patch on top of this one fixes things, so adding a Fixes tag to the other patch would be fine.
There is an explicit dependency between them (the second patch is not even applicable without the first one, or if it is, the resulting code won't compile anyway), but you can make a note to the maintainer that they need to go to -stable together.
In fact, the only possibly observable effect of this patch is the failure when the allocation of device_link_wq fails AFAICS.
Cc: stable@vger.kernel.org
So why?
Cc:stable is needed as this patch is a prerequisite of patch 2 (needed to fix the asynchronous workqueue task issue).
Dependencies like this can be expressed in "Cc: stable" tags. Personally, I'd do it like this:
Cc: stable@vger.kernel.org # 5.13: Depends on the first patch in the series