On Tue, Mar 5, 2024 at 2:43 AM Nuno Sá noname.nuno@gmail.com wrote:
On Tue, 2024-03-05 at 11:27 +0100, Herve Codina wrote:
Hi Nuno, Saravana, Rob,
On Tue, 05 Mar 2024 08:36:45 +0100 Nuno Sá noname.nuno@gmail.com wrote:
On Mon, 2024-03-04 at 22:47 -0800, Saravana Kannan wrote:
On Mon, Mar 4, 2024 at 8:49 AM Herve Codina herve.codina@bootlin.com wrote:
Hi Rob,
On Mon, 4 Mar 2024 09:22:02 -0600 Rob Herring robh@kernel.org wrote:
...
> > @@ -853,6 +854,14 @@ static void free_overlay_changeset(struct > > overlay_changeset *ovcs) > > { > > int i; > > > > + /* > > + * Wait for any ongoing device link removals before removing > > some of > > + * nodes. Drop the global lock while waiting > > + */ > > + mutex_unlock(&of_mutex); > > + device_link_wait_removal(); > > + mutex_lock(&of_mutex); > > I'm still not convinced we need to drop the lock. What happens if > someone else > grabs the lock while we are in device_link_wait_removal()? Can we > guarantee that > we can't screw things badly?
It is also just ugly because it's the callers of free_overlay_changeset() that hold the lock and now we're releasing it behind their back.
As device_link_wait_removal() is called before we touch anything, can't it be called before we take the lock? And do we need to call it if applying the overlay fails?
Rob,
This[1] scenario Luca reported seems like a reason for the device_link_wait_removal() to be where Herve put it. That example seems reasonable.
[1] - https://lore.kernel.org/all/20231220181627.341e8789@booty/
I'm still not totally convinced about that. Why not putting the check right before checking the kref in __of_changeset_entry_destroy(). I'll contradict myself a bit because this is just theory but if we look at pci_stop_dev(), which AFAIU, could be reached from a sysfs write(), we have:
device_release_driver(&dev->dev); ... of_pci_remove_node(dev); of_changeset_revert(np->data); of_changeset_destroy(np->data);
So looking at the above we would hit the same issue if we flush the queue in free_overlay_changeset() - as the queue won't be flushed at all and we could have devlink removal due to device_release_driver(). Right?
Again, completely theoretical but seems like a reasonable one plus I'm not understanding the push against having the flush in __of_changeset_entry_destroy(). Conceptually, it looks the best place to me but I may be missing some issue in doing it there?
Instead of having the wait called in __of_changeset_entry_destroy() and so called in a loop. I could move this call in the __of_changeset_entry_destroy() caller (without any of_mutex lock drop).
Oh, good catch! At this point all the devlinks removals (related to the changeset) should have been queued so yes, we should only need to flush once.
So this will look like this: --- 8< --- void of_changeset_destroy(struct of_changeset *ocs) { struct of_changeset_entry *ce, *cen;
device_link_wait_removal(); list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) __of_changeset_entry_destroy(ce);
} --- 8< ---
I already tested on my system and it works correctly with device_link_wait_removal() only called from of_changeset_destroy() as proposed.
Saravana, Nuno, Rob does it seems ok for you ?
Looks good to me.
-Saravana
It looks good to me...
- Nuno Sá