On Tue, May 06, 2025 at 10:12:40AM +0200, Antoine Tenart wrote:
Hello,
On Mon, May 05, 2025 at 06:12:50PM -0400, Sasha Levin wrote:
From: Antoine Tenart atenart@kernel.org
[ Upstream commit 79c61899b5eee317907efd1b0d06a1ada0cc00d8 ]
There is an ABBA deadlock between net device unregistration and sysfs files being accessed[1][2]. To prevent this from happening all paths taking the rtnl lock after the sysfs one (actually kn->active refcount) use rtnl_trylock and return early (using restart_syscall)[3], which can make syscalls to spin for a long time when there is contention on the rtnl lock[4].
There are not many possibilities to improve the above:
- Rework the entire net/ locking logic.
- Invert two locks in one of the paths — not possible.
But here it's actually possible to drop one of the locks safely: the kernfs_node refcount. More details in the code itself, which comes with lots of comments.
Note that we check the device is alive in the added sysfs_rtnl_lock helper to disallow sysfs operations to run after device dismantle has started. This also help keeping the same behavior as before. Because of this calls to dev_isalive in sysfs ops were removed.
[1] https://lore.kernel.org/netdev/49A4D5D5.5090602@trash.net/ [2] https://lore.kernel.org/netdev/m14oyhis31.fsf@fess.ebiederm.org/ [3] https://lore.kernel.org/netdev/20090226084924.16cb3e08@nehalam/ [4] https://lore.kernel.org/all/20210928125500.167943-1-atenart@kernel.org/T/
Signed-off-by: Antoine Tenart atenart@kernel.org Link: https://patch.msgid.link/20250204170314.146022-2-atenart@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org
I'm not sure why commits from this series were flagged for stable trees, but I would not advise to take them. They are not fixing a bug, only improving performances by reducing lock contention.
The commits are:
79c61899b5ee net-sysfs: remove rtnl_trylock from device attributes b7ecc1de51ca net-sysfs: move queue attribute groups outside the default groups [It seems this one was missed?] 7e54f85c6082 net-sysfs: prevent uncleared queues from being re-added [My guess is this looks like a real fix, but it's only preventing an issue after the changes made in the series] b0b6fcfa6ad8 net-sysfs: remove rtnl_trylock from queue attributes
Same applies for the other stable backport requests.
I'll drop them, thanks!