Hi,
syzbot reported a circular locking dependency in the NET/ROM routing code involving nr_neigh_list_lock, nr_node_list_lock and nr_node->node_lock when nr_rt_device_down() interacts with the ioctl path. This series fixes that deadlock and also addresses a long-standing reference count leak found while auditing the same code.
Patch 1/2 refactors nr_rt_device_down() to avoid nested locking between nr_neigh_list_lock and nr_node_list_lock by doing two separate passes over nodes and neighbours, and adjusts nr_rt_free() to follow the same lock ordering.
Patch 2/2 fixes a per-route reference count leak by dropping nr_neigh->count and calling nr_neigh_put() when removing routes from nr_rt_device_down(), mirroring the behaviour of nr_dec_obs()/nr_del_node().
[1] https://syzkaller.appspot.com/bug?extid=14afda08dc3484d5db82
Thanks, Junjie
syzbot reported a circular locking dependency involving nr_neigh_list_lock, nr_node_list_lock and nr_node->node_lock in the NET/ROM routing code [1].
One of the problematic scenarios looks like this:
CPU0 CPU1 ---- ---- nr_rt_device_down() nr_rt_ioctl() lock(nr_neigh_list_lock); nr_del_node() ... lock(nr_node_list_lock); lock(nr_node_list_lock); nr_remove_neigh(); lock(nr_neigh_list_lock);
This creates the following lock chain:
nr_neigh_list_lock -> nr_node_list_lock -> &nr_node->node_lock
while the ioctl path may acquire the locks in the opposite order via nr_dec_obs()/nr_del_node(), which makes lockdep complain about a possible deadlock.
Refactor nr_rt_device_down() to avoid nested locking of nr_neigh_list_lock and nr_node_list_lock. The function now performs two separate passes: one that walks all nodes under nr_node_list_lock and drops routes, and a second one that removes unused neighbours under nr_neigh_list_lock.
Also adjust nr_rt_free() to acquire nr_node_list_lock before nr_neigh_list_lock so that the global lock ordering remains consistent.
[1] https://syzkaller.appspot.com/bug?extid=14afda08dc3484d5db82
Reported-by: syzbot+14afda08dc3484d5db82@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=14afda08dc3484d5db82 Tested-by: syzbot+14afda08dc3484d5db82@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Junjie Cao junjie.cao@intel.com --- net/netrom/nr_route.c | 65 ++++++++++++++++++++++--------------------- 1 file changed, 33 insertions(+), 32 deletions(-)
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c index b94cb2ffbaf8..20aacfdfccd4 100644 --- a/net/netrom/nr_route.c +++ b/net/netrom/nr_route.c @@ -508,40 +508,41 @@ void nr_rt_device_down(struct net_device *dev) { struct nr_neigh *s; struct hlist_node *nodet, *node2t; - struct nr_node *t; + struct nr_node *t; int i;
- spin_lock_bh(&nr_neigh_list_lock); - nr_neigh_for_each_safe(s, nodet, &nr_neigh_list) { - if (s->dev == dev) { - spin_lock_bh(&nr_node_list_lock); - nr_node_for_each_safe(t, node2t, &nr_node_list) { - nr_node_lock(t); - for (i = 0; i < t->count; i++) { - if (t->routes[i].neighbour == s) { - t->count--; - - switch (i) { - case 0: - t->routes[0] = t->routes[1]; - fallthrough; - case 1: - t->routes[1] = t->routes[2]; - break; - case 2: - break; - } - } - } + spin_lock_bh(&nr_node_list_lock); + nr_node_for_each_safe(t, node2t, &nr_node_list) { + nr_node_lock(t); + for (i = 0; i < t->count; i++) { + s = t->routes[i].neighbour; + if (s->dev == dev) { + t->count--;
- if (t->count <= 0) - nr_remove_node_locked(t); - nr_node_unlock(t); + switch (i) { + case 0: + t->routes[0] = t->routes[1]; + fallthrough; + case 1: + t->routes[1] = t->routes[2]; + break; + case 2: + break; + } + i--; } - spin_unlock_bh(&nr_node_list_lock); + }
+ if (t->count <= 0) + nr_remove_node_locked(t); + nr_node_unlock(t); + } + spin_unlock_bh(&nr_node_list_lock); + + spin_lock_bh(&nr_neigh_list_lock); + nr_neigh_for_each_safe(s, nodet, &nr_neigh_list) { + if (s->dev == dev) nr_remove_neigh_locked(s); - } } spin_unlock_bh(&nr_neigh_list_lock); } @@ -962,23 +963,23 @@ const struct seq_operations nr_neigh_seqops = { void nr_rt_free(void) { struct nr_neigh *s = NULL; - struct nr_node *t = NULL; + struct nr_node *t = NULL; struct hlist_node *nodet;
- spin_lock_bh(&nr_neigh_list_lock); spin_lock_bh(&nr_node_list_lock); + spin_lock_bh(&nr_neigh_list_lock); nr_node_for_each_safe(t, nodet, &nr_node_list) { nr_node_lock(t); nr_remove_node_locked(t); nr_node_unlock(t); } nr_neigh_for_each_safe(s, nodet, &nr_neigh_list) { - while(s->count) { + while (s->count) { s->count--; nr_neigh_put(s); } nr_remove_neigh_locked(s); } - spin_unlock_bh(&nr_node_list_lock); spin_unlock_bh(&nr_neigh_list_lock); + spin_unlock_bh(&nr_node_list_lock); }
When a device goes down, nr_rt_device_down() clears the routes that use this device. However, it fails to drop the per-route reference to the neighbour (nr_neigh): neither nr_neigh->count nor its refcount is decremented when the route is removed.
Mirror the behaviour of nr_dec_obs() / nr_del_node() by decrementing nr_neigh->count and calling nr_neigh_put(), so that the neighbour reference is properly released when a device goes down.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Junjie Cao junjie.cao@intel.com --- net/netrom/nr_route.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c index 20aacfdfccd4..35d10985fd7a 100644 --- a/net/netrom/nr_route.c +++ b/net/netrom/nr_route.c @@ -517,6 +517,9 @@ void nr_rt_device_down(struct net_device *dev) for (i = 0; i < t->count; i++) { s = t->routes[i].neighbour; if (s->dev == dev) { + s->count--; + nr_neigh_put(s); + t->count--;
switch (i) {
linux-stable-mirror@lists.linaro.org