On 25.04.2018 17:16, Rafał Miłecki wrote:
On 23.04.2018 15:08, Rafał Miłecki wrote:
I've just updated my kernel 4.4.x and noticed a regression. Bisecting pointed me to the commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is backport of upstream 76da0704507bb. That backported commit has appeared in a 4.4.103.
I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping a container I start getting these messages: [ 229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1 (...)
Trying to start LXC nevertheless results in lxc-start command hang around network configuration. Trying to query LXC state afterwards results in a lxc-info command hang too.
I tried Googling for this issue and found similar reports: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637 https://github.com/fnproject/fn/issues/686 https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-wa... all of them related to the Docker, which is probably a similar use case to the LXC.
I couldn't find any reference to commit 76da0704507bb that could suggest fixing the problem I'm seeing.
Does anyone have an idea what is the issue I'm seeing about? Or even better, how to fix it? Can I provide any additional info that would help?
[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm... [1] https://openwrt.org/ [2] https://linuxcontainers.org/
Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I still experience the same problem.
From reading various reports regarding that "unregister_netdevice: waiting for lo to become free" message it appears the problem is caused by a leaking dst refcnt somewhere in the kernel code.
I found links to few commit fixing leaks at various places: 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst") 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()") 4ee806d51176b ("net: tcp: close sock if net namespace is exiting") d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()") 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
All above patches are present in the linux-v4.4.y and are part of kernel 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only expost existing one?
Mathias Tillman reported this as "4.4.103 linux kernel regression". Last message in that thread (which I couldn't find in mailing list archives) had: | As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code that adds a in6_dev_get call without calling in6_dev_put.