Hi,
I've just updated my kernel 4.4.x and noticed a regression. Bisecting pointed me to the commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is backport of upstream 76da0704507bb. That backported commit has appeared in a 4.4.103.
I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping a container I start getting these messages: [ 229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1 (...)
Trying to start LXC nevertheless results in lxc-start command hang around network configuration. Trying to query LXC state afterwards results in a lxc-info command hang too.
I tried Googling for this issue and found similar reports: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637 https://github.com/fnproject/fn/issues/686 https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-wa... all of them related to the Docker, which is probably a similar use case to the LXC.
I couldn't find any reference to commit 76da0704507bb that could suggest fixing the problem I'm seeing.
Does anyone have an idea what is the issue I'm seeing about? Or even better, how to fix it? Can I provide any additional info that would help?
[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm... [1] https://openwrt.org/ [2] https://linuxcontainers.org/
On 23.04.2018 15:08, Rafał Miłecki wrote:
I've just updated my kernel 4.4.x and noticed a regression. Bisecting pointed me to the commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is backport of upstream 76da0704507bb. That backported commit has appeared in a 4.4.103.
I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping a container I start getting these messages: [ 229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1 (...)
Trying to start LXC nevertheless results in lxc-start command hang around network configuration. Trying to query LXC state afterwards results in a lxc-info command hang too.
I tried Googling for this issue and found similar reports: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637 https://github.com/fnproject/fn/issues/686 https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-wa... all of them related to the Docker, which is probably a similar use case to the LXC.
I couldn't find any reference to commit 76da0704507bb that could suggest fixing the problem I'm seeing.
Does anyone have an idea what is the issue I'm seeing about? Or even better, how to fix it? Can I provide any additional info that would help?
[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm... [1] https://openwrt.org/ [2] https://linuxcontainers.org/
Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I still experience the same problem.
From reading various reports regarding that "unregister_netdevice: waiting for lo to become free" message it appears the problem is caused by a leaking dst refcnt somewhere in the kernel code.
I found links to few commit fixing leaks at various places: 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst") 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()") 4ee806d51176b ("net: tcp: close sock if net namespace is exiting") d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()") 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
All above patches are present in the linux-v4.4.y and are part of kernel 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only expost existing one?
On 25.04.2018 17:16, Rafał Miłecki wrote:
On 23.04.2018 15:08, Rafał Miłecki wrote:
I've just updated my kernel 4.4.x and noticed a regression. Bisecting pointed me to the commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is backport of upstream 76da0704507bb. That backported commit has appeared in a 4.4.103.
I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping a container I start getting these messages: [ 229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1 (...)
Trying to start LXC nevertheless results in lxc-start command hang around network configuration. Trying to query LXC state afterwards results in a lxc-info command hang too.
I tried Googling for this issue and found similar reports: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637 https://github.com/fnproject/fn/issues/686 https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-wa... all of them related to the Docker, which is probably a similar use case to the LXC.
I couldn't find any reference to commit 76da0704507bb that could suggest fixing the problem I'm seeing.
Does anyone have an idea what is the issue I'm seeing about? Or even better, how to fix it? Can I provide any additional info that would help?
[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm... [1] https://openwrt.org/ [2] https://linuxcontainers.org/
Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I still experience the same problem.
From reading various reports regarding that "unregister_netdevice: waiting for lo to become free" message it appears the problem is caused by a leaking dst refcnt somewhere in the kernel code.
I found links to few commit fixing leaks at various places: 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst") 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()") 4ee806d51176b ("net: tcp: close sock if net namespace is exiting") d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()") 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
All above patches are present in the linux-v4.4.y and are part of kernel 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only expost existing one?
Mathias Tillman reported this as "4.4.103 linux kernel regression". Last message in that thread (which I couldn't find in mailing list archives) had: | As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code that adds a in6_dev_get call without calling in6_dev_put.
On 25.04.2018 16:30, Konstantin Khlebnikov wrote:
On 25.04.2018 17:16, Rafał Miłecki wrote:
On 23.04.2018 15:08, Rafał Miłecki wrote:
I've just updated my kernel 4.4.x and noticed a regression. Bisecting pointed me to the commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is backport of upstream 76da0704507bb. That backported commit has appeared in a 4.4.103.
I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping a container I start getting these messages: [ 229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1 (...)
Trying to start LXC nevertheless results in lxc-start command hang around network configuration. Trying to query LXC state afterwards results in a lxc-info command hang too.
I tried Googling for this issue and found similar reports: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637 https://github.com/fnproject/fn/issues/686 https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-wa... all of them related to the Docker, which is probably a similar use case to the LXC.
I couldn't find any reference to commit 76da0704507bb that could suggest fixing the problem I'm seeing.
Does anyone have an idea what is the issue I'm seeing about? Or even better, how to fix it? Can I provide any additional info that would help?
[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm... [1] https://openwrt.org/ [2] https://linuxcontainers.org/
Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I still experience the same problem.
From reading various reports regarding that "unregister_netdevice: waiting for lo to become free" message it appears the problem is caused by a leaking dst refcnt somewhere in the kernel code.
I found links to few commit fixing leaks at various places: 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst") 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()") 4ee806d51176b ("net: tcp: close sock if net namespace is exiting") d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()") 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
All above patches are present in the linux-v4.4.y and are part of kernel 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only expost existing one?
Mathias Tillman reported this as "4.4.103 linux kernel regression". Last message in that thread (which I couldn't find in mailing list archives) had: | As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code that adds a in6_dev_get call without calling in6_dev_put.
Wow, this is very helpful, thank you!
Somehow I didn't even think about OpenWrt downstream patches. Too bad this wasn't reported to the OpenWrt community, I spent 2 days on this. There is indeed: target/linux/generic/patches-4.4/670-ipv6-allow-rejecting-with-source-address-failed-policy.patch [PATCH 1/2] ipv6: allow rejecting with "source address failed policy"
I'll move this issue discussion to the OpenWrt/LEDE now, I hope we can sort it out.
On 25 April 2018 at 16:44, Rafał Miłecki zajec5@gmail.com wrote:
On 25.04.2018 16:30, Konstantin Khlebnikov wrote:
On 25.04.2018 17:16, Rafał Miłecki wrote:
On 23.04.2018 15:08, Rafał Miłecki wrote:
I've just updated my kernel 4.4.x and noticed a regression. Bisecting pointed me to the commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") [0] which is backport of upstream 76da0704507bb. That backported commit has appeared in a 4.4.103.
I use OpenWrt/LEDE [1] distribution and LXC [2] 1.1.5. After stopping a container I start getting these messages: [ 229.419188] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 239.660408] unregister_netdevice: waiting for lo to become free. Usage count = 1 [ 249.839189] unregister_netdevice: waiting for lo to become free. Usage count = 1 (...)
Trying to start LXC nevertheless results in lxc-start command hang around network configuration. Trying to query LXC state afterwards results in a lxc-info command hang too.
I tried Googling for this issue and found similar reports: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1729637 https://github.com/fnproject/fn/issues/686
https://lime-technology.com/forums/topic/66863-kernelunregister_netdevice-wa... all of them related to the Docker, which is probably a similar use case to the LXC.
I couldn't find any reference to commit 76da0704507bb that could suggest fixing the problem I'm seeing.
Does anyone have an idea what is the issue I'm seeing about? Or even better, how to fix it? Can I provide any additional info that would help?
[0] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/comm... [1] https://openwrt.org/ [2] https://linuxcontainers.org/
Today I tried 4.14.34 to see if that helps. Unfortunately it doesn't. I still experience the same problem.
From reading various reports regarding that "unregister_netdevice: waiting for lo to become free" message it appears the problem is caused by a leaking dst refcnt somewhere in the kernel code.
I found links to few commit fixing leaks at various places: 4a31a6b19f9dd ("sctp: fix dst refcnt leak in sctp_v4_get_dst") 957d761cf91cd ("sctp: fix dst refcnt leak in sctp_v6_get_dst()") 4ee806d51176b ("net: tcp: close sock if net namespace is exiting") d747a7a51b009 ("tcp: reset sk_rx_dst in tcp_disconnect()") 751eb6b6042a5 ("ipv6: addrconf: fix dev refcont leak when DAD failed")
All above patches are present in the linux-v4.4.y and are part of kernel 4.4.124 I use. So it seems I'm facing yet another dst refcnt leak.
Could commit 2417da3f4d6bc ("ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER") introduce a new dst refcnt leak? Or does it only expost existing one?
Mathias Tillman reported this as "4.4.103 linux kernel regression". Last message in that thread (which I couldn't find in mailing list archives) had: | As it turns out, it's due to a patch in the Turris Omnia/OpenWRT code that adds a in6_dev_get call without calling in6_dev_put.
Wow, this is very helpful, thank you!
Somehow I didn't even think about OpenWrt downstream patches. Too bad this wasn't reported to the OpenWrt community, I spent 2 days on this. There is indeed: target/linux/generic/patches-4.4/670-ipv6-allow-rejecting-with-source-address-failed-policy.patch [PATCH 1/2] ipv6: allow rejecting with "source address failed policy"
I'll move this issue discussion to the OpenWrt/LEDE now, I hope we can sort it out.
For a reference it has been fixed in OpenWrt/LEDE by Felix in:
1) master branch: https://git.openwrt.org/?p=openwrt/openwrt.git%3Ba=commitdiff%3Bh=58f7b5b96c...
2) lede-17.01 branch https://git.openwrt.org/?p=openwrt/openwrt.git%3Ba=commitdiff%3Bh=999bb66b20...
linux-stable-mirror@lists.linaro.org