On 3/14/23 12:45, Mirsad Todorovac wrote:
Hi, all!
After running tools/testing/selftests/net/tun, there seems to be some kind of hang in test "FAIL tun.reattach_delete_close" or "FAIL tun.reattach_close_delete".
Two tests exit by timeout, but the processes left are unkillable, even with kill -9 PID:
[root@pc-mtodorov linux_torvalds]# ps -ef | grep tun root 1140 1 0 12:16 ? 00:00:00 /bin/bash /usr/sbin/ksmtuned root 1333 1 0 12:16 ? 00:00:01 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P root 3930 2309 0 12:20 pts/1 00:00:00 tools/testing/selftests/net/tun root 3952 2309 0 12:21 pts/1 00:00:00 tools/testing/selftests/net/tun root 4056 3765 0 12:25 pts/1 00:00:00 grep --color=auto tun [root@pc-mtodorov linux_torvalds]# kill -9 3930 3952 [root@pc-mtodorov linux_torvalds]# ps -ef | grep tun root 1140 1 0 12:16 ? 00:00:00 /bin/bash /usr/sbin/ksmtuned root 1333 1 0 12:16 ? 00:00:01 /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P root 3930 2309 0 12:20 pts/1 00:00:00 tools/testing/selftests/net/tun root 3952 2309 0 12:21 pts/1 00:00:00 tools/testing/selftests/net/tun root 4060 3765 0 12:25 pts/1 00:00:00 grep --color=auto tun [root@pc-mtodorov linux_torvalds]#
The kernel seems to be stuck in some loop, and filling the log with the following messages until reboot, where it is also waiting very long on the situation to timeout, which apparently never happens.
Mar 14 11:54:09 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3 Mar 14 11:54:19 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3 Mar 14 11:54:29 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3 Mar 14 11:54:40 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3 Mar 14 11:54:50 pc-mtodorov kernel: unregister_netdevice: waiting for tap0 to become free. Usage count = 3
The platform is kernel 6.3.0-rc2 on AlmaLinux 8.7 and a LENOVO_MT_10TX_BU_Lenovo_FM_V530S-07ICB (lshw output attached).
The .config is here:
https://domac.alu.hr/~mtodorov/linux/selftests/net-tun/config-6.3.0-rc2-mg-a...
Basically, it is a vanilla Torvalds tree kernel with MGLRU, KMEMLEAK, and CONFIG_DEBUG_KOBJECT enabled. And devres patch.
Please find the strace of the net/tun run attached.
I am available for additional diagnostics.
Hi, again!
I've been busy while waiting for reply, so I wondered how would a vanilla kernel go through the test, considering that I've been testing a number of patches lately.
I did a fresh git clone from repo and woa.
Surprisingly, the test with CONFIG_DEBUG_KOBJECT turned off passes:
[root@pc-mtodorov linux_torvalds]# tools/testing/selftests/net/tun TAP version 13 1..5 # Starting 5 tests from 1 test cases. # RUN tun.delete_detach_close ... # OK tun.delete_detach_close ok 1 tun.delete_detach_close # RUN tun.detach_delete_close ... # OK tun.detach_delete_close ok 2 tun.detach_delete_close # RUN tun.detach_close_delete ... # OK tun.detach_close_delete ok 3 tun.detach_close_delete # RUN tun.reattach_delete_close ... # OK tun.reattach_delete_close ok 4 tun.reattach_delete_close # RUN tun.reattach_close_delete ... # OK tun.reattach_close_delete ok 5 tun.reattach_close_delete # PASSED: 5 / 5 tests passed. # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0 [root@pc-mtodorov linux_torvalds]#
So, no hanging processes that cannot be killed now.
If you think it is worthy to explore the lockup that occurs when turning CONFIG_DEBUG_KOBJECT=y, I will rebuild once again with these turned on, to clear any doubts.
Until later.
Best regards, Mirsad