The following tests are failing on debug kernels:
tcp_tcp_info_tcp-info-rwnd-limited.pkt tcp_tcp_info_tcp-info-sndbuf-limited.pkt
with reports like:
assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \ AssertionError: 18000
and:
assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time AssertionError: 362000
Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign debug flakes as xfail") to cover them.
Signed-off-by: Jakub Kicinski kuba@kernel.org --- CC: shuah@kernel.org CC: willemb@google.com CC: matttbe@kernel.org CC: linux-kselftest@vger.kernel.org --- tools/testing/selftests/net/packetdrill/ksft_runner.sh | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh index ff989c325eef..e15c43b7359b 100755 --- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh +++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh @@ -43,6 +43,7 @@ if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then "tcp_timestamping.*.pkt" "tcp_user_timeout_user-timeout-probe.pkt" "tcp_zerocopy_epoll_.*.pkt" + "tcp_tcp_info_tcp-info-*-limited.pkt" ) readonly xfail_regex="^($(printf '%s|' "${xfail_list[@]}"))$" [[ "$script" =~ ${xfail_regex} ]] && failfunc=ktap_test_xfail
Jakub Kicinski wrote:
The following tests are failing on debug kernels:
tcp_tcp_info_tcp-info-rwnd-limited.pkt tcp_tcp_info_tcp-info-sndbuf-limited.pkt
with reports like:
assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \
AssertionError: 18000
and:
assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time
AssertionError: 362000
Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign debug flakes as xfail") to cover them.
Signed-off-by: Jakub Kicinski kuba@kernel.org
Reviewed-by: Willem de Bruijn willemb@google.com
Thanks.
I see that we'll still have a few flakes on dbg. Perhaps one total failure a day. From the following.
tcp-close-close-local-close-then-remote-fin-pkt tcp-ecn-ecn-uses-ect0-pkt tcp-eor-no-coalesce-retrans-pkt tcp-slow-start-slow-start-after-win-update-pkt tcp-sack-sack-route-refresh-ip-tos-pkt tcp-ts-recent-reset-tsval-pkt tcp-zerocopy-closed-pkt
We'll take a look after this change whether we can make these more resilient. But likely also allow-list or even xfail for everything in dbg.
On Thu, 16 Jan 2025 08:05:57 -0500 Willem de Bruijn wrote:
Jakub Kicinski wrote:
The following tests are failing on debug kernels:
tcp_tcp_info_tcp-info-rwnd-limited.pkt tcp_tcp_info_tcp-info-sndbuf-limited.pkt
with reports like:
assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \
AssertionError: 18000
and:
assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time
AssertionError: 362000
Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign debug flakes as xfail") to cover them.
Signed-off-by: Jakub Kicinski kuba@kernel.org
Reviewed-by: Willem de Bruijn willemb@google.com
Thanks.
I see that we'll still have a few flakes on dbg. Perhaps one total failure a day. From the following.
tcp-close-close-local-close-then-remote-fin-pkt tcp-ecn-ecn-uses-ect0-pkt tcp-eor-no-coalesce-retrans-pkt tcp-slow-start-slow-start-after-win-update-pkt
Argh, I missed the two above, I had the ignored cases filtered out when I was looking :(
tcp-sack-sack-route-refresh-ip-tos-pkt tcp-ts-recent-reset-tsval-pkt tcp-zerocopy-closed-pkt
We'll take a look after this change whether we can make these more resilient. But likely also allow-list or even xfail for everything in dbg.
Okay.
Hi Willem, Jakub,
On 16/01/2025 14:05, Willem de Bruijn wrote:
Jakub Kicinski wrote:
The following tests are failing on debug kernels:
tcp_tcp_info_tcp-info-rwnd-limited.pkt tcp_tcp_info_tcp-info-sndbuf-limited.pkt
(...)
We'll take a look after this change whether we can make these more resilient. But likely also allow-list or even xfail for everything in dbg.
On MPTCP side, I spent quite a bit of time trying to improve the situation on debug kernels. Sure it feels good and reassuring to have spent this time understanding the instabilities. Most issues were due to spurious retransmissions, because Packetdrill was "too slow" to inject replies: so more like an issue in the tests. But I don't know if having these tests running in such slow environments helped to find bugs directly, e.g. catching unexpected packets. Maybe once? But at what cost?
Still it is good to run them on debug kernels to have extra verifications on the kernel side. As Ido mentioned last summer, perhaps we can ignore the test results, but keep logging them, and only look at the kernel warnings?
So yes, I agree with Willem: if that cannot easily be fixed, ignoring packetdrill err code for everything in debug sounds like the right direction.
Cheers, Matt
Hello:
This patch was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Wed, 15 Jan 2025 15:21:29 -0800 you wrote:
The following tests are failing on debug kernels:
tcp_tcp_info_tcp-info-rwnd-limited.pkt tcp_tcp_info_tcp-info-sndbuf-limited.pkt
with reports like:
[...]
Here is the summary with links: - [net-next] selftests/net: packetdrill: make tcp buf limited timing tests benign https://git.kernel.org/netdev/net-next/c/3030e3d57ba8
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org