Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative cmsg_time.sh test") widened the accepted value range 8x but we still see flakes (at a rate of around 7%).
Return XFAIL for the most timing sensitive test on slow machines.
Before:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' FAIL - 1/36 cases failed
After:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) OK
Signed-off-by: Jakub Kicinski kuba@kernel.org --- CC: shuah@kernel.org CC: linux-kselftest@vger.kernel.org CC: willemdebruijn.kernel@gmail.com --- tools/testing/selftests/net/cmsg_time.sh | 35 +++++++++++++++++------- 1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/cmsg_time.sh b/tools/testing/selftests/net/cmsg_time.sh index 1d7e756644bc..478af0aefa97 100755 --- a/tools/testing/selftests/net/cmsg_time.sh +++ b/tools/testing/selftests/net/cmsg_time.sh @@ -34,13 +34,28 @@ BAD=0 TOTAL=0
check_result() { + local ret=$1 + local got=$2 + local exp=$3 + local case=$4 + local xfail=$5 + local xf= + local inc= + + if [ "$xfail" == "xfail" ]; then + xf="(XFAIL)" + inc=0 + else + inc=1 + fi + ((TOTAL++)) - if [ $1 -ne 0 ]; then - echo " Case $4 returned $1, expected 0" - ((BAD++)) + if [ $ret -ne 0 ]; then + echo " Case $case returned $ret, expected 0 $xf" + ((BAD+=inc)) elif [ "$2" != "$3" ]; then - echo " Case $4 returned '$2', expected '$3'" - ((BAD++)) + echo " Case $case returned '$got', expected '$exp' $xf" + ((BAD+=inc)) fi }
@@ -66,14 +81,14 @@ for i in "-4 $TGT4" "-6 $TGT6"; do awk '/SND/ { if ($3 > 1000) print "OK"; }') check_result $? "$ts" "OK" "$prot - TXTIME abs"
- [ "$KSFT_MACHINE_SLOW" = yes ] && delay=8000 || delay=1000 + [ "$KSFT_MACHINE_SLOW" = yes ] && xfail=xfail
- ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d $delay | + ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d 1000 | awk '/SND/ {snd=$3} /SCHED/ {sch=$3} - END { if (snd - sch > '$((delay/2))') print "OK"; - else print snd, "-", sch, "<", '$((delay/2))'; }') - check_result $? "$ts" "OK" "$prot - TXTIME rel" + END { if (snd - sch > 500) print "OK"; + else print snd, "-", sch, "<", 500; }') + check_result $? "$ts" "OK" "$prot - TXTIME rel" $xfail done done
On 1/16/25 03:01, Jakub Kicinski wrote:
Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative cmsg_time.sh test") widened the accepted value range 8x but we still see flakes (at a rate of around 7%).
you have undid the 8x change by this commit (without mentioning that) [fine for me]
Return XFAIL for the most timing sensitive test on slow machines.
code change looks fine for me, and does exactly that, so: Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com
Before:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' FAIL - 1/36 cases failed
After:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) OK
Signed-off-by: Jakub Kicinski kuba@kernel.org
Jakub Kicinski wrote:
Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative cmsg_time.sh test") widened the accepted value range 8x but we still see flakes (at a rate of around 7%).
Return XFAIL for the most timing sensitive test on slow machines.
Before:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' FAIL - 1/36 cases failed
After:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) OK
Signed-off-by: Jakub Kicinski kuba@kernel.org
Reviewed-by: Willem de Bruijn willemb@google.com
Jakub Kicinski kuba@kernel.org writes:
Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative cmsg_time.sh test") widened the accepted value range 8x but we still see flakes (at a rate of around 7%).
Return XFAIL for the most timing sensitive test on slow machines.
Before:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' FAIL - 1/36 cases failed
After:
# ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) OK
Signed-off-by: Jakub Kicinski kuba@kernel.org
CC: shuah@kernel.org CC: linux-kselftest@vger.kernel.org CC: willemdebruijn.kernel@gmail.com
tools/testing/selftests/net/cmsg_time.sh | 35 +++++++++++++++++------- 1 file changed, 25 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/cmsg_time.sh b/tools/testing/selftests/net/cmsg_time.sh index 1d7e756644bc..478af0aefa97 100755 --- a/tools/testing/selftests/net/cmsg_time.sh +++ b/tools/testing/selftests/net/cmsg_time.sh @@ -34,13 +34,28 @@ BAD=0 TOTAL=0 check_result() {
- local ret=$1
- local got=$2
- local exp=$3
- local case=$4
- local xfail=$5
- local xf=
- local inc=
- if [ "$xfail" == "xfail" ]; then
- xf="(XFAIL)"
- inc=0
- else
- inc=1
- fi
- ((TOTAL++))
- if [ $1 -ne 0 ]; then
- echo " Case $4 returned $1, expected 0"
- ((BAD++))
- if [ $ret -ne 0 ]; then
- echo " Case $case returned $ret, expected 0 $xf"
- ((BAD+=inc)) elif [ "$2" != "$3" ]; then
- echo " Case $4 returned '$2', expected '$3'"
- ((BAD++))
- echo " Case $case returned '$got', expected '$exp' $xf"
- ((BAD+=inc)) fi
} @@ -66,14 +81,14 @@ for i in "-4 $TGT4" "-6 $TGT6"; do awk '/SND/ { if ($3 > 1000) print "OK"; }') check_result $? "$ts" "OK" "$prot - TXTIME abs"
- [ "$KSFT_MACHINE_SLOW" = yes ] && delay=8000 || delay=1000
- [ "$KSFT_MACHINE_SLOW" = yes ] && xfail=xfail
- ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d $delay |
- ts=$(ip netns exec $NS ./cmsg_sender -p $p $i 1234 -t -d 1000 | awk '/SND/ {snd=$3} /SCHED/ {sch=$3}
END { if (snd - sch > '$((delay/2))') print "OK";
else print snd, "-", sch, "<", '$((delay/2))'; }')
- check_result $? "$ts" "OK" "$prot - TXTIME rel"
END { if (snd - sch > 500) print "OK";
else print snd, "-", sch, "<", 500; }')
- check_result $? "$ts" "OK" "$prot - TXTIME rel" $xfail done
done
This logging and xfail handling duplicates lib.sh. Would a patch like below be OK with you? The gist of it is to just use check_err, log_test and xfail_on_slow from lib.sh to achieve what the test open-codes.
I can send it as a follow-up instead if you prefer.
--- a/tools/testing/selftests/net/cmsg_time.sh +++ b/tools/testing/selftests/net/cmsg_time.sh @@ -29,19 +29,21 @@ ip -netns $NS addr add $IP6 dev dummy0 # Need FQ for TXTIME ip netns exec $NS tc qdisc replace dev dummy0 root fq
-# Test -BAD=0 -TOTAL=0 - check_result() { - ((TOTAL++)) - if [ $1 -ne 0 ]; then - echo " Case $4 returned $1, expected 0" - ((BAD++)) - elif [ "$2" != "$3" ]; then - echo " Case $4 returned '$2', expected '$3'" - ((BAD++)) - fi + local ret=$1 + local got=$2 + local exp=$3 + local case=$4 + + RET=0 + + [ $1 -eq 0 ] + check_err $? "Case $4 returned $1, expected 0" + + [ "$2" == "$3" ] + check_err $? "Case $4 returned '$2', expected '$3'" + + log_test "$4" }
for i in "-4 $TGT4" "-6 $TGT6"; do @@ -73,15 +79,8 @@ for i in "-4 $TGT4" "-6 $TGT6"; do /SCHED/ {sch=$3} END { if (snd - sch > '$((delay/2))') print "OK"; else print snd, "-", sch, "<", '$((delay/2))'; }') - check_result $? "$ts" "OK" "$prot - TXTIME rel" + xfail_on_slow check_result $? "$ts" "OK" "$prot - TXTIME rel" done done
-# Summary -if [ $BAD -ne 0 ]; then - echo "FAIL - $BAD/$TOTAL cases failed" - exit 1 -else - echo "OK" - exit 0 -fi +exit $EXIT_STATUS
This is much more verbose, but that's how tests tend to be. I could change it to only log on RET != 0, but like this the custom results block can go away and the test is overall more median.
bash-5.2# KSFT_MACHINE_SLOW=yes ./cmsg_time.sh TEST: UDPv4 - no options [ OK ] TEST: UDPv4 - ts cnt [ OK ] TEST: UDPv4 - ts0 SCHED [ OK ] TEST: UDPv4 - ts0 SND [ OK ] TEST: UDPv4 - TXTIME abs [ OK ] TEST: UDPv4 - TXTIME rel [XFAIL] Case UDPv4 - TXTIME rel returned '34us - 30us < 4000', expected 'OK' TEST: ICMPv4 - no options [ OK ] TEST: ICMPv4 - ts cnt [ OK ] [... snip ...] bash-5.2# echo $? 0
On Fri, 17 Jan 2025 13:49:23 +0100 Petr Machata wrote:
This logging and xfail handling duplicates lib.sh. Would a patch like below be OK with you? The gist of it is to just use check_err, log_test and xfail_on_slow from lib.sh to achieve what the test open-codes.
Hm, maybe? I'm not going to say no if you send a patch, but IMHO if we were to change the output I think we should make it ktap.
Jakub Kicinski kuba@kernel.org writes:
On Fri, 17 Jan 2025 13:49:23 +0100 Petr Machata wrote:
This logging and xfail handling duplicates lib.sh. Would a patch like below be OK with you? The gist of it is to just use check_err, log_test and xfail_on_slow from lib.sh to achieve what the test open-codes.
Hm, maybe? I'm not going to say no if you send a patch, but IMHO if we were to change the output I think we should make it ktap.
Hmm, at some point, we'll want to convert lib.sh to ktap as well. Hopefully transparently so that all the existing selftests out there just magically become ktap-compatible.
If you want to preserve the output 1:1 then I feel like that's bending over backwards too much. We could still maybe reuse xfail_on_slow and ask for FAIL_TO_XFAIL, but that's marked internal in lib.sh. So I retract my proposal. Too faffy at that point.
Hello:
This patch was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Wed, 15 Jan 2025 18:01:05 -0800 you wrote:
Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative cmsg_time.sh test") widened the accepted value range 8x but we still see flakes (at a rate of around 7%).
Return XFAIL for the most timing sensitive test on slow machines.
Before:
[...]
Here is the summary with links: - [net-next] selftests: net: give up on the cmsg_time accuracy on slow machines https://git.kernel.org/netdev/net-next/c/54ea680b759c
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org