From: Willem de Bruijn willemb@google.com
The test sends packets and compares enqueue, transmit and Ack timestamps with expected values. It installs netem delays to increase latency between these points.
The test proves flaky in virtual environment (vng). Increase the delays to reduce variance. Scale measurement tolerance accordingly.
Time sensitive tests are difficult to calibrate. Increasing delays 10x also increases runtime 10x, for one. And it may still prove flaky at some rate.
Signed-off-by: Willem de Bruijn willemb@google.com --- tools/testing/selftests/net/txtimestamp.sh | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/txtimestamp.sh b/tools/testing/selftests/net/txtimestamp.sh index 31637769f59f..25baca4b148e 100755 --- a/tools/testing/selftests/net/txtimestamp.sh +++ b/tools/testing/selftests/net/txtimestamp.sh @@ -8,13 +8,13 @@ set -e
setup() { # set 1ms delay on lo egress - tc qdisc add dev lo root netem delay 1ms + tc qdisc add dev lo root netem delay 10ms
# set 2ms delay on ifb0 egress modprobe ifb ip link add ifb_netem0 type ifb ip link set dev ifb_netem0 up - tc qdisc add dev ifb_netem0 root netem delay 2ms + tc qdisc add dev ifb_netem0 root netem delay 20ms
# redirect lo ingress through ifb0 egress tc qdisc add dev lo handle ffff: ingress @@ -24,9 +24,11 @@ setup() { }
run_test_v4v6() { - # SND will be delayed 1000us - # ACK will be delayed 6000us: 1 + 2 ms round-trip - local -r args="$@ -v 1000 -V 6000" + # SND will be delayed 10ms + # ACK will be delayed 60ms: 10 + 20 ms round-trip + # allow +/- tolerance of 8ms + # wait for ACK to be queued + local -r args="$@ -v 10000 -V 60000 -t 8000 -S 80000"
./txtimestamp ${args} -4 -L 127.0.0.1 ./txtimestamp ${args} -6 -L ::1
On Fri, Jan 26, 2024 at 09:31:51PM -0500, Willem de Bruijn wrote:
From: Willem de Bruijn willemb@google.com
The test sends packets and compares enqueue, transmit and Ack timestamps with expected values. It installs netem delays to increase latency between these points.
The test proves flaky in virtual environment (vng). Increase the delays to reduce variance. Scale measurement tolerance accordingly.
Time sensitive tests are difficult to calibrate. Increasing delays 10x also increases runtime 10x, for one. And it may still prove flaky at some rate.
Signed-off-by: Willem de Bruijn willemb@google.com
Reviewed-by: Simon Horman horms@kernel.org
On Fri, 26 Jan 2024 21:31:51 -0500 Willem de Bruijn wrote:
From: Willem de Bruijn willemb@google.com
The test sends packets and compares enqueue, transmit and Ack timestamps with expected values. It installs netem delays to increase latency between these points.
The test proves flaky in virtual environment (vng). Increase the delays to reduce variance. Scale measurement tolerance accordingly.
Time sensitive tests are difficult to calibrate. Increasing delays 10x also increases runtime 10x, for one. And it may still prove flaky at some rate.
Willem, do you still want us to apply this as is or should we do the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?
Jakub Kicinski wrote:
On Fri, 26 Jan 2024 21:31:51 -0500 Willem de Bruijn wrote:
From: Willem de Bruijn willemb@google.com
The test sends packets and compares enqueue, transmit and Ack timestamps with expected values. It installs netem delays to increase latency between these points.
The test proves flaky in virtual environment (vng). Increase the delays to reduce variance. Scale measurement tolerance accordingly.
Time sensitive tests are difficult to calibrate. Increasing delays 10x also increases runtime 10x, for one. And it may still prove flaky at some rate.
Willem, do you still want us to apply this as is or should we do the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?
If the test passes on all platforms with this change, I think that's still preferable.
The only downside is that it will take 10x runtime. But that will continue on debug and virtualized builds anyway.
On the upside, the awesome dash does indicate that it passes as is on non-debug metal instances:
https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh
Let me know if you want me to use this as a testcase for $KSFT_MACHINE_SLOW.
Otherwise I'll start with the gro and so-txtime tests. They may not be so easily calibrated. As we cannot control the gro timeout, nor the FQ max horizon.
In such cases we can use the environment variable to either skip the test entirely or --my preference-- run it to get code coverage, but suppress a failure if due to timing (only). Sounds good?
On Wed, 31 Jan 2024 10:06:18 -0500 Willem de Bruijn wrote:
Willem, do you still want us to apply this as is or should we do the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?
If the test passes on all platforms with this change, I think that's still preferable.
The only downside is that it will take 10x runtime. But that will continue on debug and virtualized builds anyway.
On the upside, the awesome dash does indicate that it passes as is on non-debug metal instances:
https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh
Let me know if you want me to use this as a testcase for $KSFT_MACHINE_SLOW.
Ah, all good, I thought your increasing the acceptance criteria.
Otherwise I'll start with the gro and so-txtime tests. They may not be so easily calibrated. As we cannot control the gro timeout, nor the FQ max horizon.
Paolo also mentioned working on GRO, maybe we need a spreadsheet for people to "reserve" broken tests to avoid duplicating work? :S
In such cases we can use the environment variable to either skip the test entirely or --my preference-- run it to get code coverage, but suppress a failure if due to timing (only). Sounds good?
+1 I also think we should run and ignore failure. I was wondering if we can swap FAIL for XFAIL in those cases:
tools/testing/selftests/kselftest.h #define KSFT_XFAIL 2
Documentation/dev-tools/ktap.rst - "XFAIL", which indicates that a test is expected to fail. This is similar to "TODO", above, and is used by some kselftest tests.
IDK if that's a stretch or not. Or we can just return PASS with a comment?
Jakub Kicinski wrote:
On Wed, 31 Jan 2024 10:06:18 -0500 Willem de Bruijn wrote:
Willem, do you still want us to apply this as is or should we do the 10x only if [ x$KSFT_MACHINE_SLOW != x ] ?
If the test passes on all platforms with this change, I think that's still preferable.
The only downside is that it will take 10x runtime. But that will continue on debug and virtualized builds anyway.
On the upside, the awesome dash does indicate that it passes as is on non-debug metal instances:
https://netdev.bots.linux.dev/contest.html?test=txtimestamp-sh
Let me know if you want me to use this as a testcase for $KSFT_MACHINE_SLOW.
Ah, all good, I thought your increasing the acceptance criteria.
Otherwise I'll start with the gro and so-txtime tests. They may not be so easily calibrated. As we cannot control the gro timeout, nor the FQ max horizon.
Paolo also mentioned working on GRO, maybe we need a spreadsheet for people to "reserve" broken tests to avoid duplicating work? :S
In such cases we can use the environment variable to either skip the test entirely or --my preference-- run it to get code coverage, but suppress a failure if due to timing (only). Sounds good?
+1 I also think we should run and ignore failure. I was wondering if we can swap FAIL for XFAIL in those cases:
tools/testing/selftests/kselftest.h #define KSFT_XFAIL 2
Documentation/dev-tools/ktap.rst
- "XFAIL", which indicates that a test is expected to fail. This is similar to "TODO", above, and is used by some kselftest tests.
IDK if that's a stretch or not. Or we can just return PASS with a comment?
Flaky tests will then report both pass and expected fail. That might add noise to https://netdev.bots.linux.dev/flakes.html?
I initially considered returning skipped on timing failure. But that has the same issue.
So perhaps just return pass?
Especially for flaky tests sometimes returning pass and sometimes returning expected to fa red/green dash such as
On Wed, 31 Jan 2024 15:27:34 -0500 Willem de Bruijn wrote:
Jakub Kicinski wrote:
+1 I also think we should run and ignore failure. I was wondering if we can swap FAIL for XFAIL in those cases:
tools/testing/selftests/kselftest.h #define KSFT_XFAIL 2
Documentation/dev-tools/ktap.rst
- "XFAIL", which indicates that a test is expected to fail. This is similar to "TODO", above, and is used by some kselftest tests.
IDK if that's a stretch or not. Or we can just return PASS with a comment?
Flaky tests will then report both pass and expected fail. That might add noise to https://netdev.bots.linux.dev/flakes.html?
I initially considered returning skipped on timing failure. But that has the same issue.
So perhaps just return pass?
Especially for flaky tests sometimes returning pass and sometimes returning expected to fa red/green dash such as
Right, we only have pass / fail / skip. (I put the "warn" result in for tests migrated from patchwork so ignore its existence for tests.)
We already treat XFAIL in KTAP as "pass". TCP-AO's key-managemeent_ipv6 test for example already reports XFAIL:
# ok 15 # XFAIL listen() after current/rnext keys set: the socket has current/rn ext keys: 100:200
Skips look somewhat similar in KTAP, "ok $number # SKIP" but we fish those out specifically to catch skips. Any other "ok .... # comment" KTAP result is treated as a "pass" right now.
Jakub Kicinski wrote:
On Wed, 31 Jan 2024 15:27:34 -0500 Willem de Bruijn wrote:
Jakub Kicinski wrote:
+1 I also think we should run and ignore failure. I was wondering if we can swap FAIL for XFAIL in those cases:
tools/testing/selftests/kselftest.h #define KSFT_XFAIL 2
Documentation/dev-tools/ktap.rst
- "XFAIL", which indicates that a test is expected to fail. This is similar to "TODO", above, and is used by some kselftest tests.
IDK if that's a stretch or not. Or we can just return PASS with a comment?
Flaky tests will then report both pass and expected fail. That might add noise to https://netdev.bots.linux.dev/flakes.html?
I initially considered returning skipped on timing failure. But that has the same issue.
So perhaps just return pass?
Especially for flaky tests sometimes returning pass and sometimes returning expected to fa red/green dash such as
Right, we only have pass / fail / skip. (I put the "warn" result in for tests migrated from patchwork so ignore its existence for tests.)
We already treat XFAIL in KTAP as "pass". TCP-AO's key-managemeent_ipv6 test for example already reports XFAIL:
Ok perfect. Then I'll do the same.
# ok 15 # XFAIL listen() after current/rnext keys set: the socket has current/rn ext keys: 100:200
Skips look somewhat similar in KTAP, "ok $number # SKIP" but we fish those out specifically to catch skips. Any other "ok .... # comment" KTAP result is treated as a "pass" right now.
On Wed, 2024-01-31 at 10:06 -0500, Willem de Bruijn wrote:
Otherwise I'll start with the gro and so-txtime tests. They may not be so easily calibrated. As we cannot control the gro timeout, nor the FQ max horizon.
Note that we can control the GRO timeout to some extent, via gro_flush_timeout, see commit 89abe628375301fedb68770644df845d49018d8b.
Unfortunately that is not enough for 'large' gro tests. I think the root cause is that the process sending the packets can be de-scheduled - even the qemu VM from the hypervisor CPU - causing an extremely large gap between consecutive pkts.
I guess/hope that replacing multiple sendmsg() with a sendmmsg() could improve a bit the scenario, but I fear it will not solve the issue completely.
In such cases we can use the environment variable to either skip the test entirely or --my preference-- run it to get code coverage, but suppress a failure if due to timing (only). Sounds good?
Sounds good to me! I was wondering about skipping the 'large' test only, but suppressing the failure when KSFT_MACHINE_SLOW=yes only for such test looks a better option.
Thanks!
Paolo
Hello:
This patch was applied to netdev/net-next.git (main) by Jakub Kicinski kuba@kernel.org:
On Fri, 26 Jan 2024 21:31:51 -0500 you wrote:
From: Willem de Bruijn willemb@google.com
The test sends packets and compares enqueue, transmit and Ack timestamps with expected values. It installs netem delays to increase latency between these points.
The test proves flaky in virtual environment (vng). Increase the delays to reduce variance. Scale measurement tolerance accordingly.
[...]
Here is the summary with links: - [net-next] selftests/net: calibrate txtimestamp https://git.kernel.org/netdev/net-next/c/5264ab612e28
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org