GRO tests are timing dependent and can easily flake. This is partially mitigated in gro.sh by giving each subtest 3 chances to pass. However, this still flakes on some machines.
Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely to immediately flush. This already happened in setup_loopback.sh, but wasn't added to setup_veth.sh. This accounts for most of the reduction in flakiness.
We also increase the number of chances for success from 3 to 6.
`gro.sh -t <test>` now returns a passing/failing exit code as expected.
gro.c:main no longer erroneously claims a test passes when running as a server.
Tested: Ran `gro.sh -t large` 100 times with and without this change. Passed 100/100 with and 64/100 without. Ran inside strace to increase flakiness.
Signed-off-by: Kevin Krakauer krakauer@google.com --- tools/testing/selftests/net/gro.c | 8 +++++--- tools/testing/selftests/net/gro.sh | 5 +++-- tools/testing/selftests/net/setup_veth.sh | 1 + 3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c index b2184847e388..d5824eadea10 100644 --- a/tools/testing/selftests/net/gro.c +++ b/tools/testing/selftests/net/gro.c @@ -1318,11 +1318,13 @@ int main(int argc, char **argv) read_MAC(src_mac, smac); read_MAC(dst_mac, dmac);
- if (tx_socket) + if (tx_socket) { gro_sender(); - else + } else { + /* Only the receiver exit status determines test success. */ gro_receiver(); + fprintf(stderr, "Gro::%s test passed.\n", testname); + }
- fprintf(stderr, "Gro::%s test passed.\n", testname); return 0; } diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh index 02c21ff4ca81..703173f8c8a9 100755 --- a/tools/testing/selftests/net/gro.sh +++ b/tools/testing/selftests/net/gro.sh @@ -21,7 +21,7 @@ run_test() { # Each test is run 3 times to deflake, because given the receive timing, # not all packets that should coalesce will be considered in the same flow # on every try. - for tries in {1..3}; do + for tries in {1..6}; do # Actual test starts here ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \ 1>>log.txt & @@ -100,5 +100,6 @@ trap cleanup EXIT if [[ "${test}" == "all" ]]; then run_all_tests else - run_test "${proto}" "${test}" + exit_code=$(run_test "${proto}" "${test}") + exit $exit_code fi; diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh index 1f78a87f6f37..9882ad730c24 100644 --- a/tools/testing/selftests/net/setup_veth.sh +++ b/tools/testing/selftests/net/setup_veth.sh @@ -12,6 +12,7 @@ setup_veth_ns() {
[[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}" echo 1000000 > "/sys/class/net/${ns_dev}/gro_flush_timeout" + echo 50 > "/sys/class/net/${ns_dev}/napi_defer_hard_irqs" ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535 ip -netns "${ns_name}" link set dev "${ns_dev}" up
On Tue, 18 Feb 2025 08:45:55 -0800 Kevin Krakauer wrote:
GRO tests are timing dependent and can easily flake. This is partially mitigated in gro.sh by giving each subtest 3 chances to pass. However, this still flakes on some machines.
To be clear - are you running this over veth or a real device?
Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely to immediately flush. This already happened in setup_loopback.sh, but wasn't added to setup_veth.sh. This accounts for most of the reduction in flakiness.
That doesn't make intuitive sense to me. If we already defer flushes why do we need to also defer IRQs?
We also increase the number of chances for success from 3 to 6.
`gro.sh -t <test>` now returns a passing/failing exit code as expected.
gro.c:main no longer erroneously claims a test passes when running as a server.
Tested: Ran `gro.sh -t large` 100 times with and without this change. Passed 100/100 with and 64/100 without. Ran inside strace to increase flakiness.
Signed-off-by: Kevin Krakauer krakauer@google.com
tools/testing/selftests/net/gro.c | 8 +++++--- tools/testing/selftests/net/gro.sh | 5 +++-- tools/testing/selftests/net/setup_veth.sh | 1 + 3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c index b2184847e388..d5824eadea10 100644 --- a/tools/testing/selftests/net/gro.c +++ b/tools/testing/selftests/net/gro.c @@ -1318,11 +1318,13 @@ int main(int argc, char **argv) read_MAC(src_mac, smac); read_MAC(dst_mac, dmac);
- if (tx_socket)
- if (tx_socket) { gro_sender();
- else
- } else {
gro_receiver();/* Only the receiver exit status determines test success. */
fprintf(stderr, "Gro::%s test passed.\n", testname);
- }
- fprintf(stderr, "Gro::%s test passed.\n", testname);
That seems quite separate to the stability fix?
return 0; } diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh index 02c21ff4ca81..703173f8c8a9 100755 --- a/tools/testing/selftests/net/gro.sh +++ b/tools/testing/selftests/net/gro.sh @@ -21,7 +21,7 @@ run_test() { # Each test is run 3 times to deflake, because given the receive timing, # not all packets that should coalesce will be considered in the same flow # on every try.
- for tries in {1..3}; do
- for tries in {1..6}; do # Actual test starts here ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \ 1>>log.txt &
@@ -100,5 +100,6 @@ trap cleanup EXIT if [[ "${test}" == "all" ]]; then run_all_tests else
- run_test "${proto}" "${test}"
- exit_code=$(run_test "${proto}" "${test}")
- exit $exit_code
Also separate from stability?
Let's split the patch up into logically separate changes.
fi; diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh index 1f78a87f6f37..9882ad730c24 100644 --- a/tools/testing/selftests/net/setup_veth.sh +++ b/tools/testing/selftests/net/setup_veth.sh @@ -12,6 +12,7 @@ setup_veth_ns() { [[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}" echo 1000000 > "/sys/class/net/${ns_dev}/gro_flush_timeout"
- echo 50 > "/sys/class/net/${ns_dev}/napi_defer_hard_irqs" ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535 ip -netns "${ns_name}" link set dev "${ns_dev}" up
Thanks for the review! I'll split this up. Do you think it's better as two patchsets -- one for stability/deflaking, one for return value and output cleanup -- or as a single patchset with several commits?
To be clear - are you running this over veth or a real device?
Over a veth.
Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely to immediately flush. This already happened in setup_loopback.sh, but wasn't added to setup_veth.sh. This accounts for most of the reduction in flakiness.
That doesn't make intuitive sense to me. If we already defer flushes why do we need to also defer IRQs?
Yep, the behavior here is weird. I ran `gro.sh -t large` 1000 times with each of the following setups (all inside strace to increase flakiness):
- gro_flush_timeout=1ms, napi_defer_hard_irqs=0 --> failed to GRO 29 times - gro_flush_timeout=5ms, napi_defer_hard_irqs=0 --> failed to GRO 45 times - gro_flush_timeout=50ms, napi_defer_hard_irqs=0 --> failed to GRO 35 times - gro_flush_timeout=1ms, napi_defer_hard_irqs=1 --> failed to GRO 0 times - gro_flush_timeout=1ms, napi_defer_hard_irqs=50 --> failed to GRO 0 times
napi_defer_hard_irqs is clearly having an effect. And deferring once is enough. I believe that deferring IRQs prevents anything else from causing a GRO flush before gro_flush_timeout expires. While waiting for the timeout to expire, an incoming packet can cause napi_complete_done and thus napi_gro_flush to run. Outgoing packets from the veth can also cause this: veth_xmit calls __veth_xdp_flush, which only actually does anything when IRQs are enabled.
So napi_defer_hard_irqs=1 seems sufficient to allow the full gro_flush_timeout to expire before flushing GRO.
On Sun, 23 Feb 2025 07:19:49 -0800 Kevin Krakauer wrote:
Thanks for the review! I'll split this up. Do you think it's better as two patchsets -- one for stability/deflaking, one for return value and output cleanup -- or as a single patchset with several commits?
Should be fine either way, they will both end up in net-next. One patchset may be easier to merge, as we can't CI-test two conflicting series on the list.
To be clear - are you running this over veth or a real device?
Over a veth.
Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely to immediately flush. This already happened in setup_loopback.sh, but wasn't added to setup_veth.sh. This accounts for most of the reduction in flakiness.
That doesn't make intuitive sense to me. If we already defer flushes why do we need to also defer IRQs?
Yep, the behavior here is weird. I ran `gro.sh -t large` 1000 times with each of the following setups (all inside strace to increase flakiness):
- gro_flush_timeout=1ms, napi_defer_hard_irqs=0 --> failed to GRO 29 times
- gro_flush_timeout=5ms, napi_defer_hard_irqs=0 --> failed to GRO 45 times
- gro_flush_timeout=50ms, napi_defer_hard_irqs=0 --> failed to GRO 35 times
- gro_flush_timeout=1ms, napi_defer_hard_irqs=1 --> failed to GRO 0 times
- gro_flush_timeout=1ms, napi_defer_hard_irqs=50 --> failed to GRO 0 times
napi_defer_hard_irqs is clearly having an effect. And deferring once is enough. I believe that deferring IRQs prevents anything else from causing a GRO flush before gro_flush_timeout expires. While waiting for the timeout to expire, an incoming packet can cause napi_complete_done and thus napi_gro_flush to run. Outgoing packets from the veth can also cause this: veth_xmit calls __veth_xdp_flush, which only actually does anything when IRQs are enabled.
So napi_defer_hard_irqs=1 seems sufficient to allow the full gro_flush_timeout to expire before flushing GRO.
With msec-long deferrals we'll flush due to jiffies change. At least that explains a bit. Could you maybe try lower timeouts than 1msec? Previously we'd just keep partially-completed packets in GRO for up to 1msec, now we'll delay all packet processing for 1msec, that's a lot.
On Mon, Feb 24, 2025 at 12:48:30PM -0800, Jakub Kicinski wrote:
With msec-long deferrals we'll flush due to jiffies change. At least that explains a bit. Could you maybe try lower timeouts than 1msec? Previously we'd just keep partially-completed packets in GRO for up to 1msec, now we'll delay all packet processing for 1msec, that's a lot.
Results again with each test run 1000 times:
gro_flush_timeout=50us napi_defer_hard_irqs=1 --> failed to GRO 0 times gro_flush_timeout=100us napi_defer_hard_irqs=1 --> failed to GRO 0 times
gro_flush_timeout=50us napi_defer_hard_irqs=0 --> failed to GRO 36 times gro_flush_timeout=100us napi_defer_hard_irqs=0 --> failed to GRO 46 times
100us with 1 defer seems to work fine and is well below the duration of a jiffy. So we'll usually be testing the "default" GRO path and only occasionally the jiffy-update path. I'll make these the numbers in the revised patch unless someone thinks otherwise.
Kevin Krakauer krakauer@google.com writes:
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh index 02c21ff4ca81..703173f8c8a9 100755 --- a/tools/testing/selftests/net/gro.sh +++ b/tools/testing/selftests/net/gro.sh @@ -21,7 +21,7 @@ run_test() { # Each test is run 3 times to deflake, because given the receive timing,
- # Each test is run 3 times to deflake, because given the receive timing, + # Each test is run 6 times to deflake, because given the receive timing,
# not all packets that should coalesce will be considered in the same flow # on every try.
- for tries in {1..3}; do
- for tries in {1..6}; do # Actual test starts here ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \ 1>>log.txt &
linux-kselftest-mirror@lists.linaro.org