[PATCH] selftests/net: deflake GRO tests and fix return value and output

23 Feb 2025


      Thanks for the review! I'll split this up. Do you think it's better as two
patchsets -- one for stability/deflaking, one for return value and output
cleanup -- or as a single patchset with several commits?
...
To be clear - are you running this over veth or a real device?
Over a veth.
...
...
Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely
to immediately flush. This already happened in setup_loopback.sh, but
wasn't added to setup_veth.sh. This accounts for most of the reduction
in flakiness.
That doesn't make intuitive sense to me. If we already defer flushes
why do we need to also defer IRQs?
Yep, the behavior here is weird. I ran `gro.sh -t large` 1000 times with each of
the following setups (all inside strace to increase flakiness):
- gro_flush_timeout=1ms, napi_defer_hard_irqs=0  --> failed to GRO 29 times
- gro_flush_timeout=5ms, napi_defer_hard_irqs=0  --> failed to GRO 45 times
- gro_flush_timeout=50ms, napi_defer_hard_irqs=0 --> failed to GRO 35 times
- gro_flush_timeout=1ms, napi_defer_hard_irqs=1  --> failed to GRO 0 times
- gro_flush_timeout=1ms, napi_defer_hard_irqs=50 --> failed to GRO 0 times
napi_defer_hard_irqs is clearly having an effect. And deferring once is enough.
I believe that deferring IRQs prevents anything else from causing a GRO flush
before gro_flush_timeout expires. While waiting for the timeout to expire, an
incoming packet can cause napi_complete_done and thus napi_gro_flush to run.
Outgoing packets from the veth can also cause this: veth_xmit calls
__veth_xdp_flush, which only actually does anything when IRQs are enabled.
So napi_defer_hard_irqs=1 seems sufficient to allow the full gro_flush_timeout
to expire before flushing GRO.

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH] selftests/net: deflake GRO tests and fix return value and output