Thanks for the review! I'll split this up. Do you think it's better as two patchsets -- one for stability/deflaking, one for return value and output cleanup -- or as a single patchset with several commits?
To be clear - are you running this over veth or a real device?
Over a veth.
Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely to immediately flush. This already happened in setup_loopback.sh, but wasn't added to setup_veth.sh. This accounts for most of the reduction in flakiness.
That doesn't make intuitive sense to me. If we already defer flushes why do we need to also defer IRQs?
Yep, the behavior here is weird. I ran `gro.sh -t large` 1000 times with each of the following setups (all inside strace to increase flakiness):
- gro_flush_timeout=1ms, napi_defer_hard_irqs=0 --> failed to GRO 29 times - gro_flush_timeout=5ms, napi_defer_hard_irqs=0 --> failed to GRO 45 times - gro_flush_timeout=50ms, napi_defer_hard_irqs=0 --> failed to GRO 35 times - gro_flush_timeout=1ms, napi_defer_hard_irqs=1 --> failed to GRO 0 times - gro_flush_timeout=1ms, napi_defer_hard_irqs=50 --> failed to GRO 0 times
napi_defer_hard_irqs is clearly having an effect. And deferring once is enough. I believe that deferring IRQs prevents anything else from causing a GRO flush before gro_flush_timeout expires. While waiting for the timeout to expire, an incoming packet can cause napi_complete_done and thus napi_gro_flush to run. Outgoing packets from the veth can also cause this: veth_xmit calls __veth_xdp_flush, which only actually does anything when IRQs are enabled.
So napi_defer_hard_irqs=1 seems sufficient to allow the full gro_flush_timeout to expire before flushing GRO.