On 3/12/25 10:10 AM, Dmitry Safonov wrote:
Currently, tcp_ao tests have two timeouts: TEST_RETRANSMIT_SEC and TEST_TIMEOUT_SEC [by default 1 and 5 seconds]. The first one, TEST_RETRANSMIT_SEC is used for operations that are expected to succeed in order for a test to pass. It is usually not consumed and exists only to avoid indefinite test run if the operation didn't complete. The second one, TEST_RETRANSMIT_SEC exists for the tests that checking operations, that are expected to fail/timeout. It is shorter as it is fully consumed, with an expectation that if operation didn't succeed during that period, it will timeout. And the related test that expects the timeout is passing. The actual operation failure is then cross-verified by other means like counters checks.
The issue with TEST_RETRANSMIT_SEC timeout is that 1 second is the exact initial TCP timeout. So, in case the initial segment gets lost (quite unlikely on local veth interface between two net namespaces, yet happens in slow VMs), the retransmission never happens and as a result, the test is not actually testing the functionality. Which in the end fails counters checks.
As I want tcp_ao selftests to be fast and finishing in a reasonable amount of time on manual run, I didn't consider increasing TEST_RETRANSMIT_SEC.
Rather, initially, BPF_SOCK_OPS_TIMEOUT_INIT looked promising as a lever to make the initial TCP timeout shorter. But as it's not a socket bpf attached thing, but sock_ops (attaches to cgroups), the selftests would have to use libbpf, which I wanted to avoid if not absolutely required.
Instead, use a mixed select() and counters polling mode with the longer TEST_TIMEOUT_SEC timeout to detect running-away failed tests. It actually not only allows losing segments and succeeding after the previous TEST_RETRANSMIT_SEC timeout was consumed, but makes the tests expecting timeout/failure pass faster.
The only test case taking longer (TEST_TIMEOUT_SEC) now is connect-deny "wrong snd id", which checks for no key on SYN-ACK for which there is no counter in the kernel (see tcp_make_synack()). Yet it can be speed up by poking skpair from the trace event (see trace_tcp_ao_synack_no_key).
Reported-by: Jakub Kicinski kuba@kernel.org Closes: https://lore.kernel.org/netdev/20241205070656.6ef344d7@kernel.org/ Signed-off-by: Dmitry Safonov 0x7f454c46@gmail.com
Could you please provide a suitable Fixes tag here?
Also given a good slices of the patches here are refactor, I think the whole series could land on net-next - so that we avoid putting a bit of stuff in the last 6.14-net PR - WDYT?
Thanks,
Paolo