On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote:
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, iph->id, ...) against all packets in a loop. These flush checks are used currently in all tcp flows and in some UDP flows in GRO.
These checks need to be done only once and only against the found p skb, since they only affect flush and not same_flow.
Leveraging the previous commit in the series, in which correct network header offsets are saved for both outer and inner network headers - allowing these checks to be done only once, in tcp_gro_receive and udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are more declarative and contained in inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.
This results in less parsing code for UDP flows and non-loop flush tests for TCP flows.
To make sure results are not within noise range - I've made netfilter drop all TCP packets, and measured CPU performance in GRO (in this case GRO is responsible for about 50% of the CPU utilization).
L3 flush/flush_id checks are not relevant to UDP connections where skb_gro_receive_list is called. The only code change relevant to this flow is inet_gro_receive. The rest of the code parsing this flow stays the same.
All concurrent connections tested are with the same ip srcaddr and dstaddr.
perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow): net-next: 3.03% [kernel] [k] inet_gro_receive
patch applied: 2.78% [kernel] [k] inet_gro_receive
Why there are no figures for udp_gro_receive_segment()/gro_network_flush() here?
Also you should be able to observer a very high amount of CPU usage by GRO even with TCP with very high speed links, keeping the BH/GRO on a CPU and the user-space/data copy on a different one (or using rx zero copy).
Thanks,
Paolo