Re: [PATCH net-next v7 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment

18 Apr 2024


      Paolo Abeni wrote:
...
On Tue, 2024-04-16 at 11:21 +0200, Paolo Abeni wrote:
...
On Fri, 2024-04-12 at 17:55 +0200, Richard Gobert wrote:
...
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
iph->id, ...) against all packets in a loop. These flush checks are used
currently in all tcp flows and in some UDP flows in GRO.
These checks need to be done only once and only against the found p skb,
since they only affect flush and not same_flow.
Leveraging the previous commit in the series, in which correct network
header offsets are saved for both outer and inner network headers -
allowing these checks to be done only once, in tcp_gro_receive and
udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at
all. In addition, flush_id checks are more declarative and contained in
inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.
This results in less parsing code for UDP flows and non-loop flush tests
for TCP flows.
To make sure results are not within noise range - I've made netfilter drop
all TCP packets, and measured CPU performance in GRO (in this case GRO is
responsible for about 50% of the CPU utilization).
L3 flush/flush_id checks are not relevant to UDP connections where
skb_gro_receive_list is called. The only code change relevant to this flow
is inet_gro_receive. The rest of the code parsing this flow stays the
same.
All concurrent connections tested are with the same ip srcaddr and
dstaddr.
perf top while replaying 64 concurrent IP/UDP connections (UDP fwd flow):
net-next:
        3.03%  [kernel]  [k] inet_gro_receive
patch applied:
        2.78%  [kernel]  [k] inet_gro_receive
Why there are no figures for
udp_gro_receive_segment()/gro_network_flush() here?
Also you should be able to observer a very high amount of CPU usage by
GRO even with TCP with very high speed links, keeping the BH/GRO on a
CPU and the user-space/data copy on a different one (or using rx zero
copy).
To be more explicit: I think at least the above figures are required, 
and I still fear the real gain in that case would range from zero to
negative.
I wrote about it in the commit message in short, sorry if I wasn't clear
enough.
gro_network_flush is compiled in-line to both udp_gro_receive_segment and
tcp_gro_receive. udp_gro_receive_segment is compiled in-line to
udp_gro_receive.
The UDP numbers I posted are not relevant anymore after Willem and
Alexander's thread, after which we understood flush and flush_id should be
calculated for all UDP flows.
I can post new numbers for the UDP fwd path after implementing the correct
change. As for TCP - the numbers I posted stay the same.
You should note there is an increase in CPU utilization in tcp_gro_receive
because of the inline compilation of gro_network_flush. The numbers make
sense and show performance enhancement in the case I showed when both
inet_gro_receive and tcp_gro_receive are accounted for.
...
If you can't do the TCP part of the testing, please provide at least
the figures for a single UDP flow, that should give more indication WRT
the result we can expect with TCP.
Note that GRO is used mainly by TCP and TCP packets with different
src/dst port will land into different GRO hash buckets, having
different RX hash.
That will happen even for UDP, at least for some (most?) nics include
the UDP ports in the RX hash.
Thanks,
Paolo

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH net-next v7 2/3] net: gro: move L3 flush checks to tcp_gro_receive and udp_gro_receive_segment