From: Xiaoyan Li lixiaoyan@google.com
When compound pages are enabled, although the mm layer still returns an array of page pointers, a subset (or all) of them may have the same page head since a max 180kb skb can span 2 hugepages if it is on the boundary, be a mix of pages and 1 hugepage, or fit completely in a hugepage. Instead of referencing page head on all page pointers, use page length arithmetic to only call page head when referencing a known different page head to avoid touching a cold cacheline.
Tested: See next patch with changes to tcp_mmap
Correntess: On a pair of separate hosts as send with MSG_ZEROCOPY will force a copy on tx if using loopback alone, check that the SHA on the message sent is equivalent to checksum on the message received, since the current program already checks for the length.
echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages ./tcp_mmap -s -z ./tcp_mmap -H $DADDR -z
SHA256 is correct received 2 MB (100 % mmap'ed) in 0.005914 s, 2.83686 Gbit cpu usage user:0.001984 sys:0.000963, 1473.5 usec per MB, 10 c-switches
Performance: Run neper between adjacent hosts with the same config tcp_stream -Z --skip-rx-copy -6 -T 20 -F 1000 --stime-use-proc --test-length=30
Before patch: stime_end=37.670000 After patch: stime_end=30.310000
Signed-off-by: Coco Li lixiaoyan@google.com --- net/core/datagram.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/net/core/datagram.c b/net/core/datagram.c index e4ff2db40c98..5662dff3d381 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -622,12 +622,12 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, frag = skb_shinfo(skb)->nr_frags;
while (length && iov_iter_count(from)) { + struct page *head, *last_head = NULL; struct page *pages[MAX_SKB_FRAGS]; - struct page *last_head = NULL; + int refs, order, n = 0; size_t start; ssize_t copied; unsigned long truesize; - int refs, n = 0;
if (frag == MAX_SKB_FRAGS) return -EMSGSIZE; @@ -650,9 +650,17 @@ int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, } else { refcount_add(truesize, &skb->sk->sk_wmem_alloc); } + + head = compound_head(pages[n]); + order = compound_order(head); + for (refs = 0; copied != 0; start = 0) { int size = min_t(int, copied, PAGE_SIZE - start); - struct page *head = compound_head(pages[n]); + + if (pages[n] - head > (1UL << order) - 1) { + head = compound_head(pages[n]); + order = compound_order(head); + }
start += (pages[n] - head) << PAGE_SHIFT; copied -= size;