The attached two patches fix an issue with running BPF sockmap on TCP sockets where applications error out due to sender doing a send for each scatter gather element in the msg. The other 6.x stable and longterm kernels have this fix so we don't see it there and the issue was introduced in 6.1 by converting to the read_skb() interface. The 5.15 stable kernels use the read_sock() interface which doesn't have this issue.
We missed adding the fixes tag to the original series because the work was a code improvement and wasn't originally identified as a bugfix.
The first patch applies cleanly, the second patch does not because it touches smc, espintcp, and siw which do not apply because that code does not use sendmsg() yet on 6.1. I remove those chunks of the patch because they don't apply here.
I added a Fixes tag to the patches to point at the patch introducing the issue. Originally I sent something similar to this as a single patch where I incorrectly merged the two patches. Greg asked me to do this as two patches. Thanks.
David Howells (2): tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage
net/ipv4/tcp_bpf.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-)
From: David Howells dhowells@redhat.com
[ Upstream commit ebf2e8860eea66e2c4764316b80c6a5ee5f336ee ]
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: David Howells dhowells@redhat.com cc: John Fastabend john.fastabend@gmail.com cc: Jakub Sitnicki jakub@cloudflare.com cc: David Ahern dsahern@kernel.org cc: Jens Axboe axboe@kernel.dk cc: Matthew Wilcox willy@infradead.org Signed-off-by: Jakub Kicinski kuba@kernel.org --- net/ipv4/tcp_bpf.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index f8037d142bb7..f3def363b971 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -90,11 +90,13 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes, { bool apply = apply_bytes; struct scatterlist *sge; + struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, }; struct page *page; int size, ret = 0; u32 off;
while (1) { + struct bio_vec bvec; bool has_tx_ulp;
sge = sk_msg_elem(msg, msg->sg.start); @@ -106,16 +108,18 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes, tcp_rate_check_app_limited(sk); retry: has_tx_ulp = tls_sw_has_ctx_tx(sk); - if (has_tx_ulp) { - flags |= MSG_SENDPAGE_NOPOLICY; - ret = kernel_sendpage_locked(sk, - page, off, size, flags); - } else { - ret = do_tcp_sendpages(sk, page, off, size, flags); - } + if (has_tx_ulp) + msghdr.msg_flags |= MSG_SENDPAGE_NOPOLICY;
+ if (flags & MSG_SENDPAGE_NOTLAST) + msghdr.msg_flags |= MSG_MORE; + + bvec_set_page(&bvec, page, size, off); + iov_iter_bvec(&msghdr.msg_iter, ITER_SOURCE, &bvec, 1, size); + ret = tcp_sendmsg_locked(sk, &msghdr, size); if (ret <= 0) return ret; + if (apply) apply_bytes -= ret; msg->sg.size -= ret; @@ -495,7 +499,7 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) long timeo; int flags;
- /* Don't let internal do_tcp_sendpages() flags through */ + /* Don't let internal sendpage flags through */ flags = (msg->msg_flags & ~MSG_SENDPAGE_DECRYPTED); flags |= MSG_NO_SHARED_FRAGS;
On Tue, May 07, 2024 at 10:47:56AM -0700, John Fastabend wrote:
From: David Howells dhowells@redhat.com
[ Upstream commit ebf2e8860eea66e2c4764316b80c6a5ee5f336ee ]
do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set.
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: David Howells dhowells@redhat.com cc: John Fastabend john.fastabend@gmail.com cc: Jakub Sitnicki jakub@cloudflare.com cc: David Ahern dsahern@kernel.org cc: Jens Axboe axboe@kernel.dk cc: Matthew Wilcox willy@infradead.org Signed-off-by: Jakub Kicinski kuba@kernel.org
net/ipv4/tcp_bpf.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
You too need to sign off on a patch you forward to someone else, that's what the DCO means :)
Please fix up and resend.
thanks,
greg k-h
From: David Howells dhowells@redhat.com
[ Upstream commit f8dd95b29d7ef08c19ec9720564acf72243ddcf6]
As MSG_SENDPAGE_NOTLAST is being phased out along with sendpage(), don't use it further in than the sendpage methods, but rather translate it to MSG_MORE and use that instead.
Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: David Howells dhowells@redhat.com cc: Willem de Bruijn willemdebruijn.kernel@gmail.com cc: Bernard Metzler bmt@zurich.ibm.com cc: Jason Gunthorpe jgg@ziepe.ca cc: Leon Romanovsky leon@kernel.org cc: John Fastabend john.fastabend@gmail.com cc: Jakub Sitnicki jakub@cloudflare.com cc: David Ahern dsahern@kernel.org cc: Karsten Graul kgraul@linux.ibm.com cc: Wenjia Zhang wenjia@linux.ibm.com cc: Jan Karcher jaka@linux.ibm.com cc: "D. Wythe" alibuda@linux.alibaba.com cc: Tony Lu tonylu@linux.alibaba.com cc: Wen Gu guwen@linux.alibaba.com cc: Boris Pismenny borisp@nvidia.com cc: Steffen Klassert steffen.klassert@secunet.com cc: Herbert Xu herbert@gondor.apana.org.au Link: https://lore.kernel.org/r/20230623225513.2732256-2-dhowells@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org --- net/ipv4/tcp_bpf.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c index f3def363b971..cd6648aaf570 100644 --- a/net/ipv4/tcp_bpf.c +++ b/net/ipv4/tcp_bpf.c @@ -88,9 +88,9 @@ static int bpf_tcp_ingress(struct sock *sk, struct sk_psock *psock, static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes, int flags, bool uncharge) { + struct msghdr msghdr = {}; bool apply = apply_bytes; struct scatterlist *sge; - struct msghdr msghdr = { .msg_flags = flags | MSG_SPLICE_PAGES, }; struct page *page; int size, ret = 0; u32 off; @@ -107,11 +107,12 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes,
tcp_rate_check_app_limited(sk); retry: + msghdr.msg_flags = flags | MSG_SPLICE_PAGES; has_tx_ulp = tls_sw_has_ctx_tx(sk); if (has_tx_ulp) msghdr.msg_flags |= MSG_SENDPAGE_NOPOLICY;
- if (flags & MSG_SENDPAGE_NOTLAST) + if (size < sge->length && msg->sg.start != msg->sg.end) msghdr.msg_flags |= MSG_MORE;
bvec_set_page(&bvec, page, size, off);
linux-stable-mirror@lists.linaro.org