Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
#regzbot introduced v5.15.88..
Thanks, Winter
Hi,
[CCed netdev]
On Sun, Feb 12, 2023 at 10:38:40PM -0500, Winter wrote:
Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
From the linked patch:
https://lore.kernel.org/stable/20221228144337.512799851@linuxfoundation.org/
I can see that:
We assume the correct errno is -EADDRINUSE when sk->sk_prot->get_port() fails, so some ->get_port() functions return just 1 on failure and the callers return -EADDRINUSE instead.
However, mptcp_get_port() can return -EINVAL. Let's not ignore the error.
Note the only exception is inet_autobind(), all of whose callers return -EAGAIN instead.
But the patch doesn't do what is documented, it preserves all return values and will happily return 1 if ->get_port() returns 1:
--- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -522,9 +522,9 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, /* Make sure we are allowed to bind here. */ if (snum || !(inet->bind_address_no_port || (flags & BIND_FORCE_ADDRESS_NO_PORT))) {
if (sk->sk_prot->get_port(sk, snum)) {
err = sk->sk_prot->get_port(sk, snum);
if (err) { inet->inet_saddr = inet->inet_rcv_saddr = 0;
} if (!(flags & BIND_FROM_BPF)) {err = -EADDRINUSE; goto out_release_sock;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index eb31c7158b39..971969cc7e17 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1041,7 +1041,7 @@ int inet_csk_listen_start(struct sock *sk) { struct inet_connection_sock *icsk = inet_csk(sk); struct inet_sock *inet = inet_sk(sk);
- int err = -EADDRINUSE;
- int err;
reqsk_queue_alloc(&icsk->icsk_accept_queue); @@ -1057,7 +1057,8 @@ int inet_csk_listen_start(struct sock *sk) * after validation is complete. */ inet_sk_state_store(sk, TCP_LISTEN);
- if (!sk->sk_prot->get_port(sk, inet->inet_num)) {
- err = sk->sk_prot->get_port(sk, inet->inet_num);
- if (!err) { inet->inet_sport = htons(inet->inet_num);
IMHO in the "if (err)" block in all these places what is missing is:
if (err > 0) err = -EADDRINUSE;
so that all non-negative errors are properly mapped to -EADDRINUSE, like in the appended patch (if someone wants to give it a try, I've not even build-tested it). Note that I don't like it much and do not like the original patch either, I think a revert and a cleaner fix could be better :-/
Willy --
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index cf11f10927e1..ce9960d9448d 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -526,6 +526,9 @@ int __inet_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, err = sk->sk_prot->get_port(sk, snum); if (err) { inet->inet_saddr = inet->inet_rcv_saddr = 0; + /* some ->get_port() return 1 on failure */ + if (err > 0) + err = -EADDRINUSE; goto out_release_sock; } if (!(flags & BIND_FROM_BPF)) { diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index f2c43f67187d..7585c440fb8c 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1241,6 +1241,9 @@ int inet_csk_listen_start(struct sock *sk) if (likely(!err)) return 0; } + /* some ->get_port() return 1 on failure */ + if (err > 0) + err = -EADDRINUSE;
inet_sk_set_state(sk, TCP_CLOSE); return err; diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 847934763868..941c8ee4a144 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -415,6 +415,9 @@ static int __inet6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len, if (err) { sk->sk_ipv6only = saved_ipv6only; inet_reset_saddr(sk); + /* some ->get_port() return 1 on failure */ + if (err > 0) + err = -EADDRINUSE; goto out; } if (!(flags & BIND_FROM_BPF)) {
On Mon, Feb 13, 2023 at 05:27:03AM +0100, Willy Tarreau wrote:
Hi,
[CCed netdev]
On Sun, Feb 12, 2023 at 10:38:40PM -0500, Winter wrote:
Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
From the linked patch:
https://lore.kernel.org/stable/20221228144337.512799851@linuxfoundation.org/
But that commit only ended up in 6.0.y, not 5.15, so how is this an issue in 5.15.y?
confused,
greg k-h
Hi Greg,
On Mon, Feb 13, 2023 at 08:25:34AM +0100, Greg KH wrote:
On Mon, Feb 13, 2023 at 05:27:03AM +0100, Willy Tarreau wrote:
Hi,
[CCed netdev]
On Sun, Feb 12, 2023 at 10:38:40PM -0500, Winter wrote:
Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
From the linked patch:
https://lore.kernel.org/stable/20221228144337.512799851@linuxfoundation.org/
But that commit only ended up in 6.0.y, not 5.15, so how is this an issue in 5.15.y?
Hmmm I plead -ENOCOFFEE on my side, I hadn't notice the "can't find the problematic commit", you're right indeed.
However if the issue happened in 5.15.88, the only part touching the network listening area is this one which may introduce an EINVAL on one listening path, but that seems unrelated to me given that it's only for ULP that libuv doesn't seem to be using:
dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
I guess the reporter will need to further bisect the problem to figure the exact patch at this point.
Willy
From: Willy Tarreau w@1wt.eu Date: Mon, 13 Feb 2023 08:52:34 +0100
Hi Greg,
On Mon, Feb 13, 2023 at 08:25:34AM +0100, Greg KH wrote:
On Mon, Feb 13, 2023 at 05:27:03AM +0100, Willy Tarreau wrote:
Hi,
[CCed netdev]
On Sun, Feb 12, 2023 at 10:38:40PM -0500, Winter wrote:
Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
From the linked patch:
https://lore.kernel.org/stable/20221228144337.512799851@linuxfoundation.org/
But that commit only ended up in 6.0.y, not 5.15, so how is this an issue in 5.15.y?
Hmmm I plead -ENOCOFFEE on my side, I hadn't notice the "can't find the problematic commit", you're right indeed.
However if the issue happened in 5.15.88, the only part touching the network listening area is this one which may introduce an EINVAL on one listening path, but that seems unrelated to me given that it's only for ULP that libuv doesn't seem to be using:
dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
This commit accidentally backports a part of 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed err = -EADDRINUSE in inet_csk_listen_start(). Then, listen() will return 0 even if ->get_port() actually fails and returns 1.
I can send a small revert or a whole backport, but which is preferable ? The original patch is not for stable, but it will make future backports easy.
Thanks, Kuniyuki
On Mon, Feb 13, 2023 at 08:44:55AM -0800, Kuniyuki Iwashima wrote:
From: Willy Tarreau w@1wt.eu Date: Mon, 13 Feb 2023 08:52:34 +0100
Hi Greg,
On Mon, Feb 13, 2023 at 08:25:34AM +0100, Greg KH wrote:
On Mon, Feb 13, 2023 at 05:27:03AM +0100, Willy Tarreau wrote:
Hi,
[CCed netdev]
On Sun, Feb 12, 2023 at 10:38:40PM -0500, Winter wrote:
Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
From the linked patch:
https://lore.kernel.org/stable/20221228144337.512799851@linuxfoundation.org/
But that commit only ended up in 6.0.y, not 5.15, so how is this an issue in 5.15.y?
Hmmm I plead -ENOCOFFEE on my side, I hadn't notice the "can't find the problematic commit", you're right indeed.
However if the issue happened in 5.15.88, the only part touching the network listening area is this one which may introduce an EINVAL on one listening path, but that seems unrelated to me given that it's only for ULP that libuv doesn't seem to be using:
dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
This commit accidentally backports a part of 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed err = -EADDRINUSE in inet_csk_listen_start(). Then, listen() will return 0 even if ->get_port() actually fails and returns 1.
I can send a small revert or a whole backport, but which is preferable ? The original patch is not for stable, but it will make future backports easy.
A whole revert is probably best, if it's not needed. But if it is, a fix up would be fine to get as well.
thanks,
greg k-h
From: Greg KH gregkh@linuxfoundation.org Date: Mon, 13 Feb 2023 18:48:48 +0100
On Mon, Feb 13, 2023 at 08:44:55AM -0800, Kuniyuki Iwashima wrote:
From: Willy Tarreau w@1wt.eu Date: Mon, 13 Feb 2023 08:52:34 +0100
Hi Greg,
On Mon, Feb 13, 2023 at 08:25:34AM +0100, Greg KH wrote:
On Mon, Feb 13, 2023 at 05:27:03AM +0100, Willy Tarreau wrote:
Hi,
[CCed netdev]
On Sun, Feb 12, 2023 at 10:38:40PM -0500, Winter wrote:
Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
From the linked patch:
https://lore.kernel.org/stable/20221228144337.512799851@linuxfoundation.org/
But that commit only ended up in 6.0.y, not 5.15, so how is this an issue in 5.15.y?
Hmmm I plead -ENOCOFFEE on my side, I hadn't notice the "can't find the problematic commit", you're right indeed.
However if the issue happened in 5.15.88, the only part touching the network listening area is this one which may introduce an EINVAL on one listening path, but that seems unrelated to me given that it's only for ULP that libuv doesn't seem to be using:
dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
This commit accidentally backports a part of 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed err = -EADDRINUSE in inet_csk_listen_start(). Then, listen() will return 0 even if ->get_port() actually fails and returns 1.
I can send a small revert or a whole backport, but which is preferable ? The original patch is not for stable, but it will make future backports easy.
A whole revert is probably best, if it's not needed. But if it is, a fix up would be fine to get as well.
dadd0dcaa67d is needed to fix potential double-free, so could you queue this fixup for 5.15. ?
---8<--- From ad319ace8b5c1dd5105b7263b7ccfd0ba0926551 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima kuniyu@amazon.com Date: Mon, 13 Feb 2023 20:45:48 +0000 Subject: [PATCH] tcp: Fix listen() regression in 5.15.88.
When we backport dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status"), we have accidentally backported a part of 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed err = -EADDRINUSE in inet_csk_listen_start().
Thus, listen() no longer returns -EADDRINUSE even if ->get_port() failed as reported in [0].
We set -EADDRINUSE to err just before ->get_port() to fix the regression.
[0]: https://lore.kernel.org/stable/EF8A45D0-768A-4CD5-9A8A-0FA6E610ABF7@winter.c...
Reported-by: Winter winter@winter.cafe Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com --- net/ipv4/inet_connection_sock.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index a86140ff093c..29ec42c1f5d0 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1070,6 +1070,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog) * It is OK, because this socket enters to hash table only * after validation is complete. */ + err = -EADDRINUSE; inet_sk_state_store(sk, TCP_LISTEN); if (!sk->sk_prot->get_port(sk, inet->inet_num)) { inet->inet_sport = htons(inet->inet_num);
On Mon, Feb 13, 2023 at 12:58:35PM -0800, Kuniyuki Iwashima wrote:
From: Greg KH gregkh@linuxfoundation.org Date: Mon, 13 Feb 2023 18:48:48 +0100
On Mon, Feb 13, 2023 at 08:44:55AM -0800, Kuniyuki Iwashima wrote:
From: Willy Tarreau w@1wt.eu Date: Mon, 13 Feb 2023 08:52:34 +0100
Hi Greg,
On Mon, Feb 13, 2023 at 08:25:34AM +0100, Greg KH wrote:
On Mon, Feb 13, 2023 at 05:27:03AM +0100, Willy Tarreau wrote:
Hi,
[CCed netdev]
On Sun, Feb 12, 2023 at 10:38:40PM -0500, Winter wrote: > Hi all, > > I'm facing the same issue as > https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., > but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce > on 5.15.93. > > However, I cannot seem to find the identified problematic commit in the 5.15 > branch, so I'm unsure if this is a different issue or not. > > There's a few ways to reproduce this issue, but the one I've been using is > running libuv's (https://github.com/libuv/libuv) tests, specifically tests > 271 and 277.
>From the linked patch:
https://lore.kernel.org/stable/20221228144337.512799851@linuxfoundation.org/
But that commit only ended up in 6.0.y, not 5.15, so how is this an issue in 5.15.y?
Hmmm I plead -ENOCOFFEE on my side, I hadn't notice the "can't find the problematic commit", you're right indeed.
However if the issue happened in 5.15.88, the only part touching the network listening area is this one which may introduce an EINVAL on one listening path, but that seems unrelated to me given that it's only for ULP that libuv doesn't seem to be using:
dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status")
This commit accidentally backports a part of 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed err = -EADDRINUSE in inet_csk_listen_start(). Then, listen() will return 0 even if ->get_port() actually fails and returns 1.
I can send a small revert or a whole backport, but which is preferable ? The original patch is not for stable, but it will make future backports easy.
A whole revert is probably best, if it's not needed. But if it is, a fix up would be fine to get as well.
dadd0dcaa67d is needed to fix potential double-free, so could you queue this fixup for 5.15. ?
---8<---
From ad319ace8b5c1dd5105b7263b7ccfd0ba0926551 Mon Sep 17 00:00:00 2001
From: Kuniyuki Iwashima kuniyu@amazon.com Date: Mon, 13 Feb 2023 20:45:48 +0000 Subject: [PATCH] tcp: Fix listen() regression in 5.15.88.
When we backport dadd0dcaa67d ("net/ulp: prevent ULP without clone op from entering the LISTEN status"), we have accidentally backported a part of 7a7160edf1bf ("net: Return errno in sk->sk_prot->get_port().") and removed err = -EADDRINUSE in inet_csk_listen_start().
Thus, listen() no longer returns -EADDRINUSE even if ->get_port() failed as reported in [0].
We set -EADDRINUSE to err just before ->get_port() to fix the regression.
Reported-by: Winter winter@winter.cafe Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com
net/ipv4/inet_connection_sock.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index a86140ff093c..29ec42c1f5d0 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -1070,6 +1070,7 @@ int inet_csk_listen_start(struct sock *sk, int backlog) * It is OK, because this socket enters to hash table only * after validation is complete. */
- err = -EADDRINUSE; inet_sk_state_store(sk, TCP_LISTEN); if (!sk->sk_prot->get_port(sk, inet->inet_num)) { inet->inet_sport = htons(inet->inet_num);
-- 2.38.1 ---8<---
Thanks, now queued up.
greg k-h
[TLDR: This mail in primarily relevant for Linux kernel regression tracking. See link in footer if these mails annoy you.]
On 13.02.23 04:38, Winter wrote:
Hi all,
I'm facing the same issue as https://lore.kernel.org/stable/CAFsF8vL4CGFzWMb38_XviiEgxoKX0GYup=JiUFXUOmag..., but on 5.15. I've bisected it across releases to 5.15.88, and can reproduce on 5.15.93.
However, I cannot seem to find the identified problematic commit in the 5.15 branch, so I'm unsure if this is a different issue or not.
There's a few ways to reproduce this issue, but the one I've been using is running libuv's (https://github.com/libuv/libuv) tests, specifically tests 271 and 277.
#regzbot introduced v5.15.88..
#regzbot fix: fdaf88531cfd17b #regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
linux-stable-mirror@lists.linaro.org