The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d80d60b0db6ff3dd2e29247cc2a5166d7e9ae37e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Sebastian=20W=C3=BCrl?= <sebastian.wuerl(a)ororatech.com>
Date: Thu, 4 Aug 2022 10:14:11 +0200
Subject: [PATCH] can: mcp251x: Fix race condition on receive interrupt
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The mcp251x driver uses both receiving mailboxes of the CAN controller
chips. For retrieving the CAN frames from the controller via SPI, it checks
once per interrupt which mailboxes have been filled and will retrieve the
messages accordingly.
This introduces a race condition, as another CAN frame can enter mailbox 1
while mailbox 0 is emptied. If now another CAN frame enters mailbox 0 until
the interrupt handler is called next, mailbox 0 is emptied before
mailbox 1, leading to out-of-order CAN frames in the network device.
This is fixed by checking the interrupt flags once again after freeing
mailbox 0, to correctly also empty mailbox 1 before leaving the handler.
For reproducing the bug I created the following setup:
- Two CAN devices, one Raspberry Pi with MCP2515, the other can be any.
- Setup CAN to 1 MHz
- Spam bursts of 5 CAN-messages with increasing CAN-ids
- Continue sending the bursts while sleeping a second between the bursts
- Check on the RPi whether the received messages have increasing CAN-ids
- Without this patch, every burst of messages will contain a flipped pair
v3: https://lore.kernel.org/all/20220804075914.67569-1-sebastian.wuerl@ororatec…
v2: https://lore.kernel.org/all/20220804064803.63157-1-sebastian.wuerl@ororatec…
v1: https://lore.kernel.org/all/20220803153300.58732-1-sebastian.wuerl@ororatec…
Fixes: bf66f3736a94 ("can: mcp251x: Move to threaded interrupts instead of workqueues.")
Signed-off-by: Sebastian Würl <sebastian.wuerl(a)ororatech.com>
Link: https://lore.kernel.org/all/20220804081411.68567-1-sebastian.wuerl@ororatec…
[mkl: reduce scope of intf1, eflag1]
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/drivers/net/can/spi/mcp251x.c b/drivers/net/can/spi/mcp251x.c
index e750d13c8841..c320de474f40 100644
--- a/drivers/net/can/spi/mcp251x.c
+++ b/drivers/net/can/spi/mcp251x.c
@@ -1070,9 +1070,6 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
mcp251x_read_2regs(spi, CANINTF, &intf, &eflag);
- /* mask out flags we don't care about */
- intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
-
/* receive buffer 0 */
if (intf & CANINTF_RX0IF) {
mcp251x_hw_rx(spi, 0);
@@ -1082,6 +1079,18 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
if (mcp251x_is_2510(spi))
mcp251x_write_bits(spi, CANINTF,
CANINTF_RX0IF, 0x00);
+
+ /* check if buffer 1 is already known to be full, no need to re-read */
+ if (!(intf & CANINTF_RX1IF)) {
+ u8 intf1, eflag1;
+
+ /* intf needs to be read again to avoid a race condition */
+ mcp251x_read_2regs(spi, CANINTF, &intf1, &eflag1);
+
+ /* combine flags from both operations for error handling */
+ intf |= intf1;
+ eflag |= eflag1;
+ }
}
/* receive buffer 1 */
@@ -1092,6 +1101,9 @@ static irqreturn_t mcp251x_can_ist(int irq, void *dev_id)
clear_intf |= CANINTF_RX1IF;
}
+ /* mask out flags we don't care about */
+ intf &= CANINTF_RX | CANINTF_TX | CANINTF_ERR;
+
/* any error or tx interrupt we need to clear? */
if (intf & (CANINTF_ERR | CANINTF_TX))
clear_intf |= intf & (CANINTF_ERR | CANINTF_TX);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8c21c54a53ab21842f5050fa090f26b03c0313d6 Mon Sep 17 00:00:00 2001
From: Fedor Pchelkin <pchelkin(a)ispras.ru>
Date: Fri, 5 Aug 2022 18:02:16 +0300
Subject: [PATCH] can: j1939: j1939_session_destroy(): fix memory leak of skbs
We need to drop skb references taken in j1939_session_skb_queue() when
destroying a session in j1939_session_destroy(). Otherwise those skbs
would be lost.
Link to Syzkaller info and repro: https://forge.ispras.ru/issues/11743.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
V1: https://lore.kernel.org/all/20220708175949.539064-1-pchelkin@ispras.ru
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Suggested-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov(a)ispras.ru>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Link: https://lore.kernel.org/all/20220805150216.66313-1-pchelkin@ispras.ru
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index 307ee1174a6e..d7d86c944d76 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -260,6 +260,8 @@ static void __j1939_session_drop(struct j1939_session *session)
static void j1939_session_destroy(struct j1939_session *session)
{
+ struct sk_buff *skb;
+
if (session->transmission) {
if (session->err)
j1939_sk_errqueue(session, J1939_ERRQUEUE_TX_ABORT);
@@ -274,7 +276,11 @@ static void j1939_session_destroy(struct j1939_session *session)
WARN_ON_ONCE(!list_empty(&session->sk_session_queue_entry));
WARN_ON_ONCE(!list_empty(&session->active_session_list_entry));
- skb_queue_purge(&session->skb_queue);
+ while ((skb = skb_dequeue(&session->skb_queue)) != NULL) {
+ /* drop ref taken in j1939_session_skb_queue() */
+ skb_unref(skb);
+ kfree_skb(skb);
+ }
__j1939_session_drop(session);
j1939_priv_put(session->priv);
kfree(session);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8c21c54a53ab21842f5050fa090f26b03c0313d6 Mon Sep 17 00:00:00 2001
From: Fedor Pchelkin <pchelkin(a)ispras.ru>
Date: Fri, 5 Aug 2022 18:02:16 +0300
Subject: [PATCH] can: j1939: j1939_session_destroy(): fix memory leak of skbs
We need to drop skb references taken in j1939_session_skb_queue() when
destroying a session in j1939_session_destroy(). Otherwise those skbs
would be lost.
Link to Syzkaller info and repro: https://forge.ispras.ru/issues/11743.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
V1: https://lore.kernel.org/all/20220708175949.539064-1-pchelkin@ispras.ru
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Suggested-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
Signed-off-by: Alexey Khoroshilov <khoroshilov(a)ispras.ru>
Acked-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
Link: https://lore.kernel.org/all/20220805150216.66313-1-pchelkin@ispras.ru
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index 307ee1174a6e..d7d86c944d76 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -260,6 +260,8 @@ static void __j1939_session_drop(struct j1939_session *session)
static void j1939_session_destroy(struct j1939_session *session)
{
+ struct sk_buff *skb;
+
if (session->transmission) {
if (session->err)
j1939_sk_errqueue(session, J1939_ERRQUEUE_TX_ABORT);
@@ -274,7 +276,11 @@ static void j1939_session_destroy(struct j1939_session *session)
WARN_ON_ONCE(!list_empty(&session->sk_session_queue_entry));
WARN_ON_ONCE(!list_empty(&session->active_session_list_entry));
- skb_queue_purge(&session->skb_queue);
+ while ((skb = skb_dequeue(&session->skb_queue)) != NULL) {
+ /* drop ref taken in j1939_session_skb_queue() */
+ skb_unref(skb);
+ kfree_skb(skb);
+ }
__j1939_session_drop(session);
j1939_priv_put(session->priv);
kfree(session);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df9e03aec3b14970df05b72d54f8ac9da3ab29e1 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Thu, 4 Aug 2022 17:21:27 -0700
Subject: [PATCH] selftests: mptcp: make sendfile selftest work
When the selftest got added, sendfile() on mptcp sockets returned
-EOPNOTSUPP, so running 'mptcp_connect.sh -m sendfile' failed
immediately.
This is no longer the case, but the script fails anyway due to timeout.
Let the receiver know once the sender has sent all data, just like
with '-m mmap' mode.
v2: need to respect cfg_wait too, as pm_userspace.sh relied
on -m sendfile to keep the connection open (Mat Martineau)
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Reported-by: Xiumei Mu <xmu(a)redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index e2ea6c126c99..24d4e9cb617e 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -553,6 +553,18 @@ static void set_nonblock(int fd, bool nonblock)
fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
}
+static void shut_wr(int fd)
+{
+ /* Close our write side, ev. give some time
+ * for address notification and/or checking
+ * the current status
+ */
+ if (cfg_wait)
+ usleep(cfg_wait);
+
+ shutdown(fd, SHUT_WR);
+}
+
static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after_out)
{
struct pollfd fds = {
@@ -630,14 +642,7 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after
/* ... and peer also closed already */
break;
- /* ... but we still receive.
- * Close our write side, ev. give some time
- * for address notification and/or checking
- * the current status
- */
- if (cfg_wait)
- usleep(cfg_wait);
- shutdown(peerfd, SHUT_WR);
+ shut_wr(peerfd);
} else {
if (errno == EINTR)
continue;
@@ -767,7 +772,7 @@ static int copyfd_io_mmap(int infd, int peerfd, int outfd,
if (err)
return err;
- shutdown(peerfd, SHUT_WR);
+ shut_wr(peerfd);
err = do_recvfile(peerfd, outfd);
*in_closed_after_out = true;
@@ -791,6 +796,9 @@ static int copyfd_io_sendfile(int infd, int peerfd, int outfd,
err = do_sendfile(infd, peerfd, size);
if (err)
return err;
+
+ shut_wr(peerfd);
+
err = do_recvfile(peerfd, outfd);
*in_closed_after_out = true;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df9e03aec3b14970df05b72d54f8ac9da3ab29e1 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw(a)strlen.de>
Date: Thu, 4 Aug 2022 17:21:27 -0700
Subject: [PATCH] selftests: mptcp: make sendfile selftest work
When the selftest got added, sendfile() on mptcp sockets returned
-EOPNOTSUPP, so running 'mptcp_connect.sh -m sendfile' failed
immediately.
This is no longer the case, but the script fails anyway due to timeout.
Let the receiver know once the sender has sent all data, just like
with '-m mmap' mode.
v2: need to respect cfg_wait too, as pm_userspace.sh relied
on -m sendfile to keep the connection open (Mat Martineau)
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Reported-by: Xiumei Mu <xmu(a)redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index e2ea6c126c99..24d4e9cb617e 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -553,6 +553,18 @@ static void set_nonblock(int fd, bool nonblock)
fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
}
+static void shut_wr(int fd)
+{
+ /* Close our write side, ev. give some time
+ * for address notification and/or checking
+ * the current status
+ */
+ if (cfg_wait)
+ usleep(cfg_wait);
+
+ shutdown(fd, SHUT_WR);
+}
+
static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after_out)
{
struct pollfd fds = {
@@ -630,14 +642,7 @@ static int copyfd_io_poll(int infd, int peerfd, int outfd, bool *in_closed_after
/* ... and peer also closed already */
break;
- /* ... but we still receive.
- * Close our write side, ev. give some time
- * for address notification and/or checking
- * the current status
- */
- if (cfg_wait)
- usleep(cfg_wait);
- shutdown(peerfd, SHUT_WR);
+ shut_wr(peerfd);
} else {
if (errno == EINTR)
continue;
@@ -767,7 +772,7 @@ static int copyfd_io_mmap(int infd, int peerfd, int outfd,
if (err)
return err;
- shutdown(peerfd, SHUT_WR);
+ shut_wr(peerfd);
err = do_recvfile(peerfd, outfd);
*in_closed_after_out = true;
@@ -791,6 +796,9 @@ static int copyfd_io_sendfile(int infd, int peerfd, int outfd,
err = do_sendfile(infd, peerfd, size);
if (err)
return err;
+
+ shut_wr(peerfd);
+
err = do_recvfile(peerfd, outfd);
*in_closed_after_out = true;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0bf3c6aa444a5ef44acc57ef6cfa53fd4fc1c9b Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Thu, 4 Aug 2022 17:21:25 -0700
Subject: [PATCH] mptcp: move subflow cleanup in mptcp_destroy_common()
If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
eBPF program, the MPTCP protocol ends-up leaking all the subflows:
the related cleanup happens in __mptcp_destroy_sock() that is not
invoked in such code path.
Address the issue moving the subflow sockets cleanup in the
mptcp_destroy_common() helper, which is invoked in every msk cleanup
path.
Additionally get rid of the intermediate list_splice_init step, which
is an unneeded relic from the past.
The issue is present since before the reported root cause commit, but
any attempt to backport the fix before that hash will require a complete
rewrite.
Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Reported-by: Nguyen Dinh Phi <phind.uet(a)gmail.com>
Reviewed-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Co-developed-by: Nguyen Dinh Phi <phind.uet(a)gmail.com>
Signed-off-by: Nguyen Dinh Phi <phind.uet(a)gmail.com>
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau(a)linux.intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index a3f1c1461874..07fcc86e1fc9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2769,30 +2769,16 @@ static void __mptcp_wr_shutdown(struct sock *sk)
static void __mptcp_destroy_sock(struct sock *sk)
{
- struct mptcp_subflow_context *subflow, *tmp;
struct mptcp_sock *msk = mptcp_sk(sk);
- LIST_HEAD(conn_list);
pr_debug("msk=%p", msk);
might_sleep();
- /* join list will be eventually flushed (with rst) at sock lock release time*/
- list_splice_init(&msk->conn_list, &conn_list);
-
mptcp_stop_timer(sk);
sk_stop_timer(sk, &sk->sk_timer);
msk->pm.status = 0;
- /* clears msk->subflow, allowing the following loop to close
- * even the initial subflow
- */
- mptcp_dispose_initial_subflow(msk);
- list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
- struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
- __mptcp_close_ssk(sk, ssk, subflow, 0);
- }
-
sk->sk_prot->destroy(sk);
WARN_ON_ONCE(msk->rmem_fwd_alloc);
@@ -2884,24 +2870,20 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
static int mptcp_disconnect(struct sock *sk, int flags)
{
- struct mptcp_subflow_context *subflow, *tmp;
struct mptcp_sock *msk = mptcp_sk(sk);
inet_sk_state_store(sk, TCP_CLOSE);
- list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
- struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-
- __mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_FASTCLOSE);
- }
-
mptcp_stop_timer(sk);
sk_stop_timer(sk, &sk->sk_timer);
if (mptcp_sk(sk)->token)
mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL);
- mptcp_destroy_common(msk);
+ /* msk->subflow is still intact, the following will not free the first
+ * subflow
+ */
+ mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
msk->last_snd = NULL;
WRITE_ONCE(msk->flags, 0);
msk->cb_flags = 0;
@@ -3051,12 +3033,17 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
return newsk;
}
-void mptcp_destroy_common(struct mptcp_sock *msk)
+void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
{
+ struct mptcp_subflow_context *subflow, *tmp;
struct sock *sk = (struct sock *)msk;
__mptcp_clear_xmit(sk);
+ /* join list will be eventually flushed (with rst) at sock lock release time */
+ list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node)
+ __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
+
/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
mptcp_data_lock(sk);
skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
@@ -3078,7 +3065,11 @@ static void mptcp_destroy(struct sock *sk)
{
struct mptcp_sock *msk = mptcp_sk(sk);
- mptcp_destroy_common(msk);
+ /* clears msk->subflow, allowing the following to close
+ * even the initial subflow
+ */
+ mptcp_dispose_initial_subflow(msk);
+ mptcp_destroy_common(msk, 0);
sk_sockets_allocated_dec(sk);
}
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 5d6043c16b09..40881a7df5d5 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -717,7 +717,7 @@ static inline void mptcp_write_space(struct sock *sk)
}
}
-void mptcp_destroy_common(struct mptcp_sock *msk);
+void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
#define MPTCP_TOKEN_MAX_RETRIES 4
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 901c763dcdbb..c7d49fb6e7bd 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -621,7 +621,8 @@ static void mptcp_sock_destruct(struct sock *sk)
sock_orphan(sk);
}
- mptcp_destroy_common(mptcp_sk(sk));
+ /* We don't need to clear msk->subflow, as it's still NULL at this point */
+ mptcp_destroy_common(mptcp_sk(sk), 0);
inet_sock_destruct(sk);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f54755f6a11accb2db5ef17f8f75aad0875aefdc Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 14 Jun 2022 10:17:34 -0700
Subject: [PATCH] tcp: fix possible freeze in tx path under memory pressure
Blamed commit only dealt with applications issuing small writes.
Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.
In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.
Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.
For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.
v2: deal with zero copy paths.
Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Reviewed-by: Wei Wang <weiwan(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b278063a1724..6c3eab485249 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -968,6 +968,23 @@ static int tcp_wmem_schedule(struct sock *sk, int copy)
return min(copy, sk->sk_forward_alloc);
}
+static int tcp_wmem_schedule(struct sock *sk, int copy)
+{
+ int left;
+
+ if (likely(sk_wmem_schedule(sk, copy)))
+ return copy;
+
+ /* We could be in trouble if we have nothing queued.
+ * Use whatever is left in sk->sk_forward_alloc and tcp_wmem[0]
+ * to guarantee some progress.
+ */
+ left = sock_net(sk)->ipv4.sysctl_tcp_wmem[0] - sk->sk_wmem_queued;
+ if (left > 0)
+ sk_forced_mem_schedule(sk, min(left, copy));
+ return min(copy, sk->sk_forward_alloc);
+}
+
static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,
struct page *page, int offset, size_t *size)
{
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f54755f6a11accb2db5ef17f8f75aad0875aefdc Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 14 Jun 2022 10:17:34 -0700
Subject: [PATCH] tcp: fix possible freeze in tx path under memory pressure
Blamed commit only dealt with applications issuing small writes.
Issue here is that we allow to force memory schedule for the sk_buff
allocation, but we have no guarantee that sendmsg() is able to
copy some payload in it.
In this patch, I make sure the socket can use up to tcp_wmem[0] bytes.
For example, if we consider tcp_wmem[0] = 4096 (default on x86),
and initial skb->truesize being 1280, tcp_sendmsg() is able to
copy up to 2816 bytes under memory pressure.
Before this patch a sendmsg() sending more than 2816 bytes
would either block forever (if persistent memory pressure),
or return -EAGAIN.
For bigger MTU networks, it is advised to increase tcp_wmem[0]
to avoid sending too small packets.
v2: deal with zero copy paths.
Fixes: 8e4d980ac215 ("tcp: fix behavior for epoll edge trigger")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Reviewed-by: Wei Wang <weiwan(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b278063a1724..6c3eab485249 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -968,6 +968,23 @@ static int tcp_wmem_schedule(struct sock *sk, int copy)
return min(copy, sk->sk_forward_alloc);
}
+static int tcp_wmem_schedule(struct sock *sk, int copy)
+{
+ int left;
+
+ if (likely(sk_wmem_schedule(sk, copy)))
+ return copy;
+
+ /* We could be in trouble if we have nothing queued.
+ * Use whatever is left in sk->sk_forward_alloc and tcp_wmem[0]
+ * to guarantee some progress.
+ */
+ left = sock_net(sk)->ipv4.sysctl_tcp_wmem[0] - sk->sk_wmem_queued;
+ if (left > 0)
+ sk_forced_mem_schedule(sk, min(left, copy));
+ return min(copy, sk->sk_forward_alloc);
+}
+
static struct sk_buff *tcp_build_frag(struct sock *sk, int size_goal, int flags,
struct page *page, int offset, size_t *size)
{