This is a note to let you know that I've just added the patch titled
tcp: refresh tcp_mstamp from timers callbacks
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-refresh-tcp_mstamp-from-timers-callbacks.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 12 Dec 2017 18:22:52 -0800
Subject: tcp: refresh tcp_mstamp from timers callbacks
From: Eric Dumazet <edumazet(a)google.com>
[ Upstream commit 4688eb7cf3ae2c2721d1dacff5c1384cba47d176 ]
Only the retransmit timer currently refreshes tcp_mstamp
We should do the same for delayed acks and keepalives.
Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.
Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: Soheil Hassas Yeganeh <soheil(a)google.com>
Cc: Mike Maloney <maloney(a)google.com>
Cc: Neal Cardwell <ncardwell(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Acked-by: Mike Maloney <maloney(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_timer.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -264,6 +264,7 @@ void tcp_delack_timer_handler(struct soc
icsk->icsk_ack.pingpong = 0;
icsk->icsk_ack.ato = TCP_ATO_MIN;
}
+ tcp_mstamp_refresh(tcp_sk(sk));
tcp_send_ack(sk);
__NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS);
}
@@ -627,6 +628,7 @@ static void tcp_keepalive_timer (unsigne
goto out;
}
+ tcp_mstamp_refresh(tp);
if (sk->sk_state == TCP_FIN_WAIT2 && sock_flag(sk, SOCK_DEAD)) {
if (tp->linger2 >= 0) {
const int tmo = tcp_fin_time(sk) - TCP_TIMEWAIT_LEN;
Patches currently in stable-queue which might be from edumazet(a)google.com are
queue-4.14/tcp-refresh-tcp_mstamp-from-timers-callbacks.patch
queue-4.14/net-fix-double-free-and-memory-corruption-in-get_net_ns_by_id.patch
queue-4.14/sock-free-skb-in-skb_complete_tx_timestamp-on-error.patch
queue-4.14/tcp-invalidate-rate-samples-during-sack-reneging.patch
queue-4.14/tcp-md5sig-use-skb-s-saddr-when-replying-to-an-incoming-segment.patch
queue-4.14/net-ipv4-fix-for-a-race-condition-in-raw_sendmsg.patch
queue-4.14/ipv4-igmp-guard-against-silly-mtu-values.patch
queue-4.14/tcp-fix-potential-underestimation-on-rcv_rtt.patch
queue-4.14/ipv6-mcast-better-catch-silly-mtu-values.patch
This is a note to let you know that I've just added the patch titled
tcp md5sig: Use skb's saddr when replying to an incoming segment
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-md5sig-use-skb-s-saddr-when-replying-to-an-incoming-segment.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Christoph Paasch <cpaasch(a)apple.com>
Date: Mon, 11 Dec 2017 00:05:46 -0800
Subject: tcp md5sig: Use skb's saddr when replying to an incoming segment
From: Christoph Paasch <cpaasch(a)apple.com>
[ Upstream commit 30791ac41927ebd3e75486f9504b6d2280463bf0 ]
The MD5-key that belongs to a connection is identified by the peer's
IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
to an incoming segment from tcp_check_req() that failed the seq-number
checks.
Thus, to find the correct key, we need to use the skb's saddr and not
the daddr.
This bug seems to have been there since quite a while, but probably got
unnoticed because the consequences are not catastrophic. We will call
tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
thus the connection doesn't really fail.
Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -844,7 +844,7 @@ static void tcp_v4_reqsk_send_ack(const
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent,
0,
- tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr,
+ tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->saddr,
AF_INET),
inet_rsk(req)->no_srccheck ? IP_REPLY_ARG_NOSRCCHECK : 0,
ip_hdr(skb)->tos);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -988,7 +988,7 @@ static void tcp_v6_reqsk_send_ack(const
req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent, sk->sk_bound_dev_if,
- tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
+ tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr),
0, 0);
}
Patches currently in stable-queue which might be from cpaasch(a)apple.com are
queue-4.14/tcp-md5sig-use-skb-s-saddr-when-replying-to-an-incoming-segment.patch
This is a note to let you know that I've just added the patch titled
tcp: invalidate rate samples during SACK reneging
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-invalidate-rate-samples-during-sack-reneging.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Yousuk Seung <ysseung(a)google.com>
Date: Thu, 7 Dec 2017 13:41:34 -0800
Subject: tcp: invalidate rate samples during SACK reneging
From: Yousuk Seung <ysseung(a)google.com>
[ Upstream commit d4761754b4fb2ef8d9a1e9d121c4bec84e1fe292 ]
Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.
< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000
In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.
This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.
Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Yousuk Seung <ysseung(a)google.com>
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Signed-off-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Acked-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Priyaranjan Jha <priyarjha(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/tcp.h | 3 ++-
include/net/tcp.h | 2 +-
net/ipv4/tcp.c | 1 +
net/ipv4/tcp_input.c | 10 ++++++++--
net/ipv4/tcp_rate.c | 10 +++++++---
5 files changed, 19 insertions(+), 7 deletions(-)
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -214,7 +214,8 @@ struct tcp_sock {
u8 chrono_type:2, /* current chronograph type */
rate_app_limited:1, /* rate_{delivered,interval_us} limited? */
fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
- unused:4;
+ is_sack_reneg:1, /* in recovery from loss with SACK reneg? */
+ unused:3;
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
unused1 : 1,
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1085,7 +1085,7 @@ void tcp_rate_skb_sent(struct sock *sk,
void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
struct rate_sample *rs);
void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
- struct rate_sample *rs);
+ bool is_sack_reneg, struct rate_sample *rs);
void tcp_rate_check_app_limited(struct sock *sk);
/* These functions determine how the current flow behaves in respect of SACK
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2356,6 +2356,7 @@ int tcp_disconnect(struct sock *sk, int
tp->snd_cwnd_cnt = 0;
tp->window_clamp = 0;
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
tcp_clear_retrans(tp);
inet_csk_delack_init(sk);
/* Initialize rcv_mss to TCP_MIN_MSS to avoid division by 0
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1975,6 +1975,8 @@ void tcp_enter_loss(struct sock *sk)
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
tp->sacked_out = 0;
tp->fackets_out = 0;
+ /* Mark SACK reneging until we recover from this loss event. */
+ tp->is_sack_reneg = 1;
}
tcp_clear_all_retrans_hints(tp);
@@ -2428,6 +2430,7 @@ static bool tcp_try_undo_recovery(struct
return true;
}
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
return false;
}
@@ -2459,8 +2462,10 @@ static bool tcp_try_undo_loss(struct soc
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPSPURIOUSRTOS);
inet_csk(sk)->icsk_retransmits = 0;
- if (frto_undo || tcp_is_sack(tp))
+ if (frto_undo || tcp_is_sack(tp)) {
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
+ }
return true;
}
return false;
@@ -3551,6 +3556,7 @@ static int tcp_ack(struct sock *sk, cons
struct tcp_sacktag_state sack_state;
struct rate_sample rs = { .prior_delivered = 0 };
u32 prior_snd_una = tp->snd_una;
+ bool is_sack_reneg = tp->is_sack_reneg;
u32 ack_seq = TCP_SKB_CB(skb)->seq;
u32 ack = TCP_SKB_CB(skb)->ack_seq;
bool is_dupack = false;
@@ -3666,7 +3672,7 @@ static int tcp_ack(struct sock *sk, cons
delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */
lost = tp->lost - lost; /* freshly marked lost */
- tcp_rate_gen(sk, delivered, lost, sack_state.rate);
+ tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate);
tcp_cong_control(sk, ack, delivered, flag, sack_state.rate);
tcp_xmit_recovery(sk, rexmit);
return 1;
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -106,7 +106,7 @@ void tcp_rate_skb_delivered(struct sock
/* Update the connection delivery information and generate a rate sample. */
void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
- struct rate_sample *rs)
+ bool is_sack_reneg, struct rate_sample *rs)
{
struct tcp_sock *tp = tcp_sk(sk);
u32 snd_us, ack_us;
@@ -124,8 +124,12 @@ void tcp_rate_gen(struct sock *sk, u32 d
rs->acked_sacked = delivered; /* freshly ACKed or SACKed */
rs->losses = lost; /* freshly marked lost */
- /* Return an invalid sample if no timing information is available. */
- if (!rs->prior_mstamp) {
+ /* Return an invalid sample if no timing information is available or
+ * in recovery from loss with SACK reneging. Rate samples taken during
+ * a SACK reneging event may overestimate bw by including packets that
+ * were SACKed before the reneg.
+ */
+ if (!rs->prior_mstamp || is_sack_reneg) {
rs->delivered = -1;
rs->interval_us = -1;
return;
Patches currently in stable-queue which might be from ysseung(a)google.com are
queue-4.14/tcp-invalidate-rate-samples-during-sack-reneging.patch
This is a note to let you know that I've just added the patch titled
tcp: fix potential underestimation on rcv_rtt
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-fix-potential-underestimation-on-rcv_rtt.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Wei Wang <weiwan(a)google.com>
Date: Tue, 12 Dec 2017 16:28:58 -0800
Subject: tcp: fix potential underestimation on rcv_rtt
From: Wei Wang <weiwan(a)google.com>
[ Upstream commit 9ee11bd03cb1a5c3ca33c2bb70e7ed325f68890f ]
When ms timestamp is used, current logic uses 1us in
tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us.
This could cause rcv_rtt underestimation.
Fix it by always using a min value of 1ms if ms timestamp is used.
Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Wei Wang <weiwan(a)google.com>
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_input.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -521,9 +521,6 @@ static void tcp_rcv_rtt_update(struct tc
u32 new_sample = tp->rcv_rtt_est.rtt_us;
long m = sample;
- if (m == 0)
- m = 1;
-
if (new_sample != 0) {
/* If we sample in larger samples in the non-timestamp
* case, we could grossly overestimate the RTT especially
@@ -560,6 +557,8 @@ static inline void tcp_rcv_rtt_measure(s
if (before(tp->rcv_nxt, tp->rcv_rtt_est.seq))
return;
delta_us = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcv_rtt_est.time);
+ if (!delta_us)
+ delta_us = 1;
tcp_rcv_rtt_update(tp, delta_us, 1);
new_measure:
@@ -576,8 +575,11 @@ static inline void tcp_rcv_rtt_measure_t
(TCP_SKB_CB(skb)->end_seq -
TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss)) {
u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
- u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
+ u32 delta_us;
+ if (!delta)
+ delta = 1;
+ delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ);
tcp_rcv_rtt_update(tp, delta_us, 0);
}
}
Patches currently in stable-queue which might be from weiwan(a)google.com are
queue-4.14/tcp-fix-potential-underestimation-on-rcv_rtt.patch
This is a note to let you know that I've just added the patch titled
sock: free skb in skb_complete_tx_timestamp on error
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sock-free-skb-in-skb_complete_tx_timestamp-on-error.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Willem de Bruijn <willemb(a)google.com>
Date: Wed, 13 Dec 2017 14:41:06 -0500
Subject: sock: free skb in skb_complete_tx_timestamp on error
From: Willem de Bruijn <willemb(a)google.com>
[ Upstream commit 35b99dffc3f710cafceee6c8c6ac6a98eb2cb4bf ]
skb_complete_tx_timestamp must ingest the skb it is passed. Call
kfree_skb if the skb cannot be enqueued.
Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()")
Reported-by: Richard Cochran <richardcochran(a)gmail.com>
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/core/skbuff.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4296,7 +4296,7 @@ void skb_complete_tx_timestamp(struct sk
struct sock *sk = skb->sk;
if (!skb_may_tx_timestamp(sk, false))
- return;
+ goto err;
/* Take a reference to prevent skb_orphan() from freeing the socket,
* but only if the socket refcount is not zero.
@@ -4305,7 +4305,11 @@ void skb_complete_tx_timestamp(struct sk
*skb_hwtstamps(skb) = *hwtstamps;
__skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, false);
sock_put(sk);
+ return;
}
+
+err:
+ kfree_skb(skb);
}
EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp);
Patches currently in stable-queue which might be from willemb(a)google.com are
queue-4.14/skbuff-skb_copy_ubufs-must-release-uarg-even-without-user-frags.patch
queue-4.14/sock-free-skb-in-skb_complete_tx_timestamp-on-error.patch
queue-4.14/skbuff-orphan-frags-before-zerocopy-clone.patch
queue-4.14/skbuff-in-skb_copy_ubufs-unclone-before-releasing-zerocopy.patch
This is a note to let you know that I've just added the patch titled
skbuff: skb_copy_ubufs must release uarg even without user frags
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
skbuff-skb_copy_ubufs-must-release-uarg-even-without-user-frags.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Willem de Bruijn <willemb(a)google.com>
Date: Wed, 20 Dec 2017 17:37:50 -0500
Subject: skbuff: skb_copy_ubufs must release uarg even without user frags
From: Willem de Bruijn <willemb(a)google.com>
[ Upstream commit b90ddd568792bcb0054eaf0f61785c8f80c3bd1c ]
skb_copy_ubufs creates a private copy of frags[] to release its hold
on user frags, then calls uarg->callback to notify the owner.
Call uarg->callback even when no frags exist. This edge case can
happen when zerocopy_sg_from_iter finds enough room in skb_headlen
to copy all the data.
Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/core/skbuff.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1182,7 +1182,7 @@ int skb_copy_ubufs(struct sk_buff *skb,
u32 d_off;
if (!num_frags)
- return 0;
+ goto release;
if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
return -EINVAL;
@@ -1242,6 +1242,7 @@ int skb_copy_ubufs(struct sk_buff *skb,
__skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off);
skb_shinfo(skb)->nr_frags = new_frags;
+release:
skb_zcopy_clear(skb, false);
return 0;
}
Patches currently in stable-queue which might be from willemb(a)google.com are
queue-4.14/skbuff-skb_copy_ubufs-must-release-uarg-even-without-user-frags.patch
queue-4.14/sock-free-skb-in-skb_complete_tx_timestamp-on-error.patch
queue-4.14/skbuff-orphan-frags-before-zerocopy-clone.patch
queue-4.14/skbuff-in-skb_copy_ubufs-unclone-before-releasing-zerocopy.patch
This is a note to let you know that I've just added the patch titled
skbuff: orphan frags before zerocopy clone
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
skbuff-orphan-frags-before-zerocopy-clone.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Willem de Bruijn <willemb(a)google.com>
Date: Wed, 20 Dec 2017 17:37:49 -0500
Subject: skbuff: orphan frags before zerocopy clone
From: Willem de Bruijn <willemb(a)google.com>
[ Upstream commit 268b790679422a89e9ab0685d9f291edae780c98 ]
Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
calls to skb_uarg(skb)->callback for the same data.
skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
with each segment. This is only safe for uargs that do refcounting,
which is those that pass skb_orphan_frags without dropping their
shared frags. For others, skb_orphan_frags drops the user frags and
sets the uarg to NULL, after which sock_zerocopy_clone has no effect.
Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
calls for the same data causing the vhost_net_ubuf_ref_>refcount to
drop below zero.
Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ(a)mail.gmail.com>
Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY")
Reported-by: Andreas Hartmann <andihartmann(a)01019freenet.de>
Reported-by: David Hill <dhill(a)redhat.com>
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/core/skbuff.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3657,8 +3657,6 @@ normal:
skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags &
SKBTX_SHARED_FRAG;
- if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC))
- goto err;
while (pos < offset + len) {
if (i >= nfrags) {
@@ -3684,6 +3682,8 @@ normal:
if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC)))
goto err;
+ if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC))
+ goto err;
*nskb_frag = *frag;
__skb_frag_ref(nskb_frag);
Patches currently in stable-queue which might be from willemb(a)google.com are
queue-4.14/skbuff-skb_copy_ubufs-must-release-uarg-even-without-user-frags.patch
queue-4.14/sock-free-skb-in-skb_complete_tx_timestamp-on-error.patch
queue-4.14/skbuff-orphan-frags-before-zerocopy-clone.patch
queue-4.14/skbuff-in-skb_copy_ubufs-unclone-before-releasing-zerocopy.patch
This is a note to let you know that I've just added the patch titled
skbuff: in skb_copy_ubufs unclone before releasing zerocopy
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
skbuff-in-skb_copy_ubufs-unclone-before-releasing-zerocopy.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Willem de Bruijn <willemb(a)google.com>
Date: Thu, 28 Dec 2017 12:38:13 -0500
Subject: skbuff: in skb_copy_ubufs unclone before releasing zerocopy
From: Willem de Bruijn <willemb(a)google.com>
skb_copy_ubufs must unclone before it is safe to modify its
skb_shared_info with skb_zcopy_clear.
Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even
without user frags") ensures that all skbs release their zerocopy
state, even those without frags.
But I forgot an edge case where such an skb arrives that is cloned.
The stack does not build such packets. Vhost/tun skbs have their
frags orphaned before cloning. TCP skbs only attach zerocopy state
when a frag is added.
But if TCP packets can be trimmed or linearized, this might occur.
Tracing the code I found no instance so far (e.g., skb_linearize
ends up calling skb_zcopy_clear if !skb->data_len).
Still, it is non-obvious that no path exists. And it is fragile to
rely on this.
Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags")
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/core/skbuff.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1181,12 +1181,12 @@ int skb_copy_ubufs(struct sk_buff *skb,
int i, new_frags;
u32 d_off;
- if (!num_frags)
- goto release;
-
if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
return -EINVAL;
+ if (!num_frags)
+ goto release;
+
new_frags = (__skb_pagelen(skb) + PAGE_SIZE - 1) >> PAGE_SHIFT;
for (i = 0; i < new_frags; i++) {
page = alloc_page(gfp_mask);
Patches currently in stable-queue which might be from willemb(a)google.com are
queue-4.14/skbuff-skb_copy_ubufs-must-release-uarg-even-without-user-frags.patch
queue-4.14/sock-free-skb-in-skb_complete_tx_timestamp-on-error.patch
queue-4.14/skbuff-orphan-frags-before-zerocopy-clone.patch
queue-4.14/skbuff-in-skb_copy_ubufs-unclone-before-releasing-zerocopy.patch
This is a note to let you know that I've just added the patch titled
sfc: pass valid pointers from efx_enqueue_unwind
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sfc-pass-valid-pointers-from-efx_enqueue_unwind.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Bert Kenward <bkenward(a)solarflare.com>
Date: Thu, 7 Dec 2017 17:18:58 +0000
Subject: sfc: pass valid pointers from efx_enqueue_unwind
From: Bert Kenward <bkenward(a)solarflare.com>
[ Upstream commit d4a7a8893d4cdbc89d79ac4aa704bf8d4b67b368 ]
The bytes_compl and pkts_compl pointers passed to efx_dequeue_buffers
cannot be NULL. Add a paranoid warning to check this condition and fix
the one case where they were NULL.
efx_enqueue_unwind() is called very rarely, during error handling.
Without this fix it would fail with a NULL pointer dereference in
efx_dequeue_buffer, with efx_enqueue_skb in the call stack.
Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2")
Reported-by: Jarod Wilson <jarod(a)redhat.com>
Signed-off-by: Bert Kenward <bkenward(a)solarflare.com>
Tested-by: Jarod Wilson <jarod(a)redhat.com>
Acked-by: Jarod Wilson <jarod(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/sfc/tx.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -77,6 +77,7 @@ static void efx_dequeue_buffer(struct ef
}
if (buffer->flags & EFX_TX_BUF_SKB) {
+ EFX_WARN_ON_PARANOID(!pkts_compl || !bytes_compl);
(*pkts_compl)++;
(*bytes_compl) += buffer->skb->len;
dev_consume_skb_any((struct sk_buff *)buffer->skb);
@@ -426,12 +427,14 @@ static int efx_tx_map_data(struct efx_tx
static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue)
{
struct efx_tx_buffer *buffer;
+ unsigned int bytes_compl = 0;
+ unsigned int pkts_compl = 0;
/* Work backwards until we hit the original insert pointer value */
while (tx_queue->insert_count != tx_queue->write_count) {
--tx_queue->insert_count;
buffer = __efx_tx_queue_get_insert_buffer(tx_queue);
- efx_dequeue_buffer(tx_queue, buffer, NULL, NULL);
+ efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl);
}
}
Patches currently in stable-queue which might be from bkenward(a)solarflare.com are
queue-4.14/sfc-pass-valid-pointers-from-efx_enqueue_unwind.patch
This is a note to let you know that I've just added the patch titled
sctp: Replace use of sockets_allocated with specified macro.
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sctp-replace-use-of-sockets_allocated-with-specified-macro.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Tonghao Zhang <xiangxia.m.yue(a)gmail.com>
Date: Fri, 22 Dec 2017 10:15:20 -0800
Subject: sctp: Replace use of sockets_allocated with specified macro.
From: Tonghao Zhang <xiangxia.m.yue(a)gmail.com>
[ Upstream commit 8cb38a602478e9f806571f6920b0a3298aabf042 ]
The patch(180d8cd942ce) replaces all uses of struct sock fields'
memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem
to accessor macros. But the sockets_allocated field of sctp sock is
not replaced at all. Then replace it now for unifying the code.
Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.")
Cc: Glauber Costa <glommer(a)parallels.com>
Signed-off-by: Tonghao Zhang <zhangtonghao(a)didichuxing.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sctp/socket.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -4413,7 +4413,7 @@ static int sctp_init_sock(struct sock *s
SCTP_DBG_OBJCNT_INC(sock);
local_bh_disable();
- percpu_counter_inc(&sctp_sockets_allocated);
+ sk_sockets_allocated_inc(sk);
sock_prot_inuse_add(net, sk->sk_prot, 1);
/* Nothing can fail after this block, otherwise
@@ -4457,7 +4457,7 @@ static void sctp_destroy_sock(struct soc
}
sctp_endpoint_free(sp->ep);
local_bh_disable();
- percpu_counter_dec(&sctp_sockets_allocated);
+ sk_sockets_allocated_dec(sk);
sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
local_bh_enable();
}
Patches currently in stable-queue which might be from xiangxia.m.yue(a)gmail.com are
queue-4.14/sctp-replace-use-of-sockets_allocated-with-specified-macro.patch