This is a note to let you know that I've just added the patch titled
adding missing rcu_read_unlock in ipxip6_rcv
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
adding-missing-rcu_read_unlock-in-ipxip6_rcv.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:13:15 CET 2017
From: "Nikita V. Shirokov" <tehnerd(a)fb.com>
Date: Wed, 6 Dec 2017 17:15:43 -0800
Subject: adding missing rcu_read_unlock in ipxip6_rcv
From: "Nikita V. Shirokov" <tehnerd(a)fb.com>
[ Upstream commit 74c4b656c3d92ec4c824ea1a4afd726b7b6568c8 ]
commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
introduced new exit point in ipxip6_rcv. however rcu_read_unlock is
missing there. this diff is fixing this
v1->v2:
instead of doing rcu_read_unlock in place, we are going to "drop"
section (to prevent skb leakage)
Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Nikita V. Shirokov <tehnerd(a)fb.com>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/ip6_tunnel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -911,7 +911,7 @@ static int ipxip6_rcv(struct sk_buff *sk
if (t->parms.collect_md) {
tun_dst = ipv6_tun_rx_dst(skb, 0, 0, 0);
if (!tun_dst)
- return 0;
+ goto drop;
}
ret = __ip6_tnl_rcv(t, skb, tpi, tun_dst, dscp_ecn_decapsulate,
log_ecn_error);
Patches currently in stable-queue which might be from tehnerd(a)fb.com are
queue-4.9/adding-missing-rcu_read_unlock-in-ipxip6_rcv.patch
This is a note to let you know that I've just added the patch titled
vxlan: restore dev->mtu setting based on lower device
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
vxlan-restore-dev-mtu-setting-based-on-lower-device.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Alexey Kodanev <alexey.kodanev(a)oracle.com>
Date: Thu, 14 Dec 2017 20:20:00 +0300
Subject: vxlan: restore dev->mtu setting based on lower device
From: Alexey Kodanev <alexey.kodanev(a)oracle.com>
[ Upstream commit f870c1ff65a6d1f3a083f277280802ee09a5b44d ]
Stefano Brivio says:
Commit a985343ba906 ("vxlan: refactor verification and
application of configuration") introduced a change in the
behaviour of initial MTU setting: earlier, the MTU for a link
created on top of a given lower device, without an initial MTU
specification, was set to the MTU of the lower device minus
headroom as a result of this path in vxlan_dev_configure():
if (!conf->mtu)
dev->mtu = lowerdev->mtu -
(use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
which is now gone. Now, the initial MTU, in absence of a
configured value, is simply set by ether_setup() to ETH_DATA_LEN
(1500 bytes).
This breaks userspace expectations in case the MTU of
the lower device is higher than 1500 bytes minus headroom.
This patch restores the previous behaviour on newlink operation. Since
max_mtu can be negative and we update dev->mtu directly, also check it
for valid minimum.
Reported-by: Junhan Yan <juyan(a)redhat.com>
Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration")
Signed-off-by: Alexey Kodanev <alexey.kodanev(a)oracle.com>
Acked-by: Stefano Brivio <sbrivio(a)redhat.com>
Signed-off-by: Stefano Brivio <sbrivio(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/vxlan.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3105,6 +3105,11 @@ static void vxlan_config_apply(struct ne
max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM :
VXLAN_HEADROOM);
+ if (max_mtu < ETH_MIN_MTU)
+ max_mtu = ETH_MIN_MTU;
+
+ if (!changelink && !conf->mtu)
+ dev->mtu = max_mtu;
}
if (dev->mtu > max_mtu)
Patches currently in stable-queue which might be from alexey.kodanev(a)oracle.com are
queue-4.14/vxlan-restore-dev-mtu-setting-based-on-lower-device.patch
queue-4.14/ip6_gre-fix-device-features-for-ioctl-setup.patch
This is a note to let you know that I've just added the patch titled
tipc: fix hanging poll() for stream sockets
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tipc-fix-hanging-poll-for-stream-sockets.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan(a)gmail.com>
Date: Thu, 28 Dec 2017 12:03:06 +0100
Subject: tipc: fix hanging poll() for stream sockets
From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan(a)gmail.com>
[ Upstream commit 517d7c79bdb39864e617960504bdc1aa560c75c6 ]
In commit 42b531de17d2f6 ("tipc: Fix missing connection request
handling"), we replaced unconditional wakeup() with condtional
wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND.
This breaks the applications which do a connect followed by poll
with POLLOUT flag. These applications are not woken when the
connection is ESTABLISHED and hence sleep forever.
In this commit, we fix it by including the POLLOUT event for
sockets in TIPC_CONNECTING state.
Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling")
Acked-by: Jon Maloy <jon.maloy(a)ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/tipc/socket.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -709,11 +709,11 @@ static unsigned int tipc_poll(struct fil
switch (sk->sk_state) {
case TIPC_ESTABLISHED:
+ case TIPC_CONNECTING:
if (!tsk->cong_link_cnt && !tsk_conn_cong(tsk))
mask |= POLLOUT;
/* fall thru' */
case TIPC_LISTEN:
- case TIPC_CONNECTING:
if (!skb_queue_empty(&sk->sk_receive_queue))
mask |= (POLLIN | POLLRDNORM);
break;
Patches currently in stable-queue which might be from parthasarathy.bhuvaragan(a)gmail.com are
queue-4.14/tipc-fix-hanging-poll-for-stream-sockets.patch
This is a note to let you know that I've just added the patch titled
tg3: Fix rx hang on MTU change with 5717/5719
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tg3-fix-rx-hang-on-mtu-change-with-5717-5719.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Brian King <brking(a)linux.vnet.ibm.com>
Date: Fri, 15 Dec 2017 15:21:50 -0600
Subject: tg3: Fix rx hang on MTU change with 5717/5719
From: Brian King <brking(a)linux.vnet.ibm.com>
[ Upstream commit 748a240c589824e9121befb1cba5341c319885bc ]
This fixes a hang issue seen when changing the MTU size from 1500 MTU
to 9000 MTU on both 5717 and 5719 chips. In discussion with Broadcom,
they've indicated that these chipsets have the same phy as the 57766
chipset, so the same workarounds apply. This has been tested by IBM
on both Power 8 and Power 9 systems as well as by Broadcom on x86
hardware and has been confirmed to resolve the hang issue.
Signed-off-by: Brian King <brking(a)linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/broadcom/tg3.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -14227,7 +14227,9 @@ static int tg3_change_mtu(struct net_dev
/* Reset PHY, otherwise the read DMA engine will be in a mode that
* breaks all requests to 256 bytes.
*/
- if (tg3_asic_rev(tp) == ASIC_REV_57766)
+ if (tg3_asic_rev(tp) == ASIC_REV_57766 ||
+ tg3_asic_rev(tp) == ASIC_REV_5717 ||
+ tg3_asic_rev(tp) == ASIC_REV_5719)
reset_phy = true;
err = tg3_restart_hw(tp, reset_phy);
Patches currently in stable-queue which might be from brking(a)linux.vnet.ibm.com are
queue-4.14/tg3-fix-rx-hang-on-mtu-change-with-5717-5719.patch
This is a note to let you know that I've just added the patch titled
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp_bbr-reset-long-term-bandwidth-sampling-on-loss-recovery-undo.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Neal Cardwell <ncardwell(a)google.com>
Date: Thu, 7 Dec 2017 12:43:32 -0500
Subject: tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
From: Neal Cardwell <ncardwell(a)google.com>
[ Upstream commit 600647d467c6d04b3954b41a6ee1795b5ae00550 ]
Fix BBR so that upon notification of a loss recovery undo BBR resets
long-term bandwidth sampling.
Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
can cause BBR to spuriously estimate that we are seeing loss rates
high enough to trigger long-term bandwidth estimation. To avoid that
problem, this commit resets long-term bandwidth sampling on loss
recovery undo events.
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Reviewed-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_bbr.c | 1 +
1 file changed, 1 insertion(+)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -878,6 +878,7 @@ static u32 bbr_undo_cwnd(struct sock *sk
bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */
bbr->full_bw_cnt = 0;
+ bbr_reset_lt_bw_sampling(sk);
return tcp_sk(sk)->snd_cwnd;
}
Patches currently in stable-queue which might be from ncardwell(a)google.com are
queue-4.14/tcp-refresh-tcp_mstamp-from-timers-callbacks.patch
queue-4.14/tcp_bbr-reset-full-pipe-detection-on-loss-recovery-undo.patch
queue-4.14/tcp_bbr-reset-long-term-bandwidth-sampling-on-loss-recovery-undo.patch
queue-4.14/tcp-invalidate-rate-samples-during-sack-reneging.patch
queue-4.14/tcp-fix-potential-underestimation-on-rcv_rtt.patch
queue-4.14/tcp_bbr-record-full-bw-reached-decision-in-new-full_bw_reached-bit.patch
This is a note to let you know that I've just added the patch titled
tcp_bbr: reset full pipe detection on loss recovery undo
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp_bbr-reset-full-pipe-detection-on-loss-recovery-undo.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Neal Cardwell <ncardwell(a)google.com>
Date: Thu, 7 Dec 2017 12:43:31 -0500
Subject: tcp_bbr: reset full pipe detection on loss recovery undo
From: Neal Cardwell <ncardwell(a)google.com>
[ Upstream commit 2f6c498e4f15d27852c04ed46d804a39137ba364 ]
Fix BBR so that upon notification of a loss recovery undo BBR resets
the full pipe detection (STARTUP exit) state machine.
Under high reordering, reordering events can be interpreted as loss.
If the reordering and spurious loss estimates are high enough, this
could previously cause BBR to spuriously estimate that the pipe is
full.
Since spurious loss recovery means that our overall sending will have
slowed down spuriously, this commit gives a flow more time to probe
robustly for bandwidth and decide the pipe is really full.
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Reviewed-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_bbr.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -874,6 +874,10 @@ static u32 bbr_sndbuf_expand(struct sock
*/
static u32 bbr_undo_cwnd(struct sock *sk)
{
+ struct bbr *bbr = inet_csk_ca(sk);
+
+ bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */
+ bbr->full_bw_cnt = 0;
return tcp_sk(sk)->snd_cwnd;
}
Patches currently in stable-queue which might be from ncardwell(a)google.com are
queue-4.14/tcp-refresh-tcp_mstamp-from-timers-callbacks.patch
queue-4.14/tcp_bbr-reset-full-pipe-detection-on-loss-recovery-undo.patch
queue-4.14/tcp_bbr-reset-long-term-bandwidth-sampling-on-loss-recovery-undo.patch
queue-4.14/tcp-invalidate-rate-samples-during-sack-reneging.patch
queue-4.14/tcp-fix-potential-underestimation-on-rcv_rtt.patch
queue-4.14/tcp_bbr-record-full-bw-reached-decision-in-new-full_bw_reached-bit.patch
This is a note to let you know that I've just added the patch titled
tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp_bbr-record-full-bw-reached-decision-in-new-full_bw_reached-bit.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Neal Cardwell <ncardwell(a)google.com>
Date: Thu, 7 Dec 2017 12:43:30 -0500
Subject: tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
From: Neal Cardwell <ncardwell(a)google.com>
[ Upstream commit c589e69b508d29ed8e644dfecda453f71c02ec27 ]
This commit records the "full bw reached" decision in a new
full_bw_reached bit. This is a pure refactor that does not change the
current behavior, but enables subsequent fixes and improvements.
In particular, this enables simple and clean fixes because the full_bw
and full_bw_cnt can be unconditionally zeroed without worrying about
forgetting that we estimated we filled the pipe in Startup. And it
enables future improvements because multiple code paths can be used
for estimating that we filled the pipe in Startup; any new code paths
only need to set this bit when they think the pipe is full.
Note that this fix intentionally reduces the width of the full_bw_cnt
counter, since we have never used the most significant bit.
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Reviewed-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_bbr.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -110,7 +110,8 @@ struct bbr {
u32 lt_last_lost; /* LT intvl start: tp->lost */
u32 pacing_gain:10, /* current gain for setting pacing rate */
cwnd_gain:10, /* current gain for setting cwnd */
- full_bw_cnt:3, /* number of rounds without large bw gains */
+ full_bw_reached:1, /* reached full bw in Startup? */
+ full_bw_cnt:2, /* number of rounds without large bw gains */
cycle_idx:3, /* current index in pacing_gain cycle array */
has_seen_rtt:1, /* have we seen an RTT sample yet? */
unused_b:5;
@@ -180,7 +181,7 @@ static bool bbr_full_bw_reached(const st
{
const struct bbr *bbr = inet_csk_ca(sk);
- return bbr->full_bw_cnt >= bbr_full_bw_cnt;
+ return bbr->full_bw_reached;
}
/* Return the windowed max recent bandwidth sample, in pkts/uS << BW_SCALE. */
@@ -717,6 +718,7 @@ static void bbr_check_full_bw_reached(st
return;
}
++bbr->full_bw_cnt;
+ bbr->full_bw_reached = bbr->full_bw_cnt >= bbr_full_bw_cnt;
}
/* If pipe is probably full, drain the queue and then enter steady-state. */
@@ -850,6 +852,7 @@ static void bbr_init(struct sock *sk)
bbr->restore_cwnd = 0;
bbr->round_start = 0;
bbr->idle_restart = 0;
+ bbr->full_bw_reached = 0;
bbr->full_bw = 0;
bbr->full_bw_cnt = 0;
bbr->cycle_mstamp = 0;
Patches currently in stable-queue which might be from ncardwell(a)google.com are
queue-4.14/tcp-refresh-tcp_mstamp-from-timers-callbacks.patch
queue-4.14/tcp_bbr-reset-full-pipe-detection-on-loss-recovery-undo.patch
queue-4.14/tcp_bbr-reset-long-term-bandwidth-sampling-on-loss-recovery-undo.patch
queue-4.14/tcp-invalidate-rate-samples-during-sack-reneging.patch
queue-4.14/tcp-fix-potential-underestimation-on-rcv_rtt.patch
queue-4.14/tcp_bbr-record-full-bw-reached-decision-in-new-full_bw_reached-bit.patch
This is a note to let you know that I've just added the patch titled
tcp: refresh tcp_mstamp from timers callbacks
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-refresh-tcp_mstamp-from-timers-callbacks.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 12 Dec 2017 18:22:52 -0800
Subject: tcp: refresh tcp_mstamp from timers callbacks
From: Eric Dumazet <edumazet(a)google.com>
[ Upstream commit 4688eb7cf3ae2c2721d1dacff5c1384cba47d176 ]
Only the retransmit timer currently refreshes tcp_mstamp
We should do the same for delayed acks and keepalives.
Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.
Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: Soheil Hassas Yeganeh <soheil(a)google.com>
Cc: Mike Maloney <maloney(a)google.com>
Cc: Neal Cardwell <ncardwell(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Acked-by: Mike Maloney <maloney(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_timer.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -264,6 +264,7 @@ void tcp_delack_timer_handler(struct soc
icsk->icsk_ack.pingpong = 0;
icsk->icsk_ack.ato = TCP_ATO_MIN;
}
+ tcp_mstamp_refresh(tcp_sk(sk));
tcp_send_ack(sk);
__NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS);
}
@@ -627,6 +628,7 @@ static void tcp_keepalive_timer (unsigne
goto out;
}
+ tcp_mstamp_refresh(tp);
if (sk->sk_state == TCP_FIN_WAIT2 && sock_flag(sk, SOCK_DEAD)) {
if (tp->linger2 >= 0) {
const int tmo = tcp_fin_time(sk) - TCP_TIMEWAIT_LEN;
Patches currently in stable-queue which might be from edumazet(a)google.com are
queue-4.14/tcp-refresh-tcp_mstamp-from-timers-callbacks.patch
queue-4.14/net-fix-double-free-and-memory-corruption-in-get_net_ns_by_id.patch
queue-4.14/sock-free-skb-in-skb_complete_tx_timestamp-on-error.patch
queue-4.14/tcp-invalidate-rate-samples-during-sack-reneging.patch
queue-4.14/tcp-md5sig-use-skb-s-saddr-when-replying-to-an-incoming-segment.patch
queue-4.14/net-ipv4-fix-for-a-race-condition-in-raw_sendmsg.patch
queue-4.14/ipv4-igmp-guard-against-silly-mtu-values.patch
queue-4.14/tcp-fix-potential-underestimation-on-rcv_rtt.patch
queue-4.14/ipv6-mcast-better-catch-silly-mtu-values.patch
This is a note to let you know that I've just added the patch titled
tcp md5sig: Use skb's saddr when replying to an incoming segment
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-md5sig-use-skb-s-saddr-when-replying-to-an-incoming-segment.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Christoph Paasch <cpaasch(a)apple.com>
Date: Mon, 11 Dec 2017 00:05:46 -0800
Subject: tcp md5sig: Use skb's saddr when replying to an incoming segment
From: Christoph Paasch <cpaasch(a)apple.com>
[ Upstream commit 30791ac41927ebd3e75486f9504b6d2280463bf0 ]
The MD5-key that belongs to a connection is identified by the peer's
IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
to an incoming segment from tcp_check_req() that failed the seq-number
checks.
Thus, to find the correct key, we need to use the skb's saddr and not
the daddr.
This bug seems to have been there since quite a while, but probably got
unnoticed because the consequences are not catastrophic. We will call
tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
thus the connection doesn't really fail.
Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
Signed-off-by: Christoph Paasch <cpaasch(a)apple.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_ipv4.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -844,7 +844,7 @@ static void tcp_v4_reqsk_send_ack(const
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent,
0,
- tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr,
+ tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->saddr,
AF_INET),
inet_rsk(req)->no_srccheck ? IP_REPLY_ARG_NOSRCCHECK : 0,
ip_hdr(skb)->tos);
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -988,7 +988,7 @@ static void tcp_v6_reqsk_send_ack(const
req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale,
tcp_time_stamp_raw() + tcp_rsk(req)->ts_off,
req->ts_recent, sk->sk_bound_dev_if,
- tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr),
+ tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr),
0, 0);
}
Patches currently in stable-queue which might be from cpaasch(a)apple.com are
queue-4.14/tcp-md5sig-use-skb-s-saddr-when-replying-to-an-incoming-segment.patch
This is a note to let you know that I've just added the patch titled
tcp: invalidate rate samples during SACK reneging
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-invalidate-rate-samples-during-sack-reneging.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Sun Dec 31 11:12:48 CET 2017
From: Yousuk Seung <ysseung(a)google.com>
Date: Thu, 7 Dec 2017 13:41:34 -0800
Subject: tcp: invalidate rate samples during SACK reneging
From: Yousuk Seung <ysseung(a)google.com>
[ Upstream commit d4761754b4fb2ef8d9a1e9d121c4bec84e1fe292 ]
Mark tcp_sock during a SACK reneging event and invalidate rate samples
while marked. Such rate samples may overestimate bw by including packets
that were SACKed before reneging.
< ack 6001 win 10000 sack 7001:38001
< ack 7001 win 0 sack 8001:38001 // Reneg detected
> seq 7001:8001 // RTO, SACK cleared.
< ack 38001 win 10000
In above example the rate sample taken after the last ack will count
7001-38001 as delivered while the actual delivery rate likely could
be much lower i.e. 7001-8001.
This patch adds a new field tcp_sock.sack_reneg and marks it when we
declare SACK reneging and entering TCP_CA_Loss, and unmarks it after
the last rate sample was taken before moving back to TCP_CA_Open. This
patch also invalidates rate samples taken while tcp_sock.is_sack_reneg
is set.
Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection")
Signed-off-by: Yousuk Seung <ysseung(a)google.com>
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Signed-off-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Acked-by: Eric Dumazet <edumazet(a)google.com>
Acked-by: Priyaranjan Jha <priyarjha(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/tcp.h | 3 ++-
include/net/tcp.h | 2 +-
net/ipv4/tcp.c | 1 +
net/ipv4/tcp_input.c | 10 ++++++++--
net/ipv4/tcp_rate.c | 10 +++++++---
5 files changed, 19 insertions(+), 7 deletions(-)
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -214,7 +214,8 @@ struct tcp_sock {
u8 chrono_type:2, /* current chronograph type */
rate_app_limited:1, /* rate_{delivered,interval_us} limited? */
fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */
- unused:4;
+ is_sack_reneg:1, /* in recovery from loss with SACK reneg? */
+ unused:3;
u8 nonagle : 4,/* Disable Nagle algorithm? */
thin_lto : 1,/* Use linear timeouts for thin streams */
unused1 : 1,
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1085,7 +1085,7 @@ void tcp_rate_skb_sent(struct sock *sk,
void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
struct rate_sample *rs);
void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
- struct rate_sample *rs);
+ bool is_sack_reneg, struct rate_sample *rs);
void tcp_rate_check_app_limited(struct sock *sk);
/* These functions determine how the current flow behaves in respect of SACK
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2356,6 +2356,7 @@ int tcp_disconnect(struct sock *sk, int
tp->snd_cwnd_cnt = 0;
tp->window_clamp = 0;
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
tcp_clear_retrans(tp);
inet_csk_delack_init(sk);
/* Initialize rcv_mss to TCP_MIN_MSS to avoid division by 0
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1975,6 +1975,8 @@ void tcp_enter_loss(struct sock *sk)
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING);
tp->sacked_out = 0;
tp->fackets_out = 0;
+ /* Mark SACK reneging until we recover from this loss event. */
+ tp->is_sack_reneg = 1;
}
tcp_clear_all_retrans_hints(tp);
@@ -2428,6 +2430,7 @@ static bool tcp_try_undo_recovery(struct
return true;
}
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
return false;
}
@@ -2459,8 +2462,10 @@ static bool tcp_try_undo_loss(struct soc
NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPSPURIOUSRTOS);
inet_csk(sk)->icsk_retransmits = 0;
- if (frto_undo || tcp_is_sack(tp))
+ if (frto_undo || tcp_is_sack(tp)) {
tcp_set_ca_state(sk, TCP_CA_Open);
+ tp->is_sack_reneg = 0;
+ }
return true;
}
return false;
@@ -3551,6 +3556,7 @@ static int tcp_ack(struct sock *sk, cons
struct tcp_sacktag_state sack_state;
struct rate_sample rs = { .prior_delivered = 0 };
u32 prior_snd_una = tp->snd_una;
+ bool is_sack_reneg = tp->is_sack_reneg;
u32 ack_seq = TCP_SKB_CB(skb)->seq;
u32 ack = TCP_SKB_CB(skb)->ack_seq;
bool is_dupack = false;
@@ -3666,7 +3672,7 @@ static int tcp_ack(struct sock *sk, cons
delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */
lost = tp->lost - lost; /* freshly marked lost */
- tcp_rate_gen(sk, delivered, lost, sack_state.rate);
+ tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate);
tcp_cong_control(sk, ack, delivered, flag, sack_state.rate);
tcp_xmit_recovery(sk, rexmit);
return 1;
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -106,7 +106,7 @@ void tcp_rate_skb_delivered(struct sock
/* Update the connection delivery information and generate a rate sample. */
void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
- struct rate_sample *rs)
+ bool is_sack_reneg, struct rate_sample *rs)
{
struct tcp_sock *tp = tcp_sk(sk);
u32 snd_us, ack_us;
@@ -124,8 +124,12 @@ void tcp_rate_gen(struct sock *sk, u32 d
rs->acked_sacked = delivered; /* freshly ACKed or SACKed */
rs->losses = lost; /* freshly marked lost */
- /* Return an invalid sample if no timing information is available. */
- if (!rs->prior_mstamp) {
+ /* Return an invalid sample if no timing information is available or
+ * in recovery from loss with SACK reneging. Rate samples taken during
+ * a SACK reneging event may overestimate bw by including packets that
+ * were SACKed before the reneg.
+ */
+ if (!rs->prior_mstamp || is_sack_reneg) {
rs->delivered = -1;
rs->interval_us = -1;
return;
Patches currently in stable-queue which might be from ysseung(a)google.com are
queue-4.14/tcp-invalidate-rate-samples-during-sack-reneging.patch