March 2018 - Linux-stable-mirror

Patch "tuntap: disable preemption during XDP processing" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tuntap: disable preemption during XDP processing to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tuntap-disable-preemption-during-xdp-processing.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Jason Wang <jasowang(a)redhat.com> Date: Sat, 24 Feb 2018 11:32:25 +0800 Subject: tuntap: disable preemption during XDP processing From: Jason Wang <jasowang(a)redhat.com> [ Upstream commit 23e43f07f896f8578318cfcc9466f1e8b8ab21b6 ] Except for tuntap, all other drivers' XDP was implemented at NAPI poll() routine in a bh. This guarantees all XDP operation were done at the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But for tuntap, we do it in process context and we try to protect XDP processing by RCU reader lock. This is insufficient since CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which breaks the assumption that all XDP were processed in the same CPU. Fixing this by simply disabling preemption during XDP processing. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang <jasowang(a)redhat.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/net/tun.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1471,6 +1471,7 @@ static struct sk_buff *tun_build_skb(str else *skb_xdp = 0; + preempt_disable(); rcu_read_lock(); xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog && !*skb_xdp) { @@ -1494,6 +1495,7 @@ static struct sk_buff *tun_build_skb(str if (err) goto err_redirect; rcu_read_unlock(); + preempt_enable(); return NULL; case XDP_TX: xdp_xmit = true; @@ -1515,6 +1517,7 @@ static struct sk_buff *tun_build_skb(str skb = build_skb(buf, buflen); if (!skb) { rcu_read_unlock(); + preempt_enable(); return ERR_PTR(-ENOMEM); } @@ -1527,10 +1530,12 @@ static struct sk_buff *tun_build_skb(str skb->dev = tun->dev; generic_xdp_tx(skb, xdp_prog); rcu_read_unlock(); + preempt_enable(); return NULL; } rcu_read_unlock(); + preempt_enable(); return skb; @@ -1538,6 +1543,7 @@ err_redirect: put_page(alloc_frag->page); err_xdp: rcu_read_unlock(); + preempt_enable(); this_cpu_inc(tun->pcpu_stats->rx_dropped); return NULL; } Patches currently in stable-queue which might be from jasowang(a)redhat.com are queue-4.15/tuntap-correctly-add-the-missing-xdp-flush.patch queue-4.15/virtio-net-disable-napi-only-when-enabled-during-xdp-set.patch queue-4.15/tuntap-disable-preemption-during-xdp-processing.patch

7 years, 4 months

1
0
0 0

Patch "tuntap: correctly add the missing XDP flush" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tuntap: correctly add the missing XDP flush to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tuntap-correctly-add-the-missing-xdp-flush.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Jason Wang <jasowang(a)redhat.com> Date: Sat, 24 Feb 2018 11:32:26 +0800 Subject: tuntap: correctly add the missing XDP flush From: Jason Wang <jasowang(a)redhat.com> [ Upstream commit 1bb4f2e868a2891ab8bc668b8173d6ccb8c4ce6f ] We don't flush batched XDP packets through xdp_do_flush_map(), this will cause packets stall at TX queue. Consider we don't do XDP on NAPI poll(), the only possible fix is to call xdp_do_flush_map() immediately after xdp_do_redirect(). Note, this in fact won't try to batch packets through devmap, we could address in the future. Reported-by: Christoffer Dall <christoffer.dall(a)linaro.org> Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Jason Wang <jasowang(a)redhat.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/net/tun.c | 1 + 1 file changed, 1 insertion(+) --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1490,6 +1490,7 @@ static struct sk_buff *tun_build_skb(str get_page(alloc_frag->page); alloc_frag->offset += buflen; err = xdp_do_redirect(tun->dev, &xdp, xdp_prog); + xdp_do_flush_map(); if (err) goto err_redirect; rcu_read_unlock(); Patches currently in stable-queue which might be from jasowang(a)redhat.com are queue-4.15/tuntap-correctly-add-the-missing-xdp-flush.patch queue-4.15/virtio-net-disable-napi-only-when-enabled-during-xdp-set.patch queue-4.15/tuntap-disable-preemption-during-xdp-processing.patch

7 years, 4 months

1
0
0 0

Patch "tls: Use correct sk->sk_prot for IPV6" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tls: Use correct sk->sk_prot for IPV6 to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tls-use-correct-sk-sk_prot-for-ipv6.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Boris Pismenny <borisp(a)mellanox.com> Date: Tue, 27 Feb 2018 14:18:39 +0200 Subject: tls: Use correct sk->sk_prot for IPV6 From: Boris Pismenny <borisp(a)mellanox.com> [ Upstream commit c113187d38ff85dc302a1bb55864b203ebb2ba10 ] The tls ulp overrides sk->prot with a new tls specific proto structs. The tls specific structs were previously based on the ipv4 specific tcp_prot sturct. As a result, attaching the tls ulp to an ipv6 tcp socket replaced some ipv6 callback with the ipv4 equivalents. This patch adds ipv6 tls proto structs and uses them when attached to ipv6 sockets. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Boris Pismenny <borisp(a)mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal(a)mellanox.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/tls/tls_main.c | 52 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 15 deletions(-) --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -46,16 +46,26 @@ MODULE_DESCRIPTION("Transport Layer Secu MODULE_LICENSE("Dual BSD/GPL"); enum { + TLSV4, + TLSV6, + TLS_NUM_PROTS, +}; + +enum { TLS_BASE_TX, TLS_SW_TX, TLS_NUM_CONFIG, }; -static struct proto tls_prots[TLS_NUM_CONFIG]; +static struct proto *saved_tcpv6_prot; +static DEFINE_MUTEX(tcpv6_prot_mutex); +static struct proto tls_prots[TLS_NUM_PROTS][TLS_NUM_CONFIG]; static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx) { - sk->sk_prot = &tls_prots[ctx->tx_conf]; + int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4; + + sk->sk_prot = &tls_prots[ip_ver][ctx->tx_conf]; } int wait_on_pending_writer(struct sock *sk, long *timeo) @@ -450,8 +460,21 @@ static int tls_setsockopt(struct sock *s return do_tls_setsockopt(sk, optname, optval, optlen); } +static void build_protos(struct proto *prot, struct proto *base) +{ + prot[TLS_BASE_TX] = *base; + prot[TLS_BASE_TX].setsockopt = tls_setsockopt; + prot[TLS_BASE_TX].getsockopt = tls_getsockopt; + prot[TLS_BASE_TX].close = tls_sk_proto_close; + + prot[TLS_SW_TX] = prot[TLS_BASE_TX]; + prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg; + prot[TLS_SW_TX].sendpage = tls_sw_sendpage; +} + static int tls_init(struct sock *sk) { + int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4; struct inet_connection_sock *icsk = inet_csk(sk); struct tls_context *ctx; int rc = 0; @@ -476,6 +499,17 @@ static int tls_init(struct sock *sk) ctx->getsockopt = sk->sk_prot->getsockopt; ctx->sk_proto_close = sk->sk_prot->close; + /* Build IPv6 TLS whenever the address of tcpv6_prot changes */ + if (ip_ver == TLSV6 && + unlikely(sk->sk_prot != smp_load_acquire(&saved_tcpv6_prot))) { + mutex_lock(&tcpv6_prot_mutex); + if (likely(sk->sk_prot != saved_tcpv6_prot)) { + build_protos(tls_prots[TLSV6], sk->sk_prot); + smp_store_release(&saved_tcpv6_prot, sk->sk_prot); + } + mutex_unlock(&tcpv6_prot_mutex); + } + ctx->tx_conf = TLS_BASE_TX; update_sk_prot(sk, ctx); out: @@ -488,21 +522,9 @@ static struct tcp_ulp_ops tcp_tls_ulp_op .init = tls_init, }; -static void build_protos(struct proto *prot, struct proto *base) -{ - prot[TLS_BASE_TX] = *base; - prot[TLS_BASE_TX].setsockopt = tls_setsockopt; - prot[TLS_BASE_TX].getsockopt = tls_getsockopt; - prot[TLS_BASE_TX].close = tls_sk_proto_close; - - prot[TLS_SW_TX] = prot[TLS_BASE_TX]; - prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg; - prot[TLS_SW_TX].sendpage = tls_sw_sendpage; -} - static int __init tls_register(void) { - build_protos(tls_prots, &tcp_prot); + build_protos(tls_prots[TLSV4], &tcp_prot); tcp_register_ulp(&tcp_tls_ulp_ops); Patches currently in stable-queue which might be from borisp(a)mellanox.com are queue-4.15/tls-use-correct-sk-sk_prot-for-ipv6.patch

7 years, 4 months

1
0
0 0

Patch "tcp_bbr: better deal with suboptimal GSO" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tcp_bbr: better deal with suboptimal GSO to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tcp_bbr-better-deal-with-suboptimal-gso.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Eric Dumazet <edumazet(a)google.com> Date: Wed, 21 Feb 2018 06:43:03 -0800 Subject: tcp_bbr: better deal with suboptimal GSO From: Eric Dumazet <edumazet(a)google.com> [ Upstream commit 350c9f484bde93ef229682eedd98cd5f74350f7f ] BBR uses tcp_tso_autosize() in an attempt to probe what would be the burst sizes and to adjust cwnd in bbr_target_cwnd() with following gold formula : /* Allow enough full-sized skbs in flight to utilize end systems. */ cwnd += 3 * bbr->tso_segs_goal; But GSO can be lacking or be constrained to very small units (ip link set dev ... gso_max_segs 2) What we really want is to have enough packets in flight so that both GSO and GRO are efficient. So in the case GSO is off or downgraded, we still want to have the same number of packets in flight as if GSO/TSO was fully operational, so that GRO can hopefully be working efficiently. To fix this issue, we make tcp_tso_autosize() unaware of sk->sk_gso_max_segs Only tcp_tso_segs() has to enforce the gso_max_segs limit. Tested: ethtool -K eth0 tso off gso off tc qd replace dev eth0 root pfifo_fast Before patch: for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done 691 (ss -temoi shows cwnd is stuck around 6 ) 667 651 631 517 After patch : # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done 1733 (ss -temoi shows cwnd is around 386 ) 1778 1746 1781 1718 Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Eric Dumazet <edumazet(a)google.com> Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name> Acked-by: Neal Cardwell <ncardwell(a)google.com> Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv4/tcp_output.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1730,7 +1730,7 @@ u32 tcp_tso_autosize(const struct sock * */ segs = max_t(u32, bytes / mss_now, min_tso_segs); - return min_t(u32, segs, sk->sk_gso_max_segs); + return segs; } EXPORT_SYMBOL(tcp_tso_autosize); @@ -1742,9 +1742,10 @@ static u32 tcp_tso_segs(struct sock *sk, const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops; u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0; - return tso_segs ? : - tcp_tso_autosize(sk, mss_now, - sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs); + if (!tso_segs) + tso_segs = tcp_tso_autosize(sk, mss_now, + sock_net(sk)->ipv4.sysctl_tcp_min_tso_segs); + return min_t(u32, tso_segs, sk->sk_gso_max_segs); } /* Returns the portion of skb which can be sent right away */ Patches currently in stable-queue which might be from edumazet(a)google.com are queue-4.15/doc-change-the-min-default-value-of-tcp_wmem-tcp_rmem.patch queue-4.15/tcp-purge-write-queue-upon-rst.patch queue-4.15/net_sched-gen_estimator-fix-broken-estimators-based-on-percpu-stats.patch queue-4.15/tcp_bbr-better-deal-with-suboptimal-gso.patch

7 years, 4 months

1
0
0 0

Patch "tcp: tracepoint: only call trace_tcp_send_reset with full socket" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tcp: tracepoint: only call trace_tcp_send_reset with full socket to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tcp-tracepoint-only-call-trace_tcp_send_reset-with-full-socket.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Song Liu <songliubraving(a)fb.com> Date: Tue, 6 Feb 2018 20:50:23 -0800 Subject: tcp: tracepoint: only call trace_tcp_send_reset with full socket From: Song Liu <songliubraving(a)fb.com> [ Upstream commit 5c487bb9adddbc1d23433e09d2548759375c2b52 ] tracepoint tcp_send_reset requires a full socket to work. However, it may be called when in TCP_TIME_WAIT: case TCP_TW_RST: tcp_v6_send_reset(sk, skb); inet_twsk_deschedule_put(inet_twsk(sk)); goto discard_it; To avoid this problem, this patch checks the socket with sk_fullsock() before calling trace_tcp_send_reset(). Fixes: c24b14c46bb8 ("tcp: add tracepoint trace_tcp_send_reset") Signed-off-by: Song Liu <songliubraving(a)fb.com> Reviewed-by: Lawrence Brakmo <brakmo(a)fb.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv4/tcp_ipv4.c | 3 ++- net/ipv6/tcp_ipv6.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -705,7 +705,8 @@ static void tcp_v4_send_reset(const stru */ if (sk) { arg.bound_dev_if = sk->sk_bound_dev_if; - trace_tcp_send_reset(sk, skb); + if (sk_fullsock(sk)) + trace_tcp_send_reset(sk, skb); } BUILD_BUG_ON(offsetof(struct sock, sk_bound_dev_if) != --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -943,7 +943,8 @@ static void tcp_v6_send_reset(const stru if (sk) { oif = sk->sk_bound_dev_if; - trace_tcp_send_reset(sk, skb); + if (sk_fullsock(sk)) + trace_tcp_send_reset(sk, skb); } tcp_v6_send_response(sk, skb, seq, ack_seq, 0, 0, 0, oif, key, 1, 0, 0); Patches currently in stable-queue which might be from songliubraving(a)fb.com are queue-4.15/tcp-tracepoint-only-call-trace_tcp_send_reset-with-full-socket.patch

7 years, 4 months

1
0
0 0

Patch "tcp: revert F-RTO middle-box workaround" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tcp: revert F-RTO middle-box workaround to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tcp-revert-f-rto-middle-box-workaround.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Yuchung Cheng <ycheng(a)google.com> Date: Tue, 27 Feb 2018 14:15:01 -0800 Subject: tcp: revert F-RTO middle-box workaround From: Yuchung Cheng <ycheng(a)google.com> [ Upstream commit d4131f09770d9b7471c9da65e6ecd2477746ac5c ] This reverts commit cc663f4d4c97b7297fb45135ab23cfd508b35a77. While fixing some broken middle-boxes that modifies receive window fields, it does not address middle-boxes that strip off SACK options. The best solution is to fully revert this patch and the root F-RTO enhancement. Fixes: cc663f4d4c97 ("tcp: restrict F-RTO to work-around broken middle-boxes") Reported-by: Teodor Milkov <tm(a)del.bg> Signed-off-by: Yuchung Cheng <ycheng(a)google.com> Signed-off-by: Neal Cardwell <ncardwell(a)google.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv4/tcp_input.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 45f750e85714..50963f92a67d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1915,7 +1915,6 @@ void tcp_enter_loss(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); struct sk_buff *skb; - bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; bool is_reneg; /* is receiver reneging on SACKs? */ bool mark_lost; @@ -1974,17 +1973,15 @@ void tcp_enter_loss(struct sock *sk) tp->high_seq = tp->snd_nxt; tcp_ecn_queue_cwr(tp); - /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous - * loss recovery is underway except recurring timeout(s) on - * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing - * - * In theory F-RTO can be used repeatedly during loss recovery. - * In practice this interacts badly with broken middle-boxes that - * falsely raise the receive window, which results in repeated - * timeouts and stop-and-go behavior. + /* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO + * if a previous recovery is underway, otherwise it may incorrectly + * call a timeout spurious if some previously retransmitted packets + * are s/acked (sec 3.2). We do not apply that retriction since + * retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS + * so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO + * on PTMU discovery to avoid sending new data. */ tp->frto = net->ipv4.sysctl_tcp_frto && - (new_recovery || icsk->icsk_retransmits) && !inet_csk(sk)->icsk_mtup.probe_size; } -- 2.14.3 Patches currently in stable-queue which might be from ycheng(a)google.com are queue-4.15/tcp-purge-write-queue-upon-rst.patch queue-4.15/tcp-revert-f-rto-extension-to-detect-more-spurious-timeouts.patch queue-4.15/tcp-revert-f-rto-middle-box-workaround.patch

7 years, 4 months

1
0
0 0

Patch "tcp: revert F-RTO extension to detect more spurious timeouts" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tcp: revert F-RTO extension to detect more spurious timeouts to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tcp-revert-f-rto-extension-to-detect-more-spurious-timeouts.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Yuchung Cheng <ycheng(a)google.com> Date: Tue, 27 Feb 2018 14:15:02 -0800 Subject: tcp: revert F-RTO extension to detect more spurious timeouts From: Yuchung Cheng <ycheng(a)google.com> [ Upstream commit fc68e171d376c322e6777a3d7ac2f0278b68b17f ] This reverts commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427. While the patch could detect more spurious timeouts, it could cause poor TCP performance on broken middle-boxes that modifies TCP packets (e.g. receive window, SACK options). Since the performance gain is much smaller compared to the potential loss. The best solution is to fully revert the change. Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts") Reported-by: Teodor Milkov <tm(a)del.bg> Signed-off-by: Yuchung Cheng <ycheng(a)google.com> Signed-off-by: Neal Cardwell <ncardwell(a)google.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv4/tcp_input.c | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-) --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1915,6 +1915,7 @@ void tcp_enter_loss(struct sock *sk) struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); struct sk_buff *skb; + bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; bool is_reneg; /* is receiver reneging on SACKs? */ bool mark_lost; @@ -1973,15 +1974,12 @@ void tcp_enter_loss(struct sock *sk) tp->high_seq = tp->snd_nxt; tcp_ecn_queue_cwr(tp); - /* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO - * if a previous recovery is underway, otherwise it may incorrectly - * call a timeout spurious if some previously retransmitted packets - * are s/acked (sec 3.2). We do not apply that retriction since - * retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS - * so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO - * on PTMU discovery to avoid sending new data. + /* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous + * loss recovery is underway except recurring timeout(s) on + * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing */ tp->frto = net->ipv4.sysctl_tcp_frto && + (new_recovery || icsk->icsk_retransmits) && !inet_csk(sk)->icsk_mtup.probe_size; } @@ -2634,18 +2632,14 @@ static void tcp_process_loss(struct sock tcp_try_undo_loss(sk, false)) return; - /* The ACK (s)acks some never-retransmitted data meaning not all - * the data packets before the timeout were lost. Therefore we - * undo the congestion window and state. This is essentially - * the operation in F-RTO (RFC5682 section 3.1 step 3.b). Since - * a retransmitted skb is permantly marked, we can apply such an - * operation even if F-RTO was not used. - */ - if ((flag & FLAG_ORIG_SACK_ACKED) && - tcp_try_undo_loss(sk, tp->undo_marker)) - return; - if (tp->frto) { /* F-RTO RFC5682 sec 3.1 (sack enhanced version). */ + /* Step 3.b. A timeout is spurious if not all data are + * lost, i.e., never-retransmitted data are (s)acked. + */ + if ((flag & FLAG_ORIG_SACK_ACKED) && + tcp_try_undo_loss(sk, true)) + return; + if (after(tp->snd_nxt, tp->high_seq)) { if (flag & FLAG_DATA_SACKED || is_dupack) tp->frto = 0; /* Step 3.a. loss was real */ Patches currently in stable-queue which might be from ycheng(a)google.com are queue-4.15/tcp-purge-write-queue-upon-rst.patch queue-4.15/tcp-revert-f-rto-extension-to-detect-more-spurious-timeouts.patch queue-4.15/tcp-revert-f-rto-middle-box-workaround.patch

7 years, 4 months

1
0
0 0

Patch "tcp: purge write queue upon RST" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tcp: purge write queue upon RST to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tcp-purge-write-queue-upon-rst.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Soheil Hassas Yeganeh <soheil(a)google.com> Date: Tue, 27 Feb 2018 18:32:18 -0500 Subject: tcp: purge write queue upon RST From: Soheil Hassas Yeganeh <soheil(a)google.com> [ Upstream commit a27fd7a8ed3856faaf5a2ff1c8c5f00c0667aaa0 ] When the connection is reset, there is no point in keeping the packets on the write queue until the connection is closed. RFC 793 (page 70) and RFC 793-bis (page 64) both suggest purging the write queue upon RST: https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07 Moreover, this is essential for a correct MSG_ZEROCOPY implementation, because userspace cannot call close(fd) before receiving zerocopy signals even when the connection is reset. Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY") Signed-off-by: Soheil Hassas Yeganeh <soheil(a)google.com> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Yuchung Cheng <ycheng(a)google.com> Signed-off-by: Neal Cardwell <ncardwell(a)google.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv4/tcp_input.c | 1 + 1 file changed, 1 insertion(+) --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3988,6 +3988,7 @@ void tcp_reset(struct sock *sk) /* This barrier is coupled with smp_rmb() in tcp_poll() */ smp_wmb(); + tcp_write_queue_purge(sk); tcp_done(sk); if (!sock_flag(sk, SOCK_DEAD)) Patches currently in stable-queue which might be from soheil(a)google.com are queue-4.15/tcp-purge-write-queue-upon-rst.patch queue-4.15/tcp_bbr-better-deal-with-suboptimal-gso.patch

7 years, 4 months

1
0
0 0

Patch "sctp: verify size of a new chunk in _sctp_make_chunk()" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled sctp: verify size of a new chunk in _sctp_make_chunk() to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: sctp-verify-size-of-a-new-chunk-in-_sctp_make_chunk.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Alexey Kodanev <alexey.kodanev(a)oracle.com> Date: Fri, 9 Feb 2018 17:35:23 +0300 Subject: sctp: verify size of a new chunk in _sctp_make_chunk() From: Alexey Kodanev <alexey.kodanev(a)oracle.com> [ Upstream commit 07f2c7ab6f8d0a7e7c5764c4e6cc9c52951b9d9c ] When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK: [ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ... Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters. Later this chunk causes the panic in skb_put_data(): skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len); 'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'. As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN. Signed-off-by: Alexey Kodanev <alexey.kodanev(a)oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leinter(a)gmail.com> Acked-by: Neil Horman <nhorman(a)tuxdriver.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/sctp/sm_make_chunk.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/net/sctp/sm_make_chunk.c +++ b/net/sctp/sm_make_chunk.c @@ -1378,9 +1378,14 @@ static struct sctp_chunk *_sctp_make_chu struct sctp_chunk *retval; struct sk_buff *skb; struct sock *sk; + int chunklen; + + chunklen = SCTP_PAD4(sizeof(*chunk_hdr) + paylen); + if (chunklen > SCTP_MAX_CHUNK_LEN) + goto nodata; /* No need to allocate LL here, as this is only a chunk. */ - skb = alloc_skb(SCTP_PAD4(sizeof(*chunk_hdr) + paylen), gfp); + skb = alloc_skb(chunklen, gfp); if (!skb) goto nodata; Patches currently in stable-queue which might be from alexey.kodanev(a)oracle.com are queue-4.15/sctp-fix-dst-refcnt-leak-in-sctp_v6_get_dst.patch queue-4.15/udplite-fix-partial-checksum-initialization.patch queue-4.15/sctp-verify-size-of-a-new-chunk-in-_sctp_make_chunk.patch

7 years, 4 months

1
0
0 0

Patch "tcp: Honor the eor bit in tcp_mtu_probe" has been added to the 4.15-stable tree

by gregkh＠linuxfoundation.org

This is a note to let you know that I've just added the patch titled tcp: Honor the eor bit in tcp_mtu_probe to the 4.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… The filename of the patch is: tcp-honor-the-eor-bit-in-tcp_mtu_probe.patch and it can be found in the queue-4.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable(a)vger.kernel.org> know about it. >From foo@baz Tue Mar 6 19:02:56 PST 2018 From: Ilya Lesokhin <ilyal(a)mellanox.com> Date: Mon, 12 Feb 2018 12:57:04 +0200 Subject: tcp: Honor the eor bit in tcp_mtu_probe From: Ilya Lesokhin <ilyal(a)mellanox.com> [ Upstream commit 808cf9e38cd7923036a99f459ccc8cf2955e47af ] Avoid SKB coalescing if eor bit is set in one of the relevant SKBs. Fixes: c134ecb87817 ("tcp: Make use of MSG_EOR in tcp_sendmsg") Signed-off-by: Ilya Lesokhin <ilyal(a)mellanox.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- net/ipv4/tcp_output.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2026,6 +2026,24 @@ static inline void tcp_mtu_check_reprobe } } +static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) +{ + struct sk_buff *skb, *next; + + skb = tcp_send_head(sk); + tcp_for_write_queue_from_safe(skb, next, sk) { + if (len <= skb->len) + break; + + if (unlikely(TCP_SKB_CB(skb)->eor)) + return false; + + len -= skb->len; + } + + return true; +} + /* Create a new MTU probe if we are ready. * MTU probe is regularly attempting to increase the path MTU by * deliberately sending larger packets. This discovers routing @@ -2098,6 +2116,9 @@ static int tcp_mtu_probe(struct sock *sk return 0; } + if (!tcp_can_coalesce_send_queue_head(sk, probe_size)) + return -1; + /* We're allowed to probe. Build it now. */ nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false); if (!nskb) @@ -2133,6 +2154,10 @@ static int tcp_mtu_probe(struct sock *sk /* We've eaten all the data from this skb. * Throw it away. */ TCP_SKB_CB(nskb)->tcp_flags |= TCP_SKB_CB(skb)->tcp_flags; + /* If this is the last SKB we copy and eor is set + * we need to propagate it to the new skb. + */ + TCP_SKB_CB(nskb)->eor = TCP_SKB_CB(skb)->eor; tcp_unlink_write_queue(skb, sk); sk_wmem_free_skb(sk, skb); } else { Patches currently in stable-queue which might be from ilyal(a)mellanox.com are queue-4.15/tls-use-correct-sk-sk_prot-for-ipv6.patch queue-4.15/tcp-honor-the-eor-bit-in-tcp_mtu_probe.patch

7 years, 4 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror March 2018