This is a note to let you know that I've just added the patch titled
netlink: do not set cb_running if dump's start() errs
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
netlink-do-not-set-cb_running-if-dump-s-start-errs.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: "Jason A. Donenfeld" <Jason(a)zx2c4.com>
Date: Mon, 9 Oct 2017 14:14:51 +0200
Subject: netlink: do not set cb_running if dump's start() errs
From: "Jason A. Donenfeld" <Jason(a)zx2c4.com>
[ Upstream commit 41c87425a1ac9b633e0fcc78eb1f19640c8fb5a0 ]
It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().
This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.
It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.
In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Reviewed-by: Johannes Berg <johannes(a)sipsolutions.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/netlink/af_netlink.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2207,16 +2207,17 @@ int __netlink_dump_start(struct sock *ss
cb->min_dump_alloc = control->min_dump_alloc;
cb->skb = skb;
+ if (cb->start) {
+ ret = cb->start(cb);
+ if (ret)
+ goto error_unlock;
+ }
+
nlk->cb_running = true;
mutex_unlock(nlk->cb_mutex);
- ret = 0;
- if (cb->start)
- ret = cb->start(cb);
-
- if (!ret)
- ret = netlink_dump(sk);
+ ret = netlink_dump(sk);
sock_put(sk);
Patches currently in stable-queue which might be from Jason(a)zx2c4.com are
queue-4.9/netlink-do-not-set-cb_running-if-dump-s-start-errs.patch
This is a note to let you know that I've just added the patch titled
net_sched: avoid matching qdisc with zero handle
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net_sched-avoid-matching-qdisc-with-zero-handle.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Cong Wang <xiyou.wangcong(a)gmail.com>
Date: Fri, 27 Oct 2017 22:08:56 -0700
Subject: net_sched: avoid matching qdisc with zero handle
From: Cong Wang <xiyou.wangcong(a)gmail.com>
[ Upstream commit 50317fce2cc70a2bbbc4b42c31bbad510382a53c ]
Davide found the following script triggers a NULL pointer
dereference:
ip l a name eth0 type dummy
tc q a dev eth0 parent :1 handle 1: htb
This is because for a freshly created netdevice noop_qdisc
is attached and when passing 'parent :1', kernel actually
tries to match the major handle which is 0 and noop_qdisc
has handle 0 so is matched by mistake. Commit 69012ae425d7
tries to fix a similar bug but still misses this case.
Handle 0 is not a valid one, should be just skipped. In
fact, kernel uses it as TC_H_UNSPEC.
Fixes: 69012ae425d7 ("net: sched: fix handling of singleton qdiscs with qdisc_hash")
Fixes: 59cc1f61f09c ("net: sched:convert qdisc linked list to hashtable")
Reported-by: Davide Caratti <dcaratti(a)redhat.com>
Cc: Jiri Kosina <jkosina(a)suse.cz>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sched/sch_api.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -296,6 +296,8 @@ struct Qdisc *qdisc_lookup(struct net_de
{
struct Qdisc *q;
+ if (!handle)
+ return NULL;
q = qdisc_match_from_root(dev->qdisc, handle);
if (q)
goto out;
Patches currently in stable-queue which might be from xiyou.wangcong(a)gmail.com are
queue-4.9/tun-call-dev_get_valid_name-before-register_netdevice.patch
queue-4.9/net_sched-avoid-matching-qdisc-with-zero-handle.patch
queue-4.9/tun-allow-positive-return-values-on-dev_get_valid_name-call.patch
This is a note to let you know that I've just added the patch titled
net/unix: don't show information about sockets from other namespaces
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-unix-don-t-show-information-about-sockets-from-other-namespaces.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Andrei Vagin <avagin(a)openvz.org>
Date: Wed, 25 Oct 2017 10:16:42 -0700
Subject: net/unix: don't show information about sockets from other namespaces
From: Andrei Vagin <avagin(a)openvz.org>
[ Upstream commit 0f5da659d8f1810f44de14acf2c80cd6499623a0 ]
socket_diag shows information only about sockets from a namespace where
a diag socket lives.
But if we request information about one unix socket, the kernel don't
check that its netns is matched with a diag socket namespace, so any
user can get information about any unix socket in a system. This looks
like a bug.
v2: add a Fixes tag
Fixes: 51d7cccf0723 ("net: make sock diag per-namespace")
Signed-off-by: Andrei Vagin <avagin(a)openvz.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/unix/diag.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -257,6 +257,8 @@ static int unix_diag_get_exact(struct sk
err = -ENOENT;
if (sk == NULL)
goto out_nosk;
+ if (!net_eq(sock_net(sk), net))
+ goto out;
err = sock_diag_check_cookie(sk, req->udiag_cookie);
if (err)
Patches currently in stable-queue which might be from avagin(a)openvz.org are
queue-4.9/net-unix-don-t-show-information-about-sockets-from-other-namespaces.patch
This is a note to let you know that I've just added the patch titled
net: call cgroup_sk_alloc() earlier in sk_clone_lock()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-call-cgroup_sk_alloc-earlier-in-sk_clone_lock.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Eric Dumazet <edumazet(a)google.com>
Date: Tue, 10 Oct 2017 19:12:33 -0700
Subject: net: call cgroup_sk_alloc() earlier in sk_clone_lock()
From: Eric Dumazet <edumazet(a)google.com>
[ Upstream commit c0576e3975084d4699b7bfef578613fb8e1144f6 ]
If for some reason, the newly allocated child need to be freed,
we will call cgroup_put() (via sk_free_unlock_clone()) while the
corresponding cgroup_get() was not yet done, and we will free memory
too soon.
Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/core/sock.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1526,6 +1526,7 @@ struct sock *sk_clone_lock(const struct
newsk->sk_userlocks = sk->sk_userlocks & ~SOCK_BINDPORT_LOCK;
sock_reset_flag(newsk, SOCK_DONE);
+ cgroup_sk_alloc(&newsk->sk_cgrp_data);
skb_queue_head_init(&newsk->sk_error_queue);
filter = rcu_dereference_protected(newsk->sk_filter, 1);
@@ -1560,8 +1561,6 @@ struct sock *sk_clone_lock(const struct
atomic64_set(&newsk->sk_cookie, 0);
mem_cgroup_sk_alloc(newsk);
- cgroup_sk_alloc(&newsk->sk_cgrp_data);
-
/*
* Before updating sk_refcnt, we must commit prior changes to memory
* (Documentation/RCU/rculist_nulls.txt for details)
Patches currently in stable-queue which might be from edumazet(a)google.com are
queue-4.9/net-call-cgroup_sk_alloc-earlier-in-sk_clone_lock.patch
queue-4.9/tcp-dccp-fix-ireq-opt-races.patch
queue-4.9/tcp-fix-tcp_mtu_probe-vs-highest_sack.patch
queue-4.9/ipv6-addrconf-increment-ifp-refcount-before-ipv6_del_addr.patch
queue-4.9/ipv6-flowlabel-do-not-leave-opt-tot_len-with-garbage.patch
queue-4.9/packet-avoid-panic-in-packet_getsockopt.patch
queue-4.9/sctp-add-the-missing-sock_owned_by_user-check-in-sctp_icmp_redirect.patch
queue-4.9/net_sched-avoid-matching-qdisc-with-zero-handle.patch
queue-4.9/tun-tap-sanitize-tunsetsndbuf-input.patch
queue-4.9/tcp-dccp-fix-lockdep-splat-in-inet_csk_route_req.patch
queue-4.9/tcp-dccp-fix-other-lockdep-splats-accessing-ireq_opt.patch
This is a note to let you know that I've just added the patch titled
l2tp: check ps->sock before running pppol2tp_session_ioctl()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
l2tp-check-ps-sock-before-running-pppol2tp_session_ioctl.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Guillaume Nault <g.nault(a)alphalink.fr>
Date: Fri, 13 Oct 2017 19:22:35 +0200
Subject: l2tp: check ps->sock before running pppol2tp_session_ioctl()
From: Guillaume Nault <g.nault(a)alphalink.fr>
[ Upstream commit 5903f594935a3841137c86b9d5b75143a5b7121c ]
When pppol2tp_session_ioctl() is called by pppol2tp_tunnel_ioctl(),
the session may be unconnected. That is, it was created by
pppol2tp_session_create() and hasn't been connected with
pppol2tp_connect(). In this case, ps->sock is NULL, so we need to check
for this case in order to avoid dereferencing a NULL pointer.
Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: Guillaume Nault <g.nault(a)alphalink.fr>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/l2tp/l2tp_ppp.c | 3 +++
1 file changed, 3 insertions(+)
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -993,6 +993,9 @@ static int pppol2tp_session_ioctl(struct
session->name, cmd, arg);
sk = ps->sock;
+ if (!sk)
+ return -EBADR;
+
sock_hold(sk);
switch (cmd) {
Patches currently in stable-queue which might be from g.nault(a)alphalink.fr are
queue-4.9/ppp-fix-race-in-ppp-device-destruction.patch
queue-4.9/l2tp-check-ps-sock-before-running-pppol2tp_session_ioctl.patch
This is a note to let you know that I've just added the patch titled
ipv6: flowlabel: do not leave opt->tot_len with garbage
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipv6-flowlabel-do-not-leave-opt-tot_len-with-garbage.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Eric Dumazet <edumazet(a)google.com>
Date: Sat, 21 Oct 2017 12:26:23 -0700
Subject: ipv6: flowlabel: do not leave opt->tot_len with garbage
From: Eric Dumazet <edumazet(a)google.com>
[ Upstream commit 864e2a1f8aac05effac6063ce316b480facb46ff ]
When syzkaller team brought us a C repro for the crash [1] that
had been reported many times in the past, I finally could find
the root cause.
If FlowLabel info is merged by fl6_merge_options(), we leave
part of the opt_space storage provided by udp/raw/l2tp with random value
in opt_space.tot_len, unless a control message was provided at sendmsg()
time.
Then ip6_setup_cork() would use this random value to perform a kzalloc()
call. Undefined behavior and crashes.
Fix is to properly set tot_len in fl6_merge_options()
At the same time, we can also avoid consuming memory and cpu cycles
to clear it, if every option is copied via a kmemdup(). This is the
change in ip6_setup_cork().
[1]
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
task: ffff8801cb64a100 task.stack: ffff8801cc350000
RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
FS: 00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Call Trace:
ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x358/0x5a0 net/socket.c:1750
SyS_sendto+0x40/0x50 net/socket.c:1718
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4520a9
RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/ip6_flowlabel.c | 1 +
net/ipv6/ip6_output.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -315,6 +315,7 @@ struct ipv6_txoptions *fl6_merge_options
}
opt_space->dst1opt = fopt->dst1opt;
opt_space->opt_flen = fopt->opt_flen;
+ opt_space->tot_len = fopt->tot_len;
return opt_space;
}
EXPORT_SYMBOL_GPL(fl6_merge_options);
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1215,11 +1215,11 @@ static int ip6_setup_cork(struct sock *s
if (WARN_ON(v6_cork->opt))
return -EINVAL;
- v6_cork->opt = kzalloc(opt->tot_len, sk->sk_allocation);
+ v6_cork->opt = kzalloc(sizeof(*opt), sk->sk_allocation);
if (unlikely(!v6_cork->opt))
return -ENOBUFS;
- v6_cork->opt->tot_len = opt->tot_len;
+ v6_cork->opt->tot_len = sizeof(*opt);
v6_cork->opt->opt_flen = opt->opt_flen;
v6_cork->opt->opt_nflen = opt->opt_nflen;
Patches currently in stable-queue which might be from edumazet(a)google.com are
queue-4.9/net-call-cgroup_sk_alloc-earlier-in-sk_clone_lock.patch
queue-4.9/tcp-dccp-fix-ireq-opt-races.patch
queue-4.9/tcp-fix-tcp_mtu_probe-vs-highest_sack.patch
queue-4.9/ipv6-addrconf-increment-ifp-refcount-before-ipv6_del_addr.patch
queue-4.9/ipv6-flowlabel-do-not-leave-opt-tot_len-with-garbage.patch
queue-4.9/packet-avoid-panic-in-packet_getsockopt.patch
queue-4.9/sctp-add-the-missing-sock_owned_by_user-check-in-sctp_icmp_redirect.patch
queue-4.9/net_sched-avoid-matching-qdisc-with-zero-handle.patch
queue-4.9/tun-tap-sanitize-tunsetsndbuf-input.patch
queue-4.9/tcp-dccp-fix-lockdep-splat-in-inet_csk_route_req.patch
queue-4.9/tcp-dccp-fix-other-lockdep-splats-accessing-ireq_opt.patch
This is a note to let you know that I've just added the patch titled
ipip: only increase err_count for some certain type icmp in ipip_err
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipip-only-increase-err_count-for-some-certain-type-icmp-in-ipip_err.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Xin Long <lucien.xin(a)gmail.com>
Date: Thu, 26 Oct 2017 19:19:56 +0800
Subject: ipip: only increase err_count for some certain type icmp in ipip_err
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit f3594f0a7ea36661d7fd942facd7f31a64245f1a ]
t->err_count is used to count the link failure on tunnel and an err
will be reported to user socket in tx path if t->err_count is not 0.
udp socket could even return EHOSTUNREACH to users.
Since commit fd58156e456d ("IPIP: Use ip-tunneling code.") removed
the 'switch check' for icmp type in ipip_err(), err_count would be
increased by the icmp packet with ICMP_EXC_FRAGTIME code. an link
failure would be reported out due to this.
In Jianlin's case, when receiving ICMP_EXC_FRAGTIME a icmp packet,
udp netperf failed with the err:
send_data: data send error: No route to host (errno 113)
We expect this error reported from tunnel to socket when receiving
some certain type icmp, but not ICMP_EXC_FRAGTIME, ICMP_SR_FAILED
or ICMP_PARAMETERPROB ones.
This patch is to bring 'switch check' for icmp type back to ipip_err
so that it only reports link failure for the right type icmp, just as
in ipgre_err() and ipip6_err().
Fixes: fd58156e456d ("IPIP: Use ip-tunneling code.")
Reported-by: Jianlin Shi <jishi(a)redhat.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/ipip.c | 59 +++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 42 insertions(+), 17 deletions(-)
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -128,43 +128,68 @@ static struct rtnl_link_ops ipip_link_op
static int ipip_err(struct sk_buff *skb, u32 info)
{
-
-/* All the routers (except for Linux) return only
- 8 bytes of packet payload. It means, that precise relaying of
- ICMP in the real Internet is absolutely infeasible.
- */
+ /* All the routers (except for Linux) return only
+ * 8 bytes of packet payload. It means, that precise relaying of
+ * ICMP in the real Internet is absolutely infeasible.
+ */
struct net *net = dev_net(skb->dev);
struct ip_tunnel_net *itn = net_generic(net, ipip_net_id);
const struct iphdr *iph = (const struct iphdr *)skb->data;
- struct ip_tunnel *t;
- int err;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
+ struct ip_tunnel *t;
+ int err = 0;
+
+ switch (type) {
+ case ICMP_DEST_UNREACH:
+ switch (code) {
+ case ICMP_SR_FAILED:
+ /* Impossible event. */
+ goto out;
+ default:
+ /* All others are translated to HOST_UNREACH.
+ * rfc2003 contains "deep thoughts" about NET_UNREACH,
+ * I believe they are just ether pollution. --ANK
+ */
+ break;
+ }
+ break;
+
+ case ICMP_TIME_EXCEEDED:
+ if (code != ICMP_EXC_TTL)
+ goto out;
+ break;
+
+ case ICMP_REDIRECT:
+ break;
+
+ default:
+ goto out;
+ }
- err = -ENOENT;
t = ip_tunnel_lookup(itn, skb->dev->ifindex, TUNNEL_NO_KEY,
iph->daddr, iph->saddr, 0);
- if (!t)
+ if (!t) {
+ err = -ENOENT;
goto out;
+ }
if (type == ICMP_DEST_UNREACH && code == ICMP_FRAG_NEEDED) {
- ipv4_update_pmtu(skb, dev_net(skb->dev), info,
- t->parms.link, 0, iph->protocol, 0);
- err = 0;
+ ipv4_update_pmtu(skb, net, info, t->parms.link, 0,
+ iph->protocol, 0);
goto out;
}
if (type == ICMP_REDIRECT) {
- ipv4_redirect(skb, dev_net(skb->dev), t->parms.link, 0,
- iph->protocol, 0);
- err = 0;
+ ipv4_redirect(skb, net, t->parms.link, 0, iph->protocol, 0);
goto out;
}
- if (t->parms.iph.daddr == 0)
+ if (t->parms.iph.daddr == 0) {
+ err = -ENOENT;
goto out;
+ }
- err = 0;
if (t->parms.iph.ttl == 0 && type == ICMP_TIME_EXCEEDED)
goto out;
Patches currently in stable-queue which might be from lucien.xin(a)gmail.com are
queue-4.9/sctp-reset-owner-sk-for-data-chunks-on-out-queues-when-migrating-a-sock.patch
queue-4.9/ipip-only-increase-err_count-for-some-certain-type-icmp-in-ipip_err.patch
queue-4.9/sctp-full-support-for-ipv6-ip_nonlocal_bind-ip_freebind.patch
queue-4.9/sctp-add-the-missing-sock_owned_by_user-check-in-sctp_icmp_redirect.patch
queue-4.9/ip6_gre-only-increase-err_count-for-some-certain-type-icmpv6-in-ip6gre_err.patch
queue-4.9/ip6_gre-update-dst-pmtu-if-dev-mtu-has-been-updated-by-toobig-in-__gre6_xmit.patch
This is a note to let you know that I've just added the patch titled
ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ip6_gre-update-dst-pmtu-if-dev-mtu-has-been-updated-by-toobig-in-__gre6_xmit.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Xin Long <lucien.xin(a)gmail.com>
Date: Thu, 26 Oct 2017 19:27:17 +0800
Subject: ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit 8aec4959d832bae0889a8e2f348973b5e4abffef ]
When receiving a Toobig icmpv6 packet, ip6gre_err would just set
tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
still be using the old value, it has no chance to be updated with
tunnel dev's mtu.
Jianlin found this issue by reducing route's mtu while running
netperf, the performance went to 0.
ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
to upper socket after setting tunnel dev's mtu.
We couldn't do that for ip6_gre, as gre's inner packet could be
any protocol, it's difficult to handle them (like lookup upper
dst) in a good way.
So this patch is to fix it by updating skb_dst(skb)'s pmtu when
dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no
performance regression can be caused by this.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Reported-by: Jianlin Shi <jishi(a)redhat.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/ip6_gre.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -508,8 +508,8 @@ static netdev_tx_t __gre6_xmit(struct sk
__u32 *pmtu, __be16 proto)
{
struct ip6_tnl *tunnel = netdev_priv(dev);
- __be16 protocol = (dev->type == ARPHRD_ETHER) ?
- htons(ETH_P_TEB) : proto;
+ struct dst_entry *dst = skb_dst(skb);
+ __be16 protocol;
if (dev->type == ARPHRD_ETHER)
IPCB(skb)->flags = 0;
@@ -523,9 +523,14 @@ static netdev_tx_t __gre6_xmit(struct sk
tunnel->o_seqno++;
/* Push GRE header. */
+ protocol = (dev->type == ARPHRD_ETHER) ? htons(ETH_P_TEB) : proto;
gre_build_header(skb, tunnel->tun_hlen, tunnel->parms.o_flags,
protocol, tunnel->parms.o_key, htonl(tunnel->o_seqno));
+ /* TooBig packet may have updated dst->dev's mtu */
+ if (dst && dst_mtu(dst) > dst->dev->mtu)
+ dst->ops->update_pmtu(dst, NULL, skb, dst->dev->mtu);
+
return ip6_tnl_xmit(skb, dev, dsfield, fl6, encap_limit, pmtu,
NEXTHDR_GRE);
}
Patches currently in stable-queue which might be from lucien.xin(a)gmail.com are
queue-4.9/sctp-reset-owner-sk-for-data-chunks-on-out-queues-when-migrating-a-sock.patch
queue-4.9/ipip-only-increase-err_count-for-some-certain-type-icmp-in-ipip_err.patch
queue-4.9/sctp-full-support-for-ipv6-ip_nonlocal_bind-ip_freebind.patch
queue-4.9/sctp-add-the-missing-sock_owned_by_user-check-in-sctp_icmp_redirect.patch
queue-4.9/ip6_gre-only-increase-err_count-for-some-certain-type-icmpv6-in-ip6gre_err.patch
queue-4.9/ip6_gre-update-dst-pmtu-if-dev-mtu-has-been-updated-by-toobig-in-__gre6_xmit.patch
This is a note to let you know that I've just added the patch titled
ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ip6_gre-only-increase-err_count-for-some-certain-type-icmpv6-in-ip6gre_err.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Nov 15 17:24:03 CET 2017
From: Xin Long <lucien.xin(a)gmail.com>
Date: Thu, 26 Oct 2017 19:23:27 +0800
Subject: ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit f8d20b46ce55cf40afb30dcef6d9288f7ef46d9b ]
The similar fix in patch 'ipip: only increase err_count for some
certain type icmp in ipip_err' is needed for ip6gre_err.
In Jianlin's case, udp netperf broke even when receiving a TooBig
icmpv6 packet.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Reported-by: Jianlin Shi <jishi(a)redhat.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/ip6_gre.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -408,13 +408,16 @@ static void ip6gre_err(struct sk_buff *s
case ICMPV6_DEST_UNREACH:
net_dbg_ratelimited("%s: Path to destination invalid or inactive!\n",
t->parms.name);
- break;
+ if (code != ICMPV6_PORT_UNREACH)
+ break;
+ return;
case ICMPV6_TIME_EXCEED:
if (code == ICMPV6_EXC_HOPLIMIT) {
net_dbg_ratelimited("%s: Too small hop limit or routing loop in tunnel!\n",
t->parms.name);
+ break;
}
- break;
+ return;
case ICMPV6_PARAMPROB:
teli = 0;
if (code == ICMPV6_HDR_FIELD)
@@ -430,7 +433,7 @@ static void ip6gre_err(struct sk_buff *s
net_dbg_ratelimited("%s: Recipient unable to parse tunneled packet!\n",
t->parms.name);
}
- break;
+ return;
case ICMPV6_PKT_TOOBIG:
mtu = be32_to_cpu(info) - offset - t->tun_hlen;
if (t->dev->type == ARPHRD_ETHER)
@@ -438,7 +441,7 @@ static void ip6gre_err(struct sk_buff *s
if (mtu < IPV6_MIN_MTU)
mtu = IPV6_MIN_MTU;
t->dev->mtu = mtu;
- break;
+ return;
}
if (time_before(jiffies, t->err_time + IP6TUNNEL_ERR_TIMEO))
Patches currently in stable-queue which might be from lucien.xin(a)gmail.com are
queue-4.9/sctp-reset-owner-sk-for-data-chunks-on-out-queues-when-migrating-a-sock.patch
queue-4.9/ipip-only-increase-err_count-for-some-certain-type-icmp-in-ipip_err.patch
queue-4.9/sctp-full-support-for-ipv6-ip_nonlocal_bind-ip_freebind.patch
queue-4.9/sctp-add-the-missing-sock_owned_by_user-check-in-sctp_icmp_redirect.patch
queue-4.9/ip6_gre-only-increase-err_count-for-some-certain-type-icmpv6-in-ip6gre_err.patch
queue-4.9/ip6_gre-update-dst-pmtu-if-dev-mtu-has-been-updated-by-toobig-in-__gre6_xmit.patch