This is a note to let you know that I've just added the patch titled
ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ipv6-fix-so_reuseport-udp-socket-with-implicit-sk_ipv6only.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:20 PST 2018
From: Martin KaFai Lau <kafai(a)fb.com>
Date: Wed, 24 Jan 2018 23:15:27 -0800
Subject: ipv6: Fix SO_REUSEPORT UDP socket with implicit sk_ipv6only
From: Martin KaFai Lau <kafai(a)fb.com>
[ Upstream commit 7ece54a60ee2ba7a386308cae73c790bd580589c ]
If a sk_v6_rcv_saddr is !IPV6_ADDR_ANY and !IPV6_ADDR_MAPPED, it
implicitly implies it is an ipv6only socket. However, in inet6_bind(),
this addr_type checking and setting sk->sk_ipv6only to 1 are only done
after sk->sk_prot->get_port(sk, snum) has been completed successfully.
This inconsistency between sk_v6_rcv_saddr and sk_ipv6only confuses
the 'get_port()'.
In particular, when binding SO_REUSEPORT UDP sockets,
udp_reuseport_add_sock(sk,...) is called. udp_reuseport_add_sock()
checks "ipv6_only_sock(sk2) == ipv6_only_sock(sk)" before adding sk to
sk2->sk_reuseport_cb. In this case, ipv6_only_sock(sk2) could be
1 while ipv6_only_sock(sk) is still 0 here. The end result is,
reuseport_alloc(sk) is called instead of adding sk to the existing
sk2->sk_reuseport_cb.
It can be reproduced by binding two SO_REUSEPORT UDP sockets on an
IPv6 address (!ANY and !MAPPED). Only one of the socket will
receive packet.
The fix is to set the implicit sk_ipv6only before calling get_port().
The original sk_ipv6only has to be saved such that it can be restored
in case get_port() failed. The situation is similar to the
inet_reset_saddr(sk) after get_port() has failed.
Thanks to Calvin Owens <calvinowens(a)fb.com> who created an easy
reproduction which leads to a fix.
Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Signed-off-by: Martin KaFai Lau <kafai(a)fb.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/af_inet6.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -284,6 +284,7 @@ int inet6_bind(struct socket *sock, stru
struct net *net = sock_net(sk);
__be32 v4addr = 0;
unsigned short snum;
+ bool saved_ipv6only;
int addr_type = 0;
int err = 0;
@@ -389,19 +390,21 @@ int inet6_bind(struct socket *sock, stru
if (!(addr_type & IPV6_ADDR_MULTICAST))
np->saddr = addr->sin6_addr;
+ saved_ipv6only = sk->sk_ipv6only;
+ if (addr_type != IPV6_ADDR_ANY && addr_type != IPV6_ADDR_MAPPED)
+ sk->sk_ipv6only = 1;
+
/* Make sure we are allowed to bind here. */
if ((snum || !inet->bind_address_no_port) &&
sk->sk_prot->get_port(sk, snum)) {
+ sk->sk_ipv6only = saved_ipv6only;
inet_reset_saddr(sk);
err = -EADDRINUSE;
goto out;
}
- if (addr_type != IPV6_ADDR_ANY) {
+ if (addr_type != IPV6_ADDR_ANY)
sk->sk_userlocks |= SOCK_BINDADDR_LOCK;
- if (addr_type != IPV6_ADDR_MAPPED)
- sk->sk_ipv6only = 1;
- }
if (snum)
sk->sk_userlocks |= SOCK_BINDPORT_LOCK;
inet->inet_sport = htons(inet->inet_num);
Patches currently in stable-queue which might be from kafai(a)fb.com are
queue-4.14/ipv6-fix-so_reuseport-udp-socket-with-implicit-sk_ipv6only.patch
This is a note to let you know that I've just added the patch titled
ip6mr: fix stale iterator
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
ip6mr-fix-stale-iterator.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:20 PST 2018
From: Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
Date: Wed, 31 Jan 2018 16:29:30 +0200
Subject: ip6mr: fix stale iterator
From: Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
[ Upstream commit 4adfa79fc254efb7b0eb3cd58f62c2c3f805f1ba ]
When we dump the ip6mr mfc entries via proc, we initialize an iterator
with the table to dump but we don't clear the cache pointer which might
be initialized from a prior read on the same descriptor that ended. This
can result in lock imbalance (an unnecessary unlock) leading to other
crashes and hangs. Clear the cache pointer like ipmr does to fix the issue.
Thanks for the reliable reproducer.
Here's syzbot's trace:
WARNING: bad unlock balance detected!
4.15.0-rc3+ #128 Not tainted
syzkaller971460/3195 is trying to release lock (mrt_lock) at:
[<000000006898068d>] ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
but there are no more locks to release!
other info that might help us debug this:
1 lock held by syzkaller971460/3195:
#0: (&p->lock){+.+.}, at: [<00000000744a6565>] seq_read+0xd5/0x13d0
fs/seq_file.c:165
stack backtrace:
CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
print_unlock_imbalance_bug+0x12f/0x140 kernel/locking/lockdep.c:3561
__lock_release kernel/locking/lockdep.c:3775 [inline]
lock_release+0x5f9/0xda0 kernel/locking/lockdep.c:4023
__raw_read_unlock include/linux/rwlock_api_smp.h:225 [inline]
_raw_read_unlock+0x1a/0x30 kernel/locking/spinlock.c:255
ipmr_mfc_seq_stop+0xe1/0x130 net/ipv6/ip6mr.c:553
traverse+0x3bc/0xa00 fs/seq_file.c:135
seq_read+0x96a/0x13d0 fs/seq_file.c:189
proc_reg_read+0xef/0x170 fs/proc/inode.c:217
do_loop_readv_writev fs/read_write.c:673 [inline]
do_iter_read+0x3db/0x5b0 fs/read_write.c:897
compat_readv+0x1bf/0x270 fs/read_write.c:1140
do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
C_SYSC_preadv fs/read_write.c:1209 [inline]
compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
RIP: 0023:0xf7f73c79
RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
BUG: sleeping function called from invalid context at lib/usercopy.c:25
in_atomic(): 1, irqs_disabled(): 0, pid: 3195, name: syzkaller971460
INFO: lockdep is turned off.
CPU: 1 PID: 3195 Comm: syzkaller971460 Not tainted 4.15.0-rc3+ #128
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
___might_sleep+0x2b2/0x470 kernel/sched/core.c:6060
__might_sleep+0x95/0x190 kernel/sched/core.c:6013
__might_fault+0xab/0x1d0 mm/memory.c:4525
_copy_to_user+0x2c/0xc0 lib/usercopy.c:25
copy_to_user include/linux/uaccess.h:155 [inline]
seq_read+0xcb4/0x13d0 fs/seq_file.c:279
proc_reg_read+0xef/0x170 fs/proc/inode.c:217
do_loop_readv_writev fs/read_write.c:673 [inline]
do_iter_read+0x3db/0x5b0 fs/read_write.c:897
compat_readv+0x1bf/0x270 fs/read_write.c:1140
do_compat_preadv64+0xdc/0x100 fs/read_write.c:1189
C_SYSC_preadv fs/read_write.c:1209 [inline]
compat_SyS_preadv+0x3b/0x50 fs/read_write.c:1203
do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
entry_SYSENTER_compat+0x51/0x60 arch/x86/entry/entry_64_compat.S:125
RIP: 0023:0xf7f73c79
RSP: 002b:00000000e574a15c EFLAGS: 00000292 ORIG_RAX: 000000000000014d
RAX: ffffffffffffffda RBX: 000000000000000f RCX: 0000000020a3afb0
RDX: 0000000000000001 RSI: 0000000000000067 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
WARNING: CPU: 1 PID: 3195 at lib/usercopy.c:26 _copy_to_user+0xb5/0xc0
lib/usercopy.c:26
Reported-by: syzbot <bot+eceb3204562c41a438fa1f2335e0fe4f6886d669(a)syzkaller.appspotmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv6/ip6mr.c | 1 +
1 file changed, 1 insertion(+)
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -496,6 +496,7 @@ static void *ipmr_mfc_seq_start(struct s
return ERR_PTR(-ENOENT);
it->mrt = mrt;
+ it->cache = NULL;
return *pos ? ipmr_mfc_seq_idx(net, seq->private, *pos - 1)
: SEQ_START_TOKEN;
}
Patches currently in stable-queue which might be from nikolay(a)cumulusnetworks.com are
queue-4.14/ip6mr-fix-stale-iterator.patch
This is a note to let you know that I've just added the patch titled
tcp_bbr: fix pacing_gain to always be unity when using lt_bw
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp_bbr-fix-pacing_gain-to-always-be-unity-when-using-lt_bw.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:33 PST 2018
From: Neal Cardwell <ncardwell(a)google.com>
Date: Wed, 31 Jan 2018 15:43:05 -0500
Subject: tcp_bbr: fix pacing_gain to always be unity when using lt_bw
From: Neal Cardwell <ncardwell(a)google.com>
[ Upstream commit 3aff3b4b986e51bcf4ab249e5d48d39596e0df6a ]
This commit fixes the pacing_gain to remain at BBR_UNIT (1.0) when
using lt_bw and returning from the PROBE_RTT state to PROBE_BW.
Previously, when using lt_bw, upon exiting PROBE_RTT and entering
PROBE_BW the bbr_reset_probe_bw_mode() code could sometimes randomly
end up with a cycle_idx of 0 and hence have bbr_advance_cycle_phase()
set a pacing gain above 1.0. In such cases this would result in a
pacing rate that is 1.25x higher than intended, potentially resulting
in a high loss rate for a little while until we stop using the lt_bw a
bit later.
This commit is a stable candidate for kernels back as far as 4.9.
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Neal Cardwell <ncardwell(a)google.com>
Signed-off-by: Yuchung Cheng <ycheng(a)google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Reported-by: Beyers Cronje <bcronje(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_bbr.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -481,7 +481,8 @@ static void bbr_advance_cycle_phase(stru
bbr->cycle_idx = (bbr->cycle_idx + 1) & (CYCLE_LEN - 1);
bbr->cycle_mstamp = tp->delivered_mstamp;
- bbr->pacing_gain = bbr_pacing_gain[bbr->cycle_idx];
+ bbr->pacing_gain = bbr->lt_use_bw ? BBR_UNIT :
+ bbr_pacing_gain[bbr->cycle_idx];
}
/* Gain cycling: cycle pacing gain to converge to fair share of available bw. */
@@ -490,8 +491,7 @@ static void bbr_update_cycle_phase(struc
{
struct bbr *bbr = inet_csk_ca(sk);
- if ((bbr->mode == BBR_PROBE_BW) && !bbr->lt_use_bw &&
- bbr_is_next_cycle_phase(sk, rs))
+ if (bbr->mode == BBR_PROBE_BW && bbr_is_next_cycle_phase(sk, rs))
bbr_advance_cycle_phase(sk);
}
Patches currently in stable-queue which might be from ncardwell(a)google.com are
queue-4.15/tcp_bbr-fix-pacing_gain-to-always-be-unity-when-using-lt_bw.patch
This is a note to let you know that I've just added the patch titled
vhost_net: stop device during reset owner
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
vhost_net-stop-device-during-reset-owner.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:33 PST 2018
From: Jason Wang <jasowang(a)redhat.com>
Date: Thu, 25 Jan 2018 22:03:52 +0800
Subject: vhost_net: stop device during reset owner
From: Jason Wang <jasowang(a)redhat.com>
[ Upstream commit 4cd879515d686849eec5f718aeac62a70b067d82 ]
We don't stop device before reset owner, this means we could try to
serve any virtqueue kick before reset dev->worker. This will result a
warn since the work was pending at llist during owner resetting. Fix
this by stopping device during owner reset.
Reported-by: syzbot+eb17c6162478cc50632c(a)syzkaller.appspotmail.com
Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
Signed-off-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/vhost/net.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1208,6 +1208,7 @@ static long vhost_net_reset_owner(struct
}
vhost_net_stop(n, &tx_sock, &rx_sock);
vhost_net_flush(n);
+ vhost_dev_stop(&n->dev);
vhost_dev_reset_owner(&n->dev, umem);
vhost_net_vq_reset(n);
done:
Patches currently in stable-queue which might be from jasowang(a)redhat.com are
queue-4.15/vhost_net-stop-device-during-reset-owner.patch
This is a note to let you know that I've just added the patch titled
tcp: release sk_frag.page in tcp_disconnect
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-release-sk_frag.page-in-tcp_disconnect.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:33 PST 2018
From: Li RongQing <lirongqing(a)baidu.com>
Date: Fri, 26 Jan 2018 16:40:41 +0800
Subject: tcp: release sk_frag.page in tcp_disconnect
From: Li RongQing <lirongqing(a)baidu.com>
[ Upstream commit 9b42d55a66d388e4dd5550107df051a9637564fc ]
socket can be disconnected and gets transformed back to a listening
socket, if sk_frag.page is not released, which will be cloned into
a new socket by sk_clone_lock, but the reference count of this page
is increased, lead to a use after free or double free issue
Signed-off-by: Li RongQing <lirongqing(a)baidu.com>
Cc: Eric Dumazet <edumazet(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2434,6 +2434,12 @@ int tcp_disconnect(struct sock *sk, int
WARN_ON(inet->inet_num && !icsk->icsk_bind_hash);
+ if (sk->sk_frag.page) {
+ put_page(sk->sk_frag.page);
+ sk->sk_frag.page = NULL;
+ sk->sk_frag.offset = 0;
+ }
+
sk->sk_error_report(sk);
return err;
}
Patches currently in stable-queue which might be from lirongqing(a)baidu.com are
queue-4.15/tcp-release-sk_frag.page-in-tcp_disconnect.patch
This is a note to let you know that I've just added the patch titled
soreuseport: fix mem leak in reuseport_add_sock()
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
soreuseport-fix-mem-leak-in-reuseport_add_sock.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:33 PST 2018
From: Eric Dumazet <edumazet(a)google.com>
Date: Fri, 2 Feb 2018 10:27:27 -0800
Subject: soreuseport: fix mem leak in reuseport_add_sock()
From: Eric Dumazet <edumazet(a)google.com>
[ Upstream commit 4db428a7c9ab07e08783e0fcdc4ca0f555da0567 ]
reuseport_add_sock() needs to deal with attaching a socket having
its own sk_reuseport_cb, after a prior
setsockopt(SO_ATTACH_REUSEPORT_?BPF)
Without this fix, not only a WARN_ONCE() was issued, but we were also
leaking memory.
Thanks to sysbot and Eric Biggers for providing us nice C repros.
------------[ cut here ]------------
socket already in reuseport group
WARNING: CPU: 0 PID: 3496 at net/core/sock_reuseport.c:119
reuseport_add_sock+0x742/0x9b0 net/core/sock_reuseport.c:117
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 3496 Comm: syzkaller869503 Not tainted 4.15.0-rc6+ #245
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:53
panic+0x1e4/0x41c kernel/panic.c:183
__warn+0x1dc/0x200 kernel/panic.c:547
report_bug+0x211/0x2d0 lib/bug.c:184
fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
fixup_bug arch/x86/kernel/traps.c:247 [inline]
do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:1079
Fixes: ef456144da8e ("soreuseport: define reuseport groups")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: syzbot+c0ea2226f77a42936bf7(a)syzkaller.appspotmail.com
Acked-by: Craig Gallek <kraig(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/core/sock_reuseport.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -94,6 +94,16 @@ static struct sock_reuseport *reuseport_
return more_reuse;
}
+static void reuseport_free_rcu(struct rcu_head *head)
+{
+ struct sock_reuseport *reuse;
+
+ reuse = container_of(head, struct sock_reuseport, rcu);
+ if (reuse->prog)
+ bpf_prog_destroy(reuse->prog);
+ kfree(reuse);
+}
+
/**
* reuseport_add_sock - Add a socket to the reuseport group of another.
* @sk: New socket to add to the group.
@@ -102,7 +112,7 @@ static struct sock_reuseport *reuseport_
*/
int reuseport_add_sock(struct sock *sk, struct sock *sk2)
{
- struct sock_reuseport *reuse;
+ struct sock_reuseport *old_reuse, *reuse;
if (!rcu_access_pointer(sk2->sk_reuseport_cb)) {
int err = reuseport_alloc(sk2);
@@ -113,10 +123,13 @@ int reuseport_add_sock(struct sock *sk,
spin_lock_bh(&reuseport_lock);
reuse = rcu_dereference_protected(sk2->sk_reuseport_cb,
- lockdep_is_held(&reuseport_lock)),
- WARN_ONCE(rcu_dereference_protected(sk->sk_reuseport_cb,
- lockdep_is_held(&reuseport_lock)),
- "socket already in reuseport group");
+ lockdep_is_held(&reuseport_lock));
+ old_reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
+ lockdep_is_held(&reuseport_lock));
+ if (old_reuse && old_reuse->num_socks != 1) {
+ spin_unlock_bh(&reuseport_lock);
+ return -EBUSY;
+ }
if (reuse->num_socks == reuse->max_socks) {
reuse = reuseport_grow(reuse);
@@ -134,19 +147,11 @@ int reuseport_add_sock(struct sock *sk,
spin_unlock_bh(&reuseport_lock);
+ if (old_reuse)
+ call_rcu(&old_reuse->rcu, reuseport_free_rcu);
return 0;
}
-static void reuseport_free_rcu(struct rcu_head *head)
-{
- struct sock_reuseport *reuse;
-
- reuse = container_of(head, struct sock_reuseport, rcu);
- if (reuse->prog)
- bpf_prog_destroy(reuse->prog);
- kfree(reuse);
-}
-
void reuseport_detach_sock(struct sock *sk)
{
struct sock_reuseport *reuse;
Patches currently in stable-queue which might be from edumazet(a)google.com are
queue-4.15/tcp-release-sk_frag.page-in-tcp_disconnect.patch
queue-4.15/net_sched-get-rid-of-rcu_barrier-in-tcf_block_put_ext.patch
queue-4.15/net-igmp-add-a-missing-rcu-locking-section.patch
queue-4.15/ipv6-change-route-cache-aging-logic.patch
queue-4.15/soreuseport-fix-mem-leak-in-reuseport_add_sock.patch
queue-4.15/revert-defer-call-to-mem_cgroup_sk_alloc.patch
queue-4.15/ipv6-addrconf-break-critical-section-in-addrconf_verify_rtnl.patch
This is a note to let you know that I've just added the patch titled
rocker: fix possible null pointer dereference in rocker_router_fib_event_work
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
rocker-fix-possible-null-pointer-dereference-in-rocker_router_fib_event_work.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:33 PST 2018
From: Jiri Pirko <jiri(a)mellanox.com>
Date: Thu, 1 Feb 2018 12:21:15 +0100
Subject: rocker: fix possible null pointer dereference in rocker_router_fib_event_work
From: Jiri Pirko <jiri(a)mellanox.com>
[ Upstream commit a83165f00f16c0e0ef5b7cec3cbd0d4788699265 ]
Currently, rocker user may experience following null pointer
derefence bug:
[ 3.062141] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
[ 3.065163] IP: rocker_router_fib_event_work+0x36/0x110 [rocker]
The problem is uninitialized rocker->wops pointer that is initialized
only with the first initialized port. So move the port initialization
before registering the fib events.
Fixes: 936bd486564a ("rocker: use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri(a)mellanox.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/rocker/rocker_main.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -2902,6 +2902,12 @@ static int rocker_probe(struct pci_dev *
goto err_alloc_ordered_workqueue;
}
+ err = rocker_probe_ports(rocker);
+ if (err) {
+ dev_err(&pdev->dev, "failed to probe ports\n");
+ goto err_probe_ports;
+ }
+
/* Only FIBs pointing to our own netdevs are programmed into
* the device, so no need to pass a callback.
*/
@@ -2918,22 +2924,16 @@ static int rocker_probe(struct pci_dev *
rocker->hw.id = rocker_read64(rocker, SWITCH_ID);
- err = rocker_probe_ports(rocker);
- if (err) {
- dev_err(&pdev->dev, "failed to probe ports\n");
- goto err_probe_ports;
- }
-
dev_info(&pdev->dev, "Rocker switch with id %*phN\n",
(int)sizeof(rocker->hw.id), &rocker->hw.id);
return 0;
-err_probe_ports:
- unregister_switchdev_notifier(&rocker_switchdev_notifier);
err_register_switchdev_notifier:
unregister_fib_notifier(&rocker->fib_nb);
err_register_fib_notifier:
+ rocker_remove_ports(rocker);
+err_probe_ports:
destroy_workqueue(rocker->rocker_owq);
err_alloc_ordered_workqueue:
free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
@@ -2961,9 +2961,9 @@ static void rocker_remove(struct pci_dev
{
struct rocker *rocker = pci_get_drvdata(pdev);
- rocker_remove_ports(rocker);
unregister_switchdev_notifier(&rocker_switchdev_notifier);
unregister_fib_notifier(&rocker->fib_nb);
+ rocker_remove_ports(rocker);
rocker_write32(rocker, CONTROL, ROCKER_CONTROL_RESET);
destroy_workqueue(rocker->rocker_owq);
free_irq(rocker_msix_vector(rocker, ROCKER_MSIX_VEC_EVENT), rocker);
Patches currently in stable-queue which might be from jiri(a)mellanox.com are
queue-4.15/net_sched-get-rid-of-rcu_barrier-in-tcf_block_put_ext.patch
queue-4.15/rocker-fix-possible-null-pointer-dereference-in-rocker_router_fib_event_work.patch
queue-4.15/net-sched-fix-use-after-free-in-tcf_block_put_ext.patch
This is a note to let you know that I've just added the patch titled
Revert "defer call to mem_cgroup_sk_alloc()"
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
revert-defer-call-to-mem_cgroup_sk_alloc.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:33 PST 2018
From: Roman Gushchin <guro(a)fb.com>
Date: Fri, 2 Feb 2018 15:26:57 +0000
Subject: Revert "defer call to mem_cgroup_sk_alloc()"
From: Roman Gushchin <guro(a)fb.com>
[ Upstream commit edbe69ef2c90fc86998a74b08319a01c508bd497 ]
This patch effectively reverts commit 9f1c2674b328 ("net: memcontrol:
defer call to mem_cgroup_sk_alloc()").
Moving mem_cgroup_sk_alloc() to the inet_csk_accept() completely breaks
memcg socket memory accounting, as packets received before memcg
pointer initialization are not accounted and are causing refcounting
underflow on socket release.
Actually the free-after-use problem was fixed by
commit c0576e397508 ("net: call cgroup_sk_alloc() earlier in
sk_clone_lock()") for the cgroup pointer.
So, let's revert it and call mem_cgroup_sk_alloc() just before
cgroup_sk_alloc(). This is safe, as we hold a reference to the socket
we're cloning, and it holds a reference to the memcg.
Also, let's drop BUG_ON(mem_cgroup_is_root()) check from
mem_cgroup_sk_alloc(). I see no reasons why bumping the root
memcg counter is a good reason to panic, and there are no realistic
ways to hit it.
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/memcontrol.c | 14 ++++++++++++++
net/core/sock.c | 5 +----
net/ipv4/inet_connection_sock.c | 1 -
3 files changed, 15 insertions(+), 5 deletions(-)
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5828,6 +5828,20 @@ void mem_cgroup_sk_alloc(struct sock *sk
if (!mem_cgroup_sockets_enabled)
return;
+ /*
+ * Socket cloning can throw us here with sk_memcg already
+ * filled. It won't however, necessarily happen from
+ * process context. So the test for root memcg given
+ * the current task's memcg won't help us in this case.
+ *
+ * Respecting the original socket's memcg is a better
+ * decision in this case.
+ */
+ if (sk->sk_memcg) {
+ css_get(&sk->sk_memcg->css);
+ return;
+ }
+
rcu_read_lock();
memcg = mem_cgroup_from_task(current);
if (memcg == root_mem_cgroup)
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1675,16 +1675,13 @@ struct sock *sk_clone_lock(const struct
newsk->sk_dst_pending_confirm = 0;
newsk->sk_wmem_queued = 0;
newsk->sk_forward_alloc = 0;
-
- /* sk->sk_memcg will be populated at accept() time */
- newsk->sk_memcg = NULL;
-
atomic_set(&newsk->sk_drops, 0);
newsk->sk_send_head = NULL;
newsk->sk_userlocks = sk->sk_userlocks & ~SOCK_BINDPORT_LOCK;
atomic_set(&newsk->sk_zckey, 0);
sock_reset_flag(newsk, SOCK_DONE);
+ mem_cgroup_sk_alloc(newsk);
cgroup_sk_alloc(&newsk->sk_cgrp_data);
rcu_read_lock();
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -475,7 +475,6 @@ struct sock *inet_csk_accept(struct sock
}
spin_unlock_bh(&queue->fastopenq.lock);
}
- mem_cgroup_sk_alloc(newsk);
out:
release_sock(sk);
if (req)
Patches currently in stable-queue which might be from guro(a)fb.com are
queue-4.15/revert-defer-call-to-mem_cgroup_sk_alloc.patch
This is a note to let you know that I've just added the patch titled
r8169: fix RTL8168EP take too long to complete driver initialization.
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
r8169-fix-rtl8168ep-take-too-long-to-complete-driver-initialization.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Wed Feb 7 11:29:33 PST 2018
From: Chunhao Lin <hau(a)realtek.com>
Date: Wed, 31 Jan 2018 01:32:36 +0800
Subject: r8169: fix RTL8168EP take too long to complete driver initialization.
From: Chunhao Lin <hau(a)realtek.com>
[ Upstream commit 086ca23d03c0d2f4088f472386778d293e15c5f6 ]
Driver check the wrong register bit in rtl_ocp_tx_cond() that keep driver
waiting until timeout.
Fix this by waiting for the right register bit.
Signed-off-by: Chunhao Lin <hau(a)realtek.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/realtek/r8169.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1395,7 +1395,7 @@ DECLARE_RTL_COND(rtl_ocp_tx_cond)
{
void __iomem *ioaddr = tp->mmio_addr;
- return RTL_R8(IBISR0) & 0x02;
+ return RTL_R8(IBISR0) & 0x20;
}
static void rtl8168ep_stop_cmac(struct rtl8169_private *tp)
@@ -1403,7 +1403,7 @@ static void rtl8168ep_stop_cmac(struct r
void __iomem *ioaddr = tp->mmio_addr;
RTL_W8(IBCR2, RTL_R8(IBCR2) & ~0x01);
- rtl_msleep_loop_wait_low(tp, &rtl_ocp_tx_cond, 50, 2000);
+ rtl_msleep_loop_wait_high(tp, &rtl_ocp_tx_cond, 50, 2000);
RTL_W8(IBISR0, RTL_R8(IBISR0) | 0x20);
RTL_W8(IBCR0, RTL_R8(IBCR0) & ~0x01);
}
Patches currently in stable-queue which might be from hau(a)realtek.com are
queue-4.15/r8169-fix-rtl8168ep-take-too-long-to-complete-driver-initialization.patch