This is a note to let you know that I've just added the patch titled
cpufreq: governor: Ensure sufficiently large sampling intervals
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
cpufreq-governor-ensure-sufficiently-large-sampling-intervals.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 56026645e2b6f11ede34a5e6ab69d3eb56f9c8fc Mon Sep 17 00:00:00 2001
From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Date: Mon, 18 Dec 2017 02:15:32 +0100
Subject: cpufreq: governor: Ensure sufficiently large sampling intervals
From: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
commit 56026645e2b6f11ede34a5e6ab69d3eb56f9c8fc upstream.
After commit aa7519af450d (cpufreq: Use transition_delay_us for legacy
governors as well) the sampling_rate field of struct dbs_data may be
less than the tick period which causes dbs_update() to produce
incorrect results, so make the code ensure that the value of that
field will always be sufficiently large.
Fixes: aa7519af450d (cpufreq: Use transition_delay_us for legacy governors as well)
Reported-by: Andy Tang <andy.tang(a)nxp.com>
Reported-by: Doug Smythies <dsmythies(a)telus.net>
Tested-by: Andy Tang <andy.tang(a)nxp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Viresh Kumar <viresh.kumar(a)linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/cpufreq/cpufreq_governor.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -22,6 +22,8 @@
#include "cpufreq_governor.h"
+#define CPUFREQ_DBS_MIN_SAMPLING_INTERVAL (2 * TICK_NSEC / NSEC_PER_USEC)
+
static DEFINE_PER_CPU(struct cpu_dbs_info, cpu_dbs);
static DEFINE_MUTEX(gov_dbs_data_mutex);
@@ -47,11 +49,15 @@ ssize_t store_sampling_rate(struct gov_a
{
struct dbs_data *dbs_data = to_dbs_data(attr_set);
struct policy_dbs_info *policy_dbs;
+ unsigned int sampling_interval;
int ret;
- ret = sscanf(buf, "%u", &dbs_data->sampling_rate);
- if (ret != 1)
+
+ ret = sscanf(buf, "%u", &sampling_interval);
+ if (ret != 1 || sampling_interval < CPUFREQ_DBS_MIN_SAMPLING_INTERVAL)
return -EINVAL;
+ dbs_data->sampling_rate = sampling_interval;
+
/*
* We are operating under dbs_data->mutex and so the list and its
* entries can't be freed concurrently.
@@ -430,7 +436,14 @@ int cpufreq_dbs_governor_init(struct cpu
if (ret)
goto free_policy_dbs_info;
- dbs_data->sampling_rate = cpufreq_policy_transition_delay_us(policy);
+ /*
+ * The sampling interval should not be less than the transition latency
+ * of the CPU and it also cannot be too small for dbs_update() to work
+ * correctly.
+ */
+ dbs_data->sampling_rate = max_t(unsigned int,
+ CPUFREQ_DBS_MIN_SAMPLING_INTERVAL,
+ cpufreq_policy_transition_delay_us(policy));
if (!have_governor_per_policy())
gov->gdbs_data = dbs_data;
Patches currently in stable-queue which might be from rafael.j.wysocki(a)intel.com are
queue-4.14/cpufreq-governor-ensure-sufficiently-large-sampling-intervals.patch
On 01/24/2018 11:07 AM, David Woodhouse wrote:
> On Tue, 2018-01-09 at 22:39 +0100, Daniel Borkmann wrote:
>> On 01/09/2018 07:04 PM, Alexei Starovoitov wrote:
>>>
>>> The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
>>>
>>> A quote from goolge project zero blog:
>>> "At this point, it would normally be necessary to locate gadgets in
>>> the host kernel code that can be used to actually leak data by reading
>>> from an attacker-controlled location, shifting and masking the result
>>> appropriately and then using the result of that as offset to an
>>> attacker-controlled address for a load. But piecing gadgets together
>>> and figuring out which ones work in a speculation context seems annoying.
>>> So instead, we decided to use the eBPF interpreter, which is built into
>>> the host kernel - while there is no legitimate way to invoke it from inside
>>> a VM, the presence of the code in the host kernel's text section is sufficient
>>> to make it usable for the attack, just like with ordinary ROP gadgets."
>>>
>>> To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
>>> option that removes interpreter from the kernel in favor of JIT-only mode.
>>> So far eBPF JIT is supported by:
>>> x64, arm64, arm32, sparc64, s390, powerpc64, mips64
>>>
>>> The start of JITed program is randomized and code page is marked as read-only.
>>> In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
>>>
>>> v2->v3:
>>> - move __bpf_prog_ret0 under ifdef (Daniel)
>>>
>>> v1->v2:
>>> - fix init order, test_bpf and cBPF (Daniel's feedback)
>>> - fix offloaded bpf (Jakub's feedback)
>>> - add 'return 0' dummy in case something can invoke prog->bpf_func
>>> - retarget bpf tree. For bpf-next the patch would need one extra hunk.
>>> It will be sent when the trees are merged back to net-next
>>>
>>> Considered doing:
>>> int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
>>> but it seems better to land the patch as-is and in bpf-next remove
>>> bpf_jit_enable global variable from all JITs, consolidate in one place
>>> and remove this jit_init() function.
>>>
>>> Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
>>
>> Applied to bpf tree, thanks Alexei!
>
> For stable too?
Yes, this will go into stable as well; batch of backports will come Thurs/Fri.
This is a note to let you know that I've just added the patch titled
vmxnet3: repair memory leak
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
vmxnet3-repair-memory-leak.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Jan 29 11:15:06 CET 2018
From: Neil Horman <nhorman(a)tuxdriver.com>
Date: Mon, 22 Jan 2018 16:06:37 -0500
Subject: vmxnet3: repair memory leak
From: Neil Horman <nhorman(a)tuxdriver.com>
[ Upstream commit 848b159835ddef99cc4193083f7e786c3992f580 ]
with the introduction of commit
b0eb57cb97e7837ebb746404c2c58c6f536f23fa, it appears that rq->buf_info
is improperly handled. While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when the mtu of a vmxnet3 interface is changed
frequently.
Fix is pretty straight forward, just move the NULL set to after the
free.
Tested by myself with successful results
Applies to net, and should likely be queued for stable, please
Signed-off-by: Neil Horman <nhorman(a)tuxdriver.com>
Reported-By: boyang(a)redhat.com
CC: boyang(a)redhat.com
CC: Shrikrishna Khare <skhare(a)vmware.com>
CC: "VMware, Inc." <pv-drivers(a)vmware.com>
CC: David S. Miller <davem(a)davemloft.net>
Acked-by: Shrikrishna Khare <skhare(a)vmware.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1420,7 +1420,6 @@ static void vmxnet3_rq_destroy(struct vm
rq->rx_ring[i].basePA);
rq->rx_ring[i].base = NULL;
}
- rq->buf_info[i] = NULL;
}
if (rq->comp_ring.base) {
@@ -1435,6 +1434,7 @@ static void vmxnet3_rq_destroy(struct vm
(rq->rx_ring[0].size + rq->rx_ring[1].size);
dma_free_coherent(&adapter->pdev->dev, sz, rq->buf_info[0],
rq->buf_info_pa);
+ rq->buf_info[0] = rq->buf_info[1] = NULL;
}
}
Patches currently in stable-queue which might be from nhorman(a)tuxdriver.com are
queue-3.18/sctp-do-not-allow-the-v4-socket-to-bind-a-v4mapped-v6-address.patch
queue-3.18/vmxnet3-repair-memory-leak.patch
queue-3.18/sctp-return-error-if-the-asoc-has-been-peeled-off-in-sctp_wait_for_sndbuf.patch
This is a note to let you know that I've just added the patch titled
tcp: __tcp_hdrlen() helper
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp-__tcp_hdrlen-helper.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From d9b3fca27385eafe61c3ca6feab6cb1e7dc77482 Mon Sep 17 00:00:00 2001
From: Craig Gallek <kraig(a)google.com>
Date: Wed, 10 Feb 2016 11:50:37 -0500
Subject: tcp: __tcp_hdrlen() helper
From: Craig Gallek <kraig(a)google.com>
commit d9b3fca27385eafe61c3ca6feab6cb1e7dc77482 upstream.
tcp_hdrlen is wasteful if you already have a pointer to struct tcphdr.
This splits the size calculation into a helper function that can be
used if a struct tcphdr is already available.
Signed-off-by: Craig Gallek <kraig(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/linux/tcp.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -29,9 +29,14 @@ static inline struct tcphdr *tcp_hdr(con
return (struct tcphdr *)skb_transport_header(skb);
}
+static inline unsigned int __tcp_hdrlen(const struct tcphdr *th)
+{
+ return th->doff * 4;
+}
+
static inline unsigned int tcp_hdrlen(const struct sk_buff *skb)
{
- return tcp_hdr(skb)->doff * 4;
+ return __tcp_hdrlen(tcp_hdr(skb));
}
static inline struct tcphdr *inner_tcp_hdr(const struct sk_buff *skb)
Patches currently in stable-queue which might be from kraig(a)google.com are
queue-3.18/tcp-__tcp_hdrlen-helper.patch
This is a note to let you know that I've just added the patch titled
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sctp-return-error-if-the-asoc-has-been-peeled-off-in-sctp_wait_for_sndbuf.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Jan 29 11:15:06 CET 2018
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 15 Jan 2018 17:01:36 +0800
Subject: sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit a0ff660058b88d12625a783ce9e5c1371c87951f ]
After commit cea0cc80a677 ("sctp: use the right sk after waking up from
wait_buf sleep"), it may change to lock another sk if the asoc has been
peeled off in sctp_wait_for_sndbuf.
However, the asoc's new sk could be already closed elsewhere, as it's in
the sendmsg context of the old sk that can't avoid the new sk's closing.
If the sk's last one refcnt is held by this asoc, later on after putting
this asoc, the new sk will be freed, while under it's own lock.
This patch is to revert that commit, but fix the old issue by returning
error under the old sk's lock.
Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep")
Reported-by: syzbot+ac6ea7baa4432811eb50(a)syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Acked-by: Neil Horman <nhorman(a)tuxdriver.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sctp/socket.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -83,7 +83,7 @@
static int sctp_writeable(struct sock *sk);
static void sctp_wfree(struct sk_buff *skb);
static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
- size_t msg_len, struct sock **orig_sk);
+ size_t msg_len);
static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p);
static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
static int sctp_wait_for_accept(struct sock *sk, long timeo);
@@ -1948,7 +1948,7 @@ static int sctp_sendmsg(struct kiocb *io
timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT);
if (!sctp_wspace(asoc)) {
/* sk can be changed by peel off when waiting for buf. */
- err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk);
+ err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len);
if (err) {
if (err == -ESRCH) {
/* asoc is already dead. */
@@ -6981,12 +6981,12 @@ void sctp_sock_rfree(struct sk_buff *skb
/* Helper function to wait for space in the sndbuf. */
static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p,
- size_t msg_len, struct sock **orig_sk)
+ size_t msg_len)
{
struct sock *sk = asoc->base.sk;
- int err = 0;
long current_timeo = *timeo_p;
DEFINE_WAIT(wait);
+ int err = 0;
pr_debug("%s: asoc:%p, timeo:%ld, msg_len:%zu\n", __func__, asoc,
*timeo_p, msg_len);
@@ -7015,17 +7015,13 @@ static int sctp_wait_for_sndbuf(struct s
release_sock(sk);
current_timeo = schedule_timeout(current_timeo);
lock_sock(sk);
- if (sk != asoc->base.sk) {
- release_sock(sk);
- sk = asoc->base.sk;
- lock_sock(sk);
- }
+ if (sk != asoc->base.sk)
+ goto do_error;
*timeo_p = current_timeo;
}
out:
- *orig_sk = sk;
finish_wait(&asoc->wait, &wait);
/* Release the association's refcnt. */
Patches currently in stable-queue which might be from lucien.xin(a)gmail.com are
queue-3.18/sctp-do-not-allow-the-v4-socket-to-bind-a-v4mapped-v6-address.patch
queue-3.18/pppoe-take-needed_headroom-of-lower-device-into-account-on-xmit.patch
queue-3.18/sctp-return-error-if-the-asoc-has-been-peeled-off-in-sctp_wait_for_sndbuf.patch
This is a note to let you know that I've just added the patch titled
sctp: do not allow the v4 socket to bind a v4mapped v6 address
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
sctp-do-not-allow-the-v4-socket-to-bind-a-v4mapped-v6-address.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Jan 29 11:15:06 CET 2018
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 15 Jan 2018 17:02:00 +0800
Subject: sctp: do not allow the v4 socket to bind a v4mapped v6 address
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit c5006b8aa74599ce19104b31d322d2ea9ff887cc ]
The check in sctp_sockaddr_af is not robust enough to forbid binding a
v4mapped v6 addr on a v4 socket.
The worse thing is that v4 socket's bind_verify would not convert this
v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4
socket bound a v6 addr.
This patch is to fix it by doing the common sa.sa_family check first,
then AF_INET check for v4mapped v6 addrs.
Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.")
Reported-by: syzbot+7b7b518b1228d2743963(a)syzkaller.appspotmail.com
Acked-by: Neil Horman <nhorman(a)tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sctp/socket.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -333,16 +333,14 @@ static struct sctp_af *sctp_sockaddr_af(
if (len < sizeof (struct sockaddr))
return NULL;
+ if (!opt->pf->af_supported(addr->sa.sa_family, opt))
+ return NULL;
+
/* V4 mapped address are really of AF_INET family */
if (addr->sa.sa_family == AF_INET6 &&
- ipv6_addr_v4mapped(&addr->v6.sin6_addr)) {
- if (!opt->pf->af_supported(AF_INET, opt))
- return NULL;
- } else {
- /* Does this PF support this AF? */
- if (!opt->pf->af_supported(addr->sa.sa_family, opt))
- return NULL;
- }
+ ipv6_addr_v4mapped(&addr->v6.sin6_addr) &&
+ !opt->pf->af_supported(AF_INET, opt))
+ return NULL;
/* If we get this far, af is valid. */
af = sctp_get_af_specific(addr->sa.sa_family);
Patches currently in stable-queue which might be from lucien.xin(a)gmail.com are
queue-3.18/sctp-do-not-allow-the-v4-socket-to-bind-a-v4mapped-v6-address.patch
queue-3.18/pppoe-take-needed_headroom-of-lower-device-into-account-on-xmit.patch
queue-3.18/sctp-return-error-if-the-asoc-has-been-peeled-off-in-sctp_wait_for_sndbuf.patch
This is a note to let you know that I've just added the patch titled
pppoe: take ->needed_headroom of lower device into account on xmit
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
pppoe-take-needed_headroom-of-lower-device-into-account-on-xmit.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Jan 29 11:15:06 CET 2018
From: Guillaume Nault <g.nault(a)alphalink.fr>
Date: Mon, 22 Jan 2018 18:06:37 +0100
Subject: pppoe: take ->needed_headroom of lower device into account on xmit
From: Guillaume Nault <g.nault(a)alphalink.fr>
[ Upstream commit 02612bb05e51df8489db5e94d0cf8d1c81f87b0c ]
In pppoe_sendmsg(), reserving dev->hard_header_len bytes of headroom
was probably fine before the introduction of ->needed_headroom in
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom").
But now, virtual devices typically advertise the size of their overhead
in dev->needed_headroom, so we must also take it into account in
skb_reserve().
Allocation size of skb is also updated to take dev->needed_tailroom
into account and replace the arbitrary 32 bytes with the real size of
a PPPoE header.
This issue was discovered by syzbot, who connected a pppoe socket to a
gre device which had dev->header_ops->create == ipgre_header and
dev->hard_header_len == 0. Therefore, PPPoE didn't reserve any
headroom, and dev_hard_header() crashed when ipgre_header() tried to
prepend its header to skb->data.
skbuff: skb_under_panic: text:000000001d390b3a len:31 put:24
head:00000000d8ed776f data:000000008150e823 tail:0x7 end:0xc0 dev:gre0
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 3670 Comm: syzkaller801466 Not tainted
4.15.0-rc7-next-20180115+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x162/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801d9bd7840 EFLAGS: 00010282
RAX: 0000000000000083 RBX: ffff8801d4f083c0 RCX: 0000000000000000
RDX: 0000000000000083 RSI: 1ffff1003b37ae92 RDI: ffffed003b37aefc
RBP: ffff8801d9bd78a8 R08: 1ffff1003b37ae8a R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff86200de0
R13: ffffffff84a981ad R14: 0000000000000018 R15: ffff8801d2d34180
FS: 00000000019c4880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000208bc000 CR3: 00000001d9111001 CR4: 00000000001606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
skb_under_panic net/core/skbuff.c:114 [inline]
skb_push+0xce/0xf0 net/core/skbuff.c:1714
ipgre_header+0x6d/0x4e0 net/ipv4/ip_gre.c:879
dev_hard_header include/linux/netdevice.h:2723 [inline]
pppoe_sendmsg+0x58e/0x8b0 drivers/net/ppp/pppoe.c:890
sock_sendmsg_nosec net/socket.c:630 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:640
sock_write_iter+0x31a/0x5d0 net/socket.c:909
call_write_iter include/linux/fs.h:1775 [inline]
do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
do_iter_write+0x154/0x540 fs/read_write.c:932
vfs_writev+0x18a/0x340 fs/read_write.c:977
do_writev+0xfc/0x2a0 fs/read_write.c:1012
SYSC_writev fs/read_write.c:1085 [inline]
SyS_writev+0x27/0x30 fs/read_write.c:1082
entry_SYSCALL_64_fastpath+0x29/0xa0
Admittedly PPPoE shouldn't be allowed to run on non Ethernet-like
interfaces, but reserving space for ->needed_headroom is a more
fundamental issue that needs to be addressed first.
Same problem exists for __pppoe_xmit(), which also needs to take
dev->needed_headroom into account in skb_cow_head().
Fixes: f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")
Reported-by: syzbot+ed0838d0fa4c4f2b528e20286e6dc63effc7c14d(a)syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault(a)alphalink.fr>
Reviewed-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ppp/pppoe.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--- a/drivers/net/ppp/pppoe.c
+++ b/drivers/net/ppp/pppoe.c
@@ -830,6 +830,7 @@ static int pppoe_sendmsg(struct kiocb *i
struct pppoe_hdr *ph;
struct net_device *dev;
char *start;
+ int hlen;
lock_sock(sk);
if (sock_flag(sk, SOCK_DEAD) || !(sk->sk_state & PPPOX_CONNECTED)) {
@@ -848,16 +849,16 @@ static int pppoe_sendmsg(struct kiocb *i
if (total_len > (dev->mtu + dev->hard_header_len))
goto end;
-
- skb = sock_wmalloc(sk, total_len + dev->hard_header_len + 32,
- 0, GFP_KERNEL);
+ hlen = LL_RESERVED_SPACE(dev);
+ skb = sock_wmalloc(sk, hlen + sizeof(*ph) + total_len +
+ dev->needed_tailroom, 0, GFP_KERNEL);
if (!skb) {
error = -ENOMEM;
goto end;
}
/* Reserve space for headers. */
- skb_reserve(skb, dev->hard_header_len);
+ skb_reserve(skb, hlen);
skb_reset_network_header(skb);
skb->dev = dev;
@@ -918,7 +919,7 @@ static int __pppoe_xmit(struct sock *sk,
/* Copy the data if there is no space for the header or if it's
* read-only.
*/
- if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
+ if (skb_cow_head(skb, LL_RESERVED_SPACE(dev) + sizeof(*ph)))
goto abort;
__skb_push(skb, sizeof(*ph));
Patches currently in stable-queue which might be from g.nault(a)alphalink.fr are
queue-3.18/pppoe-take-needed_headroom-of-lower-device-into-account-on-xmit.patch
This is a note to let you know that I've just added the patch titled
net: tcp: close sock if net namespace is exiting
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-tcp-close-sock-if-net-namespace-is-exiting.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Jan 29 10:14:57 CET 2018
From: Dan Streetman <ddstreet(a)ieee.org>
Date: Thu, 18 Jan 2018 16:14:26 -0500
Subject: net: tcp: close sock if net namespace is exiting
From: Dan Streetman <ddstreet(a)ieee.org>
[ Upstream commit 4ee806d51176ba7b8ff1efd81f271d7252e03a1d ]
When a tcp socket is closed, if it detects that its net namespace is
exiting, close immediately and do not wait for FIN sequence.
For normal sockets, a reference is taken to their net namespace, so it will
never exit while the socket is open. However, kernel sockets do not take a
reference to their net namespace, so it may begin exiting while the kernel
socket is still open. In this case if the kernel socket is a tcp socket,
it will stay open trying to complete its close sequence. The sock's dst(s)
hold a reference to their interface, which are all transferred to the
namespace's loopback interface when the real interfaces are taken down.
When the namespace tries to take down its loopback interface, it hangs
waiting for all references to the loopback interface to release, which
results in messages like:
unregister_netdevice: waiting for lo to become free. Usage count = 1
These messages continue until the socket finally times out and closes.
Since the net namespace cleanup holds the net_mutex while calling its
registered pernet callbacks, any new net namespace initialization is
blocked until the current net namespace finishes exiting.
After this change, the tcp socket notices the exiting net namespace, and
closes immediately, releasing its dst(s) and their reference to the
loopback interface, which lets the net namespace continue exiting.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811
Signed-off-by: Dan Streetman <ddstreet(a)canonical.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/net/net_namespace.h | 10 ++++++++++
net/ipv4/tcp.c | 3 +++
net/ipv4/tcp_timer.c | 15 +++++++++++++++
3 files changed, 28 insertions(+)
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -200,6 +200,11 @@ int net_eq(const struct net *net1, const
return net1 == net2;
}
+static inline int check_net(const struct net *net)
+{
+ return atomic_read(&net->count) != 0;
+}
+
void net_drop_ns(void *);
#else
@@ -223,6 +228,11 @@ int net_eq(const struct net *net1, const
{
return 1;
}
+
+static inline int check_net(const struct net *net)
+{
+ return 1;
+}
#define net_drop_ns NULL
#endif
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2182,6 +2182,9 @@ adjudge_to_death:
tcp_send_active_reset(sk, GFP_ATOMIC);
NET_INC_STATS_BH(sock_net(sk),
LINUX_MIB_TCPABORTONMEMORY);
+ } else if (!check_net(sock_net(sk))) {
+ /* Not possible to send reset; just close */
+ tcp_set_state(sk, TCP_CLOSE);
}
}
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -46,11 +46,19 @@ static void tcp_write_err(struct sock *s
* to prevent DoS attacks. It is called when a retransmission timeout
* or zero probe timeout occurs on orphaned socket.
*
+ * Also close if our net namespace is exiting; in that case there is no
+ * hope of ever communicating again since all netns interfaces are already
+ * down (or about to be down), and we need to release our dst references,
+ * which have been moved to the netns loopback interface, so the namespace
+ * can finish exiting. This condition is only possible if we are a kernel
+ * socket, as those do not hold references to the namespace.
+ *
* Criteria is still not confirmed experimentally and may change.
* We kill the socket, if:
* 1. If number of orphaned sockets exceeds an administratively configured
* limit.
* 2. If we have strong memory pressure.
+ * 3. If our net namespace is exiting.
*/
static int tcp_out_of_resources(struct sock *sk, bool do_reset)
{
@@ -79,6 +87,13 @@ static int tcp_out_of_resources(struct s
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONMEMORY);
return 1;
}
+
+ if (!check_net(sock_net(sk))) {
+ /* Not possible to send reset; just close */
+ tcp_done(sk);
+ return 1;
+ }
+
return 0;
}
Patches currently in stable-queue which might be from ddstreet(a)ieee.org are
queue-3.18/net-tcp-close-sock-if-net-namespace-is-exiting.patch
This is a note to let you know that I've just added the patch titled
net: qdisc_pkt_len_init() should be more robust
to the 3.18-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
net-qdisc_pkt_len_init-should-be-more-robust.patch
and it can be found in the queue-3.18 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Jan 29 11:15:06 CET 2018
From: Eric Dumazet <edumazet(a)google.com>
Date: Thu, 18 Jan 2018 19:59:19 -0800
Subject: net: qdisc_pkt_len_init() should be more robust
From: Eric Dumazet <edumazet(a)google.com>
[ Upstream commit 7c68d1a6b4db9012790af7ac0f0fdc0d2083422a ]
Without proper validation of DODGY packets, we might very well
feed qdisc_pkt_len_init() with invalid GSO packets.
tcp_hdrlen() might access out-of-bound data, so let's use
skb_header_pointer() and proper checks.
Whole story is described in commit d0c081b49137 ("flow_dissector:
properly cap thoff field")
We have the goal of validating DODGY packets earlier in the stack,
so we might very well revert this fix in the future.
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Cc: Willem de Bruijn <willemb(a)google.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Reported-by: syzbot+9da69ebac7dddd804552(a)syzkaller.appspotmail.com
Acked-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/core/dev.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2772,10 +2772,21 @@ static void qdisc_pkt_len_init(struct sk
hdr_len = skb_transport_header(skb) - skb_mac_header(skb);
/* + transport layer */
- if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
- hdr_len += tcp_hdrlen(skb);
- else
- hdr_len += sizeof(struct udphdr);
+ if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
+ const struct tcphdr *th;
+ struct tcphdr _tcphdr;
+
+ th = skb_header_pointer(skb, skb_transport_offset(skb),
+ sizeof(_tcphdr), &_tcphdr);
+ if (likely(th))
+ hdr_len += __tcp_hdrlen(th);
+ } else {
+ struct udphdr _udphdr;
+
+ if (skb_header_pointer(skb, skb_transport_offset(skb),
+ sizeof(_udphdr), &_udphdr))
+ hdr_len += sizeof(struct udphdr);
+ }
if (shinfo->gso_type & SKB_GSO_DODGY)
gso_segs = DIV_ROUND_UP(skb->len - hdr_len,
Patches currently in stable-queue which might be from edumazet(a)google.com are
queue-3.18/ipv6-fix-udpv6-sendmsg-crash-caused-by-too-small-mtu.patch
queue-3.18/dccp-don-t-restart-ccid2_hc_tx_rto_expire-if-sk-in-closed-state.patch
queue-3.18/netfilter-restart-search-if-moved-to-other-chain.patch
queue-3.18/net-qdisc_pkt_len_init-should-be-more-robust.patch