The following patches are targeted at 4.19 stable tree.
Thanks!
Alexei Starovoitov (2):
bpf: improve verifier branch analysis
bpf: add per-insn complexity limit
Daniel Borkmann (10):
bpf: move {prev_,}insn_idx into verifier env
bpf: move tmp variable into ax register in interpreter
bpf: enable access to ax register also from verifier rewrite
bpf: restrict map value pointer arithmetic for unprivileged
bpf: restrict stack pointer arithmetic for unprivileged
bpf: restrict unknown scalars of mixed signed bounds for unprivileged
bpf: fix check_map_access smin_value test when pointer contains offset
bpf: prevent out of bounds speculation on pointer arithmetic
bpf: fix sanitation of alu op with pointer / scalar type from
different paths
bpf: fix inner map masking to prevent oob under speculation
include/linux/bpf_verifier.h | 13 +
include/linux/filter.h | 10 +-
kernel/bpf/core.c | 54 ++--
kernel/bpf/map_in_map.c | 17 +-
kernel/bpf/verifier.c | 470 +++++++++++++++++++++++++++++------
5 files changed, 463 insertions(+), 101 deletions(-)
--
2.17.1
I run a qemu/kvm VM with debian and I've started getting segfaults and failing checksums on
downloaded files. The failures are undeterministic and similar to the failures you get with
bad ram. I tried to diagnose the problem with various testing tools and found that
"stress-ng --verify --cpu 1" always give an error. Stress-ng give one of these errors
usually within 60 sec:
stress-ng-cpu: Newton-Rapshon sqrt not accurate enough
stress-ng-cpu: prime error detected, number of primes between 0 and 1000000 miscalculated
Nothing relevant has changed recently in the VM but the host kernel was upgraded from
4.14.93 to 4.14.96. I can't reproduce the stress-ng error with a 4.14.93 host kernel. There
is only one kvm related change in that range so I tried to revert that one.
By reverting commit 4124a4cff344abbf8187775eb643d9827830e715
"x86,kvm: move qemu/guest FPU switching out to vcpu_run" on kernel 4.14.96 I can't reproduce
the stress-ng error and I have no segfault or other problems with the guest.
The commit was originally introduced in v4.15-rc3 (Nov 14 2017) and was only recently
backported to 4.14. The other stable kernels before 4.14 didn't get any backport so it looks
like a broken 4.14 backport. That backport also cause problems for other people.
https://bugzilla.kernel.org/show_bug.cgi?id=202419
I've rebooted between the different kernels and rebooted the VM enough to be reasonably sure
that commit is the problem. Stress-ng never lasts more than 10 min with that commit but works
for hours without it.
Steps to reproduce would be to create a qemu/kvm VM with debian stretch, install stress-ng
version 0.07.16 and run "stress-ng --verify --cpu 1".
Here is the qemu-3.1.0 commandline generated by libvirt:
/usr/bin/qemu-system-x86_64 -name guest=debian,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-debian/master-key.aes
-machine pc-i440fx-2.4,accel=kvm,usb=off,dump-guest-core=off -cpu Haswell-noTSX -m 2048
-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid
0473ded4-d417-4b0e-a4f5-36ba5a2cd675 -no-user-config -nodefaults -chardev
socket,id=charmonitor,fd=21,server,nowait -mon chardev=charmonitor,id=monitor,mode=control
-rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown
-global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device
ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device
ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive
if=none,id=drive-ide0-0-1,readonly=on -device
ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1,bootindex=2 -drive
file=/mnt/gemini.61rn.3T/Backups/debian.raw,format=raw,if=none,id=drive-virtio-disk0 -device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-netdev tap,fd=23,id=hostnet0 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 -spice
port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device AC97,id=sound0,bus=pci.0,addr=0x7
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -object
rng-random,id=objrng0,filename=/dev/random -device
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
My host kernel .config is big so I put it in a paste: http://sprunge.us/u7YNBt
The patch titled
Subject: mm: migrate: don't rely on PageMovable() of newpage after unlocking it
has been removed from the -mm tree. Its filename was
mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it.patch
This patch was dropped because it is obsolete
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm: migrate: don't rely on PageMovable() of newpage after unlocking it
While debugging some crashes related to virtio-balloon deflation that
happened under the old balloon migration code, I stumbled over a race that
still exists today.
What we experienced:
drivers/virtio/virtio_balloon.c:release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Turns out after having added the page to a local list when dequeuing, the
page would suddenly be moved to an LRU list before we would free it via
the local list, corrupting both lists. So a page we own and that is !LRU
was moved to an LRU list.
In __unmap_and_move(), we lock the old and newpage and perform the
migration. In case of vitio-balloon, the new page will become movable,
the old page will no longer be movable.
However, after unlocking newpage, there is nothing stopping the newpage
from getting dequeued and freed by virtio-balloon. This
will result in the newpage
1. No longer having PageMovable()
2. Getting moved to the local list before finally freeing it (using
page->lru)
Back in the migration thread in __unmap_and_move(), we would after
unlocking the newpage suddenly no longer have PageMovable(newpage) and
will therefore call putback_lru_page(newpage), modifying page->lru
although that list is still in use by virtio-balloon.
To summarize, we have a race between migrating the newpage and checking
for PageMovable(newpage). Instead of checking PageMovable(newpage), we
can simply rely on is_lru of the original page.
Looks like this was introduced by d6d86c0a7f8d ("mm/balloon_compaction:
redesign ballooned pages management"), which was backported up to 3.12.
Old compaction code used PageBalloon() via -_is_movable_balloon_page()
instead of PageMovable(), however with the same semantics.
Link: http://lkml.kernel.org/r/20190128160403.16657-1-david@redhat.com
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Vratislav Bendel <vbendel(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Rafael Aquini <aquini(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Konstantin Khlebnikov <k.khlebnikov(a)samsung.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mm-migrate-dont-rely-on-pagemovable-of-newpage-after-unlocking-it
+++ a/mm/migrate.c
@@ -1135,10 +1135,12 @@ out:
* If migration is successful, decrease refcount of the newpage
* which will not free the page because new page owner increased
* refcounter. As well, if it is LRU page, add the page to LRU
- * list in here.
+ * list in here. Don't rely on PageMovable(newpage), as that could
+ * already have changed after unlocking newpage (e.g.
+ * virtio-balloon deflation).
*/
if (rc == MIGRATEPAGE_SUCCESS) {
- if (unlikely(__PageMovable(newpage)))
+ if (unlikely(!is_lru))
put_page(newpage);
else
putback_lru_page(newpage);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-balloon-update-comment-about-isolation-migration-compaction.patch
mm-convert-pg_balloon-to-pg_offline.patch
kexec-export-pg_offline-to-vmcoreinfo.patch
xen-balloon-mark-inflated-pages-pg_offline.patch
hv_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline-v2.patch
pm-hibernate-use-pfn_to_online_page.patch
pm-hibernate-exclude-all-pageoffline-pages.patch
pm-hibernate-exclude-all-pageoffline-pages-v2.patch
Hi Greg,
Can you please revert this commit in 4.14?
commit e65cd9a20343ea90f576c24c38ee85ab6e7d5fec
Author: Tycho Andersen <tycho(a)tycho.ws>
Date: Tue Feb 20 19:47:47 2018 -0700
seccomp: add a selftest for get_metadata
[ Upstream commit d057dc4e35e16050befa3dda943876dab39cbf80 ]
Let's test that we get the flags correctly, and that we preserve
the filter
index across the ptrace(PTRACE_SECCOMP_GET_METADATA) correctly.
PTRACE_SECCOMP_GET_METADATA was only added in 4.16
(26500475ac1b499d8636ff281311d633909f5d20)
And it's also breaking seccomp_bpf.c compilation for me:
seccomp_bpf.c: In function ‘get_metadata’:
seccomp_bpf.c:2878:26: error: storage size of ‘md’ isn’t known
struct seccomp_metadata md;
^~
-Tommi
Hi Greg,
Can you please pick these two upstream patches to 4.14?
They fix broken perf unwinding for me.
commit 3d20c6246690219881786de10d2dda93f616d0ac
Author: Martin Vuille <
jpmv27(a)aim.com>
Date: Sun Feb 11 16:24:20 2018 -0500
perf unwind: Unwind with libdw doesn't take symfs into account
commit 1fe627da30331024f453faef04d500079b901107
Author: Milian Wolff <
milian.wolff(a)kdab.com>
Date: Mon Oct 29 15:16:44 2018 +0100
perf unwind: Take pgoff into account when reporting elf to libdwfl
-Tommi
From: "Gustavo A. R. Silva" <gustavo(a)embeddedor.com>
[ Upstream commit a37805098900a6e73a55b3a43b7d3bcd987bb3f4 ]
idx can be indirectly controlled by user-space, hence leading to a
potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/gpu/drm/drm_bufs.c:1420 drm_legacy_freebufs() warn: potential
spectre issue 'dma->buflist' [r] (local cap)
Fix this by sanitizing idx before using it to index dma->buflist
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Signed-off-by: Gustavo A. R. Silva <gustavo(a)embeddedor.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20181016095549.GA23586@embedd…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/drm_bufs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index 7412acaf3cde..d7d10cabb9bb 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -36,6 +36,8 @@
#include <drm/drmP.h>
#include "drm_legacy.h"
+#include <linux/nospec.h>
+
static struct drm_map_list *drm_find_matching_map(struct drm_device *dev,
struct drm_local_map *map)
{
@@ -1417,6 +1419,7 @@ int drm_legacy_freebufs(struct drm_device *dev, void *data,
idx, dma->buf_count - 1);
return -EINVAL;
}
+ idx = array_index_nospec(idx, dma->buf_count);
buf = dma->buflist[idx];
if (buf->file_priv != file_priv) {
DRM_ERROR("Process %d freeing buffer not owned\n",
--
2.19.1
The patch titled
Subject: Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
has been added to the -mm tree. Its filename is
revert-mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/revert-mm-memory_hotplug-initializ…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/revert-mm-memory_hotplug-initializ…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: Revert "mm, memory_hotplug: initialize struct pages for the full memory section"
This reverts 2830bf6f05fb3e05b ("mm, memory_hotplug: initialize struct
pages for the full memory section").
The underlying assumption that one sparse section belongs into a single
numa node doesn't hold really. Robert Shteynfeld has reported a boot
failure. The boot log was not captured but his memory layout is as
follows:
[ 0.286954] Early memory node ranges
[ 0.286955] node 1: [mem 0x0000000000001000-0x0000000000090fff]
[ 0.286955] node 1: [mem 0x0000000000100000-0x00000000dbdf8fff]
[ 0.286956] node 1: [mem 0x0000000100000000-0x0000001423ffffff]
[ 0.286956] node 0: [mem 0x0000001424000000-0x0000002023ffffff]
This means that node0 starts in the middle of a memory section which is
also in node1. memmap_init_zone tries to initialize padding of a section
even when it is outside of the given pfn range because there are code
paths (e.g. memory hotplug) which assume that the full worth of memory
section is always initialized. In this particular case, though, such a
range is already intialized and most likely already managed by the page
allocator. Scribbling over those pages corrupts the internal state and
likely blows up when any of those pages gets used.
Link: http://lkml.kernel.org/r/20190125181549.GE20411@dhcp22.suse.cz
Fixes: 2830bf6f05fb ("mm, memory_hotplug: initialize struct pages for the full memory section")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Robert Shteynfeld <robert.shteynfeld(a)gmail.com>
Cc: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer(a)de.ibm.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov(a)gmail.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
Cc: Pasha Tatashin <Pavel.Tatashin(a)microsoft.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 12 ------------
1 file changed, 12 deletions(-)
--- a/mm/page_alloc.c~revert-mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section
+++ a/mm/page_alloc.c
@@ -5701,18 +5701,6 @@ void __meminit memmap_init_zone(unsigned
cond_resched();
}
}
-#ifdef CONFIG_SPARSEMEM
- /*
- * If the zone does not span the rest of the section then
- * we should at least initialize those pages. Otherwise we
- * could blow up on a poisoned page in some paths which depend
- * on full sections being initialized (e.g. memory hotplug).
- */
- while (end_pfn % PAGES_PER_SECTION) {
- __init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
- end_pfn++;
- }
-#endif
}
#ifdef CONFIG_ZONE_DEVICE
_
Patches currently in -mm which might be from mhocko(a)suse.com are
mm-memory_hotplug-is_mem_section_removable-do-not-pass-the-end-of-a-zone.patch
revert-mm-memory_hotplug-initialize-struct-pages-for-the-full-memory-section.patch
mm-oom-marks-all-killed-tasks-as-oom-victims.patch
memcg-do-not-report-racy-no-eligible-oom-tasks.patch
The patch titled
Subject: mm: migrate: make buffer_migrate_page_norefs() actually succeed
has been added to the -mm tree. Its filename is
mm-migrate-make-buffer_migrate_page_norefs-actually-succeed.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-make-buffer_migrate_pag…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-make-buffer_migrate_pag…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: mm: migrate: make buffer_migrate_page_norefs() actually succeed
Currently, buffer_migrate_page_norefs() was constantly failing because
buffer_migrate_lock_buffers() grabbed reference on each buffer. In fact,
there's no reason for buffer_migrate_lock_buffers() to grab any buffer
references as the page is locked during all our operation and thus nobody
can reclaim buffers from the page. So remove grabbing of buffer
references which also makes buffer_migrate_page_norefs() succeed.
Link: http://lkml.kernel.org/r/20190116131217.7226-1-jack@suse.cz
Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack(a)suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: Pavel Machek <pavel(a)ucw.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 5 -----
1 file changed, 5 deletions(-)
--- a/mm/migrate.c~mm-migrate-make-buffer_migrate_page_norefs-actually-succeed
+++ a/mm/migrate.c
@@ -709,7 +709,6 @@ static bool buffer_migrate_lock_buffers(
/* Simple case, sync compaction */
if (mode != MIGRATE_ASYNC) {
do {
- get_bh(bh);
lock_buffer(bh);
bh = bh->b_this_page;
@@ -720,18 +719,15 @@ static bool buffer_migrate_lock_buffers(
/* async case, we cannot block on lock_buffer so use trylock_buffer */
do {
- get_bh(bh);
if (!trylock_buffer(bh)) {
/*
* We failed to lock the buffer and cannot stall in
* async migration. Release the taken locks
*/
struct buffer_head *failed_bh = bh;
- put_bh(failed_bh);
bh = head;
while (bh != failed_bh) {
unlock_buffer(bh);
- put_bh(bh);
bh = bh->b_this_page;
}
return false;
@@ -818,7 +814,6 @@ unlock_buffers:
bh = head;
do {
unlock_buffer(bh);
- put_bh(bh);
bh = bh->b_this_page;
} while (bh != head);
_
Patches currently in -mm which might be from jack(a)suse.cz are
mm-migrate-make-buffer_migrate_page_norefs-actually-succeed.patch