The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 951531691c4bcaa59f56a316e018bc2ff1ddf855 Mon Sep 17 00:00:00 2001
From: "Isaac J. Manjarres" <isaacm(a)codeaurora.org>
Date: Tue, 13 Aug 2019 15:37:37 -0700
Subject: [PATCH] mm/usercopy: use memory range to be accessed for wraparound
check
Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the
range of addresses that will be accessed is [ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the memory
address, when trying to read 4 KB from memory that is mapped to the the
last possible page in the virtual address space, when in fact, accessing
that range of memory would not cause a wraparound to occur.
Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to
wrap around.
Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeauror…
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Trilok Soni <tsoni(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 2a09796edef8..98e924864554 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -147,7 +147,7 @@ static inline void check_bogus_address(const unsigned long ptr, unsigned long n,
bool to_user)
{
/* Reject if object wraps past end of memory. */
- if (ptr + n < ptr)
+ if (ptr + (n - 1) < ptr)
usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
/* Reject if NULL or ZERO-allocation. */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 951531691c4bcaa59f56a316e018bc2ff1ddf855 Mon Sep 17 00:00:00 2001
From: "Isaac J. Manjarres" <isaacm(a)codeaurora.org>
Date: Tue, 13 Aug 2019 15:37:37 -0700
Subject: [PATCH] mm/usercopy: use memory range to be accessed for wraparound
check
Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the
range of addresses that will be accessed is [ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the memory
address, when trying to read 4 KB from memory that is mapped to the the
last possible page in the virtual address space, when in fact, accessing
that range of memory would not cause a wraparound to occur.
Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to
wrap around.
Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeauror…
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Trilok Soni <tsoni(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/usercopy.c b/mm/usercopy.c
index 2a09796edef8..98e924864554 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -147,7 +147,7 @@ static inline void check_bogus_address(const unsigned long ptr, unsigned long n,
bool to_user)
{
/* Reject if object wraps past end of memory. */
- if (ptr + n < ptr)
+ if (ptr + (n - 1) < ptr)
usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
/* Reject if NULL or ZERO-allocation. */
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6051d3bd3b91e96c59e62b8be2dba1cc2b19ee40 Mon Sep 17 00:00:00 2001
From: Henry Burns <henryburns(a)google.com>
Date: Tue, 13 Aug 2019 15:37:21 -0700
Subject: [PATCH] mm/z3fold.c: fix z3fold_destroy_pool() ordering
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.
If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:
z3fold_destroy_pool()
destroy_workqueue(pool->release_wq)
destroy_workqueue(pool->compact_wq)
drain_workqueue(pool->compact_wq)
do_compact_page(zhdr)
kref_put(&zhdr->refcount)
__release_z3fold_page(zhdr, ...)
queue_work_on(pool->release_wq, &pool->work) *BOOM*
So compact_wq needs to be destroyed before release_wq.
Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Jonathan Adams <jwadams(a)google.com>
Cc: Vitaly Vul <vitaly.vul(a)sony.com>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/z3fold.c b/mm/z3fold.c
index 1a029a7432ee..43de92f52961 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -818,8 +818,15 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
{
kmem_cache_destroy(pool->c_handle);
z3fold_unregister_migration(pool);
- destroy_workqueue(pool->release_wq);
+
+ /*
+ * We need to destroy pool->compact_wq before pool->release_wq,
+ * as any pending work on pool->compact_wq will call
+ * queue_work(pool->release_wq, &pool->work).
+ */
+
destroy_workqueue(pool->compact_wq);
+ destroy_workqueue(pool->release_wq);
kfree(pool);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6051d3bd3b91e96c59e62b8be2dba1cc2b19ee40 Mon Sep 17 00:00:00 2001
From: Henry Burns <henryburns(a)google.com>
Date: Tue, 13 Aug 2019 15:37:21 -0700
Subject: [PATCH] mm/z3fold.c: fix z3fold_destroy_pool() ordering
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.
If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:
z3fold_destroy_pool()
destroy_workqueue(pool->release_wq)
destroy_workqueue(pool->compact_wq)
drain_workqueue(pool->compact_wq)
do_compact_page(zhdr)
kref_put(&zhdr->refcount)
__release_z3fold_page(zhdr, ...)
queue_work_on(pool->release_wq, &pool->work) *BOOM*
So compact_wq needs to be destroyed before release_wq.
Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Jonathan Adams <jwadams(a)google.com>
Cc: Vitaly Vul <vitaly.vul(a)sony.com>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/z3fold.c b/mm/z3fold.c
index 1a029a7432ee..43de92f52961 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -818,8 +818,15 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
{
kmem_cache_destroy(pool->c_handle);
z3fold_unregister_migration(pool);
- destroy_workqueue(pool->release_wq);
+
+ /*
+ * We need to destroy pool->compact_wq before pool->release_wq,
+ * as any pending work on pool->compact_wq will call
+ * queue_work(pool->release_wq, &pool->work).
+ */
+
destroy_workqueue(pool->compact_wq);
+ destroy_workqueue(pool->release_wq);
kfree(pool);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a53190a4aaa36494f4d7209fd1fcc6f2ee08e0e0 Mon Sep 17 00:00:00 2001
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Date: Tue, 13 Aug 2019 15:37:18 -0700
Subject: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped
correctly in mbind
When running syzkaller internally, we ran into the below bug on 4.9.x
kernel:
kernel BUG at mm/huge_memory.c:2124!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
task: ffff880067b34900 task.stack: ffff880068998000
RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
Call Trace:
split_huge_page include/linux/huge_mm.h:100 [inline]
queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538
walk_pmd_range mm/pagewalk.c:50 [inline]
walk_pud_range mm/pagewalk.c:90 [inline]
walk_pgd_range mm/pagewalk.c:116 [inline]
__walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208
walk_page_range+0x154/0x370 mm/pagewalk.c:285
queue_pages_range+0x115/0x150 mm/mempolicy.c:694
do_mbind mm/mempolicy.c:1241 [inline]
SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370
SyS_mbind+0x46/0x60 mm/mempolicy.c:1352
do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282
entry_SYSCALL_64_after_swapgs+0x5d/0xdb
Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c
RIP [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
RSP <ffff88006899f980>
with the below test:
uint64_t r[1] = {0xffffffffffffffff};
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
intptr_t res = 0;
res = syscall(__NR_socket, 0x11, 3, 0x300);
if (res != -1)
r[0] = res;
*(uint32_t*)0x20000040 = 0x10000;
*(uint32_t*)0x20000044 = 1;
*(uint32_t*)0x20000048 = 0xc520;
*(uint32_t*)0x2000004c = 1;
syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10);
syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0);
*(uint64_t*)0x20000340 = 2;
syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3);
return 0;
}
Actually the test does:
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
socket(AF_PACKET, SOCK_RAW, 768) = 3
setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0
mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000
mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0
The setsockopt() would allocate compound pages (16 pages in this test)
for packet tx ring, then the mmap() would call packet_mmap() to map the
pages into the user address space specified by the mmap() call.
When calling mbind(), it would scan the vma to queue the pages for
migration to the new node. It would split any huge page since 4.9
doesn't support THP migration, however, the packet tx ring compound
pages are not THP and even not movable. So, the above bug is triggered.
However, the later kernel is not hit by this issue due to commit
d44d363f6578 ("mm: don't assume anonymous pages have SwapBacked flag"),
which just removes the PageSwapBacked check for a different reason.
But, there is a deeper issue. According to the semantic of mbind(), it
should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and
MPOL_MF_STRICT was also specified, but the kernel was unable to move all
existing pages in the range. The tx ring of the packet socket is
definitely not movable, however, mbind() returns success for this case.
Although the most socket file associates with non-movable pages, but XDP
may have movable pages from gup. So, it sounds not fine to just check
the underlying file type of vma in vma_migratable().
Change migrate_page_add() to check if the page is movable or not, if it
is unmovable, just return -EIO. But do not abort pte walk immediately,
since there may be pages off LRU temporarily. We should migrate other
pages if MPOL_MF_MOVE* is specified. Set has_unmovable flag if some
paged could not be not moved, then return -EIO for mbind() eventually.
With this change the above test would return -EIO as expected.
[yang.shi(a)linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 932c26845e3e..547cd403ed02 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -403,7 +403,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
},
};
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags);
struct queue_pages {
@@ -463,12 +463,11 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
flags = qp->flags;
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- if (!vma_migratable(walk->vma)) {
+ if (!vma_migratable(walk->vma) ||
+ migrate_page_add(page, qp->pagelist, flags)) {
ret = 1;
goto unlock;
}
-
- migrate_page_add(page, qp->pagelist, flags);
} else
ret = -EIO;
unlock:
@@ -532,7 +531,14 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
has_unmovable = true;
break;
}
- migrate_page_add(page, qp->pagelist, flags);
+
+ /*
+ * Do not abort immediately since there may be
+ * temporary off LRU pages in the range. Still
+ * need migrate other LRU pages.
+ */
+ if (migrate_page_add(page, qp->pagelist, flags))
+ has_unmovable = true;
} else
break;
}
@@ -961,7 +967,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
/*
* page migration, thp tail pages can be passed.
*/
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
struct page *head = compound_head(page);
@@ -974,8 +980,19 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
mod_node_page_state(page_pgdat(head),
NR_ISOLATED_ANON + page_is_file_cache(head),
hpage_nr_pages(head));
+ } else if (flags & MPOL_MF_STRICT) {
+ /*
+ * Non-movable page may reach here. And, there may be
+ * temporary off LRU pages or non-LRU movable pages.
+ * Treat them as unmovable pages since they can't be
+ * isolated, so they can't be moved at the moment. It
+ * should return -EIO for this case too.
+ */
+ return -EIO;
}
}
+
+ return 0;
}
/* page allocation callback for NUMA node migration */
@@ -1178,9 +1195,10 @@ static struct page *new_page(struct page *page, unsigned long start)
}
#else
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
+ return -EIO;
}
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a53190a4aaa36494f4d7209fd1fcc6f2ee08e0e0 Mon Sep 17 00:00:00 2001
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Date: Tue, 13 Aug 2019 15:37:18 -0700
Subject: [PATCH] mm: mempolicy: handle vma with unmovable pages mapped
correctly in mbind
When running syzkaller internally, we ran into the below bug on 4.9.x
kernel:
kernel BUG at mm/huge_memory.c:2124!
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
task: ffff880067b34900 task.stack: ffff880068998000
RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
Call Trace:
split_huge_page include/linux/huge_mm.h:100 [inline]
queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538
walk_pmd_range mm/pagewalk.c:50 [inline]
walk_pud_range mm/pagewalk.c:90 [inline]
walk_pgd_range mm/pagewalk.c:116 [inline]
__walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208
walk_page_range+0x154/0x370 mm/pagewalk.c:285
queue_pages_range+0x115/0x150 mm/mempolicy.c:694
do_mbind mm/mempolicy.c:1241 [inline]
SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370
SyS_mbind+0x46/0x60 mm/mempolicy.c:1352
do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282
entry_SYSCALL_64_after_swapgs+0x5d/0xdb
Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c
RIP [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
RSP <ffff88006899f980>
with the below test:
uint64_t r[1] = {0xffffffffffffffff};
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
intptr_t res = 0;
res = syscall(__NR_socket, 0x11, 3, 0x300);
if (res != -1)
r[0] = res;
*(uint32_t*)0x20000040 = 0x10000;
*(uint32_t*)0x20000044 = 1;
*(uint32_t*)0x20000048 = 0xc520;
*(uint32_t*)0x2000004c = 1;
syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10);
syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0);
*(uint64_t*)0x20000340 = 2;
syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3);
return 0;
}
Actually the test does:
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
socket(AF_PACKET, SOCK_RAW, 768) = 3
setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0
mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000
mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0
The setsockopt() would allocate compound pages (16 pages in this test)
for packet tx ring, then the mmap() would call packet_mmap() to map the
pages into the user address space specified by the mmap() call.
When calling mbind(), it would scan the vma to queue the pages for
migration to the new node. It would split any huge page since 4.9
doesn't support THP migration, however, the packet tx ring compound
pages are not THP and even not movable. So, the above bug is triggered.
However, the later kernel is not hit by this issue due to commit
d44d363f6578 ("mm: don't assume anonymous pages have SwapBacked flag"),
which just removes the PageSwapBacked check for a different reason.
But, there is a deeper issue. According to the semantic of mbind(), it
should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and
MPOL_MF_STRICT was also specified, but the kernel was unable to move all
existing pages in the range. The tx ring of the packet socket is
definitely not movable, however, mbind() returns success for this case.
Although the most socket file associates with non-movable pages, but XDP
may have movable pages from gup. So, it sounds not fine to just check
the underlying file type of vma in vma_migratable().
Change migrate_page_add() to check if the page is movable or not, if it
is unmovable, just return -EIO. But do not abort pte walk immediately,
since there may be pages off LRU temporarily. We should migrate other
pages if MPOL_MF_MOVE* is specified. Set has_unmovable flag if some
paged could not be not moved, then return -EIO for mbind() eventually.
With this change the above test would return -EIO as expected.
[yang.shi(a)linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 932c26845e3e..547cd403ed02 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -403,7 +403,7 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
},
};
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags);
struct queue_pages {
@@ -463,12 +463,11 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
flags = qp->flags;
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- if (!vma_migratable(walk->vma)) {
+ if (!vma_migratable(walk->vma) ||
+ migrate_page_add(page, qp->pagelist, flags)) {
ret = 1;
goto unlock;
}
-
- migrate_page_add(page, qp->pagelist, flags);
} else
ret = -EIO;
unlock:
@@ -532,7 +531,14 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
has_unmovable = true;
break;
}
- migrate_page_add(page, qp->pagelist, flags);
+
+ /*
+ * Do not abort immediately since there may be
+ * temporary off LRU pages in the range. Still
+ * need migrate other LRU pages.
+ */
+ if (migrate_page_add(page, qp->pagelist, flags))
+ has_unmovable = true;
} else
break;
}
@@ -961,7 +967,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask,
/*
* page migration, thp tail pages can be passed.
*/
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
struct page *head = compound_head(page);
@@ -974,8 +980,19 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
mod_node_page_state(page_pgdat(head),
NR_ISOLATED_ANON + page_is_file_cache(head),
hpage_nr_pages(head));
+ } else if (flags & MPOL_MF_STRICT) {
+ /*
+ * Non-movable page may reach here. And, there may be
+ * temporary off LRU pages or non-LRU movable pages.
+ * Treat them as unmovable pages since they can't be
+ * isolated, so they can't be moved at the moment. It
+ * should return -EIO for this case too.
+ */
+ return -EIO;
}
}
+
+ return 0;
}
/* page allocation callback for NUMA node migration */
@@ -1178,9 +1195,10 @@ static struct page *new_page(struct page *page, unsigned long start)
}
#else
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
+ return -EIO;
}
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d883544515aae54842c21730b880172e7894fde9 Mon Sep 17 00:00:00 2001
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Date: Tue, 13 Aug 2019 15:37:15 -0700
Subject: [PATCH] mm: mempolicy: make the behavior consistent when
MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.
There are three different sub-cases:
1. vma is not migratable
2. vma is migratable, but there are unmovable pages
3. vma is migratable, pages are movable, but migrate_pages() fails
If #1 happens, kernel would just abort immediately, then return -EIO,
after a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified").
If #3 happens, kernel would set policy and migrate pages with
best-effort, but won't rollback the migrated pages and reset the policy
back.
Before that commit, they behaves in the same way. It'd better to keep
their behavior consistent. But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.
Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt
rollback, determine which exact page(s) are violating the policy, etc.
Make queue_pages_range() return 1 to indicate there are unmovable pages
or vma is not migratable.
The #2 is not handled correctly in the current kernel, the following
patch will fix it.
[yang.shi(a)linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f48693f75b37..932c26845e3e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -429,11 +429,14 @@ static inline bool queue_pages_required(struct page *page,
}
/*
- * queue_pages_pmd() has three possible return values:
- * 1 - pages are placed on the right node or queued successfully.
- * 0 - THP was split.
- * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
- * page was already on a node that does not follow the policy.
+ * queue_pages_pmd() has four possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * 2 - THP was split.
+ * -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
+ * existing page was already on a node that does not follow the
+ * policy.
*/
static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
unsigned long end, struct mm_walk *walk)
@@ -451,19 +454,17 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
if (is_huge_zero_page(page)) {
spin_unlock(ptl);
__split_huge_pmd(walk->vma, pmd, addr, false, NULL);
+ ret = 2;
goto out;
}
- if (!queue_pages_required(page, qp)) {
- ret = 1;
+ if (!queue_pages_required(page, qp))
goto unlock;
- }
- ret = 1;
flags = qp->flags;
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma)) {
- ret = -EIO;
+ ret = 1;
goto unlock;
}
@@ -479,6 +480,13 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
/*
* Scan through pages checking if pages follow certain conditions,
* and move them to the pagelist if they do.
+ *
+ * queue_pages_pte_range() has three possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * -EIO - only MPOL_MF_STRICT was specified and an existing page was already
+ * on a node that does not follow the policy.
*/
static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, struct mm_walk *walk)
@@ -488,17 +496,17 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
int ret;
+ bool has_unmovable = false;
pte_t *pte;
spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmd, vma);
if (ptl) {
ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
- if (ret > 0)
- return 0;
- else if (ret < 0)
+ if (ret != 2)
return ret;
}
+ /* THP was split, fall through to pte walk */
if (pmd_trans_unstable(pmd))
return 0;
@@ -519,14 +527,21 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
if (!queue_pages_required(page, qp))
continue;
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- if (!vma_migratable(vma))
+ /* MPOL_MF_STRICT must be specified if we get here */
+ if (!vma_migratable(vma)) {
+ has_unmovable = true;
break;
+ }
migrate_page_add(page, qp->pagelist, flags);
} else
break;
}
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
+
+ if (has_unmovable)
+ return 1;
+
return addr != end ? -EIO : 0;
}
@@ -639,7 +654,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
*
* If pages found in a given range are on a set of nodes (determined by
* @nodes and @flags,) it's isolated and queued to the pagelist which is
- * passed via @private.)
+ * passed via @private.
+ *
+ * queue_pages_range() has three possible return values:
+ * 1 - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * 0 - queue pages successfully or no misplaced page.
+ * -EIO - there is misplaced page and only MPOL_MF_STRICT was specified.
*/
static int
queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
@@ -1182,6 +1203,7 @@ static long do_mbind(unsigned long start, unsigned long len,
struct mempolicy *new;
unsigned long end;
int err;
+ int ret;
LIST_HEAD(pagelist);
if (flags & ~(unsigned long)MPOL_MF_VALID)
@@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
if (err)
goto mpol_out;
- err = queue_pages_range(mm, start, end, nmask,
+ ret = queue_pages_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist);
- if (!err)
- err = mbind_range(mm, start, end, new);
+
+ if (ret < 0) {
+ err = -EIO;
+ goto up_out;
+ }
+
+ err = mbind_range(mm, start, end, new);
if (!err) {
int nr_failed = 0;
@@ -1259,13 +1286,14 @@ static long do_mbind(unsigned long start, unsigned long len,
putback_movable_pages(&pagelist);
}
- if (nr_failed && (flags & MPOL_MF_STRICT))
+ if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
err = -EIO;
} else
putback_movable_pages(&pagelist);
+up_out:
up_write(&mm->mmap_sem);
- mpol_out:
+mpol_out:
mpol_put(new);
return err;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d883544515aae54842c21730b880172e7894fde9 Mon Sep 17 00:00:00 2001
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Date: Tue, 13 Aug 2019 15:37:15 -0700
Subject: [PATCH] mm: mempolicy: make the behavior consistent when
MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.
There are three different sub-cases:
1. vma is not migratable
2. vma is migratable, but there are unmovable pages
3. vma is migratable, pages are movable, but migrate_pages() fails
If #1 happens, kernel would just abort immediately, then return -EIO,
after a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified").
If #3 happens, kernel would set policy and migrate pages with
best-effort, but won't rollback the migrated pages and reset the policy
back.
Before that commit, they behaves in the same way. It'd better to keep
their behavior consistent. But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.
Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt
rollback, determine which exact page(s) are violating the policy, etc.
Make queue_pages_range() return 1 to indicate there are unmovable pages
or vma is not migratable.
The #2 is not handled correctly in the current kernel, the following
patch will fix it.
[yang.shi(a)linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f48693f75b37..932c26845e3e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -429,11 +429,14 @@ static inline bool queue_pages_required(struct page *page,
}
/*
- * queue_pages_pmd() has three possible return values:
- * 1 - pages are placed on the right node or queued successfully.
- * 0 - THP was split.
- * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
- * page was already on a node that does not follow the policy.
+ * queue_pages_pmd() has four possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * 2 - THP was split.
+ * -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
+ * existing page was already on a node that does not follow the
+ * policy.
*/
static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
unsigned long end, struct mm_walk *walk)
@@ -451,19 +454,17 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
if (is_huge_zero_page(page)) {
spin_unlock(ptl);
__split_huge_pmd(walk->vma, pmd, addr, false, NULL);
+ ret = 2;
goto out;
}
- if (!queue_pages_required(page, qp)) {
- ret = 1;
+ if (!queue_pages_required(page, qp))
goto unlock;
- }
- ret = 1;
flags = qp->flags;
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma)) {
- ret = -EIO;
+ ret = 1;
goto unlock;
}
@@ -479,6 +480,13 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
/*
* Scan through pages checking if pages follow certain conditions,
* and move them to the pagelist if they do.
+ *
+ * queue_pages_pte_range() has three possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * -EIO - only MPOL_MF_STRICT was specified and an existing page was already
+ * on a node that does not follow the policy.
*/
static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, struct mm_walk *walk)
@@ -488,17 +496,17 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
int ret;
+ bool has_unmovable = false;
pte_t *pte;
spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmd, vma);
if (ptl) {
ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
- if (ret > 0)
- return 0;
- else if (ret < 0)
+ if (ret != 2)
return ret;
}
+ /* THP was split, fall through to pte walk */
if (pmd_trans_unstable(pmd))
return 0;
@@ -519,14 +527,21 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
if (!queue_pages_required(page, qp))
continue;
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- if (!vma_migratable(vma))
+ /* MPOL_MF_STRICT must be specified if we get here */
+ if (!vma_migratable(vma)) {
+ has_unmovable = true;
break;
+ }
migrate_page_add(page, qp->pagelist, flags);
} else
break;
}
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
+
+ if (has_unmovable)
+ return 1;
+
return addr != end ? -EIO : 0;
}
@@ -639,7 +654,13 @@ static int queue_pages_test_walk(unsigned long start, unsigned long end,
*
* If pages found in a given range are on a set of nodes (determined by
* @nodes and @flags,) it's isolated and queued to the pagelist which is
- * passed via @private.)
+ * passed via @private.
+ *
+ * queue_pages_range() has three possible return values:
+ * 1 - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * 0 - queue pages successfully or no misplaced page.
+ * -EIO - there is misplaced page and only MPOL_MF_STRICT was specified.
*/
static int
queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
@@ -1182,6 +1203,7 @@ static long do_mbind(unsigned long start, unsigned long len,
struct mempolicy *new;
unsigned long end;
int err;
+ int ret;
LIST_HEAD(pagelist);
if (flags & ~(unsigned long)MPOL_MF_VALID)
@@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start, unsigned long len,
if (err)
goto mpol_out;
- err = queue_pages_range(mm, start, end, nmask,
+ ret = queue_pages_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist);
- if (!err)
- err = mbind_range(mm, start, end, new);
+
+ if (ret < 0) {
+ err = -EIO;
+ goto up_out;
+ }
+
+ err = mbind_range(mm, start, end, new);
if (!err) {
int nr_failed = 0;
@@ -1259,13 +1286,14 @@ static long do_mbind(unsigned long start, unsigned long len,
putback_movable_pages(&pagelist);
}
- if (nr_failed && (flags & MPOL_MF_STRICT))
+ if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
err = -EIO;
} else
putback_movable_pages(&pagelist);
+up_out:
up_write(&mm->mmap_sem);
- mpol_out:
+mpol_out:
mpol_put(new);
return err;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1de13ee59225dfc98d483f8cce7d83f97c0b31de Mon Sep 17 00:00:00 2001
From: Ralph Campbell <rcampbell(a)nvidia.com>
Date: Tue, 13 Aug 2019 15:37:11 -0700
Subject: [PATCH] mm/hmm: fix bad subpage pointer in try_to_unmap_one
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is
increased. This is so rmap_walk() can be used to unmap and migrate the
page back to system memory.
However, try_to_unmap_one() computes the subpage pointer from a swap pte
which computes an invalid page pointer and a kernel panic results such
as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
[rcampbell(a)nvidia.com: add comment]
Link: http://lkml.kernel.org/r/20190724232700.23327-4-rcampbell@nvidia.com
Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/rmap.c b/mm/rmap.c
index e5dfe2ae6b0d..003377e24232 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1475,7 +1475,15 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
/*
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
+ *
+ * The assignment to subpage above was computed from a
+ * swap PTE which results in an invalid pointer.
+ * Since only PAGE_SIZE pages can currently be
+ * migrated, just set it to page. This will need to be
+ * changed when hugepage migrations to device private
+ * memory are supported.
*/
+ subpage = page;
goto discard;
}
From: Suganath Prabu S <suganath-prabu.subramani(a)broadcom.com>
Although SAS3 & SAS3.5 IT HBA controllers support
64-bit DMA addressing, as per hardware design,
if DMA able range contains all 64-bits set (0xFFFFFFFF-FFFFFFFF) then
it results in a firmware fault.
e.g. SGE's start address is 0xFFFFFFFF-FFFF000 and
data length is 0x1000 bytes. when HBA tries to DMA the data
at 0xFFFFFFFF-FFFFFFFF location then HBA will
fault the firmware.
Fix:
Driver will set 63-bit DMA mask to ensure the above address
will not be used.
Cc: <stable(a)vger.kernel.org> # 4.4.186, # 4.9.186 # 4.14.136
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani(a)broadcom.com>
---
RESEND: Resending this patch to include 4.14.136 stable kernel.
This Patch is for stable kernels 4.4.186, 4.9.186 and 4.14.136.
Original patch is applied to 5.3/scsi-fixes.
commit ID: df9a606184bfdb5ae3ca9d226184e9489f5c24f7
drivers/scsi/mpt3sas/mpt3sas_base.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 9b53672..7af7a08 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -1686,9 +1686,11 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
{
struct sysinfo s;
u64 consistent_dma_mask;
+ /* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
+ int dma_mask = (ioc->hba_mpi_version_belonged > MPI2_VERSION) ? 63 : 64;
if (ioc->dma_mask)
- consistent_dma_mask = DMA_BIT_MASK(64);
+ consistent_dma_mask = DMA_BIT_MASK(dma_mask);
else
consistent_dma_mask = DMA_BIT_MASK(32);
@@ -1696,11 +1698,11 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
const uint64_t required_mask =
dma_get_required_mask(&pdev->dev);
if ((required_mask > DMA_BIT_MASK(32)) &&
- !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) &&
+ !pci_set_dma_mask(pdev, DMA_BIT_MASK(dma_mask)) &&
!pci_set_consistent_dma_mask(pdev, consistent_dma_mask)) {
ioc->base_add_sg_single = &_base_add_sg_single_64;
ioc->sge_size = sizeof(Mpi2SGESimple64_t);
- ioc->dma_mask = 64;
+ ioc->dma_mask = dma_mask;
goto out;
}
}
@@ -1726,7 +1728,7 @@ static int
_base_change_consistent_dma_mask(struct MPT3SAS_ADAPTER *ioc,
struct pci_dev *pdev)
{
- if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64))) {
+ if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(ioc->dma_mask))) {
if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)))
return -ENODEV;
}
@@ -3325,7 +3327,7 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc, int sleep_flag)
total_sz += sz;
} while (ioc->rdpq_array_enable && (++i < ioc->reply_queue_count));
- if (ioc->dma_mask == 64) {
+ if (ioc->dma_mask > 32) {
if (_base_change_consistent_dma_mask(ioc, ioc->pdev) != 0) {
pr_warn(MPT3SAS_FMT
"no suitable consistent DMA mask for %s\n",
--
2.16.3
This is the start of the stable review cycle for the 4.14.139 release.
There are 69 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri 16 Aug 2019 04:55:34 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.139-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.139-rc1
Luca Coelho <luciano.coelho(a)intel.com>
iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT support
Luca Coelho <luciano.coelho(a)intel.com>
iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT on version < 41
Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
iwlwifi: mvm: fix an out-of-bound access
Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
iwlwifi: don't unmap as page memory that was mapped as single
Brian Norris <briannorris(a)chromium.org>
mwifiex: fix 802.11n/WPA detection
Wanpeng Li <wanpengli(a)tencent.com>
KVM: Fix leak vCPU's VMCS value into other pCPU
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Fix an Oops in nfs4_do_setattr
Trond Myklebust <trond.myklebust(a)primarydata.com>
NFSv4: Only pass the delegation to setattr if we're sending a truncate
Steve French <stfrench(a)microsoft.com>
smb3: send CAP_DFS capability during session setup
Pavel Shilovsky <pshilov(a)microsoft.com>
SMB3: Fix deadlock in validate negotiate hits reconnect
Brian Norris <briannorris(a)chromium.org>
mac80211: don't WARN on short WMM parameters from AP
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda - Workaround for crackled sound on AMD controller (1022:1457)
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda - Don't override global PCM hw info flag
Wenwen Wang <wenwen(a)cs.uga.edu>
ALSA: firewire: fix a memory leak bug
Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com>
drm/i915: Fix wrong escape clock divisor init for GLK
Guenter Roeck <linux(a)roeck-us.net>
hwmon: (nct7802) Fix wrong detection of in4 presence
Tomas Bortoli <tomasbortoli(a)gmail.com>
can: peak_usb: pcan_usb_fd: Fix info-leaks to USB devices
Tomas Bortoli <tomasbortoli(a)gmail.com>
can: peak_usb: pcan_usb_pro: Fix info-leaks to USB devices
Roderick Colenbrander <roderick(a)gaikai.com>
HID: sony: Fix race condition between rumble and device remove.
Leonard Crestez <leonard.crestez(a)nxp.com>
perf/core: Fix creating kernel counters for PMUs that override event->cpu
Peter Zijlstra <peterz(a)infradead.org>
tty/ldsem, locking/rwsem: Add missing ACQUIRE to read_failed sleep loop
Wenwen Wang <wenwen(a)cs.uga.edu>
test_firmware: fix a memory leak bug
Hannes Reinecke <hare(a)suse.de>
scsi: scsi_dh_alua: always use a 2 second delay before retrying RTPG
Tyrel Datwyler <tyreld(a)linux.vnet.ibm.com>
scsi: ibmvfc: fix WARN_ON during event pool release
Junxiao Bi <junxiao.bi(a)oracle.com>
scsi: megaraid_sas: fix panic on loading firmware crashdump
Arnd Bergmann <arnd(a)arndb.de>
ARM: davinci: fix sleep.S build error on ARMv4
Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id()
Arnd Bergmann <arnd(a)arndb.de>
drbd: dynamically allocate shash descriptor
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf probe: Avoid calling freeing routine multiple times for same pointer
Jiri Olsa <jolsa(a)kernel.org>
perf tools: Fix proper buffer size for feature processing
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ALSA: compress: Be more restrictive about when a drain is allowed
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ALSA: compress: Don't allow paritial drain operations on capture streams
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ALSA: compress: Prevent bypasses of set_params
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ALSA: compress: Fix regression on compressed capture streams
Julian Wiedmann <jwi(a)linux.ibm.com>
s390/qdio: add sanity checks to the fast-requeue path
Wen Yang <wen.yang99(a)zte.com.cn>
cpufreq/pasemi: fix use-after-free in pas_cpufreq_cpu_init()
Qian Cai <cai(a)lca.pw>
drm: silence variable 'conn' set but not used
Björn Gerhart <gerhart(a)posteo.de>
hwmon: (nct6775) Fix register address and added missed tolerance for nct6106
Brian Norris <briannorris(a)chromium.org>
mac80211: don't warn about CW params when not using them
Thomas Tai <thomas.tai(a)oracle.com>
iscsi_ibft: make ISCSI_IBFT dependson ACPI instead of ISCSI_IBFT_FIND
Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
scripts/sphinx-pre-install: fix script for RHEL/CentOS
Laura Garcia Liebana <nevola(a)gmail.com>
netfilter: nft_hash: fix symhash with modulus one
Miaohe Lin <linmiaohe(a)huawei.com>
netfilter: Fix rpfilter dropping vrf packets by mistake
Farhan Ali <alifm(a)linux.ibm.com>
vfio-ccw: Set pa_nr to 0 if memory allocation fails for pa_iova_pfn
Florian Westphal <fw(a)strlen.de>
netfilter: nfnetlink: avoid deadlock due to synchronous request_module
Stephane Grosjean <s.grosjean(a)peak-system.com>
can: peak_usb: fix potential double kfree_skb()
Nikita Yushchenko <nikita.yoush(a)cogentembedded.com>
can: rcar_canfd: fix possible IRQ storm on high load
Suzuki K Poulose <suzuki.poulose(a)arm.com>
usb: yurex: Fix use-after-free in yurex_delete
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
usb: host: xhci-rcar: Fix timeout in xhci_suspend()
Thomas Richter <tmricht(a)linux.ibm.com>
perf record: Fix module size on s390
Adrian Hunter <adrian.hunter(a)intel.com>
perf db-export: Fix thread__exec_comm()
Thomas Richter <tmricht(a)linux.ibm.com>
perf annotate: Fix s390 gap between kernel end and module start
Joerg Roedel <jroedel(a)suse.de>
mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()
Joerg Roedel <jroedel(a)suse.de>
x86/mm: Sync also unmappings in vmalloc_sync_all()
Joerg Roedel <jroedel(a)suse.de>
x86/mm: Check for pfn instead of page in vmalloc_sync_one()
Ben Hutchings <ben(a)decadent.org.uk>
tcp: Clear sk_send_head after purging the write queue
Gary R Hook <gary.hook(a)amd.com>
crypto: ccp - Add support for valid authsize values less than 16
Gary R Hook <gary.hook(a)amd.com>
crypto: ccp - Validate buffer lengths for copy operations
Nick Desaulniers <ndesaulniers(a)google.com>
lkdtm: support llvm-objcopy
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Input: synaptics - enable RMI mode for HP Spectre X360
Mikulas Patocka <mpatocka(a)redhat.com>
loop: set PF_MEMALLOC_NOIO for the worker thread
Kevin Hao <haokexin(a)gmail.com>
mmc: cavium: Add the missing dma unmap when the dma has finished.
Kevin Hao <haokexin(a)gmail.com>
mmc: cavium: Set the correct dma max segment size for mmc_host
Wenwen Wang <wenwen(a)cs.uga.edu>
sound: fix a memory leak bug
Oliver Neukum <oneukum(a)suse.com>
usb: iowarrior: fix deadlock on disconnect
Gavin Li <git(a)thegavinli.com>
usb: usbfs: fix double-free of usb memory upon submiturb error
Gary R Hook <gary.hook(a)amd.com>
crypto: ccp - Ignore tag length when decrypting GCM ciphertext
Gary R Hook <gary.hook(a)amd.com>
crypto: ccp - Fix oops by properly managing allocated structures
Joe Perches <joe(a)perches.com>
iio: adc: max9611: Fix misuse of GENMASK macro
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mach-davinci/sleep.S | 1 +
arch/powerpc/kvm/powerpc.c | 5 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 6 ++
arch/x86/kvm/vmx.c | 6 ++
arch/x86/kvm/x86.c | 16 +++
arch/x86/mm/fault.c | 15 ++-
drivers/acpi/arm64/iort.c | 4 +-
drivers/block/drbd/drbd_receiver.c | 14 ++-
drivers/block/loop.c | 2 +-
drivers/cpufreq/pasemi-cpufreq.c | 23 ++---
drivers/crypto/ccp/ccp-crypto-aes-galois.c | 14 +++
drivers/crypto/ccp/ccp-ops.c | 139 +++++++++++++++++++--------
drivers/firmware/Kconfig | 5 +-
drivers/firmware/iscsi_ibft.c | 4 +
drivers/gpu/drm/drm_framebuffer.c | 2 +-
drivers/gpu/drm/i915/intel_dsi_pll.c | 4 +-
drivers/hid/hid-sony.c | 15 ++-
drivers/hwmon/nct6775.c | 3 +-
drivers/hwmon/nct7802.c | 6 +-
drivers/iio/adc/max9611.c | 2 +-
drivers/input/mouse/synaptics.c | 1 +
drivers/misc/Makefile | 3 +-
drivers/mmc/host/cavium.c | 4 +-
drivers/net/can/rcar/rcar_canfd.c | 9 +-
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 8 +-
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 2 +-
drivers/net/can/usb/peak_usb/pcan_usb_pro.c | 2 +-
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 29 ++++--
drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 2 +
drivers/net/wireless/marvell/mwifiex/main.h | 1 +
drivers/net/wireless/marvell/mwifiex/scan.c | 3 +-
drivers/s390/cio/qdio_main.c | 12 +--
drivers/s390/cio/vfio_ccw_cp.c | 4 +-
drivers/scsi/device_handler/scsi_dh_alua.c | 7 +-
drivers/scsi/ibmvscsi/ibmvfc.c | 2 +-
drivers/scsi/megaraid/megaraid_sas_base.c | 3 +
drivers/tty/tty_ldsem.c | 5 +-
drivers/usb/core/devio.c | 2 -
drivers/usb/host/xhci-rcar.c | 9 +-
drivers/usb/misc/iowarrior.c | 7 +-
drivers/usb/misc/yurex.c | 2 +-
fs/cifs/smb2pdu.c | 7 +-
fs/nfs/nfs4proc.c | 12 ++-
include/linux/ccp.h | 2 +
include/linux/kvm_host.h | 1 +
include/net/tcp.h | 3 +
include/sound/compress_driver.h | 5 +-
kernel/events/core.c | 2 +-
lib/test_firmware.c | 5 +-
mm/vmalloc.c | 9 ++
net/ipv4/netfilter/ipt_rpfilter.c | 1 +
net/ipv6/netfilter/ip6t_rpfilter.c | 8 +-
net/mac80211/driver-ops.c | 13 ++-
net/mac80211/mlme.c | 10 ++
net/netfilter/nfnetlink.c | 2 +-
net/netfilter/nft_hash.c | 2 +-
scripts/sphinx-pre-install | 2 +-
sound/core/compress_offload.c | 60 +++++++++---
sound/firewire/packets-buffer.c | 2 +-
sound/pci/hda/hda_controller.c | 13 ++-
sound/pci/hda/hda_controller.h | 2 +-
sound/pci/hda/hda_intel.c | 63 +++++++++++-
sound/sound_core.c | 3 +-
tools/perf/arch/s390/util/machine.c | 31 +++++-
tools/perf/builtin-probe.c | 10 ++
tools/perf/util/header.c | 2 +-
tools/perf/util/machine.c | 3 +-
tools/perf/util/machine.h | 2 +-
tools/perf/util/symbol.c | 7 +-
tools/perf/util/symbol.h | 1 +
tools/perf/util/thread.c | 12 ++-
virt/kvm/kvm_main.c | 25 ++++-
74 files changed, 558 insertions(+), 170 deletions(-)
After _gadget_stop_activity is executed, we can consider the hardware
operation for gadget has finished, and the udc can be stopped and enter
low power mode. So, any later hardware operations (from usb_ep_ops APIs
or usb_gadget_ops APIs) should be considered invalid, any deinitializatons
has been covered at _gadget_stop_activity.
I meet this problem when I plug out usb cable from PC using mass_storage
gadget, my callstack like: vbus interrupt->.vbus_session->
composite_disconnect ->pm_runtime_put_sync(&_gadget->dev),
the composite_disconnect will call fsg_disable, but fsg_disable calls
usb_ep_disable using async way, there are register accesses for
usb_ep_disable. So sometimes, I get system hang due to visit register
without clock, sometimes not.
The Linux Kernel USB maintainer Alan Stern suggests this kinds of solution.
See: http://marc.info/?l=linux-usb&m=138541769810983&w=2.
Cc: <stable(a)vger.kernel.org> #v4.9+
Signed-off-by: Peter Chen <peter.chen(a)nxp.com>
---
This patch is at NXP internal tree long time, and no issues have found.
Submit to mainline kenrel.
drivers/usb/chipidea/udc.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
index 053432d79bf7..8f18e7b6cadf 100644
--- a/drivers/usb/chipidea/udc.c
+++ b/drivers/usb/chipidea/udc.c
@@ -709,12 +709,6 @@ static int _gadget_stop_activity(struct usb_gadget *gadget)
struct ci_hdrc *ci = container_of(gadget, struct ci_hdrc, gadget);
unsigned long flags;
- spin_lock_irqsave(&ci->lock, flags);
- ci->gadget.speed = USB_SPEED_UNKNOWN;
- ci->remote_wakeup = 0;
- ci->suspended = 0;
- spin_unlock_irqrestore(&ci->lock, flags);
-
/* flush all endpoints */
gadget_for_each_ep(ep, gadget) {
usb_ep_fifo_flush(ep);
@@ -732,6 +726,12 @@ static int _gadget_stop_activity(struct usb_gadget *gadget)
ci->status = NULL;
}
+ spin_lock_irqsave(&ci->lock, flags);
+ ci->gadget.speed = USB_SPEED_UNKNOWN;
+ ci->remote_wakeup = 0;
+ ci->suspended = 0;
+ spin_unlock_irqrestore(&ci->lock, flags);
+
return 0;
}
@@ -1303,6 +1303,10 @@ static int ep_disable(struct usb_ep *ep)
return -EBUSY;
spin_lock_irqsave(hwep->lock, flags);
+ if (hwep->ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(hwep->lock, flags);
+ return 0;
+ }
/* only internal SW should disable ctrl endpts */
@@ -1392,6 +1396,10 @@ static int ep_queue(struct usb_ep *ep, struct usb_request *req,
return -EINVAL;
spin_lock_irqsave(hwep->lock, flags);
+ if (hwep->ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(hwep->lock, flags);
+ return 0;
+ }
retval = _ep_queue(ep, req, gfp_flags);
spin_unlock_irqrestore(hwep->lock, flags);
return retval;
@@ -1415,8 +1423,8 @@ static int ep_dequeue(struct usb_ep *ep, struct usb_request *req)
return -EINVAL;
spin_lock_irqsave(hwep->lock, flags);
-
- hw_ep_flush(hwep->ci, hwep->num, hwep->dir);
+ if (hwep->ci->gadget.speed != USB_SPEED_UNKNOWN)
+ hw_ep_flush(hwep->ci, hwep->num, hwep->dir);
list_for_each_entry_safe(node, tmpnode, &hwreq->tds, td) {
dma_pool_free(hwep->td_pool, node->ptr, node->dma);
@@ -1487,6 +1495,10 @@ static void ep_fifo_flush(struct usb_ep *ep)
}
spin_lock_irqsave(hwep->lock, flags);
+ if (hwep->ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(hwep->lock, flags);
+ return;
+ }
hw_ep_flush(hwep->ci, hwep->num, hwep->dir);
@@ -1559,6 +1571,10 @@ static int ci_udc_wakeup(struct usb_gadget *_gadget)
int ret = 0;
spin_lock_irqsave(&ci->lock, flags);
+ if (ci->gadget.speed == USB_SPEED_UNKNOWN) {
+ spin_unlock_irqrestore(&ci->lock, flags);
+ return 0;
+ }
if (!ci->remote_wakeup) {
ret = -EOPNOTSUPP;
goto out;
--
2.17.1
As reported in [1], the hisi-lpc driver has certain issues in handling
logical PIO regions, specifically unregistering regions.
This series add a method to unregister a logical PIO region, and fixes up
the driver to use them.
RCU usage in logical PIO code looks to always have been broken, so that
is fixed also. This is not a major fix as the list which RCU protects
would be rarely modified.
At this point, there are still separate ongoing discussions about how to
stop the logical PIO and PCI host bridge code leaking ranges, as in [2],
which I haven't had a chance to look at recently.
Hopefully this series can go through the arm soc tree and the maintainers
have no issue with that. I'm talking specifically about the logical PIO
code, which went through PCI tree on only previous upstreaming.
[1] https://lore.kernel.org/lkml/1560770148-57960-1-git-send-email-john.garry@h…
[2] https://lore.kernel.org/lkml/4b24fd36-e716-7c5e-31cc-13da727802e7@huawei.co…
Changes since v3:
https://lore.kernel.org/lkml/1561566418-22714-1-git-send-email-john.garry@h…
- drop optimisation patch (lib: logic_pio: Enforce LOGIC_PIO_INDIRECT
region ops are set at registration)
Not a fix
- rebase to v5.3-rc2
- cc stable
Change since v2:
- Add hisi_lpc_acpi_remove() stub to fix build for !CONFIG_ACPI
Changes since v1:
- Add more reasoning in RCU fix patch
- Create separate patch to change LOGIC_PIO_CPU_MMIO registration to
accomodate unregistration
John Garry (5):
lib: logic_pio: Fix RCU usage
lib: logic_pio: Avoid possible overlap for unregistering regions
lib: logic_pio: Add logic_pio_unregister_range()
bus: hisi_lpc: Unregister logical PIO range to avoid potential
use-after-free
bus: hisi_lpc: Add .remove method to avoid driver unbind crash
drivers/bus/hisi_lpc.c | 47 +++++++++++++++++++++----
include/linux/logic_pio.h | 1 +
lib/logic_pio.c | 73 +++++++++++++++++++++++++++++----------
3 files changed, 96 insertions(+), 25 deletions(-)
--
2.17.1
This is a note to let you know that I've just added the patch titled
tty/serial: atmel: reschedule TX after RX was started
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 1bc102260d278de0af89c58536f4bbabd2ef28be Mon Sep 17 00:00:00 2001
From: Razvan Stefanescu <razvan.stefanescu(a)microchip.com>
Date: Tue, 13 Aug 2019 10:40:25 +0300
Subject: tty/serial: atmel: reschedule TX after RX was started
When half-duplex RS485 communication is used, after RX is started, TX
tasklet still needs to be scheduled tasklet. This avoids console freezing
when more data is to be transmitted, if the serial communication is not
closed.
Fixes: 69646d7a3689 ("tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped")
Signed-off-by: Razvan Stefanescu <razvan.stefanescu(a)microchip.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190813074025.16218-1-razvan.stefanescu@microchi…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/atmel_serial.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c
index 19a85d6fe3d2..9a54c9e6d36e 100644
--- a/drivers/tty/serial/atmel_serial.c
+++ b/drivers/tty/serial/atmel_serial.c
@@ -1400,7 +1400,6 @@ atmel_handle_transmit(struct uart_port *port, unsigned int pending)
atmel_port->hd_start_rx = false;
atmel_start_rx(port);
- return;
}
atmel_tasklet_schedule(atmel_port, &atmel_port->tasklet_tx);
--
2.22.1
Testing with RTL8822BE hardware, when available memory is low, we
frequently see a kernel panic and system freeze.
First, rtw_pci_rx_isr encounters a memory allocation failure (trimmed):
rx routine starvation
WARNING: CPU: 7 PID: 9871 at drivers/net/wireless/realtek/rtw88/pci.c:822 rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
[ 2356.580313] RIP: 0010:rtw_pci_rx_isr.constprop.25+0x35a/0x370 [rtwpci]
Then we see a variety of different error conditions and kernel panics,
such as this one (trimmed):
rtw_pci 0000:02:00.0: pci bus timeout, check dma status
skbuff: skb_over_panic: text:00000000091b6e66 len:415 put:415 head:00000000d2880c6f data:000000007a02b1ea tail:0x1df end:0xc0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:105!
invalid opcode: 0000 [#1] SMP NOPTI
RIP: 0010:skb_panic+0x43/0x45
When skb allocation fails and the "rx routine starvation" is hit, the
function returns immediately without updating the RX ring. At this
point, the RX ring may continue referencing an old skb which was already
handed off to ieee80211_rx_irqsafe(). When it comes to be used again,
bad things happen.
This patch allocates a new, data-sized skb first in RX ISR. After
copying the data in, we pass it to the upper layers. However, if skb
allocation fails, we effectively drop the frame. In both cases, the
original, full size ring skb is reused.
In addition, by fixing the kernel crash, the RX routine should now
generally behave better under low memory conditions.
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=204053
Signed-off-by: Jian-Hong Pan <jian-hong(a)endlessm.com>
Cc: <stable(a)vger.kernel.org>
---
v2:
- Allocate new data-sized skb and put data into it, then pass it to
mac80211. Reuse the original skb in RX ring by DMA sync.
- Modify the commit message.
- Introduce following [PATCH v3 2/2] rtw88: pci: Use DMA sync instead
of remapping in RX ISR.
v3:
- Same as v2.
drivers/net/wireless/realtek/rtw88/pci.c | 49 +++++++++++-------------
1 file changed, 22 insertions(+), 27 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index cfe05ba7280d..e9fe3ad896c8 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -763,6 +763,7 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
u32 pkt_offset;
u32 pkt_desc_sz = chip->rx_pkt_desc_sz;
u32 buf_desc_sz = chip->rx_buf_desc_sz;
+ u32 new_len;
u8 *rx_desc;
dma_addr_t dma;
@@ -790,40 +791,34 @@ static void rtw_pci_rx_isr(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci,
pkt_offset = pkt_desc_sz + pkt_stat.drv_info_sz +
pkt_stat.shift;
- if (pkt_stat.is_c2h) {
- /* keep rx_desc, halmac needs it */
- skb_put(skb, pkt_stat.pkt_len + pkt_offset);
+ /* discard current skb if the new skb cannot be allocated as a
+ * new one in rx ring later
+ */
+ new_len = pkt_stat.pkt_len + pkt_offset;
+ new = dev_alloc_skb(new_len);
+ if (WARN_ONCE(!new, "rx routine starvation\n"))
+ goto next_rp;
+
+ /* put the DMA data including rx_desc from phy to new skb */
+ skb_put_data(new, skb->data, new_len);
- /* pass offset for further operation */
- *((u32 *)skb->cb) = pkt_offset;
- skb_queue_tail(&rtwdev->c2h_queue, skb);
+ if (pkt_stat.is_c2h) {
+ /* pass rx_desc & offset for further operation */
+ *((u32 *)new->cb) = pkt_offset;
+ skb_queue_tail(&rtwdev->c2h_queue, new);
ieee80211_queue_work(rtwdev->hw, &rtwdev->c2h_work);
} else {
- /* remove rx_desc, maybe use skb_pull? */
- skb_put(skb, pkt_stat.pkt_len);
- skb_reserve(skb, pkt_offset);
-
- /* alloc a smaller skb to mac80211 */
- new = dev_alloc_skb(pkt_stat.pkt_len);
- if (!new) {
- new = skb;
- } else {
- skb_put_data(new, skb->data, skb->len);
- dev_kfree_skb_any(skb);
- }
- /* TODO: merge into rx.c */
- rtw_rx_stats(rtwdev, pkt_stat.vif, skb);
+ /* remove rx_desc */
+ skb_pull(new, pkt_offset);
+
+ rtw_rx_stats(rtwdev, pkt_stat.vif, new);
memcpy(new->cb, &rx_status, sizeof(rx_status));
ieee80211_rx_irqsafe(rtwdev->hw, new);
}
- /* skb delivered to mac80211, alloc a new one in rx ring */
- new = dev_alloc_skb(RTK_PCI_RX_BUF_SIZE);
- if (WARN(!new, "rx routine starvation\n"))
- return;
-
- ring->buf[cur_rp] = new;
- rtw_pci_reset_rx_desc(rtwdev, new, ring, cur_rp, buf_desc_sz);
+next_rp:
+ /* new skb delivered to mac80211, re-enable original skb DMA */
+ rtw_pci_reset_rx_desc(rtwdev, skb, ring, cur_rp, buf_desc_sz);
/* host read next element in ring */
if (++cur_rp >= ring->r.len)
--
2.22.0
The patch titled
Subject: mm, vmscan: do not special-case slab reclaim when watermarks are boosted
has been removed from the -mm tree. Its filename was
mm-vmscan-do-not-special-case-slab-reclaim-when-watermarks-are-boosted.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm, vmscan: do not special-case slab reclaim when watermarks are boosted
Dave Chinner reported a problem pointing a finger at commit 1c30844d2dfe
("mm: reclaim small amounts of memory when an external fragmentation event
occurs"). The report is extensive (see
https://lore.kernel.org/linux-mm/20190807091858.2857-1-david@fromorbit.com/)
and it's worth recording the most relevant parts (colorful language and
typos included).
When running a simple, steady state 4kB file creation test to
simulate extracting tarballs larger than memory full of small
files into the filesystem, I noticed that once memory fills up
the cache balance goes to hell.
The workload is creating one dirty cached inode for every dirty
page, both of which should require a single IO each to clean and
reclaim, and creation of inodes is throttled by the rate at which
dirty writeback runs at (via balance dirty pages). Hence the ingest
rate of new cached inodes and page cache pages is identical and
steady. As a result, memory reclaim should quickly find a steady
balance between page cache and inode caches.
The moment memory fills, the page cache is reclaimed at a much
faster rate than the inode cache, and evidence suggests that
the inode cache shrinker is not being called when large batches
of pages are being reclaimed. In roughly the same time period
that it takes to fill memory with 50% pages and 50% slab caches,
memory reclaim reduces the page cache down to just dirty pages
and slab caches fill the entirety of memory.
The LRU is largely full of dirty pages, and we're getting spikes
of random writeback from memory reclaim so it's all going to shit.
Behaviour never recovers, the page cache remains pinned at just
dirty pages, and nothing I could tune would make any difference.
vfs_cache_pressure makes no difference - I would set it so high
it should trim the entire inode caches in a single pass, yet it
didn't do anything. It was clear from tracing and live telemetry
that the shrinkers were pretty much not running except when
there was absolutely no memory free at all, and then they did
the minimum necessary to free memory to make progress.
So I went looking at the code, trying to find places where pages
got reclaimed and the shrinkers weren't called. There's only one
- kswapd doing boosted reclaim as per commit 1c30844d2dfe ("mm:
reclaim small amounts of memory when an external fragmentation
event occurs").
The watermark boosting introduced by the commit is triggered in response
to an allocation "fragmentation event". The boosting was not intended to
target THP specifically and triggers even if THP is disabled. However,
with Dave's perfectly reasonable workload, fragmentation events can be
very common given the ratio of slab to page cache allocations so boosting
remains active for long periods of time.
As high-order allocations might use compaction and compaction cannot move
slab pages the decision was made in the commit to special-case kswapd when
watermarks are boosted -- kswapd avoids reclaiming slab as reclaiming slab
does not directly help compaction.
As Dave notes, this decision means that slab can be artificially protected
for long periods of time and messes up the balance with slab and page
caches.
Removing the special casing can still indirectly help avoid
fragmentation by avoiding fragmentation-causing events due to slab
allocation as pages from a slab pageblock will have some slab objects
freed. Furthermore, with the special casing, reclaim behaviour is
unpredictable as kswapd sometimes examines slab and sometimes does not
in a manner that is tricky to tune or analyse.
This patch removes the special casing. The downside is that this is not a
universal performance win. Some benchmarks that depend on the residency
of data when rereading metadata may see a regression when slab reclaim is
restored to its original behaviour. Similarly, some benchmarks that only
read-once or write-once may perform better when page reclaim is too
aggressive. The primary upside is that slab shrinker is less surprising
(arguably more sane but that's a matter of opinion), behaves consistently
regardless of the fragmentation state of the system and properly obeys VM
sysctls.
A fsmark benchmark configuration was constructed similar to what Dave
reported and is codified by the mmtest configuration
config-io-fsmark-small-file-stream. It was evaluated on a 1-socket
machine to avoid dealing with NUMA-related issues and the timing of
reclaim. The storage was an SSD Samsung Evo and a fresh trimmed XFS
filesystem was used for the test data.
This is not an exact replication of Dave's setup. The configuration
scales its parameters depending on the memory size of the SUT to behave
similarly across machines. The parameters mean the first sample reported
by fs_mark is using 50% of RAM which will barely be throttled and look
like a big outlier. Dave used fake NUMA to have multiple kswapd instances
which I didn't replicate. Finally, the number of iterations differ from
Dave's test as the target disk was not large enough. While not identical,
it should be representative.
fsmark
5.3.0-rc3 5.3.0-rc3
vanilla shrinker-v1r1
Min 1-files/sec 4444.80 ( 0.00%) 4765.60 ( 7.22%)
1st-qrtle 1-files/sec 5005.10 ( 0.00%) 5091.70 ( 1.73%)
2nd-qrtle 1-files/sec 4917.80 ( 0.00%) 4855.60 ( -1.26%)
3rd-qrtle 1-files/sec 4667.40 ( 0.00%) 4831.20 ( 3.51%)
Max-1 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Max-5 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Max-10 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Max-90 1-files/sec 4649.60 ( 0.00%) 4780.70 ( 2.82%)
Max-95 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%)
Max-99 1-files/sec 4491.00 ( 0.00%) 4768.20 ( 6.17%)
Max 1-files/sec 11421.50 ( 0.00%) 9999.30 ( -12.45%)
Hmean 1-files/sec 5004.75 ( 0.00%) 5075.96 ( 1.42%)
Stddev 1-files/sec 1778.70 ( 0.00%) 1369.66 ( 23.00%)
CoeffVar 1-files/sec 33.70 ( 0.00%) 26.05 ( 22.71%)
BHmean-99 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%)
BHmean-95 1-files/sec 5053.72 ( 0.00%) 5101.52 ( 0.95%)
BHmean-90 1-files/sec 5107.05 ( 0.00%) 5131.41 ( 0.48%)
BHmean-75 1-files/sec 5208.45 ( 0.00%) 5206.68 ( -0.03%)
BHmean-50 1-files/sec 5405.53 ( 0.00%) 5381.62 ( -0.44%)
BHmean-25 1-files/sec 6179.75 ( 0.00%) 6095.14 ( -1.37%)
5.3.0-rc3 5.3.0-rc3
vanillashrinker-v1r1
Duration User 501.82 497.29
Duration System 4401.44 4424.08
Duration Elapsed 8124.76 8358.05
This is showing a slight skew for the max result representing a large
outlier for the 1st, 2nd and 3rd quartile are similar indicating that the
bulk of the results show little difference. Note that an earlier version
of the fsmark configuration showed a regression but that included more
samples taken while memory was still filling.
Note that the elapsed time is higher. Part of this is that the
configuration included time to delete all the test files when the test
completes -- the test automation handles the possibility of testing fsmark
with multiple thread counts. Without the patch, many of these objects
would be memory resident which is part of what the patch is addressing.
There are other important observations that justify the patch.
1. With the vanilla kernel, the number of dirty pages in the system
is very low for much of the test. With this patch, dirty pages
is generally kept at 10% which matches vm.dirty_background_ratio
which is normal expected historical behaviour.
2. With the vanilla kernel, the ratio of Slab/Pagecache is close to
0.95 for much of the test i.e. Slab is being left alone and dominating
memory consumption. With the patch applied, the ratio varies between
0.35 and 0.45 with the bulk of the measured ratios roughly half way
between those values. This is a different balance to what Dave reported
but it was at least consistent.
3. Slabs are scanned throughout the entire test with the patch applied.
The vanille kernel has periods with no scan activity and then relatively
massive spikes.
4. Without the patch, kswapd scan rates are very variable. With the patch,
the scan rates remain quite stead.
4. Overall vmstats are closer to normal expectations
5.3.0-rc3 5.3.0-rc3
vanilla shrinker-v1r1
Ops Direct pages scanned 99388.00 328410.00
Ops Kswapd pages scanned 45382917.00 33451026.00
Ops Kswapd pages reclaimed 30869570.00 25239655.00
Ops Direct pages reclaimed 74131.00 5830.00
Ops Kswapd efficiency % 68.02 75.45
Ops Kswapd velocity 5585.75 4002.25
Ops Page reclaim immediate 1179721.00 430927.00
Ops Slabs scanned 62367361.00 73581394.00
Ops Direct inode steals 2103.00 1002.00
Ops Kswapd inode steals 570180.00 5183206.00
o Vanilla kernel is hitting direct reclaim more frequently,
not very much in absolute terms but the fact the patch
reduces it is interesting
o "Page reclaim immediate" in the vanilla kernel indicates
dirty pages are being encountered at the tail of the LRU.
This is generally bad and means in this case that the LRU
is not long enough for dirty pages to be cleaned by the
background flush in time. This is much reduced by the
patch.
o With the patch, kswapd is reclaiming 10 times more slab
pages than with the vanilla kernel. This is indicative
of the watermark boosting over-protecting slab
A more complete set of tests were run that were part of the basis
for introducing boosting and while there are some differences, they
are well within tolerances.
Bottom line, the special casing kswapd to avoid slab behaviour is
unpredictable and can lead to abnormal results for normal workloads. This
patch restores the expected behaviour that slab and page cache is balanced
consistently for a workload with a steady allocation ratio of
slab/pagecache pages. It also means that if there are workloads that
favour the preservation of slab over pagecache that it can be tuned via
vm.vfs_cache_pressure where as the vanilla kernel effectively ignores the
parameter when boosting is active.
Link: http://lkml.kernel.org/r/20190808182946.GM2739@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reviewed-by: Dave Chinner <dchinner(a)redhat.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [5.0+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
--- a/mm/vmscan.c~mm-vmscan-do-not-special-case-slab-reclaim-when-watermarks-are-boosted
+++ a/mm/vmscan.c
@@ -88,9 +88,6 @@ struct scan_control {
/* Can pages be swapped as part of reclaim? */
unsigned int may_swap:1;
- /* e.g. boosted watermark reclaim leaves slabs alone */
- unsigned int may_shrinkslab:1;
-
/*
* Cgroups are not reclaimed below their configured memory.low,
* unless we threaten to OOM. If any cgroups are skipped due to
@@ -2714,10 +2711,8 @@ static bool shrink_node(pg_data_t *pgdat
shrink_node_memcg(pgdat, memcg, sc, &lru_pages);
node_lru_pages += lru_pages;
- if (sc->may_shrinkslab) {
- shrink_slab(sc->gfp_mask, pgdat->node_id,
- memcg, sc->priority);
- }
+ shrink_slab(sc->gfp_mask, pgdat->node_id, memcg,
+ sc->priority);
/* Record the group's reclaim efficiency */
vmpressure(sc->gfp_mask, memcg, false,
@@ -3194,7 +3189,6 @@ unsigned long try_to_free_pages(struct z
.may_writepage = !laptop_mode,
.may_unmap = 1,
.may_swap = 1,
- .may_shrinkslab = 1,
};
/*
@@ -3238,7 +3232,6 @@ unsigned long mem_cgroup_shrink_node(str
.may_unmap = 1,
.reclaim_idx = MAX_NR_ZONES - 1,
.may_swap = !noswap,
- .may_shrinkslab = 1,
};
unsigned long lru_pages;
@@ -3286,7 +3279,6 @@ unsigned long try_to_free_mem_cgroup_pag
.may_writepage = !laptop_mode,
.may_unmap = 1,
.may_swap = may_swap,
- .may_shrinkslab = 1,
};
set_task_reclaim_state(current, &sc.reclaim_state);
@@ -3598,7 +3590,6 @@ restart:
*/
sc.may_writepage = !laptop_mode && !nr_boost_reclaim;
sc.may_swap = !nr_boost_reclaim;
- sc.may_shrinkslab = !nr_boost_reclaim;
/*
* Do some background aging of the anon list, to give
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
The patch titled
Subject: seq_file: fix problem when seeking mid-record
has been removed from the -mm tree. Its filename was
seq_file-fix-problem-when-seeking-mid-record.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: NeilBrown <neilb(a)suse.com>
Subject: seq_file: fix problem when seeking mid-record
If you use lseek or similar (e.g. pread) to access a location in a
seq_file file that is within a record, rather than at a record boundary,
then the first read will return the remainder of the record, and the
second read will return the whole of that same record (instead of the next
record). When seeking to a record boundary, the next record is correctly
returned.
This bug was introduced by a recent patch (identified below). Before that
patch, seq_read() would increment m->index when the last of the buffer was
returned (m->count == 0). After that patch, we rely on ->next to
increment m->index after filling the buffer - but there was one place
where that didn't happen.
Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb(a)suse.com>
Reported-by: Sergei Turchanov <turchanov(a)farpost.com>
Tested-by: Sergei Turchanov <turchanov(a)farpost.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Markus Elfring <Markus.Elfring(a)web.de>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/seq_file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/seq_file.c~seq_file-fix-problem-when-seeking-mid-record
+++ a/fs/seq_file.c
@@ -119,6 +119,7 @@ static int traverse(struct seq_file *m,
}
if (seq_has_overflowed(m))
goto Eoverflow;
+ p = m->op->next(m, p, &m->index);
if (pos + m->count > offset) {
m->from = offset - pos;
m->count -= m->from;
@@ -126,7 +127,6 @@ static int traverse(struct seq_file *m,
}
pos += m->count;
m->count = 0;
- p = m->op->next(m, p, &m->index);
if (pos == offset)
break;
}
_
Patches currently in -mm which might be from neilb(a)suse.com are
The patch titled
Subject: mm/usercopy: use memory range to be accessed for wraparound check
has been removed from the -mm tree. Its filename was
mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Isaac J. Manjarres" <isaacm(a)codeaurora.org>
Subject: mm/usercopy: use memory range to be accessed for wraparound check
Currently, when checking to see if accessing n bytes starting at address
"ptr" will cause a wraparound in the memory addresses, the check in
check_bogus_address() adds an extra byte, which is incorrect, as the range
of addresses that will be accessed is [ptr, ptr + (n - 1)].
This can lead to incorrectly detecting a wraparound in the memory address,
when trying to read 4 KB from memory that is mapped to the the last
possible page in the virtual address space, when in fact, accessing that
range of memory would not cause a wraparound to occur.
Use the memory range that will actually be accessed when considering if
accessing a certain amount of bytes will cause the memory address to wrap
around.
Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeauror…
Fixes: f5509cc18daa ("mm: Hardened usercopy")
Signed-off-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Signed-off-by: Isaac J. Manjarres <isaacm(a)codeaurora.org>
Co-developed-by: Prasad Sodagudi <psodagud(a)codeaurora.org>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Acked-by: Kees Cook <keescook(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Trilok Soni <tsoni(a)codeaurora.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/usercopy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/usercopy.c~mm-usercopy-use-memory-range-to-be-accessed-for-wraparound-check
+++ a/mm/usercopy.c
@@ -147,7 +147,7 @@ static inline void check_bogus_address(c
bool to_user)
{
/* Reject if object wraps past end of memory. */
- if (ptr + n < ptr)
+ if (ptr + (n - 1) < ptr)
usercopy_abort("wrapped address", NULL, to_user, 0, ptr + n);
/* Reject if NULL or ZERO-allocation. */
_
Patches currently in -mm which might be from isaacm(a)codeaurora.org are
The patch titled
Subject: mm/memcontrol.c: fix use after free in mem_cgroup_iter()
has been removed from the -mm tree. Its filename was
mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Miles Chen <miles.chen(a)mediatek.com>
Subject: mm/memcontrol.c: fix use after free in mem_cgroup_iter()
This patch is sent to report an use after free in mem_cgroup_iter() after
merging commit be2657752e9e ("mm: memcg: fix use after free in
mem_cgroup_iter()").
I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e
("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged to
the trees. However, I can still observe use after free issues addressed
in the commit be2657752e9e. (on low-end devices, a few times this month)
backtrace:
css_tryget <- crash here
mem_cgroup_iter
shrink_node
shrink_zones
do_try_to_free_pages
try_to_free_pages
__perform_reclaim
__alloc_pages_direct_reclaim
__alloc_pages_slowpath
__alloc_pages_nodemask
To debug, I poisoned mem_cgroup before freeing it:
static void __mem_cgroup_free(struct mem_cgroup *memcg)
for_each_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->stat);
+ /* poison memcg before freeing it */
+ memset(memcg, 0x78, sizeof(struct mem_cgroup));
kfree(memcg);
}
The coredump shows the position=0xdbbc2a00 is freed.
(gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8]
$13 = {position = 0xdbbc2a00, generation = 0x2efd}
0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100
0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000
0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000
0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878
0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0
0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000
0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000
0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000
0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000
0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878
0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878
0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878
In the reclaim path, try_to_free_pages() does not setup
sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ...,
shrink_node().
In mem_cgroup_iter(), root is set to root_mem_cgroup because
sc->target_mem_cgroup is NULL. It is possible to assign a memcg to
root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter().
try_to_free_pages
struct scan_control sc = {...}, target_mem_cgroup is 0x0;
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup *root = sc->target_mem_cgroup;
memcg = mem_cgroup_iter(root, NULL, &reclaim);
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
My device uses memcg non-hierarchical mode. When we release a memcg:
invalidate_reclaim_iterators() reaches only dead_memcg and its parents.
If non-hierarchical mode is used, invalidate_reclaim_iterators() never
reaches root_mem_cgroup.
static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
{
struct mem_cgroup *memcg = dead_memcg;
for (; memcg; memcg = parent_mem_cgroup(memcg)
...
}
So the use after free scenario looks like:
CPU1 CPU2
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
invalidate_reclaim_iterators(memcg);
...
__mem_cgroup_free()
kfree(memcg);
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id);
iter = &mz->iter[reclaim->priority];
pos = READ_ONCE(iter->position);
css_tryget(&pos->css) <- use after free
To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter in
invalidate_reclaim_iterators().
[cai(a)lca.pw: fix -Wparentheses compilation warning]
Link: http://lkml.kernel.org/r/1564580753-17531-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com
Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
Signed-off-by: Miles Chen <miles.chen(a)mediatek.com>
Signed-off-by: Qian Cai <cai(a)lca.pw>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter
+++ a/mm/memcontrol.c
@@ -1130,26 +1130,45 @@ void mem_cgroup_iter_break(struct mem_cg
css_put(&prev->css);
}
-static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
+static void __invalidate_reclaim_iterators(struct mem_cgroup *from,
+ struct mem_cgroup *dead_memcg)
{
- struct mem_cgroup *memcg = dead_memcg;
struct mem_cgroup_reclaim_iter *iter;
struct mem_cgroup_per_node *mz;
int nid;
int i;
- for (; memcg; memcg = parent_mem_cgroup(memcg)) {
- for_each_node(nid) {
- mz = mem_cgroup_nodeinfo(memcg, nid);
- for (i = 0; i <= DEF_PRIORITY; i++) {
- iter = &mz->iter[i];
- cmpxchg(&iter->position,
- dead_memcg, NULL);
- }
+ for_each_node(nid) {
+ mz = mem_cgroup_nodeinfo(from, nid);
+ for (i = 0; i <= DEF_PRIORITY; i++) {
+ iter = &mz->iter[i];
+ cmpxchg(&iter->position,
+ dead_memcg, NULL);
}
}
}
+static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
+{
+ struct mem_cgroup *memcg = dead_memcg;
+ struct mem_cgroup *last;
+
+ do {
+ __invalidate_reclaim_iterators(memcg, dead_memcg);
+ last = memcg;
+ } while ((memcg = parent_mem_cgroup(memcg)));
+
+ /*
+ * When cgruop1 non-hierarchy mode is used,
+ * parent_mem_cgroup() does not walk all the way up to the
+ * cgroup root (root_mem_cgroup). So we have to handle
+ * dead_memcg from cgroup root separately.
+ */
+ if (last != root_mem_cgroup)
+ __invalidate_reclaim_iterators(root_mem_cgroup,
+ dead_memcg);
+}
+
/**
* mem_cgroup_scan_tasks - iterate over tasks of a memory cgroup hierarchy
* @memcg: hierarchy root
_
Patches currently in -mm which might be from miles.chen(a)mediatek.com are
The patch titled
Subject: mm/z3fold.c: fix z3fold_destroy_pool() race condition
has been removed from the -mm tree. Its filename was
mm-z3foldc-fix-z3fold_destroy_pool-race-condition.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Henry Burns <henryburns(a)google.com>
Subject: mm/z3fold.c: fix z3fold_destroy_pool() race condition
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.
Calling z3fold_deregister_migration() before the workqueues are drained
means that there can be allocated pages referencing a freed inode,
causing any thread in compaction to be able to trip over the bad
pointer in PageMovable().
Link: http://lkml.kernel.org/r/20190726224810.79660-2-henryburns@google.com
Fixes: 1f862989b04a ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Jonathan Adams <jwadams(a)google.com>
Cc: Vitaly Vul <vitaly.vul(a)sony.com>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/mm/z3fold.c~mm-z3foldc-fix-z3fold_destroy_pool-race-condition
+++ a/mm/z3fold.c
@@ -817,16 +817,19 @@ out:
static void z3fold_destroy_pool(struct z3fold_pool *pool)
{
kmem_cache_destroy(pool->c_handle);
- z3fold_unregister_migration(pool);
/*
* We need to destroy pool->compact_wq before pool->release_wq,
* as any pending work on pool->compact_wq will call
* queue_work(pool->release_wq, &pool->work).
+ *
+ * There are still outstanding pages until both workqueues are drained,
+ * so we cannot unregister migration until then.
*/
destroy_workqueue(pool->compact_wq);
destroy_workqueue(pool->release_wq);
+ z3fold_unregister_migration(pool);
kfree(pool);
}
_
Patches currently in -mm which might be from henryburns(a)google.com are
mm-z3foldc-fix-race-between-migration-and-destruction.patch
The patch titled
Subject: mm/z3fold.c: fix z3fold_destroy_pool() ordering
has been removed from the -mm tree. Its filename was
mm-z3foldc-fix-z3fold_destroy_pool-ordering.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Henry Burns <henryburns(a)google.com>
Subject: mm/z3fold.c: fix z3fold_destroy_pool() ordering
The constraint from the zpool use of z3fold_destroy_pool() is there are
no outstanding handles to memory (so no active allocations), but it is
possible for there to be outstanding work on either of the two wqs in
the pool.
If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:
z3fold_destroy_pool()
destroy_workqueue(pool->release_wq)
destroy_workqueue(pool->compact_wq)
drain_workqueue(pool->compact_wq)
do_compact_page(zhdr)
kref_put(&zhdr->refcount)
__release_z3fold_page(zhdr, ...)
queue_work_on(pool->release_wq, &pool->work) *BOOM*
So compact_wq needs to be destroyed before release_wq.
Link: http://lkml.kernel.org/r/20190726224810.79660-1-henryburns@google.com
Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Jonathan Adams <jwadams(a)google.com>
Cc: Vitaly Vul <vitaly.vul(a)sony.com>
Cc: Vitaly Wool <vitalywool(a)gmail.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Al Viro <viro(a)zeniv.linux.org.uk
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/z3fold.c~mm-z3foldc-fix-z3fold_destroy_pool-ordering
+++ a/mm/z3fold.c
@@ -818,8 +818,15 @@ static void z3fold_destroy_pool(struct z
{
kmem_cache_destroy(pool->c_handle);
z3fold_unregister_migration(pool);
- destroy_workqueue(pool->release_wq);
+
+ /*
+ * We need to destroy pool->compact_wq before pool->release_wq,
+ * as any pending work on pool->compact_wq will call
+ * queue_work(pool->release_wq, &pool->work).
+ */
+
destroy_workqueue(pool->compact_wq);
+ destroy_workqueue(pool->release_wq);
kfree(pool);
}
_
Patches currently in -mm which might be from henryburns(a)google.com are
mm-z3foldc-fix-race-between-migration-and-destruction.patch
The patch titled
Subject: mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
has been removed from the -mm tree. Its filename was
mm-mempolicy-handle-vma-with-unmovable-pages-mapped-correctly-in-mbind.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind
When running syzkaller internally, we ran into the below bug on 4.9.x
kernel:
kernel BUG at mm/huge_memory.c:2124!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
task: ffff880067b34900 task.stack: ffff880068998000
RIP: 0010:[<ffffffff81895d6b>] [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
RSP: 0018:ffff88006899f980 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffea00018f1700 RCX: 0000000000000000
RDX: 1ffffd400031e2e7 RSI: 0000000000000001 RDI: ffffea00018f1738
RBP: ffff88006899f9e8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: fffffbfff0d8b13e R12: ffffea00018f1400
R13: ffffea00018f1400 R14: ffffea00018f1720 R15: ffffea00018f1401
FS: 00007fa333996740(0000) GS:ffff88006c600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000040 CR3: 0000000066b9c000 CR4: 00000000000606f0
Stack:
0000000000000246 ffff880067b34900 0000000000000000 ffff88007ffdc000
0000000000000000 ffff88006899f9e8 ffffffff812b4015 ffff880064c64e18
ffffea00018f1401 dffffc0000000000 ffffea00018f1700 0000000020ffd000
Call Trace:
[<ffffffff818490f1>] split_huge_page include/linux/huge_mm.h:100 [inline]
[<ffffffff818490f1>] queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538
[<ffffffff817ed0da>] walk_pmd_range mm/pagewalk.c:50 [inline]
[<ffffffff817ed0da>] walk_pud_range mm/pagewalk.c:90 [inline]
[<ffffffff817ed0da>] walk_pgd_range mm/pagewalk.c:116 [inline]
[<ffffffff817ed0da>] __walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208
[<ffffffff817edb94>] walk_page_range+0x154/0x370 mm/pagewalk.c:285
[<ffffffff81844515>] queue_pages_range+0x115/0x150 mm/mempolicy.c:694
[<ffffffff8184f493>] do_mbind mm/mempolicy.c:1241 [inline]
[<ffffffff8184f493>] SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370
[<ffffffff81850146>] SyS_mbind+0x46/0x60 mm/mempolicy.c:1352
[<ffffffff810097e2>] do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282
[<ffffffff82ff6f93>] entry_SYSCALL_64_after_swapgs+0x5d/0xdb
Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c
RIP [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
RSP <ffff88006899f980>
with the below test:
---8<---
uint64_t r[1] = {0xffffffffffffffff};
int main(void)
{
syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
intptr_t res = 0;
res = syscall(__NR_socket, 0x11, 3, 0x300);
if (res != -1)
r[0] = res;
*(uint32_t*)0x20000040 = 0x10000;
*(uint32_t*)0x20000044 = 1;
*(uint32_t*)0x20000048 = 0xc520;
*(uint32_t*)0x2000004c = 1;
syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10);
syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0);
*(uint64_t*)0x20000340 = 2;
syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340,
0x45d4, 3);
return 0;
}
---8<---
Actually the test does:
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
socket(AF_PACKET, SOCK_RAW, 768) = 3
setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0
mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000
mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0
The setsockopt() would allocate compound pages (16 pages in this test) for
packet tx ring, then the mmap() would call packet_mmap() to map the pages
into the user address space specified by the mmap() call.
When calling mbind(), it would scan the vma to queue the pages for
migration to the new node. It would split any huge page since 4.9 doesn't
support THP migration, however, the packet tx ring compound pages are not
THP and even not movable. So, the above bug is triggered.
However, the later kernel is not hit by this issue due to
d44d363f65780f2ac2 ("mm: don't assume anonymous pages have SwapBacked
flag"), which just removes the PageSwapBacked check for a different
reason.
But, there is a deeper issue. According to the semantic of mbind(), it
should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and
MPOL_MF_STRICT was also specified, but the kernel was unable to move all
existing pages in the range. The tx ring of the packet socket is
definitely not movable, however, mbind() returns success for this case.
Although the most socket file associates with non-movable pages, but XDP
may have movable pages from gup. So, it sounds not fine to just check the
underlying file type of vma in vma_migratable().
Change migrate_page_add() to check if the page is movable or not, if it is
unmovable, just return -EIO. But do not abort pte walk immediately, since
there may be pages off LRU temporarily. We should migrate other pages if
MPOL_MF_MOVE* is specified. Set has_unmovable flag if some paged could
not be not moved, then return -EIO for mbind() eventually.
With this change the above test would return -EIO as expected.
[yang.shi(a)linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-handle-vma-with-unmovable-pages-mapped-correctly-in-mbind
+++ a/mm/mempolicy.c
@@ -403,7 +403,7 @@ static const struct mempolicy_operations
},
};
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags);
struct queue_pages {
@@ -463,12 +463,11 @@ static int queue_pages_pmd(pmd_t *pmd, s
flags = qp->flags;
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- if (!vma_migratable(walk->vma)) {
+ if (!vma_migratable(walk->vma) ||
+ migrate_page_add(page, qp->pagelist, flags)) {
ret = 1;
goto unlock;
}
-
- migrate_page_add(page, qp->pagelist, flags);
} else
ret = -EIO;
unlock:
@@ -532,7 +531,14 @@ static int queue_pages_pte_range(pmd_t *
has_unmovable = true;
break;
}
- migrate_page_add(page, qp->pagelist, flags);
+
+ /*
+ * Do not abort immediately since there may be
+ * temporary off LRU pages in the range. Still
+ * need migrate other LRU pages.
+ */
+ if (migrate_page_add(page, qp->pagelist, flags))
+ has_unmovable = true;
} else
break;
}
@@ -961,7 +967,7 @@ static long do_get_mempolicy(int *policy
/*
* page migration, thp tail pages can be passed.
*/
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
struct page *head = compound_head(page);
@@ -974,8 +980,19 @@ static void migrate_page_add(struct page
mod_node_page_state(page_pgdat(head),
NR_ISOLATED_ANON + page_is_file_cache(head),
hpage_nr_pages(head));
+ } else if (flags & MPOL_MF_STRICT) {
+ /*
+ * Non-movable page may reach here. And, there may be
+ * temporary off LRU pages or non-LRU movable pages.
+ * Treat them as unmovable pages since they can't be
+ * isolated, so they can't be moved at the moment. It
+ * should return -EIO for this case too.
+ */
+ return -EIO;
}
}
+
+ return 0;
}
/* page allocation callback for NUMA node migration */
@@ -1178,9 +1195,10 @@ static struct page *new_page(struct page
}
#else
-static void migrate_page_add(struct page *page, struct list_head *pagelist,
+static int migrate_page_add(struct page *page, struct list_head *pagelist,
unsigned long flags)
{
+ return -EIO;
}
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
_
Patches currently in -mm which might be from yang.shi(a)linux.alibaba.com are
mm-thp-extract-split_queue_-into-a-struct.patch
mm-move-mem_cgroup_uncharge-out-of-__page_cache_release.patch
mm-shrinker-make-shrinker-not-depend-on-memcg-kmem.patch
mm-thp-make-deferred-split-shrinker-memcg-aware.patch
The patch titled
Subject: mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
has been removed from the -mm tree. Its filename was
mm-mempolicy-make-the-behavior-consistent-when-mpol_mf_move-and-mpol_mf_strict-were-specified.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.
There are three different sub-cases:
1. vma is not migratable
2. vma is migratable, but there are unmovable pages
3. vma is migratable, pages are movable, but migrate_pages() fails
If #1 happens, kernel would just abort immediately, then return -EIO,
after a7f40cfe3b7ada ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified").
If #3 happens, kernel would set policy and migrate pages with best-effort,
but won't rollback the migrated pages and reset the policy back.
Before that commit, they behaves in the same way. It'd better to keep
their behavior consistent. But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.
Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt rollback,
determine which exact page(s) are violating the policy, etc.
Make queue_pages_range() return 1 to indicate there are unmovable pages or
vma is not migratable.
The #2 is not handled correctly in the current kernel, the following patch
will fix it.
[yang.shi(a)linux.alibaba.com: fix review comments from Vlastimil]
Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 68 +++++++++++++++++++++++++++++++++--------------
1 file changed, 48 insertions(+), 20 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-make-the-behavior-consistent-when-mpol_mf_move-and-mpol_mf_strict-were-specified
+++ a/mm/mempolicy.c
@@ -429,11 +429,14 @@ static inline bool queue_pages_required(
}
/*
- * queue_pages_pmd() has three possible return values:
- * 1 - pages are placed on the right node or queued successfully.
- * 0 - THP was split.
- * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
- * page was already on a node that does not follow the policy.
+ * queue_pages_pmd() has four possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * 2 - THP was split.
+ * -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
+ * existing page was already on a node that does not follow the
+ * policy.
*/
static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
unsigned long end, struct mm_walk *walk)
@@ -451,19 +454,17 @@ static int queue_pages_pmd(pmd_t *pmd, s
if (is_huge_zero_page(page)) {
spin_unlock(ptl);
__split_huge_pmd(walk->vma, pmd, addr, false, NULL);
+ ret = 2;
goto out;
}
- if (!queue_pages_required(page, qp)) {
- ret = 1;
+ if (!queue_pages_required(page, qp))
goto unlock;
- }
- ret = 1;
flags = qp->flags;
/* go to thp migration */
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma)) {
- ret = -EIO;
+ ret = 1;
goto unlock;
}
@@ -479,6 +480,13 @@ out:
/*
* Scan through pages checking if pages follow certain conditions,
* and move them to the pagelist if they do.
+ *
+ * queue_pages_pte_range() has three possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * -EIO - only MPOL_MF_STRICT was specified and an existing page was already
+ * on a node that does not follow the policy.
*/
static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long end, struct mm_walk *walk)
@@ -488,17 +496,17 @@ static int queue_pages_pte_range(pmd_t *
struct queue_pages *qp = walk->private;
unsigned long flags = qp->flags;
int ret;
+ bool has_unmovable = false;
pte_t *pte;
spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmd, vma);
if (ptl) {
ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
- if (ret > 0)
- return 0;
- else if (ret < 0)
+ if (ret != 2)
return ret;
}
+ /* THP was split, fall through to pte walk */
if (pmd_trans_unstable(pmd))
return 0;
@@ -519,14 +527,21 @@ static int queue_pages_pte_range(pmd_t *
if (!queue_pages_required(page, qp))
continue;
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
- if (!vma_migratable(vma))
+ /* MPOL_MF_STRICT must be specified if we get here */
+ if (!vma_migratable(vma)) {
+ has_unmovable = true;
break;
+ }
migrate_page_add(page, qp->pagelist, flags);
} else
break;
}
pte_unmap_unlock(pte - 1, ptl);
cond_resched();
+
+ if (has_unmovable)
+ return 1;
+
return addr != end ? -EIO : 0;
}
@@ -639,7 +654,13 @@ static int queue_pages_test_walk(unsigne
*
* If pages found in a given range are on a set of nodes (determined by
* @nodes and @flags,) it's isolated and queued to the pagelist which is
- * passed via @private.)
+ * passed via @private.
+ *
+ * queue_pages_range() has three possible return values:
+ * 1 - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ * specified.
+ * 0 - queue pages successfully or no misplaced page.
+ * -EIO - there is misplaced page and only MPOL_MF_STRICT was specified.
*/
static int
queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
@@ -1182,6 +1203,7 @@ static long do_mbind(unsigned long start
struct mempolicy *new;
unsigned long end;
int err;
+ int ret;
LIST_HEAD(pagelist);
if (flags & ~(unsigned long)MPOL_MF_VALID)
@@ -1243,10 +1265,15 @@ static long do_mbind(unsigned long start
if (err)
goto mpol_out;
- err = queue_pages_range(mm, start, end, nmask,
+ ret = queue_pages_range(mm, start, end, nmask,
flags | MPOL_MF_INVERT, &pagelist);
- if (!err)
- err = mbind_range(mm, start, end, new);
+
+ if (ret < 0) {
+ err = -EIO;
+ goto up_out;
+ }
+
+ err = mbind_range(mm, start, end, new);
if (!err) {
int nr_failed = 0;
@@ -1259,13 +1286,14 @@ static long do_mbind(unsigned long start
putback_movable_pages(&pagelist);
}
- if (nr_failed && (flags & MPOL_MF_STRICT))
+ if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
err = -EIO;
} else
putback_movable_pages(&pagelist);
+up_out:
up_write(&mm->mmap_sem);
- mpol_out:
+mpol_out:
mpol_put(new);
return err;
}
_
Patches currently in -mm which might be from yang.shi(a)linux.alibaba.com are
mm-thp-extract-split_queue_-into-a-struct.patch
mm-move-mem_cgroup_uncharge-out-of-__page_cache_release.patch
mm-shrinker-make-shrinker-not-depend-on-memcg-kmem.patch
mm-thp-make-deferred-split-shrinker-memcg-aware.patch
The patch titled
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
has been removed from the -mm tree. Its filename was
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
When migrating an anonymous private page to a ZONE_DEVICE private page,
the source page->mapping and page->index fields are copied to the
destination ZONE_DEVICE struct page and the page_mapcount() is increased.
This is so rmap_walk() can be used to unmap and migrate the page back to
system memory. However, try_to_unmap_one() computes the subpage pointer
from a swap pte which computes an invalid page pointer and a kernel panic
results such as:
BUG: unable to handle page fault for address: ffffea1fffffffc8
Currently, only single pages can be migrated to device private memory so
no subpage computation is needed and it can be set to "page".
[rcampbell(a)nvidia.com: add comment]
Link: http://lkml.kernel.org/r/20190724232700.23327-4-rcampbell@nvidia.com
Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1475,7 +1475,15 @@ static bool try_to_unmap_one(struct page
/*
* No need to invalidate here it will synchronize on
* against the special swap migration pte.
+ *
+ * The assignment to subpage above was computed from a
+ * swap PTE which results in an invalid pointer.
+ * Since only PAGE_SIZE pages can currently be
+ * migrated, just set it to page. This will need to be
+ * changed when hugepage migrations to device private
+ * memory are supported.
*/
+ subpage = page;
goto discard;
}
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
The patch titled
Subject: mm/hmm: fix ZONE_DEVICE anon page mapping reuse
has been removed from the -mm tree. Its filename was
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/hmm: fix ZONE_DEVICE anon page mapping reuse
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table. For
example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Link: http://lkml.kernel.org/r/20190719192955.30462-3-rcampbell@nvidia.com
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
--- a/mm/memremap.c~mm-hmm-fix-zone_device-anon-page-mapping-reuse
+++ a/mm/memremap.c
@@ -403,6 +403,30 @@ void __put_devmap_managed_page(struct pa
mem_cgroup_uncharge(page);
+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
This is a note to let you know that I've just added the patch titled
tty/serial: atmel: reschedule TX after RX was started
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From a8441389f0daef2d74ecd79ab27067953f0b2483 Mon Sep 17 00:00:00 2001
From: Razvan Stefanescu <razvan.stefanescu(a)microchip.com>
Date: Tue, 13 Aug 2019 10:40:25 +0300
Subject: tty/serial: atmel: reschedule TX after RX was started
When half-duplex RS485 communication is used, after RX is started, TX
tasklet still needs to be scheduled tasklet. This avoids console freezing
when more data is to be transmitted, if the serial communication is not
closed.
Fixes: 69646d7a3689 ("tty/serial: atmel: RS485 HD w/DMA: enable RX after TX is stopped")
Signed-off-by: Razvan Stefanescu <razvan.stefanescu(a)microchip.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20190813074025.16218-1-razvan.stefanescu@microchi…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/atmel_serial.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c
index 19a85d6fe3d2..9a54c9e6d36e 100644
--- a/drivers/tty/serial/atmel_serial.c
+++ b/drivers/tty/serial/atmel_serial.c
@@ -1400,7 +1400,6 @@ atmel_handle_transmit(struct uart_port *port, unsigned int pending)
atmel_port->hd_start_rx = false;
atmel_start_rx(port);
- return;
}
atmel_tasklet_schedule(atmel_port, &atmel_port->tasklet_tx);
--
2.22.1
This is a note to let you know that I've just added the patch titled
usb: chipidea: imx: fix EPROBE_DEFER support during driver probe
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 141822aa3f79efc8a2ec3ed464f2fd2c93ccd803 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Andr=C3=A9=20Draszik?= <git(a)andred.net>
Date: Sat, 10 Aug 2019 16:07:58 +0100
Subject: usb: chipidea: imx: fix EPROBE_DEFER support during driver probe
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If driver probe needs to be deferred, e.g. because ci_hdrc_add_device()
isn't ready yet, this driver currently misbehaves badly:
a) success is still reported to the driver core (meaning a 2nd
probe attempt will never be done), leaving the driver in
a dysfunctional state and the hardware unusable
b) driver remove / shutdown OOPSes:
[ 206.786916] Unable to handle kernel paging request at virtual address fffffdff
[ 206.794148] pgd = 880b9f82
[ 206.796890] [fffffdff] *pgd=abf5e861, *pte=00000000, *ppte=00000000
[ 206.803179] Internal error: Oops: 37 [#1] PREEMPT SMP ARM
[ 206.808581] Modules linked in: wl18xx evbug
[ 206.813308] CPU: 1 PID: 1 Comm: systemd-shutdow Not tainted 4.19.35+gf345c93b4195 #1
[ 206.821053] Hardware name: Freescale i.MX7 Dual (Device Tree)
[ 206.826813] PC is at ci_hdrc_remove_device+0x4/0x20
[ 206.831699] LR is at ci_hdrc_imx_remove+0x20/0xe8
[ 206.836407] pc : [<805cd4b0>] lr : [<805d62cc>] psr: 20000013
[ 206.842678] sp : a806be40 ip : 00000001 fp : 80adbd3c
[ 206.847906] r10: 80b1b794 r9 : 80d5dfe0 r8 : a8192c44
[ 206.853136] r7 : 80db93a0 r6 : a8192c10 r5 : a8192c00 r4 : a93a4a00
[ 206.859668] r3 : 00000000 r2 : a8192ce4 r1 : ffffffff r0 : fffffdfb
[ 206.866201] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 206.873341] Control: 10c5387d Table: a9e0c06a DAC: 00000051
[ 206.879092] Process systemd-shutdow (pid: 1, stack limit = 0xb271353c)
[ 206.885624] Stack: (0xa806be40 to 0xa806c000)
[ 206.889992] be40: a93a4a00 805d62cc a8192c1c a8170e10 a8192c10 8049a490 80d04d08 00000000
[ 206.898179] be60: 00000000 80d0da2c fee1dead 00000000 a806a000 00000058 00000000 80148b08
[ 206.906366] be80: 01234567 80148d8c a9858600 00000000 00000000 00000000 00000000 80d04d08
[ 206.914553] bea0: 00000000 00000000 a82741e0 a9858600 00000024 00000002 a9858608 00000005
[ 206.922740] bec0: 0000001e 8022c058 00000000 00000000 a806bf14 a9858600 00000000 a806befc
[ 206.930927] bee0: a806bf78 00000000 7ee12c30 8022c18c a806bef8 a806befc 00000000 00000001
[ 206.939115] bf00: 00000000 00000024 a806bf14 00000005 7ee13b34 7ee12c68 00000004 7ee13f20
[ 206.947302] bf20: 00000010 7ee12c7c 00000005 7ee12d04 0000000a 76e7dc00 00000001 80d0f140
[ 206.955490] bf40: ab637880 a974de40 60000013 80d0f140 ab6378a0 80d04d08 a8080470 a9858600
[ 206.963677] bf60: a9858600 00000000 00000000 8022c24c 00000000 80144310 00000000 00000000
[ 206.971864] bf80: 80101204 80d04d08 00000000 80d04d08 00000000 00000000 00000003 00000058
[ 206.980051] bfa0: 80101204 80101000 00000000 00000000 fee1dead 28121969 01234567 00000000
[ 206.988237] bfc0: 00000000 00000000 00000003 00000058 00000000 00000000 00000000 00000000
[ 206.996425] bfe0: 0049ffb0 7ee13d58 0048a84b 76f245a6 60000030 fee1dead 00000000 00000000
[ 207.004622] [<805cd4b0>] (ci_hdrc_remove_device) from [<805d62cc>] (ci_hdrc_imx_remove+0x20/0xe8)
[ 207.013509] [<805d62cc>] (ci_hdrc_imx_remove) from [<8049a490>] (device_shutdown+0x16c/0x218)
[ 207.022050] [<8049a490>] (device_shutdown) from [<80148b08>] (kernel_restart+0xc/0x50)
[ 207.029980] [<80148b08>] (kernel_restart) from [<80148d8c>] (sys_reboot+0xf4/0x1f0)
[ 207.037648] [<80148d8c>] (sys_reboot) from [<80101000>] (ret_fast_syscall+0x0/0x54)
[ 207.045308] Exception stack(0xa806bfa8 to 0xa806bff0)
[ 207.050368] bfa0: 00000000 00000000 fee1dead 28121969 01234567 00000000
[ 207.058554] bfc0: 00000000 00000000 00000003 00000058 00000000 00000000 00000000 00000000
[ 207.066737] bfe0: 0049ffb0 7ee13d58 0048a84b 76f245a6
[ 207.071799] Code: ebffffa8 e3a00000 e8bd8010 e92d4010 (e5904004)
[ 207.078021] ---[ end trace be47424e3fd46e9f ]---
[ 207.082647] Kernel panic - not syncing: Fatal exception
[ 207.087894] ---[ end Kernel panic - not syncing: Fatal exception ]---
c) the error path in combination with driver removal causes
imbalanced calls to the clk_*() and pm_()* APIs
a) happens because the original intended return value is
overwritten (with 0) by the return code of
regulator_disable() in ci_hdrc_imx_probe()'s error path
b) happens because ci_pdev is -EPROBE_DEFER, which causes
ci_hdrc_remove_device() to OOPS
Fix a) by being more careful in ci_hdrc_imx_probe()'s error
path and not overwriting the real error code
Fix b) by calling the respective cleanup functions during
remove only when needed (when ci_pdev != NULL, i.e. when
everything was initialised correctly). This also has the
side effect of not causing imbalanced clk_*() and pm_*()
API calls as part of the error code path.
Fixes: 7c8e8909417e ("usb: chipidea: imx: add HSIC support")
Signed-off-by: André Draszik <git(a)andred.net>
Cc: stable <stable(a)vger.kernel.org>
CC: Peter Chen <Peter.Chen(a)nxp.com>
CC: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
CC: Shawn Guo <shawnguo(a)kernel.org>
CC: Sascha Hauer <s.hauer(a)pengutronix.de>
CC: Pengutronix Kernel Team <kernel(a)pengutronix.de>
CC: Fabio Estevam <festevam(a)gmail.com>
CC: NXP Linux Team <linux-imx(a)nxp.com>
CC: linux-usb(a)vger.kernel.org
CC: linux-arm-kernel(a)lists.infradead.org
CC: linux-kernel(a)vger.kernel.org
Link: https://lore.kernel.org/r/20190810150758.17694-1-git@andred.net
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index b5abfe89190c..df8812c30640 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -454,9 +454,11 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
imx_disable_unprepare_clks(dev);
disable_hsic_regulator:
if (data->hsic_pad_regulator)
- ret = regulator_disable(data->hsic_pad_regulator);
+ /* don't overwrite original ret (cf. EPROBE_DEFER) */
+ regulator_disable(data->hsic_pad_regulator);
if (pdata.flags & CI_HDRC_PMQOS)
pm_qos_remove_request(&data->pm_qos_req);
+ data->ci_pdev = NULL;
return ret;
}
@@ -469,14 +471,17 @@ static int ci_hdrc_imx_remove(struct platform_device *pdev)
pm_runtime_disable(&pdev->dev);
pm_runtime_put_noidle(&pdev->dev);
}
- ci_hdrc_remove_device(data->ci_pdev);
+ if (data->ci_pdev)
+ ci_hdrc_remove_device(data->ci_pdev);
if (data->override_phy_control)
usb_phy_shutdown(data->phy);
- imx_disable_unprepare_clks(&pdev->dev);
- if (data->plat_data->flags & CI_HDRC_PMQOS)
- pm_qos_remove_request(&data->pm_qos_req);
- if (data->hsic_pad_regulator)
- regulator_disable(data->hsic_pad_regulator);
+ if (data->ci_pdev) {
+ imx_disable_unprepare_clks(&pdev->dev);
+ if (data->plat_data->flags & CI_HDRC_PMQOS)
+ pm_qos_remove_request(&data->pm_qos_req);
+ if (data->hsic_pad_regulator)
+ regulator_disable(data->hsic_pad_regulator);
+ }
return 0;
}
--
2.22.1
This is a note to let you know that I've just added the patch titled
usb: cdc-acm: make sure a refcount is taken early enough
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From c52873e5a1ef72f845526d9f6a50704433f9c625 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum(a)suse.com>
Date: Thu, 8 Aug 2019 16:21:19 +0200
Subject: usb: cdc-acm: make sure a refcount is taken early enough
destroy() will decrement the refcount on the interface, so that
it needs to be taken so early that it never undercounts.
Fixes: 7fb57a019f94e ("USB: cdc-acm: Fix potential deadlock (lockdep warning)")
Cc: stable <stable(a)vger.kernel.org>
Reported-and-tested-by: syzbot+1b2449b7b5dc240d107a(a)syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Link: https://lore.kernel.org/r/20190808142119.7998-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/class/cdc-acm.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 183b41753c98..62f4fb9b362f 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1301,10 +1301,6 @@ static int acm_probe(struct usb_interface *intf,
tty_port_init(&acm->port);
acm->port.ops = &acm_port_ops;
- minor = acm_alloc_minor(acm);
- if (minor < 0)
- goto alloc_fail1;
-
ctrlsize = usb_endpoint_maxp(epctrl);
readsize = usb_endpoint_maxp(epread) *
(quirks == SINGLE_RX_URB ? 1 : 2);
@@ -1312,6 +1308,13 @@ static int acm_probe(struct usb_interface *intf,
acm->writesize = usb_endpoint_maxp(epwrite) * 20;
acm->control = control_interface;
acm->data = data_interface;
+
+ usb_get_intf(acm->control); /* undone in destruct() */
+
+ minor = acm_alloc_minor(acm);
+ if (minor < 0)
+ goto alloc_fail1;
+
acm->minor = minor;
acm->dev = usb_dev;
if (h.usb_cdc_acm_descriptor)
@@ -1458,7 +1461,6 @@ static int acm_probe(struct usb_interface *intf,
usb_driver_claim_interface(&acm_driver, data_interface, acm);
usb_set_intfdata(data_interface, acm);
- usb_get_intf(control_interface);
tty_dev = tty_port_register_device(&acm->port, acm_tty_driver, minor,
&control_interface->dev);
if (IS_ERR(tty_dev)) {
--
2.22.1
This is a note to let you know that I've just added the patch titled
USB: CDC: fix sanity checks in CDC union parser
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 54364278fb3cabdea51d6398b07c87415065b3fc Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum(a)suse.com>
Date: Tue, 13 Aug 2019 11:35:41 +0200
Subject: USB: CDC: fix sanity checks in CDC union parser
A few checks checked for the size of the pointer to a structure
instead of the structure itself. Copy & paste issue presumably.
Fixes: e4c6fb7794982 ("usbnet: move the CDC parser into USB core")
Cc: stable <stable(a)vger.kernel.org>
Reported-by: syzbot+45a53506b65321c1fe91(a)syzkaller.appspotmail.com
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Link: https://lore.kernel.org/r/20190813093541.18889-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/core/message.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/core/message.c b/drivers/usb/core/message.c
index e844bb7b5676..5adf489428aa 100644
--- a/drivers/usb/core/message.c
+++ b/drivers/usb/core/message.c
@@ -2218,14 +2218,14 @@ int cdc_parse_cdc_header(struct usb_cdc_parsed_header *hdr,
(struct usb_cdc_dmm_desc *)buffer;
break;
case USB_CDC_MDLM_TYPE:
- if (elength < sizeof(struct usb_cdc_mdlm_desc *))
+ if (elength < sizeof(struct usb_cdc_mdlm_desc))
goto next_desc;
if (desc)
return -EINVAL;
desc = (struct usb_cdc_mdlm_desc *)buffer;
break;
case USB_CDC_MDLM_DETAIL_TYPE:
- if (elength < sizeof(struct usb_cdc_mdlm_detail_desc *))
+ if (elength < sizeof(struct usb_cdc_mdlm_detail_desc))
goto next_desc;
if (detail)
return -EINVAL;
--
2.22.1
Hello stable(a)vger.kernel.org
I am Karen Ngui by name got your details based on recommendations
from LinkedIn 5 star marketers and its feels good to be part of
your professional network.
I sent you a proposal via email on the 10th of this month did
you get it ? Do reply to my message so I can elaborate more on my
proposal.
Thank You For Your Time.
On Sun, Aug 11, 2019 at 10:08 AM kbuild test robot <lkp(a)intel.com> wrote:
>
> CC: kbuild-all(a)01.org
> TO: Daniel Borkmann <daniel(a)iogearbox.net>
> CC: "Greg Kroah-Hartman" <gregkh(a)linuxfoundation.org>
> CC: Thomas Gleixner <tglx(a)linutronix.de>
>
> tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stabl… linux-4.14.y
> head: 3ffe1e79c174b2093f7ee3df589a7705572c9620
> commit: e28951100515c9fd8f8d4b06ed96576e3527ad82 [8386/9999] x86/retpolines: Disable switch jump tables when retpolines are enabled
> config: x86_64-rhel-7.6 (attached as .config)
> compiler: clang version 10.0.0 (git://gitmirror/llvm_project 45a3fd206fb06f77a08968c99a8172cbf2ccdd0f)
> reproduce:
> git checkout e28951100515c9fd8f8d4b06ed96576e3527ad82
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp(a)intel.com>
>
> All warnings (new ones prefixed by >>):
>
> In file included from drivers/gpu/drm/i915/gvt/opregion.c:25:
> In file included from drivers/gpu/drm/i915/i915_drv.h:61:
> In file included from drivers/gpu/drm/i915/intel_uc.h:31:
> In file included from drivers/gpu/drm/i915/i915_vma.h:34:
> drivers/gpu/drm/i915/i915_gem_object.h:290:1: warning: attribute declaration must precede definition [-Wignored-attributes]
> __deprecated
> ^
Was there another patch that fixes this that should have been
backported to stable (4.14) along with this?
> include/linux/compiler-gcc.h:106:37: note: expanded from macro '__deprecated'
> #define __deprecated __attribute__((deprecated))
> ^
> include/drm/drm_gem.h:247:20: note: previous definition is here
> static inline void drm_gem_object_reference(struct drm_gem_object *obj)
> ^
> In file included from drivers/gpu/drm/i915/gvt/opregion.c:25:
> In file included from drivers/gpu/drm/i915/i915_drv.h:61:
> In file included from drivers/gpu/drm/i915/intel_uc.h:31:
> In file included from drivers/gpu/drm/i915/i915_vma.h:34:
> drivers/gpu/drm/i915/i915_gem_object.h:300:1: warning: attribute declaration must precede definition [-Wignored-attributes]
> __deprecated
> ^
> include/linux/compiler-gcc.h:106:37: note: expanded from macro '__deprecated'
> #define __deprecated __attribute__((deprecated))
> ^
> include/drm/drm_gem.h:285:20: note: previous definition is here
> static inline void drm_gem_object_unreference(struct drm_gem_object *obj)
> ^
> In file included from drivers/gpu/drm/i915/gvt/opregion.c:25:
> In file included from drivers/gpu/drm/i915/i915_drv.h:61:
> In file included from drivers/gpu/drm/i915/intel_uc.h:31:
> In file included from drivers/gpu/drm/i915/i915_vma.h:34:
> drivers/gpu/drm/i915/i915_gem_object.h:303:1: warning: attribute declaration must precede definition [-Wignored-attributes]
> __deprecated
> ^
> include/linux/compiler-gcc.h:106:37: note: expanded from macro '__deprecated'
> #define __deprecated __attribute__((deprecated))
> ^
> include/drm/drm_gem.h:273:1: note: previous definition is here
> drm_gem_object_unreference_unlocked(struct drm_gem_object *obj)
> ^
> 3 warnings generated.
> >> drivers/gpu/drm/i915/gvt/opregion.o: warning: objtool: intel_vgpu_emulate_opregion_request()+0xbe: can't find jump dest instruction at .text+0x6dd
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
--
Thanks,
~Nick Desaulniers
With the existing pintbl, we already have many entries in it. it is
better to figure out a new match to reduce the size of the pintbl.
For example, there are over 10 entries in the pintbl for:
0x10ec0255, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE
If we define a new tbl like below, and with the new adding match
function, we can remove those over 10 entries:
SND_HDA_PIN_QUIRK(0x10ec0255, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE,
{0x19, 0x40000000},
{0x1a, 0x40000000},),
Here we put 0x19 and 0x1a in the tbl just because these two pins are
undefined on Dell laptops with the codec alc255, and these two pins
will be overwritten by ALC255_FIXUP_DELL1_MIC_NO_PRESENCE.
In summary: the new match will check vendor id and codec id first,
then check the pin_cfg defined in the tbl, only all pin_cfgs in the
tbl are undef and the corresponding pin_cfgs on the laptop are undef
too, this match returns true.
This new match function has lower priority than existing match
functions, so the existing tbls still work as before after applying this
patch.
My plan is to change the existing tbl to undef tbl for MIC_NO_PRESENCE
fixups gradually.
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/hda_auto_parser.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/sound/pci/hda/hda_auto_parser.c b/sound/pci/hda/hda_auto_parser.c
index 92390d457567..cfada7401b86 100644
--- a/sound/pci/hda/hda_auto_parser.c
+++ b/sound/pci/hda/hda_auto_parser.c
@@ -915,6 +915,36 @@ static bool pin_config_match(struct hda_codec *codec,
return true;
}
+/* match the pintbl which only contains specific pins with undef configuration */
+static bool pin_config_match_undef(struct hda_codec *codec,
+ const struct hda_pintbl *pins)
+{
+ bool match = false;
+
+ for (; pins->nid; pins++) {
+ const struct hda_pincfg *pin;
+ int i;
+
+ if ((pins->val & 0xf0000000) != 0x40000000)
+ return false;
+
+ match = false;
+ snd_array_for_each(&codec->init_pins, i, pin) {
+ if (pin->nid != pins->nid)
+ continue;
+ if ((pin->cfg & 0xf0000000) != 0x40000000)
+ return false;
+ match = true;
+ break;
+ }
+
+ if (match == false)
+ return false;
+ }
+
+ return match;
+}
+
/**
* snd_hda_pick_pin_fixup - Pick up a fixup matching with the pin quirk list
* @codec: the HDA codec
@@ -935,7 +965,7 @@ void snd_hda_pick_pin_fixup(struct hda_codec *codec,
continue;
if (codec->core.vendor_id != pq->codec)
continue;
- if (pin_config_match(codec, pq->pins)) {
+ if (pin_config_match(codec, pq->pins) || pin_config_match_undef(codec, pq->pins)) {
codec->fixup_id = pq->value;
#ifdef CONFIG_SND_DEBUG_VERBOSE
codec->fixup_name = pq->name;
--
2.17.1
Hi Sasha,
On Mon, Aug 12, 2019 at 07:05:47PM +0000, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: d3862e44daa7 dma-buf/sw-sync: Fix locking around sync_timeline lists.
>
> The bot has tested the following trees: v5.2.8, v4.19.66, v4.14.138, v4.9.189.
>
> v5.2.8: Build OK!
> v4.19.66: Build OK!
> v4.14.138: Build OK!
> v4.9.189: Failed to apply! Possible dependencies:
> Unable to calculate
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
The backporting instruction has an explicit # v4.14+ in there, so failure
to apply to older kernels is expected.
Can you perhaps teach this trick to your script perhaps? Iirc we're using
the official format even.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: d36a8d2fb62c - Linux 5.2.8
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://artifacts.cki-project.org/pipelines/100903
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out the following commit:
Repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: d36a8d2fb62c - Linux 5.2.8
We grabbed the d6535b968bdc commit of the stable queue repository.
We then merged the patchset with `git am`:
revert-pci-add-missing-link-delays-required-by-the-pcie-spec.patch
iio-ingenic-jz47xx-set-clock-divider-on-probe.patch
iio-cros_ec_accel_legacy-fix-incorrect-channel-setting.patch
iio-imu-mpu6050-add-missing-available-scan-masks.patch
iio-adc-gyroadc-fix-uninitialized-return-code.patch
iio-adc-max9611-fix-misuse-of-genmask-macro.patch
staging-gasket-apex-fix-copy-paste-typo.patch
staging-wilc1000-flush-the-workqueue-before-deinit-the-host.patch
staging-android-ion-bail-out-upon-sigkill-when-allocating-memory.patch
staging-fbtft-fix-probing-of-gpio-descriptor.patch
staging-fbtft-fix-reset-assertion-when-using-gpio-descriptor.patch
crypto-ccp-fix-oops-by-properly-managing-allocated-structures.patch
crypto-ccp-add-support-for-valid-authsize-values-less-than-16.patch
crypto-ccp-ignore-tag-length-when-decrypting-gcm-ciphertext.patch
driver-core-platform-return-enxio-for-missing-gpioint.patch
usb-usbfs-fix-double-free-of-usb-memory-upon-submiturb-error.patch
revert-usb-rio500-simplify-locking.patch
usb-iowarrior-fix-deadlock-on-disconnect.patch
sound-fix-a-memory-leak-bug.patch
mmc-cavium-set-the-correct-dma-max-segment-size-for-mmc_host.patch
mmc-cavium-add-the-missing-dma-unmap-when-the-dma-has-finished.patch
loop-set-pf_memalloc_noio-for-the-worker-thread.patch
bdev-fixup-error-handling-in-blkdev_get.patch
input-usbtouchscreen-initialize-pm-mutex-before-using-it.patch
input-elantech-enable-smbus-on-new-2018-systems.patch
input-synaptics-enable-rmi-mode-for-hp-spectre-x360.patch
x86-mm-check-for-pfn-instead-of-page-in-vmalloc_sync_one.patch
x86-mm-sync-also-unmappings-in-vmalloc_sync_all.patch
mm-vmalloc-sync-unmappings-in-__purge_vmap_area_lazy.patch
coresight-fix-debug_locks_warn_on-for-uninitialized-attribute.patch
perf-annotate-fix-s390-gap-between-kernel-end-and-module-start.patch
perf-db-export-fix-thread__exec_comm.patch
perf-record-fix-module-size-on-s390.patch
x86-purgatory-do-not-use-__builtin_memcpy-and-__builtin_memset.patch
x86-purgatory-use-cflags_remove-rather-than-reset-kbuild_cflags.patch
genirq-affinity-create-affinity-mask-for-single-vector.patch
gfs2-gfs2_walk_metadata-fix.patch
usb-host-xhci-rcar-fix-timeout-in-xhci_suspend.patch
usb-yurex-fix-use-after-free-in-yurex_delete.patch
usb-typec-ucsi-ccg-fix-uninitilized-symbol-error.patch
usb-typec-tcpm-free-log-buf-memory-when-remove-debug-file.patch
usb-typec-tcpm-remove-tcpm-dir-if-no-children.patch
usb-typec-tcpm-add-null-check-before-dereferencing-config.patch
usb-typec-tcpm-ignore-unsupported-unknown-alternate-mode-requests.patch
can-rcar_canfd-fix-possible-irq-storm-on-high-load.patch
can-flexcan-fix-stop-mode-acknowledgment.patch
can-flexcan-fix-an-use-after-free-in-flexcan_setup_stop_mode.patch
can-peak_usb-fix-potential-double-kfree_skb.patch
powerpc-fix-off-by-one-in-max_zone_pfn-initializatio.patch
netfilter-nfnetlink-avoid-deadlock-due-to-synchronou.patch
vfio-ccw-set-pa_nr-to-0-if-memory-allocation-fails-f.patch
vfio-ccw-don-t-call-cp_free-if-we-are-processing-a-c.patch
netfilter-fix-rpfilter-dropping-vrf-packets-by-mista.patch
netfilter-nf_tables-fix-module-autoload-for-redir.patch
netfilter-conntrack-always-store-window-size-un-scal.patch
netfilter-nft_hash-fix-symhash-with-modulus-one.patch
scripts-sphinx-pre-install-fix-script-for-rhel-cento.patch
scripts-sphinx-pre-install-don-t-use-latex-with-cent.patch
scripts-sphinx-pre-install-fix-latexmk-dependencies.patch
rq-qos-don-t-reset-has_sleepers-on-spurious-wakeups.patch
rq-qos-set-ourself-task_uninterruptible-after-we-sch.patch
rq-qos-use-a-mb-for-got_token.patch
netfilter-nf_tables-support-auto-loading-for-inet-na.patch
drm-amd-display-no-audio-endpoint-for-dell-mst-displ.patch
drm-amd-display-clock-does-not-lower-in-updateplanes.patch
drm-amd-display-wait-for-backlight-programming-compl.patch
drm-amd-display-fix-dmcu-hang-when-going-into-modern.patch
drm-amd-display-use-encoder-s-engine-id-to-find-matc.patch
drm-amd-display-put-back-front-end-initialization-se.patch
drm-amd-display-allocate-4-ddc-engines-for-rv2.patch
drm-amd-display-fix-dc_create-failure-handling-and-6.patch
drm-amd-display-only-enable-audio-if-speaker-allocat.patch
drm-amd-display-increase-size-of-audios-array.patch
iscsi_ibft-make-iscsi_ibft-dependson-acpi-instead-of.patch
nl80211-fix-nl80211_he_max_capability_len.patch
mac80211-fix-possible-memory-leak-in-ieee80211_assig.patch
mac80211-don-t-warn-about-cw-params-when-not-using-t.patch
allocate_flower_entry-should-check-for-null-deref.patch
hwmon-occ-fix-division-by-zero-issue.patch
hwmon-nct6775-fix-register-address-and-added-missed-.patch
arm-dts-imx6ul-fix-clock-frequency-property-name-of-.patch
powerpc-papr_scm-force-a-scm-unbind-if-initial-scm-b.patch
arm64-force-ssbs-on-context-switch.patch
arm64-entry-sp-alignment-fault-doesn-t-write-to-far_.patch
iommu-vt-d-check-if-domain-pgd-was-allocated.patch
drm-msm-dpu-correct-dpu-encoder-spinlock-initializat.patch
drm-silence-variable-conn-set-but-not-used.patch
arm64-dts-imx8mm-correct-sai3-rxc-txfs-pin-s-mux-opt.patch
arm64-dts-imx8mq-fix-sai-compatible.patch
cpufreq-pasemi-fix-use-after-free-in-pas_cpufreq_cpu.patch
s390-qdio-add-sanity-checks-to-the-fast-requeue-path.patch
alsa-compress-fix-regression-on-compressed-capture-s.patch
alsa-compress-prevent-bypasses-of-set_params.patch
alsa-compress-don-t-allow-paritial-drain-operations-.patch
alsa-compress-be-more-restrictive-about-when-a-drain.patch
perf-script-fix-off-by-one-in-brstackinsn-ipc-comput.patch
perf-tools-fix-proper-buffer-size-for-feature-proces.patch
perf-stat-fix-segfault-for-event-group-in-repeat-mod.patch
perf-session-fix-loading-of-compressed-data-split-ac.patch
perf-probe-avoid-calling-freeing-routine-multiple-ti.patch
drbd-dynamically-allocate-shash-descriptor.patch
acpi-iort-fix-off-by-one-check-in-iort_dev_find_its_.patch
nvme-ignore-subnqn-for-adata-sx6000lnp.patch
nvme-fix-memory-leak-caused-by-incorrect-subsystem-f.patch
arm-davinci-fix-sleep.s-build-error-on-armv4.patch
arm-dts-bcm-bcm47094-add-missing-cells-for-mdio-bus-.patch
scsi-megaraid_sas-fix-panic-on-loading-firmware-cras.patch
scsi-ibmvfc-fix-warn_on-during-event-pool-release.patch
scsi-scsi_dh_alua-always-use-a-2-second-delay-before.patch
test_firmware-fix-a-memory-leak-bug.patch
tty-ldsem-locking-rwsem-add-missing-acquire-to-read_.patch
perf-x86-intel-fix-slots-pebs-event-constraint.patch
perf-x86-intel-fix-invalid-bit-13-for-icelake-msr_of.patch
perf-x86-apply-more-accurate-check-on-hypervisor-pla.patch
perf-core-fix-creating-kernel-counters-for-pmus-that.patch
s390-dma-provide-proper-arch_zone_dma_bits-value.patch
gen_compile_commands-lower-the-entry-count-threshold.patch
hid-sony-fix-race-condition-between-rumble-and-device-remove.patch
alsa-usb-audio-fix-a-memory-leak-bug.patch
kvm-nsvm-properly-map-nested-vmcb.patch
can-peak_usb-pcan_usb_pro-fix-info-leaks-to-usb-devices.patch
can-peak_usb-pcan_usb_fd-fix-info-leaks-to-usb-devices.patch
hwmon-nct7802-fix-wrong-detection-of-in4-presence.patch
hwmon-lm75-fixup-tmp75b-clr_mask.patch
drm-i915-fix-wrong-escape-clock-divisor-init-for-glk.patch
alsa-firewire-fix-a-memory-leak-bug.patch
alsa-hiface-fix-multiple-memory-leak-bugs.patch
alsa-hda-don-t-override-global-pcm-hw-info-flag.patch
alsa-hda-workaround-for-crackled-sound-on-amd-controller-1022-1457.patch
mac80211-don-t-warn-on-short-wmm-parameters-from-ap.patch
dax-dax_layout_busy_page-should-not-unmap-cow-pages.patch
smb3-fix-deadlock-in-validate-negotiate-hits-reconnect.patch
smb3-send-cap_dfs-capability-during-session-setup.patch
nfsv4-fix-delegation-state-recovery.patch
nfsv4-check-the-return-value-of-update_open_stateid.patch
nfsv4-fix-an-oops-in-nfs4_do_setattr.patch
kvm-fix-leak-vcpu-s-vmcs-value-into-other-pcpu.patch
kvm-arm-arm64-sync-ich_vmcr_el2-back-when-about-to-block.patch
mwifiex-fix-802.11n-wpa-detection.patch
iwlwifi-don-t-unmap-as-page-memory-that-was-mapped-as-single.patch
iwlwifi-mvm-fix-an-out-of-bound-access.patch
iwlwifi-mvm-fix-a-use-after-free-bug-in-iwl_mvm_tx_tso_segment.patch
iwlwifi-mvm-don-t-send-geo_tx_power_limit-on-version-41.patch
iwlwifi-mvm-fix-version-check-for-geo_tx_power_limit-support.patch
Compile testing
---------------
We compiled the kernel for 3 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
ppc64le:
Host 1:
✅ Boot test [0]
✅ xfstests: xfs [1]
✅ selinux-policy: serge-testsuite [2]
✅ lvm thinp sanity [3]
✅ storage: software RAID testing [4]
🚧 ✅ Storage blktests [5]
Host 2:
✅ Boot test [0]
✅ Podman system integration test (as root) [6]
✅ Podman system integration test (as user) [6]
✅ LTP lite [7]
✅ Loopdev Sanity [8]
✅ jvm test suite [9]
✅ AMTU (Abstract Machine Test Utility) [10]
✅ LTP: openposix test suite [11]
✅ Ethernet drivers sanity [12]
✅ Networking socket: fuzz [13]
✅ audit: audit testsuite test [14]
✅ httpd: mod_ssl smoke sanity [15]
✅ iotop: sanity [16]
✅ tuned: tune-processes-through-perf [17]
✅ pciutils: update pci ids test [18]
✅ Usex - version 1.9-29 [19]
x86_64:
Host 1:
✅ Boot test [0]
✅ xfstests: xfs [1]
✅ selinux-policy: serge-testsuite [2]
✅ lvm thinp sanity [3]
✅ storage: software RAID testing [4]
🚧 ✅ Storage blktests [5]
Host 2:
✅ Boot test [0]
✅ Podman system integration test (as root) [6]
✅ Podman system integration test (as user) [6]
✅ LTP lite [7]
✅ Loopdev Sanity [8]
✅ jvm test suite [9]
✅ AMTU (Abstract Machine Test Utility) [10]
✅ LTP: openposix test suite [11]
✅ Ethernet drivers sanity [12]
✅ Networking socket: fuzz [13]
✅ audit: audit testsuite test [14]
✅ httpd: mod_ssl smoke sanity [15]
✅ iotop: sanity [16]
✅ tuned: tune-processes-through-perf [17]
✅ pciutils: sanity smoke test [20]
✅ pciutils: update pci ids test [18]
✅ Usex - version 1.9-29 [19]
✅ storage: SCSI VPD [21]
✅ stress: stress-ng [22]
Test source:
💚 Pull requests are welcome for new tests or improvements to existing tests!
[0]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[1]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/filesystems…
[2]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/packages/se…
[3]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/lvm/…
[4]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/swra…
[5]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/blk
[6]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/container/p…
[7]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[8]: https://github.com/CKI-project/tests-beaker/archive/master.zip#filesystems/…
[9]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/jvm
[10]: https://github.com/CKI-project/tests-beaker/archive/master.zip#misc/amtu
[11]: https://github.com/CKI-project/tests-beaker/archive/master.zip#distribution…
[12]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/…
[13]: https://github.com/CKI-project/tests-beaker/archive/master.zip#/networking/…
[14]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/aud…
[15]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/htt…
[16]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/iot…
[17]: https://github.com/CKI-project/tests-beaker/archive/master.zip#packages/tun…
[18]: https://github.com/CKI-project/tests-beaker/archive/master.zip#pciutils/upd…
[19]: https://github.com/CKI-project/tests-beaker/archive/master.zip#standards/us…
[20]: https://github.com/CKI-project/tests-beaker/archive/master.zip#pciutils/san…
[21]: https://github.com/CKI-project/tests-beaker/archive/master.zip#storage/scsi…
[22]: https://github.com/CKI-project/tests-beaker/archive/master.zip#stress/stres…
Waived tests (marked with 🚧)
-----------------------------
This test run included waived tests. Such tests are executed but their results
are not taken into account. Tests are waived when their results are not
reliable enough, e.g. when they're just introduced or are being fixed.
FS_IOC_GETFSLABEL/FS_IOC_SETFSLABEL were added in linux-4.18
in xfs, but not in the compat_ioctl case, so add them here.
FITRIM was added earlier and also lacks a line the same function,
but this is ok because there is an entry in fs/compat_ioctl.c for
it.
Adding all three here to keep the native and compat ioctl handlers
in sync.
Cc: stable(a)vger.kernel.org
Fixes: f7664b31975b ("xfs: implement online get/set fs label")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
fs/xfs/xfs_ioctl32.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index ad91e81a2fcf..63bdc4c1535b 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -576,6 +576,9 @@ xfs_file_compat_ioctl(
case XFS_IOC_SCRUB_METADATA:
case XFS_IOC_BULKSTAT:
case XFS_IOC_INUMBERS:
+ case FITRIM:
+ case FS_IOC_GETFSLABEL:
+ case FS_IOC_SETFSLABEL:
return xfs_file_ioctl(filp, cmd, (unsigned long)arg);
#if !defined(BROKEN_X86_ALIGNMENT) || defined(CONFIG_X86_X32)
/*
--
2.20.0
From: Florian Westphal <fw(a)strlen.de>
[ Upstream commit 1b0890cd60829bd51455dc5ad689ed58c4408227 ]
Thomas and Juliana report a deadlock when running:
(rmmod nf_conntrack_netlink/xfrm_user)
conntrack -e NEW -E &
modprobe -v xfrm_user
They provided following analysis:
conntrack -e NEW -E
netlink_bind()
netlink_lock_table() -> increases "nl_table_users"
nfnetlink_bind()
# does not unlock the table as it's locked by netlink_bind()
__request_module()
call_usermodehelper_exec()
This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind()
won't return until modprobe process is done.
"modprobe xfrm_user":
xfrm_user_init()
register_pernet_subsys()
-> grab pernet_ops_rwsem
..
netlink_table_grab()
calls schedule() as "nl_table_users" is non-zero
so modprobe is blocked because netlink_bind() increased
nl_table_users while also holding pernet_ops_rwsem.
"modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink:
ctnetlink_init()
register_pernet_subsys()
-> blocks on "pernet_ops_rwsem" thanks to xfrm_user module
both modprobe processes wait on one another -- neither can make
progress.
Switch netlink_bind() to "nowait" modprobe -- this releases the netlink
table lock, which then allows both modprobe instances to complete.
Reported-by: Thomas Jarosch <thomas.jarosch(a)intra2net.com>
Reported-by: Juliana Rodrigueiro <juliana.rodrigueiro(a)intra2net.com>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/netfilter/nfnetlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 916913454624f..7f2c1915763f8 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -575,7 +575,7 @@ static int nfnetlink_bind(struct net *net, int group)
ss = nfnetlink_get_subsys(type << 8);
rcu_read_unlock();
if (!ss)
- request_module("nfnetlink-subsys-%d", type);
+ request_module_nowait("nfnetlink-subsys-%d", type);
return 0;
}
#endif
--
2.20.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From df612421fe2566654047769c6852ffae1a31df16 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 24 Jul 2019 12:46:34 -0700
Subject: [PATCH] mwifiex: fix 802.11n/WPA detection
Commit 63d7ef36103d ("mwifiex: Don't abort on small, spec-compliant
vendor IEs") adjusted the ieee_types_vendor_header struct, which
inadvertently messed up the offsets used in
mwifiex_is_wpa_oui_present(). Add that offset back in, mirroring
mwifiex_is_rsn_oui_present().
As it stands, commit 63d7ef36103d breaks compatibility with WPA (not
WPA2) 802.11n networks, since we hit the "info: Disable 11n if AES is
not supported by AP" case in mwifiex_is_network_compatible().
Fixes: 63d7ef36103d ("mwifiex: Don't abort on small, spec-compliant vendor IEs")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
diff --git a/drivers/net/wireless/marvell/mwifiex/main.h b/drivers/net/wireless/marvell/mwifiex/main.h
index 3e442c7f7882..095837fba300 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.h
+++ b/drivers/net/wireless/marvell/mwifiex/main.h
@@ -124,6 +124,7 @@ enum {
#define MWIFIEX_MAX_TOTAL_SCAN_TIME (MWIFIEX_TIMER_10S - MWIFIEX_TIMER_1S)
+#define WPA_GTK_OUI_OFFSET 2
#define RSN_GTK_OUI_OFFSET 2
#define MWIFIEX_OUI_NOT_PRESENT 0
diff --git a/drivers/net/wireless/marvell/mwifiex/scan.c b/drivers/net/wireless/marvell/mwifiex/scan.c
index 0d6d41727037..21dda385f6c6 100644
--- a/drivers/net/wireless/marvell/mwifiex/scan.c
+++ b/drivers/net/wireless/marvell/mwifiex/scan.c
@@ -181,7 +181,8 @@ mwifiex_is_wpa_oui_present(struct mwifiex_bssdescriptor *bss_desc, u32 cipher)
u8 ret = MWIFIEX_OUI_NOT_PRESENT;
if (has_vendor_hdr(bss_desc->bcn_wpa_ie, WLAN_EID_VENDOR_SPECIFIC)) {
- iebody = (struct ie_body *) bss_desc->bcn_wpa_ie->data;
+ iebody = (struct ie_body *)((u8 *)bss_desc->bcn_wpa_ie->data +
+ WPA_GTK_OUI_OFFSET);
oui = &mwifiex_wpa_oui[cipher][0];
ret = mwifiex_search_oui_in_ie(iebody, oui);
if (ret)
Checking bypass_reg is incorrect for calculating the cnt_clk rates.
Instead we should be checking that there is a proper hardware register
that holds the clock divider.
Cc: stable(a)vger.kernel.org
Signed-off-by: Dinh Nguyen <dinguyen(a)kernel.org>
---
drivers/clk/socfpga/clk-periph-s10.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/clk/socfpga/clk-periph-s10.c b/drivers/clk/socfpga/clk-periph-s10.c
index 5c50e723ecae..1a191eeeebba 100644
--- a/drivers/clk/socfpga/clk-periph-s10.c
+++ b/drivers/clk/socfpga/clk-periph-s10.c
@@ -38,7 +38,7 @@ static unsigned long clk_peri_cnt_clk_recalc_rate(struct clk_hw *hwclk,
if (socfpgaclk->fixed_div) {
div = socfpgaclk->fixed_div;
} else {
- if (!socfpgaclk->bypass_reg)
+ if (socfpgaclk->hw.reg)
div = ((readl(socfpgaclk->hw.reg) & 0x7ff) + 1);
}
--
2.20.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d584a3c85d6fe2cf878f220d4ad7145e7f89218 Mon Sep 17 00:00:00 2001
From: Anders Roxell <anders.roxell(a)linaro.org>
Date: Fri, 26 Jul 2019 13:27:05 +0200
Subject: [PATCH] arm64: KVM: regmap: Fix unexpected switch fall-through
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warnings was starting to show up:
In file included from ../arch/arm64/include/asm/kvm_emulate.h:19,
from ../arch/arm64/kvm/regmap.c:13:
../arch/arm64/kvm/regmap.c: In function ‘vcpu_write_spsr32’:
../arch/arm64/include/asm/kvm_hyp.h:31:3: warning: this statement may fall
through [-Wimplicit-fallthrough=]
asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"), \
^~~
../arch/arm64/include/asm/kvm_hyp.h:46:31: note: in expansion of macro ‘write_sysreg_elx’
#define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:180:3: note: in expansion of macro ‘write_sysreg_el1’
write_sysreg_el1(v, SYS_SPSR);
^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:181:2: note: here
case KVM_SPSR_ABT:
^~~~
In file included from ../arch/arm64/include/asm/cputype.h:132,
from ../arch/arm64/include/asm/cache.h:8,
from ../include/linux/cache.h:6,
from ../include/linux/printk.h:9,
from ../include/linux/kernel.h:15,
from ../include/asm-generic/bug.h:18,
from ../arch/arm64/include/asm/bug.h:26,
from ../include/linux/bug.h:5,
from ../include/linux/mmdebug.h:5,
from ../include/linux/mm.h:9,
from ../arch/arm64/kvm/regmap.c:11:
../arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall
through [-Wimplicit-fallthrough=]
asm volatile("msr " __stringify(r) ", %x0" \
^~~
../arch/arm64/kvm/regmap.c:182:3: note: in expansion of macro ‘write_sysreg’
write_sysreg(v, spsr_abt);
^~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:183:2: note: here
case KVM_SPSR_UND:
^~~~
Rework to add a 'break;' in the swich-case since it didn't have that,
leading to an interresting set of bugs.
Cc: stable(a)vger.kernel.org # v4.17+
Fixes: a892819560c4 ("KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers")
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
[maz: reworked commit message, fixed stable range]
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index 0d60e4f0af66..a900181e3867 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -178,13 +178,18 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
switch (spsr_idx) {
case KVM_SPSR_SVC:
write_sysreg_el1(v, SYS_SPSR);
+ break;
case KVM_SPSR_ABT:
write_sysreg(v, spsr_abt);
+ break;
case KVM_SPSR_UND:
write_sysreg(v, spsr_und);
+ break;
case KVM_SPSR_IRQ:
write_sysreg(v, spsr_irq);
+ break;
case KVM_SPSR_FIQ:
write_sysreg(v, spsr_fiq);
+ break;
}
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d584a3c85d6fe2cf878f220d4ad7145e7f89218 Mon Sep 17 00:00:00 2001
From: Anders Roxell <anders.roxell(a)linaro.org>
Date: Fri, 26 Jul 2019 13:27:05 +0200
Subject: [PATCH] arm64: KVM: regmap: Fix unexpected switch fall-through
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warnings was starting to show up:
In file included from ../arch/arm64/include/asm/kvm_emulate.h:19,
from ../arch/arm64/kvm/regmap.c:13:
../arch/arm64/kvm/regmap.c: In function ‘vcpu_write_spsr32’:
../arch/arm64/include/asm/kvm_hyp.h:31:3: warning: this statement may fall
through [-Wimplicit-fallthrough=]
asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"), \
^~~
../arch/arm64/include/asm/kvm_hyp.h:46:31: note: in expansion of macro ‘write_sysreg_elx’
#define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12)
^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:180:3: note: in expansion of macro ‘write_sysreg_el1’
write_sysreg_el1(v, SYS_SPSR);
^~~~~~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:181:2: note: here
case KVM_SPSR_ABT:
^~~~
In file included from ../arch/arm64/include/asm/cputype.h:132,
from ../arch/arm64/include/asm/cache.h:8,
from ../include/linux/cache.h:6,
from ../include/linux/printk.h:9,
from ../include/linux/kernel.h:15,
from ../include/asm-generic/bug.h:18,
from ../arch/arm64/include/asm/bug.h:26,
from ../include/linux/bug.h:5,
from ../include/linux/mmdebug.h:5,
from ../include/linux/mm.h:9,
from ../arch/arm64/kvm/regmap.c:11:
../arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall
through [-Wimplicit-fallthrough=]
asm volatile("msr " __stringify(r) ", %x0" \
^~~
../arch/arm64/kvm/regmap.c:182:3: note: in expansion of macro ‘write_sysreg’
write_sysreg(v, spsr_abt);
^~~~~~~~~~~~
../arch/arm64/kvm/regmap.c:183:2: note: here
case KVM_SPSR_UND:
^~~~
Rework to add a 'break;' in the swich-case since it didn't have that,
leading to an interresting set of bugs.
Cc: stable(a)vger.kernel.org # v4.17+
Fixes: a892819560c4 ("KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers")
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
[maz: reworked commit message, fixed stable range]
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c
index 0d60e4f0af66..a900181e3867 100644
--- a/arch/arm64/kvm/regmap.c
+++ b/arch/arm64/kvm/regmap.c
@@ -178,13 +178,18 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v)
switch (spsr_idx) {
case KVM_SPSR_SVC:
write_sysreg_el1(v, SYS_SPSR);
+ break;
case KVM_SPSR_ABT:
write_sysreg(v, spsr_abt);
+ break;
case KVM_SPSR_UND:
write_sysreg(v, spsr_und);
+ break;
case KVM_SPSR_IRQ:
write_sysreg(v, spsr_irq);
+ break;
case KVM_SPSR_FIQ:
write_sysreg(v, spsr_fiq);
+ break;
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5eeaf10eec394b28fad2c58f1f5c3a5da0e87d1c Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Fri, 2 Aug 2019 10:28:32 +0100
Subject: [PATCH] KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.
There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().
Things go really south if the guest does the following:
mov x0, #0 // or any small value masking interrupts
msr ICC_PMR_EL1, x0
[vcpu preempted, then rescheduled, VMCR sampled]
mov x0, #ff // allow all interrupts
msr ICC_PMR_EL1, x0
wfi // traps to EL2, so samping of VMCR
[interrupt arrives just after WFI]
Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.
To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.
This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.
Cc: stable(a)vger.kernel.org # 4.12
Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 46bbc949c20a..7a30524a80ee 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -350,6 +350,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
void kvm_vgic_load(struct kvm_vcpu *vcpu);
void kvm_vgic_put(struct kvm_vcpu *vcpu);
+void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
#define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
#define vgic_initialized(k) ((k)->arch.vgic.initialized)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index c704fa696184..482b20256fa8 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -323,6 +323,17 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
{
+ /*
+ * If we're about to block (most likely because we've just hit a
+ * WFI), we need to sync back the state of the GIC CPU interface
+ * so that we have the lastest PMR and group enables. This ensures
+ * that kvm_arch_vcpu_runnable has up-to-date data to decide
+ * whether we have pending interrupts.
+ */
+ preempt_disable();
+ kvm_vgic_vmcr_sync(vcpu);
+ preempt_enable();
+
kvm_vgic_v4_enable_doorbell(vcpu);
}
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 6dd5ad706c92..96aab77d0471 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -484,10 +484,17 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
kvm_vgic_global_state.vctrl_base + GICH_APR);
}
-void vgic_v2_put(struct kvm_vcpu *vcpu)
+void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
+}
+
+void vgic_v2_put(struct kvm_vcpu *vcpu)
+{
+ struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+
+ vgic_v2_vmcr_sync(vcpu);
cpu_if->vgic_apr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_APR);
}
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index c2c9ce009f63..0c653a1e5215 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -662,12 +662,17 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
__vgic_v3_activate_traps(vcpu);
}
-void vgic_v3_put(struct kvm_vcpu *vcpu)
+void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
if (likely(cpu_if->vgic_sre))
cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
+}
+
+void vgic_v3_put(struct kvm_vcpu *vcpu)
+{
+ vgic_v3_vmcr_sync(vcpu);
kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 04786c8ec77e..13d4b38a94ec 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -919,6 +919,17 @@ void kvm_vgic_put(struct kvm_vcpu *vcpu)
vgic_v3_put(vcpu);
}
+void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu)
+{
+ if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
+ return;
+
+ if (kvm_vgic_global_state.type == VGIC_V2)
+ vgic_v2_vmcr_sync(vcpu);
+ else
+ vgic_v3_vmcr_sync(vcpu);
+}
+
int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 57205beaa981..11adbdac1d56 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -193,6 +193,7 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
void vgic_v2_init_lrs(void);
void vgic_v2_load(struct kvm_vcpu *vcpu);
void vgic_v2_put(struct kvm_vcpu *vcpu);
+void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu);
void vgic_v2_save_state(struct kvm_vcpu *vcpu);
void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
@@ -223,6 +224,7 @@ bool vgic_v3_check_base(struct kvm *kvm);
void vgic_v3_load(struct kvm_vcpu *vcpu);
void vgic_v3_put(struct kvm_vcpu *vcpu);
+void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
bool vgic_has_its(struct kvm *kvm);
int kvm_vgic_register_its_device(void);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5eeaf10eec394b28fad2c58f1f5c3a5da0e87d1c Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Fri, 2 Aug 2019 10:28:32 +0100
Subject: [PATCH] KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer
touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or
its GICv2 equivalent) loaded as long as we can, only syncing it
back when we're scheduled out.
There is a small snag with that though: kvm_vgic_vcpu_pending_irq(),
which is indirectly called from kvm_vcpu_check_block(), needs to
evaluate the guest's view of ICC_PMR_EL1. At the point were we
call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever
changes to PMR is not visible in memory until we do a vcpu_put().
Things go really south if the guest does the following:
mov x0, #0 // or any small value masking interrupts
msr ICC_PMR_EL1, x0
[vcpu preempted, then rescheduled, VMCR sampled]
mov x0, #ff // allow all interrupts
msr ICC_PMR_EL1, x0
wfi // traps to EL2, so samping of VMCR
[interrupt arrives just after WFI]
Here, the hypervisor's view of PMR is zero, while the guest has enabled
its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no
interrupts are pending (despite an interrupt being received) and we'll
block for no reason. If the guest doesn't have a periodic interrupt
firing once it has blocked, it will stay there forever.
To avoid this unfortuante situation, let's resync VMCR from
kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block()
will observe the latest value of PMR.
This has been found by booting an arm64 Linux guest with the pseudo NMI
feature, and thus using interrupt priorities to mask interrupts instead
of the usual PSTATE masking.
Cc: stable(a)vger.kernel.org # 4.12
Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 46bbc949c20a..7a30524a80ee 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -350,6 +350,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
void kvm_vgic_load(struct kvm_vcpu *vcpu);
void kvm_vgic_put(struct kvm_vcpu *vcpu);
+void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu);
#define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel))
#define vgic_initialized(k) ((k)->arch.vgic.initialized)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index c704fa696184..482b20256fa8 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -323,6 +323,17 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
{
+ /*
+ * If we're about to block (most likely because we've just hit a
+ * WFI), we need to sync back the state of the GIC CPU interface
+ * so that we have the lastest PMR and group enables. This ensures
+ * that kvm_arch_vcpu_runnable has up-to-date data to decide
+ * whether we have pending interrupts.
+ */
+ preempt_disable();
+ kvm_vgic_vmcr_sync(vcpu);
+ preempt_enable();
+
kvm_vgic_v4_enable_doorbell(vcpu);
}
diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c
index 6dd5ad706c92..96aab77d0471 100644
--- a/virt/kvm/arm/vgic/vgic-v2.c
+++ b/virt/kvm/arm/vgic/vgic-v2.c
@@ -484,10 +484,17 @@ void vgic_v2_load(struct kvm_vcpu *vcpu)
kvm_vgic_global_state.vctrl_base + GICH_APR);
}
-void vgic_v2_put(struct kvm_vcpu *vcpu)
+void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR);
+}
+
+void vgic_v2_put(struct kvm_vcpu *vcpu)
+{
+ struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
+
+ vgic_v2_vmcr_sync(vcpu);
cpu_if->vgic_apr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_APR);
}
diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c
index c2c9ce009f63..0c653a1e5215 100644
--- a/virt/kvm/arm/vgic/vgic-v3.c
+++ b/virt/kvm/arm/vgic/vgic-v3.c
@@ -662,12 +662,17 @@ void vgic_v3_load(struct kvm_vcpu *vcpu)
__vgic_v3_activate_traps(vcpu);
}
-void vgic_v3_put(struct kvm_vcpu *vcpu)
+void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
if (likely(cpu_if->vgic_sre))
cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr);
+}
+
+void vgic_v3_put(struct kvm_vcpu *vcpu)
+{
+ vgic_v3_vmcr_sync(vcpu);
kvm_call_hyp(__vgic_v3_save_aprs, vcpu);
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 04786c8ec77e..13d4b38a94ec 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -919,6 +919,17 @@ void kvm_vgic_put(struct kvm_vcpu *vcpu)
vgic_v3_put(vcpu);
}
+void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu)
+{
+ if (unlikely(!irqchip_in_kernel(vcpu->kvm)))
+ return;
+
+ if (kvm_vgic_global_state.type == VGIC_V2)
+ vgic_v2_vmcr_sync(vcpu);
+ else
+ vgic_v3_vmcr_sync(vcpu);
+}
+
int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index 57205beaa981..11adbdac1d56 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -193,6 +193,7 @@ int vgic_register_dist_iodev(struct kvm *kvm, gpa_t dist_base_address,
void vgic_v2_init_lrs(void);
void vgic_v2_load(struct kvm_vcpu *vcpu);
void vgic_v2_put(struct kvm_vcpu *vcpu);
+void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu);
void vgic_v2_save_state(struct kvm_vcpu *vcpu);
void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
@@ -223,6 +224,7 @@ bool vgic_v3_check_base(struct kvm *kvm);
void vgic_v3_load(struct kvm_vcpu *vcpu);
void vgic_v3_put(struct kvm_vcpu *vcpu);
+void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu);
bool vgic_has_its(struct kvm *kvm);
int kvm_vgic_register_its_device(void);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e3c8dc761ead061da2220ee8f8132f729ac3ddfe Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Mon, 29 Jul 2019 18:25:00 +0100
Subject: [PATCH] NFSv4: Check the return value of update_open_stateid()
Ensure that we always check the return value of update_open_stateid()
so that we can retry if the update of local state failed. This fixes
infinite looping on state recovery.
Fixes: e23008ec81ef3 ("NFSv4 reduce attribute requests for open reclaim")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Cc: stable(a)vger.kernel.org # v3.7+
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c9e14ce0b7b2..3e0b93f2b61a 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1915,8 +1915,9 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
update:
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode))
+ return ERR_PTR(-EAGAIN);
refcount_inc(&state->count);
return state;
@@ -1981,8 +1982,11 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode)) {
+ nfs4_put_open_state(state);
+ state = ERR_PTR(-EAGAIN);
+ }
out:
nfs_release_seqid(data->o_arg.seqid);
return state;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e3c8dc761ead061da2220ee8f8132f729ac3ddfe Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Mon, 29 Jul 2019 18:25:00 +0100
Subject: [PATCH] NFSv4: Check the return value of update_open_stateid()
Ensure that we always check the return value of update_open_stateid()
so that we can retry if the update of local state failed. This fixes
infinite looping on state recovery.
Fixes: e23008ec81ef3 ("NFSv4 reduce attribute requests for open reclaim")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Cc: stable(a)vger.kernel.org # v3.7+
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c9e14ce0b7b2..3e0b93f2b61a 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1915,8 +1915,9 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
update:
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode))
+ return ERR_PTR(-EAGAIN);
refcount_inc(&state->count);
return state;
@@ -1981,8 +1982,11 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode)) {
+ nfs4_put_open_state(state);
+ state = ERR_PTR(-EAGAIN);
+ }
out:
nfs_release_seqid(data->o_arg.seqid);
return state;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e3c8dc761ead061da2220ee8f8132f729ac3ddfe Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Mon, 29 Jul 2019 18:25:00 +0100
Subject: [PATCH] NFSv4: Check the return value of update_open_stateid()
Ensure that we always check the return value of update_open_stateid()
so that we can retry if the update of local state failed. This fixes
infinite looping on state recovery.
Fixes: e23008ec81ef3 ("NFSv4 reduce attribute requests for open reclaim")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Cc: stable(a)vger.kernel.org # v3.7+
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c9e14ce0b7b2..3e0b93f2b61a 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1915,8 +1915,9 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
update:
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode))
+ return ERR_PTR(-EAGAIN);
refcount_inc(&state->count);
return state;
@@ -1981,8 +1982,11 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode)) {
+ nfs4_put_open_state(state);
+ state = ERR_PTR(-EAGAIN);
+ }
out:
nfs_release_seqid(data->o_arg.seqid);
return state;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e3c8dc761ead061da2220ee8f8132f729ac3ddfe Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Mon, 29 Jul 2019 18:25:00 +0100
Subject: [PATCH] NFSv4: Check the return value of update_open_stateid()
Ensure that we always check the return value of update_open_stateid()
so that we can retry if the update of local state failed. This fixes
infinite looping on state recovery.
Fixes: e23008ec81ef3 ("NFSv4 reduce attribute requests for open reclaim")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Cc: stable(a)vger.kernel.org # v3.7+
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c9e14ce0b7b2..3e0b93f2b61a 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1915,8 +1915,9 @@ _nfs4_opendata_reclaim_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
update:
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode))
+ return ERR_PTR(-EAGAIN);
refcount_inc(&state->count);
return state;
@@ -1981,8 +1982,11 @@ _nfs4_opendata_to_nfs4_state(struct nfs4_opendata *data)
if (data->o_res.delegation_type != 0)
nfs4_opendata_check_deleg(data, state);
- update_open_stateid(state, &data->o_res.stateid, NULL,
- data->o_arg.fmode);
+ if (!update_open_stateid(state, &data->o_res.stateid,
+ NULL, data->o_arg.fmode)) {
+ nfs4_put_open_state(state);
+ state = ERR_PTR(-EAGAIN);
+ }
out:
nfs_release_seqid(data->o_arg.seqid);
return state;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d92aa45fbfd7319e3a19f4ec59fd32b3862b723 Mon Sep 17 00:00:00 2001
From: Wenwen Wang <wenwen(a)cs.uga.edu>
Date: Wed, 7 Aug 2019 04:08:51 -0500
Subject: [PATCH] ALSA: hiface: fix multiple memory leak bugs
In hiface_pcm_init(), 'rt' is firstly allocated through kzalloc(). Later
on, hiface_pcm_init_urb() is invoked to initialize 'rt->out_urbs[i]'. In
hiface_pcm_init_urb(), 'rt->out_urbs[i].buffer' is allocated through
kzalloc(). However, if hiface_pcm_init_urb() fails, both 'rt' and
'rt->out_urbs[i].buffer' are not deallocated, leading to memory leak bugs.
Also, 'rt->out_urbs[i].buffer' is not deallocated if snd_pcm_new() fails.
To fix the above issues, free 'rt' and 'rt->out_urbs[i].buffer'.
Fixes: a91c3fb2f842 ("Add M2Tech hiFace USB-SPDIF driver")
Signed-off-by: Wenwen Wang <wenwen(a)cs.uga.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/hiface/pcm.c b/sound/usb/hiface/pcm.c
index 14fc1e1d5d13..c406497c5919 100644
--- a/sound/usb/hiface/pcm.c
+++ b/sound/usb/hiface/pcm.c
@@ -600,14 +600,13 @@ int hiface_pcm_init(struct hiface_chip *chip, u8 extra_freq)
ret = hiface_pcm_init_urb(&rt->out_urbs[i], chip, OUT_EP,
hiface_pcm_out_urb_handler);
if (ret < 0)
- return ret;
+ goto error;
}
ret = snd_pcm_new(chip->card, "USB-SPDIF Audio", 0, 1, 0, &pcm);
if (ret < 0) {
- kfree(rt);
dev_err(&chip->dev->dev, "Cannot create pcm instance\n");
- return ret;
+ goto error;
}
pcm->private_data = rt;
@@ -620,4 +619,10 @@ int hiface_pcm_init(struct hiface_chip *chip, u8 extra_freq)
chip->pcm = rt;
return 0;
+
+error:
+ for (i = 0; i < PCM_N_URBS; i++)
+ kfree(rt->out_urbs[i].buffer);
+ kfree(rt);
+ return ret;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d92aa45fbfd7319e3a19f4ec59fd32b3862b723 Mon Sep 17 00:00:00 2001
From: Wenwen Wang <wenwen(a)cs.uga.edu>
Date: Wed, 7 Aug 2019 04:08:51 -0500
Subject: [PATCH] ALSA: hiface: fix multiple memory leak bugs
In hiface_pcm_init(), 'rt' is firstly allocated through kzalloc(). Later
on, hiface_pcm_init_urb() is invoked to initialize 'rt->out_urbs[i]'. In
hiface_pcm_init_urb(), 'rt->out_urbs[i].buffer' is allocated through
kzalloc(). However, if hiface_pcm_init_urb() fails, both 'rt' and
'rt->out_urbs[i].buffer' are not deallocated, leading to memory leak bugs.
Also, 'rt->out_urbs[i].buffer' is not deallocated if snd_pcm_new() fails.
To fix the above issues, free 'rt' and 'rt->out_urbs[i].buffer'.
Fixes: a91c3fb2f842 ("Add M2Tech hiFace USB-SPDIF driver")
Signed-off-by: Wenwen Wang <wenwen(a)cs.uga.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/hiface/pcm.c b/sound/usb/hiface/pcm.c
index 14fc1e1d5d13..c406497c5919 100644
--- a/sound/usb/hiface/pcm.c
+++ b/sound/usb/hiface/pcm.c
@@ -600,14 +600,13 @@ int hiface_pcm_init(struct hiface_chip *chip, u8 extra_freq)
ret = hiface_pcm_init_urb(&rt->out_urbs[i], chip, OUT_EP,
hiface_pcm_out_urb_handler);
if (ret < 0)
- return ret;
+ goto error;
}
ret = snd_pcm_new(chip->card, "USB-SPDIF Audio", 0, 1, 0, &pcm);
if (ret < 0) {
- kfree(rt);
dev_err(&chip->dev->dev, "Cannot create pcm instance\n");
- return ret;
+ goto error;
}
pcm->private_data = rt;
@@ -620,4 +619,10 @@ int hiface_pcm_init(struct hiface_chip *chip, u8 extra_freq)
chip->pcm = rt;
return 0;
+
+error:
+ for (i = 0; i < PCM_N_URBS; i++)
+ kfree(rt->out_urbs[i].buffer);
+ kfree(rt);
+ return ret;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3d92aa45fbfd7319e3a19f4ec59fd32b3862b723 Mon Sep 17 00:00:00 2001
From: Wenwen Wang <wenwen(a)cs.uga.edu>
Date: Wed, 7 Aug 2019 04:08:51 -0500
Subject: [PATCH] ALSA: hiface: fix multiple memory leak bugs
In hiface_pcm_init(), 'rt' is firstly allocated through kzalloc(). Later
on, hiface_pcm_init_urb() is invoked to initialize 'rt->out_urbs[i]'. In
hiface_pcm_init_urb(), 'rt->out_urbs[i].buffer' is allocated through
kzalloc(). However, if hiface_pcm_init_urb() fails, both 'rt' and
'rt->out_urbs[i].buffer' are not deallocated, leading to memory leak bugs.
Also, 'rt->out_urbs[i].buffer' is not deallocated if snd_pcm_new() fails.
To fix the above issues, free 'rt' and 'rt->out_urbs[i].buffer'.
Fixes: a91c3fb2f842 ("Add M2Tech hiFace USB-SPDIF driver")
Signed-off-by: Wenwen Wang <wenwen(a)cs.uga.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/hiface/pcm.c b/sound/usb/hiface/pcm.c
index 14fc1e1d5d13..c406497c5919 100644
--- a/sound/usb/hiface/pcm.c
+++ b/sound/usb/hiface/pcm.c
@@ -600,14 +600,13 @@ int hiface_pcm_init(struct hiface_chip *chip, u8 extra_freq)
ret = hiface_pcm_init_urb(&rt->out_urbs[i], chip, OUT_EP,
hiface_pcm_out_urb_handler);
if (ret < 0)
- return ret;
+ goto error;
}
ret = snd_pcm_new(chip->card, "USB-SPDIF Audio", 0, 1, 0, &pcm);
if (ret < 0) {
- kfree(rt);
dev_err(&chip->dev->dev, "Cannot create pcm instance\n");
- return ret;
+ goto error;
}
pcm->private_data = rt;
@@ -620,4 +619,10 @@ int hiface_pcm_init(struct hiface_chip *chip, u8 extra_freq)
chip->pcm = rt;
return 0;
+
+error:
+ for (i = 0; i < PCM_N_URBS; i++)
+ kfree(rt->out_urbs[i].buffer);
+ kfree(rt);
+ return ret;
}